You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Jeff Zhang <zj...@gmail.com> on 2019/05/29 08:17:49 UTC

Re: [DISCUSS] Flink client api enhancement for downstream project

Sorry folks, the design doc is late as you expected. Here's the design doc
I drafted, welcome any comments and feedback.

https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing



Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午8:43写道：

> Nice that this discussion is happening.
>
> In the FLIP, we could also revisit the entire role of the environments
> again.
>
> Initially, the idea was:
>   - the environments take care of the specific setup for standalone (no
> setup needed), yarn, mesos, etc.
>   - the session ones have control over the session. The environment holds
> the session client.
>   - running a job gives a "control" object for that job. That behavior is
> the same in all environments.
>
> The actual implementation diverged quite a bit from that. Happy to see a
> discussion about straitening this out a bit more.
>
>
> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <zj...@gmail.com> wrote:
>
> > Hi folks,
> >
> > Sorry for late response, It seems we reach consensus on this, I will
> create
> > FLIP for this with more detailed design
> >
> >
> > Thomas Weise <th...@apache.org> 于2018年12月21日周五 上午11:43写道：
> >
> > > Great to see this discussion seeded! The problems you face with the
> > > Zeppelin integration are also affecting other downstream projects, like
> > > Beam.
> > >
> > > We just enabled the savepoint restore option in RemoteStreamEnvironment
> > [1]
> > > and that was more difficult than it should be. The main issue is that
> > > environment and cluster client aren't decoupled. Ideally it should be
> > > possible to just get the matching cluster client from the environment
> and
> > > then control the job through it (environment as factory for cluster
> > > client). But note that the environment classes are part of the public
> > API,
> > > and it is not straightforward to make larger changes without breaking
> > > backward compatibility.
> > >
> > > ClusterClient currently exposes internal classes like JobGraph and
> > > StreamGraph. But it should be possible to wrap this with a new public
> API
> > > that brings the required job control capabilities for downstream
> > projects.
> > > Perhaps it is helpful to look at some of the interfaces in Beam while
> > > thinking about this: [2] for the portable job API and [3] for the old
> > > asynchronous job control from the Beam Java SDK.
> > >
> > > The backward compatibility discussion [4] is also relevant here. A new
> > API
> > > should shield downstream projects from internals and allow them to
> > > interoperate with multiple future Flink versions in the same release
> line
> > > without forced upgrades.
> > >
> > > Thanks,
> > > Thomas
> > >
> > > [1] https://github.com/apache/flink/pull/7249
> > > [2]
> > >
> > >
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > [3]
> > >
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > [4]
> > >
> > >
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > >
> > >
> > > On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <zj...@gmail.com> wrote:
> > >
> > > > >>> I'm not so sure whether the user should be able to define where
> the
> > > job
> > > > runs (in your example Yarn). This is actually independent of the job
> > > > development and is something which is decided at deployment time.
> > > >
> > > > User don't need to specify execution mode programmatically. They can
> > also
> > > > pass the execution mode from the arguments in flink run command. e.g.
> > > >
> > > > bin/flink run -m yarn-cluster ....
> > > > bin/flink run -m local ...
> > > > bin/flink run -m host:port ...
> > > >
> > > > Does this make sense to you ?
> > > >
> > > > >>> To me it makes sense that the ExecutionEnvironment is not
> directly
> > > > initialized by the user and instead context sensitive how you want to
> > > > execute your job (Flink CLI vs. IDE, for example).
> > > >
> > > > Right, currently I notice Flink would create different
> > > > ContextExecutionEnvironment based on different submission scenarios
> > > (Flink
> > > > Cli vs IDE). To me this is kind of hack approach, not so
> > straightforward.
> > > > What I suggested above is that is that flink should always create the
> > > same
> > > > ExecutionEnvironment but with different configuration, and based on
> the
> > > > configuration it would create the proper ClusterClient for different
> > > > behaviors.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Till Rohrmann <tr...@apache.org> 于2018年12月20日周四 下午11:18写道：
> > > >
> > > > > You are probably right that we have code duplication when it comes
> to
> > > the
> > > > > creation of the ClusterClient. This should be reduced in the
> future.
> > > > >
> > > > > I'm not so sure whether the user should be able to define where the
> > job
> > > > > runs (in your example Yarn). This is actually independent of the
> job
> > > > > development and is something which is decided at deployment time.
> To
> > me
> > > > it
> > > > > makes sense that the ExecutionEnvironment is not directly
> initialized
> > > by
> > > > > the user and instead context sensitive how you want to execute your
> > job
> > > > > (Flink CLI vs. IDE, for example). However, I agree that the
> > > > > ExecutionEnvironment should give you access to the ClusterClient
> and
> > to
> > > > the
> > > > > job (maybe in the form of the JobGraph or a job plan).
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <zj...@gmail.com>
> wrote:
> > > > >
> > > > > > Hi Till,
> > > > > > Thanks for the feedback. You are right that I expect better
> > > > programmatic
> > > > > > job submission/control api which could be used by downstream
> > project.
> > > > And
> > > > > > it would benefit for the flink ecosystem. When I look at the code
> > of
> > > > > flink
> > > > > > scala-shell and sql-client (I believe they are not the core of
> > flink,
> > > > but
> > > > > > belong to the ecosystem of flink), I find many duplicated code
> for
> > > > > creating
> > > > > > ClusterClient from user provided configuration (configuration
> > format
> > > > may
> > > > > be
> > > > > > different from scala-shell and sql-client) and then use that
> > > > > ClusterClient
> > > > > > to manipulate jobs. I don't think this is convenient for
> downstream
> > > > > > projects. What I expect is that downstream project only needs to
> > > > provide
> > > > > > necessary configuration info (maybe introducing class FlinkConf),
> > and
> > > > > then
> > > > > > build ExecutionEnvironment based on this FlinkConf, and
> > > > > > ExecutionEnvironment will create the proper ClusterClient. It not
> > > only
> > > > > > benefit for the downstream project development but also be
> helpful
> > > for
> > > > > > their integration test with flink. Here's one sample code snippet
> > > that
> > > > I
> > > > > > expect.
> > > > > >
> > > > > > val conf = new FlinkConf().mode("yarn")
> > > > > > val env = new ExecutionEnvironment(conf)
> > > > > > val jobId = env.submit(...)
> > > > > > val jobStatus = env.getClusterClient().queryJobStatus(jobId)
> > > > > > env.getClusterClient().cancelJob(jobId)
> > > > > >
> > > > > > What do you think ?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Till Rohrmann <tr...@apache.org> 于2018年12月11日周二 下午6:28写道：
> > > > > >
> > > > > > > Hi Jeff,
> > > > > > >
> > > > > > > what you are proposing is to provide the user with better
> > > > programmatic
> > > > > > job
> > > > > > > control. There was actually an effort to achieve this but it
> has
> > > > never
> > > > > > been
> > > > > > > completed [1]. However, there are some improvement in the code
> > base
> > > > > now.
> > > > > > > Look for example at the NewClusterClient interface which
> offers a
> > > > > > > non-blocking job submission. But I agree that we need to
> improve
> > > > Flink
> > > > > in
> > > > > > > this regard.
> > > > > > >
> > > > > > > I would not be in favour if exposing all ClusterClient calls
> via
> > > the
> > > > > > > ExecutionEnvironment because it would clutter the class and
> would
> > > not
> > > > > be
> > > > > > a
> > > > > > > good separation of concerns. Instead one idea could be to
> > retrieve
> > > > the
> > > > > > > current ClusterClient from the ExecutionEnvironment which can
> > then
> > > be
> > > > > > used
> > > > > > > for cluster and job control. But before we start an effort
> here,
> > we
> > > > > need
> > > > > > to
> > > > > > > agree and capture what functionality we want to provide.
> > > > > > >
> > > > > > > Initially, the idea was that we have the ClusterDescriptor
> > > describing
> > > > > how
> > > > > > > to talk to cluster manager like Yarn or Mesos. The
> > > ClusterDescriptor
> > > > > can
> > > > > > be
> > > > > > > used for deploying Flink clusters (job and session) and gives
> > you a
> > > > > > > ClusterClient. The ClusterClient controls the cluster (e.g.
> > > > submitting
> > > > > > > jobs, listing all running jobs). And then there was the idea to
> > > > > > introduce a
> > > > > > > JobClient which you obtain from the ClusterClient to trigger
> job
> > > > > specific
> > > > > > > operations (e.g. taking a savepoint, cancelling the job).
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-4272
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang <zj...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Folks,
> > > > > > > >
> > > > > > > > I am trying to integrate flink into apache zeppelin which is
> an
> > > > > > > interactive
> > > > > > > > notebook. And I hit several issues that is caused by flink
> > client
> > > > > api.
> > > > > > So
> > > > > > > > I'd like to proposal the following changes for flink client
> > api.
> > > > > > > >
> > > > > > > > 1. Support nonblocking execution. Currently,
> > > > > > ExecutionEnvironment#execute
> > > > > > > > is a blocking method which would do 2 things, first submit
> job
> > > and
> > > > > then
> > > > > > > > wait for job until it is finished. I'd like introduce a
> > > nonblocking
> > > > > > > > execution method like ExecutionEnvironment#submit which only
> > > submit
> > > > > job
> > > > > > > and
> > > > > > > > then return jobId to client. And allow user to query the job
> > > status
> > > > > via
> > > > > > > the
> > > > > > > > jobId.
> > > > > > > >
> > > > > > > > 2. Add cancel api in
> > > > ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > > > > currently the only way to cancel job is via cli (bin/flink),
> > this
> > > > is
> > > > > > not
> > > > > > > > convenient for downstream project to use this feature. So I'd
> > > like
> > > > to
> > > > > > add
> > > > > > > > cancel api in ExecutionEnvironment
> > > > > > > >
> > > > > > > > 3. Add savepoint api in
> > > > > > ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > > It
> > > > > > > > is similar as cancel api, we should use ExecutionEnvironment
> as
> > > the
> > > > > > > unified
> > > > > > > > api for third party to integrate with flink.
> > > > > > > >
> > > > > > > > 4. Add listener for job execution lifecycle. Something like
> > > > > following,
> > > > > > so
> > > > > > > > that downstream project can do custom logic in the lifecycle
> of
> > > > job.
> > > > > > e.g.
> > > > > > > > Zeppelin would capture the jobId after job is submitted and
> > then
> > > > use
> > > > > > this
> > > > > > > > jobId to cancel it later when necessary.
> > > > > > > >
> > > > > > > > public interface JobListener {
> > > > > > > >
> > > > > > > >    void onJobSubmitted(JobID jobId);
> > > > > > > >
> > > > > > > >    void onJobExecuted(JobExecutionResult jobResult);
> > > > > > > >
> > > > > > > >    void onJobCanceled(JobID jobId);
> > > > > > > > }
> > > > > > > >
> > > > > > > > 5. Enable session in ExecutionEnvironment. Currently it is
> > > > disabled,
> > > > > > but
> > > > > > > > session is very convenient for third party to submitting jobs
> > > > > > > continually.
> > > > > > > > I hope flink can enable it again.
> > > > > > > >
> > > > > > > > 6. Unify all flink client api into
> > > > > > > > ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > > >
> > > > > > > > This is a long term issue which needs more careful thinking
> and
> > > > > design.
> > > > > > > > Currently some of features of flink is exposed in
> > > > > > > > ExecutionEnvironment/StreamExecutionEnvironment, but some are
> > > > exposed
> > > > > > in
> > > > > > > > cli instead of api, like the cancel and savepoint I mentioned
> > > > above.
> > > > > I
> > > > > > > > think the root cause is due to that flink didn't unify the
> > > > > interaction
> > > > > > > with
> > > > > > > > flink. Here I list 3 scenarios of flink operation
> > > > > > > >
> > > > > > > >    - Local job execution.  Flink will create LocalEnvironment
> > and
> > > > > then
> > > > > > > use
> > > > > > > >    this LocalEnvironment to create LocalExecutor for job
> > > execution.
> > > > > > > >    - Remote job execution. Flink will create ClusterClient
> > first
> > > > and
> > > > > > then
> > > > > > > >    create ContextEnvironment based on the ClusterClient and
> > then
> > > > run
> > > > > > the
> > > > > > > > job.
> > > > > > > >    - Job cancelation. Flink will create ClusterClient first
> and
> > > > then
> > > > > > > cancel
> > > > > > > >    this job via this ClusterClient.
> > > > > > > >
> > > > > > > > As you can see in the above 3 scenarios. Flink didn't use the
> > > same
> > > > > > > > approach(code path) to interact with flink
> > > > > > > > What I propose is following:
> > > > > > > > Create the proper LocalEnvironment/RemoteEnvironment (based
> on
> > > user
> > > > > > > > configuration) --> Use this Environment to create proper
> > > > > ClusterClient
> > > > > > > > (LocalClusterClient or RestClusterClient) to interactive with
> > > > Flink (
> > > > > > job
> > > > > > > > execution or cancelation)
> > > > > > > >
> > > > > > > > This way we can unify the process of local execution and
> remote
> > > > > > > execution.
> > > > > > > > And it is much easier for third party to integrate with
> flink,
> > > > > because
> > > > > > > > ExecutionEnvironment is the unified entry point for flink.
> What
> > > > third
> > > > > > > party
> > > > > > > > needs to do is just pass configuration to
> ExecutionEnvironment
> > > and
> > > > > > > > ExecutionEnvironment will do the right thing based on the
> > > > > > configuration.
> > > > > > > > Flink cli can also be considered as flink api consumer. it
> just
> > > > pass
> > > > > > the
> > > > > > > > configuration to ExecutionEnvironment and let
> > > ExecutionEnvironment
> > > > to
> > > > > > > > create the proper ClusterClient instead of letting cli to
> > create
> > > > > > > > ClusterClient directly.
> > > > > > > >
> > > > > > > >
> > > > > > > > 6 would involve large code refactoring, so I think we can
> defer
> > > it
> > > > > for
> > > > > > > > future release, 1,2,3,4,5 could be done at once I believe.
> Let
> > me
> > > > > know
> > > > > > > your
> > > > > > > > comments and feedback, thanks
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards
> > > > > > > >
> > > > > > > > Jeff Zhang
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards
> > > > > >
> > > > > > Jeff Zhang
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Jeff Zhang
> > > >
> > >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>


-- 
Best Regards

Jeff Zhang

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

Hi Stephan,

Thanks for waking up this thread.

Jeff and I had a discussion yesterday sharing our respective
observations and ideas of client api enhancements. We are
glad to make some progress in Flink 1.10.

It's really nice to hear that you're gonna participate this
thread soon. In ML threads we can see a consensus to client
api enhancement and with several offline discussions we have
rough ideas on how to introduce these enhancements. It's
hopeful that we draft a FLIP based on Jeff's doc and the
discussions so far. And it is appreciated that some of our
committers/PMCs could shepherd the FLIP.

By the way, there is separated thread concurrently happening[1],
which, besides of the major enhancements, helps make progress in
refactoring client api.

Best,
tison.

[1]
https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E


Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：

> Hi all!
>
> This thread has stalled for a bit, which I assume ist mostly due to the
> Flink 1.9 feature freeze and release testing effort.
>
> I personally still recognize this issue as one important to be solved. I'd
> be happy to help resume this discussion soon (after the 1.9 release) and
> see if we can do some step towards this in Flink 1.10.
>
> Best,
> Stephan
>
>
>
> On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <po...@okkam.it>
> wrote:
>
> > That's exactly what I suggested a long time ago: the Flink REST client
> > should not require any Flink dependency, only http library to call the
> REST
> > services to submit and monitor a job.
> > What I suggested also in [1] was to have a way to automatically suggest
> the
> > user (via a UI) the available main classes and their required
> > parameters[2].
> > Another problem we have with Flink is that the Rest client and the CLI
> one
> > behaves differently and we use the CLI client (via ssh) because it allows
> > to call some other method after env.execute() [3] (we have to call
> another
> > REST service to signal the end of the job).
> > Int his regard, a dedicated interface, like the JobListener suggested in
> > the previous emails, would be very helpful (IMHO).
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-10864
> > [2] https://issues.apache.org/jira/browse/FLINK-10862
> > [3] https://issues.apache.org/jira/browse/FLINK-10879
> >
> > Best,
> > Flavio
> >
> > On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <zj...@gmail.com> wrote:
> >
> > > Hi, Tison,
> > >
> > > Thanks for your comments. Overall I agree with you that it is difficult
> > for
> > > down stream project to integrate with flink and we need to refactor the
> > > current flink client api.
> > > And I agree that CliFrontend should only parsing command line arguments
> > and
> > > then pass them to ExecutionEnvironment. It is ExecutionEnvironment's
> > > responsibility to compile job, create cluster, and submit job. Besides
> > > that, Currently flink has many ExecutionEnvironment implementations,
> and
> > > flink will use the specific one based on the context. IMHO, it is not
> > > necessary, ExecutionEnvironment should be able to do the right thing
> > based
> > > on the FlinkConf it is received. Too many ExecutionEnvironment
> > > implementation is another burden for downstream project integration.
> > >
> > > One thing I'd like to mention is flink's scala shell and sql client,
> > > although they are sub-modules of flink, they could be treated as
> > downstream
> > > project which use flink's client api. Currently you will find it is not
> > > easy for them to integrate with flink, they share many duplicated code.
> > It
> > > is another sign that we should refactor flink client api.
> > >
> > > I believe it is a large and hard change, and I am afraid we can not
> keep
> > > compatibility since many of changes are user facing.
> > >
> > >
> > >
> > > Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：
> > >
> > > > Hi all,
> > > >
> > > > After a closer look on our client apis, I can see there are two major
> > > > issues to consistency and integration, namely different deployment of
> > > > job cluster which couples job graph creation and cluster deployment,
> > > > and submission via CliFrontend confusing control flow of job graph
> > > > compilation and job submission. I'd like to follow the discuss above,
> > > > mainly the process described by Jeff and Stephan, and share my
> > > > ideas on these issues.
> > > >
> > > > 1) CliFrontend confuses the control flow of job compilation and
> > > submission.
> > > > Following the process of job submission Stephan and Jeff described,
> > > > execution environment knows all configs of the cluster and
> > topos/settings
> > > > of the job. Ideally, in the main method of user program, it calls
> > > #execute
> > > > (or named #submit) and Flink deploys the cluster, compile the job
> graph
> > > > and submit it to the cluster. However, current CliFrontend does all
> > these
> > > > things inside its #runProgram method, which introduces a lot of
> > > subclasses
> > > > of (stream) execution environment.
> > > >
> > > > Actually, it sets up an exec env that hijacks the
> #execute/executePlan
> > > > method, initializes the job graph and abort execution. And then
> > > > control flow back to CliFrontend, it deploys the cluster(or retrieve
> > > > the client) and submits the job graph. This is quite a specific
> > internal
> > > > process inside Flink and none of consistency to anything.
> > > >
> > > > 2) Deployment of job cluster couples job graph creation and cluster
> > > > deployment. Abstractly, from user job to a concrete submission, it
> > > requires
> > > >
> > > >      create JobGraph --\
> > > >
> > > > create ClusterClient -->  submit JobGraph
> > > >
> > > > such a dependency. ClusterClient was created by deploying or
> > retrieving.
> > > > JobGraph submission requires a compiled JobGraph and valid
> > ClusterClient,
> > > > but the creation of ClusterClient is abstractly independent of that
> of
> > > > JobGraph. However, in job cluster mode, we deploy job cluster with a
> > job
> > > > graph, which means we use another process:
> > > >
> > > > create JobGraph --> deploy cluster with the JobGraph
> > > >
> > > > Here is another inconsistency and downstream projects/client apis are
> > > > forced to handle different cases with rare supports from Flink.
> > > >
> > > > Since we likely reached a consensus on
> > > >
> > > > 1. all configs gathered by Flink configuration and passed
> > > > 2. execution environment knows all configs and handles execution(both
> > > > deployment and submission)
> > > >
> > > > to the issues above I propose eliminating inconsistencies by
> following
> > > > approach:
> > > >
> > > > 1) CliFrontend should exactly be a front end, at least for "run"
> > command.
> > > > That means it just gathered and passed all config from command line
> to
> > > > the main method of user program. Execution environment knows all the
> > info
> > > > and with an addition to utils for ClusterClient, we gracefully get a
> > > > ClusterClient by deploying or retrieving. In this way, we don't need
> to
> > > > hijack #execute/executePlan methods and can remove various hacking
> > > > subclasses of exec env, as well as #run methods in ClusterClient(for
> an
> > > > interface-ized ClusterClient). Now the control flow flows from
> > > CliFrontend
> > > > to the main method and never returns.
> > > >
> > > > 2) Job cluster means a cluster for the specific job. From another
> > > > perspective, it is an ephemeral session. We may decouple the
> deployment
> > > > with a compiled job graph, but start a session with idle timeout
> > > > and submit the job following.
> > > >
> > > > These topics, before we go into more details on design or
> > implementation,
> > > > are better to be aware and discussed for a consensus.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
> > > >
> > > >> Hi Jeff,
> > > >>
> > > >> Thanks for raising this thread and the design document!
> > > >>
> > > >> As @Thomas Weise mentioned above, extending config to flink
> > > >> requires far more effort than it should be. Another example
> > > >> is we achieve detach mode by introduce another execution
> > > >> environment which also hijack #execute method.
> > > >>
> > > >> I agree with your idea that user would configure all things
> > > >> and flink "just" respect it. On this topic I think the unusual
> > > >> control flow when CliFrontend handle "run" command is the problem.
> > > >> It handles several configs, mainly about cluster settings, and
> > > >> thus main method of user program is unaware of them. Also it
> compiles
> > > >> app to job graph by run the main method with a hijacked exec env,
> > > >> which constrain the main method further.
> > > >>
> > > >> I'd like to write down a few of notes on configs/args pass and
> > respect,
> > > >> as well as decoupling job compilation and submission. Share on this
> > > >> thread later.
> > > >>
> > > >> Best,
> > > >> tison.
> > > >>
> > > >>
> > > >> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一 下午7:29写道：
> > > >>
> > > >>> Hi Jeff and Flavio,
> > > >>>
> > > >>> Thanks Jeff a lot for proposing the design document.
> > > >>>
> > > >>> We are also working on refactoring ClusterClient to allow flexible
> > and
> > > >>> efficient job management in our real-time platform.
> > > >>> We would like to draft a document to share our ideas with you.
> > > >>>
> > > >>> I think it's a good idea to have something like Apache Livy for
> > Flink,
> > > >>> and
> > > >>> the efforts discussed here will take a great step forward to it.
> > > >>>
> > > >>> Regards,
> > > >>> Xiaogang
> > > >>>
> > > >>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一 下午7:13写道：
> > > >>>
> > > >>> > Is there any possibility to have something like Apache Livy [1]
> > also
> > > >>> for
> > > >>> > Flink in the future?
> > > >>> >
> > > >>> > [1] https://livy.apache.org/
> > > >>> >
> > > >>> > On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <zj...@gmail.com>
> > wrote:
> > > >>> >
> > > >>> > > >>>  Any API we expose should not have dependencies on the
> > runtime
> > > >>> > > (flink-runtime) package or other implementation details. To me,
> > > this
> > > >>> > means
> > > >>> > > that the current ClusterClient cannot be exposed to users
> because
> > > it
> > > >>> >  uses
> > > >>> > > quite some classes from the optimiser and runtime packages.
> > > >>> > >
> > > >>> > > We should change ClusterClient from class to interface.
> > > >>> > > ExecutionEnvironment only use the interface ClusterClient which
> > > >>> should be
> > > >>> > > in flink-clients while the concrete implementation class could
> be
> > > in
> > > >>> > > flink-runtime.
> > > >>> > >
> > > >>> > > >>> What happens when a failure/restart in the client happens?
> > > There
> > > >>> need
> > > >>> > > to be a way of re-establishing the connection to the job, set
> up
> > > the
> > > >>> > > listeners again, etc.
> > > >>> > >
> > > >>> > > Good point.  First we need to define what does failure/restart
> in
> > > the
> > > >>> > > client mean. IIUC, that usually mean network failure which will
> > > >>> happen in
> > > >>> > > class RestClient. If my understanding is correct, restart/retry
> > > >>> mechanism
> > > >>> > > should be done in RestClient.
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> > > Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二
> 下午11:10写道：
> > > >>> > >
> > > >>> > > > Some points to consider:
> > > >>> > > >
> > > >>> > > > * Any API we expose should not have dependencies on the
> runtime
> > > >>> > > > (flink-runtime) package or other implementation details. To
> me,
> > > >>> this
> > > >>> > > means
> > > >>> > > > that the current ClusterClient cannot be exposed to users
> > because
> > > >>> it
> > > >>> > >  uses
> > > >>> > > > quite some classes from the optimiser and runtime packages.
> > > >>> > > >
> > > >>> > > > * What happens when a failure/restart in the client happens?
> > > There
> > > >>> need
> > > >>> > > to
> > > >>> > > > be a way of re-establishing the connection to the job, set up
> > the
> > > >>> > > listeners
> > > >>> > > > again, etc.
> > > >>> > > >
> > > >>> > > > Aljoscha
> > > >>> > > >
> > > >>> > > > > On 29. May 2019, at 10:17, Jeff Zhang <zj...@gmail.com>
> > > wrote:
> > > >>> > > > >
> > > >>> > > > > Sorry folks, the design doc is late as you expected. Here's
> > the
> > > >>> > design
> > > >>> > > > doc
> > > >>> > > > > I drafted, welcome any comments and feedback.
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午8:43写道：
> > > >>> > > > >
> > > >>> > > > >> Nice that this discussion is happening.
> > > >>> > > > >>
> > > >>> > > > >> In the FLIP, we could also revisit the entire role of the
> > > >>> > environments
> > > >>> > > > >> again.
> > > >>> > > > >>
> > > >>> > > > >> Initially, the idea was:
> > > >>> > > > >>  - the environments take care of the specific setup for
> > > >>> standalone
> > > >>> > (no
> > > >>> > > > >> setup needed), yarn, mesos, etc.
> > > >>> > > > >>  - the session ones have control over the session. The
> > > >>> environment
> > > >>> > > holds
> > > >>> > > > >> the session client.
> > > >>> > > > >>  - running a job gives a "control" object for that job.
> That
> > > >>> > behavior
> > > >>> > > is
> > > >>> > > > >> the same in all environments.
> > > >>> > > > >>
> > > >>> > > > >> The actual implementation diverged quite a bit from that.
> > > Happy
> > > >>> to
> > > >>> > > see a
> > > >>> > > > >> discussion about straitening this out a bit more.
> > > >>> > > > >>
> > > >>> > > > >>
> > > >>> > > > >> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
> > zjffdu@gmail.com>
> > > >>> > wrote:
> > > >>> > > > >>
> > > >>> > > > >>> Hi folks,
> > > >>> > > > >>>
> > > >>> > > > >>> Sorry for late response, It seems we reach consensus on
> > > this, I
> > > >>> > will
> > > >>> > > > >> create
> > > >>> > > > >>> FLIP for this with more detailed design
> > > >>> > > > >>>
> > > >>> > > > >>>
> > > >>> > > > >>> Thomas Weise <th...@apache.org> 于2018年12月21日周五 上午11:43写道：
> > > >>> > > > >>>
> > > >>> > > > >>>> Great to see this discussion seeded! The problems you
> face
> > > >>> with
> > > >>> > the
> > > >>> > > > >>>> Zeppelin integration are also affecting other downstream
> > > >>> projects,
> > > >>> > > > like
> > > >>> > > > >>>> Beam.
> > > >>> > > > >>>>
> > > >>> > > > >>>> We just enabled the savepoint restore option in
> > > >>> > > > RemoteStreamEnvironment
> > > >>> > > > >>> [1]
> > > >>> > > > >>>> and that was more difficult than it should be. The main
> > > issue
> > > >>> is
> > > >>> > > that
> > > >>> > > > >>>> environment and cluster client aren't decoupled. Ideally
> > it
> > > >>> should
> > > >>> > > be
> > > >>> > > > >>>> possible to just get the matching cluster client from
> the
> > > >>> > > environment
> > > >>> > > > >> and
> > > >>> > > > >>>> then control the job through it (environment as factory
> > for
> > > >>> > cluster
> > > >>> > > > >>>> client). But note that the environment classes are part
> of
> > > the
> > > >>> > > public
> > > >>> > > > >>> API,
> > > >>> > > > >>>> and it is not straightforward to make larger changes
> > without
> > > >>> > > breaking
> > > >>> > > > >>>> backward compatibility.
> > > >>> > > > >>>>
> > > >>> > > > >>>> ClusterClient currently exposes internal classes like
> > > >>> JobGraph and
> > > >>> > > > >>>> StreamGraph. But it should be possible to wrap this
> with a
> > > new
> > > >>> > > public
> > > >>> > > > >> API
> > > >>> > > > >>>> that brings the required job control capabilities for
> > > >>> downstream
> > > >>> > > > >>> projects.
> > > >>> > > > >>>> Perhaps it is helpful to look at some of the interfaces
> in
> > > >>> Beam
> > > >>> > > while
> > > >>> > > > >>>> thinking about this: [2] for the portable job API and
> [3]
> > > for
> > > >>> the
> > > >>> > > old
> > > >>> > > > >>>> asynchronous job control from the Beam Java SDK.
> > > >>> > > > >>>>
> > > >>> > > > >>>> The backward compatibility discussion [4] is also
> relevant
> > > >>> here. A
> > > >>> > > new
> > > >>> > > > >>> API
> > > >>> > > > >>>> should shield downstream projects from internals and
> allow
> > > >>> them to
> > > >>> > > > >>>> interoperate with multiple future Flink versions in the
> > same
> > > >>> > release
> > > >>> > > > >> line
> > > >>> > > > >>>> without forced upgrades.
> > > >>> > > > >>>>
> > > >>> > > > >>>> Thanks,
> > > >>> > > > >>>> Thomas
> > > >>> > > > >>>>
> > > >>> > > > >>>> [1] https://github.com/apache/flink/pull/7249
> > > >>> > > > >>>> [2]
> > > >>> > > > >>>>
> > > >>> > > > >>>>
> > > >>> > > > >>>
> > > >>> > > > >>
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > >>> > > > >>>> [3]
> > > >>> > > > >>>>
> > > >>> > > > >>>>
> > > >>> > > > >>>
> > > >>> > > > >>
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > >>> > > > >>>> [4]
> > > >>> > > > >>>>
> > > >>> > > > >>>>
> > > >>> > > > >>>
> > > >>> > > > >>
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > >>> > > > >>>>
> > > >>> > > > >>>>
> > > >>> > > > >>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
> > > zjffdu@gmail.com>
> > > >>> > > wrote:
> > > >>> > > > >>>>
> > > >>> > > > >>>>>>>> I'm not so sure whether the user should be able to
> > > define
> > > >>> > where
> > > >>> > > > >> the
> > > >>> > > > >>>> job
> > > >>> > > > >>>>> runs (in your example Yarn). This is actually
> independent
> > > of
> > > >>> the
> > > >>> > > job
> > > >>> > > > >>>>> development and is something which is decided at
> > deployment
> > > >>> time.
> > > >>> > > > >>>>>
> > > >>> > > > >>>>> User don't need to specify execution mode
> > programmatically.
> > > >>> They
> > > >>> > > can
> > > >>> > > > >>> also
> > > >>> > > > >>>>> pass the execution mode from the arguments in flink run
> > > >>> command.
> > > >>> > > e.g.
> > > >>> > > > >>>>>
> > > >>> > > > >>>>> bin/flink run -m yarn-cluster ....
> > > >>> > > > >>>>> bin/flink run -m local ...
> > > >>> > > > >>>>> bin/flink run -m host:port ...
> > > >>> > > > >>>>>
> > > >>> > > > >>>>> Does this make sense to you ?
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>>>> To me it makes sense that the ExecutionEnvironment
> is
> > > not
> > > >>> > > > >> directly
> > > >>> > > > >>>>> initialized by the user and instead context sensitive
> how
> > > you
> > > >>> > want
> > > >>> > > to
> > > >>> > > > >>>>> execute your job (Flink CLI vs. IDE, for example).
> > > >>> > > > >>>>>
> > > >>> > > > >>>>> Right, currently I notice Flink would create different
> > > >>> > > > >>>>> ContextExecutionEnvironment based on different
> submission
> > > >>> > scenarios
> > > >>> > > > >>>> (Flink
> > > >>> > > > >>>>> Cli vs IDE). To me this is kind of hack approach, not
> so
> > > >>> > > > >>> straightforward.
> > > >>> > > > >>>>> What I suggested above is that is that flink should
> > always
> > > >>> create
> > > >>> > > the
> > > >>> > > > >>>> same
> > > >>> > > > >>>>> ExecutionEnvironment but with different configuration,
> > and
> > > >>> based
> > > >>> > on
> > > >>> > > > >> the
> > > >>> > > > >>>>> configuration it would create the proper ClusterClient
> > for
> > > >>> > > different
> > > >>> > > > >>>>> behaviors.
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>> Till Rohrmann <tr...@apache.org> 于2018年12月20日周四
> > > >>> 下午11:18写道：
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>> You are probably right that we have code duplication
> > when
> > > it
> > > >>> > comes
> > > >>> > > > >> to
> > > >>> > > > >>>> the
> > > >>> > > > >>>>>> creation of the ClusterClient. This should be reduced
> in
> > > the
> > > >>> > > > >> future.
> > > >>> > > > >>>>>>
> > > >>> > > > >>>>>> I'm not so sure whether the user should be able to
> > define
> > > >>> where
> > > >>> > > the
> > > >>> > > > >>> job
> > > >>> > > > >>>>>> runs (in your example Yarn). This is actually
> > independent
> > > >>> of the
> > > >>> > > > >> job
> > > >>> > > > >>>>>> development and is something which is decided at
> > > deployment
> > > >>> > time.
> > > >>> > > > >> To
> > > >>> > > > >>> me
> > > >>> > > > >>>>> it
> > > >>> > > > >>>>>> makes sense that the ExecutionEnvironment is not
> > directly
> > > >>> > > > >> initialized
> > > >>> > > > >>>> by
> > > >>> > > > >>>>>> the user and instead context sensitive how you want to
> > > >>> execute
> > > >>> > > your
> > > >>> > > > >>> job
> > > >>> > > > >>>>>> (Flink CLI vs. IDE, for example). However, I agree
> that
> > > the
> > > >>> > > > >>>>>> ExecutionEnvironment should give you access to the
> > > >>> ClusterClient
> > > >>> > > > >> and
> > > >>> > > > >>> to
> > > >>> > > > >>>>> the
> > > >>> > > > >>>>>> job (maybe in the form of the JobGraph or a job plan).
> > > >>> > > > >>>>>>
> > > >>> > > > >>>>>> Cheers,
> > > >>> > > > >>>>>> Till
> > > >>> > > > >>>>>>
> > > >>> > > > >>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
> > > >>> zjffdu@gmail.com>
> > > >>> > > > >> wrote:
> > > >>> > > > >>>>>>
> > > >>> > > > >>>>>>> Hi Till,
> > > >>> > > > >>>>>>> Thanks for the feedback. You are right that I expect
> > > better
> > > >>> > > > >>>>> programmatic
> > > >>> > > > >>>>>>> job submission/control api which could be used by
> > > >>> downstream
> > > >>> > > > >>> project.
> > > >>> > > > >>>>> And
> > > >>> > > > >>>>>>> it would benefit for the flink ecosystem. When I look
> > at
> > > >>> the
> > > >>> > code
> > > >>> > > > >>> of
> > > >>> > > > >>>>>> flink
> > > >>> > > > >>>>>>> scala-shell and sql-client (I believe they are not
> the
> > > >>> core of
> > > >>> > > > >>> flink,
> > > >>> > > > >>>>> but
> > > >>> > > > >>>>>>> belong to the ecosystem of flink), I find many
> > duplicated
> > > >>> code
> > > >>> > > > >> for
> > > >>> > > > >>>>>> creating
> > > >>> > > > >>>>>>> ClusterClient from user provided configuration
> > > >>> (configuration
> > > >>> > > > >>> format
> > > >>> > > > >>>>> may
> > > >>> > > > >>>>>> be
> > > >>> > > > >>>>>>> different from scala-shell and sql-client) and then
> use
> > > >>> that
> > > >>> > > > >>>>>> ClusterClient
> > > >>> > > > >>>>>>> to manipulate jobs. I don't think this is convenient
> > for
> > > >>> > > > >> downstream
> > > >>> > > > >>>>>>> projects. What I expect is that downstream project
> only
> > > >>> needs
> > > >>> > to
> > > >>> > > > >>>>> provide
> > > >>> > > > >>>>>>> necessary configuration info (maybe introducing class
> > > >>> > FlinkConf),
> > > >>> > > > >>> and
> > > >>> > > > >>>>>> then
> > > >>> > > > >>>>>>> build ExecutionEnvironment based on this FlinkConf,
> and
> > > >>> > > > >>>>>>> ExecutionEnvironment will create the proper
> > > ClusterClient.
> > > >>> It
> > > >>> > not
> > > >>> > > > >>>> only
> > > >>> > > > >>>>>>> benefit for the downstream project development but
> also
> > > be
> > > >>> > > > >> helpful
> > > >>> > > > >>>> for
> > > >>> > > > >>>>>>> their integration test with flink. Here's one sample
> > code
> > > >>> > snippet
> > > >>> > > > >>>> that
> > > >>> > > > >>>>> I
> > > >>> > > > >>>>>>> expect.
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>> val conf = new FlinkConf().mode("yarn")
> > > >>> > > > >>>>>>> val env = new ExecutionEnvironment(conf)
> > > >>> > > > >>>>>>> val jobId = env.submit(...)
> > > >>> > > > >>>>>>> val jobStatus =
> > > >>> env.getClusterClient().queryJobStatus(jobId)
> > > >>> > > > >>>>>>> env.getClusterClient().cancelJob(jobId)
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>> What do you think ?
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>> Till Rohrmann <tr...@apache.org> 于2018年12月11日周二
> > > >>> 下午6:28写道：
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>>> Hi Jeff,
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>> what you are proposing is to provide the user with
> > > better
> > > >>> > > > >>>>> programmatic
> > > >>> > > > >>>>>>> job
> > > >>> > > > >>>>>>>> control. There was actually an effort to achieve
> this
> > > but
> > > >>> it
> > > >>> > > > >> has
> > > >>> > > > >>>>> never
> > > >>> > > > >>>>>>> been
> > > >>> > > > >>>>>>>> completed [1]. However, there are some improvement
> in
> > > the
> > > >>> code
> > > >>> > > > >>> base
> > > >>> > > > >>>>>> now.
> > > >>> > > > >>>>>>>> Look for example at the NewClusterClient interface
> > which
> > > >>> > > > >> offers a
> > > >>> > > > >>>>>>>> non-blocking job submission. But I agree that we
> need
> > to
> > > >>> > > > >> improve
> > > >>> > > > >>>>> Flink
> > > >>> > > > >>>>>> in
> > > >>> > > > >>>>>>>> this regard.
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>> I would not be in favour if exposing all
> ClusterClient
> > > >>> calls
> > > >>> > > > >> via
> > > >>> > > > >>>> the
> > > >>> > > > >>>>>>>> ExecutionEnvironment because it would clutter the
> > class
> > > >>> and
> > > >>> > > > >> would
> > > >>> > > > >>>> not
> > > >>> > > > >>>>>> be
> > > >>> > > > >>>>>>> a
> > > >>> > > > >>>>>>>> good separation of concerns. Instead one idea could
> be
> > > to
> > > >>> > > > >>> retrieve
> > > >>> > > > >>>>> the
> > > >>> > > > >>>>>>>> current ClusterClient from the ExecutionEnvironment
> > > which
> > > >>> can
> > > >>> > > > >>> then
> > > >>> > > > >>>> be
> > > >>> > > > >>>>>>> used
> > > >>> > > > >>>>>>>> for cluster and job control. But before we start an
> > > effort
> > > >>> > > > >> here,
> > > >>> > > > >>> we
> > > >>> > > > >>>>>> need
> > > >>> > > > >>>>>>> to
> > > >>> > > > >>>>>>>> agree and capture what functionality we want to
> > provide.
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>> Initially, the idea was that we have the
> > > ClusterDescriptor
> > > >>> > > > >>>> describing
> > > >>> > > > >>>>>> how
> > > >>> > > > >>>>>>>> to talk to cluster manager like Yarn or Mesos. The
> > > >>> > > > >>>> ClusterDescriptor
> > > >>> > > > >>>>>> can
> > > >>> > > > >>>>>>> be
> > > >>> > > > >>>>>>>> used for deploying Flink clusters (job and session)
> > and
> > > >>> gives
> > > >>> > > > >>> you a
> > > >>> > > > >>>>>>>> ClusterClient. The ClusterClient controls the
> cluster
> > > >>> (e.g.
> > > >>> > > > >>>>> submitting
> > > >>> > > > >>>>>>>> jobs, listing all running jobs). And then there was
> > the
> > > >>> idea
> > > >>> > to
> > > >>> > > > >>>>>>> introduce a
> > > >>> > > > >>>>>>>> JobClient which you obtain from the ClusterClient to
> > > >>> trigger
> > > >>> > > > >> job
> > > >>> > > > >>>>>> specific
> > > >>> > > > >>>>>>>> operations (e.g. taking a savepoint, cancelling the
> > > job).
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>> [1]
> https://issues.apache.org/jira/browse/FLINK-4272
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>> Cheers,
> > > >>> > > > >>>>>>>> Till
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang <
> > > >>> zjffdu@gmail.com
> > > >>> > >
> > > >>> > > > >>>>> wrote:
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>>> Hi Folks,
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> I am trying to integrate flink into apache zeppelin
> > > >>> which is
> > > >>> > > > >> an
> > > >>> > > > >>>>>>>> interactive
> > > >>> > > > >>>>>>>>> notebook. And I hit several issues that is caused
> by
> > > >>> flink
> > > >>> > > > >>> client
> > > >>> > > > >>>>>> api.
> > > >>> > > > >>>>>>> So
> > > >>> > > > >>>>>>>>> I'd like to proposal the following changes for
> flink
> > > >>> client
> > > >>> > > > >>> api.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> 1. Support nonblocking execution. Currently,
> > > >>> > > > >>>>>>> ExecutionEnvironment#execute
> > > >>> > > > >>>>>>>>> is a blocking method which would do 2 things, first
> > > >>> submit
> > > >>> > > > >> job
> > > >>> > > > >>>> and
> > > >>> > > > >>>>>> then
> > > >>> > > > >>>>>>>>> wait for job until it is finished. I'd like
> > introduce a
> > > >>> > > > >>>> nonblocking
> > > >>> > > > >>>>>>>>> execution method like ExecutionEnvironment#submit
> > which
> > > >>> only
> > > >>> > > > >>>> submit
> > > >>> > > > >>>>>> job
> > > >>> > > > >>>>>>>> and
> > > >>> > > > >>>>>>>>> then return jobId to client. And allow user to
> query
> > > the
> > > >>> job
> > > >>> > > > >>>> status
> > > >>> > > > >>>>>> via
> > > >>> > > > >>>>>>>> the
> > > >>> > > > >>>>>>>>> jobId.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> 2. Add cancel api in
> > > >>> > > > >>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > > >>> > > > >>>>>>>>> currently the only way to cancel job is via cli
> > > >>> (bin/flink),
> > > >>> > > > >>> this
> > > >>> > > > >>>>> is
> > > >>> > > > >>>>>>> not
> > > >>> > > > >>>>>>>>> convenient for downstream project to use this
> > feature.
> > > >>> So I'd
> > > >>> > > > >>>> like
> > > >>> > > > >>>>> to
> > > >>> > > > >>>>>>> add
> > > >>> > > > >>>>>>>>> cancel api in ExecutionEnvironment
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> 3. Add savepoint api in
> > > >>> > > > >>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > >>> > > > >>>>>>>> It
> > > >>> > > > >>>>>>>>> is similar as cancel api, we should use
> > > >>> ExecutionEnvironment
> > > >>> > > > >> as
> > > >>> > > > >>>> the
> > > >>> > > > >>>>>>>> unified
> > > >>> > > > >>>>>>>>> api for third party to integrate with flink.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> 4. Add listener for job execution lifecycle.
> > Something
> > > >>> like
> > > >>> > > > >>>>>> following,
> > > >>> > > > >>>>>>> so
> > > >>> > > > >>>>>>>>> that downstream project can do custom logic in the
> > > >>> lifecycle
> > > >>> > > > >> of
> > > >>> > > > >>>>> job.
> > > >>> > > > >>>>>>> e.g.
> > > >>> > > > >>>>>>>>> Zeppelin would capture the jobId after job is
> > submitted
> > > >>> and
> > > >>> > > > >>> then
> > > >>> > > > >>>>> use
> > > >>> > > > >>>>>>> this
> > > >>> > > > >>>>>>>>> jobId to cancel it later when necessary.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> public interface JobListener {
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>>   void onJobSubmitted(JobID jobId);
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>>   void onJobExecuted(JobExecutionResult jobResult);
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>>   void onJobCanceled(JobID jobId);
> > > >>> > > > >>>>>>>>> }
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> 5. Enable session in ExecutionEnvironment.
> Currently
> > it
> > > >>> is
> > > >>> > > > >>>>> disabled,
> > > >>> > > > >>>>>>> but
> > > >>> > > > >>>>>>>>> session is very convenient for third party to
> > > submitting
> > > >>> jobs
> > > >>> > > > >>>>>>>> continually.
> > > >>> > > > >>>>>>>>> I hope flink can enable it again.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> 6. Unify all flink client api into
> > > >>> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> This is a long term issue which needs more careful
> > > >>> thinking
> > > >>> > > > >> and
> > > >>> > > > >>>>>> design.
> > > >>> > > > >>>>>>>>> Currently some of features of flink is exposed in
> > > >>> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> but
> > > >>> some are
> > > >>> > > > >>>>> exposed
> > > >>> > > > >>>>>>> in
> > > >>> > > > >>>>>>>>> cli instead of api, like the cancel and savepoint I
> > > >>> mentioned
> > > >>> > > > >>>>> above.
> > > >>> > > > >>>>>> I
> > > >>> > > > >>>>>>>>> think the root cause is due to that flink didn't
> > unify
> > > >>> the
> > > >>> > > > >>>>>> interaction
> > > >>> > > > >>>>>>>> with
> > > >>> > > > >>>>>>>>> flink. Here I list 3 scenarios of flink operation
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>>   - Local job execution.  Flink will create
> > > >>> LocalEnvironment
> > > >>> > > > >>> and
> > > >>> > > > >>>>>> then
> > > >>> > > > >>>>>>>> use
> > > >>> > > > >>>>>>>>>   this LocalEnvironment to create LocalExecutor for
> > job
> > > >>> > > > >>>> execution.
> > > >>> > > > >>>>>>>>>   - Remote job execution. Flink will create
> > > ClusterClient
> > > >>> > > > >>> first
> > > >>> > > > >>>>> and
> > > >>> > > > >>>>>>> then
> > > >>> > > > >>>>>>>>>   create ContextEnvironment based on the
> > ClusterClient
> > > >>> and
> > > >>> > > > >>> then
> > > >>> > > > >>>>> run
> > > >>> > > > >>>>>>> the
> > > >>> > > > >>>>>>>>> job.
> > > >>> > > > >>>>>>>>>   - Job cancelation. Flink will create
> ClusterClient
> > > >>> first
> > > >>> > > > >> and
> > > >>> > > > >>>>> then
> > > >>> > > > >>>>>>>> cancel
> > > >>> > > > >>>>>>>>>   this job via this ClusterClient.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> As you can see in the above 3 scenarios. Flink
> didn't
> > > >>> use the
> > > >>> > > > >>>> same
> > > >>> > > > >>>>>>>>> approach(code path) to interact with flink
> > > >>> > > > >>>>>>>>> What I propose is following:
> > > >>> > > > >>>>>>>>> Create the proper
> LocalEnvironment/RemoteEnvironment
> > > >>> (based
> > > >>> > > > >> on
> > > >>> > > > >>>> user
> > > >>> > > > >>>>>>>>> configuration) --> Use this Environment to create
> > > proper
> > > >>> > > > >>>>>> ClusterClient
> > > >>> > > > >>>>>>>>> (LocalClusterClient or RestClusterClient) to
> > > interactive
> > > >>> with
> > > >>> > > > >>>>> Flink (
> > > >>> > > > >>>>>>> job
> > > >>> > > > >>>>>>>>> execution or cancelation)
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> This way we can unify the process of local
> execution
> > > and
> > > >>> > > > >> remote
> > > >>> > > > >>>>>>>> execution.
> > > >>> > > > >>>>>>>>> And it is much easier for third party to integrate
> > with
> > > >>> > > > >> flink,
> > > >>> > > > >>>>>> because
> > > >>> > > > >>>>>>>>> ExecutionEnvironment is the unified entry point for
> > > >>> flink.
> > > >>> > > > >> What
> > > >>> > > > >>>>> third
> > > >>> > > > >>>>>>>> party
> > > >>> > > > >>>>>>>>> needs to do is just pass configuration to
> > > >>> > > > >> ExecutionEnvironment
> > > >>> > > > >>>> and
> > > >>> > > > >>>>>>>>> ExecutionEnvironment will do the right thing based
> on
> > > the
> > > >>> > > > >>>>>>> configuration.
> > > >>> > > > >>>>>>>>> Flink cli can also be considered as flink api
> > consumer.
> > > >>> it
> > > >>> > > > >> just
> > > >>> > > > >>>>> pass
> > > >>> > > > >>>>>>> the
> > > >>> > > > >>>>>>>>> configuration to ExecutionEnvironment and let
> > > >>> > > > >>>> ExecutionEnvironment
> > > >>> > > > >>>>> to
> > > >>> > > > >>>>>>>>> create the proper ClusterClient instead of letting
> > cli
> > > to
> > > >>> > > > >>> create
> > > >>> > > > >>>>>>>>> ClusterClient directly.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> 6 would involve large code refactoring, so I think
> we
> > > can
> > > >>> > > > >> defer
> > > >>> > > > >>>> it
> > > >>> > > > >>>>>> for
> > > >>> > > > >>>>>>>>> future release, 1,2,3,4,5 could be done at once I
> > > >>> believe.
> > > >>> > > > >> Let
> > > >>> > > > >>> me
> > > >>> > > > >>>>>> know
> > > >>> > > > >>>>>>>> your
> > > >>> > > > >>>>>>>>> comments and feedback, thanks
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> --
> > > >>> > > > >>>>>>>>> Best Regards
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> Jeff Zhang
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>> --
> > > >>> > > > >>>>>>> Best Regards
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>> Jeff Zhang
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>> --
> > > >>> > > > >>>>> Best Regards
> > > >>> > > > >>>>>
> > > >>> > > > >>>>> Jeff Zhang
> > > >>> > > > >>>>>
> > > >>> > > > >>>>
> > > >>> > > > >>>
> > > >>> > > > >>>
> > > >>> > > > >>> --
> > > >>> > > > >>> Best Regards
> > > >>> > > > >>>
> > > >>> > > > >>> Jeff Zhang
> > > >>> > > > >>>
> > > >>> > > > >>
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > --
> > > >>> > > > > Best Regards
> > > >>> > > > >
> > > >>> > > > > Jeff Zhang
> > > >>> > > >
> > > >>> > > >
> > > >>> > >
> > > >>> > > --
> > > >>> > > Best Regards
> > > >>> > >
> > > >>> > > Jeff Zhang
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> > >
> >
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Aljoscha Krettek <al...@apache.org>.

Yes, this sounds reasonable!

Aljoscha

> On 19. Sep 2019, at 08:07, Zili Chen <wa...@gmail.com> wrote:
> 
> Hi Jeff,
> 
> I'm drafting the FLIP of JobClient. In my opinion JobClient is a thin
> encapsulation of current ClusterClient that users need not and should
> not pass JobID for the similar functions of ClusterClient.
> 
> Mainly it contains interfaces as below.
> 
> getJobResult(): CompletableFuture<JobResult>
> getJobStatus(): CompletableFuture<JobStatus>
> 
> cancel(): CompletableFuture<Acknowledge>
> cancelWithSavepoint(savepointDir): CompletableFuture<savepoint path>
> stopWithSavepoint(savepointDir,
> advanceToEndOfEventTime): CompletableFuture<savepoint path>
> 
> triggerSavepoint(savepointDir): CompletableFuture<savepoint path>
> getAccumulators: CompletableFuture<Map<acc name, acc value>>
> 
> Is it the same as that kept in your mind?
> 
> Besides, I recommend you participant the slack channel[1] so that
> we follow the consensus above and discuss there :-)
> 
> Best,
> tison.
> 
> [1] https://the-asf.slack.com/messages/CNA3ADZPH
> 
> 
> Zili Chen <wa...@gmail.com> 于2019年9月11日周三 下午5:54写道：
> 
>> s/ASK/ASF/
>> 
>> Best,
>> tison.
>> 
>> 
>> Zili Chen <wa...@gmail.com> 于2019年9月11日周三 下午5:53写道：
>> 
>>> FYI I've created a public channel #flink-client-api-enhancement under ASK
>>> slack.
>>> 
>>> You can reach it after join ASK Slack and search the channel.
>>> 
>>> Best,
>>> tison.
>>> 
>>> 
>>> Kostas Kloudas <kk...@gmail.com> 于2019年9月11日周三 下午5:09写道：
>>> 
>>>> +1 for creating a channel.
>>>> 
>>>> Kostas
>>>> 
>>>> On Wed, Sep 11, 2019 at 10:57 AM Zili Chen <wa...@gmail.com> wrote:
>>>>> 
>>>>> Hi Aljoscha,
>>>>> 
>>>>> I'm OK to use the ASF slack.
>>>>> 
>>>>> Best,
>>>>> tison.
>>>>> 
>>>>> 
>>>>> Jeff Zhang <zj...@gmail.com> 于2019年9月11日周三 下午4:48写道：
>>>>>> 
>>>>>> +1 for using slack for instant communication
>>>>>> 
>>>>>> Aljoscha Krettek <al...@apache.org> 于2019年9月11日周三 下午4:44写道：
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> We could try and use the ASF slack for this purpose, that would
>>>> probably be easiest. See https://s.apache.org/slack-invite. We could
>>>> create a dedicated channel for our work and would still use the open ASF
>>>> infrastructure and people can have a look if they are interested because
>>>> discussion would be public. What do you think?
>>>>>>> 
>>>>>>> P.S. Committers/PMCs should should be able to login with their
>>>> apache ID.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Aljoscha
>>>>>>> 
>>>>>>>> On 6. Sep 2019, at 14:24, Zili Chen <wa...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hi Aljoscha,
>>>>>>>> 
>>>>>>>> I'd like to gather all the ideas here and among documents, and
>>>> draft a
>>>>>>>> formal FLIP
>>>>>>>> that keep us on the same page. Hopefully I start a FLIP thread in
>>>> next week.
>>>>>>>> 
>>>>>>>> For the implementation or said POC part, I'd like to work with you
>>>> guys who
>>>>>>>> proposed
>>>>>>>> the concept Executor to make sure that we go in the same
>>>> direction. I'm
>>>>>>>> wondering
>>>>>>>> whether a dedicate thread or a Slack group is the proper one. In
>>>> my opinion
>>>>>>>> we can
>>>>>>>> involve the team in a Slack group, concurrent with the FLIP
>>>> process start
>>>>>>>> our branch
>>>>>>>> and once we reach a consensus on the FLIP, open an umbrella issue
>>>> about the
>>>>>>>> framework
>>>>>>>> and start subtasks. What do you think?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> tison.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Aljoscha Krettek <al...@apache.org> 于2019年9月5日周四 下午9:39写道：
>>>>>>>> 
>>>>>>>>> Hi Tison,
>>>>>>>>> 
>>>>>>>>> To keep this moving forward, maybe you want to start working on a
>>>> proof of
>>>>>>>>> concept implementation for the new JobClient interface, maybe
>>>> with a new
>>>>>>>>> method executeAsync() in the environment that returns the
>>>> JobClient and
>>>>>>>>> implement the methods to see how that works and to see where we
>>>> get. Would
>>>>>>>>> you be interested in that?
>>>>>>>>> 
>>>>>>>>> Also, at some point we should collect all the ideas and start
>>>> forming an
>>>>>>>>> actual FLIP.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Aljoscha
>>>>>>>>> 
>>>>>>>>>> On 4. Sep 2019, at 12:04, Zili Chen <wa...@gmail.com>
>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Thanks for your update Kostas!
>>>>>>>>>> 
>>>>>>>>>> It looks good to me that clean up existing code paths as first
>>>>>>>>>> pass. I'd like to help on review and file subtasks if I find
>>>> ones.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> tison.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Kostas Kloudas <kk...@gmail.com> 于2019年9月4日周三 下午5:52写道：
>>>>>>>>>> Here is the issue, and I will keep on updating it as I find more
>>>> issues.
>>>>>>>>>> 
>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-13954
>>>>>>>>>> 
>>>>>>>>>> This will also cover the refactoring of the Executors that we
>>>> discussed
>>>>>>>>>> in this thread, without any additional functionality (such as
>>>> the job
>>>>>>>>> client).
>>>>>>>>>> 
>>>>>>>>>> Kostas
>>>>>>>>>> 
>>>>>>>>>> On Wed, Sep 4, 2019 at 11:46 AM Kostas Kloudas <
>>>> kkloudas@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Great idea Tison!
>>>>>>>>>>> 
>>>>>>>>>>> I will create the umbrella issue and post it here so that we
>>>> are all
>>>>>>>>>>> on the same page!
>>>>>>>>>>> 
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Kostas
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Sep 4, 2019 at 11:36 AM Zili Chen <wander4096@gmail.com
>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi Kostas & Aljoscha,
>>>>>>>>>>>> 
>>>>>>>>>>>> I notice that there is a JIRA(FLINK-13946) which could be
>>>> included
>>>>>>>>>>>> in this refactor thread. Since we agree on most of directions
>>>> in
>>>>>>>>>>>> big picture, is it reasonable that we create an umbrella issue
>>>> for
>>>>>>>>>>>> refactor client APIs and also linked subtasks? It would be a
>>>> better
>>>>>>>>>>>> way that we join forces of our community.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> tison.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月31日周六 下午12:52写道：
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Great Kostas! Looking forward to your POC!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> tison.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jeff Zhang <zj...@gmail.com> 于2019年8月30日周五 下午11:07写道：
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Awesome, @Kostas Looking forward your POC.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I am just writing here to let you know that I am working on
>>>> a
>>>>>>>>> POC that
>>>>>>>>>>>>>>> tries to refactor the current state of job submission in
>>>> Flink.
>>>>>>>>>>>>>>> I want to stress out that it introduces NO CHANGES to the
>>>> current
>>>>>>>>>>>>>>> behaviour of Flink. It just re-arranges things and
>>>> introduces the
>>>>>>>>>>>>>>> notion of an Executor, which is the entity responsible for
>>>>>>>>> taking the
>>>>>>>>>>>>>>> user-code and submitting it for execution.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Given this, the discussion about the functionality that the
>>>>>>>>> JobClient
>>>>>>>>>>>>>>> will expose to the user can go on independently and the same
>>>>>>>>>>>>>>> holds for all the open questions so far.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I hope I will have some more new to share soon.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Kostas
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <
>>>> danrtsey.wy@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Zili,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> It make sense to me that a dedicated cluster is started
>>>> for a
>>>>>>>>> per-job
>>>>>>>>>>>>>>>> cluster and will not accept more jobs.
>>>>>>>>>>>>>>>> Just have a question about the command line.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Currently we could use the following commands to start
>>>>>>>>> different
>>>>>>>>>>>>>>> clusters.
>>>>>>>>>>>>>>>> *per-job cluster*
>>>>>>>>>>>>>>>> ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m
>>>> yarn-cluster
>>>>>>>>>>>>>>>> examples/streaming/WindowJoin.jar
>>>>>>>>>>>>>>>> *session cluster*
>>>>>>>>>>>>>>>> ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
>>>>>>>>>>>>>>>> examples/streaming/WindowJoin.jar
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> What will it look like after client enhancement?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Yang
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi Till,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for your update. Nice to hear :-)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> tison.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org> 于2019年8月23日周五
>>>>>>>>> 下午10:39写道：
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi Tison,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> just a quick comment concerning the class loading issues
>>>>>>>>> when using
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> per
>>>>>>>>>>>>>>>>>> job mode. The community wants to change it so that the
>>>>>>>>>>>>>>>>>> StandaloneJobClusterEntryPoint actually uses the user
>>>> code
>>>>>>>>> class
>>>>>>>>>>>>>>> loader
>>>>>>>>>>>>>>>>>> with child first class loading [1]. Hence, I hope that
>>>>>>>>> this problem
>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>> resolved soon.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-13840
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <
>>>>>>>>> kkloudas@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On the topic of web submission, I agree with Till that
>>>>>>>>> it only
>>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>>>>>> to complicate things.
>>>>>>>>>>>>>>>>>>> It is bad for security, job isolation (anybody can
>>>>>>>>> submit/cancel
>>>>>>>>>>>>>>> jobs),
>>>>>>>>>>>>>>>>>>> and its
>>>>>>>>>>>>>>>>>>> implementation complicates some parts of the code. So,
>>>>>>>>> if it were
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> redesign the
>>>>>>>>>>>>>>>>>>> WebUI, maybe this part could be left out. In addition, I
>>>>>>>>> would say
>>>>>>>>>>>>>>>>>>> that the ability to cancel
>>>>>>>>>>>>>>>>>>> jobs could also be left out.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Also I would also be in favour of removing the
>>>>>>>>> "detached" mode, for
>>>>>>>>>>>>>>>>>>> the reasons mentioned
>>>>>>>>>>>>>>>>>>> above (i.e. because now we will have a future
>>>>>>>>> representing the
>>>>>>>>>>>>>>> result
>>>>>>>>>>>>>>>>>>> on which the user
>>>>>>>>>>>>>>>>>>> can choose to wait or not).
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Now for the separating job submission and cluster
>>>>>>>>> creation, I am in
>>>>>>>>>>>>>>>>>>> favour of keeping both.
>>>>>>>>>>>>>>>>>>> Once again, the reasons are mentioned above by Stephan,
>>>>>>>>> Till,
>>>>>>>>>>>>>>> Aljoscha
>>>>>>>>>>>>>>>>>>> and also Zili seems
>>>>>>>>>>>>>>>>>>> to agree. They mainly have to do with security,
>>>>>>>>> isolation and ease
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> resource management
>>>>>>>>>>>>>>>>>>> for the user as he knows that "when my job is done,
>>>>>>>>> everything
>>>>>>>>>>>>>>> will be
>>>>>>>>>>>>>>>>>>> cleared up". This is
>>>>>>>>>>>>>>>>>>> also the experience you get when launching a process on
>>>>>>>>> your local
>>>>>>>>>>>>>>> OS.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On excluding the per-job mode from returning a JobClient
>>>>>>>>> or not, I
>>>>>>>>>>>>>>>>>>> believe that eventually
>>>>>>>>>>>>>>>>>>> it would be nice to allow users to get back a jobClient.
>>>>>>>>> The
>>>>>>>>>>>>>>> reason is
>>>>>>>>>>>>>>>>>>> that 1) I cannot
>>>>>>>>>>>>>>>>>>> find any objective reason why the user-experience should
>>>>>>>>> diverge,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> 2) this will be the
>>>>>>>>>>>>>>>>>>> way that the user will be able to interact with his
>>>>>>>>> running job.
>>>>>>>>>>>>>>>>>>> Assuming that the necessary
>>>>>>>>>>>>>>>>>>> ports are open for the REST API to work, then I think
>>>>>>>>> that the
>>>>>>>>>>>>>>>>>>> JobClient can run against the
>>>>>>>>>>>>>>>>>>> REST API without problems. If the needed ports are not
>>>>>>>>> open, then
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> are safe to not return
>>>>>>>>>>>>>>>>>>> a JobClient, as the user explicitly chose to close all
>>>>>>>>> points of
>>>>>>>>>>>>>>>>>>> communication to his running job.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On the topic of not hijacking the "env.execute()" in
>>>>>>>>> order to get
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> Plan, I definitely agree but
>>>>>>>>>>>>>>>>>>> for the proposal of having a "compile()" method in the
>>>>>>>>> env, I would
>>>>>>>>>>>>>>>>>>> like to have a better look at
>>>>>>>>>>>>>>>>>>> the existing code.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>> Kostas
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <
>>>>>>>>> wander4096@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hi Yang,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> It would be helpful if you check Stephan's last
>>>>>>>>> comment,
>>>>>>>>>>>>>>>>>>>> which states that isolation is important.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> For per-job mode, we run a dedicated cluster(maybe it
>>>>>>>>>>>>>>>>>>>> should have been a couple of JM and TMs during FLIP-6
>>>>>>>>>>>>>>>>>>>> design) for a specific job. Thus the process is
>>>>>>>>> prevented
>>>>>>>>>>>>>>>>>>>> from other jobs.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> In our cases there was a time we suffered from multi
>>>>>>>>>>>>>>>>>>>> jobs submitted by different users and they affected
>>>>>>>>>>>>>>>>>>>> each other so that all ran into an error state. Also,
>>>>>>>>>>>>>>>>>>>> run the client inside the cluster could save client
>>>>>>>>>>>>>>>>>>>> resource at some points.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> However, we also face several issues as you mentioned,
>>>>>>>>>>>>>>>>>>>> that in per-job mode it always uses parent classloader
>>>>>>>>>>>>>>>>>>>> thus classloading issues occur.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> BTW, one can makes an analogy between session/per-job
>>>>>>>>> mode
>>>>>>>>>>>>>>>>>>>> in  Flink, and client/cluster mode in Spark.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> tison.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Yang Wang <da...@gmail.com> 于2019年8月22日周四
>>>>>>>>> 上午11:25写道：
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> From the user's perspective, it is really confused
>>>>>>>>> about the
>>>>>>>>>>>>>>> scope
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> per-job cluster.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> If it means a flink cluster with single job, so that
>>>>>>>>> we could
>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>> better
>>>>>>>>>>>>>>>>>>>>> isolation.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Now it does not matter how we deploy the cluster,
>>>>>>>>> directly
>>>>>>>>>>>>>>>>>>> deploy(mode1)
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> or start a flink cluster and then submit job through
>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>> client(mode2).
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Otherwise, if it just means directly deploy, how
>>>>>>>>> should we
>>>>>>>>>>>>>>> name the
>>>>>>>>>>>>>>>>>>> mode2,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> session with job or something else?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> We could also benefit from the mode2. Users could
>>>>>>>>> get the same
>>>>>>>>>>>>>>>>>>> isolation
>>>>>>>>>>>>>>>>>>>>> with mode1.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> The user code and dependencies will be loaded by
>>>>>>>>> user class
>>>>>>>>>>>>>>> loader
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> to avoid class conflict with framework.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Anyway, both of the two submission modes are useful.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> We just need to clarify the concepts.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Yang
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月20日周二
>>>>>>>>> 下午5:58写道：
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Thanks for the clarification.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> The idea JobDeployer ever came into my mind when I
>>>>>>>>> was
>>>>>>>>>>>>>>> muddled
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>> how to execute per-job mode and session mode with
>>>>>>>>> the same
>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>> and framework codepath.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> With the concept JobDeployer we back to the
>>>>>>>>> statement that
>>>>>>>>>>>>>>>>>>> environment
>>>>>>>>>>>>>>>>>>>>>> knows every configs of cluster deployment and job
>>>>>>>>>>>>>>> submission. We
>>>>>>>>>>>>>>>>>>>>>> configure or generate from configuration a specific
>>>>>>>>>>>>>>> JobDeployer
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>> environment and then code align on
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> *JobClient client = env.execute().get();*
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> which in session mode returned by
>>>>>>>>> clusterClient.submitJob
>>>>>>>>>>>>>>> and in
>>>>>>>>>>>>>>>>>>> per-job
>>>>>>>>>>>>>>>>>>>>>> mode returned by
>>>>>>>>> clusterDescriptor.deployJobCluster.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Here comes a problem that currently we directly run
>>>>>>>>>>>>>>>>>> ClusterEntrypoint
>>>>>>>>>>>>>>>>>>>>>> with extracted job graph. Follow the JobDeployer
>>>>>>>>> way we'd
>>>>>>>>>>>>>>> better
>>>>>>>>>>>>>>>>>>>>>> align entry point of per-job deployment at
>>>>>>>>> JobDeployer.
>>>>>>>>>>>>>>> Users run
>>>>>>>>>>>>>>>>>>>>>> their main method or by a Cli(finally call main
>>>>>>>>> method) to
>>>>>>>>>>>>>>> deploy
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> job cluster.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>> tison.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年8月20日周二
>>>>>>>>> 下午4:40写道：
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Till has made some good comments here.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Two things to add:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> - The job mode is very nice in the way that it
>>>>>>>>> runs the
>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>> inside
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> cluster (in the same image/process that is the
>>>>>>>>> JM) and thus
>>>>>>>>>>>>>>>>>> unifies
>>>>>>>>>>>>>>>>>>>>> both
>>>>>>>>>>>>>>>>>>>>>>> applications and what the Spark world calls the
>>>>>>>>> "driver
>>>>>>>>>>>>>>> mode".
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> - Another thing I would add is that during the
>>>>>>>>> FLIP-6
>>>>>>>>>>>>>>> design,
>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> were
>>>>>>>>>>>>>>>>>>>>>>> thinking about setups where Dispatcher and
>>>>>>>>> JobManager are
>>>>>>>>>>>>>>>>>> separate
>>>>>>>>>>>>>>>>>>>>>>> processes.
>>>>>>>>>>>>>>>>>>>>>>>   A Yarn or Mesos Dispatcher of a session
>>>>>>>>> could run
>>>>>>>>>>>>>>>>>> independently
>>>>>>>>>>>>>>>>>>>>> (even
>>>>>>>>>>>>>>>>>>>>>>> as privileged processes executing no code).
>>>>>>>>>>>>>>>>>>>>>>>   Then you the "per-job" mode could still be
>>>>>>>>> helpful:
>>>>>>>>>>>>>>> when a
>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>> submitted to the dispatcher, it launches the JM
>>>>>>>>> again in a
>>>>>>>>>>>>>>>>>> per-job
>>>>>>>>>>>>>>>>>>>>> mode,
>>>>>>>>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>>>>>>> that JM and TM processes are bound to teh job
>>>>>>>>> only. For
>>>>>>>>>>>>>>> higher
>>>>>>>>>>>>>>>>>>> security
>>>>>>>>>>>>>>>>>>>>>>> setups, it is important that processes are not
>>>>>>>>> reused
>>>>>>>>>>>>>>> across
>>>>>>>>>>>>>>>>>> jobs.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
>>>>>>>>>>>>>>>>>>> trohrmann@apache.org>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> I would not be in favour of getting rid of the
>>>>>>>>> per-job
>>>>>>>>>>>>>>> mode
>>>>>>>>>>>>>>>>>>> since it
>>>>>>>>>>>>>>>>>>>>>>>> simplifies the process of running Flink jobs
>>>>>>>>>>>>>>> considerably.
>>>>>>>>>>>>>>>>>>> Moreover,
>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>> not only well suited for container deployments
>>>>>>>>> but also
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>> deployments
>>>>>>>>>>>>>>>>>>>>>>>> where you want to guarantee job isolation. For
>>>>>>>>> example, a
>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>> the per-job mode on Yarn to execute his job on
>>>>>>>>> a separate
>>>>>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> I think that having two notions of cluster
>>>>>>>>> deployments
>>>>>>>>>>>>>>>>> (session
>>>>>>>>>>>>>>>>>>> vs.
>>>>>>>>>>>>>>>>>>>>>>> per-job
>>>>>>>>>>>>>>>>>>>>>>>> mode) does not necessarily contradict your
>>>>>>>>> ideas for the
>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>> api
>>>>>>>>>>>>>>>>>>>>>>>> refactoring. For example one could have the
>>>>>>>>> following
>>>>>>>>>>>>>>>>>> interfaces:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> - ClusterDeploymentDescriptor: encapsulates
>>>>>>>>> the logic
>>>>>>>>>>>>>>> how to
>>>>>>>>>>>>>>>>>>> deploy a
>>>>>>>>>>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>>>>>>>>> - ClusterClient: allows to interact with a
>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>>> - JobClient: allows to interact with a running
>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Now the ClusterDeploymentDescriptor could have
>>>>>>>>> two
>>>>>>>>>>>>>>> methods:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> - ClusterClient deploySessionCluster()
>>>>>>>>>>>>>>>>>>>>>>>> - JobClusterClient/JobClient
>>>>>>>>>>>>>>> deployPerJobCluster(JobGraph)
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> where JobClusterClient is either a supertype of
>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>> not give you the functionality to submit jobs
>>>>>>>>> or
>>>>>>>>>>>>>>>>>>> deployPerJobCluster
>>>>>>>>>>>>>>>>>>>>>>>> returns directly a JobClient.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> When setting up the ExecutionEnvironment, one
>>>>>>>>> would then
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>> provide
>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient to submit jobs but a JobDeployer
>>>>>>>>> which,
>>>>>>>>>>>>>>>>> depending
>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> selected mode, either uses a ClusterClient
>>>>>>>>> (session
>>>>>>>>>>>>>>> mode) to
>>>>>>>>>>>>>>>>>>> submit
>>>>>>>>>>>>>>>>>>>>>> jobs
>>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>> a ClusterDeploymentDescriptor to deploy per a
>>>>>>>>> job mode
>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>> to execute.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> These are just some thoughts how one could
>>>>>>>>> make it
>>>>>>>>>>>>>>> working
>>>>>>>>>>>>>>>>>>> because I
>>>>>>>>>>>>>>>>>>>>>>>> believe there is some value in using the per
>>>>>>>>> job mode
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Concerning the web submission, this is indeed
>>>>>>>>> a bit
>>>>>>>>>>>>>>> tricky.
>>>>>>>>>>>>>>>>>> From
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>>> management stand point, I would in favour of
>>>>>>>>> not
>>>>>>>>>>>>>>> executing
>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> REST endpoint. Especially when considering
>>>>>>>>> security, it
>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> have a well defined cluster behaviour where it
>>>>>>>>> is
>>>>>>>>>>>>>>> explicitly
>>>>>>>>>>>>>>>>>>> stated
>>>>>>>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>>>>>>> user code and, thus, potentially risky code is
>>>>>>>>> executed.
>>>>>>>>>>>>>>>>>> Ideally
>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>> limit
>>>>>>>>>>>>>>>>>>>>>>>> it to the TaskExecutor and JobMaster.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:40 AM Flavio
>>>>>>>>> Pompermaier <
>>>>>>>>>>>>>>>>>>>>>> pompermaier@okkam.it
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> In my opinion the client should not use any
>>>>>>>>>>>>>>> environment to
>>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> Job
>>>>>>>>>>>>>>>>>>>>>>>>> graph because the jar should reside ONLY on
>>>>>>>>> the cluster
>>>>>>>>>>>>>>>>> (and
>>>>>>>>>>>>>>>>>>> not in
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> client classpath otherwise there are always
>>>>>>>>>>>>>>> inconsistencies
>>>>>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>>> and Flink Job manager's classpath).
>>>>>>>>>>>>>>>>>>>>>>>>> In the YARN, Mesos and Kubernetes scenarios
>>>>>>>>> you have
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> jar
>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>>>>> start a cluster that has the jar on the Job
>>>>>>>>> Manager as
>>>>>>>>>>>>>>> well
>>>>>>>>>>>>>>>>>>> (but
>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>> the only case where I think you can assume
>>>>>>>>> that the
>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> jar
>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>> the classpath..in the REST job submission
>>>>>>>>> you don't
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>> any
>>>>>>>>>>>>>>>>>>>>>>> classpath).
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Thus, always in my opinion, the JobGraph
>>>>>>>>> should be
>>>>>>>>>>>>>>>>> generated
>>>>>>>>>>>>>>>>>>> by the
>>>>>>>>>>>>>>>>>>>>>> Job
>>>>>>>>>>>>>>>>>>>>>>>>> Manager REST API.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
>>>>>>>>>>>>>>>>>>> wander4096@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to involve Till & Stephan here
>>>>>>>>> to clarify
>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>> concept
>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>> per-job mode.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> The term per-job is one of modes a cluster
>>>>>>>>> could run
>>>>>>>>>>>>>>> on.
>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> mainly
>>>>>>>>>>>>>>>>>>>>>>>>>> aimed
>>>>>>>>>>>>>>>>>>>>>>>>>> at spawn
>>>>>>>>>>>>>>>>>>>>>>>>>> a dedicated cluster for a specific job
>>>>>>>>> while the job
>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>> packaged
>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>>>>>>>>>> itself and thus the cluster initialized
>>>>>>>>> with job so
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>> rid
>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>> separated
>>>>>>>>>>>>>>>>>>>>>>>>>> submission step.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> This is useful for container deployments
>>>>>>>>> where one
>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>> his
>>>>>>>>>>>>>>>>>>>>> image
>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>> the job
>>>>>>>>>>>>>>>>>>>>>>>>>> and then simply deploy the container.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> However, it is out of client scope since a
>>>>>>>>>>>>>>>>>>> client(ClusterClient
>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>> example) is for
>>>>>>>>>>>>>>>>>>>>>>>>>> communicate with an existing cluster and
>>>>>>>>> performance
>>>>>>>>>>>>>>>>>> actions.
>>>>>>>>>>>>>>>>>>>>>>> Currently,
>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>> per-job
>>>>>>>>>>>>>>>>>>>>>>>>>> mode, we extract the job graph and bundle
>>>>>>>>> it into
>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>> thus no
>>>>>>>>>>>>>>>>>>>>>>>>>> concept of client get involved. It looks
>>>>>>>>> like
>>>>>>>>>>>>>>> reasonable
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> exclude
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>>>>>>>>>>>> of per-job cluster from client api and use
>>>>>>>>> dedicated
>>>>>>>>>>>>>>>>> utility
>>>>>>>>>>>>>>>>>>>>>>>>>> classes(deployers) for
>>>>>>>>>>>>>>>>>>>>>>>>>> deployment.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
>>>>>>>>> 于2019年8月20日周二
>>>>>>>>>>>>>>> 下午12:37写道：
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Aljoscha,
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your reply and participance.
>>>>>>>>> The Google
>>>>>>>>>>>>>>> Doc
>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>> linked
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>> requires
>>>>>>>>>>>>>>>>>>>>>>>>>>> permission and I think you could use a
>>>>>>>>> share link
>>>>>>>>>>>>>>>>> instead.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with that we almost reach a
>>>>>>>>> consensus that
>>>>>>>>>>>>>>>>>>> JobClient is
>>>>>>>>>>>>>>>>>>>>>>>>>> necessary
>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>> interacte with a running Job.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me check your open questions one by
>>>>>>>>> one.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Separate cluster creation and job
>>>>>>>>> submission for
>>>>>>>>>>>>>>>>>> per-job
>>>>>>>>>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> As you mentioned here is where the
>>>>>>>>> opinions
>>>>>>>>>>>>>>> diverge. In
>>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>>>>>>> document
>>>>>>>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>> an alternative[2] that proposes excluding
>>>>>>>>> per-job
>>>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>>>>> api
>>>>>>>>>>>>>>>>>>>>>>>>>>> scope and now I find it is more
>>>>>>>>> reasonable we do the
>>>>>>>>>>>>>>>>>>> exclusion.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> When in per-job mode, a dedicated
>>>>>>>>> JobCluster is
>>>>>>>>>>>>>>> launched
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> execute
>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> specific job. It is like a Flink
>>>>>>>>> Application more
>>>>>>>>>>>>>>> than a
>>>>>>>>>>>>>>>>>>>>>> submission
>>>>>>>>>>>>>>>>>>>>>>>>>>> of Flink Job. Client only takes care of
>>>>>>>>> job
>>>>>>>>>>>>>>> submission
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> assume
>>>>>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>> an existing cluster. In this way we are
>>>>>>>>> able to
>>>>>>>>>>>>>>> consider
>>>>>>>>>>>>>>>>>>> per-job
>>>>>>>>>>>>>>>>>>>>>>>> issues
>>>>>>>>>>>>>>>>>>>>>>>>>>> individually and JobClusterEntrypoint
>>>>>>>>> would be the
>>>>>>>>>>>>>>>>> utility
>>>>>>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>> per-job
>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Nevertheless, user program works in both
>>>>>>>>> session
>>>>>>>>>>>>>>> mode
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> per-job
>>>>>>>>>>>>>>>>>>>>>>> mode
>>>>>>>>>>>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary to change code. JobClient in
>>>>>>>>> per-job mode
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> returned
>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>>>>> env.execute as normal. However, it would
>>>>>>>>> be no
>>>>>>>>>>>>>>> longer a
>>>>>>>>>>>>>>>>>>> wrapper
>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>> RestClusterClient but a wrapper of
>>>>>>>>>>>>>>> PerJobClusterClient
>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>> communicates
>>>>>>>>>>>>>>>>>>>>>>>>>>> to Dispatcher locally.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. How to deal with plan preview.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> With env.compile functions users can get
>>>>>>>>> JobGraph or
>>>>>>>>>>>>>>>>>>> FlinkPlan
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> thus
>>>>>>>>>>>>>>>>>>>>>>>>>>> they can preview the plan with
>>>>>>>>> programming.
>>>>>>>>>>>>>>> Typically it
>>>>>>>>>>>>>>>>>>> looks
>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> if (preview configured) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>   FlinkPlan plan = env.compile();
>>>>>>>>>>>>>>>>>>>>>>>>>>>   new JSONDumpGenerator(...).dump(plan);
>>>>>>>>>>>>>>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>>>>>>>>>>>>>>   env.execute();
>>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> And `flink info` would be invalid any
>>>>>>>>> more.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. How to deal with Jar Submission at the
>>>>>>>>> Web
>>>>>>>>>>>>>>> Frontend.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> There is one more thread talked on this
>>>>>>>>> topic[1].
>>>>>>>>>>>>>>> Apart
>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>> removing
>>>>>>>>>>>>>>>>>>>>>>>>>>> the functions there are two alternatives.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> One is to introduce an interface has a
>>>>>>>>> method
>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>>>>>>>> JobGraph/FilnkPlan
>>>>>>>>>>>>>>>>>>>>>>>>>>> and Jar Submission only support main-class
>>>>>>>>>>>>>>> implements
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>> interface.
>>>>>>>>>>>>>>>>>>>>>>>>>>> And then extract the JobGraph/FlinkPlan
>>>>>>>>> just by
>>>>>>>>>>>>>>> calling
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> method.
>>>>>>>>>>>>>>>>>>>>>>>>>>> In this way, it is even possible to
>>>>>>>>> consider a
>>>>>>>>>>>>>>>>> separation
>>>>>>>>>>>>>>>>>>> of job
>>>>>>>>>>>>>>>>>>>>>>>>>> creation
>>>>>>>>>>>>>>>>>>>>>>>>>>> and job submission.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> The other is, as you mentioned, let
>>>>>>>>> execute() do the
>>>>>>>>>>>>>>>>>> actual
>>>>>>>>>>>>>>>>>>>>>>> execution.
>>>>>>>>>>>>>>>>>>>>>>>>>>> We won't execute the main method in the
>>>>>>>>> WebFrontend
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> spawn a
>>>>>>>>>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>>>>>>>>>>>>> at WebMonitor side to execute. For return
>>>>>>>>> part we
>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>> generate
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> JobID from WebMonitor and pass it to the
>>>>>>>>> execution
>>>>>>>>>>>>>>>>>>> environemnt.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. How to deal with detached mode.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> I think detached mode is a temporary
>>>>>>>>> solution for
>>>>>>>>>>>>>>>>>>> non-blocking
>>>>>>>>>>>>>>>>>>>>>>>>>> submission.
>>>>>>>>>>>>>>>>>>>>>>>>>>> In my document both submission and
>>>>>>>>> execution return
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>> CompletableFuture
>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>> users control whether or not wait for the
>>>>>>>>> result. In
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>> point
>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>>>>>>>>> need a detached option but the
>>>>>>>>> functionality is
>>>>>>>>>>>>>>> covered.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 5. How does per-job mode interact with
>>>>>>>>> interactive
>>>>>>>>>>>>>>>>>>> programming.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> All of YARN, Mesos and Kubernetes
>>>>>>>>> scenarios follow
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> pattern
>>>>>>>>>>>>>>>>>>>>>>> launch
>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>>> JobCluster now. And I don't think there
>>>>>>>>> would be
>>>>>>>>>>>>>>>>>>> inconsistency
>>>>>>>>>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>>>>>>>>>>>>> different resource management.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org>
>>>>>>>>>>>>>>> 于2019年8月16日周五
>>>>>>>>>>>>>>>>>>> 下午9:20写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I read both Jeffs initial design
>>>>>>>>> document and the
>>>>>>>>>>>>>>> newer
>>>>>>>>>>>>>>>>>>>>> document
>>>>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tison. I also finally found the time to
>>>>>>>>> collect our
>>>>>>>>>>>>>>>>>>> thoughts on
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> issue,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I had quite some discussions with Kostas
>>>>>>>>> and this
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> result:
>>>>>>>>>>>>>>>>>>>>>>> [1].
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think overall we agree that this part
>>>>>>>>> of the
>>>>>>>>>>>>>>> code is
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> dire
>>>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> some refactoring/improvements but I
>>>>>>>>> think there are
>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>> open
>>>>>>>>>>>>>>>>>>>>>>>>>>>> questions and some differences in
>>>>>>>>> opinion what
>>>>>>>>>>>>>>> those
>>>>>>>>>>>>>>>>>>>>> refactorings
>>>>>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>>>>>> look like.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think the API-side is quite clear,
>>>>>>>>> i.e. we need
>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>> JobClient
>>>>>>>>>>>>>>>>>>>>>>> API
>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>> allows interacting with a running Job.
>>>>>>>>> It could be
>>>>>>>>>>>>>>>>>>> worthwhile
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> spin
>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>> off into a separate FLIP because we can
>>>>>>>>> probably
>>>>>>>>>>>>>>> find
>>>>>>>>>>>>>>>>>>> consensus
>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>> part more easily.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the rest, the main open questions
>>>>>>>>> from our doc
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>> these:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Do we want to separate cluster
>>>>>>>>> creation and job
>>>>>>>>>>>>>>>>>>> submission
>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>> per-job mode? In the past, there were
>>>>>>>>> conscious
>>>>>>>>>>>>>>> efforts
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> *not*
>>>>>>>>>>>>>>>>>>>>>>>>>> separate
>>>>>>>>>>>>>>>>>>>>>>>>>>>> job submission from cluster creation for
>>>>>>>>> per-job
>>>>>>>>>>>>>>>>> clusters
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>> Mesos,
>>>>>>>>>>>>>>>>>>>>>>>>>> YARN,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Kubernets (see
>>>>>>>>> StandaloneJobClusterEntryPoint).
>>>>>>>>>>>>>>> Tison
>>>>>>>>>>>>>>>>>>> suggests
>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>> his
>>>>>>>>>>>>>>>>>>>>>>>>>>>> design document to decouple this in
>>>>>>>>> order to unify
>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>> submission.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - How to deal with plan preview, which
>>>>>>>>> needs to
>>>>>>>>>>>>>>>>> hijack
>>>>>>>>>>>>>>>>>>>>>> execute()
>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>> let the outside code catch an exception?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - How to deal with Jar Submission at
>>>>>>>>> the Web
>>>>>>>>>>>>>>>>> Frontend,
>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>> needs
>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> hijack execute() and let the outside
>>>>>>>>> code catch an
>>>>>>>>>>>>>>>>>>> exception?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> CliFrontend.run() “hijacks”
>>>>>>>>>>>>>>>>>> ExecutionEnvironment.execute()
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> get a
>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph and then execute that JobGraph
>>>>>>>>> manually.
>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>>> around
>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>> by letting execute() do the actual
>>>>>>>>> execution. One
>>>>>>>>>>>>>>>>> caveat
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>> now the main() method doesn’t return (or
>>>>>>>>> is forced
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> return by
>>>>>>>>>>>>>>>>>>>>>>>>>> throwing an
>>>>>>>>>>>>>>>>>>>>>>>>>>>> exception from execute()) which means
>>>>>>>>> that for Jar
>>>>>>>>>>>>>>>>>>> Submission
>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> WebFrontend we have a long-running
>>>>>>>>> main() method
>>>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> WebFrontend. This doesn’t sound very
>>>>>>>>> good. We
>>>>>>>>>>>>>>> could get
>>>>>>>>>>>>>>>>>>> around
>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>> removing the plan preview feature and by
>>>>>>>>> removing
>>>>>>>>>>>>>>> Jar
>>>>>>>>>>>>>>>>>>>>>>>>>> Submission/Running.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - How to deal with detached mode?
>>>>>>>>> Right now,
>>>>>>>>>>>>>>>>>>>>>> DetachedEnvironment
>>>>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute the job and return immediately.
>>>>>>>>> If users
>>>>>>>>>>>>>>>>> control
>>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> return, by waiting on the job completion
>>>>>>>>> future,
>>>>>>>>>>>>>>> how do
>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> deal
>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>> this?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do we simply remove the distinction
>>>>>>>>> between
>>>>>>>>>>>>>>>>>>>>>> detached/non-detached?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - How does per-job mode interact with
>>>>>>>>>>>>>>> “interactive
>>>>>>>>>>>>>>>>>>>>> programming”
>>>>>>>>>>>>>>>>>>>>>>>>>>>> (FLIP-36). For YARN, each execute() call
>>>>>>>>> could
>>>>>>>>>>>>>>> spawn a
>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>>>>>>> YARN
>>>>>>>>>>>>>>>>>>>>>>>>>>>> cluster. What about Mesos and Kubernetes?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> The first open question is where the
>>>>>>>>> opinions
>>>>>>>>>>>>>>> diverge,
>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>> think.
>>>>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>>> rest
>>>>>>>>>>>>>>>>>>>>>>>>>>>> are just open questions and interesting
>>>>>>>>> things
>>>>>>>>>>>>>>> that we
>>>>>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>>>>>>>>>>>> consider.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 31. Jul 2019, at 15:23, Jeff Zhang <
>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks tison for the effort. I left a
>>>>>>>>> few
>>>>>>>>>>>>>>> comments.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
>>>>>>>>> 于2019年7月31日周三
>>>>>>>>>>>>>>>>>> 下午8:24写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Flavio,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your reply.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Either current impl and in the design,
>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> never takes responsibility for
>>>>>>>>> generating
>>>>>>>>>>>>>>> JobGraph.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (what you see in current codebase is
>>>>>>>>> several
>>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>>>> methods)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Instead, user describes his program
>>>>>>>>> in the main
>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with ExecutionEnvironment apis and
>>>>>>>>> calls
>>>>>>>>>>>>>>>>> env.compile()
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or env.optimize() to get FlinkPlan
>>>>>>>>> and JobGraph
>>>>>>>>>>>>>>>>>>>>> respectively.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For listing main classes in a jar and
>>>>>>>>> choose
>>>>>>>>>>>>>>> one for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submission, you're now able to
>>>>>>>>> customize a CLI
>>>>>>>>>>>>>>> to do
>>>>>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Specifically, the path of jar is
>>>>>>>>> passed as
>>>>>>>>>>>>>>> arguments
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the customized CLI you list main
>>>>>>>>> classes,
>>>>>>>>>>>>>>> choose
>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to submit to the cluster.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
>>>>>>>>> pompermaier@okkam.it>
>>>>>>>>>>>>>>>>>> 于2019年7月31日周三
>>>>>>>>>>>>>>>>>>>>>>> 下午8:12写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Just one note on my side: it is not
>>>>>>>>> clear to me
>>>>>>>>>>>>>>>>>>> whether the
>>>>>>>>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be able to generate a job graph or
>>>>>>>>> not.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In my opinion, the job jar must
>>>>>>>>> resides only
>>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>>>>>>>>>>>> server/jobManager
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> side
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and the client requires a way to get
>>>>>>>>> the job
>>>>>>>>>>>>>>> graph.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you really want to access to the
>>>>>>>>> job graph,
>>>>>>>>>>>>>>> I'd
>>>>>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>> dedicated
>>>>>>>>>>>>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the ClusterClient. like:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - getJobGraph(jarId, mainClass):
>>>>>>>>> JobGraph
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - listMainClasses(jarId):
>>>>>>>>> List<String>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> These would require some addition
>>>>>>>>> also on the
>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>> manager
>>>>>>>>>>>>>>>>>>>>>>>> endpoint
>>>>>>>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> well..what do you think?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 31, 2019 at 12:42 PM
>>>>>>>>> Zili Chen <
>>>>>>>>>>>>>>>>>>>>>>> wander4096@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Here is a document[1] on client api
>>>>>>>>>>>>>>> enhancement
>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>>>>>>>>>>>> perspective.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We have investigated current
>>>>>>>>> implementations.
>>>>>>>>>>>>>>> And
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>> propose
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Unify the implementation of
>>>>>>>>> cluster
>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>> submission
>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Provide programmatic interfaces
>>>>>>>>> to allow
>>>>>>>>>>>>>>>>> flexible
>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> management.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The first proposal is aimed at
>>>>>>>>> reducing code
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job submission so that one can
>>>>>>>>> adopt Flink in
>>>>>>>>>>>>>>> his
>>>>>>>>>>>>>>>>>>> usage
>>>>>>>>>>>>>>>>>>>>>>> easily.
>>>>>>>>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> second
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposal is aimed at providing rich
>>>>>>>>>>>>>>> interfaces for
>>>>>>>>>>>>>>>>>>>>> advanced
>>>>>>>>>>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> who want to make accurate control
>>>>>>>>> of these
>>>>>>>>>>>>>>> stages.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Quick reference on open questions:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Exclude job cluster deployment
>>>>>>>>> from client
>>>>>>>>>>>>>>> side
>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>> redefine
>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantic
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of job cluster? Since it fits in a
>>>>>>>>> process
>>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>>>>>> session
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cluster deployment and job
>>>>>>>>> submission.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Maintain the codepaths handling
>>>>>>>>> class
>>>>>>>>>>>>>>>>>>>>>>>> o.a.f.api.common.Program
>>>>>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implement customized program
>>>>>>>>> handling logic by
>>>>>>>>>>>>>>>>>>> customized
>>>>>>>>>>>>>>>>>>>>>>>>>>>> CliFrontend?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> See also this thread[2] and the
>>>>>>>>> document[1].
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. Expose ClusterClient as public
>>>>>>>>> api or just
>>>>>>>>>>>>>>>>> expose
>>>>>>>>>>>>>>>>>>> api
>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and delegate them to ClusterClient?
>>>>>>>>> Further,
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> either way
>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>> worth
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a JobClient which is an
>>>>>>>>>>>>>>> encapsulation of
>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> associated to specific job?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang <zj...@gmail.com>
>>>>>>>>> 于2019年7月24日周三
>>>>>>>>>>>>>>>>>> 上午9:19写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Stephan, I will follow up
>>>>>>>>> this issue
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> next
>>>>>>>>>>>>>>>>>>> few
>>>>>>>>>>>>>>>>>>>>>>> weeks,
>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> refine the design doc. We could
>>>>>>>>> discuss more
>>>>>>>>>>>>>>>>>> details
>>>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>>>> 1.9
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> release.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
>>>>>>>>>>>>>>> 于2019年7月24日周三
>>>>>>>>>>>>>>>>>>> 上午12:58写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This thread has stalled for a
>>>>>>>>> bit, which I
>>>>>>>>>>>>>>>>> assume
>>>>>>>>>>>>>>>>>>> ist
>>>>>>>>>>>>>>>>>>>>>> mostly
>>>>>>>>>>>>>>>>>>>>>>>>>> due to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink 1.9 feature freeze and
>>>>>>>>> release testing
>>>>>>>>>>>>>>>>>> effort.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I personally still recognize this
>>>>>>>>> issue as
>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>> important
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solved.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be happy to help resume this
>>>>>>>>> discussion soon
>>>>>>>>>>>>>>>>>> (after
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> 1.9
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> release)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see if we can do some step
>>>>>>>>> towards this in
>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>>> 1.10.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 10:41 AM
>>>>>>>>> Flavio
>>>>>>>>>>>>>>>>>> Pompermaier
>>>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pompermaier@okkam.it>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That's exactly what I suggested
>>>>>>>>> a long time
>>>>>>>>>>>>>>>>> ago:
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>>>>>>>> REST
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should not require any Flink
>>>>>>>>> dependency,
>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>> http
>>>>>>>>>>>>>>>>>>>>>> library
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> REST
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> services to submit and monitor a
>>>>>>>>> job.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested also in [1] was
>>>>>>>>> to have a
>>>>>>>>>>>>>>> way
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> automatically
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> suggest
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user (via a UI) the available
>>>>>>>>> main classes
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>>>>>>> required
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parameters[2].
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another problem we have with
>>>>>>>>> Flink is that
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> Rest
>>>>>>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CLI
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behaves differently and we use
>>>>>>>>> the CLI
>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>> (via
>>>>>>>>>>>>>>>>>>> ssh)
>>>>>>>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> allows
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to call some other method after
>>>>>>>>>>>>>>> env.execute()
>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>> (we
>>>>>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> another
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> REST service to signal the end
>>>>>>>>> of the job).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Int his regard, a dedicated
>>>>>>>>> interface,
>>>>>>>>>>>>>>> like the
>>>>>>>>>>>>>>>>>>>>>> JobListener
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> suggested
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the previous emails, would be
>>>>>>>>> very helpful
>>>>>>>>>>>>>>>>>> (IMHO).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10864
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10862
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10879
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 9:54 AM
>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Tison,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your comments.
>>>>>>>>> Overall I agree
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> difficult
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> down stream project to
>>>>>>>>> integrate with
>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> refactor
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current flink client api.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And I agree that CliFrontend
>>>>>>>>> should only
>>>>>>>>>>>>>>>>> parsing
>>>>>>>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>>>>>>>>>> line
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> arguments
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then pass them to
>>>>>>>>> ExecutionEnvironment.
>>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> responsibility to compile job,
>>>>>>>>> create
>>>>>>>>>>>>>>> cluster,
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> submit
>>>>>>>>>>>>>>>>>>>>>>>> job.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Besides
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that, Currently flink has many
>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementations,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink will use the specific one
>>>>>>>>> based on
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> context.
>>>>>>>>>>>>>>>>>>>>>>> IMHO,
>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary, ExecutionEnvironment
>>>>>>>>> should be
>>>>>>>>>>>>>>> able
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> right
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the FlinkConf it is
>>>>>>>>> received. Too many
>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation is another
>>>>>>>>> burden for
>>>>>>>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>>>>>> project
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> integration.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One thing I'd like to mention
>>>>>>>>> is flink's
>>>>>>>>>>>>>>> scala
>>>>>>>>>>>>>>>>>>> shell
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> sql
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> although they are sub-modules
>>>>>>>>> of flink,
>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>>>> could be
>>>>>>>>>>>>>>>>>>>>>>>> treated
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> project which use flink's
>>>>>>>>> client api.
>>>>>>>>>>>>>>>>> Currently
>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>> find
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easy for them to integrate with
>>>>>>>>> flink,
>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>>> share
>>>>>>>>>>>>>>>>>>> many
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> duplicated
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is another sign that we should
>>>>>>>>> refactor
>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>> api.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I believe it is a large and
>>>>>>>>> hard change,
>>>>>>>>>>>>>>> and I
>>>>>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>>>> afraid
>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> keep
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility since many of
>>>>>>>>> changes are
>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>> facing.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <wander4096@gmail.com
>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 于2019年6月24日周一
>>>>>>>>>>>>>>>>>>>>>> 下午2:53写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> After a closer look on our
>>>>>>>>> client apis,
>>>>>>>>>>>>>>> I can
>>>>>>>>>>>>>>>>>> see
>>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> two
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> major
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues to consistency and
>>>>>>>>> integration,
>>>>>>>>>>>>>>> namely
>>>>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job cluster which couples job
>>>>>>>>> graph
>>>>>>>>>>>>>>> creation
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submission via CliFrontend
>>>>>>>>> confusing
>>>>>>>>>>>>>>>>>> control
>>>>>>>>>>>>>>>>>>> flow
>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> graph
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compilation and job
>>>>>>>>> submission. I'd like
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> follow
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discuss
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> above,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mainly the process described
>>>>>>>>> by Jeff and
>>>>>>>>>>>>>>>>>>> Stephan, and
>>>>>>>>>>>>>>>>>>>>>>> share
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas on these issues.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend confuses the
>>>>>>>>> control flow
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>> compilation
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submission.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Following the process of job
>>>>>>>>> submission
>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> Jeff
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> described,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution environment knows
>>>>>>>>> all configs
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> topos/settings
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the job. Ideally, in the
>>>>>>>>> main method
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>> program,
>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> calls
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> #execute
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (or named #submit) and Flink
>>>>>>>>> deploys the
>>>>>>>>>>>>>>>>>> cluster,
>>>>>>>>>>>>>>>>>>>>>> compile
>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> graph
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit it to the cluster.
>>>>>>>>> However,
>>>>>>>>>>>>>>>>> current
>>>>>>>>>>>>>>>>>>>>>>> CliFrontend
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things inside its #runProgram
>>>>>>>>> method,
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>> introduces
>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>> lot
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of (stream) execution
>>>>>>>>> environment.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually, it sets up an exec
>>>>>>>>> env that
>>>>>>>>>>>>>>> hijacks
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> #execute/executePlan
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method, initializes the job
>>>>>>>>> graph and
>>>>>>>>>>>>>>> abort
>>>>>>>>>>>>>>>>>>>>> execution.
>>>>>>>>>>>>>>>>>>>>>>> And
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow back to
>>>>>>>>> CliFrontend, it
>>>>>>>>>>>>>>> deploys
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> cluster(or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the client) and submits the
>>>>>>>>> job graph.
>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> process inside Flink and none
>>>>>>>>> of
>>>>>>>>>>>>>>> consistency
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> anything.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Deployment of job cluster
>>>>>>>>> couples job
>>>>>>>>>>>>>>>>> graph
>>>>>>>>>>>>>>>>>>>>> creation
>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment. Abstractly, from
>>>>>>>>> user job to
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> concrete
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submission,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   create JobGraph --\
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create ClusterClient -->
>>>>>>>>> submit JobGraph
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> such a dependency.
>>>>>>>>> ClusterClient was
>>>>>>>>>>>>>>> created
>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>> deploying
>>>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieving.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph submission requires a
>>>>>>>>> compiled
>>>>>>>>>>>>>>>>>> JobGraph
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> valid
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but the creation of
>>>>>>>>> ClusterClient is
>>>>>>>>>>>>>>>>> abstractly
>>>>>>>>>>>>>>>>>>>>>>> independent
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph. However, in job
>>>>>>>>> cluster mode,
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> deploy job
>>>>>>>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> graph, which means we use
>>>>>>>>> another
>>>>>>>>>>>>>>> process:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create JobGraph --> deploy
>>>>>>>>> cluster with
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> JobGraph
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Here is another inconsistency
>>>>>>>>> and
>>>>>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>>>>>>>>> projects/client
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> apis
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forced to handle different
>>>>>>>>> cases with
>>>>>>>>>>>>>>> rare
>>>>>>>>>>>>>>>>>>> supports
>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since we likely reached a
>>>>>>>>> consensus on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. all configs gathered by
>>>>>>>>> Flink
>>>>>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> passed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. execution environment knows
>>>>>>>>> all
>>>>>>>>>>>>>>> configs
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> handles
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution(both
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment and submission)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the issues above I propose
>>>>>>>>> eliminating
>>>>>>>>>>>>>>>>>>>>>> inconsistencies
>>>>>>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend should exactly
>>>>>>>>> be a front
>>>>>>>>>>>>>>> end,
>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>> least
>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "run"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That means it just gathered
>>>>>>>>> and passed
>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>> config
>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the main method of user
>>>>>>>>> program.
>>>>>>>>>>>>>>> Execution
>>>>>>>>>>>>>>>>>>>>> environment
>>>>>>>>>>>>>>>>>>>>>>>> knows
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> info
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and with an addition to utils
>>>>>>>>> for
>>>>>>>>>>>>>>>>>> ClusterClient,
>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gracefully
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient by deploying or
>>>>>>>>>>>>>>> retrieving. In
>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>> way,
>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hijack #execute/executePlan
>>>>>>>>> methods and
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>> remove
>>>>>>>>>>>>>>>>>>>>>>> various
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hacking
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses of exec env, as
>>>>>>>>> well as #run
>>>>>>>>>>>>>>>>> methods
>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient(for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interface-ized ClusterClient).
>>>>>>>>> Now the
>>>>>>>>>>>>>>>>> control
>>>>>>>>>>>>>>>>>>> flow
>>>>>>>>>>>>>>>>>>>>>> flows
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CliFrontend
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the main method and never
>>>>>>>>> returns.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Job cluster means a cluster
>>>>>>>>> for the
>>>>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>>>>>> job.
>>>>>>>>>>>>>>>>>>>>>> From
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> another
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> perspective, it is an
>>>>>>>>> ephemeral session.
>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>>>>>>> decouple
>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with a compiled job graph, but
>>>>>>>>> start a
>>>>>>>>>>>>>>>>> session
>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>> idle
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> timeout
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit the job following.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> These topics, before we go
>>>>>>>>> into more
>>>>>>>>>>>>>>> details
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are better to be aware and
>>>>>>>>> discussed for
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> consensus.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <
>>>>>>>>> wander4096@gmail.com>
>>>>>>>>>>>>>>>>> 于2019年6月20日周四
>>>>>>>>>>>>>>>>>>>>>> 上午3:21写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for raising this
>>>>>>>>> thread and the
>>>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>>>>>>> document!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As @Thomas Weise mentioned
>>>>>>>>> above,
>>>>>>>>>>>>>>> extending
>>>>>>>>>>>>>>>>>>> config
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires far more effort than
>>>>>>>>> it should
>>>>>>>>>>>>>>> be.
>>>>>>>>>>>>>>>>>>> Another
>>>>>>>>>>>>>>>>>>>>>>>> example
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is we achieve detach mode by
>>>>>>>>> introduce
>>>>>>>>>>>>>>>>> another
>>>>>>>>>>>>>>>>>>>>>> execution
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment which also hijack
>>>>>>>>> #execute
>>>>>>>>>>>>>>>>> method.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with your idea that
>>>>>>>>> user would
>>>>>>>>>>>>>>>>>>> configure all
>>>>>>>>>>>>>>>>>>>>>>>> things
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and flink "just" respect it.
>>>>>>>>> On this
>>>>>>>>>>>>>>> topic I
>>>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unusual
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow when CliFrontend
>>>>>>>>> handle
>>>>>>>>>>>>>>> "run"
>>>>>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It handles several configs,
>>>>>>>>> mainly about
>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>> settings,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thus main method of user
>>>>>>>>> program is
>>>>>>>>>>>>>>> unaware
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> them.
>>>>>>>>>>>>>>>>>>>>>>> Also
>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compiles
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> app to job graph by run the
>>>>>>>>> main method
>>>>>>>>>>>>>>>>> with a
>>>>>>>>>>>>>>>>>>>>>> hijacked
>>>>>>>>>>>>>>>>>>>>>>>> exec
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> env,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which constrain the main
>>>>>>>>> method further.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to write down a few
>>>>>>>>> of notes on
>>>>>>>>>>>>>>>>>>>>> configs/args
>>>>>>>>>>>>>>>>>>>>>>> pass
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> respect,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as well as decoupling job
>>>>>>>>> compilation
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> submission.
>>>>>>>>>>>>>>>>>>>>>>>> Share
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread later.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SHI Xiaogang <
>>>>>>>>> shixiaogangg@gmail.com>
>>>>>>>>>>>>>>>>>>> 于2019年6月17日周一
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午7:29写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff and Flavio,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Jeff a lot for
>>>>>>>>> proposing the
>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>>>>>> document.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are also working on
>>>>>>>>> refactoring
>>>>>>>>>>>>>>>>>>> ClusterClient to
>>>>>>>>>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flexible
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> efficient job management in
>>>>>>>>> our
>>>>>>>>>>>>>>> real-time
>>>>>>>>>>>>>>>>>>> platform.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We would like to draft a
>>>>>>>>> document to
>>>>>>>>>>>>>>> share
>>>>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>>>>>>> ideas
>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think it's a good idea to
>>>>>>>>> have
>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>> Apache
>>>>>>>>>>>>>>>>>>>>>>>> Livy
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the efforts discussed here
>>>>>>>>> will take a
>>>>>>>>>>>>>>>>> great
>>>>>>>>>>>>>>>>>>> step
>>>>>>>>>>>>>>>>>>>>>>> forward
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Xiaogang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
>>>>>>>>>>>>>>> pompermaier@okkam.it>
>>>>>>>>>>>>>>>>>>>>>>> 于2019年6月17日周一
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午7:13写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Is there any possibility to
>>>>>>>>> have
>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>> Apache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Livy
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink in the future?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>> https://livy.apache.org/
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 11, 2019 at
>>>>>>>>> 5:23 PM Jeff
>>>>>>>>>>>>>>>>> Zhang <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any API we expose
>>>>>>>>> should not have
>>>>>>>>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package or
>>>>>>>>> other
>>>>>>>>>>>>>>>>>>> implementation
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> details.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>> cannot be
>>>>>>>>>>>>>>>>>>> exposed
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from the
>>>>>>>>>>>>>>> optimiser and
>>>>>>>>>>>>>>>>>>> runtime
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> packages.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We should change
>>>>>>>>> ClusterClient from
>>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> interface.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment only
>>>>>>>>> use the
>>>>>>>>>>>>>>>>> interface
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in flink-clients while the
>>>>>>>>> concrete
>>>>>>>>>>>>>>>>>>>>> implementation
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-runtime.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What happens when a
>>>>>>>>>>>>>>> failure/restart in
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to be a way of
>>>>>>>>> re-establishing the
>>>>>>>>>>>>>>>>>>> connection to
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> up
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners again, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Good point.  First we need
>>>>>>>>> to define
>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failure/restart
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client mean. IIUC, that
>>>>>>>>> usually mean
>>>>>>>>>>>>>>>>>> network
>>>>>>>>>>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happen in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class RestClient. If my
>>>>>>>>>>>>>>> understanding is
>>>>>>>>>>>>>>>>>>> correct,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> restart/retry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mechanism
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be done in
>>>>>>>>> RestClient.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <
>>>>>>>>>>>>>>> aljoscha@apache.org>
>>>>>>>>>>>>>>>>>>>>>> 于2019年6月11日周二
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:10写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Some points to consider:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * Any API we expose
>>>>>>>>> should not have
>>>>>>>>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package
>>>>>>>>> or other
>>>>>>>>>>>>>>>>>>> implementation
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> details.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>> cannot be
>>>>>>>>>>>>>>>>>>> exposed
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from
>>>>>>>>> the
>>>>>>>>>>>>>>> optimiser
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> runtime
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> packages.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * What happens when a
>>>>>>>>>>>>>>> failure/restart in
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be a way of
>>>>>>>>> re-establishing the
>>>>>>>>>>>>>>>>> connection
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> up
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 29. May 2019, at
>>>>>>>>> 10:17, Jeff
>>>>>>>>>>>>>>> Zhang <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry folks, the design
>>>>>>>>> doc is
>>>>>>>>>>>>>>> late as
>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expected.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Here's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doc
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I drafted, welcome any
>>>>>>>>> comments and
>>>>>>>>>>>>>>>>>>> feedback.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <
>>>>>>>>> sewen@apache.org>
>>>>>>>>>>>>>>>>>>> 于2019年2月14日周四
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午8:43写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Nice that this
>>>>>>>>> discussion is
>>>>>>>>>>>>>>>>> happening.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the FLIP, we could
>>>>>>>>> also
>>>>>>>>>>>>>>> revisit the
>>>>>>>>>>>>>>>>>>> entire
>>>>>>>>>>>>>>>>>>>>>> role
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environments
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the idea was:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the environments take
>>>>>>>>> care of
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> setup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> standalone
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (no
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> setup needed), yarn,
>>>>>>>>> mesos, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the session ones have
>>>>>>>>> control
>>>>>>>>>>>>>>> over
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> session.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> holds
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the session client.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - running a job gives a
>>>>>>>>> "control"
>>>>>>>>>>>>>>>>> object
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behavior
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the same in all
>>>>>>>>> environments.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The actual
>>>>>>>>> implementation diverged
>>>>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>>>> a bit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Happy
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about
>>>>>>>>> straitening this
>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> bit
>>>>>>>>>>>>>>>>>>>>>> more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at
>>>>>>>>> 4:58 AM
>>>>>>>>>>>>>>> Jeff
>>>>>>>>>>>>>>>>>>> Zhang <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for late
>>>>>>>>> response, It
>>>>>>>>>>>>>>> seems we
>>>>>>>>>>>>>>>>>>> reach
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consensus
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP for this with
>>>>>>>>> more detailed
>>>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas Weise <
>>>>>>>>> thw@apache.org>
>>>>>>>>>>>>>>>>>>> 于2018年12月21日周五
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 上午11:43写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great to see this
>>>>>>>>> discussion
>>>>>>>>>>>>>>> seeded!
>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problems
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> face
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin integration
>>>>>>>>> are also
>>>>>>>>>>>>>>>>>> affecting
>>>>>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We just enabled the
>>>>>>>>> savepoint
>>>>>>>>>>>>>>>>> restore
>>>>>>>>>>>>>>>>>>> option
>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RemoteStreamEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and that was more
>>>>>>>>> difficult
>>>>>>>>>>>>>>> than it
>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>> be.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> main
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment and
>>>>>>>>> cluster client
>>>>>>>>>>>>>>>>> aren't
>>>>>>>>>>>>>>>>>>>>>> decoupled.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ideally
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible to just get
>>>>>>>>> the
>>>>>>>>>>>>>>> matching
>>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then control the job
>>>>>>>>> through it
>>>>>>>>>>>>>>>>>>> (environment
>>>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> factory
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client). But note
>>>>>>>>> that the
>>>>>>>>>>>>>>>>> environment
>>>>>>>>>>>>>>>>>>>>> classes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> part
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and it is not
>>>>>>>>> straightforward to
>>>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>>>>> larger
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> changes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> breaking
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backward
>>>>>>>>> compatibility.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>> currently exposes
>>>>>>>>>>>>>>>>>> internal
>>>>>>>>>>>>>>>>>>>>>> classes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> StreamGraph. But it
>>>>>>>>> should be
>>>>>>>>>>>>>>>>> possible
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> wrap
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that brings the
>>>>>>>>> required job
>>>>>>>>>>>>>>> control
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> capabilities
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Perhaps it is helpful
>>>>>>>>> to look at
>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interfaces
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> while
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking about this:
>>>>>>>>> [2] for the
>>>>>>>>>>>>>>>>>>> portable
>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> old
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> asynchronous job
>>>>>>>>> control from
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> Beam
>>>>>>>>>>>>>>>>>>> Java
>>>>>>>>>>>>>>>>>>>>>> SDK.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The backward
>>>>>>>>> compatibility
>>>>>>>>>>>>>>>>> discussion
>>>>>>>>>>>>>>>>>>> [4] is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> relevant
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here. A
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should shield
>>>>>>>>> downstream
>>>>>>>>>>>>>>> projects
>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>> internals
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> them to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interoperate with
>>>>>>>>> multiple
>>>>>>>>>>>>>>> future
>>>>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>>>>>> versions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> release
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without forced
>>>>>>>>> upgrades.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/flink/pull/7249
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [4]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018
>>>>>>>>> at 6:15 PM
>>>>>>>>>>>>>>> Jeff
>>>>>>>>>>>>>>>>>>> Zhang <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
>>>>>>>>> whether the
>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> able
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> define
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
>>>>>>>>> example Yarn).
>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> independent
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
>>>>>>>>> something
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>> decided
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> User don't need to
>>>>>>>>> specify
>>>>>>>>>>>>>>>>> execution
>>>>>>>>>>>>>>>>>>> mode
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatically.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> They
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass the execution
>>>>>>>>> mode from
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> arguments
>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> run
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>>>>>>>>> yarn-cluster
>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>>>>>>>>> local ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>>>>>>>>> host:port ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this make sense
>>>>>>>>> to you ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To me it makes
>>>>>>>>> sense that
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized by the
>>>>>>>>> user and
>>>>>>>>>>>>>>> instead
>>>>>>>>>>>>>>>>>>> context
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sensitive
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute your job
>>>>>>>>> (Flink CLI vs.
>>>>>>>>>>>>>>>>> IDE,
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Right, currently I
>>>>>>>>> notice Flink
>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>> ContextExecutionEnvironment
>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submission
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scenarios
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cli vs IDE). To me
>>>>>>>>> this is
>>>>>>>>>>>>>>> kind of
>>>>>>>>>>>>>>>>>> hack
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> straightforward.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested
>>>>>>>>> above is that
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> always
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>> but with
>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration it
>>>>>>>>> would create
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> proper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behaviors.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
>>>>>>>>>>>>>>>>> trohrmann@apache.org>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月20日周四
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:18写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You are probably
>>>>>>>>> right that we
>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> duplication
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creation of the
>>>>>>>>> ClusterClient.
>>>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduced
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
>>>>>>>>> whether the
>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> able
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> define
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
>>>>>>>>> example Yarn).
>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> independent
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
>>>>>>>>> something
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> decided
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> makes sense that the
>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user and
>>>>>>>>> instead context
>>>>>>>>>>>>>>>>>>> sensitive how
>>>>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE,
>>>>>>>>> for
>>>>>>>>>>>>>>> example).
>>>>>>>>>>>>>>>>>>>>> However, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> agree
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>> ExecutionEnvironment should
>>>>>>>>>>>>>>> give
>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>> access
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job (maybe in the
>>>>>>>>> form of the
>>>>>>>>>>>>>>>>>>> JobGraph or
>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> plan).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 13,
>>>>>>>>> 2018 at 4:36
>>>>>>>>>>>>>>> AM
>>>>>>>>>>>>>>>>> Jeff
>>>>>>>>>>>>>>>>>>>>> Zhang <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Till,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the
>>>>>>>>> feedback. You
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>> right
>>>>>>>>>>>>>>>>>>>>>> that I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expect
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>> submission/control api
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>> could be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> project.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it would benefit
>>>>>>>>> for the
>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>> ecosystem.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> When
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scala-shell and
>>>>>>>>> sql-client (I
>>>>>>>>>>>>>>>>>> believe
>>>>>>>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> core of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> belong to the
>>>>>>>>> ecosystem of
>>>>>>>>>>>>>>>>> flink),
>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>> find
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> many
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> duplicated
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creating
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient from
>>>>>>>>> user
>>>>>>>>>>>>>>> provided
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (configuration
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> format
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different from
>>>>>>>>> scala-shell
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> sql-client)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to manipulate
>>>>>>>>> jobs. I don't
>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>> this is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> convenient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects. What I
>>>>>>>>> expect is
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> project
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary
>>>>>>>>> configuration info
>>>>>>>>>>>>>>>>> (maybe
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introducing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> build
>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>> ExecutionEnvironment will
>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> proper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> benefit for the
>>>>>>>>> downstream
>>>>>>>>>>>>>>>>> project
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their integration
>>>>>>>>> test with
>>>>>>>>>>>>>>>>> flink.
>>>>>>>>>>>>>>>>>>> Here's
>>>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sample
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> snippet
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expect.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val conf = new
>>>>>>>>>>>>>>>>>>> FlinkConf().mode("yarn")
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val env = new
>>>>>>>>>>>>>>>>>>> ExecutionEnvironment(conf)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobId =
>>>>>>>>> env.submit(...)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobStatus =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What do you think ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
>>>>>>>>>>>>>>>>>> trohrmann@apache.org>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月11日周二
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午6:28写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what you are
>>>>>>>>> proposing is to
>>>>>>>>>>>>>>>>>>> provide the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control. There
>>>>>>>>> was actually
>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>> effort to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> achieve
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> never
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> completed [1].
>>>>>>>>> However,
>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improvement
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> base
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Look for example
>>>>>>>>> at the
>>>>>>>>>>>>>>>>>>> NewClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interface
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> offers a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> non-blocking job
>>>>>>>>> submission.
>>>>>>>>>>>>>>>>> But I
>>>>>>>>>>>>>>>>>>> agree
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improve
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this regard.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would not be in
>>>>>>>>> favour if
>>>>>>>>>>>>>>>>>>> exposing all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> calls
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>> because it
>>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> clutter
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> good separation
>>>>>>>>> of concerns.
>>>>>>>>>>>>>>>>>>> Instead one
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current
>>>>>>>>> ClusterClient from
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for cluster and
>>>>>>>>> job
>>>>>>>>>>>>>>> control. But
>>>>>>>>>>>>>>>>>>> before
>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> effort
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> agree and capture
>>>>>>>>> what
>>>>>>>>>>>>>>>>>>> functionality we
>>>>>>>>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the
>>>>>>>>> idea was
>>>>>>>>>>>>>>> that we
>>>>>>>>>>>>>>>>>>> have the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> describing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to talk to
>>>>>>>>> cluster manager
>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>> Yarn or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mesos.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used for
>>>>>>>>> deploying Flink
>>>>>>>>>>>>>>>>> clusters
>>>>>>>>>>>>>>>>>>> (job
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> session)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gives
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>> controls
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (e.g.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, listing all
>>>>>>>>> running
>>>>>>>>>>>>>>> jobs).
>>>>>>>>>>>>>>>>>> And
>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobClient which
>>>>>>>>> you obtain
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trigger
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operations (e.g.
>>>>>>>>> taking a
>>>>>>>>>>>>>>>>>> savepoint,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancelling
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-4272
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11,
>>>>>>>>> 2018 at
>>>>>>>>>>>>>>> 10:13 AM
>>>>>>>>>>>>>>>>>>> Jeff
>>>>>>>>>>>>>>>>>>>>>> Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am trying to
>>>>>>>>> integrate
>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>>>>>>> apache
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zeppelin
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> notebook. And I
>>>>>>>>> hit several
>>>>>>>>>>>>>>>>>> issues
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caused
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to
>>>>>>>>> proposal the
>>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>>>>>> changes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Support
>>>>>>>>> nonblocking
>>>>>>>>>>>>>>>>> execution.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>> ExecutionEnvironment#execute
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a blocking
>>>>>>>>> method which
>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>> do 2
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wait for job
>>>>>>>>> until it is
>>>>>>>>>>>>>>>>>> finished.
>>>>>>>>>>>>>>>>>>> I'd
>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nonblocking
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution method
>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#submit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then return
>>>>>>>>> jobId to
>>>>>>>>>>>>>>> client.
>>>>>>>>>>>>>>>>> And
>>>>>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> query
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> status
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel
>>>>>>>>> api in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> currently the
>>>>>>>>> only way to
>>>>>>>>>>>>>>>>> cancel
>>>>>>>>>>>>>>>>>>> job is
>>>>>>>>>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cli
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (bin/flink),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> convenient for
>>>>>>>>> downstream
>>>>>>>>>>>>>>>>> project
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> feature.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So I'd
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel api in
>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint
>>>>>>>>> api in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is similar as
>>>>>>>>> cancel api,
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> should use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unified
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api for third
>>>>>>>>> party to
>>>>>>>>>>>>>>>>> integrate
>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. Add listener
>>>>>>>>> for job
>>>>>>>>>>>>>>>>> execution
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lifecycle.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Something
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that downstream
>>>>>>>>> project
>>>>>>>>>>>>>>> can do
>>>>>>>>>>>>>>>>>>> custom
>>>>>>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lifecycle
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin would
>>>>>>>>> capture the
>>>>>>>>>>>>>>>>> jobId
>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitted
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId to cancel
>>>>>>>>> it later
>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>> necessary.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public interface
>>>>>>>>>>>>>>> JobListener {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>>>>>>>>> onJobSubmitted(JobID
>>>>>>>>>>>>>>>>>> jobId);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>>>>>>>>>>>>>>>>>>> onJobExecuted(JobExecutionResult
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobResult);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>>>>>>>>> onJobCanceled(JobID
>>>>>>>>>>>>>>>>> jobId);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5. Enable
>>>>>>>>> session in
>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> disabled,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> session is very
>>>>>>>>> convenient
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> third
>>>>>>>>>>>>>>>>>>>>>> party
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> continually.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I hope flink can
>>>>>>>>> enable it
>>>>>>>>>>>>>>>>> again.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6. Unify all
>>>>>>>>> flink client
>>>>>>>>>>>>>>> api
>>>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a long
>>>>>>>>> term issue
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>> needs
>>>>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> careful
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently some
>>>>>>>>> of features
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> flink is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exposed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exposed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cli instead of
>>>>>>>>> api, like
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> cancel and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> savepoint I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> above.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think the root
>>>>>>>>> cause is
>>>>>>>>>>>>>>> due to
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> didn't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unify
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interaction
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink. Here I
>>>>>>>>> list 3
>>>>>>>>>>>>>>> scenarios
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operation
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Local job
>>>>>>>>> execution.
>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>> LocalEnvironment to
>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LocalExecutor
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Remote job
>>>>>>>>> execution.
>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>>>>>>>> ContextEnvironment
>>>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> run
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Job
>>>>>>>>> cancelation. Flink
>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this job via
>>>>>>>>> this
>>>>>>>>>>>>>>>>> ClusterClient.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As you can see
>>>>>>>>> in the
>>>>>>>>>>>>>>> above 3
>>>>>>>>>>>>>>>>>>>>> scenarios.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> didn't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach(code
>>>>>>>>> path) to
>>>>>>>>>>>>>>> interact
>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I propose is
>>>>>>>>>>>>>>> following:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Create the proper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment/RemoteEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (based
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration)
>>>>>>>>> --> Use this
>>>>>>>>>>>>>>>>>>> Environment
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>> (LocalClusterClient or
>>>>>>>>>>>>>>>>>>>>> RestClusterClient)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink (
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution or
>>>>>>>>> cancelation)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This way we can
>>>>>>>>> unify the
>>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>> local
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remote
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And it is much
>>>>>>>>> easier for
>>>>>>>>>>>>>>> third
>>>>>>>>>>>>>>>>>>> party
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> integrate
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>> ExecutionEnvironment is the
>>>>>>>>>>>>>>>>>> unified
>>>>>>>>>>>>>>>>>>>>> entry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> third
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> party
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs to do is
>>>>>>>>> just pass
>>>>>>>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>> ExecutionEnvironment will
>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> right
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink cli can
>>>>>>>>> also be
>>>>>>>>>>>>>>>>> considered
>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consumer.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration to
>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> let
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create the proper
>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>> instead
>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> letting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cli
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>> directly.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6 would involve
>>>>>>>>> large code
>>>>>>>>>>>>>>>>>>> refactoring,
>>>>>>>>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> defer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future release,
>>>>>>>>> 1,2,3,4,5
>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> done
>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> once I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> believe.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comments and
>>>>>>>>> feedback,
>>>>>>>>>>>>>>> thanks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best Regards
>>>>>> 
>>>>>> Jeff Zhang
>>>> 
>>>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

Hi Jeff,

I'm drafting the FLIP of JobClient. In my opinion JobClient is a thin
encapsulation of current ClusterClient that users need not and should
not pass JobID for the similar functions of ClusterClient.

Mainly it contains interfaces as below.

getJobResult(): CompletableFuture<JobResult>
getJobStatus(): CompletableFuture<JobStatus>

cancel(): CompletableFuture<Acknowledge>
cancelWithSavepoint(savepointDir): CompletableFuture<savepoint path>
stopWithSavepoint(savepointDir,
advanceToEndOfEventTime): CompletableFuture<savepoint path>

triggerSavepoint(savepointDir): CompletableFuture<savepoint path>
getAccumulators: CompletableFuture<Map<acc name, acc value>>

Is it the same as that kept in your mind?

Besides, I recommend you participant the slack channel[1] so that
we follow the consensus above and discuss there :-)

Best,
tison.

[1] https://the-asf.slack.com/messages/CNA3ADZPH


Zili Chen <wa...@gmail.com> 于2019年9月11日周三 下午5:54写道：

> s/ASK/ASF/
>
> Best,
> tison.
>
>
> Zili Chen <wa...@gmail.com> 于2019年9月11日周三 下午5:53写道：
>
>> FYI I've created a public channel #flink-client-api-enhancement under ASK
>> slack.
>>
>> You can reach it after join ASK Slack and search the channel.
>>
>> Best,
>> tison.
>>
>>
>> Kostas Kloudas <kk...@gmail.com> 于2019年9月11日周三 下午5:09写道：
>>
>>> +1 for creating a channel.
>>>
>>> Kostas
>>>
>>> On Wed, Sep 11, 2019 at 10:57 AM Zili Chen <wa...@gmail.com> wrote:
>>> >
>>> > Hi Aljoscha,
>>> >
>>> > I'm OK to use the ASF slack.
>>> >
>>> > Best,
>>> > tison.
>>> >
>>> >
>>> > Jeff Zhang <zj...@gmail.com> 于2019年9月11日周三 下午4:48写道：
>>> >>
>>> >> +1 for using slack for instant communication
>>> >>
>>> >> Aljoscha Krettek <al...@apache.org> 于2019年9月11日周三 下午4:44写道：
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> We could try and use the ASF slack for this purpose, that would
>>> probably be easiest. See https://s.apache.org/slack-invite. We could
>>> create a dedicated channel for our work and would still use the open ASF
>>> infrastructure and people can have a look if they are interested because
>>> discussion would be public. What do you think?
>>> >>>
>>> >>> P.S. Committers/PMCs should should be able to login with their
>>> apache ID.
>>> >>>
>>> >>> Best,
>>> >>> Aljoscha
>>> >>>
>>> >>> > On 6. Sep 2019, at 14:24, Zili Chen <wa...@gmail.com> wrote:
>>> >>> >
>>> >>> > Hi Aljoscha,
>>> >>> >
>>> >>> > I'd like to gather all the ideas here and among documents, and
>>> draft a
>>> >>> > formal FLIP
>>> >>> > that keep us on the same page. Hopefully I start a FLIP thread in
>>> next week.
>>> >>> >
>>> >>> > For the implementation or said POC part, I'd like to work with you
>>> guys who
>>> >>> > proposed
>>> >>> > the concept Executor to make sure that we go in the same
>>> direction. I'm
>>> >>> > wondering
>>> >>> > whether a dedicate thread or a Slack group is the proper one. In
>>> my opinion
>>> >>> > we can
>>> >>> > involve the team in a Slack group, concurrent with the FLIP
>>> process start
>>> >>> > our branch
>>> >>> > and once we reach a consensus on the FLIP, open an umbrella issue
>>> about the
>>> >>> > framework
>>> >>> > and start subtasks. What do you think?
>>> >>> >
>>> >>> > Best,
>>> >>> > tison.
>>> >>> >
>>> >>> >
>>> >>> > Aljoscha Krettek <al...@apache.org> 于2019年9月5日周四 下午9:39写道：
>>> >>> >
>>> >>> >> Hi Tison,
>>> >>> >>
>>> >>> >> To keep this moving forward, maybe you want to start working on a
>>> proof of
>>> >>> >> concept implementation for the new JobClient interface, maybe
>>> with a new
>>> >>> >> method executeAsync() in the environment that returns the
>>> JobClient and
>>> >>> >> implement the methods to see how that works and to see where we
>>> get. Would
>>> >>> >> you be interested in that?
>>> >>> >>
>>> >>> >> Also, at some point we should collect all the ideas and start
>>> forming an
>>> >>> >> actual FLIP.
>>> >>> >>
>>> >>> >> Best,
>>> >>> >> Aljoscha
>>> >>> >>
>>> >>> >>> On 4. Sep 2019, at 12:04, Zili Chen <wa...@gmail.com>
>>> wrote:
>>> >>> >>>
>>> >>> >>> Thanks for your update Kostas!
>>> >>> >>>
>>> >>> >>> It looks good to me that clean up existing code paths as first
>>> >>> >>> pass. I'd like to help on review and file subtasks if I find
>>> ones.
>>> >>> >>>
>>> >>> >>> Best,
>>> >>> >>> tison.
>>> >>> >>>
>>> >>> >>>
>>> >>> >>> Kostas Kloudas <kk...@gmail.com> 于2019年9月4日周三 下午5:52写道：
>>> >>> >>> Here is the issue, and I will keep on updating it as I find more
>>> issues.
>>> >>> >>>
>>> >>> >>> https://issues.apache.org/jira/browse/FLINK-13954
>>> >>> >>>
>>> >>> >>> This will also cover the refactoring of the Executors that we
>>> discussed
>>> >>> >>> in this thread, without any additional functionality (such as
>>> the job
>>> >>> >> client).
>>> >>> >>>
>>> >>> >>> Kostas
>>> >>> >>>
>>> >>> >>> On Wed, Sep 4, 2019 at 11:46 AM Kostas Kloudas <
>>> kkloudas@gmail.com>
>>> >>> >> wrote:
>>> >>> >>>>
>>> >>> >>>> Great idea Tison!
>>> >>> >>>>
>>> >>> >>>> I will create the umbrella issue and post it here so that we
>>> are all
>>> >>> >>>> on the same page!
>>> >>> >>>>
>>> >>> >>>> Cheers,
>>> >>> >>>> Kostas
>>> >>> >>>>
>>> >>> >>>> On Wed, Sep 4, 2019 at 11:36 AM Zili Chen <wander4096@gmail.com
>>> >
>>> >>> >> wrote:
>>> >>> >>>>>
>>> >>> >>>>> Hi Kostas & Aljoscha,
>>> >>> >>>>>
>>> >>> >>>>> I notice that there is a JIRA(FLINK-13946) which could be
>>> included
>>> >>> >>>>> in this refactor thread. Since we agree on most of directions
>>> in
>>> >>> >>>>> big picture, is it reasonable that we create an umbrella issue
>>> for
>>> >>> >>>>> refactor client APIs and also linked subtasks? It would be a
>>> better
>>> >>> >>>>> way that we join forces of our community.
>>> >>> >>>>>
>>> >>> >>>>> Best,
>>> >>> >>>>> tison.
>>> >>> >>>>>
>>> >>> >>>>>
>>> >>> >>>>> Zili Chen <wa...@gmail.com> 于2019年8月31日周六 下午12:52写道：
>>> >>> >>>>>>
>>> >>> >>>>>> Great Kostas! Looking forward to your POC!
>>> >>> >>>>>>
>>> >>> >>>>>> Best,
>>> >>> >>>>>> tison.
>>> >>> >>>>>>
>>> >>> >>>>>>
>>> >>> >>>>>> Jeff Zhang <zj...@gmail.com> 于2019年8月30日周五 下午11:07写道：
>>> >>> >>>>>>>
>>> >>> >>>>>>> Awesome, @Kostas Looking forward your POC.
>>> >>> >>>>>>>
>>> >>> >>>>>>> Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：
>>> >>> >>>>>>>
>>> >>> >>>>>>>> Hi all,
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> I am just writing here to let you know that I am working on
>>> a
>>> >>> >> POC that
>>> >>> >>>>>>>> tries to refactor the current state of job submission in
>>> Flink.
>>> >>> >>>>>>>> I want to stress out that it introduces NO CHANGES to the
>>> current
>>> >>> >>>>>>>> behaviour of Flink. It just re-arranges things and
>>> introduces the
>>> >>> >>>>>>>> notion of an Executor, which is the entity responsible for
>>> >>> >> taking the
>>> >>> >>>>>>>> user-code and submitting it for execution.
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> Given this, the discussion about the functionality that the
>>> >>> >> JobClient
>>> >>> >>>>>>>> will expose to the user can go on independently and the same
>>> >>> >>>>>>>> holds for all the open questions so far.
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> I hope I will have some more new to share soon.
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> Thanks,
>>> >>> >>>>>>>> Kostas
>>> >>> >>>>>>>>
>>> >>> >>>>>>>> On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <
>>> danrtsey.wy@gmail.com>
>>> >>> >> wrote:
>>> >>> >>>>>>>>>
>>> >>> >>>>>>>>> Hi Zili,
>>> >>> >>>>>>>>>
>>> >>> >>>>>>>>> It make sense to me that a dedicated cluster is started
>>> for a
>>> >>> >> per-job
>>> >>> >>>>>>>>> cluster and will not accept more jobs.
>>> >>> >>>>>>>>> Just have a question about the command line.
>>> >>> >>>>>>>>>
>>> >>> >>>>>>>>> Currently we could use the following commands to start
>>> >>> >> different
>>> >>> >>>>>>>> clusters.
>>> >>> >>>>>>>>> *per-job cluster*
>>> >>> >>>>>>>>> ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m
>>> yarn-cluster
>>> >>> >>>>>>>>> examples/streaming/WindowJoin.jar
>>> >>> >>>>>>>>> *session cluster*
>>> >>> >>>>>>>>> ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
>>> >>> >>>>>>>>> examples/streaming/WindowJoin.jar
>>> >>> >>>>>>>>>
>>> >>> >>>>>>>>> What will it look like after client enhancement?
>>> >>> >>>>>>>>>
>>> >>> >>>>>>>>>
>>> >>> >>>>>>>>> Best,
>>> >>> >>>>>>>>> Yang
>>> >>> >>>>>>>>>
>>> >>> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
>>> >>> >>>>>>>>>
>>> >>> >>>>>>>>>> Hi Till,
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>>> Thanks for your update. Nice to hear :-)
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>>> Best,
>>> >>> >>>>>>>>>> tison.
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>>> Till Rohrmann <tr...@apache.org> 于2019年8月23日周五
>>> >>> >> 下午10:39写道：
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>>>> Hi Tison,
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>> just a quick comment concerning the class loading issues
>>> >>> >> when using
>>> >>> >>>>>>>> the
>>> >>> >>>>>>>>>> per
>>> >>> >>>>>>>>>>> job mode. The community wants to change it so that the
>>> >>> >>>>>>>>>>> StandaloneJobClusterEntryPoint actually uses the user
>>> code
>>> >>> >> class
>>> >>> >>>>>>>> loader
>>> >>> >>>>>>>>>>> with child first class loading [1]. Hence, I hope that
>>> >>> >> this problem
>>> >>> >>>>>>>> will
>>> >>> >>>>>>>>>> be
>>> >>> >>>>>>>>>>> resolved soon.
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-13840
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>> Cheers,
>>> >>> >>>>>>>>>>> Till
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>> On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <
>>> >>> >> kkloudas@gmail.com>
>>> >>> >>>>>>>>>> wrote:
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>>> Hi all,
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>> On the topic of web submission, I agree with Till that
>>> >>> >> it only
>>> >>> >>>>>>>> seems
>>> >>> >>>>>>>>>>>> to complicate things.
>>> >>> >>>>>>>>>>>> It is bad for security, job isolation (anybody can
>>> >>> >> submit/cancel
>>> >>> >>>>>>>> jobs),
>>> >>> >>>>>>>>>>>> and its
>>> >>> >>>>>>>>>>>> implementation complicates some parts of the code. So,
>>> >>> >> if it were
>>> >>> >>>>>>>> to
>>> >>> >>>>>>>>>>>> redesign the
>>> >>> >>>>>>>>>>>> WebUI, maybe this part could be left out. In addition, I
>>> >>> >> would say
>>> >>> >>>>>>>>>>>> that the ability to cancel
>>> >>> >>>>>>>>>>>> jobs could also be left out.
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>> Also I would also be in favour of removing the
>>> >>> >> "detached" mode, for
>>> >>> >>>>>>>>>>>> the reasons mentioned
>>> >>> >>>>>>>>>>>> above (i.e. because now we will have a future
>>> >>> >> representing the
>>> >>> >>>>>>>> result
>>> >>> >>>>>>>>>>>> on which the user
>>> >>> >>>>>>>>>>>> can choose to wait or not).
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>> Now for the separating job submission and cluster
>>> >>> >> creation, I am in
>>> >>> >>>>>>>>>>>> favour of keeping both.
>>> >>> >>>>>>>>>>>> Once again, the reasons are mentioned above by Stephan,
>>> >>> >> Till,
>>> >>> >>>>>>>> Aljoscha
>>> >>> >>>>>>>>>>>> and also Zili seems
>>> >>> >>>>>>>>>>>> to agree. They mainly have to do with security,
>>> >>> >> isolation and ease
>>> >>> >>>>>>>> of
>>> >>> >>>>>>>>>>>> resource management
>>> >>> >>>>>>>>>>>> for the user as he knows that "when my job is done,
>>> >>> >> everything
>>> >>> >>>>>>>> will be
>>> >>> >>>>>>>>>>>> cleared up". This is
>>> >>> >>>>>>>>>>>> also the experience you get when launching a process on
>>> >>> >> your local
>>> >>> >>>>>>>> OS.
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>> On excluding the per-job mode from returning a JobClient
>>> >>> >> or not, I
>>> >>> >>>>>>>>>>>> believe that eventually
>>> >>> >>>>>>>>>>>> it would be nice to allow users to get back a jobClient.
>>> >>> >> The
>>> >>> >>>>>>>> reason is
>>> >>> >>>>>>>>>>>> that 1) I cannot
>>> >>> >>>>>>>>>>>> find any objective reason why the user-experience should
>>> >>> >> diverge,
>>> >>> >>>>>>>> and
>>> >>> >>>>>>>>>>>> 2) this will be the
>>> >>> >>>>>>>>>>>> way that the user will be able to interact with his
>>> >>> >> running job.
>>> >>> >>>>>>>>>>>> Assuming that the necessary
>>> >>> >>>>>>>>>>>> ports are open for the REST API to work, then I think
>>> >>> >> that the
>>> >>> >>>>>>>>>>>> JobClient can run against the
>>> >>> >>>>>>>>>>>> REST API without problems. If the needed ports are not
>>> >>> >> open, then
>>> >>> >>>>>>>> we
>>> >>> >>>>>>>>>>>> are safe to not return
>>> >>> >>>>>>>>>>>> a JobClient, as the user explicitly chose to close all
>>> >>> >> points of
>>> >>> >>>>>>>>>>>> communication to his running job.
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>> On the topic of not hijacking the "env.execute()" in
>>> >>> >> order to get
>>> >>> >>>>>>>> the
>>> >>> >>>>>>>>>>>> Plan, I definitely agree but
>>> >>> >>>>>>>>>>>> for the proposal of having a "compile()" method in the
>>> >>> >> env, I would
>>> >>> >>>>>>>>>>>> like to have a better look at
>>> >>> >>>>>>>>>>>> the existing code.
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>> Cheers,
>>> >>> >>>>>>>>>>>> Kostas
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>> On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <
>>> >>> >> wander4096@gmail.com>
>>> >>> >>>>>>>>>> wrote:
>>> >>> >>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>> Hi Yang,
>>> >>> >>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>> It would be helpful if you check Stephan's last
>>> >>> >> comment,
>>> >>> >>>>>>>>>>>>> which states that isolation is important.
>>> >>> >>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>> For per-job mode, we run a dedicated cluster(maybe it
>>> >>> >>>>>>>>>>>>> should have been a couple of JM and TMs during FLIP-6
>>> >>> >>>>>>>>>>>>> design) for a specific job. Thus the process is
>>> >>> >> prevented
>>> >>> >>>>>>>>>>>>> from other jobs.
>>> >>> >>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>> In our cases there was a time we suffered from multi
>>> >>> >>>>>>>>>>>>> jobs submitted by different users and they affected
>>> >>> >>>>>>>>>>>>> each other so that all ran into an error state. Also,
>>> >>> >>>>>>>>>>>>> run the client inside the cluster could save client
>>> >>> >>>>>>>>>>>>> resource at some points.
>>> >>> >>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>> However, we also face several issues as you mentioned,
>>> >>> >>>>>>>>>>>>> that in per-job mode it always uses parent classloader
>>> >>> >>>>>>>>>>>>> thus classloading issues occur.
>>> >>> >>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>> BTW, one can makes an analogy between session/per-job
>>> >>> >> mode
>>> >>> >>>>>>>>>>>>> in  Flink, and client/cluster mode in Spark.
>>> >>> >>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>> Best,
>>> >>> >>>>>>>>>>>>> tison.
>>> >>> >>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>> Yang Wang <da...@gmail.com> 于2019年8月22日周四
>>> >>> >> 上午11:25写道：
>>> >>> >>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>> From the user's perspective, it is really confused
>>> >>> >> about the
>>> >>> >>>>>>>> scope
>>> >>> >>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>> per-job cluster.
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>> If it means a flink cluster with single job, so that
>>> >>> >> we could
>>> >>> >>>>>>>> get
>>> >>> >>>>>>>>>>>> better
>>> >>> >>>>>>>>>>>>>> isolation.
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>> Now it does not matter how we deploy the cluster,
>>> >>> >> directly
>>> >>> >>>>>>>>>>>> deploy(mode1)
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>> or start a flink cluster and then submit job through
>>> >>> >> cluster
>>> >>> >>>>>>>>>>>> client(mode2).
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>> Otherwise, if it just means directly deploy, how
>>> >>> >> should we
>>> >>> >>>>>>>> name the
>>> >>> >>>>>>>>>>>> mode2,
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>> session with job or something else?
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>> We could also benefit from the mode2. Users could
>>> >>> >> get the same
>>> >>> >>>>>>>>>>>> isolation
>>> >>> >>>>>>>>>>>>>> with mode1.
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>> The user code and dependencies will be loaded by
>>> >>> >> user class
>>> >>> >>>>>>>> loader
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>> to avoid class conflict with framework.
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>> Anyway, both of the two submission modes are useful.
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>> We just need to clarify the concepts.
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>> Best,
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>> Yang
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月20日周二
>>> >>> >> 下午5:58写道：
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>> Thanks for the clarification.
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>> The idea JobDeployer ever came into my mind when I
>>> >>> >> was
>>> >>> >>>>>>>> muddled
>>> >>> >>>>>>>>>> with
>>> >>> >>>>>>>>>>>>>>> how to execute per-job mode and session mode with
>>> >>> >> the same
>>> >>> >>>>>>>> user
>>> >>> >>>>>>>>>>> code
>>> >>> >>>>>>>>>>>>>>> and framework codepath.
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>> With the concept JobDeployer we back to the
>>> >>> >> statement that
>>> >>> >>>>>>>>>>>> environment
>>> >>> >>>>>>>>>>>>>>> knows every configs of cluster deployment and job
>>> >>> >>>>>>>> submission. We
>>> >>> >>>>>>>>>>>>>>> configure or generate from configuration a specific
>>> >>> >>>>>>>> JobDeployer
>>> >>> >>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>> environment and then code align on
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>> *JobClient client = env.execute().get();*
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>> which in session mode returned by
>>> >>> >> clusterClient.submitJob
>>> >>> >>>>>>>> and in
>>> >>> >>>>>>>>>>>> per-job
>>> >>> >>>>>>>>>>>>>>> mode returned by
>>> >>> >> clusterDescriptor.deployJobCluster.
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>> Here comes a problem that currently we directly run
>>> >>> >>>>>>>>>>> ClusterEntrypoint
>>> >>> >>>>>>>>>>>>>>> with extracted job graph. Follow the JobDeployer
>>> >>> >> way we'd
>>> >>> >>>>>>>> better
>>> >>> >>>>>>>>>>>>>>> align entry point of per-job deployment at
>>> >>> >> JobDeployer.
>>> >>> >>>>>>>> Users run
>>> >>> >>>>>>>>>>>>>>> their main method or by a Cli(finally call main
>>> >>> >> method) to
>>> >>> >>>>>>>> deploy
>>> >>> >>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>> job cluster.
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>> Best,
>>> >>> >>>>>>>>>>>>>>> tison.
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年8月20日周二
>>> >>> >> 下午4:40写道：
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>> Till has made some good comments here.
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>> Two things to add:
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>  - The job mode is very nice in the way that it
>>> >>> >> runs the
>>> >>> >>>>>>>>>> client
>>> >>> >>>>>>>>>>>> inside
>>> >>> >>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>> cluster (in the same image/process that is the
>>> >>> >> JM) and thus
>>> >>> >>>>>>>>>>> unifies
>>> >>> >>>>>>>>>>>>>> both
>>> >>> >>>>>>>>>>>>>>>> applications and what the Spark world calls the
>>> >>> >> "driver
>>> >>> >>>>>>>> mode".
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>  - Another thing I would add is that during the
>>> >>> >> FLIP-6
>>> >>> >>>>>>>> design,
>>> >>> >>>>>>>>>>> we
>>> >>> >>>>>>>>>>>> were
>>> >>> >>>>>>>>>>>>>>>> thinking about setups where Dispatcher and
>>> >>> >> JobManager are
>>> >>> >>>>>>>>>>> separate
>>> >>> >>>>>>>>>>>>>>>> processes.
>>> >>> >>>>>>>>>>>>>>>>    A Yarn or Mesos Dispatcher of a session
>>> >>> >> could run
>>> >>> >>>>>>>>>>> independently
>>> >>> >>>>>>>>>>>>>> (even
>>> >>> >>>>>>>>>>>>>>>> as privileged processes executing no code).
>>> >>> >>>>>>>>>>>>>>>>    Then you the "per-job" mode could still be
>>> >>> >> helpful:
>>> >>> >>>>>>>> when a
>>> >>> >>>>>>>>>>> job
>>> >>> >>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>> submitted to the dispatcher, it launches the JM
>>> >>> >> again in a
>>> >>> >>>>>>>>>>> per-job
>>> >>> >>>>>>>>>>>>>> mode,
>>> >>> >>>>>>>>>>>>>>> so
>>> >>> >>>>>>>>>>>>>>>> that JM and TM processes are bound to teh job
>>> >>> >> only. For
>>> >>> >>>>>>>> higher
>>> >>> >>>>>>>>>>>> security
>>> >>> >>>>>>>>>>>>>>>> setups, it is important that processes are not
>>> >>> >> reused
>>> >>> >>>>>>>> across
>>> >>> >>>>>>>>>>> jobs.
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
>>> >>> >>>>>>>>>>>> trohrmann@apache.org>
>>> >>> >>>>>>>>>>>>>>>> wrote:
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>> I would not be in favour of getting rid of the
>>> >>> >> per-job
>>> >>> >>>>>>>> mode
>>> >>> >>>>>>>>>>>> since it
>>> >>> >>>>>>>>>>>>>>>>> simplifies the process of running Flink jobs
>>> >>> >>>>>>>> considerably.
>>> >>> >>>>>>>>>>>> Moreover,
>>> >>> >>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>> not only well suited for container deployments
>>> >>> >> but also
>>> >>> >>>>>>>> for
>>> >>> >>>>>>>>>>>>>> deployments
>>> >>> >>>>>>>>>>>>>>>>> where you want to guarantee job isolation. For
>>> >>> >> example, a
>>> >>> >>>>>>>>>> user
>>> >>> >>>>>>>>>>>> could
>>> >>> >>>>>>>>>>>>>>> use
>>> >>> >>>>>>>>>>>>>>>>> the per-job mode on Yarn to execute his job on
>>> >>> >> a separate
>>> >>> >>>>>>>>>>>> cluster.
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>> I think that having two notions of cluster
>>> >>> >> deployments
>>> >>> >>>>>>>>>> (session
>>> >>> >>>>>>>>>>>> vs.
>>> >>> >>>>>>>>>>>>>>>> per-job
>>> >>> >>>>>>>>>>>>>>>>> mode) does not necessarily contradict your
>>> >>> >> ideas for the
>>> >>> >>>>>>>>>> client
>>> >>> >>>>>>>>>>>> api
>>> >>> >>>>>>>>>>>>>>>>> refactoring. For example one could have the
>>> >>> >> following
>>> >>> >>>>>>>>>>> interfaces:
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>> - ClusterDeploymentDescriptor: encapsulates
>>> >>> >> the logic
>>> >>> >>>>>>>> how to
>>> >>> >>>>>>>>>>>> deploy a
>>> >>> >>>>>>>>>>>>>>>>> cluster.
>>> >>> >>>>>>>>>>>>>>>>> - ClusterClient: allows to interact with a
>>> >>> >> cluster
>>> >>> >>>>>>>>>>>>>>>>> - JobClient: allows to interact with a running
>>> >>> >> job
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>> Now the ClusterDeploymentDescriptor could have
>>> >>> >> two
>>> >>> >>>>>>>> methods:
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>> - ClusterClient deploySessionCluster()
>>> >>> >>>>>>>>>>>>>>>>> - JobClusterClient/JobClient
>>> >>> >>>>>>>> deployPerJobCluster(JobGraph)
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>> where JobClusterClient is either a supertype of
>>> >>> >>>>>>>> ClusterClient
>>> >>> >>>>>>>>>>>> which
>>> >>> >>>>>>>>>>>>>>> does
>>> >>> >>>>>>>>>>>>>>>>> not give you the functionality to submit jobs
>>> >>> >> or
>>> >>> >>>>>>>>>>>> deployPerJobCluster
>>> >>> >>>>>>>>>>>>>>>>> returns directly a JobClient.
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>> When setting up the ExecutionEnvironment, one
>>> >>> >> would then
>>> >>> >>>>>>>> not
>>> >>> >>>>>>>>>>>> provide
>>> >>> >>>>>>>>>>>>>> a
>>> >>> >>>>>>>>>>>>>>>>> ClusterClient to submit jobs but a JobDeployer
>>> >>> >> which,
>>> >>> >>>>>>>>>> depending
>>> >>> >>>>>>>>>>>> on
>>> >>> >>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>> selected mode, either uses a ClusterClient
>>> >>> >> (session
>>> >>> >>>>>>>> mode) to
>>> >>> >>>>>>>>>>>> submit
>>> >>> >>>>>>>>>>>>>>> jobs
>>> >>> >>>>>>>>>>>>>>>> or
>>> >>> >>>>>>>>>>>>>>>>> a ClusterDeploymentDescriptor to deploy per a
>>> >>> >> job mode
>>> >>> >>>>>>>>>> cluster
>>> >>> >>>>>>>>>>>> with
>>> >>> >>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>> to execute.
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>> These are just some thoughts how one could
>>> >>> >> make it
>>> >>> >>>>>>>> working
>>> >>> >>>>>>>>>>>> because I
>>> >>> >>>>>>>>>>>>>>>>> believe there is some value in using the per
>>> >>> >> job mode
>>> >>> >>>>>>>> from
>>> >>> >>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>> ExecutionEnvironment.
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>> Concerning the web submission, this is indeed
>>> >>> >> a bit
>>> >>> >>>>>>>> tricky.
>>> >>> >>>>>>>>>>> From
>>> >>> >>>>>>>>>>>> a
>>> >>> >>>>>>>>>>>>>>>> cluster
>>> >>> >>>>>>>>>>>>>>>>> management stand point, I would in favour of
>>> >>> >> not
>>> >>> >>>>>>>> executing
>>> >>> >>>>>>>>>> user
>>> >>> >>>>>>>>>>>> code
>>> >>> >>>>>>>>>>>>>> on
>>> >>> >>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>> REST endpoint. Especially when considering
>>> >>> >> security, it
>>> >>> >>>>>>>> would
>>> >>> >>>>>>>>>>> be
>>> >>> >>>>>>>>>>>> good
>>> >>> >>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>> have a well defined cluster behaviour where it
>>> >>> >> is
>>> >>> >>>>>>>> explicitly
>>> >>> >>>>>>>>>>>> stated
>>> >>> >>>>>>>>>>>>>>> where
>>> >>> >>>>>>>>>>>>>>>>> user code and, thus, potentially risky code is
>>> >>> >> executed.
>>> >>> >>>>>>>>>>> Ideally
>>> >>> >>>>>>>>>>>> we
>>> >>> >>>>>>>>>>>>>>> limit
>>> >>> >>>>>>>>>>>>>>>>> it to the TaskExecutor and JobMaster.
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>> Cheers,
>>> >>> >>>>>>>>>>>>>>>>> Till
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:40 AM Flavio
>>> >>> >> Pompermaier <
>>> >>> >>>>>>>>>>>>>>> pompermaier@okkam.it
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>> wrote:
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>> In my opinion the client should not use any
>>> >>> >>>>>>>> environment to
>>> >>> >>>>>>>>>>> get
>>> >>> >>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>> Job
>>> >>> >>>>>>>>>>>>>>>>>> graph because the jar should reside ONLY on
>>> >>> >> the cluster
>>> >>> >>>>>>>>>> (and
>>> >>> >>>>>>>>>>>> not in
>>> >>> >>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>> client classpath otherwise there are always
>>> >>> >>>>>>>> inconsistencies
>>> >>> >>>>>>>>>>>> between
>>> >>> >>>>>>>>>>>>>>>>> client
>>> >>> >>>>>>>>>>>>>>>>>> and Flink Job manager's classpath).
>>> >>> >>>>>>>>>>>>>>>>>> In the YARN, Mesos and Kubernetes scenarios
>>> >>> >> you have
>>> >>> >>>>>>>> the
>>> >>> >>>>>>>>>> jar
>>> >>> >>>>>>>>>>>> but
>>> >>> >>>>>>>>>>>>>> you
>>> >>> >>>>>>>>>>>>>>>>> could
>>> >>> >>>>>>>>>>>>>>>>>> start a cluster that has the jar on the Job
>>> >>> >> Manager as
>>> >>> >>>>>>>> well
>>> >>> >>>>>>>>>>>> (but
>>> >>> >>>>>>>>>>>>>> this
>>> >>> >>>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>>> the only case where I think you can assume
>>> >>> >> that the
>>> >>> >>>>>>>> client
>>> >>> >>>>>>>>>>> has
>>> >>> >>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>> jar
>>> >>> >>>>>>>>>>>>>>>> on
>>> >>> >>>>>>>>>>>>>>>>>> the classpath..in the REST job submission
>>> >>> >> you don't
>>> >>> >>>>>>>> have
>>> >>> >>>>>>>>>> any
>>> >>> >>>>>>>>>>>>>>>> classpath).
>>> >>> >>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>> Thus, always in my opinion, the JobGraph
>>> >>> >> should be
>>> >>> >>>>>>>>>> generated
>>> >>> >>>>>>>>>>>> by the
>>> >>> >>>>>>>>>>>>>>> Job
>>> >>> >>>>>>>>>>>>>>>>>> Manager REST API.
>>> >>> >>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
>>> >>> >>>>>>>>>>>> wander4096@gmail.com>
>>> >>> >>>>>>>>>>>>>>>> wrote:
>>> >>> >>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>> I would like to involve Till & Stephan here
>>> >>> >> to clarify
>>> >>> >>>>>>>>>> some
>>> >>> >>>>>>>>>>>>>> concept
>>> >>> >>>>>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>>>>>> per-job mode.
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>> The term per-job is one of modes a cluster
>>> >>> >> could run
>>> >>> >>>>>>>> on.
>>> >>> >>>>>>>>>> It
>>> >>> >>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>> mainly
>>> >>> >>>>>>>>>>>>>>>>>>> aimed
>>> >>> >>>>>>>>>>>>>>>>>>> at spawn
>>> >>> >>>>>>>>>>>>>>>>>>> a dedicated cluster for a specific job
>>> >>> >> while the job
>>> >>> >>>>>>>> could
>>> >>> >>>>>>>>>>> be
>>> >>> >>>>>>>>>>>>>>> packaged
>>> >>> >>>>>>>>>>>>>>>>>>> with
>>> >>> >>>>>>>>>>>>>>>>>>> Flink
>>> >>> >>>>>>>>>>>>>>>>>>> itself and thus the cluster initialized
>>> >>> >> with job so
>>> >>> >>>>>>>> that
>>> >>> >>>>>>>>>> get
>>> >>> >>>>>>>>>>>> rid
>>> >>> >>>>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>> a
>>> >>> >>>>>>>>>>>>>>>>>>> separated
>>> >>> >>>>>>>>>>>>>>>>>>> submission step.
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>> This is useful for container deployments
>>> >>> >> where one
>>> >>> >>>>>>>> create
>>> >>> >>>>>>>>>>> his
>>> >>> >>>>>>>>>>>>>> image
>>> >>> >>>>>>>>>>>>>>>> with
>>> >>> >>>>>>>>>>>>>>>>>>> the job
>>> >>> >>>>>>>>>>>>>>>>>>> and then simply deploy the container.
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>> However, it is out of client scope since a
>>> >>> >>>>>>>>>>>> client(ClusterClient
>>> >>> >>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>> example) is for
>>> >>> >>>>>>>>>>>>>>>>>>> communicate with an existing cluster and
>>> >>> >> performance
>>> >>> >>>>>>>>>>> actions.
>>> >>> >>>>>>>>>>>>>>>> Currently,
>>> >>> >>>>>>>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>> per-job
>>> >>> >>>>>>>>>>>>>>>>>>> mode, we extract the job graph and bundle
>>> >>> >> it into
>>> >>> >>>>>>>> cluster
>>> >>> >>>>>>>>>>>>>> deployment
>>> >>> >>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>> thus no
>>> >>> >>>>>>>>>>>>>>>>>>> concept of client get involved. It looks
>>> >>> >> like
>>> >>> >>>>>>>> reasonable
>>> >>> >>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>> exclude
>>> >>> >>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>> deployment
>>> >>> >>>>>>>>>>>>>>>>>>> of per-job cluster from client api and use
>>> >>> >> dedicated
>>> >>> >>>>>>>>>> utility
>>> >>> >>>>>>>>>>>>>>>>>>> classes(deployers) for
>>> >>> >>>>>>>>>>>>>>>>>>> deployment.
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
>>> >>> >> 于2019年8月20日周二
>>> >>> >>>>>>>> 下午12:37写道：
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> Hi Aljoscha,
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> Thanks for your reply and participance.
>>> >>> >> The Google
>>> >>> >>>>>>>> Doc
>>> >>> >>>>>>>>>> you
>>> >>> >>>>>>>>>>>>>> linked
>>> >>> >>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>> requires
>>> >>> >>>>>>>>>>>>>>>>>>>> permission and I think you could use a
>>> >>> >> share link
>>> >>> >>>>>>>>>> instead.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> I agree with that we almost reach a
>>> >>> >> consensus that
>>> >>> >>>>>>>>>>>> JobClient is
>>> >>> >>>>>>>>>>>>>>>>>>> necessary
>>> >>> >>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>> interacte with a running Job.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> Let me check your open questions one by
>>> >>> >> one.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> 1. Separate cluster creation and job
>>> >>> >> submission for
>>> >>> >>>>>>>>>>> per-job
>>> >>> >>>>>>>>>>>>>> mode.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> As you mentioned here is where the
>>> >>> >> opinions
>>> >>> >>>>>>>> diverge. In
>>> >>> >>>>>>>>>> my
>>> >>> >>>>>>>>>>>>>>> document
>>> >>> >>>>>>>>>>>>>>>>>>> there
>>> >>> >>>>>>>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>>>>> an alternative[2] that proposes excluding
>>> >>> >> per-job
>>> >>> >>>>>>>>>>> deployment
>>> >>> >>>>>>>>>>>>>> from
>>> >>> >>>>>>>>>>>>>>>>> client
>>> >>> >>>>>>>>>>>>>>>>>>>> api
>>> >>> >>>>>>>>>>>>>>>>>>>> scope and now I find it is more
>>> >>> >> reasonable we do the
>>> >>> >>>>>>>>>>>> exclusion.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> When in per-job mode, a dedicated
>>> >>> >> JobCluster is
>>> >>> >>>>>>>> launched
>>> >>> >>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>> execute
>>> >>> >>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>> specific job. It is like a Flink
>>> >>> >> Application more
>>> >>> >>>>>>>> than a
>>> >>> >>>>>>>>>>>>>>> submission
>>> >>> >>>>>>>>>>>>>>>>>>>> of Flink Job. Client only takes care of
>>> >>> >> job
>>> >>> >>>>>>>> submission
>>> >>> >>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>> assume
>>> >>> >>>>>>>>>>>>>>>>> there
>>> >>> >>>>>>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>>>>> an existing cluster. In this way we are
>>> >>> >> able to
>>> >>> >>>>>>>> consider
>>> >>> >>>>>>>>>>>> per-job
>>> >>> >>>>>>>>>>>>>>>>> issues
>>> >>> >>>>>>>>>>>>>>>>>>>> individually and JobClusterEntrypoint
>>> >>> >> would be the
>>> >>> >>>>>>>>>> utility
>>> >>> >>>>>>>>>>>> class
>>> >>> >>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>> per-job
>>> >>> >>>>>>>>>>>>>>>>>>>> deployment.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> Nevertheless, user program works in both
>>> >>> >> session
>>> >>> >>>>>>>> mode
>>> >>> >>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>> per-job
>>> >>> >>>>>>>>>>>>>>>> mode
>>> >>> >>>>>>>>>>>>>>>>>>>> without
>>> >>> >>>>>>>>>>>>>>>>>>>> necessary to change code. JobClient in
>>> >>> >> per-job mode
>>> >>> >>>>>>>> is
>>> >>> >>>>>>>>>>>> returned
>>> >>> >>>>>>>>>>>>>>> from
>>> >>> >>>>>>>>>>>>>>>>>>>> env.execute as normal. However, it would
>>> >>> >> be no
>>> >>> >>>>>>>> longer a
>>> >>> >>>>>>>>>>>> wrapper
>>> >>> >>>>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>>>>>>> RestClusterClient but a wrapper of
>>> >>> >>>>>>>> PerJobClusterClient
>>> >>> >>>>>>>>>>> which
>>> >>> >>>>>>>>>>>>>>>>>>> communicates
>>> >>> >>>>>>>>>>>>>>>>>>>> to Dispatcher locally.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> 2. How to deal with plan preview.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> With env.compile functions users can get
>>> >>> >> JobGraph or
>>> >>> >>>>>>>>>>>> FlinkPlan
>>> >>> >>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>> thus
>>> >>> >>>>>>>>>>>>>>>>>>>> they can preview the plan with
>>> >>> >> programming.
>>> >>> >>>>>>>> Typically it
>>> >>> >>>>>>>>>>>> looks
>>> >>> >>>>>>>>>>>>>>> like
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> if (preview configured) {
>>> >>> >>>>>>>>>>>>>>>>>>>>    FlinkPlan plan = env.compile();
>>> >>> >>>>>>>>>>>>>>>>>>>>    new JSONDumpGenerator(...).dump(plan);
>>> >>> >>>>>>>>>>>>>>>>>>>> } else {
>>> >>> >>>>>>>>>>>>>>>>>>>>    env.execute();
>>> >>> >>>>>>>>>>>>>>>>>>>> }
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> And `flink info` would be invalid any
>>> >>> >> more.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> 3. How to deal with Jar Submission at the
>>> >>> >> Web
>>> >>> >>>>>>>> Frontend.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> There is one more thread talked on this
>>> >>> >> topic[1].
>>> >>> >>>>>>>> Apart
>>> >>> >>>>>>>>>>> from
>>> >>> >>>>>>>>>>>>>>>> removing
>>> >>> >>>>>>>>>>>>>>>>>>>> the functions there are two alternatives.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> One is to introduce an interface has a
>>> >>> >> method
>>> >>> >>>>>>>> returns
>>> >>> >>>>>>>>>>>>>>>>> JobGraph/FilnkPlan
>>> >>> >>>>>>>>>>>>>>>>>>>> and Jar Submission only support main-class
>>> >>> >>>>>>>> implements
>>> >>> >>>>>>>>>> this
>>> >>> >>>>>>>>>>>>>>>> interface.
>>> >>> >>>>>>>>>>>>>>>>>>>> And then extract the JobGraph/FlinkPlan
>>> >>> >> just by
>>> >>> >>>>>>>> calling
>>> >>> >>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>> method.
>>> >>> >>>>>>>>>>>>>>>>>>>> In this way, it is even possible to
>>> >>> >> consider a
>>> >>> >>>>>>>>>> separation
>>> >>> >>>>>>>>>>>> of job
>>> >>> >>>>>>>>>>>>>>>>>>> creation
>>> >>> >>>>>>>>>>>>>>>>>>>> and job submission.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> The other is, as you mentioned, let
>>> >>> >> execute() do the
>>> >>> >>>>>>>>>>> actual
>>> >>> >>>>>>>>>>>>>>>> execution.
>>> >>> >>>>>>>>>>>>>>>>>>>> We won't execute the main method in the
>>> >>> >> WebFrontend
>>> >>> >>>>>>>> but
>>> >>> >>>>>>>>>>>> spawn a
>>> >>> >>>>>>>>>>>>>>>>> process
>>> >>> >>>>>>>>>>>>>>>>>>>> at WebMonitor side to execute. For return
>>> >>> >> part we
>>> >>> >>>>>>>> could
>>> >>> >>>>>>>>>>>> generate
>>> >>> >>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>> JobID from WebMonitor and pass it to the
>>> >>> >> execution
>>> >>> >>>>>>>>>>>> environemnt.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> 4. How to deal with detached mode.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> I think detached mode is a temporary
>>> >>> >> solution for
>>> >>> >>>>>>>>>>>> non-blocking
>>> >>> >>>>>>>>>>>>>>>>>>> submission.
>>> >>> >>>>>>>>>>>>>>>>>>>> In my document both submission and
>>> >>> >> execution return
>>> >>> >>>>>>>> a
>>> >>> >>>>>>>>>>>>>>>>> CompletableFuture
>>> >>> >>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>> users control whether or not wait for the
>>> >>> >> result. In
>>> >>> >>>>>>>>>> this
>>> >>> >>>>>>>>>>>> point
>>> >>> >>>>>>>>>>>>>> we
>>> >>> >>>>>>>>>>>>>>>>> don't
>>> >>> >>>>>>>>>>>>>>>>>>>> need a detached option but the
>>> >>> >> functionality is
>>> >>> >>>>>>>> covered.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> 5. How does per-job mode interact with
>>> >>> >> interactive
>>> >>> >>>>>>>>>>>> programming.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> All of YARN, Mesos and Kubernetes
>>> >>> >> scenarios follow
>>> >>> >>>>>>>> the
>>> >>> >>>>>>>>>>>> pattern
>>> >>> >>>>>>>>>>>>>>>> launch
>>> >>> >>>>>>>>>>>>>>>>> a
>>> >>> >>>>>>>>>>>>>>>>>>>> JobCluster now. And I don't think there
>>> >>> >> would be
>>> >>> >>>>>>>>>>>> inconsistency
>>> >>> >>>>>>>>>>>>>>>> between
>>> >>> >>>>>>>>>>>>>>>>>>>> different resource management.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> Best,
>>> >>> >>>>>>>>>>>>>>>>>>>> tison.
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> [1]
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>
>>> >>> >>
>>> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
>>> >>> >>>>>>>>>>>>>>>>>>>> [2]
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>
>>> >>> >>
>>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org>
>>> >>> >>>>>>>> 于2019年8月16日周五
>>> >>> >>>>>>>>>>>> 下午9:20写道：
>>> >>> >>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>> Hi,
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>> I read both Jeffs initial design
>>> >>> >> document and the
>>> >>> >>>>>>>> newer
>>> >>> >>>>>>>>>>>>>> document
>>> >>> >>>>>>>>>>>>>>> by
>>> >>> >>>>>>>>>>>>>>>>>>>>> Tison. I also finally found the time to
>>> >>> >> collect our
>>> >>> >>>>>>>>>>>> thoughts on
>>> >>> >>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>> issue,
>>> >>> >>>>>>>>>>>>>>>>>>>>> I had quite some discussions with Kostas
>>> >>> >> and this
>>> >>> >>>>>>>> is
>>> >>> >>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>> result:
>>> >>> >>>>>>>>>>>>>>>> [1].
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>> I think overall we agree that this part
>>> >>> >> of the
>>> >>> >>>>>>>> code is
>>> >>> >>>>>>>>>> in
>>> >>> >>>>>>>>>>>> dire
>>> >>> >>>>>>>>>>>>>>> need
>>> >>> >>>>>>>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>>>>>>>> some refactoring/improvements but I
>>> >>> >> think there are
>>> >>> >>>>>>>>>> still
>>> >>> >>>>>>>>>>>> some
>>> >>> >>>>>>>>>>>>>>> open
>>> >>> >>>>>>>>>>>>>>>>>>>>> questions and some differences in
>>> >>> >> opinion what
>>> >>> >>>>>>>> those
>>> >>> >>>>>>>>>>>>>> refactorings
>>> >>> >>>>>>>>>>>>>>>>>>> should
>>> >>> >>>>>>>>>>>>>>>>>>>>> look like.
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>> I think the API-side is quite clear,
>>> >>> >> i.e. we need
>>> >>> >>>>>>>> some
>>> >>> >>>>>>>>>>>>>> JobClient
>>> >>> >>>>>>>>>>>>>>>> API
>>> >>> >>>>>>>>>>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>>>>>>>>> allows interacting with a running Job.
>>> >>> >> It could be
>>> >>> >>>>>>>>>>>> worthwhile
>>> >>> >>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>> spin
>>> >>> >>>>>>>>>>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>>>>>>>>> off into a separate FLIP because we can
>>> >>> >> probably
>>> >>> >>>>>>>> find
>>> >>> >>>>>>>>>>>> consensus
>>> >>> >>>>>>>>>>>>>>> on
>>> >>> >>>>>>>>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>>>>>>>>> part more easily.
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>> For the rest, the main open questions
>>> >>> >> from our doc
>>> >>> >>>>>>>> are
>>> >>> >>>>>>>>>>>> these:
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>  - Do we want to separate cluster
>>> >>> >> creation and job
>>> >>> >>>>>>>>>>>> submission
>>> >>> >>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>> per-job mode? In the past, there were
>>> >>> >> conscious
>>> >>> >>>>>>>> efforts
>>> >>> >>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>> *not*
>>> >>> >>>>>>>>>>>>>>>>>>> separate
>>> >>> >>>>>>>>>>>>>>>>>>>>> job submission from cluster creation for
>>> >>> >> per-job
>>> >>> >>>>>>>>>> clusters
>>> >>> >>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>> Mesos,
>>> >>> >>>>>>>>>>>>>>>>>>> YARN,
>>> >>> >>>>>>>>>>>>>>>>>>>>> Kubernets (see
>>> >>> >> StandaloneJobClusterEntryPoint).
>>> >>> >>>>>>>> Tison
>>> >>> >>>>>>>>>>>> suggests
>>> >>> >>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>> his
>>> >>> >>>>>>>>>>>>>>>>>>>>> design document to decouple this in
>>> >>> >> order to unify
>>> >>> >>>>>>>> job
>>> >>> >>>>>>>>>>>>>>> submission.
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>  - How to deal with plan preview, which
>>> >>> >> needs to
>>> >>> >>>>>>>>>> hijack
>>> >>> >>>>>>>>>>>>>>> execute()
>>> >>> >>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>> let the outside code catch an exception?
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>  - How to deal with Jar Submission at
>>> >>> >> the Web
>>> >>> >>>>>>>>>> Frontend,
>>> >>> >>>>>>>>>>>> which
>>> >>> >>>>>>>>>>>>>>>> needs
>>> >>> >>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>> hijack execute() and let the outside
>>> >>> >> code catch an
>>> >>> >>>>>>>>>>>> exception?
>>> >>> >>>>>>>>>>>>>>>>>>>>> CliFrontend.run() “hijacks”
>>> >>> >>>>>>>>>>> ExecutionEnvironment.execute()
>>> >>> >>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>> get a
>>> >>> >>>>>>>>>>>>>>>>>>>>> JobGraph and then execute that JobGraph
>>> >>> >> manually.
>>> >>> >>>>>>>> We
>>> >>> >>>>>>>>>>> could
>>> >>> >>>>>>>>>>>> get
>>> >>> >>>>>>>>>>>>>>>> around
>>> >>> >>>>>>>>>>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>>>>>>>>> by letting execute() do the actual
>>> >>> >> execution. One
>>> >>> >>>>>>>>>> caveat
>>> >>> >>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>> this
>>> >>> >>>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>>>>>>>>> now the main() method doesn’t return (or
>>> >>> >> is forced
>>> >>> >>>>>>>> to
>>> >>> >>>>>>>>>>>> return by
>>> >>> >>>>>>>>>>>>>>>>>>> throwing an
>>> >>> >>>>>>>>>>>>>>>>>>>>> exception from execute()) which means
>>> >>> >> that for Jar
>>> >>> >>>>>>>>>>>> Submission
>>> >>> >>>>>>>>>>>>>>> from
>>> >>> >>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>> WebFrontend we have a long-running
>>> >>> >> main() method
>>> >>> >>>>>>>>>> running
>>> >>> >>>>>>>>>>>> in the
>>> >>> >>>>>>>>>>>>>>>>>>>>> WebFrontend. This doesn’t sound very
>>> >>> >> good. We
>>> >>> >>>>>>>> could get
>>> >>> >>>>>>>>>>>> around
>>> >>> >>>>>>>>>>>>>>> this
>>> >>> >>>>>>>>>>>>>>>>> by
>>> >>> >>>>>>>>>>>>>>>>>>>>> removing the plan preview feature and by
>>> >>> >> removing
>>> >>> >>>>>>>> Jar
>>> >>> >>>>>>>>>>>>>>>>>>> Submission/Running.
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>  - How to deal with detached mode?
>>> >>> >> Right now,
>>> >>> >>>>>>>>>>>>>>> DetachedEnvironment
>>> >>> >>>>>>>>>>>>>>>>> will
>>> >>> >>>>>>>>>>>>>>>>>>>>> execute the job and return immediately.
>>> >>> >> If users
>>> >>> >>>>>>>>>> control
>>> >>> >>>>>>>>>>>> when
>>> >>> >>>>>>>>>>>>>>> they
>>> >>> >>>>>>>>>>>>>>>>>>> want to
>>> >>> >>>>>>>>>>>>>>>>>>>>> return, by waiting on the job completion
>>> >>> >> future,
>>> >>> >>>>>>>> how do
>>> >>> >>>>>>>>>>> we
>>> >>> >>>>>>>>>>>> deal
>>> >>> >>>>>>>>>>>>>>>> with
>>> >>> >>>>>>>>>>>>>>>>>>> this?
>>> >>> >>>>>>>>>>>>>>>>>>>>> Do we simply remove the distinction
>>> >>> >> between
>>> >>> >>>>>>>>>>>>>>> detached/non-detached?
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>  - How does per-job mode interact with
>>> >>> >>>>>>>> “interactive
>>> >>> >>>>>>>>>>>>>> programming”
>>> >>> >>>>>>>>>>>>>>>>>>>>> (FLIP-36). For YARN, each execute() call
>>> >>> >> could
>>> >>> >>>>>>>> spawn a
>>> >>> >>>>>>>>>>> new
>>> >>> >>>>>>>>>>>>>> Flink
>>> >>> >>>>>>>>>>>>>>>> YARN
>>> >>> >>>>>>>>>>>>>>>>>>>>> cluster. What about Mesos and Kubernetes?
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>> The first open question is where the
>>> >>> >> opinions
>>> >>> >>>>>>>> diverge,
>>> >>> >>>>>>>>>> I
>>> >>> >>>>>>>>>>>> think.
>>> >>> >>>>>>>>>>>>>>> The
>>> >>> >>>>>>>>>>>>>>>>>>> rest
>>> >>> >>>>>>>>>>>>>>>>>>>>> are just open questions and interesting
>>> >>> >> things
>>> >>> >>>>>>>> that we
>>> >>> >>>>>>>>>>>> need to
>>> >>> >>>>>>>>>>>>>>>>>>> consider.
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>> Best,
>>> >>> >>>>>>>>>>>>>>>>>>>>> Aljoscha
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>> [1]
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>
>>> >>> >>
>>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>>> >>> >>>>>>>>>>>>>>>>>>>>> <
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>
>>> >>> >>
>>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>>> >>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>> On 31. Jul 2019, at 15:23, Jeff Zhang <
>>> >>> >>>>>>>>>>> zjffdu@gmail.com>
>>> >>> >>>>>>>>>>>>>>> wrote:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>> Thanks tison for the effort. I left a
>>> >>> >> few
>>> >>> >>>>>>>> comments.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
>>> >>> >> 于2019年7月31日周三
>>> >>> >>>>>>>>>>> 下午8:24写道：
>>> >>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> Hi Flavio,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> Thanks for your reply.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> Either current impl and in the design,
>>> >>> >>>>>>>> ClusterClient
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> never takes responsibility for
>>> >>> >> generating
>>> >>> >>>>>>>> JobGraph.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> (what you see in current codebase is
>>> >>> >> several
>>> >>> >>>>>>>> class
>>> >>> >>>>>>>>>>>> methods)
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> Instead, user describes his program
>>> >>> >> in the main
>>> >>> >>>>>>>>>> method
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> with ExecutionEnvironment apis and
>>> >>> >> calls
>>> >>> >>>>>>>>>> env.compile()
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> or env.optimize() to get FlinkPlan
>>> >>> >> and JobGraph
>>> >>> >>>>>>>>>>>>>> respectively.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> For listing main classes in a jar and
>>> >>> >> choose
>>> >>> >>>>>>>> one for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> submission, you're now able to
>>> >>> >> customize a CLI
>>> >>> >>>>>>>> to do
>>> >>> >>>>>>>>>>> it.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> Specifically, the path of jar is
>>> >>> >> passed as
>>> >>> >>>>>>>> arguments
>>> >>> >>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> in the customized CLI you list main
>>> >>> >> classes,
>>> >>> >>>>>>>> choose
>>> >>> >>>>>>>>>>> one
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> to submit to the cluster.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> tison.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
>>> >>> >> pompermaier@okkam.it>
>>> >>> >>>>>>>>>>> 于2019年7月31日周三
>>> >>> >>>>>>>>>>>>>>>> 下午8:12写道：
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> Just one note on my side: it is not
>>> >>> >> clear to me
>>> >>> >>>>>>>>>>>> whether the
>>> >>> >>>>>>>>>>>>>>>>> client
>>> >>> >>>>>>>>>>>>>>>>>>>>> needs
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> be able to generate a job graph or
>>> >>> >> not.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> In my opinion, the job jar must
>>> >>> >> resides only
>>> >>> >>>>>>>> on the
>>> >>> >>>>>>>>>>>>>>>>>>> server/jobManager
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> side
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> and the client requires a way to get
>>> >>> >> the job
>>> >>> >>>>>>>> graph.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> If you really want to access to the
>>> >>> >> job graph,
>>> >>> >>>>>>>> I'd
>>> >>> >>>>>>>>>>> add
>>> >>> >>>>>>>>>>>> a
>>> >>> >>>>>>>>>>>>>>>>> dedicated
>>> >>> >>>>>>>>>>>>>>>>>>>>> method
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> on the ClusterClient. like:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>  - getJobGraph(jarId, mainClass):
>>> >>> >> JobGraph
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>  - listMainClasses(jarId):
>>> >>> >> List<String>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> These would require some addition
>>> >>> >> also on the
>>> >>> >>>>>>>> job
>>> >>> >>>>>>>>>>>> manager
>>> >>> >>>>>>>>>>>>>>>>> endpoint
>>> >>> >>>>>>>>>>>>>>>>>>> as
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> well..what do you think?
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 31, 2019 at 12:42 PM
>>> >>> >> Zili Chen <
>>> >>> >>>>>>>>>>>>>>>> wander4096@gmail.com
>>> >>> >>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Here is a document[1] on client api
>>> >>> >>>>>>>> enhancement
>>> >>> >>>>>>>>>> from
>>> >>> >>>>>>>>>>>> our
>>> >>> >>>>>>>>>>>>>>>>>>> perspective.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> We have investigated current
>>> >>> >> implementations.
>>> >>> >>>>>>>> And
>>> >>> >>>>>>>>>> we
>>> >>> >>>>>>>>>>>>>> propose
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Unify the implementation of
>>> >>> >> cluster
>>> >>> >>>>>>>> deployment
>>> >>> >>>>>>>>>>> and
>>> >>> >>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>> submission
>>> >>> >>>>>>>>>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Flink.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Provide programmatic interfaces
>>> >>> >> to allow
>>> >>> >>>>>>>>>> flexible
>>> >>> >>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>> cluster
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> management.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> The first proposal is aimed at
>>> >>> >> reducing code
>>> >>> >>>>>>>> paths
>>> >>> >>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>> cluster
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> deployment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> job submission so that one can
>>> >>> >> adopt Flink in
>>> >>> >>>>>>>> his
>>> >>> >>>>>>>>>>>> usage
>>> >>> >>>>>>>>>>>>>>>> easily.
>>> >>> >>>>>>>>>>>>>>>>>>> The
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> second
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> proposal is aimed at providing rich
>>> >>> >>>>>>>> interfaces for
>>> >>> >>>>>>>>>>>>>> advanced
>>> >>> >>>>>>>>>>>>>>>>> users
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> who want to make accurate control
>>> >>> >> of these
>>> >>> >>>>>>>> stages.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Quick reference on open questions:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Exclude job cluster deployment
>>> >>> >> from client
>>> >>> >>>>>>>> side
>>> >>> >>>>>>>>>>> or
>>> >>> >>>>>>>>>>>>>>> redefine
>>> >>> >>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> semantic
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> of job cluster? Since it fits in a
>>> >>> >> process
>>> >>> >>>>>>>> quite
>>> >>> >>>>>>>>>>>> different
>>> >>> >>>>>>>>>>>>>>>> from
>>> >>> >>>>>>>>>>>>>>>>>>>>> session
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> cluster deployment and job
>>> >>> >> submission.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Maintain the codepaths handling
>>> >>> >> class
>>> >>> >>>>>>>>>>>>>>>>> o.a.f.api.common.Program
>>> >>> >>>>>>>>>>>>>>>>>>> or
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> implement customized program
>>> >>> >> handling logic by
>>> >>> >>>>>>>>>>>> customized
>>> >>> >>>>>>>>>>>>>>>>>>>>> CliFrontend?
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> See also this thread[2] and the
>>> >>> >> document[1].
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 3. Expose ClusterClient as public
>>> >>> >> api or just
>>> >>> >>>>>>>>>> expose
>>> >>> >>>>>>>>>>>> api
>>> >>> >>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> and delegate them to ClusterClient?
>>> >>> >> Further,
>>> >>> >>>>>>>> in
>>> >>> >>>>>>>>>>>> either way
>>> >>> >>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>> worth
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> introduce a JobClient which is an
>>> >>> >>>>>>>> encapsulation of
>>> >>> >>>>>>>>>>>>>>>> ClusterClient
>>> >>> >>>>>>>>>>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> associated to specific job?
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> tison.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>
>>> >>> >>
>>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>
>>> >>> >>
>>> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang <zj...@gmail.com>
>>> >>> >> 于2019年7月24日周三
>>> >>> >>>>>>>>>>> 上午9:19写道：
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Stephan, I will follow up
>>> >>> >> this issue
>>> >>> >>>>>>>> in
>>> >>> >>>>>>>>>> next
>>> >>> >>>>>>>>>>>> few
>>> >>> >>>>>>>>>>>>>>>> weeks,
>>> >>> >>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> will
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> refine the design doc. We could
>>> >>> >> discuss more
>>> >>> >>>>>>>>>>> details
>>> >>> >>>>>>>>>>>>>> after
>>> >>> >>>>>>>>>>>>>>>> 1.9
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> release.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
>>> >>> >>>>>>>> 于2019年7月24日周三
>>> >>> >>>>>>>>>>>> 上午12:58写道：
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all!
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> This thread has stalled for a
>>> >>> >> bit, which I
>>> >>> >>>>>>>>>> assume
>>> >>> >>>>>>>>>>>> ist
>>> >>> >>>>>>>>>>>>>>> mostly
>>> >>> >>>>>>>>>>>>>>>>>>> due to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Flink 1.9 feature freeze and
>>> >>> >> release testing
>>> >>> >>>>>>>>>>> effort.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I personally still recognize this
>>> >>> >> issue as
>>> >>> >>>>>>>> one
>>> >>> >>>>>>>>>>>> important
>>> >>> >>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>> be
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> solved.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> I'd
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> be happy to help resume this
>>> >>> >> discussion soon
>>> >>> >>>>>>>>>>> (after
>>> >>> >>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>> 1.9
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> release)
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> see if we can do some step
>>> >>> >> towards this in
>>> >>> >>>>>>>> Flink
>>> >>> >>>>>>>>>>>> 1.10.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 10:41 AM
>>> >>> >> Flavio
>>> >>> >>>>>>>>>>> Pompermaier
>>> >>> >>>>>>>>>>>> <
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> pompermaier@okkam.it>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> That's exactly what I suggested
>>> >>> >> a long time
>>> >>> >>>>>>>>>> ago:
>>> >>> >>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>> Flink
>>> >>> >>>>>>>>>>>>>>>>> REST
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> client
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> should not require any Flink
>>> >>> >> dependency,
>>> >>> >>>>>>>> only
>>> >>> >>>>>>>>>>> http
>>> >>> >>>>>>>>>>>>>>> library
>>> >>> >>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> call
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> REST
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> services to submit and monitor a
>>> >>> >> job.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested also in [1] was
>>> >>> >> to have a
>>> >>> >>>>>>>> way
>>> >>> >>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>> automatically
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> suggest
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> user (via a UI) the available
>>> >>> >> main classes
>>> >>> >>>>>>>> and
>>> >>> >>>>>>>>>>>> their
>>> >>> >>>>>>>>>>>>>>>> required
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> parameters[2].
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Another problem we have with
>>> >>> >> Flink is that
>>> >>> >>>>>>>> the
>>> >>> >>>>>>>>>>> Rest
>>> >>> >>>>>>>>>>>>>>> client
>>> >>> >>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> CLI
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> one
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> behaves differently and we use
>>> >>> >> the CLI
>>> >>> >>>>>>>> client
>>> >>> >>>>>>>>>>> (via
>>> >>> >>>>>>>>>>>> ssh)
>>> >>> >>>>>>>>>>>>>>>>> because
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> allows
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> to call some other method after
>>> >>> >>>>>>>> env.execute()
>>> >>> >>>>>>>>>> [3]
>>> >>> >>>>>>>>>>>> (we
>>> >>> >>>>>>>>>>>>>>> have
>>> >>> >>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> call
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> another
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> REST service to signal the end
>>> >>> >> of the job).
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Int his regard, a dedicated
>>> >>> >> interface,
>>> >>> >>>>>>>> like the
>>> >>> >>>>>>>>>>>>>>> JobListener
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> suggested
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the previous emails, would be
>>> >>> >> very helpful
>>> >>> >>>>>>>>>>> (IMHO).
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10864
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>> >>> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10862
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>> >>> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10879
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 9:54 AM
>>> >>> >> Jeff Zhang
>>> >>> >>>>>>>> <
>>> >>> >>>>>>>>>>>>>>>> zjffdu@gmail.com
>>> >>> >>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Tison,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your comments.
>>> >>> >> Overall I agree
>>> >>> >>>>>>>> with
>>> >>> >>>>>>>>>>> you
>>> >>> >>>>>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> difficult
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> down stream project to
>>> >>> >> integrate with
>>> >>> >>>>>>>> flink
>>> >>> >>>>>>>>>> and
>>> >>> >>>>>>>>>>> we
>>> >>> >>>>>>>>>>>>>> need
>>> >>> >>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> refactor
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> current flink client api.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> And I agree that CliFrontend
>>> >>> >> should only
>>> >>> >>>>>>>>>> parsing
>>> >>> >>>>>>>>>>>>>> command
>>> >>> >>>>>>>>>>>>>>>>> line
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> arguments
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> then pass them to
>>> >>> >> ExecutionEnvironment.
>>> >>> >>>>>>>> It is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment's
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> responsibility to compile job,
>>> >>> >> create
>>> >>> >>>>>>>> cluster,
>>> >>> >>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>> submit
>>> >>> >>>>>>>>>>>>>>>>> job.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Besides
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that, Currently flink has many
>>> >>> >>>>>>>>>>>> ExecutionEnvironment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> implementations,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink will use the specific one
>>> >>> >> based on
>>> >>> >>>>>>>> the
>>> >>> >>>>>>>>>>>> context.
>>> >>> >>>>>>>>>>>>>>>> IMHO,
>>> >>> >>>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary, ExecutionEnvironment
>>> >>> >> should be
>>> >>> >>>>>>>> able
>>> >>> >>>>>>>>>>> to
>>> >>> >>>>>>>>>>>> do
>>> >>> >>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>> right
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> thing
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> based
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the FlinkConf it is
>>> >>> >> received. Too many
>>> >>> >>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation is another
>>> >>> >> burden for
>>> >>> >>>>>>>>>> downstream
>>> >>> >>>>>>>>>>>>>> project
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> integration.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> One thing I'd like to mention
>>> >>> >> is flink's
>>> >>> >>>>>>>> scala
>>> >>> >>>>>>>>>>>> shell
>>> >>> >>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>> sql
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> client,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> although they are sub-modules
>>> >>> >> of flink,
>>> >>> >>>>>>>> they
>>> >>> >>>>>>>>>>>> could be
>>> >>> >>>>>>>>>>>>>>>>> treated
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> as
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> project which use flink's
>>> >>> >> client api.
>>> >>> >>>>>>>>>> Currently
>>> >>> >>>>>>>>>>>> you
>>> >>> >>>>>>>>>>>>>> will
>>> >>> >>>>>>>>>>>>>>>>> find
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> easy for them to integrate with
>>> >>> >> flink,
>>> >>> >>>>>>>> they
>>> >>> >>>>>>>>>>> share
>>> >>> >>>>>>>>>>>> many
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> duplicated
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> code.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> is another sign that we should
>>> >>> >> refactor
>>> >>> >>>>>>>> flink
>>> >>> >>>>>>>>>>>> client
>>> >>> >>>>>>>>>>>>>>> api.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I believe it is a large and
>>> >>> >> hard change,
>>> >>> >>>>>>>> and I
>>> >>> >>>>>>>>>>> am
>>> >>> >>>>>>>>>>>>>> afraid
>>> >>> >>>>>>>>>>>>>>>> we
>>> >>> >>>>>>>>>>>>>>>>>>> can
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> keep
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility since many of
>>> >>> >> changes are
>>> >>> >>>>>>>> user
>>> >>> >>>>>>>>>>>> facing.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <wander4096@gmail.com
>>> >>> >>>
>>> >>> >>>>>>>>>> 于2019年6月24日周一
>>> >>> >>>>>>>>>>>>>>> 下午2:53写道：
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> After a closer look on our
>>> >>> >> client apis,
>>> >>> >>>>>>>> I can
>>> >>> >>>>>>>>>>> see
>>> >>> >>>>>>>>>>>>>> there
>>> >>> >>>>>>>>>>>>>>>> are
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> two
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> major
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues to consistency and
>>> >>> >> integration,
>>> >>> >>>>>>>> namely
>>> >>> >>>>>>>>>>>>>> different
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job cluster which couples job
>>> >>> >> graph
>>> >>> >>>>>>>> creation
>>> >>> >>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>> cluster
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> deployment,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submission via CliFrontend
>>> >>> >> confusing
>>> >>> >>>>>>>>>>> control
>>> >>> >>>>>>>>>>>> flow
>>> >>> >>>>>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> graph
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compilation and job
>>> >>> >> submission. I'd like
>>> >>> >>>>>>>> to
>>> >>> >>>>>>>>>>>> follow
>>> >>> >>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> discuss
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> above,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mainly the process described
>>> >>> >> by Jeff and
>>> >>> >>>>>>>>>>>> Stephan, and
>>> >>> >>>>>>>>>>>>>>>> share
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> my
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas on these issues.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend confuses the
>>> >>> >> control flow
>>> >>> >>>>>>>> of
>>> >>> >>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>> compilation
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> submission.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Following the process of job
>>> >>> >> submission
>>> >>> >>>>>>>>>> Stephan
>>> >>> >>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>> Jeff
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> described,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution environment knows
>>> >>> >> all configs
>>> >>> >>>>>>>> of
>>> >>> >>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>> cluster
>>> >>> >>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> topos/settings
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the job. Ideally, in the
>>> >>> >> main method
>>> >>> >>>>>>>> of
>>> >>> >>>>>>>>>> user
>>> >>> >>>>>>>>>>>>>>> program,
>>> >>> >>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> calls
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> #execute
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (or named #submit) and Flink
>>> >>> >> deploys the
>>> >>> >>>>>>>>>>> cluster,
>>> >>> >>>>>>>>>>>>>>> compile
>>> >>> >>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> graph
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit it to the cluster.
>>> >>> >> However,
>>> >>> >>>>>>>>>> current
>>> >>> >>>>>>>>>>>>>>>> CliFrontend
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> does
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> all
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things inside its #runProgram
>>> >>> >> method,
>>> >>> >>>>>>>> which
>>> >>> >>>>>>>>>>>>>> introduces
>>> >>> >>>>>>>>>>>>>>> a
>>> >>> >>>>>>>>>>>>>>>>> lot
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of (stream) execution
>>> >>> >> environment.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually, it sets up an exec
>>> >>> >> env that
>>> >>> >>>>>>>> hijacks
>>> >>> >>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> #execute/executePlan
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method, initializes the job
>>> >>> >> graph and
>>> >>> >>>>>>>> abort
>>> >>> >>>>>>>>>>>>>> execution.
>>> >>> >>>>>>>>>>>>>>>> And
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> then
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow back to
>>> >>> >> CliFrontend, it
>>> >>> >>>>>>>> deploys
>>> >>> >>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>> cluster(or
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the client) and submits the
>>> >>> >> job graph.
>>> >>> >>>>>>>> This
>>> >>> >>>>>>>>>> is
>>> >>> >>>>>>>>>>>> quite
>>> >>> >>>>>>>>>>>>>> a
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> specific
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> internal
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> process inside Flink and none
>>> >>> >> of
>>> >>> >>>>>>>> consistency
>>> >>> >>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>> anything.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Deployment of job cluster
>>> >>> >> couples job
>>> >>> >>>>>>>>>> graph
>>> >>> >>>>>>>>>>>>>> creation
>>> >>> >>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> cluster
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment. Abstractly, from
>>> >>> >> user job to
>>> >>> >>>>>>>> a
>>> >>> >>>>>>>>>>>> concrete
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> submission,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    create JobGraph --\
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create ClusterClient -->
>>> >>> >> submit JobGraph
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> such a dependency.
>>> >>> >> ClusterClient was
>>> >>> >>>>>>>> created
>>> >>> >>>>>>>>>> by
>>> >>> >>>>>>>>>>>>>>> deploying
>>> >>> >>>>>>>>>>>>>>>>> or
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieving.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph submission requires a
>>> >>> >> compiled
>>> >>> >>>>>>>>>>> JobGraph
>>> >>> >>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>> valid
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but the creation of
>>> >>> >> ClusterClient is
>>> >>> >>>>>>>>>> abstractly
>>> >>> >>>>>>>>>>>>>>>> independent
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph. However, in job
>>> >>> >> cluster mode,
>>> >>> >>>>>>>> we
>>> >>> >>>>>>>>>>>> deploy job
>>> >>> >>>>>>>>>>>>>>>>> cluster
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> with
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> a
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> graph, which means we use
>>> >>> >> another
>>> >>> >>>>>>>> process:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create JobGraph --> deploy
>>> >>> >> cluster with
>>> >>> >>>>>>>> the
>>> >>> >>>>>>>>>>>> JobGraph
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Here is another inconsistency
>>> >>> >> and
>>> >>> >>>>>>>> downstream
>>> >>> >>>>>>>>>>>>>>>>> projects/client
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> apis
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> are
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forced to handle different
>>> >>> >> cases with
>>> >>> >>>>>>>> rare
>>> >>> >>>>>>>>>>>> supports
>>> >>> >>>>>>>>>>>>>>> from
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> Flink.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since we likely reached a
>>> >>> >> consensus on
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. all configs gathered by
>>> >>> >> Flink
>>> >>> >>>>>>>>>> configuration
>>> >>> >>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>> passed
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. execution environment knows
>>> >>> >> all
>>> >>> >>>>>>>> configs
>>> >>> >>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>> handles
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> execution(both
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment and submission)
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the issues above I propose
>>> >>> >> eliminating
>>> >>> >>>>>>>>>>>>>>> inconsistencies
>>> >>> >>>>>>>>>>>>>>>>> by
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> following
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend should exactly
>>> >>> >> be a front
>>> >>> >>>>>>>> end,
>>> >>> >>>>>>>>>>> at
>>> >>> >>>>>>>>>>>>>> least
>>> >>> >>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> "run"
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That means it just gathered
>>> >>> >> and passed
>>> >>> >>>>>>>> all
>>> >>> >>>>>>>>>>> config
>>> >>> >>>>>>>>>>>>>> from
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> command
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> line
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the main method of user
>>> >>> >> program.
>>> >>> >>>>>>>> Execution
>>> >>> >>>>>>>>>>>>>> environment
>>> >>> >>>>>>>>>>>>>>>>> knows
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> all
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> info
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and with an addition to utils
>>> >>> >> for
>>> >>> >>>>>>>>>>> ClusterClient,
>>> >>> >>>>>>>>>>>> we
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> gracefully
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> get
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> a
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient by deploying or
>>> >>> >>>>>>>> retrieving. In
>>> >>> >>>>>>>>>>> this
>>> >>> >>>>>>>>>>>>>> way,
>>> >>> >>>>>>>>>>>>>>> we
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> don't
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> need
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hijack #execute/executePlan
>>> >>> >> methods and
>>> >>> >>>>>>>> can
>>> >>> >>>>>>>>>>>> remove
>>> >>> >>>>>>>>>>>>>>>> various
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> hacking
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses of exec env, as
>>> >>> >> well as #run
>>> >>> >>>>>>>>>> methods
>>> >>> >>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient(for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interface-ized ClusterClient).
>>> >>> >> Now the
>>> >>> >>>>>>>>>> control
>>> >>> >>>>>>>>>>>> flow
>>> >>> >>>>>>>>>>>>>>> flows
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> from
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> CliFrontend
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the main method and never
>>> >>> >> returns.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Job cluster means a cluster
>>> >>> >> for the
>>> >>> >>>>>>>>>> specific
>>> >>> >>>>>>>>>>>> job.
>>> >>> >>>>>>>>>>>>>>> From
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> another
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> perspective, it is an
>>> >>> >> ephemeral session.
>>> >>> >>>>>>>> We
>>> >>> >>>>>>>>>> may
>>> >>> >>>>>>>>>>>>>>> decouple
>>> >>> >>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with a compiled job graph, but
>>> >>> >> start a
>>> >>> >>>>>>>>>> session
>>> >>> >>>>>>>>>>>> with
>>> >>> >>>>>>>>>>>>>>> idle
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> timeout
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit the job following.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> These topics, before we go
>>> >>> >> into more
>>> >>> >>>>>>>> details
>>> >>> >>>>>>>>>> on
>>> >>> >>>>>>>>>>>>>> design
>>> >>> >>>>>>>>>>>>>>> or
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are better to be aware and
>>> >>> >> discussed for
>>> >>> >>>>>>>> a
>>> >>> >>>>>>>>>>>> consensus.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <
>>> >>> >> wander4096@gmail.com>
>>> >>> >>>>>>>>>> 于2019年6月20日周四
>>> >>> >>>>>>>>>>>>>>> 上午3:21写道：
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for raising this
>>> >>> >> thread and the
>>> >>> >>>>>>>>>> design
>>> >>> >>>>>>>>>>>>>>> document!
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As @Thomas Weise mentioned
>>> >>> >> above,
>>> >>> >>>>>>>> extending
>>> >>> >>>>>>>>>>>> config
>>> >>> >>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>> flink
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires far more effort than
>>> >>> >> it should
>>> >>> >>>>>>>> be.
>>> >>> >>>>>>>>>>>> Another
>>> >>> >>>>>>>>>>>>>>>>> example
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is we achieve detach mode by
>>> >>> >> introduce
>>> >>> >>>>>>>>>> another
>>> >>> >>>>>>>>>>>>>>> execution
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment which also hijack
>>> >>> >> #execute
>>> >>> >>>>>>>>>> method.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with your idea that
>>> >>> >> user would
>>> >>> >>>>>>>>>>>> configure all
>>> >>> >>>>>>>>>>>>>>>>> things
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and flink "just" respect it.
>>> >>> >> On this
>>> >>> >>>>>>>> topic I
>>> >>> >>>>>>>>>>>> think
>>> >>> >>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> unusual
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow when CliFrontend
>>> >>> >> handle
>>> >>> >>>>>>>> "run"
>>> >>> >>>>>>>>>>>> command
>>> >>> >>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> problem.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It handles several configs,
>>> >>> >> mainly about
>>> >>> >>>>>>>>>>> cluster
>>> >>> >>>>>>>>>>>>>>>> settings,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thus main method of user
>>> >>> >> program is
>>> >>> >>>>>>>> unaware
>>> >>> >>>>>>>>>> of
>>> >>> >>>>>>>>>>>> them.
>>> >>> >>>>>>>>>>>>>>>> Also
>>> >>> >>>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> compiles
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> app to job graph by run the
>>> >>> >> main method
>>> >>> >>>>>>>>>> with a
>>> >>> >>>>>>>>>>>>>>> hijacked
>>> >>> >>>>>>>>>>>>>>>>> exec
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> env,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which constrain the main
>>> >>> >> method further.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to write down a few
>>> >>> >> of notes on
>>> >>> >>>>>>>>>>>>>> configs/args
>>> >>> >>>>>>>>>>>>>>>> pass
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> respect,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as well as decoupling job
>>> >>> >> compilation
>>> >>> >>>>>>>> and
>>> >>> >>>>>>>>>>>>>> submission.
>>> >>> >>>>>>>>>>>>>>>>> Share
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> on
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> this
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread later.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SHI Xiaogang <
>>> >>> >> shixiaogangg@gmail.com>
>>> >>> >>>>>>>>>>>> 于2019年6月17日周一
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> 下午7:29写道：
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff and Flavio,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Jeff a lot for
>>> >>> >> proposing the
>>> >>> >>>>>>>> design
>>> >>> >>>>>>>>>>>>>> document.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are also working on
>>> >>> >> refactoring
>>> >>> >>>>>>>>>>>> ClusterClient to
>>> >>> >>>>>>>>>>>>>>>> allow
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> flexible
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> efficient job management in
>>> >>> >> our
>>> >>> >>>>>>>> real-time
>>> >>> >>>>>>>>>>>> platform.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We would like to draft a
>>> >>> >> document to
>>> >>> >>>>>>>> share
>>> >>> >>>>>>>>>>> our
>>> >>> >>>>>>>>>>>>>> ideas
>>> >>> >>>>>>>>>>>>>>>> with
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> you.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think it's a good idea to
>>> >>> >> have
>>> >>> >>>>>>>> something
>>> >>> >>>>>>>>>>> like
>>> >>> >>>>>>>>>>>>>>> Apache
>>> >>> >>>>>>>>>>>>>>>>> Livy
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the efforts discussed here
>>> >>> >> will take a
>>> >>> >>>>>>>>>> great
>>> >>> >>>>>>>>>>>> step
>>> >>> >>>>>>>>>>>>>>>> forward
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> it.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Xiaogang
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
>>> >>> >>>>>>>> pompermaier@okkam.it>
>>> >>> >>>>>>>>>>>>>>>> 于2019年6月17日周一
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> 下午7:13写道：
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Is there any possibility to
>>> >>> >> have
>>> >>> >>>>>>>> something
>>> >>> >>>>>>>>>>>> like
>>> >>> >>>>>>>>>>>>>>> Apache
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> Livy
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink in the future?
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>> >> https://livy.apache.org/
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 11, 2019 at
>>> >>> >> 5:23 PM Jeff
>>> >>> >>>>>>>>>> Zhang <
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any API we expose
>>> >>> >> should not have
>>> >>> >>>>>>>>>>>> dependencies
>>> >>> >>>>>>>>>>>>>>> on
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package or
>>> >>> >> other
>>> >>> >>>>>>>>>>>> implementation
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> details.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> To
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> me,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
>>> >>> >> ClusterClient
>>> >>> >>>>>>>> cannot be
>>> >>> >>>>>>>>>>>> exposed
>>> >>> >>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> users
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> because
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from the
>>> >>> >>>>>>>> optimiser and
>>> >>> >>>>>>>>>>>> runtime
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> packages.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We should change
>>> >>> >> ClusterClient from
>>> >>> >>>>>>>> class
>>> >>> >>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>> interface.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment only
>>> >>> >> use the
>>> >>> >>>>>>>>>> interface
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> which
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in flink-clients while the
>>> >>> >> concrete
>>> >>> >>>>>>>>>>>>>> implementation
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> class
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> could
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-runtime.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What happens when a
>>> >>> >>>>>>>> failure/restart in
>>> >>> >>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>> client
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> happens?
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to be a way of
>>> >>> >> re-establishing the
>>> >>> >>>>>>>>>>>> connection to
>>> >>> >>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> job,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> set
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> up
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners again, etc.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Good point.  First we need
>>> >>> >> to define
>>> >>> >>>>>>>> what
>>> >>> >>>>>>>>>>>> does
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> failure/restart
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client mean. IIUC, that
>>> >>> >> usually mean
>>> >>> >>>>>>>>>>> network
>>> >>> >>>>>>>>>>>>>>> failure
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> which
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> will
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happen in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class RestClient. If my
>>> >>> >>>>>>>> understanding is
>>> >>> >>>>>>>>>>>> correct,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> restart/retry
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mechanism
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be done in
>>> >>> >> RestClient.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <
>>> >>> >>>>>>>> aljoscha@apache.org>
>>> >>> >>>>>>>>>>>>>>> 于2019年6月11日周二
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:10写道：
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Some points to consider:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * Any API we expose
>>> >>> >> should not have
>>> >>> >>>>>>>>>>>> dependencies
>>> >>> >>>>>>>>>>>>>>> on
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package
>>> >>> >> or other
>>> >>> >>>>>>>>>>>> implementation
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> details.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> To
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> me,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
>>> >>> >> ClusterClient
>>> >>> >>>>>>>> cannot be
>>> >>> >>>>>>>>>>>> exposed
>>> >>> >>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> users
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> because
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from
>>> >>> >> the
>>> >>> >>>>>>>> optimiser
>>> >>> >>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>> runtime
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> packages.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * What happens when a
>>> >>> >>>>>>>> failure/restart in
>>> >>> >>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>> client
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> happens?
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be a way of
>>> >>> >> re-establishing the
>>> >>> >>>>>>>>>> connection
>>> >>> >>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> job,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> set
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> up
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again, etc.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 29. May 2019, at
>>> >>> >> 10:17, Jeff
>>> >>> >>>>>>>> Zhang <
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry folks, the design
>>> >>> >> doc is
>>> >>> >>>>>>>> late as
>>> >>> >>>>>>>>>>> you
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> expected.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Here's
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doc
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I drafted, welcome any
>>> >>> >> comments and
>>> >>> >>>>>>>>>>>> feedback.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>
>>> >>> >>
>>> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <
>>> >>> >> sewen@apache.org>
>>> >>> >>>>>>>>>>>> 于2019年2月14日周四
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 下午8:43写道：
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Nice that this
>>> >>> >> discussion is
>>> >>> >>>>>>>>>> happening.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the FLIP, we could
>>> >>> >> also
>>> >>> >>>>>>>> revisit the
>>> >>> >>>>>>>>>>>> entire
>>> >>> >>>>>>>>>>>>>>> role
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environments
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the idea was:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the environments take
>>> >>> >> care of
>>> >>> >>>>>>>> the
>>> >>> >>>>>>>>>>>> specific
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> setup
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> standalone
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (no
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> setup needed), yarn,
>>> >>> >> mesos, etc.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the session ones have
>>> >>> >> control
>>> >>> >>>>>>>> over
>>> >>> >>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>> session.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> The
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> holds
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the session client.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - running a job gives a
>>> >>> >> "control"
>>> >>> >>>>>>>>>> object
>>> >>> >>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> job.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> That
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behavior
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the same in all
>>> >>> >> environments.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The actual
>>> >>> >> implementation diverged
>>> >>> >>>>>>>>>> quite
>>> >>> >>>>>>>>>>>> a bit
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> from
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> that.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Happy
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see a
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about
>>> >>> >> straitening this
>>> >>> >>>>>>>> out
>>> >>> >>>>>>>>>> a
>>> >>> >>>>>>>>>>>> bit
>>> >>> >>>>>>>>>>>>>>> more.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at
>>> >>> >> 4:58 AM
>>> >>> >>>>>>>> Jeff
>>> >>> >>>>>>>>>>>> Zhang <
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi folks,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for late
>>> >>> >> response, It
>>> >>> >>>>>>>> seems we
>>> >>> >>>>>>>>>>>> reach
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> consensus
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> on
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, I
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP for this with
>>> >>> >> more detailed
>>> >>> >>>>>>>>>> design
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas Weise <
>>> >>> >> thw@apache.org>
>>> >>> >>>>>>>>>>>> 于2018年12月21日周五
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> 上午11:43写道：
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great to see this
>>> >>> >> discussion
>>> >>> >>>>>>>> seeded!
>>> >>> >>>>>>>>>>> The
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> problems
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> you
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> face
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin integration
>>> >>> >> are also
>>> >>> >>>>>>>>>>> affecting
>>> >>> >>>>>>>>>>>>>> other
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We just enabled the
>>> >>> >> savepoint
>>> >>> >>>>>>>>>> restore
>>> >>> >>>>>>>>>>>> option
>>> >>> >>>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RemoteStreamEnvironment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and that was more
>>> >>> >> difficult
>>> >>> >>>>>>>> than it
>>> >>> >>>>>>>>>>>> should
>>> >>> >>>>>>>>>>>>>> be.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> The
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> main
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment and
>>> >>> >> cluster client
>>> >>> >>>>>>>>>> aren't
>>> >>> >>>>>>>>>>>>>>> decoupled.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Ideally
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible to just get
>>> >>> >> the
>>> >>> >>>>>>>> matching
>>> >>> >>>>>>>>>>>> cluster
>>> >>> >>>>>>>>>>>>>>> client
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> from
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then control the job
>>> >>> >> through it
>>> >>> >>>>>>>>>>>> (environment
>>> >>> >>>>>>>>>>>>>>> as
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> factory
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client). But note
>>> >>> >> that the
>>> >>> >>>>>>>>>> environment
>>> >>> >>>>>>>>>>>>>> classes
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> are
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> part
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and it is not
>>> >>> >> straightforward to
>>> >>> >>>>>>>>>> make
>>> >>> >>>>>>>>>>>> larger
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> changes
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> without
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> breaking
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backward
>>> >>> >> compatibility.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>> >> currently exposes
>>> >>> >>>>>>>>>>> internal
>>> >>> >>>>>>>>>>>>>>> classes
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> like
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> StreamGraph. But it
>>> >>> >> should be
>>> >>> >>>>>>>>>> possible
>>> >>> >>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>> wrap
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> this
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> with a
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that brings the
>>> >>> >> required job
>>> >>> >>>>>>>> control
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> capabilities
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Perhaps it is helpful
>>> >>> >> to look at
>>> >>> >>>>>>>>>> some
>>> >>> >>>>>>>>>>>> of the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> interfaces
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> while
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking about this:
>>> >>> >> [2] for the
>>> >>> >>>>>>>>>>>> portable
>>> >>> >>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> API
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> old
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> asynchronous job
>>> >>> >> control from
>>> >>> >>>>>>>> the
>>> >>> >>>>>>>>>> Beam
>>> >>> >>>>>>>>>>>> Java
>>> >>> >>>>>>>>>>>>>>> SDK.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The backward
>>> >>> >> compatibility
>>> >>> >>>>>>>>>> discussion
>>> >>> >>>>>>>>>>>> [4] is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> also
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> relevant
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here. A
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should shield
>>> >>> >> downstream
>>> >>> >>>>>>>> projects
>>> >>> >>>>>>>>>> from
>>> >>> >>>>>>>>>>>>>>> internals
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> allow
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> them to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interoperate with
>>> >>> >> multiple
>>> >>> >>>>>>>> future
>>> >>> >>>>>>>>>>> Flink
>>> >>> >>>>>>>>>>>>>>> versions
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> release
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without forced
>>> >>> >> upgrades.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>> >>>>>>>>>>>>>> https://github.com/apache/flink/pull/7249
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>
>>> >>> >>
>>> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>
>>> >>> >>
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [4]
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>
>>> >>> >>
>>> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018
>>> >>> >> at 6:15 PM
>>> >>> >>>>>>>> Jeff
>>> >>> >>>>>>>>>>>> Zhang <
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
>>> >>> >> whether the
>>> >>> >>>>>>>> user
>>> >>> >>>>>>>>>>>> should
>>> >>> >>>>>>>>>>>>>> be
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> able
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> define
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
>>> >>> >> example Yarn).
>>> >>> >>>>>>>> This
>>> >>> >>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>> actually
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> independent
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
>>> >>> >> something
>>> >>> >>>>>>>> which
>>> >>> >>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>> decided
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> at
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> User don't need to
>>> >>> >> specify
>>> >>> >>>>>>>>>> execution
>>> >>> >>>>>>>>>>>> mode
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatically.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> They
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass the execution
>>> >>> >> mode from
>>> >>> >>>>>>>> the
>>> >>> >>>>>>>>>>>> arguments
>>> >>> >>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> flink
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> run
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>>> >>> >> yarn-cluster
>>> >>> >>>>>>>> ....
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>>> >>> >> local ...
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>>> >>> >> host:port ...
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this make sense
>>> >>> >> to you ?
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To me it makes
>>> >>> >> sense that
>>> >>> >>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized by the
>>> >>> >> user and
>>> >>> >>>>>>>> instead
>>> >>> >>>>>>>>>>>> context
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> sensitive
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> how
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute your job
>>> >>> >> (Flink CLI vs.
>>> >>> >>>>>>>>>> IDE,
>>> >>> >>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> example).
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Right, currently I
>>> >>> >> notice Flink
>>> >>> >>>>>>>>>> would
>>> >>> >>>>>>>>>>>>>> create
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> different
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >> ContextExecutionEnvironment
>>> >>> >>>>>>>> based
>>> >>> >>>>>>>>>> on
>>> >>> >>>>>>>>>>>>>>> different
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> submission
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scenarios
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cli vs IDE). To me
>>> >>> >> this is
>>> >>> >>>>>>>> kind of
>>> >>> >>>>>>>>>>> hack
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> approach,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> so
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> straightforward.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested
>>> >>> >> above is that
>>> >>> >>>>>>>> is
>>> >>> >>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>> flink
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> should
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> always
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >>> >> but with
>>> >>> >>>>>>>>>>> different
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> configuration,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> based
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration it
>>> >>> >> would create
>>> >>> >>>>>>>> the
>>> >>> >>>>>>>>>>>> proper
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behaviors.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
>>> >>> >>>>>>>>>> trohrmann@apache.org>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月20日周四
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:18写道：
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You are probably
>>> >>> >> right that we
>>> >>> >>>>>>>>>> have
>>> >>> >>>>>>>>>>>> code
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> duplication
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> when
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creation of the
>>> >>> >> ClusterClient.
>>> >>> >>>>>>>>>> This
>>> >>> >>>>>>>>>>>> should
>>> >>> >>>>>>>>>>>>>>> be
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> reduced
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
>>> >>> >> whether the
>>> >>> >>>>>>>> user
>>> >>> >>>>>>>>>>>> should be
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> able
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> define
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
>>> >>> >> example Yarn).
>>> >>> >>>>>>>> This
>>> >>> >>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>> actually
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> independent
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
>>> >>> >> something
>>> >>> >>>>>>>> which
>>> >>> >>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>> decided
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> at
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> makes sense that the
>>> >>> >>>>>>>>>>>> ExecutionEnvironment
>>> >>> >>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> not
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user and
>>> >>> >> instead context
>>> >>> >>>>>>>>>>>> sensitive how
>>> >>> >>>>>>>>>>>>>>> you
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> want
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE,
>>> >>> >> for
>>> >>> >>>>>>>> example).
>>> >>> >>>>>>>>>>>>>> However, I
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> agree
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >> ExecutionEnvironment should
>>> >>> >>>>>>>> give
>>> >>> >>>>>>>>>> you
>>> >>> >>>>>>>>>>>>>> access
>>> >>> >>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job (maybe in the
>>> >>> >> form of the
>>> >>> >>>>>>>>>>>> JobGraph or
>>> >>> >>>>>>>>>>>>>> a
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> plan).
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 13,
>>> >>> >> 2018 at 4:36
>>> >>> >>>>>>>> AM
>>> >>> >>>>>>>>>> Jeff
>>> >>> >>>>>>>>>>>>>> Zhang <
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Till,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the
>>> >>> >> feedback. You
>>> >>> >>>>>>>> are
>>> >>> >>>>>>>>>>>> right
>>> >>> >>>>>>>>>>>>>>> that I
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> expect
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >> submission/control api
>>> >>> >>>>>>>> which
>>> >>> >>>>>>>>>>>> could be
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> used
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> by
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> project.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it would benefit
>>> >>> >> for the
>>> >>> >>>>>>>> flink
>>> >>> >>>>>>>>>>>> ecosystem.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> When
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> I
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> look
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scala-shell and
>>> >>> >> sql-client (I
>>> >>> >>>>>>>>>>> believe
>>> >>> >>>>>>>>>>>>>> they
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> are
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> core of
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> belong to the
>>> >>> >> ecosystem of
>>> >>> >>>>>>>>>> flink),
>>> >>> >>>>>>>>>>> I
>>> >>> >>>>>>>>>>>> find
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> many
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> duplicated
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creating
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient from
>>> >>> >> user
>>> >>> >>>>>>>> provided
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> configuration
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (configuration
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> format
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different from
>>> >>> >> scala-shell
>>> >>> >>>>>>>> and
>>> >>> >>>>>>>>>>>>>> sql-client)
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> use
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to manipulate
>>> >>> >> jobs. I don't
>>> >>> >>>>>>>> think
>>> >>> >>>>>>>>>>>> this is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> convenient
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects. What I
>>> >>> >> expect is
>>> >>> >>>>>>>> that
>>> >>> >>>>>>>>>>>>>> downstream
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> project
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> only
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary
>>> >>> >> configuration info
>>> >>> >>>>>>>>>> (maybe
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> introducing
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> class
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf),
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> build
>>> >>> >> ExecutionEnvironment
>>> >>> >>>>>>>> based
>>> >>> >>>>>>>>>> on
>>> >>> >>>>>>>>>>>> this
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >> ExecutionEnvironment will
>>> >>> >>>>>>>> create
>>> >>> >>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>> proper
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> benefit for the
>>> >>> >> downstream
>>> >>> >>>>>>>>>> project
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> development
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> but
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> also
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their integration
>>> >>> >> test with
>>> >>> >>>>>>>>>> flink.
>>> >>> >>>>>>>>>>>> Here's
>>> >>> >>>>>>>>>>>>>>> one
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> sample
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> snippet
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expect.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val conf = new
>>> >>> >>>>>>>>>>>> FlinkConf().mode("yarn")
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val env = new
>>> >>> >>>>>>>>>>>> ExecutionEnvironment(conf)
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobId =
>>> >>> >> env.submit(...)
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobStatus =
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What do you think ?
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
>>> >>> >>>>>>>>>>> trohrmann@apache.org>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月11日周二
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午6:28写道：
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what you are
>>> >>> >> proposing is to
>>> >>> >>>>>>>>>>>> provide the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> user
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> with
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control. There
>>> >>> >> was actually
>>> >>> >>>>>>>> an
>>> >>> >>>>>>>>>>>> effort to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> achieve
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> never
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> been
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> completed [1].
>>> >>> >> However,
>>> >>> >>>>>>>> there
>>> >>> >>>>>>>>>> are
>>> >>> >>>>>>>>>>>> some
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> improvement
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> base
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> now.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Look for example
>>> >>> >> at the
>>> >>> >>>>>>>>>>>> NewClusterClient
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> interface
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> offers a
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> non-blocking job
>>> >>> >> submission.
>>> >>> >>>>>>>>>> But I
>>> >>> >>>>>>>>>>>> agree
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> we
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improve
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this regard.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would not be in
>>> >>> >> favour if
>>> >>> >>>>>>>>>>>> exposing all
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> calls
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >> ExecutionEnvironment
>>> >>> >>>>>>>> because it
>>> >>> >>>>>>>>>>>> would
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> clutter
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> class
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> good separation
>>> >>> >> of concerns.
>>> >>> >>>>>>>>>>>> Instead one
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> idea
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> could
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current
>>> >>> >> ClusterClient from
>>> >>> >>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for cluster and
>>> >>> >> job
>>> >>> >>>>>>>> control. But
>>> >>> >>>>>>>>>>>> before
>>> >>> >>>>>>>>>>>>>> we
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> start
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> an
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> effort
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> agree and capture
>>> >>> >> what
>>> >>> >>>>>>>>>>>> functionality we
>>> >>> >>>>>>>>>>>>>>> want
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> provide.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the
>>> >>> >> idea was
>>> >>> >>>>>>>> that we
>>> >>> >>>>>>>>>>>> have the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> describing
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to talk to
>>> >>> >> cluster manager
>>> >>> >>>>>>>> like
>>> >>> >>>>>>>>>>>> Yarn or
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> Mesos.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> The
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used for
>>> >>> >> deploying Flink
>>> >>> >>>>>>>>>> clusters
>>> >>> >>>>>>>>>>>> (job
>>> >>> >>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> session)
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gives
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you a
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
>>> >>> >>>>>>>> ClusterClient
>>> >>> >>>>>>>>>>>>>> controls
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (e.g.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, listing all
>>> >>> >> running
>>> >>> >>>>>>>> jobs).
>>> >>> >>>>>>>>>>> And
>>> >>> >>>>>>>>>>>>>> then
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> there
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> was
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> idea
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobClient which
>>> >>> >> you obtain
>>> >>> >>>>>>>> from
>>> >>> >>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trigger
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operations (e.g.
>>> >>> >> taking a
>>> >>> >>>>>>>>>>> savepoint,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> cancelling
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> job).
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-4272
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11,
>>> >>> >> 2018 at
>>> >>> >>>>>>>> 10:13 AM
>>> >>> >>>>>>>>>>>> Jeff
>>> >>> >>>>>>>>>>>>>>> Zhang
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> <
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am trying to
>>> >>> >> integrate
>>> >>> >>>>>>>> flink
>>> >>> >>>>>>>>>>> into
>>> >>> >>>>>>>>>>>>>>> apache
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> zeppelin
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> notebook. And I
>>> >>> >> hit several
>>> >>> >>>>>>>>>>> issues
>>> >>> >>>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> caused
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> by
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to
>>> >>> >> proposal the
>>> >>> >>>>>>>>>>> following
>>> >>> >>>>>>>>>>>>>>> changes
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Support
>>> >>> >> nonblocking
>>> >>> >>>>>>>>>> execution.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> Currently,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >> ExecutionEnvironment#execute
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a blocking
>>> >>> >> method which
>>> >>> >>>>>>>>>> would
>>> >>> >>>>>>>>>>>> do 2
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> things,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> first
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wait for job
>>> >>> >> until it is
>>> >>> >>>>>>>>>>> finished.
>>> >>> >>>>>>>>>>>> I'd
>>> >>> >>>>>>>>>>>>>>> like
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nonblocking
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution method
>>> >>> >> like
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#submit
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then return
>>> >>> >> jobId to
>>> >>> >>>>>>>> client.
>>> >>> >>>>>>>>>> And
>>> >>> >>>>>>>>>>>> allow
>>> >>> >>>>>>>>>>>>>>> user
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> query
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> status
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel
>>> >>> >> api in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >> ExecutionEnvironment/StreamExecutionEnvironment,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> currently the
>>> >>> >> only way to
>>> >>> >>>>>>>>>> cancel
>>> >>> >>>>>>>>>>>> job is
>>> >>> >>>>>>>>>>>>>>> via
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> cli
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (bin/flink),
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> convenient for
>>> >>> >> downstream
>>> >>> >>>>>>>>>> project
>>> >>> >>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>> use
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> this
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> feature.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So I'd
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> add
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel api in
>>> >>> >>>>>>>>>>> ExecutionEnvironment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint
>>> >>> >> api in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is similar as
>>> >>> >> cancel api,
>>> >>> >>>>>>>> we
>>> >>> >>>>>>>>>>>> should use
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unified
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api for third
>>> >>> >> party to
>>> >>> >>>>>>>>>> integrate
>>> >>> >>>>>>>>>>>> with
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> flink.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. Add listener
>>> >>> >> for job
>>> >>> >>>>>>>>>> execution
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> lifecycle.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Something
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> so
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that downstream
>>> >>> >> project
>>> >>> >>>>>>>> can do
>>> >>> >>>>>>>>>>>> custom
>>> >>> >>>>>>>>>>>>>>> logic
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lifecycle
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin would
>>> >>> >> capture the
>>> >>> >>>>>>>>>> jobId
>>> >>> >>>>>>>>>>>> after
>>> >>> >>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> submitted
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId to cancel
>>> >>> >> it later
>>> >>> >>>>>>>> when
>>> >>> >>>>>>>>>>>>>> necessary.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public interface
>>> >>> >>>>>>>> JobListener {
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>>> >>> >> onJobSubmitted(JobID
>>> >>> >>>>>>>>>>> jobId);
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>>> >>> >>>>>>>>>>>> onJobExecuted(JobExecutionResult
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> jobResult);
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>>> >>> >> onJobCanceled(JobID
>>> >>> >>>>>>>>>> jobId);
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5. Enable
>>> >>> >> session in
>>> >>> >>>>>>>>>>>>>>> ExecutionEnvironment.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Currently
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> disabled,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> session is very
>>> >>> >> convenient
>>> >>> >>>>>>>> for
>>> >>> >>>>>>>>>>>> third
>>> >>> >>>>>>>>>>>>>>> party
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> continually.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I hope flink can
>>> >>> >> enable it
>>> >>> >>>>>>>>>> again.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6. Unify all
>>> >>> >> flink client
>>> >>> >>>>>>>> api
>>> >>> >>>>>>>>>>> into
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a long
>>> >>> >> term issue
>>> >>> >>>>>>>> which
>>> >>> >>>>>>>>>>>> needs
>>> >>> >>>>>>>>>>>>>>> more
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> careful
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently some
>>> >>> >> of features
>>> >>> >>>>>>>> of
>>> >>> >>>>>>>>>>>> flink is
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> exposed
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some are
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exposed
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cli instead of
>>> >>> >> api, like
>>> >>> >>>>>>>> the
>>> >>> >>>>>>>>>>>> cancel and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> savepoint I
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> above.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think the root
>>> >>> >> cause is
>>> >>> >>>>>>>> due to
>>> >>> >>>>>>>>>>> that
>>> >>> >>>>>>>>>>>>>> flink
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> didn't
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> unify
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interaction
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink. Here I
>>> >>> >> list 3
>>> >>> >>>>>>>> scenarios
>>> >>> >>>>>>>>>> of
>>> >>> >>>>>>>>>>>> flink
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> operation
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Local job
>>> >>> >> execution.
>>> >>> >>>>>>>> Flink
>>> >>> >>>>>>>>>>> will
>>> >>> >>>>>>>>>>>>>>> create
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>> >>> >> LocalEnvironment to
>>> >>> >>>>>>>>>> create
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> LocalExecutor
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Remote job
>>> >>> >> execution.
>>> >>> >>>>>>>> Flink
>>> >>> >>>>>>>>>>> will
>>> >>> >>>>>>>>>>>>>>> create
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>> >>> >> ContextEnvironment
>>> >>> >>>>>>>>>> based
>>> >>> >>>>>>>>>>>> on the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> run
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Job
>>> >>> >> cancelation. Flink
>>> >>> >>>>>>>> will
>>> >>> >>>>>>>>>>>> create
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this job via
>>> >>> >> this
>>> >>> >>>>>>>>>> ClusterClient.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As you can see
>>> >>> >> in the
>>> >>> >>>>>>>> above 3
>>> >>> >>>>>>>>>>>>>> scenarios.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> Flink
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> didn't
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach(code
>>> >>> >> path) to
>>> >>> >>>>>>>> interact
>>> >>> >>>>>>>>>>>> with
>>> >>> >>>>>>>>>>>>>>> flink
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I propose is
>>> >>> >>>>>>>> following:
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Create the proper
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment/RemoteEnvironment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (based
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration)
>>> >>> >> --> Use this
>>> >>> >>>>>>>>>>>> Environment
>>> >>> >>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> create
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> proper
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >> (LocalClusterClient or
>>> >>> >>>>>>>>>>>>>> RestClusterClient)
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink (
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution or
>>> >>> >> cancelation)
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This way we can
>>> >>> >> unify the
>>> >>> >>>>>>>>>> process
>>> >>> >>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>> local
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> execution
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remote
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And it is much
>>> >>> >> easier for
>>> >>> >>>>>>>> third
>>> >>> >>>>>>>>>>>> party
>>> >>> >>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> integrate
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >> ExecutionEnvironment is the
>>> >>> >>>>>>>>>>> unified
>>> >>> >>>>>>>>>>>>>> entry
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>> point
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> third
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> party
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs to do is
>>> >>> >> just pass
>>> >>> >>>>>>>>>>>> configuration
>>> >>> >>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >> ExecutionEnvironment will
>>> >>> >>>>>>>> do
>>> >>> >>>>>>>>>> the
>>> >>> >>>>>>>>>>>> right
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> thing
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> based
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink cli can
>>> >>> >> also be
>>> >>> >>>>>>>>>> considered
>>> >>> >>>>>>>>>>> as
>>> >>> >>>>>>>>>>>>>> flink
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> api
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> consumer.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> just
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration to
>>> >>> >>>>>>>>>>>> ExecutionEnvironment
>>> >>> >>>>>>>>>>>>>> and
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> let
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create the proper
>>> >>> >>>>>>>> ClusterClient
>>> >>> >>>>>>>>>>>> instead
>>> >>> >>>>>>>>>>>>>>> of
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> letting
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> cli
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>> >> directly.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6 would involve
>>> >>> >> large code
>>> >>> >>>>>>>>>>>> refactoring,
>>> >>> >>>>>>>>>>>>>>> so
>>> >>> >>>>>>>>>>>>>>>>>>>>>>> I
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> think
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> defer
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future release,
>>> >>> >> 1,2,3,4,5
>>> >>> >>>>>>>> could
>>> >>> >>>>>>>>>>> be
>>> >>> >>>>>>>>>>>> done
>>> >>> >>>>>>>>>>>>>>> at
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> once I
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> believe.
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comments and
>>> >>> >> feedback,
>>> >>> >>>>>>>> thanks
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>> --
>>> >>> >>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>>
>>> >>> >>>>>>>>>>>
>>> >>> >>>>>>>>>>
>>> >>> >>>>>>>>
>>> >>> >>>>>>>
>>> >>> >>>>>>>
>>> >>> >>>>>>> --
>>> >>> >>>>>>> Best Regards
>>> >>> >>>>>>>
>>> >>> >>>>>>> Jeff Zhang
>>> >>> >>
>>> >>> >>
>>> >>>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards
>>> >>
>>> >> Jeff Zhang
>>>
>>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

s/ASK/ASF/

Best,
tison.


Zili Chen <wa...@gmail.com> 于2019年9月11日周三 下午5:53写道：

> FYI I've created a public channel #flink-client-api-enhancement under ASK
> slack.
>
> You can reach it after join ASK Slack and search the channel.
>
> Best,
> tison.
>
>
> Kostas Kloudas <kk...@gmail.com> 于2019年9月11日周三 下午5:09写道：
>
>> +1 for creating a channel.
>>
>> Kostas
>>
>> On Wed, Sep 11, 2019 at 10:57 AM Zili Chen <wa...@gmail.com> wrote:
>> >
>> > Hi Aljoscha,
>> >
>> > I'm OK to use the ASF slack.
>> >
>> > Best,
>> > tison.
>> >
>> >
>> > Jeff Zhang <zj...@gmail.com> 于2019年9月11日周三 下午4:48写道：
>> >>
>> >> +1 for using slack for instant communication
>> >>
>> >> Aljoscha Krettek <al...@apache.org> 于2019年9月11日周三 下午4:44写道：
>> >>>
>> >>> Hi,
>> >>>
>> >>> We could try and use the ASF slack for this purpose, that would
>> probably be easiest. See https://s.apache.org/slack-invite. We could
>> create a dedicated channel for our work and would still use the open ASF
>> infrastructure and people can have a look if they are interested because
>> discussion would be public. What do you think?
>> >>>
>> >>> P.S. Committers/PMCs should should be able to login with their apache
>> ID.
>> >>>
>> >>> Best,
>> >>> Aljoscha
>> >>>
>> >>> > On 6. Sep 2019, at 14:24, Zili Chen <wa...@gmail.com> wrote:
>> >>> >
>> >>> > Hi Aljoscha,
>> >>> >
>> >>> > I'd like to gather all the ideas here and among documents, and
>> draft a
>> >>> > formal FLIP
>> >>> > that keep us on the same page. Hopefully I start a FLIP thread in
>> next week.
>> >>> >
>> >>> > For the implementation or said POC part, I'd like to work with you
>> guys who
>> >>> > proposed
>> >>> > the concept Executor to make sure that we go in the same direction.
>> I'm
>> >>> > wondering
>> >>> > whether a dedicate thread or a Slack group is the proper one. In my
>> opinion
>> >>> > we can
>> >>> > involve the team in a Slack group, concurrent with the FLIP process
>> start
>> >>> > our branch
>> >>> > and once we reach a consensus on the FLIP, open an umbrella issue
>> about the
>> >>> > framework
>> >>> > and start subtasks. What do you think?
>> >>> >
>> >>> > Best,
>> >>> > tison.
>> >>> >
>> >>> >
>> >>> > Aljoscha Krettek <al...@apache.org> 于2019年9月5日周四 下午9:39写道：
>> >>> >
>> >>> >> Hi Tison,
>> >>> >>
>> >>> >> To keep this moving forward, maybe you want to start working on a
>> proof of
>> >>> >> concept implementation for the new JobClient interface, maybe with
>> a new
>> >>> >> method executeAsync() in the environment that returns the
>> JobClient and
>> >>> >> implement the methods to see how that works and to see where we
>> get. Would
>> >>> >> you be interested in that?
>> >>> >>
>> >>> >> Also, at some point we should collect all the ideas and start
>> forming an
>> >>> >> actual FLIP.
>> >>> >>
>> >>> >> Best,
>> >>> >> Aljoscha
>> >>> >>
>> >>> >>> On 4. Sep 2019, at 12:04, Zili Chen <wa...@gmail.com> wrote:
>> >>> >>>
>> >>> >>> Thanks for your update Kostas!
>> >>> >>>
>> >>> >>> It looks good to me that clean up existing code paths as first
>> >>> >>> pass. I'd like to help on review and file subtasks if I find ones.
>> >>> >>>
>> >>> >>> Best,
>> >>> >>> tison.
>> >>> >>>
>> >>> >>>
>> >>> >>> Kostas Kloudas <kk...@gmail.com> 于2019年9月4日周三 下午5:52写道：
>> >>> >>> Here is the issue, and I will keep on updating it as I find more
>> issues.
>> >>> >>>
>> >>> >>> https://issues.apache.org/jira/browse/FLINK-13954
>> >>> >>>
>> >>> >>> This will also cover the refactoring of the Executors that we
>> discussed
>> >>> >>> in this thread, without any additional functionality (such as the
>> job
>> >>> >> client).
>> >>> >>>
>> >>> >>> Kostas
>> >>> >>>
>> >>> >>> On Wed, Sep 4, 2019 at 11:46 AM Kostas Kloudas <
>> kkloudas@gmail.com>
>> >>> >> wrote:
>> >>> >>>>
>> >>> >>>> Great idea Tison!
>> >>> >>>>
>> >>> >>>> I will create the umbrella issue and post it here so that we are
>> all
>> >>> >>>> on the same page!
>> >>> >>>>
>> >>> >>>> Cheers,
>> >>> >>>> Kostas
>> >>> >>>>
>> >>> >>>> On Wed, Sep 4, 2019 at 11:36 AM Zili Chen <wa...@gmail.com>
>> >>> >> wrote:
>> >>> >>>>>
>> >>> >>>>> Hi Kostas & Aljoscha,
>> >>> >>>>>
>> >>> >>>>> I notice that there is a JIRA(FLINK-13946) which could be
>> included
>> >>> >>>>> in this refactor thread. Since we agree on most of directions in
>> >>> >>>>> big picture, is it reasonable that we create an umbrella issue
>> for
>> >>> >>>>> refactor client APIs and also linked subtasks? It would be a
>> better
>> >>> >>>>> way that we join forces of our community.
>> >>> >>>>>
>> >>> >>>>> Best,
>> >>> >>>>> tison.
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> Zili Chen <wa...@gmail.com> 于2019年8月31日周六 下午12:52写道：
>> >>> >>>>>>
>> >>> >>>>>> Great Kostas! Looking forward to your POC!
>> >>> >>>>>>
>> >>> >>>>>> Best,
>> >>> >>>>>> tison.
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>> Jeff Zhang <zj...@gmail.com> 于2019年8月30日周五 下午11:07写道：
>> >>> >>>>>>>
>> >>> >>>>>>> Awesome, @Kostas Looking forward your POC.
>> >>> >>>>>>>
>> >>> >>>>>>> Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：
>> >>> >>>>>>>
>> >>> >>>>>>>> Hi all,
>> >>> >>>>>>>>
>> >>> >>>>>>>> I am just writing here to let you know that I am working on a
>> >>> >> POC that
>> >>> >>>>>>>> tries to refactor the current state of job submission in
>> Flink.
>> >>> >>>>>>>> I want to stress out that it introduces NO CHANGES to the
>> current
>> >>> >>>>>>>> behaviour of Flink. It just re-arranges things and
>> introduces the
>> >>> >>>>>>>> notion of an Executor, which is the entity responsible for
>> >>> >> taking the
>> >>> >>>>>>>> user-code and submitting it for execution.
>> >>> >>>>>>>>
>> >>> >>>>>>>> Given this, the discussion about the functionality that the
>> >>> >> JobClient
>> >>> >>>>>>>> will expose to the user can go on independently and the same
>> >>> >>>>>>>> holds for all the open questions so far.
>> >>> >>>>>>>>
>> >>> >>>>>>>> I hope I will have some more new to share soon.
>> >>> >>>>>>>>
>> >>> >>>>>>>> Thanks,
>> >>> >>>>>>>> Kostas
>> >>> >>>>>>>>
>> >>> >>>>>>>> On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <
>> danrtsey.wy@gmail.com>
>> >>> >> wrote:
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> Hi Zili,
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> It make sense to me that a dedicated cluster is started for
>> a
>> >>> >> per-job
>> >>> >>>>>>>>> cluster and will not accept more jobs.
>> >>> >>>>>>>>> Just have a question about the command line.
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> Currently we could use the following commands to start
>> >>> >> different
>> >>> >>>>>>>> clusters.
>> >>> >>>>>>>>> *per-job cluster*
>> >>> >>>>>>>>> ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
>> >>> >>>>>>>>> examples/streaming/WindowJoin.jar
>> >>> >>>>>>>>> *session cluster*
>> >>> >>>>>>>>> ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
>> >>> >>>>>>>>> examples/streaming/WindowJoin.jar
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> What will it look like after client enhancement?
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> Best,
>> >>> >>>>>>>>> Yang
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
>> >>> >>>>>>>>>
>> >>> >>>>>>>>>> Hi Till,
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>>> Thanks for your update. Nice to hear :-)
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>>> Best,
>> >>> >>>>>>>>>> tison.
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>>> Till Rohrmann <tr...@apache.org> 于2019年8月23日周五
>> >>> >> 下午10:39写道：
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>>>> Hi Tison,
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>> just a quick comment concerning the class loading issues
>> >>> >> when using
>> >>> >>>>>>>> the
>> >>> >>>>>>>>>> per
>> >>> >>>>>>>>>>> job mode. The community wants to change it so that the
>> >>> >>>>>>>>>>> StandaloneJobClusterEntryPoint actually uses the user code
>> >>> >> class
>> >>> >>>>>>>> loader
>> >>> >>>>>>>>>>> with child first class loading [1]. Hence, I hope that
>> >>> >> this problem
>> >>> >>>>>>>> will
>> >>> >>>>>>>>>> be
>> >>> >>>>>>>>>>> resolved soon.
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-13840
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>> Cheers,
>> >>> >>>>>>>>>>> Till
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>> On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <
>> >>> >> kkloudas@gmail.com>
>> >>> >>>>>>>>>> wrote:
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>>> Hi all,
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>> On the topic of web submission, I agree with Till that
>> >>> >> it only
>> >>> >>>>>>>> seems
>> >>> >>>>>>>>>>>> to complicate things.
>> >>> >>>>>>>>>>>> It is bad for security, job isolation (anybody can
>> >>> >> submit/cancel
>> >>> >>>>>>>> jobs),
>> >>> >>>>>>>>>>>> and its
>> >>> >>>>>>>>>>>> implementation complicates some parts of the code. So,
>> >>> >> if it were
>> >>> >>>>>>>> to
>> >>> >>>>>>>>>>>> redesign the
>> >>> >>>>>>>>>>>> WebUI, maybe this part could be left out. In addition, I
>> >>> >> would say
>> >>> >>>>>>>>>>>> that the ability to cancel
>> >>> >>>>>>>>>>>> jobs could also be left out.
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>> Also I would also be in favour of removing the
>> >>> >> "detached" mode, for
>> >>> >>>>>>>>>>>> the reasons mentioned
>> >>> >>>>>>>>>>>> above (i.e. because now we will have a future
>> >>> >> representing the
>> >>> >>>>>>>> result
>> >>> >>>>>>>>>>>> on which the user
>> >>> >>>>>>>>>>>> can choose to wait or not).
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>> Now for the separating job submission and cluster
>> >>> >> creation, I am in
>> >>> >>>>>>>>>>>> favour of keeping both.
>> >>> >>>>>>>>>>>> Once again, the reasons are mentioned above by Stephan,
>> >>> >> Till,
>> >>> >>>>>>>> Aljoscha
>> >>> >>>>>>>>>>>> and also Zili seems
>> >>> >>>>>>>>>>>> to agree. They mainly have to do with security,
>> >>> >> isolation and ease
>> >>> >>>>>>>> of
>> >>> >>>>>>>>>>>> resource management
>> >>> >>>>>>>>>>>> for the user as he knows that "when my job is done,
>> >>> >> everything
>> >>> >>>>>>>> will be
>> >>> >>>>>>>>>>>> cleared up". This is
>> >>> >>>>>>>>>>>> also the experience you get when launching a process on
>> >>> >> your local
>> >>> >>>>>>>> OS.
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>> On excluding the per-job mode from returning a JobClient
>> >>> >> or not, I
>> >>> >>>>>>>>>>>> believe that eventually
>> >>> >>>>>>>>>>>> it would be nice to allow users to get back a jobClient.
>> >>> >> The
>> >>> >>>>>>>> reason is
>> >>> >>>>>>>>>>>> that 1) I cannot
>> >>> >>>>>>>>>>>> find any objective reason why the user-experience should
>> >>> >> diverge,
>> >>> >>>>>>>> and
>> >>> >>>>>>>>>>>> 2) this will be the
>> >>> >>>>>>>>>>>> way that the user will be able to interact with his
>> >>> >> running job.
>> >>> >>>>>>>>>>>> Assuming that the necessary
>> >>> >>>>>>>>>>>> ports are open for the REST API to work, then I think
>> >>> >> that the
>> >>> >>>>>>>>>>>> JobClient can run against the
>> >>> >>>>>>>>>>>> REST API without problems. If the needed ports are not
>> >>> >> open, then
>> >>> >>>>>>>> we
>> >>> >>>>>>>>>>>> are safe to not return
>> >>> >>>>>>>>>>>> a JobClient, as the user explicitly chose to close all
>> >>> >> points of
>> >>> >>>>>>>>>>>> communication to his running job.
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>> On the topic of not hijacking the "env.execute()" in
>> >>> >> order to get
>> >>> >>>>>>>> the
>> >>> >>>>>>>>>>>> Plan, I definitely agree but
>> >>> >>>>>>>>>>>> for the proposal of having a "compile()" method in the
>> >>> >> env, I would
>> >>> >>>>>>>>>>>> like to have a better look at
>> >>> >>>>>>>>>>>> the existing code.
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>> Cheers,
>> >>> >>>>>>>>>>>> Kostas
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>> On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <
>> >>> >> wander4096@gmail.com>
>> >>> >>>>>>>>>> wrote:
>> >>> >>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>> Hi Yang,
>> >>> >>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>> It would be helpful if you check Stephan's last
>> >>> >> comment,
>> >>> >>>>>>>>>>>>> which states that isolation is important.
>> >>> >>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>> For per-job mode, we run a dedicated cluster(maybe it
>> >>> >>>>>>>>>>>>> should have been a couple of JM and TMs during FLIP-6
>> >>> >>>>>>>>>>>>> design) for a specific job. Thus the process is
>> >>> >> prevented
>> >>> >>>>>>>>>>>>> from other jobs.
>> >>> >>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>> In our cases there was a time we suffered from multi
>> >>> >>>>>>>>>>>>> jobs submitted by different users and they affected
>> >>> >>>>>>>>>>>>> each other so that all ran into an error state. Also,
>> >>> >>>>>>>>>>>>> run the client inside the cluster could save client
>> >>> >>>>>>>>>>>>> resource at some points.
>> >>> >>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>> However, we also face several issues as you mentioned,
>> >>> >>>>>>>>>>>>> that in per-job mode it always uses parent classloader
>> >>> >>>>>>>>>>>>> thus classloading issues occur.
>> >>> >>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>> BTW, one can makes an analogy between session/per-job
>> >>> >> mode
>> >>> >>>>>>>>>>>>> in  Flink, and client/cluster mode in Spark.
>> >>> >>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>> Best,
>> >>> >>>>>>>>>>>>> tison.
>> >>> >>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>> Yang Wang <da...@gmail.com> 于2019年8月22日周四
>> >>> >> 上午11:25写道：
>> >>> >>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> From the user's perspective, it is really confused
>> >>> >> about the
>> >>> >>>>>>>> scope
>> >>> >>>>>>>>>> of
>> >>> >>>>>>>>>>>>>> per-job cluster.
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> If it means a flink cluster with single job, so that
>> >>> >> we could
>> >>> >>>>>>>> get
>> >>> >>>>>>>>>>>> better
>> >>> >>>>>>>>>>>>>> isolation.
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> Now it does not matter how we deploy the cluster,
>> >>> >> directly
>> >>> >>>>>>>>>>>> deploy(mode1)
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> or start a flink cluster and then submit job through
>> >>> >> cluster
>> >>> >>>>>>>>>>>> client(mode2).
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> Otherwise, if it just means directly deploy, how
>> >>> >> should we
>> >>> >>>>>>>> name the
>> >>> >>>>>>>>>>>> mode2,
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> session with job or something else?
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> We could also benefit from the mode2. Users could
>> >>> >> get the same
>> >>> >>>>>>>>>>>> isolation
>> >>> >>>>>>>>>>>>>> with mode1.
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> The user code and dependencies will be loaded by
>> >>> >> user class
>> >>> >>>>>>>> loader
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> to avoid class conflict with framework.
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> Anyway, both of the two submission modes are useful.
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> We just need to clarify the concepts.
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> Best,
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> Yang
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月20日周二
>> >>> >> 下午5:58写道：
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>> Thanks for the clarification.
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>> The idea JobDeployer ever came into my mind when I
>> >>> >> was
>> >>> >>>>>>>> muddled
>> >>> >>>>>>>>>> with
>> >>> >>>>>>>>>>>>>>> how to execute per-job mode and session mode with
>> >>> >> the same
>> >>> >>>>>>>> user
>> >>> >>>>>>>>>>> code
>> >>> >>>>>>>>>>>>>>> and framework codepath.
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>> With the concept JobDeployer we back to the
>> >>> >> statement that
>> >>> >>>>>>>>>>>> environment
>> >>> >>>>>>>>>>>>>>> knows every configs of cluster deployment and job
>> >>> >>>>>>>> submission. We
>> >>> >>>>>>>>>>>>>>> configure or generate from configuration a specific
>> >>> >>>>>>>> JobDeployer
>> >>> >>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>> environment and then code align on
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>> *JobClient client = env.execute().get();*
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>> which in session mode returned by
>> >>> >> clusterClient.submitJob
>> >>> >>>>>>>> and in
>> >>> >>>>>>>>>>>> per-job
>> >>> >>>>>>>>>>>>>>> mode returned by
>> >>> >> clusterDescriptor.deployJobCluster.
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>> Here comes a problem that currently we directly run
>> >>> >>>>>>>>>>> ClusterEntrypoint
>> >>> >>>>>>>>>>>>>>> with extracted job graph. Follow the JobDeployer
>> >>> >> way we'd
>> >>> >>>>>>>> better
>> >>> >>>>>>>>>>>>>>> align entry point of per-job deployment at
>> >>> >> JobDeployer.
>> >>> >>>>>>>> Users run
>> >>> >>>>>>>>>>>>>>> their main method or by a Cli(finally call main
>> >>> >> method) to
>> >>> >>>>>>>> deploy
>> >>> >>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>> job cluster.
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>> Best,
>> >>> >>>>>>>>>>>>>>> tison.
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年8月20日周二
>> >>> >> 下午4:40写道：
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>> Till has made some good comments here.
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>> Two things to add:
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>  - The job mode is very nice in the way that it
>> >>> >> runs the
>> >>> >>>>>>>>>> client
>> >>> >>>>>>>>>>>> inside
>> >>> >>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>> cluster (in the same image/process that is the
>> >>> >> JM) and thus
>> >>> >>>>>>>>>>> unifies
>> >>> >>>>>>>>>>>>>> both
>> >>> >>>>>>>>>>>>>>>> applications and what the Spark world calls the
>> >>> >> "driver
>> >>> >>>>>>>> mode".
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>  - Another thing I would add is that during the
>> >>> >> FLIP-6
>> >>> >>>>>>>> design,
>> >>> >>>>>>>>>>> we
>> >>> >>>>>>>>>>>> were
>> >>> >>>>>>>>>>>>>>>> thinking about setups where Dispatcher and
>> >>> >> JobManager are
>> >>> >>>>>>>>>>> separate
>> >>> >>>>>>>>>>>>>>>> processes.
>> >>> >>>>>>>>>>>>>>>>    A Yarn or Mesos Dispatcher of a session
>> >>> >> could run
>> >>> >>>>>>>>>>> independently
>> >>> >>>>>>>>>>>>>> (even
>> >>> >>>>>>>>>>>>>>>> as privileged processes executing no code).
>> >>> >>>>>>>>>>>>>>>>    Then you the "per-job" mode could still be
>> >>> >> helpful:
>> >>> >>>>>>>> when a
>> >>> >>>>>>>>>>> job
>> >>> >>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>> submitted to the dispatcher, it launches the JM
>> >>> >> again in a
>> >>> >>>>>>>>>>> per-job
>> >>> >>>>>>>>>>>>>> mode,
>> >>> >>>>>>>>>>>>>>> so
>> >>> >>>>>>>>>>>>>>>> that JM and TM processes are bound to teh job
>> >>> >> only. For
>> >>> >>>>>>>> higher
>> >>> >>>>>>>>>>>> security
>> >>> >>>>>>>>>>>>>>>> setups, it is important that processes are not
>> >>> >> reused
>> >>> >>>>>>>> across
>> >>> >>>>>>>>>>> jobs.
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
>> >>> >>>>>>>>>>>> trohrmann@apache.org>
>> >>> >>>>>>>>>>>>>>>> wrote:
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>> I would not be in favour of getting rid of the
>> >>> >> per-job
>> >>> >>>>>>>> mode
>> >>> >>>>>>>>>>>> since it
>> >>> >>>>>>>>>>>>>>>>> simplifies the process of running Flink jobs
>> >>> >>>>>>>> considerably.
>> >>> >>>>>>>>>>>> Moreover,
>> >>> >>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>> not only well suited for container deployments
>> >>> >> but also
>> >>> >>>>>>>> for
>> >>> >>>>>>>>>>>>>> deployments
>> >>> >>>>>>>>>>>>>>>>> where you want to guarantee job isolation. For
>> >>> >> example, a
>> >>> >>>>>>>>>> user
>> >>> >>>>>>>>>>>> could
>> >>> >>>>>>>>>>>>>>> use
>> >>> >>>>>>>>>>>>>>>>> the per-job mode on Yarn to execute his job on
>> >>> >> a separate
>> >>> >>>>>>>>>>>> cluster.
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>> I think that having two notions of cluster
>> >>> >> deployments
>> >>> >>>>>>>>>> (session
>> >>> >>>>>>>>>>>> vs.
>> >>> >>>>>>>>>>>>>>>> per-job
>> >>> >>>>>>>>>>>>>>>>> mode) does not necessarily contradict your
>> >>> >> ideas for the
>> >>> >>>>>>>>>> client
>> >>> >>>>>>>>>>>> api
>> >>> >>>>>>>>>>>>>>>>> refactoring. For example one could have the
>> >>> >> following
>> >>> >>>>>>>>>>> interfaces:
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>> - ClusterDeploymentDescriptor: encapsulates
>> >>> >> the logic
>> >>> >>>>>>>> how to
>> >>> >>>>>>>>>>>> deploy a
>> >>> >>>>>>>>>>>>>>>>> cluster.
>> >>> >>>>>>>>>>>>>>>>> - ClusterClient: allows to interact with a
>> >>> >> cluster
>> >>> >>>>>>>>>>>>>>>>> - JobClient: allows to interact with a running
>> >>> >> job
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>> Now the ClusterDeploymentDescriptor could have
>> >>> >> two
>> >>> >>>>>>>> methods:
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>> - ClusterClient deploySessionCluster()
>> >>> >>>>>>>>>>>>>>>>> - JobClusterClient/JobClient
>> >>> >>>>>>>> deployPerJobCluster(JobGraph)
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>> where JobClusterClient is either a supertype of
>> >>> >>>>>>>> ClusterClient
>> >>> >>>>>>>>>>>> which
>> >>> >>>>>>>>>>>>>>> does
>> >>> >>>>>>>>>>>>>>>>> not give you the functionality to submit jobs
>> >>> >> or
>> >>> >>>>>>>>>>>> deployPerJobCluster
>> >>> >>>>>>>>>>>>>>>>> returns directly a JobClient.
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>> When setting up the ExecutionEnvironment, one
>> >>> >> would then
>> >>> >>>>>>>> not
>> >>> >>>>>>>>>>>> provide
>> >>> >>>>>>>>>>>>>> a
>> >>> >>>>>>>>>>>>>>>>> ClusterClient to submit jobs but a JobDeployer
>> >>> >> which,
>> >>> >>>>>>>>>> depending
>> >>> >>>>>>>>>>>> on
>> >>> >>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>> selected mode, either uses a ClusterClient
>> >>> >> (session
>> >>> >>>>>>>> mode) to
>> >>> >>>>>>>>>>>> submit
>> >>> >>>>>>>>>>>>>>> jobs
>> >>> >>>>>>>>>>>>>>>> or
>> >>> >>>>>>>>>>>>>>>>> a ClusterDeploymentDescriptor to deploy per a
>> >>> >> job mode
>> >>> >>>>>>>>>> cluster
>> >>> >>>>>>>>>>>> with
>> >>> >>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>> to execute.
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>> These are just some thoughts how one could
>> >>> >> make it
>> >>> >>>>>>>> working
>> >>> >>>>>>>>>>>> because I
>> >>> >>>>>>>>>>>>>>>>> believe there is some value in using the per
>> >>> >> job mode
>> >>> >>>>>>>> from
>> >>> >>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>> ExecutionEnvironment.
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>> Concerning the web submission, this is indeed
>> >>> >> a bit
>> >>> >>>>>>>> tricky.
>> >>> >>>>>>>>>>> From
>> >>> >>>>>>>>>>>> a
>> >>> >>>>>>>>>>>>>>>> cluster
>> >>> >>>>>>>>>>>>>>>>> management stand point, I would in favour of
>> >>> >> not
>> >>> >>>>>>>> executing
>> >>> >>>>>>>>>> user
>> >>> >>>>>>>>>>>> code
>> >>> >>>>>>>>>>>>>> on
>> >>> >>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>> REST endpoint. Especially when considering
>> >>> >> security, it
>> >>> >>>>>>>> would
>> >>> >>>>>>>>>>> be
>> >>> >>>>>>>>>>>> good
>> >>> >>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>> have a well defined cluster behaviour where it
>> >>> >> is
>> >>> >>>>>>>> explicitly
>> >>> >>>>>>>>>>>> stated
>> >>> >>>>>>>>>>>>>>> where
>> >>> >>>>>>>>>>>>>>>>> user code and, thus, potentially risky code is
>> >>> >> executed.
>> >>> >>>>>>>>>>> Ideally
>> >>> >>>>>>>>>>>> we
>> >>> >>>>>>>>>>>>>>> limit
>> >>> >>>>>>>>>>>>>>>>> it to the TaskExecutor and JobMaster.
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>> Cheers,
>> >>> >>>>>>>>>>>>>>>>> Till
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:40 AM Flavio
>> >>> >> Pompermaier <
>> >>> >>>>>>>>>>>>>>> pompermaier@okkam.it
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>> wrote:
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>> In my opinion the client should not use any
>> >>> >>>>>>>> environment to
>> >>> >>>>>>>>>>> get
>> >>> >>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>> Job
>> >>> >>>>>>>>>>>>>>>>>> graph because the jar should reside ONLY on
>> >>> >> the cluster
>> >>> >>>>>>>>>> (and
>> >>> >>>>>>>>>>>> not in
>> >>> >>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>> client classpath otherwise there are always
>> >>> >>>>>>>> inconsistencies
>> >>> >>>>>>>>>>>> between
>> >>> >>>>>>>>>>>>>>>>> client
>> >>> >>>>>>>>>>>>>>>>>> and Flink Job manager's classpath).
>> >>> >>>>>>>>>>>>>>>>>> In the YARN, Mesos and Kubernetes scenarios
>> >>> >> you have
>> >>> >>>>>>>> the
>> >>> >>>>>>>>>> jar
>> >>> >>>>>>>>>>>> but
>> >>> >>>>>>>>>>>>>> you
>> >>> >>>>>>>>>>>>>>>>> could
>> >>> >>>>>>>>>>>>>>>>>> start a cluster that has the jar on the Job
>> >>> >> Manager as
>> >>> >>>>>>>> well
>> >>> >>>>>>>>>>>> (but
>> >>> >>>>>>>>>>>>>> this
>> >>> >>>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>>> the only case where I think you can assume
>> >>> >> that the
>> >>> >>>>>>>> client
>> >>> >>>>>>>>>>> has
>> >>> >>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>> jar
>> >>> >>>>>>>>>>>>>>>> on
>> >>> >>>>>>>>>>>>>>>>>> the classpath..in the REST job submission
>> >>> >> you don't
>> >>> >>>>>>>> have
>> >>> >>>>>>>>>> any
>> >>> >>>>>>>>>>>>>>>> classpath).
>> >>> >>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>> Thus, always in my opinion, the JobGraph
>> >>> >> should be
>> >>> >>>>>>>>>> generated
>> >>> >>>>>>>>>>>> by the
>> >>> >>>>>>>>>>>>>>> Job
>> >>> >>>>>>>>>>>>>>>>>> Manager REST API.
>> >>> >>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
>> >>> >>>>>>>>>>>> wander4096@gmail.com>
>> >>> >>>>>>>>>>>>>>>> wrote:
>> >>> >>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>> I would like to involve Till & Stephan here
>> >>> >> to clarify
>> >>> >>>>>>>>>> some
>> >>> >>>>>>>>>>>>>> concept
>> >>> >>>>>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>>>>>> per-job mode.
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>> The term per-job is one of modes a cluster
>> >>> >> could run
>> >>> >>>>>>>> on.
>> >>> >>>>>>>>>> It
>> >>> >>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>> mainly
>> >>> >>>>>>>>>>>>>>>>>>> aimed
>> >>> >>>>>>>>>>>>>>>>>>> at spawn
>> >>> >>>>>>>>>>>>>>>>>>> a dedicated cluster for a specific job
>> >>> >> while the job
>> >>> >>>>>>>> could
>> >>> >>>>>>>>>>> be
>> >>> >>>>>>>>>>>>>>> packaged
>> >>> >>>>>>>>>>>>>>>>>>> with
>> >>> >>>>>>>>>>>>>>>>>>> Flink
>> >>> >>>>>>>>>>>>>>>>>>> itself and thus the cluster initialized
>> >>> >> with job so
>> >>> >>>>>>>> that
>> >>> >>>>>>>>>> get
>> >>> >>>>>>>>>>>> rid
>> >>> >>>>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>> a
>> >>> >>>>>>>>>>>>>>>>>>> separated
>> >>> >>>>>>>>>>>>>>>>>>> submission step.
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>> This is useful for container deployments
>> >>> >> where one
>> >>> >>>>>>>> create
>> >>> >>>>>>>>>>> his
>> >>> >>>>>>>>>>>>>> image
>> >>> >>>>>>>>>>>>>>>> with
>> >>> >>>>>>>>>>>>>>>>>>> the job
>> >>> >>>>>>>>>>>>>>>>>>> and then simply deploy the container.
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>> However, it is out of client scope since a
>> >>> >>>>>>>>>>>> client(ClusterClient
>> >>> >>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>> example) is for
>> >>> >>>>>>>>>>>>>>>>>>> communicate with an existing cluster and
>> >>> >> performance
>> >>> >>>>>>>>>>> actions.
>> >>> >>>>>>>>>>>>>>>> Currently,
>> >>> >>>>>>>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>> per-job
>> >>> >>>>>>>>>>>>>>>>>>> mode, we extract the job graph and bundle
>> >>> >> it into
>> >>> >>>>>>>> cluster
>> >>> >>>>>>>>>>>>>> deployment
>> >>> >>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>> thus no
>> >>> >>>>>>>>>>>>>>>>>>> concept of client get involved. It looks
>> >>> >> like
>> >>> >>>>>>>> reasonable
>> >>> >>>>>>>>>> to
>> >>> >>>>>>>>>>>>>> exclude
>> >>> >>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>> deployment
>> >>> >>>>>>>>>>>>>>>>>>> of per-job cluster from client api and use
>> >>> >> dedicated
>> >>> >>>>>>>>>> utility
>> >>> >>>>>>>>>>>>>>>>>>> classes(deployers) for
>> >>> >>>>>>>>>>>>>>>>>>> deployment.
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
>> >>> >> 于2019年8月20日周二
>> >>> >>>>>>>> 下午12:37写道：
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> Hi Aljoscha,
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> Thanks for your reply and participance.
>> >>> >> The Google
>> >>> >>>>>>>> Doc
>> >>> >>>>>>>>>> you
>> >>> >>>>>>>>>>>>>> linked
>> >>> >>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>> requires
>> >>> >>>>>>>>>>>>>>>>>>>> permission and I think you could use a
>> >>> >> share link
>> >>> >>>>>>>>>> instead.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> I agree with that we almost reach a
>> >>> >> consensus that
>> >>> >>>>>>>>>>>> JobClient is
>> >>> >>>>>>>>>>>>>>>>>>> necessary
>> >>> >>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>> interacte with a running Job.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> Let me check your open questions one by
>> >>> >> one.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> 1. Separate cluster creation and job
>> >>> >> submission for
>> >>> >>>>>>>>>>> per-job
>> >>> >>>>>>>>>>>>>> mode.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> As you mentioned here is where the
>> >>> >> opinions
>> >>> >>>>>>>> diverge. In
>> >>> >>>>>>>>>> my
>> >>> >>>>>>>>>>>>>>> document
>> >>> >>>>>>>>>>>>>>>>>>> there
>> >>> >>>>>>>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>>>>> an alternative[2] that proposes excluding
>> >>> >> per-job
>> >>> >>>>>>>>>>> deployment
>> >>> >>>>>>>>>>>>>> from
>> >>> >>>>>>>>>>>>>>>>> client
>> >>> >>>>>>>>>>>>>>>>>>>> api
>> >>> >>>>>>>>>>>>>>>>>>>> scope and now I find it is more
>> >>> >> reasonable we do the
>> >>> >>>>>>>>>>>> exclusion.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> When in per-job mode, a dedicated
>> >>> >> JobCluster is
>> >>> >>>>>>>> launched
>> >>> >>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>> execute
>> >>> >>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>> specific job. It is like a Flink
>> >>> >> Application more
>> >>> >>>>>>>> than a
>> >>> >>>>>>>>>>>>>>> submission
>> >>> >>>>>>>>>>>>>>>>>>>> of Flink Job. Client only takes care of
>> >>> >> job
>> >>> >>>>>>>> submission
>> >>> >>>>>>>>>> and
>> >>> >>>>>>>>>>>>>> assume
>> >>> >>>>>>>>>>>>>>>>> there
>> >>> >>>>>>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>>>>> an existing cluster. In this way we are
>> >>> >> able to
>> >>> >>>>>>>> consider
>> >>> >>>>>>>>>>>> per-job
>> >>> >>>>>>>>>>>>>>>>> issues
>> >>> >>>>>>>>>>>>>>>>>>>> individually and JobClusterEntrypoint
>> >>> >> would be the
>> >>> >>>>>>>>>> utility
>> >>> >>>>>>>>>>>> class
>> >>> >>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>> per-job
>> >>> >>>>>>>>>>>>>>>>>>>> deployment.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> Nevertheless, user program works in both
>> >>> >> session
>> >>> >>>>>>>> mode
>> >>> >>>>>>>>>> and
>> >>> >>>>>>>>>>>>>> per-job
>> >>> >>>>>>>>>>>>>>>> mode
>> >>> >>>>>>>>>>>>>>>>>>>> without
>> >>> >>>>>>>>>>>>>>>>>>>> necessary to change code. JobClient in
>> >>> >> per-job mode
>> >>> >>>>>>>> is
>> >>> >>>>>>>>>>>> returned
>> >>> >>>>>>>>>>>>>>> from
>> >>> >>>>>>>>>>>>>>>>>>>> env.execute as normal. However, it would
>> >>> >> be no
>> >>> >>>>>>>> longer a
>> >>> >>>>>>>>>>>> wrapper
>> >>> >>>>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>>>>>>> RestClusterClient but a wrapper of
>> >>> >>>>>>>> PerJobClusterClient
>> >>> >>>>>>>>>>> which
>> >>> >>>>>>>>>>>>>>>>>>> communicates
>> >>> >>>>>>>>>>>>>>>>>>>> to Dispatcher locally.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> 2. How to deal with plan preview.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> With env.compile functions users can get
>> >>> >> JobGraph or
>> >>> >>>>>>>>>>>> FlinkPlan
>> >>> >>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>> thus
>> >>> >>>>>>>>>>>>>>>>>>>> they can preview the plan with
>> >>> >> programming.
>> >>> >>>>>>>> Typically it
>> >>> >>>>>>>>>>>> looks
>> >>> >>>>>>>>>>>>>>> like
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> if (preview configured) {
>> >>> >>>>>>>>>>>>>>>>>>>>    FlinkPlan plan = env.compile();
>> >>> >>>>>>>>>>>>>>>>>>>>    new JSONDumpGenerator(...).dump(plan);
>> >>> >>>>>>>>>>>>>>>>>>>> } else {
>> >>> >>>>>>>>>>>>>>>>>>>>    env.execute();
>> >>> >>>>>>>>>>>>>>>>>>>> }
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> And `flink info` would be invalid any
>> >>> >> more.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> 3. How to deal with Jar Submission at the
>> >>> >> Web
>> >>> >>>>>>>> Frontend.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> There is one more thread talked on this
>> >>> >> topic[1].
>> >>> >>>>>>>> Apart
>> >>> >>>>>>>>>>> from
>> >>> >>>>>>>>>>>>>>>> removing
>> >>> >>>>>>>>>>>>>>>>>>>> the functions there are two alternatives.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> One is to introduce an interface has a
>> >>> >> method
>> >>> >>>>>>>> returns
>> >>> >>>>>>>>>>>>>>>>> JobGraph/FilnkPlan
>> >>> >>>>>>>>>>>>>>>>>>>> and Jar Submission only support main-class
>> >>> >>>>>>>> implements
>> >>> >>>>>>>>>> this
>> >>> >>>>>>>>>>>>>>>> interface.
>> >>> >>>>>>>>>>>>>>>>>>>> And then extract the JobGraph/FlinkPlan
>> >>> >> just by
>> >>> >>>>>>>> calling
>> >>> >>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>> method.
>> >>> >>>>>>>>>>>>>>>>>>>> In this way, it is even possible to
>> >>> >> consider a
>> >>> >>>>>>>>>> separation
>> >>> >>>>>>>>>>>> of job
>> >>> >>>>>>>>>>>>>>>>>>> creation
>> >>> >>>>>>>>>>>>>>>>>>>> and job submission.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> The other is, as you mentioned, let
>> >>> >> execute() do the
>> >>> >>>>>>>>>>> actual
>> >>> >>>>>>>>>>>>>>>> execution.
>> >>> >>>>>>>>>>>>>>>>>>>> We won't execute the main method in the
>> >>> >> WebFrontend
>> >>> >>>>>>>> but
>> >>> >>>>>>>>>>>> spawn a
>> >>> >>>>>>>>>>>>>>>>> process
>> >>> >>>>>>>>>>>>>>>>>>>> at WebMonitor side to execute. For return
>> >>> >> part we
>> >>> >>>>>>>> could
>> >>> >>>>>>>>>>>> generate
>> >>> >>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>> JobID from WebMonitor and pass it to the
>> >>> >> execution
>> >>> >>>>>>>>>>>> environemnt.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> 4. How to deal with detached mode.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> I think detached mode is a temporary
>> >>> >> solution for
>> >>> >>>>>>>>>>>> non-blocking
>> >>> >>>>>>>>>>>>>>>>>>> submission.
>> >>> >>>>>>>>>>>>>>>>>>>> In my document both submission and
>> >>> >> execution return
>> >>> >>>>>>>> a
>> >>> >>>>>>>>>>>>>>>>> CompletableFuture
>> >>> >>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>> users control whether or not wait for the
>> >>> >> result. In
>> >>> >>>>>>>>>> this
>> >>> >>>>>>>>>>>> point
>> >>> >>>>>>>>>>>>>> we
>> >>> >>>>>>>>>>>>>>>>> don't
>> >>> >>>>>>>>>>>>>>>>>>>> need a detached option but the
>> >>> >> functionality is
>> >>> >>>>>>>> covered.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> 5. How does per-job mode interact with
>> >>> >> interactive
>> >>> >>>>>>>>>>>> programming.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> All of YARN, Mesos and Kubernetes
>> >>> >> scenarios follow
>> >>> >>>>>>>> the
>> >>> >>>>>>>>>>>> pattern
>> >>> >>>>>>>>>>>>>>>> launch
>> >>> >>>>>>>>>>>>>>>>> a
>> >>> >>>>>>>>>>>>>>>>>>>> JobCluster now. And I don't think there
>> >>> >> would be
>> >>> >>>>>>>>>>>> inconsistency
>> >>> >>>>>>>>>>>>>>>> between
>> >>> >>>>>>>>>>>>>>>>>>>> different resource management.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> Best,
>> >>> >>>>>>>>>>>>>>>>>>>> tison.
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> [1]
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>
>> >>> >>
>> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
>> >>> >>>>>>>>>>>>>>>>>>>> [2]
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>
>> >>> >>
>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org>
>> >>> >>>>>>>> 于2019年8月16日周五
>> >>> >>>>>>>>>>>> 下午9:20写道：
>> >>> >>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>> Hi,
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>> I read both Jeffs initial design
>> >>> >> document and the
>> >>> >>>>>>>> newer
>> >>> >>>>>>>>>>>>>> document
>> >>> >>>>>>>>>>>>>>> by
>> >>> >>>>>>>>>>>>>>>>>>>>> Tison. I also finally found the time to
>> >>> >> collect our
>> >>> >>>>>>>>>>>> thoughts on
>> >>> >>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>> issue,
>> >>> >>>>>>>>>>>>>>>>>>>>> I had quite some discussions with Kostas
>> >>> >> and this
>> >>> >>>>>>>> is
>> >>> >>>>>>>>>> the
>> >>> >>>>>>>>>>>>>> result:
>> >>> >>>>>>>>>>>>>>>> [1].
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>> I think overall we agree that this part
>> >>> >> of the
>> >>> >>>>>>>> code is
>> >>> >>>>>>>>>> in
>> >>> >>>>>>>>>>>> dire
>> >>> >>>>>>>>>>>>>>> need
>> >>> >>>>>>>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>>>>>>>> some refactoring/improvements but I
>> >>> >> think there are
>> >>> >>>>>>>>>> still
>> >>> >>>>>>>>>>>> some
>> >>> >>>>>>>>>>>>>>> open
>> >>> >>>>>>>>>>>>>>>>>>>>> questions and some differences in
>> >>> >> opinion what
>> >>> >>>>>>>> those
>> >>> >>>>>>>>>>>>>> refactorings
>> >>> >>>>>>>>>>>>>>>>>>> should
>> >>> >>>>>>>>>>>>>>>>>>>>> look like.
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>> I think the API-side is quite clear,
>> >>> >> i.e. we need
>> >>> >>>>>>>> some
>> >>> >>>>>>>>>>>>>> JobClient
>> >>> >>>>>>>>>>>>>>>> API
>> >>> >>>>>>>>>>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>>>>>>>>> allows interacting with a running Job.
>> >>> >> It could be
>> >>> >>>>>>>>>>>> worthwhile
>> >>> >>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>> spin
>> >>> >>>>>>>>>>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>>>>>>>>> off into a separate FLIP because we can
>> >>> >> probably
>> >>> >>>>>>>> find
>> >>> >>>>>>>>>>>> consensus
>> >>> >>>>>>>>>>>>>>> on
>> >>> >>>>>>>>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>>>>>>>>> part more easily.
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>> For the rest, the main open questions
>> >>> >> from our doc
>> >>> >>>>>>>> are
>> >>> >>>>>>>>>>>> these:
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>  - Do we want to separate cluster
>> >>> >> creation and job
>> >>> >>>>>>>>>>>> submission
>> >>> >>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>> per-job mode? In the past, there were
>> >>> >> conscious
>> >>> >>>>>>>> efforts
>> >>> >>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>> *not*
>> >>> >>>>>>>>>>>>>>>>>>> separate
>> >>> >>>>>>>>>>>>>>>>>>>>> job submission from cluster creation for
>> >>> >> per-job
>> >>> >>>>>>>>>> clusters
>> >>> >>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>> Mesos,
>> >>> >>>>>>>>>>>>>>>>>>> YARN,
>> >>> >>>>>>>>>>>>>>>>>>>>> Kubernets (see
>> >>> >> StandaloneJobClusterEntryPoint).
>> >>> >>>>>>>> Tison
>> >>> >>>>>>>>>>>> suggests
>> >>> >>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>> his
>> >>> >>>>>>>>>>>>>>>>>>>>> design document to decouple this in
>> >>> >> order to unify
>> >>> >>>>>>>> job
>> >>> >>>>>>>>>>>>>>> submission.
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>  - How to deal with plan preview, which
>> >>> >> needs to
>> >>> >>>>>>>>>> hijack
>> >>> >>>>>>>>>>>>>>> execute()
>> >>> >>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>> let the outside code catch an exception?
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>  - How to deal with Jar Submission at
>> >>> >> the Web
>> >>> >>>>>>>>>> Frontend,
>> >>> >>>>>>>>>>>> which
>> >>> >>>>>>>>>>>>>>>> needs
>> >>> >>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>> hijack execute() and let the outside
>> >>> >> code catch an
>> >>> >>>>>>>>>>>> exception?
>> >>> >>>>>>>>>>>>>>>>>>>>> CliFrontend.run() “hijacks”
>> >>> >>>>>>>>>>> ExecutionEnvironment.execute()
>> >>> >>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>> get a
>> >>> >>>>>>>>>>>>>>>>>>>>> JobGraph and then execute that JobGraph
>> >>> >> manually.
>> >>> >>>>>>>> We
>> >>> >>>>>>>>>>> could
>> >>> >>>>>>>>>>>> get
>> >>> >>>>>>>>>>>>>>>> around
>> >>> >>>>>>>>>>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>>>>>>>>> by letting execute() do the actual
>> >>> >> execution. One
>> >>> >>>>>>>>>> caveat
>> >>> >>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>> this
>> >>> >>>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>>>>>>>>> now the main() method doesn’t return (or
>> >>> >> is forced
>> >>> >>>>>>>> to
>> >>> >>>>>>>>>>>> return by
>> >>> >>>>>>>>>>>>>>>>>>> throwing an
>> >>> >>>>>>>>>>>>>>>>>>>>> exception from execute()) which means
>> >>> >> that for Jar
>> >>> >>>>>>>>>>>> Submission
>> >>> >>>>>>>>>>>>>>> from
>> >>> >>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>> WebFrontend we have a long-running
>> >>> >> main() method
>> >>> >>>>>>>>>> running
>> >>> >>>>>>>>>>>> in the
>> >>> >>>>>>>>>>>>>>>>>>>>> WebFrontend. This doesn’t sound very
>> >>> >> good. We
>> >>> >>>>>>>> could get
>> >>> >>>>>>>>>>>> around
>> >>> >>>>>>>>>>>>>>> this
>> >>> >>>>>>>>>>>>>>>>> by
>> >>> >>>>>>>>>>>>>>>>>>>>> removing the plan preview feature and by
>> >>> >> removing
>> >>> >>>>>>>> Jar
>> >>> >>>>>>>>>>>>>>>>>>> Submission/Running.
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>  - How to deal with detached mode?
>> >>> >> Right now,
>> >>> >>>>>>>>>>>>>>> DetachedEnvironment
>> >>> >>>>>>>>>>>>>>>>> will
>> >>> >>>>>>>>>>>>>>>>>>>>> execute the job and return immediately.
>> >>> >> If users
>> >>> >>>>>>>>>> control
>> >>> >>>>>>>>>>>> when
>> >>> >>>>>>>>>>>>>>> they
>> >>> >>>>>>>>>>>>>>>>>>> want to
>> >>> >>>>>>>>>>>>>>>>>>>>> return, by waiting on the job completion
>> >>> >> future,
>> >>> >>>>>>>> how do
>> >>> >>>>>>>>>>> we
>> >>> >>>>>>>>>>>> deal
>> >>> >>>>>>>>>>>>>>>> with
>> >>> >>>>>>>>>>>>>>>>>>> this?
>> >>> >>>>>>>>>>>>>>>>>>>>> Do we simply remove the distinction
>> >>> >> between
>> >>> >>>>>>>>>>>>>>> detached/non-detached?
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>  - How does per-job mode interact with
>> >>> >>>>>>>> “interactive
>> >>> >>>>>>>>>>>>>> programming”
>> >>> >>>>>>>>>>>>>>>>>>>>> (FLIP-36). For YARN, each execute() call
>> >>> >> could
>> >>> >>>>>>>> spawn a
>> >>> >>>>>>>>>>> new
>> >>> >>>>>>>>>>>>>> Flink
>> >>> >>>>>>>>>>>>>>>> YARN
>> >>> >>>>>>>>>>>>>>>>>>>>> cluster. What about Mesos and Kubernetes?
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>> The first open question is where the
>> >>> >> opinions
>> >>> >>>>>>>> diverge,
>> >>> >>>>>>>>>> I
>> >>> >>>>>>>>>>>> think.
>> >>> >>>>>>>>>>>>>>> The
>> >>> >>>>>>>>>>>>>>>>>>> rest
>> >>> >>>>>>>>>>>>>>>>>>>>> are just open questions and interesting
>> >>> >> things
>> >>> >>>>>>>> that we
>> >>> >>>>>>>>>>>> need to
>> >>> >>>>>>>>>>>>>>>>>>> consider.
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>> Best,
>> >>> >>>>>>>>>>>>>>>>>>>>> Aljoscha
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>> [1]
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>
>> >>> >>
>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>> >>> >>>>>>>>>>>>>>>>>>>>> <
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>
>> >>> >>
>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>> >>> >>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>> On 31. Jul 2019, at 15:23, Jeff Zhang <
>> >>> >>>>>>>>>>> zjffdu@gmail.com>
>> >>> >>>>>>>>>>>>>>> wrote:
>> >>> >>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>> Thanks tison for the effort. I left a
>> >>> >> few
>> >>> >>>>>>>> comments.
>> >>> >>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
>> >>> >> 于2019年7月31日周三
>> >>> >>>>>>>>>>> 下午8:24写道：
>> >>> >>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>> Hi Flavio,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>> Thanks for your reply.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>> Either current impl and in the design,
>> >>> >>>>>>>> ClusterClient
>> >>> >>>>>>>>>>>>>>>>>>>>>>> never takes responsibility for
>> >>> >> generating
>> >>> >>>>>>>> JobGraph.
>> >>> >>>>>>>>>>>>>>>>>>>>>>> (what you see in current codebase is
>> >>> >> several
>> >>> >>>>>>>> class
>> >>> >>>>>>>>>>>> methods)
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>> Instead, user describes his program
>> >>> >> in the main
>> >>> >>>>>>>>>> method
>> >>> >>>>>>>>>>>>>>>>>>>>>>> with ExecutionEnvironment apis and
>> >>> >> calls
>> >>> >>>>>>>>>> env.compile()
>> >>> >>>>>>>>>>>>>>>>>>>>>>> or env.optimize() to get FlinkPlan
>> >>> >> and JobGraph
>> >>> >>>>>>>>>>>>>> respectively.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>> For listing main classes in a jar and
>> >>> >> choose
>> >>> >>>>>>>> one for
>> >>> >>>>>>>>>>>>>>>>>>>>>>> submission, you're now able to
>> >>> >> customize a CLI
>> >>> >>>>>>>> to do
>> >>> >>>>>>>>>>> it.
>> >>> >>>>>>>>>>>>>>>>>>>>>>> Specifically, the path of jar is
>> >>> >> passed as
>> >>> >>>>>>>> arguments
>> >>> >>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>> in the customized CLI you list main
>> >>> >> classes,
>> >>> >>>>>>>> choose
>> >>> >>>>>>>>>>> one
>> >>> >>>>>>>>>>>>>>>>>>>>>>> to submit to the cluster.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>> >>>>>>>>>>>>>>>>>>>>>>> tison.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
>> >>> >> pompermaier@okkam.it>
>> >>> >>>>>>>>>>> 于2019年7月31日周三
>> >>> >>>>>>>>>>>>>>>> 下午8:12写道：
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> Just one note on my side: it is not
>> >>> >> clear to me
>> >>> >>>>>>>>>>>> whether the
>> >>> >>>>>>>>>>>>>>>>> client
>> >>> >>>>>>>>>>>>>>>>>>>>> needs
>> >>> >>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> be able to generate a job graph or
>> >>> >> not.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> In my opinion, the job jar must
>> >>> >> resides only
>> >>> >>>>>>>> on the
>> >>> >>>>>>>>>>>>>>>>>>> server/jobManager
>> >>> >>>>>>>>>>>>>>>>>>>>>>> side
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> and the client requires a way to get
>> >>> >> the job
>> >>> >>>>>>>> graph.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> If you really want to access to the
>> >>> >> job graph,
>> >>> >>>>>>>> I'd
>> >>> >>>>>>>>>>> add
>> >>> >>>>>>>>>>>> a
>> >>> >>>>>>>>>>>>>>>>> dedicated
>> >>> >>>>>>>>>>>>>>>>>>>>> method
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> on the ClusterClient. like:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>  - getJobGraph(jarId, mainClass):
>> >>> >> JobGraph
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>  - listMainClasses(jarId):
>> >>> >> List<String>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> These would require some addition
>> >>> >> also on the
>> >>> >>>>>>>> job
>> >>> >>>>>>>>>>>> manager
>> >>> >>>>>>>>>>>>>>>>> endpoint
>> >>> >>>>>>>>>>>>>>>>>>> as
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> well..what do you think?
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 31, 2019 at 12:42 PM
>> >>> >> Zili Chen <
>> >>> >>>>>>>>>>>>>>>> wander4096@gmail.com
>> >>> >>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>> wrote:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Here is a document[1] on client api
>> >>> >>>>>>>> enhancement
>> >>> >>>>>>>>>> from
>> >>> >>>>>>>>>>>> our
>> >>> >>>>>>>>>>>>>>>>>>> perspective.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> We have investigated current
>> >>> >> implementations.
>> >>> >>>>>>>> And
>> >>> >>>>>>>>>> we
>> >>> >>>>>>>>>>>>>> propose
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Unify the implementation of
>> >>> >> cluster
>> >>> >>>>>>>> deployment
>> >>> >>>>>>>>>>> and
>> >>> >>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>> submission
>> >>> >>>>>>>>>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Flink.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Provide programmatic interfaces
>> >>> >> to allow
>> >>> >>>>>>>>>> flexible
>> >>> >>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>> cluster
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> management.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> The first proposal is aimed at
>> >>> >> reducing code
>> >>> >>>>>>>> paths
>> >>> >>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>> cluster
>> >>> >>>>>>>>>>>>>>>>>>>>>>> deployment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> job submission so that one can
>> >>> >> adopt Flink in
>> >>> >>>>>>>> his
>> >>> >>>>>>>>>>>> usage
>> >>> >>>>>>>>>>>>>>>> easily.
>> >>> >>>>>>>>>>>>>>>>>>> The
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> second
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> proposal is aimed at providing rich
>> >>> >>>>>>>> interfaces for
>> >>> >>>>>>>>>>>>>> advanced
>> >>> >>>>>>>>>>>>>>>>> users
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> who want to make accurate control
>> >>> >> of these
>> >>> >>>>>>>> stages.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Quick reference on open questions:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Exclude job cluster deployment
>> >>> >> from client
>> >>> >>>>>>>> side
>> >>> >>>>>>>>>>> or
>> >>> >>>>>>>>>>>>>>> redefine
>> >>> >>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> semantic
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> of job cluster? Since it fits in a
>> >>> >> process
>> >>> >>>>>>>> quite
>> >>> >>>>>>>>>>>> different
>> >>> >>>>>>>>>>>>>>>> from
>> >>> >>>>>>>>>>>>>>>>>>>>> session
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> cluster deployment and job
>> >>> >> submission.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Maintain the codepaths handling
>> >>> >> class
>> >>> >>>>>>>>>>>>>>>>> o.a.f.api.common.Program
>> >>> >>>>>>>>>>>>>>>>>>> or
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> implement customized program
>> >>> >> handling logic by
>> >>> >>>>>>>>>>>> customized
>> >>> >>>>>>>>>>>>>>>>>>>>> CliFrontend?
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> See also this thread[2] and the
>> >>> >> document[1].
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 3. Expose ClusterClient as public
>> >>> >> api or just
>> >>> >>>>>>>>>> expose
>> >>> >>>>>>>>>>>> api
>> >>> >>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> and delegate them to ClusterClient?
>> >>> >> Further,
>> >>> >>>>>>>> in
>> >>> >>>>>>>>>>>> either way
>> >>> >>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>> worth
>> >>> >>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> introduce a JobClient which is an
>> >>> >>>>>>>> encapsulation of
>> >>> >>>>>>>>>>>>>>>> ClusterClient
>> >>> >>>>>>>>>>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> associated to specific job?
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> tison.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>
>> >>> >>
>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> [2]
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>
>> >>> >>
>> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang <zj...@gmail.com>
>> >>> >> 于2019年7月24日周三
>> >>> >>>>>>>>>>> 上午9:19写道：
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Stephan, I will follow up
>> >>> >> this issue
>> >>> >>>>>>>> in
>> >>> >>>>>>>>>> next
>> >>> >>>>>>>>>>>> few
>> >>> >>>>>>>>>>>>>>>> weeks,
>> >>> >>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>> will
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> refine the design doc. We could
>> >>> >> discuss more
>> >>> >>>>>>>>>>> details
>> >>> >>>>>>>>>>>>>> after
>> >>> >>>>>>>>>>>>>>>> 1.9
>> >>> >>>>>>>>>>>>>>>>>>>>>>> release.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
>> >>> >>>>>>>> 于2019年7月24日周三
>> >>> >>>>>>>>>>>> 上午12:58写道：
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all!
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> This thread has stalled for a
>> >>> >> bit, which I
>> >>> >>>>>>>>>> assume
>> >>> >>>>>>>>>>>> ist
>> >>> >>>>>>>>>>>>>>> mostly
>> >>> >>>>>>>>>>>>>>>>>>> due to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Flink 1.9 feature freeze and
>> >>> >> release testing
>> >>> >>>>>>>>>>> effort.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I personally still recognize this
>> >>> >> issue as
>> >>> >>>>>>>> one
>> >>> >>>>>>>>>>>> important
>> >>> >>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>> be
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> solved.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> I'd
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> be happy to help resume this
>> >>> >> discussion soon
>> >>> >>>>>>>>>>> (after
>> >>> >>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>> 1.9
>> >>> >>>>>>>>>>>>>>>>>>>>>>> release)
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> see if we can do some step
>> >>> >> towards this in
>> >>> >>>>>>>> Flink
>> >>> >>>>>>>>>>>> 1.10.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 10:41 AM
>> >>> >> Flavio
>> >>> >>>>>>>>>>> Pompermaier
>> >>> >>>>>>>>>>>> <
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> pompermaier@okkam.it>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> That's exactly what I suggested
>> >>> >> a long time
>> >>> >>>>>>>>>> ago:
>> >>> >>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>> Flink
>> >>> >>>>>>>>>>>>>>>>> REST
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> client
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> should not require any Flink
>> >>> >> dependency,
>> >>> >>>>>>>> only
>> >>> >>>>>>>>>>> http
>> >>> >>>>>>>>>>>>>>> library
>> >>> >>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>> call
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> REST
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> services to submit and monitor a
>> >>> >> job.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested also in [1] was
>> >>> >> to have a
>> >>> >>>>>>>> way
>> >>> >>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>> automatically
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> suggest
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> user (via a UI) the available
>> >>> >> main classes
>> >>> >>>>>>>> and
>> >>> >>>>>>>>>>>> their
>> >>> >>>>>>>>>>>>>>>> required
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> parameters[2].
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Another problem we have with
>> >>> >> Flink is that
>> >>> >>>>>>>> the
>> >>> >>>>>>>>>>> Rest
>> >>> >>>>>>>>>>>>>>> client
>> >>> >>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> CLI
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> one
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> behaves differently and we use
>> >>> >> the CLI
>> >>> >>>>>>>> client
>> >>> >>>>>>>>>>> (via
>> >>> >>>>>>>>>>>> ssh)
>> >>> >>>>>>>>>>>>>>>>> because
>> >>> >>>>>>>>>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> allows
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> to call some other method after
>> >>> >>>>>>>> env.execute()
>> >>> >>>>>>>>>> [3]
>> >>> >>>>>>>>>>>> (we
>> >>> >>>>>>>>>>>>>>> have
>> >>> >>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>> call
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> another
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> REST service to signal the end
>> >>> >> of the job).
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Int his regard, a dedicated
>> >>> >> interface,
>> >>> >>>>>>>> like the
>> >>> >>>>>>>>>>>>>>> JobListener
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> suggested
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the previous emails, would be
>> >>> >> very helpful
>> >>> >>>>>>>>>>> (IMHO).
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10864
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>> >>> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10862
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>> >>> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10879
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 9:54 AM
>> >>> >> Jeff Zhang
>> >>> >>>>>>>> <
>> >>> >>>>>>>>>>>>>>>> zjffdu@gmail.com
>> >>> >>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Tison,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your comments.
>> >>> >> Overall I agree
>> >>> >>>>>>>> with
>> >>> >>>>>>>>>>> you
>> >>> >>>>>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> difficult
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> down stream project to
>> >>> >> integrate with
>> >>> >>>>>>>> flink
>> >>> >>>>>>>>>> and
>> >>> >>>>>>>>>>> we
>> >>> >>>>>>>>>>>>>> need
>> >>> >>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> refactor
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> current flink client api.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> And I agree that CliFrontend
>> >>> >> should only
>> >>> >>>>>>>>>> parsing
>> >>> >>>>>>>>>>>>>> command
>> >>> >>>>>>>>>>>>>>>>> line
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> arguments
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> then pass them to
>> >>> >> ExecutionEnvironment.
>> >>> >>>>>>>> It is
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment's
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> responsibility to compile job,
>> >>> >> create
>> >>> >>>>>>>> cluster,
>> >>> >>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>> submit
>> >>> >>>>>>>>>>>>>>>>> job.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Besides
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that, Currently flink has many
>> >>> >>>>>>>>>>>> ExecutionEnvironment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> implementations,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink will use the specific one
>> >>> >> based on
>> >>> >>>>>>>> the
>> >>> >>>>>>>>>>>> context.
>> >>> >>>>>>>>>>>>>>>> IMHO,
>> >>> >>>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary, ExecutionEnvironment
>> >>> >> should be
>> >>> >>>>>>>> able
>> >>> >>>>>>>>>>> to
>> >>> >>>>>>>>>>>> do
>> >>> >>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>> right
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> thing
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> based
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the FlinkConf it is
>> >>> >> received. Too many
>> >>> >>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation is another
>> >>> >> burden for
>> >>> >>>>>>>>>> downstream
>> >>> >>>>>>>>>>>>>> project
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> integration.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> One thing I'd like to mention
>> >>> >> is flink's
>> >>> >>>>>>>> scala
>> >>> >>>>>>>>>>>> shell
>> >>> >>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>> sql
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> client,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> although they are sub-modules
>> >>> >> of flink,
>> >>> >>>>>>>> they
>> >>> >>>>>>>>>>>> could be
>> >>> >>>>>>>>>>>>>>>>> treated
>> >>> >>>>>>>>>>>>>>>>>>>>>>> as
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> project which use flink's
>> >>> >> client api.
>> >>> >>>>>>>>>> Currently
>> >>> >>>>>>>>>>>> you
>> >>> >>>>>>>>>>>>>> will
>> >>> >>>>>>>>>>>>>>>>> find
>> >>> >>>>>>>>>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> easy for them to integrate with
>> >>> >> flink,
>> >>> >>>>>>>> they
>> >>> >>>>>>>>>>> share
>> >>> >>>>>>>>>>>> many
>> >>> >>>>>>>>>>>>>>>>>>>>>>> duplicated
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> code.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> is another sign that we should
>> >>> >> refactor
>> >>> >>>>>>>> flink
>> >>> >>>>>>>>>>>> client
>> >>> >>>>>>>>>>>>>>> api.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I believe it is a large and
>> >>> >> hard change,
>> >>> >>>>>>>> and I
>> >>> >>>>>>>>>>> am
>> >>> >>>>>>>>>>>>>> afraid
>> >>> >>>>>>>>>>>>>>>> we
>> >>> >>>>>>>>>>>>>>>>>>> can
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> not
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> keep
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility since many of
>> >>> >> changes are
>> >>> >>>>>>>> user
>> >>> >>>>>>>>>>>> facing.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <wander4096@gmail.com
>> >>> >>>
>> >>> >>>>>>>>>> 于2019年6月24日周一
>> >>> >>>>>>>>>>>>>>> 下午2:53写道：
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> After a closer look on our
>> >>> >> client apis,
>> >>> >>>>>>>> I can
>> >>> >>>>>>>>>>> see
>> >>> >>>>>>>>>>>>>> there
>> >>> >>>>>>>>>>>>>>>> are
>> >>> >>>>>>>>>>>>>>>>>>>>>>> two
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> major
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues to consistency and
>> >>> >> integration,
>> >>> >>>>>>>> namely
>> >>> >>>>>>>>>>>>>> different
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> deployment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job cluster which couples job
>> >>> >> graph
>> >>> >>>>>>>> creation
>> >>> >>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>> cluster
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> deployment,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submission via CliFrontend
>> >>> >> confusing
>> >>> >>>>>>>>>>> control
>> >>> >>>>>>>>>>>> flow
>> >>> >>>>>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> graph
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compilation and job
>> >>> >> submission. I'd like
>> >>> >>>>>>>> to
>> >>> >>>>>>>>>>>> follow
>> >>> >>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>> discuss
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> above,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mainly the process described
>> >>> >> by Jeff and
>> >>> >>>>>>>>>>>> Stephan, and
>> >>> >>>>>>>>>>>>>>>> share
>> >>> >>>>>>>>>>>>>>>>>>>>>>> my
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas on these issues.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend confuses the
>> >>> >> control flow
>> >>> >>>>>>>> of
>> >>> >>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>> compilation
>> >>> >>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> submission.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Following the process of job
>> >>> >> submission
>> >>> >>>>>>>>>> Stephan
>> >>> >>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>> Jeff
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> described,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution environment knows
>> >>> >> all configs
>> >>> >>>>>>>> of
>> >>> >>>>>>>>>> the
>> >>> >>>>>>>>>>>>>> cluster
>> >>> >>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> topos/settings
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the job. Ideally, in the
>> >>> >> main method
>> >>> >>>>>>>> of
>> >>> >>>>>>>>>> user
>> >>> >>>>>>>>>>>>>>> program,
>> >>> >>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> calls
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> #execute
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (or named #submit) and Flink
>> >>> >> deploys the
>> >>> >>>>>>>>>>> cluster,
>> >>> >>>>>>>>>>>>>>> compile
>> >>> >>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> graph
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit it to the cluster.
>> >>> >> However,
>> >>> >>>>>>>>>> current
>> >>> >>>>>>>>>>>>>>>> CliFrontend
>> >>> >>>>>>>>>>>>>>>>>>>>>>> does
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> all
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> these
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things inside its #runProgram
>> >>> >> method,
>> >>> >>>>>>>> which
>> >>> >>>>>>>>>>>>>> introduces
>> >>> >>>>>>>>>>>>>>> a
>> >>> >>>>>>>>>>>>>>>>> lot
>> >>> >>>>>>>>>>>>>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of (stream) execution
>> >>> >> environment.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually, it sets up an exec
>> >>> >> env that
>> >>> >>>>>>>> hijacks
>> >>> >>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> #execute/executePlan
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method, initializes the job
>> >>> >> graph and
>> >>> >>>>>>>> abort
>> >>> >>>>>>>>>>>>>> execution.
>> >>> >>>>>>>>>>>>>>>> And
>> >>> >>>>>>>>>>>>>>>>>>>>>>> then
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow back to
>> >>> >> CliFrontend, it
>> >>> >>>>>>>> deploys
>> >>> >>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>> cluster(or
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the client) and submits the
>> >>> >> job graph.
>> >>> >>>>>>>> This
>> >>> >>>>>>>>>> is
>> >>> >>>>>>>>>>>> quite
>> >>> >>>>>>>>>>>>>> a
>> >>> >>>>>>>>>>>>>>>>>>>>>>> specific
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> internal
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> process inside Flink and none
>> >>> >> of
>> >>> >>>>>>>> consistency
>> >>> >>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>> anything.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Deployment of job cluster
>> >>> >> couples job
>> >>> >>>>>>>>>> graph
>> >>> >>>>>>>>>>>>>> creation
>> >>> >>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> cluster
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment. Abstractly, from
>> >>> >> user job to
>> >>> >>>>>>>> a
>> >>> >>>>>>>>>>>> concrete
>> >>> >>>>>>>>>>>>>>>>>>>>>>> submission,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    create JobGraph --\
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create ClusterClient -->
>> >>> >> submit JobGraph
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> such a dependency.
>> >>> >> ClusterClient was
>> >>> >>>>>>>> created
>> >>> >>>>>>>>>> by
>> >>> >>>>>>>>>>>>>>> deploying
>> >>> >>>>>>>>>>>>>>>>> or
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieving.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph submission requires a
>> >>> >> compiled
>> >>> >>>>>>>>>>> JobGraph
>> >>> >>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>> valid
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but the creation of
>> >>> >> ClusterClient is
>> >>> >>>>>>>>>> abstractly
>> >>> >>>>>>>>>>>>>>>> independent
>> >>> >>>>>>>>>>>>>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph. However, in job
>> >>> >> cluster mode,
>> >>> >>>>>>>> we
>> >>> >>>>>>>>>>>> deploy job
>> >>> >>>>>>>>>>>>>>>>> cluster
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> with
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> a
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> graph, which means we use
>> >>> >> another
>> >>> >>>>>>>> process:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create JobGraph --> deploy
>> >>> >> cluster with
>> >>> >>>>>>>> the
>> >>> >>>>>>>>>>>> JobGraph
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Here is another inconsistency
>> >>> >> and
>> >>> >>>>>>>> downstream
>> >>> >>>>>>>>>>>>>>>>> projects/client
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> apis
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> are
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forced to handle different
>> >>> >> cases with
>> >>> >>>>>>>> rare
>> >>> >>>>>>>>>>>> supports
>> >>> >>>>>>>>>>>>>>> from
>> >>> >>>>>>>>>>>>>>>>>>>>>>> Flink.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since we likely reached a
>> >>> >> consensus on
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. all configs gathered by
>> >>> >> Flink
>> >>> >>>>>>>>>> configuration
>> >>> >>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>> passed
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. execution environment knows
>> >>> >> all
>> >>> >>>>>>>> configs
>> >>> >>>>>>>>>> and
>> >>> >>>>>>>>>>>>>> handles
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> execution(both
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment and submission)
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the issues above I propose
>> >>> >> eliminating
>> >>> >>>>>>>>>>>>>>> inconsistencies
>> >>> >>>>>>>>>>>>>>>>> by
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> following
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend should exactly
>> >>> >> be a front
>> >>> >>>>>>>> end,
>> >>> >>>>>>>>>>> at
>> >>> >>>>>>>>>>>>>> least
>> >>> >>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> "run"
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That means it just gathered
>> >>> >> and passed
>> >>> >>>>>>>> all
>> >>> >>>>>>>>>>> config
>> >>> >>>>>>>>>>>>>> from
>> >>> >>>>>>>>>>>>>>>>>>>>>>> command
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> line
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the main method of user
>> >>> >> program.
>> >>> >>>>>>>> Execution
>> >>> >>>>>>>>>>>>>> environment
>> >>> >>>>>>>>>>>>>>>>> knows
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> all
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> info
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and with an addition to utils
>> >>> >> for
>> >>> >>>>>>>>>>> ClusterClient,
>> >>> >>>>>>>>>>>> we
>> >>> >>>>>>>>>>>>>>>>>>>>>>> gracefully
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> get
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> a
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient by deploying or
>> >>> >>>>>>>> retrieving. In
>> >>> >>>>>>>>>>> this
>> >>> >>>>>>>>>>>>>> way,
>> >>> >>>>>>>>>>>>>>> we
>> >>> >>>>>>>>>>>>>>>>>>>>>>> don't
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> need
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hijack #execute/executePlan
>> >>> >> methods and
>> >>> >>>>>>>> can
>> >>> >>>>>>>>>>>> remove
>> >>> >>>>>>>>>>>>>>>> various
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> hacking
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses of exec env, as
>> >>> >> well as #run
>> >>> >>>>>>>>>> methods
>> >>> >>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient(for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> an
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interface-ized ClusterClient).
>> >>> >> Now the
>> >>> >>>>>>>>>> control
>> >>> >>>>>>>>>>>> flow
>> >>> >>>>>>>>>>>>>>> flows
>> >>> >>>>>>>>>>>>>>>>>>>>>>> from
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> CliFrontend
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the main method and never
>> >>> >> returns.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Job cluster means a cluster
>> >>> >> for the
>> >>> >>>>>>>>>> specific
>> >>> >>>>>>>>>>>> job.
>> >>> >>>>>>>>>>>>>>> From
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> another
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> perspective, it is an
>> >>> >> ephemeral session.
>> >>> >>>>>>>> We
>> >>> >>>>>>>>>> may
>> >>> >>>>>>>>>>>>>>> decouple
>> >>> >>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with a compiled job graph, but
>> >>> >> start a
>> >>> >>>>>>>>>> session
>> >>> >>>>>>>>>>>> with
>> >>> >>>>>>>>>>>>>>> idle
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> timeout
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit the job following.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> These topics, before we go
>> >>> >> into more
>> >>> >>>>>>>> details
>> >>> >>>>>>>>>> on
>> >>> >>>>>>>>>>>>>> design
>> >>> >>>>>>>>>>>>>>> or
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are better to be aware and
>> >>> >> discussed for
>> >>> >>>>>>>> a
>> >>> >>>>>>>>>>>> consensus.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <
>> >>> >> wander4096@gmail.com>
>> >>> >>>>>>>>>> 于2019年6月20日周四
>> >>> >>>>>>>>>>>>>>> 上午3:21写道：
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for raising this
>> >>> >> thread and the
>> >>> >>>>>>>>>> design
>> >>> >>>>>>>>>>>>>>> document!
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As @Thomas Weise mentioned
>> >>> >> above,
>> >>> >>>>>>>> extending
>> >>> >>>>>>>>>>>> config
>> >>> >>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>> flink
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires far more effort than
>> >>> >> it should
>> >>> >>>>>>>> be.
>> >>> >>>>>>>>>>>> Another
>> >>> >>>>>>>>>>>>>>>>> example
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is we achieve detach mode by
>> >>> >> introduce
>> >>> >>>>>>>>>> another
>> >>> >>>>>>>>>>>>>>> execution
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment which also hijack
>> >>> >> #execute
>> >>> >>>>>>>>>> method.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with your idea that
>> >>> >> user would
>> >>> >>>>>>>>>>>> configure all
>> >>> >>>>>>>>>>>>>>>>> things
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and flink "just" respect it.
>> >>> >> On this
>> >>> >>>>>>>> topic I
>> >>> >>>>>>>>>>>> think
>> >>> >>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>> unusual
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow when CliFrontend
>> >>> >> handle
>> >>> >>>>>>>> "run"
>> >>> >>>>>>>>>>>> command
>> >>> >>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> problem.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It handles several configs,
>> >>> >> mainly about
>> >>> >>>>>>>>>>> cluster
>> >>> >>>>>>>>>>>>>>>> settings,
>> >>> >>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thus main method of user
>> >>> >> program is
>> >>> >>>>>>>> unaware
>> >>> >>>>>>>>>> of
>> >>> >>>>>>>>>>>> them.
>> >>> >>>>>>>>>>>>>>>> Also
>> >>> >>>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> compiles
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> app to job graph by run the
>> >>> >> main method
>> >>> >>>>>>>>>> with a
>> >>> >>>>>>>>>>>>>>> hijacked
>> >>> >>>>>>>>>>>>>>>>> exec
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> env,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which constrain the main
>> >>> >> method further.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to write down a few
>> >>> >> of notes on
>> >>> >>>>>>>>>>>>>> configs/args
>> >>> >>>>>>>>>>>>>>>> pass
>> >>> >>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> respect,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as well as decoupling job
>> >>> >> compilation
>> >>> >>>>>>>> and
>> >>> >>>>>>>>>>>>>> submission.
>> >>> >>>>>>>>>>>>>>>>> Share
>> >>> >>>>>>>>>>>>>>>>>>>>>>> on
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> this
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread later.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SHI Xiaogang <
>> >>> >> shixiaogangg@gmail.com>
>> >>> >>>>>>>>>>>> 于2019年6月17日周一
>> >>> >>>>>>>>>>>>>>>>>>>>>>> 下午7:29写道：
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff and Flavio,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Jeff a lot for
>> >>> >> proposing the
>> >>> >>>>>>>> design
>> >>> >>>>>>>>>>>>>> document.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are also working on
>> >>> >> refactoring
>> >>> >>>>>>>>>>>> ClusterClient to
>> >>> >>>>>>>>>>>>>>>> allow
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> flexible
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> efficient job management in
>> >>> >> our
>> >>> >>>>>>>> real-time
>> >>> >>>>>>>>>>>> platform.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We would like to draft a
>> >>> >> document to
>> >>> >>>>>>>> share
>> >>> >>>>>>>>>>> our
>> >>> >>>>>>>>>>>>>> ideas
>> >>> >>>>>>>>>>>>>>>> with
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> you.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think it's a good idea to
>> >>> >> have
>> >>> >>>>>>>> something
>> >>> >>>>>>>>>>> like
>> >>> >>>>>>>>>>>>>>> Apache
>> >>> >>>>>>>>>>>>>>>>> Livy
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the efforts discussed here
>> >>> >> will take a
>> >>> >>>>>>>>>> great
>> >>> >>>>>>>>>>>> step
>> >>> >>>>>>>>>>>>>>>> forward
>> >>> >>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> it.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Xiaogang
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
>> >>> >>>>>>>> pompermaier@okkam.it>
>> >>> >>>>>>>>>>>>>>>> 于2019年6月17日周一
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> 下午7:13写道：
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Is there any possibility to
>> >>> >> have
>> >>> >>>>>>>> something
>> >>> >>>>>>>>>>>> like
>> >>> >>>>>>>>>>>>>>> Apache
>> >>> >>>>>>>>>>>>>>>>>>>>>>> Livy
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink in the future?
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>> >> https://livy.apache.org/
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 11, 2019 at
>> >>> >> 5:23 PM Jeff
>> >>> >>>>>>>>>> Zhang <
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any API we expose
>> >>> >> should not have
>> >>> >>>>>>>>>>>> dependencies
>> >>> >>>>>>>>>>>>>>> on
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package or
>> >>> >> other
>> >>> >>>>>>>>>>>> implementation
>> >>> >>>>>>>>>>>>>>>>>>>>>>> details.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> To
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> me,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
>> >>> >> ClusterClient
>> >>> >>>>>>>> cannot be
>> >>> >>>>>>>>>>>> exposed
>> >>> >>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>> users
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> because
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from the
>> >>> >>>>>>>> optimiser and
>> >>> >>>>>>>>>>>> runtime
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> packages.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We should change
>> >>> >> ClusterClient from
>> >>> >>>>>>>> class
>> >>> >>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>> interface.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment only
>> >>> >> use the
>> >>> >>>>>>>>>> interface
>> >>> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> which
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in flink-clients while the
>> >>> >> concrete
>> >>> >>>>>>>>>>>>>> implementation
>> >>> >>>>>>>>>>>>>>>>>>>>>>> class
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> could
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-runtime.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What happens when a
>> >>> >>>>>>>> failure/restart in
>> >>> >>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>> client
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> happens?
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to be a way of
>> >>> >> re-establishing the
>> >>> >>>>>>>>>>>> connection to
>> >>> >>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>> job,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> set
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> up
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners again, etc.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Good point.  First we need
>> >>> >> to define
>> >>> >>>>>>>> what
>> >>> >>>>>>>>>>>> does
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> failure/restart
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client mean. IIUC, that
>> >>> >> usually mean
>> >>> >>>>>>>>>>> network
>> >>> >>>>>>>>>>>>>>> failure
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> which
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> will
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happen in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class RestClient. If my
>> >>> >>>>>>>> understanding is
>> >>> >>>>>>>>>>>> correct,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> restart/retry
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mechanism
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be done in
>> >>> >> RestClient.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <
>> >>> >>>>>>>> aljoscha@apache.org>
>> >>> >>>>>>>>>>>>>>> 于2019年6月11日周二
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:10写道：
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Some points to consider:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * Any API we expose
>> >>> >> should not have
>> >>> >>>>>>>>>>>> dependencies
>> >>> >>>>>>>>>>>>>>> on
>> >>> >>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package
>> >>> >> or other
>> >>> >>>>>>>>>>>> implementation
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> details.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> To
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> me,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
>> >>> >> ClusterClient
>> >>> >>>>>>>> cannot be
>> >>> >>>>>>>>>>>> exposed
>> >>> >>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> users
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> because
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from
>> >>> >> the
>> >>> >>>>>>>> optimiser
>> >>> >>>>>>>>>> and
>> >>> >>>>>>>>>>>>>> runtime
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> packages.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * What happens when a
>> >>> >>>>>>>> failure/restart in
>> >>> >>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>> client
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> happens?
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be a way of
>> >>> >> re-establishing the
>> >>> >>>>>>>>>> connection
>> >>> >>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>> job,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> set
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> up
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again, etc.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 29. May 2019, at
>> >>> >> 10:17, Jeff
>> >>> >>>>>>>> Zhang <
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry folks, the design
>> >>> >> doc is
>> >>> >>>>>>>> late as
>> >>> >>>>>>>>>>> you
>> >>> >>>>>>>>>>>>>>>>>>>>>>> expected.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Here's
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doc
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I drafted, welcome any
>> >>> >> comments and
>> >>> >>>>>>>>>>>> feedback.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>
>> >>> >>
>> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <
>> >>> >> sewen@apache.org>
>> >>> >>>>>>>>>>>> 于2019年2月14日周四
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 下午8:43写道：
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Nice that this
>> >>> >> discussion is
>> >>> >>>>>>>>>> happening.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the FLIP, we could
>> >>> >> also
>> >>> >>>>>>>> revisit the
>> >>> >>>>>>>>>>>> entire
>> >>> >>>>>>>>>>>>>>> role
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environments
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the idea was:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the environments take
>> >>> >> care of
>> >>> >>>>>>>> the
>> >>> >>>>>>>>>>>> specific
>> >>> >>>>>>>>>>>>>>>>>>>>>>> setup
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> standalone
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (no
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> setup needed), yarn,
>> >>> >> mesos, etc.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the session ones have
>> >>> >> control
>> >>> >>>>>>>> over
>> >>> >>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>> session.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> The
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> holds
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the session client.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - running a job gives a
>> >>> >> "control"
>> >>> >>>>>>>>>> object
>> >>> >>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> job.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> That
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behavior
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the same in all
>> >>> >> environments.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The actual
>> >>> >> implementation diverged
>> >>> >>>>>>>>>> quite
>> >>> >>>>>>>>>>>> a bit
>> >>> >>>>>>>>>>>>>>>>>>>>>>> from
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> that.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Happy
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see a
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about
>> >>> >> straitening this
>> >>> >>>>>>>> out
>> >>> >>>>>>>>>> a
>> >>> >>>>>>>>>>>> bit
>> >>> >>>>>>>>>>>>>>> more.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at
>> >>> >> 4:58 AM
>> >>> >>>>>>>> Jeff
>> >>> >>>>>>>>>>>> Zhang <
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi folks,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for late
>> >>> >> response, It
>> >>> >>>>>>>> seems we
>> >>> >>>>>>>>>>>> reach
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> consensus
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> on
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, I
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP for this with
>> >>> >> more detailed
>> >>> >>>>>>>>>> design
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas Weise <
>> >>> >> thw@apache.org>
>> >>> >>>>>>>>>>>> 于2018年12月21日周五
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> 上午11:43写道：
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great to see this
>> >>> >> discussion
>> >>> >>>>>>>> seeded!
>> >>> >>>>>>>>>>> The
>> >>> >>>>>>>>>>>>>>>>>>>>>>> problems
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> you
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> face
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin integration
>> >>> >> are also
>> >>> >>>>>>>>>>> affecting
>> >>> >>>>>>>>>>>>>> other
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We just enabled the
>> >>> >> savepoint
>> >>> >>>>>>>>>> restore
>> >>> >>>>>>>>>>>> option
>> >>> >>>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RemoteStreamEnvironment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and that was more
>> >>> >> difficult
>> >>> >>>>>>>> than it
>> >>> >>>>>>>>>>>> should
>> >>> >>>>>>>>>>>>>> be.
>> >>> >>>>>>>>>>>>>>>>>>>>>>> The
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> main
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment and
>> >>> >> cluster client
>> >>> >>>>>>>>>> aren't
>> >>> >>>>>>>>>>>>>>> decoupled.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Ideally
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible to just get
>> >>> >> the
>> >>> >>>>>>>> matching
>> >>> >>>>>>>>>>>> cluster
>> >>> >>>>>>>>>>>>>>> client
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> from
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then control the job
>> >>> >> through it
>> >>> >>>>>>>>>>>> (environment
>> >>> >>>>>>>>>>>>>>> as
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> factory
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client). But note
>> >>> >> that the
>> >>> >>>>>>>>>> environment
>> >>> >>>>>>>>>>>>>> classes
>> >>> >>>>>>>>>>>>>>>>>>>>>>> are
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> part
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and it is not
>> >>> >> straightforward to
>> >>> >>>>>>>>>> make
>> >>> >>>>>>>>>>>> larger
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> changes
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> without
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> breaking
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backward
>> >>> >> compatibility.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>> >> currently exposes
>> >>> >>>>>>>>>>> internal
>> >>> >>>>>>>>>>>>>>> classes
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> like
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> StreamGraph. But it
>> >>> >> should be
>> >>> >>>>>>>>>> possible
>> >>> >>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>> wrap
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> this
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> with a
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that brings the
>> >>> >> required job
>> >>> >>>>>>>> control
>> >>> >>>>>>>>>>>>>>>>>>>>>>> capabilities
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Perhaps it is helpful
>> >>> >> to look at
>> >>> >>>>>>>>>> some
>> >>> >>>>>>>>>>>> of the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> interfaces
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> while
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking about this:
>> >>> >> [2] for the
>> >>> >>>>>>>>>>>> portable
>> >>> >>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>> API
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> old
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> asynchronous job
>> >>> >> control from
>> >>> >>>>>>>> the
>> >>> >>>>>>>>>> Beam
>> >>> >>>>>>>>>>>> Java
>> >>> >>>>>>>>>>>>>>> SDK.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The backward
>> >>> >> compatibility
>> >>> >>>>>>>>>> discussion
>> >>> >>>>>>>>>>>> [4] is
>> >>> >>>>>>>>>>>>>>>>>>>>>>> also
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> relevant
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here. A
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should shield
>> >>> >> downstream
>> >>> >>>>>>>> projects
>> >>> >>>>>>>>>> from
>> >>> >>>>>>>>>>>>>>> internals
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> allow
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> them to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interoperate with
>> >>> >> multiple
>> >>> >>>>>>>> future
>> >>> >>>>>>>>>>> Flink
>> >>> >>>>>>>>>>>>>>> versions
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> release
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without forced
>> >>> >> upgrades.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>> >>>>>>>>>>>>>> https://github.com/apache/flink/pull/7249
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>
>> >>> >>
>> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>
>> >>> >>
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [4]
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>
>> >>> >>
>> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018
>> >>> >> at 6:15 PM
>> >>> >>>>>>>> Jeff
>> >>> >>>>>>>>>>>> Zhang <
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
>> >>> >> whether the
>> >>> >>>>>>>> user
>> >>> >>>>>>>>>>>> should
>> >>> >>>>>>>>>>>>>> be
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> able
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> define
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
>> >>> >> example Yarn).
>> >>> >>>>>>>> This
>> >>> >>>>>>>>>> is
>> >>> >>>>>>>>>>>>>> actually
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> independent
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
>> >>> >> something
>> >>> >>>>>>>> which
>> >>> >>>>>>>>>> is
>> >>> >>>>>>>>>>>>>> decided
>> >>> >>>>>>>>>>>>>>>>>>>>>>> at
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> User don't need to
>> >>> >> specify
>> >>> >>>>>>>>>> execution
>> >>> >>>>>>>>>>>> mode
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatically.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> They
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass the execution
>> >>> >> mode from
>> >>> >>>>>>>> the
>> >>> >>>>>>>>>>>> arguments
>> >>> >>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> flink
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> run
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>> >>> >> yarn-cluster
>> >>> >>>>>>>> ....
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>> >>> >> local ...
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>> >>> >> host:port ...
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this make sense
>> >>> >> to you ?
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To me it makes
>> >>> >> sense that
>> >>> >>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized by the
>> >>> >> user and
>> >>> >>>>>>>> instead
>> >>> >>>>>>>>>>>> context
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> sensitive
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> how
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute your job
>> >>> >> (Flink CLI vs.
>> >>> >>>>>>>>>> IDE,
>> >>> >>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> example).
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Right, currently I
>> >>> >> notice Flink
>> >>> >>>>>>>>>> would
>> >>> >>>>>>>>>>>>>> create
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> different
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >> ContextExecutionEnvironment
>> >>> >>>>>>>> based
>> >>> >>>>>>>>>> on
>> >>> >>>>>>>>>>>>>>> different
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> submission
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scenarios
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cli vs IDE). To me
>> >>> >> this is
>> >>> >>>>>>>> kind of
>> >>> >>>>>>>>>>> hack
>> >>> >>>>>>>>>>>>>>>>>>>>>>> approach,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> so
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> straightforward.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested
>> >>> >> above is that
>> >>> >>>>>>>> is
>> >>> >>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>> flink
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> should
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> always
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>> >> but with
>> >>> >>>>>>>>>>> different
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> configuration,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> based
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration it
>> >>> >> would create
>> >>> >>>>>>>> the
>> >>> >>>>>>>>>>>> proper
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behaviors.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
>> >>> >>>>>>>>>> trohrmann@apache.org>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月20日周四
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:18写道：
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You are probably
>> >>> >> right that we
>> >>> >>>>>>>>>> have
>> >>> >>>>>>>>>>>> code
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> duplication
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> when
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creation of the
>> >>> >> ClusterClient.
>> >>> >>>>>>>>>> This
>> >>> >>>>>>>>>>>> should
>> >>> >>>>>>>>>>>>>>> be
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> reduced
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
>> >>> >> whether the
>> >>> >>>>>>>> user
>> >>> >>>>>>>>>>>> should be
>> >>> >>>>>>>>>>>>>>>>>>>>>>> able
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> define
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
>> >>> >> example Yarn).
>> >>> >>>>>>>> This
>> >>> >>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>> actually
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> independent
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
>> >>> >> something
>> >>> >>>>>>>> which
>> >>> >>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>> decided
>> >>> >>>>>>>>>>>>>>>>>>>>>>> at
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> makes sense that the
>> >>> >>>>>>>>>>>> ExecutionEnvironment
>> >>> >>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>>>>>>>> not
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user and
>> >>> >> instead context
>> >>> >>>>>>>>>>>> sensitive how
>> >>> >>>>>>>>>>>>>>> you
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> want
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE,
>> >>> >> for
>> >>> >>>>>>>> example).
>> >>> >>>>>>>>>>>>>> However, I
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> agree
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >> ExecutionEnvironment should
>> >>> >>>>>>>> give
>> >>> >>>>>>>>>> you
>> >>> >>>>>>>>>>>>>> access
>> >>> >>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job (maybe in the
>> >>> >> form of the
>> >>> >>>>>>>>>>>> JobGraph or
>> >>> >>>>>>>>>>>>>> a
>> >>> >>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> plan).
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 13,
>> >>> >> 2018 at 4:36
>> >>> >>>>>>>> AM
>> >>> >>>>>>>>>> Jeff
>> >>> >>>>>>>>>>>>>> Zhang <
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Till,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the
>> >>> >> feedback. You
>> >>> >>>>>>>> are
>> >>> >>>>>>>>>>>> right
>> >>> >>>>>>>>>>>>>>> that I
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> expect
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >> submission/control api
>> >>> >>>>>>>> which
>> >>> >>>>>>>>>>>> could be
>> >>> >>>>>>>>>>>>>>>>>>>>>>> used
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> by
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> project.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it would benefit
>> >>> >> for the
>> >>> >>>>>>>> flink
>> >>> >>>>>>>>>>>> ecosystem.
>> >>> >>>>>>>>>>>>>>>>>>>>>>> When
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> I
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> look
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scala-shell and
>> >>> >> sql-client (I
>> >>> >>>>>>>>>>> believe
>> >>> >>>>>>>>>>>>>> they
>> >>> >>>>>>>>>>>>>>>>>>>>>>> are
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> core of
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> belong to the
>> >>> >> ecosystem of
>> >>> >>>>>>>>>> flink),
>> >>> >>>>>>>>>>> I
>> >>> >>>>>>>>>>>> find
>> >>> >>>>>>>>>>>>>>>>>>>>>>> many
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> duplicated
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creating
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient from
>> >>> >> user
>> >>> >>>>>>>> provided
>> >>> >>>>>>>>>>>>>>>>>>>>>>> configuration
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (configuration
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> format
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different from
>> >>> >> scala-shell
>> >>> >>>>>>>> and
>> >>> >>>>>>>>>>>>>> sql-client)
>> >>> >>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> use
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to manipulate
>> >>> >> jobs. I don't
>> >>> >>>>>>>> think
>> >>> >>>>>>>>>>>> this is
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> convenient
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects. What I
>> >>> >> expect is
>> >>> >>>>>>>> that
>> >>> >>>>>>>>>>>>>> downstream
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> project
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> only
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary
>> >>> >> configuration info
>> >>> >>>>>>>>>> (maybe
>> >>> >>>>>>>>>>>>>>>>>>>>>>> introducing
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> class
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf),
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> build
>> >>> >> ExecutionEnvironment
>> >>> >>>>>>>> based
>> >>> >>>>>>>>>> on
>> >>> >>>>>>>>>>>> this
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >> ExecutionEnvironment will
>> >>> >>>>>>>> create
>> >>> >>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>> proper
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> benefit for the
>> >>> >> downstream
>> >>> >>>>>>>>>> project
>> >>> >>>>>>>>>>>>>>>>>>>>>>> development
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> but
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> also
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their integration
>> >>> >> test with
>> >>> >>>>>>>>>> flink.
>> >>> >>>>>>>>>>>> Here's
>> >>> >>>>>>>>>>>>>>> one
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> sample
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> snippet
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expect.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val conf = new
>> >>> >>>>>>>>>>>> FlinkConf().mode("yarn")
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val env = new
>> >>> >>>>>>>>>>>> ExecutionEnvironment(conf)
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobId =
>> >>> >> env.submit(...)
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobStatus =
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What do you think ?
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
>> >>> >>>>>>>>>>> trohrmann@apache.org>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月11日周二
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午6:28写道：
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what you are
>> >>> >> proposing is to
>> >>> >>>>>>>>>>>> provide the
>> >>> >>>>>>>>>>>>>>>>>>>>>>> user
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> with
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control. There
>> >>> >> was actually
>> >>> >>>>>>>> an
>> >>> >>>>>>>>>>>> effort to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> achieve
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> this
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> never
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> been
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> completed [1].
>> >>> >> However,
>> >>> >>>>>>>> there
>> >>> >>>>>>>>>> are
>> >>> >>>>>>>>>>>> some
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> improvement
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> base
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> now.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Look for example
>> >>> >> at the
>> >>> >>>>>>>>>>>> NewClusterClient
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> interface
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> offers a
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> non-blocking job
>> >>> >> submission.
>> >>> >>>>>>>>>> But I
>> >>> >>>>>>>>>>>> agree
>> >>> >>>>>>>>>>>>>>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> we
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> need
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improve
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this regard.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would not be in
>> >>> >> favour if
>> >>> >>>>>>>>>>>> exposing all
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> calls
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >> ExecutionEnvironment
>> >>> >>>>>>>> because it
>> >>> >>>>>>>>>>>> would
>> >>> >>>>>>>>>>>>>>>>>>>>>>> clutter
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> class
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> good separation
>> >>> >> of concerns.
>> >>> >>>>>>>>>>>> Instead one
>> >>> >>>>>>>>>>>>>>>>>>>>>>> idea
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> could
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current
>> >>> >> ClusterClient from
>> >>> >>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for cluster and
>> >>> >> job
>> >>> >>>>>>>> control. But
>> >>> >>>>>>>>>>>> before
>> >>> >>>>>>>>>>>>>> we
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> start
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> an
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> effort
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> agree and capture
>> >>> >> what
>> >>> >>>>>>>>>>>> functionality we
>> >>> >>>>>>>>>>>>>>> want
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> provide.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the
>> >>> >> idea was
>> >>> >>>>>>>> that we
>> >>> >>>>>>>>>>>> have the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> describing
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to talk to
>> >>> >> cluster manager
>> >>> >>>>>>>> like
>> >>> >>>>>>>>>>>> Yarn or
>> >>> >>>>>>>>>>>>>>>>>>>>>>> Mesos.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> The
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used for
>> >>> >> deploying Flink
>> >>> >>>>>>>>>> clusters
>> >>> >>>>>>>>>>>> (job
>> >>> >>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> session)
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gives
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you a
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
>> >>> >>>>>>>> ClusterClient
>> >>> >>>>>>>>>>>>>> controls
>> >>> >>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (e.g.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, listing all
>> >>> >> running
>> >>> >>>>>>>> jobs).
>> >>> >>>>>>>>>>> And
>> >>> >>>>>>>>>>>>>> then
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> there
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> was
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> idea
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobClient which
>> >>> >> you obtain
>> >>> >>>>>>>> from
>> >>> >>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trigger
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operations (e.g.
>> >>> >> taking a
>> >>> >>>>>>>>>>> savepoint,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> cancelling
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> job).
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-4272
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11,
>> >>> >> 2018 at
>> >>> >>>>>>>> 10:13 AM
>> >>> >>>>>>>>>>>> Jeff
>> >>> >>>>>>>>>>>>>>> Zhang
>> >>> >>>>>>>>>>>>>>>>>>>>>>> <
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am trying to
>> >>> >> integrate
>> >>> >>>>>>>> flink
>> >>> >>>>>>>>>>> into
>> >>> >>>>>>>>>>>>>>> apache
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> zeppelin
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which is
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> notebook. And I
>> >>> >> hit several
>> >>> >>>>>>>>>>> issues
>> >>> >>>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> caused
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> by
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to
>> >>> >> proposal the
>> >>> >>>>>>>>>>> following
>> >>> >>>>>>>>>>>>>>> changes
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Support
>> >>> >> nonblocking
>> >>> >>>>>>>>>> execution.
>> >>> >>>>>>>>>>>>>>>>>>>>>>> Currently,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >> ExecutionEnvironment#execute
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a blocking
>> >>> >> method which
>> >>> >>>>>>>>>> would
>> >>> >>>>>>>>>>>> do 2
>> >>> >>>>>>>>>>>>>>>>>>>>>>> things,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> first
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wait for job
>> >>> >> until it is
>> >>> >>>>>>>>>>> finished.
>> >>> >>>>>>>>>>>> I'd
>> >>> >>>>>>>>>>>>>>> like
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nonblocking
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution method
>> >>> >> like
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#submit
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then return
>> >>> >> jobId to
>> >>> >>>>>>>> client.
>> >>> >>>>>>>>>> And
>> >>> >>>>>>>>>>>> allow
>> >>> >>>>>>>>>>>>>>> user
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> query
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> status
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel
>> >>> >> api in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >> ExecutionEnvironment/StreamExecutionEnvironment,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> currently the
>> >>> >> only way to
>> >>> >>>>>>>>>> cancel
>> >>> >>>>>>>>>>>> job is
>> >>> >>>>>>>>>>>>>>> via
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> cli
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (bin/flink),
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> convenient for
>> >>> >> downstream
>> >>> >>>>>>>>>> project
>> >>> >>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>> use
>> >>> >>>>>>>>>>>>>>>>>>>>>>> this
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> feature.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So I'd
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> add
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel api in
>> >>> >>>>>>>>>>> ExecutionEnvironment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint
>> >>> >> api in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is similar as
>> >>> >> cancel api,
>> >>> >>>>>>>> we
>> >>> >>>>>>>>>>>> should use
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unified
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api for third
>> >>> >> party to
>> >>> >>>>>>>>>> integrate
>> >>> >>>>>>>>>>>> with
>> >>> >>>>>>>>>>>>>>>>>>>>>>> flink.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. Add listener
>> >>> >> for job
>> >>> >>>>>>>>>> execution
>> >>> >>>>>>>>>>>>>>>>>>>>>>> lifecycle.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Something
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> so
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that downstream
>> >>> >> project
>> >>> >>>>>>>> can do
>> >>> >>>>>>>>>>>> custom
>> >>> >>>>>>>>>>>>>>> logic
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lifecycle
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin would
>> >>> >> capture the
>> >>> >>>>>>>>>> jobId
>> >>> >>>>>>>>>>>> after
>> >>> >>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> submitted
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId to cancel
>> >>> >> it later
>> >>> >>>>>>>> when
>> >>> >>>>>>>>>>>>>> necessary.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public interface
>> >>> >>>>>>>> JobListener {
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>> >>> >> onJobSubmitted(JobID
>> >>> >>>>>>>>>>> jobId);
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>> >>> >>>>>>>>>>>> onJobExecuted(JobExecutionResult
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> jobResult);
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>> >>> >> onJobCanceled(JobID
>> >>> >>>>>>>>>> jobId);
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> }
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5. Enable
>> >>> >> session in
>> >>> >>>>>>>>>>>>>>> ExecutionEnvironment.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Currently
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> disabled,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> session is very
>> >>> >> convenient
>> >>> >>>>>>>> for
>> >>> >>>>>>>>>>>> third
>> >>> >>>>>>>>>>>>>>> party
>> >>> >>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> continually.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I hope flink can
>> >>> >> enable it
>> >>> >>>>>>>>>> again.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6. Unify all
>> >>> >> flink client
>> >>> >>>>>>>> api
>> >>> >>>>>>>>>>> into
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a long
>> >>> >> term issue
>> >>> >>>>>>>> which
>> >>> >>>>>>>>>>>> needs
>> >>> >>>>>>>>>>>>>>> more
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> careful
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently some
>> >>> >> of features
>> >>> >>>>>>>> of
>> >>> >>>>>>>>>>>> flink is
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> exposed
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> but
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some are
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exposed
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cli instead of
>> >>> >> api, like
>> >>> >>>>>>>> the
>> >>> >>>>>>>>>>>> cancel and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> savepoint I
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> above.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think the root
>> >>> >> cause is
>> >>> >>>>>>>> due to
>> >>> >>>>>>>>>>> that
>> >>> >>>>>>>>>>>>>> flink
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> didn't
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> unify
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interaction
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink. Here I
>> >>> >> list 3
>> >>> >>>>>>>> scenarios
>> >>> >>>>>>>>>> of
>> >>> >>>>>>>>>>>> flink
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> operation
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Local job
>> >>> >> execution.
>> >>> >>>>>>>> Flink
>> >>> >>>>>>>>>>> will
>> >>> >>>>>>>>>>>>>>> create
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>> >>> >> LocalEnvironment to
>> >>> >>>>>>>>>> create
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> LocalExecutor
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Remote job
>> >>> >> execution.
>> >>> >>>>>>>> Flink
>> >>> >>>>>>>>>>> will
>> >>> >>>>>>>>>>>>>>> create
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>> >>> >> ContextEnvironment
>> >>> >>>>>>>>>> based
>> >>> >>>>>>>>>>>> on the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> run
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Job
>> >>> >> cancelation. Flink
>> >>> >>>>>>>> will
>> >>> >>>>>>>>>>>> create
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this job via
>> >>> >> this
>> >>> >>>>>>>>>> ClusterClient.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As you can see
>> >>> >> in the
>> >>> >>>>>>>> above 3
>> >>> >>>>>>>>>>>>>> scenarios.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> Flink
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> didn't
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach(code
>> >>> >> path) to
>> >>> >>>>>>>> interact
>> >>> >>>>>>>>>>>> with
>> >>> >>>>>>>>>>>>>>> flink
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I propose is
>> >>> >>>>>>>> following:
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Create the proper
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment/RemoteEnvironment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (based
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration)
>> >>> >> --> Use this
>> >>> >>>>>>>>>>>> Environment
>> >>> >>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> create
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> proper
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >> (LocalClusterClient or
>> >>> >>>>>>>>>>>>>> RestClusterClient)
>> >>> >>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink (
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution or
>> >>> >> cancelation)
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This way we can
>> >>> >> unify the
>> >>> >>>>>>>>>> process
>> >>> >>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>> local
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> execution
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remote
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And it is much
>> >>> >> easier for
>> >>> >>>>>>>> third
>> >>> >>>>>>>>>>>> party
>> >>> >>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> integrate
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >> ExecutionEnvironment is the
>> >>> >>>>>>>>>>> unified
>> >>> >>>>>>>>>>>>>> entry
>> >>> >>>>>>>>>>>>>>>>>>>>>>>> point
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> third
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> party
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs to do is
>> >>> >> just pass
>> >>> >>>>>>>>>>>> configuration
>> >>> >>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >> ExecutionEnvironment will
>> >>> >>>>>>>> do
>> >>> >>>>>>>>>> the
>> >>> >>>>>>>>>>>> right
>> >>> >>>>>>>>>>>>>>>>>>>>>>> thing
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> based
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> on
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink cli can
>> >>> >> also be
>> >>> >>>>>>>>>> considered
>> >>> >>>>>>>>>>> as
>> >>> >>>>>>>>>>>>>> flink
>> >>> >>>>>>>>>>>>>>>>>>>>>>> api
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> consumer.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> just
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration to
>> >>> >>>>>>>>>>>> ExecutionEnvironment
>> >>> >>>>>>>>>>>>>> and
>> >>> >>>>>>>>>>>>>>>>>>>>>>> let
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create the proper
>> >>> >>>>>>>> ClusterClient
>> >>> >>>>>>>>>>>> instead
>> >>> >>>>>>>>>>>>>>> of
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> letting
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> cli
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>> >> directly.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6 would involve
>> >>> >> large code
>> >>> >>>>>>>>>>>> refactoring,
>> >>> >>>>>>>>>>>>>>> so
>> >>> >>>>>>>>>>>>>>>>>>>>>>> I
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> think
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> we
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> defer
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future release,
>> >>> >> 1,2,3,4,5
>> >>> >>>>>>>> could
>> >>> >>>>>>>>>>> be
>> >>> >>>>>>>>>>>> done
>> >>> >>>>>>>>>>>>>>> at
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>> once I
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> believe.
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comments and
>> >>> >> feedback,
>> >>> >>>>>>>> thanks
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>> --
>> >>> >>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>> >>>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> --
>> >>> >>>>>>> Best Regards
>> >>> >>>>>>>
>> >>> >>>>>>> Jeff Zhang
>> >>> >>
>> >>> >>
>> >>>
>> >>
>> >>
>> >> --
>> >> Best Regards
>> >>
>> >> Jeff Zhang
>>
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

FYI I've created a public channel #flink-client-api-enhancement under ASK
slack.

You can reach it after join ASK Slack and search the channel.

Best,
tison.


Kostas Kloudas <kk...@gmail.com> 于2019年9月11日周三 下午5:09写道：

> +1 for creating a channel.
>
> Kostas
>
> On Wed, Sep 11, 2019 at 10:57 AM Zili Chen <wa...@gmail.com> wrote:
> >
> > Hi Aljoscha,
> >
> > I'm OK to use the ASF slack.
> >
> > Best,
> > tison.
> >
> >
> > Jeff Zhang <zj...@gmail.com> 于2019年9月11日周三 下午4:48写道：
> >>
> >> +1 for using slack for instant communication
> >>
> >> Aljoscha Krettek <al...@apache.org> 于2019年9月11日周三 下午4:44写道：
> >>>
> >>> Hi,
> >>>
> >>> We could try and use the ASF slack for this purpose, that would
> probably be easiest. See https://s.apache.org/slack-invite. We could
> create a dedicated channel for our work and would still use the open ASF
> infrastructure and people can have a look if they are interested because
> discussion would be public. What do you think?
> >>>
> >>> P.S. Committers/PMCs should should be able to login with their apache
> ID.
> >>>
> >>> Best,
> >>> Aljoscha
> >>>
> >>> > On 6. Sep 2019, at 14:24, Zili Chen <wa...@gmail.com> wrote:
> >>> >
> >>> > Hi Aljoscha,
> >>> >
> >>> > I'd like to gather all the ideas here and among documents, and draft
> a
> >>> > formal FLIP
> >>> > that keep us on the same page. Hopefully I start a FLIP thread in
> next week.
> >>> >
> >>> > For the implementation or said POC part, I'd like to work with you
> guys who
> >>> > proposed
> >>> > the concept Executor to make sure that we go in the same direction.
> I'm
> >>> > wondering
> >>> > whether a dedicate thread or a Slack group is the proper one. In my
> opinion
> >>> > we can
> >>> > involve the team in a Slack group, concurrent with the FLIP process
> start
> >>> > our branch
> >>> > and once we reach a consensus on the FLIP, open an umbrella issue
> about the
> >>> > framework
> >>> > and start subtasks. What do you think?
> >>> >
> >>> > Best,
> >>> > tison.
> >>> >
> >>> >
> >>> > Aljoscha Krettek <al...@apache.org> 于2019年9月5日周四 下午9:39写道：
> >>> >
> >>> >> Hi Tison,
> >>> >>
> >>> >> To keep this moving forward, maybe you want to start working on a
> proof of
> >>> >> concept implementation for the new JobClient interface, maybe with
> a new
> >>> >> method executeAsync() in the environment that returns the JobClient
> and
> >>> >> implement the methods to see how that works and to see where we
> get. Would
> >>> >> you be interested in that?
> >>> >>
> >>> >> Also, at some point we should collect all the ideas and start
> forming an
> >>> >> actual FLIP.
> >>> >>
> >>> >> Best,
> >>> >> Aljoscha
> >>> >>
> >>> >>> On 4. Sep 2019, at 12:04, Zili Chen <wa...@gmail.com> wrote:
> >>> >>>
> >>> >>> Thanks for your update Kostas!
> >>> >>>
> >>> >>> It looks good to me that clean up existing code paths as first
> >>> >>> pass. I'd like to help on review and file subtasks if I find ones.
> >>> >>>
> >>> >>> Best,
> >>> >>> tison.
> >>> >>>
> >>> >>>
> >>> >>> Kostas Kloudas <kk...@gmail.com> 于2019年9月4日周三 下午5:52写道：
> >>> >>> Here is the issue, and I will keep on updating it as I find more
> issues.
> >>> >>>
> >>> >>> https://issues.apache.org/jira/browse/FLINK-13954
> >>> >>>
> >>> >>> This will also cover the refactoring of the Executors that we
> discussed
> >>> >>> in this thread, without any additional functionality (such as the
> job
> >>> >> client).
> >>> >>>
> >>> >>> Kostas
> >>> >>>
> >>> >>> On Wed, Sep 4, 2019 at 11:46 AM Kostas Kloudas <kkloudas@gmail.com
> >
> >>> >> wrote:
> >>> >>>>
> >>> >>>> Great idea Tison!
> >>> >>>>
> >>> >>>> I will create the umbrella issue and post it here so that we are
> all
> >>> >>>> on the same page!
> >>> >>>>
> >>> >>>> Cheers,
> >>> >>>> Kostas
> >>> >>>>
> >>> >>>> On Wed, Sep 4, 2019 at 11:36 AM Zili Chen <wa...@gmail.com>
> >>> >> wrote:
> >>> >>>>>
> >>> >>>>> Hi Kostas & Aljoscha,
> >>> >>>>>
> >>> >>>>> I notice that there is a JIRA(FLINK-13946) which could be
> included
> >>> >>>>> in this refactor thread. Since we agree on most of directions in
> >>> >>>>> big picture, is it reasonable that we create an umbrella issue
> for
> >>> >>>>> refactor client APIs and also linked subtasks? It would be a
> better
> >>> >>>>> way that we join forces of our community.
> >>> >>>>>
> >>> >>>>> Best,
> >>> >>>>> tison.
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> Zili Chen <wa...@gmail.com> 于2019年8月31日周六 下午12:52写道：
> >>> >>>>>>
> >>> >>>>>> Great Kostas! Looking forward to your POC!
> >>> >>>>>>
> >>> >>>>>> Best,
> >>> >>>>>> tison.
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>> Jeff Zhang <zj...@gmail.com> 于2019年8月30日周五 下午11:07写道：
> >>> >>>>>>>
> >>> >>>>>>> Awesome, @Kostas Looking forward your POC.
> >>> >>>>>>>
> >>> >>>>>>> Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：
> >>> >>>>>>>
> >>> >>>>>>>> Hi all,
> >>> >>>>>>>>
> >>> >>>>>>>> I am just writing here to let you know that I am working on a
> >>> >> POC that
> >>> >>>>>>>> tries to refactor the current state of job submission in
> Flink.
> >>> >>>>>>>> I want to stress out that it introduces NO CHANGES to the
> current
> >>> >>>>>>>> behaviour of Flink. It just re-arranges things and introduces
> the
> >>> >>>>>>>> notion of an Executor, which is the entity responsible for
> >>> >> taking the
> >>> >>>>>>>> user-code and submitting it for execution.
> >>> >>>>>>>>
> >>> >>>>>>>> Given this, the discussion about the functionality that the
> >>> >> JobClient
> >>> >>>>>>>> will expose to the user can go on independently and the same
> >>> >>>>>>>> holds for all the open questions so far.
> >>> >>>>>>>>
> >>> >>>>>>>> I hope I will have some more new to share soon.
> >>> >>>>>>>>
> >>> >>>>>>>> Thanks,
> >>> >>>>>>>> Kostas
> >>> >>>>>>>>
> >>> >>>>>>>> On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <
> danrtsey.wy@gmail.com>
> >>> >> wrote:
> >>> >>>>>>>>>
> >>> >>>>>>>>> Hi Zili,
> >>> >>>>>>>>>
> >>> >>>>>>>>> It make sense to me that a dedicated cluster is started for a
> >>> >> per-job
> >>> >>>>>>>>> cluster and will not accept more jobs.
> >>> >>>>>>>>> Just have a question about the command line.
> >>> >>>>>>>>>
> >>> >>>>>>>>> Currently we could use the following commands to start
> >>> >> different
> >>> >>>>>>>> clusters.
> >>> >>>>>>>>> *per-job cluster*
> >>> >>>>>>>>> ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
> >>> >>>>>>>>> examples/streaming/WindowJoin.jar
> >>> >>>>>>>>> *session cluster*
> >>> >>>>>>>>> ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
> >>> >>>>>>>>> examples/streaming/WindowJoin.jar
> >>> >>>>>>>>>
> >>> >>>>>>>>> What will it look like after client enhancement?
> >>> >>>>>>>>>
> >>> >>>>>>>>>
> >>> >>>>>>>>> Best,
> >>> >>>>>>>>> Yang
> >>> >>>>>>>>>
> >>> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
> >>> >>>>>>>>>
> >>> >>>>>>>>>> Hi Till,
> >>> >>>>>>>>>>
> >>> >>>>>>>>>> Thanks for your update. Nice to hear :-)
> >>> >>>>>>>>>>
> >>> >>>>>>>>>> Best,
> >>> >>>>>>>>>> tison.
> >>> >>>>>>>>>>
> >>> >>>>>>>>>>
> >>> >>>>>>>>>> Till Rohrmann <tr...@apache.org> 于2019年8月23日周五
> >>> >> 下午10:39写道：
> >>> >>>>>>>>>>
> >>> >>>>>>>>>>> Hi Tison,
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>> just a quick comment concerning the class loading issues
> >>> >> when using
> >>> >>>>>>>> the
> >>> >>>>>>>>>> per
> >>> >>>>>>>>>>> job mode. The community wants to change it so that the
> >>> >>>>>>>>>>> StandaloneJobClusterEntryPoint actually uses the user code
> >>> >> class
> >>> >>>>>>>> loader
> >>> >>>>>>>>>>> with child first class loading [1]. Hence, I hope that
> >>> >> this problem
> >>> >>>>>>>> will
> >>> >>>>>>>>>> be
> >>> >>>>>>>>>>> resolved soon.
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-13840
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>> Cheers,
> >>> >>>>>>>>>>> Till
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>> On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <
> >>> >> kkloudas@gmail.com>
> >>> >>>>>>>>>> wrote:
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>>> Hi all,
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>> On the topic of web submission, I agree with Till that
> >>> >> it only
> >>> >>>>>>>> seems
> >>> >>>>>>>>>>>> to complicate things.
> >>> >>>>>>>>>>>> It is bad for security, job isolation (anybody can
> >>> >> submit/cancel
> >>> >>>>>>>> jobs),
> >>> >>>>>>>>>>>> and its
> >>> >>>>>>>>>>>> implementation complicates some parts of the code. So,
> >>> >> if it were
> >>> >>>>>>>> to
> >>> >>>>>>>>>>>> redesign the
> >>> >>>>>>>>>>>> WebUI, maybe this part could be left out. In addition, I
> >>> >> would say
> >>> >>>>>>>>>>>> that the ability to cancel
> >>> >>>>>>>>>>>> jobs could also be left out.
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>> Also I would also be in favour of removing the
> >>> >> "detached" mode, for
> >>> >>>>>>>>>>>> the reasons mentioned
> >>> >>>>>>>>>>>> above (i.e. because now we will have a future
> >>> >> representing the
> >>> >>>>>>>> result
> >>> >>>>>>>>>>>> on which the user
> >>> >>>>>>>>>>>> can choose to wait or not).
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>> Now for the separating job submission and cluster
> >>> >> creation, I am in
> >>> >>>>>>>>>>>> favour of keeping both.
> >>> >>>>>>>>>>>> Once again, the reasons are mentioned above by Stephan,
> >>> >> Till,
> >>> >>>>>>>> Aljoscha
> >>> >>>>>>>>>>>> and also Zili seems
> >>> >>>>>>>>>>>> to agree. They mainly have to do with security,
> >>> >> isolation and ease
> >>> >>>>>>>> of
> >>> >>>>>>>>>>>> resource management
> >>> >>>>>>>>>>>> for the user as he knows that "when my job is done,
> >>> >> everything
> >>> >>>>>>>> will be
> >>> >>>>>>>>>>>> cleared up". This is
> >>> >>>>>>>>>>>> also the experience you get when launching a process on
> >>> >> your local
> >>> >>>>>>>> OS.
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>> On excluding the per-job mode from returning a JobClient
> >>> >> or not, I
> >>> >>>>>>>>>>>> believe that eventually
> >>> >>>>>>>>>>>> it would be nice to allow users to get back a jobClient.
> >>> >> The
> >>> >>>>>>>> reason is
> >>> >>>>>>>>>>>> that 1) I cannot
> >>> >>>>>>>>>>>> find any objective reason why the user-experience should
> >>> >> diverge,
> >>> >>>>>>>> and
> >>> >>>>>>>>>>>> 2) this will be the
> >>> >>>>>>>>>>>> way that the user will be able to interact with his
> >>> >> running job.
> >>> >>>>>>>>>>>> Assuming that the necessary
> >>> >>>>>>>>>>>> ports are open for the REST API to work, then I think
> >>> >> that the
> >>> >>>>>>>>>>>> JobClient can run against the
> >>> >>>>>>>>>>>> REST API without problems. If the needed ports are not
> >>> >> open, then
> >>> >>>>>>>> we
> >>> >>>>>>>>>>>> are safe to not return
> >>> >>>>>>>>>>>> a JobClient, as the user explicitly chose to close all
> >>> >> points of
> >>> >>>>>>>>>>>> communication to his running job.
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>> On the topic of not hijacking the "env.execute()" in
> >>> >> order to get
> >>> >>>>>>>> the
> >>> >>>>>>>>>>>> Plan, I definitely agree but
> >>> >>>>>>>>>>>> for the proposal of having a "compile()" method in the
> >>> >> env, I would
> >>> >>>>>>>>>>>> like to have a better look at
> >>> >>>>>>>>>>>> the existing code.
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>> Cheers,
> >>> >>>>>>>>>>>> Kostas
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>> On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <
> >>> >> wander4096@gmail.com>
> >>> >>>>>>>>>> wrote:
> >>> >>>>>>>>>>>>>
> >>> >>>>>>>>>>>>> Hi Yang,
> >>> >>>>>>>>>>>>>
> >>> >>>>>>>>>>>>> It would be helpful if you check Stephan's last
> >>> >> comment,
> >>> >>>>>>>>>>>>> which states that isolation is important.
> >>> >>>>>>>>>>>>>
> >>> >>>>>>>>>>>>> For per-job mode, we run a dedicated cluster(maybe it
> >>> >>>>>>>>>>>>> should have been a couple of JM and TMs during FLIP-6
> >>> >>>>>>>>>>>>> design) for a specific job. Thus the process is
> >>> >> prevented
> >>> >>>>>>>>>>>>> from other jobs.
> >>> >>>>>>>>>>>>>
> >>> >>>>>>>>>>>>> In our cases there was a time we suffered from multi
> >>> >>>>>>>>>>>>> jobs submitted by different users and they affected
> >>> >>>>>>>>>>>>> each other so that all ran into an error state. Also,
> >>> >>>>>>>>>>>>> run the client inside the cluster could save client
> >>> >>>>>>>>>>>>> resource at some points.
> >>> >>>>>>>>>>>>>
> >>> >>>>>>>>>>>>> However, we also face several issues as you mentioned,
> >>> >>>>>>>>>>>>> that in per-job mode it always uses parent classloader
> >>> >>>>>>>>>>>>> thus classloading issues occur.
> >>> >>>>>>>>>>>>>
> >>> >>>>>>>>>>>>> BTW, one can makes an analogy between session/per-job
> >>> >> mode
> >>> >>>>>>>>>>>>> in  Flink, and client/cluster mode in Spark.
> >>> >>>>>>>>>>>>>
> >>> >>>>>>>>>>>>> Best,
> >>> >>>>>>>>>>>>> tison.
> >>> >>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>
> >>> >>>>>>>>>>>>> Yang Wang <da...@gmail.com> 于2019年8月22日周四
> >>> >> 上午11:25写道：
> >>> >>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> From the user's perspective, it is really confused
> >>> >> about the
> >>> >>>>>>>> scope
> >>> >>>>>>>>>> of
> >>> >>>>>>>>>>>>>> per-job cluster.
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> If it means a flink cluster with single job, so that
> >>> >> we could
> >>> >>>>>>>> get
> >>> >>>>>>>>>>>> better
> >>> >>>>>>>>>>>>>> isolation.
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> Now it does not matter how we deploy the cluster,
> >>> >> directly
> >>> >>>>>>>>>>>> deploy(mode1)
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> or start a flink cluster and then submit job through
> >>> >> cluster
> >>> >>>>>>>>>>>> client(mode2).
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> Otherwise, if it just means directly deploy, how
> >>> >> should we
> >>> >>>>>>>> name the
> >>> >>>>>>>>>>>> mode2,
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> session with job or something else?
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> We could also benefit from the mode2. Users could
> >>> >> get the same
> >>> >>>>>>>>>>>> isolation
> >>> >>>>>>>>>>>>>> with mode1.
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> The user code and dependencies will be loaded by
> >>> >> user class
> >>> >>>>>>>> loader
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> to avoid class conflict with framework.
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> Anyway, both of the two submission modes are useful.
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> We just need to clarify the concepts.
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> Best,
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> Yang
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月20日周二
> >>> >> 下午5:58写道：
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>> Thanks for the clarification.
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>> The idea JobDeployer ever came into my mind when I
> >>> >> was
> >>> >>>>>>>> muddled
> >>> >>>>>>>>>> with
> >>> >>>>>>>>>>>>>>> how to execute per-job mode and session mode with
> >>> >> the same
> >>> >>>>>>>> user
> >>> >>>>>>>>>>> code
> >>> >>>>>>>>>>>>>>> and framework codepath.
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>> With the concept JobDeployer we back to the
> >>> >> statement that
> >>> >>>>>>>>>>>> environment
> >>> >>>>>>>>>>>>>>> knows every configs of cluster deployment and job
> >>> >>>>>>>> submission. We
> >>> >>>>>>>>>>>>>>> configure or generate from configuration a specific
> >>> >>>>>>>> JobDeployer
> >>> >>>>>>>>>> in
> >>> >>>>>>>>>>>>>>> environment and then code align on
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>> *JobClient client = env.execute().get();*
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>> which in session mode returned by
> >>> >> clusterClient.submitJob
> >>> >>>>>>>> and in
> >>> >>>>>>>>>>>> per-job
> >>> >>>>>>>>>>>>>>> mode returned by
> >>> >> clusterDescriptor.deployJobCluster.
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>> Here comes a problem that currently we directly run
> >>> >>>>>>>>>>> ClusterEntrypoint
> >>> >>>>>>>>>>>>>>> with extracted job graph. Follow the JobDeployer
> >>> >> way we'd
> >>> >>>>>>>> better
> >>> >>>>>>>>>>>>>>> align entry point of per-job deployment at
> >>> >> JobDeployer.
> >>> >>>>>>>> Users run
> >>> >>>>>>>>>>>>>>> their main method or by a Cli(finally call main
> >>> >> method) to
> >>> >>>>>>>> deploy
> >>> >>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>> job cluster.
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>> Best,
> >>> >>>>>>>>>>>>>>> tison.
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年8月20日周二
> >>> >> 下午4:40写道：
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>> Till has made some good comments here.
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>> Two things to add:
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>  - The job mode is very nice in the way that it
> >>> >> runs the
> >>> >>>>>>>>>> client
> >>> >>>>>>>>>>>> inside
> >>> >>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>> cluster (in the same image/process that is the
> >>> >> JM) and thus
> >>> >>>>>>>>>>> unifies
> >>> >>>>>>>>>>>>>> both
> >>> >>>>>>>>>>>>>>>> applications and what the Spark world calls the
> >>> >> "driver
> >>> >>>>>>>> mode".
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>  - Another thing I would add is that during the
> >>> >> FLIP-6
> >>> >>>>>>>> design,
> >>> >>>>>>>>>>> we
> >>> >>>>>>>>>>>> were
> >>> >>>>>>>>>>>>>>>> thinking about setups where Dispatcher and
> >>> >> JobManager are
> >>> >>>>>>>>>>> separate
> >>> >>>>>>>>>>>>>>>> processes.
> >>> >>>>>>>>>>>>>>>>    A Yarn or Mesos Dispatcher of a session
> >>> >> could run
> >>> >>>>>>>>>>> independently
> >>> >>>>>>>>>>>>>> (even
> >>> >>>>>>>>>>>>>>>> as privileged processes executing no code).
> >>> >>>>>>>>>>>>>>>>    Then you the "per-job" mode could still be
> >>> >> helpful:
> >>> >>>>>>>> when a
> >>> >>>>>>>>>>> job
> >>> >>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>> submitted to the dispatcher, it launches the JM
> >>> >> again in a
> >>> >>>>>>>>>>> per-job
> >>> >>>>>>>>>>>>>> mode,
> >>> >>>>>>>>>>>>>>> so
> >>> >>>>>>>>>>>>>>>> that JM and TM processes are bound to teh job
> >>> >> only. For
> >>> >>>>>>>> higher
> >>> >>>>>>>>>>>> security
> >>> >>>>>>>>>>>>>>>> setups, it is important that processes are not
> >>> >> reused
> >>> >>>>>>>> across
> >>> >>>>>>>>>>> jobs.
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
> >>> >>>>>>>>>>>> trohrmann@apache.org>
> >>> >>>>>>>>>>>>>>>> wrote:
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>> I would not be in favour of getting rid of the
> >>> >> per-job
> >>> >>>>>>>> mode
> >>> >>>>>>>>>>>> since it
> >>> >>>>>>>>>>>>>>>>> simplifies the process of running Flink jobs
> >>> >>>>>>>> considerably.
> >>> >>>>>>>>>>>> Moreover,
> >>> >>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>> not only well suited for container deployments
> >>> >> but also
> >>> >>>>>>>> for
> >>> >>>>>>>>>>>>>> deployments
> >>> >>>>>>>>>>>>>>>>> where you want to guarantee job isolation. For
> >>> >> example, a
> >>> >>>>>>>>>> user
> >>> >>>>>>>>>>>> could
> >>> >>>>>>>>>>>>>>> use
> >>> >>>>>>>>>>>>>>>>> the per-job mode on Yarn to execute his job on
> >>> >> a separate
> >>> >>>>>>>>>>>> cluster.
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>> I think that having two notions of cluster
> >>> >> deployments
> >>> >>>>>>>>>> (session
> >>> >>>>>>>>>>>> vs.
> >>> >>>>>>>>>>>>>>>> per-job
> >>> >>>>>>>>>>>>>>>>> mode) does not necessarily contradict your
> >>> >> ideas for the
> >>> >>>>>>>>>> client
> >>> >>>>>>>>>>>> api
> >>> >>>>>>>>>>>>>>>>> refactoring. For example one could have the
> >>> >> following
> >>> >>>>>>>>>>> interfaces:
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>> - ClusterDeploymentDescriptor: encapsulates
> >>> >> the logic
> >>> >>>>>>>> how to
> >>> >>>>>>>>>>>> deploy a
> >>> >>>>>>>>>>>>>>>>> cluster.
> >>> >>>>>>>>>>>>>>>>> - ClusterClient: allows to interact with a
> >>> >> cluster
> >>> >>>>>>>>>>>>>>>>> - JobClient: allows to interact with a running
> >>> >> job
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>> Now the ClusterDeploymentDescriptor could have
> >>> >> two
> >>> >>>>>>>> methods:
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>> - ClusterClient deploySessionCluster()
> >>> >>>>>>>>>>>>>>>>> - JobClusterClient/JobClient
> >>> >>>>>>>> deployPerJobCluster(JobGraph)
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>> where JobClusterClient is either a supertype of
> >>> >>>>>>>> ClusterClient
> >>> >>>>>>>>>>>> which
> >>> >>>>>>>>>>>>>>> does
> >>> >>>>>>>>>>>>>>>>> not give you the functionality to submit jobs
> >>> >> or
> >>> >>>>>>>>>>>> deployPerJobCluster
> >>> >>>>>>>>>>>>>>>>> returns directly a JobClient.
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>> When setting up the ExecutionEnvironment, one
> >>> >> would then
> >>> >>>>>>>> not
> >>> >>>>>>>>>>>> provide
> >>> >>>>>>>>>>>>>> a
> >>> >>>>>>>>>>>>>>>>> ClusterClient to submit jobs but a JobDeployer
> >>> >> which,
> >>> >>>>>>>>>> depending
> >>> >>>>>>>>>>>> on
> >>> >>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>> selected mode, either uses a ClusterClient
> >>> >> (session
> >>> >>>>>>>> mode) to
> >>> >>>>>>>>>>>> submit
> >>> >>>>>>>>>>>>>>> jobs
> >>> >>>>>>>>>>>>>>>> or
> >>> >>>>>>>>>>>>>>>>> a ClusterDeploymentDescriptor to deploy per a
> >>> >> job mode
> >>> >>>>>>>>>> cluster
> >>> >>>>>>>>>>>> with
> >>> >>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>> to execute.
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>> These are just some thoughts how one could
> >>> >> make it
> >>> >>>>>>>> working
> >>> >>>>>>>>>>>> because I
> >>> >>>>>>>>>>>>>>>>> believe there is some value in using the per
> >>> >> job mode
> >>> >>>>>>>> from
> >>> >>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>> ExecutionEnvironment.
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>> Concerning the web submission, this is indeed
> >>> >> a bit
> >>> >>>>>>>> tricky.
> >>> >>>>>>>>>>> From
> >>> >>>>>>>>>>>> a
> >>> >>>>>>>>>>>>>>>> cluster
> >>> >>>>>>>>>>>>>>>>> management stand point, I would in favour of
> >>> >> not
> >>> >>>>>>>> executing
> >>> >>>>>>>>>> user
> >>> >>>>>>>>>>>> code
> >>> >>>>>>>>>>>>>> on
> >>> >>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>> REST endpoint. Especially when considering
> >>> >> security, it
> >>> >>>>>>>> would
> >>> >>>>>>>>>>> be
> >>> >>>>>>>>>>>> good
> >>> >>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>> have a well defined cluster behaviour where it
> >>> >> is
> >>> >>>>>>>> explicitly
> >>> >>>>>>>>>>>> stated
> >>> >>>>>>>>>>>>>>> where
> >>> >>>>>>>>>>>>>>>>> user code and, thus, potentially risky code is
> >>> >> executed.
> >>> >>>>>>>>>>> Ideally
> >>> >>>>>>>>>>>> we
> >>> >>>>>>>>>>>>>>> limit
> >>> >>>>>>>>>>>>>>>>> it to the TaskExecutor and JobMaster.
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>> Cheers,
> >>> >>>>>>>>>>>>>>>>> Till
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:40 AM Flavio
> >>> >> Pompermaier <
> >>> >>>>>>>>>>>>>>> pompermaier@okkam.it
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>> wrote:
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>> In my opinion the client should not use any
> >>> >>>>>>>> environment to
> >>> >>>>>>>>>>> get
> >>> >>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>> Job
> >>> >>>>>>>>>>>>>>>>>> graph because the jar should reside ONLY on
> >>> >> the cluster
> >>> >>>>>>>>>> (and
> >>> >>>>>>>>>>>> not in
> >>> >>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>> client classpath otherwise there are always
> >>> >>>>>>>> inconsistencies
> >>> >>>>>>>>>>>> between
> >>> >>>>>>>>>>>>>>>>> client
> >>> >>>>>>>>>>>>>>>>>> and Flink Job manager's classpath).
> >>> >>>>>>>>>>>>>>>>>> In the YARN, Mesos and Kubernetes scenarios
> >>> >> you have
> >>> >>>>>>>> the
> >>> >>>>>>>>>> jar
> >>> >>>>>>>>>>>> but
> >>> >>>>>>>>>>>>>> you
> >>> >>>>>>>>>>>>>>>>> could
> >>> >>>>>>>>>>>>>>>>>> start a cluster that has the jar on the Job
> >>> >> Manager as
> >>> >>>>>>>> well
> >>> >>>>>>>>>>>> (but
> >>> >>>>>>>>>>>>>> this
> >>> >>>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>>> the only case where I think you can assume
> >>> >> that the
> >>> >>>>>>>> client
> >>> >>>>>>>>>>> has
> >>> >>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>> jar
> >>> >>>>>>>>>>>>>>>> on
> >>> >>>>>>>>>>>>>>>>>> the classpath..in the REST job submission
> >>> >> you don't
> >>> >>>>>>>> have
> >>> >>>>>>>>>> any
> >>> >>>>>>>>>>>>>>>> classpath).
> >>> >>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>> Thus, always in my opinion, the JobGraph
> >>> >> should be
> >>> >>>>>>>>>> generated
> >>> >>>>>>>>>>>> by the
> >>> >>>>>>>>>>>>>>> Job
> >>> >>>>>>>>>>>>>>>>>> Manager REST API.
> >>> >>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
> >>> >>>>>>>>>>>> wander4096@gmail.com>
> >>> >>>>>>>>>>>>>>>> wrote:
> >>> >>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>> I would like to involve Till & Stephan here
> >>> >> to clarify
> >>> >>>>>>>>>> some
> >>> >>>>>>>>>>>>>> concept
> >>> >>>>>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>>>>>> per-job mode.
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>> The term per-job is one of modes a cluster
> >>> >> could run
> >>> >>>>>>>> on.
> >>> >>>>>>>>>> It
> >>> >>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>> mainly
> >>> >>>>>>>>>>>>>>>>>>> aimed
> >>> >>>>>>>>>>>>>>>>>>> at spawn
> >>> >>>>>>>>>>>>>>>>>>> a dedicated cluster for a specific job
> >>> >> while the job
> >>> >>>>>>>> could
> >>> >>>>>>>>>>> be
> >>> >>>>>>>>>>>>>>> packaged
> >>> >>>>>>>>>>>>>>>>>>> with
> >>> >>>>>>>>>>>>>>>>>>> Flink
> >>> >>>>>>>>>>>>>>>>>>> itself and thus the cluster initialized
> >>> >> with job so
> >>> >>>>>>>> that
> >>> >>>>>>>>>> get
> >>> >>>>>>>>>>>> rid
> >>> >>>>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>> a
> >>> >>>>>>>>>>>>>>>>>>> separated
> >>> >>>>>>>>>>>>>>>>>>> submission step.
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>> This is useful for container deployments
> >>> >> where one
> >>> >>>>>>>> create
> >>> >>>>>>>>>>> his
> >>> >>>>>>>>>>>>>> image
> >>> >>>>>>>>>>>>>>>> with
> >>> >>>>>>>>>>>>>>>>>>> the job
> >>> >>>>>>>>>>>>>>>>>>> and then simply deploy the container.
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>> However, it is out of client scope since a
> >>> >>>>>>>>>>>> client(ClusterClient
> >>> >>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>> example) is for
> >>> >>>>>>>>>>>>>>>>>>> communicate with an existing cluster and
> >>> >> performance
> >>> >>>>>>>>>>> actions.
> >>> >>>>>>>>>>>>>>>> Currently,
> >>> >>>>>>>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>> per-job
> >>> >>>>>>>>>>>>>>>>>>> mode, we extract the job graph and bundle
> >>> >> it into
> >>> >>>>>>>> cluster
> >>> >>>>>>>>>>>>>> deployment
> >>> >>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>> thus no
> >>> >>>>>>>>>>>>>>>>>>> concept of client get involved. It looks
> >>> >> like
> >>> >>>>>>>> reasonable
> >>> >>>>>>>>>> to
> >>> >>>>>>>>>>>>>> exclude
> >>> >>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>> deployment
> >>> >>>>>>>>>>>>>>>>>>> of per-job cluster from client api and use
> >>> >> dedicated
> >>> >>>>>>>>>> utility
> >>> >>>>>>>>>>>>>>>>>>> classes(deployers) for
> >>> >>>>>>>>>>>>>>>>>>> deployment.
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
> >>> >> 于2019年8月20日周二
> >>> >>>>>>>> 下午12:37写道：
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> Hi Aljoscha,
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> Thanks for your reply and participance.
> >>> >> The Google
> >>> >>>>>>>> Doc
> >>> >>>>>>>>>> you
> >>> >>>>>>>>>>>>>> linked
> >>> >>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>> requires
> >>> >>>>>>>>>>>>>>>>>>>> permission and I think you could use a
> >>> >> share link
> >>> >>>>>>>>>> instead.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> I agree with that we almost reach a
> >>> >> consensus that
> >>> >>>>>>>>>>>> JobClient is
> >>> >>>>>>>>>>>>>>>>>>> necessary
> >>> >>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>> interacte with a running Job.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> Let me check your open questions one by
> >>> >> one.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> 1. Separate cluster creation and job
> >>> >> submission for
> >>> >>>>>>>>>>> per-job
> >>> >>>>>>>>>>>>>> mode.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> As you mentioned here is where the
> >>> >> opinions
> >>> >>>>>>>> diverge. In
> >>> >>>>>>>>>> my
> >>> >>>>>>>>>>>>>>> document
> >>> >>>>>>>>>>>>>>>>>>> there
> >>> >>>>>>>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>>>>> an alternative[2] that proposes excluding
> >>> >> per-job
> >>> >>>>>>>>>>> deployment
> >>> >>>>>>>>>>>>>> from
> >>> >>>>>>>>>>>>>>>>> client
> >>> >>>>>>>>>>>>>>>>>>>> api
> >>> >>>>>>>>>>>>>>>>>>>> scope and now I find it is more
> >>> >> reasonable we do the
> >>> >>>>>>>>>>>> exclusion.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> When in per-job mode, a dedicated
> >>> >> JobCluster is
> >>> >>>>>>>> launched
> >>> >>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>> execute
> >>> >>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>> specific job. It is like a Flink
> >>> >> Application more
> >>> >>>>>>>> than a
> >>> >>>>>>>>>>>>>>> submission
> >>> >>>>>>>>>>>>>>>>>>>> of Flink Job. Client only takes care of
> >>> >> job
> >>> >>>>>>>> submission
> >>> >>>>>>>>>> and
> >>> >>>>>>>>>>>>>> assume
> >>> >>>>>>>>>>>>>>>>> there
> >>> >>>>>>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>>>>> an existing cluster. In this way we are
> >>> >> able to
> >>> >>>>>>>> consider
> >>> >>>>>>>>>>>> per-job
> >>> >>>>>>>>>>>>>>>>> issues
> >>> >>>>>>>>>>>>>>>>>>>> individually and JobClusterEntrypoint
> >>> >> would be the
> >>> >>>>>>>>>> utility
> >>> >>>>>>>>>>>> class
> >>> >>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>> per-job
> >>> >>>>>>>>>>>>>>>>>>>> deployment.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> Nevertheless, user program works in both
> >>> >> session
> >>> >>>>>>>> mode
> >>> >>>>>>>>>> and
> >>> >>>>>>>>>>>>>> per-job
> >>> >>>>>>>>>>>>>>>> mode
> >>> >>>>>>>>>>>>>>>>>>>> without
> >>> >>>>>>>>>>>>>>>>>>>> necessary to change code. JobClient in
> >>> >> per-job mode
> >>> >>>>>>>> is
> >>> >>>>>>>>>>>> returned
> >>> >>>>>>>>>>>>>>> from
> >>> >>>>>>>>>>>>>>>>>>>> env.execute as normal. However, it would
> >>> >> be no
> >>> >>>>>>>> longer a
> >>> >>>>>>>>>>>> wrapper
> >>> >>>>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>>>>>>> RestClusterClient but a wrapper of
> >>> >>>>>>>> PerJobClusterClient
> >>> >>>>>>>>>>> which
> >>> >>>>>>>>>>>>>>>>>>> communicates
> >>> >>>>>>>>>>>>>>>>>>>> to Dispatcher locally.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> 2. How to deal with plan preview.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> With env.compile functions users can get
> >>> >> JobGraph or
> >>> >>>>>>>>>>>> FlinkPlan
> >>> >>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>> thus
> >>> >>>>>>>>>>>>>>>>>>>> they can preview the plan with
> >>> >> programming.
> >>> >>>>>>>> Typically it
> >>> >>>>>>>>>>>> looks
> >>> >>>>>>>>>>>>>>> like
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> if (preview configured) {
> >>> >>>>>>>>>>>>>>>>>>>>    FlinkPlan plan = env.compile();
> >>> >>>>>>>>>>>>>>>>>>>>    new JSONDumpGenerator(...).dump(plan);
> >>> >>>>>>>>>>>>>>>>>>>> } else {
> >>> >>>>>>>>>>>>>>>>>>>>    env.execute();
> >>> >>>>>>>>>>>>>>>>>>>> }
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> And `flink info` would be invalid any
> >>> >> more.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> 3. How to deal with Jar Submission at the
> >>> >> Web
> >>> >>>>>>>> Frontend.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> There is one more thread talked on this
> >>> >> topic[1].
> >>> >>>>>>>> Apart
> >>> >>>>>>>>>>> from
> >>> >>>>>>>>>>>>>>>> removing
> >>> >>>>>>>>>>>>>>>>>>>> the functions there are two alternatives.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> One is to introduce an interface has a
> >>> >> method
> >>> >>>>>>>> returns
> >>> >>>>>>>>>>>>>>>>> JobGraph/FilnkPlan
> >>> >>>>>>>>>>>>>>>>>>>> and Jar Submission only support main-class
> >>> >>>>>>>> implements
> >>> >>>>>>>>>> this
> >>> >>>>>>>>>>>>>>>> interface.
> >>> >>>>>>>>>>>>>>>>>>>> And then extract the JobGraph/FlinkPlan
> >>> >> just by
> >>> >>>>>>>> calling
> >>> >>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>> method.
> >>> >>>>>>>>>>>>>>>>>>>> In this way, it is even possible to
> >>> >> consider a
> >>> >>>>>>>>>> separation
> >>> >>>>>>>>>>>> of job
> >>> >>>>>>>>>>>>>>>>>>> creation
> >>> >>>>>>>>>>>>>>>>>>>> and job submission.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> The other is, as you mentioned, let
> >>> >> execute() do the
> >>> >>>>>>>>>>> actual
> >>> >>>>>>>>>>>>>>>> execution.
> >>> >>>>>>>>>>>>>>>>>>>> We won't execute the main method in the
> >>> >> WebFrontend
> >>> >>>>>>>> but
> >>> >>>>>>>>>>>> spawn a
> >>> >>>>>>>>>>>>>>>>> process
> >>> >>>>>>>>>>>>>>>>>>>> at WebMonitor side to execute. For return
> >>> >> part we
> >>> >>>>>>>> could
> >>> >>>>>>>>>>>> generate
> >>> >>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>> JobID from WebMonitor and pass it to the
> >>> >> execution
> >>> >>>>>>>>>>>> environemnt.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> 4. How to deal with detached mode.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> I think detached mode is a temporary
> >>> >> solution for
> >>> >>>>>>>>>>>> non-blocking
> >>> >>>>>>>>>>>>>>>>>>> submission.
> >>> >>>>>>>>>>>>>>>>>>>> In my document both submission and
> >>> >> execution return
> >>> >>>>>>>> a
> >>> >>>>>>>>>>>>>>>>> CompletableFuture
> >>> >>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>> users control whether or not wait for the
> >>> >> result. In
> >>> >>>>>>>>>> this
> >>> >>>>>>>>>>>> point
> >>> >>>>>>>>>>>>>> we
> >>> >>>>>>>>>>>>>>>>> don't
> >>> >>>>>>>>>>>>>>>>>>>> need a detached option but the
> >>> >> functionality is
> >>> >>>>>>>> covered.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> 5. How does per-job mode interact with
> >>> >> interactive
> >>> >>>>>>>>>>>> programming.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> All of YARN, Mesos and Kubernetes
> >>> >> scenarios follow
> >>> >>>>>>>> the
> >>> >>>>>>>>>>>> pattern
> >>> >>>>>>>>>>>>>>>> launch
> >>> >>>>>>>>>>>>>>>>> a
> >>> >>>>>>>>>>>>>>>>>>>> JobCluster now. And I don't think there
> >>> >> would be
> >>> >>>>>>>>>>>> inconsistency
> >>> >>>>>>>>>>>>>>>> between
> >>> >>>>>>>>>>>>>>>>>>>> different resource management.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> Best,
> >>> >>>>>>>>>>>>>>>>>>>> tison.
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> [1]
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>
> >>> >>>>>>>>
> >>> >>
> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> >>> >>>>>>>>>>>>>>>>>>>> [2]
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>
> >>> >>>>>>>>
> >>> >>
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org>
> >>> >>>>>>>> 于2019年8月16日周五
> >>> >>>>>>>>>>>> 下午9:20写道：
> >>> >>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>> Hi,
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>> I read both Jeffs initial design
> >>> >> document and the
> >>> >>>>>>>> newer
> >>> >>>>>>>>>>>>>> document
> >>> >>>>>>>>>>>>>>> by
> >>> >>>>>>>>>>>>>>>>>>>>> Tison. I also finally found the time to
> >>> >> collect our
> >>> >>>>>>>>>>>> thoughts on
> >>> >>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>> issue,
> >>> >>>>>>>>>>>>>>>>>>>>> I had quite some discussions with Kostas
> >>> >> and this
> >>> >>>>>>>> is
> >>> >>>>>>>>>> the
> >>> >>>>>>>>>>>>>> result:
> >>> >>>>>>>>>>>>>>>> [1].
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>> I think overall we agree that this part
> >>> >> of the
> >>> >>>>>>>> code is
> >>> >>>>>>>>>> in
> >>> >>>>>>>>>>>> dire
> >>> >>>>>>>>>>>>>>> need
> >>> >>>>>>>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>>>>>>>> some refactoring/improvements but I
> >>> >> think there are
> >>> >>>>>>>>>> still
> >>> >>>>>>>>>>>> some
> >>> >>>>>>>>>>>>>>> open
> >>> >>>>>>>>>>>>>>>>>>>>> questions and some differences in
> >>> >> opinion what
> >>> >>>>>>>> those
> >>> >>>>>>>>>>>>>> refactorings
> >>> >>>>>>>>>>>>>>>>>>> should
> >>> >>>>>>>>>>>>>>>>>>>>> look like.
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>> I think the API-side is quite clear,
> >>> >> i.e. we need
> >>> >>>>>>>> some
> >>> >>>>>>>>>>>>>> JobClient
> >>> >>>>>>>>>>>>>>>> API
> >>> >>>>>>>>>>>>>>>>>>> that
> >>> >>>>>>>>>>>>>>>>>>>>> allows interacting with a running Job.
> >>> >> It could be
> >>> >>>>>>>>>>>> worthwhile
> >>> >>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>> spin
> >>> >>>>>>>>>>>>>>>>>>> that
> >>> >>>>>>>>>>>>>>>>>>>>> off into a separate FLIP because we can
> >>> >> probably
> >>> >>>>>>>> find
> >>> >>>>>>>>>>>> consensus
> >>> >>>>>>>>>>>>>>> on
> >>> >>>>>>>>>>>>>>>>> that
> >>> >>>>>>>>>>>>>>>>>>>>> part more easily.
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>> For the rest, the main open questions
> >>> >> from our doc
> >>> >>>>>>>> are
> >>> >>>>>>>>>>>> these:
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>  - Do we want to separate cluster
> >>> >> creation and job
> >>> >>>>>>>>>>>> submission
> >>> >>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>> per-job mode? In the past, there were
> >>> >> conscious
> >>> >>>>>>>> efforts
> >>> >>>>>>>>>>> to
> >>> >>>>>>>>>>>>>> *not*
> >>> >>>>>>>>>>>>>>>>>>> separate
> >>> >>>>>>>>>>>>>>>>>>>>> job submission from cluster creation for
> >>> >> per-job
> >>> >>>>>>>>>> clusters
> >>> >>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>> Mesos,
> >>> >>>>>>>>>>>>>>>>>>> YARN,
> >>> >>>>>>>>>>>>>>>>>>>>> Kubernets (see
> >>> >> StandaloneJobClusterEntryPoint).
> >>> >>>>>>>> Tison
> >>> >>>>>>>>>>>> suggests
> >>> >>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>> his
> >>> >>>>>>>>>>>>>>>>>>>>> design document to decouple this in
> >>> >> order to unify
> >>> >>>>>>>> job
> >>> >>>>>>>>>>>>>>> submission.
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>  - How to deal with plan preview, which
> >>> >> needs to
> >>> >>>>>>>>>> hijack
> >>> >>>>>>>>>>>>>>> execute()
> >>> >>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>> let the outside code catch an exception?
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>  - How to deal with Jar Submission at
> >>> >> the Web
> >>> >>>>>>>>>> Frontend,
> >>> >>>>>>>>>>>> which
> >>> >>>>>>>>>>>>>>>> needs
> >>> >>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>> hijack execute() and let the outside
> >>> >> code catch an
> >>> >>>>>>>>>>>> exception?
> >>> >>>>>>>>>>>>>>>>>>>>> CliFrontend.run() “hijacks”
> >>> >>>>>>>>>>> ExecutionEnvironment.execute()
> >>> >>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>> get a
> >>> >>>>>>>>>>>>>>>>>>>>> JobGraph and then execute that JobGraph
> >>> >> manually.
> >>> >>>>>>>> We
> >>> >>>>>>>>>>> could
> >>> >>>>>>>>>>>> get
> >>> >>>>>>>>>>>>>>>> around
> >>> >>>>>>>>>>>>>>>>>>> that
> >>> >>>>>>>>>>>>>>>>>>>>> by letting execute() do the actual
> >>> >> execution. One
> >>> >>>>>>>>>> caveat
> >>> >>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>> this
> >>> >>>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>>>> that
> >>> >>>>>>>>>>>>>>>>>>>>> now the main() method doesn’t return (or
> >>> >> is forced
> >>> >>>>>>>> to
> >>> >>>>>>>>>>>> return by
> >>> >>>>>>>>>>>>>>>>>>> throwing an
> >>> >>>>>>>>>>>>>>>>>>>>> exception from execute()) which means
> >>> >> that for Jar
> >>> >>>>>>>>>>>> Submission
> >>> >>>>>>>>>>>>>>> from
> >>> >>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>> WebFrontend we have a long-running
> >>> >> main() method
> >>> >>>>>>>>>> running
> >>> >>>>>>>>>>>> in the
> >>> >>>>>>>>>>>>>>>>>>>>> WebFrontend. This doesn’t sound very
> >>> >> good. We
> >>> >>>>>>>> could get
> >>> >>>>>>>>>>>> around
> >>> >>>>>>>>>>>>>>> this
> >>> >>>>>>>>>>>>>>>>> by
> >>> >>>>>>>>>>>>>>>>>>>>> removing the plan preview feature and by
> >>> >> removing
> >>> >>>>>>>> Jar
> >>> >>>>>>>>>>>>>>>>>>> Submission/Running.
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>  - How to deal with detached mode?
> >>> >> Right now,
> >>> >>>>>>>>>>>>>>> DetachedEnvironment
> >>> >>>>>>>>>>>>>>>>> will
> >>> >>>>>>>>>>>>>>>>>>>>> execute the job and return immediately.
> >>> >> If users
> >>> >>>>>>>>>> control
> >>> >>>>>>>>>>>> when
> >>> >>>>>>>>>>>>>>> they
> >>> >>>>>>>>>>>>>>>>>>> want to
> >>> >>>>>>>>>>>>>>>>>>>>> return, by waiting on the job completion
> >>> >> future,
> >>> >>>>>>>> how do
> >>> >>>>>>>>>>> we
> >>> >>>>>>>>>>>> deal
> >>> >>>>>>>>>>>>>>>> with
> >>> >>>>>>>>>>>>>>>>>>> this?
> >>> >>>>>>>>>>>>>>>>>>>>> Do we simply remove the distinction
> >>> >> between
> >>> >>>>>>>>>>>>>>> detached/non-detached?
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>  - How does per-job mode interact with
> >>> >>>>>>>> “interactive
> >>> >>>>>>>>>>>>>> programming”
> >>> >>>>>>>>>>>>>>>>>>>>> (FLIP-36). For YARN, each execute() call
> >>> >> could
> >>> >>>>>>>> spawn a
> >>> >>>>>>>>>>> new
> >>> >>>>>>>>>>>>>> Flink
> >>> >>>>>>>>>>>>>>>> YARN
> >>> >>>>>>>>>>>>>>>>>>>>> cluster. What about Mesos and Kubernetes?
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>> The first open question is where the
> >>> >> opinions
> >>> >>>>>>>> diverge,
> >>> >>>>>>>>>> I
> >>> >>>>>>>>>>>> think.
> >>> >>>>>>>>>>>>>>> The
> >>> >>>>>>>>>>>>>>>>>>> rest
> >>> >>>>>>>>>>>>>>>>>>>>> are just open questions and interesting
> >>> >> things
> >>> >>>>>>>> that we
> >>> >>>>>>>>>>>> need to
> >>> >>>>>>>>>>>>>>>>>>> consider.
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>> Best,
> >>> >>>>>>>>>>>>>>>>>>>>> Aljoscha
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>> [1]
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>
> >>> >>>>>>>>
> >>> >>
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> >>> >>>>>>>>>>>>>>>>>>>>> <
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>
> >>> >>>>>>>>
> >>> >>
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> >>> >>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>> On 31. Jul 2019, at 15:23, Jeff Zhang <
> >>> >>>>>>>>>>> zjffdu@gmail.com>
> >>> >>>>>>>>>>>>>>> wrote:
> >>> >>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>> Thanks tison for the effort. I left a
> >>> >> few
> >>> >>>>>>>> comments.
> >>> >>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
> >>> >> 于2019年7月31日周三
> >>> >>>>>>>>>>> 下午8:24写道：
> >>> >>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>> Hi Flavio,
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>> Thanks for your reply.
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>> Either current impl and in the design,
> >>> >>>>>>>> ClusterClient
> >>> >>>>>>>>>>>>>>>>>>>>>>> never takes responsibility for
> >>> >> generating
> >>> >>>>>>>> JobGraph.
> >>> >>>>>>>>>>>>>>>>>>>>>>> (what you see in current codebase is
> >>> >> several
> >>> >>>>>>>> class
> >>> >>>>>>>>>>>> methods)
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>> Instead, user describes his program
> >>> >> in the main
> >>> >>>>>>>>>> method
> >>> >>>>>>>>>>>>>>>>>>>>>>> with ExecutionEnvironment apis and
> >>> >> calls
> >>> >>>>>>>>>> env.compile()
> >>> >>>>>>>>>>>>>>>>>>>>>>> or env.optimize() to get FlinkPlan
> >>> >> and JobGraph
> >>> >>>>>>>>>>>>>> respectively.
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>> For listing main classes in a jar and
> >>> >> choose
> >>> >>>>>>>> one for
> >>> >>>>>>>>>>>>>>>>>>>>>>> submission, you're now able to
> >>> >> customize a CLI
> >>> >>>>>>>> to do
> >>> >>>>>>>>>>> it.
> >>> >>>>>>>>>>>>>>>>>>>>>>> Specifically, the path of jar is
> >>> >> passed as
> >>> >>>>>>>> arguments
> >>> >>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>> in the customized CLI you list main
> >>> >> classes,
> >>> >>>>>>>> choose
> >>> >>>>>>>>>>> one
> >>> >>>>>>>>>>>>>>>>>>>>>>> to submit to the cluster.
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>> Best,
> >>> >>>>>>>>>>>>>>>>>>>>>>> tison.
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
> >>> >> pompermaier@okkam.it>
> >>> >>>>>>>>>>> 于2019年7月31日周三
> >>> >>>>>>>>>>>>>>>> 下午8:12写道：
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>> Just one note on my side: it is not
> >>> >> clear to me
> >>> >>>>>>>>>>>> whether the
> >>> >>>>>>>>>>>>>>>>> client
> >>> >>>>>>>>>>>>>>>>>>>>> needs
> >>> >>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>> be able to generate a job graph or
> >>> >> not.
> >>> >>>>>>>>>>>>>>>>>>>>>>>> In my opinion, the job jar must
> >>> >> resides only
> >>> >>>>>>>> on the
> >>> >>>>>>>>>>>>>>>>>>> server/jobManager
> >>> >>>>>>>>>>>>>>>>>>>>>>> side
> >>> >>>>>>>>>>>>>>>>>>>>>>>> and the client requires a way to get
> >>> >> the job
> >>> >>>>>>>> graph.
> >>> >>>>>>>>>>>>>>>>>>>>>>>> If you really want to access to the
> >>> >> job graph,
> >>> >>>>>>>> I'd
> >>> >>>>>>>>>>> add
> >>> >>>>>>>>>>>> a
> >>> >>>>>>>>>>>>>>>>> dedicated
> >>> >>>>>>>>>>>>>>>>>>>>> method
> >>> >>>>>>>>>>>>>>>>>>>>>>>> on the ClusterClient. like:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>  - getJobGraph(jarId, mainClass):
> >>> >> JobGraph
> >>> >>>>>>>>>>>>>>>>>>>>>>>>  - listMainClasses(jarId):
> >>> >> List<String>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>> These would require some addition
> >>> >> also on the
> >>> >>>>>>>> job
> >>> >>>>>>>>>>>> manager
> >>> >>>>>>>>>>>>>>>>> endpoint
> >>> >>>>>>>>>>>>>>>>>>> as
> >>> >>>>>>>>>>>>>>>>>>>>>>>> well..what do you think?
> >>> >>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 31, 2019 at 12:42 PM
> >>> >> Zili Chen <
> >>> >>>>>>>>>>>>>>>> wander4096@gmail.com
> >>> >>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Here is a document[1] on client api
> >>> >>>>>>>> enhancement
> >>> >>>>>>>>>> from
> >>> >>>>>>>>>>>> our
> >>> >>>>>>>>>>>>>>>>>>> perspective.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> We have investigated current
> >>> >> implementations.
> >>> >>>>>>>> And
> >>> >>>>>>>>>> we
> >>> >>>>>>>>>>>>>> propose
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Unify the implementation of
> >>> >> cluster
> >>> >>>>>>>> deployment
> >>> >>>>>>>>>>> and
> >>> >>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>> submission
> >>> >>>>>>>>>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Flink.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Provide programmatic interfaces
> >>> >> to allow
> >>> >>>>>>>>>> flexible
> >>> >>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>> cluster
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> management.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> The first proposal is aimed at
> >>> >> reducing code
> >>> >>>>>>>> paths
> >>> >>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>> cluster
> >>> >>>>>>>>>>>>>>>>>>>>>>> deployment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> job submission so that one can
> >>> >> adopt Flink in
> >>> >>>>>>>> his
> >>> >>>>>>>>>>>> usage
> >>> >>>>>>>>>>>>>>>> easily.
> >>> >>>>>>>>>>>>>>>>>>> The
> >>> >>>>>>>>>>>>>>>>>>>>>>>> second
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> proposal is aimed at providing rich
> >>> >>>>>>>> interfaces for
> >>> >>>>>>>>>>>>>> advanced
> >>> >>>>>>>>>>>>>>>>> users
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> who want to make accurate control
> >>> >> of these
> >>> >>>>>>>> stages.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Quick reference on open questions:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Exclude job cluster deployment
> >>> >> from client
> >>> >>>>>>>> side
> >>> >>>>>>>>>>> or
> >>> >>>>>>>>>>>>>>> redefine
> >>> >>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>> semantic
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> of job cluster? Since it fits in a
> >>> >> process
> >>> >>>>>>>> quite
> >>> >>>>>>>>>>>> different
> >>> >>>>>>>>>>>>>>>> from
> >>> >>>>>>>>>>>>>>>>>>>>> session
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> cluster deployment and job
> >>> >> submission.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Maintain the codepaths handling
> >>> >> class
> >>> >>>>>>>>>>>>>>>>> o.a.f.api.common.Program
> >>> >>>>>>>>>>>>>>>>>>> or
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> implement customized program
> >>> >> handling logic by
> >>> >>>>>>>>>>>> customized
> >>> >>>>>>>>>>>>>>>>>>>>> CliFrontend?
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> See also this thread[2] and the
> >>> >> document[1].
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 3. Expose ClusterClient as public
> >>> >> api or just
> >>> >>>>>>>>>> expose
> >>> >>>>>>>>>>>> api
> >>> >>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> and delegate them to ClusterClient?
> >>> >> Further,
> >>> >>>>>>>> in
> >>> >>>>>>>>>>>> either way
> >>> >>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>> worth
> >>> >>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> introduce a JobClient which is an
> >>> >>>>>>>> encapsulation of
> >>> >>>>>>>>>>>>>>>> ClusterClient
> >>> >>>>>>>>>>>>>>>>>>> that
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> associated to specific job?
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> tison.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>
> >>> >>>>>>>>
> >>> >>
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> [2]
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>
> >>> >>>>>>>>
> >>> >>
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang <zj...@gmail.com>
> >>> >> 于2019年7月24日周三
> >>> >>>>>>>>>>> 上午9:19写道：
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Stephan, I will follow up
> >>> >> this issue
> >>> >>>>>>>> in
> >>> >>>>>>>>>> next
> >>> >>>>>>>>>>>> few
> >>> >>>>>>>>>>>>>>>> weeks,
> >>> >>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>> will
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> refine the design doc. We could
> >>> >> discuss more
> >>> >>>>>>>>>>> details
> >>> >>>>>>>>>>>>>> after
> >>> >>>>>>>>>>>>>>>> 1.9
> >>> >>>>>>>>>>>>>>>>>>>>>>> release.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
> >>> >>>>>>>> 于2019年7月24日周三
> >>> >>>>>>>>>>>> 上午12:58写道：
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all!
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> This thread has stalled for a
> >>> >> bit, which I
> >>> >>>>>>>>>> assume
> >>> >>>>>>>>>>>> ist
> >>> >>>>>>>>>>>>>>> mostly
> >>> >>>>>>>>>>>>>>>>>>> due to
> >>> >>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Flink 1.9 feature freeze and
> >>> >> release testing
> >>> >>>>>>>>>>> effort.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I personally still recognize this
> >>> >> issue as
> >>> >>>>>>>> one
> >>> >>>>>>>>>>>> important
> >>> >>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>> be
> >>> >>>>>>>>>>>>>>>>>>>>>>>> solved.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> I'd
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> be happy to help resume this
> >>> >> discussion soon
> >>> >>>>>>>>>>> (after
> >>> >>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>> 1.9
> >>> >>>>>>>>>>>>>>>>>>>>>>> release)
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> see if we can do some step
> >>> >> towards this in
> >>> >>>>>>>> Flink
> >>> >>>>>>>>>>>> 1.10.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 10:41 AM
> >>> >> Flavio
> >>> >>>>>>>>>>> Pompermaier
> >>> >>>>>>>>>>>> <
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> pompermaier@okkam.it>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> That's exactly what I suggested
> >>> >> a long time
> >>> >>>>>>>>>> ago:
> >>> >>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>> Flink
> >>> >>>>>>>>>>>>>>>>> REST
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> client
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> should not require any Flink
> >>> >> dependency,
> >>> >>>>>>>> only
> >>> >>>>>>>>>>> http
> >>> >>>>>>>>>>>>>>> library
> >>> >>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>> call
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> REST
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> services to submit and monitor a
> >>> >> job.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested also in [1] was
> >>> >> to have a
> >>> >>>>>>>> way
> >>> >>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>> automatically
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> suggest
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> user (via a UI) the available
> >>> >> main classes
> >>> >>>>>>>> and
> >>> >>>>>>>>>>>> their
> >>> >>>>>>>>>>>>>>>> required
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> parameters[2].
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Another problem we have with
> >>> >> Flink is that
> >>> >>>>>>>> the
> >>> >>>>>>>>>>> Rest
> >>> >>>>>>>>>>>>>>> client
> >>> >>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> CLI
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> one
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> behaves differently and we use
> >>> >> the CLI
> >>> >>>>>>>> client
> >>> >>>>>>>>>>> (via
> >>> >>>>>>>>>>>> ssh)
> >>> >>>>>>>>>>>>>>>>> because
> >>> >>>>>>>>>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> allows
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> to call some other method after
> >>> >>>>>>>> env.execute()
> >>> >>>>>>>>>> [3]
> >>> >>>>>>>>>>>> (we
> >>> >>>>>>>>>>>>>>> have
> >>> >>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>> call
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> another
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> REST service to signal the end
> >>> >> of the job).
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Int his regard, a dedicated
> >>> >> interface,
> >>> >>>>>>>> like the
> >>> >>>>>>>>>>>>>>> JobListener
> >>> >>>>>>>>>>>>>>>>>>>>>>>> suggested
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the previous emails, would be
> >>> >> very helpful
> >>> >>>>>>>>>>> (IMHO).
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10864
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
> >>> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10862
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
> >>> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10879
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 9:54 AM
> >>> >> Jeff Zhang
> >>> >>>>>>>> <
> >>> >>>>>>>>>>>>>>>> zjffdu@gmail.com
> >>> >>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Tison,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your comments.
> >>> >> Overall I agree
> >>> >>>>>>>> with
> >>> >>>>>>>>>>> you
> >>> >>>>>>>>>>>>>> that
> >>> >>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> difficult
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> down stream project to
> >>> >> integrate with
> >>> >>>>>>>> flink
> >>> >>>>>>>>>> and
> >>> >>>>>>>>>>> we
> >>> >>>>>>>>>>>>>> need
> >>> >>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>> refactor
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> current flink client api.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> And I agree that CliFrontend
> >>> >> should only
> >>> >>>>>>>>>> parsing
> >>> >>>>>>>>>>>>>> command
> >>> >>>>>>>>>>>>>>>>> line
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> arguments
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> then pass them to
> >>> >> ExecutionEnvironment.
> >>> >>>>>>>> It is
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment's
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> responsibility to compile job,
> >>> >> create
> >>> >>>>>>>> cluster,
> >>> >>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>> submit
> >>> >>>>>>>>>>>>>>>>> job.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Besides
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that, Currently flink has many
> >>> >>>>>>>>>>>> ExecutionEnvironment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> implementations,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink will use the specific one
> >>> >> based on
> >>> >>>>>>>> the
> >>> >>>>>>>>>>>> context.
> >>> >>>>>>>>>>>>>>>> IMHO,
> >>> >>>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> not
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary, ExecutionEnvironment
> >>> >> should be
> >>> >>>>>>>> able
> >>> >>>>>>>>>>> to
> >>> >>>>>>>>>>>> do
> >>> >>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>> right
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> thing
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> based
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the FlinkConf it is
> >>> >> received. Too many
> >>> >>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation is another
> >>> >> burden for
> >>> >>>>>>>>>> downstream
> >>> >>>>>>>>>>>>>> project
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> integration.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> One thing I'd like to mention
> >>> >> is flink's
> >>> >>>>>>>> scala
> >>> >>>>>>>>>>>> shell
> >>> >>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>> sql
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> client,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> although they are sub-modules
> >>> >> of flink,
> >>> >>>>>>>> they
> >>> >>>>>>>>>>>> could be
> >>> >>>>>>>>>>>>>>>>> treated
> >>> >>>>>>>>>>>>>>>>>>>>>>> as
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> project which use flink's
> >>> >> client api.
> >>> >>>>>>>>>> Currently
> >>> >>>>>>>>>>>> you
> >>> >>>>>>>>>>>>>> will
> >>> >>>>>>>>>>>>>>>>> find
> >>> >>>>>>>>>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> not
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> easy for them to integrate with
> >>> >> flink,
> >>> >>>>>>>> they
> >>> >>>>>>>>>>> share
> >>> >>>>>>>>>>>> many
> >>> >>>>>>>>>>>>>>>>>>>>>>> duplicated
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> code.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> It
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> is another sign that we should
> >>> >> refactor
> >>> >>>>>>>> flink
> >>> >>>>>>>>>>>> client
> >>> >>>>>>>>>>>>>>> api.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I believe it is a large and
> >>> >> hard change,
> >>> >>>>>>>> and I
> >>> >>>>>>>>>>> am
> >>> >>>>>>>>>>>>>> afraid
> >>> >>>>>>>>>>>>>>>> we
> >>> >>>>>>>>>>>>>>>>>>> can
> >>> >>>>>>>>>>>>>>>>>>>>>>>> not
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> keep
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility since many of
> >>> >> changes are
> >>> >>>>>>>> user
> >>> >>>>>>>>>>>> facing.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <wander4096@gmail.com
> >>> >>>
> >>> >>>>>>>>>> 于2019年6月24日周一
> >>> >>>>>>>>>>>>>>> 下午2:53写道：
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> After a closer look on our
> >>> >> client apis,
> >>> >>>>>>>> I can
> >>> >>>>>>>>>>> see
> >>> >>>>>>>>>>>>>> there
> >>> >>>>>>>>>>>>>>>> are
> >>> >>>>>>>>>>>>>>>>>>>>>>> two
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> major
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues to consistency and
> >>> >> integration,
> >>> >>>>>>>> namely
> >>> >>>>>>>>>>>>>> different
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> deployment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job cluster which couples job
> >>> >> graph
> >>> >>>>>>>> creation
> >>> >>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>> cluster
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> deployment,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submission via CliFrontend
> >>> >> confusing
> >>> >>>>>>>>>>> control
> >>> >>>>>>>>>>>> flow
> >>> >>>>>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> graph
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compilation and job
> >>> >> submission. I'd like
> >>> >>>>>>>> to
> >>> >>>>>>>>>>>> follow
> >>> >>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>> discuss
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> above,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mainly the process described
> >>> >> by Jeff and
> >>> >>>>>>>>>>>> Stephan, and
> >>> >>>>>>>>>>>>>>>> share
> >>> >>>>>>>>>>>>>>>>>>>>>>> my
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas on these issues.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend confuses the
> >>> >> control flow
> >>> >>>>>>>> of
> >>> >>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>> compilation
> >>> >>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> submission.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Following the process of job
> >>> >> submission
> >>> >>>>>>>>>> Stephan
> >>> >>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>> Jeff
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> described,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution environment knows
> >>> >> all configs
> >>> >>>>>>>> of
> >>> >>>>>>>>>> the
> >>> >>>>>>>>>>>>>> cluster
> >>> >>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> topos/settings
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the job. Ideally, in the
> >>> >> main method
> >>> >>>>>>>> of
> >>> >>>>>>>>>> user
> >>> >>>>>>>>>>>>>>> program,
> >>> >>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>>>>> calls
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> #execute
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (or named #submit) and Flink
> >>> >> deploys the
> >>> >>>>>>>>>>> cluster,
> >>> >>>>>>>>>>>>>>> compile
> >>> >>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> graph
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit it to the cluster.
> >>> >> However,
> >>> >>>>>>>>>> current
> >>> >>>>>>>>>>>>>>>> CliFrontend
> >>> >>>>>>>>>>>>>>>>>>>>>>> does
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> all
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> these
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things inside its #runProgram
> >>> >> method,
> >>> >>>>>>>> which
> >>> >>>>>>>>>>>>>> introduces
> >>> >>>>>>>>>>>>>>> a
> >>> >>>>>>>>>>>>>>>>> lot
> >>> >>>>>>>>>>>>>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of (stream) execution
> >>> >> environment.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually, it sets up an exec
> >>> >> env that
> >>> >>>>>>>> hijacks
> >>> >>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> #execute/executePlan
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method, initializes the job
> >>> >> graph and
> >>> >>>>>>>> abort
> >>> >>>>>>>>>>>>>> execution.
> >>> >>>>>>>>>>>>>>>> And
> >>> >>>>>>>>>>>>>>>>>>>>>>> then
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow back to
> >>> >> CliFrontend, it
> >>> >>>>>>>> deploys
> >>> >>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>> cluster(or
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the client) and submits the
> >>> >> job graph.
> >>> >>>>>>>> This
> >>> >>>>>>>>>> is
> >>> >>>>>>>>>>>> quite
> >>> >>>>>>>>>>>>>> a
> >>> >>>>>>>>>>>>>>>>>>>>>>> specific
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> internal
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> process inside Flink and none
> >>> >> of
> >>> >>>>>>>> consistency
> >>> >>>>>>>>>> to
> >>> >>>>>>>>>>>>>>> anything.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Deployment of job cluster
> >>> >> couples job
> >>> >>>>>>>>>> graph
> >>> >>>>>>>>>>>>>> creation
> >>> >>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> cluster
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment. Abstractly, from
> >>> >> user job to
> >>> >>>>>>>> a
> >>> >>>>>>>>>>>> concrete
> >>> >>>>>>>>>>>>>>>>>>>>>>> submission,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    create JobGraph --\
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create ClusterClient -->
> >>> >> submit JobGraph
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> such a dependency.
> >>> >> ClusterClient was
> >>> >>>>>>>> created
> >>> >>>>>>>>>> by
> >>> >>>>>>>>>>>>>>> deploying
> >>> >>>>>>>>>>>>>>>>> or
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieving.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph submission requires a
> >>> >> compiled
> >>> >>>>>>>>>>> JobGraph
> >>> >>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>> valid
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but the creation of
> >>> >> ClusterClient is
> >>> >>>>>>>>>> abstractly
> >>> >>>>>>>>>>>>>>>> independent
> >>> >>>>>>>>>>>>>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> that
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph. However, in job
> >>> >> cluster mode,
> >>> >>>>>>>> we
> >>> >>>>>>>>>>>> deploy job
> >>> >>>>>>>>>>>>>>>>> cluster
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> with
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> a
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> graph, which means we use
> >>> >> another
> >>> >>>>>>>> process:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create JobGraph --> deploy
> >>> >> cluster with
> >>> >>>>>>>> the
> >>> >>>>>>>>>>>> JobGraph
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Here is another inconsistency
> >>> >> and
> >>> >>>>>>>> downstream
> >>> >>>>>>>>>>>>>>>>> projects/client
> >>> >>>>>>>>>>>>>>>>>>>>>>>> apis
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> are
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forced to handle different
> >>> >> cases with
> >>> >>>>>>>> rare
> >>> >>>>>>>>>>>> supports
> >>> >>>>>>>>>>>>>>> from
> >>> >>>>>>>>>>>>>>>>>>>>>>> Flink.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since we likely reached a
> >>> >> consensus on
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. all configs gathered by
> >>> >> Flink
> >>> >>>>>>>>>> configuration
> >>> >>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>> passed
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. execution environment knows
> >>> >> all
> >>> >>>>>>>> configs
> >>> >>>>>>>>>> and
> >>> >>>>>>>>>>>>>> handles
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> execution(both
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment and submission)
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the issues above I propose
> >>> >> eliminating
> >>> >>>>>>>>>>>>>>> inconsistencies
> >>> >>>>>>>>>>>>>>>>> by
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> following
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend should exactly
> >>> >> be a front
> >>> >>>>>>>> end,
> >>> >>>>>>>>>>> at
> >>> >>>>>>>>>>>>>> least
> >>> >>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>> "run"
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That means it just gathered
> >>> >> and passed
> >>> >>>>>>>> all
> >>> >>>>>>>>>>> config
> >>> >>>>>>>>>>>>>> from
> >>> >>>>>>>>>>>>>>>>>>>>>>> command
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> line
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the main method of user
> >>> >> program.
> >>> >>>>>>>> Execution
> >>> >>>>>>>>>>>>>> environment
> >>> >>>>>>>>>>>>>>>>> knows
> >>> >>>>>>>>>>>>>>>>>>>>>>>> all
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> info
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and with an addition to utils
> >>> >> for
> >>> >>>>>>>>>>> ClusterClient,
> >>> >>>>>>>>>>>> we
> >>> >>>>>>>>>>>>>>>>>>>>>>> gracefully
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> get
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> a
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient by deploying or
> >>> >>>>>>>> retrieving. In
> >>> >>>>>>>>>>> this
> >>> >>>>>>>>>>>>>> way,
> >>> >>>>>>>>>>>>>>> we
> >>> >>>>>>>>>>>>>>>>>>>>>>> don't
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> need
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hijack #execute/executePlan
> >>> >> methods and
> >>> >>>>>>>> can
> >>> >>>>>>>>>>>> remove
> >>> >>>>>>>>>>>>>>>> various
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> hacking
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses of exec env, as
> >>> >> well as #run
> >>> >>>>>>>>>> methods
> >>> >>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient(for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interface-ized ClusterClient).
> >>> >> Now the
> >>> >>>>>>>>>> control
> >>> >>>>>>>>>>>> flow
> >>> >>>>>>>>>>>>>>> flows
> >>> >>>>>>>>>>>>>>>>>>>>>>> from
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> CliFrontend
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the main method and never
> >>> >> returns.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Job cluster means a cluster
> >>> >> for the
> >>> >>>>>>>>>> specific
> >>> >>>>>>>>>>>> job.
> >>> >>>>>>>>>>>>>>> From
> >>> >>>>>>>>>>>>>>>>>>>>>>>> another
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> perspective, it is an
> >>> >> ephemeral session.
> >>> >>>>>>>> We
> >>> >>>>>>>>>> may
> >>> >>>>>>>>>>>>>>> decouple
> >>> >>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with a compiled job graph, but
> >>> >> start a
> >>> >>>>>>>>>> session
> >>> >>>>>>>>>>>> with
> >>> >>>>>>>>>>>>>>> idle
> >>> >>>>>>>>>>>>>>>>>>>>>>>> timeout
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit the job following.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> These topics, before we go
> >>> >> into more
> >>> >>>>>>>> details
> >>> >>>>>>>>>> on
> >>> >>>>>>>>>>>>>> design
> >>> >>>>>>>>>>>>>>> or
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are better to be aware and
> >>> >> discussed for
> >>> >>>>>>>> a
> >>> >>>>>>>>>>>> consensus.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <
> >>> >> wander4096@gmail.com>
> >>> >>>>>>>>>> 于2019年6月20日周四
> >>> >>>>>>>>>>>>>>> 上午3:21写道：
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for raising this
> >>> >> thread and the
> >>> >>>>>>>>>> design
> >>> >>>>>>>>>>>>>>> document!
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As @Thomas Weise mentioned
> >>> >> above,
> >>> >>>>>>>> extending
> >>> >>>>>>>>>>>> config
> >>> >>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>> flink
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires far more effort than
> >>> >> it should
> >>> >>>>>>>> be.
> >>> >>>>>>>>>>>> Another
> >>> >>>>>>>>>>>>>>>>> example
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is we achieve detach mode by
> >>> >> introduce
> >>> >>>>>>>>>> another
> >>> >>>>>>>>>>>>>>> execution
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment which also hijack
> >>> >> #execute
> >>> >>>>>>>>>> method.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with your idea that
> >>> >> user would
> >>> >>>>>>>>>>>> configure all
> >>> >>>>>>>>>>>>>>>>> things
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and flink "just" respect it.
> >>> >> On this
> >>> >>>>>>>> topic I
> >>> >>>>>>>>>>>> think
> >>> >>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>> unusual
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow when CliFrontend
> >>> >> handle
> >>> >>>>>>>> "run"
> >>> >>>>>>>>>>>> command
> >>> >>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> problem.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It handles several configs,
> >>> >> mainly about
> >>> >>>>>>>>>>> cluster
> >>> >>>>>>>>>>>>>>>> settings,
> >>> >>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thus main method of user
> >>> >> program is
> >>> >>>>>>>> unaware
> >>> >>>>>>>>>> of
> >>> >>>>>>>>>>>> them.
> >>> >>>>>>>>>>>>>>>> Also
> >>> >>>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> compiles
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> app to job graph by run the
> >>> >> main method
> >>> >>>>>>>>>> with a
> >>> >>>>>>>>>>>>>>> hijacked
> >>> >>>>>>>>>>>>>>>>> exec
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> env,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which constrain the main
> >>> >> method further.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to write down a few
> >>> >> of notes on
> >>> >>>>>>>>>>>>>> configs/args
> >>> >>>>>>>>>>>>>>>> pass
> >>> >>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> respect,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as well as decoupling job
> >>> >> compilation
> >>> >>>>>>>> and
> >>> >>>>>>>>>>>>>> submission.
> >>> >>>>>>>>>>>>>>>>> Share
> >>> >>>>>>>>>>>>>>>>>>>>>>> on
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread later.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SHI Xiaogang <
> >>> >> shixiaogangg@gmail.com>
> >>> >>>>>>>>>>>> 于2019年6月17日周一
> >>> >>>>>>>>>>>>>>>>>>>>>>> 下午7:29写道：
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff and Flavio,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Jeff a lot for
> >>> >> proposing the
> >>> >>>>>>>> design
> >>> >>>>>>>>>>>>>> document.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are also working on
> >>> >> refactoring
> >>> >>>>>>>>>>>> ClusterClient to
> >>> >>>>>>>>>>>>>>>> allow
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> flexible
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> efficient job management in
> >>> >> our
> >>> >>>>>>>> real-time
> >>> >>>>>>>>>>>> platform.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We would like to draft a
> >>> >> document to
> >>> >>>>>>>> share
> >>> >>>>>>>>>>> our
> >>> >>>>>>>>>>>>>> ideas
> >>> >>>>>>>>>>>>>>>> with
> >>> >>>>>>>>>>>>>>>>>>>>>>>> you.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think it's a good idea to
> >>> >> have
> >>> >>>>>>>> something
> >>> >>>>>>>>>>> like
> >>> >>>>>>>>>>>>>>> Apache
> >>> >>>>>>>>>>>>>>>>> Livy
> >>> >>>>>>>>>>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the efforts discussed here
> >>> >> will take a
> >>> >>>>>>>>>> great
> >>> >>>>>>>>>>>> step
> >>> >>>>>>>>>>>>>>>> forward
> >>> >>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> it.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Xiaogang
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
> >>> >>>>>>>> pompermaier@okkam.it>
> >>> >>>>>>>>>>>>>>>> 于2019年6月17日周一
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> 下午7:13写道：
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Is there any possibility to
> >>> >> have
> >>> >>>>>>>> something
> >>> >>>>>>>>>>>> like
> >>> >>>>>>>>>>>>>>> Apache
> >>> >>>>>>>>>>>>>>>>>>>>>>> Livy
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> also
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink in the future?
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>> >> https://livy.apache.org/
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 11, 2019 at
> >>> >> 5:23 PM Jeff
> >>> >>>>>>>>>> Zhang <
> >>> >>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any API we expose
> >>> >> should not have
> >>> >>>>>>>>>>>> dependencies
> >>> >>>>>>>>>>>>>>> on
> >>> >>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package or
> >>> >> other
> >>> >>>>>>>>>>>> implementation
> >>> >>>>>>>>>>>>>>>>>>>>>>> details.
> >>> >>>>>>>>>>>>>>>>>>>>>>>> To
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> me,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
> >>> >> ClusterClient
> >>> >>>>>>>> cannot be
> >>> >>>>>>>>>>>> exposed
> >>> >>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>> users
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> because
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from the
> >>> >>>>>>>> optimiser and
> >>> >>>>>>>>>>>> runtime
> >>> >>>>>>>>>>>>>>>>>>>>>>>> packages.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We should change
> >>> >> ClusterClient from
> >>> >>>>>>>> class
> >>> >>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>> interface.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment only
> >>> >> use the
> >>> >>>>>>>>>> interface
> >>> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in flink-clients while the
> >>> >> concrete
> >>> >>>>>>>>>>>>>> implementation
> >>> >>>>>>>>>>>>>>>>>>>>>>> class
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> could
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-runtime.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What happens when a
> >>> >>>>>>>> failure/restart in
> >>> >>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>> client
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> happens?
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to be a way of
> >>> >> re-establishing the
> >>> >>>>>>>>>>>> connection to
> >>> >>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>> job,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> set
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> up
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners again, etc.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Good point.  First we need
> >>> >> to define
> >>> >>>>>>>> what
> >>> >>>>>>>>>>>> does
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> failure/restart
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client mean. IIUC, that
> >>> >> usually mean
> >>> >>>>>>>>>>> network
> >>> >>>>>>>>>>>>>>> failure
> >>> >>>>>>>>>>>>>>>>>>>>>>>> which
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> will
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happen in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class RestClient. If my
> >>> >>>>>>>> understanding is
> >>> >>>>>>>>>>>> correct,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> restart/retry
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mechanism
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be done in
> >>> >> RestClient.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <
> >>> >>>>>>>> aljoscha@apache.org>
> >>> >>>>>>>>>>>>>>> 于2019年6月11日周二
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:10写道：
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Some points to consider:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * Any API we expose
> >>> >> should not have
> >>> >>>>>>>>>>>> dependencies
> >>> >>>>>>>>>>>>>>> on
> >>> >>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package
> >>> >> or other
> >>> >>>>>>>>>>>> implementation
> >>> >>>>>>>>>>>>>>>>>>>>>>>> details.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> To
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> me,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
> >>> >> ClusterClient
> >>> >>>>>>>> cannot be
> >>> >>>>>>>>>>>> exposed
> >>> >>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>> users
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> because
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from
> >>> >> the
> >>> >>>>>>>> optimiser
> >>> >>>>>>>>>> and
> >>> >>>>>>>>>>>>>> runtime
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> packages.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * What happens when a
> >>> >>>>>>>> failure/restart in
> >>> >>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>> client
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> happens?
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be a way of
> >>> >> re-establishing the
> >>> >>>>>>>>>> connection
> >>> >>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>> job,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> set
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> up
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again, etc.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 29. May 2019, at
> >>> >> 10:17, Jeff
> >>> >>>>>>>> Zhang <
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry folks, the design
> >>> >> doc is
> >>> >>>>>>>> late as
> >>> >>>>>>>>>>> you
> >>> >>>>>>>>>>>>>>>>>>>>>>> expected.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Here's
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doc
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I drafted, welcome any
> >>> >> comments and
> >>> >>>>>>>>>>>> feedback.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>
> >>> >>>>>>>>
> >>> >>
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <
> >>> >> sewen@apache.org>
> >>> >>>>>>>>>>>> 于2019年2月14日周四
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 下午8:43写道：
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Nice that this
> >>> >> discussion is
> >>> >>>>>>>>>> happening.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the FLIP, we could
> >>> >> also
> >>> >>>>>>>> revisit the
> >>> >>>>>>>>>>>> entire
> >>> >>>>>>>>>>>>>>> role
> >>> >>>>>>>>>>>>>>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environments
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the idea was:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the environments take
> >>> >> care of
> >>> >>>>>>>> the
> >>> >>>>>>>>>>>> specific
> >>> >>>>>>>>>>>>>>>>>>>>>>> setup
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> standalone
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (no
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> setup needed), yarn,
> >>> >> mesos, etc.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the session ones have
> >>> >> control
> >>> >>>>>>>> over
> >>> >>>>>>>>>> the
> >>> >>>>>>>>>>>>>>> session.
> >>> >>>>>>>>>>>>>>>>>>>>>>>> The
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> holds
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the session client.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - running a job gives a
> >>> >> "control"
> >>> >>>>>>>>>> object
> >>> >>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>> that
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> job.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> That
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behavior
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the same in all
> >>> >> environments.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The actual
> >>> >> implementation diverged
> >>> >>>>>>>>>> quite
> >>> >>>>>>>>>>>> a bit
> >>> >>>>>>>>>>>>>>>>>>>>>>> from
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> that.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Happy
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see a
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about
> >>> >> straitening this
> >>> >>>>>>>> out
> >>> >>>>>>>>>> a
> >>> >>>>>>>>>>>> bit
> >>> >>>>>>>>>>>>>>> more.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at
> >>> >> 4:58 AM
> >>> >>>>>>>> Jeff
> >>> >>>>>>>>>>>> Zhang <
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi folks,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for late
> >>> >> response, It
> >>> >>>>>>>> seems we
> >>> >>>>>>>>>>>> reach
> >>> >>>>>>>>>>>>>>>>>>>>>>>> consensus
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> on
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, I
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP for this with
> >>> >> more detailed
> >>> >>>>>>>>>> design
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas Weise <
> >>> >> thw@apache.org>
> >>> >>>>>>>>>>>> 于2018年12月21日周五
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> 上午11:43写道：
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great to see this
> >>> >> discussion
> >>> >>>>>>>> seeded!
> >>> >>>>>>>>>>> The
> >>> >>>>>>>>>>>>>>>>>>>>>>> problems
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> you
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> face
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin integration
> >>> >> are also
> >>> >>>>>>>>>>> affecting
> >>> >>>>>>>>>>>>>> other
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> downstream
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We just enabled the
> >>> >> savepoint
> >>> >>>>>>>>>> restore
> >>> >>>>>>>>>>>> option
> >>> >>>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RemoteStreamEnvironment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and that was more
> >>> >> difficult
> >>> >>>>>>>> than it
> >>> >>>>>>>>>>>> should
> >>> >>>>>>>>>>>>>> be.
> >>> >>>>>>>>>>>>>>>>>>>>>>> The
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> main
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment and
> >>> >> cluster client
> >>> >>>>>>>>>> aren't
> >>> >>>>>>>>>>>>>>> decoupled.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Ideally
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible to just get
> >>> >> the
> >>> >>>>>>>> matching
> >>> >>>>>>>>>>>> cluster
> >>> >>>>>>>>>>>>>>> client
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> from
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then control the job
> >>> >> through it
> >>> >>>>>>>>>>>> (environment
> >>> >>>>>>>>>>>>>>> as
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> factory
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client). But note
> >>> >> that the
> >>> >>>>>>>>>> environment
> >>> >>>>>>>>>>>>>> classes
> >>> >>>>>>>>>>>>>>>>>>>>>>> are
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> part
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and it is not
> >>> >> straightforward to
> >>> >>>>>>>>>> make
> >>> >>>>>>>>>>>> larger
> >>> >>>>>>>>>>>>>>>>>>>>>>>> changes
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> without
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> breaking
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backward
> >>> >> compatibility.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>> >> currently exposes
> >>> >>>>>>>>>>> internal
> >>> >>>>>>>>>>>>>>> classes
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> like
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> StreamGraph. But it
> >>> >> should be
> >>> >>>>>>>>>> possible
> >>> >>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>> wrap
> >>> >>>>>>>>>>>>>>>>>>>>>>>> this
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> with a
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that brings the
> >>> >> required job
> >>> >>>>>>>> control
> >>> >>>>>>>>>>>>>>>>>>>>>>> capabilities
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Perhaps it is helpful
> >>> >> to look at
> >>> >>>>>>>>>> some
> >>> >>>>>>>>>>>> of the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> interfaces
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> while
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking about this:
> >>> >> [2] for the
> >>> >>>>>>>>>>>> portable
> >>> >>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>> API
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> old
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> asynchronous job
> >>> >> control from
> >>> >>>>>>>> the
> >>> >>>>>>>>>> Beam
> >>> >>>>>>>>>>>> Java
> >>> >>>>>>>>>>>>>>> SDK.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The backward
> >>> >> compatibility
> >>> >>>>>>>>>> discussion
> >>> >>>>>>>>>>>> [4] is
> >>> >>>>>>>>>>>>>>>>>>>>>>> also
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> relevant
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here. A
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should shield
> >>> >> downstream
> >>> >>>>>>>> projects
> >>> >>>>>>>>>> from
> >>> >>>>>>>>>>>>>>> internals
> >>> >>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> allow
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> them to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interoperate with
> >>> >> multiple
> >>> >>>>>>>> future
> >>> >>>>>>>>>>> Flink
> >>> >>>>>>>>>>>>>>> versions
> >>> >>>>>>>>>>>>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> same
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> release
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without forced
> >>> >> upgrades.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>> >>>>>>>>>>>>>> https://github.com/apache/flink/pull/7249
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>
> >>> >>>>>>>>
> >>> >>
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>
> >>> >>>>>>>>
> >>> >>
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [4]
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>
> >>> >>>>>>>>
> >>> >>
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018
> >>> >> at 6:15 PM
> >>> >>>>>>>> Jeff
> >>> >>>>>>>>>>>> Zhang <
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
> >>> >> whether the
> >>> >>>>>>>> user
> >>> >>>>>>>>>>>> should
> >>> >>>>>>>>>>>>>> be
> >>> >>>>>>>>>>>>>>>>>>>>>>>> able
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> define
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
> >>> >> example Yarn).
> >>> >>>>>>>> This
> >>> >>>>>>>>>> is
> >>> >>>>>>>>>>>>>> actually
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> independent
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
> >>> >> something
> >>> >>>>>>>> which
> >>> >>>>>>>>>> is
> >>> >>>>>>>>>>>>>> decided
> >>> >>>>>>>>>>>>>>>>>>>>>>> at
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> User don't need to
> >>> >> specify
> >>> >>>>>>>>>> execution
> >>> >>>>>>>>>>>> mode
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatically.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> They
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass the execution
> >>> >> mode from
> >>> >>>>>>>> the
> >>> >>>>>>>>>>>> arguments
> >>> >>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>> flink
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> run
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
> >>> >> yarn-cluster
> >>> >>>>>>>> ....
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
> >>> >> local ...
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
> >>> >> host:port ...
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this make sense
> >>> >> to you ?
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To me it makes
> >>> >> sense that
> >>> >>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized by the
> >>> >> user and
> >>> >>>>>>>> instead
> >>> >>>>>>>>>>>> context
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> sensitive
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> how
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute your job
> >>> >> (Flink CLI vs.
> >>> >>>>>>>>>> IDE,
> >>> >>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>> example).
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Right, currently I
> >>> >> notice Flink
> >>> >>>>>>>>>> would
> >>> >>>>>>>>>>>>>> create
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> different
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >> ContextExecutionEnvironment
> >>> >>>>>>>> based
> >>> >>>>>>>>>> on
> >>> >>>>>>>>>>>>>>> different
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> submission
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scenarios
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cli vs IDE). To me
> >>> >> this is
> >>> >>>>>>>> kind of
> >>> >>>>>>>>>>> hack
> >>> >>>>>>>>>>>>>>>>>>>>>>> approach,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> not
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> so
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> straightforward.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested
> >>> >> above is that
> >>> >>>>>>>> is
> >>> >>>>>>>>>>> that
> >>> >>>>>>>>>>>>>> flink
> >>> >>>>>>>>>>>>>>>>>>>>>>>> should
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> always
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>> >> but with
> >>> >>>>>>>>>>> different
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> configuration,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> based
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration it
> >>> >> would create
> >>> >>>>>>>> the
> >>> >>>>>>>>>>>> proper
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behaviors.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
> >>> >>>>>>>>>> trohrmann@apache.org>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月20日周四
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:18写道：
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You are probably
> >>> >> right that we
> >>> >>>>>>>>>> have
> >>> >>>>>>>>>>>> code
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> duplication
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> when
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creation of the
> >>> >> ClusterClient.
> >>> >>>>>>>>>> This
> >>> >>>>>>>>>>>> should
> >>> >>>>>>>>>>>>>>> be
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> reduced
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
> >>> >> whether the
> >>> >>>>>>>> user
> >>> >>>>>>>>>>>> should be
> >>> >>>>>>>>>>>>>>>>>>>>>>> able
> >>> >>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> define
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
> >>> >> example Yarn).
> >>> >>>>>>>> This
> >>> >>>>>>>>>> is
> >>> >>>>>>>>>>>>>>> actually
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> independent
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
> >>> >> something
> >>> >>>>>>>> which
> >>> >>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>> decided
> >>> >>>>>>>>>>>>>>>>>>>>>>> at
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> makes sense that the
> >>> >>>>>>>>>>>> ExecutionEnvironment
> >>> >>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>>>>>>>> not
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user and
> >>> >> instead context
> >>> >>>>>>>>>>>> sensitive how
> >>> >>>>>>>>>>>>>>> you
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> want
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE,
> >>> >> for
> >>> >>>>>>>> example).
> >>> >>>>>>>>>>>>>> However, I
> >>> >>>>>>>>>>>>>>>>>>>>>>>> agree
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >> ExecutionEnvironment should
> >>> >>>>>>>> give
> >>> >>>>>>>>>> you
> >>> >>>>>>>>>>>>>> access
> >>> >>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job (maybe in the
> >>> >> form of the
> >>> >>>>>>>>>>>> JobGraph or
> >>> >>>>>>>>>>>>>> a
> >>> >>>>>>>>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> plan).
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 13,
> >>> >> 2018 at 4:36
> >>> >>>>>>>> AM
> >>> >>>>>>>>>> Jeff
> >>> >>>>>>>>>>>>>> Zhang <
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Till,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the
> >>> >> feedback. You
> >>> >>>>>>>> are
> >>> >>>>>>>>>>>> right
> >>> >>>>>>>>>>>>>>> that I
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> expect
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>> >> submission/control api
> >>> >>>>>>>> which
> >>> >>>>>>>>>>>> could be
> >>> >>>>>>>>>>>>>>>>>>>>>>> used
> >>> >>>>>>>>>>>>>>>>>>>>>>>> by
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> project.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it would benefit
> >>> >> for the
> >>> >>>>>>>> flink
> >>> >>>>>>>>>>>> ecosystem.
> >>> >>>>>>>>>>>>>>>>>>>>>>> When
> >>> >>>>>>>>>>>>>>>>>>>>>>>> I
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> look
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scala-shell and
> >>> >> sql-client (I
> >>> >>>>>>>>>>> believe
> >>> >>>>>>>>>>>>>> they
> >>> >>>>>>>>>>>>>>>>>>>>>>> are
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> not
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> core of
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> belong to the
> >>> >> ecosystem of
> >>> >>>>>>>>>> flink),
> >>> >>>>>>>>>>> I
> >>> >>>>>>>>>>>> find
> >>> >>>>>>>>>>>>>>>>>>>>>>> many
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> duplicated
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creating
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient from
> >>> >> user
> >>> >>>>>>>> provided
> >>> >>>>>>>>>>>>>>>>>>>>>>> configuration
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (configuration
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> format
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different from
> >>> >> scala-shell
> >>> >>>>>>>> and
> >>> >>>>>>>>>>>>>> sql-client)
> >>> >>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> then
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> use
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to manipulate
> >>> >> jobs. I don't
> >>> >>>>>>>> think
> >>> >>>>>>>>>>>> this is
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> convenient
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects. What I
> >>> >> expect is
> >>> >>>>>>>> that
> >>> >>>>>>>>>>>>>> downstream
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> project
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> only
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary
> >>> >> configuration info
> >>> >>>>>>>>>> (maybe
> >>> >>>>>>>>>>>>>>>>>>>>>>> introducing
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> class
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf),
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> build
> >>> >> ExecutionEnvironment
> >>> >>>>>>>> based
> >>> >>>>>>>>>> on
> >>> >>>>>>>>>>>> this
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >> ExecutionEnvironment will
> >>> >>>>>>>> create
> >>> >>>>>>>>>>> the
> >>> >>>>>>>>>>>>>> proper
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> benefit for the
> >>> >> downstream
> >>> >>>>>>>>>> project
> >>> >>>>>>>>>>>>>>>>>>>>>>> development
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> but
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> also
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their integration
> >>> >> test with
> >>> >>>>>>>>>> flink.
> >>> >>>>>>>>>>>> Here's
> >>> >>>>>>>>>>>>>>> one
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> sample
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> code
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> snippet
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expect.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val conf = new
> >>> >>>>>>>>>>>> FlinkConf().mode("yarn")
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val env = new
> >>> >>>>>>>>>>>> ExecutionEnvironment(conf)
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobId =
> >>> >> env.submit(...)
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobStatus =
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What do you think ?
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
> >>> >>>>>>>>>>> trohrmann@apache.org>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月11日周二
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午6:28写道：
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what you are
> >>> >> proposing is to
> >>> >>>>>>>>>>>> provide the
> >>> >>>>>>>>>>>>>>>>>>>>>>> user
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> with
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control. There
> >>> >> was actually
> >>> >>>>>>>> an
> >>> >>>>>>>>>>>> effort to
> >>> >>>>>>>>>>>>>>>>>>>>>>>> achieve
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> never
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> been
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> completed [1].
> >>> >> However,
> >>> >>>>>>>> there
> >>> >>>>>>>>>> are
> >>> >>>>>>>>>>>> some
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> improvement
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> base
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> now.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Look for example
> >>> >> at the
> >>> >>>>>>>>>>>> NewClusterClient
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> interface
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> offers a
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> non-blocking job
> >>> >> submission.
> >>> >>>>>>>>>> But I
> >>> >>>>>>>>>>>> agree
> >>> >>>>>>>>>>>>>>>>>>>>>>> that
> >>> >>>>>>>>>>>>>>>>>>>>>>>> we
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> need
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improve
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this regard.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would not be in
> >>> >> favour if
> >>> >>>>>>>>>>>> exposing all
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> calls
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >> ExecutionEnvironment
> >>> >>>>>>>> because it
> >>> >>>>>>>>>>>> would
> >>> >>>>>>>>>>>>>>>>>>>>>>> clutter
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> class
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> good separation
> >>> >> of concerns.
> >>> >>>>>>>>>>>> Instead one
> >>> >>>>>>>>>>>>>>>>>>>>>>> idea
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> could
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current
> >>> >> ClusterClient from
> >>> >>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for cluster and
> >>> >> job
> >>> >>>>>>>> control. But
> >>> >>>>>>>>>>>> before
> >>> >>>>>>>>>>>>>> we
> >>> >>>>>>>>>>>>>>>>>>>>>>>> start
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> effort
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> agree and capture
> >>> >> what
> >>> >>>>>>>>>>>> functionality we
> >>> >>>>>>>>>>>>>>> want
> >>> >>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> provide.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the
> >>> >> idea was
> >>> >>>>>>>> that we
> >>> >>>>>>>>>>>> have the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> describing
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to talk to
> >>> >> cluster manager
> >>> >>>>>>>> like
> >>> >>>>>>>>>>>> Yarn or
> >>> >>>>>>>>>>>>>>>>>>>>>>> Mesos.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> The
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used for
> >>> >> deploying Flink
> >>> >>>>>>>>>> clusters
> >>> >>>>>>>>>>>> (job
> >>> >>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> session)
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gives
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you a
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
> >>> >>>>>>>> ClusterClient
> >>> >>>>>>>>>>>>>> controls
> >>> >>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (e.g.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, listing all
> >>> >> running
> >>> >>>>>>>> jobs).
> >>> >>>>>>>>>>> And
> >>> >>>>>>>>>>>>>> then
> >>> >>>>>>>>>>>>>>>>>>>>>>>> there
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> was
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> idea
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobClient which
> >>> >> you obtain
> >>> >>>>>>>> from
> >>> >>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trigger
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operations (e.g.
> >>> >> taking a
> >>> >>>>>>>>>>> savepoint,
> >>> >>>>>>>>>>>>>>>>>>>>>>>> cancelling
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> job).
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-4272
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11,
> >>> >> 2018 at
> >>> >>>>>>>> 10:13 AM
> >>> >>>>>>>>>>>> Jeff
> >>> >>>>>>>>>>>>>>> Zhang
> >>> >>>>>>>>>>>>>>>>>>>>>>> <
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am trying to
> >>> >> integrate
> >>> >>>>>>>> flink
> >>> >>>>>>>>>>> into
> >>> >>>>>>>>>>>>>>> apache
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> zeppelin
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which is
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> notebook. And I
> >>> >> hit several
> >>> >>>>>>>>>>> issues
> >>> >>>>>>>>>>>> that
> >>> >>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> caused
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> by
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to
> >>> >> proposal the
> >>> >>>>>>>>>>> following
> >>> >>>>>>>>>>>>>>> changes
> >>> >>>>>>>>>>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> flink
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Support
> >>> >> nonblocking
> >>> >>>>>>>>>> execution.
> >>> >>>>>>>>>>>>>>>>>>>>>>> Currently,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >> ExecutionEnvironment#execute
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a blocking
> >>> >> method which
> >>> >>>>>>>>>> would
> >>> >>>>>>>>>>>> do 2
> >>> >>>>>>>>>>>>>>>>>>>>>>> things,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> first
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wait for job
> >>> >> until it is
> >>> >>>>>>>>>>> finished.
> >>> >>>>>>>>>>>> I'd
> >>> >>>>>>>>>>>>>>> like
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nonblocking
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution method
> >>> >> like
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#submit
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then return
> >>> >> jobId to
> >>> >>>>>>>> client.
> >>> >>>>>>>>>> And
> >>> >>>>>>>>>>>> allow
> >>> >>>>>>>>>>>>>>> user
> >>> >>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> query
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> status
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel
> >>> >> api in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >> ExecutionEnvironment/StreamExecutionEnvironment,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> currently the
> >>> >> only way to
> >>> >>>>>>>>>> cancel
> >>> >>>>>>>>>>>> job is
> >>> >>>>>>>>>>>>>>> via
> >>> >>>>>>>>>>>>>>>>>>>>>>>> cli
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (bin/flink),
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> convenient for
> >>> >> downstream
> >>> >>>>>>>>>> project
> >>> >>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>> use
> >>> >>>>>>>>>>>>>>>>>>>>>>> this
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> feature.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So I'd
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> add
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel api in
> >>> >>>>>>>>>>> ExecutionEnvironment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint
> >>> >> api in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is similar as
> >>> >> cancel api,
> >>> >>>>>>>> we
> >>> >>>>>>>>>>>> should use
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unified
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api for third
> >>> >> party to
> >>> >>>>>>>>>> integrate
> >>> >>>>>>>>>>>> with
> >>> >>>>>>>>>>>>>>>>>>>>>>> flink.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. Add listener
> >>> >> for job
> >>> >>>>>>>>>> execution
> >>> >>>>>>>>>>>>>>>>>>>>>>> lifecycle.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Something
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> so
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that downstream
> >>> >> project
> >>> >>>>>>>> can do
> >>> >>>>>>>>>>>> custom
> >>> >>>>>>>>>>>>>>> logic
> >>> >>>>>>>>>>>>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lifecycle
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin would
> >>> >> capture the
> >>> >>>>>>>>>> jobId
> >>> >>>>>>>>>>>> after
> >>> >>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> submitted
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId to cancel
> >>> >> it later
> >>> >>>>>>>> when
> >>> >>>>>>>>>>>>>> necessary.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public interface
> >>> >>>>>>>> JobListener {
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
> >>> >> onJobSubmitted(JobID
> >>> >>>>>>>>>>> jobId);
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
> >>> >>>>>>>>>>>> onJobExecuted(JobExecutionResult
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> jobResult);
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
> >>> >> onJobCanceled(JobID
> >>> >>>>>>>>>> jobId);
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> }
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5. Enable
> >>> >> session in
> >>> >>>>>>>>>>>>>>> ExecutionEnvironment.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Currently
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> disabled,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> session is very
> >>> >> convenient
> >>> >>>>>>>> for
> >>> >>>>>>>>>>>> third
> >>> >>>>>>>>>>>>>>> party
> >>> >>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> continually.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I hope flink can
> >>> >> enable it
> >>> >>>>>>>>>> again.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6. Unify all
> >>> >> flink client
> >>> >>>>>>>> api
> >>> >>>>>>>>>>> into
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a long
> >>> >> term issue
> >>> >>>>>>>> which
> >>> >>>>>>>>>>>> needs
> >>> >>>>>>>>>>>>>>> more
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> careful
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently some
> >>> >> of features
> >>> >>>>>>>> of
> >>> >>>>>>>>>>>> flink is
> >>> >>>>>>>>>>>>>>>>>>>>>>>> exposed
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> but
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some are
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exposed
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cli instead of
> >>> >> api, like
> >>> >>>>>>>> the
> >>> >>>>>>>>>>>> cancel and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> savepoint I
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> above.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think the root
> >>> >> cause is
> >>> >>>>>>>> due to
> >>> >>>>>>>>>>> that
> >>> >>>>>>>>>>>>>> flink
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> didn't
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> unify
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interaction
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink. Here I
> >>> >> list 3
> >>> >>>>>>>> scenarios
> >>> >>>>>>>>>> of
> >>> >>>>>>>>>>>> flink
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> operation
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Local job
> >>> >> execution.
> >>> >>>>>>>> Flink
> >>> >>>>>>>>>>> will
> >>> >>>>>>>>>>>>>>> create
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>> >> LocalEnvironment to
> >>> >>>>>>>>>> create
> >>> >>>>>>>>>>>>>>>>>>>>>>>> LocalExecutor
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Remote job
> >>> >> execution.
> >>> >>>>>>>> Flink
> >>> >>>>>>>>>>> will
> >>> >>>>>>>>>>>>>>> create
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
> >>> >> ContextEnvironment
> >>> >>>>>>>>>> based
> >>> >>>>>>>>>>>> on the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> run
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Job
> >>> >> cancelation. Flink
> >>> >>>>>>>> will
> >>> >>>>>>>>>>>> create
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this job via
> >>> >> this
> >>> >>>>>>>>>> ClusterClient.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As you can see
> >>> >> in the
> >>> >>>>>>>> above 3
> >>> >>>>>>>>>>>>>> scenarios.
> >>> >>>>>>>>>>>>>>>>>>>>>>>> Flink
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> didn't
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach(code
> >>> >> path) to
> >>> >>>>>>>> interact
> >>> >>>>>>>>>>>> with
> >>> >>>>>>>>>>>>>>> flink
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I propose is
> >>> >>>>>>>> following:
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Create the proper
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment/RemoteEnvironment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (based
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration)
> >>> >> --> Use this
> >>> >>>>>>>>>>>> Environment
> >>> >>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> create
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> proper
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >> (LocalClusterClient or
> >>> >>>>>>>>>>>>>> RestClusterClient)
> >>> >>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink (
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution or
> >>> >> cancelation)
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This way we can
> >>> >> unify the
> >>> >>>>>>>>>> process
> >>> >>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>> local
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> execution
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remote
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And it is much
> >>> >> easier for
> >>> >>>>>>>> third
> >>> >>>>>>>>>>>> party
> >>> >>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> integrate
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >> ExecutionEnvironment is the
> >>> >>>>>>>>>>> unified
> >>> >>>>>>>>>>>>>> entry
> >>> >>>>>>>>>>>>>>>>>>>>>>>> point
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> third
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> party
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs to do is
> >>> >> just pass
> >>> >>>>>>>>>>>> configuration
> >>> >>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >> ExecutionEnvironment will
> >>> >>>>>>>> do
> >>> >>>>>>>>>> the
> >>> >>>>>>>>>>>> right
> >>> >>>>>>>>>>>>>>>>>>>>>>> thing
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> based
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> on
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink cli can
> >>> >> also be
> >>> >>>>>>>>>> considered
> >>> >>>>>>>>>>> as
> >>> >>>>>>>>>>>>>> flink
> >>> >>>>>>>>>>>>>>>>>>>>>>> api
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> consumer.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> just
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration to
> >>> >>>>>>>>>>>> ExecutionEnvironment
> >>> >>>>>>>>>>>>>> and
> >>> >>>>>>>>>>>>>>>>>>>>>>> let
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create the proper
> >>> >>>>>>>> ClusterClient
> >>> >>>>>>>>>>>> instead
> >>> >>>>>>>>>>>>>>> of
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> letting
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> cli
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>> >> directly.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6 would involve
> >>> >> large code
> >>> >>>>>>>>>>>> refactoring,
> >>> >>>>>>>>>>>>>>> so
> >>> >>>>>>>>>>>>>>>>>>>>>>> I
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> think
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>> we
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> defer
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future release,
> >>> >> 1,2,3,4,5
> >>> >>>>>>>> could
> >>> >>>>>>>>>>> be
> >>> >>>>>>>>>>>> done
> >>> >>>>>>>>>>>>>>> at
> >>> >>>>>>>>>>>>>>>>>>>>>>>>> once I
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> believe.
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comments and
> >>> >> feedback,
> >>> >>>>>>>> thanks
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>> --
> >>> >>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>> >>>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>> --
> >>> >>>>>>> Best Regards
> >>> >>>>>>>
> >>> >>>>>>> Jeff Zhang
> >>> >>
> >>> >>
> >>>
> >>
> >>
> >> --
> >> Best Regards
> >>
> >> Jeff Zhang
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Kostas Kloudas <kk...@gmail.com>.

+1 for creating a channel.

Kostas

On Wed, Sep 11, 2019 at 10:57 AM Zili Chen <wa...@gmail.com> wrote:
>
> Hi Aljoscha,
>
> I'm OK to use the ASF slack.
>
> Best,
> tison.
>
>
> Jeff Zhang <zj...@gmail.com> 于2019年9月11日周三 下午4:48写道：
>>
>> +1 for using slack for instant communication
>>
>> Aljoscha Krettek <al...@apache.org> 于2019年9月11日周三 下午4:44写道：
>>>
>>> Hi,
>>>
>>> We could try and use the ASF slack for this purpose, that would probably be easiest. See https://s.apache.org/slack-invite. We could create a dedicated channel for our work and would still use the open ASF infrastructure and people can have a look if they are interested because discussion would be public. What do you think?
>>>
>>> P.S. Committers/PMCs should should be able to login with their apache ID.
>>>
>>> Best,
>>> Aljoscha
>>>
>>> > On 6. Sep 2019, at 14:24, Zili Chen <wa...@gmail.com> wrote:
>>> >
>>> > Hi Aljoscha,
>>> >
>>> > I'd like to gather all the ideas here and among documents, and draft a
>>> > formal FLIP
>>> > that keep us on the same page. Hopefully I start a FLIP thread in next week.
>>> >
>>> > For the implementation or said POC part, I'd like to work with you guys who
>>> > proposed
>>> > the concept Executor to make sure that we go in the same direction. I'm
>>> > wondering
>>> > whether a dedicate thread or a Slack group is the proper one. In my opinion
>>> > we can
>>> > involve the team in a Slack group, concurrent with the FLIP process start
>>> > our branch
>>> > and once we reach a consensus on the FLIP, open an umbrella issue about the
>>> > framework
>>> > and start subtasks. What do you think?
>>> >
>>> > Best,
>>> > tison.
>>> >
>>> >
>>> > Aljoscha Krettek <al...@apache.org> 于2019年9月5日周四 下午9:39写道：
>>> >
>>> >> Hi Tison,
>>> >>
>>> >> To keep this moving forward, maybe you want to start working on a proof of
>>> >> concept implementation for the new JobClient interface, maybe with a new
>>> >> method executeAsync() in the environment that returns the JobClient and
>>> >> implement the methods to see how that works and to see where we get. Would
>>> >> you be interested in that?
>>> >>
>>> >> Also, at some point we should collect all the ideas and start forming an
>>> >> actual FLIP.
>>> >>
>>> >> Best,
>>> >> Aljoscha
>>> >>
>>> >>> On 4. Sep 2019, at 12:04, Zili Chen <wa...@gmail.com> wrote:
>>> >>>
>>> >>> Thanks for your update Kostas!
>>> >>>
>>> >>> It looks good to me that clean up existing code paths as first
>>> >>> pass. I'd like to help on review and file subtasks if I find ones.
>>> >>>
>>> >>> Best,
>>> >>> tison.
>>> >>>
>>> >>>
>>> >>> Kostas Kloudas <kk...@gmail.com> 于2019年9月4日周三 下午5:52写道：
>>> >>> Here is the issue, and I will keep on updating it as I find more issues.
>>> >>>
>>> >>> https://issues.apache.org/jira/browse/FLINK-13954
>>> >>>
>>> >>> This will also cover the refactoring of the Executors that we discussed
>>> >>> in this thread, without any additional functionality (such as the job
>>> >> client).
>>> >>>
>>> >>> Kostas
>>> >>>
>>> >>> On Wed, Sep 4, 2019 at 11:46 AM Kostas Kloudas <kk...@gmail.com>
>>> >> wrote:
>>> >>>>
>>> >>>> Great idea Tison!
>>> >>>>
>>> >>>> I will create the umbrella issue and post it here so that we are all
>>> >>>> on the same page!
>>> >>>>
>>> >>>> Cheers,
>>> >>>> Kostas
>>> >>>>
>>> >>>> On Wed, Sep 4, 2019 at 11:36 AM Zili Chen <wa...@gmail.com>
>>> >> wrote:
>>> >>>>>
>>> >>>>> Hi Kostas & Aljoscha,
>>> >>>>>
>>> >>>>> I notice that there is a JIRA(FLINK-13946) which could be included
>>> >>>>> in this refactor thread. Since we agree on most of directions in
>>> >>>>> big picture, is it reasonable that we create an umbrella issue for
>>> >>>>> refactor client APIs and also linked subtasks? It would be a better
>>> >>>>> way that we join forces of our community.
>>> >>>>>
>>> >>>>> Best,
>>> >>>>> tison.
>>> >>>>>
>>> >>>>>
>>> >>>>> Zili Chen <wa...@gmail.com> 于2019年8月31日周六 下午12:52写道：
>>> >>>>>>
>>> >>>>>> Great Kostas! Looking forward to your POC!
>>> >>>>>>
>>> >>>>>> Best,
>>> >>>>>> tison.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Jeff Zhang <zj...@gmail.com> 于2019年8月30日周五 下午11:07写道：
>>> >>>>>>>
>>> >>>>>>> Awesome, @Kostas Looking forward your POC.
>>> >>>>>>>
>>> >>>>>>> Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：
>>> >>>>>>>
>>> >>>>>>>> Hi all,
>>> >>>>>>>>
>>> >>>>>>>> I am just writing here to let you know that I am working on a
>>> >> POC that
>>> >>>>>>>> tries to refactor the current state of job submission in Flink.
>>> >>>>>>>> I want to stress out that it introduces NO CHANGES to the current
>>> >>>>>>>> behaviour of Flink. It just re-arranges things and introduces the
>>> >>>>>>>> notion of an Executor, which is the entity responsible for
>>> >> taking the
>>> >>>>>>>> user-code and submitting it for execution.
>>> >>>>>>>>
>>> >>>>>>>> Given this, the discussion about the functionality that the
>>> >> JobClient
>>> >>>>>>>> will expose to the user can go on independently and the same
>>> >>>>>>>> holds for all the open questions so far.
>>> >>>>>>>>
>>> >>>>>>>> I hope I will have some more new to share soon.
>>> >>>>>>>>
>>> >>>>>>>> Thanks,
>>> >>>>>>>> Kostas
>>> >>>>>>>>
>>> >>>>>>>> On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <da...@gmail.com>
>>> >> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>> Hi Zili,
>>> >>>>>>>>>
>>> >>>>>>>>> It make sense to me that a dedicated cluster is started for a
>>> >> per-job
>>> >>>>>>>>> cluster and will not accept more jobs.
>>> >>>>>>>>> Just have a question about the command line.
>>> >>>>>>>>>
>>> >>>>>>>>> Currently we could use the following commands to start
>>> >> different
>>> >>>>>>>> clusters.
>>> >>>>>>>>> *per-job cluster*
>>> >>>>>>>>> ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
>>> >>>>>>>>> examples/streaming/WindowJoin.jar
>>> >>>>>>>>> *session cluster*
>>> >>>>>>>>> ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
>>> >>>>>>>>> examples/streaming/WindowJoin.jar
>>> >>>>>>>>>
>>> >>>>>>>>> What will it look like after client enhancement?
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> Best,
>>> >>>>>>>>> Yang
>>> >>>>>>>>>
>>> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
>>> >>>>>>>>>
>>> >>>>>>>>>> Hi Till,
>>> >>>>>>>>>>
>>> >>>>>>>>>> Thanks for your update. Nice to hear :-)
>>> >>>>>>>>>>
>>> >>>>>>>>>> Best,
>>> >>>>>>>>>> tison.
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> Till Rohrmann <tr...@apache.org> 于2019年8月23日周五
>>> >> 下午10:39写道：
>>> >>>>>>>>>>
>>> >>>>>>>>>>> Hi Tison,
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> just a quick comment concerning the class loading issues
>>> >> when using
>>> >>>>>>>> the
>>> >>>>>>>>>> per
>>> >>>>>>>>>>> job mode. The community wants to change it so that the
>>> >>>>>>>>>>> StandaloneJobClusterEntryPoint actually uses the user code
>>> >> class
>>> >>>>>>>> loader
>>> >>>>>>>>>>> with child first class loading [1]. Hence, I hope that
>>> >> this problem
>>> >>>>>>>> will
>>> >>>>>>>>>> be
>>> >>>>>>>>>>> resolved soon.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-13840
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Cheers,
>>> >>>>>>>>>>> Till
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <
>>> >> kkloudas@gmail.com>
>>> >>>>>>>>>> wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> Hi all,
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> On the topic of web submission, I agree with Till that
>>> >> it only
>>> >>>>>>>> seems
>>> >>>>>>>>>>>> to complicate things.
>>> >>>>>>>>>>>> It is bad for security, job isolation (anybody can
>>> >> submit/cancel
>>> >>>>>>>> jobs),
>>> >>>>>>>>>>>> and its
>>> >>>>>>>>>>>> implementation complicates some parts of the code. So,
>>> >> if it were
>>> >>>>>>>> to
>>> >>>>>>>>>>>> redesign the
>>> >>>>>>>>>>>> WebUI, maybe this part could be left out. In addition, I
>>> >> would say
>>> >>>>>>>>>>>> that the ability to cancel
>>> >>>>>>>>>>>> jobs could also be left out.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Also I would also be in favour of removing the
>>> >> "detached" mode, for
>>> >>>>>>>>>>>> the reasons mentioned
>>> >>>>>>>>>>>> above (i.e. because now we will have a future
>>> >> representing the
>>> >>>>>>>> result
>>> >>>>>>>>>>>> on which the user
>>> >>>>>>>>>>>> can choose to wait or not).
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Now for the separating job submission and cluster
>>> >> creation, I am in
>>> >>>>>>>>>>>> favour of keeping both.
>>> >>>>>>>>>>>> Once again, the reasons are mentioned above by Stephan,
>>> >> Till,
>>> >>>>>>>> Aljoscha
>>> >>>>>>>>>>>> and also Zili seems
>>> >>>>>>>>>>>> to agree. They mainly have to do with security,
>>> >> isolation and ease
>>> >>>>>>>> of
>>> >>>>>>>>>>>> resource management
>>> >>>>>>>>>>>> for the user as he knows that "when my job is done,
>>> >> everything
>>> >>>>>>>> will be
>>> >>>>>>>>>>>> cleared up". This is
>>> >>>>>>>>>>>> also the experience you get when launching a process on
>>> >> your local
>>> >>>>>>>> OS.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> On excluding the per-job mode from returning a JobClient
>>> >> or not, I
>>> >>>>>>>>>>>> believe that eventually
>>> >>>>>>>>>>>> it would be nice to allow users to get back a jobClient.
>>> >> The
>>> >>>>>>>> reason is
>>> >>>>>>>>>>>> that 1) I cannot
>>> >>>>>>>>>>>> find any objective reason why the user-experience should
>>> >> diverge,
>>> >>>>>>>> and
>>> >>>>>>>>>>>> 2) this will be the
>>> >>>>>>>>>>>> way that the user will be able to interact with his
>>> >> running job.
>>> >>>>>>>>>>>> Assuming that the necessary
>>> >>>>>>>>>>>> ports are open for the REST API to work, then I think
>>> >> that the
>>> >>>>>>>>>>>> JobClient can run against the
>>> >>>>>>>>>>>> REST API without problems. If the needed ports are not
>>> >> open, then
>>> >>>>>>>> we
>>> >>>>>>>>>>>> are safe to not return
>>> >>>>>>>>>>>> a JobClient, as the user explicitly chose to close all
>>> >> points of
>>> >>>>>>>>>>>> communication to his running job.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> On the topic of not hijacking the "env.execute()" in
>>> >> order to get
>>> >>>>>>>> the
>>> >>>>>>>>>>>> Plan, I definitely agree but
>>> >>>>>>>>>>>> for the proposal of having a "compile()" method in the
>>> >> env, I would
>>> >>>>>>>>>>>> like to have a better look at
>>> >>>>>>>>>>>> the existing code.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Cheers,
>>> >>>>>>>>>>>> Kostas
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <
>>> >> wander4096@gmail.com>
>>> >>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Hi Yang,
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> It would be helpful if you check Stephan's last
>>> >> comment,
>>> >>>>>>>>>>>>> which states that isolation is important.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> For per-job mode, we run a dedicated cluster(maybe it
>>> >>>>>>>>>>>>> should have been a couple of JM and TMs during FLIP-6
>>> >>>>>>>>>>>>> design) for a specific job. Thus the process is
>>> >> prevented
>>> >>>>>>>>>>>>> from other jobs.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> In our cases there was a time we suffered from multi
>>> >>>>>>>>>>>>> jobs submitted by different users and they affected
>>> >>>>>>>>>>>>> each other so that all ran into an error state. Also,
>>> >>>>>>>>>>>>> run the client inside the cluster could save client
>>> >>>>>>>>>>>>> resource at some points.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> However, we also face several issues as you mentioned,
>>> >>>>>>>>>>>>> that in per-job mode it always uses parent classloader
>>> >>>>>>>>>>>>> thus classloading issues occur.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> BTW, one can makes an analogy between session/per-job
>>> >> mode
>>> >>>>>>>>>>>>> in  Flink, and client/cluster mode in Spark.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>> tison.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Yang Wang <da...@gmail.com> 于2019年8月22日周四
>>> >> 上午11:25写道：
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> From the user's perspective, it is really confused
>>> >> about the
>>> >>>>>>>> scope
>>> >>>>>>>>>> of
>>> >>>>>>>>>>>>>> per-job cluster.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> If it means a flink cluster with single job, so that
>>> >> we could
>>> >>>>>>>> get
>>> >>>>>>>>>>>> better
>>> >>>>>>>>>>>>>> isolation.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Now it does not matter how we deploy the cluster,
>>> >> directly
>>> >>>>>>>>>>>> deploy(mode1)
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> or start a flink cluster and then submit job through
>>> >> cluster
>>> >>>>>>>>>>>> client(mode2).
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Otherwise, if it just means directly deploy, how
>>> >> should we
>>> >>>>>>>> name the
>>> >>>>>>>>>>>> mode2,
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> session with job or something else?
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> We could also benefit from the mode2. Users could
>>> >> get the same
>>> >>>>>>>>>>>> isolation
>>> >>>>>>>>>>>>>> with mode1.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> The user code and dependencies will be loaded by
>>> >> user class
>>> >>>>>>>> loader
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> to avoid class conflict with framework.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Anyway, both of the two submission modes are useful.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> We just need to clarify the concepts.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Yang
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月20日周二
>>> >> 下午5:58写道：
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Thanks for the clarification.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> The idea JobDeployer ever came into my mind when I
>>> >> was
>>> >>>>>>>> muddled
>>> >>>>>>>>>> with
>>> >>>>>>>>>>>>>>> how to execute per-job mode and session mode with
>>> >> the same
>>> >>>>>>>> user
>>> >>>>>>>>>>> code
>>> >>>>>>>>>>>>>>> and framework codepath.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> With the concept JobDeployer we back to the
>>> >> statement that
>>> >>>>>>>>>>>> environment
>>> >>>>>>>>>>>>>>> knows every configs of cluster deployment and job
>>> >>>>>>>> submission. We
>>> >>>>>>>>>>>>>>> configure or generate from configuration a specific
>>> >>>>>>>> JobDeployer
>>> >>>>>>>>>> in
>>> >>>>>>>>>>>>>>> environment and then code align on
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> *JobClient client = env.execute().get();*
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> which in session mode returned by
>>> >> clusterClient.submitJob
>>> >>>>>>>> and in
>>> >>>>>>>>>>>> per-job
>>> >>>>>>>>>>>>>>> mode returned by
>>> >> clusterDescriptor.deployJobCluster.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Here comes a problem that currently we directly run
>>> >>>>>>>>>>> ClusterEntrypoint
>>> >>>>>>>>>>>>>>> with extracted job graph. Follow the JobDeployer
>>> >> way we'd
>>> >>>>>>>> better
>>> >>>>>>>>>>>>>>> align entry point of per-job deployment at
>>> >> JobDeployer.
>>> >>>>>>>> Users run
>>> >>>>>>>>>>>>>>> their main method or by a Cli(finally call main
>>> >> method) to
>>> >>>>>>>> deploy
>>> >>>>>>>>>>> the
>>> >>>>>>>>>>>>>>> job cluster.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>> tison.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年8月20日周二
>>> >> 下午4:40写道：
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Till has made some good comments here.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Two things to add:
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>  - The job mode is very nice in the way that it
>>> >> runs the
>>> >>>>>>>>>> client
>>> >>>>>>>>>>>> inside
>>> >>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>> cluster (in the same image/process that is the
>>> >> JM) and thus
>>> >>>>>>>>>>> unifies
>>> >>>>>>>>>>>>>> both
>>> >>>>>>>>>>>>>>>> applications and what the Spark world calls the
>>> >> "driver
>>> >>>>>>>> mode".
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>  - Another thing I would add is that during the
>>> >> FLIP-6
>>> >>>>>>>> design,
>>> >>>>>>>>>>> we
>>> >>>>>>>>>>>> were
>>> >>>>>>>>>>>>>>>> thinking about setups where Dispatcher and
>>> >> JobManager are
>>> >>>>>>>>>>> separate
>>> >>>>>>>>>>>>>>>> processes.
>>> >>>>>>>>>>>>>>>>    A Yarn or Mesos Dispatcher of a session
>>> >> could run
>>> >>>>>>>>>>> independently
>>> >>>>>>>>>>>>>> (even
>>> >>>>>>>>>>>>>>>> as privileged processes executing no code).
>>> >>>>>>>>>>>>>>>>    Then you the "per-job" mode could still be
>>> >> helpful:
>>> >>>>>>>> when a
>>> >>>>>>>>>>> job
>>> >>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>> submitted to the dispatcher, it launches the JM
>>> >> again in a
>>> >>>>>>>>>>> per-job
>>> >>>>>>>>>>>>>> mode,
>>> >>>>>>>>>>>>>>> so
>>> >>>>>>>>>>>>>>>> that JM and TM processes are bound to teh job
>>> >> only. For
>>> >>>>>>>> higher
>>> >>>>>>>>>>>> security
>>> >>>>>>>>>>>>>>>> setups, it is important that processes are not
>>> >> reused
>>> >>>>>>>> across
>>> >>>>>>>>>>> jobs.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
>>> >>>>>>>>>>>> trohrmann@apache.org>
>>> >>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> I would not be in favour of getting rid of the
>>> >> per-job
>>> >>>>>>>> mode
>>> >>>>>>>>>>>> since it
>>> >>>>>>>>>>>>>>>>> simplifies the process of running Flink jobs
>>> >>>>>>>> considerably.
>>> >>>>>>>>>>>> Moreover,
>>> >>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>> not only well suited for container deployments
>>> >> but also
>>> >>>>>>>> for
>>> >>>>>>>>>>>>>> deployments
>>> >>>>>>>>>>>>>>>>> where you want to guarantee job isolation. For
>>> >> example, a
>>> >>>>>>>>>> user
>>> >>>>>>>>>>>> could
>>> >>>>>>>>>>>>>>> use
>>> >>>>>>>>>>>>>>>>> the per-job mode on Yarn to execute his job on
>>> >> a separate
>>> >>>>>>>>>>>> cluster.
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> I think that having two notions of cluster
>>> >> deployments
>>> >>>>>>>>>> (session
>>> >>>>>>>>>>>> vs.
>>> >>>>>>>>>>>>>>>> per-job
>>> >>>>>>>>>>>>>>>>> mode) does not necessarily contradict your
>>> >> ideas for the
>>> >>>>>>>>>> client
>>> >>>>>>>>>>>> api
>>> >>>>>>>>>>>>>>>>> refactoring. For example one could have the
>>> >> following
>>> >>>>>>>>>>> interfaces:
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> - ClusterDeploymentDescriptor: encapsulates
>>> >> the logic
>>> >>>>>>>> how to
>>> >>>>>>>>>>>> deploy a
>>> >>>>>>>>>>>>>>>>> cluster.
>>> >>>>>>>>>>>>>>>>> - ClusterClient: allows to interact with a
>>> >> cluster
>>> >>>>>>>>>>>>>>>>> - JobClient: allows to interact with a running
>>> >> job
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Now the ClusterDeploymentDescriptor could have
>>> >> two
>>> >>>>>>>> methods:
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> - ClusterClient deploySessionCluster()
>>> >>>>>>>>>>>>>>>>> - JobClusterClient/JobClient
>>> >>>>>>>> deployPerJobCluster(JobGraph)
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> where JobClusterClient is either a supertype of
>>> >>>>>>>> ClusterClient
>>> >>>>>>>>>>>> which
>>> >>>>>>>>>>>>>>> does
>>> >>>>>>>>>>>>>>>>> not give you the functionality to submit jobs
>>> >> or
>>> >>>>>>>>>>>> deployPerJobCluster
>>> >>>>>>>>>>>>>>>>> returns directly a JobClient.
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> When setting up the ExecutionEnvironment, one
>>> >> would then
>>> >>>>>>>> not
>>> >>>>>>>>>>>> provide
>>> >>>>>>>>>>>>>> a
>>> >>>>>>>>>>>>>>>>> ClusterClient to submit jobs but a JobDeployer
>>> >> which,
>>> >>>>>>>>>> depending
>>> >>>>>>>>>>>> on
>>> >>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>> selected mode, either uses a ClusterClient
>>> >> (session
>>> >>>>>>>> mode) to
>>> >>>>>>>>>>>> submit
>>> >>>>>>>>>>>>>>> jobs
>>> >>>>>>>>>>>>>>>> or
>>> >>>>>>>>>>>>>>>>> a ClusterDeploymentDescriptor to deploy per a
>>> >> job mode
>>> >>>>>>>>>> cluster
>>> >>>>>>>>>>>> with
>>> >>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>> to execute.
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> These are just some thoughts how one could
>>> >> make it
>>> >>>>>>>> working
>>> >>>>>>>>>>>> because I
>>> >>>>>>>>>>>>>>>>> believe there is some value in using the per
>>> >> job mode
>>> >>>>>>>> from
>>> >>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>> ExecutionEnvironment.
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Concerning the web submission, this is indeed
>>> >> a bit
>>> >>>>>>>> tricky.
>>> >>>>>>>>>>> From
>>> >>>>>>>>>>>> a
>>> >>>>>>>>>>>>>>>> cluster
>>> >>>>>>>>>>>>>>>>> management stand point, I would in favour of
>>> >> not
>>> >>>>>>>> executing
>>> >>>>>>>>>> user
>>> >>>>>>>>>>>> code
>>> >>>>>>>>>>>>>> on
>>> >>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>> REST endpoint. Especially when considering
>>> >> security, it
>>> >>>>>>>> would
>>> >>>>>>>>>>> be
>>> >>>>>>>>>>>> good
>>> >>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>> have a well defined cluster behaviour where it
>>> >> is
>>> >>>>>>>> explicitly
>>> >>>>>>>>>>>> stated
>>> >>>>>>>>>>>>>>> where
>>> >>>>>>>>>>>>>>>>> user code and, thus, potentially risky code is
>>> >> executed.
>>> >>>>>>>>>>> Ideally
>>> >>>>>>>>>>>> we
>>> >>>>>>>>>>>>>>> limit
>>> >>>>>>>>>>>>>>>>> it to the TaskExecutor and JobMaster.
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Cheers,
>>> >>>>>>>>>>>>>>>>> Till
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:40 AM Flavio
>>> >> Pompermaier <
>>> >>>>>>>>>>>>>>> pompermaier@okkam.it
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> In my opinion the client should not use any
>>> >>>>>>>> environment to
>>> >>>>>>>>>>> get
>>> >>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>> Job
>>> >>>>>>>>>>>>>>>>>> graph because the jar should reside ONLY on
>>> >> the cluster
>>> >>>>>>>>>> (and
>>> >>>>>>>>>>>> not in
>>> >>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>> client classpath otherwise there are always
>>> >>>>>>>> inconsistencies
>>> >>>>>>>>>>>> between
>>> >>>>>>>>>>>>>>>>> client
>>> >>>>>>>>>>>>>>>>>> and Flink Job manager's classpath).
>>> >>>>>>>>>>>>>>>>>> In the YARN, Mesos and Kubernetes scenarios
>>> >> you have
>>> >>>>>>>> the
>>> >>>>>>>>>> jar
>>> >>>>>>>>>>>> but
>>> >>>>>>>>>>>>>> you
>>> >>>>>>>>>>>>>>>>> could
>>> >>>>>>>>>>>>>>>>>> start a cluster that has the jar on the Job
>>> >> Manager as
>>> >>>>>>>> well
>>> >>>>>>>>>>>> (but
>>> >>>>>>>>>>>>>> this
>>> >>>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>> the only case where I think you can assume
>>> >> that the
>>> >>>>>>>> client
>>> >>>>>>>>>>> has
>>> >>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>> jar
>>> >>>>>>>>>>>>>>>> on
>>> >>>>>>>>>>>>>>>>>> the classpath..in the REST job submission
>>> >> you don't
>>> >>>>>>>> have
>>> >>>>>>>>>> any
>>> >>>>>>>>>>>>>>>> classpath).
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> Thus, always in my opinion, the JobGraph
>>> >> should be
>>> >>>>>>>>>> generated
>>> >>>>>>>>>>>> by the
>>> >>>>>>>>>>>>>>> Job
>>> >>>>>>>>>>>>>>>>>> Manager REST API.
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
>>> >>>>>>>>>>>> wander4096@gmail.com>
>>> >>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> I would like to involve Till & Stephan here
>>> >> to clarify
>>> >>>>>>>>>> some
>>> >>>>>>>>>>>>>> concept
>>> >>>>>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>>>>>> per-job mode.
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> The term per-job is one of modes a cluster
>>> >> could run
>>> >>>>>>>> on.
>>> >>>>>>>>>> It
>>> >>>>>>>>>>> is
>>> >>>>>>>>>>>>>>> mainly
>>> >>>>>>>>>>>>>>>>>>> aimed
>>> >>>>>>>>>>>>>>>>>>> at spawn
>>> >>>>>>>>>>>>>>>>>>> a dedicated cluster for a specific job
>>> >> while the job
>>> >>>>>>>> could
>>> >>>>>>>>>>> be
>>> >>>>>>>>>>>>>>> packaged
>>> >>>>>>>>>>>>>>>>>>> with
>>> >>>>>>>>>>>>>>>>>>> Flink
>>> >>>>>>>>>>>>>>>>>>> itself and thus the cluster initialized
>>> >> with job so
>>> >>>>>>>> that
>>> >>>>>>>>>> get
>>> >>>>>>>>>>>> rid
>>> >>>>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>> a
>>> >>>>>>>>>>>>>>>>>>> separated
>>> >>>>>>>>>>>>>>>>>>> submission step.
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> This is useful for container deployments
>>> >> where one
>>> >>>>>>>> create
>>> >>>>>>>>>>> his
>>> >>>>>>>>>>>>>> image
>>> >>>>>>>>>>>>>>>> with
>>> >>>>>>>>>>>>>>>>>>> the job
>>> >>>>>>>>>>>>>>>>>>> and then simply deploy the container.
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> However, it is out of client scope since a
>>> >>>>>>>>>>>> client(ClusterClient
>>> >>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>> example) is for
>>> >>>>>>>>>>>>>>>>>>> communicate with an existing cluster and
>>> >> performance
>>> >>>>>>>>>>> actions.
>>> >>>>>>>>>>>>>>>> Currently,
>>> >>>>>>>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>> per-job
>>> >>>>>>>>>>>>>>>>>>> mode, we extract the job graph and bundle
>>> >> it into
>>> >>>>>>>> cluster
>>> >>>>>>>>>>>>>> deployment
>>> >>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>> thus no
>>> >>>>>>>>>>>>>>>>>>> concept of client get involved. It looks
>>> >> like
>>> >>>>>>>> reasonable
>>> >>>>>>>>>> to
>>> >>>>>>>>>>>>>> exclude
>>> >>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>> deployment
>>> >>>>>>>>>>>>>>>>>>> of per-job cluster from client api and use
>>> >> dedicated
>>> >>>>>>>>>> utility
>>> >>>>>>>>>>>>>>>>>>> classes(deployers) for
>>> >>>>>>>>>>>>>>>>>>> deployment.
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
>>> >> 于2019年8月20日周二
>>> >>>>>>>> 下午12:37写道：
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> Hi Aljoscha,
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> Thanks for your reply and participance.
>>> >> The Google
>>> >>>>>>>> Doc
>>> >>>>>>>>>> you
>>> >>>>>>>>>>>>>> linked
>>> >>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>> requires
>>> >>>>>>>>>>>>>>>>>>>> permission and I think you could use a
>>> >> share link
>>> >>>>>>>>>> instead.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> I agree with that we almost reach a
>>> >> consensus that
>>> >>>>>>>>>>>> JobClient is
>>> >>>>>>>>>>>>>>>>>>> necessary
>>> >>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>> interacte with a running Job.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> Let me check your open questions one by
>>> >> one.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> 1. Separate cluster creation and job
>>> >> submission for
>>> >>>>>>>>>>> per-job
>>> >>>>>>>>>>>>>> mode.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> As you mentioned here is where the
>>> >> opinions
>>> >>>>>>>> diverge. In
>>> >>>>>>>>>> my
>>> >>>>>>>>>>>>>>> document
>>> >>>>>>>>>>>>>>>>>>> there
>>> >>>>>>>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>>> an alternative[2] that proposes excluding
>>> >> per-job
>>> >>>>>>>>>>> deployment
>>> >>>>>>>>>>>>>> from
>>> >>>>>>>>>>>>>>>>> client
>>> >>>>>>>>>>>>>>>>>>>> api
>>> >>>>>>>>>>>>>>>>>>>> scope and now I find it is more
>>> >> reasonable we do the
>>> >>>>>>>>>>>> exclusion.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> When in per-job mode, a dedicated
>>> >> JobCluster is
>>> >>>>>>>> launched
>>> >>>>>>>>>>> to
>>> >>>>>>>>>>>>>>> execute
>>> >>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>> specific job. It is like a Flink
>>> >> Application more
>>> >>>>>>>> than a
>>> >>>>>>>>>>>>>>> submission
>>> >>>>>>>>>>>>>>>>>>>> of Flink Job. Client only takes care of
>>> >> job
>>> >>>>>>>> submission
>>> >>>>>>>>>> and
>>> >>>>>>>>>>>>>> assume
>>> >>>>>>>>>>>>>>>>> there
>>> >>>>>>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>>> an existing cluster. In this way we are
>>> >> able to
>>> >>>>>>>> consider
>>> >>>>>>>>>>>> per-job
>>> >>>>>>>>>>>>>>>>> issues
>>> >>>>>>>>>>>>>>>>>>>> individually and JobClusterEntrypoint
>>> >> would be the
>>> >>>>>>>>>> utility
>>> >>>>>>>>>>>> class
>>> >>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>> per-job
>>> >>>>>>>>>>>>>>>>>>>> deployment.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> Nevertheless, user program works in both
>>> >> session
>>> >>>>>>>> mode
>>> >>>>>>>>>> and
>>> >>>>>>>>>>>>>> per-job
>>> >>>>>>>>>>>>>>>> mode
>>> >>>>>>>>>>>>>>>>>>>> without
>>> >>>>>>>>>>>>>>>>>>>> necessary to change code. JobClient in
>>> >> per-job mode
>>> >>>>>>>> is
>>> >>>>>>>>>>>> returned
>>> >>>>>>>>>>>>>>> from
>>> >>>>>>>>>>>>>>>>>>>> env.execute as normal. However, it would
>>> >> be no
>>> >>>>>>>> longer a
>>> >>>>>>>>>>>> wrapper
>>> >>>>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>>>>>>> RestClusterClient but a wrapper of
>>> >>>>>>>> PerJobClusterClient
>>> >>>>>>>>>>> which
>>> >>>>>>>>>>>>>>>>>>> communicates
>>> >>>>>>>>>>>>>>>>>>>> to Dispatcher locally.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> 2. How to deal with plan preview.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> With env.compile functions users can get
>>> >> JobGraph or
>>> >>>>>>>>>>>> FlinkPlan
>>> >>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>> thus
>>> >>>>>>>>>>>>>>>>>>>> they can preview the plan with
>>> >> programming.
>>> >>>>>>>> Typically it
>>> >>>>>>>>>>>> looks
>>> >>>>>>>>>>>>>>> like
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> if (preview configured) {
>>> >>>>>>>>>>>>>>>>>>>>    FlinkPlan plan = env.compile();
>>> >>>>>>>>>>>>>>>>>>>>    new JSONDumpGenerator(...).dump(plan);
>>> >>>>>>>>>>>>>>>>>>>> } else {
>>> >>>>>>>>>>>>>>>>>>>>    env.execute();
>>> >>>>>>>>>>>>>>>>>>>> }
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> And `flink info` would be invalid any
>>> >> more.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> 3. How to deal with Jar Submission at the
>>> >> Web
>>> >>>>>>>> Frontend.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> There is one more thread talked on this
>>> >> topic[1].
>>> >>>>>>>> Apart
>>> >>>>>>>>>>> from
>>> >>>>>>>>>>>>>>>> removing
>>> >>>>>>>>>>>>>>>>>>>> the functions there are two alternatives.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> One is to introduce an interface has a
>>> >> method
>>> >>>>>>>> returns
>>> >>>>>>>>>>>>>>>>> JobGraph/FilnkPlan
>>> >>>>>>>>>>>>>>>>>>>> and Jar Submission only support main-class
>>> >>>>>>>> implements
>>> >>>>>>>>>> this
>>> >>>>>>>>>>>>>>>> interface.
>>> >>>>>>>>>>>>>>>>>>>> And then extract the JobGraph/FlinkPlan
>>> >> just by
>>> >>>>>>>> calling
>>> >>>>>>>>>>> the
>>> >>>>>>>>>>>>>>> method.
>>> >>>>>>>>>>>>>>>>>>>> In this way, it is even possible to
>>> >> consider a
>>> >>>>>>>>>> separation
>>> >>>>>>>>>>>> of job
>>> >>>>>>>>>>>>>>>>>>> creation
>>> >>>>>>>>>>>>>>>>>>>> and job submission.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> The other is, as you mentioned, let
>>> >> execute() do the
>>> >>>>>>>>>>> actual
>>> >>>>>>>>>>>>>>>> execution.
>>> >>>>>>>>>>>>>>>>>>>> We won't execute the main method in the
>>> >> WebFrontend
>>> >>>>>>>> but
>>> >>>>>>>>>>>> spawn a
>>> >>>>>>>>>>>>>>>>> process
>>> >>>>>>>>>>>>>>>>>>>> at WebMonitor side to execute. For return
>>> >> part we
>>> >>>>>>>> could
>>> >>>>>>>>>>>> generate
>>> >>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>> JobID from WebMonitor and pass it to the
>>> >> execution
>>> >>>>>>>>>>>> environemnt.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> 4. How to deal with detached mode.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> I think detached mode is a temporary
>>> >> solution for
>>> >>>>>>>>>>>> non-blocking
>>> >>>>>>>>>>>>>>>>>>> submission.
>>> >>>>>>>>>>>>>>>>>>>> In my document both submission and
>>> >> execution return
>>> >>>>>>>> a
>>> >>>>>>>>>>>>>>>>> CompletableFuture
>>> >>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>> users control whether or not wait for the
>>> >> result. In
>>> >>>>>>>>>> this
>>> >>>>>>>>>>>> point
>>> >>>>>>>>>>>>>> we
>>> >>>>>>>>>>>>>>>>> don't
>>> >>>>>>>>>>>>>>>>>>>> need a detached option but the
>>> >> functionality is
>>> >>>>>>>> covered.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> 5. How does per-job mode interact with
>>> >> interactive
>>> >>>>>>>>>>>> programming.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> All of YARN, Mesos and Kubernetes
>>> >> scenarios follow
>>> >>>>>>>> the
>>> >>>>>>>>>>>> pattern
>>> >>>>>>>>>>>>>>>> launch
>>> >>>>>>>>>>>>>>>>> a
>>> >>>>>>>>>>>>>>>>>>>> JobCluster now. And I don't think there
>>> >> would be
>>> >>>>>>>>>>>> inconsistency
>>> >>>>>>>>>>>>>>>> between
>>> >>>>>>>>>>>>>>>>>>>> different resource management.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>>>>>>> tison.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> [1]
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
>>> >>>>>>>>>>>>>>>>>>>> [2]
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org>
>>> >>>>>>>> 于2019年8月16日周五
>>> >>>>>>>>>>>> 下午9:20写道：
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> Hi,
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> I read both Jeffs initial design
>>> >> document and the
>>> >>>>>>>> newer
>>> >>>>>>>>>>>>>> document
>>> >>>>>>>>>>>>>>> by
>>> >>>>>>>>>>>>>>>>>>>>> Tison. I also finally found the time to
>>> >> collect our
>>> >>>>>>>>>>>> thoughts on
>>> >>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>> issue,
>>> >>>>>>>>>>>>>>>>>>>>> I had quite some discussions with Kostas
>>> >> and this
>>> >>>>>>>> is
>>> >>>>>>>>>> the
>>> >>>>>>>>>>>>>> result:
>>> >>>>>>>>>>>>>>>> [1].
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> I think overall we agree that this part
>>> >> of the
>>> >>>>>>>> code is
>>> >>>>>>>>>> in
>>> >>>>>>>>>>>> dire
>>> >>>>>>>>>>>>>>> need
>>> >>>>>>>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>>>>>>>> some refactoring/improvements but I
>>> >> think there are
>>> >>>>>>>>>> still
>>> >>>>>>>>>>>> some
>>> >>>>>>>>>>>>>>> open
>>> >>>>>>>>>>>>>>>>>>>>> questions and some differences in
>>> >> opinion what
>>> >>>>>>>> those
>>> >>>>>>>>>>>>>> refactorings
>>> >>>>>>>>>>>>>>>>>>> should
>>> >>>>>>>>>>>>>>>>>>>>> look like.
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> I think the API-side is quite clear,
>>> >> i.e. we need
>>> >>>>>>>> some
>>> >>>>>>>>>>>>>> JobClient
>>> >>>>>>>>>>>>>>>> API
>>> >>>>>>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>>>> allows interacting with a running Job.
>>> >> It could be
>>> >>>>>>>>>>>> worthwhile
>>> >>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>> spin
>>> >>>>>>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>>>> off into a separate FLIP because we can
>>> >> probably
>>> >>>>>>>> find
>>> >>>>>>>>>>>> consensus
>>> >>>>>>>>>>>>>>> on
>>> >>>>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>>>> part more easily.
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> For the rest, the main open questions
>>> >> from our doc
>>> >>>>>>>> are
>>> >>>>>>>>>>>> these:
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>  - Do we want to separate cluster
>>> >> creation and job
>>> >>>>>>>>>>>> submission
>>> >>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>> per-job mode? In the past, there were
>>> >> conscious
>>> >>>>>>>> efforts
>>> >>>>>>>>>>> to
>>> >>>>>>>>>>>>>> *not*
>>> >>>>>>>>>>>>>>>>>>> separate
>>> >>>>>>>>>>>>>>>>>>>>> job submission from cluster creation for
>>> >> per-job
>>> >>>>>>>>>> clusters
>>> >>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>> Mesos,
>>> >>>>>>>>>>>>>>>>>>> YARN,
>>> >>>>>>>>>>>>>>>>>>>>> Kubernets (see
>>> >> StandaloneJobClusterEntryPoint).
>>> >>>>>>>> Tison
>>> >>>>>>>>>>>> suggests
>>> >>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>> his
>>> >>>>>>>>>>>>>>>>>>>>> design document to decouple this in
>>> >> order to unify
>>> >>>>>>>> job
>>> >>>>>>>>>>>>>>> submission.
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>  - How to deal with plan preview, which
>>> >> needs to
>>> >>>>>>>>>> hijack
>>> >>>>>>>>>>>>>>> execute()
>>> >>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>> let the outside code catch an exception?
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>  - How to deal with Jar Submission at
>>> >> the Web
>>> >>>>>>>>>> Frontend,
>>> >>>>>>>>>>>> which
>>> >>>>>>>>>>>>>>>> needs
>>> >>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>> hijack execute() and let the outside
>>> >> code catch an
>>> >>>>>>>>>>>> exception?
>>> >>>>>>>>>>>>>>>>>>>>> CliFrontend.run() “hijacks”
>>> >>>>>>>>>>> ExecutionEnvironment.execute()
>>> >>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>> get a
>>> >>>>>>>>>>>>>>>>>>>>> JobGraph and then execute that JobGraph
>>> >> manually.
>>> >>>>>>>> We
>>> >>>>>>>>>>> could
>>> >>>>>>>>>>>> get
>>> >>>>>>>>>>>>>>>> around
>>> >>>>>>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>>>> by letting execute() do the actual
>>> >> execution. One
>>> >>>>>>>>>> caveat
>>> >>>>>>>>>>>> for
>>> >>>>>>>>>>>>>> this
>>> >>>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>>>> now the main() method doesn’t return (or
>>> >> is forced
>>> >>>>>>>> to
>>> >>>>>>>>>>>> return by
>>> >>>>>>>>>>>>>>>>>>> throwing an
>>> >>>>>>>>>>>>>>>>>>>>> exception from execute()) which means
>>> >> that for Jar
>>> >>>>>>>>>>>> Submission
>>> >>>>>>>>>>>>>>> from
>>> >>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>> WebFrontend we have a long-running
>>> >> main() method
>>> >>>>>>>>>> running
>>> >>>>>>>>>>>> in the
>>> >>>>>>>>>>>>>>>>>>>>> WebFrontend. This doesn’t sound very
>>> >> good. We
>>> >>>>>>>> could get
>>> >>>>>>>>>>>> around
>>> >>>>>>>>>>>>>>> this
>>> >>>>>>>>>>>>>>>>> by
>>> >>>>>>>>>>>>>>>>>>>>> removing the plan preview feature and by
>>> >> removing
>>> >>>>>>>> Jar
>>> >>>>>>>>>>>>>>>>>>> Submission/Running.
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>  - How to deal with detached mode?
>>> >> Right now,
>>> >>>>>>>>>>>>>>> DetachedEnvironment
>>> >>>>>>>>>>>>>>>>> will
>>> >>>>>>>>>>>>>>>>>>>>> execute the job and return immediately.
>>> >> If users
>>> >>>>>>>>>> control
>>> >>>>>>>>>>>> when
>>> >>>>>>>>>>>>>>> they
>>> >>>>>>>>>>>>>>>>>>> want to
>>> >>>>>>>>>>>>>>>>>>>>> return, by waiting on the job completion
>>> >> future,
>>> >>>>>>>> how do
>>> >>>>>>>>>>> we
>>> >>>>>>>>>>>> deal
>>> >>>>>>>>>>>>>>>> with
>>> >>>>>>>>>>>>>>>>>>> this?
>>> >>>>>>>>>>>>>>>>>>>>> Do we simply remove the distinction
>>> >> between
>>> >>>>>>>>>>>>>>> detached/non-detached?
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>  - How does per-job mode interact with
>>> >>>>>>>> “interactive
>>> >>>>>>>>>>>>>> programming”
>>> >>>>>>>>>>>>>>>>>>>>> (FLIP-36). For YARN, each execute() call
>>> >> could
>>> >>>>>>>> spawn a
>>> >>>>>>>>>>> new
>>> >>>>>>>>>>>>>> Flink
>>> >>>>>>>>>>>>>>>> YARN
>>> >>>>>>>>>>>>>>>>>>>>> cluster. What about Mesos and Kubernetes?
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> The first open question is where the
>>> >> opinions
>>> >>>>>>>> diverge,
>>> >>>>>>>>>> I
>>> >>>>>>>>>>>> think.
>>> >>>>>>>>>>>>>>> The
>>> >>>>>>>>>>>>>>>>>>> rest
>>> >>>>>>>>>>>>>>>>>>>>> are just open questions and interesting
>>> >> things
>>> >>>>>>>> that we
>>> >>>>>>>>>>>> need to
>>> >>>>>>>>>>>>>>>>>>> consider.
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>>>>>>>> Aljoscha
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> [1]
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>>> >>>>>>>>>>>>>>>>>>>>> <
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> On 31. Jul 2019, at 15:23, Jeff Zhang <
>>> >>>>>>>>>>> zjffdu@gmail.com>
>>> >>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> Thanks tison for the effort. I left a
>>> >> few
>>> >>>>>>>> comments.
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
>>> >> 于2019年7月31日周三
>>> >>>>>>>>>>> 下午8:24写道：
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> Hi Flavio,
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks for your reply.
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> Either current impl and in the design,
>>> >>>>>>>> ClusterClient
>>> >>>>>>>>>>>>>>>>>>>>>>> never takes responsibility for
>>> >> generating
>>> >>>>>>>> JobGraph.
>>> >>>>>>>>>>>>>>>>>>>>>>> (what you see in current codebase is
>>> >> several
>>> >>>>>>>> class
>>> >>>>>>>>>>>> methods)
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> Instead, user describes his program
>>> >> in the main
>>> >>>>>>>>>> method
>>> >>>>>>>>>>>>>>>>>>>>>>> with ExecutionEnvironment apis and
>>> >> calls
>>> >>>>>>>>>> env.compile()
>>> >>>>>>>>>>>>>>>>>>>>>>> or env.optimize() to get FlinkPlan
>>> >> and JobGraph
>>> >>>>>>>>>>>>>> respectively.
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> For listing main classes in a jar and
>>> >> choose
>>> >>>>>>>> one for
>>> >>>>>>>>>>>>>>>>>>>>>>> submission, you're now able to
>>> >> customize a CLI
>>> >>>>>>>> to do
>>> >>>>>>>>>>> it.
>>> >>>>>>>>>>>>>>>>>>>>>>> Specifically, the path of jar is
>>> >> passed as
>>> >>>>>>>> arguments
>>> >>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>> in the customized CLI you list main
>>> >> classes,
>>> >>>>>>>> choose
>>> >>>>>>>>>>> one
>>> >>>>>>>>>>>>>>>>>>>>>>> to submit to the cluster.
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>>>>>>>>>> tison.
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
>>> >> pompermaier@okkam.it>
>>> >>>>>>>>>>> 于2019年7月31日周三
>>> >>>>>>>>>>>>>>>> 下午8:12写道：
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>> Just one note on my side: it is not
>>> >> clear to me
>>> >>>>>>>>>>>> whether the
>>> >>>>>>>>>>>>>>>>> client
>>> >>>>>>>>>>>>>>>>>>>>> needs
>>> >>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>> be able to generate a job graph or
>>> >> not.
>>> >>>>>>>>>>>>>>>>>>>>>>>> In my opinion, the job jar must
>>> >> resides only
>>> >>>>>>>> on the
>>> >>>>>>>>>>>>>>>>>>> server/jobManager
>>> >>>>>>>>>>>>>>>>>>>>>>> side
>>> >>>>>>>>>>>>>>>>>>>>>>>> and the client requires a way to get
>>> >> the job
>>> >>>>>>>> graph.
>>> >>>>>>>>>>>>>>>>>>>>>>>> If you really want to access to the
>>> >> job graph,
>>> >>>>>>>> I'd
>>> >>>>>>>>>>> add
>>> >>>>>>>>>>>> a
>>> >>>>>>>>>>>>>>>>> dedicated
>>> >>>>>>>>>>>>>>>>>>>>> method
>>> >>>>>>>>>>>>>>>>>>>>>>>> on the ClusterClient. like:
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>  - getJobGraph(jarId, mainClass):
>>> >> JobGraph
>>> >>>>>>>>>>>>>>>>>>>>>>>>  - listMainClasses(jarId):
>>> >> List<String>
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>> These would require some addition
>>> >> also on the
>>> >>>>>>>> job
>>> >>>>>>>>>>>> manager
>>> >>>>>>>>>>>>>>>>> endpoint
>>> >>>>>>>>>>>>>>>>>>> as
>>> >>>>>>>>>>>>>>>>>>>>>>>> well..what do you think?
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 31, 2019 at 12:42 PM
>>> >> Zili Chen <
>>> >>>>>>>>>>>>>>>> wander4096@gmail.com
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> Here is a document[1] on client api
>>> >>>>>>>> enhancement
>>> >>>>>>>>>> from
>>> >>>>>>>>>>>> our
>>> >>>>>>>>>>>>>>>>>>> perspective.
>>> >>>>>>>>>>>>>>>>>>>>>>>>> We have investigated current
>>> >> implementations.
>>> >>>>>>>> And
>>> >>>>>>>>>> we
>>> >>>>>>>>>>>>>> propose
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Unify the implementation of
>>> >> cluster
>>> >>>>>>>> deployment
>>> >>>>>>>>>>> and
>>> >>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>> submission
>>> >>>>>>>>>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>>> Flink.
>>> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Provide programmatic interfaces
>>> >> to allow
>>> >>>>>>>>>> flexible
>>> >>>>>>>>>>>> job
>>> >>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>> cluster
>>> >>>>>>>>>>>>>>>>>>>>>>>>> management.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> The first proposal is aimed at
>>> >> reducing code
>>> >>>>>>>> paths
>>> >>>>>>>>>>> of
>>> >>>>>>>>>>>>>>> cluster
>>> >>>>>>>>>>>>>>>>>>>>>>> deployment
>>> >>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>> job submission so that one can
>>> >> adopt Flink in
>>> >>>>>>>> his
>>> >>>>>>>>>>>> usage
>>> >>>>>>>>>>>>>>>> easily.
>>> >>>>>>>>>>>>>>>>>>> The
>>> >>>>>>>>>>>>>>>>>>>>>>>> second
>>> >>>>>>>>>>>>>>>>>>>>>>>>> proposal is aimed at providing rich
>>> >>>>>>>> interfaces for
>>> >>>>>>>>>>>>>> advanced
>>> >>>>>>>>>>>>>>>>> users
>>> >>>>>>>>>>>>>>>>>>>>>>>>> who want to make accurate control
>>> >> of these
>>> >>>>>>>> stages.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> Quick reference on open questions:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Exclude job cluster deployment
>>> >> from client
>>> >>>>>>>> side
>>> >>>>>>>>>>> or
>>> >>>>>>>>>>>>>>> redefine
>>> >>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>> semantic
>>> >>>>>>>>>>>>>>>>>>>>>>>>> of job cluster? Since it fits in a
>>> >> process
>>> >>>>>>>> quite
>>> >>>>>>>>>>>> different
>>> >>>>>>>>>>>>>>>> from
>>> >>>>>>>>>>>>>>>>>>>>> session
>>> >>>>>>>>>>>>>>>>>>>>>>>>> cluster deployment and job
>>> >> submission.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Maintain the codepaths handling
>>> >> class
>>> >>>>>>>>>>>>>>>>> o.a.f.api.common.Program
>>> >>>>>>>>>>>>>>>>>>> or
>>> >>>>>>>>>>>>>>>>>>>>>>>>> implement customized program
>>> >> handling logic by
>>> >>>>>>>>>>>> customized
>>> >>>>>>>>>>>>>>>>>>>>> CliFrontend?
>>> >>>>>>>>>>>>>>>>>>>>>>>>> See also this thread[2] and the
>>> >> document[1].
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> 3. Expose ClusterClient as public
>>> >> api or just
>>> >>>>>>>>>> expose
>>> >>>>>>>>>>>> api
>>> >>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >>>>>>>>>>>>>>>>>>>>>>>>> and delegate them to ClusterClient?
>>> >> Further,
>>> >>>>>>>> in
>>> >>>>>>>>>>>> either way
>>> >>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>> worth
>>> >>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>> introduce a JobClient which is an
>>> >>>>>>>> encapsulation of
>>> >>>>>>>>>>>>>>>> ClusterClient
>>> >>>>>>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>>>>>>>> associated to specific job?
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>>>>>>>>>>>> tison.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
>>> >>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang <zj...@gmail.com>
>>> >> 于2019年7月24日周三
>>> >>>>>>>>>>> 上午9:19写道：
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Stephan, I will follow up
>>> >> this issue
>>> >>>>>>>> in
>>> >>>>>>>>>> next
>>> >>>>>>>>>>>> few
>>> >>>>>>>>>>>>>>>> weeks,
>>> >>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>> will
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> refine the design doc. We could
>>> >> discuss more
>>> >>>>>>>>>>> details
>>> >>>>>>>>>>>>>> after
>>> >>>>>>>>>>>>>>>> 1.9
>>> >>>>>>>>>>>>>>>>>>>>>>> release.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
>>> >>>>>>>> 于2019年7月24日周三
>>> >>>>>>>>>>>> 上午12:58写道：
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all!
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> This thread has stalled for a
>>> >> bit, which I
>>> >>>>>>>>>> assume
>>> >>>>>>>>>>>> ist
>>> >>>>>>>>>>>>>>> mostly
>>> >>>>>>>>>>>>>>>>>>> due to
>>> >>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Flink 1.9 feature freeze and
>>> >> release testing
>>> >>>>>>>>>>> effort.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I personally still recognize this
>>> >> issue as
>>> >>>>>>>> one
>>> >>>>>>>>>>>> important
>>> >>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>> be
>>> >>>>>>>>>>>>>>>>>>>>>>>> solved.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I'd
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> be happy to help resume this
>>> >> discussion soon
>>> >>>>>>>>>>> (after
>>> >>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>> 1.9
>>> >>>>>>>>>>>>>>>>>>>>>>> release)
>>> >>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> see if we can do some step
>>> >> towards this in
>>> >>>>>>>> Flink
>>> >>>>>>>>>>>> 1.10.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 10:41 AM
>>> >> Flavio
>>> >>>>>>>>>>> Pompermaier
>>> >>>>>>>>>>>> <
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> pompermaier@okkam.it>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> That's exactly what I suggested
>>> >> a long time
>>> >>>>>>>>>> ago:
>>> >>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>> Flink
>>> >>>>>>>>>>>>>>>>> REST
>>> >>>>>>>>>>>>>>>>>>>>>>>>> client
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> should not require any Flink
>>> >> dependency,
>>> >>>>>>>> only
>>> >>>>>>>>>>> http
>>> >>>>>>>>>>>>>>> library
>>> >>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>> call
>>> >>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> REST
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> services to submit and monitor a
>>> >> job.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested also in [1] was
>>> >> to have a
>>> >>>>>>>> way
>>> >>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>> automatically
>>> >>>>>>>>>>>>>>>>>>>>>>>>> suggest
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> user (via a UI) the available
>>> >> main classes
>>> >>>>>>>> and
>>> >>>>>>>>>>>> their
>>> >>>>>>>>>>>>>>>> required
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> parameters[2].
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Another problem we have with
>>> >> Flink is that
>>> >>>>>>>> the
>>> >>>>>>>>>>> Rest
>>> >>>>>>>>>>>>>>> client
>>> >>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>> CLI
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> one
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> behaves differently and we use
>>> >> the CLI
>>> >>>>>>>> client
>>> >>>>>>>>>>> (via
>>> >>>>>>>>>>>> ssh)
>>> >>>>>>>>>>>>>>>>> because
>>> >>>>>>>>>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> allows
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> to call some other method after
>>> >>>>>>>> env.execute()
>>> >>>>>>>>>> [3]
>>> >>>>>>>>>>>> (we
>>> >>>>>>>>>>>>>>> have
>>> >>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>> call
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> another
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> REST service to signal the end
>>> >> of the job).
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Int his regard, a dedicated
>>> >> interface,
>>> >>>>>>>> like the
>>> >>>>>>>>>>>>>>> JobListener
>>> >>>>>>>>>>>>>>>>>>>>>>>> suggested
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the previous emails, would be
>>> >> very helpful
>>> >>>>>>>>>>> (IMHO).
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10864
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10862
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10879
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 9:54 AM
>>> >> Jeff Zhang
>>> >>>>>>>> <
>>> >>>>>>>>>>>>>>>> zjffdu@gmail.com
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Tison,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your comments.
>>> >> Overall I agree
>>> >>>>>>>> with
>>> >>>>>>>>>>> you
>>> >>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> difficult
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> down stream project to
>>> >> integrate with
>>> >>>>>>>> flink
>>> >>>>>>>>>> and
>>> >>>>>>>>>>> we
>>> >>>>>>>>>>>>>> need
>>> >>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>> refactor
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> current flink client api.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> And I agree that CliFrontend
>>> >> should only
>>> >>>>>>>>>> parsing
>>> >>>>>>>>>>>>>> command
>>> >>>>>>>>>>>>>>>>> line
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> arguments
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> then pass them to
>>> >> ExecutionEnvironment.
>>> >>>>>>>> It is
>>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment's
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> responsibility to compile job,
>>> >> create
>>> >>>>>>>> cluster,
>>> >>>>>>>>>>> and
>>> >>>>>>>>>>>>>>> submit
>>> >>>>>>>>>>>>>>>>> job.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Besides
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that, Currently flink has many
>>> >>>>>>>>>>>> ExecutionEnvironment
>>> >>>>>>>>>>>>>>>>>>>>>>>>> implementations,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink will use the specific one
>>> >> based on
>>> >>>>>>>> the
>>> >>>>>>>>>>>> context.
>>> >>>>>>>>>>>>>>>> IMHO,
>>> >>>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary, ExecutionEnvironment
>>> >> should be
>>> >>>>>>>> able
>>> >>>>>>>>>>> to
>>> >>>>>>>>>>>> do
>>> >>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>> right
>>> >>>>>>>>>>>>>>>>>>>>>>>>> thing
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> based
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the FlinkConf it is
>>> >> received. Too many
>>> >>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation is another
>>> >> burden for
>>> >>>>>>>>>> downstream
>>> >>>>>>>>>>>>>> project
>>> >>>>>>>>>>>>>>>>>>>>>>>>> integration.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> One thing I'd like to mention
>>> >> is flink's
>>> >>>>>>>> scala
>>> >>>>>>>>>>>> shell
>>> >>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>> sql
>>> >>>>>>>>>>>>>>>>>>>>>>>>> client,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> although they are sub-modules
>>> >> of flink,
>>> >>>>>>>> they
>>> >>>>>>>>>>>> could be
>>> >>>>>>>>>>>>>>>>> treated
>>> >>>>>>>>>>>>>>>>>>>>>>> as
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> project which use flink's
>>> >> client api.
>>> >>>>>>>>>> Currently
>>> >>>>>>>>>>>> you
>>> >>>>>>>>>>>>>> will
>>> >>>>>>>>>>>>>>>>> find
>>> >>>>>>>>>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> easy for them to integrate with
>>> >> flink,
>>> >>>>>>>> they
>>> >>>>>>>>>>> share
>>> >>>>>>>>>>>> many
>>> >>>>>>>>>>>>>>>>>>>>>>> duplicated
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> code.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> is another sign that we should
>>> >> refactor
>>> >>>>>>>> flink
>>> >>>>>>>>>>>> client
>>> >>>>>>>>>>>>>>> api.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I believe it is a large and
>>> >> hard change,
>>> >>>>>>>> and I
>>> >>>>>>>>>>> am
>>> >>>>>>>>>>>>>> afraid
>>> >>>>>>>>>>>>>>>> we
>>> >>>>>>>>>>>>>>>>>>> can
>>> >>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> keep
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility since many of
>>> >> changes are
>>> >>>>>>>> user
>>> >>>>>>>>>>>> facing.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <wander4096@gmail.com
>>> >>>
>>> >>>>>>>>>> 于2019年6月24日周一
>>> >>>>>>>>>>>>>>> 下午2:53写道：
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> After a closer look on our
>>> >> client apis,
>>> >>>>>>>> I can
>>> >>>>>>>>>>> see
>>> >>>>>>>>>>>>>> there
>>> >>>>>>>>>>>>>>>> are
>>> >>>>>>>>>>>>>>>>>>>>>>> two
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> major
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues to consistency and
>>> >> integration,
>>> >>>>>>>> namely
>>> >>>>>>>>>>>>>> different
>>> >>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job cluster which couples job
>>> >> graph
>>> >>>>>>>> creation
>>> >>>>>>>>>>> and
>>> >>>>>>>>>>>>>>> cluster
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> deployment,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submission via CliFrontend
>>> >> confusing
>>> >>>>>>>>>>> control
>>> >>>>>>>>>>>> flow
>>> >>>>>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>> graph
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compilation and job
>>> >> submission. I'd like
>>> >>>>>>>> to
>>> >>>>>>>>>>>> follow
>>> >>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>> discuss
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> above,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mainly the process described
>>> >> by Jeff and
>>> >>>>>>>>>>>> Stephan, and
>>> >>>>>>>>>>>>>>>> share
>>> >>>>>>>>>>>>>>>>>>>>>>> my
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas on these issues.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend confuses the
>>> >> control flow
>>> >>>>>>>> of
>>> >>>>>>>>>> job
>>> >>>>>>>>>>>>>>>> compilation
>>> >>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> submission.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Following the process of job
>>> >> submission
>>> >>>>>>>>>> Stephan
>>> >>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>> Jeff
>>> >>>>>>>>>>>>>>>>>>>>>>>>> described,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution environment knows
>>> >> all configs
>>> >>>>>>>> of
>>> >>>>>>>>>> the
>>> >>>>>>>>>>>>>> cluster
>>> >>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> topos/settings
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the job. Ideally, in the
>>> >> main method
>>> >>>>>>>> of
>>> >>>>>>>>>> user
>>> >>>>>>>>>>>>>>> program,
>>> >>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>>>>> calls
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> #execute
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (or named #submit) and Flink
>>> >> deploys the
>>> >>>>>>>>>>> cluster,
>>> >>>>>>>>>>>>>>> compile
>>> >>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> graph
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit it to the cluster.
>>> >> However,
>>> >>>>>>>>>> current
>>> >>>>>>>>>>>>>>>> CliFrontend
>>> >>>>>>>>>>>>>>>>>>>>>>> does
>>> >>>>>>>>>>>>>>>>>>>>>>>>> all
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things inside its #runProgram
>>> >> method,
>>> >>>>>>>> which
>>> >>>>>>>>>>>>>> introduces
>>> >>>>>>>>>>>>>>> a
>>> >>>>>>>>>>>>>>>>> lot
>>> >>>>>>>>>>>>>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of (stream) execution
>>> >> environment.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually, it sets up an exec
>>> >> env that
>>> >>>>>>>> hijacks
>>> >>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> #execute/executePlan
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method, initializes the job
>>> >> graph and
>>> >>>>>>>> abort
>>> >>>>>>>>>>>>>> execution.
>>> >>>>>>>>>>>>>>>> And
>>> >>>>>>>>>>>>>>>>>>>>>>> then
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow back to
>>> >> CliFrontend, it
>>> >>>>>>>> deploys
>>> >>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>> cluster(or
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the client) and submits the
>>> >> job graph.
>>> >>>>>>>> This
>>> >>>>>>>>>> is
>>> >>>>>>>>>>>> quite
>>> >>>>>>>>>>>>>> a
>>> >>>>>>>>>>>>>>>>>>>>>>> specific
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> internal
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> process inside Flink and none
>>> >> of
>>> >>>>>>>> consistency
>>> >>>>>>>>>> to
>>> >>>>>>>>>>>>>>> anything.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Deployment of job cluster
>>> >> couples job
>>> >>>>>>>>>> graph
>>> >>>>>>>>>>>>>> creation
>>> >>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>> cluster
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment. Abstractly, from
>>> >> user job to
>>> >>>>>>>> a
>>> >>>>>>>>>>>> concrete
>>> >>>>>>>>>>>>>>>>>>>>>>> submission,
>>> >>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    create JobGraph --\
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create ClusterClient -->
>>> >> submit JobGraph
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> such a dependency.
>>> >> ClusterClient was
>>> >>>>>>>> created
>>> >>>>>>>>>> by
>>> >>>>>>>>>>>>>>> deploying
>>> >>>>>>>>>>>>>>>>> or
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieving.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph submission requires a
>>> >> compiled
>>> >>>>>>>>>>> JobGraph
>>> >>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>> valid
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but the creation of
>>> >> ClusterClient is
>>> >>>>>>>>>> abstractly
>>> >>>>>>>>>>>>>>>> independent
>>> >>>>>>>>>>>>>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph. However, in job
>>> >> cluster mode,
>>> >>>>>>>> we
>>> >>>>>>>>>>>> deploy job
>>> >>>>>>>>>>>>>>>>> cluster
>>> >>>>>>>>>>>>>>>>>>>>>>>>> with
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> a
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> graph, which means we use
>>> >> another
>>> >>>>>>>> process:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create JobGraph --> deploy
>>> >> cluster with
>>> >>>>>>>> the
>>> >>>>>>>>>>>> JobGraph
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Here is another inconsistency
>>> >> and
>>> >>>>>>>> downstream
>>> >>>>>>>>>>>>>>>>> projects/client
>>> >>>>>>>>>>>>>>>>>>>>>>>> apis
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> are
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forced to handle different
>>> >> cases with
>>> >>>>>>>> rare
>>> >>>>>>>>>>>> supports
>>> >>>>>>>>>>>>>>> from
>>> >>>>>>>>>>>>>>>>>>>>>>> Flink.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since we likely reached a
>>> >> consensus on
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. all configs gathered by
>>> >> Flink
>>> >>>>>>>>>> configuration
>>> >>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>> passed
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. execution environment knows
>>> >> all
>>> >>>>>>>> configs
>>> >>>>>>>>>> and
>>> >>>>>>>>>>>>>> handles
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> execution(both
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment and submission)
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the issues above I propose
>>> >> eliminating
>>> >>>>>>>>>>>>>>> inconsistencies
>>> >>>>>>>>>>>>>>>>> by
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> following
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend should exactly
>>> >> be a front
>>> >>>>>>>> end,
>>> >>>>>>>>>>> at
>>> >>>>>>>>>>>>>> least
>>> >>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>> "run"
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That means it just gathered
>>> >> and passed
>>> >>>>>>>> all
>>> >>>>>>>>>>> config
>>> >>>>>>>>>>>>>> from
>>> >>>>>>>>>>>>>>>>>>>>>>> command
>>> >>>>>>>>>>>>>>>>>>>>>>>>> line
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the main method of user
>>> >> program.
>>> >>>>>>>> Execution
>>> >>>>>>>>>>>>>> environment
>>> >>>>>>>>>>>>>>>>> knows
>>> >>>>>>>>>>>>>>>>>>>>>>>> all
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> info
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and with an addition to utils
>>> >> for
>>> >>>>>>>>>>> ClusterClient,
>>> >>>>>>>>>>>> we
>>> >>>>>>>>>>>>>>>>>>>>>>> gracefully
>>> >>>>>>>>>>>>>>>>>>>>>>>>> get
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> a
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient by deploying or
>>> >>>>>>>> retrieving. In
>>> >>>>>>>>>>> this
>>> >>>>>>>>>>>>>> way,
>>> >>>>>>>>>>>>>>> we
>>> >>>>>>>>>>>>>>>>>>>>>>> don't
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> need
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hijack #execute/executePlan
>>> >> methods and
>>> >>>>>>>> can
>>> >>>>>>>>>>>> remove
>>> >>>>>>>>>>>>>>>> various
>>> >>>>>>>>>>>>>>>>>>>>>>>>> hacking
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses of exec env, as
>>> >> well as #run
>>> >>>>>>>>>> methods
>>> >>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient(for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interface-ized ClusterClient).
>>> >> Now the
>>> >>>>>>>>>> control
>>> >>>>>>>>>>>> flow
>>> >>>>>>>>>>>>>>> flows
>>> >>>>>>>>>>>>>>>>>>>>>>> from
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> CliFrontend
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the main method and never
>>> >> returns.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Job cluster means a cluster
>>> >> for the
>>> >>>>>>>>>> specific
>>> >>>>>>>>>>>> job.
>>> >>>>>>>>>>>>>>> From
>>> >>>>>>>>>>>>>>>>>>>>>>>> another
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> perspective, it is an
>>> >> ephemeral session.
>>> >>>>>>>> We
>>> >>>>>>>>>> may
>>> >>>>>>>>>>>>>>> decouple
>>> >>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with a compiled job graph, but
>>> >> start a
>>> >>>>>>>>>> session
>>> >>>>>>>>>>>> with
>>> >>>>>>>>>>>>>>> idle
>>> >>>>>>>>>>>>>>>>>>>>>>>> timeout
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit the job following.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> These topics, before we go
>>> >> into more
>>> >>>>>>>> details
>>> >>>>>>>>>> on
>>> >>>>>>>>>>>>>> design
>>> >>>>>>>>>>>>>>> or
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are better to be aware and
>>> >> discussed for
>>> >>>>>>>> a
>>> >>>>>>>>>>>> consensus.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <
>>> >> wander4096@gmail.com>
>>> >>>>>>>>>> 于2019年6月20日周四
>>> >>>>>>>>>>>>>>> 上午3:21写道：
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for raising this
>>> >> thread and the
>>> >>>>>>>>>> design
>>> >>>>>>>>>>>>>>> document!
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As @Thomas Weise mentioned
>>> >> above,
>>> >>>>>>>> extending
>>> >>>>>>>>>>>> config
>>> >>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>> flink
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires far more effort than
>>> >> it should
>>> >>>>>>>> be.
>>> >>>>>>>>>>>> Another
>>> >>>>>>>>>>>>>>>>> example
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is we achieve detach mode by
>>> >> introduce
>>> >>>>>>>>>> another
>>> >>>>>>>>>>>>>>> execution
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment which also hijack
>>> >> #execute
>>> >>>>>>>>>> method.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with your idea that
>>> >> user would
>>> >>>>>>>>>>>> configure all
>>> >>>>>>>>>>>>>>>>> things
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and flink "just" respect it.
>>> >> On this
>>> >>>>>>>> topic I
>>> >>>>>>>>>>>> think
>>> >>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>> unusual
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow when CliFrontend
>>> >> handle
>>> >>>>>>>> "run"
>>> >>>>>>>>>>>> command
>>> >>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>> problem.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It handles several configs,
>>> >> mainly about
>>> >>>>>>>>>>> cluster
>>> >>>>>>>>>>>>>>>> settings,
>>> >>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thus main method of user
>>> >> program is
>>> >>>>>>>> unaware
>>> >>>>>>>>>> of
>>> >>>>>>>>>>>> them.
>>> >>>>>>>>>>>>>>>> Also
>>> >>>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> compiles
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> app to job graph by run the
>>> >> main method
>>> >>>>>>>>>> with a
>>> >>>>>>>>>>>>>>> hijacked
>>> >>>>>>>>>>>>>>>>> exec
>>> >>>>>>>>>>>>>>>>>>>>>>>>> env,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which constrain the main
>>> >> method further.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to write down a few
>>> >> of notes on
>>> >>>>>>>>>>>>>> configs/args
>>> >>>>>>>>>>>>>>>> pass
>>> >>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> respect,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as well as decoupling job
>>> >> compilation
>>> >>>>>>>> and
>>> >>>>>>>>>>>>>> submission.
>>> >>>>>>>>>>>>>>>>> Share
>>> >>>>>>>>>>>>>>>>>>>>>>> on
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> this
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread later.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SHI Xiaogang <
>>> >> shixiaogangg@gmail.com>
>>> >>>>>>>>>>>> 于2019年6月17日周一
>>> >>>>>>>>>>>>>>>>>>>>>>> 下午7:29写道：
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff and Flavio,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Jeff a lot for
>>> >> proposing the
>>> >>>>>>>> design
>>> >>>>>>>>>>>>>> document.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are also working on
>>> >> refactoring
>>> >>>>>>>>>>>> ClusterClient to
>>> >>>>>>>>>>>>>>>> allow
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> flexible
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> efficient job management in
>>> >> our
>>> >>>>>>>> real-time
>>> >>>>>>>>>>>> platform.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We would like to draft a
>>> >> document to
>>> >>>>>>>> share
>>> >>>>>>>>>>> our
>>> >>>>>>>>>>>>>> ideas
>>> >>>>>>>>>>>>>>>> with
>>> >>>>>>>>>>>>>>>>>>>>>>>> you.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think it's a good idea to
>>> >> have
>>> >>>>>>>> something
>>> >>>>>>>>>>> like
>>> >>>>>>>>>>>>>>> Apache
>>> >>>>>>>>>>>>>>>>> Livy
>>> >>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the efforts discussed here
>>> >> will take a
>>> >>>>>>>>>> great
>>> >>>>>>>>>>>> step
>>> >>>>>>>>>>>>>>>> forward
>>> >>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>> it.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Xiaogang
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
>>> >>>>>>>> pompermaier@okkam.it>
>>> >>>>>>>>>>>>>>>> 于2019年6月17日周一
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> 下午7:13写道：
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Is there any possibility to
>>> >> have
>>> >>>>>>>> something
>>> >>>>>>>>>>>> like
>>> >>>>>>>>>>>>>>> Apache
>>> >>>>>>>>>>>>>>>>>>>>>>> Livy
>>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink in the future?
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >> https://livy.apache.org/
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 11, 2019 at
>>> >> 5:23 PM Jeff
>>> >>>>>>>>>> Zhang <
>>> >>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any API we expose
>>> >> should not have
>>> >>>>>>>>>>>> dependencies
>>> >>>>>>>>>>>>>>> on
>>> >>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package or
>>> >> other
>>> >>>>>>>>>>>> implementation
>>> >>>>>>>>>>>>>>>>>>>>>>> details.
>>> >>>>>>>>>>>>>>>>>>>>>>>> To
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> me,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
>>> >> ClusterClient
>>> >>>>>>>> cannot be
>>> >>>>>>>>>>>> exposed
>>> >>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>> users
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> because
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from the
>>> >>>>>>>> optimiser and
>>> >>>>>>>>>>>> runtime
>>> >>>>>>>>>>>>>>>>>>>>>>>> packages.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We should change
>>> >> ClusterClient from
>>> >>>>>>>> class
>>> >>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>> interface.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment only
>>> >> use the
>>> >>>>>>>>>> interface
>>> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> which
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in flink-clients while the
>>> >> concrete
>>> >>>>>>>>>>>>>> implementation
>>> >>>>>>>>>>>>>>>>>>>>>>> class
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> could
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-runtime.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What happens when a
>>> >>>>>>>> failure/restart in
>>> >>>>>>>>>>> the
>>> >>>>>>>>>>>>>>> client
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> happens?
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to be a way of
>>> >> re-establishing the
>>> >>>>>>>>>>>> connection to
>>> >>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>> job,
>>> >>>>>>>>>>>>>>>>>>>>>>>>> set
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> up
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners again, etc.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Good point.  First we need
>>> >> to define
>>> >>>>>>>> what
>>> >>>>>>>>>>>> does
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> failure/restart
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client mean. IIUC, that
>>> >> usually mean
>>> >>>>>>>>>>> network
>>> >>>>>>>>>>>>>>> failure
>>> >>>>>>>>>>>>>>>>>>>>>>>> which
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> will
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happen in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class RestClient. If my
>>> >>>>>>>> understanding is
>>> >>>>>>>>>>>> correct,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> restart/retry
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mechanism
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be done in
>>> >> RestClient.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <
>>> >>>>>>>> aljoscha@apache.org>
>>> >>>>>>>>>>>>>>> 于2019年6月11日周二
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:10写道：
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Some points to consider:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * Any API we expose
>>> >> should not have
>>> >>>>>>>>>>>> dependencies
>>> >>>>>>>>>>>>>>> on
>>> >>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package
>>> >> or other
>>> >>>>>>>>>>>> implementation
>>> >>>>>>>>>>>>>>>>>>>>>>>> details.
>>> >>>>>>>>>>>>>>>>>>>>>>>>> To
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> me,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
>>> >> ClusterClient
>>> >>>>>>>> cannot be
>>> >>>>>>>>>>>> exposed
>>> >>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>> users
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> because
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from
>>> >> the
>>> >>>>>>>> optimiser
>>> >>>>>>>>>> and
>>> >>>>>>>>>>>>>> runtime
>>> >>>>>>>>>>>>>>>>>>>>>>>>> packages.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * What happens when a
>>> >>>>>>>> failure/restart in
>>> >>>>>>>>>>> the
>>> >>>>>>>>>>>>>>> client
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> happens?
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be a way of
>>> >> re-establishing the
>>> >>>>>>>>>> connection
>>> >>>>>>>>>>>> to
>>> >>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>> job,
>>> >>>>>>>>>>>>>>>>>>>>>>>>> set
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> up
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again, etc.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 29. May 2019, at
>>> >> 10:17, Jeff
>>> >>>>>>>> Zhang <
>>> >>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry folks, the design
>>> >> doc is
>>> >>>>>>>> late as
>>> >>>>>>>>>>> you
>>> >>>>>>>>>>>>>>>>>>>>>>> expected.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Here's
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doc
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I drafted, welcome any
>>> >> comments and
>>> >>>>>>>>>>>> feedback.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <
>>> >> sewen@apache.org>
>>> >>>>>>>>>>>> 于2019年2月14日周四
>>> >>>>>>>>>>>>>>>>>>>>>>>>> 下午8:43写道：
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Nice that this
>>> >> discussion is
>>> >>>>>>>>>> happening.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the FLIP, we could
>>> >> also
>>> >>>>>>>> revisit the
>>> >>>>>>>>>>>> entire
>>> >>>>>>>>>>>>>>> role
>>> >>>>>>>>>>>>>>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environments
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the idea was:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the environments take
>>> >> care of
>>> >>>>>>>> the
>>> >>>>>>>>>>>> specific
>>> >>>>>>>>>>>>>>>>>>>>>>> setup
>>> >>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> standalone
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (no
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> setup needed), yarn,
>>> >> mesos, etc.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the session ones have
>>> >> control
>>> >>>>>>>> over
>>> >>>>>>>>>> the
>>> >>>>>>>>>>>>>>> session.
>>> >>>>>>>>>>>>>>>>>>>>>>>> The
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> holds
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the session client.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - running a job gives a
>>> >> "control"
>>> >>>>>>>>>> object
>>> >>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>>>>>>>> job.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> That
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behavior
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the same in all
>>> >> environments.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The actual
>>> >> implementation diverged
>>> >>>>>>>>>> quite
>>> >>>>>>>>>>>> a bit
>>> >>>>>>>>>>>>>>>>>>>>>>> from
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> that.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Happy
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see a
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about
>>> >> straitening this
>>> >>>>>>>> out
>>> >>>>>>>>>> a
>>> >>>>>>>>>>>> bit
>>> >>>>>>>>>>>>>>> more.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at
>>> >> 4:58 AM
>>> >>>>>>>> Jeff
>>> >>>>>>>>>>>> Zhang <
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi folks,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for late
>>> >> response, It
>>> >>>>>>>> seems we
>>> >>>>>>>>>>>> reach
>>> >>>>>>>>>>>>>>>>>>>>>>>> consensus
>>> >>>>>>>>>>>>>>>>>>>>>>>>> on
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, I
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP for this with
>>> >> more detailed
>>> >>>>>>>>>> design
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas Weise <
>>> >> thw@apache.org>
>>> >>>>>>>>>>>> 于2018年12月21日周五
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> 上午11:43写道：
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great to see this
>>> >> discussion
>>> >>>>>>>> seeded!
>>> >>>>>>>>>>> The
>>> >>>>>>>>>>>>>>>>>>>>>>> problems
>>> >>>>>>>>>>>>>>>>>>>>>>>>> you
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> face
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin integration
>>> >> are also
>>> >>>>>>>>>>> affecting
>>> >>>>>>>>>>>>>> other
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We just enabled the
>>> >> savepoint
>>> >>>>>>>>>> restore
>>> >>>>>>>>>>>> option
>>> >>>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RemoteStreamEnvironment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and that was more
>>> >> difficult
>>> >>>>>>>> than it
>>> >>>>>>>>>>>> should
>>> >>>>>>>>>>>>>> be.
>>> >>>>>>>>>>>>>>>>>>>>>>> The
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> main
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment and
>>> >> cluster client
>>> >>>>>>>>>> aren't
>>> >>>>>>>>>>>>>>> decoupled.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Ideally
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible to just get
>>> >> the
>>> >>>>>>>> matching
>>> >>>>>>>>>>>> cluster
>>> >>>>>>>>>>>>>>> client
>>> >>>>>>>>>>>>>>>>>>>>>>>>> from
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then control the job
>>> >> through it
>>> >>>>>>>>>>>> (environment
>>> >>>>>>>>>>>>>>> as
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> factory
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client). But note
>>> >> that the
>>> >>>>>>>>>> environment
>>> >>>>>>>>>>>>>> classes
>>> >>>>>>>>>>>>>>>>>>>>>>> are
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> part
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and it is not
>>> >> straightforward to
>>> >>>>>>>>>> make
>>> >>>>>>>>>>>> larger
>>> >>>>>>>>>>>>>>>>>>>>>>>> changes
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> without
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> breaking
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backward
>>> >> compatibility.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >> currently exposes
>>> >>>>>>>>>>> internal
>>> >>>>>>>>>>>>>>> classes
>>> >>>>>>>>>>>>>>>>>>>>>>>>> like
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> StreamGraph. But it
>>> >> should be
>>> >>>>>>>>>> possible
>>> >>>>>>>>>>>> to
>>> >>>>>>>>>>>>>> wrap
>>> >>>>>>>>>>>>>>>>>>>>>>>> this
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> with a
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that brings the
>>> >> required job
>>> >>>>>>>> control
>>> >>>>>>>>>>>>>>>>>>>>>>> capabilities
>>> >>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Perhaps it is helpful
>>> >> to look at
>>> >>>>>>>>>> some
>>> >>>>>>>>>>>> of the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> interfaces
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> while
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking about this:
>>> >> [2] for the
>>> >>>>>>>>>>>> portable
>>> >>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>> API
>>> >>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> old
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> asynchronous job
>>> >> control from
>>> >>>>>>>> the
>>> >>>>>>>>>> Beam
>>> >>>>>>>>>>>> Java
>>> >>>>>>>>>>>>>>> SDK.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The backward
>>> >> compatibility
>>> >>>>>>>>>> discussion
>>> >>>>>>>>>>>> [4] is
>>> >>>>>>>>>>>>>>>>>>>>>>> also
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> relevant
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here. A
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should shield
>>> >> downstream
>>> >>>>>>>> projects
>>> >>>>>>>>>> from
>>> >>>>>>>>>>>>>>> internals
>>> >>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> allow
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> them to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interoperate with
>>> >> multiple
>>> >>>>>>>> future
>>> >>>>>>>>>>> Flink
>>> >>>>>>>>>>>>>>> versions
>>> >>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> release
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without forced
>>> >> upgrades.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>>>>>>>>>>>>> https://github.com/apache/flink/pull/7249
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [4]
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018
>>> >> at 6:15 PM
>>> >>>>>>>> Jeff
>>> >>>>>>>>>>>> Zhang <
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
>>> >> whether the
>>> >>>>>>>> user
>>> >>>>>>>>>>>> should
>>> >>>>>>>>>>>>>> be
>>> >>>>>>>>>>>>>>>>>>>>>>>> able
>>> >>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> define
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
>>> >> example Yarn).
>>> >>>>>>>> This
>>> >>>>>>>>>> is
>>> >>>>>>>>>>>>>> actually
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> independent
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
>>> >> something
>>> >>>>>>>> which
>>> >>>>>>>>>> is
>>> >>>>>>>>>>>>>> decided
>>> >>>>>>>>>>>>>>>>>>>>>>> at
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> User don't need to
>>> >> specify
>>> >>>>>>>>>> execution
>>> >>>>>>>>>>>> mode
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatically.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> They
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass the execution
>>> >> mode from
>>> >>>>>>>> the
>>> >>>>>>>>>>>> arguments
>>> >>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>> flink
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> run
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>>> >> yarn-cluster
>>> >>>>>>>> ....
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>>> >> local ...
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>>> >> host:port ...
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this make sense
>>> >> to you ?
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To me it makes
>>> >> sense that
>>> >>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized by the
>>> >> user and
>>> >>>>>>>> instead
>>> >>>>>>>>>>>> context
>>> >>>>>>>>>>>>>>>>>>>>>>>>> sensitive
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> how
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute your job
>>> >> (Flink CLI vs.
>>> >>>>>>>>>> IDE,
>>> >>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>> example).
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Right, currently I
>>> >> notice Flink
>>> >>>>>>>>>> would
>>> >>>>>>>>>>>>>> create
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> different
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >> ContextExecutionEnvironment
>>> >>>>>>>> based
>>> >>>>>>>>>> on
>>> >>>>>>>>>>>>>>> different
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> submission
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scenarios
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cli vs IDE). To me
>>> >> this is
>>> >>>>>>>> kind of
>>> >>>>>>>>>>> hack
>>> >>>>>>>>>>>>>>>>>>>>>>> approach,
>>> >>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> so
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> straightforward.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested
>>> >> above is that
>>> >>>>>>>> is
>>> >>>>>>>>>>> that
>>> >>>>>>>>>>>>>> flink
>>> >>>>>>>>>>>>>>>>>>>>>>>> should
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> always
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >> but with
>>> >>>>>>>>>>> different
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> configuration,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> based
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration it
>>> >> would create
>>> >>>>>>>> the
>>> >>>>>>>>>>>> proper
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behaviors.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
>>> >>>>>>>>>> trohrmann@apache.org>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月20日周四
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:18写道：
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You are probably
>>> >> right that we
>>> >>>>>>>>>> have
>>> >>>>>>>>>>>> code
>>> >>>>>>>>>>>>>>>>>>>>>>>>> duplication
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> when
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creation of the
>>> >> ClusterClient.
>>> >>>>>>>>>> This
>>> >>>>>>>>>>>> should
>>> >>>>>>>>>>>>>>> be
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> reduced
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
>>> >> whether the
>>> >>>>>>>> user
>>> >>>>>>>>>>>> should be
>>> >>>>>>>>>>>>>>>>>>>>>>> able
>>> >>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> define
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
>>> >> example Yarn).
>>> >>>>>>>> This
>>> >>>>>>>>>> is
>>> >>>>>>>>>>>>>>> actually
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> independent
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
>>> >> something
>>> >>>>>>>> which
>>> >>>>>>>>>>> is
>>> >>>>>>>>>>>>>>> decided
>>> >>>>>>>>>>>>>>>>>>>>>>> at
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> makes sense that the
>>> >>>>>>>>>>>> ExecutionEnvironment
>>> >>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>>>>>> not
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user and
>>> >> instead context
>>> >>>>>>>>>>>> sensitive how
>>> >>>>>>>>>>>>>>> you
>>> >>>>>>>>>>>>>>>>>>>>>>>>> want
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE,
>>> >> for
>>> >>>>>>>> example).
>>> >>>>>>>>>>>>>> However, I
>>> >>>>>>>>>>>>>>>>>>>>>>>> agree
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >> ExecutionEnvironment should
>>> >>>>>>>> give
>>> >>>>>>>>>> you
>>> >>>>>>>>>>>>>> access
>>> >>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job (maybe in the
>>> >> form of the
>>> >>>>>>>>>>>> JobGraph or
>>> >>>>>>>>>>>>>> a
>>> >>>>>>>>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> plan).
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 13,
>>> >> 2018 at 4:36
>>> >>>>>>>> AM
>>> >>>>>>>>>> Jeff
>>> >>>>>>>>>>>>>> Zhang <
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Till,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the
>>> >> feedback. You
>>> >>>>>>>> are
>>> >>>>>>>>>>>> right
>>> >>>>>>>>>>>>>>> that I
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> expect
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >> submission/control api
>>> >>>>>>>> which
>>> >>>>>>>>>>>> could be
>>> >>>>>>>>>>>>>>>>>>>>>>> used
>>> >>>>>>>>>>>>>>>>>>>>>>>> by
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> project.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it would benefit
>>> >> for the
>>> >>>>>>>> flink
>>> >>>>>>>>>>>> ecosystem.
>>> >>>>>>>>>>>>>>>>>>>>>>> When
>>> >>>>>>>>>>>>>>>>>>>>>>>> I
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> look
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scala-shell and
>>> >> sql-client (I
>>> >>>>>>>>>>> believe
>>> >>>>>>>>>>>>>> they
>>> >>>>>>>>>>>>>>>>>>>>>>> are
>>> >>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> core of
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> belong to the
>>> >> ecosystem of
>>> >>>>>>>>>> flink),
>>> >>>>>>>>>>> I
>>> >>>>>>>>>>>> find
>>> >>>>>>>>>>>>>>>>>>>>>>> many
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> duplicated
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creating
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient from
>>> >> user
>>> >>>>>>>> provided
>>> >>>>>>>>>>>>>>>>>>>>>>> configuration
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (configuration
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> format
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different from
>>> >> scala-shell
>>> >>>>>>>> and
>>> >>>>>>>>>>>>>> sql-client)
>>> >>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> use
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to manipulate
>>> >> jobs. I don't
>>> >>>>>>>> think
>>> >>>>>>>>>>>> this is
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> convenient
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects. What I
>>> >> expect is
>>> >>>>>>>> that
>>> >>>>>>>>>>>>>> downstream
>>> >>>>>>>>>>>>>>>>>>>>>>>>> project
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> only
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary
>>> >> configuration info
>>> >>>>>>>>>> (maybe
>>> >>>>>>>>>>>>>>>>>>>>>>> introducing
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> class
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf),
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> build
>>> >> ExecutionEnvironment
>>> >>>>>>>> based
>>> >>>>>>>>>> on
>>> >>>>>>>>>>>> this
>>> >>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >> ExecutionEnvironment will
>>> >>>>>>>> create
>>> >>>>>>>>>>> the
>>> >>>>>>>>>>>>>> proper
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> benefit for the
>>> >> downstream
>>> >>>>>>>>>> project
>>> >>>>>>>>>>>>>>>>>>>>>>> development
>>> >>>>>>>>>>>>>>>>>>>>>>>>> but
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> also
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their integration
>>> >> test with
>>> >>>>>>>>>> flink.
>>> >>>>>>>>>>>> Here's
>>> >>>>>>>>>>>>>>> one
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> sample
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> snippet
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expect.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val conf = new
>>> >>>>>>>>>>>> FlinkConf().mode("yarn")
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val env = new
>>> >>>>>>>>>>>> ExecutionEnvironment(conf)
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobId =
>>> >> env.submit(...)
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobStatus =
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What do you think ?
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
>>> >>>>>>>>>>> trohrmann@apache.org>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月11日周二
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午6:28写道：
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what you are
>>> >> proposing is to
>>> >>>>>>>>>>>> provide the
>>> >>>>>>>>>>>>>>>>>>>>>>> user
>>> >>>>>>>>>>>>>>>>>>>>>>>>> with
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control. There
>>> >> was actually
>>> >>>>>>>> an
>>> >>>>>>>>>>>> effort to
>>> >>>>>>>>>>>>>>>>>>>>>>>> achieve
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> never
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> been
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> completed [1].
>>> >> However,
>>> >>>>>>>> there
>>> >>>>>>>>>> are
>>> >>>>>>>>>>>> some
>>> >>>>>>>>>>>>>>>>>>>>>>>>> improvement
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> base
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> now.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Look for example
>>> >> at the
>>> >>>>>>>>>>>> NewClusterClient
>>> >>>>>>>>>>>>>>>>>>>>>>>>> interface
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> offers a
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> non-blocking job
>>> >> submission.
>>> >>>>>>>>>> But I
>>> >>>>>>>>>>>> agree
>>> >>>>>>>>>>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>>>>>>> we
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improve
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this regard.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would not be in
>>> >> favour if
>>> >>>>>>>>>>>> exposing all
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> calls
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >> ExecutionEnvironment
>>> >>>>>>>> because it
>>> >>>>>>>>>>>> would
>>> >>>>>>>>>>>>>>>>>>>>>>> clutter
>>> >>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> class
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> good separation
>>> >> of concerns.
>>> >>>>>>>>>>>> Instead one
>>> >>>>>>>>>>>>>>>>>>>>>>> idea
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> could
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current
>>> >> ClusterClient from
>>> >>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for cluster and
>>> >> job
>>> >>>>>>>> control. But
>>> >>>>>>>>>>>> before
>>> >>>>>>>>>>>>>> we
>>> >>>>>>>>>>>>>>>>>>>>>>>> start
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> an
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> effort
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> agree and capture
>>> >> what
>>> >>>>>>>>>>>> functionality we
>>> >>>>>>>>>>>>>>> want
>>> >>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> provide.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the
>>> >> idea was
>>> >>>>>>>> that we
>>> >>>>>>>>>>>> have the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> describing
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to talk to
>>> >> cluster manager
>>> >>>>>>>> like
>>> >>>>>>>>>>>> Yarn or
>>> >>>>>>>>>>>>>>>>>>>>>>> Mesos.
>>> >>>>>>>>>>>>>>>>>>>>>>>>> The
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used for
>>> >> deploying Flink
>>> >>>>>>>>>> clusters
>>> >>>>>>>>>>>> (job
>>> >>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> session)
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gives
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you a
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
>>> >>>>>>>> ClusterClient
>>> >>>>>>>>>>>>>> controls
>>> >>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (e.g.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, listing all
>>> >> running
>>> >>>>>>>> jobs).
>>> >>>>>>>>>>> And
>>> >>>>>>>>>>>>>> then
>>> >>>>>>>>>>>>>>>>>>>>>>>> there
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> was
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> idea
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobClient which
>>> >> you obtain
>>> >>>>>>>> from
>>> >>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trigger
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operations (e.g.
>>> >> taking a
>>> >>>>>>>>>>> savepoint,
>>> >>>>>>>>>>>>>>>>>>>>>>>> cancelling
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> job).
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-4272
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11,
>>> >> 2018 at
>>> >>>>>>>> 10:13 AM
>>> >>>>>>>>>>>> Jeff
>>> >>>>>>>>>>>>>>> Zhang
>>> >>>>>>>>>>>>>>>>>>>>>>> <
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am trying to
>>> >> integrate
>>> >>>>>>>> flink
>>> >>>>>>>>>>> into
>>> >>>>>>>>>>>>>>> apache
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> zeppelin
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which is
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> notebook. And I
>>> >> hit several
>>> >>>>>>>>>>> issues
>>> >>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>>>>>>>> caused
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> by
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to
>>> >> proposal the
>>> >>>>>>>>>>> following
>>> >>>>>>>>>>>>>>> changes
>>> >>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Support
>>> >> nonblocking
>>> >>>>>>>>>> execution.
>>> >>>>>>>>>>>>>>>>>>>>>>> Currently,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >> ExecutionEnvironment#execute
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a blocking
>>> >> method which
>>> >>>>>>>>>> would
>>> >>>>>>>>>>>> do 2
>>> >>>>>>>>>>>>>>>>>>>>>>> things,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> first
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wait for job
>>> >> until it is
>>> >>>>>>>>>>> finished.
>>> >>>>>>>>>>>> I'd
>>> >>>>>>>>>>>>>>> like
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nonblocking
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution method
>>> >> like
>>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#submit
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then return
>>> >> jobId to
>>> >>>>>>>> client.
>>> >>>>>>>>>> And
>>> >>>>>>>>>>>> allow
>>> >>>>>>>>>>>>>>> user
>>> >>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> query
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> status
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel
>>> >> api in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >> ExecutionEnvironment/StreamExecutionEnvironment,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> currently the
>>> >> only way to
>>> >>>>>>>>>> cancel
>>> >>>>>>>>>>>> job is
>>> >>>>>>>>>>>>>>> via
>>> >>>>>>>>>>>>>>>>>>>>>>>> cli
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (bin/flink),
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> convenient for
>>> >> downstream
>>> >>>>>>>>>> project
>>> >>>>>>>>>>>> to
>>> >>>>>>>>>>>>>> use
>>> >>>>>>>>>>>>>>>>>>>>>>> this
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> feature.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So I'd
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> add
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel api in
>>> >>>>>>>>>>> ExecutionEnvironment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint
>>> >> api in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is similar as
>>> >> cancel api,
>>> >>>>>>>> we
>>> >>>>>>>>>>>> should use
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unified
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api for third
>>> >> party to
>>> >>>>>>>>>> integrate
>>> >>>>>>>>>>>> with
>>> >>>>>>>>>>>>>>>>>>>>>>> flink.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. Add listener
>>> >> for job
>>> >>>>>>>>>> execution
>>> >>>>>>>>>>>>>>>>>>>>>>> lifecycle.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Something
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> so
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that downstream
>>> >> project
>>> >>>>>>>> can do
>>> >>>>>>>>>>>> custom
>>> >>>>>>>>>>>>>>> logic
>>> >>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lifecycle
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin would
>>> >> capture the
>>> >>>>>>>>>> jobId
>>> >>>>>>>>>>>> after
>>> >>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> submitted
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId to cancel
>>> >> it later
>>> >>>>>>>> when
>>> >>>>>>>>>>>>>> necessary.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public interface
>>> >>>>>>>> JobListener {
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>>> >> onJobSubmitted(JobID
>>> >>>>>>>>>>> jobId);
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>>> >>>>>>>>>>>> onJobExecuted(JobExecutionResult
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> jobResult);
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>>> >> onJobCanceled(JobID
>>> >>>>>>>>>> jobId);
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5. Enable
>>> >> session in
>>> >>>>>>>>>>>>>>> ExecutionEnvironment.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Currently
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> disabled,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> session is very
>>> >> convenient
>>> >>>>>>>> for
>>> >>>>>>>>>>>> third
>>> >>>>>>>>>>>>>>> party
>>> >>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> continually.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I hope flink can
>>> >> enable it
>>> >>>>>>>>>> again.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6. Unify all
>>> >> flink client
>>> >>>>>>>> api
>>> >>>>>>>>>>> into
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a long
>>> >> term issue
>>> >>>>>>>> which
>>> >>>>>>>>>>>> needs
>>> >>>>>>>>>>>>>>> more
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> careful
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently some
>>> >> of features
>>> >>>>>>>> of
>>> >>>>>>>>>>>> flink is
>>> >>>>>>>>>>>>>>>>>>>>>>>> exposed
>>> >>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some are
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exposed
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cli instead of
>>> >> api, like
>>> >>>>>>>> the
>>> >>>>>>>>>>>> cancel and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> savepoint I
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> above.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think the root
>>> >> cause is
>>> >>>>>>>> due to
>>> >>>>>>>>>>> that
>>> >>>>>>>>>>>>>> flink
>>> >>>>>>>>>>>>>>>>>>>>>>>>> didn't
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> unify
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interaction
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink. Here I
>>> >> list 3
>>> >>>>>>>> scenarios
>>> >>>>>>>>>> of
>>> >>>>>>>>>>>> flink
>>> >>>>>>>>>>>>>>>>>>>>>>>>> operation
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Local job
>>> >> execution.
>>> >>>>>>>> Flink
>>> >>>>>>>>>>> will
>>> >>>>>>>>>>>>>>> create
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>> >> LocalEnvironment to
>>> >>>>>>>>>> create
>>> >>>>>>>>>>>>>>>>>>>>>>>> LocalExecutor
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Remote job
>>> >> execution.
>>> >>>>>>>> Flink
>>> >>>>>>>>>>> will
>>> >>>>>>>>>>>>>>> create
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>> >> ContextEnvironment
>>> >>>>>>>>>> based
>>> >>>>>>>>>>>> on the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> run
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Job
>>> >> cancelation. Flink
>>> >>>>>>>> will
>>> >>>>>>>>>>>> create
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this job via
>>> >> this
>>> >>>>>>>>>> ClusterClient.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As you can see
>>> >> in the
>>> >>>>>>>> above 3
>>> >>>>>>>>>>>>>> scenarios.
>>> >>>>>>>>>>>>>>>>>>>>>>>> Flink
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> didn't
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach(code
>>> >> path) to
>>> >>>>>>>> interact
>>> >>>>>>>>>>>> with
>>> >>>>>>>>>>>>>>> flink
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I propose is
>>> >>>>>>>> following:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Create the proper
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment/RemoteEnvironment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (based
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration)
>>> >> --> Use this
>>> >>>>>>>>>>>> Environment
>>> >>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>> create
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> proper
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >> (LocalClusterClient or
>>> >>>>>>>>>>>>>> RestClusterClient)
>>> >>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink (
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution or
>>> >> cancelation)
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This way we can
>>> >> unify the
>>> >>>>>>>>>> process
>>> >>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>> local
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> execution
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remote
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And it is much
>>> >> easier for
>>> >>>>>>>> third
>>> >>>>>>>>>>>> party
>>> >>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> integrate
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >> ExecutionEnvironment is the
>>> >>>>>>>>>>> unified
>>> >>>>>>>>>>>>>> entry
>>> >>>>>>>>>>>>>>>>>>>>>>>> point
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> third
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> party
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs to do is
>>> >> just pass
>>> >>>>>>>>>>>> configuration
>>> >>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >> ExecutionEnvironment will
>>> >>>>>>>> do
>>> >>>>>>>>>> the
>>> >>>>>>>>>>>> right
>>> >>>>>>>>>>>>>>>>>>>>>>> thing
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> based
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink cli can
>>> >> also be
>>> >>>>>>>>>> considered
>>> >>>>>>>>>>> as
>>> >>>>>>>>>>>>>> flink
>>> >>>>>>>>>>>>>>>>>>>>>>> api
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> consumer.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> just
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration to
>>> >>>>>>>>>>>> ExecutionEnvironment
>>> >>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>> let
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create the proper
>>> >>>>>>>> ClusterClient
>>> >>>>>>>>>>>> instead
>>> >>>>>>>>>>>>>>> of
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> letting
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> cli
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>> >> directly.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6 would involve
>>> >> large code
>>> >>>>>>>>>>>> refactoring,
>>> >>>>>>>>>>>>>>> so
>>> >>>>>>>>>>>>>>>>>>>>>>> I
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> think
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> defer
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future release,
>>> >> 1,2,3,4,5
>>> >>>>>>>> could
>>> >>>>>>>>>>> be
>>> >>>>>>>>>>>> done
>>> >>>>>>>>>>>>>>> at
>>> >>>>>>>>>>>>>>>>>>>>>>>>> once I
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> believe.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comments and
>>> >> feedback,
>>> >>>>>>>> thanks
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> --
>>> >>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> Best Regards
>>> >>>>>>>
>>> >>>>>>> Jeff Zhang
>>> >>
>>> >>
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

Hi Aljoscha,

I'm OK to use the ASF slack.

Best,
tison.


Jeff Zhang <zj...@gmail.com> 于2019年9月11日周三 下午4:48写道：

> +1 for using slack for instant communication
>
> Aljoscha Krettek <al...@apache.org> 于2019年9月11日周三 下午4:44写道：
>
>> Hi,
>>
>> We could try and use the ASF slack for this purpose, that would probably
>> be easiest. See https://s.apache.org/slack-invite. We could create a
>> dedicated channel for our work and would still use the open ASF
>> infrastructure and people can have a look if they are interested because
>> discussion would be public. What do you think?
>>
>> P.S. Committers/PMCs should should be able to login with their apache ID.
>>
>> Best,
>> Aljoscha
>>
>> > On 6. Sep 2019, at 14:24, Zili Chen <wa...@gmail.com> wrote:
>> >
>> > Hi Aljoscha,
>> >
>> > I'd like to gather all the ideas here and among documents, and draft a
>> > formal FLIP
>> > that keep us on the same page. Hopefully I start a FLIP thread in next
>> week.
>> >
>> > For the implementation or said POC part, I'd like to work with you guys
>> who
>> > proposed
>> > the concept Executor to make sure that we go in the same direction. I'm
>> > wondering
>> > whether a dedicate thread or a Slack group is the proper one. In my
>> opinion
>> > we can
>> > involve the team in a Slack group, concurrent with the FLIP process
>> start
>> > our branch
>> > and once we reach a consensus on the FLIP, open an umbrella issue about
>> the
>> > framework
>> > and start subtasks. What do you think?
>> >
>> > Best,
>> > tison.
>> >
>> >
>> > Aljoscha Krettek <al...@apache.org> 于2019年9月5日周四 下午9:39写道：
>> >
>> >> Hi Tison,
>> >>
>> >> To keep this moving forward, maybe you want to start working on a
>> proof of
>> >> concept implementation for the new JobClient interface, maybe with a
>> new
>> >> method executeAsync() in the environment that returns the JobClient and
>> >> implement the methods to see how that works and to see where we get.
>> Would
>> >> you be interested in that?
>> >>
>> >> Also, at some point we should collect all the ideas and start forming
>> an
>> >> actual FLIP.
>> >>
>> >> Best,
>> >> Aljoscha
>> >>
>> >>> On 4. Sep 2019, at 12:04, Zili Chen <wa...@gmail.com> wrote:
>> >>>
>> >>> Thanks for your update Kostas!
>> >>>
>> >>> It looks good to me that clean up existing code paths as first
>> >>> pass. I'd like to help on review and file subtasks if I find ones.
>> >>>
>> >>> Best,
>> >>> tison.
>> >>>
>> >>>
>> >>> Kostas Kloudas <kk...@gmail.com> 于2019年9月4日周三 下午5:52写道：
>> >>> Here is the issue, and I will keep on updating it as I find more
>> issues.
>> >>>
>> >>> https://issues.apache.org/jira/browse/FLINK-13954
>> >>>
>> >>> This will also cover the refactoring of the Executors that we
>> discussed
>> >>> in this thread, without any additional functionality (such as the job
>> >> client).
>> >>>
>> >>> Kostas
>> >>>
>> >>> On Wed, Sep 4, 2019 at 11:46 AM Kostas Kloudas <kk...@gmail.com>
>> >> wrote:
>> >>>>
>> >>>> Great idea Tison!
>> >>>>
>> >>>> I will create the umbrella issue and post it here so that we are all
>> >>>> on the same page!
>> >>>>
>> >>>> Cheers,
>> >>>> Kostas
>> >>>>
>> >>>> On Wed, Sep 4, 2019 at 11:36 AM Zili Chen <wa...@gmail.com>
>> >> wrote:
>> >>>>>
>> >>>>> Hi Kostas & Aljoscha,
>> >>>>>
>> >>>>> I notice that there is a JIRA(FLINK-13946) which could be included
>> >>>>> in this refactor thread. Since we agree on most of directions in
>> >>>>> big picture, is it reasonable that we create an umbrella issue for
>> >>>>> refactor client APIs and also linked subtasks? It would be a better
>> >>>>> way that we join forces of our community.
>> >>>>>
>> >>>>> Best,
>> >>>>> tison.
>> >>>>>
>> >>>>>
>> >>>>> Zili Chen <wa...@gmail.com> 于2019年8月31日周六 下午12:52写道：
>> >>>>>>
>> >>>>>> Great Kostas! Looking forward to your POC!
>> >>>>>>
>> >>>>>> Best,
>> >>>>>> tison.
>> >>>>>>
>> >>>>>>
>> >>>>>> Jeff Zhang <zj...@gmail.com> 于2019年8月30日周五 下午11:07写道：
>> >>>>>>>
>> >>>>>>> Awesome, @Kostas Looking forward your POC.
>> >>>>>>>
>> >>>>>>> Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：
>> >>>>>>>
>> >>>>>>>> Hi all,
>> >>>>>>>>
>> >>>>>>>> I am just writing here to let you know that I am working on a
>> >> POC that
>> >>>>>>>> tries to refactor the current state of job submission in Flink.
>> >>>>>>>> I want to stress out that it introduces NO CHANGES to the current
>> >>>>>>>> behaviour of Flink. It just re-arranges things and introduces the
>> >>>>>>>> notion of an Executor, which is the entity responsible for
>> >> taking the
>> >>>>>>>> user-code and submitting it for execution.
>> >>>>>>>>
>> >>>>>>>> Given this, the discussion about the functionality that the
>> >> JobClient
>> >>>>>>>> will expose to the user can go on independently and the same
>> >>>>>>>> holds for all the open questions so far.
>> >>>>>>>>
>> >>>>>>>> I hope I will have some more new to share soon.
>> >>>>>>>>
>> >>>>>>>> Thanks,
>> >>>>>>>> Kostas
>> >>>>>>>>
>> >>>>>>>> On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <danrtsey.wy@gmail.com
>> >
>> >> wrote:
>> >>>>>>>>>
>> >>>>>>>>> Hi Zili,
>> >>>>>>>>>
>> >>>>>>>>> It make sense to me that a dedicated cluster is started for a
>> >> per-job
>> >>>>>>>>> cluster and will not accept more jobs.
>> >>>>>>>>> Just have a question about the command line.
>> >>>>>>>>>
>> >>>>>>>>> Currently we could use the following commands to start
>> >> different
>> >>>>>>>> clusters.
>> >>>>>>>>> *per-job cluster*
>> >>>>>>>>> ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
>> >>>>>>>>> examples/streaming/WindowJoin.jar
>> >>>>>>>>> *session cluster*
>> >>>>>>>>> ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
>> >>>>>>>>> examples/streaming/WindowJoin.jar
>> >>>>>>>>>
>> >>>>>>>>> What will it look like after client enhancement?
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Best,
>> >>>>>>>>> Yang
>> >>>>>>>>>
>> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
>> >>>>>>>>>
>> >>>>>>>>>> Hi Till,
>> >>>>>>>>>>
>> >>>>>>>>>> Thanks for your update. Nice to hear :-)
>> >>>>>>>>>>
>> >>>>>>>>>> Best,
>> >>>>>>>>>> tison.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Till Rohrmann <tr...@apache.org> 于2019年8月23日周五
>> >> 下午10:39写道：
>> >>>>>>>>>>
>> >>>>>>>>>>> Hi Tison,
>> >>>>>>>>>>>
>> >>>>>>>>>>> just a quick comment concerning the class loading issues
>> >> when using
>> >>>>>>>> the
>> >>>>>>>>>> per
>> >>>>>>>>>>> job mode. The community wants to change it so that the
>> >>>>>>>>>>> StandaloneJobClusterEntryPoint actually uses the user code
>> >> class
>> >>>>>>>> loader
>> >>>>>>>>>>> with child first class loading [1]. Hence, I hope that
>> >> this problem
>> >>>>>>>> will
>> >>>>>>>>>> be
>> >>>>>>>>>>> resolved soon.
>> >>>>>>>>>>>
>> >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-13840
>> >>>>>>>>>>>
>> >>>>>>>>>>> Cheers,
>> >>>>>>>>>>> Till
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <
>> >> kkloudas@gmail.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Hi all,
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On the topic of web submission, I agree with Till that
>> >> it only
>> >>>>>>>> seems
>> >>>>>>>>>>>> to complicate things.
>> >>>>>>>>>>>> It is bad for security, job isolation (anybody can
>> >> submit/cancel
>> >>>>>>>> jobs),
>> >>>>>>>>>>>> and its
>> >>>>>>>>>>>> implementation complicates some parts of the code. So,
>> >> if it were
>> >>>>>>>> to
>> >>>>>>>>>>>> redesign the
>> >>>>>>>>>>>> WebUI, maybe this part could be left out. In addition, I
>> >> would say
>> >>>>>>>>>>>> that the ability to cancel
>> >>>>>>>>>>>> jobs could also be left out.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Also I would also be in favour of removing the
>> >> "detached" mode, for
>> >>>>>>>>>>>> the reasons mentioned
>> >>>>>>>>>>>> above (i.e. because now we will have a future
>> >> representing the
>> >>>>>>>> result
>> >>>>>>>>>>>> on which the user
>> >>>>>>>>>>>> can choose to wait or not).
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Now for the separating job submission and cluster
>> >> creation, I am in
>> >>>>>>>>>>>> favour of keeping both.
>> >>>>>>>>>>>> Once again, the reasons are mentioned above by Stephan,
>> >> Till,
>> >>>>>>>> Aljoscha
>> >>>>>>>>>>>> and also Zili seems
>> >>>>>>>>>>>> to agree. They mainly have to do with security,
>> >> isolation and ease
>> >>>>>>>> of
>> >>>>>>>>>>>> resource management
>> >>>>>>>>>>>> for the user as he knows that "when my job is done,
>> >> everything
>> >>>>>>>> will be
>> >>>>>>>>>>>> cleared up". This is
>> >>>>>>>>>>>> also the experience you get when launching a process on
>> >> your local
>> >>>>>>>> OS.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On excluding the per-job mode from returning a JobClient
>> >> or not, I
>> >>>>>>>>>>>> believe that eventually
>> >>>>>>>>>>>> it would be nice to allow users to get back a jobClient.
>> >> The
>> >>>>>>>> reason is
>> >>>>>>>>>>>> that 1) I cannot
>> >>>>>>>>>>>> find any objective reason why the user-experience should
>> >> diverge,
>> >>>>>>>> and
>> >>>>>>>>>>>> 2) this will be the
>> >>>>>>>>>>>> way that the user will be able to interact with his
>> >> running job.
>> >>>>>>>>>>>> Assuming that the necessary
>> >>>>>>>>>>>> ports are open for the REST API to work, then I think
>> >> that the
>> >>>>>>>>>>>> JobClient can run against the
>> >>>>>>>>>>>> REST API without problems. If the needed ports are not
>> >> open, then
>> >>>>>>>> we
>> >>>>>>>>>>>> are safe to not return
>> >>>>>>>>>>>> a JobClient, as the user explicitly chose to close all
>> >> points of
>> >>>>>>>>>>>> communication to his running job.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On the topic of not hijacking the "env.execute()" in
>> >> order to get
>> >>>>>>>> the
>> >>>>>>>>>>>> Plan, I definitely agree but
>> >>>>>>>>>>>> for the proposal of having a "compile()" method in the
>> >> env, I would
>> >>>>>>>>>>>> like to have a better look at
>> >>>>>>>>>>>> the existing code.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>> Kostas
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <
>> >> wander4096@gmail.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Hi Yang,
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> It would be helpful if you check Stephan's last
>> >> comment,
>> >>>>>>>>>>>>> which states that isolation is important.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> For per-job mode, we run a dedicated cluster(maybe it
>> >>>>>>>>>>>>> should have been a couple of JM and TMs during FLIP-6
>> >>>>>>>>>>>>> design) for a specific job. Thus the process is
>> >> prevented
>> >>>>>>>>>>>>> from other jobs.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> In our cases there was a time we suffered from multi
>> >>>>>>>>>>>>> jobs submitted by different users and they affected
>> >>>>>>>>>>>>> each other so that all ran into an error state. Also,
>> >>>>>>>>>>>>> run the client inside the cluster could save client
>> >>>>>>>>>>>>> resource at some points.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> However, we also face several issues as you mentioned,
>> >>>>>>>>>>>>> that in per-job mode it always uses parent classloader
>> >>>>>>>>>>>>> thus classloading issues occur.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> BTW, one can makes an analogy between session/per-job
>> >> mode
>> >>>>>>>>>>>>> in  Flink, and client/cluster mode in Spark.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>> tison.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Yang Wang <da...@gmail.com> 于2019年8月22日周四
>> >> 上午11:25写道：
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> From the user's perspective, it is really confused
>> >> about the
>> >>>>>>>> scope
>> >>>>>>>>>> of
>> >>>>>>>>>>>>>> per-job cluster.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> If it means a flink cluster with single job, so that
>> >> we could
>> >>>>>>>> get
>> >>>>>>>>>>>> better
>> >>>>>>>>>>>>>> isolation.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Now it does not matter how we deploy the cluster,
>> >> directly
>> >>>>>>>>>>>> deploy(mode1)
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> or start a flink cluster and then submit job through
>> >> cluster
>> >>>>>>>>>>>> client(mode2).
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Otherwise, if it just means directly deploy, how
>> >> should we
>> >>>>>>>> name the
>> >>>>>>>>>>>> mode2,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> session with job or something else?
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> We could also benefit from the mode2. Users could
>> >> get the same
>> >>>>>>>>>>>> isolation
>> >>>>>>>>>>>>>> with mode1.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> The user code and dependencies will be loaded by
>> >> user class
>> >>>>>>>> loader
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> to avoid class conflict with framework.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Anyway, both of the two submission modes are useful.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> We just need to clarify the concepts.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Yang
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月20日周二
>> >> 下午5:58写道：
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Thanks for the clarification.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> The idea JobDeployer ever came into my mind when I
>> >> was
>> >>>>>>>> muddled
>> >>>>>>>>>> with
>> >>>>>>>>>>>>>>> how to execute per-job mode and session mode with
>> >> the same
>> >>>>>>>> user
>> >>>>>>>>>>> code
>> >>>>>>>>>>>>>>> and framework codepath.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> With the concept JobDeployer we back to the
>> >> statement that
>> >>>>>>>>>>>> environment
>> >>>>>>>>>>>>>>> knows every configs of cluster deployment and job
>> >>>>>>>> submission. We
>> >>>>>>>>>>>>>>> configure or generate from configuration a specific
>> >>>>>>>> JobDeployer
>> >>>>>>>>>> in
>> >>>>>>>>>>>>>>> environment and then code align on
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> *JobClient client = env.execute().get();*
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> which in session mode returned by
>> >> clusterClient.submitJob
>> >>>>>>>> and in
>> >>>>>>>>>>>> per-job
>> >>>>>>>>>>>>>>> mode returned by
>> >> clusterDescriptor.deployJobCluster.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Here comes a problem that currently we directly run
>> >>>>>>>>>>> ClusterEntrypoint
>> >>>>>>>>>>>>>>> with extracted job graph. Follow the JobDeployer
>> >> way we'd
>> >>>>>>>> better
>> >>>>>>>>>>>>>>> align entry point of per-job deployment at
>> >> JobDeployer.
>> >>>>>>>> Users run
>> >>>>>>>>>>>>>>> their main method or by a Cli(finally call main
>> >> method) to
>> >>>>>>>> deploy
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>> job cluster.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>> tison.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年8月20日周二
>> >> 下午4:40写道：
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Till has made some good comments here.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Two things to add:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>  - The job mode is very nice in the way that it
>> >> runs the
>> >>>>>>>>>> client
>> >>>>>>>>>>>> inside
>> >>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>> cluster (in the same image/process that is the
>> >> JM) and thus
>> >>>>>>>>>>> unifies
>> >>>>>>>>>>>>>> both
>> >>>>>>>>>>>>>>>> applications and what the Spark world calls the
>> >> "driver
>> >>>>>>>> mode".
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>  - Another thing I would add is that during the
>> >> FLIP-6
>> >>>>>>>> design,
>> >>>>>>>>>>> we
>> >>>>>>>>>>>> were
>> >>>>>>>>>>>>>>>> thinking about setups where Dispatcher and
>> >> JobManager are
>> >>>>>>>>>>> separate
>> >>>>>>>>>>>>>>>> processes.
>> >>>>>>>>>>>>>>>>    A Yarn or Mesos Dispatcher of a session
>> >> could run
>> >>>>>>>>>>> independently
>> >>>>>>>>>>>>>> (even
>> >>>>>>>>>>>>>>>> as privileged processes executing no code).
>> >>>>>>>>>>>>>>>>    Then you the "per-job" mode could still be
>> >> helpful:
>> >>>>>>>> when a
>> >>>>>>>>>>> job
>> >>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>> submitted to the dispatcher, it launches the JM
>> >> again in a
>> >>>>>>>>>>> per-job
>> >>>>>>>>>>>>>> mode,
>> >>>>>>>>>>>>>>> so
>> >>>>>>>>>>>>>>>> that JM and TM processes are bound to teh job
>> >> only. For
>> >>>>>>>> higher
>> >>>>>>>>>>>> security
>> >>>>>>>>>>>>>>>> setups, it is important that processes are not
>> >> reused
>> >>>>>>>> across
>> >>>>>>>>>>> jobs.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
>> >>>>>>>>>>>> trohrmann@apache.org>
>> >>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> I would not be in favour of getting rid of the
>> >> per-job
>> >>>>>>>> mode
>> >>>>>>>>>>>> since it
>> >>>>>>>>>>>>>>>>> simplifies the process of running Flink jobs
>> >>>>>>>> considerably.
>> >>>>>>>>>>>> Moreover,
>> >>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>> not only well suited for container deployments
>> >> but also
>> >>>>>>>> for
>> >>>>>>>>>>>>>> deployments
>> >>>>>>>>>>>>>>>>> where you want to guarantee job isolation. For
>> >> example, a
>> >>>>>>>>>> user
>> >>>>>>>>>>>> could
>> >>>>>>>>>>>>>>> use
>> >>>>>>>>>>>>>>>>> the per-job mode on Yarn to execute his job on
>> >> a separate
>> >>>>>>>>>>>> cluster.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> I think that having two notions of cluster
>> >> deployments
>> >>>>>>>>>> (session
>> >>>>>>>>>>>> vs.
>> >>>>>>>>>>>>>>>> per-job
>> >>>>>>>>>>>>>>>>> mode) does not necessarily contradict your
>> >> ideas for the
>> >>>>>>>>>> client
>> >>>>>>>>>>>> api
>> >>>>>>>>>>>>>>>>> refactoring. For example one could have the
>> >> following
>> >>>>>>>>>>> interfaces:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> - ClusterDeploymentDescriptor: encapsulates
>> >> the logic
>> >>>>>>>> how to
>> >>>>>>>>>>>> deploy a
>> >>>>>>>>>>>>>>>>> cluster.
>> >>>>>>>>>>>>>>>>> - ClusterClient: allows to interact with a
>> >> cluster
>> >>>>>>>>>>>>>>>>> - JobClient: allows to interact with a running
>> >> job
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Now the ClusterDeploymentDescriptor could have
>> >> two
>> >>>>>>>> methods:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> - ClusterClient deploySessionCluster()
>> >>>>>>>>>>>>>>>>> - JobClusterClient/JobClient
>> >>>>>>>> deployPerJobCluster(JobGraph)
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> where JobClusterClient is either a supertype of
>> >>>>>>>> ClusterClient
>> >>>>>>>>>>>> which
>> >>>>>>>>>>>>>>> does
>> >>>>>>>>>>>>>>>>> not give you the functionality to submit jobs
>> >> or
>> >>>>>>>>>>>> deployPerJobCluster
>> >>>>>>>>>>>>>>>>> returns directly a JobClient.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> When setting up the ExecutionEnvironment, one
>> >> would then
>> >>>>>>>> not
>> >>>>>>>>>>>> provide
>> >>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>> ClusterClient to submit jobs but a JobDeployer
>> >> which,
>> >>>>>>>>>> depending
>> >>>>>>>>>>>> on
>> >>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> selected mode, either uses a ClusterClient
>> >> (session
>> >>>>>>>> mode) to
>> >>>>>>>>>>>> submit
>> >>>>>>>>>>>>>>> jobs
>> >>>>>>>>>>>>>>>> or
>> >>>>>>>>>>>>>>>>> a ClusterDeploymentDescriptor to deploy per a
>> >> job mode
>> >>>>>>>>>> cluster
>> >>>>>>>>>>>> with
>> >>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>> to execute.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> These are just some thoughts how one could
>> >> make it
>> >>>>>>>> working
>> >>>>>>>>>>>> because I
>> >>>>>>>>>>>>>>>>> believe there is some value in using the per
>> >> job mode
>> >>>>>>>> from
>> >>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> ExecutionEnvironment.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Concerning the web submission, this is indeed
>> >> a bit
>> >>>>>>>> tricky.
>> >>>>>>>>>>> From
>> >>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>> cluster
>> >>>>>>>>>>>>>>>>> management stand point, I would in favour of
>> >> not
>> >>>>>>>> executing
>> >>>>>>>>>> user
>> >>>>>>>>>>>> code
>> >>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> REST endpoint. Especially when considering
>> >> security, it
>> >>>>>>>> would
>> >>>>>>>>>>> be
>> >>>>>>>>>>>> good
>> >>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>> have a well defined cluster behaviour where it
>> >> is
>> >>>>>>>> explicitly
>> >>>>>>>>>>>> stated
>> >>>>>>>>>>>>>>> where
>> >>>>>>>>>>>>>>>>> user code and, thus, potentially risky code is
>> >> executed.
>> >>>>>>>>>>> Ideally
>> >>>>>>>>>>>> we
>> >>>>>>>>>>>>>>> limit
>> >>>>>>>>>>>>>>>>> it to the TaskExecutor and JobMaster.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>>>>>>> Till
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:40 AM Flavio
>> >> Pompermaier <
>> >>>>>>>>>>>>>>> pompermaier@okkam.it
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> In my opinion the client should not use any
>> >>>>>>>> environment to
>> >>>>>>>>>>> get
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>>> Job
>> >>>>>>>>>>>>>>>>>> graph because the jar should reside ONLY on
>> >> the cluster
>> >>>>>>>>>> (and
>> >>>>>>>>>>>> not in
>> >>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>> client classpath otherwise there are always
>> >>>>>>>> inconsistencies
>> >>>>>>>>>>>> between
>> >>>>>>>>>>>>>>>>> client
>> >>>>>>>>>>>>>>>>>> and Flink Job manager's classpath).
>> >>>>>>>>>>>>>>>>>> In the YARN, Mesos and Kubernetes scenarios
>> >> you have
>> >>>>>>>> the
>> >>>>>>>>>> jar
>> >>>>>>>>>>>> but
>> >>>>>>>>>>>>>> you
>> >>>>>>>>>>>>>>>>> could
>> >>>>>>>>>>>>>>>>>> start a cluster that has the jar on the Job
>> >> Manager as
>> >>>>>>>> well
>> >>>>>>>>>>>> (but
>> >>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>> the only case where I think you can assume
>> >> that the
>> >>>>>>>> client
>> >>>>>>>>>>> has
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>>> jar
>> >>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>> the classpath..in the REST job submission
>> >> you don't
>> >>>>>>>> have
>> >>>>>>>>>> any
>> >>>>>>>>>>>>>>>> classpath).
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Thus, always in my opinion, the JobGraph
>> >> should be
>> >>>>>>>>>> generated
>> >>>>>>>>>>>> by the
>> >>>>>>>>>>>>>>> Job
>> >>>>>>>>>>>>>>>>>> Manager REST API.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
>> >>>>>>>>>>>> wander4096@gmail.com>
>> >>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> I would like to involve Till & Stephan here
>> >> to clarify
>> >>>>>>>>>> some
>> >>>>>>>>>>>>>> concept
>> >>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>> per-job mode.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> The term per-job is one of modes a cluster
>> >> could run
>> >>>>>>>> on.
>> >>>>>>>>>> It
>> >>>>>>>>>>> is
>> >>>>>>>>>>>>>>> mainly
>> >>>>>>>>>>>>>>>>>>> aimed
>> >>>>>>>>>>>>>>>>>>> at spawn
>> >>>>>>>>>>>>>>>>>>> a dedicated cluster for a specific job
>> >> while the job
>> >>>>>>>> could
>> >>>>>>>>>>> be
>> >>>>>>>>>>>>>>> packaged
>> >>>>>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>>>>> itself and thus the cluster initialized
>> >> with job so
>> >>>>>>>> that
>> >>>>>>>>>> get
>> >>>>>>>>>>>> rid
>> >>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>> separated
>> >>>>>>>>>>>>>>>>>>> submission step.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> This is useful for container deployments
>> >> where one
>> >>>>>>>> create
>> >>>>>>>>>>> his
>> >>>>>>>>>>>>>> image
>> >>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>>>> the job
>> >>>>>>>>>>>>>>>>>>> and then simply deploy the container.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> However, it is out of client scope since a
>> >>>>>>>>>>>> client(ClusterClient
>> >>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>> example) is for
>> >>>>>>>>>>>>>>>>>>> communicate with an existing cluster and
>> >> performance
>> >>>>>>>>>>> actions.
>> >>>>>>>>>>>>>>>> Currently,
>> >>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>> per-job
>> >>>>>>>>>>>>>>>>>>> mode, we extract the job graph and bundle
>> >> it into
>> >>>>>>>> cluster
>> >>>>>>>>>>>>>> deployment
>> >>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>> thus no
>> >>>>>>>>>>>>>>>>>>> concept of client get involved. It looks
>> >> like
>> >>>>>>>> reasonable
>> >>>>>>>>>> to
>> >>>>>>>>>>>>>> exclude
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>> deployment
>> >>>>>>>>>>>>>>>>>>> of per-job cluster from client api and use
>> >> dedicated
>> >>>>>>>>>> utility
>> >>>>>>>>>>>>>>>>>>> classes(deployers) for
>> >>>>>>>>>>>>>>>>>>> deployment.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
>> >> 于2019年8月20日周二
>> >>>>>>>> 下午12:37写道：
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Hi Aljoscha,
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Thanks for your reply and participance.
>> >> The Google
>> >>>>>>>> Doc
>> >>>>>>>>>> you
>> >>>>>>>>>>>>>> linked
>> >>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>> requires
>> >>>>>>>>>>>>>>>>>>>> permission and I think you could use a
>> >> share link
>> >>>>>>>>>> instead.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> I agree with that we almost reach a
>> >> consensus that
>> >>>>>>>>>>>> JobClient is
>> >>>>>>>>>>>>>>>>>>> necessary
>> >>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>> interacte with a running Job.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Let me check your open questions one by
>> >> one.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> 1. Separate cluster creation and job
>> >> submission for
>> >>>>>>>>>>> per-job
>> >>>>>>>>>>>>>> mode.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> As you mentioned here is where the
>> >> opinions
>> >>>>>>>> diverge. In
>> >>>>>>>>>> my
>> >>>>>>>>>>>>>>> document
>> >>>>>>>>>>>>>>>>>>> there
>> >>>>>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>> an alternative[2] that proposes excluding
>> >> per-job
>> >>>>>>>>>>> deployment
>> >>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>> client
>> >>>>>>>>>>>>>>>>>>>> api
>> >>>>>>>>>>>>>>>>>>>> scope and now I find it is more
>> >> reasonable we do the
>> >>>>>>>>>>>> exclusion.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> When in per-job mode, a dedicated
>> >> JobCluster is
>> >>>>>>>> launched
>> >>>>>>>>>>> to
>> >>>>>>>>>>>>>>> execute
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>> specific job. It is like a Flink
>> >> Application more
>> >>>>>>>> than a
>> >>>>>>>>>>>>>>> submission
>> >>>>>>>>>>>>>>>>>>>> of Flink Job. Client only takes care of
>> >> job
>> >>>>>>>> submission
>> >>>>>>>>>> and
>> >>>>>>>>>>>>>> assume
>> >>>>>>>>>>>>>>>>> there
>> >>>>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>> an existing cluster. In this way we are
>> >> able to
>> >>>>>>>> consider
>> >>>>>>>>>>>> per-job
>> >>>>>>>>>>>>>>>>> issues
>> >>>>>>>>>>>>>>>>>>>> individually and JobClusterEntrypoint
>> >> would be the
>> >>>>>>>>>> utility
>> >>>>>>>>>>>> class
>> >>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>> per-job
>> >>>>>>>>>>>>>>>>>>>> deployment.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Nevertheless, user program works in both
>> >> session
>> >>>>>>>> mode
>> >>>>>>>>>> and
>> >>>>>>>>>>>>>> per-job
>> >>>>>>>>>>>>>>>> mode
>> >>>>>>>>>>>>>>>>>>>> without
>> >>>>>>>>>>>>>>>>>>>> necessary to change code. JobClient in
>> >> per-job mode
>> >>>>>>>> is
>> >>>>>>>>>>>> returned
>> >>>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>>>>> env.execute as normal. However, it would
>> >> be no
>> >>>>>>>> longer a
>> >>>>>>>>>>>> wrapper
>> >>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>> RestClusterClient but a wrapper of
>> >>>>>>>> PerJobClusterClient
>> >>>>>>>>>>> which
>> >>>>>>>>>>>>>>>>>>> communicates
>> >>>>>>>>>>>>>>>>>>>> to Dispatcher locally.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> 2. How to deal with plan preview.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> With env.compile functions users can get
>> >> JobGraph or
>> >>>>>>>>>>>> FlinkPlan
>> >>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>> thus
>> >>>>>>>>>>>>>>>>>>>> they can preview the plan with
>> >> programming.
>> >>>>>>>> Typically it
>> >>>>>>>>>>>> looks
>> >>>>>>>>>>>>>>> like
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> if (preview configured) {
>> >>>>>>>>>>>>>>>>>>>>    FlinkPlan plan = env.compile();
>> >>>>>>>>>>>>>>>>>>>>    new JSONDumpGenerator(...).dump(plan);
>> >>>>>>>>>>>>>>>>>>>> } else {
>> >>>>>>>>>>>>>>>>>>>>    env.execute();
>> >>>>>>>>>>>>>>>>>>>> }
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> And `flink info` would be invalid any
>> >> more.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> 3. How to deal with Jar Submission at the
>> >> Web
>> >>>>>>>> Frontend.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> There is one more thread talked on this
>> >> topic[1].
>> >>>>>>>> Apart
>> >>>>>>>>>>> from
>> >>>>>>>>>>>>>>>> removing
>> >>>>>>>>>>>>>>>>>>>> the functions there are two alternatives.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> One is to introduce an interface has a
>> >> method
>> >>>>>>>> returns
>> >>>>>>>>>>>>>>>>> JobGraph/FilnkPlan
>> >>>>>>>>>>>>>>>>>>>> and Jar Submission only support main-class
>> >>>>>>>> implements
>> >>>>>>>>>> this
>> >>>>>>>>>>>>>>>> interface.
>> >>>>>>>>>>>>>>>>>>>> And then extract the JobGraph/FlinkPlan
>> >> just by
>> >>>>>>>> calling
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>> method.
>> >>>>>>>>>>>>>>>>>>>> In this way, it is even possible to
>> >> consider a
>> >>>>>>>>>> separation
>> >>>>>>>>>>>> of job
>> >>>>>>>>>>>>>>>>>>> creation
>> >>>>>>>>>>>>>>>>>>>> and job submission.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> The other is, as you mentioned, let
>> >> execute() do the
>> >>>>>>>>>>> actual
>> >>>>>>>>>>>>>>>> execution.
>> >>>>>>>>>>>>>>>>>>>> We won't execute the main method in the
>> >> WebFrontend
>> >>>>>>>> but
>> >>>>>>>>>>>> spawn a
>> >>>>>>>>>>>>>>>>> process
>> >>>>>>>>>>>>>>>>>>>> at WebMonitor side to execute. For return
>> >> part we
>> >>>>>>>> could
>> >>>>>>>>>>>> generate
>> >>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>> JobID from WebMonitor and pass it to the
>> >> execution
>> >>>>>>>>>>>> environemnt.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> 4. How to deal with detached mode.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> I think detached mode is a temporary
>> >> solution for
>> >>>>>>>>>>>> non-blocking
>> >>>>>>>>>>>>>>>>>>> submission.
>> >>>>>>>>>>>>>>>>>>>> In my document both submission and
>> >> execution return
>> >>>>>>>> a
>> >>>>>>>>>>>>>>>>> CompletableFuture
>> >>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>> users control whether or not wait for the
>> >> result. In
>> >>>>>>>>>> this
>> >>>>>>>>>>>> point
>> >>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>> don't
>> >>>>>>>>>>>>>>>>>>>> need a detached option but the
>> >> functionality is
>> >>>>>>>> covered.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> 5. How does per-job mode interact with
>> >> interactive
>> >>>>>>>>>>>> programming.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> All of YARN, Mesos and Kubernetes
>> >> scenarios follow
>> >>>>>>>> the
>> >>>>>>>>>>>> pattern
>> >>>>>>>>>>>>>>>> launch
>> >>>>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>>> JobCluster now. And I don't think there
>> >> would be
>> >>>>>>>>>>>> inconsistency
>> >>>>>>>>>>>>>>>> between
>> >>>>>>>>>>>>>>>>>>>> different resource management.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>> tison.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>
>> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
>> >>>>>>>>>>>>>>>>>>>> [2]
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>
>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org>
>> >>>>>>>> 于2019年8月16日周五
>> >>>>>>>>>>>> 下午9:20写道：
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Hi,
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> I read both Jeffs initial design
>> >> document and the
>> >>>>>>>> newer
>> >>>>>>>>>>>>>> document
>> >>>>>>>>>>>>>>> by
>> >>>>>>>>>>>>>>>>>>>>> Tison. I also finally found the time to
>> >> collect our
>> >>>>>>>>>>>> thoughts on
>> >>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>> issue,
>> >>>>>>>>>>>>>>>>>>>>> I had quite some discussions with Kostas
>> >> and this
>> >>>>>>>> is
>> >>>>>>>>>> the
>> >>>>>>>>>>>>>> result:
>> >>>>>>>>>>>>>>>> [1].
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> I think overall we agree that this part
>> >> of the
>> >>>>>>>> code is
>> >>>>>>>>>> in
>> >>>>>>>>>>>> dire
>> >>>>>>>>>>>>>>> need
>> >>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>> some refactoring/improvements but I
>> >> think there are
>> >>>>>>>>>> still
>> >>>>>>>>>>>> some
>> >>>>>>>>>>>>>>> open
>> >>>>>>>>>>>>>>>>>>>>> questions and some differences in
>> >> opinion what
>> >>>>>>>> those
>> >>>>>>>>>>>>>> refactorings
>> >>>>>>>>>>>>>>>>>>> should
>> >>>>>>>>>>>>>>>>>>>>> look like.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> I think the API-side is quite clear,
>> >> i.e. we need
>> >>>>>>>> some
>> >>>>>>>>>>>>>> JobClient
>> >>>>>>>>>>>>>>>> API
>> >>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>> allows interacting with a running Job.
>> >> It could be
>> >>>>>>>>>>>> worthwhile
>> >>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>> spin
>> >>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>> off into a separate FLIP because we can
>> >> probably
>> >>>>>>>> find
>> >>>>>>>>>>>> consensus
>> >>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>> part more easily.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> For the rest, the main open questions
>> >> from our doc
>> >>>>>>>> are
>> >>>>>>>>>>>> these:
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>  - Do we want to separate cluster
>> >> creation and job
>> >>>>>>>>>>>> submission
>> >>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>> per-job mode? In the past, there were
>> >> conscious
>> >>>>>>>> efforts
>> >>>>>>>>>>> to
>> >>>>>>>>>>>>>> *not*
>> >>>>>>>>>>>>>>>>>>> separate
>> >>>>>>>>>>>>>>>>>>>>> job submission from cluster creation for
>> >> per-job
>> >>>>>>>>>> clusters
>> >>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>> Mesos,
>> >>>>>>>>>>>>>>>>>>> YARN,
>> >>>>>>>>>>>>>>>>>>>>> Kubernets (see
>> >> StandaloneJobClusterEntryPoint).
>> >>>>>>>> Tison
>> >>>>>>>>>>>> suggests
>> >>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>> his
>> >>>>>>>>>>>>>>>>>>>>> design document to decouple this in
>> >> order to unify
>> >>>>>>>> job
>> >>>>>>>>>>>>>>> submission.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>  - How to deal with plan preview, which
>> >> needs to
>> >>>>>>>>>> hijack
>> >>>>>>>>>>>>>>> execute()
>> >>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>> let the outside code catch an exception?
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>  - How to deal with Jar Submission at
>> >> the Web
>> >>>>>>>>>> Frontend,
>> >>>>>>>>>>>> which
>> >>>>>>>>>>>>>>>> needs
>> >>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>> hijack execute() and let the outside
>> >> code catch an
>> >>>>>>>>>>>> exception?
>> >>>>>>>>>>>>>>>>>>>>> CliFrontend.run() “hijacks”
>> >>>>>>>>>>> ExecutionEnvironment.execute()
>> >>>>>>>>>>>> to
>> >>>>>>>>>>>>>>> get a
>> >>>>>>>>>>>>>>>>>>>>> JobGraph and then execute that JobGraph
>> >> manually.
>> >>>>>>>> We
>> >>>>>>>>>>> could
>> >>>>>>>>>>>> get
>> >>>>>>>>>>>>>>>> around
>> >>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>> by letting execute() do the actual
>> >> execution. One
>> >>>>>>>>>> caveat
>> >>>>>>>>>>>> for
>> >>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>> now the main() method doesn’t return (or
>> >> is forced
>> >>>>>>>> to
>> >>>>>>>>>>>> return by
>> >>>>>>>>>>>>>>>>>>> throwing an
>> >>>>>>>>>>>>>>>>>>>>> exception from execute()) which means
>> >> that for Jar
>> >>>>>>>>>>>> Submission
>> >>>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>> WebFrontend we have a long-running
>> >> main() method
>> >>>>>>>>>> running
>> >>>>>>>>>>>> in the
>> >>>>>>>>>>>>>>>>>>>>> WebFrontend. This doesn’t sound very
>> >> good. We
>> >>>>>>>> could get
>> >>>>>>>>>>>> around
>> >>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>> by
>> >>>>>>>>>>>>>>>>>>>>> removing the plan preview feature and by
>> >> removing
>> >>>>>>>> Jar
>> >>>>>>>>>>>>>>>>>>> Submission/Running.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>  - How to deal with detached mode?
>> >> Right now,
>> >>>>>>>>>>>>>>> DetachedEnvironment
>> >>>>>>>>>>>>>>>>> will
>> >>>>>>>>>>>>>>>>>>>>> execute the job and return immediately.
>> >> If users
>> >>>>>>>>>> control
>> >>>>>>>>>>>> when
>> >>>>>>>>>>>>>>> they
>> >>>>>>>>>>>>>>>>>>> want to
>> >>>>>>>>>>>>>>>>>>>>> return, by waiting on the job completion
>> >> future,
>> >>>>>>>> how do
>> >>>>>>>>>>> we
>> >>>>>>>>>>>> deal
>> >>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>>>> this?
>> >>>>>>>>>>>>>>>>>>>>> Do we simply remove the distinction
>> >> between
>> >>>>>>>>>>>>>>> detached/non-detached?
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>  - How does per-job mode interact with
>> >>>>>>>> “interactive
>> >>>>>>>>>>>>>> programming”
>> >>>>>>>>>>>>>>>>>>>>> (FLIP-36). For YARN, each execute() call
>> >> could
>> >>>>>>>> spawn a
>> >>>>>>>>>>> new
>> >>>>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>> YARN
>> >>>>>>>>>>>>>>>>>>>>> cluster. What about Mesos and Kubernetes?
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> The first open question is where the
>> >> opinions
>> >>>>>>>> diverge,
>> >>>>>>>>>> I
>> >>>>>>>>>>>> think.
>> >>>>>>>>>>>>>>> The
>> >>>>>>>>>>>>>>>>>>> rest
>> >>>>>>>>>>>>>>>>>>>>> are just open questions and interesting
>> >> things
>> >>>>>>>> that we
>> >>>>>>>>>>>> need to
>> >>>>>>>>>>>>>>>>>>> consider.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>> Aljoscha
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>
>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>> >>>>>>>>>>>>>>>>>>>>> <
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>
>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> On 31. Jul 2019, at 15:23, Jeff Zhang <
>> >>>>>>>>>>> zjffdu@gmail.com>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Thanks tison for the effort. I left a
>> >> few
>> >>>>>>>> comments.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
>> >> 于2019年7月31日周三
>> >>>>>>>>>>> 下午8:24写道：
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Hi Flavio,
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Thanks for your reply.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Either current impl and in the design,
>> >>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>>>>>>>>> never takes responsibility for
>> >> generating
>> >>>>>>>> JobGraph.
>> >>>>>>>>>>>>>>>>>>>>>>> (what you see in current codebase is
>> >> several
>> >>>>>>>> class
>> >>>>>>>>>>>> methods)
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Instead, user describes his program
>> >> in the main
>> >>>>>>>>>> method
>> >>>>>>>>>>>>>>>>>>>>>>> with ExecutionEnvironment apis and
>> >> calls
>> >>>>>>>>>> env.compile()
>> >>>>>>>>>>>>>>>>>>>>>>> or env.optimize() to get FlinkPlan
>> >> and JobGraph
>> >>>>>>>>>>>>>> respectively.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> For listing main classes in a jar and
>> >> choose
>> >>>>>>>> one for
>> >>>>>>>>>>>>>>>>>>>>>>> submission, you're now able to
>> >> customize a CLI
>> >>>>>>>> to do
>> >>>>>>>>>>> it.
>> >>>>>>>>>>>>>>>>>>>>>>> Specifically, the path of jar is
>> >> passed as
>> >>>>>>>> arguments
>> >>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>> in the customized CLI you list main
>> >> classes,
>> >>>>>>>> choose
>> >>>>>>>>>>> one
>> >>>>>>>>>>>>>>>>>>>>>>> to submit to the cluster.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>> tison.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
>> >> pompermaier@okkam.it>
>> >>>>>>>>>>> 于2019年7月31日周三
>> >>>>>>>>>>>>>>>> 下午8:12写道：
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> Just one note on my side: it is not
>> >> clear to me
>> >>>>>>>>>>>> whether the
>> >>>>>>>>>>>>>>>>> client
>> >>>>>>>>>>>>>>>>>>>>> needs
>> >>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>> be able to generate a job graph or
>> >> not.
>> >>>>>>>>>>>>>>>>>>>>>>>> In my opinion, the job jar must
>> >> resides only
>> >>>>>>>> on the
>> >>>>>>>>>>>>>>>>>>> server/jobManager
>> >>>>>>>>>>>>>>>>>>>>>>> side
>> >>>>>>>>>>>>>>>>>>>>>>>> and the client requires a way to get
>> >> the job
>> >>>>>>>> graph.
>> >>>>>>>>>>>>>>>>>>>>>>>> If you really want to access to the
>> >> job graph,
>> >>>>>>>> I'd
>> >>>>>>>>>>> add
>> >>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>> dedicated
>> >>>>>>>>>>>>>>>>>>>>> method
>> >>>>>>>>>>>>>>>>>>>>>>>> on the ClusterClient. like:
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>  - getJobGraph(jarId, mainClass):
>> >> JobGraph
>> >>>>>>>>>>>>>>>>>>>>>>>>  - listMainClasses(jarId):
>> >> List<String>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> These would require some addition
>> >> also on the
>> >>>>>>>> job
>> >>>>>>>>>>>> manager
>> >>>>>>>>>>>>>>>>> endpoint
>> >>>>>>>>>>>>>>>>>>> as
>> >>>>>>>>>>>>>>>>>>>>>>>> well..what do you think?
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 31, 2019 at 12:42 PM
>> >> Zili Chen <
>> >>>>>>>>>>>>>>>> wander4096@gmail.com
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Here is a document[1] on client api
>> >>>>>>>> enhancement
>> >>>>>>>>>> from
>> >>>>>>>>>>>> our
>> >>>>>>>>>>>>>>>>>>> perspective.
>> >>>>>>>>>>>>>>>>>>>>>>>>> We have investigated current
>> >> implementations.
>> >>>>>>>> And
>> >>>>>>>>>> we
>> >>>>>>>>>>>>>> propose
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Unify the implementation of
>> >> cluster
>> >>>>>>>> deployment
>> >>>>>>>>>>> and
>> >>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>> submission
>> >>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>> Flink.
>> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Provide programmatic interfaces
>> >> to allow
>> >>>>>>>>>> flexible
>> >>>>>>>>>>>> job
>> >>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>> cluster
>> >>>>>>>>>>>>>>>>>>>>>>>>> management.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> The first proposal is aimed at
>> >> reducing code
>> >>>>>>>> paths
>> >>>>>>>>>>> of
>> >>>>>>>>>>>>>>> cluster
>> >>>>>>>>>>>>>>>>>>>>>>> deployment
>> >>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>> job submission so that one can
>> >> adopt Flink in
>> >>>>>>>> his
>> >>>>>>>>>>>> usage
>> >>>>>>>>>>>>>>>> easily.
>> >>>>>>>>>>>>>>>>>>> The
>> >>>>>>>>>>>>>>>>>>>>>>>> second
>> >>>>>>>>>>>>>>>>>>>>>>>>> proposal is aimed at providing rich
>> >>>>>>>> interfaces for
>> >>>>>>>>>>>>>> advanced
>> >>>>>>>>>>>>>>>>> users
>> >>>>>>>>>>>>>>>>>>>>>>>>> who want to make accurate control
>> >> of these
>> >>>>>>>> stages.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Quick reference on open questions:
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Exclude job cluster deployment
>> >> from client
>> >>>>>>>> side
>> >>>>>>>>>>> or
>> >>>>>>>>>>>>>>> redefine
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>> semantic
>> >>>>>>>>>>>>>>>>>>>>>>>>> of job cluster? Since it fits in a
>> >> process
>> >>>>>>>> quite
>> >>>>>>>>>>>> different
>> >>>>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>>>>>> session
>> >>>>>>>>>>>>>>>>>>>>>>>>> cluster deployment and job
>> >> submission.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Maintain the codepaths handling
>> >> class
>> >>>>>>>>>>>>>>>>> o.a.f.api.common.Program
>> >>>>>>>>>>>>>>>>>>> or
>> >>>>>>>>>>>>>>>>>>>>>>>>> implement customized program
>> >> handling logic by
>> >>>>>>>>>>>> customized
>> >>>>>>>>>>>>>>>>>>>>> CliFrontend?
>> >>>>>>>>>>>>>>>>>>>>>>>>> See also this thread[2] and the
>> >> document[1].
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> 3. Expose ClusterClient as public
>> >> api or just
>> >>>>>>>>>> expose
>> >>>>>>>>>>>> api
>> >>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>>>>>>>>>>>>>>>>>>>>>>>> and delegate them to ClusterClient?
>> >> Further,
>> >>>>>>>> in
>> >>>>>>>>>>>> either way
>> >>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>> worth
>> >>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>> introduce a JobClient which is an
>> >>>>>>>> encapsulation of
>> >>>>>>>>>>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>>>> associated to specific job?
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>>>> tison.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>
>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
>> >>>>>>>>>>>>>>>>>>>>>>>>> [2]
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>
>> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang <zj...@gmail.com>
>> >> 于2019年7月24日周三
>> >>>>>>>>>>> 上午9:19写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Stephan, I will follow up
>> >> this issue
>> >>>>>>>> in
>> >>>>>>>>>> next
>> >>>>>>>>>>>> few
>> >>>>>>>>>>>>>>>> weeks,
>> >>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>> will
>> >>>>>>>>>>>>>>>>>>>>>>>>>> refine the design doc. We could
>> >> discuss more
>> >>>>>>>>>>> details
>> >>>>>>>>>>>>>> after
>> >>>>>>>>>>>>>>>> 1.9
>> >>>>>>>>>>>>>>>>>>>>>>> release.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
>> >>>>>>>> 于2019年7月24日周三
>> >>>>>>>>>>>> 上午12:58写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all!
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> This thread has stalled for a
>> >> bit, which I
>> >>>>>>>>>> assume
>> >>>>>>>>>>>> ist
>> >>>>>>>>>>>>>>> mostly
>> >>>>>>>>>>>>>>>>>>> due to
>> >>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Flink 1.9 feature freeze and
>> >> release testing
>> >>>>>>>>>>> effort.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I personally still recognize this
>> >> issue as
>> >>>>>>>> one
>> >>>>>>>>>>>> important
>> >>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>>>>> solved.
>> >>>>>>>>>>>>>>>>>>>>>>>>>> I'd
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> be happy to help resume this
>> >> discussion soon
>> >>>>>>>>>>> (after
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>>> 1.9
>> >>>>>>>>>>>>>>>>>>>>>>> release)
>> >>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> see if we can do some step
>> >> towards this in
>> >>>>>>>> Flink
>> >>>>>>>>>>>> 1.10.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 10:41 AM
>> >> Flavio
>> >>>>>>>>>>> Pompermaier
>> >>>>>>>>>>>> <
>> >>>>>>>>>>>>>>>>>>>>>>>>>> pompermaier@okkam.it>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> That's exactly what I suggested
>> >> a long time
>> >>>>>>>>>> ago:
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>>> REST
>> >>>>>>>>>>>>>>>>>>>>>>>>> client
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> should not require any Flink
>> >> dependency,
>> >>>>>>>> only
>> >>>>>>>>>>> http
>> >>>>>>>>>>>>>>> library
>> >>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>> call
>> >>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> REST
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> services to submit and monitor a
>> >> job.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested also in [1] was
>> >> to have a
>> >>>>>>>> way
>> >>>>>>>>>> to
>> >>>>>>>>>>>>>>>>> automatically
>> >>>>>>>>>>>>>>>>>>>>>>>>> suggest
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> user (via a UI) the available
>> >> main classes
>> >>>>>>>> and
>> >>>>>>>>>>>> their
>> >>>>>>>>>>>>>>>> required
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> parameters[2].
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Another problem we have with
>> >> Flink is that
>> >>>>>>>> the
>> >>>>>>>>>>> Rest
>> >>>>>>>>>>>>>>> client
>> >>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>> CLI
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> one
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> behaves differently and we use
>> >> the CLI
>> >>>>>>>> client
>> >>>>>>>>>>> (via
>> >>>>>>>>>>>> ssh)
>> >>>>>>>>>>>>>>>>> because
>> >>>>>>>>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>>>>> allows
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> to call some other method after
>> >>>>>>>> env.execute()
>> >>>>>>>>>> [3]
>> >>>>>>>>>>>> (we
>> >>>>>>>>>>>>>>> have
>> >>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>> call
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> another
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> REST service to signal the end
>> >> of the job).
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Int his regard, a dedicated
>> >> interface,
>> >>>>>>>> like the
>> >>>>>>>>>>>>>>> JobListener
>> >>>>>>>>>>>>>>>>>>>>>>>> suggested
>> >>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the previous emails, would be
>> >> very helpful
>> >>>>>>>>>>> (IMHO).
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10864
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10862
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10879
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 9:54 AM
>> >> Jeff Zhang
>> >>>>>>>> <
>> >>>>>>>>>>>>>>>> zjffdu@gmail.com
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Tison,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your comments.
>> >> Overall I agree
>> >>>>>>>> with
>> >>>>>>>>>>> you
>> >>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>>>>> difficult
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> down stream project to
>> >> integrate with
>> >>>>>>>> flink
>> >>>>>>>>>> and
>> >>>>>>>>>>> we
>> >>>>>>>>>>>>>> need
>> >>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>> refactor
>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> current flink client api.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> And I agree that CliFrontend
>> >> should only
>> >>>>>>>>>> parsing
>> >>>>>>>>>>>>>> command
>> >>>>>>>>>>>>>>>>> line
>> >>>>>>>>>>>>>>>>>>>>>>>>>> arguments
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> then pass them to
>> >> ExecutionEnvironment.
>> >>>>>>>> It is
>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment's
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> responsibility to compile job,
>> >> create
>> >>>>>>>> cluster,
>> >>>>>>>>>>> and
>> >>>>>>>>>>>>>>> submit
>> >>>>>>>>>>>>>>>>> job.
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Besides
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that, Currently flink has many
>> >>>>>>>>>>>> ExecutionEnvironment
>> >>>>>>>>>>>>>>>>>>>>>>>>> implementations,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink will use the specific one
>> >> based on
>> >>>>>>>> the
>> >>>>>>>>>>>> context.
>> >>>>>>>>>>>>>>>> IMHO,
>> >>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary, ExecutionEnvironment
>> >> should be
>> >>>>>>>> able
>> >>>>>>>>>>> to
>> >>>>>>>>>>>> do
>> >>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> right
>> >>>>>>>>>>>>>>>>>>>>>>>>> thing
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> based
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the FlinkConf it is
>> >> received. Too many
>> >>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation is another
>> >> burden for
>> >>>>>>>>>> downstream
>> >>>>>>>>>>>>>> project
>> >>>>>>>>>>>>>>>>>>>>>>>>> integration.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> One thing I'd like to mention
>> >> is flink's
>> >>>>>>>> scala
>> >>>>>>>>>>>> shell
>> >>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>> sql
>> >>>>>>>>>>>>>>>>>>>>>>>>> client,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> although they are sub-modules
>> >> of flink,
>> >>>>>>>> they
>> >>>>>>>>>>>> could be
>> >>>>>>>>>>>>>>>>> treated
>> >>>>>>>>>>>>>>>>>>>>>>> as
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> project which use flink's
>> >> client api.
>> >>>>>>>>>> Currently
>> >>>>>>>>>>>> you
>> >>>>>>>>>>>>>> will
>> >>>>>>>>>>>>>>>>> find
>> >>>>>>>>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> easy for them to integrate with
>> >> flink,
>> >>>>>>>> they
>> >>>>>>>>>>> share
>> >>>>>>>>>>>> many
>> >>>>>>>>>>>>>>>>>>>>>>> duplicated
>> >>>>>>>>>>>>>>>>>>>>>>>>>> code.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> is another sign that we should
>> >> refactor
>> >>>>>>>> flink
>> >>>>>>>>>>>> client
>> >>>>>>>>>>>>>>> api.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I believe it is a large and
>> >> hard change,
>> >>>>>>>> and I
>> >>>>>>>>>>> am
>> >>>>>>>>>>>>>> afraid
>> >>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> keep
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility since many of
>> >> changes are
>> >>>>>>>> user
>> >>>>>>>>>>>> facing.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <wander4096@gmail.com
>> >>>
>> >>>>>>>>>> 于2019年6月24日周一
>> >>>>>>>>>>>>>>> 下午2:53写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> After a closer look on our
>> >> client apis,
>> >>>>>>>> I can
>> >>>>>>>>>>> see
>> >>>>>>>>>>>>>> there
>> >>>>>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>>>>>>>>> two
>> >>>>>>>>>>>>>>>>>>>>>>>>>> major
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues to consistency and
>> >> integration,
>> >>>>>>>> namely
>> >>>>>>>>>>>>>> different
>> >>>>>>>>>>>>>>>>>>>>>>>>> deployment
>> >>>>>>>>>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job cluster which couples job
>> >> graph
>> >>>>>>>> creation
>> >>>>>>>>>>> and
>> >>>>>>>>>>>>>>> cluster
>> >>>>>>>>>>>>>>>>>>>>>>>>>> deployment,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submission via CliFrontend
>> >> confusing
>> >>>>>>>>>>> control
>> >>>>>>>>>>>> flow
>> >>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>> graph
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compilation and job
>> >> submission. I'd like
>> >>>>>>>> to
>> >>>>>>>>>>>> follow
>> >>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>> discuss
>> >>>>>>>>>>>>>>>>>>>>>>>>>> above,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mainly the process described
>> >> by Jeff and
>> >>>>>>>>>>>> Stephan, and
>> >>>>>>>>>>>>>>>> share
>> >>>>>>>>>>>>>>>>>>>>>>> my
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas on these issues.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend confuses the
>> >> control flow
>> >>>>>>>> of
>> >>>>>>>>>> job
>> >>>>>>>>>>>>>>>> compilation
>> >>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> submission.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Following the process of job
>> >> submission
>> >>>>>>>>>> Stephan
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>>>> Jeff
>> >>>>>>>>>>>>>>>>>>>>>>>>> described,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution environment knows
>> >> all configs
>> >>>>>>>> of
>> >>>>>>>>>> the
>> >>>>>>>>>>>>>> cluster
>> >>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> topos/settings
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the job. Ideally, in the
>> >> main method
>> >>>>>>>> of
>> >>>>>>>>>> user
>> >>>>>>>>>>>>>>> program,
>> >>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>>> calls
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> #execute
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (or named #submit) and Flink
>> >> deploys the
>> >>>>>>>>>>> cluster,
>> >>>>>>>>>>>>>>> compile
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> graph
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit it to the cluster.
>> >> However,
>> >>>>>>>>>> current
>> >>>>>>>>>>>>>>>> CliFrontend
>> >>>>>>>>>>>>>>>>>>>>>>> does
>> >>>>>>>>>>>>>>>>>>>>>>>>> all
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> these
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things inside its #runProgram
>> >> method,
>> >>>>>>>> which
>> >>>>>>>>>>>>>> introduces
>> >>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>> lot
>> >>>>>>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of (stream) execution
>> >> environment.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually, it sets up an exec
>> >> env that
>> >>>>>>>> hijacks
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> #execute/executePlan
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method, initializes the job
>> >> graph and
>> >>>>>>>> abort
>> >>>>>>>>>>>>>> execution.
>> >>>>>>>>>>>>>>>> And
>> >>>>>>>>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow back to
>> >> CliFrontend, it
>> >>>>>>>> deploys
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>>> cluster(or
>> >>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the client) and submits the
>> >> job graph.
>> >>>>>>>> This
>> >>>>>>>>>> is
>> >>>>>>>>>>>> quite
>> >>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>>>>>> specific
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> internal
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> process inside Flink and none
>> >> of
>> >>>>>>>> consistency
>> >>>>>>>>>> to
>> >>>>>>>>>>>>>>> anything.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Deployment of job cluster
>> >> couples job
>> >>>>>>>>>> graph
>> >>>>>>>>>>>>>> creation
>> >>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>> cluster
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment. Abstractly, from
>> >> user job to
>> >>>>>>>> a
>> >>>>>>>>>>>> concrete
>> >>>>>>>>>>>>>>>>>>>>>>> submission,
>> >>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    create JobGraph --\
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create ClusterClient -->
>> >> submit JobGraph
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> such a dependency.
>> >> ClusterClient was
>> >>>>>>>> created
>> >>>>>>>>>> by
>> >>>>>>>>>>>>>>> deploying
>> >>>>>>>>>>>>>>>>> or
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieving.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph submission requires a
>> >> compiled
>> >>>>>>>>>>> JobGraph
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>> valid
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but the creation of
>> >> ClusterClient is
>> >>>>>>>>>> abstractly
>> >>>>>>>>>>>>>>>> independent
>> >>>>>>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph. However, in job
>> >> cluster mode,
>> >>>>>>>> we
>> >>>>>>>>>>>> deploy job
>> >>>>>>>>>>>>>>>>> cluster
>> >>>>>>>>>>>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> graph, which means we use
>> >> another
>> >>>>>>>> process:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create JobGraph --> deploy
>> >> cluster with
>> >>>>>>>> the
>> >>>>>>>>>>>> JobGraph
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Here is another inconsistency
>> >> and
>> >>>>>>>> downstream
>> >>>>>>>>>>>>>>>>> projects/client
>> >>>>>>>>>>>>>>>>>>>>>>>> apis
>> >>>>>>>>>>>>>>>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forced to handle different
>> >> cases with
>> >>>>>>>> rare
>> >>>>>>>>>>>> supports
>> >>>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>>>>>>>> Flink.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since we likely reached a
>> >> consensus on
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. all configs gathered by
>> >> Flink
>> >>>>>>>>>> configuration
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>>>> passed
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. execution environment knows
>> >> all
>> >>>>>>>> configs
>> >>>>>>>>>> and
>> >>>>>>>>>>>>>> handles
>> >>>>>>>>>>>>>>>>>>>>>>>>>> execution(both
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment and submission)
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the issues above I propose
>> >> eliminating
>> >>>>>>>>>>>>>>> inconsistencies
>> >>>>>>>>>>>>>>>>> by
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> following
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend should exactly
>> >> be a front
>> >>>>>>>> end,
>> >>>>>>>>>>> at
>> >>>>>>>>>>>>>> least
>> >>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>> "run"
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That means it just gathered
>> >> and passed
>> >>>>>>>> all
>> >>>>>>>>>>> config
>> >>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>>>>>>>> command
>> >>>>>>>>>>>>>>>>>>>>>>>>> line
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the main method of user
>> >> program.
>> >>>>>>>> Execution
>> >>>>>>>>>>>>>> environment
>> >>>>>>>>>>>>>>>>> knows
>> >>>>>>>>>>>>>>>>>>>>>>>> all
>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> info
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and with an addition to utils
>> >> for
>> >>>>>>>>>>> ClusterClient,
>> >>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>>>>> gracefully
>> >>>>>>>>>>>>>>>>>>>>>>>>> get
>> >>>>>>>>>>>>>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient by deploying or
>> >>>>>>>> retrieving. In
>> >>>>>>>>>>> this
>> >>>>>>>>>>>>>> way,
>> >>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>>>>> don't
>> >>>>>>>>>>>>>>>>>>>>>>>>>> need
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hijack #execute/executePlan
>> >> methods and
>> >>>>>>>> can
>> >>>>>>>>>>>> remove
>> >>>>>>>>>>>>>>>> various
>> >>>>>>>>>>>>>>>>>>>>>>>>> hacking
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses of exec env, as
>> >> well as #run
>> >>>>>>>>>> methods
>> >>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient(for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> an
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interface-ized ClusterClient).
>> >> Now the
>> >>>>>>>>>> control
>> >>>>>>>>>>>> flow
>> >>>>>>>>>>>>>>> flows
>> >>>>>>>>>>>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> CliFrontend
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the main method and never
>> >> returns.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Job cluster means a cluster
>> >> for the
>> >>>>>>>>>> specific
>> >>>>>>>>>>>> job.
>> >>>>>>>>>>>>>>> From
>> >>>>>>>>>>>>>>>>>>>>>>>> another
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> perspective, it is an
>> >> ephemeral session.
>> >>>>>>>> We
>> >>>>>>>>>> may
>> >>>>>>>>>>>>>>> decouple
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with a compiled job graph, but
>> >> start a
>> >>>>>>>>>> session
>> >>>>>>>>>>>> with
>> >>>>>>>>>>>>>>> idle
>> >>>>>>>>>>>>>>>>>>>>>>>> timeout
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit the job following.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> These topics, before we go
>> >> into more
>> >>>>>>>> details
>> >>>>>>>>>> on
>> >>>>>>>>>>>>>> design
>> >>>>>>>>>>>>>>> or
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are better to be aware and
>> >> discussed for
>> >>>>>>>> a
>> >>>>>>>>>>>> consensus.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <
>> >> wander4096@gmail.com>
>> >>>>>>>>>> 于2019年6月20日周四
>> >>>>>>>>>>>>>>> 上午3:21写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for raising this
>> >> thread and the
>> >>>>>>>>>> design
>> >>>>>>>>>>>>>>> document!
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As @Thomas Weise mentioned
>> >> above,
>> >>>>>>>> extending
>> >>>>>>>>>>>> config
>> >>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>> flink
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires far more effort than
>> >> it should
>> >>>>>>>> be.
>> >>>>>>>>>>>> Another
>> >>>>>>>>>>>>>>>>> example
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is we achieve detach mode by
>> >> introduce
>> >>>>>>>>>> another
>> >>>>>>>>>>>>>>> execution
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment which also hijack
>> >> #execute
>> >>>>>>>>>> method.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with your idea that
>> >> user would
>> >>>>>>>>>>>> configure all
>> >>>>>>>>>>>>>>>>> things
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and flink "just" respect it.
>> >> On this
>> >>>>>>>> topic I
>> >>>>>>>>>>>> think
>> >>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>> unusual
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow when CliFrontend
>> >> handle
>> >>>>>>>> "run"
>> >>>>>>>>>>>> command
>> >>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>> problem.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It handles several configs,
>> >> mainly about
>> >>>>>>>>>>> cluster
>> >>>>>>>>>>>>>>>> settings,
>> >>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thus main method of user
>> >> program is
>> >>>>>>>> unaware
>> >>>>>>>>>> of
>> >>>>>>>>>>>> them.
>> >>>>>>>>>>>>>>>> Also
>> >>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> compiles
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> app to job graph by run the
>> >> main method
>> >>>>>>>>>> with a
>> >>>>>>>>>>>>>>> hijacked
>> >>>>>>>>>>>>>>>>> exec
>> >>>>>>>>>>>>>>>>>>>>>>>>> env,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which constrain the main
>> >> method further.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to write down a few
>> >> of notes on
>> >>>>>>>>>>>>>> configs/args
>> >>>>>>>>>>>>>>>> pass
>> >>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> respect,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as well as decoupling job
>> >> compilation
>> >>>>>>>> and
>> >>>>>>>>>>>>>> submission.
>> >>>>>>>>>>>>>>>>> Share
>> >>>>>>>>>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread later.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SHI Xiaogang <
>> >> shixiaogangg@gmail.com>
>> >>>>>>>>>>>> 于2019年6月17日周一
>> >>>>>>>>>>>>>>>>>>>>>>> 下午7:29写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff and Flavio,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Jeff a lot for
>> >> proposing the
>> >>>>>>>> design
>> >>>>>>>>>>>>>> document.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are also working on
>> >> refactoring
>> >>>>>>>>>>>> ClusterClient to
>> >>>>>>>>>>>>>>>> allow
>> >>>>>>>>>>>>>>>>>>>>>>>>>> flexible
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> efficient job management in
>> >> our
>> >>>>>>>> real-time
>> >>>>>>>>>>>> platform.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We would like to draft a
>> >> document to
>> >>>>>>>> share
>> >>>>>>>>>>> our
>> >>>>>>>>>>>>>> ideas
>> >>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>>>>>>>>> you.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think it's a good idea to
>> >> have
>> >>>>>>>> something
>> >>>>>>>>>>> like
>> >>>>>>>>>>>>>>> Apache
>> >>>>>>>>>>>>>>>>> Livy
>> >>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the efforts discussed here
>> >> will take a
>> >>>>>>>>>> great
>> >>>>>>>>>>>> step
>> >>>>>>>>>>>>>>>> forward
>> >>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>> it.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Xiaogang
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
>> >>>>>>>> pompermaier@okkam.it>
>> >>>>>>>>>>>>>>>> 于2019年6月17日周一
>> >>>>>>>>>>>>>>>>>>>>>>>>>> 下午7:13写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Is there any possibility to
>> >> have
>> >>>>>>>> something
>> >>>>>>>>>>>> like
>> >>>>>>>>>>>>>>> Apache
>> >>>>>>>>>>>>>>>>>>>>>>> Livy
>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink in the future?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >> https://livy.apache.org/
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 11, 2019 at
>> >> 5:23 PM Jeff
>> >>>>>>>>>> Zhang <
>> >>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any API we expose
>> >> should not have
>> >>>>>>>>>>>> dependencies
>> >>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package or
>> >> other
>> >>>>>>>>>>>> implementation
>> >>>>>>>>>>>>>>>>>>>>>>> details.
>> >>>>>>>>>>>>>>>>>>>>>>>> To
>> >>>>>>>>>>>>>>>>>>>>>>>>>> me,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
>> >> ClusterClient
>> >>>>>>>> cannot be
>> >>>>>>>>>>>> exposed
>> >>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>> users
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> because
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from the
>> >>>>>>>> optimiser and
>> >>>>>>>>>>>> runtime
>> >>>>>>>>>>>>>>>>>>>>>>>> packages.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We should change
>> >> ClusterClient from
>> >>>>>>>> class
>> >>>>>>>>>>> to
>> >>>>>>>>>>>>>>>> interface.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment only
>> >> use the
>> >>>>>>>>>> interface
>> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>>>>>>>>>>>> which
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in flink-clients while the
>> >> concrete
>> >>>>>>>>>>>>>> implementation
>> >>>>>>>>>>>>>>>>>>>>>>> class
>> >>>>>>>>>>>>>>>>>>>>>>>>>> could
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-runtime.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What happens when a
>> >>>>>>>> failure/restart in
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>> client
>> >>>>>>>>>>>>>>>>>>>>>>>>>> happens?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to be a way of
>> >> re-establishing the
>> >>>>>>>>>>>> connection to
>> >>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>> job,
>> >>>>>>>>>>>>>>>>>>>>>>>>> set
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> up
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners again, etc.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Good point.  First we need
>> >> to define
>> >>>>>>>> what
>> >>>>>>>>>>>> does
>> >>>>>>>>>>>>>>>>>>>>>>>>>> failure/restart
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client mean. IIUC, that
>> >> usually mean
>> >>>>>>>>>>> network
>> >>>>>>>>>>>>>>> failure
>> >>>>>>>>>>>>>>>>>>>>>>>> which
>> >>>>>>>>>>>>>>>>>>>>>>>>>> will
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happen in
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class RestClient. If my
>> >>>>>>>> understanding is
>> >>>>>>>>>>>> correct,
>> >>>>>>>>>>>>>>>>>>>>>>>>>> restart/retry
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mechanism
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be done in
>> >> RestClient.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <
>> >>>>>>>> aljoscha@apache.org>
>> >>>>>>>>>>>>>>> 于2019年6月11日周二
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:10写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Some points to consider:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * Any API we expose
>> >> should not have
>> >>>>>>>>>>>> dependencies
>> >>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package
>> >> or other
>> >>>>>>>>>>>> implementation
>> >>>>>>>>>>>>>>>>>>>>>>>> details.
>> >>>>>>>>>>>>>>>>>>>>>>>>> To
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> me,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
>> >> ClusterClient
>> >>>>>>>> cannot be
>> >>>>>>>>>>>> exposed
>> >>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>> users
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> because
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from
>> >> the
>> >>>>>>>> optimiser
>> >>>>>>>>>> and
>> >>>>>>>>>>>>>> runtime
>> >>>>>>>>>>>>>>>>>>>>>>>>> packages.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * What happens when a
>> >>>>>>>> failure/restart in
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>> client
>> >>>>>>>>>>>>>>>>>>>>>>>>>> happens?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be a way of
>> >> re-establishing the
>> >>>>>>>>>> connection
>> >>>>>>>>>>>> to
>> >>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>> job,
>> >>>>>>>>>>>>>>>>>>>>>>>>> set
>> >>>>>>>>>>>>>>>>>>>>>>>>>> up
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again, etc.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 29. May 2019, at
>> >> 10:17, Jeff
>> >>>>>>>> Zhang <
>> >>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry folks, the design
>> >> doc is
>> >>>>>>>> late as
>> >>>>>>>>>>> you
>> >>>>>>>>>>>>>>>>>>>>>>> expected.
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Here's
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doc
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I drafted, welcome any
>> >> comments and
>> >>>>>>>>>>>> feedback.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>
>> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <
>> >> sewen@apache.org>
>> >>>>>>>>>>>> 于2019年2月14日周四
>> >>>>>>>>>>>>>>>>>>>>>>>>> 下午8:43写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Nice that this
>> >> discussion is
>> >>>>>>>>>> happening.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the FLIP, we could
>> >> also
>> >>>>>>>> revisit the
>> >>>>>>>>>>>> entire
>> >>>>>>>>>>>>>>> role
>> >>>>>>>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environments
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the idea was:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the environments take
>> >> care of
>> >>>>>>>> the
>> >>>>>>>>>>>> specific
>> >>>>>>>>>>>>>>>>>>>>>>> setup
>> >>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> standalone
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (no
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> setup needed), yarn,
>> >> mesos, etc.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the session ones have
>> >> control
>> >>>>>>>> over
>> >>>>>>>>>> the
>> >>>>>>>>>>>>>>> session.
>> >>>>>>>>>>>>>>>>>>>>>>>> The
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> holds
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the session client.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - running a job gives a
>> >> "control"
>> >>>>>>>>>> object
>> >>>>>>>>>>>> for
>> >>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>>>> job.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> That
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behavior
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the same in all
>> >> environments.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The actual
>> >> implementation diverged
>> >>>>>>>>>> quite
>> >>>>>>>>>>>> a bit
>> >>>>>>>>>>>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>>>>>>>>>>> that.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Happy
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about
>> >> straitening this
>> >>>>>>>> out
>> >>>>>>>>>> a
>> >>>>>>>>>>>> bit
>> >>>>>>>>>>>>>>> more.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at
>> >> 4:58 AM
>> >>>>>>>> Jeff
>> >>>>>>>>>>>> Zhang <
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi folks,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for late
>> >> response, It
>> >>>>>>>> seems we
>> >>>>>>>>>>>> reach
>> >>>>>>>>>>>>>>>>>>>>>>>> consensus
>> >>>>>>>>>>>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, I
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP for this with
>> >> more detailed
>> >>>>>>>>>> design
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas Weise <
>> >> thw@apache.org>
>> >>>>>>>>>>>> 于2018年12月21日周五
>> >>>>>>>>>>>>>>>>>>>>>>>>>> 上午11:43写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great to see this
>> >> discussion
>> >>>>>>>> seeded!
>> >>>>>>>>>>> The
>> >>>>>>>>>>>>>>>>>>>>>>> problems
>> >>>>>>>>>>>>>>>>>>>>>>>>> you
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> face
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin integration
>> >> are also
>> >>>>>>>>>>> affecting
>> >>>>>>>>>>>>>> other
>> >>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We just enabled the
>> >> savepoint
>> >>>>>>>>>> restore
>> >>>>>>>>>>>> option
>> >>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RemoteStreamEnvironment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and that was more
>> >> difficult
>> >>>>>>>> than it
>> >>>>>>>>>>>> should
>> >>>>>>>>>>>>>> be.
>> >>>>>>>>>>>>>>>>>>>>>>> The
>> >>>>>>>>>>>>>>>>>>>>>>>>>> main
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment and
>> >> cluster client
>> >>>>>>>>>> aren't
>> >>>>>>>>>>>>>>> decoupled.
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Ideally
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible to just get
>> >> the
>> >>>>>>>> matching
>> >>>>>>>>>>>> cluster
>> >>>>>>>>>>>>>>> client
>> >>>>>>>>>>>>>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then control the job
>> >> through it
>> >>>>>>>>>>>> (environment
>> >>>>>>>>>>>>>>> as
>> >>>>>>>>>>>>>>>>>>>>>>>>>> factory
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client). But note
>> >> that the
>> >>>>>>>>>> environment
>> >>>>>>>>>>>>>> classes
>> >>>>>>>>>>>>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>>>>>>>>>>>> part
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and it is not
>> >> straightforward to
>> >>>>>>>>>> make
>> >>>>>>>>>>>> larger
>> >>>>>>>>>>>>>>>>>>>>>>>> changes
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> without
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> breaking
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backward
>> >> compatibility.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >> currently exposes
>> >>>>>>>>>>> internal
>> >>>>>>>>>>>>>>> classes
>> >>>>>>>>>>>>>>>>>>>>>>>>> like
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> StreamGraph. But it
>> >> should be
>> >>>>>>>>>> possible
>> >>>>>>>>>>>> to
>> >>>>>>>>>>>>>> wrap
>> >>>>>>>>>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> with a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that brings the
>> >> required job
>> >>>>>>>> control
>> >>>>>>>>>>>>>>>>>>>>>>> capabilities
>> >>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Perhaps it is helpful
>> >> to look at
>> >>>>>>>>>> some
>> >>>>>>>>>>>> of the
>> >>>>>>>>>>>>>>>>>>>>>>>>>> interfaces
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> while
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking about this:
>> >> [2] for the
>> >>>>>>>>>>>> portable
>> >>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>> API
>> >>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> old
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> asynchronous job
>> >> control from
>> >>>>>>>> the
>> >>>>>>>>>> Beam
>> >>>>>>>>>>>> Java
>> >>>>>>>>>>>>>>> SDK.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The backward
>> >> compatibility
>> >>>>>>>>>> discussion
>> >>>>>>>>>>>> [4] is
>> >>>>>>>>>>>>>>>>>>>>>>> also
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> relevant
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here. A
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should shield
>> >> downstream
>> >>>>>>>> projects
>> >>>>>>>>>> from
>> >>>>>>>>>>>>>>> internals
>> >>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> allow
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> them to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interoperate with
>> >> multiple
>> >>>>>>>> future
>> >>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>> versions
>> >>>>>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> release
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without forced
>> >> upgrades.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>> https://github.com/apache/flink/pull/7249
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>
>> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [4]
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>
>> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018
>> >> at 6:15 PM
>> >>>>>>>> Jeff
>> >>>>>>>>>>>> Zhang <
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
>> >> whether the
>> >>>>>>>> user
>> >>>>>>>>>>>> should
>> >>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>>>>> able
>> >>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> define
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
>> >> example Yarn).
>> >>>>>>>> This
>> >>>>>>>>>> is
>> >>>>>>>>>>>>>> actually
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> independent
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
>> >> something
>> >>>>>>>> which
>> >>>>>>>>>> is
>> >>>>>>>>>>>>>> decided
>> >>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> User don't need to
>> >> specify
>> >>>>>>>>>> execution
>> >>>>>>>>>>>> mode
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatically.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> They
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass the execution
>> >> mode from
>> >>>>>>>> the
>> >>>>>>>>>>>> arguments
>> >>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>> flink
>> >>>>>>>>>>>>>>>>>>>>>>>>>> run
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>> >> yarn-cluster
>> >>>>>>>> ....
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>> >> local ...
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>> >> host:port ...
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this make sense
>> >> to you ?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To me it makes
>> >> sense that
>> >>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized by the
>> >> user and
>> >>>>>>>> instead
>> >>>>>>>>>>>> context
>> >>>>>>>>>>>>>>>>>>>>>>>>> sensitive
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> how
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute your job
>> >> (Flink CLI vs.
>> >>>>>>>>>> IDE,
>> >>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>> example).
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Right, currently I
>> >> notice Flink
>> >>>>>>>>>> would
>> >>>>>>>>>>>>>> create
>> >>>>>>>>>>>>>>>>>>>>>>>>>> different
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >> ContextExecutionEnvironment
>> >>>>>>>> based
>> >>>>>>>>>> on
>> >>>>>>>>>>>>>>> different
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> submission
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scenarios
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cli vs IDE). To me
>> >> this is
>> >>>>>>>> kind of
>> >>>>>>>>>>> hack
>> >>>>>>>>>>>>>>>>>>>>>>> approach,
>> >>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> so
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> straightforward.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested
>> >> above is that
>> >>>>>>>> is
>> >>>>>>>>>>> that
>> >>>>>>>>>>>>>> flink
>> >>>>>>>>>>>>>>>>>>>>>>>> should
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> always
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >> but with
>> >>>>>>>>>>> different
>> >>>>>>>>>>>>>>>>>>>>>>>>>> configuration,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> based
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration it
>> >> would create
>> >>>>>>>> the
>> >>>>>>>>>>>> proper
>> >>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behaviors.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
>> >>>>>>>>>> trohrmann@apache.org>
>> >>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月20日周四
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:18写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You are probably
>> >> right that we
>> >>>>>>>>>> have
>> >>>>>>>>>>>> code
>> >>>>>>>>>>>>>>>>>>>>>>>>> duplication
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> when
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creation of the
>> >> ClusterClient.
>> >>>>>>>>>> This
>> >>>>>>>>>>>> should
>> >>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>>>>>>> reduced
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
>> >> whether the
>> >>>>>>>> user
>> >>>>>>>>>>>> should be
>> >>>>>>>>>>>>>>>>>>>>>>> able
>> >>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> define
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
>> >> example Yarn).
>> >>>>>>>> This
>> >>>>>>>>>> is
>> >>>>>>>>>>>>>>> actually
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> independent
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
>> >> something
>> >>>>>>>> which
>> >>>>>>>>>>> is
>> >>>>>>>>>>>>>>> decided
>> >>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> makes sense that the
>> >>>>>>>>>>>> ExecutionEnvironment
>> >>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user and
>> >> instead context
>> >>>>>>>>>>>> sensitive how
>> >>>>>>>>>>>>>>> you
>> >>>>>>>>>>>>>>>>>>>>>>>>> want
>> >>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE,
>> >> for
>> >>>>>>>> example).
>> >>>>>>>>>>>>>> However, I
>> >>>>>>>>>>>>>>>>>>>>>>>> agree
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >> ExecutionEnvironment should
>> >>>>>>>> give
>> >>>>>>>>>> you
>> >>>>>>>>>>>>>> access
>> >>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job (maybe in the
>> >> form of the
>> >>>>>>>>>>>> JobGraph or
>> >>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>> plan).
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 13,
>> >> 2018 at 4:36
>> >>>>>>>> AM
>> >>>>>>>>>> Jeff
>> >>>>>>>>>>>>>> Zhang <
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Till,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the
>> >> feedback. You
>> >>>>>>>> are
>> >>>>>>>>>>>> right
>> >>>>>>>>>>>>>>> that I
>> >>>>>>>>>>>>>>>>>>>>>>>>>> expect
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >> submission/control api
>> >>>>>>>> which
>> >>>>>>>>>>>> could be
>> >>>>>>>>>>>>>>>>>>>>>>> used
>> >>>>>>>>>>>>>>>>>>>>>>>> by
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> project.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it would benefit
>> >> for the
>> >>>>>>>> flink
>> >>>>>>>>>>>> ecosystem.
>> >>>>>>>>>>>>>>>>>>>>>>> When
>> >>>>>>>>>>>>>>>>>>>>>>>> I
>> >>>>>>>>>>>>>>>>>>>>>>>>>> look
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scala-shell and
>> >> sql-client (I
>> >>>>>>>>>>> believe
>> >>>>>>>>>>>>>> they
>> >>>>>>>>>>>>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> core of
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> belong to the
>> >> ecosystem of
>> >>>>>>>>>> flink),
>> >>>>>>>>>>> I
>> >>>>>>>>>>>> find
>> >>>>>>>>>>>>>>>>>>>>>>> many
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> duplicated
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creating
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient from
>> >> user
>> >>>>>>>> provided
>> >>>>>>>>>>>>>>>>>>>>>>> configuration
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (configuration
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> format
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different from
>> >> scala-shell
>> >>>>>>>> and
>> >>>>>>>>>>>>>> sql-client)
>> >>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> use
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to manipulate
>> >> jobs. I don't
>> >>>>>>>> think
>> >>>>>>>>>>>> this is
>> >>>>>>>>>>>>>>>>>>>>>>>>>> convenient
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects. What I
>> >> expect is
>> >>>>>>>> that
>> >>>>>>>>>>>>>> downstream
>> >>>>>>>>>>>>>>>>>>>>>>>>> project
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> only
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary
>> >> configuration info
>> >>>>>>>>>> (maybe
>> >>>>>>>>>>>>>>>>>>>>>>> introducing
>> >>>>>>>>>>>>>>>>>>>>>>>>>> class
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf),
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> build
>> >> ExecutionEnvironment
>> >>>>>>>> based
>> >>>>>>>>>> on
>> >>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >> ExecutionEnvironment will
>> >>>>>>>> create
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>> proper
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> benefit for the
>> >> downstream
>> >>>>>>>>>> project
>> >>>>>>>>>>>>>>>>>>>>>>> development
>> >>>>>>>>>>>>>>>>>>>>>>>>> but
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> also
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their integration
>> >> test with
>> >>>>>>>>>> flink.
>> >>>>>>>>>>>> Here's
>> >>>>>>>>>>>>>>> one
>> >>>>>>>>>>>>>>>>>>>>>>>>>> sample
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> snippet
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expect.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val conf = new
>> >>>>>>>>>>>> FlinkConf().mode("yarn")
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val env = new
>> >>>>>>>>>>>> ExecutionEnvironment(conf)
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobId =
>> >> env.submit(...)
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobStatus =
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What do you think ?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
>> >>>>>>>>>>> trohrmann@apache.org>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月11日周二
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午6:28写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what you are
>> >> proposing is to
>> >>>>>>>>>>>> provide the
>> >>>>>>>>>>>>>>>>>>>>>>> user
>> >>>>>>>>>>>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control. There
>> >> was actually
>> >>>>>>>> an
>> >>>>>>>>>>>> effort to
>> >>>>>>>>>>>>>>>>>>>>>>>> achieve
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> never
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> been
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> completed [1].
>> >> However,
>> >>>>>>>> there
>> >>>>>>>>>> are
>> >>>>>>>>>>>> some
>> >>>>>>>>>>>>>>>>>>>>>>>>> improvement
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> base
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> now.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Look for example
>> >> at the
>> >>>>>>>>>>>> NewClusterClient
>> >>>>>>>>>>>>>>>>>>>>>>>>> interface
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> offers a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> non-blocking job
>> >> submission.
>> >>>>>>>>>> But I
>> >>>>>>>>>>>> agree
>> >>>>>>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> need
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improve
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this regard.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would not be in
>> >> favour if
>> >>>>>>>>>>>> exposing all
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> calls
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >> ExecutionEnvironment
>> >>>>>>>> because it
>> >>>>>>>>>>>> would
>> >>>>>>>>>>>>>>>>>>>>>>> clutter
>> >>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> class
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> good separation
>> >> of concerns.
>> >>>>>>>>>>>> Instead one
>> >>>>>>>>>>>>>>>>>>>>>>> idea
>> >>>>>>>>>>>>>>>>>>>>>>>>>> could
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current
>> >> ClusterClient from
>> >>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for cluster and
>> >> job
>> >>>>>>>> control. But
>> >>>>>>>>>>>> before
>> >>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>>>>>> start
>> >>>>>>>>>>>>>>>>>>>>>>>>>> an
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> effort
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> agree and capture
>> >> what
>> >>>>>>>>>>>> functionality we
>> >>>>>>>>>>>>>>> want
>> >>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> provide.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the
>> >> idea was
>> >>>>>>>> that we
>> >>>>>>>>>>>> have the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> describing
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to talk to
>> >> cluster manager
>> >>>>>>>> like
>> >>>>>>>>>>>> Yarn or
>> >>>>>>>>>>>>>>>>>>>>>>> Mesos.
>> >>>>>>>>>>>>>>>>>>>>>>>>> The
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used for
>> >> deploying Flink
>> >>>>>>>>>> clusters
>> >>>>>>>>>>>> (job
>> >>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>> session)
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gives
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
>> >>>>>>>> ClusterClient
>> >>>>>>>>>>>>>> controls
>> >>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (e.g.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, listing all
>> >> running
>> >>>>>>>> jobs).
>> >>>>>>>>>>> And
>> >>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>>>>> there
>> >>>>>>>>>>>>>>>>>>>>>>>>>> was
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> idea
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobClient which
>> >> you obtain
>> >>>>>>>> from
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trigger
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operations (e.g.
>> >> taking a
>> >>>>>>>>>>> savepoint,
>> >>>>>>>>>>>>>>>>>>>>>>>> cancelling
>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> job).
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-4272
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11,
>> >> 2018 at
>> >>>>>>>> 10:13 AM
>> >>>>>>>>>>>> Jeff
>> >>>>>>>>>>>>>>> Zhang
>> >>>>>>>>>>>>>>>>>>>>>>> <
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am trying to
>> >> integrate
>> >>>>>>>> flink
>> >>>>>>>>>>> into
>> >>>>>>>>>>>>>>> apache
>> >>>>>>>>>>>>>>>>>>>>>>>>>> zeppelin
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which is
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> notebook. And I
>> >> hit several
>> >>>>>>>>>>> issues
>> >>>>>>>>>>>> that
>> >>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>>>> caused
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> by
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to
>> >> proposal the
>> >>>>>>>>>>> following
>> >>>>>>>>>>>>>>> changes
>> >>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Support
>> >> nonblocking
>> >>>>>>>>>> execution.
>> >>>>>>>>>>>>>>>>>>>>>>> Currently,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >> ExecutionEnvironment#execute
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a blocking
>> >> method which
>> >>>>>>>>>> would
>> >>>>>>>>>>>> do 2
>> >>>>>>>>>>>>>>>>>>>>>>> things,
>> >>>>>>>>>>>>>>>>>>>>>>>>>> first
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wait for job
>> >> until it is
>> >>>>>>>>>>> finished.
>> >>>>>>>>>>>> I'd
>> >>>>>>>>>>>>>>> like
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nonblocking
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution method
>> >> like
>> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#submit
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then return
>> >> jobId to
>> >>>>>>>> client.
>> >>>>>>>>>> And
>> >>>>>>>>>>>> allow
>> >>>>>>>>>>>>>>> user
>> >>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> query
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> status
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel
>> >> api in
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >> ExecutionEnvironment/StreamExecutionEnvironment,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> currently the
>> >> only way to
>> >>>>>>>>>> cancel
>> >>>>>>>>>>>> job is
>> >>>>>>>>>>>>>>> via
>> >>>>>>>>>>>>>>>>>>>>>>>> cli
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (bin/flink),
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> convenient for
>> >> downstream
>> >>>>>>>>>> project
>> >>>>>>>>>>>> to
>> >>>>>>>>>>>>>> use
>> >>>>>>>>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> feature.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So I'd
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> add
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel api in
>> >>>>>>>>>>> ExecutionEnvironment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint
>> >> api in
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is similar as
>> >> cancel api,
>> >>>>>>>> we
>> >>>>>>>>>>>> should use
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unified
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api for third
>> >> party to
>> >>>>>>>>>> integrate
>> >>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>>>>>>>> flink.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. Add listener
>> >> for job
>> >>>>>>>>>> execution
>> >>>>>>>>>>>>>>>>>>>>>>> lifecycle.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Something
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> so
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that downstream
>> >> project
>> >>>>>>>> can do
>> >>>>>>>>>>>> custom
>> >>>>>>>>>>>>>>> logic
>> >>>>>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lifecycle
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin would
>> >> capture the
>> >>>>>>>>>> jobId
>> >>>>>>>>>>>> after
>> >>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> submitted
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId to cancel
>> >> it later
>> >>>>>>>> when
>> >>>>>>>>>>>>>> necessary.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public interface
>> >>>>>>>> JobListener {
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>> >> onJobSubmitted(JobID
>> >>>>>>>>>>> jobId);
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>> >>>>>>>>>>>> onJobExecuted(JobExecutionResult
>> >>>>>>>>>>>>>>>>>>>>>>>>>> jobResult);
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>> >> onJobCanceled(JobID
>> >>>>>>>>>> jobId);
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> }
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5. Enable
>> >> session in
>> >>>>>>>>>>>>>>> ExecutionEnvironment.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Currently
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> disabled,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> session is very
>> >> convenient
>> >>>>>>>> for
>> >>>>>>>>>>>> third
>> >>>>>>>>>>>>>>> party
>> >>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> continually.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I hope flink can
>> >> enable it
>> >>>>>>>>>> again.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6. Unify all
>> >> flink client
>> >>>>>>>> api
>> >>>>>>>>>>> into
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a long
>> >> term issue
>> >>>>>>>> which
>> >>>>>>>>>>>> needs
>> >>>>>>>>>>>>>>> more
>> >>>>>>>>>>>>>>>>>>>>>>>>>> careful
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently some
>> >> of features
>> >>>>>>>> of
>> >>>>>>>>>>>> flink is
>> >>>>>>>>>>>>>>>>>>>>>>>> exposed
>> >>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> but
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some are
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exposed
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cli instead of
>> >> api, like
>> >>>>>>>> the
>> >>>>>>>>>>>> cancel and
>> >>>>>>>>>>>>>>>>>>>>>>>>>> savepoint I
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> above.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think the root
>> >> cause is
>> >>>>>>>> due to
>> >>>>>>>>>>> that
>> >>>>>>>>>>>>>> flink
>> >>>>>>>>>>>>>>>>>>>>>>>>> didn't
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> unify
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interaction
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink. Here I
>> >> list 3
>> >>>>>>>> scenarios
>> >>>>>>>>>> of
>> >>>>>>>>>>>> flink
>> >>>>>>>>>>>>>>>>>>>>>>>>> operation
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Local job
>> >> execution.
>> >>>>>>>> Flink
>> >>>>>>>>>>> will
>> >>>>>>>>>>>>>>> create
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>> >> LocalEnvironment to
>> >>>>>>>>>> create
>> >>>>>>>>>>>>>>>>>>>>>>>> LocalExecutor
>> >>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Remote job
>> >> execution.
>> >>>>>>>> Flink
>> >>>>>>>>>>> will
>> >>>>>>>>>>>>>>> create
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>> >> ContextEnvironment
>> >>>>>>>>>> based
>> >>>>>>>>>>>> on the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> run
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Job
>> >> cancelation. Flink
>> >>>>>>>> will
>> >>>>>>>>>>>> create
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this job via
>> >> this
>> >>>>>>>>>> ClusterClient.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As you can see
>> >> in the
>> >>>>>>>> above 3
>> >>>>>>>>>>>>>> scenarios.
>> >>>>>>>>>>>>>>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> didn't
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach(code
>> >> path) to
>> >>>>>>>> interact
>> >>>>>>>>>>>> with
>> >>>>>>>>>>>>>>> flink
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I propose is
>> >>>>>>>> following:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Create the proper
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment/RemoteEnvironment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (based
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration)
>> >> --> Use this
>> >>>>>>>>>>>> Environment
>> >>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>> create
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> proper
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >> (LocalClusterClient or
>> >>>>>>>>>>>>>> RestClusterClient)
>> >>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink (
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution or
>> >> cancelation)
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This way we can
>> >> unify the
>> >>>>>>>>>> process
>> >>>>>>>>>>>> of
>> >>>>>>>>>>>>>>> local
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> execution
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remote
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And it is much
>> >> easier for
>> >>>>>>>> third
>> >>>>>>>>>>>> party
>> >>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>> integrate
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >> ExecutionEnvironment is the
>> >>>>>>>>>>> unified
>> >>>>>>>>>>>>>> entry
>> >>>>>>>>>>>>>>>>>>>>>>>> point
>> >>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> third
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> party
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs to do is
>> >> just pass
>> >>>>>>>>>>>> configuration
>> >>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >> ExecutionEnvironment will
>> >>>>>>>> do
>> >>>>>>>>>> the
>> >>>>>>>>>>>> right
>> >>>>>>>>>>>>>>>>>>>>>>> thing
>> >>>>>>>>>>>>>>>>>>>>>>>>>> based
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink cli can
>> >> also be
>> >>>>>>>>>> considered
>> >>>>>>>>>>> as
>> >>>>>>>>>>>>>> flink
>> >>>>>>>>>>>>>>>>>>>>>>> api
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> consumer.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> just
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration to
>> >>>>>>>>>>>> ExecutionEnvironment
>> >>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>> let
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create the proper
>> >>>>>>>> ClusterClient
>> >>>>>>>>>>>> instead
>> >>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>>>>>>> letting
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> cli
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> >> directly.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6 would involve
>> >> large code
>> >>>>>>>>>>>> refactoring,
>> >>>>>>>>>>>>>>> so
>> >>>>>>>>>>>>>>>>>>>>>>> I
>> >>>>>>>>>>>>>>>>>>>>>>>>>> think
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> defer
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future release,
>> >> 1,2,3,4,5
>> >>>>>>>> could
>> >>>>>>>>>>> be
>> >>>>>>>>>>>> done
>> >>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>> once I
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> believe.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comments and
>> >> feedback,
>> >>>>>>>> thanks
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Best Regards
>> >>>>>>>
>> >>>>>>> Jeff Zhang
>> >>
>> >>
>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Jeff Zhang <zj...@gmail.com>.

+1 for using slack for instant communication

Aljoscha Krettek <al...@apache.org> 于2019年9月11日周三 下午4:44写道：

> Hi,
>
> We could try and use the ASF slack for this purpose, that would probably
> be easiest. See https://s.apache.org/slack-invite. We could create a
> dedicated channel for our work and would still use the open ASF
> infrastructure and people can have a look if they are interested because
> discussion would be public. What do you think?
>
> P.S. Committers/PMCs should should be able to login with their apache ID.
>
> Best,
> Aljoscha
>
> > On 6. Sep 2019, at 14:24, Zili Chen <wa...@gmail.com> wrote:
> >
> > Hi Aljoscha,
> >
> > I'd like to gather all the ideas here and among documents, and draft a
> > formal FLIP
> > that keep us on the same page. Hopefully I start a FLIP thread in next
> week.
> >
> > For the implementation or said POC part, I'd like to work with you guys
> who
> > proposed
> > the concept Executor to make sure that we go in the same direction. I'm
> > wondering
> > whether a dedicate thread or a Slack group is the proper one. In my
> opinion
> > we can
> > involve the team in a Slack group, concurrent with the FLIP process start
> > our branch
> > and once we reach a consensus on the FLIP, open an umbrella issue about
> the
> > framework
> > and start subtasks. What do you think?
> >
> > Best,
> > tison.
> >
> >
> > Aljoscha Krettek <al...@apache.org> 于2019年9月5日周四 下午9:39写道：
> >
> >> Hi Tison,
> >>
> >> To keep this moving forward, maybe you want to start working on a proof
> of
> >> concept implementation for the new JobClient interface, maybe with a new
> >> method executeAsync() in the environment that returns the JobClient and
> >> implement the methods to see how that works and to see where we get.
> Would
> >> you be interested in that?
> >>
> >> Also, at some point we should collect all the ideas and start forming an
> >> actual FLIP.
> >>
> >> Best,
> >> Aljoscha
> >>
> >>> On 4. Sep 2019, at 12:04, Zili Chen <wa...@gmail.com> wrote:
> >>>
> >>> Thanks for your update Kostas!
> >>>
> >>> It looks good to me that clean up existing code paths as first
> >>> pass. I'd like to help on review and file subtasks if I find ones.
> >>>
> >>> Best,
> >>> tison.
> >>>
> >>>
> >>> Kostas Kloudas <kk...@gmail.com> 于2019年9月4日周三 下午5:52写道：
> >>> Here is the issue, and I will keep on updating it as I find more
> issues.
> >>>
> >>> https://issues.apache.org/jira/browse/FLINK-13954
> >>>
> >>> This will also cover the refactoring of the Executors that we discussed
> >>> in this thread, without any additional functionality (such as the job
> >> client).
> >>>
> >>> Kostas
> >>>
> >>> On Wed, Sep 4, 2019 at 11:46 AM Kostas Kloudas <kk...@gmail.com>
> >> wrote:
> >>>>
> >>>> Great idea Tison!
> >>>>
> >>>> I will create the umbrella issue and post it here so that we are all
> >>>> on the same page!
> >>>>
> >>>> Cheers,
> >>>> Kostas
> >>>>
> >>>> On Wed, Sep 4, 2019 at 11:36 AM Zili Chen <wa...@gmail.com>
> >> wrote:
> >>>>>
> >>>>> Hi Kostas & Aljoscha,
> >>>>>
> >>>>> I notice that there is a JIRA(FLINK-13946) which could be included
> >>>>> in this refactor thread. Since we agree on most of directions in
> >>>>> big picture, is it reasonable that we create an umbrella issue for
> >>>>> refactor client APIs and also linked subtasks? It would be a better
> >>>>> way that we join forces of our community.
> >>>>>
> >>>>> Best,
> >>>>> tison.
> >>>>>
> >>>>>
> >>>>> Zili Chen <wa...@gmail.com> 于2019年8月31日周六 下午12:52写道：
> >>>>>>
> >>>>>> Great Kostas! Looking forward to your POC!
> >>>>>>
> >>>>>> Best,
> >>>>>> tison.
> >>>>>>
> >>>>>>
> >>>>>> Jeff Zhang <zj...@gmail.com> 于2019年8月30日周五 下午11:07写道：
> >>>>>>>
> >>>>>>> Awesome, @Kostas Looking forward your POC.
> >>>>>>>
> >>>>>>> Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：
> >>>>>>>
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> I am just writing here to let you know that I am working on a
> >> POC that
> >>>>>>>> tries to refactor the current state of job submission in Flink.
> >>>>>>>> I want to stress out that it introduces NO CHANGES to the current
> >>>>>>>> behaviour of Flink. It just re-arranges things and introduces the
> >>>>>>>> notion of an Executor, which is the entity responsible for
> >> taking the
> >>>>>>>> user-code and submitting it for execution.
> >>>>>>>>
> >>>>>>>> Given this, the discussion about the functionality that the
> >> JobClient
> >>>>>>>> will expose to the user can go on independently and the same
> >>>>>>>> holds for all the open questions so far.
> >>>>>>>>
> >>>>>>>> I hope I will have some more new to share soon.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Kostas
> >>>>>>>>
> >>>>>>>> On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <da...@gmail.com>
> >> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Zili,
> >>>>>>>>>
> >>>>>>>>> It make sense to me that a dedicated cluster is started for a
> >> per-job
> >>>>>>>>> cluster and will not accept more jobs.
> >>>>>>>>> Just have a question about the command line.
> >>>>>>>>>
> >>>>>>>>> Currently we could use the following commands to start
> >> different
> >>>>>>>> clusters.
> >>>>>>>>> *per-job cluster*
> >>>>>>>>> ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
> >>>>>>>>> examples/streaming/WindowJoin.jar
> >>>>>>>>> *session cluster*
> >>>>>>>>> ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
> >>>>>>>>> examples/streaming/WindowJoin.jar
> >>>>>>>>>
> >>>>>>>>> What will it look like after client enhancement?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Yang
> >>>>>>>>>
> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
> >>>>>>>>>
> >>>>>>>>>> Hi Till,
> >>>>>>>>>>
> >>>>>>>>>> Thanks for your update. Nice to hear :-)
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> tison.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Till Rohrmann <tr...@apache.org> 于2019年8月23日周五
> >> 下午10:39写道：
> >>>>>>>>>>
> >>>>>>>>>>> Hi Tison,
> >>>>>>>>>>>
> >>>>>>>>>>> just a quick comment concerning the class loading issues
> >> when using
> >>>>>>>> the
> >>>>>>>>>> per
> >>>>>>>>>>> job mode. The community wants to change it so that the
> >>>>>>>>>>> StandaloneJobClusterEntryPoint actually uses the user code
> >> class
> >>>>>>>> loader
> >>>>>>>>>>> with child first class loading [1]. Hence, I hope that
> >> this problem
> >>>>>>>> will
> >>>>>>>>>> be
> >>>>>>>>>>> resolved soon.
> >>>>>>>>>>>
> >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-13840
> >>>>>>>>>>>
> >>>>>>>>>>> Cheers,
> >>>>>>>>>>> Till
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <
> >> kkloudas@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>
> >>>>>>>>>>>> On the topic of web submission, I agree with Till that
> >> it only
> >>>>>>>> seems
> >>>>>>>>>>>> to complicate things.
> >>>>>>>>>>>> It is bad for security, job isolation (anybody can
> >> submit/cancel
> >>>>>>>> jobs),
> >>>>>>>>>>>> and its
> >>>>>>>>>>>> implementation complicates some parts of the code. So,
> >> if it were
> >>>>>>>> to
> >>>>>>>>>>>> redesign the
> >>>>>>>>>>>> WebUI, maybe this part could be left out. In addition, I
> >> would say
> >>>>>>>>>>>> that the ability to cancel
> >>>>>>>>>>>> jobs could also be left out.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Also I would also be in favour of removing the
> >> "detached" mode, for
> >>>>>>>>>>>> the reasons mentioned
> >>>>>>>>>>>> above (i.e. because now we will have a future
> >> representing the
> >>>>>>>> result
> >>>>>>>>>>>> on which the user
> >>>>>>>>>>>> can choose to wait or not).
> >>>>>>>>>>>>
> >>>>>>>>>>>> Now for the separating job submission and cluster
> >> creation, I am in
> >>>>>>>>>>>> favour of keeping both.
> >>>>>>>>>>>> Once again, the reasons are mentioned above by Stephan,
> >> Till,
> >>>>>>>> Aljoscha
> >>>>>>>>>>>> and also Zili seems
> >>>>>>>>>>>> to agree. They mainly have to do with security,
> >> isolation and ease
> >>>>>>>> of
> >>>>>>>>>>>> resource management
> >>>>>>>>>>>> for the user as he knows that "when my job is done,
> >> everything
> >>>>>>>> will be
> >>>>>>>>>>>> cleared up". This is
> >>>>>>>>>>>> also the experience you get when launching a process on
> >> your local
> >>>>>>>> OS.
> >>>>>>>>>>>>
> >>>>>>>>>>>> On excluding the per-job mode from returning a JobClient
> >> or not, I
> >>>>>>>>>>>> believe that eventually
> >>>>>>>>>>>> it would be nice to allow users to get back a jobClient.
> >> The
> >>>>>>>> reason is
> >>>>>>>>>>>> that 1) I cannot
> >>>>>>>>>>>> find any objective reason why the user-experience should
> >> diverge,
> >>>>>>>> and
> >>>>>>>>>>>> 2) this will be the
> >>>>>>>>>>>> way that the user will be able to interact with his
> >> running job.
> >>>>>>>>>>>> Assuming that the necessary
> >>>>>>>>>>>> ports are open for the REST API to work, then I think
> >> that the
> >>>>>>>>>>>> JobClient can run against the
> >>>>>>>>>>>> REST API without problems. If the needed ports are not
> >> open, then
> >>>>>>>> we
> >>>>>>>>>>>> are safe to not return
> >>>>>>>>>>>> a JobClient, as the user explicitly chose to close all
> >> points of
> >>>>>>>>>>>> communication to his running job.
> >>>>>>>>>>>>
> >>>>>>>>>>>> On the topic of not hijacking the "env.execute()" in
> >> order to get
> >>>>>>>> the
> >>>>>>>>>>>> Plan, I definitely agree but
> >>>>>>>>>>>> for the proposal of having a "compile()" method in the
> >> env, I would
> >>>>>>>>>>>> like to have a better look at
> >>>>>>>>>>>> the existing code.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Cheers,
> >>>>>>>>>>>> Kostas
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <
> >> wander4096@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Yang,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It would be helpful if you check Stephan's last
> >> comment,
> >>>>>>>>>>>>> which states that isolation is important.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> For per-job mode, we run a dedicated cluster(maybe it
> >>>>>>>>>>>>> should have been a couple of JM and TMs during FLIP-6
> >>>>>>>>>>>>> design) for a specific job. Thus the process is
> >> prevented
> >>>>>>>>>>>>> from other jobs.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> In our cases there was a time we suffered from multi
> >>>>>>>>>>>>> jobs submitted by different users and they affected
> >>>>>>>>>>>>> each other so that all ran into an error state. Also,
> >>>>>>>>>>>>> run the client inside the cluster could save client
> >>>>>>>>>>>>> resource at some points.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> However, we also face several issues as you mentioned,
> >>>>>>>>>>>>> that in per-job mode it always uses parent classloader
> >>>>>>>>>>>>> thus classloading issues occur.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> BTW, one can makes an analogy between session/per-job
> >> mode
> >>>>>>>>>>>>> in  Flink, and client/cluster mode in Spark.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> tison.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Yang Wang <da...@gmail.com> 于2019年8月22日周四
> >> 上午11:25写道：
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> From the user's perspective, it is really confused
> >> about the
> >>>>>>>> scope
> >>>>>>>>>> of
> >>>>>>>>>>>>>> per-job cluster.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> If it means a flink cluster with single job, so that
> >> we could
> >>>>>>>> get
> >>>>>>>>>>>> better
> >>>>>>>>>>>>>> isolation.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Now it does not matter how we deploy the cluster,
> >> directly
> >>>>>>>>>>>> deploy(mode1)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> or start a flink cluster and then submit job through
> >> cluster
> >>>>>>>>>>>> client(mode2).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Otherwise, if it just means directly deploy, how
> >> should we
> >>>>>>>> name the
> >>>>>>>>>>>> mode2,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> session with job or something else?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We could also benefit from the mode2. Users could
> >> get the same
> >>>>>>>>>>>> isolation
> >>>>>>>>>>>>>> with mode1.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The user code and dependencies will be loaded by
> >> user class
> >>>>>>>> loader
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> to avoid class conflict with framework.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Anyway, both of the two submission modes are useful.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We just need to clarify the concepts.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月20日周二
> >> 下午5:58写道：
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks for the clarification.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The idea JobDeployer ever came into my mind when I
> >> was
> >>>>>>>> muddled
> >>>>>>>>>> with
> >>>>>>>>>>>>>>> how to execute per-job mode and session mode with
> >> the same
> >>>>>>>> user
> >>>>>>>>>>> code
> >>>>>>>>>>>>>>> and framework codepath.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> With the concept JobDeployer we back to the
> >> statement that
> >>>>>>>>>>>> environment
> >>>>>>>>>>>>>>> knows every configs of cluster deployment and job
> >>>>>>>> submission. We
> >>>>>>>>>>>>>>> configure or generate from configuration a specific
> >>>>>>>> JobDeployer
> >>>>>>>>>> in
> >>>>>>>>>>>>>>> environment and then code align on
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> *JobClient client = env.execute().get();*
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> which in session mode returned by
> >> clusterClient.submitJob
> >>>>>>>> and in
> >>>>>>>>>>>> per-job
> >>>>>>>>>>>>>>> mode returned by
> >> clusterDescriptor.deployJobCluster.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Here comes a problem that currently we directly run
> >>>>>>>>>>> ClusterEntrypoint
> >>>>>>>>>>>>>>> with extracted job graph. Follow the JobDeployer
> >> way we'd
> >>>>>>>> better
> >>>>>>>>>>>>>>> align entry point of per-job deployment at
> >> JobDeployer.
> >>>>>>>> Users run
> >>>>>>>>>>>>>>> their main method or by a Cli(finally call main
> >> method) to
> >>>>>>>> deploy
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> job cluster.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>> tison.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年8月20日周二
> >> 下午4:40写道：
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Till has made some good comments here.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Two things to add:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>  - The job mode is very nice in the way that it
> >> runs the
> >>>>>>>>>> client
> >>>>>>>>>>>> inside
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> cluster (in the same image/process that is the
> >> JM) and thus
> >>>>>>>>>>> unifies
> >>>>>>>>>>>>>> both
> >>>>>>>>>>>>>>>> applications and what the Spark world calls the
> >> "driver
> >>>>>>>> mode".
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>  - Another thing I would add is that during the
> >> FLIP-6
> >>>>>>>> design,
> >>>>>>>>>>> we
> >>>>>>>>>>>> were
> >>>>>>>>>>>>>>>> thinking about setups where Dispatcher and
> >> JobManager are
> >>>>>>>>>>> separate
> >>>>>>>>>>>>>>>> processes.
> >>>>>>>>>>>>>>>>    A Yarn or Mesos Dispatcher of a session
> >> could run
> >>>>>>>>>>> independently
> >>>>>>>>>>>>>> (even
> >>>>>>>>>>>>>>>> as privileged processes executing no code).
> >>>>>>>>>>>>>>>>    Then you the "per-job" mode could still be
> >> helpful:
> >>>>>>>> when a
> >>>>>>>>>>> job
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>>> submitted to the dispatcher, it launches the JM
> >> again in a
> >>>>>>>>>>> per-job
> >>>>>>>>>>>>>> mode,
> >>>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>>> that JM and TM processes are bound to teh job
> >> only. For
> >>>>>>>> higher
> >>>>>>>>>>>> security
> >>>>>>>>>>>>>>>> setups, it is important that processes are not
> >> reused
> >>>>>>>> across
> >>>>>>>>>>> jobs.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
> >>>>>>>>>>>> trohrmann@apache.org>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I would not be in favour of getting rid of the
> >> per-job
> >>>>>>>> mode
> >>>>>>>>>>>> since it
> >>>>>>>>>>>>>>>>> simplifies the process of running Flink jobs
> >>>>>>>> considerably.
> >>>>>>>>>>>> Moreover,
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> not only well suited for container deployments
> >> but also
> >>>>>>>> for
> >>>>>>>>>>>>>> deployments
> >>>>>>>>>>>>>>>>> where you want to guarantee job isolation. For
> >> example, a
> >>>>>>>>>> user
> >>>>>>>>>>>> could
> >>>>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>> the per-job mode on Yarn to execute his job on
> >> a separate
> >>>>>>>>>>>> cluster.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I think that having two notions of cluster
> >> deployments
> >>>>>>>>>> (session
> >>>>>>>>>>>> vs.
> >>>>>>>>>>>>>>>> per-job
> >>>>>>>>>>>>>>>>> mode) does not necessarily contradict your
> >> ideas for the
> >>>>>>>>>> client
> >>>>>>>>>>>> api
> >>>>>>>>>>>>>>>>> refactoring. For example one could have the
> >> following
> >>>>>>>>>>> interfaces:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> - ClusterDeploymentDescriptor: encapsulates
> >> the logic
> >>>>>>>> how to
> >>>>>>>>>>>> deploy a
> >>>>>>>>>>>>>>>>> cluster.
> >>>>>>>>>>>>>>>>> - ClusterClient: allows to interact with a
> >> cluster
> >>>>>>>>>>>>>>>>> - JobClient: allows to interact with a running
> >> job
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Now the ClusterDeploymentDescriptor could have
> >> two
> >>>>>>>> methods:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> - ClusterClient deploySessionCluster()
> >>>>>>>>>>>>>>>>> - JobClusterClient/JobClient
> >>>>>>>> deployPerJobCluster(JobGraph)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> where JobClusterClient is either a supertype of
> >>>>>>>> ClusterClient
> >>>>>>>>>>>> which
> >>>>>>>>>>>>>>> does
> >>>>>>>>>>>>>>>>> not give you the functionality to submit jobs
> >> or
> >>>>>>>>>>>> deployPerJobCluster
> >>>>>>>>>>>>>>>>> returns directly a JobClient.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> When setting up the ExecutionEnvironment, one
> >> would then
> >>>>>>>> not
> >>>>>>>>>>>> provide
> >>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>> ClusterClient to submit jobs but a JobDeployer
> >> which,
> >>>>>>>>>> depending
> >>>>>>>>>>>> on
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> selected mode, either uses a ClusterClient
> >> (session
> >>>>>>>> mode) to
> >>>>>>>>>>>> submit
> >>>>>>>>>>>>>>> jobs
> >>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>> a ClusterDeploymentDescriptor to deploy per a
> >> job mode
> >>>>>>>>>> cluster
> >>>>>>>>>>>> with
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>> to execute.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> These are just some thoughts how one could
> >> make it
> >>>>>>>> working
> >>>>>>>>>>>> because I
> >>>>>>>>>>>>>>>>> believe there is some value in using the per
> >> job mode
> >>>>>>>> from
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>>> ExecutionEnvironment.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Concerning the web submission, this is indeed
> >> a bit
> >>>>>>>> tricky.
> >>>>>>>>>>> From
> >>>>>>>>>>>> a
> >>>>>>>>>>>>>>>> cluster
> >>>>>>>>>>>>>>>>> management stand point, I would in favour of
> >> not
> >>>>>>>> executing
> >>>>>>>>>> user
> >>>>>>>>>>>> code
> >>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> REST endpoint. Especially when considering
> >> security, it
> >>>>>>>> would
> >>>>>>>>>>> be
> >>>>>>>>>>>> good
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> have a well defined cluster behaviour where it
> >> is
> >>>>>>>> explicitly
> >>>>>>>>>>>> stated
> >>>>>>>>>>>>>>> where
> >>>>>>>>>>>>>>>>> user code and, thus, potentially risky code is
> >> executed.
> >>>>>>>>>>> Ideally
> >>>>>>>>>>>> we
> >>>>>>>>>>>>>>> limit
> >>>>>>>>>>>>>>>>> it to the TaskExecutor and JobMaster.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>> Till
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:40 AM Flavio
> >> Pompermaier <
> >>>>>>>>>>>>>>> pompermaier@okkam.it
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> In my opinion the client should not use any
> >>>>>>>> environment to
> >>>>>>>>>>> get
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>> Job
> >>>>>>>>>>>>>>>>>> graph because the jar should reside ONLY on
> >> the cluster
> >>>>>>>>>> (and
> >>>>>>>>>>>> not in
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> client classpath otherwise there are always
> >>>>>>>> inconsistencies
> >>>>>>>>>>>> between
> >>>>>>>>>>>>>>>>> client
> >>>>>>>>>>>>>>>>>> and Flink Job manager's classpath).
> >>>>>>>>>>>>>>>>>> In the YARN, Mesos and Kubernetes scenarios
> >> you have
> >>>>>>>> the
> >>>>>>>>>> jar
> >>>>>>>>>>>> but
> >>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>> start a cluster that has the jar on the Job
> >> Manager as
> >>>>>>>> well
> >>>>>>>>>>>> (but
> >>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>> the only case where I think you can assume
> >> that the
> >>>>>>>> client
> >>>>>>>>>>> has
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>> jar
> >>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>> the classpath..in the REST job submission
> >> you don't
> >>>>>>>> have
> >>>>>>>>>> any
> >>>>>>>>>>>>>>>> classpath).
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thus, always in my opinion, the JobGraph
> >> should be
> >>>>>>>>>> generated
> >>>>>>>>>>>> by the
> >>>>>>>>>>>>>>> Job
> >>>>>>>>>>>>>>>>>> Manager REST API.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
> >>>>>>>>>>>> wander4096@gmail.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I would like to involve Till & Stephan here
> >> to clarify
> >>>>>>>>>> some
> >>>>>>>>>>>>>> concept
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> per-job mode.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> The term per-job is one of modes a cluster
> >> could run
> >>>>>>>> on.
> >>>>>>>>>> It
> >>>>>>>>>>> is
> >>>>>>>>>>>>>>> mainly
> >>>>>>>>>>>>>>>>>>> aimed
> >>>>>>>>>>>>>>>>>>> at spawn
> >>>>>>>>>>>>>>>>>>> a dedicated cluster for a specific job
> >> while the job
> >>>>>>>> could
> >>>>>>>>>>> be
> >>>>>>>>>>>>>>> packaged
> >>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>>>>> itself and thus the cluster initialized
> >> with job so
> >>>>>>>> that
> >>>>>>>>>> get
> >>>>>>>>>>>> rid
> >>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>> separated
> >>>>>>>>>>>>>>>>>>> submission step.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> This is useful for container deployments
> >> where one
> >>>>>>>> create
> >>>>>>>>>>> his
> >>>>>>>>>>>>>> image
> >>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>> the job
> >>>>>>>>>>>>>>>>>>> and then simply deploy the container.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> However, it is out of client scope since a
> >>>>>>>>>>>> client(ClusterClient
> >>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>> example) is for
> >>>>>>>>>>>>>>>>>>> communicate with an existing cluster and
> >> performance
> >>>>>>>>>>> actions.
> >>>>>>>>>>>>>>>> Currently,
> >>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>> per-job
> >>>>>>>>>>>>>>>>>>> mode, we extract the job graph and bundle
> >> it into
> >>>>>>>> cluster
> >>>>>>>>>>>>>> deployment
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> thus no
> >>>>>>>>>>>>>>>>>>> concept of client get involved. It looks
> >> like
> >>>>>>>> reasonable
> >>>>>>>>>> to
> >>>>>>>>>>>>>> exclude
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> deployment
> >>>>>>>>>>>>>>>>>>> of per-job cluster from client api and use
> >> dedicated
> >>>>>>>>>> utility
> >>>>>>>>>>>>>>>>>>> classes(deployers) for
> >>>>>>>>>>>>>>>>>>> deployment.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
> >> 于2019年8月20日周二
> >>>>>>>> 下午12:37写道：
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hi Aljoscha,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks for your reply and participance.
> >> The Google
> >>>>>>>> Doc
> >>>>>>>>>> you
> >>>>>>>>>>>>>> linked
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> requires
> >>>>>>>>>>>>>>>>>>>> permission and I think you could use a
> >> share link
> >>>>>>>>>> instead.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I agree with that we almost reach a
> >> consensus that
> >>>>>>>>>>>> JobClient is
> >>>>>>>>>>>>>>>>>>> necessary
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> interacte with a running Job.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Let me check your open questions one by
> >> one.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> 1. Separate cluster creation and job
> >> submission for
> >>>>>>>>>>> per-job
> >>>>>>>>>>>>>> mode.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> As you mentioned here is where the
> >> opinions
> >>>>>>>> diverge. In
> >>>>>>>>>> my
> >>>>>>>>>>>>>>> document
> >>>>>>>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> an alternative[2] that proposes excluding
> >> per-job
> >>>>>>>>>>> deployment
> >>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>> client
> >>>>>>>>>>>>>>>>>>>> api
> >>>>>>>>>>>>>>>>>>>> scope and now I find it is more
> >> reasonable we do the
> >>>>>>>>>>>> exclusion.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> When in per-job mode, a dedicated
> >> JobCluster is
> >>>>>>>> launched
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>> execute
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> specific job. It is like a Flink
> >> Application more
> >>>>>>>> than a
> >>>>>>>>>>>>>>> submission
> >>>>>>>>>>>>>>>>>>>> of Flink Job. Client only takes care of
> >> job
> >>>>>>>> submission
> >>>>>>>>>> and
> >>>>>>>>>>>>>> assume
> >>>>>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> an existing cluster. In this way we are
> >> able to
> >>>>>>>> consider
> >>>>>>>>>>>> per-job
> >>>>>>>>>>>>>>>>> issues
> >>>>>>>>>>>>>>>>>>>> individually and JobClusterEntrypoint
> >> would be the
> >>>>>>>>>> utility
> >>>>>>>>>>>> class
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>> per-job
> >>>>>>>>>>>>>>>>>>>> deployment.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Nevertheless, user program works in both
> >> session
> >>>>>>>> mode
> >>>>>>>>>> and
> >>>>>>>>>>>>>> per-job
> >>>>>>>>>>>>>>>> mode
> >>>>>>>>>>>>>>>>>>>> without
> >>>>>>>>>>>>>>>>>>>> necessary to change code. JobClient in
> >> per-job mode
> >>>>>>>> is
> >>>>>>>>>>>> returned
> >>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>> env.execute as normal. However, it would
> >> be no
> >>>>>>>> longer a
> >>>>>>>>>>>> wrapper
> >>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>> RestClusterClient but a wrapper of
> >>>>>>>> PerJobClusterClient
> >>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>> communicates
> >>>>>>>>>>>>>>>>>>>> to Dispatcher locally.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> 2. How to deal with plan preview.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> With env.compile functions users can get
> >> JobGraph or
> >>>>>>>>>>>> FlinkPlan
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> thus
> >>>>>>>>>>>>>>>>>>>> they can preview the plan with
> >> programming.
> >>>>>>>> Typically it
> >>>>>>>>>>>> looks
> >>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> if (preview configured) {
> >>>>>>>>>>>>>>>>>>>>    FlinkPlan plan = env.compile();
> >>>>>>>>>>>>>>>>>>>>    new JSONDumpGenerator(...).dump(plan);
> >>>>>>>>>>>>>>>>>>>> } else {
> >>>>>>>>>>>>>>>>>>>>    env.execute();
> >>>>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> And `flink info` would be invalid any
> >> more.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> 3. How to deal with Jar Submission at the
> >> Web
> >>>>>>>> Frontend.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> There is one more thread talked on this
> >> topic[1].
> >>>>>>>> Apart
> >>>>>>>>>>> from
> >>>>>>>>>>>>>>>> removing
> >>>>>>>>>>>>>>>>>>>> the functions there are two alternatives.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> One is to introduce an interface has a
> >> method
> >>>>>>>> returns
> >>>>>>>>>>>>>>>>> JobGraph/FilnkPlan
> >>>>>>>>>>>>>>>>>>>> and Jar Submission only support main-class
> >>>>>>>> implements
> >>>>>>>>>> this
> >>>>>>>>>>>>>>>> interface.
> >>>>>>>>>>>>>>>>>>>> And then extract the JobGraph/FlinkPlan
> >> just by
> >>>>>>>> calling
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> method.
> >>>>>>>>>>>>>>>>>>>> In this way, it is even possible to
> >> consider a
> >>>>>>>>>> separation
> >>>>>>>>>>>> of job
> >>>>>>>>>>>>>>>>>>> creation
> >>>>>>>>>>>>>>>>>>>> and job submission.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> The other is, as you mentioned, let
> >> execute() do the
> >>>>>>>>>>> actual
> >>>>>>>>>>>>>>>> execution.
> >>>>>>>>>>>>>>>>>>>> We won't execute the main method in the
> >> WebFrontend
> >>>>>>>> but
> >>>>>>>>>>>> spawn a
> >>>>>>>>>>>>>>>>> process
> >>>>>>>>>>>>>>>>>>>> at WebMonitor side to execute. For return
> >> part we
> >>>>>>>> could
> >>>>>>>>>>>> generate
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> JobID from WebMonitor and pass it to the
> >> execution
> >>>>>>>>>>>> environemnt.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> 4. How to deal with detached mode.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I think detached mode is a temporary
> >> solution for
> >>>>>>>>>>>> non-blocking
> >>>>>>>>>>>>>>>>>>> submission.
> >>>>>>>>>>>>>>>>>>>> In my document both submission and
> >> execution return
> >>>>>>>> a
> >>>>>>>>>>>>>>>>> CompletableFuture
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> users control whether or not wait for the
> >> result. In
> >>>>>>>>>> this
> >>>>>>>>>>>> point
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>> don't
> >>>>>>>>>>>>>>>>>>>> need a detached option but the
> >> functionality is
> >>>>>>>> covered.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> 5. How does per-job mode interact with
> >> interactive
> >>>>>>>>>>>> programming.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> All of YARN, Mesos and Kubernetes
> >> scenarios follow
> >>>>>>>> the
> >>>>>>>>>>>> pattern
> >>>>>>>>>>>>>>>> launch
> >>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>> JobCluster now. And I don't think there
> >> would be
> >>>>>>>>>>>> inconsistency
> >>>>>>>>>>>>>>>> between
> >>>>>>>>>>>>>>>>>>>> different resource management.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>> tison.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>
> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> >>>>>>>>>>>>>>>>>>>> [2]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org>
> >>>>>>>> 于2019年8月16日周五
> >>>>>>>>>>>> 下午9:20写道：
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I read both Jeffs initial design
> >> document and the
> >>>>>>>> newer
> >>>>>>>>>>>>>> document
> >>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>> Tison. I also finally found the time to
> >> collect our
> >>>>>>>>>>>> thoughts on
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> issue,
> >>>>>>>>>>>>>>>>>>>>> I had quite some discussions with Kostas
> >> and this
> >>>>>>>> is
> >>>>>>>>>> the
> >>>>>>>>>>>>>> result:
> >>>>>>>>>>>>>>>> [1].
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I think overall we agree that this part
> >> of the
> >>>>>>>> code is
> >>>>>>>>>> in
> >>>>>>>>>>>> dire
> >>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>> some refactoring/improvements but I
> >> think there are
> >>>>>>>>>> still
> >>>>>>>>>>>> some
> >>>>>>>>>>>>>>> open
> >>>>>>>>>>>>>>>>>>>>> questions and some differences in
> >> opinion what
> >>>>>>>> those
> >>>>>>>>>>>>>> refactorings
> >>>>>>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>>>> look like.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I think the API-side is quite clear,
> >> i.e. we need
> >>>>>>>> some
> >>>>>>>>>>>>>> JobClient
> >>>>>>>>>>>>>>>> API
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> allows interacting with a running Job.
> >> It could be
> >>>>>>>>>>>> worthwhile
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> spin
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> off into a separate FLIP because we can
> >> probably
> >>>>>>>> find
> >>>>>>>>>>>> consensus
> >>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> part more easily.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> For the rest, the main open questions
> >> from our doc
> >>>>>>>> are
> >>>>>>>>>>>> these:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>  - Do we want to separate cluster
> >> creation and job
> >>>>>>>>>>>> submission
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>> per-job mode? In the past, there were
> >> conscious
> >>>>>>>> efforts
> >>>>>>>>>>> to
> >>>>>>>>>>>>>> *not*
> >>>>>>>>>>>>>>>>>>> separate
> >>>>>>>>>>>>>>>>>>>>> job submission from cluster creation for
> >> per-job
> >>>>>>>>>> clusters
> >>>>>>>>>>>> for
> >>>>>>>>>>>>>>>> Mesos,
> >>>>>>>>>>>>>>>>>>> YARN,
> >>>>>>>>>>>>>>>>>>>>> Kubernets (see
> >> StandaloneJobClusterEntryPoint).
> >>>>>>>> Tison
> >>>>>>>>>>>> suggests
> >>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>> his
> >>>>>>>>>>>>>>>>>>>>> design document to decouple this in
> >> order to unify
> >>>>>>>> job
> >>>>>>>>>>>>>>> submission.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>  - How to deal with plan preview, which
> >> needs to
> >>>>>>>>>> hijack
> >>>>>>>>>>>>>>> execute()
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>> let the outside code catch an exception?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>  - How to deal with Jar Submission at
> >> the Web
> >>>>>>>>>> Frontend,
> >>>>>>>>>>>> which
> >>>>>>>>>>>>>>>> needs
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> hijack execute() and let the outside
> >> code catch an
> >>>>>>>>>>>> exception?
> >>>>>>>>>>>>>>>>>>>>> CliFrontend.run() “hijacks”
> >>>>>>>>>>> ExecutionEnvironment.execute()
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>> get a
> >>>>>>>>>>>>>>>>>>>>> JobGraph and then execute that JobGraph
> >> manually.
> >>>>>>>> We
> >>>>>>>>>>> could
> >>>>>>>>>>>> get
> >>>>>>>>>>>>>>>> around
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> by letting execute() do the actual
> >> execution. One
> >>>>>>>>>> caveat
> >>>>>>>>>>>> for
> >>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> now the main() method doesn’t return (or
> >> is forced
> >>>>>>>> to
> >>>>>>>>>>>> return by
> >>>>>>>>>>>>>>>>>>> throwing an
> >>>>>>>>>>>>>>>>>>>>> exception from execute()) which means
> >> that for Jar
> >>>>>>>>>>>> Submission
> >>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> WebFrontend we have a long-running
> >> main() method
> >>>>>>>>>> running
> >>>>>>>>>>>> in the
> >>>>>>>>>>>>>>>>>>>>> WebFrontend. This doesn’t sound very
> >> good. We
> >>>>>>>> could get
> >>>>>>>>>>>> around
> >>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>> removing the plan preview feature and by
> >> removing
> >>>>>>>> Jar
> >>>>>>>>>>>>>>>>>>> Submission/Running.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>  - How to deal with detached mode?
> >> Right now,
> >>>>>>>>>>>>>>> DetachedEnvironment
> >>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>>> execute the job and return immediately.
> >> If users
> >>>>>>>>>> control
> >>>>>>>>>>>> when
> >>>>>>>>>>>>>>> they
> >>>>>>>>>>>>>>>>>>> want to
> >>>>>>>>>>>>>>>>>>>>> return, by waiting on the job completion
> >> future,
> >>>>>>>> how do
> >>>>>>>>>>> we
> >>>>>>>>>>>> deal
> >>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>> this?
> >>>>>>>>>>>>>>>>>>>>> Do we simply remove the distinction
> >> between
> >>>>>>>>>>>>>>> detached/non-detached?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>  - How does per-job mode interact with
> >>>>>>>> “interactive
> >>>>>>>>>>>>>> programming”
> >>>>>>>>>>>>>>>>>>>>> (FLIP-36). For YARN, each execute() call
> >> could
> >>>>>>>> spawn a
> >>>>>>>>>>> new
> >>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>> YARN
> >>>>>>>>>>>>>>>>>>>>> cluster. What about Mesos and Kubernetes?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> The first open question is where the
> >> opinions
> >>>>>>>> diverge,
> >>>>>>>>>> I
> >>>>>>>>>>>> think.
> >>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>> rest
> >>>>>>>>>>>>>>>>>>>>> are just open questions and interesting
> >> things
> >>>>>>>> that we
> >>>>>>>>>>>> need to
> >>>>>>>>>>>>>>>>>>> consider.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>> Aljoscha
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> >>>>>>>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On 31. Jul 2019, at 15:23, Jeff Zhang <
> >>>>>>>>>>> zjffdu@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Thanks tison for the effort. I left a
> >> few
> >>>>>>>> comments.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
> >> 于2019年7月31日周三
> >>>>>>>>>>> 下午8:24写道：
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Hi Flavio,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks for your reply.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Either current impl and in the design,
> >>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>>>>>>>>> never takes responsibility for
> >> generating
> >>>>>>>> JobGraph.
> >>>>>>>>>>>>>>>>>>>>>>> (what you see in current codebase is
> >> several
> >>>>>>>> class
> >>>>>>>>>>>> methods)
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Instead, user describes his program
> >> in the main
> >>>>>>>>>> method
> >>>>>>>>>>>>>>>>>>>>>>> with ExecutionEnvironment apis and
> >> calls
> >>>>>>>>>> env.compile()
> >>>>>>>>>>>>>>>>>>>>>>> or env.optimize() to get FlinkPlan
> >> and JobGraph
> >>>>>>>>>>>>>> respectively.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> For listing main classes in a jar and
> >> choose
> >>>>>>>> one for
> >>>>>>>>>>>>>>>>>>>>>>> submission, you're now able to
> >> customize a CLI
> >>>>>>>> to do
> >>>>>>>>>>> it.
> >>>>>>>>>>>>>>>>>>>>>>> Specifically, the path of jar is
> >> passed as
> >>>>>>>> arguments
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>> in the customized CLI you list main
> >> classes,
> >>>>>>>> choose
> >>>>>>>>>>> one
> >>>>>>>>>>>>>>>>>>>>>>> to submit to the cluster.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>> tison.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
> >> pompermaier@okkam.it>
> >>>>>>>>>>> 于2019年7月31日周三
> >>>>>>>>>>>>>>>> 下午8:12写道：
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Just one note on my side: it is not
> >> clear to me
> >>>>>>>>>>>> whether the
> >>>>>>>>>>>>>>>>> client
> >>>>>>>>>>>>>>>>>>>>> needs
> >>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> be able to generate a job graph or
> >> not.
> >>>>>>>>>>>>>>>>>>>>>>>> In my opinion, the job jar must
> >> resides only
> >>>>>>>> on the
> >>>>>>>>>>>>>>>>>>> server/jobManager
> >>>>>>>>>>>>>>>>>>>>>>> side
> >>>>>>>>>>>>>>>>>>>>>>>> and the client requires a way to get
> >> the job
> >>>>>>>> graph.
> >>>>>>>>>>>>>>>>>>>>>>>> If you really want to access to the
> >> job graph,
> >>>>>>>> I'd
> >>>>>>>>>>> add
> >>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>> dedicated
> >>>>>>>>>>>>>>>>>>>>> method
> >>>>>>>>>>>>>>>>>>>>>>>> on the ClusterClient. like:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>  - getJobGraph(jarId, mainClass):
> >> JobGraph
> >>>>>>>>>>>>>>>>>>>>>>>>  - listMainClasses(jarId):
> >> List<String>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> These would require some addition
> >> also on the
> >>>>>>>> job
> >>>>>>>>>>>> manager
> >>>>>>>>>>>>>>>>> endpoint
> >>>>>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>>>>>> well..what do you think?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 31, 2019 at 12:42 PM
> >> Zili Chen <
> >>>>>>>>>>>>>>>> wander4096@gmail.com
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Here is a document[1] on client api
> >>>>>>>> enhancement
> >>>>>>>>>> from
> >>>>>>>>>>>> our
> >>>>>>>>>>>>>>>>>>> perspective.
> >>>>>>>>>>>>>>>>>>>>>>>>> We have investigated current
> >> implementations.
> >>>>>>>> And
> >>>>>>>>>> we
> >>>>>>>>>>>>>> propose
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Unify the implementation of
> >> cluster
> >>>>>>>> deployment
> >>>>>>>>>>> and
> >>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>> submission
> >>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>> Flink.
> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Provide programmatic interfaces
> >> to allow
> >>>>>>>>>> flexible
> >>>>>>>>>>>> job
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> cluster
> >>>>>>>>>>>>>>>>>>>>>>>>> management.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> The first proposal is aimed at
> >> reducing code
> >>>>>>>> paths
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>> cluster
> >>>>>>>>>>>>>>>>>>>>>>> deployment
> >>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>> job submission so that one can
> >> adopt Flink in
> >>>>>>>> his
> >>>>>>>>>>>> usage
> >>>>>>>>>>>>>>>> easily.
> >>>>>>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>>>>>>> second
> >>>>>>>>>>>>>>>>>>>>>>>>> proposal is aimed at providing rich
> >>>>>>>> interfaces for
> >>>>>>>>>>>>>> advanced
> >>>>>>>>>>>>>>>>> users
> >>>>>>>>>>>>>>>>>>>>>>>>> who want to make accurate control
> >> of these
> >>>>>>>> stages.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Quick reference on open questions:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Exclude job cluster deployment
> >> from client
> >>>>>>>> side
> >>>>>>>>>>> or
> >>>>>>>>>>>>>>> redefine
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> semantic
> >>>>>>>>>>>>>>>>>>>>>>>>> of job cluster? Since it fits in a
> >> process
> >>>>>>>> quite
> >>>>>>>>>>>> different
> >>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>> session
> >>>>>>>>>>>>>>>>>>>>>>>>> cluster deployment and job
> >> submission.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Maintain the codepaths handling
> >> class
> >>>>>>>>>>>>>>>>> o.a.f.api.common.Program
> >>>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>>>>> implement customized program
> >> handling logic by
> >>>>>>>>>>>> customized
> >>>>>>>>>>>>>>>>>>>>> CliFrontend?
> >>>>>>>>>>>>>>>>>>>>>>>>> See also this thread[2] and the
> >> document[1].
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> 3. Expose ClusterClient as public
> >> api or just
> >>>>>>>>>> expose
> >>>>>>>>>>>> api
> >>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>>>>>>>>>>>>>> and delegate them to ClusterClient?
> >> Further,
> >>>>>>>> in
> >>>>>>>>>>>> either way
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>> worth
> >>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>> introduce a JobClient which is an
> >>>>>>>> encapsulation of
> >>>>>>>>>>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>> associated to specific job?
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>> tison.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> >>>>>>>>>>>>>>>>>>>>>>>>> [2]
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang <zj...@gmail.com>
> >> 于2019年7月24日周三
> >>>>>>>>>>> 上午9:19写道：
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Stephan, I will follow up
> >> this issue
> >>>>>>>> in
> >>>>>>>>>> next
> >>>>>>>>>>>> few
> >>>>>>>>>>>>>>>> weeks,
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>>>>>>>> refine the design doc. We could
> >> discuss more
> >>>>>>>>>>> details
> >>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>> 1.9
> >>>>>>>>>>>>>>>>>>>>>>> release.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
> >>>>>>>> 于2019年7月24日周三
> >>>>>>>>>>>> 上午12:58写道：
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> This thread has stalled for a
> >> bit, which I
> >>>>>>>>>> assume
> >>>>>>>>>>>> ist
> >>>>>>>>>>>>>>> mostly
> >>>>>>>>>>>>>>>>>>> due to
> >>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Flink 1.9 feature freeze and
> >> release testing
> >>>>>>>>>>> effort.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> I personally still recognize this
> >> issue as
> >>>>>>>> one
> >>>>>>>>>>>> important
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>> solved.
> >>>>>>>>>>>>>>>>>>>>>>>>>> I'd
> >>>>>>>>>>>>>>>>>>>>>>>>>>> be happy to help resume this
> >> discussion soon
> >>>>>>>>>>> (after
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>> 1.9
> >>>>>>>>>>>>>>>>>>>>>>> release)
> >>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>> see if we can do some step
> >> towards this in
> >>>>>>>> Flink
> >>>>>>>>>>>> 1.10.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 10:41 AM
> >> Flavio
> >>>>>>>>>>> Pompermaier
> >>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>> pompermaier@okkam.it>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> That's exactly what I suggested
> >> a long time
> >>>>>>>>>> ago:
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>>> REST
> >>>>>>>>>>>>>>>>>>>>>>>>> client
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> should not require any Flink
> >> dependency,
> >>>>>>>> only
> >>>>>>>>>>> http
> >>>>>>>>>>>>>>> library
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>> call
> >>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>> REST
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> services to submit and monitor a
> >> job.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested also in [1] was
> >> to have a
> >>>>>>>> way
> >>>>>>>>>> to
> >>>>>>>>>>>>>>>>> automatically
> >>>>>>>>>>>>>>>>>>>>>>>>> suggest
> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> user (via a UI) the available
> >> main classes
> >>>>>>>> and
> >>>>>>>>>>>> their
> >>>>>>>>>>>>>>>> required
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> parameters[2].
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Another problem we have with
> >> Flink is that
> >>>>>>>> the
> >>>>>>>>>>> Rest
> >>>>>>>>>>>>>>> client
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> CLI
> >>>>>>>>>>>>>>>>>>>>>>>>>>> one
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> behaves differently and we use
> >> the CLI
> >>>>>>>> client
> >>>>>>>>>>> (via
> >>>>>>>>>>>> ssh)
> >>>>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>> allows
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> to call some other method after
> >>>>>>>> env.execute()
> >>>>>>>>>> [3]
> >>>>>>>>>>>> (we
> >>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>> call
> >>>>>>>>>>>>>>>>>>>>>>>>>>> another
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> REST service to signal the end
> >> of the job).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Int his regard, a dedicated
> >> interface,
> >>>>>>>> like the
> >>>>>>>>>>>>>>> JobListener
> >>>>>>>>>>>>>>>>>>>>>>>> suggested
> >>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the previous emails, would be
> >> very helpful
> >>>>>>>>>>> (IMHO).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10864
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10862
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
> >>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10879
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 9:54 AM
> >> Jeff Zhang
> >>>>>>>> <
> >>>>>>>>>>>>>>>> zjffdu@gmail.com
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Tison,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your comments.
> >> Overall I agree
> >>>>>>>> with
> >>>>>>>>>>> you
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>> difficult
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> down stream project to
> >> integrate with
> >>>>>>>> flink
> >>>>>>>>>> and
> >>>>>>>>>>> we
> >>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> refactor
> >>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> current flink client api.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> And I agree that CliFrontend
> >> should only
> >>>>>>>>>> parsing
> >>>>>>>>>>>>>> command
> >>>>>>>>>>>>>>>>> line
> >>>>>>>>>>>>>>>>>>>>>>>>>> arguments
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> then pass them to
> >> ExecutionEnvironment.
> >>>>>>>> It is
> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment's
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> responsibility to compile job,
> >> create
> >>>>>>>> cluster,
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>> submit
> >>>>>>>>>>>>>>>>> job.
> >>>>>>>>>>>>>>>>>>>>>>>>>> Besides
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that, Currently flink has many
> >>>>>>>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>>>>>>>>>>>>>> implementations,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink will use the specific one
> >> based on
> >>>>>>>> the
> >>>>>>>>>>>> context.
> >>>>>>>>>>>>>>>> IMHO,
> >>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary, ExecutionEnvironment
> >> should be
> >>>>>>>> able
> >>>>>>>>>>> to
> >>>>>>>>>>>> do
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> right
> >>>>>>>>>>>>>>>>>>>>>>>>> thing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> based
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the FlinkConf it is
> >> received. Too many
> >>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation is another
> >> burden for
> >>>>>>>>>> downstream
> >>>>>>>>>>>>>> project
> >>>>>>>>>>>>>>>>>>>>>>>>> integration.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> One thing I'd like to mention
> >> is flink's
> >>>>>>>> scala
> >>>>>>>>>>>> shell
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> sql
> >>>>>>>>>>>>>>>>>>>>>>>>> client,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> although they are sub-modules
> >> of flink,
> >>>>>>>> they
> >>>>>>>>>>>> could be
> >>>>>>>>>>>>>>>>> treated
> >>>>>>>>>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> project which use flink's
> >> client api.
> >>>>>>>>>> Currently
> >>>>>>>>>>>> you
> >>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>> find
> >>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> easy for them to integrate with
> >> flink,
> >>>>>>>> they
> >>>>>>>>>>> share
> >>>>>>>>>>>> many
> >>>>>>>>>>>>>>>>>>>>>>> duplicated
> >>>>>>>>>>>>>>>>>>>>>>>>>> code.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> is another sign that we should
> >> refactor
> >>>>>>>> flink
> >>>>>>>>>>>> client
> >>>>>>>>>>>>>>> api.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I believe it is a large and
> >> hard change,
> >>>>>>>> and I
> >>>>>>>>>>> am
> >>>>>>>>>>>>>> afraid
> >>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>> keep
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility since many of
> >> changes are
> >>>>>>>> user
> >>>>>>>>>>>> facing.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <wander4096@gmail.com
> >>>
> >>>>>>>>>> 于2019年6月24日周一
> >>>>>>>>>>>>>>> 下午2:53写道：
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> After a closer look on our
> >> client apis,
> >>>>>>>> I can
> >>>>>>>>>>> see
> >>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>>> two
> >>>>>>>>>>>>>>>>>>>>>>>>>> major
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues to consistency and
> >> integration,
> >>>>>>>> namely
> >>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>>>>>> deployment
> >>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job cluster which couples job
> >> graph
> >>>>>>>> creation
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>> cluster
> >>>>>>>>>>>>>>>>>>>>>>>>>> deployment,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submission via CliFrontend
> >> confusing
> >>>>>>>>>>> control
> >>>>>>>>>>>> flow
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>> graph
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compilation and job
> >> submission. I'd like
> >>>>>>>> to
> >>>>>>>>>>>> follow
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> discuss
> >>>>>>>>>>>>>>>>>>>>>>>>>> above,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mainly the process described
> >> by Jeff and
> >>>>>>>>>>>> Stephan, and
> >>>>>>>>>>>>>>>> share
> >>>>>>>>>>>>>>>>>>>>>>> my
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas on these issues.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend confuses the
> >> control flow
> >>>>>>>> of
> >>>>>>>>>> job
> >>>>>>>>>>>>>>>> compilation
> >>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> submission.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Following the process of job
> >> submission
> >>>>>>>>>> Stephan
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>> Jeff
> >>>>>>>>>>>>>>>>>>>>>>>>> described,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution environment knows
> >> all configs
> >>>>>>>> of
> >>>>>>>>>> the
> >>>>>>>>>>>>>> cluster
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> topos/settings
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the job. Ideally, in the
> >> main method
> >>>>>>>> of
> >>>>>>>>>> user
> >>>>>>>>>>>>>>> program,
> >>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>> calls
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> #execute
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (or named #submit) and Flink
> >> deploys the
> >>>>>>>>>>> cluster,
> >>>>>>>>>>>>>>> compile
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>>> graph
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit it to the cluster.
> >> However,
> >>>>>>>>>> current
> >>>>>>>>>>>>>>>> CliFrontend
> >>>>>>>>>>>>>>>>>>>>>>> does
> >>>>>>>>>>>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things inside its #runProgram
> >> method,
> >>>>>>>> which
> >>>>>>>>>>>>>> introduces
> >>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>> lot
> >>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of (stream) execution
> >> environment.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually, it sets up an exec
> >> env that
> >>>>>>>> hijacks
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>> #execute/executePlan
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method, initializes the job
> >> graph and
> >>>>>>>> abort
> >>>>>>>>>>>>>> execution.
> >>>>>>>>>>>>>>>> And
> >>>>>>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow back to
> >> CliFrontend, it
> >>>>>>>> deploys
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>> cluster(or
> >>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the client) and submits the
> >> job graph.
> >>>>>>>> This
> >>>>>>>>>> is
> >>>>>>>>>>>> quite
> >>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>> specific
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> internal
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> process inside Flink and none
> >> of
> >>>>>>>> consistency
> >>>>>>>>>> to
> >>>>>>>>>>>>>>> anything.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Deployment of job cluster
> >> couples job
> >>>>>>>>>> graph
> >>>>>>>>>>>>>> creation
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>> cluster
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment. Abstractly, from
> >> user job to
> >>>>>>>> a
> >>>>>>>>>>>> concrete
> >>>>>>>>>>>>>>>>>>>>>>> submission,
> >>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    create JobGraph --\
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create ClusterClient -->
> >> submit JobGraph
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> such a dependency.
> >> ClusterClient was
> >>>>>>>> created
> >>>>>>>>>> by
> >>>>>>>>>>>>>>> deploying
> >>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieving.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph submission requires a
> >> compiled
> >>>>>>>>>>> JobGraph
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> valid
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but the creation of
> >> ClusterClient is
> >>>>>>>>>> abstractly
> >>>>>>>>>>>>>>>> independent
> >>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph. However, in job
> >> cluster mode,
> >>>>>>>> we
> >>>>>>>>>>>> deploy job
> >>>>>>>>>>>>>>>>> cluster
> >>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> graph, which means we use
> >> another
> >>>>>>>> process:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create JobGraph --> deploy
> >> cluster with
> >>>>>>>> the
> >>>>>>>>>>>> JobGraph
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Here is another inconsistency
> >> and
> >>>>>>>> downstream
> >>>>>>>>>>>>>>>>> projects/client
> >>>>>>>>>>>>>>>>>>>>>>>> apis
> >>>>>>>>>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forced to handle different
> >> cases with
> >>>>>>>> rare
> >>>>>>>>>>>> supports
> >>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>>>> Flink.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since we likely reached a
> >> consensus on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. all configs gathered by
> >> Flink
> >>>>>>>>>> configuration
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>> passed
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. execution environment knows
> >> all
> >>>>>>>> configs
> >>>>>>>>>> and
> >>>>>>>>>>>>>> handles
> >>>>>>>>>>>>>>>>>>>>>>>>>> execution(both
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment and submission)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the issues above I propose
> >> eliminating
> >>>>>>>>>>>>>>> inconsistencies
> >>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>>>>>> following
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend should exactly
> >> be a front
> >>>>>>>> end,
> >>>>>>>>>>> at
> >>>>>>>>>>>>>> least
> >>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>> "run"
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That means it just gathered
> >> and passed
> >>>>>>>> all
> >>>>>>>>>>> config
> >>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>>>> command
> >>>>>>>>>>>>>>>>>>>>>>>>> line
> >>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the main method of user
> >> program.
> >>>>>>>> Execution
> >>>>>>>>>>>>>> environment
> >>>>>>>>>>>>>>>>> knows
> >>>>>>>>>>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> info
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and with an addition to utils
> >> for
> >>>>>>>>>>> ClusterClient,
> >>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>> gracefully
> >>>>>>>>>>>>>>>>>>>>>>>>> get
> >>>>>>>>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient by deploying or
> >>>>>>>> retrieving. In
> >>>>>>>>>>> this
> >>>>>>>>>>>>>> way,
> >>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>> don't
> >>>>>>>>>>>>>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hijack #execute/executePlan
> >> methods and
> >>>>>>>> can
> >>>>>>>>>>>> remove
> >>>>>>>>>>>>>>>> various
> >>>>>>>>>>>>>>>>>>>>>>>>> hacking
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses of exec env, as
> >> well as #run
> >>>>>>>>>> methods
> >>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient(for
> >>>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interface-ized ClusterClient).
> >> Now the
> >>>>>>>>>> control
> >>>>>>>>>>>> flow
> >>>>>>>>>>>>>>> flows
> >>>>>>>>>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> CliFrontend
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the main method and never
> >> returns.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Job cluster means a cluster
> >> for the
> >>>>>>>>>> specific
> >>>>>>>>>>>> job.
> >>>>>>>>>>>>>>> From
> >>>>>>>>>>>>>>>>>>>>>>>> another
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> perspective, it is an
> >> ephemeral session.
> >>>>>>>> We
> >>>>>>>>>> may
> >>>>>>>>>>>>>>> decouple
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with a compiled job graph, but
> >> start a
> >>>>>>>>>> session
> >>>>>>>>>>>> with
> >>>>>>>>>>>>>>> idle
> >>>>>>>>>>>>>>>>>>>>>>>> timeout
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit the job following.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> These topics, before we go
> >> into more
> >>>>>>>> details
> >>>>>>>>>> on
> >>>>>>>>>>>>>> design
> >>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are better to be aware and
> >> discussed for
> >>>>>>>> a
> >>>>>>>>>>>> consensus.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <
> >> wander4096@gmail.com>
> >>>>>>>>>> 于2019年6月20日周四
> >>>>>>>>>>>>>>> 上午3:21写道：
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for raising this
> >> thread and the
> >>>>>>>>>> design
> >>>>>>>>>>>>>>> document!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As @Thomas Weise mentioned
> >> above,
> >>>>>>>> extending
> >>>>>>>>>>>> config
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> flink
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires far more effort than
> >> it should
> >>>>>>>> be.
> >>>>>>>>>>>> Another
> >>>>>>>>>>>>>>>>> example
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is we achieve detach mode by
> >> introduce
> >>>>>>>>>> another
> >>>>>>>>>>>>>>> execution
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment which also hijack
> >> #execute
> >>>>>>>>>> method.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with your idea that
> >> user would
> >>>>>>>>>>>> configure all
> >>>>>>>>>>>>>>>>> things
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and flink "just" respect it.
> >> On this
> >>>>>>>> topic I
> >>>>>>>>>>>> think
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> unusual
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow when CliFrontend
> >> handle
> >>>>>>>> "run"
> >>>>>>>>>>>> command
> >>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> problem.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It handles several configs,
> >> mainly about
> >>>>>>>>>>> cluster
> >>>>>>>>>>>>>>>> settings,
> >>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thus main method of user
> >> program is
> >>>>>>>> unaware
> >>>>>>>>>> of
> >>>>>>>>>>>> them.
> >>>>>>>>>>>>>>>> Also
> >>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>> compiles
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> app to job graph by run the
> >> main method
> >>>>>>>>>> with a
> >>>>>>>>>>>>>>> hijacked
> >>>>>>>>>>>>>>>>> exec
> >>>>>>>>>>>>>>>>>>>>>>>>> env,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which constrain the main
> >> method further.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to write down a few
> >> of notes on
> >>>>>>>>>>>>>> configs/args
> >>>>>>>>>>>>>>>> pass
> >>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> respect,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as well as decoupling job
> >> compilation
> >>>>>>>> and
> >>>>>>>>>>>>>> submission.
> >>>>>>>>>>>>>>>>> Share
> >>>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread later.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SHI Xiaogang <
> >> shixiaogangg@gmail.com>
> >>>>>>>>>>>> 于2019年6月17日周一
> >>>>>>>>>>>>>>>>>>>>>>> 下午7:29写道：
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff and Flavio,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Jeff a lot for
> >> proposing the
> >>>>>>>> design
> >>>>>>>>>>>>>> document.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are also working on
> >> refactoring
> >>>>>>>>>>>> ClusterClient to
> >>>>>>>>>>>>>>>> allow
> >>>>>>>>>>>>>>>>>>>>>>>>>> flexible
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> efficient job management in
> >> our
> >>>>>>>> real-time
> >>>>>>>>>>>> platform.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We would like to draft a
> >> document to
> >>>>>>>> share
> >>>>>>>>>>> our
> >>>>>>>>>>>>>> ideas
> >>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>> you.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think it's a good idea to
> >> have
> >>>>>>>> something
> >>>>>>>>>>> like
> >>>>>>>>>>>>>>> Apache
> >>>>>>>>>>>>>>>>> Livy
> >>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the efforts discussed here
> >> will take a
> >>>>>>>>>> great
> >>>>>>>>>>>> step
> >>>>>>>>>>>>>>>> forward
> >>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>> it.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Xiaogang
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
> >>>>>>>> pompermaier@okkam.it>
> >>>>>>>>>>>>>>>> 于2019年6月17日周一
> >>>>>>>>>>>>>>>>>>>>>>>>>> 下午7:13写道：
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Is there any possibility to
> >> have
> >>>>>>>> something
> >>>>>>>>>>>> like
> >>>>>>>>>>>>>>> Apache
> >>>>>>>>>>>>>>>>>>>>>>> Livy
> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink in the future?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >> https://livy.apache.org/
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 11, 2019 at
> >> 5:23 PM Jeff
> >>>>>>>>>> Zhang <
> >>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any API we expose
> >> should not have
> >>>>>>>>>>>> dependencies
> >>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package or
> >> other
> >>>>>>>>>>>> implementation
> >>>>>>>>>>>>>>>>>>>>>>> details.
> >>>>>>>>>>>>>>>>>>>>>>>> To
> >>>>>>>>>>>>>>>>>>>>>>>>>> me,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
> >> ClusterClient
> >>>>>>>> cannot be
> >>>>>>>>>>>> exposed
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>> users
> >>>>>>>>>>>>>>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from the
> >>>>>>>> optimiser and
> >>>>>>>>>>>> runtime
> >>>>>>>>>>>>>>>>>>>>>>>> packages.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We should change
> >> ClusterClient from
> >>>>>>>> class
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>> interface.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment only
> >> use the
> >>>>>>>>>> interface
> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in flink-clients while the
> >> concrete
> >>>>>>>>>>>>>> implementation
> >>>>>>>>>>>>>>>>>>>>>>> class
> >>>>>>>>>>>>>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-runtime.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What happens when a
> >>>>>>>> failure/restart in
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> client
> >>>>>>>>>>>>>>>>>>>>>>>>>> happens?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to be a way of
> >> re-establishing the
> >>>>>>>>>>>> connection to
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> job,
> >>>>>>>>>>>>>>>>>>>>>>>>> set
> >>>>>>>>>>>>>>>>>>>>>>>>>>> up
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners again, etc.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Good point.  First we need
> >> to define
> >>>>>>>> what
> >>>>>>>>>>>> does
> >>>>>>>>>>>>>>>>>>>>>>>>>> failure/restart
> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client mean. IIUC, that
> >> usually mean
> >>>>>>>>>>> network
> >>>>>>>>>>>>>>> failure
> >>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happen in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class RestClient. If my
> >>>>>>>> understanding is
> >>>>>>>>>>>> correct,
> >>>>>>>>>>>>>>>>>>>>>>>>>> restart/retry
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mechanism
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be done in
> >> RestClient.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <
> >>>>>>>> aljoscha@apache.org>
> >>>>>>>>>>>>>>> 于2019年6月11日周二
> >>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:10写道：
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Some points to consider:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * Any API we expose
> >> should not have
> >>>>>>>>>>>> dependencies
> >>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package
> >> or other
> >>>>>>>>>>>> implementation
> >>>>>>>>>>>>>>>>>>>>>>>> details.
> >>>>>>>>>>>>>>>>>>>>>>>>> To
> >>>>>>>>>>>>>>>>>>>>>>>>>>> me,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
> >> ClusterClient
> >>>>>>>> cannot be
> >>>>>>>>>>>> exposed
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> users
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from
> >> the
> >>>>>>>> optimiser
> >>>>>>>>>> and
> >>>>>>>>>>>>>> runtime
> >>>>>>>>>>>>>>>>>>>>>>>>> packages.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * What happens when a
> >>>>>>>> failure/restart in
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> client
> >>>>>>>>>>>>>>>>>>>>>>>>>> happens?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be a way of
> >> re-establishing the
> >>>>>>>>>> connection
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> job,
> >>>>>>>>>>>>>>>>>>>>>>>>> set
> >>>>>>>>>>>>>>>>>>>>>>>>>> up
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again, etc.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 29. May 2019, at
> >> 10:17, Jeff
> >>>>>>>> Zhang <
> >>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry folks, the design
> >> doc is
> >>>>>>>> late as
> >>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>>> expected.
> >>>>>>>>>>>>>>>>>>>>>>>>>> Here's
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doc
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I drafted, welcome any
> >> comments and
> >>>>>>>>>>>> feedback.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <
> >> sewen@apache.org>
> >>>>>>>>>>>> 于2019年2月14日周四
> >>>>>>>>>>>>>>>>>>>>>>>>> 下午8:43写道：
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Nice that this
> >> discussion is
> >>>>>>>>>> happening.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the FLIP, we could
> >> also
> >>>>>>>> revisit the
> >>>>>>>>>>>> entire
> >>>>>>>>>>>>>>> role
> >>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environments
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the idea was:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the environments take
> >> care of
> >>>>>>>> the
> >>>>>>>>>>>> specific
> >>>>>>>>>>>>>>>>>>>>>>> setup
> >>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> standalone
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (no
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> setup needed), yarn,
> >> mesos, etc.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the session ones have
> >> control
> >>>>>>>> over
> >>>>>>>>>> the
> >>>>>>>>>>>>>>> session.
> >>>>>>>>>>>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> holds
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the session client.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - running a job gives a
> >> "control"
> >>>>>>>>>> object
> >>>>>>>>>>>> for
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>> job.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> That
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behavior
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the same in all
> >> environments.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The actual
> >> implementation diverged
> >>>>>>>>>> quite
> >>>>>>>>>>>> a bit
> >>>>>>>>>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>>>>>>> that.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Happy
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about
> >> straitening this
> >>>>>>>> out
> >>>>>>>>>> a
> >>>>>>>>>>>> bit
> >>>>>>>>>>>>>>> more.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at
> >> 4:58 AM
> >>>>>>>> Jeff
> >>>>>>>>>>>> Zhang <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi folks,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for late
> >> response, It
> >>>>>>>> seems we
> >>>>>>>>>>>> reach
> >>>>>>>>>>>>>>>>>>>>>>>> consensus
> >>>>>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP for this with
> >> more detailed
> >>>>>>>>>> design
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas Weise <
> >> thw@apache.org>
> >>>>>>>>>>>> 于2018年12月21日周五
> >>>>>>>>>>>>>>>>>>>>>>>>>> 上午11:43写道：
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great to see this
> >> discussion
> >>>>>>>> seeded!
> >>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>>>>>> problems
> >>>>>>>>>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>>>>>>> face
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin integration
> >> are also
> >>>>>>>>>>> affecting
> >>>>>>>>>>>>>> other
> >>>>>>>>>>>>>>>>>>>>>>>>>> downstream
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We just enabled the
> >> savepoint
> >>>>>>>>>> restore
> >>>>>>>>>>>> option
> >>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RemoteStreamEnvironment
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and that was more
> >> difficult
> >>>>>>>> than it
> >>>>>>>>>>>> should
> >>>>>>>>>>>>>> be.
> >>>>>>>>>>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>>>>>>>>> main
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment and
> >> cluster client
> >>>>>>>>>> aren't
> >>>>>>>>>>>>>>> decoupled.
> >>>>>>>>>>>>>>>>>>>>>>>>>> Ideally
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible to just get
> >> the
> >>>>>>>> matching
> >>>>>>>>>>>> cluster
> >>>>>>>>>>>>>>> client
> >>>>>>>>>>>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then control the job
> >> through it
> >>>>>>>>>>>> (environment
> >>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>>>>>>>> factory
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client). But note
> >> that the
> >>>>>>>>>> environment
> >>>>>>>>>>>>>> classes
> >>>>>>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>>>>>> part
> >>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and it is not
> >> straightforward to
> >>>>>>>>>> make
> >>>>>>>>>>>> larger
> >>>>>>>>>>>>>>>>>>>>>>>> changes
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> without
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> breaking
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backward
> >> compatibility.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >> currently exposes
> >>>>>>>>>>> internal
> >>>>>>>>>>>>>>> classes
> >>>>>>>>>>>>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> StreamGraph. But it
> >> should be
> >>>>>>>>>> possible
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>> wrap
> >>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>> with a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that brings the
> >> required job
> >>>>>>>> control
> >>>>>>>>>>>>>>>>>>>>>>> capabilities
> >>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Perhaps it is helpful
> >> to look at
> >>>>>>>>>> some
> >>>>>>>>>>>> of the
> >>>>>>>>>>>>>>>>>>>>>>>>>> interfaces
> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> while
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking about this:
> >> [2] for the
> >>>>>>>>>>>> portable
> >>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>> API
> >>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> old
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> asynchronous job
> >> control from
> >>>>>>>> the
> >>>>>>>>>> Beam
> >>>>>>>>>>>> Java
> >>>>>>>>>>>>>>> SDK.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The backward
> >> compatibility
> >>>>>>>>>> discussion
> >>>>>>>>>>>> [4] is
> >>>>>>>>>>>>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>>>>>>>>>>>>> relevant
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here. A
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should shield
> >> downstream
> >>>>>>>> projects
> >>>>>>>>>> from
> >>>>>>>>>>>>>>> internals
> >>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>> allow
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> them to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interoperate with
> >> multiple
> >>>>>>>> future
> >>>>>>>>>>> Flink
> >>>>>>>>>>>>>>> versions
> >>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> same
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> release
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without forced
> >> upgrades.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>> https://github.com/apache/flink/pull/7249
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [4]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018
> >> at 6:15 PM
> >>>>>>>> Jeff
> >>>>>>>>>>>> Zhang <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
> >> whether the
> >>>>>>>> user
> >>>>>>>>>>>> should
> >>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>> able
> >>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> define
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
> >> example Yarn).
> >>>>>>>> This
> >>>>>>>>>> is
> >>>>>>>>>>>>>> actually
> >>>>>>>>>>>>>>>>>>>>>>>>>>> independent
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
> >> something
> >>>>>>>> which
> >>>>>>>>>> is
> >>>>>>>>>>>>>> decided
> >>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> User don't need to
> >> specify
> >>>>>>>>>> execution
> >>>>>>>>>>>> mode
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatically.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> They
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass the execution
> >> mode from
> >>>>>>>> the
> >>>>>>>>>>>> arguments
> >>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>> flink
> >>>>>>>>>>>>>>>>>>>>>>>>>> run
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
> >> yarn-cluster
> >>>>>>>> ....
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
> >> local ...
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
> >> host:port ...
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this make sense
> >> to you ?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To me it makes
> >> sense that
> >>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized by the
> >> user and
> >>>>>>>> instead
> >>>>>>>>>>>> context
> >>>>>>>>>>>>>>>>>>>>>>>>> sensitive
> >>>>>>>>>>>>>>>>>>>>>>>>>>> how
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute your job
> >> (Flink CLI vs.
> >>>>>>>>>> IDE,
> >>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>> example).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Right, currently I
> >> notice Flink
> >>>>>>>>>> would
> >>>>>>>>>>>>>> create
> >>>>>>>>>>>>>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >> ContextExecutionEnvironment
> >>>>>>>> based
> >>>>>>>>>> on
> >>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>>>>>>>> submission
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scenarios
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cli vs IDE). To me
> >> this is
> >>>>>>>> kind of
> >>>>>>>>>>> hack
> >>>>>>>>>>>>>>>>>>>>>>> approach,
> >>>>>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> straightforward.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested
> >> above is that
> >>>>>>>> is
> >>>>>>>>>>> that
> >>>>>>>>>>>>>> flink
> >>>>>>>>>>>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> always
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >> but with
> >>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>>>>>>> configuration,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> based
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration it
> >> would create
> >>>>>>>> the
> >>>>>>>>>>>> proper
> >>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behaviors.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
> >>>>>>>>>> trohrmann@apache.org>
> >>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月20日周四
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:18写道：
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You are probably
> >> right that we
> >>>>>>>>>> have
> >>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>>>>>>>> duplication
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creation of the
> >> ClusterClient.
> >>>>>>>>>> This
> >>>>>>>>>>>> should
> >>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>> reduced
> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
> >> whether the
> >>>>>>>> user
> >>>>>>>>>>>> should be
> >>>>>>>>>>>>>>>>>>>>>>> able
> >>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> define
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
> >> example Yarn).
> >>>>>>>> This
> >>>>>>>>>> is
> >>>>>>>>>>>>>>> actually
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> independent
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
> >> something
> >>>>>>>> which
> >>>>>>>>>>> is
> >>>>>>>>>>>>>>> decided
> >>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> makes sense that the
> >>>>>>>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user and
> >> instead context
> >>>>>>>>>>>> sensitive how
> >>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>>>>> want
> >>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE,
> >> for
> >>>>>>>> example).
> >>>>>>>>>>>>>> However, I
> >>>>>>>>>>>>>>>>>>>>>>>> agree
> >>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >> ExecutionEnvironment should
> >>>>>>>> give
> >>>>>>>>>> you
> >>>>>>>>>>>>>> access
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job (maybe in the
> >> form of the
> >>>>>>>>>>>> JobGraph or
> >>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>> plan).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 13,
> >> 2018 at 4:36
> >>>>>>>> AM
> >>>>>>>>>> Jeff
> >>>>>>>>>>>>>> Zhang <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Till,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the
> >> feedback. You
> >>>>>>>> are
> >>>>>>>>>>>> right
> >>>>>>>>>>>>>>> that I
> >>>>>>>>>>>>>>>>>>>>>>>>>> expect
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >> submission/control api
> >>>>>>>> which
> >>>>>>>>>>>> could be
> >>>>>>>>>>>>>>>>>>>>>>> used
> >>>>>>>>>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> project.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it would benefit
> >> for the
> >>>>>>>> flink
> >>>>>>>>>>>> ecosystem.
> >>>>>>>>>>>>>>>>>>>>>>> When
> >>>>>>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>>>> look
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scala-shell and
> >> sql-client (I
> >>>>>>>>>>> believe
> >>>>>>>>>>>>>> they
> >>>>>>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> core of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> belong to the
> >> ecosystem of
> >>>>>>>>>> flink),
> >>>>>>>>>>> I
> >>>>>>>>>>>> find
> >>>>>>>>>>>>>>>>>>>>>>> many
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> duplicated
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creating
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient from
> >> user
> >>>>>>>> provided
> >>>>>>>>>>>>>>>>>>>>>>> configuration
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (configuration
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> format
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different from
> >> scala-shell
> >>>>>>>> and
> >>>>>>>>>>>>>> sql-client)
> >>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to manipulate
> >> jobs. I don't
> >>>>>>>> think
> >>>>>>>>>>>> this is
> >>>>>>>>>>>>>>>>>>>>>>>>>> convenient
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects. What I
> >> expect is
> >>>>>>>> that
> >>>>>>>>>>>>>> downstream
> >>>>>>>>>>>>>>>>>>>>>>>>> project
> >>>>>>>>>>>>>>>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary
> >> configuration info
> >>>>>>>>>> (maybe
> >>>>>>>>>>>>>>>>>>>>>>> introducing
> >>>>>>>>>>>>>>>>>>>>>>>>>> class
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf),
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> build
> >> ExecutionEnvironment
> >>>>>>>> based
> >>>>>>>>>> on
> >>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >> ExecutionEnvironment will
> >>>>>>>> create
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> proper
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> benefit for the
> >> downstream
> >>>>>>>>>> project
> >>>>>>>>>>>>>>>>>>>>>>> development
> >>>>>>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their integration
> >> test with
> >>>>>>>>>> flink.
> >>>>>>>>>>>> Here's
> >>>>>>>>>>>>>>> one
> >>>>>>>>>>>>>>>>>>>>>>>>>> sample
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> snippet
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expect.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val conf = new
> >>>>>>>>>>>> FlinkConf().mode("yarn")
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val env = new
> >>>>>>>>>>>> ExecutionEnvironment(conf)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobId =
> >> env.submit(...)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobStatus =
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What do you think ?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
> >>>>>>>>>>> trohrmann@apache.org>
> >>>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月11日周二
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午6:28写道：
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what you are
> >> proposing is to
> >>>>>>>>>>>> provide the
> >>>>>>>>>>>>>>>>>>>>>>> user
> >>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control. There
> >> was actually
> >>>>>>>> an
> >>>>>>>>>>>> effort to
> >>>>>>>>>>>>>>>>>>>>>>>> achieve
> >>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> never
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> been
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> completed [1].
> >> However,
> >>>>>>>> there
> >>>>>>>>>> are
> >>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>>>>>>>> improvement
> >>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> base
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> now.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Look for example
> >> at the
> >>>>>>>>>>>> NewClusterClient
> >>>>>>>>>>>>>>>>>>>>>>>>> interface
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> offers a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> non-blocking job
> >> submission.
> >>>>>>>>>> But I
> >>>>>>>>>>>> agree
> >>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improve
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this regard.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would not be in
> >> favour if
> >>>>>>>>>>>> exposing all
> >>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> calls
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >> ExecutionEnvironment
> >>>>>>>> because it
> >>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>>>>>>> clutter
> >>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> class
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> good separation
> >> of concerns.
> >>>>>>>>>>>> Instead one
> >>>>>>>>>>>>>>>>>>>>>>> idea
> >>>>>>>>>>>>>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current
> >> ClusterClient from
> >>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for cluster and
> >> job
> >>>>>>>> control. But
> >>>>>>>>>>>> before
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>> start
> >>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> effort
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> agree and capture
> >> what
> >>>>>>>>>>>> functionality we
> >>>>>>>>>>>>>>> want
> >>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> provide.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the
> >> idea was
> >>>>>>>> that we
> >>>>>>>>>>>> have the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> describing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to talk to
> >> cluster manager
> >>>>>>>> like
> >>>>>>>>>>>> Yarn or
> >>>>>>>>>>>>>>>>>>>>>>> Mesos.
> >>>>>>>>>>>>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used for
> >> deploying Flink
> >>>>>>>>>> clusters
> >>>>>>>>>>>> (job
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>> session)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gives
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
> >>>>>>>> ClusterClient
> >>>>>>>>>>>>>> controls
> >>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (e.g.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, listing all
> >> running
> >>>>>>>> jobs).
> >>>>>>>>>>> And
> >>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>>>>>>>>>>>> was
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> idea
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobClient which
> >> you obtain
> >>>>>>>> from
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trigger
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operations (e.g.
> >> taking a
> >>>>>>>>>>> savepoint,
> >>>>>>>>>>>>>>>>>>>>>>>> cancelling
> >>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> job).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-4272
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11,
> >> 2018 at
> >>>>>>>> 10:13 AM
> >>>>>>>>>>>> Jeff
> >>>>>>>>>>>>>>> Zhang
> >>>>>>>>>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am trying to
> >> integrate
> >>>>>>>> flink
> >>>>>>>>>>> into
> >>>>>>>>>>>>>>> apache
> >>>>>>>>>>>>>>>>>>>>>>>>>> zeppelin
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> notebook. And I
> >> hit several
> >>>>>>>>>>> issues
> >>>>>>>>>>>> that
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>> caused
> >>>>>>>>>>>>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to
> >> proposal the
> >>>>>>>>>>> following
> >>>>>>>>>>>>>>> changes
> >>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>> flink
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Support
> >> nonblocking
> >>>>>>>>>> execution.
> >>>>>>>>>>>>>>>>>>>>>>> Currently,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >> ExecutionEnvironment#execute
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a blocking
> >> method which
> >>>>>>>>>> would
> >>>>>>>>>>>> do 2
> >>>>>>>>>>>>>>>>>>>>>>> things,
> >>>>>>>>>>>>>>>>>>>>>>>>>> first
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wait for job
> >> until it is
> >>>>>>>>>>> finished.
> >>>>>>>>>>>> I'd
> >>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nonblocking
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution method
> >> like
> >>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#submit
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then return
> >> jobId to
> >>>>>>>> client.
> >>>>>>>>>> And
> >>>>>>>>>>>> allow
> >>>>>>>>>>>>>>> user
> >>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>> query
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> status
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel
> >> api in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >> ExecutionEnvironment/StreamExecutionEnvironment,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> currently the
> >> only way to
> >>>>>>>>>> cancel
> >>>>>>>>>>>> job is
> >>>>>>>>>>>>>>> via
> >>>>>>>>>>>>>>>>>>>>>>>> cli
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (bin/flink),
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> convenient for
> >> downstream
> >>>>>>>>>> project
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> feature.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So I'd
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> add
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel api in
> >>>>>>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint
> >> api in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is similar as
> >> cancel api,
> >>>>>>>> we
> >>>>>>>>>>>> should use
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unified
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api for third
> >> party to
> >>>>>>>>>> integrate
> >>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>> flink.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. Add listener
> >> for job
> >>>>>>>>>> execution
> >>>>>>>>>>>>>>>>>>>>>>> lifecycle.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Something
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that downstream
> >> project
> >>>>>>>> can do
> >>>>>>>>>>>> custom
> >>>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lifecycle
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin would
> >> capture the
> >>>>>>>>>> jobId
> >>>>>>>>>>>> after
> >>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> submitted
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId to cancel
> >> it later
> >>>>>>>> when
> >>>>>>>>>>>>>> necessary.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public interface
> >>>>>>>> JobListener {
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
> >> onJobSubmitted(JobID
> >>>>>>>>>>> jobId);
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
> >>>>>>>>>>>> onJobExecuted(JobExecutionResult
> >>>>>>>>>>>>>>>>>>>>>>>>>> jobResult);
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
> >> onJobCanceled(JobID
> >>>>>>>>>> jobId);
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5. Enable
> >> session in
> >>>>>>>>>>>>>>> ExecutionEnvironment.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Currently
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> disabled,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> session is very
> >> convenient
> >>>>>>>> for
> >>>>>>>>>>>> third
> >>>>>>>>>>>>>>> party
> >>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> continually.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I hope flink can
> >> enable it
> >>>>>>>>>> again.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6. Unify all
> >> flink client
> >>>>>>>> api
> >>>>>>>>>>> into
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a long
> >> term issue
> >>>>>>>> which
> >>>>>>>>>>>> needs
> >>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>>>>>>>>> careful
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently some
> >> of features
> >>>>>>>> of
> >>>>>>>>>>>> flink is
> >>>>>>>>>>>>>>>>>>>>>>>> exposed
> >>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some are
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exposed
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cli instead of
> >> api, like
> >>>>>>>> the
> >>>>>>>>>>>> cancel and
> >>>>>>>>>>>>>>>>>>>>>>>>>> savepoint I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> above.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think the root
> >> cause is
> >>>>>>>> due to
> >>>>>>>>>>> that
> >>>>>>>>>>>>>> flink
> >>>>>>>>>>>>>>>>>>>>>>>>> didn't
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> unify
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interaction
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink. Here I
> >> list 3
> >>>>>>>> scenarios
> >>>>>>>>>> of
> >>>>>>>>>>>> flink
> >>>>>>>>>>>>>>>>>>>>>>>>> operation
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Local job
> >> execution.
> >>>>>>>> Flink
> >>>>>>>>>>> will
> >>>>>>>>>>>>>>> create
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >> LocalEnvironment to
> >>>>>>>>>> create
> >>>>>>>>>>>>>>>>>>>>>>>> LocalExecutor
> >>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Remote job
> >> execution.
> >>>>>>>> Flink
> >>>>>>>>>>> will
> >>>>>>>>>>>>>>> create
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
> >> ContextEnvironment
> >>>>>>>>>> based
> >>>>>>>>>>>> on the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> run
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Job
> >> cancelation. Flink
> >>>>>>>> will
> >>>>>>>>>>>> create
> >>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this job via
> >> this
> >>>>>>>>>> ClusterClient.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As you can see
> >> in the
> >>>>>>>> above 3
> >>>>>>>>>>>>>> scenarios.
> >>>>>>>>>>>>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>>>>>>>>>>>>> didn't
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach(code
> >> path) to
> >>>>>>>> interact
> >>>>>>>>>>>> with
> >>>>>>>>>>>>>>> flink
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I propose is
> >>>>>>>> following:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Create the proper
> >>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment/RemoteEnvironment
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (based
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration)
> >> --> Use this
> >>>>>>>>>>>> Environment
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>> create
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> proper
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >> (LocalClusterClient or
> >>>>>>>>>>>>>> RestClusterClient)
> >>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink (
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution or
> >> cancelation)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This way we can
> >> unify the
> >>>>>>>>>> process
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>>> local
> >>>>>>>>>>>>>>>>>>>>>>>>>>> execution
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remote
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And it is much
> >> easier for
> >>>>>>>> third
> >>>>>>>>>>>> party
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>> integrate
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >> ExecutionEnvironment is the
> >>>>>>>>>>> unified
> >>>>>>>>>>>>>> entry
> >>>>>>>>>>>>>>>>>>>>>>>> point
> >>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> third
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> party
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs to do is
> >> just pass
> >>>>>>>>>>>> configuration
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >> ExecutionEnvironment will
> >>>>>>>> do
> >>>>>>>>>> the
> >>>>>>>>>>>> right
> >>>>>>>>>>>>>>>>>>>>>>> thing
> >>>>>>>>>>>>>>>>>>>>>>>>>> based
> >>>>>>>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink cli can
> >> also be
> >>>>>>>>>> considered
> >>>>>>>>>>> as
> >>>>>>>>>>>>>> flink
> >>>>>>>>>>>>>>>>>>>>>>> api
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> consumer.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration to
> >>>>>>>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>> let
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create the proper
> >>>>>>>> ClusterClient
> >>>>>>>>>>>> instead
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>> letting
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> cli
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> >> directly.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6 would involve
> >> large code
> >>>>>>>>>>>> refactoring,
> >>>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> defer
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future release,
> >> 1,2,3,4,5
> >>>>>>>> could
> >>>>>>>>>>> be
> >>>>>>>>>>>> done
> >>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>> once I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> believe.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comments and
> >> feedback,
> >>>>>>>> thanks
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best Regards
> >>>>>>>
> >>>>>>> Jeff Zhang
> >>
> >>
>
>

-- 
Best Regards

Jeff Zhang

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Aljoscha Krettek <al...@apache.org>.

Hi,

We could try and use the ASF slack for this purpose, that would probably be easiest. See https://s.apache.org/slack-invite. We could create a dedicated channel for our work and would still use the open ASF infrastructure and people can have a look if they are interested because discussion would be public. What do you think?

P.S. Committers/PMCs should should be able to login with their apache ID.

Best,
Aljoscha

> On 6. Sep 2019, at 14:24, Zili Chen <wa...@gmail.com> wrote:
> 
> Hi Aljoscha,
> 
> I'd like to gather all the ideas here and among documents, and draft a
> formal FLIP
> that keep us on the same page. Hopefully I start a FLIP thread in next week.
> 
> For the implementation or said POC part, I'd like to work with you guys who
> proposed
> the concept Executor to make sure that we go in the same direction. I'm
> wondering
> whether a dedicate thread or a Slack group is the proper one. In my opinion
> we can
> involve the team in a Slack group, concurrent with the FLIP process start
> our branch
> and once we reach a consensus on the FLIP, open an umbrella issue about the
> framework
> and start subtasks. What do you think?
> 
> Best,
> tison.
> 
> 
> Aljoscha Krettek <al...@apache.org> 于2019年9月5日周四 下午9:39写道：
> 
>> Hi Tison,
>> 
>> To keep this moving forward, maybe you want to start working on a proof of
>> concept implementation for the new JobClient interface, maybe with a new
>> method executeAsync() in the environment that returns the JobClient and
>> implement the methods to see how that works and to see where we get. Would
>> you be interested in that?
>> 
>> Also, at some point we should collect all the ideas and start forming an
>> actual FLIP.
>> 
>> Best,
>> Aljoscha
>> 
>>> On 4. Sep 2019, at 12:04, Zili Chen <wa...@gmail.com> wrote:
>>> 
>>> Thanks for your update Kostas!
>>> 
>>> It looks good to me that clean up existing code paths as first
>>> pass. I'd like to help on review and file subtasks if I find ones.
>>> 
>>> Best,
>>> tison.
>>> 
>>> 
>>> Kostas Kloudas <kk...@gmail.com> 于2019年9月4日周三 下午5:52写道：
>>> Here is the issue, and I will keep on updating it as I find more issues.
>>> 
>>> https://issues.apache.org/jira/browse/FLINK-13954
>>> 
>>> This will also cover the refactoring of the Executors that we discussed
>>> in this thread, without any additional functionality (such as the job
>> client).
>>> 
>>> Kostas
>>> 
>>> On Wed, Sep 4, 2019 at 11:46 AM Kostas Kloudas <kk...@gmail.com>
>> wrote:
>>>> 
>>>> Great idea Tison!
>>>> 
>>>> I will create the umbrella issue and post it here so that we are all
>>>> on the same page!
>>>> 
>>>> Cheers,
>>>> Kostas
>>>> 
>>>> On Wed, Sep 4, 2019 at 11:36 AM Zili Chen <wa...@gmail.com>
>> wrote:
>>>>> 
>>>>> Hi Kostas & Aljoscha,
>>>>> 
>>>>> I notice that there is a JIRA(FLINK-13946) which could be included
>>>>> in this refactor thread. Since we agree on most of directions in
>>>>> big picture, is it reasonable that we create an umbrella issue for
>>>>> refactor client APIs and also linked subtasks? It would be a better
>>>>> way that we join forces of our community.
>>>>> 
>>>>> Best,
>>>>> tison.
>>>>> 
>>>>> 
>>>>> Zili Chen <wa...@gmail.com> 于2019年8月31日周六 下午12:52写道：
>>>>>> 
>>>>>> Great Kostas! Looking forward to your POC!
>>>>>> 
>>>>>> Best,
>>>>>> tison.
>>>>>> 
>>>>>> 
>>>>>> Jeff Zhang <zj...@gmail.com> 于2019年8月30日周五 下午11:07写道：
>>>>>>> 
>>>>>>> Awesome, @Kostas Looking forward your POC.
>>>>>>> 
>>>>>>> Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I am just writing here to let you know that I am working on a
>> POC that
>>>>>>>> tries to refactor the current state of job submission in Flink.
>>>>>>>> I want to stress out that it introduces NO CHANGES to the current
>>>>>>>> behaviour of Flink. It just re-arranges things and introduces the
>>>>>>>> notion of an Executor, which is the entity responsible for
>> taking the
>>>>>>>> user-code and submitting it for execution.
>>>>>>>> 
>>>>>>>> Given this, the discussion about the functionality that the
>> JobClient
>>>>>>>> will expose to the user can go on independently and the same
>>>>>>>> holds for all the open questions so far.
>>>>>>>> 
>>>>>>>> I hope I will have some more new to share soon.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Kostas
>>>>>>>> 
>>>>>>>> On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <da...@gmail.com>
>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Zili,
>>>>>>>>> 
>>>>>>>>> It make sense to me that a dedicated cluster is started for a
>> per-job
>>>>>>>>> cluster and will not accept more jobs.
>>>>>>>>> Just have a question about the command line.
>>>>>>>>> 
>>>>>>>>> Currently we could use the following commands to start
>> different
>>>>>>>> clusters.
>>>>>>>>> *per-job cluster*
>>>>>>>>> ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
>>>>>>>>> examples/streaming/WindowJoin.jar
>>>>>>>>> *session cluster*
>>>>>>>>> ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
>>>>>>>>> examples/streaming/WindowJoin.jar
>>>>>>>>> 
>>>>>>>>> What will it look like after client enhancement?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Yang
>>>>>>>>> 
>>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
>>>>>>>>> 
>>>>>>>>>> Hi Till,
>>>>>>>>>> 
>>>>>>>>>> Thanks for your update. Nice to hear :-)
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> tison.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Till Rohrmann <tr...@apache.org> 于2019年8月23日周五
>> 下午10:39写道：
>>>>>>>>>> 
>>>>>>>>>>> Hi Tison,
>>>>>>>>>>> 
>>>>>>>>>>> just a quick comment concerning the class loading issues
>> when using
>>>>>>>> the
>>>>>>>>>> per
>>>>>>>>>>> job mode. The community wants to change it so that the
>>>>>>>>>>> StandaloneJobClusterEntryPoint actually uses the user code
>> class
>>>>>>>> loader
>>>>>>>>>>> with child first class loading [1]. Hence, I hope that
>> this problem
>>>>>>>> will
>>>>>>>>>> be
>>>>>>>>>>> resolved soon.
>>>>>>>>>>> 
>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-13840
>>>>>>>>>>> 
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Till
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <
>> kkloudas@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> 
>>>>>>>>>>>> On the topic of web submission, I agree with Till that
>> it only
>>>>>>>> seems
>>>>>>>>>>>> to complicate things.
>>>>>>>>>>>> It is bad for security, job isolation (anybody can
>> submit/cancel
>>>>>>>> jobs),
>>>>>>>>>>>> and its
>>>>>>>>>>>> implementation complicates some parts of the code. So,
>> if it were
>>>>>>>> to
>>>>>>>>>>>> redesign the
>>>>>>>>>>>> WebUI, maybe this part could be left out. In addition, I
>> would say
>>>>>>>>>>>> that the ability to cancel
>>>>>>>>>>>> jobs could also be left out.
>>>>>>>>>>>> 
>>>>>>>>>>>> Also I would also be in favour of removing the
>> "detached" mode, for
>>>>>>>>>>>> the reasons mentioned
>>>>>>>>>>>> above (i.e. because now we will have a future
>> representing the
>>>>>>>> result
>>>>>>>>>>>> on which the user
>>>>>>>>>>>> can choose to wait or not).
>>>>>>>>>>>> 
>>>>>>>>>>>> Now for the separating job submission and cluster
>> creation, I am in
>>>>>>>>>>>> favour of keeping both.
>>>>>>>>>>>> Once again, the reasons are mentioned above by Stephan,
>> Till,
>>>>>>>> Aljoscha
>>>>>>>>>>>> and also Zili seems
>>>>>>>>>>>> to agree. They mainly have to do with security,
>> isolation and ease
>>>>>>>> of
>>>>>>>>>>>> resource management
>>>>>>>>>>>> for the user as he knows that "when my job is done,
>> everything
>>>>>>>> will be
>>>>>>>>>>>> cleared up". This is
>>>>>>>>>>>> also the experience you get when launching a process on
>> your local
>>>>>>>> OS.
>>>>>>>>>>>> 
>>>>>>>>>>>> On excluding the per-job mode from returning a JobClient
>> or not, I
>>>>>>>>>>>> believe that eventually
>>>>>>>>>>>> it would be nice to allow users to get back a jobClient.
>> The
>>>>>>>> reason is
>>>>>>>>>>>> that 1) I cannot
>>>>>>>>>>>> find any objective reason why the user-experience should
>> diverge,
>>>>>>>> and
>>>>>>>>>>>> 2) this will be the
>>>>>>>>>>>> way that the user will be able to interact with his
>> running job.
>>>>>>>>>>>> Assuming that the necessary
>>>>>>>>>>>> ports are open for the REST API to work, then I think
>> that the
>>>>>>>>>>>> JobClient can run against the
>>>>>>>>>>>> REST API without problems. If the needed ports are not
>> open, then
>>>>>>>> we
>>>>>>>>>>>> are safe to not return
>>>>>>>>>>>> a JobClient, as the user explicitly chose to close all
>> points of
>>>>>>>>>>>> communication to his running job.
>>>>>>>>>>>> 
>>>>>>>>>>>> On the topic of not hijacking the "env.execute()" in
>> order to get
>>>>>>>> the
>>>>>>>>>>>> Plan, I definitely agree but
>>>>>>>>>>>> for the proposal of having a "compile()" method in the
>> env, I would
>>>>>>>>>>>> like to have a better look at
>>>>>>>>>>>> the existing code.
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Kostas
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <
>> wander4096@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Yang,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It would be helpful if you check Stephan's last
>> comment,
>>>>>>>>>>>>> which states that isolation is important.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For per-job mode, we run a dedicated cluster(maybe it
>>>>>>>>>>>>> should have been a couple of JM and TMs during FLIP-6
>>>>>>>>>>>>> design) for a specific job. Thus the process is
>> prevented
>>>>>>>>>>>>> from other jobs.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In our cases there was a time we suffered from multi
>>>>>>>>>>>>> jobs submitted by different users and they affected
>>>>>>>>>>>>> each other so that all ran into an error state. Also,
>>>>>>>>>>>>> run the client inside the cluster could save client
>>>>>>>>>>>>> resource at some points.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> However, we also face several issues as you mentioned,
>>>>>>>>>>>>> that in per-job mode it always uses parent classloader
>>>>>>>>>>>>> thus classloading issues occur.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> BTW, one can makes an analogy between session/per-job
>> mode
>>>>>>>>>>>>> in  Flink, and client/cluster mode in Spark.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> tison.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yang Wang <da...@gmail.com> 于2019年8月22日周四
>> 上午11:25写道：
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> From the user's perspective, it is really confused
>> about the
>>>>>>>> scope
>>>>>>>>>> of
>>>>>>>>>>>>>> per-job cluster.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If it means a flink cluster with single job, so that
>> we could
>>>>>>>> get
>>>>>>>>>>>> better
>>>>>>>>>>>>>> isolation.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Now it does not matter how we deploy the cluster,
>> directly
>>>>>>>>>>>> deploy(mode1)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> or start a flink cluster and then submit job through
>> cluster
>>>>>>>>>>>> client(mode2).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Otherwise, if it just means directly deploy, how
>> should we
>>>>>>>> name the
>>>>>>>>>>>> mode2,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> session with job or something else?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We could also benefit from the mode2. Users could
>> get the same
>>>>>>>>>>>> isolation
>>>>>>>>>>>>>> with mode1.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The user code and dependencies will be loaded by
>> user class
>>>>>>>> loader
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> to avoid class conflict with framework.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Anyway, both of the two submission modes are useful.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We just need to clarify the concepts.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yang
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年8月20日周二
>> 下午5:58写道：
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks for the clarification.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The idea JobDeployer ever came into my mind when I
>> was
>>>>>>>> muddled
>>>>>>>>>> with
>>>>>>>>>>>>>>> how to execute per-job mode and session mode with
>> the same
>>>>>>>> user
>>>>>>>>>>> code
>>>>>>>>>>>>>>> and framework codepath.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> With the concept JobDeployer we back to the
>> statement that
>>>>>>>>>>>> environment
>>>>>>>>>>>>>>> knows every configs of cluster deployment and job
>>>>>>>> submission. We
>>>>>>>>>>>>>>> configure or generate from configuration a specific
>>>>>>>> JobDeployer
>>>>>>>>>> in
>>>>>>>>>>>>>>> environment and then code align on
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> *JobClient client = env.execute().get();*
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> which in session mode returned by
>> clusterClient.submitJob
>>>>>>>> and in
>>>>>>>>>>>> per-job
>>>>>>>>>>>>>>> mode returned by
>> clusterDescriptor.deployJobCluster.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Here comes a problem that currently we directly run
>>>>>>>>>>> ClusterEntrypoint
>>>>>>>>>>>>>>> with extracted job graph. Follow the JobDeployer
>> way we'd
>>>>>>>> better
>>>>>>>>>>>>>>> align entry point of per-job deployment at
>> JobDeployer.
>>>>>>>> Users run
>>>>>>>>>>>>>>> their main method or by a Cli(finally call main
>> method) to
>>>>>>>> deploy
>>>>>>>>>>> the
>>>>>>>>>>>>>>> job cluster.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> tison.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年8月20日周二
>> 下午4:40写道：
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Till has made some good comments here.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Two things to add:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>  - The job mode is very nice in the way that it
>> runs the
>>>>>>>>>> client
>>>>>>>>>>>> inside
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> cluster (in the same image/process that is the
>> JM) and thus
>>>>>>>>>>> unifies
>>>>>>>>>>>>>> both
>>>>>>>>>>>>>>>> applications and what the Spark world calls the
>> "driver
>>>>>>>> mode".
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>  - Another thing I would add is that during the
>> FLIP-6
>>>>>>>> design,
>>>>>>>>>>> we
>>>>>>>>>>>> were
>>>>>>>>>>>>>>>> thinking about setups where Dispatcher and
>> JobManager are
>>>>>>>>>>> separate
>>>>>>>>>>>>>>>> processes.
>>>>>>>>>>>>>>>>    A Yarn or Mesos Dispatcher of a session
>> could run
>>>>>>>>>>> independently
>>>>>>>>>>>>>> (even
>>>>>>>>>>>>>>>> as privileged processes executing no code).
>>>>>>>>>>>>>>>>    Then you the "per-job" mode could still be
>> helpful:
>>>>>>>> when a
>>>>>>>>>>> job
>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> submitted to the dispatcher, it launches the JM
>> again in a
>>>>>>>>>>> per-job
>>>>>>>>>>>>>> mode,
>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>> that JM and TM processes are bound to teh job
>> only. For
>>>>>>>> higher
>>>>>>>>>>>> security
>>>>>>>>>>>>>>>> setups, it is important that processes are not
>> reused
>>>>>>>> across
>>>>>>>>>>> jobs.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
>>>>>>>>>>>> trohrmann@apache.org>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I would not be in favour of getting rid of the
>> per-job
>>>>>>>> mode
>>>>>>>>>>>> since it
>>>>>>>>>>>>>>>>> simplifies the process of running Flink jobs
>>>>>>>> considerably.
>>>>>>>>>>>> Moreover,
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> not only well suited for container deployments
>> but also
>>>>>>>> for
>>>>>>>>>>>>>> deployments
>>>>>>>>>>>>>>>>> where you want to guarantee job isolation. For
>> example, a
>>>>>>>>>> user
>>>>>>>>>>>> could
>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>> the per-job mode on Yarn to execute his job on
>> a separate
>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I think that having two notions of cluster
>> deployments
>>>>>>>>>> (session
>>>>>>>>>>>> vs.
>>>>>>>>>>>>>>>> per-job
>>>>>>>>>>>>>>>>> mode) does not necessarily contradict your
>> ideas for the
>>>>>>>>>> client
>>>>>>>>>>>> api
>>>>>>>>>>>>>>>>> refactoring. For example one could have the
>> following
>>>>>>>>>>> interfaces:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> - ClusterDeploymentDescriptor: encapsulates
>> the logic
>>>>>>>> how to
>>>>>>>>>>>> deploy a
>>>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>> - ClusterClient: allows to interact with a
>> cluster
>>>>>>>>>>>>>>>>> - JobClient: allows to interact with a running
>> job
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Now the ClusterDeploymentDescriptor could have
>> two
>>>>>>>> methods:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> - ClusterClient deploySessionCluster()
>>>>>>>>>>>>>>>>> - JobClusterClient/JobClient
>>>>>>>> deployPerJobCluster(JobGraph)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> where JobClusterClient is either a supertype of
>>>>>>>> ClusterClient
>>>>>>>>>>>> which
>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>> not give you the functionality to submit jobs
>> or
>>>>>>>>>>>> deployPerJobCluster
>>>>>>>>>>>>>>>>> returns directly a JobClient.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> When setting up the ExecutionEnvironment, one
>> would then
>>>>>>>> not
>>>>>>>>>>>> provide
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> ClusterClient to submit jobs but a JobDeployer
>> which,
>>>>>>>>>> depending
>>>>>>>>>>>> on
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> selected mode, either uses a ClusterClient
>> (session
>>>>>>>> mode) to
>>>>>>>>>>>> submit
>>>>>>>>>>>>>>> jobs
>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>> a ClusterDeploymentDescriptor to deploy per a
>> job mode
>>>>>>>>>> cluster
>>>>>>>>>>>> with
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>> to execute.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> These are just some thoughts how one could
>> make it
>>>>>>>> working
>>>>>>>>>>>> because I
>>>>>>>>>>>>>>>>> believe there is some value in using the per
>> job mode
>>>>>>>> from
>>>>>>>>>> the
>>>>>>>>>>>>>>>>> ExecutionEnvironment.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Concerning the web submission, this is indeed
>> a bit
>>>>>>>> tricky.
>>>>>>>>>>> From
>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>> management stand point, I would in favour of
>> not
>>>>>>>> executing
>>>>>>>>>> user
>>>>>>>>>>>> code
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> REST endpoint. Especially when considering
>> security, it
>>>>>>>> would
>>>>>>>>>>> be
>>>>>>>>>>>> good
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> have a well defined cluster behaviour where it
>> is
>>>>>>>> explicitly
>>>>>>>>>>>> stated
>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>> user code and, thus, potentially risky code is
>> executed.
>>>>>>>>>>> Ideally
>>>>>>>>>>>> we
>>>>>>>>>>>>>>> limit
>>>>>>>>>>>>>>>>> it to the TaskExecutor and JobMaster.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:40 AM Flavio
>> Pompermaier <
>>>>>>>>>>>>>>> pompermaier@okkam.it
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> In my opinion the client should not use any
>>>>>>>> environment to
>>>>>>>>>>> get
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> Job
>>>>>>>>>>>>>>>>>> graph because the jar should reside ONLY on
>> the cluster
>>>>>>>>>> (and
>>>>>>>>>>>> not in
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> client classpath otherwise there are always
>>>>>>>> inconsistencies
>>>>>>>>>>>> between
>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>> and Flink Job manager's classpath).
>>>>>>>>>>>>>>>>>> In the YARN, Mesos and Kubernetes scenarios
>> you have
>>>>>>>> the
>>>>>>>>>> jar
>>>>>>>>>>>> but
>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>> start a cluster that has the jar on the Job
>> Manager as
>>>>>>>> well
>>>>>>>>>>>> (but
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> the only case where I think you can assume
>> that the
>>>>>>>> client
>>>>>>>>>>> has
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> jar
>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> the classpath..in the REST job submission
>> you don't
>>>>>>>> have
>>>>>>>>>> any
>>>>>>>>>>>>>>>> classpath).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thus, always in my opinion, the JobGraph
>> should be
>>>>>>>>>> generated
>>>>>>>>>>>> by the
>>>>>>>>>>>>>>> Job
>>>>>>>>>>>>>>>>>> Manager REST API.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
>>>>>>>>>>>> wander4096@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I would like to involve Till & Stephan here
>> to clarify
>>>>>>>>>> some
>>>>>>>>>>>>>> concept
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> per-job mode.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> The term per-job is one of modes a cluster
>> could run
>>>>>>>> on.
>>>>>>>>>> It
>>>>>>>>>>> is
>>>>>>>>>>>>>>> mainly
>>>>>>>>>>>>>>>>>>> aimed
>>>>>>>>>>>>>>>>>>> at spawn
>>>>>>>>>>>>>>>>>>> a dedicated cluster for a specific job
>> while the job
>>>>>>>> could
>>>>>>>>>>> be
>>>>>>>>>>>>>>> packaged
>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>>> itself and thus the cluster initialized
>> with job so
>>>>>>>> that
>>>>>>>>>> get
>>>>>>>>>>>> rid
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> separated
>>>>>>>>>>>>>>>>>>> submission step.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> This is useful for container deployments
>> where one
>>>>>>>> create
>>>>>>>>>>> his
>>>>>>>>>>>>>> image
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> the job
>>>>>>>>>>>>>>>>>>> and then simply deploy the container.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> However, it is out of client scope since a
>>>>>>>>>>>> client(ClusterClient
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> example) is for
>>>>>>>>>>>>>>>>>>> communicate with an existing cluster and
>> performance
>>>>>>>>>>> actions.
>>>>>>>>>>>>>>>> Currently,
>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> per-job
>>>>>>>>>>>>>>>>>>> mode, we extract the job graph and bundle
>> it into
>>>>>>>> cluster
>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> thus no
>>>>>>>>>>>>>>>>>>> concept of client get involved. It looks
>> like
>>>>>>>> reasonable
>>>>>>>>>> to
>>>>>>>>>>>>>> exclude
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>>>>> of per-job cluster from client api and use
>> dedicated
>>>>>>>>>> utility
>>>>>>>>>>>>>>>>>>> classes(deployers) for
>>>>>>>>>>>>>>>>>>> deployment.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
>> 于2019年8月20日周二
>>>>>>>> 下午12:37写道：
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hi Aljoscha,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks for your reply and participance.
>> The Google
>>>>>>>> Doc
>>>>>>>>>> you
>>>>>>>>>>>>>> linked
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> requires
>>>>>>>>>>>>>>>>>>>> permission and I think you could use a
>> share link
>>>>>>>>>> instead.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I agree with that we almost reach a
>> consensus that
>>>>>>>>>>>> JobClient is
>>>>>>>>>>>>>>>>>>> necessary
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> interacte with a running Job.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Let me check your open questions one by
>> one.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 1. Separate cluster creation and job
>> submission for
>>>>>>>>>>> per-job
>>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> As you mentioned here is where the
>> opinions
>>>>>>>> diverge. In
>>>>>>>>>> my
>>>>>>>>>>>>>>> document
>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> an alternative[2] that proposes excluding
>> per-job
>>>>>>>>>>> deployment
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>> api
>>>>>>>>>>>>>>>>>>>> scope and now I find it is more
>> reasonable we do the
>>>>>>>>>>>> exclusion.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> When in per-job mode, a dedicated
>> JobCluster is
>>>>>>>> launched
>>>>>>>>>>> to
>>>>>>>>>>>>>>> execute
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> specific job. It is like a Flink
>> Application more
>>>>>>>> than a
>>>>>>>>>>>>>>> submission
>>>>>>>>>>>>>>>>>>>> of Flink Job. Client only takes care of
>> job
>>>>>>>> submission
>>>>>>>>>> and
>>>>>>>>>>>>>> assume
>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> an existing cluster. In this way we are
>> able to
>>>>>>>> consider
>>>>>>>>>>>> per-job
>>>>>>>>>>>>>>>>> issues
>>>>>>>>>>>>>>>>>>>> individually and JobClusterEntrypoint
>> would be the
>>>>>>>>>> utility
>>>>>>>>>>>> class
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> per-job
>>>>>>>>>>>>>>>>>>>> deployment.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Nevertheless, user program works in both
>> session
>>>>>>>> mode
>>>>>>>>>> and
>>>>>>>>>>>>>> per-job
>>>>>>>>>>>>>>>> mode
>>>>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>>> necessary to change code. JobClient in
>> per-job mode
>>>>>>>> is
>>>>>>>>>>>> returned
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>> env.execute as normal. However, it would
>> be no
>>>>>>>> longer a
>>>>>>>>>>>> wrapper
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>> RestClusterClient but a wrapper of
>>>>>>>> PerJobClusterClient
>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>> communicates
>>>>>>>>>>>>>>>>>>>> to Dispatcher locally.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 2. How to deal with plan preview.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> With env.compile functions users can get
>> JobGraph or
>>>>>>>>>>>> FlinkPlan
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> thus
>>>>>>>>>>>>>>>>>>>> they can preview the plan with
>> programming.
>>>>>>>> Typically it
>>>>>>>>>>>> looks
>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> if (preview configured) {
>>>>>>>>>>>>>>>>>>>>    FlinkPlan plan = env.compile();
>>>>>>>>>>>>>>>>>>>>    new JSONDumpGenerator(...).dump(plan);
>>>>>>>>>>>>>>>>>>>> } else {
>>>>>>>>>>>>>>>>>>>>    env.execute();
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> And `flink info` would be invalid any
>> more.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 3. How to deal with Jar Submission at the
>> Web
>>>>>>>> Frontend.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> There is one more thread talked on this
>> topic[1].
>>>>>>>> Apart
>>>>>>>>>>> from
>>>>>>>>>>>>>>>> removing
>>>>>>>>>>>>>>>>>>>> the functions there are two alternatives.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> One is to introduce an interface has a
>> method
>>>>>>>> returns
>>>>>>>>>>>>>>>>> JobGraph/FilnkPlan
>>>>>>>>>>>>>>>>>>>> and Jar Submission only support main-class
>>>>>>>> implements
>>>>>>>>>> this
>>>>>>>>>>>>>>>> interface.
>>>>>>>>>>>>>>>>>>>> And then extract the JobGraph/FlinkPlan
>> just by
>>>>>>>> calling
>>>>>>>>>>> the
>>>>>>>>>>>>>>> method.
>>>>>>>>>>>>>>>>>>>> In this way, it is even possible to
>> consider a
>>>>>>>>>> separation
>>>>>>>>>>>> of job
>>>>>>>>>>>>>>>>>>> creation
>>>>>>>>>>>>>>>>>>>> and job submission.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> The other is, as you mentioned, let
>> execute() do the
>>>>>>>>>>> actual
>>>>>>>>>>>>>>>> execution.
>>>>>>>>>>>>>>>>>>>> We won't execute the main method in the
>> WebFrontend
>>>>>>>> but
>>>>>>>>>>>> spawn a
>>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>>>>>> at WebMonitor side to execute. For return
>> part we
>>>>>>>> could
>>>>>>>>>>>> generate
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> JobID from WebMonitor and pass it to the
>> execution
>>>>>>>>>>>> environemnt.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 4. How to deal with detached mode.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I think detached mode is a temporary
>> solution for
>>>>>>>>>>>> non-blocking
>>>>>>>>>>>>>>>>>>> submission.
>>>>>>>>>>>>>>>>>>>> In my document both submission and
>> execution return
>>>>>>>> a
>>>>>>>>>>>>>>>>> CompletableFuture
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> users control whether or not wait for the
>> result. In
>>>>>>>>>> this
>>>>>>>>>>>> point
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>> need a detached option but the
>> functionality is
>>>>>>>> covered.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 5. How does per-job mode interact with
>> interactive
>>>>>>>>>>>> programming.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> All of YARN, Mesos and Kubernetes
>> scenarios follow
>>>>>>>> the
>>>>>>>>>>>> pattern
>>>>>>>>>>>>>>>> launch
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> JobCluster now. And I don't think there
>> would be
>>>>>>>>>>>> inconsistency
>>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>>>>>> different resource management.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> tison.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org>
>>>>>>>> 于2019年8月16日周五
>>>>>>>>>>>> 下午9:20写道：
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I read both Jeffs initial design
>> document and the
>>>>>>>> newer
>>>>>>>>>>>>>> document
>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>> Tison. I also finally found the time to
>> collect our
>>>>>>>>>>>> thoughts on
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> issue,
>>>>>>>>>>>>>>>>>>>>> I had quite some discussions with Kostas
>> and this
>>>>>>>> is
>>>>>>>>>> the
>>>>>>>>>>>>>> result:
>>>>>>>>>>>>>>>> [1].
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I think overall we agree that this part
>> of the
>>>>>>>> code is
>>>>>>>>>> in
>>>>>>>>>>>> dire
>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> some refactoring/improvements but I
>> think there are
>>>>>>>>>> still
>>>>>>>>>>>> some
>>>>>>>>>>>>>>> open
>>>>>>>>>>>>>>>>>>>>> questions and some differences in
>> opinion what
>>>>>>>> those
>>>>>>>>>>>>>> refactorings
>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>> look like.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I think the API-side is quite clear,
>> i.e. we need
>>>>>>>> some
>>>>>>>>>>>>>> JobClient
>>>>>>>>>>>>>>>> API
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> allows interacting with a running Job.
>> It could be
>>>>>>>>>>>> worthwhile
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> spin
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> off into a separate FLIP because we can
>> probably
>>>>>>>> find
>>>>>>>>>>>> consensus
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> part more easily.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> For the rest, the main open questions
>> from our doc
>>>>>>>> are
>>>>>>>>>>>> these:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>  - Do we want to separate cluster
>> creation and job
>>>>>>>>>>>> submission
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>> per-job mode? In the past, there were
>> conscious
>>>>>>>> efforts
>>>>>>>>>>> to
>>>>>>>>>>>>>> *not*
>>>>>>>>>>>>>>>>>>> separate
>>>>>>>>>>>>>>>>>>>>> job submission from cluster creation for
>> per-job
>>>>>>>>>> clusters
>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> Mesos,
>>>>>>>>>>>>>>>>>>> YARN,
>>>>>>>>>>>>>>>>>>>>> Kubernets (see
>> StandaloneJobClusterEntryPoint).
>>>>>>>> Tison
>>>>>>>>>>>> suggests
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> his
>>>>>>>>>>>>>>>>>>>>> design document to decouple this in
>> order to unify
>>>>>>>> job
>>>>>>>>>>>>>>> submission.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>  - How to deal with plan preview, which
>> needs to
>>>>>>>>>> hijack
>>>>>>>>>>>>>>> execute()
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> let the outside code catch an exception?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>  - How to deal with Jar Submission at
>> the Web
>>>>>>>>>> Frontend,
>>>>>>>>>>>> which
>>>>>>>>>>>>>>>> needs
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> hijack execute() and let the outside
>> code catch an
>>>>>>>>>>>> exception?
>>>>>>>>>>>>>>>>>>>>> CliFrontend.run() “hijacks”
>>>>>>>>>>> ExecutionEnvironment.execute()
>>>>>>>>>>>> to
>>>>>>>>>>>>>>> get a
>>>>>>>>>>>>>>>>>>>>> JobGraph and then execute that JobGraph
>> manually.
>>>>>>>> We
>>>>>>>>>>> could
>>>>>>>>>>>> get
>>>>>>>>>>>>>>>> around
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> by letting execute() do the actual
>> execution. One
>>>>>>>>>> caveat
>>>>>>>>>>>> for
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> now the main() method doesn’t return (or
>> is forced
>>>>>>>> to
>>>>>>>>>>>> return by
>>>>>>>>>>>>>>>>>>> throwing an
>>>>>>>>>>>>>>>>>>>>> exception from execute()) which means
>> that for Jar
>>>>>>>>>>>> Submission
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> WebFrontend we have a long-running
>> main() method
>>>>>>>>>> running
>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>>> WebFrontend. This doesn’t sound very
>> good. We
>>>>>>>> could get
>>>>>>>>>>>> around
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>> removing the plan preview feature and by
>> removing
>>>>>>>> Jar
>>>>>>>>>>>>>>>>>>> Submission/Running.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>  - How to deal with detached mode?
>> Right now,
>>>>>>>>>>>>>>> DetachedEnvironment
>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>> execute the job and return immediately.
>> If users
>>>>>>>>>> control
>>>>>>>>>>>> when
>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>>>>>>>> return, by waiting on the job completion
>> future,
>>>>>>>> how do
>>>>>>>>>>> we
>>>>>>>>>>>> deal
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> this?
>>>>>>>>>>>>>>>>>>>>> Do we simply remove the distinction
>> between
>>>>>>>>>>>>>>> detached/non-detached?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>  - How does per-job mode interact with
>>>>>>>> “interactive
>>>>>>>>>>>>>> programming”
>>>>>>>>>>>>>>>>>>>>> (FLIP-36). For YARN, each execute() call
>> could
>>>>>>>> spawn a
>>>>>>>>>>> new
>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>> YARN
>>>>>>>>>>>>>>>>>>>>> cluster. What about Mesos and Kubernetes?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> The first open question is where the
>> opinions
>>>>>>>> diverge,
>>>>>>>>>> I
>>>>>>>>>>>> think.
>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>> rest
>>>>>>>>>>>>>>>>>>>>> are just open questions and interesting
>> things
>>>>>>>> that we
>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>>>>> consider.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>> Aljoscha
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>>>>>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On 31. Jul 2019, at 15:23, Jeff Zhang <
>>>>>>>>>>> zjffdu@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Thanks tison for the effort. I left a
>> few
>>>>>>>> comments.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Zili Chen <wa...@gmail.com>
>> 于2019年7月31日周三
>>>>>>>>>>> 下午8:24写道：
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Hi Flavio,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks for your reply.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Either current impl and in the design,
>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>> never takes responsibility for
>> generating
>>>>>>>> JobGraph.
>>>>>>>>>>>>>>>>>>>>>>> (what you see in current codebase is
>> several
>>>>>>>> class
>>>>>>>>>>>> methods)
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Instead, user describes his program
>> in the main
>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>>>> with ExecutionEnvironment apis and
>> calls
>>>>>>>>>> env.compile()
>>>>>>>>>>>>>>>>>>>>>>> or env.optimize() to get FlinkPlan
>> and JobGraph
>>>>>>>>>>>>>> respectively.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> For listing main classes in a jar and
>> choose
>>>>>>>> one for
>>>>>>>>>>>>>>>>>>>>>>> submission, you're now able to
>> customize a CLI
>>>>>>>> to do
>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>>>>>>>> Specifically, the path of jar is
>> passed as
>>>>>>>> arguments
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> in the customized CLI you list main
>> classes,
>>>>>>>> choose
>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>>>> to submit to the cluster.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>> tison.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
>> pompermaier@okkam.it>
>>>>>>>>>>> 于2019年7月31日周三
>>>>>>>>>>>>>>>> 下午8:12写道：
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Just one note on my side: it is not
>> clear to me
>>>>>>>>>>>> whether the
>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>> needs
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> be able to generate a job graph or
>> not.
>>>>>>>>>>>>>>>>>>>>>>>> In my opinion, the job jar must
>> resides only
>>>>>>>> on the
>>>>>>>>>>>>>>>>>>> server/jobManager
>>>>>>>>>>>>>>>>>>>>>>> side
>>>>>>>>>>>>>>>>>>>>>>>> and the client requires a way to get
>> the job
>>>>>>>> graph.
>>>>>>>>>>>>>>>>>>>>>>>> If you really want to access to the
>> job graph,
>>>>>>>> I'd
>>>>>>>>>>> add
>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> dedicated
>>>>>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>>>>> on the ClusterClient. like:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>  - getJobGraph(jarId, mainClass):
>> JobGraph
>>>>>>>>>>>>>>>>>>>>>>>>  - listMainClasses(jarId):
>> List<String>
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> These would require some addition
>> also on the
>>>>>>>> job
>>>>>>>>>>>> manager
>>>>>>>>>>>>>>>>> endpoint
>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>> well..what do you think?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 31, 2019 at 12:42 PM
>> Zili Chen <
>>>>>>>>>>>>>>>> wander4096@gmail.com
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Here is a document[1] on client api
>>>>>>>> enhancement
>>>>>>>>>> from
>>>>>>>>>>>> our
>>>>>>>>>>>>>>>>>>> perspective.
>>>>>>>>>>>>>>>>>>>>>>>>> We have investigated current
>> implementations.
>>>>>>>> And
>>>>>>>>>> we
>>>>>>>>>>>>>> propose
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 1. Unify the implementation of
>> cluster
>>>>>>>> deployment
>>>>>>>>>>> and
>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>> submission
>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>> Flink.
>>>>>>>>>>>>>>>>>>>>>>>>> 2. Provide programmatic interfaces
>> to allow
>>>>>>>>>> flexible
>>>>>>>>>>>> job
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>>>> management.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> The first proposal is aimed at
>> reducing code
>>>>>>>> paths
>>>>>>>>>>> of
>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>> job submission so that one can
>> adopt Flink in
>>>>>>>> his
>>>>>>>>>>>> usage
>>>>>>>>>>>>>>>> easily.
>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>> second
>>>>>>>>>>>>>>>>>>>>>>>>> proposal is aimed at providing rich
>>>>>>>> interfaces for
>>>>>>>>>>>>>> advanced
>>>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>>>>>>>>>>>> who want to make accurate control
>> of these
>>>>>>>> stages.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Quick reference on open questions:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 1. Exclude job cluster deployment
>> from client
>>>>>>>> side
>>>>>>>>>>> or
>>>>>>>>>>>>>>> redefine
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> semantic
>>>>>>>>>>>>>>>>>>>>>>>>> of job cluster? Since it fits in a
>> process
>>>>>>>> quite
>>>>>>>>>>>> different
>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>> session
>>>>>>>>>>>>>>>>>>>>>>>>> cluster deployment and job
>> submission.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 2. Maintain the codepaths handling
>> class
>>>>>>>>>>>>>>>>> o.a.f.api.common.Program
>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>> implement customized program
>> handling logic by
>>>>>>>>>>>> customized
>>>>>>>>>>>>>>>>>>>>> CliFrontend?
>>>>>>>>>>>>>>>>>>>>>>>>> See also this thread[2] and the
>> document[1].
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 3. Expose ClusterClient as public
>> api or just
>>>>>>>>>> expose
>>>>>>>>>>>> api
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>> and delegate them to ClusterClient?
>> Further,
>>>>>>>> in
>>>>>>>>>>>> either way
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>> worth
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>> introduce a JobClient which is an
>>>>>>>> encapsulation of
>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>> associated to specific job?
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang <zj...@gmail.com>
>> 于2019年7月24日周三
>>>>>>>>>>> 上午9:19写道：
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Stephan, I will follow up
>> this issue
>>>>>>>> in
>>>>>>>>>> next
>>>>>>>>>>>> few
>>>>>>>>>>>>>>>> weeks,
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>>> refine the design doc. We could
>> discuss more
>>>>>>>>>>> details
>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>> 1.9
>>>>>>>>>>>>>>>>>>>>>>> release.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
>>>>>>>> 于2019年7月24日周三
>>>>>>>>>>>> 上午12:58写道：
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all!
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> This thread has stalled for a
>> bit, which I
>>>>>>>>>> assume
>>>>>>>>>>>> ist
>>>>>>>>>>>>>>> mostly
>>>>>>>>>>>>>>>>>>> due to
>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink 1.9 feature freeze and
>> release testing
>>>>>>>>>>> effort.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> I personally still recognize this
>> issue as
>>>>>>>> one
>>>>>>>>>>>> important
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>> solved.
>>>>>>>>>>>>>>>>>>>>>>>>>> I'd
>>>>>>>>>>>>>>>>>>>>>>>>>>> be happy to help resume this
>> discussion soon
>>>>>>>>>>> (after
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> 1.9
>>>>>>>>>>>>>>>>>>>>>>> release)
>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>> see if we can do some step
>> towards this in
>>>>>>>> Flink
>>>>>>>>>>>> 1.10.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 10:41 AM
>> Flavio
>>>>>>>>>>> Pompermaier
>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>> pompermaier@okkam.it>
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> That's exactly what I suggested
>> a long time
>>>>>>>>>> ago:
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>> REST
>>>>>>>>>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>>>>>> should not require any Flink
>> dependency,
>>>>>>>> only
>>>>>>>>>>> http
>>>>>>>>>>>>>>> library
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> REST
>>>>>>>>>>>>>>>>>>>>>>>>>>>> services to submit and monitor a
>> job.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested also in [1] was
>> to have a
>>>>>>>> way
>>>>>>>>>> to
>>>>>>>>>>>>>>>>> automatically
>>>>>>>>>>>>>>>>>>>>>>>>> suggest
>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> user (via a UI) the available
>> main classes
>>>>>>>> and
>>>>>>>>>>>> their
>>>>>>>>>>>>>>>> required
>>>>>>>>>>>>>>>>>>>>>>>>>>>> parameters[2].
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another problem we have with
>> Flink is that
>>>>>>>> the
>>>>>>>>>>> Rest
>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> CLI
>>>>>>>>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>>>>>>>>> behaves differently and we use
>> the CLI
>>>>>>>> client
>>>>>>>>>>> (via
>>>>>>>>>>>> ssh)
>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>> allows
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to call some other method after
>>>>>>>> env.execute()
>>>>>>>>>> [3]
>>>>>>>>>>>> (we
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>>>>>>>>>>>>>> another
>>>>>>>>>>>>>>>>>>>>>>>>>>>> REST service to signal the end
>> of the job).
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Int his regard, a dedicated
>> interface,
>>>>>>>> like the
>>>>>>>>>>>>>>> JobListener
>>>>>>>>>>>>>>>>>>>>>>>> suggested
>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the previous emails, would be
>> very helpful
>>>>>>>>>>> (IMHO).
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10864
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10862
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-10879
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 9:54 AM
>> Jeff Zhang
>>>>>>>> <
>>>>>>>>>>>>>>>> zjffdu@gmail.com
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Tison,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your comments.
>> Overall I agree
>>>>>>>> with
>>>>>>>>>>> you
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>> difficult
>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> down stream project to
>> integrate with
>>>>>>>> flink
>>>>>>>>>> and
>>>>>>>>>>> we
>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> refactor
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current flink client api.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And I agree that CliFrontend
>> should only
>>>>>>>>>> parsing
>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>>> line
>>>>>>>>>>>>>>>>>>>>>>>>>> arguments
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then pass them to
>> ExecutionEnvironment.
>>>>>>>> It is
>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> responsibility to compile job,
>> create
>>>>>>>> cluster,
>>>>>>>>>>> and
>>>>>>>>>>>>>>> submit
>>>>>>>>>>>>>>>>> job.
>>>>>>>>>>>>>>>>>>>>>>>>>> Besides
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that, Currently flink has many
>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>> implementations,
>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink will use the specific one
>> based on
>>>>>>>> the
>>>>>>>>>>>> context.
>>>>>>>>>>>>>>>> IMHO,
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary, ExecutionEnvironment
>> should be
>>>>>>>> able
>>>>>>>>>>> to
>>>>>>>>>>>> do
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> right
>>>>>>>>>>>>>>>>>>>>>>>>> thing
>>>>>>>>>>>>>>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the FlinkConf it is
>> received. Too many
>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation is another
>> burden for
>>>>>>>>>> downstream
>>>>>>>>>>>>>> project
>>>>>>>>>>>>>>>>>>>>>>>>> integration.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One thing I'd like to mention
>> is flink's
>>>>>>>> scala
>>>>>>>>>>>> shell
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> sql
>>>>>>>>>>>>>>>>>>>>>>>>> client,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> although they are sub-modules
>> of flink,
>>>>>>>> they
>>>>>>>>>>>> could be
>>>>>>>>>>>>>>>>> treated
>>>>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> project which use flink's
>> client api.
>>>>>>>>>> Currently
>>>>>>>>>>>> you
>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>> find
>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easy for them to integrate with
>> flink,
>>>>>>>> they
>>>>>>>>>>> share
>>>>>>>>>>>> many
>>>>>>>>>>>>>>>>>>>>>>> duplicated
>>>>>>>>>>>>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is another sign that we should
>> refactor
>>>>>>>> flink
>>>>>>>>>>>> client
>>>>>>>>>>>>>>> api.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I believe it is a large and
>> hard change,
>>>>>>>> and I
>>>>>>>>>>> am
>>>>>>>>>>>>>> afraid
>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>> keep
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility since many of
>> changes are
>>>>>>>> user
>>>>>>>>>>>> facing.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <wander4096@gmail.com
>>> 
>>>>>>>>>> 于2019年6月24日周一
>>>>>>>>>>>>>>> 下午2:53写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> After a closer look on our
>> client apis,
>>>>>>>> I can
>>>>>>>>>>> see
>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>> two
>>>>>>>>>>>>>>>>>>>>>>>>>> major
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues to consistency and
>> integration,
>>>>>>>> namely
>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job cluster which couples job
>> graph
>>>>>>>> creation
>>>>>>>>>>> and
>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>>>>> deployment,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submission via CliFrontend
>> confusing
>>>>>>>>>>> control
>>>>>>>>>>>> flow
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>> graph
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compilation and job
>> submission. I'd like
>>>>>>>> to
>>>>>>>>>>>> follow
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> discuss
>>>>>>>>>>>>>>>>>>>>>>>>>> above,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mainly the process described
>> by Jeff and
>>>>>>>>>>>> Stephan, and
>>>>>>>>>>>>>>>> share
>>>>>>>>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas on these issues.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend confuses the
>> control flow
>>>>>>>> of
>>>>>>>>>> job
>>>>>>>>>>>>>>>> compilation
>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submission.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Following the process of job
>> submission
>>>>>>>>>> Stephan
>>>>>>>>>>>> and
>>>>>>>>>>>>>>> Jeff
>>>>>>>>>>>>>>>>>>>>>>>>> described,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution environment knows
>> all configs
>>>>>>>> of
>>>>>>>>>> the
>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>> topos/settings
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the job. Ideally, in the
>> main method
>>>>>>>> of
>>>>>>>>>> user
>>>>>>>>>>>>>>> program,
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>> calls
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> #execute
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (or named #submit) and Flink
>> deploys the
>>>>>>>>>>> cluster,
>>>>>>>>>>>>>>> compile
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>> graph
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit it to the cluster.
>> However,
>>>>>>>>>> current
>>>>>>>>>>>>>>>> CliFrontend
>>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> things inside its #runProgram
>> method,
>>>>>>>> which
>>>>>>>>>>>>>> introduces
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> lot
>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of (stream) execution
>> environment.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually, it sets up an exec
>> env that
>>>>>>>> hijacks
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> #execute/executePlan
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method, initializes the job
>> graph and
>>>>>>>> abort
>>>>>>>>>>>>>> execution.
>>>>>>>>>>>>>>>> And
>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow back to
>> CliFrontend, it
>>>>>>>> deploys
>>>>>>>>>>> the
>>>>>>>>>>>>>>>> cluster(or
>>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the client) and submits the
>> job graph.
>>>>>>>> This
>>>>>>>>>> is
>>>>>>>>>>>> quite
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> process inside Flink and none
>> of
>>>>>>>> consistency
>>>>>>>>>> to
>>>>>>>>>>>>>>> anything.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Deployment of job cluster
>> couples job
>>>>>>>>>> graph
>>>>>>>>>>>>>> creation
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment. Abstractly, from
>> user job to
>>>>>>>> a
>>>>>>>>>>>> concrete
>>>>>>>>>>>>>>>>>>>>>>> submission,
>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    create JobGraph --\
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create ClusterClient -->
>> submit JobGraph
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> such a dependency.
>> ClusterClient was
>>>>>>>> created
>>>>>>>>>> by
>>>>>>>>>>>>>>> deploying
>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieving.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph submission requires a
>> compiled
>>>>>>>>>>> JobGraph
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> valid
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but the creation of
>> ClusterClient is
>>>>>>>>>> abstractly
>>>>>>>>>>>>>>>> independent
>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph. However, in job
>> cluster mode,
>>>>>>>> we
>>>>>>>>>>>> deploy job
>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> graph, which means we use
>> another
>>>>>>>> process:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create JobGraph --> deploy
>> cluster with
>>>>>>>> the
>>>>>>>>>>>> JobGraph
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Here is another inconsistency
>> and
>>>>>>>> downstream
>>>>>>>>>>>>>>>>> projects/client
>>>>>>>>>>>>>>>>>>>>>>>> apis
>>>>>>>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forced to handle different
>> cases with
>>>>>>>> rare
>>>>>>>>>>>> supports
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>> Flink.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since we likely reached a
>> consensus on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. all configs gathered by
>> Flink
>>>>>>>>>> configuration
>>>>>>>>>>>> and
>>>>>>>>>>>>>>> passed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. execution environment knows
>> all
>>>>>>>> configs
>>>>>>>>>> and
>>>>>>>>>>>>>> handles
>>>>>>>>>>>>>>>>>>>>>>>>>> execution(both
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment and submission)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the issues above I propose
>> eliminating
>>>>>>>>>>>>>>> inconsistencies
>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) CliFrontend should exactly
>> be a front
>>>>>>>> end,
>>>>>>>>>>> at
>>>>>>>>>>>>>> least
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>> "run"
>>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That means it just gathered
>> and passed
>>>>>>>> all
>>>>>>>>>>> config
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>>>>>>>>>>> line
>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the main method of user
>> program.
>>>>>>>> Execution
>>>>>>>>>>>>>> environment
>>>>>>>>>>>>>>>>> knows
>>>>>>>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> info
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and with an addition to utils
>> for
>>>>>>>>>>> ClusterClient,
>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>> gracefully
>>>>>>>>>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient by deploying or
>>>>>>>> retrieving. In
>>>>>>>>>>> this
>>>>>>>>>>>>>> way,
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hijack #execute/executePlan
>> methods and
>>>>>>>> can
>>>>>>>>>>>> remove
>>>>>>>>>>>>>>>> various
>>>>>>>>>>>>>>>>>>>>>>>>> hacking
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subclasses of exec env, as
>> well as #run
>>>>>>>>>> methods
>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient(for
>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interface-ized ClusterClient).
>> Now the
>>>>>>>>>> control
>>>>>>>>>>>> flow
>>>>>>>>>>>>>>> flows
>>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CliFrontend
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the main method and never
>> returns.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Job cluster means a cluster
>> for the
>>>>>>>>>> specific
>>>>>>>>>>>> job.
>>>>>>>>>>>>>>> From
>>>>>>>>>>>>>>>>>>>>>>>> another
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> perspective, it is an
>> ephemeral session.
>>>>>>>> We
>>>>>>>>>> may
>>>>>>>>>>>>>>> decouple
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with a compiled job graph, but
>> start a
>>>>>>>>>> session
>>>>>>>>>>>> with
>>>>>>>>>>>>>>> idle
>>>>>>>>>>>>>>>>>>>>>>>> timeout
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and submit the job following.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> These topics, before we go
>> into more
>>>>>>>> details
>>>>>>>>>> on
>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are better to be aware and
>> discussed for
>>>>>>>> a
>>>>>>>>>>>> consensus.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zili Chen <
>> wander4096@gmail.com>
>>>>>>>>>> 于2019年6月20日周四
>>>>>>>>>>>>>>> 上午3:21写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for raising this
>> thread and the
>>>>>>>>>> design
>>>>>>>>>>>>>>> document!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As @Thomas Weise mentioned
>> above,
>>>>>>>> extending
>>>>>>>>>>>> config
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires far more effort than
>> it should
>>>>>>>> be.
>>>>>>>>>>>> Another
>>>>>>>>>>>>>>>>> example
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is we achieve detach mode by
>> introduce
>>>>>>>>>> another
>>>>>>>>>>>>>>> execution
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment which also hijack
>> #execute
>>>>>>>>>> method.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with your idea that
>> user would
>>>>>>>>>>>> configure all
>>>>>>>>>>>>>>>>> things
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and flink "just" respect it.
>> On this
>>>>>>>> topic I
>>>>>>>>>>>> think
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> unusual
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control flow when CliFrontend
>> handle
>>>>>>>> "run"
>>>>>>>>>>>> command
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It handles several configs,
>> mainly about
>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>> settings,
>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thus main method of user
>> program is
>>>>>>>> unaware
>>>>>>>>>> of
>>>>>>>>>>>> them.
>>>>>>>>>>>>>>>> Also
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>> compiles
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> app to job graph by run the
>> main method
>>>>>>>>>> with a
>>>>>>>>>>>>>>> hijacked
>>>>>>>>>>>>>>>>> exec
>>>>>>>>>>>>>>>>>>>>>>>>> env,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which constrain the main
>> method further.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to write down a few
>> of notes on
>>>>>>>>>>>>>> configs/args
>>>>>>>>>>>>>>>> pass
>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>> respect,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as well as decoupling job
>> compilation
>>>>>>>> and
>>>>>>>>>>>>>> submission.
>>>>>>>>>>>>>>>>> Share
>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread later.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tison.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SHI Xiaogang <
>> shixiaogangg@gmail.com>
>>>>>>>>>>>> 于2019年6月17日周一
>>>>>>>>>>>>>>>>>>>>>>> 下午7:29写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff and Flavio,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Jeff a lot for
>> proposing the
>>>>>>>> design
>>>>>>>>>>>>>> document.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are also working on
>> refactoring
>>>>>>>>>>>> ClusterClient to
>>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>>>>>>>>> flexible
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> efficient job management in
>> our
>>>>>>>> real-time
>>>>>>>>>>>> platform.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We would like to draft a
>> document to
>>>>>>>> share
>>>>>>>>>>> our
>>>>>>>>>>>>>> ideas
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>> you.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think it's a good idea to
>> have
>>>>>>>> something
>>>>>>>>>>> like
>>>>>>>>>>>>>>> Apache
>>>>>>>>>>>>>>>>> Livy
>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the efforts discussed here
>> will take a
>>>>>>>>>> great
>>>>>>>>>>>> step
>>>>>>>>>>>>>>>> forward
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Xiaogang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flavio Pompermaier <
>>>>>>>> pompermaier@okkam.it>
>>>>>>>>>>>>>>>> 于2019年6月17日周一
>>>>>>>>>>>>>>>>>>>>>>>>>> 下午7:13写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Is there any possibility to
>> have
>>>>>>>> something
>>>>>>>>>>>> like
>>>>>>>>>>>>>>> Apache
>>>>>>>>>>>>>>>>>>>>>>> Livy
>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink in the future?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> https://livy.apache.org/
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 11, 2019 at
>> 5:23 PM Jeff
>>>>>>>>>> Zhang <
>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any API we expose
>> should not have
>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package or
>> other
>>>>>>>>>>>> implementation
>>>>>>>>>>>>>>>>>>>>>>> details.
>>>>>>>>>>>>>>>>>>>>>>>> To
>>>>>>>>>>>>>>>>>>>>>>>>>> me,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
>> ClusterClient
>>>>>>>> cannot be
>>>>>>>>>>>> exposed
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from the
>>>>>>>> optimiser and
>>>>>>>>>>>> runtime
>>>>>>>>>>>>>>>>>>>>>>>> packages.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We should change
>> ClusterClient from
>>>>>>>> class
>>>>>>>>>>> to
>>>>>>>>>>>>>>>> interface.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment only
>> use the
>>>>>>>>>> interface
>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in flink-clients while the
>> concrete
>>>>>>>>>>>>>> implementation
>>>>>>>>>>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-runtime.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What happens when a
>>>>>>>> failure/restart in
>>>>>>>>>>> the
>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>>>> happens?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to be a way of
>> re-establishing the
>>>>>>>>>>>> connection to
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> job,
>>>>>>>>>>>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>>>>>>>>>> up
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners again, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Good point.  First we need
>> to define
>>>>>>>> what
>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>>> failure/restart
>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client mean. IIUC, that
>> usually mean
>>>>>>>>>>> network
>>>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happen in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class RestClient. If my
>>>>>>>> understanding is
>>>>>>>>>>>> correct,
>>>>>>>>>>>>>>>>>>>>>>>>>> restart/retry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mechanism
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be done in
>> RestClient.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <
>>>>>>>> aljoscha@apache.org>
>>>>>>>>>>>>>>> 于2019年6月11日周二
>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:10写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Some points to consider:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * Any API we expose
>> should not have
>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (flink-runtime) package
>> or other
>>>>>>>>>>>> implementation
>>>>>>>>>>>>>>>>>>>>>>>> details.
>>>>>>>>>>>>>>>>>>>>>>>>> To
>>>>>>>>>>>>>>>>>>>>>>>>>>> me,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the current
>> ClusterClient
>>>>>>>> cannot be
>>>>>>>>>>>> exposed
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite some classes from
>> the
>>>>>>>> optimiser
>>>>>>>>>> and
>>>>>>>>>>>>>> runtime
>>>>>>>>>>>>>>>>>>>>>>>>> packages.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * What happens when a
>>>>>>>> failure/restart in
>>>>>>>>>>> the
>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>>>> happens?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be a way of
>> re-establishing the
>>>>>>>>>> connection
>>>>>>>>>>>> to
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> job,
>>>>>>>>>>>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>>>>>>>>> up
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> listeners
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 29. May 2019, at
>> 10:17, Jeff
>>>>>>>> Zhang <
>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry folks, the design
>> doc is
>>>>>>>> late as
>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>> expected.
>>>>>>>>>>>>>>>>>>>>>>>>>> Here's
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doc
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I drafted, welcome any
>> comments and
>>>>>>>>>>>> feedback.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Stephan Ewen <
>> sewen@apache.org>
>>>>>>>>>>>> 于2019年2月14日周四
>>>>>>>>>>>>>>>>>>>>>>>>> 下午8:43写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Nice that this
>> discussion is
>>>>>>>>>> happening.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the FLIP, we could
>> also
>>>>>>>> revisit the
>>>>>>>>>>>> entire
>>>>>>>>>>>>>>> role
>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environments
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> again.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the idea was:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the environments take
>> care of
>>>>>>>> the
>>>>>>>>>>>> specific
>>>>>>>>>>>>>>>>>>>>>>> setup
>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> standalone
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (no
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> setup needed), yarn,
>> mesos, etc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - the session ones have
>> control
>>>>>>>> over
>>>>>>>>>> the
>>>>>>>>>>>>>>> session.
>>>>>>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> holds
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the session client.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - running a job gives a
>> "control"
>>>>>>>>>> object
>>>>>>>>>>>> for
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>> job.
>>>>>>>>>>>>>>>>>>>>>>>>>>> That
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behavior
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the same in all
>> environments.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The actual
>> implementation diverged
>>>>>>>>>> quite
>>>>>>>>>>>> a bit
>>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>>>> that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Happy
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about
>> straitening this
>>>>>>>> out
>>>>>>>>>> a
>>>>>>>>>>>> bit
>>>>>>>>>>>>>>> more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at
>> 4:58 AM
>>>>>>>> Jeff
>>>>>>>>>>>> Zhang <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for late
>> response, It
>>>>>>>> seems we
>>>>>>>>>>>> reach
>>>>>>>>>>>>>>>>>>>>>>>> consensus
>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP for this with
>> more detailed
>>>>>>>>>> design
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas Weise <
>> thw@apache.org>
>>>>>>>>>>>> 于2018年12月21日周五
>>>>>>>>>>>>>>>>>>>>>>>>>> 上午11:43写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great to see this
>> discussion
>>>>>>>> seeded!
>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>> problems
>>>>>>>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>>>>>> face
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin integration
>> are also
>>>>>>>>>>> affecting
>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We just enabled the
>> savepoint
>>>>>>>>>> restore
>>>>>>>>>>>> option
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RemoteStreamEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and that was more
>> difficult
>>>>>>>> than it
>>>>>>>>>>>> should
>>>>>>>>>>>>>> be.
>>>>>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>>> main
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment and
>> cluster client
>>>>>>>>>> aren't
>>>>>>>>>>>>>>> decoupled.
>>>>>>>>>>>>>>>>>>>>>>>>>> Ideally
>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible to just get
>> the
>>>>>>>> matching
>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> environment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then control the job
>> through it
>>>>>>>>>>>> (environment
>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>>>> factory
>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client). But note
>> that the
>>>>>>>>>> environment
>>>>>>>>>>>>>> classes
>>>>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>>> part
>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and it is not
>> straightforward to
>>>>>>>>>> make
>>>>>>>>>>>> larger
>>>>>>>>>>>>>>>>>>>>>>>> changes
>>>>>>>>>>>>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> breaking
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backward
>> compatibility.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> currently exposes
>>>>>>>>>>> internal
>>>>>>>>>>>>>>> classes
>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobGraph and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> StreamGraph. But it
>> should be
>>>>>>>>>> possible
>>>>>>>>>>>> to
>>>>>>>>>>>>>> wrap
>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>> with a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that brings the
>> required job
>>>>>>>> control
>>>>>>>>>>>>>>>>>>>>>>> capabilities
>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Perhaps it is helpful
>> to look at
>>>>>>>>>> some
>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>>>>>>>>>>>> interfaces
>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Beam
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> while
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking about this:
>> [2] for the
>>>>>>>>>>>> portable
>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>> API
>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> old
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> asynchronous job
>> control from
>>>>>>>> the
>>>>>>>>>> Beam
>>>>>>>>>>>> Java
>>>>>>>>>>>>>>> SDK.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The backward
>> compatibility
>>>>>>>>>> discussion
>>>>>>>>>>>> [4] is
>>>>>>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>>>>>>>>>> relevant
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here. A
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> API
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should shield
>> downstream
>>>>>>>> projects
>>>>>>>>>> from
>>>>>>>>>>>>>>> internals
>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> them to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interoperate with
>> multiple
>>>>>>>> future
>>>>>>>>>>> Flink
>>>>>>>>>>>>>>> versions
>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> release
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without forced
>> upgrades.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> https://github.com/apache/flink/pull/7249
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [4]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018
>> at 6:15 PM
>>>>>>>> Jeff
>>>>>>>>>>>> Zhang <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
>> whether the
>>>>>>>> user
>>>>>>>>>>>> should
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>> able
>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> define
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
>> example Yarn).
>>>>>>>> This
>>>>>>>>>> is
>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>>>>>>>>>>>>>> independent
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
>> something
>>>>>>>> which
>>>>>>>>>> is
>>>>>>>>>>>>>> decided
>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> User don't need to
>> specify
>>>>>>>>>> execution
>>>>>>>>>>>> mode
>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatically.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> They
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass the execution
>> mode from
>>>>>>>> the
>>>>>>>>>>>> arguments
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>> run
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> command.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>> yarn-cluster
>>>>>>>> ....
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>> local ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bin/flink run -m
>> host:port ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does this make sense
>> to you ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To me it makes
>> sense that
>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized by the
>> user and
>>>>>>>> instead
>>>>>>>>>>>> context
>>>>>>>>>>>>>>>>>>>>>>>>> sensitive
>>>>>>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute your job
>> (Flink CLI vs.
>>>>>>>>>> IDE,
>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>> example).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Right, currently I
>> notice Flink
>>>>>>>>>> would
>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>> ContextExecutionEnvironment
>>>>>>>> based
>>>>>>>>>> on
>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>> submission
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scenarios
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cli vs IDE). To me
>> this is
>>>>>>>> kind of
>>>>>>>>>>> hack
>>>>>>>>>>>>>>>>>>>>>>> approach,
>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> straightforward.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I suggested
>> above is that
>>>>>>>> is
>>>>>>>>>>> that
>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>>>>>> always
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> but with
>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>> configuration,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration it
>> would create
>>>>>>>> the
>>>>>>>>>>>> proper
>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behaviors.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
>>>>>>>>>> trohrmann@apache.org>
>>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月20日周四
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午11:18写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You are probably
>> right that we
>>>>>>>>>> have
>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>> duplication
>>>>>>>>>>>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creation of the
>> ClusterClient.
>>>>>>>>>> This
>>>>>>>>>>>> should
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>> reduced
>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not so sure
>> whether the
>>>>>>>> user
>>>>>>>>>>>> should be
>>>>>>>>>>>>>>>>>>>>>>> able
>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> define
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs (in your
>> example Yarn).
>>>>>>>> This
>>>>>>>>>> is
>>>>>>>>>>>>>>> actually
>>>>>>>>>>>>>>>>>>>>>>>>>>>> independent
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> development and is
>> something
>>>>>>>> which
>>>>>>>>>>> is
>>>>>>>>>>>>>>> decided
>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> makes sense that the
>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initialized
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user and
>> instead context
>>>>>>>>>>>> sensitive how
>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execute
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE,
>> for
>>>>>>>> example).
>>>>>>>>>>>>>> However, I
>>>>>>>>>>>>>>>>>>>>>>>> agree
>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>> ExecutionEnvironment should
>>>>>>>> give
>>>>>>>>>> you
>>>>>>>>>>>>>> access
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job (maybe in the
>> form of the
>>>>>>>>>>>> JobGraph or
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>> plan).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 13,
>> 2018 at 4:36
>>>>>>>> AM
>>>>>>>>>> Jeff
>>>>>>>>>>>>>> Zhang <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Till,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the
>> feedback. You
>>>>>>>> are
>>>>>>>>>>>> right
>>>>>>>>>>>>>>> that I
>>>>>>>>>>>>>>>>>>>>>>>>>> expect
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>> submission/control api
>>>>>>>> which
>>>>>>>>>>>> could be
>>>>>>>>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> project.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it would benefit
>> for the
>>>>>>>> flink
>>>>>>>>>>>> ecosystem.
>>>>>>>>>>>>>>>>>>>>>>> When
>>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scala-shell and
>> sql-client (I
>>>>>>>>>>> believe
>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> core of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> belong to the
>> ecosystem of
>>>>>>>>>> flink),
>>>>>>>>>>> I
>>>>>>>>>>>> find
>>>>>>>>>>>>>>>>>>>>>>> many
>>>>>>>>>>>>>>>>>>>>>>>>>>>> duplicated
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creating
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient from
>> user
>>>>>>>> provided
>>>>>>>>>>>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (configuration
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> format
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different from
>> scala-shell
>>>>>>>> and
>>>>>>>>>>>>>> sql-client)
>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to manipulate
>> jobs. I don't
>>>>>>>> think
>>>>>>>>>>>> this is
>>>>>>>>>>>>>>>>>>>>>>>>>> convenient
>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projects. What I
>> expect is
>>>>>>>> that
>>>>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>>>>>>>>>> project
>>>>>>>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> necessary
>> configuration info
>>>>>>>>>> (maybe
>>>>>>>>>>>>>>>>>>>>>>> introducing
>>>>>>>>>>>>>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> build
>> ExecutionEnvironment
>>>>>>>> based
>>>>>>>>>> on
>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>> FlinkConf,
>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>> ExecutionEnvironment will
>>>>>>>> create
>>>>>>>>>>> the
>>>>>>>>>>>>>> proper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> benefit for the
>> downstream
>>>>>>>>>> project
>>>>>>>>>>>>>>>>>>>>>>> development
>>>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their integration
>> test with
>>>>>>>>>> flink.
>>>>>>>>>>>> Here's
>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>>>>>>> sample
>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> snippet
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expect.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val conf = new
>>>>>>>>>>>> FlinkConf().mode("yarn")
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val env = new
>>>>>>>>>>>> ExecutionEnvironment(conf)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobId =
>> env.submit(...)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> val jobStatus =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What do you think ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
>>>>>>>>>>> trohrmann@apache.org>
>>>>>>>>>>>>>>>>>>>>>>>>>> 于2018年12月11日周二
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 下午6:28写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what you are
>> proposing is to
>>>>>>>>>>>> provide the
>>>>>>>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> better
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> programmatic
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control. There
>> was actually
>>>>>>>> an
>>>>>>>>>>>> effort to
>>>>>>>>>>>>>>>>>>>>>>>> achieve
>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> never
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> completed [1].
>> However,
>>>>>>>> there
>>>>>>>>>> are
>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>>>>> improvement
>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> base
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Look for example
>> at the
>>>>>>>>>>>> NewClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>> interface
>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> offers a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> non-blocking job
>> submission.
>>>>>>>>>> But I
>>>>>>>>>>>> agree
>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improve
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this regard.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would not be in
>> favour if
>>>>>>>>>>>> exposing all
>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> calls
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>> ExecutionEnvironment
>>>>>>>> because it
>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>>> clutter
>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> good separation
>> of concerns.
>>>>>>>>>>>> Instead one
>>>>>>>>>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retrieve
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current
>> ClusterClient from
>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for cluster and
>> job
>>>>>>>> control. But
>>>>>>>>>>>> before
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> effort
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> agree and capture
>> what
>>>>>>>>>>>> functionality we
>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Initially, the
>> idea was
>>>>>>>> that we
>>>>>>>>>>>> have the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> describing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to talk to
>> cluster manager
>>>>>>>> like
>>>>>>>>>>>> Yarn or
>>>>>>>>>>>>>>>>>>>>>>> Mesos.
>>>>>>>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterDescriptor
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> used for
>> deploying Flink
>>>>>>>>>> clusters
>>>>>>>>>>>> (job
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>> session)
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gives
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
>>>>>>>> ClusterClient
>>>>>>>>>>>>>> controls
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (e.g.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, listing all
>> running
>>>>>>>> jobs).
>>>>>>>>>>> And
>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> JobClient which
>> you obtain
>>>>>>>> from
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trigger
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operations (e.g.
>> taking a
>>>>>>>>>>> savepoint,
>>>>>>>>>>>>>>>>>>>>>>>> cancelling
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-4272
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11,
>> 2018 at
>>>>>>>> 10:13 AM
>>>>>>>>>>>> Jeff
>>>>>>>>>>>>>>> Zhang
>>>>>>>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zjffdu@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am trying to
>> integrate
>>>>>>>> flink
>>>>>>>>>>> into
>>>>>>>>>>>>>>> apache
>>>>>>>>>>>>>>>>>>>>>>>>>> zeppelin
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> notebook. And I
>> hit several
>>>>>>>>>>> issues
>>>>>>>>>>>> that
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>> caused
>>>>>>>>>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to
>> proposal the
>>>>>>>>>>> following
>>>>>>>>>>>>>>> changes
>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Support
>> nonblocking
>>>>>>>>>> execution.
>>>>>>>>>>>>>>>>>>>>>>> Currently,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>> ExecutionEnvironment#execute
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a blocking
>> method which
>>>>>>>>>> would
>>>>>>>>>>>> do 2
>>>>>>>>>>>>>>>>>>>>>>> things,
>>>>>>>>>>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wait for job
>> until it is
>>>>>>>>>>> finished.
>>>>>>>>>>>> I'd
>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nonblocking
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution method
>> like
>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#submit
>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submit
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then return
>> jobId to
>>>>>>>> client.
>>>>>>>>>> And
>>>>>>>>>>>> allow
>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>> query
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> status
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel
>> api in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>> ExecutionEnvironment/StreamExecutionEnvironment,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> currently the
>> only way to
>>>>>>>>>> cancel
>>>>>>>>>>>> job is
>>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>>>>>>>>> cli
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (bin/flink),
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> convenient for
>> downstream
>>>>>>>>>> project
>>>>>>>>>>>> to
>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>> feature.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So I'd
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel api in
>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint
>> api in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is similar as
>> cancel api,
>>>>>>>> we
>>>>>>>>>>>> should use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unified
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> api for third
>> party to
>>>>>>>>>> integrate
>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>> flink.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. Add listener
>> for job
>>>>>>>>>> execution
>>>>>>>>>>>>>>>>>>>>>>> lifecycle.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Something
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that downstream
>> project
>>>>>>>> can do
>>>>>>>>>>>> custom
>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lifecycle
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Zeppelin would
>> capture the
>>>>>>>>>> jobId
>>>>>>>>>>>> after
>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitted
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobId to cancel
>> it later
>>>>>>>> when
>>>>>>>>>>>>>> necessary.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> public interface
>>>>>>>> JobListener {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>> onJobSubmitted(JobID
>>>>>>>>>>> jobId);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>>>>>>>>>>>> onJobExecuted(JobExecutionResult
>>>>>>>>>>>>>>>>>>>>>>>>>> jobResult);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> void
>> onJobCanceled(JobID
>>>>>>>>>> jobId);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5. Enable
>> session in
>>>>>>>>>>>>>>> ExecutionEnvironment.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently
>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> disabled,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> session is very
>> convenient
>>>>>>>> for
>>>>>>>>>>>> third
>>>>>>>>>>>>>>> party
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> submitting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> continually.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I hope flink can
>> enable it
>>>>>>>>>> again.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6. Unify all
>> flink client
>>>>>>>> api
>>>>>>>>>>> into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a long
>> term issue
>>>>>>>> which
>>>>>>>>>>>> needs
>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>>>>>> careful
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently some
>> of features
>>>>>>>> of
>>>>>>>>>>>> flink is
>>>>>>>>>>>>>>>>>>>>>>>> exposed
>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment,
>>>>>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exposed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cli instead of
>> api, like
>>>>>>>> the
>>>>>>>>>>>> cancel and
>>>>>>>>>>>>>>>>>>>>>>>>>> savepoint I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> above.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think the root
>> cause is
>>>>>>>> due to
>>>>>>>>>>> that
>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>> didn't
>>>>>>>>>>>>>>>>>>>>>>>>>>>> unify
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interaction
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink. Here I
>> list 3
>>>>>>>> scenarios
>>>>>>>>>> of
>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>> operation
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Local job
>> execution.
>>>>>>>> Flink
>>>>>>>>>>> will
>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>> LocalEnvironment to
>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>> LocalExecutor
>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Remote job
>> execution.
>>>>>>>> Flink
>>>>>>>>>>> will
>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>> ContextEnvironment
>>>>>>>>>> based
>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> run
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Job
>> cancelation. Flink
>>>>>>>> will
>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cancel
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this job via
>> this
>>>>>>>>>> ClusterClient.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As you can see
>> in the
>>>>>>>> above 3
>>>>>>>>>>>>>> scenarios.
>>>>>>>>>>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>>>>>>>>>>> didn't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approach(code
>> path) to
>>>>>>>> interact
>>>>>>>>>>>> with
>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What I propose is
>>>>>>>> following:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Create the proper
>>>>>>>>>>>>>>>>>>>>>>>>>>> LocalEnvironment/RemoteEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (based
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration)
>> --> Use this
>>>>>>>>>>>> Environment
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proper
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>> (LocalClusterClient or
>>>>>>>>>>>>>> RestClusterClient)
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interactive
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink (
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution or
>> cancelation)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This way we can
>> unify the
>>>>>>>>>> process
>>>>>>>>>>>> of
>>>>>>>>>>>>>>> local
>>>>>>>>>>>>>>>>>>>>>>>>>>> execution
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remote
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And it is much
>> easier for
>>>>>>>> third
>>>>>>>>>>>> party
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>> integrate
>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>> ExecutionEnvironment is the
>>>>>>>>>>> unified
>>>>>>>>>>>>>> entry
>>>>>>>>>>>>>>>>>>>>>>>> point
>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> third
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> party
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needs to do is
>> just pass
>>>>>>>>>>>> configuration
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>> ExecutionEnvironment will
>>>>>>>> do
>>>>>>>>>> the
>>>>>>>>>>>> right
>>>>>>>>>>>>>>>>>>>>>>> thing
>>>>>>>>>>>>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink cli can
>> also be
>>>>>>>>>> considered
>>>>>>>>>>> as
>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>>>> api
>>>>>>>>>>>>>>>>>>>>>>>>>>>> consumer.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pass
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configuration to
>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> let
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create the proper
>>>>>>>> ClusterClient
>>>>>>>>>>>> instead
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>> letting
>>>>>>>>>>>>>>>>>>>>>>>>>>>> cli
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ClusterClient
>> directly.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 6 would involve
>> large code
>>>>>>>>>>>> refactoring,
>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> defer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> future release,
>> 1,2,3,4,5
>>>>>>>> could
>>>>>>>>>>> be
>>>>>>>>>>>> done
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>> once I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> believe.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comments and
>> feedback,
>>>>>>>> thanks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best Regards
>>>>>>> 
>>>>>>> Jeff Zhang
>> 
>>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

Hi Aljoscha,

I'd like to gather all the ideas here and among documents, and draft a
formal FLIP
that keep us on the same page. Hopefully I start a FLIP thread in next week.

For the implementation or said POC part, I'd like to work with you guys who
proposed
the concept Executor to make sure that we go in the same direction. I'm
wondering
whether a dedicate thread or a Slack group is the proper one. In my opinion
we can
involve the team in a Slack group, concurrent with the FLIP process start
our branch
and once we reach a consensus on the FLIP, open an umbrella issue about the
framework
and start subtasks. What do you think?

Best,
tison.


Aljoscha Krettek <al...@apache.org> 于2019年9月5日周四 下午9:39写道：

> Hi Tison,
>
> To keep this moving forward, maybe you want to start working on a proof of
> concept implementation for the new JobClient interface, maybe with a new
> method executeAsync() in the environment that returns the JobClient and
> implement the methods to see how that works and to see where we get. Would
> you be interested in that?
>
> Also, at some point we should collect all the ideas and start forming an
> actual FLIP.
>
> Best,
> Aljoscha
>
> > On 4. Sep 2019, at 12:04, Zili Chen <wa...@gmail.com> wrote:
> >
> > Thanks for your update Kostas!
> >
> > It looks good to me that clean up existing code paths as first
> > pass. I'd like to help on review and file subtasks if I find ones.
> >
> > Best,
> > tison.
> >
> >
> > Kostas Kloudas <kk...@gmail.com> 于2019年9月4日周三 下午5:52写道：
> > Here is the issue, and I will keep on updating it as I find more issues.
> >
> > https://issues.apache.org/jira/browse/FLINK-13954
> >
> > This will also cover the refactoring of the Executors that we discussed
> > in this thread, without any additional functionality (such as the job
> client).
> >
> > Kostas
> >
> > On Wed, Sep 4, 2019 at 11:46 AM Kostas Kloudas <kk...@gmail.com>
> wrote:
> > >
> > > Great idea Tison!
> > >
> > > I will create the umbrella issue and post it here so that we are all
> > > on the same page!
> > >
> > > Cheers,
> > > Kostas
> > >
> > > On Wed, Sep 4, 2019 at 11:36 AM Zili Chen <wa...@gmail.com>
> wrote:
> > > >
> > > > Hi Kostas & Aljoscha,
> > > >
> > > > I notice that there is a JIRA(FLINK-13946) which could be included
> > > > in this refactor thread. Since we agree on most of directions in
> > > > big picture, is it reasonable that we create an umbrella issue for
> > > > refactor client APIs and also linked subtasks? It would be a better
> > > > way that we join forces of our community.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > Zili Chen <wa...@gmail.com> 于2019年8月31日周六 下午12:52写道：
> > > >>
> > > >> Great Kostas! Looking forward to your POC!
> > > >>
> > > >> Best,
> > > >> tison.
> > > >>
> > > >>
> > > >> Jeff Zhang <zj...@gmail.com> 于2019年8月30日周五 下午11:07写道：
> > > >>>
> > > >>> Awesome, @Kostas Looking forward your POC.
> > > >>>
> > > >>> Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：
> > > >>>
> > > >>> > Hi all,
> > > >>> >
> > > >>> > I am just writing here to let you know that I am working on a
> POC that
> > > >>> > tries to refactor the current state of job submission in Flink.
> > > >>> > I want to stress out that it introduces NO CHANGES to the current
> > > >>> > behaviour of Flink. It just re-arranges things and introduces the
> > > >>> > notion of an Executor, which is the entity responsible for
> taking the
> > > >>> > user-code and submitting it for execution.
> > > >>> >
> > > >>> > Given this, the discussion about the functionality that the
> JobClient
> > > >>> > will expose to the user can go on independently and the same
> > > >>> > holds for all the open questions so far.
> > > >>> >
> > > >>> > I hope I will have some more new to share soon.
> > > >>> >
> > > >>> > Thanks,
> > > >>> > Kostas
> > > >>> >
> > > >>> > On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <da...@gmail.com>
> wrote:
> > > >>> > >
> > > >>> > > Hi Zili,
> > > >>> > >
> > > >>> > > It make sense to me that a dedicated cluster is started for a
> per-job
> > > >>> > > cluster and will not accept more jobs.
> > > >>> > > Just have a question about the command line.
> > > >>> > >
> > > >>> > > Currently we could use the following commands to start
> different
> > > >>> > clusters.
> > > >>> > > *per-job cluster*
> > > >>> > > ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
> > > >>> > > examples/streaming/WindowJoin.jar
> > > >>> > > *session cluster*
> > > >>> > > ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
> > > >>> > > examples/streaming/WindowJoin.jar
> > > >>> > >
> > > >>> > > What will it look like after client enhancement?
> > > >>> > >
> > > >>> > >
> > > >>> > > Best,
> > > >>> > > Yang
> > > >>> > >
> > > >>> > > Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
> > > >>> > >
> > > >>> > > > Hi Till,
> > > >>> > > >
> > > >>> > > > Thanks for your update. Nice to hear :-)
> > > >>> > > >
> > > >>> > > > Best,
> > > >>> > > > tison.
> > > >>> > > >
> > > >>> > > >
> > > >>> > > > Till Rohrmann <tr...@apache.org> 于2019年8月23日周五
> 下午10:39写道：
> > > >>> > > >
> > > >>> > > > > Hi Tison,
> > > >>> > > > >
> > > >>> > > > > just a quick comment concerning the class loading issues
> when using
> > > >>> > the
> > > >>> > > > per
> > > >>> > > > > job mode. The community wants to change it so that the
> > > >>> > > > > StandaloneJobClusterEntryPoint actually uses the user code
> class
> > > >>> > loader
> > > >>> > > > > with child first class loading [1]. Hence, I hope that
> this problem
> > > >>> > will
> > > >>> > > > be
> > > >>> > > > > resolved soon.
> > > >>> > > > >
> > > >>> > > > > [1] https://issues.apache.org/jira/browse/FLINK-13840
> > > >>> > > > >
> > > >>> > > > > Cheers,
> > > >>> > > > > Till
> > > >>> > > > >
> > > >>> > > > > On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <
> kkloudas@gmail.com>
> > > >>> > > > wrote:
> > > >>> > > > >
> > > >>> > > > > > Hi all,
> > > >>> > > > > >
> > > >>> > > > > > On the topic of web submission, I agree with Till that
> it only
> > > >>> > seems
> > > >>> > > > > > to complicate things.
> > > >>> > > > > > It is bad for security, job isolation (anybody can
> submit/cancel
> > > >>> > jobs),
> > > >>> > > > > > and its
> > > >>> > > > > > implementation complicates some parts of the code. So,
> if it were
> > > >>> > to
> > > >>> > > > > > redesign the
> > > >>> > > > > > WebUI, maybe this part could be left out. In addition, I
> would say
> > > >>> > > > > > that the ability to cancel
> > > >>> > > > > > jobs could also be left out.
> > > >>> > > > > >
> > > >>> > > > > > Also I would also be in favour of removing the
> "detached" mode, for
> > > >>> > > > > > the reasons mentioned
> > > >>> > > > > > above (i.e. because now we will have a future
> representing the
> > > >>> > result
> > > >>> > > > > > on which the user
> > > >>> > > > > > can choose to wait or not).
> > > >>> > > > > >
> > > >>> > > > > > Now for the separating job submission and cluster
> creation, I am in
> > > >>> > > > > > favour of keeping both.
> > > >>> > > > > > Once again, the reasons are mentioned above by Stephan,
> Till,
> > > >>> > Aljoscha
> > > >>> > > > > > and also Zili seems
> > > >>> > > > > > to agree. They mainly have to do with security,
> isolation and ease
> > > >>> > of
> > > >>> > > > > > resource management
> > > >>> > > > > > for the user as he knows that "when my job is done,
> everything
> > > >>> > will be
> > > >>> > > > > > cleared up". This is
> > > >>> > > > > > also the experience you get when launching a process on
> your local
> > > >>> > OS.
> > > >>> > > > > >
> > > >>> > > > > > On excluding the per-job mode from returning a JobClient
> or not, I
> > > >>> > > > > > believe that eventually
> > > >>> > > > > > it would be nice to allow users to get back a jobClient.
> The
> > > >>> > reason is
> > > >>> > > > > > that 1) I cannot
> > > >>> > > > > > find any objective reason why the user-experience should
> diverge,
> > > >>> > and
> > > >>> > > > > > 2) this will be the
> > > >>> > > > > > way that the user will be able to interact with his
> running job.
> > > >>> > > > > > Assuming that the necessary
> > > >>> > > > > > ports are open for the REST API to work, then I think
> that the
> > > >>> > > > > > JobClient can run against the
> > > >>> > > > > > REST API without problems. If the needed ports are not
> open, then
> > > >>> > we
> > > >>> > > > > > are safe to not return
> > > >>> > > > > > a JobClient, as the user explicitly chose to close all
> points of
> > > >>> > > > > > communication to his running job.
> > > >>> > > > > >
> > > >>> > > > > > On the topic of not hijacking the "env.execute()" in
> order to get
> > > >>> > the
> > > >>> > > > > > Plan, I definitely agree but
> > > >>> > > > > > for the proposal of having a "compile()" method in the
> env, I would
> > > >>> > > > > > like to have a better look at
> > > >>> > > > > > the existing code.
> > > >>> > > > > >
> > > >>> > > > > > Cheers,
> > > >>> > > > > > Kostas
> > > >>> > > > > >
> > > >>> > > > > > On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <
> wander4096@gmail.com>
> > > >>> > > > wrote:
> > > >>> > > > > > >
> > > >>> > > > > > > Hi Yang,
> > > >>> > > > > > >
> > > >>> > > > > > > It would be helpful if you check Stephan's last
> comment,
> > > >>> > > > > > > which states that isolation is important.
> > > >>> > > > > > >
> > > >>> > > > > > > For per-job mode, we run a dedicated cluster(maybe it
> > > >>> > > > > > > should have been a couple of JM and TMs during FLIP-6
> > > >>> > > > > > > design) for a specific job. Thus the process is
> prevented
> > > >>> > > > > > > from other jobs.
> > > >>> > > > > > >
> > > >>> > > > > > > In our cases there was a time we suffered from multi
> > > >>> > > > > > > jobs submitted by different users and they affected
> > > >>> > > > > > > each other so that all ran into an error state. Also,
> > > >>> > > > > > > run the client inside the cluster could save client
> > > >>> > > > > > > resource at some points.
> > > >>> > > > > > >
> > > >>> > > > > > > However, we also face several issues as you mentioned,
> > > >>> > > > > > > that in per-job mode it always uses parent classloader
> > > >>> > > > > > > thus classloading issues occur.
> > > >>> > > > > > >
> > > >>> > > > > > > BTW, one can makes an analogy between session/per-job
> mode
> > > >>> > > > > > > in  Flink, and client/cluster mode in Spark.
> > > >>> > > > > > >
> > > >>> > > > > > > Best,
> > > >>> > > > > > > tison.
> > > >>> > > > > > >
> > > >>> > > > > > >
> > > >>> > > > > > > Yang Wang <da...@gmail.com> 于2019年8月22日周四
> 上午11:25写道：
> > > >>> > > > > > >
> > > >>> > > > > > > > From the user's perspective, it is really confused
> about the
> > > >>> > scope
> > > >>> > > > of
> > > >>> > > > > > > > per-job cluster.
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > > > If it means a flink cluster with single job, so that
> we could
> > > >>> > get
> > > >>> > > > > > better
> > > >>> > > > > > > > isolation.
> > > >>> > > > > > > >
> > > >>> > > > > > > > Now it does not matter how we deploy the cluster,
> directly
> > > >>> > > > > > deploy(mode1)
> > > >>> > > > > > > >
> > > >>> > > > > > > > or start a flink cluster and then submit job through
> cluster
> > > >>> > > > > > client(mode2).
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > > > Otherwise, if it just means directly deploy, how
> should we
> > > >>> > name the
> > > >>> > > > > > mode2,
> > > >>> > > > > > > >
> > > >>> > > > > > > > session with job or something else?
> > > >>> > > > > > > >
> > > >>> > > > > > > > We could also benefit from the mode2. Users could
> get the same
> > > >>> > > > > > isolation
> > > >>> > > > > > > > with mode1.
> > > >>> > > > > > > >
> > > >>> > > > > > > > The user code and dependencies will be loaded by
> user class
> > > >>> > loader
> > > >>> > > > > > > >
> > > >>> > > > > > > > to avoid class conflict with framework.
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > > > Anyway, both of the two submission modes are useful.
> > > >>> > > > > > > >
> > > >>> > > > > > > > We just need to clarify the concepts.
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > > > Best,
> > > >>> > > > > > > >
> > > >>> > > > > > > > Yang
> > > >>> > > > > > > >
> > > >>> > > > > > > > Zili Chen <wa...@gmail.com> 于2019年8月20日周二
> 下午5:58写道：
> > > >>> > > > > > > >
> > > >>> > > > > > > > > Thanks for the clarification.
> > > >>> > > > > > > > >
> > > >>> > > > > > > > > The idea JobDeployer ever came into my mind when I
> was
> > > >>> > muddled
> > > >>> > > > with
> > > >>> > > > > > > > > how to execute per-job mode and session mode with
> the same
> > > >>> > user
> > > >>> > > > > code
> > > >>> > > > > > > > > and framework codepath.
> > > >>> > > > > > > > >
> > > >>> > > > > > > > > With the concept JobDeployer we back to the
> statement that
> > > >>> > > > > > environment
> > > >>> > > > > > > > > knows every configs of cluster deployment and job
> > > >>> > submission. We
> > > >>> > > > > > > > > configure or generate from configuration a specific
> > > >>> > JobDeployer
> > > >>> > > > in
> > > >>> > > > > > > > > environment and then code align on
> > > >>> > > > > > > > >
> > > >>> > > > > > > > > *JobClient client = env.execute().get();*
> > > >>> > > > > > > > >
> > > >>> > > > > > > > > which in session mode returned by
> clusterClient.submitJob
> > > >>> > and in
> > > >>> > > > > > per-job
> > > >>> > > > > > > > > mode returned by
> clusterDescriptor.deployJobCluster.
> > > >>> > > > > > > > >
> > > >>> > > > > > > > > Here comes a problem that currently we directly run
> > > >>> > > > > ClusterEntrypoint
> > > >>> > > > > > > > > with extracted job graph. Follow the JobDeployer
> way we'd
> > > >>> > better
> > > >>> > > > > > > > > align entry point of per-job deployment at
> JobDeployer.
> > > >>> > Users run
> > > >>> > > > > > > > > their main method or by a Cli(finally call main
> method) to
> > > >>> > deploy
> > > >>> > > > > the
> > > >>> > > > > > > > > job cluster.
> > > >>> > > > > > > > >
> > > >>> > > > > > > > > Best,
> > > >>> > > > > > > > > tison.
> > > >>> > > > > > > > >
> > > >>> > > > > > > > >
> > > >>> > > > > > > > > Stephan Ewen <se...@apache.org> 于2019年8月20日周二
> 下午4:40写道：
> > > >>> > > > > > > > >
> > > >>> > > > > > > > > > Till has made some good comments here.
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > > > Two things to add:
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > > >   - The job mode is very nice in the way that it
> runs the
> > > >>> > > > client
> > > >>> > > > > > inside
> > > >>> > > > > > > > > the
> > > >>> > > > > > > > > > cluster (in the same image/process that is the
> JM) and thus
> > > >>> > > > > unifies
> > > >>> > > > > > > > both
> > > >>> > > > > > > > > > applications and what the Spark world calls the
> "driver
> > > >>> > mode".
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > > >   - Another thing I would add is that during the
> FLIP-6
> > > >>> > design,
> > > >>> > > > > we
> > > >>> > > > > > were
> > > >>> > > > > > > > > > thinking about setups where Dispatcher and
> JobManager are
> > > >>> > > > > separate
> > > >>> > > > > > > > > > processes.
> > > >>> > > > > > > > > >     A Yarn or Mesos Dispatcher of a session
> could run
> > > >>> > > > > independently
> > > >>> > > > > > > > (even
> > > >>> > > > > > > > > > as privileged processes executing no code).
> > > >>> > > > > > > > > >     Then you the "per-job" mode could still be
> helpful:
> > > >>> > when a
> > > >>> > > > > job
> > > >>> > > > > > is
> > > >>> > > > > > > > > > submitted to the dispatcher, it launches the JM
> again in a
> > > >>> > > > > per-job
> > > >>> > > > > > > > mode,
> > > >>> > > > > > > > > so
> > > >>> > > > > > > > > > that JM and TM processes are bound to teh job
> only. For
> > > >>> > higher
> > > >>> > > > > > security
> > > >>> > > > > > > > > > setups, it is important that processes are not
> reused
> > > >>> > across
> > > >>> > > > > jobs.
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > > > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
> > > >>> > > > > > trohrmann@apache.org>
> > > >>> > > > > > > > > > wrote:
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > > > > I would not be in favour of getting rid of the
> per-job
> > > >>> > mode
> > > >>> > > > > > since it
> > > >>> > > > > > > > > > > simplifies the process of running Flink jobs
> > > >>> > considerably.
> > > >>> > > > > > Moreover,
> > > >>> > > > > > > > it
> > > >>> > > > > > > > > > is
> > > >>> > > > > > > > > > > not only well suited for container deployments
> but also
> > > >>> > for
> > > >>> > > > > > > > deployments
> > > >>> > > > > > > > > > > where you want to guarantee job isolation. For
> example, a
> > > >>> > > > user
> > > >>> > > > > > could
> > > >>> > > > > > > > > use
> > > >>> > > > > > > > > > > the per-job mode on Yarn to execute his job on
> a separate
> > > >>> > > > > > cluster.
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > > > I think that having two notions of cluster
> deployments
> > > >>> > > > (session
> > > >>> > > > > > vs.
> > > >>> > > > > > > > > > per-job
> > > >>> > > > > > > > > > > mode) does not necessarily contradict your
> ideas for the
> > > >>> > > > client
> > > >>> > > > > > api
> > > >>> > > > > > > > > > > refactoring. For example one could have the
> following
> > > >>> > > > > interfaces:
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > > > - ClusterDeploymentDescriptor: encapsulates
> the logic
> > > >>> > how to
> > > >>> > > > > > deploy a
> > > >>> > > > > > > > > > > cluster.
> > > >>> > > > > > > > > > > - ClusterClient: allows to interact with a
> cluster
> > > >>> > > > > > > > > > > - JobClient: allows to interact with a running
> job
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > > > Now the ClusterDeploymentDescriptor could have
> two
> > > >>> > methods:
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > > > - ClusterClient deploySessionCluster()
> > > >>> > > > > > > > > > > - JobClusterClient/JobClient
> > > >>> > deployPerJobCluster(JobGraph)
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > > > where JobClusterClient is either a supertype of
> > > >>> > ClusterClient
> > > >>> > > > > > which
> > > >>> > > > > > > > > does
> > > >>> > > > > > > > > > > not give you the functionality to submit jobs
> or
> > > >>> > > > > > deployPerJobCluster
> > > >>> > > > > > > > > > > returns directly a JobClient.
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > > > When setting up the ExecutionEnvironment, one
> would then
> > > >>> > not
> > > >>> > > > > > provide
> > > >>> > > > > > > > a
> > > >>> > > > > > > > > > > ClusterClient to submit jobs but a JobDeployer
> which,
> > > >>> > > > depending
> > > >>> > > > > > on
> > > >>> > > > > > > > the
> > > >>> > > > > > > > > > > selected mode, either uses a ClusterClient
> (session
> > > >>> > mode) to
> > > >>> > > > > > submit
> > > >>> > > > > > > > > jobs
> > > >>> > > > > > > > > > or
> > > >>> > > > > > > > > > > a ClusterDeploymentDescriptor to deploy per a
> job mode
> > > >>> > > > cluster
> > > >>> > > > > > with
> > > >>> > > > > > > > the
> > > >>> > > > > > > > > > job
> > > >>> > > > > > > > > > > to execute.
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > > > These are just some thoughts how one could
> make it
> > > >>> > working
> > > >>> > > > > > because I
> > > >>> > > > > > > > > > > believe there is some value in using the per
> job mode
> > > >>> > from
> > > >>> > > > the
> > > >>> > > > > > > > > > > ExecutionEnvironment.
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > > > Concerning the web submission, this is indeed
> a bit
> > > >>> > tricky.
> > > >>> > > > > From
> > > >>> > > > > > a
> > > >>> > > > > > > > > > cluster
> > > >>> > > > > > > > > > > management stand point, I would in favour of
> not
> > > >>> > executing
> > > >>> > > > user
> > > >>> > > > > > code
> > > >>> > > > > > > > on
> > > >>> > > > > > > > > > the
> > > >>> > > > > > > > > > > REST endpoint. Especially when considering
> security, it
> > > >>> > would
> > > >>> > > > > be
> > > >>> > > > > > good
> > > >>> > > > > > > > > to
> > > >>> > > > > > > > > > > have a well defined cluster behaviour where it
> is
> > > >>> > explicitly
> > > >>> > > > > > stated
> > > >>> > > > > > > > > where
> > > >>> > > > > > > > > > > user code and, thus, potentially risky code is
> executed.
> > > >>> > > > > Ideally
> > > >>> > > > > > we
> > > >>> > > > > > > > > limit
> > > >>> > > > > > > > > > > it to the TaskExecutor and JobMaster.
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > > > Cheers,
> > > >>> > > > > > > > > > > Till
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > > > On Tue, Aug 20, 2019 at 9:40 AM Flavio
> Pompermaier <
> > > >>> > > > > > > > > pompermaier@okkam.it
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > > > wrote:
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > > > > In my opinion the client should not use any
> > > >>> > environment to
> > > >>> > > > > get
> > > >>> > > > > > the
> > > >>> > > > > > > > > Job
> > > >>> > > > > > > > > > > > graph because the jar should reside ONLY on
> the cluster
> > > >>> > > > (and
> > > >>> > > > > > not in
> > > >>> > > > > > > > > the
> > > >>> > > > > > > > > > > > client classpath otherwise there are always
> > > >>> > inconsistencies
> > > >>> > > > > > between
> > > >>> > > > > > > > > > > client
> > > >>> > > > > > > > > > > > and Flink Job manager's classpath).
> > > >>> > > > > > > > > > > > In the YARN, Mesos and Kubernetes scenarios
> you have
> > > >>> > the
> > > >>> > > > jar
> > > >>> > > > > > but
> > > >>> > > > > > > > you
> > > >>> > > > > > > > > > > could
> > > >>> > > > > > > > > > > > start a cluster that has the jar on the Job
> Manager as
> > > >>> > well
> > > >>> > > > > > (but
> > > >>> > > > > > > > this
> > > >>> > > > > > > > > > is
> > > >>> > > > > > > > > > > > the only case where I think you can assume
> that the
> > > >>> > client
> > > >>> > > > > has
> > > >>> > > > > > the
> > > >>> > > > > > > > > jar
> > > >>> > > > > > > > > > on
> > > >>> > > > > > > > > > > > the classpath..in the REST job submission
> you don't
> > > >>> > have
> > > >>> > > > any
> > > >>> > > > > > > > > > classpath).
> > > >>> > > > > > > > > > > >
> > > >>> > > > > > > > > > > > Thus, always in my opinion, the JobGraph
> should be
> > > >>> > > > generated
> > > >>> > > > > > by the
> > > >>> > > > > > > > > Job
> > > >>> > > > > > > > > > > > Manager REST API.
> > > >>> > > > > > > > > > > >
> > > >>> > > > > > > > > > > >
> > > >>> > > > > > > > > > > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
> > > >>> > > > > > wander4096@gmail.com>
> > > >>> > > > > > > > > > wrote:
> > > >>> > > > > > > > > > > >
> > > >>> > > > > > > > > > > >> I would like to involve Till & Stephan here
> to clarify
> > > >>> > > > some
> > > >>> > > > > > > > concept
> > > >>> > > > > > > > > of
> > > >>> > > > > > > > > > > >> per-job mode.
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > > >> The term per-job is one of modes a cluster
> could run
> > > >>> > on.
> > > >>> > > > It
> > > >>> > > > > is
> > > >>> > > > > > > > > mainly
> > > >>> > > > > > > > > > > >> aimed
> > > >>> > > > > > > > > > > >> at spawn
> > > >>> > > > > > > > > > > >> a dedicated cluster for a specific job
> while the job
> > > >>> > could
> > > >>> > > > > be
> > > >>> > > > > > > > > packaged
> > > >>> > > > > > > > > > > >> with
> > > >>> > > > > > > > > > > >> Flink
> > > >>> > > > > > > > > > > >> itself and thus the cluster initialized
> with job so
> > > >>> > that
> > > >>> > > > get
> > > >>> > > > > > rid
> > > >>> > > > > > > > of
> > > >>> > > > > > > > > a
> > > >>> > > > > > > > > > > >> separated
> > > >>> > > > > > > > > > > >> submission step.
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > > >> This is useful for container deployments
> where one
> > > >>> > create
> > > >>> > > > > his
> > > >>> > > > > > > > image
> > > >>> > > > > > > > > > with
> > > >>> > > > > > > > > > > >> the job
> > > >>> > > > > > > > > > > >> and then simply deploy the container.
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > > >> However, it is out of client scope since a
> > > >>> > > > > > client(ClusterClient
> > > >>> > > > > > > > for
> > > >>> > > > > > > > > > > >> example) is for
> > > >>> > > > > > > > > > > >> communicate with an existing cluster and
> performance
> > > >>> > > > > actions.
> > > >>> > > > > > > > > > Currently,
> > > >>> > > > > > > > > > > >> in
> > > >>> > > > > > > > > > > >> per-job
> > > >>> > > > > > > > > > > >> mode, we extract the job graph and bundle
> it into
> > > >>> > cluster
> > > >>> > > > > > > > deployment
> > > >>> > > > > > > > > > and
> > > >>> > > > > > > > > > > >> thus no
> > > >>> > > > > > > > > > > >> concept of client get involved. It looks
> like
> > > >>> > reasonable
> > > >>> > > > to
> > > >>> > > > > > > > exclude
> > > >>> > > > > > > > > > the
> > > >>> > > > > > > > > > > >> deployment
> > > >>> > > > > > > > > > > >> of per-job cluster from client api and use
> dedicated
> > > >>> > > > utility
> > > >>> > > > > > > > > > > >> classes(deployers) for
> > > >>> > > > > > > > > > > >> deployment.
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > > >> Zili Chen <wa...@gmail.com>
> 于2019年8月20日周二
> > > >>> > 下午12:37写道：
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > > >> > Hi Aljoscha,
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > Thanks for your reply and participance.
> The Google
> > > >>> > Doc
> > > >>> > > > you
> > > >>> > > > > > > > linked
> > > >>> > > > > > > > > to
> > > >>> > > > > > > > > > > >> > requires
> > > >>> > > > > > > > > > > >> > permission and I think you could use a
> share link
> > > >>> > > > instead.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > I agree with that we almost reach a
> consensus that
> > > >>> > > > > > JobClient is
> > > >>> > > > > > > > > > > >> necessary
> > > >>> > > > > > > > > > > >> > to
> > > >>> > > > > > > > > > > >> > interacte with a running Job.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > Let me check your open questions one by
> one.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > 1. Separate cluster creation and job
> submission for
> > > >>> > > > > per-job
> > > >>> > > > > > > > mode.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > As you mentioned here is where the
> opinions
> > > >>> > diverge. In
> > > >>> > > > my
> > > >>> > > > > > > > > document
> > > >>> > > > > > > > > > > >> there
> > > >>> > > > > > > > > > > >> > is
> > > >>> > > > > > > > > > > >> > an alternative[2] that proposes excluding
> per-job
> > > >>> > > > > deployment
> > > >>> > > > > > > > from
> > > >>> > > > > > > > > > > client
> > > >>> > > > > > > > > > > >> > api
> > > >>> > > > > > > > > > > >> > scope and now I find it is more
> reasonable we do the
> > > >>> > > > > > exclusion.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > When in per-job mode, a dedicated
> JobCluster is
> > > >>> > launched
> > > >>> > > > > to
> > > >>> > > > > > > > > execute
> > > >>> > > > > > > > > > > the
> > > >>> > > > > > > > > > > >> > specific job. It is like a Flink
> Application more
> > > >>> > than a
> > > >>> > > > > > > > > submission
> > > >>> > > > > > > > > > > >> > of Flink Job. Client only takes care of
> job
> > > >>> > submission
> > > >>> > > > and
> > > >>> > > > > > > > assume
> > > >>> > > > > > > > > > > there
> > > >>> > > > > > > > > > > >> is
> > > >>> > > > > > > > > > > >> > an existing cluster. In this way we are
> able to
> > > >>> > consider
> > > >>> > > > > > per-job
> > > >>> > > > > > > > > > > issues
> > > >>> > > > > > > > > > > >> > individually and JobClusterEntrypoint
> would be the
> > > >>> > > > utility
> > > >>> > > > > > class
> > > >>> > > > > > > > > for
> > > >>> > > > > > > > > > > >> > per-job
> > > >>> > > > > > > > > > > >> > deployment.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > Nevertheless, user program works in both
> session
> > > >>> > mode
> > > >>> > > > and
> > > >>> > > > > > > > per-job
> > > >>> > > > > > > > > > mode
> > > >>> > > > > > > > > > > >> > without
> > > >>> > > > > > > > > > > >> > necessary to change code. JobClient in
> per-job mode
> > > >>> > is
> > > >>> > > > > > returned
> > > >>> > > > > > > > > from
> > > >>> > > > > > > > > > > >> > env.execute as normal. However, it would
> be no
> > > >>> > longer a
> > > >>> > > > > > wrapper
> > > >>> > > > > > > > of
> > > >>> > > > > > > > > > > >> > RestClusterClient but a wrapper of
> > > >>> > PerJobClusterClient
> > > >>> > > > > which
> > > >>> > > > > > > > > > > >> communicates
> > > >>> > > > > > > > > > > >> > to Dispatcher locally.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > 2. How to deal with plan preview.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > With env.compile functions users can get
> JobGraph or
> > > >>> > > > > > FlinkPlan
> > > >>> > > > > > > > and
> > > >>> > > > > > > > > > > thus
> > > >>> > > > > > > > > > > >> > they can preview the plan with
> programming.
> > > >>> > Typically it
> > > >>> > > > > > looks
> > > >>> > > > > > > > > like
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > if (preview configured) {
> > > >>> > > > > > > > > > > >> >     FlinkPlan plan = env.compile();
> > > >>> > > > > > > > > > > >> >     new JSONDumpGenerator(...).dump(plan);
> > > >>> > > > > > > > > > > >> > } else {
> > > >>> > > > > > > > > > > >> >     env.execute();
> > > >>> > > > > > > > > > > >> > }
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > And `flink info` would be invalid any
> more.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > 3. How to deal with Jar Submission at the
> Web
> > > >>> > Frontend.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > There is one more thread talked on this
> topic[1].
> > > >>> > Apart
> > > >>> > > > > from
> > > >>> > > > > > > > > > removing
> > > >>> > > > > > > > > > > >> > the functions there are two alternatives.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > One is to introduce an interface has a
> method
> > > >>> > returns
> > > >>> > > > > > > > > > > JobGraph/FilnkPlan
> > > >>> > > > > > > > > > > >> > and Jar Submission only support main-class
> > > >>> > implements
> > > >>> > > > this
> > > >>> > > > > > > > > > interface.
> > > >>> > > > > > > > > > > >> > And then extract the JobGraph/FlinkPlan
> just by
> > > >>> > calling
> > > >>> > > > > the
> > > >>> > > > > > > > > method.
> > > >>> > > > > > > > > > > >> > In this way, it is even possible to
> consider a
> > > >>> > > > separation
> > > >>> > > > > > of job
> > > >>> > > > > > > > > > > >> creation
> > > >>> > > > > > > > > > > >> > and job submission.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > The other is, as you mentioned, let
> execute() do the
> > > >>> > > > > actual
> > > >>> > > > > > > > > > execution.
> > > >>> > > > > > > > > > > >> > We won't execute the main method in the
> WebFrontend
> > > >>> > but
> > > >>> > > > > > spawn a
> > > >>> > > > > > > > > > > process
> > > >>> > > > > > > > > > > >> > at WebMonitor side to execute. For return
> part we
> > > >>> > could
> > > >>> > > > > > generate
> > > >>> > > > > > > > > the
> > > >>> > > > > > > > > > > >> > JobID from WebMonitor and pass it to the
> execution
> > > >>> > > > > > environemnt.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > 4. How to deal with detached mode.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > I think detached mode is a temporary
> solution for
> > > >>> > > > > > non-blocking
> > > >>> > > > > > > > > > > >> submission.
> > > >>> > > > > > > > > > > >> > In my document both submission and
> execution return
> > > >>> > a
> > > >>> > > > > > > > > > > CompletableFuture
> > > >>> > > > > > > > > > > >> and
> > > >>> > > > > > > > > > > >> > users control whether or not wait for the
> result. In
> > > >>> > > > this
> > > >>> > > > > > point
> > > >>> > > > > > > > we
> > > >>> > > > > > > > > > > don't
> > > >>> > > > > > > > > > > >> > need a detached option but the
> functionality is
> > > >>> > covered.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > 5. How does per-job mode interact with
> interactive
> > > >>> > > > > > programming.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > All of YARN, Mesos and Kubernetes
> scenarios follow
> > > >>> > the
> > > >>> > > > > > pattern
> > > >>> > > > > > > > > > launch
> > > >>> > > > > > > > > > > a
> > > >>> > > > > > > > > > > >> > JobCluster now. And I don't think there
> would be
> > > >>> > > > > > inconsistency
> > > >>> > > > > > > > > > between
> > > >>> > > > > > > > > > > >> > different resource management.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > Best,
> > > >>> > > > > > > > > > > >> > tison.
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > [1]
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> >
> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> > > >>> > > > > > > > > > > >> > [2]
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> > Aljoscha Krettek <al...@apache.org>
> > > >>> > 于2019年8月16日周五
> > > >>> > > > > > 下午9:20写道：
> > > >>> > > > > > > > > > > >> >
> > > >>> > > > > > > > > > > >> >> Hi,
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >> >> I read both Jeffs initial design
> document and the
> > > >>> > newer
> > > >>> > > > > > > > document
> > > >>> > > > > > > > > by
> > > >>> > > > > > > > > > > >> >> Tison. I also finally found the time to
> collect our
> > > >>> > > > > > thoughts on
> > > >>> > > > > > > > > the
> > > >>> > > > > > > > > > > >> issue,
> > > >>> > > > > > > > > > > >> >> I had quite some discussions with Kostas
> and this
> > > >>> > is
> > > >>> > > > the
> > > >>> > > > > > > > result:
> > > >>> > > > > > > > > > [1].
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >> >> I think overall we agree that this part
> of the
> > > >>> > code is
> > > >>> > > > in
> > > >>> > > > > > dire
> > > >>> > > > > > > > > need
> > > >>> > > > > > > > > > > of
> > > >>> > > > > > > > > > > >> >> some refactoring/improvements but I
> think there are
> > > >>> > > > still
> > > >>> > > > > > some
> > > >>> > > > > > > > > open
> > > >>> > > > > > > > > > > >> >> questions and some differences in
> opinion what
> > > >>> > those
> > > >>> > > > > > > > refactorings
> > > >>> > > > > > > > > > > >> should
> > > >>> > > > > > > > > > > >> >> look like.
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >> >> I think the API-side is quite clear,
> i.e. we need
> > > >>> > some
> > > >>> > > > > > > > JobClient
> > > >>> > > > > > > > > > API
> > > >>> > > > > > > > > > > >> that
> > > >>> > > > > > > > > > > >> >> allows interacting with a running Job.
> It could be
> > > >>> > > > > > worthwhile
> > > >>> > > > > > > > to
> > > >>> > > > > > > > > > spin
> > > >>> > > > > > > > > > > >> that
> > > >>> > > > > > > > > > > >> >> off into a separate FLIP because we can
> probably
> > > >>> > find
> > > >>> > > > > > consensus
> > > >>> > > > > > > > > on
> > > >>> > > > > > > > > > > that
> > > >>> > > > > > > > > > > >> >> part more easily.
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >> >> For the rest, the main open questions
> from our doc
> > > >>> > are
> > > >>> > > > > > these:
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >> >>   - Do we want to separate cluster
> creation and job
> > > >>> > > > > > submission
> > > >>> > > > > > > > > for
> > > >>> > > > > > > > > > > >> >> per-job mode? In the past, there were
> conscious
> > > >>> > efforts
> > > >>> > > > > to
> > > >>> > > > > > > > *not*
> > > >>> > > > > > > > > > > >> separate
> > > >>> > > > > > > > > > > >> >> job submission from cluster creation for
> per-job
> > > >>> > > > clusters
> > > >>> > > > > > for
> > > >>> > > > > > > > > > Mesos,
> > > >>> > > > > > > > > > > >> YARN,
> > > >>> > > > > > > > > > > >> >> Kubernets (see
> StandaloneJobClusterEntryPoint).
> > > >>> > Tison
> > > >>> > > > > > suggests
> > > >>> > > > > > > > in
> > > >>> > > > > > > > > > his
> > > >>> > > > > > > > > > > >> >> design document to decouple this in
> order to unify
> > > >>> > job
> > > >>> > > > > > > > > submission.
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >> >>   - How to deal with plan preview, which
> needs to
> > > >>> > > > hijack
> > > >>> > > > > > > > > execute()
> > > >>> > > > > > > > > > > and
> > > >>> > > > > > > > > > > >> >> let the outside code catch an exception?
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >> >>   - How to deal with Jar Submission at
> the Web
> > > >>> > > > Frontend,
> > > >>> > > > > > which
> > > >>> > > > > > > > > > needs
> > > >>> > > > > > > > > > > to
> > > >>> > > > > > > > > > > >> >> hijack execute() and let the outside
> code catch an
> > > >>> > > > > > exception?
> > > >>> > > > > > > > > > > >> >> CliFrontend.run() “hijacks”
> > > >>> > > > > ExecutionEnvironment.execute()
> > > >>> > > > > > to
> > > >>> > > > > > > > > get a
> > > >>> > > > > > > > > > > >> >> JobGraph and then execute that JobGraph
> manually.
> > > >>> > We
> > > >>> > > > > could
> > > >>> > > > > > get
> > > >>> > > > > > > > > > around
> > > >>> > > > > > > > > > > >> that
> > > >>> > > > > > > > > > > >> >> by letting execute() do the actual
> execution. One
> > > >>> > > > caveat
> > > >>> > > > > > for
> > > >>> > > > > > > > this
> > > >>> > > > > > > > > > is
> > > >>> > > > > > > > > > > >> that
> > > >>> > > > > > > > > > > >> >> now the main() method doesn’t return (or
> is forced
> > > >>> > to
> > > >>> > > > > > return by
> > > >>> > > > > > > > > > > >> throwing an
> > > >>> > > > > > > > > > > >> >> exception from execute()) which means
> that for Jar
> > > >>> > > > > > Submission
> > > >>> > > > > > > > > from
> > > >>> > > > > > > > > > > the
> > > >>> > > > > > > > > > > >> >> WebFrontend we have a long-running
> main() method
> > > >>> > > > running
> > > >>> > > > > > in the
> > > >>> > > > > > > > > > > >> >> WebFrontend. This doesn’t sound very
> good. We
> > > >>> > could get
> > > >>> > > > > > around
> > > >>> > > > > > > > > this
> > > >>> > > > > > > > > > > by
> > > >>> > > > > > > > > > > >> >> removing the plan preview feature and by
> removing
> > > >>> > Jar
> > > >>> > > > > > > > > > > >> Submission/Running.
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >> >>   - How to deal with detached mode?
> Right now,
> > > >>> > > > > > > > > DetachedEnvironment
> > > >>> > > > > > > > > > > will
> > > >>> > > > > > > > > > > >> >> execute the job and return immediately.
> If users
> > > >>> > > > control
> > > >>> > > > > > when
> > > >>> > > > > > > > > they
> > > >>> > > > > > > > > > > >> want to
> > > >>> > > > > > > > > > > >> >> return, by waiting on the job completion
> future,
> > > >>> > how do
> > > >>> > > > > we
> > > >>> > > > > > deal
> > > >>> > > > > > > > > > with
> > > >>> > > > > > > > > > > >> this?
> > > >>> > > > > > > > > > > >> >> Do we simply remove the distinction
> between
> > > >>> > > > > > > > > detached/non-detached?
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >> >>   - How does per-job mode interact with
> > > >>> > “interactive
> > > >>> > > > > > > > programming”
> > > >>> > > > > > > > > > > >> >> (FLIP-36). For YARN, each execute() call
> could
> > > >>> > spawn a
> > > >>> > > > > new
> > > >>> > > > > > > > Flink
> > > >>> > > > > > > > > > YARN
> > > >>> > > > > > > > > > > >> >> cluster. What about Mesos and Kubernetes?
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >> >> The first open question is where the
> opinions
> > > >>> > diverge,
> > > >>> > > > I
> > > >>> > > > > > think.
> > > >>> > > > > > > > > The
> > > >>> > > > > > > > > > > >> rest
> > > >>> > > > > > > > > > > >> >> are just open questions and interesting
> things
> > > >>> > that we
> > > >>> > > > > > need to
> > > >>> > > > > > > > > > > >> consider.
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >> >> Best,
> > > >>> > > > > > > > > > > >> >> Aljoscha
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >> >> [1]
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > >>> > > > > > > > > > > >> >> <
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > >>> > > > > > > > > > > >> >> >
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <
> > > >>> > > > > zjffdu@gmail.com>
> > > >>> > > > > > > > > wrote:
> > > >>> > > > > > > > > > > >> >> >
> > > >>> > > > > > > > > > > >> >> > Thanks tison for the effort. I left a
> few
> > > >>> > comments.
> > > >>> > > > > > > > > > > >> >> >
> > > >>> > > > > > > > > > > >> >> >
> > > >>> > > > > > > > > > > >> >> > Zili Chen <wa...@gmail.com>
> 于2019年7月31日周三
> > > >>> > > > > 下午8:24写道：
> > > >>> > > > > > > > > > > >> >> >
> > > >>> > > > > > > > > > > >> >> >> Hi Flavio,
> > > >>> > > > > > > > > > > >> >> >>
> > > >>> > > > > > > > > > > >> >> >> Thanks for your reply.
> > > >>> > > > > > > > > > > >> >> >>
> > > >>> > > > > > > > > > > >> >> >> Either current impl and in the design,
> > > >>> > ClusterClient
> > > >>> > > > > > > > > > > >> >> >> never takes responsibility for
> generating
> > > >>> > JobGraph.
> > > >>> > > > > > > > > > > >> >> >> (what you see in current codebase is
> several
> > > >>> > class
> > > >>> > > > > > methods)
> > > >>> > > > > > > > > > > >> >> >>
> > > >>> > > > > > > > > > > >> >> >> Instead, user describes his program
> in the main
> > > >>> > > > method
> > > >>> > > > > > > > > > > >> >> >> with ExecutionEnvironment apis and
> calls
> > > >>> > > > env.compile()
> > > >>> > > > > > > > > > > >> >> >> or env.optimize() to get FlinkPlan
> and JobGraph
> > > >>> > > > > > > > respectively.
> > > >>> > > > > > > > > > > >> >> >>
> > > >>> > > > > > > > > > > >> >> >> For listing main classes in a jar and
> choose
> > > >>> > one for
> > > >>> > > > > > > > > > > >> >> >> submission, you're now able to
> customize a CLI
> > > >>> > to do
> > > >>> > > > > it.
> > > >>> > > > > > > > > > > >> >> >> Specifically, the path of jar is
> passed as
> > > >>> > arguments
> > > >>> > > > > and
> > > >>> > > > > > > > > > > >> >> >> in the customized CLI you list main
> classes,
> > > >>> > choose
> > > >>> > > > > one
> > > >>> > > > > > > > > > > >> >> >> to submit to the cluster.
> > > >>> > > > > > > > > > > >> >> >>
> > > >>> > > > > > > > > > > >> >> >> Best,
> > > >>> > > > > > > > > > > >> >> >> tison.
> > > >>> > > > > > > > > > > >> >> >>
> > > >>> > > > > > > > > > > >> >> >>
> > > >>> > > > > > > > > > > >> >> >> Flavio Pompermaier <
> pompermaier@okkam.it>
> > > >>> > > > > 于2019年7月31日周三
> > > >>> > > > > > > > > > 下午8:12写道：
> > > >>> > > > > > > > > > > >> >> >>
> > > >>> > > > > > > > > > > >> >> >>> Just one note on my side: it is not
> clear to me
> > > >>> > > > > > whether the
> > > >>> > > > > > > > > > > client
> > > >>> > > > > > > > > > > >> >> needs
> > > >>> > > > > > > > > > > >> >> >> to
> > > >>> > > > > > > > > > > >> >> >>> be able to generate a job graph or
> not.
> > > >>> > > > > > > > > > > >> >> >>> In my opinion, the job jar must
> resides only
> > > >>> > on the
> > > >>> > > > > > > > > > > >> server/jobManager
> > > >>> > > > > > > > > > > >> >> >> side
> > > >>> > > > > > > > > > > >> >> >>> and the client requires a way to get
> the job
> > > >>> > graph.
> > > >>> > > > > > > > > > > >> >> >>> If you really want to access to the
> job graph,
> > > >>> > I'd
> > > >>> > > > > add
> > > >>> > > > > > a
> > > >>> > > > > > > > > > > dedicated
> > > >>> > > > > > > > > > > >> >> method
> > > >>> > > > > > > > > > > >> >> >>> on the ClusterClient. like:
> > > >>> > > > > > > > > > > >> >> >>>
> > > >>> > > > > > > > > > > >> >> >>>   - getJobGraph(jarId, mainClass):
> JobGraph
> > > >>> > > > > > > > > > > >> >> >>>   - listMainClasses(jarId):
> List<String>
> > > >>> > > > > > > > > > > >> >> >>>
> > > >>> > > > > > > > > > > >> >> >>> These would require some addition
> also on the
> > > >>> > job
> > > >>> > > > > > manager
> > > >>> > > > > > > > > > > endpoint
> > > >>> > > > > > > > > > > >> as
> > > >>> > > > > > > > > > > >> >> >>> well..what do you think?
> > > >>> > > > > > > > > > > >> >> >>>
> > > >>> > > > > > > > > > > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM
> Zili Chen <
> > > >>> > > > > > > > > > wander4096@gmail.com
> > > >>> > > > > > > > > > > >
> > > >>> > > > > > > > > > > >> >> wrote:
> > > >>> > > > > > > > > > > >> >> >>>
> > > >>> > > > > > > > > > > >> >> >>>> Hi all,
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>> Here is a document[1] on client api
> > > >>> > enhancement
> > > >>> > > > from
> > > >>> > > > > > our
> > > >>> > > > > > > > > > > >> perspective.
> > > >>> > > > > > > > > > > >> >> >>>> We have investigated current
> implementations.
> > > >>> > And
> > > >>> > > > we
> > > >>> > > > > > > > propose
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>> 1. Unify the implementation of
> cluster
> > > >>> > deployment
> > > >>> > > > > and
> > > >>> > > > > > job
> > > >>> > > > > > > > > > > >> submission
> > > >>> > > > > > > > > > > >> >> in
> > > >>> > > > > > > > > > > >> >> >>>> Flink.
> > > >>> > > > > > > > > > > >> >> >>>> 2. Provide programmatic interfaces
> to allow
> > > >>> > > > flexible
> > > >>> > > > > > job
> > > >>> > > > > > > > and
> > > >>> > > > > > > > > > > >> cluster
> > > >>> > > > > > > > > > > >> >> >>>> management.
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>> The first proposal is aimed at
> reducing code
> > > >>> > paths
> > > >>> > > > > of
> > > >>> > > > > > > > > cluster
> > > >>> > > > > > > > > > > >> >> >> deployment
> > > >>> > > > > > > > > > > >> >> >>>> and
> > > >>> > > > > > > > > > > >> >> >>>> job submission so that one can
> adopt Flink in
> > > >>> > his
> > > >>> > > > > > usage
> > > >>> > > > > > > > > > easily.
> > > >>> > > > > > > > > > > >> The
> > > >>> > > > > > > > > > > >> >> >>> second
> > > >>> > > > > > > > > > > >> >> >>>> proposal is aimed at providing rich
> > > >>> > interfaces for
> > > >>> > > > > > > > advanced
> > > >>> > > > > > > > > > > users
> > > >>> > > > > > > > > > > >> >> >>>> who want to make accurate control
> of these
> > > >>> > stages.
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>> Quick reference on open questions:
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>> 1. Exclude job cluster deployment
> from client
> > > >>> > side
> > > >>> > > > > or
> > > >>> > > > > > > > > redefine
> > > >>> > > > > > > > > > > the
> > > >>> > > > > > > > > > > >> >> >>> semantic
> > > >>> > > > > > > > > > > >> >> >>>> of job cluster? Since it fits in a
> process
> > > >>> > quite
> > > >>> > > > > > different
> > > >>> > > > > > > > > > from
> > > >>> > > > > > > > > > > >> >> session
> > > >>> > > > > > > > > > > >> >> >>>> cluster deployment and job
> submission.
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>> 2. Maintain the codepaths handling
> class
> > > >>> > > > > > > > > > > o.a.f.api.common.Program
> > > >>> > > > > > > > > > > >> or
> > > >>> > > > > > > > > > > >> >> >>>> implement customized program
> handling logic by
> > > >>> > > > > > customized
> > > >>> > > > > > > > > > > >> >> CliFrontend?
> > > >>> > > > > > > > > > > >> >> >>>> See also this thread[2] and the
> document[1].
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>> 3. Expose ClusterClient as public
> api or just
> > > >>> > > > expose
> > > >>> > > > > > api
> > > >>> > > > > > > > in
> > > >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment
> > > >>> > > > > > > > > > > >> >> >>>> and delegate them to ClusterClient?
> Further,
> > > >>> > in
> > > >>> > > > > > either way
> > > >>> > > > > > > > > is
> > > >>> > > > > > > > > > it
> > > >>> > > > > > > > > > > >> >> worth
> > > >>> > > > > > > > > > > >> >> >> to
> > > >>> > > > > > > > > > > >> >> >>>> introduce a JobClient which is an
> > > >>> > encapsulation of
> > > >>> > > > > > > > > > ClusterClient
> > > >>> > > > > > > > > > > >> that
> > > >>> > > > > > > > > > > >> >> >>>> associated to specific job?
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>> Best,
> > > >>> > > > > > > > > > > >> >> >>>> tison.
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>> [1]
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>
> > > >>> > > > > > > > > > > >> >> >>
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> > > >>> > > > > > > > > > > >> >> >>>> [2]
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>
> > > >>> > > > > > > > > > > >> >> >>
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> >
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>> Jeff Zhang <zj...@gmail.com>
> 于2019年7月24日周三
> > > >>> > > > > 上午9:19写道：
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>>> Thanks Stephan, I will follow up
> this issue
> > > >>> > in
> > > >>> > > > next
> > > >>> > > > > > few
> > > >>> > > > > > > > > > weeks,
> > > >>> > > > > > > > > > > >> and
> > > >>> > > > > > > > > > > >> >> >> will
> > > >>> > > > > > > > > > > >> >> >>>>> refine the design doc. We could
> discuss more
> > > >>> > > > > details
> > > >>> > > > > > > > after
> > > >>> > > > > > > > > > 1.9
> > > >>> > > > > > > > > > > >> >> >> release.
> > > >>> > > > > > > > > > > >> >> >>>>>
> > > >>> > > > > > > > > > > >> >> >>>>> Stephan Ewen <se...@apache.org>
> > > >>> > 于2019年7月24日周三
> > > >>> > > > > > 上午12:58写道：
> > > >>> > > > > > > > > > > >> >> >>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>> Hi all!
> > > >>> > > > > > > > > > > >> >> >>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>> This thread has stalled for a
> bit, which I
> > > >>> > > > assume
> > > >>> > > > > > ist
> > > >>> > > > > > > > > mostly
> > > >>> > > > > > > > > > > >> due to
> > > >>> > > > > > > > > > > >> >> >>> the
> > > >>> > > > > > > > > > > >> >> >>>>>> Flink 1.9 feature freeze and
> release testing
> > > >>> > > > > effort.
> > > >>> > > > > > > > > > > >> >> >>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>> I personally still recognize this
> issue as
> > > >>> > one
> > > >>> > > > > > important
> > > >>> > > > > > > > > to
> > > >>> > > > > > > > > > be
> > > >>> > > > > > > > > > > >> >> >>> solved.
> > > >>> > > > > > > > > > > >> >> >>>>> I'd
> > > >>> > > > > > > > > > > >> >> >>>>>> be happy to help resume this
> discussion soon
> > > >>> > > > > (after
> > > >>> > > > > > the
> > > >>> > > > > > > > > 1.9
> > > >>> > > > > > > > > > > >> >> >> release)
> > > >>> > > > > > > > > > > >> >> >>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>> see if we can do some step
> towards this in
> > > >>> > Flink
> > > >>> > > > > > 1.10.
> > > >>> > > > > > > > > > > >> >> >>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>> Best,
> > > >>> > > > > > > > > > > >> >> >>>>>> Stephan
> > > >>> > > > > > > > > > > >> >> >>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM
> Flavio
> > > >>> > > > > Pompermaier
> > > >>> > > > > > <
> > > >>> > > > > > > > > > > >> >> >>>>> pompermaier@okkam.it>
> > > >>> > > > > > > > > > > >> >> >>>>>> wrote:
> > > >>> > > > > > > > > > > >> >> >>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>> That's exactly what I suggested
> a long time
> > > >>> > > > ago:
> > > >>> > > > > > the
> > > >>> > > > > > > > > Flink
> > > >>> > > > > > > > > > > REST
> > > >>> > > > > > > > > > > >> >> >>>> client
> > > >>> > > > > > > > > > > >> >> >>>>>>> should not require any Flink
> dependency,
> > > >>> > only
> > > >>> > > > > http
> > > >>> > > > > > > > > library
> > > >>> > > > > > > > > > to
> > > >>> > > > > > > > > > > >> >> >> call
> > > >>> > > > > > > > > > > >> >> >>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>> REST
> > > >>> > > > > > > > > > > >> >> >>>>>>> services to submit and monitor a
> job.
> > > >>> > > > > > > > > > > >> >> >>>>>>> What I suggested also in [1] was
> to have a
> > > >>> > way
> > > >>> > > > to
> > > >>> > > > > > > > > > > automatically
> > > >>> > > > > > > > > > > >> >> >>>> suggest
> > > >>> > > > > > > > > > > >> >> >>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>> user (via a UI) the available
> main classes
> > > >>> > and
> > > >>> > > > > > their
> > > >>> > > > > > > > > > required
> > > >>> > > > > > > > > > > >> >> >>>>>>> parameters[2].
> > > >>> > > > > > > > > > > >> >> >>>>>>> Another problem we have with
> Flink is that
> > > >>> > the
> > > >>> > > > > Rest
> > > >>> > > > > > > > > client
> > > >>> > > > > > > > > > > and
> > > >>> > > > > > > > > > > >> >> >> the
> > > >>> > > > > > > > > > > >> >> >>>> CLI
> > > >>> > > > > > > > > > > >> >> >>>>>> one
> > > >>> > > > > > > > > > > >> >> >>>>>>> behaves differently and we use
> the CLI
> > > >>> > client
> > > >>> > > > > (via
> > > >>> > > > > > ssh)
> > > >>> > > > > > > > > > > because
> > > >>> > > > > > > > > > > >> >> >> it
> > > >>> > > > > > > > > > > >> >> >>>>> allows
> > > >>> > > > > > > > > > > >> >> >>>>>>> to call some other method after
> > > >>> > env.execute()
> > > >>> > > > [3]
> > > >>> > > > > > (we
> > > >>> > > > > > > > > have
> > > >>> > > > > > > > > > to
> > > >>> > > > > > > > > > > >> >> >> call
> > > >>> > > > > > > > > > > >> >> >>>>>> another
> > > >>> > > > > > > > > > > >> >> >>>>>>> REST service to signal the end
> of the job).
> > > >>> > > > > > > > > > > >> >> >>>>>>> Int his regard, a dedicated
> interface,
> > > >>> > like the
> > > >>> > > > > > > > > JobListener
> > > >>> > > > > > > > > > > >> >> >>> suggested
> > > >>> > > > > > > > > > > >> >> >>>>> in
> > > >>> > > > > > > > > > > >> >> >>>>>>> the previous emails, would be
> very helpful
> > > >>> > > > > (IMHO).
> > > >>> > > > > > > > > > > >> >> >>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>> [1]
> > > >>> > > > > > https://issues.apache.org/jira/browse/FLINK-10864
> > > >>> > > > > > > > > > > >> >> >>>>>>> [2]
> > > >>> > > > > > https://issues.apache.org/jira/browse/FLINK-10862
> > > >>> > > > > > > > > > > >> >> >>>>>>> [3]
> > > >>> > > > > > https://issues.apache.org/jira/browse/FLINK-10879
> > > >>> > > > > > > > > > > >> >> >>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>> Best,
> > > >>> > > > > > > > > > > >> >> >>>>>>> Flavio
> > > >>> > > > > > > > > > > >> >> >>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM
> Jeff Zhang
> > > >>> > <
> > > >>> > > > > > > > > > zjffdu@gmail.com
> > > >>> > > > > > > > > > > >
> > > >>> > > > > > > > > > > >> >> >>> wrote:
> > > >>> > > > > > > > > > > >> >> >>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>> Hi, Tison,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>> Thanks for your comments.
> Overall I agree
> > > >>> > with
> > > >>> > > > > you
> > > >>> > > > > > > > that
> > > >>> > > > > > > > > it
> > > >>> > > > > > > > > > > is
> > > >>> > > > > > > > > > > >> >> >>>>> difficult
> > > >>> > > > > > > > > > > >> >> >>>>>>> for
> > > >>> > > > > > > > > > > >> >> >>>>>>>> down stream project to
> integrate with
> > > >>> > flink
> > > >>> > > > and
> > > >>> > > > > we
> > > >>> > > > > > > > need
> > > >>> > > > > > > > > to
> > > >>> > > > > > > > > > > >> >> >>> refactor
> > > >>> > > > > > > > > > > >> >> >>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>> current flink client api.
> > > >>> > > > > > > > > > > >> >> >>>>>>>> And I agree that CliFrontend
> should only
> > > >>> > > > parsing
> > > >>> > > > > > > > command
> > > >>> > > > > > > > > > > line
> > > >>> > > > > > > > > > > >> >> >>>>> arguments
> > > >>> > > > > > > > > > > >> >> >>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>> then pass them to
> ExecutionEnvironment.
> > > >>> > It is
> > > >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment's
> > > >>> > > > > > > > > > > >> >> >>>>>>>> responsibility to compile job,
> create
> > > >>> > cluster,
> > > >>> > > > > and
> > > >>> > > > > > > > > submit
> > > >>> > > > > > > > > > > job.
> > > >>> > > > > > > > > > > >> >> >>>>> Besides
> > > >>> > > > > > > > > > > >> >> >>>>>>>> that, Currently flink has many
> > > >>> > > > > > ExecutionEnvironment
> > > >>> > > > > > > > > > > >> >> >>>> implementations,
> > > >>> > > > > > > > > > > >> >> >>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>> flink will use the specific one
> based on
> > > >>> > the
> > > >>> > > > > > context.
> > > >>> > > > > > > > > > IMHO,
> > > >>> > > > > > > > > > > it
> > > >>> > > > > > > > > > > >> >> >> is
> > > >>> > > > > > > > > > > >> >> >>>> not
> > > >>> > > > > > > > > > > >> >> >>>>>>>> necessary, ExecutionEnvironment
> should be
> > > >>> > able
> > > >>> > > > > to
> > > >>> > > > > > do
> > > >>> > > > > > > > the
> > > >>> > > > > > > > > > > right
> > > >>> > > > > > > > > > > >> >> >>>> thing
> > > >>> > > > > > > > > > > >> >> >>>>>>> based
> > > >>> > > > > > > > > > > >> >> >>>>>>>> on the FlinkConf it is
> received. Too many
> > > >>> > > > > > > > > > > ExecutionEnvironment
> > > >>> > > > > > > > > > > >> >> >>>>>>>> implementation is another
> burden for
> > > >>> > > > downstream
> > > >>> > > > > > > > project
> > > >>> > > > > > > > > > > >> >> >>>> integration.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>> One thing I'd like to mention
> is flink's
> > > >>> > scala
> > > >>> > > > > > shell
> > > >>> > > > > > > > and
> > > >>> > > > > > > > > > sql
> > > >>> > > > > > > > > > > >> >> >>>> client,
> > > >>> > > > > > > > > > > >> >> >>>>>>>> although they are sub-modules
> of flink,
> > > >>> > they
> > > >>> > > > > > could be
> > > >>> > > > > > > > > > > treated
> > > >>> > > > > > > > > > > >> >> >> as
> > > >>> > > > > > > > > > > >> >> >>>>>>> downstream
> > > >>> > > > > > > > > > > >> >> >>>>>>>> project which use flink's
> client api.
> > > >>> > > > Currently
> > > >>> > > > > > you
> > > >>> > > > > > > > will
> > > >>> > > > > > > > > > > find
> > > >>> > > > > > > > > > > >> >> >> it
> > > >>> > > > > > > > > > > >> >> >>> is
> > > >>> > > > > > > > > > > >> >> >>>>> not
> > > >>> > > > > > > > > > > >> >> >>>>>>>> easy for them to integrate with
> flink,
> > > >>> > they
> > > >>> > > > > share
> > > >>> > > > > > many
> > > >>> > > > > > > > > > > >> >> >> duplicated
> > > >>> > > > > > > > > > > >> >> >>>>> code.
> > > >>> > > > > > > > > > > >> >> >>>>>>> It
> > > >>> > > > > > > > > > > >> >> >>>>>>>> is another sign that we should
> refactor
> > > >>> > flink
> > > >>> > > > > > client
> > > >>> > > > > > > > > api.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>> I believe it is a large and
> hard change,
> > > >>> > and I
> > > >>> > > > > am
> > > >>> > > > > > > > afraid
> > > >>> > > > > > > > > > we
> > > >>> > > > > > > > > > > >> can
> > > >>> > > > > > > > > > > >> >> >>> not
> > > >>> > > > > > > > > > > >> >> >>>>>> keep
> > > >>> > > > > > > > > > > >> >> >>>>>>>> compatibility since many of
> changes are
> > > >>> > user
> > > >>> > > > > > facing.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>> Zili Chen <wander4096@gmail.com
> >
> > > >>> > > > 于2019年6月24日周一
> > > >>> > > > > > > > > 下午2:53写道：
> > > >>> > > > > > > > > > > >> >> >>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> Hi all,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> After a closer look on our
> client apis,
> > > >>> > I can
> > > >>> > > > > see
> > > >>> > > > > > > > there
> > > >>> > > > > > > > > > are
> > > >>> > > > > > > > > > > >> >> >> two
> > > >>> > > > > > > > > > > >> >> >>>>> major
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> issues to consistency and
> integration,
> > > >>> > namely
> > > >>> > > > > > > > different
> > > >>> > > > > > > > > > > >> >> >>>> deployment
> > > >>> > > > > > > > > > > >> >> >>>>> of
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> job cluster which couples job
> graph
> > > >>> > creation
> > > >>> > > > > and
> > > >>> > > > > > > > > cluster
> > > >>> > > > > > > > > > > >> >> >>>>> deployment,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> and submission via CliFrontend
> confusing
> > > >>> > > > > control
> > > >>> > > > > > flow
> > > >>> > > > > > > > > of
> > > >>> > > > > > > > > > > job
> > > >>> > > > > > > > > > > >> >> >>>> graph
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> compilation and job
> submission. I'd like
> > > >>> > to
> > > >>> > > > > > follow
> > > >>> > > > > > > > the
> > > >>> > > > > > > > > > > >> >> >> discuss
> > > >>> > > > > > > > > > > >> >> >>>>> above,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> mainly the process described
> by Jeff and
> > > >>> > > > > > Stephan, and
> > > >>> > > > > > > > > > share
> > > >>> > > > > > > > > > > >> >> >> my
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> ideas on these issues.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend confuses the
> control flow
> > > >>> > of
> > > >>> > > > job
> > > >>> > > > > > > > > > compilation
> > > >>> > > > > > > > > > > >> >> >> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>> submission.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> Following the process of job
> submission
> > > >>> > > > Stephan
> > > >>> > > > > > and
> > > >>> > > > > > > > > Jeff
> > > >>> > > > > > > > > > > >> >> >>>> described,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> execution environment knows
> all configs
> > > >>> > of
> > > >>> > > > the
> > > >>> > > > > > > > cluster
> > > >>> > > > > > > > > > and
> > > >>> > > > > > > > > > > >> >> >>>>>>> topos/settings
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> of the job. Ideally, in the
> main method
> > > >>> > of
> > > >>> > > > user
> > > >>> > > > > > > > > program,
> > > >>> > > > > > > > > > it
> > > >>> > > > > > > > > > > >> >> >>> calls
> > > >>> > > > > > > > > > > >> >> >>>>>>>> #execute
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> (or named #submit) and Flink
> deploys the
> > > >>> > > > > cluster,
> > > >>> > > > > > > > > compile
> > > >>> > > > > > > > > > > the
> > > >>> > > > > > > > > > > >> >> >>> job
> > > >>> > > > > > > > > > > >> >> >>>>>> graph
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> and submit it to the cluster.
> However,
> > > >>> > > > current
> > > >>> > > > > > > > > > CliFrontend
> > > >>> > > > > > > > > > > >> >> >> does
> > > >>> > > > > > > > > > > >> >> >>>> all
> > > >>> > > > > > > > > > > >> >> >>>>>>> these
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> things inside its #runProgram
> method,
> > > >>> > which
> > > >>> > > > > > > > introduces
> > > >>> > > > > > > > > a
> > > >>> > > > > > > > > > > lot
> > > >>> > > > > > > > > > > >> >> >> of
> > > >>> > > > > > > > > > > >> >> >>>>>>>> subclasses
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> of (stream) execution
> environment.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> Actually, it sets up an exec
> env that
> > > >>> > hijacks
> > > >>> > > > > the
> > > >>> > > > > > > > > > > >> >> >>>>>> #execute/executePlan
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> method, initializes the job
> graph and
> > > >>> > abort
> > > >>> > > > > > > > execution.
> > > >>> > > > > > > > > > And
> > > >>> > > > > > > > > > > >> >> >> then
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> control flow back to
> CliFrontend, it
> > > >>> > deploys
> > > >>> > > > > the
> > > >>> > > > > > > > > > cluster(or
> > > >>> > > > > > > > > > > >> >> >>>>> retrieve
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> the client) and submits the
> job graph.
> > > >>> > This
> > > >>> > > > is
> > > >>> > > > > > quite
> > > >>> > > > > > > > a
> > > >>> > > > > > > > > > > >> >> >> specific
> > > >>> > > > > > > > > > > >> >> >>>>>>> internal
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> process inside Flink and none
> of
> > > >>> > consistency
> > > >>> > > > to
> > > >>> > > > > > > > > anything.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> 2) Deployment of job cluster
> couples job
> > > >>> > > > graph
> > > >>> > > > > > > > creation
> > > >>> > > > > > > > > > and
> > > >>> > > > > > > > > > > >> >> >>>> cluster
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> deployment. Abstractly, from
> user job to
> > > >>> > a
> > > >>> > > > > > concrete
> > > >>> > > > > > > > > > > >> >> >> submission,
> > > >>> > > > > > > > > > > >> >> >>>> it
> > > >>> > > > > > > > > > > >> >> >>>>>>>> requires
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>     create JobGraph --\
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> create ClusterClient -->
> submit JobGraph
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> such a dependency.
> ClusterClient was
> > > >>> > created
> > > >>> > > > by
> > > >>> > > > > > > > > deploying
> > > >>> > > > > > > > > > > or
> > > >>> > > > > > > > > > > >> >> >>>>>>> retrieving.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> JobGraph submission requires a
> compiled
> > > >>> > > > > JobGraph
> > > >>> > > > > > and
> > > >>> > > > > > > > > > valid
> > > >>> > > > > > > > > > > >> >> >>>>>>> ClusterClient,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> but the creation of
> ClusterClient is
> > > >>> > > > abstractly
> > > >>> > > > > > > > > > independent
> > > >>> > > > > > > > > > > >> >> >> of
> > > >>> > > > > > > > > > > >> >> >>>> that
> > > >>> > > > > > > > > > > >> >> >>>>>> of
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> JobGraph. However, in job
> cluster mode,
> > > >>> > we
> > > >>> > > > > > deploy job
> > > >>> > > > > > > > > > > cluster
> > > >>> > > > > > > > > > > >> >> >>>> with
> > > >>> > > > > > > > > > > >> >> >>>>> a
> > > >>> > > > > > > > > > > >> >> >>>>>>> job
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> graph, which means we use
> another
> > > >>> > process:
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> create JobGraph --> deploy
> cluster with
> > > >>> > the
> > > >>> > > > > > JobGraph
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> Here is another inconsistency
> and
> > > >>> > downstream
> > > >>> > > > > > > > > > > projects/client
> > > >>> > > > > > > > > > > >> >> >>> apis
> > > >>> > > > > > > > > > > >> >> >>>>> are
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> forced to handle different
> cases with
> > > >>> > rare
> > > >>> > > > > > supports
> > > >>> > > > > > > > > from
> > > >>> > > > > > > > > > > >> >> >> Flink.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> Since we likely reached a
> consensus on
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> 1. all configs gathered by
> Flink
> > > >>> > > > configuration
> > > >>> > > > > > and
> > > >>> > > > > > > > > passed
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> 2. execution environment knows
> all
> > > >>> > configs
> > > >>> > > > and
> > > >>> > > > > > > > handles
> > > >>> > > > > > > > > > > >> >> >>>>> execution(both
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> deployment and submission)
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> to the issues above I propose
> eliminating
> > > >>> > > > > > > > > inconsistencies
> > > >>> > > > > > > > > > > by
> > > >>> > > > > > > > > > > >> >> >>>>>> following
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> approach:
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend should exactly
> be a front
> > > >>> > end,
> > > >>> > > > > at
> > > >>> > > > > > > > least
> > > >>> > > > > > > > > > for
> > > >>> > > > > > > > > > > >> >> >>> "run"
> > > >>> > > > > > > > > > > >> >> >>>>>>> command.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> That means it just gathered
> and passed
> > > >>> > all
> > > >>> > > > > config
> > > >>> > > > > > > > from
> > > >>> > > > > > > > > > > >> >> >> command
> > > >>> > > > > > > > > > > >> >> >>>> line
> > > >>> > > > > > > > > > > >> >> >>>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> the main method of user
> program.
> > > >>> > Execution
> > > >>> > > > > > > > environment
> > > >>> > > > > > > > > > > knows
> > > >>> > > > > > > > > > > >> >> >>> all
> > > >>> > > > > > > > > > > >> >> >>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>> info
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> and with an addition to utils
> for
> > > >>> > > > > ClusterClient,
> > > >>> > > > > > we
> > > >>> > > > > > > > > > > >> >> >> gracefully
> > > >>> > > > > > > > > > > >> >> >>>> get
> > > >>> > > > > > > > > > > >> >> >>>>> a
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> ClusterClient by deploying or
> > > >>> > retrieving. In
> > > >>> > > > > this
> > > >>> > > > > > > > way,
> > > >>> > > > > > > > > we
> > > >>> > > > > > > > > > > >> >> >> don't
> > > >>> > > > > > > > > > > >> >> >>>>> need
> > > >>> > > > > > > > > > > >> >> >>>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> hijack #execute/executePlan
> methods and
> > > >>> > can
> > > >>> > > > > > remove
> > > >>> > > > > > > > > > various
> > > >>> > > > > > > > > > > >> >> >>>> hacking
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> subclasses of exec env, as
> well as #run
> > > >>> > > > methods
> > > >>> > > > > > in
> > > >>> > > > > > > > > > > >> >> >>>>> ClusterClient(for
> > > >>> > > > > > > > > > > >> >> >>>>>> an
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> interface-ized ClusterClient).
> Now the
> > > >>> > > > control
> > > >>> > > > > > flow
> > > >>> > > > > > > > > flows
> > > >>> > > > > > > > > > > >> >> >> from
> > > >>> > > > > > > > > > > >> >> >>>>>>>> CliFrontend
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> to the main method and never
> returns.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> 2) Job cluster means a cluster
> for the
> > > >>> > > > specific
> > > >>> > > > > > job.
> > > >>> > > > > > > > > From
> > > >>> > > > > > > > > > > >> >> >>> another
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> perspective, it is an
> ephemeral session.
> > > >>> > We
> > > >>> > > > may
> > > >>> > > > > > > > > decouple
> > > >>> > > > > > > > > > > the
> > > >>> > > > > > > > > > > >> >> >>>>>> deployment
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> with a compiled job graph, but
> start a
> > > >>> > > > session
> > > >>> > > > > > with
> > > >>> > > > > > > > > idle
> > > >>> > > > > > > > > > > >> >> >>> timeout
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> and submit the job following.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> These topics, before we go
> into more
> > > >>> > details
> > > >>> > > > on
> > > >>> > > > > > > > design
> > > >>> > > > > > > > > or
> > > >>> > > > > > > > > > > >> >> >>>>>>> implementation,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> are better to be aware and
> discussed for
> > > >>> > a
> > > >>> > > > > > consensus.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> Best,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> tison.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>> Zili Chen <
> wander4096@gmail.com>
> > > >>> > > > 于2019年6月20日周四
> > > >>> > > > > > > > > 上午3:21写道：
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> Hi Jeff,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> Thanks for raising this
> thread and the
> > > >>> > > > design
> > > >>> > > > > > > > > document!
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> As @Thomas Weise mentioned
> above,
> > > >>> > extending
> > > >>> > > > > > config
> > > >>> > > > > > > > to
> > > >>> > > > > > > > > > > flink
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> requires far more effort than
> it should
> > > >>> > be.
> > > >>> > > > > > Another
> > > >>> > > > > > > > > > > example
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> is we achieve detach mode by
> introduce
> > > >>> > > > another
> > > >>> > > > > > > > > execution
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> environment which also hijack
> #execute
> > > >>> > > > method.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> I agree with your idea that
> user would
> > > >>> > > > > > configure all
> > > >>> > > > > > > > > > > things
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> and flink "just" respect it.
> On this
> > > >>> > topic I
> > > >>> > > > > > think
> > > >>> > > > > > > > the
> > > >>> > > > > > > > > > > >> >> >> unusual
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> control flow when CliFrontend
> handle
> > > >>> > "run"
> > > >>> > > > > > command
> > > >>> > > > > > > > is
> > > >>> > > > > > > > > > the
> > > >>> > > > > > > > > > > >> >> >>>> problem.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> It handles several configs,
> mainly about
> > > >>> > > > > cluster
> > > >>> > > > > > > > > > settings,
> > > >>> > > > > > > > > > > >> >> >> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> thus main method of user
> program is
> > > >>> > unaware
> > > >>> > > > of
> > > >>> > > > > > them.
> > > >>> > > > > > > > > > Also
> > > >>> > > > > > > > > > > it
> > > >>> > > > > > > > > > > >> >> >>>>>> compiles
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> app to job graph by run the
> main method
> > > >>> > > > with a
> > > >>> > > > > > > > > hijacked
> > > >>> > > > > > > > > > > exec
> > > >>> > > > > > > > > > > >> >> >>>> env,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> which constrain the main
> method further.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> I'd like to write down a few
> of notes on
> > > >>> > > > > > > > configs/args
> > > >>> > > > > > > > > > pass
> > > >>> > > > > > > > > > > >> >> >> and
> > > >>> > > > > > > > > > > >> >> >>>>>>> respect,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> as well as decoupling job
> compilation
> > > >>> > and
> > > >>> > > > > > > > submission.
> > > >>> > > > > > > > > > > Share
> > > >>> > > > > > > > > > > >> >> >> on
> > > >>> > > > > > > > > > > >> >> >>>>> this
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> thread later.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> Best,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> tison.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>> SHI Xiaogang <
> shixiaogangg@gmail.com>
> > > >>> > > > > > 于2019年6月17日周一
> > > >>> > > > > > > > > > > >> >> >> 下午7:29写道：
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Thanks Jeff a lot for
> proposing the
> > > >>> > design
> > > >>> > > > > > > > document.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> We are also working on
> refactoring
> > > >>> > > > > > ClusterClient to
> > > >>> > > > > > > > > > allow
> > > >>> > > > > > > > > > > >> >> >>>>> flexible
> > > >>> > > > > > > > > > > >> >> >>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> efficient job management in
> our
> > > >>> > real-time
> > > >>> > > > > > platform.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> We would like to draft a
> document to
> > > >>> > share
> > > >>> > > > > our
> > > >>> > > > > > > > ideas
> > > >>> > > > > > > > > > with
> > > >>> > > > > > > > > > > >> >> >>> you.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> I think it's a good idea to
> have
> > > >>> > something
> > > >>> > > > > like
> > > >>> > > > > > > > > Apache
> > > >>> > > > > > > > > > > Livy
> > > >>> > > > > > > > > > > >> >> >>> for
> > > >>> > > > > > > > > > > >> >> >>>>>>> Flink,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> the efforts discussed here
> will take a
> > > >>> > > > great
> > > >>> > > > > > step
> > > >>> > > > > > > > > > forward
> > > >>> > > > > > > > > > > >> >> >> to
> > > >>> > > > > > > > > > > >> >> >>>> it.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Regards,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Xiaogang
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Flavio Pompermaier <
> > > >>> > pompermaier@okkam.it>
> > > >>> > > > > > > > > > 于2019年6月17日周一
> > > >>> > > > > > > > > > > >> >> >>>>> 下午7:13写道：
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> Is there any possibility to
> have
> > > >>> > something
> > > >>> > > > > > like
> > > >>> > > > > > > > > Apache
> > > >>> > > > > > > > > > > >> >> >> Livy
> > > >>> > > > > > > > > > > >> >> >>>> [1]
> > > >>> > > > > > > > > > > >> >> >>>>>>> also
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> for
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> Flink in the future?
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> [1]
> https://livy.apache.org/
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at
> 5:23 PM Jeff
> > > >>> > > > Zhang <
> > > >>> > > > > > > > > > > >> >> >>> zjffdu@gmail.com
> > > >>> > > > > > > > > > > >> >> >>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>> wrote:
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Any API we expose
> should not have
> > > >>> > > > > > dependencies
> > > >>> > > > > > > > > on
> > > >>> > > > > > > > > > > >> >> >>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>> runtime
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> (flink-runtime) package or
> other
> > > >>> > > > > > implementation
> > > >>> > > > > > > > > > > >> >> >> details.
> > > >>> > > > > > > > > > > >> >> >>> To
> > > >>> > > > > > > > > > > >> >> >>>>> me,
> > > >>> > > > > > > > > > > >> >> >>>>>>>> this
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> means
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> that the current
> ClusterClient
> > > >>> > cannot be
> > > >>> > > > > > exposed
> > > >>> > > > > > > > to
> > > >>> > > > > > > > > > > >> >> >> users
> > > >>> > > > > > > > > > > >> >> >>>>>> because
> > > >>> > > > > > > > > > > >> >> >>>>>>>> it
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> uses
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> quite some classes from the
> > > >>> > optimiser and
> > > >>> > > > > > runtime
> > > >>> > > > > > > > > > > >> >> >>> packages.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> We should change
> ClusterClient from
> > > >>> > class
> > > >>> > > > > to
> > > >>> > > > > > > > > > interface.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> ExecutionEnvironment only
> use the
> > > >>> > > > interface
> > > >>> > > > > > > > > > > >> >> >> ClusterClient
> > > >>> > > > > > > > > > > >> >> >>>>> which
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> should be
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> in flink-clients while the
> concrete
> > > >>> > > > > > > > implementation
> > > >>> > > > > > > > > > > >> >> >> class
> > > >>> > > > > > > > > > > >> >> >>>>> could
> > > >>> > > > > > > > > > > >> >> >>>>>> be
> > > >>> > > > > > > > > > > >> >> >>>>>>>> in
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> flink-runtime.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What happens when a
> > > >>> > failure/restart in
> > > >>> > > > > the
> > > >>> > > > > > > > > client
> > > >>> > > > > > > > > > > >> >> >>>>> happens?
> > > >>> > > > > > > > > > > >> >> >>>>>>>> There
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> need
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to be a way of
> re-establishing the
> > > >>> > > > > > connection to
> > > >>> > > > > > > > > the
> > > >>> > > > > > > > > > > >> >> >> job,
> > > >>> > > > > > > > > > > >> >> >>>> set
> > > >>> > > > > > > > > > > >> >> >>>>>> up
> > > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> listeners again, etc.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Good point.  First we need
> to define
> > > >>> > what
> > > >>> > > > > > does
> > > >>> > > > > > > > > > > >> >> >>>>> failure/restart
> > > >>> > > > > > > > > > > >> >> >>>>>> in
> > > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> client mean. IIUC, that
> usually mean
> > > >>> > > > > network
> > > >>> > > > > > > > > failure
> > > >>> > > > > > > > > > > >> >> >>> which
> > > >>> > > > > > > > > > > >> >> >>>>> will
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> happen in
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> class RestClient. If my
> > > >>> > understanding is
> > > >>> > > > > > correct,
> > > >>> > > > > > > > > > > >> >> >>>>> restart/retry
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> mechanism
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> should be done in
> RestClient.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Aljoscha Krettek <
> > > >>> > aljoscha@apache.org>
> > > >>> > > > > > > > > 于2019年6月11日周二
> > > >>> > > > > > > > > > > >> >> >>>>>> 下午11:10写道：
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> Some points to consider:
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> * Any API we expose
> should not have
> > > >>> > > > > > dependencies
> > > >>> > > > > > > > > on
> > > >>> > > > > > > > > > > >> >> >> the
> > > >>> > > > > > > > > > > >> >> >>>>>> runtime
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> (flink-runtime) package
> or other
> > > >>> > > > > > implementation
> > > >>> > > > > > > > > > > >> >> >>> details.
> > > >>> > > > > > > > > > > >> >> >>>> To
> > > >>> > > > > > > > > > > >> >> >>>>>> me,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> this
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> means
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> that the current
> ClusterClient
> > > >>> > cannot be
> > > >>> > > > > > exposed
> > > >>> > > > > > > > > to
> > > >>> > > > > > > > > > > >> >> >>> users
> > > >>> > > > > > > > > > > >> >> >>>>>>> because
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> it
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> uses
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> quite some classes from
> the
> > > >>> > optimiser
> > > >>> > > > and
> > > >>> > > > > > > > runtime
> > > >>> > > > > > > > > > > >> >> >>>> packages.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> * What happens when a
> > > >>> > failure/restart in
> > > >>> > > > > the
> > > >>> > > > > > > > > client
> > > >>> > > > > > > > > > > >> >> >>>>> happens?
> > > >>> > > > > > > > > > > >> >> >>>>>>>> There
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> need
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> be a way of
> re-establishing the
> > > >>> > > > connection
> > > >>> > > > > > to
> > > >>> > > > > > > > the
> > > >>> > > > > > > > > > > >> >> >> job,
> > > >>> > > > > > > > > > > >> >> >>>> set
> > > >>> > > > > > > > > > > >> >> >>>>> up
> > > >>> > > > > > > > > > > >> >> >>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> listeners
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> again, etc.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> Aljoscha
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at
> 10:17, Jeff
> > > >>> > Zhang <
> > > >>> > > > > > > > > > > >> >> >>>> zjffdu@gmail.com>
> > > >>> > > > > > > > > > > >> >> >>>>>>>> wrote:
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Sorry folks, the design
> doc is
> > > >>> > late as
> > > >>> > > > > you
> > > >>> > > > > > > > > > > >> >> >> expected.
> > > >>> > > > > > > > > > > >> >> >>>>> Here's
> > > >>> > > > > > > > > > > >> >> >>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> design
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> doc
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> I drafted, welcome any
> comments and
> > > >>> > > > > > feedback.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>
> > > >>> > > > > > > > > > > >> >> >>
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Stephan Ewen <
> sewen@apache.org>
> > > >>> > > > > > 于2019年2月14日周四
> > > >>> > > > > > > > > > > >> >> >>>> 下午8:43写道：
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Nice that this
> discussion is
> > > >>> > > > happening.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could
> also
> > > >>> > revisit the
> > > >>> > > > > > entire
> > > >>> > > > > > > > > role
> > > >>> > > > > > > > > > > >> >> >>> of
> > > >>> > > > > > > > > > > >> >> >>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> environments
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> again.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the environments take
> care of
> > > >>> > the
> > > >>> > > > > > specific
> > > >>> > > > > > > > > > > >> >> >> setup
> > > >>> > > > > > > > > > > >> >> >>>> for
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> standalone
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> (no
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> setup needed), yarn,
> mesos, etc.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the session ones have
> control
> > > >>> > over
> > > >>> > > > the
> > > >>> > > > > > > > > session.
> > > >>> > > > > > > > > > > >> >> >>> The
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> environment
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> holds
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the session client.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - running a job gives a
> "control"
> > > >>> > > > object
> > > >>> > > > > > for
> > > >>> > > > > > > > > that
> > > >>> > > > > > > > > > > >> >> >>>> job.
> > > >>> > > > > > > > > > > >> >> >>>>>> That
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> behavior
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> is
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the same in all
> environments.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> The actual
> implementation diverged
> > > >>> > > > quite
> > > >>> > > > > > a bit
> > > >>> > > > > > > > > > > >> >> >> from
> > > >>> > > > > > > > > > > >> >> >>>>> that.
> > > >>> > > > > > > > > > > >> >> >>>>>>>> Happy
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> see a
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> discussion about
> straitening this
> > > >>> > out
> > > >>> > > > a
> > > >>> > > > > > bit
> > > >>> > > > > > > > > more.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at
> 4:58 AM
> > > >>> > Jeff
> > > >>> > > > > > Zhang <
> > > >>> > > > > > > > > > > >> >> >>>>>>> zjffdu@gmail.com>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> wrote:
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Hi folks,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Sorry for late
> response, It
> > > >>> > seems we
> > > >>> > > > > > reach
> > > >>> > > > > > > > > > > >> >> >>> consensus
> > > >>> > > > > > > > > > > >> >> >>>> on
> > > >>> > > > > > > > > > > >> >> >>>>>>>> this, I
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> will
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> create
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> FLIP for this with
> more detailed
> > > >>> > > > design
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <
> thw@apache.org>
> > > >>> > > > > > 于2018年12月21日周五
> > > >>> > > > > > > > > > > >> >> >>>>> 上午11:43写道：
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Great to see this
> discussion
> > > >>> > seeded!
> > > >>> > > > > The
> > > >>> > > > > > > > > > > >> >> >> problems
> > > >>> > > > > > > > > > > >> >> >>>> you
> > > >>> > > > > > > > > > > >> >> >>>>>> face
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> with
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration
> are also
> > > >>> > > > > affecting
> > > >>> > > > > > > > other
> > > >>> > > > > > > > > > > >> >> >>>>> downstream
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> projects,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> like
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Beam.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> We just enabled the
> savepoint
> > > >>> > > > restore
> > > >>> > > > > > option
> > > >>> > > > > > > > > in
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> [1]
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and that was more
> difficult
> > > >>> > than it
> > > >>> > > > > > should
> > > >>> > > > > > > > be.
> > > >>> > > > > > > > > > > >> >> >> The
> > > >>> > > > > > > > > > > >> >> >>>>> main
> > > >>> > > > > > > > > > > >> >> >>>>>>>> issue
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> is
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> that
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> environment and
> cluster client
> > > >>> > > > aren't
> > > >>> > > > > > > > > decoupled.
> > > >>> > > > > > > > > > > >> >> >>>>> Ideally
> > > >>> > > > > > > > > > > >> >> >>>>>>> it
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> should
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> be
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> possible to just get
> the
> > > >>> > matching
> > > >>> > > > > > cluster
> > > >>> > > > > > > > > client
> > > >>> > > > > > > > > > > >> >> >>>> from
> > > >>> > > > > > > > > > > >> >> >>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> environment
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> then control the job
> through it
> > > >>> > > > > > (environment
> > > >>> > > > > > > > > as
> > > >>> > > > > > > > > > > >> >> >>>>> factory
> > > >>> > > > > > > > > > > >> >> >>>>>>> for
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> cluster
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> client). But note
> that the
> > > >>> > > > environment
> > > >>> > > > > > > > classes
> > > >>> > > > > > > > > > > >> >> >> are
> > > >>> > > > > > > > > > > >> >> >>>>> part
> > > >>> > > > > > > > > > > >> >> >>>>>> of
> > > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> public
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and it is not
> straightforward to
> > > >>> > > > make
> > > >>> > > > > > larger
> > > >>> > > > > > > > > > > >> >> >>> changes
> > > >>> > > > > > > > > > > >> >> >>>>>>> without
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> breaking
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> backward
> compatibility.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterClient
> currently exposes
> > > >>> > > > > internal
> > > >>> > > > > > > > > classes
> > > >>> > > > > > > > > > > >> >> >>>> like
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> JobGraph and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it
> should be
> > > >>> > > > possible
> > > >>> > > > > > to
> > > >>> > > > > > > > wrap
> > > >>> > > > > > > > > > > >> >> >>> this
> > > >>> > > > > > > > > > > >> >> >>>>>> with a
> > > >>> > > > > > > > > > > >> >> >>>>>>>> new
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> public
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> API
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that brings the
> required job
> > > >>> > control
> > > >>> > > > > > > > > > > >> >> >> capabilities
> > > >>> > > > > > > > > > > >> >> >>>> for
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> downstream
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> projects.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful
> to look at
> > > >>> > > > some
> > > >>> > > > > > of the
> > > >>> > > > > > > > > > > >> >> >>>>> interfaces
> > > >>> > > > > > > > > > > >> >> >>>>>> in
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Beam
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> while
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> thinking about this:
> [2] for the
> > > >>> > > > > > portable
> > > >>> > > > > > > > job
> > > >>> > > > > > > > > > > >> >> >> API
> > > >>> > > > > > > > > > > >> >> >>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>> [3]
> > > >>> > > > > > > > > > > >> >> >>>>>>>> for
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> old
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> asynchronous job
> control from
> > > >>> > the
> > > >>> > > > Beam
> > > >>> > > > > > Java
> > > >>> > > > > > > > > SDK.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> The backward
> compatibility
> > > >>> > > > discussion
> > > >>> > > > > > [4] is
> > > >>> > > > > > > > > > > >> >> >> also
> > > >>> > > > > > > > > > > >> >> >>>>>> relevant
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> here. A
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> new
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> should shield
> downstream
> > > >>> > projects
> > > >>> > > > from
> > > >>> > > > > > > > > internals
> > > >>> > > > > > > > > > > >> >> >>> and
> > > >>> > > > > > > > > > > >> >> >>>>>> allow
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> them to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> interoperate with
> multiple
> > > >>> > future
> > > >>> > > > > Flink
> > > >>> > > > > > > > > versions
> > > >>> > > > > > > > > > > >> >> >>> in
> > > >>> > > > > > > > > > > >> >> >>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>> same
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> release
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> line
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> without forced
> upgrades.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thanks,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thomas
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [1]
> > > >>> > > > > > > > https://github.com/apache/flink/pull/7249
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [2]
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>
> > > >>> > > > > > > > > > > >> >> >>
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [3]
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>
> > > >>> > > > > > > > > > > >> >> >>
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [4]
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>
> > > >>> > > > > > > > > > > >> >> >>
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018
> at 6:15 PM
> > > >>> > Jeff
> > > >>> > > > > > Zhang <
> > > >>> > > > > > > > > > > >> >> >>>>>>>> zjffdu@gmail.com>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> wrote:
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure
> whether the
> > > >>> > user
> > > >>> > > > > > should
> > > >>> > > > > > > > be
> > > >>> > > > > > > > > > > >> >> >>> able
> > > >>> > > > > > > > > > > >> >> >>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>> define
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> where
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> job
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> runs (in your
> example Yarn).
> > > >>> > This
> > > >>> > > > is
> > > >>> > > > > > > > actually
> > > >>> > > > > > > > > > > >> >> >>>>>> independent
> > > >>> > > > > > > > > > > >> >> >>>>>>>> of
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> job
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> development and is
> something
> > > >>> > which
> > > >>> > > > is
> > > >>> > > > > > > > decided
> > > >>> > > > > > > > > > > >> >> >> at
> > > >>> > > > > > > > > > > >> >> >>>>>>> deployment
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> time.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> User don't need to
> specify
> > > >>> > > > execution
> > > >>> > > > > > mode
> > > >>> > > > > > > > > > > >> >> >>>>>>> programmatically.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> They
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> can
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> also
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass the execution
> mode from
> > > >>> > the
> > > >>> > > > > > arguments
> > > >>> > > > > > > > in
> > > >>> > > > > > > > > > > >> >> >>> flink
> > > >>> > > > > > > > > > > >> >> >>>>> run
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> command.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> e.g.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m
> yarn-cluster
> > > >>> > ....
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m
> local ...
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m
> host:port ...
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense
> to you ?
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes
> sense that
> > > >>> > the
> > > >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment
> > > >>> > > > > > > > > > > >> >> >>>>>> is
> > > >>> > > > > > > > > > > >> >> >>>>>>>> not
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> directly
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> initialized by the
> user and
> > > >>> > instead
> > > >>> > > > > > context
> > > >>> > > > > > > > > > > >> >> >>>> sensitive
> > > >>> > > > > > > > > > > >> >> >>>>>> how
> > > >>> > > > > > > > > > > >> >> >>>>>>>> you
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> want
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> execute your job
> (Flink CLI vs.
> > > >>> > > > IDE,
> > > >>> > > > > > for
> > > >>> > > > > > > > > > > >> >> >>> example).
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I
> notice Flink
> > > >>> > > > would
> > > >>> > > > > > > > create
> > > >>> > > > > > > > > > > >> >> >>>>> different
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> ContextExecutionEnvironment
> > > >>> > based
> > > >>> > > > on
> > > >>> > > > > > > > > different
> > > >>> > > > > > > > > > > >> >> >>>>>> submission
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> scenarios
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> (Flink
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me
> this is
> > > >>> > kind of
> > > >>> > > > > hack
> > > >>> > > > > > > > > > > >> >> >> approach,
> > > >>> > > > > > > > > > > >> >> >>>> not
> > > >>> > > > > > > > > > > >> >> >>>>>> so
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> straightforward.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> What I suggested
> above is that
> > > >>> > is
> > > >>> > > > > that
> > > >>> > > > > > > > flink
> > > >>> > > > > > > > > > > >> >> >>> should
> > > >>> > > > > > > > > > > >> >> >>>>>>> always
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> create
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> but with
> > > >>> > > > > different
> > > >>> > > > > > > > > > > >> >> >>>>> configuration,
> > > >>> > > > > > > > > > > >> >> >>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> based
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> on
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> configuration it
> would create
> > > >>> > the
> > > >>> > > > > > proper
> > > >>> > > > > > > > > > > >> >> >>>>> ClusterClient
> > > >>> > > > > > > > > > > >> >> >>>>>>> for
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> different
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <
> > > >>> > > > trohrmann@apache.org>
> > > >>> > > > > > > > > > > >> >> >>>> 于2018年12月20日周四
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> 下午11:18写道：
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> You are probably
> right that we
> > > >>> > > > have
> > > >>> > > > > > code
> > > >>> > > > > > > > > > > >> >> >>>> duplication
> > > >>> > > > > > > > > > > >> >> >>>>>>> when
> > > >>> > > > > > > > > > > >> >> >>>>>>>> it
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> comes
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creation of the
> ClusterClient.
> > > >>> > > > This
> > > >>> > > > > > should
> > > >>> > > > > > > > > be
> > > >>> > > > > > > > > > > >> >> >>>>> reduced
> > > >>> > > > > > > > > > > >> >> >>>>>> in
> > > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> future.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure
> whether the
> > > >>> > user
> > > >>> > > > > > should be
> > > >>> > > > > > > > > > > >> >> >> able
> > > >>> > > > > > > > > > > >> >> >>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>> define
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> where
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your
> example Yarn).
> > > >>> > This
> > > >>> > > > is
> > > >>> > > > > > > > > actually
> > > >>> > > > > > > > > > > >> >> >>>>>>> independent
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> of the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> development and is
> something
> > > >>> > which
> > > >>> > > > > is
> > > >>> > > > > > > > > decided
> > > >>> > > > > > > > > > > >> >> >> at
> > > >>> > > > > > > > > > > >> >> >>>>>>>> deployment
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> time.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> To
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> it
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the
> > > >>> > > > > > ExecutionEnvironment
> > > >>> > > > > > > > is
> > > >>> > > > > > > > > > > >> >> >> not
> > > >>> > > > > > > > > > > >> >> >>>>>>> directly
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> initialized
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> by
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> the user and
> instead context
> > > >>> > > > > > sensitive how
> > > >>> > > > > > > > > you
> > > >>> > > > > > > > > > > >> >> >>>> want
> > > >>> > > > > > > > > > > >> >> >>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> execute
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> your
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE,
> for
> > > >>> > example).
> > > >>> > > > > > > > However, I
> > > >>> > > > > > > > > > > >> >> >>> agree
> > > >>> > > > > > > > > > > >> >> >>>>>> that
> > > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> ExecutionEnvironment should
> > > >>> > give
> > > >>> > > > you
> > > >>> > > > > > > > access
> > > >>> > > > > > > > > to
> > > >>> > > > > > > > > > > >> >> >>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> ClusterClient
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the
> form of the
> > > >>> > > > > > JobGraph or
> > > >>> > > > > > > > a
> > > >>> > > > > > > > > > > >> >> >> job
> > > >>> > > > > > > > > > > >> >> >>>>> plan).
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Till
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13,
> 2018 at 4:36
> > > >>> > AM
> > > >>> > > > Jeff
> > > >>> > > > > > > > Zhang <
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> wrote:
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the
> feedback. You
> > > >>> > are
> > > >>> > > > > > right
> > > >>> > > > > > > > > that I
> > > >>> > > > > > > > > > > >> >> >>>>> expect
> > > >>> > > > > > > > > > > >> >> >>>>>>>> better
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> submission/control api
> > > >>> > which
> > > >>> > > > > > could be
> > > >>> > > > > > > > > > > >> >> >> used
> > > >>> > > > > > > > > > > >> >> >>> by
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> downstream
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> project.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> And
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit
> for the
> > > >>> > flink
> > > >>> > > > > > ecosystem.
> > > >>> > > > > > > > > > > >> >> >> When
> > > >>> > > > > > > > > > > >> >> >>> I
> > > >>> > > > > > > > > > > >> >> >>>>> look
> > > >>> > > > > > > > > > > >> >> >>>>>>> at
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> code
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> of
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> flink
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and
> sql-client (I
> > > >>> > > > > believe
> > > >>> > > > > > > > they
> > > >>> > > > > > > > > > > >> >> >> are
> > > >>> > > > > > > > > > > >> >> >>>> not
> > > >>> > > > > > > > > > > >> >> >>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> core of
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> flink,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> but
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the
> ecosystem of
> > > >>> > > > flink),
> > > >>> > > > > I
> > > >>> > > > > > find
> > > >>> > > > > > > > > > > >> >> >> many
> > > >>> > > > > > > > > > > >> >> >>>>>>> duplicated
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> code
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> for
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creating
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from
> user
> > > >>> > provided
> > > >>> > > > > > > > > > > >> >> >> configuration
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> (configuration
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> format
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> may
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> different from
> scala-shell
> > > >>> > and
> > > >>> > > > > > > > sql-client)
> > > >>> > > > > > > > > > > >> >> >> and
> > > >>> > > > > > > > > > > >> >> >>>> then
> > > >>> > > > > > > > > > > >> >> >>>>>> use
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> that
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate
> jobs. I don't
> > > >>> > think
> > > >>> > > > > > this is
> > > >>> > > > > > > > > > > >> >> >>>>> convenient
> > > >>> > > > > > > > > > > >> >> >>>>>>> for
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> downstream
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I
> expect is
> > > >>> > that
> > > >>> > > > > > > > downstream
> > > >>> > > > > > > > > > > >> >> >>>> project
> > > >>> > > > > > > > > > > >> >> >>>>>> only
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> needs
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> provide
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> necessary
> configuration info
> > > >>> > > > (maybe
> > > >>> > > > > > > > > > > >> >> >> introducing
> > > >>> > > > > > > > > > > >> >> >>>>> class
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> FlinkConf),
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> build
> ExecutionEnvironment
> > > >>> > based
> > > >>> > > > on
> > > >>> > > > > > this
> > > >>> > > > > > > > > > > >> >> >>>> FlinkConf,
> > > >>> > > > > > > > > > > >> >> >>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> ExecutionEnvironment will
> > > >>> > create
> > > >>> > > > > the
> > > >>> > > > > > > > proper
> > > >>> > > > > > > > > > > >> >> >>>>>>>> ClusterClient.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> It
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> not
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> only
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the
> downstream
> > > >>> > > > project
> > > >>> > > > > > > > > > > >> >> >> development
> > > >>> > > > > > > > > > > >> >> >>>> but
> > > >>> > > > > > > > > > > >> >> >>>>>> also
> > > >>> > > > > > > > > > > >> >> >>>>>>>> be
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> helpful
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> for
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> their integration
> test with
> > > >>> > > > flink.
> > > >>> > > > > > Here's
> > > >>> > > > > > > > > one
> > > >>> > > > > > > > > > > >> >> >>>>> sample
> > > >>> > > > > > > > > > > >> >> >>>>>>> code
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> snippet
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> I
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new
> > > >>> > > > > > FlinkConf().mode("yarn")
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new
> > > >>> > > > > > ExecutionEnvironment(conf)
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId =
> env.submit(...)
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > >>> > > > env.getClusterClient().queryJobStatus(jobId)
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > env.getClusterClient().cancelJob(jobId)
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
> > > >>> > > > > trohrmann@apache.org>
> > > >>> > > > > > > > > > > >> >> >>>>> 于2018年12月11日周二
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> 下午6:28写道：
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are
> proposing is to
> > > >>> > > > > > provide the
> > > >>> > > > > > > > > > > >> >> >> user
> > > >>> > > > > > > > > > > >> >> >>>> with
> > > >>> > > > > > > > > > > >> >> >>>>>>>> better
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There
> was actually
> > > >>> > an
> > > >>> > > > > > effort to
> > > >>> > > > > > > > > > > >> >> >>> achieve
> > > >>> > > > > > > > > > > >> >> >>>>>> this
> > > >>> > > > > > > > > > > >> >> >>>>>>>> but
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> it
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> has
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> never
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> been
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1].
> However,
> > > >>> > there
> > > >>> > > > are
> > > >>> > > > > > some
> > > >>> > > > > > > > > > > >> >> >>>> improvement
> > > >>> > > > > > > > > > > >> >> >>>>>> in
> > > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> code
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> base
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> now.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example
> at the
> > > >>> > > > > > NewClusterClient
> > > >>> > > > > > > > > > > >> >> >>>> interface
> > > >>> > > > > > > > > > > >> >> >>>>>>> which
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> offers a
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job
> submission.
> > > >>> > > > But I
> > > >>> > > > > > agree
> > > >>> > > > > > > > > > > >> >> >> that
> > > >>> > > > > > > > > > > >> >> >>> we
> > > >>> > > > > > > > > > > >> >> >>>>>> need
> > > >>> > > > > > > > > > > >> >> >>>>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> improve
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> in
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in
> favour if
> > > >>> > > > > > exposing all
> > > >>> > > > > > > > > > > >> >> >>>>>> ClusterClient
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> calls
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> via
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> ExecutionEnvironment
> > > >>> > because it
> > > >>> > > > > > would
> > > >>> > > > > > > > > > > >> >> >> clutter
> > > >>> > > > > > > > > > > >> >> >>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>> class
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> would
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> not
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> a
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation
> of concerns.
> > > >>> > > > > > Instead one
> > > >>> > > > > > > > > > > >> >> >> idea
> > > >>> > > > > > > > > > > >> >> >>>>> could
> > > >>> > > > > > > > > > > >> >> >>>>>> be
> > > >>> > > > > > > > > > > >> >> >>>>>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> retrieve
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> current
> ClusterClient from
> > > >>> > the
> > > >>> > > > > > > > > > > >> >> >>>>> ExecutionEnvironment
> > > >>> > > > > > > > > > > >> >> >>>>>>>> which
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> can
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> be
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> used
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and
> job
> > > >>> > control. But
> > > >>> > > > > > before
> > > >>> > > > > > > > we
> > > >>> > > > > > > > > > > >> >> >>> start
> > > >>> > > > > > > > > > > >> >> >>>>> an
> > > >>> > > > > > > > > > > >> >> >>>>>>>> effort
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> here,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> we
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> need
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture
> what
> > > >>> > > > > > functionality we
> > > >>> > > > > > > > > want
> > > >>> > > > > > > > > > > >> >> >>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>> provide.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the
> idea was
> > > >>> > that we
> > > >>> > > > > > have the
> > > >>> > > > > > > > > > > >> >> >>>>>>>> ClusterDescriptor
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> describing
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> how
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to
> cluster manager
> > > >>> > like
> > > >>> > > > > > Yarn or
> > > >>> > > > > > > > > > > >> >> >> Mesos.
> > > >>> > > > > > > > > > > >> >> >>>> The
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> can
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> be
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> used for
> deploying Flink
> > > >>> > > > clusters
> > > >>> > > > > > (job
> > > >>> > > > > > > > and
> > > >>> > > > > > > > > > > >> >> >>>>> session)
> > > >>> > > > > > > > > > > >> >> >>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> gives
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> you a
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
> > > >>> > ClusterClient
> > > >>> > > > > > > > controls
> > > >>> > > > > > > > > > > >> >> >> the
> > > >>> > > > > > > > > > > >> >> >>>>>> cluster
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> (e.g.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> submitting
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all
> running
> > > >>> > jobs).
> > > >>> > > > > And
> > > >>> > > > > > > > then
> > > >>> > > > > > > > > > > >> >> >>> there
> > > >>> > > > > > > > > > > >> >> >>>>> was
> > > >>> > > > > > > > > > > >> >> >>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> idea
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which
> you obtain
> > > >>> > from
> > > >>> > > > > the
> > > >>> > > > > > > > > > > >> >> >>>> ClusterClient
> > > >>> > > > > > > > > > > >> >> >>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> trigger
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> specific
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g.
> taking a
> > > >>> > > > > savepoint,
> > > >>> > > > > > > > > > > >> >> >>> cancelling
> > > >>> > > > > > > > > > > >> >> >>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>> job).
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> > > >>> > > > > > > > > > > >> >> >>>>>>
> > > >>> > > > https://issues.apache.org/jira/browse/FLINK-4272
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11,
> 2018 at
> > > >>> > 10:13 AM
> > > >>> > > > > > Jeff
> > > >>> > > > > > > > > Zhang
> > > >>> > > > > > > > > > > >> >> >> <
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> wrote:
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to
> integrate
> > > >>> > flink
> > > >>> > > > > into
> > > >>> > > > > > > > > apache
> > > >>> > > > > > > > > > > >> >> >>>>> zeppelin
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> which is
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> an
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I
> hit several
> > > >>> > > > > issues
> > > >>> > > > > > that
> > > >>> > > > > > > > > is
> > > >>> > > > > > > > > > > >> >> >>>> caused
> > > >>> > > > > > > > > > > >> >> >>>>>> by
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> flink
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> client
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> api.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> So
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to
> proposal the
> > > >>> > > > > following
> > > >>> > > > > > > > > changes
> > > >>> > > > > > > > > > > >> >> >>> for
> > > >>> > > > > > > > > > > >> >> >>>>>> flink
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> client
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> api.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support
> nonblocking
> > > >>> > > > execution.
> > > >>> > > > > > > > > > > >> >> >> Currently,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> ExecutionEnvironment#execute
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking
> method which
> > > >>> > > > would
> > > >>> > > > > > do 2
> > > >>> > > > > > > > > > > >> >> >> things,
> > > >>> > > > > > > > > > > >> >> >>>>> first
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> submit
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job
> until it is
> > > >>> > > > > finished.
> > > >>> > > > > > I'd
> > > >>> > > > > > > > > like
> > > >>> > > > > > > > > > > >> >> >>>>>>> introduce a
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> nonblocking
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method
> like
> > > >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment#submit
> > > >>> > > > > > > > > > > >> >> >>>>>>> which
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> only
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> submit
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return
> jobId to
> > > >>> > client.
> > > >>> > > > And
> > > >>> > > > > > allow
> > > >>> > > > > > > > > user
> > > >>> > > > > > > > > > > >> >> >>> to
> > > >>> > > > > > > > > > > >> >> >>>>>> query
> > > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> job
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> status
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> via
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel
> api in
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>
> ExecutionEnvironment/StreamExecutionEnvironment,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the
> only way to
> > > >>> > > > cancel
> > > >>> > > > > > job is
> > > >>> > > > > > > > > via
> > > >>> > > > > > > > > > > >> >> >>> cli
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> (bin/flink),
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> this
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> is
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> not
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for
> downstream
> > > >>> > > > project
> > > >>> > > > > > to
> > > >>> > > > > > > > use
> > > >>> > > > > > > > > > > >> >> >> this
> > > >>> > > > > > > > > > > >> >> >>>>>>> feature.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> So I'd
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> like
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> add
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in
> > > >>> > > > > ExecutionEnvironment
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint
> api in
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>
> > > >>> > ExecutionEnvironment/StreamExecutionEnvironment.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> It
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as
> cancel api,
> > > >>> > we
> > > >>> > > > > > should use
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> ExecutionEnvironment
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> as
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third
> party to
> > > >>> > > > integrate
> > > >>> > > > > > with
> > > >>> > > > > > > > > > > >> >> >> flink.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener
> for job
> > > >>> > > > execution
> > > >>> > > > > > > > > > > >> >> >> lifecycle.
> > > >>> > > > > > > > > > > >> >> >>>>>>> Something
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> like
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> following,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> so
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream
> project
> > > >>> > can do
> > > >>> > > > > > custom
> > > >>> > > > > > > > > logic
> > > >>> > > > > > > > > > > >> >> >>> in
> > > >>> > > > > > > > > > > >> >> >>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> lifecycle
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> of
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> job.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would
> capture the
> > > >>> > > > jobId
> > > >>> > > > > > after
> > > >>> > > > > > > > > job
> > > >>> > > > > > > > > > > >> >> >> is
> > > >>> > > > > > > > > > > >> >> >>>>>>> submitted
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> use
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> this
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel
> it later
> > > >>> > when
> > > >>> > > > > > > > necessary.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface
> > > >>> > JobListener {
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
> onJobSubmitted(JobID
> > > >>> > > > > jobId);
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
> > > >>> > > > > > onJobExecuted(JobExecutionResult
> > > >>> > > > > > > > > > > >> >> >>>>> jobResult);
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
> onJobCanceled(JobID
> > > >>> > > > jobId);
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable
> session in
> > > >>> > > > > > > > > ExecutionEnvironment.
> > > >>> > > > > > > > > > > >> >> >>>>>> Currently
> > > >>> > > > > > > > > > > >> >> >>>>>>> it
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> is
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> disabled,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> but
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very
> convenient
> > > >>> > for
> > > >>> > > > > > third
> > > >>> > > > > > > > > party
> > > >>> > > > > > > > > > > >> >> >> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>> submitting
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> jobs
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can
> enable it
> > > >>> > > > again.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all
> flink client
> > > >>> > api
> > > >>> > > > > into
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > ExecutionEnvironment/StreamExecutionEnvironment.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long
> term issue
> > > >>> > which
> > > >>> > > > > > needs
> > > >>> > > > > > > > > more
> > > >>> > > > > > > > > > > >> >> >>>>> careful
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> thinking
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> design.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some
> of features
> > > >>> > of
> > > >>> > > > > > flink is
> > > >>> > > > > > > > > > > >> >> >>> exposed
> > > >>> > > > > > > > > > > >> >> >>>> in
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > ExecutionEnvironment/StreamExecutionEnvironment,
> > > >>> > > > > > > > > > > >> >> >>>>>> but
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> some are
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> exposed
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> in
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of
> api, like
> > > >>> > the
> > > >>> > > > > > cancel and
> > > >>> > > > > > > > > > > >> >> >>>>> savepoint I
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> mentioned
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> above.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root
> cause is
> > > >>> > due to
> > > >>> > > > > that
> > > >>> > > > > > > > flink
> > > >>> > > > > > > > > > > >> >> >>>> didn't
> > > >>> > > > > > > > > > > >> >> >>>>>>> unify
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> interaction
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> with
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I
> list 3
> > > >>> > scenarios
> > > >>> > > > of
> > > >>> > > > > > flink
> > > >>> > > > > > > > > > > >> >> >>>> operation
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job
> execution.
> > > >>> > Flink
> > > >>> > > > > will
> > > >>> > > > > > > > > create
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> LocalEnvironment
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> use
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this
> LocalEnvironment to
> > > >>> > > > create
> > > >>> > > > > > > > > > > >> >> >>> LocalExecutor
> > > >>> > > > > > > > > > > >> >> >>>>> for
> > > >>> > > > > > > > > > > >> >> >>>>>>> job
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> execution.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job
> execution.
> > > >>> > Flink
> > > >>> > > > > will
> > > >>> > > > > > > > > create
> > > >>> > > > > > > > > > > >> >> >>>>>>>> ClusterClient
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> first
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> then
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create
> ContextEnvironment
> > > >>> > > > based
> > > >>> > > > > > on the
> > > >>> > > > > > > > > > > >> >> >>>>>>> ClusterClient
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> run
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job
> cancelation. Flink
> > > >>> > will
> > > >>> > > > > > create
> > > >>> > > > > > > > > > > >> >> >>>>>> ClusterClient
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> first
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> then
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via
> this
> > > >>> > > > ClusterClient.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see
> in the
> > > >>> > above 3
> > > >>> > > > > > > > scenarios.
> > > >>> > > > > > > > > > > >> >> >>> Flink
> > > >>> > > > > > > > > > > >> >> >>>>>> didn't
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> use the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code
> path) to
> > > >>> > interact
> > > >>> > > > > > with
> > > >>> > > > > > > > > flink
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is
> > > >>> > following:
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> > > >>> > > > > > > > > > > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> (based
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> on
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> user
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration)
> --> Use this
> > > >>> > > > > > Environment
> > > >>> > > > > > > > > to
> > > >>> > > > > > > > > > > >> >> >>>> create
> > > >>> > > > > > > > > > > >> >> >>>>>>>> proper
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> (LocalClusterClient or
> > > >>> > > > > > > > RestClusterClient)
> > > >>> > > > > > > > > > > >> >> >> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>> interactive
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> with
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink (
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or
> cancelation)
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can
> unify the
> > > >>> > > > process
> > > >>> > > > > > of
> > > >>> > > > > > > > > local
> > > >>> > > > > > > > > > > >> >> >>>>>> execution
> > > >>> > > > > > > > > > > >> >> >>>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> remote
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much
> easier for
> > > >>> > third
> > > >>> > > > > > party
> > > >>> > > > > > > > to
> > > >>> > > > > > > > > > > >> >> >>>>> integrate
> > > >>> > > > > > > > > > > >> >> >>>>>>> with
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> flink,
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> because
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> ExecutionEnvironment is the
> > > >>> > > > > unified
> > > >>> > > > > > > > entry
> > > >>> > > > > > > > > > > >> >> >>> point
> > > >>> > > > > > > > > > > >> >> >>>>> for
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> flink.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> third
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> party
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is
> just pass
> > > >>> > > > > > configuration
> > > >>> > > > > > > > to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> ExecutionEnvironment will
> > > >>> > do
> > > >>> > > > the
> > > >>> > > > > > right
> > > >>> > > > > > > > > > > >> >> >> thing
> > > >>> > > > > > > > > > > >> >> >>>>> based
> > > >>> > > > > > > > > > > >> >> >>>>>> on
> > > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can
> also be
> > > >>> > > > considered
> > > >>> > > > > as
> > > >>> > > > > > > > flink
> > > >>> > > > > > > > > > > >> >> >> api
> > > >>> > > > > > > > > > > >> >> >>>>>>> consumer.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> it
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> just
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to
> > > >>> > > > > > ExecutionEnvironment
> > > >>> > > > > > > > and
> > > >>> > > > > > > > > > > >> >> >> let
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper
> > > >>> > ClusterClient
> > > >>> > > > > > instead
> > > >>> > > > > > > > > of
> > > >>> > > > > > > > > > > >> >> >>>>> letting
> > > >>> > > > > > > > > > > >> >> >>>>>>> cli
> > > >>> > > > > > > > > > > >> >> >>>>>>>> to
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> create
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> directly.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve
> large code
> > > >>> > > > > > refactoring,
> > > >>> > > > > > > > > so
> > > >>> > > > > > > > > > > >> >> >> I
> > > >>> > > > > > > > > > > >> >> >>>>> think
> > > >>> > > > > > > > > > > >> >> >>>>>> we
> > > >>> > > > > > > > > > > >> >> >>>>>>>> can
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> defer
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> it
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> for
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release,
> 1,2,3,4,5
> > > >>> > could
> > > >>> > > > > be
> > > >>> > > > > > done
> > > >>> > > > > > > > > at
> > > >>> > > > > > > > > > > >> >> >>>> once I
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>> believe.
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Let
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> know
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> your
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and
> feedback,
> > > >>> > thanks
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> --
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> --
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> --
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Best Regards
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> --
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Best Regards
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Jeff Zhang
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> --
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Best Regards
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Jeff Zhang
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>> --
> > > >>> > > > > > > > > > > >> >> >>>>>>>> Best Regards
> > > >>> > > > > > > > > > > >> >> >>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>> Jeff Zhang
> > > >>> > > > > > > > > > > >> >> >>>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>
> > > >>> > > > > > > > > > > >> >> >>>>>
> > > >>> > > > > > > > > > > >> >> >>>>> --
> > > >>> > > > > > > > > > > >> >> >>>>> Best Regards
> > > >>> > > > > > > > > > > >> >> >>>>>
> > > >>> > > > > > > > > > > >> >> >>>>> Jeff Zhang
> > > >>> > > > > > > > > > > >> >> >>>>>
> > > >>> > > > > > > > > > > >> >> >>>>
> > > >>> > > > > > > > > > > >> >> >>>
> > > >>> > > > > > > > > > > >> >> >>
> > > >>> > > > > > > > > > > >> >> >
> > > >>> > > > > > > > > > > >> >> >
> > > >>> > > > > > > > > > > >> >> > --
> > > >>> > > > > > > > > > > >> >> > Best Regards
> > > >>> > > > > > > > > > > >> >> >
> > > >>> > > > > > > > > > > >> >> > Jeff Zhang
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >> >>
> > > >>> > > > > > > > > > > >>
> > > >>> > > > > > > > > > > >
> > > >>> > > > > > > > > > > >
> > > >>> > > > > > > > > > >
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> >
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Best Regards
> > > >>>
> > > >>> Jeff Zhang
>
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Aljoscha Krettek <al...@apache.org>.

Hi Tison,

To keep this moving forward, maybe you want to start working on a proof of concept implementation for the new JobClient interface, maybe with a new method executeAsync() in the environment that returns the JobClient and implement the methods to see how that works and to see where we get. Would you be interested in that?

Also, at some point we should collect all the ideas and start forming an actual FLIP.

Best,
Aljoscha

> On 4. Sep 2019, at 12:04, Zili Chen <wa...@gmail.com> wrote:
> 
> Thanks for your update Kostas! 
> 
> It looks good to me that clean up existing code paths as first
> pass. I'd like to help on review and file subtasks if I find ones.
> 
> Best,
> tison.
> 
> 
> Kostas Kloudas <kk...@gmail.com> 于2019年9月4日周三 下午5:52写道：
> Here is the issue, and I will keep on updating it as I find more issues.
> 
> https://issues.apache.org/jira/browse/FLINK-13954
> 
> This will also cover the refactoring of the Executors that we discussed
> in this thread, without any additional functionality (such as the job client).
> 
> Kostas
> 
> On Wed, Sep 4, 2019 at 11:46 AM Kostas Kloudas <kk...@gmail.com> wrote:
> >
> > Great idea Tison!
> >
> > I will create the umbrella issue and post it here so that we are all
> > on the same page!
> >
> > Cheers,
> > Kostas
> >
> > On Wed, Sep 4, 2019 at 11:36 AM Zili Chen <wa...@gmail.com> wrote:
> > >
> > > Hi Kostas & Aljoscha,
> > >
> > > I notice that there is a JIRA(FLINK-13946) which could be included
> > > in this refactor thread. Since we agree on most of directions in
> > > big picture, is it reasonable that we create an umbrella issue for
> > > refactor client APIs and also linked subtasks? It would be a better
> > > way that we join forces of our community.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Zili Chen <wa...@gmail.com> 于2019年8月31日周六 下午12:52写道：
> > >>
> > >> Great Kostas! Looking forward to your POC!
> > >>
> > >> Best,
> > >> tison.
> > >>
> > >>
> > >> Jeff Zhang <zj...@gmail.com> 于2019年8月30日周五 下午11:07写道：
> > >>>
> > >>> Awesome, @Kostas Looking forward your POC.
> > >>>
> > >>> Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：
> > >>>
> > >>> > Hi all,
> > >>> >
> > >>> > I am just writing here to let you know that I am working on a POC that
> > >>> > tries to refactor the current state of job submission in Flink.
> > >>> > I want to stress out that it introduces NO CHANGES to the current
> > >>> > behaviour of Flink. It just re-arranges things and introduces the
> > >>> > notion of an Executor, which is the entity responsible for taking the
> > >>> > user-code and submitting it for execution.
> > >>> >
> > >>> > Given this, the discussion about the functionality that the JobClient
> > >>> > will expose to the user can go on independently and the same
> > >>> > holds for all the open questions so far.
> > >>> >
> > >>> > I hope I will have some more new to share soon.
> > >>> >
> > >>> > Thanks,
> > >>> > Kostas
> > >>> >
> > >>> > On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <da...@gmail.com> wrote:
> > >>> > >
> > >>> > > Hi Zili,
> > >>> > >
> > >>> > > It make sense to me that a dedicated cluster is started for a per-job
> > >>> > > cluster and will not accept more jobs.
> > >>> > > Just have a question about the command line.
> > >>> > >
> > >>> > > Currently we could use the following commands to start different
> > >>> > clusters.
> > >>> > > *per-job cluster*
> > >>> > > ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
> > >>> > > examples/streaming/WindowJoin.jar
> > >>> > > *session cluster*
> > >>> > > ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
> > >>> > > examples/streaming/WindowJoin.jar
> > >>> > >
> > >>> > > What will it look like after client enhancement?
> > >>> > >
> > >>> > >
> > >>> > > Best,
> > >>> > > Yang
> > >>> > >
> > >>> > > Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
> > >>> > >
> > >>> > > > Hi Till,
> > >>> > > >
> > >>> > > > Thanks for your update. Nice to hear :-)
> > >>> > > >
> > >>> > > > Best,
> > >>> > > > tison.
> > >>> > > >
> > >>> > > >
> > >>> > > > Till Rohrmann <tr...@apache.org> 于2019年8月23日周五 下午10:39写道：
> > >>> > > >
> > >>> > > > > Hi Tison,
> > >>> > > > >
> > >>> > > > > just a quick comment concerning the class loading issues when using
> > >>> > the
> > >>> > > > per
> > >>> > > > > job mode. The community wants to change it so that the
> > >>> > > > > StandaloneJobClusterEntryPoint actually uses the user code class
> > >>> > loader
> > >>> > > > > with child first class loading [1]. Hence, I hope that this problem
> > >>> > will
> > >>> > > > be
> > >>> > > > > resolved soon.
> > >>> > > > >
> > >>> > > > > [1] https://issues.apache.org/jira/browse/FLINK-13840
> > >>> > > > >
> > >>> > > > > Cheers,
> > >>> > > > > Till
> > >>> > > > >
> > >>> > > > > On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <kk...@gmail.com>
> > >>> > > > wrote:
> > >>> > > > >
> > >>> > > > > > Hi all,
> > >>> > > > > >
> > >>> > > > > > On the topic of web submission, I agree with Till that it only
> > >>> > seems
> > >>> > > > > > to complicate things.
> > >>> > > > > > It is bad for security, job isolation (anybody can submit/cancel
> > >>> > jobs),
> > >>> > > > > > and its
> > >>> > > > > > implementation complicates some parts of the code. So, if it were
> > >>> > to
> > >>> > > > > > redesign the
> > >>> > > > > > WebUI, maybe this part could be left out. In addition, I would say
> > >>> > > > > > that the ability to cancel
> > >>> > > > > > jobs could also be left out.
> > >>> > > > > >
> > >>> > > > > > Also I would also be in favour of removing the "detached" mode, for
> > >>> > > > > > the reasons mentioned
> > >>> > > > > > above (i.e. because now we will have a future representing the
> > >>> > result
> > >>> > > > > > on which the user
> > >>> > > > > > can choose to wait or not).
> > >>> > > > > >
> > >>> > > > > > Now for the separating job submission and cluster creation, I am in
> > >>> > > > > > favour of keeping both.
> > >>> > > > > > Once again, the reasons are mentioned above by Stephan, Till,
> > >>> > Aljoscha
> > >>> > > > > > and also Zili seems
> > >>> > > > > > to agree. They mainly have to do with security, isolation and ease
> > >>> > of
> > >>> > > > > > resource management
> > >>> > > > > > for the user as he knows that "when my job is done, everything
> > >>> > will be
> > >>> > > > > > cleared up". This is
> > >>> > > > > > also the experience you get when launching a process on your local
> > >>> > OS.
> > >>> > > > > >
> > >>> > > > > > On excluding the per-job mode from returning a JobClient or not, I
> > >>> > > > > > believe that eventually
> > >>> > > > > > it would be nice to allow users to get back a jobClient. The
> > >>> > reason is
> > >>> > > > > > that 1) I cannot
> > >>> > > > > > find any objective reason why the user-experience should diverge,
> > >>> > and
> > >>> > > > > > 2) this will be the
> > >>> > > > > > way that the user will be able to interact with his running job.
> > >>> > > > > > Assuming that the necessary
> > >>> > > > > > ports are open for the REST API to work, then I think that the
> > >>> > > > > > JobClient can run against the
> > >>> > > > > > REST API without problems. If the needed ports are not open, then
> > >>> > we
> > >>> > > > > > are safe to not return
> > >>> > > > > > a JobClient, as the user explicitly chose to close all points of
> > >>> > > > > > communication to his running job.
> > >>> > > > > >
> > >>> > > > > > On the topic of not hijacking the "env.execute()" in order to get
> > >>> > the
> > >>> > > > > > Plan, I definitely agree but
> > >>> > > > > > for the proposal of having a "compile()" method in the env, I would
> > >>> > > > > > like to have a better look at
> > >>> > > > > > the existing code.
> > >>> > > > > >
> > >>> > > > > > Cheers,
> > >>> > > > > > Kostas
> > >>> > > > > >
> > >>> > > > > > On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <wa...@gmail.com>
> > >>> > > > wrote:
> > >>> > > > > > >
> > >>> > > > > > > Hi Yang,
> > >>> > > > > > >
> > >>> > > > > > > It would be helpful if you check Stephan's last comment,
> > >>> > > > > > > which states that isolation is important.
> > >>> > > > > > >
> > >>> > > > > > > For per-job mode, we run a dedicated cluster(maybe it
> > >>> > > > > > > should have been a couple of JM and TMs during FLIP-6
> > >>> > > > > > > design) for a specific job. Thus the process is prevented
> > >>> > > > > > > from other jobs.
> > >>> > > > > > >
> > >>> > > > > > > In our cases there was a time we suffered from multi
> > >>> > > > > > > jobs submitted by different users and they affected
> > >>> > > > > > > each other so that all ran into an error state. Also,
> > >>> > > > > > > run the client inside the cluster could save client
> > >>> > > > > > > resource at some points.
> > >>> > > > > > >
> > >>> > > > > > > However, we also face several issues as you mentioned,
> > >>> > > > > > > that in per-job mode it always uses parent classloader
> > >>> > > > > > > thus classloading issues occur.
> > >>> > > > > > >
> > >>> > > > > > > BTW, one can makes an analogy between session/per-job mode
> > >>> > > > > > > in  Flink, and client/cluster mode in Spark.
> > >>> > > > > > >
> > >>> > > > > > > Best,
> > >>> > > > > > > tison.
> > >>> > > > > > >
> > >>> > > > > > >
> > >>> > > > > > > Yang Wang <da...@gmail.com> 于2019年8月22日周四 上午11:25写道：
> > >>> > > > > > >
> > >>> > > > > > > > From the user's perspective, it is really confused about the
> > >>> > scope
> > >>> > > > of
> > >>> > > > > > > > per-job cluster.
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > > If it means a flink cluster with single job, so that we could
> > >>> > get
> > >>> > > > > > better
> > >>> > > > > > > > isolation.
> > >>> > > > > > > >
> > >>> > > > > > > > Now it does not matter how we deploy the cluster, directly
> > >>> > > > > > deploy(mode1)
> > >>> > > > > > > >
> > >>> > > > > > > > or start a flink cluster and then submit job through cluster
> > >>> > > > > > client(mode2).
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > > Otherwise, if it just means directly deploy, how should we
> > >>> > name the
> > >>> > > > > > mode2,
> > >>> > > > > > > >
> > >>> > > > > > > > session with job or something else?
> > >>> > > > > > > >
> > >>> > > > > > > > We could also benefit from the mode2. Users could get the same
> > >>> > > > > > isolation
> > >>> > > > > > > > with mode1.
> > >>> > > > > > > >
> > >>> > > > > > > > The user code and dependencies will be loaded by user class
> > >>> > loader
> > >>> > > > > > > >
> > >>> > > > > > > > to avoid class conflict with framework.
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > > Anyway, both of the two submission modes are useful.
> > >>> > > > > > > >
> > >>> > > > > > > > We just need to clarify the concepts.
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > > Best,
> > >>> > > > > > > >
> > >>> > > > > > > > Yang
> > >>> > > > > > > >
> > >>> > > > > > > > Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午5:58写道：
> > >>> > > > > > > >
> > >>> > > > > > > > > Thanks for the clarification.
> > >>> > > > > > > > >
> > >>> > > > > > > > > The idea JobDeployer ever came into my mind when I was
> > >>> > muddled
> > >>> > > > with
> > >>> > > > > > > > > how to execute per-job mode and session mode with the same
> > >>> > user
> > >>> > > > > code
> > >>> > > > > > > > > and framework codepath.
> > >>> > > > > > > > >
> > >>> > > > > > > > > With the concept JobDeployer we back to the statement that
> > >>> > > > > > environment
> > >>> > > > > > > > > knows every configs of cluster deployment and job
> > >>> > submission. We
> > >>> > > > > > > > > configure or generate from configuration a specific
> > >>> > JobDeployer
> > >>> > > > in
> > >>> > > > > > > > > environment and then code align on
> > >>> > > > > > > > >
> > >>> > > > > > > > > *JobClient client = env.execute().get();*
> > >>> > > > > > > > >
> > >>> > > > > > > > > which in session mode returned by clusterClient.submitJob
> > >>> > and in
> > >>> > > > > > per-job
> > >>> > > > > > > > > mode returned by clusterDescriptor.deployJobCluster.
> > >>> > > > > > > > >
> > >>> > > > > > > > > Here comes a problem that currently we directly run
> > >>> > > > > ClusterEntrypoint
> > >>> > > > > > > > > with extracted job graph. Follow the JobDeployer way we'd
> > >>> > better
> > >>> > > > > > > > > align entry point of per-job deployment at JobDeployer.
> > >>> > Users run
> > >>> > > > > > > > > their main method or by a Cli(finally call main method) to
> > >>> > deploy
> > >>> > > > > the
> > >>> > > > > > > > > job cluster.
> > >>> > > > > > > > >
> > >>> > > > > > > > > Best,
> > >>> > > > > > > > > tison.
> > >>> > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > > > Stephan Ewen <se...@apache.org> 于2019年8月20日周二 下午4:40写道：
> > >>> > > > > > > > >
> > >>> > > > > > > > > > Till has made some good comments here.
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > Two things to add:
> > >>> > > > > > > > > >
> > >>> > > > > > > > > >   - The job mode is very nice in the way that it runs the
> > >>> > > > client
> > >>> > > > > > inside
> > >>> > > > > > > > > the
> > >>> > > > > > > > > > cluster (in the same image/process that is the JM) and thus
> > >>> > > > > unifies
> > >>> > > > > > > > both
> > >>> > > > > > > > > > applications and what the Spark world calls the "driver
> > >>> > mode".
> > >>> > > > > > > > > >
> > >>> > > > > > > > > >   - Another thing I would add is that during the FLIP-6
> > >>> > design,
> > >>> > > > > we
> > >>> > > > > > were
> > >>> > > > > > > > > > thinking about setups where Dispatcher and JobManager are
> > >>> > > > > separate
> > >>> > > > > > > > > > processes.
> > >>> > > > > > > > > >     A Yarn or Mesos Dispatcher of a session could run
> > >>> > > > > independently
> > >>> > > > > > > > (even
> > >>> > > > > > > > > > as privileged processes executing no code).
> > >>> > > > > > > > > >     Then you the "per-job" mode could still be helpful:
> > >>> > when a
> > >>> > > > > job
> > >>> > > > > > is
> > >>> > > > > > > > > > submitted to the dispatcher, it launches the JM again in a
> > >>> > > > > per-job
> > >>> > > > > > > > mode,
> > >>> > > > > > > > > so
> > >>> > > > > > > > > > that JM and TM processes are bound to teh job only. For
> > >>> > higher
> > >>> > > > > > security
> > >>> > > > > > > > > > setups, it is important that processes are not reused
> > >>> > across
> > >>> > > > > jobs.
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
> > >>> > > > > > trohrmann@apache.org>
> > >>> > > > > > > > > > wrote:
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > > I would not be in favour of getting rid of the per-job
> > >>> > mode
> > >>> > > > > > since it
> > >>> > > > > > > > > > > simplifies the process of running Flink jobs
> > >>> > considerably.
> > >>> > > > > > Moreover,
> > >>> > > > > > > > it
> > >>> > > > > > > > > > is
> > >>> > > > > > > > > > > not only well suited for container deployments but also
> > >>> > for
> > >>> > > > > > > > deployments
> > >>> > > > > > > > > > > where you want to guarantee job isolation. For example, a
> > >>> > > > user
> > >>> > > > > > could
> > >>> > > > > > > > > use
> > >>> > > > > > > > > > > the per-job mode on Yarn to execute his job on a separate
> > >>> > > > > > cluster.
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > I think that having two notions of cluster deployments
> > >>> > > > (session
> > >>> > > > > > vs.
> > >>> > > > > > > > > > per-job
> > >>> > > > > > > > > > > mode) does not necessarily contradict your ideas for the
> > >>> > > > client
> > >>> > > > > > api
> > >>> > > > > > > > > > > refactoring. For example one could have the following
> > >>> > > > > interfaces:
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > - ClusterDeploymentDescriptor: encapsulates the logic
> > >>> > how to
> > >>> > > > > > deploy a
> > >>> > > > > > > > > > > cluster.
> > >>> > > > > > > > > > > - ClusterClient: allows to interact with a cluster
> > >>> > > > > > > > > > > - JobClient: allows to interact with a running job
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > Now the ClusterDeploymentDescriptor could have two
> > >>> > methods:
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > - ClusterClient deploySessionCluster()
> > >>> > > > > > > > > > > - JobClusterClient/JobClient
> > >>> > deployPerJobCluster(JobGraph)
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > where JobClusterClient is either a supertype of
> > >>> > ClusterClient
> > >>> > > > > > which
> > >>> > > > > > > > > does
> > >>> > > > > > > > > > > not give you the functionality to submit jobs or
> > >>> > > > > > deployPerJobCluster
> > >>> > > > > > > > > > > returns directly a JobClient.
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > When setting up the ExecutionEnvironment, one would then
> > >>> > not
> > >>> > > > > > provide
> > >>> > > > > > > > a
> > >>> > > > > > > > > > > ClusterClient to submit jobs but a JobDeployer which,
> > >>> > > > depending
> > >>> > > > > > on
> > >>> > > > > > > > the
> > >>> > > > > > > > > > > selected mode, either uses a ClusterClient (session
> > >>> > mode) to
> > >>> > > > > > submit
> > >>> > > > > > > > > jobs
> > >>> > > > > > > > > > or
> > >>> > > > > > > > > > > a ClusterDeploymentDescriptor to deploy per a job mode
> > >>> > > > cluster
> > >>> > > > > > with
> > >>> > > > > > > > the
> > >>> > > > > > > > > > job
> > >>> > > > > > > > > > > to execute.
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > These are just some thoughts how one could make it
> > >>> > working
> > >>> > > > > > because I
> > >>> > > > > > > > > > > believe there is some value in using the per job mode
> > >>> > from
> > >>> > > > the
> > >>> > > > > > > > > > > ExecutionEnvironment.
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > Concerning the web submission, this is indeed a bit
> > >>> > tricky.
> > >>> > > > > From
> > >>> > > > > > a
> > >>> > > > > > > > > > cluster
> > >>> > > > > > > > > > > management stand point, I would in favour of not
> > >>> > executing
> > >>> > > > user
> > >>> > > > > > code
> > >>> > > > > > > > on
> > >>> > > > > > > > > > the
> > >>> > > > > > > > > > > REST endpoint. Especially when considering security, it
> > >>> > would
> > >>> > > > > be
> > >>> > > > > > good
> > >>> > > > > > > > > to
> > >>> > > > > > > > > > > have a well defined cluster behaviour where it is
> > >>> > explicitly
> > >>> > > > > > stated
> > >>> > > > > > > > > where
> > >>> > > > > > > > > > > user code and, thus, potentially risky code is executed.
> > >>> > > > > Ideally
> > >>> > > > > > we
> > >>> > > > > > > > > limit
> > >>> > > > > > > > > > > it to the TaskExecutor and JobMaster.
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > Cheers,
> > >>> > > > > > > > > > > Till
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <
> > >>> > > > > > > > > pompermaier@okkam.it
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > wrote:
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > > In my opinion the client should not use any
> > >>> > environment to
> > >>> > > > > get
> > >>> > > > > > the
> > >>> > > > > > > > > Job
> > >>> > > > > > > > > > > > graph because the jar should reside ONLY on the cluster
> > >>> > > > (and
> > >>> > > > > > not in
> > >>> > > > > > > > > the
> > >>> > > > > > > > > > > > client classpath otherwise there are always
> > >>> > inconsistencies
> > >>> > > > > > between
> > >>> > > > > > > > > > > client
> > >>> > > > > > > > > > > > and Flink Job manager's classpath).
> > >>> > > > > > > > > > > > In the YARN, Mesos and Kubernetes scenarios you have
> > >>> > the
> > >>> > > > jar
> > >>> > > > > > but
> > >>> > > > > > > > you
> > >>> > > > > > > > > > > could
> > >>> > > > > > > > > > > > start a cluster that has the jar on the Job Manager as
> > >>> > well
> > >>> > > > > > (but
> > >>> > > > > > > > this
> > >>> > > > > > > > > > is
> > >>> > > > > > > > > > > > the only case where I think you can assume that the
> > >>> > client
> > >>> > > > > has
> > >>> > > > > > the
> > >>> > > > > > > > > jar
> > >>> > > > > > > > > > on
> > >>> > > > > > > > > > > > the classpath..in the REST job submission you don't
> > >>> > have
> > >>> > > > any
> > >>> > > > > > > > > > classpath).
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > > > Thus, always in my opinion, the JobGraph should be
> > >>> > > > generated
> > >>> > > > > > by the
> > >>> > > > > > > > > Job
> > >>> > > > > > > > > > > > Manager REST API.
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
> > >>> > > > > > wander4096@gmail.com>
> > >>> > > > > > > > > > wrote:
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > > >> I would like to involve Till & Stephan here to clarify
> > >>> > > > some
> > >>> > > > > > > > concept
> > >>> > > > > > > > > of
> > >>> > > > > > > > > > > >> per-job mode.
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > > >> The term per-job is one of modes a cluster could run
> > >>> > on.
> > >>> > > > It
> > >>> > > > > is
> > >>> > > > > > > > > mainly
> > >>> > > > > > > > > > > >> aimed
> > >>> > > > > > > > > > > >> at spawn
> > >>> > > > > > > > > > > >> a dedicated cluster for a specific job while the job
> > >>> > could
> > >>> > > > > be
> > >>> > > > > > > > > packaged
> > >>> > > > > > > > > > > >> with
> > >>> > > > > > > > > > > >> Flink
> > >>> > > > > > > > > > > >> itself and thus the cluster initialized with job so
> > >>> > that
> > >>> > > > get
> > >>> > > > > > rid
> > >>> > > > > > > > of
> > >>> > > > > > > > > a
> > >>> > > > > > > > > > > >> separated
> > >>> > > > > > > > > > > >> submission step.
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > > >> This is useful for container deployments where one
> > >>> > create
> > >>> > > > > his
> > >>> > > > > > > > image
> > >>> > > > > > > > > > with
> > >>> > > > > > > > > > > >> the job
> > >>> > > > > > > > > > > >> and then simply deploy the container.
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > > >> However, it is out of client scope since a
> > >>> > > > > > client(ClusterClient
> > >>> > > > > > > > for
> > >>> > > > > > > > > > > >> example) is for
> > >>> > > > > > > > > > > >> communicate with an existing cluster and performance
> > >>> > > > > actions.
> > >>> > > > > > > > > > Currently,
> > >>> > > > > > > > > > > >> in
> > >>> > > > > > > > > > > >> per-job
> > >>> > > > > > > > > > > >> mode, we extract the job graph and bundle it into
> > >>> > cluster
> > >>> > > > > > > > deployment
> > >>> > > > > > > > > > and
> > >>> > > > > > > > > > > >> thus no
> > >>> > > > > > > > > > > >> concept of client get involved. It looks like
> > >>> > reasonable
> > >>> > > > to
> > >>> > > > > > > > exclude
> > >>> > > > > > > > > > the
> > >>> > > > > > > > > > > >> deployment
> > >>> > > > > > > > > > > >> of per-job cluster from client api and use dedicated
> > >>> > > > utility
> > >>> > > > > > > > > > > >> classes(deployers) for
> > >>> > > > > > > > > > > >> deployment.
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > > >> Zili Chen <wa...@gmail.com> 于2019年8月20日周二
> > >>> > 下午12:37写道：
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > > >> > Hi Aljoscha,
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > Thanks for your reply and participance. The Google
> > >>> > Doc
> > >>> > > > you
> > >>> > > > > > > > linked
> > >>> > > > > > > > > to
> > >>> > > > > > > > > > > >> > requires
> > >>> > > > > > > > > > > >> > permission and I think you could use a share link
> > >>> > > > instead.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > I agree with that we almost reach a consensus that
> > >>> > > > > > JobClient is
> > >>> > > > > > > > > > > >> necessary
> > >>> > > > > > > > > > > >> > to
> > >>> > > > > > > > > > > >> > interacte with a running Job.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > Let me check your open questions one by one.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > 1. Separate cluster creation and job submission for
> > >>> > > > > per-job
> > >>> > > > > > > > mode.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > As you mentioned here is where the opinions
> > >>> > diverge. In
> > >>> > > > my
> > >>> > > > > > > > > document
> > >>> > > > > > > > > > > >> there
> > >>> > > > > > > > > > > >> > is
> > >>> > > > > > > > > > > >> > an alternative[2] that proposes excluding per-job
> > >>> > > > > deployment
> > >>> > > > > > > > from
> > >>> > > > > > > > > > > client
> > >>> > > > > > > > > > > >> > api
> > >>> > > > > > > > > > > >> > scope and now I find it is more reasonable we do the
> > >>> > > > > > exclusion.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > When in per-job mode, a dedicated JobCluster is
> > >>> > launched
> > >>> > > > > to
> > >>> > > > > > > > > execute
> > >>> > > > > > > > > > > the
> > >>> > > > > > > > > > > >> > specific job. It is like a Flink Application more
> > >>> > than a
> > >>> > > > > > > > > submission
> > >>> > > > > > > > > > > >> > of Flink Job. Client only takes care of job
> > >>> > submission
> > >>> > > > and
> > >>> > > > > > > > assume
> > >>> > > > > > > > > > > there
> > >>> > > > > > > > > > > >> is
> > >>> > > > > > > > > > > >> > an existing cluster. In this way we are able to
> > >>> > consider
> > >>> > > > > > per-job
> > >>> > > > > > > > > > > issues
> > >>> > > > > > > > > > > >> > individually and JobClusterEntrypoint would be the
> > >>> > > > utility
> > >>> > > > > > class
> > >>> > > > > > > > > for
> > >>> > > > > > > > > > > >> > per-job
> > >>> > > > > > > > > > > >> > deployment.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > Nevertheless, user program works in both session
> > >>> > mode
> > >>> > > > and
> > >>> > > > > > > > per-job
> > >>> > > > > > > > > > mode
> > >>> > > > > > > > > > > >> > without
> > >>> > > > > > > > > > > >> > necessary to change code. JobClient in per-job mode
> > >>> > is
> > >>> > > > > > returned
> > >>> > > > > > > > > from
> > >>> > > > > > > > > > > >> > env.execute as normal. However, it would be no
> > >>> > longer a
> > >>> > > > > > wrapper
> > >>> > > > > > > > of
> > >>> > > > > > > > > > > >> > RestClusterClient but a wrapper of
> > >>> > PerJobClusterClient
> > >>> > > > > which
> > >>> > > > > > > > > > > >> communicates
> > >>> > > > > > > > > > > >> > to Dispatcher locally.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > 2. How to deal with plan preview.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > With env.compile functions users can get JobGraph or
> > >>> > > > > > FlinkPlan
> > >>> > > > > > > > and
> > >>> > > > > > > > > > > thus
> > >>> > > > > > > > > > > >> > they can preview the plan with programming.
> > >>> > Typically it
> > >>> > > > > > looks
> > >>> > > > > > > > > like
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > if (preview configured) {
> > >>> > > > > > > > > > > >> >     FlinkPlan plan = env.compile();
> > >>> > > > > > > > > > > >> >     new JSONDumpGenerator(...).dump(plan);
> > >>> > > > > > > > > > > >> > } else {
> > >>> > > > > > > > > > > >> >     env.execute();
> > >>> > > > > > > > > > > >> > }
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > And `flink info` would be invalid any more.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > 3. How to deal with Jar Submission at the Web
> > >>> > Frontend.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > There is one more thread talked on this topic[1].
> > >>> > Apart
> > >>> > > > > from
> > >>> > > > > > > > > > removing
> > >>> > > > > > > > > > > >> > the functions there are two alternatives.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > One is to introduce an interface has a method
> > >>> > returns
> > >>> > > > > > > > > > > JobGraph/FilnkPlan
> > >>> > > > > > > > > > > >> > and Jar Submission only support main-class
> > >>> > implements
> > >>> > > > this
> > >>> > > > > > > > > > interface.
> > >>> > > > > > > > > > > >> > And then extract the JobGraph/FlinkPlan just by
> > >>> > calling
> > >>> > > > > the
> > >>> > > > > > > > > method.
> > >>> > > > > > > > > > > >> > In this way, it is even possible to consider a
> > >>> > > > separation
> > >>> > > > > > of job
> > >>> > > > > > > > > > > >> creation
> > >>> > > > > > > > > > > >> > and job submission.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > The other is, as you mentioned, let execute() do the
> > >>> > > > > actual
> > >>> > > > > > > > > > execution.
> > >>> > > > > > > > > > > >> > We won't execute the main method in the WebFrontend
> > >>> > but
> > >>> > > > > > spawn a
> > >>> > > > > > > > > > > process
> > >>> > > > > > > > > > > >> > at WebMonitor side to execute. For return part we
> > >>> > could
> > >>> > > > > > generate
> > >>> > > > > > > > > the
> > >>> > > > > > > > > > > >> > JobID from WebMonitor and pass it to the execution
> > >>> > > > > > environemnt.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > 4. How to deal with detached mode.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > I think detached mode is a temporary solution for
> > >>> > > > > > non-blocking
> > >>> > > > > > > > > > > >> submission.
> > >>> > > > > > > > > > > >> > In my document both submission and execution return
> > >>> > a
> > >>> > > > > > > > > > > CompletableFuture
> > >>> > > > > > > > > > > >> and
> > >>> > > > > > > > > > > >> > users control whether or not wait for the result. In
> > >>> > > > this
> > >>> > > > > > point
> > >>> > > > > > > > we
> > >>> > > > > > > > > > > don't
> > >>> > > > > > > > > > > >> > need a detached option but the functionality is
> > >>> > covered.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > 5. How does per-job mode interact with interactive
> > >>> > > > > > programming.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > All of YARN, Mesos and Kubernetes scenarios follow
> > >>> > the
> > >>> > > > > > pattern
> > >>> > > > > > > > > > launch
> > >>> > > > > > > > > > > a
> > >>> > > > > > > > > > > >> > JobCluster now. And I don't think there would be
> > >>> > > > > > inconsistency
> > >>> > > > > > > > > > between
> > >>> > > > > > > > > > > >> > different resource management.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > Best,
> > >>> > > > > > > > > > > >> > tison.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > [1]
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> > >>> > > > > > > > > > > >> > [2]
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > Aljoscha Krettek <al...@apache.org>
> > >>> > 于2019年8月16日周五
> > >>> > > > > > 下午9:20写道：
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> >> Hi,
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> I read both Jeffs initial design document and the
> > >>> > newer
> > >>> > > > > > > > document
> > >>> > > > > > > > > by
> > >>> > > > > > > > > > > >> >> Tison. I also finally found the time to collect our
> > >>> > > > > > thoughts on
> > >>> > > > > > > > > the
> > >>> > > > > > > > > > > >> issue,
> > >>> > > > > > > > > > > >> >> I had quite some discussions with Kostas and this
> > >>> > is
> > >>> > > > the
> > >>> > > > > > > > result:
> > >>> > > > > > > > > > [1].
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> I think overall we agree that this part of the
> > >>> > code is
> > >>> > > > in
> > >>> > > > > > dire
> > >>> > > > > > > > > need
> > >>> > > > > > > > > > > of
> > >>> > > > > > > > > > > >> >> some refactoring/improvements but I think there are
> > >>> > > > still
> > >>> > > > > > some
> > >>> > > > > > > > > open
> > >>> > > > > > > > > > > >> >> questions and some differences in opinion what
> > >>> > those
> > >>> > > > > > > > refactorings
> > >>> > > > > > > > > > > >> should
> > >>> > > > > > > > > > > >> >> look like.
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> I think the API-side is quite clear, i.e. we need
> > >>> > some
> > >>> > > > > > > > JobClient
> > >>> > > > > > > > > > API
> > >>> > > > > > > > > > > >> that
> > >>> > > > > > > > > > > >> >> allows interacting with a running Job. It could be
> > >>> > > > > > worthwhile
> > >>> > > > > > > > to
> > >>> > > > > > > > > > spin
> > >>> > > > > > > > > > > >> that
> > >>> > > > > > > > > > > >> >> off into a separate FLIP because we can probably
> > >>> > find
> > >>> > > > > > consensus
> > >>> > > > > > > > > on
> > >>> > > > > > > > > > > that
> > >>> > > > > > > > > > > >> >> part more easily.
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> For the rest, the main open questions from our doc
> > >>> > are
> > >>> > > > > > these:
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >>   - Do we want to separate cluster creation and job
> > >>> > > > > > submission
> > >>> > > > > > > > > for
> > >>> > > > > > > > > > > >> >> per-job mode? In the past, there were conscious
> > >>> > efforts
> > >>> > > > > to
> > >>> > > > > > > > *not*
> > >>> > > > > > > > > > > >> separate
> > >>> > > > > > > > > > > >> >> job submission from cluster creation for per-job
> > >>> > > > clusters
> > >>> > > > > > for
> > >>> > > > > > > > > > Mesos,
> > >>> > > > > > > > > > > >> YARN,
> > >>> > > > > > > > > > > >> >> Kubernets (see StandaloneJobClusterEntryPoint).
> > >>> > Tison
> > >>> > > > > > suggests
> > >>> > > > > > > > in
> > >>> > > > > > > > > > his
> > >>> > > > > > > > > > > >> >> design document to decouple this in order to unify
> > >>> > job
> > >>> > > > > > > > > submission.
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >>   - How to deal with plan preview, which needs to
> > >>> > > > hijack
> > >>> > > > > > > > > execute()
> > >>> > > > > > > > > > > and
> > >>> > > > > > > > > > > >> >> let the outside code catch an exception?
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >>   - How to deal with Jar Submission at the Web
> > >>> > > > Frontend,
> > >>> > > > > > which
> > >>> > > > > > > > > > needs
> > >>> > > > > > > > > > > to
> > >>> > > > > > > > > > > >> >> hijack execute() and let the outside code catch an
> > >>> > > > > > exception?
> > >>> > > > > > > > > > > >> >> CliFrontend.run() “hijacks”
> > >>> > > > > ExecutionEnvironment.execute()
> > >>> > > > > > to
> > >>> > > > > > > > > get a
> > >>> > > > > > > > > > > >> >> JobGraph and then execute that JobGraph manually.
> > >>> > We
> > >>> > > > > could
> > >>> > > > > > get
> > >>> > > > > > > > > > around
> > >>> > > > > > > > > > > >> that
> > >>> > > > > > > > > > > >> >> by letting execute() do the actual execution. One
> > >>> > > > caveat
> > >>> > > > > > for
> > >>> > > > > > > > this
> > >>> > > > > > > > > > is
> > >>> > > > > > > > > > > >> that
> > >>> > > > > > > > > > > >> >> now the main() method doesn’t return (or is forced
> > >>> > to
> > >>> > > > > > return by
> > >>> > > > > > > > > > > >> throwing an
> > >>> > > > > > > > > > > >> >> exception from execute()) which means that for Jar
> > >>> > > > > > Submission
> > >>> > > > > > > > > from
> > >>> > > > > > > > > > > the
> > >>> > > > > > > > > > > >> >> WebFrontend we have a long-running main() method
> > >>> > > > running
> > >>> > > > > > in the
> > >>> > > > > > > > > > > >> >> WebFrontend. This doesn’t sound very good. We
> > >>> > could get
> > >>> > > > > > around
> > >>> > > > > > > > > this
> > >>> > > > > > > > > > > by
> > >>> > > > > > > > > > > >> >> removing the plan preview feature and by removing
> > >>> > Jar
> > >>> > > > > > > > > > > >> Submission/Running.
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >>   - How to deal with detached mode? Right now,
> > >>> > > > > > > > > DetachedEnvironment
> > >>> > > > > > > > > > > will
> > >>> > > > > > > > > > > >> >> execute the job and return immediately. If users
> > >>> > > > control
> > >>> > > > > > when
> > >>> > > > > > > > > they
> > >>> > > > > > > > > > > >> want to
> > >>> > > > > > > > > > > >> >> return, by waiting on the job completion future,
> > >>> > how do
> > >>> > > > > we
> > >>> > > > > > deal
> > >>> > > > > > > > > > with
> > >>> > > > > > > > > > > >> this?
> > >>> > > > > > > > > > > >> >> Do we simply remove the distinction between
> > >>> > > > > > > > > detached/non-detached?
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >>   - How does per-job mode interact with
> > >>> > “interactive
> > >>> > > > > > > > programming”
> > >>> > > > > > > > > > > >> >> (FLIP-36). For YARN, each execute() call could
> > >>> > spawn a
> > >>> > > > > new
> > >>> > > > > > > > Flink
> > >>> > > > > > > > > > YARN
> > >>> > > > > > > > > > > >> >> cluster. What about Mesos and Kubernetes?
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> The first open question is where the opinions
> > >>> > diverge,
> > >>> > > > I
> > >>> > > > > > think.
> > >>> > > > > > > > > The
> > >>> > > > > > > > > > > >> rest
> > >>> > > > > > > > > > > >> >> are just open questions and interesting things
> > >>> > that we
> > >>> > > > > > need to
> > >>> > > > > > > > > > > >> consider.
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> Best,
> > >>> > > > > > > > > > > >> >> Aljoscha
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> [1]
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > >>> > > > > > > > > > > >> >> <
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <
> > >>> > > > > zjffdu@gmail.com>
> > >>> > > > > > > > > wrote:
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >> > Thanks tison for the effort. I left a few
> > >>> > comments.
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三
> > >>> > > > > 下午8:24写道：
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >> >> Hi Flavio,
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >> Thanks for your reply.
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >> Either current impl and in the design,
> > >>> > ClusterClient
> > >>> > > > > > > > > > > >> >> >> never takes responsibility for generating
> > >>> > JobGraph.
> > >>> > > > > > > > > > > >> >> >> (what you see in current codebase is several
> > >>> > class
> > >>> > > > > > methods)
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >> Instead, user describes his program in the main
> > >>> > > > method
> > >>> > > > > > > > > > > >> >> >> with ExecutionEnvironment apis and calls
> > >>> > > > env.compile()
> > >>> > > > > > > > > > > >> >> >> or env.optimize() to get FlinkPlan and JobGraph
> > >>> > > > > > > > respectively.
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >> For listing main classes in a jar and choose
> > >>> > one for
> > >>> > > > > > > > > > > >> >> >> submission, you're now able to customize a CLI
> > >>> > to do
> > >>> > > > > it.
> > >>> > > > > > > > > > > >> >> >> Specifically, the path of jar is passed as
> > >>> > arguments
> > >>> > > > > and
> > >>> > > > > > > > > > > >> >> >> in the customized CLI you list main classes,
> > >>> > choose
> > >>> > > > > one
> > >>> > > > > > > > > > > >> >> >> to submit to the cluster.
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >> Best,
> > >>> > > > > > > > > > > >> >> >> tison.
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >> Flavio Pompermaier <po...@okkam.it>
> > >>> > > > > 于2019年7月31日周三
> > >>> > > > > > > > > > 下午8:12写道：
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >>> Just one note on my side: it is not clear to me
> > >>> > > > > > whether the
> > >>> > > > > > > > > > > client
> > >>> > > > > > > > > > > >> >> needs
> > >>> > > > > > > > > > > >> >> >> to
> > >>> > > > > > > > > > > >> >> >>> be able to generate a job graph or not.
> > >>> > > > > > > > > > > >> >> >>> In my opinion, the job jar must resides only
> > >>> > on the
> > >>> > > > > > > > > > > >> server/jobManager
> > >>> > > > > > > > > > > >> >> >> side
> > >>> > > > > > > > > > > >> >> >>> and the client requires a way to get the job
> > >>> > graph.
> > >>> > > > > > > > > > > >> >> >>> If you really want to access to the job graph,
> > >>> > I'd
> > >>> > > > > add
> > >>> > > > > > a
> > >>> > > > > > > > > > > dedicated
> > >>> > > > > > > > > > > >> >> method
> > >>> > > > > > > > > > > >> >> >>> on the ClusterClient. like:
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
> > >>> > > > > > > > > > > >> >> >>>   - listMainClasses(jarId): List<String>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>> These would require some addition also on the
> > >>> > job
> > >>> > > > > > manager
> > >>> > > > > > > > > > > endpoint
> > >>> > > > > > > > > > > >> as
> > >>> > > > > > > > > > > >> >> >>> well..what do you think?
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <
> > >>> > > > > > > > > > wander4096@gmail.com
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > > >> >> wrote:
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>>> Hi all,
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> Here is a document[1] on client api
> > >>> > enhancement
> > >>> > > > from
> > >>> > > > > > our
> > >>> > > > > > > > > > > >> perspective.
> > >>> > > > > > > > > > > >> >> >>>> We have investigated current implementations.
> > >>> > And
> > >>> > > > we
> > >>> > > > > > > > propose
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> 1. Unify the implementation of cluster
> > >>> > deployment
> > >>> > > > > and
> > >>> > > > > > job
> > >>> > > > > > > > > > > >> submission
> > >>> > > > > > > > > > > >> >> in
> > >>> > > > > > > > > > > >> >> >>>> Flink.
> > >>> > > > > > > > > > > >> >> >>>> 2. Provide programmatic interfaces to allow
> > >>> > > > flexible
> > >>> > > > > > job
> > >>> > > > > > > > and
> > >>> > > > > > > > > > > >> cluster
> > >>> > > > > > > > > > > >> >> >>>> management.
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> The first proposal is aimed at reducing code
> > >>> > paths
> > >>> > > > > of
> > >>> > > > > > > > > cluster
> > >>> > > > > > > > > > > >> >> >> deployment
> > >>> > > > > > > > > > > >> >> >>>> and
> > >>> > > > > > > > > > > >> >> >>>> job submission so that one can adopt Flink in
> > >>> > his
> > >>> > > > > > usage
> > >>> > > > > > > > > > easily.
> > >>> > > > > > > > > > > >> The
> > >>> > > > > > > > > > > >> >> >>> second
> > >>> > > > > > > > > > > >> >> >>>> proposal is aimed at providing rich
> > >>> > interfaces for
> > >>> > > > > > > > advanced
> > >>> > > > > > > > > > > users
> > >>> > > > > > > > > > > >> >> >>>> who want to make accurate control of these
> > >>> > stages.
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> Quick reference on open questions:
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> 1. Exclude job cluster deployment from client
> > >>> > side
> > >>> > > > > or
> > >>> > > > > > > > > redefine
> > >>> > > > > > > > > > > the
> > >>> > > > > > > > > > > >> >> >>> semantic
> > >>> > > > > > > > > > > >> >> >>>> of job cluster? Since it fits in a process
> > >>> > quite
> > >>> > > > > > different
> > >>> > > > > > > > > > from
> > >>> > > > > > > > > > > >> >> session
> > >>> > > > > > > > > > > >> >> >>>> cluster deployment and job submission.
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> 2. Maintain the codepaths handling class
> > >>> > > > > > > > > > > o.a.f.api.common.Program
> > >>> > > > > > > > > > > >> or
> > >>> > > > > > > > > > > >> >> >>>> implement customized program handling logic by
> > >>> > > > > > customized
> > >>> > > > > > > > > > > >> >> CliFrontend?
> > >>> > > > > > > > > > > >> >> >>>> See also this thread[2] and the document[1].
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> 3. Expose ClusterClient as public api or just
> > >>> > > > expose
> > >>> > > > > > api
> > >>> > > > > > > > in
> > >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>> and delegate them to ClusterClient? Further,
> > >>> > in
> > >>> > > > > > either way
> > >>> > > > > > > > > is
> > >>> > > > > > > > > > it
> > >>> > > > > > > > > > > >> >> worth
> > >>> > > > > > > > > > > >> >> >> to
> > >>> > > > > > > > > > > >> >> >>>> introduce a JobClient which is an
> > >>> > encapsulation of
> > >>> > > > > > > > > > ClusterClient
> > >>> > > > > > > > > > > >> that
> > >>> > > > > > > > > > > >> >> >>>> associated to specific job?
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> Best,
> > >>> > > > > > > > > > > >> >> >>>> tison.
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> [1]
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> > >>> > > > > > > > > > > >> >> >>>> [2]
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三
> > >>> > > > > 上午9:19写道：
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>>> Thanks Stephan, I will follow up this issue
> > >>> > in
> > >>> > > > next
> > >>> > > > > > few
> > >>> > > > > > > > > > weeks,
> > >>> > > > > > > > > > > >> and
> > >>> > > > > > > > > > > >> >> >> will
> > >>> > > > > > > > > > > >> >> >>>>> refine the design doc. We could discuss more
> > >>> > > > > details
> > >>> > > > > > > > after
> > >>> > > > > > > > > > 1.9
> > >>> > > > > > > > > > > >> >> >> release.
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>> Stephan Ewen <se...@apache.org>
> > >>> > 于2019年7月24日周三
> > >>> > > > > > 上午12:58写道：
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>>> Hi all!
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>> This thread has stalled for a bit, which I
> > >>> > > > assume
> > >>> > > > > > ist
> > >>> > > > > > > > > mostly
> > >>> > > > > > > > > > > >> due to
> > >>> > > > > > > > > > > >> >> >>> the
> > >>> > > > > > > > > > > >> >> >>>>>> Flink 1.9 feature freeze and release testing
> > >>> > > > > effort.
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>> I personally still recognize this issue as
> > >>> > one
> > >>> > > > > > important
> > >>> > > > > > > > > to
> > >>> > > > > > > > > > be
> > >>> > > > > > > > > > > >> >> >>> solved.
> > >>> > > > > > > > > > > >> >> >>>>> I'd
> > >>> > > > > > > > > > > >> >> >>>>>> be happy to help resume this discussion soon
> > >>> > > > > (after
> > >>> > > > > > the
> > >>> > > > > > > > > 1.9
> > >>> > > > > > > > > > > >> >> >> release)
> > >>> > > > > > > > > > > >> >> >>>> and
> > >>> > > > > > > > > > > >> >> >>>>>> see if we can do some step towards this in
> > >>> > Flink
> > >>> > > > > > 1.10.
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>> Best,
> > >>> > > > > > > > > > > >> >> >>>>>> Stephan
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio
> > >>> > > > > Pompermaier
> > >>> > > > > > <
> > >>> > > > > > > > > > > >> >> >>>>> pompermaier@okkam.it>
> > >>> > > > > > > > > > > >> >> >>>>>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>> That's exactly what I suggested a long time
> > >>> > > > ago:
> > >>> > > > > > the
> > >>> > > > > > > > > Flink
> > >>> > > > > > > > > > > REST
> > >>> > > > > > > > > > > >> >> >>>> client
> > >>> > > > > > > > > > > >> >> >>>>>>> should not require any Flink dependency,
> > >>> > only
> > >>> > > > > http
> > >>> > > > > > > > > library
> > >>> > > > > > > > > > to
> > >>> > > > > > > > > > > >> >> >> call
> > >>> > > > > > > > > > > >> >> >>>> the
> > >>> > > > > > > > > > > >> >> >>>>>> REST
> > >>> > > > > > > > > > > >> >> >>>>>>> services to submit and monitor a job.
> > >>> > > > > > > > > > > >> >> >>>>>>> What I suggested also in [1] was to have a
> > >>> > way
> > >>> > > > to
> > >>> > > > > > > > > > > automatically
> > >>> > > > > > > > > > > >> >> >>>> suggest
> > >>> > > > > > > > > > > >> >> >>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>> user (via a UI) the available main classes
> > >>> > and
> > >>> > > > > > their
> > >>> > > > > > > > > > required
> > >>> > > > > > > > > > > >> >> >>>>>>> parameters[2].
> > >>> > > > > > > > > > > >> >> >>>>>>> Another problem we have with Flink is that
> > >>> > the
> > >>> > > > > Rest
> > >>> > > > > > > > > client
> > >>> > > > > > > > > > > and
> > >>> > > > > > > > > > > >> >> >> the
> > >>> > > > > > > > > > > >> >> >>>> CLI
> > >>> > > > > > > > > > > >> >> >>>>>> one
> > >>> > > > > > > > > > > >> >> >>>>>>> behaves differently and we use the CLI
> > >>> > client
> > >>> > > > > (via
> > >>> > > > > > ssh)
> > >>> > > > > > > > > > > because
> > >>> > > > > > > > > > > >> >> >> it
> > >>> > > > > > > > > > > >> >> >>>>> allows
> > >>> > > > > > > > > > > >> >> >>>>>>> to call some other method after
> > >>> > env.execute()
> > >>> > > > [3]
> > >>> > > > > > (we
> > >>> > > > > > > > > have
> > >>> > > > > > > > > > to
> > >>> > > > > > > > > > > >> >> >> call
> > >>> > > > > > > > > > > >> >> >>>>>> another
> > >>> > > > > > > > > > > >> >> >>>>>>> REST service to signal the end of the job).
> > >>> > > > > > > > > > > >> >> >>>>>>> Int his regard, a dedicated interface,
> > >>> > like the
> > >>> > > > > > > > > JobListener
> > >>> > > > > > > > > > > >> >> >>> suggested
> > >>> > > > > > > > > > > >> >> >>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>> the previous emails, would be very helpful
> > >>> > > > > (IMHO).
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>> [1]
> > >>> > > > > > https://issues.apache.org/jira/browse/FLINK-10864
> > >>> > > > > > > > > > > >> >> >>>>>>> [2]
> > >>> > > > > > https://issues.apache.org/jira/browse/FLINK-10862
> > >>> > > > > > > > > > > >> >> >>>>>>> [3]
> > >>> > > > > > https://issues.apache.org/jira/browse/FLINK-10879
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>> Best,
> > >>> > > > > > > > > > > >> >> >>>>>>> Flavio
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang
> > >>> > <
> > >>> > > > > > > > > > zjffdu@gmail.com
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > > >> >> >>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>> Hi, Tison,
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>> Thanks for your comments. Overall I agree
> > >>> > with
> > >>> > > > > you
> > >>> > > > > > > > that
> > >>> > > > > > > > > it
> > >>> > > > > > > > > > > is
> > >>> > > > > > > > > > > >> >> >>>>> difficult
> > >>> > > > > > > > > > > >> >> >>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>> down stream project to integrate with
> > >>> > flink
> > >>> > > > and
> > >>> > > > > we
> > >>> > > > > > > > need
> > >>> > > > > > > > > to
> > >>> > > > > > > > > > > >> >> >>> refactor
> > >>> > > > > > > > > > > >> >> >>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>> current flink client api.
> > >>> > > > > > > > > > > >> >> >>>>>>>> And I agree that CliFrontend should only
> > >>> > > > parsing
> > >>> > > > > > > > command
> > >>> > > > > > > > > > > line
> > >>> > > > > > > > > > > >> >> >>>>> arguments
> > >>> > > > > > > > > > > >> >> >>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>> then pass them to ExecutionEnvironment.
> > >>> > It is
> > >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment's
> > >>> > > > > > > > > > > >> >> >>>>>>>> responsibility to compile job, create
> > >>> > cluster,
> > >>> > > > > and
> > >>> > > > > > > > > submit
> > >>> > > > > > > > > > > job.
> > >>> > > > > > > > > > > >> >> >>>>> Besides
> > >>> > > > > > > > > > > >> >> >>>>>>>> that, Currently flink has many
> > >>> > > > > > ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>> implementations,
> > >>> > > > > > > > > > > >> >> >>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>> flink will use the specific one based on
> > >>> > the
> > >>> > > > > > context.
> > >>> > > > > > > > > > IMHO,
> > >>> > > > > > > > > > > it
> > >>> > > > > > > > > > > >> >> >> is
> > >>> > > > > > > > > > > >> >> >>>> not
> > >>> > > > > > > > > > > >> >> >>>>>>>> necessary, ExecutionEnvironment should be
> > >>> > able
> > >>> > > > > to
> > >>> > > > > > do
> > >>> > > > > > > > the
> > >>> > > > > > > > > > > right
> > >>> > > > > > > > > > > >> >> >>>> thing
> > >>> > > > > > > > > > > >> >> >>>>>>> based
> > >>> > > > > > > > > > > >> >> >>>>>>>> on the FlinkConf it is received. Too many
> > >>> > > > > > > > > > > ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>> implementation is another burden for
> > >>> > > > downstream
> > >>> > > > > > > > project
> > >>> > > > > > > > > > > >> >> >>>> integration.
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>> One thing I'd like to mention is flink's
> > >>> > scala
> > >>> > > > > > shell
> > >>> > > > > > > > and
> > >>> > > > > > > > > > sql
> > >>> > > > > > > > > > > >> >> >>>> client,
> > >>> > > > > > > > > > > >> >> >>>>>>>> although they are sub-modules of flink,
> > >>> > they
> > >>> > > > > > could be
> > >>> > > > > > > > > > > treated
> > >>> > > > > > > > > > > >> >> >> as
> > >>> > > > > > > > > > > >> >> >>>>>>> downstream
> > >>> > > > > > > > > > > >> >> >>>>>>>> project which use flink's client api.
> > >>> > > > Currently
> > >>> > > > > > you
> > >>> > > > > > > > will
> > >>> > > > > > > > > > > find
> > >>> > > > > > > > > > > >> >> >> it
> > >>> > > > > > > > > > > >> >> >>> is
> > >>> > > > > > > > > > > >> >> >>>>> not
> > >>> > > > > > > > > > > >> >> >>>>>>>> easy for them to integrate with flink,
> > >>> > they
> > >>> > > > > share
> > >>> > > > > > many
> > >>> > > > > > > > > > > >> >> >> duplicated
> > >>> > > > > > > > > > > >> >> >>>>> code.
> > >>> > > > > > > > > > > >> >> >>>>>>> It
> > >>> > > > > > > > > > > >> >> >>>>>>>> is another sign that we should refactor
> > >>> > flink
> > >>> > > > > > client
> > >>> > > > > > > > > api.
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>> I believe it is a large and hard change,
> > >>> > and I
> > >>> > > > > am
> > >>> > > > > > > > afraid
> > >>> > > > > > > > > > we
> > >>> > > > > > > > > > > >> can
> > >>> > > > > > > > > > > >> >> >>> not
> > >>> > > > > > > > > > > >> >> >>>>>> keep
> > >>> > > > > > > > > > > >> >> >>>>>>>> compatibility since many of changes are
> > >>> > user
> > >>> > > > > > facing.
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>> Zili Chen <wa...@gmail.com>
> > >>> > > > 于2019年6月24日周一
> > >>> > > > > > > > > 下午2:53写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> Hi all,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> After a closer look on our client apis,
> > >>> > I can
> > >>> > > > > see
> > >>> > > > > > > > there
> > >>> > > > > > > > > > are
> > >>> > > > > > > > > > > >> >> >> two
> > >>> > > > > > > > > > > >> >> >>>>> major
> > >>> > > > > > > > > > > >> >> >>>>>>>>> issues to consistency and integration,
> > >>> > namely
> > >>> > > > > > > > different
> > >>> > > > > > > > > > > >> >> >>>> deployment
> > >>> > > > > > > > > > > >> >> >>>>> of
> > >>> > > > > > > > > > > >> >> >>>>>>>>> job cluster which couples job graph
> > >>> > creation
> > >>> > > > > and
> > >>> > > > > > > > > cluster
> > >>> > > > > > > > > > > >> >> >>>>> deployment,
> > >>> > > > > > > > > > > >> >> >>>>>>>>> and submission via CliFrontend confusing
> > >>> > > > > control
> > >>> > > > > > flow
> > >>> > > > > > > > > of
> > >>> > > > > > > > > > > job
> > >>> > > > > > > > > > > >> >> >>>> graph
> > >>> > > > > > > > > > > >> >> >>>>>>>>> compilation and job submission. I'd like
> > >>> > to
> > >>> > > > > > follow
> > >>> > > > > > > > the
> > >>> > > > > > > > > > > >> >> >> discuss
> > >>> > > > > > > > > > > >> >> >>>>> above,
> > >>> > > > > > > > > > > >> >> >>>>>>>>> mainly the process described by Jeff and
> > >>> > > > > > Stephan, and
> > >>> > > > > > > > > > share
> > >>> > > > > > > > > > > >> >> >> my
> > >>> > > > > > > > > > > >> >> >>>>>>>>> ideas on these issues.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend confuses the control flow
> > >>> > of
> > >>> > > > job
> > >>> > > > > > > > > > compilation
> > >>> > > > > > > > > > > >> >> >> and
> > >>> > > > > > > > > > > >> >> >>>>>>>> submission.
> > >>> > > > > > > > > > > >> >> >>>>>>>>> Following the process of job submission
> > >>> > > > Stephan
> > >>> > > > > > and
> > >>> > > > > > > > > Jeff
> > >>> > > > > > > > > > > >> >> >>>> described,
> > >>> > > > > > > > > > > >> >> >>>>>>>>> execution environment knows all configs
> > >>> > of
> > >>> > > > the
> > >>> > > > > > > > cluster
> > >>> > > > > > > > > > and
> > >>> > > > > > > > > > > >> >> >>>>>>> topos/settings
> > >>> > > > > > > > > > > >> >> >>>>>>>>> of the job. Ideally, in the main method
> > >>> > of
> > >>> > > > user
> > >>> > > > > > > > > program,
> > >>> > > > > > > > > > it
> > >>> > > > > > > > > > > >> >> >>> calls
> > >>> > > > > > > > > > > >> >> >>>>>>>> #execute
> > >>> > > > > > > > > > > >> >> >>>>>>>>> (or named #submit) and Flink deploys the
> > >>> > > > > cluster,
> > >>> > > > > > > > > compile
> > >>> > > > > > > > > > > the
> > >>> > > > > > > > > > > >> >> >>> job
> > >>> > > > > > > > > > > >> >> >>>>>> graph
> > >>> > > > > > > > > > > >> >> >>>>>>>>> and submit it to the cluster. However,
> > >>> > > > current
> > >>> > > > > > > > > > CliFrontend
> > >>> > > > > > > > > > > >> >> >> does
> > >>> > > > > > > > > > > >> >> >>>> all
> > >>> > > > > > > > > > > >> >> >>>>>>> these
> > >>> > > > > > > > > > > >> >> >>>>>>>>> things inside its #runProgram method,
> > >>> > which
> > >>> > > > > > > > introduces
> > >>> > > > > > > > > a
> > >>> > > > > > > > > > > lot
> > >>> > > > > > > > > > > >> >> >> of
> > >>> > > > > > > > > > > >> >> >>>>>>>> subclasses
> > >>> > > > > > > > > > > >> >> >>>>>>>>> of (stream) execution environment.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> Actually, it sets up an exec env that
> > >>> > hijacks
> > >>> > > > > the
> > >>> > > > > > > > > > > >> >> >>>>>> #execute/executePlan
> > >>> > > > > > > > > > > >> >> >>>>>>>>> method, initializes the job graph and
> > >>> > abort
> > >>> > > > > > > > execution.
> > >>> > > > > > > > > > And
> > >>> > > > > > > > > > > >> >> >> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>> control flow back to CliFrontend, it
> > >>> > deploys
> > >>> > > > > the
> > >>> > > > > > > > > > cluster(or
> > >>> > > > > > > > > > > >> >> >>>>> retrieve
> > >>> > > > > > > > > > > >> >> >>>>>>>>> the client) and submits the job graph.
> > >>> > This
> > >>> > > > is
> > >>> > > > > > quite
> > >>> > > > > > > > a
> > >>> > > > > > > > > > > >> >> >> specific
> > >>> > > > > > > > > > > >> >> >>>>>>> internal
> > >>> > > > > > > > > > > >> >> >>>>>>>>> process inside Flink and none of
> > >>> > consistency
> > >>> > > > to
> > >>> > > > > > > > > anything.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> 2) Deployment of job cluster couples job
> > >>> > > > graph
> > >>> > > > > > > > creation
> > >>> > > > > > > > > > and
> > >>> > > > > > > > > > > >> >> >>>> cluster
> > >>> > > > > > > > > > > >> >> >>>>>>>>> deployment. Abstractly, from user job to
> > >>> > a
> > >>> > > > > > concrete
> > >>> > > > > > > > > > > >> >> >> submission,
> > >>> > > > > > > > > > > >> >> >>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>> requires
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>     create JobGraph --\
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> create ClusterClient -->  submit JobGraph
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> such a dependency. ClusterClient was
> > >>> > created
> > >>> > > > by
> > >>> > > > > > > > > deploying
> > >>> > > > > > > > > > > or
> > >>> > > > > > > > > > > >> >> >>>>>>> retrieving.
> > >>> > > > > > > > > > > >> >> >>>>>>>>> JobGraph submission requires a compiled
> > >>> > > > > JobGraph
> > >>> > > > > > and
> > >>> > > > > > > > > > valid
> > >>> > > > > > > > > > > >> >> >>>>>>> ClusterClient,
> > >>> > > > > > > > > > > >> >> >>>>>>>>> but the creation of ClusterClient is
> > >>> > > > abstractly
> > >>> > > > > > > > > > independent
> > >>> > > > > > > > > > > >> >> >> of
> > >>> > > > > > > > > > > >> >> >>>> that
> > >>> > > > > > > > > > > >> >> >>>>>> of
> > >>> > > > > > > > > > > >> >> >>>>>>>>> JobGraph. However, in job cluster mode,
> > >>> > we
> > >>> > > > > > deploy job
> > >>> > > > > > > > > > > cluster
> > >>> > > > > > > > > > > >> >> >>>> with
> > >>> > > > > > > > > > > >> >> >>>>> a
> > >>> > > > > > > > > > > >> >> >>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>> graph, which means we use another
> > >>> > process:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> create JobGraph --> deploy cluster with
> > >>> > the
> > >>> > > > > > JobGraph
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> Here is another inconsistency and
> > >>> > downstream
> > >>> > > > > > > > > > > projects/client
> > >>> > > > > > > > > > > >> >> >>> apis
> > >>> > > > > > > > > > > >> >> >>>>> are
> > >>> > > > > > > > > > > >> >> >>>>>>>>> forced to handle different cases with
> > >>> > rare
> > >>> > > > > > supports
> > >>> > > > > > > > > from
> > >>> > > > > > > > > > > >> >> >> Flink.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> Since we likely reached a consensus on
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> 1. all configs gathered by Flink
> > >>> > > > configuration
> > >>> > > > > > and
> > >>> > > > > > > > > passed
> > >>> > > > > > > > > > > >> >> >>>>>>>>> 2. execution environment knows all
> > >>> > configs
> > >>> > > > and
> > >>> > > > > > > > handles
> > >>> > > > > > > > > > > >> >> >>>>> execution(both
> > >>> > > > > > > > > > > >> >> >>>>>>>>> deployment and submission)
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> to the issues above I propose eliminating
> > >>> > > > > > > > > inconsistencies
> > >>> > > > > > > > > > > by
> > >>> > > > > > > > > > > >> >> >>>>>> following
> > >>> > > > > > > > > > > >> >> >>>>>>>>> approach:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend should exactly be a front
> > >>> > end,
> > >>> > > > > at
> > >>> > > > > > > > least
> > >>> > > > > > > > > > for
> > >>> > > > > > > > > > > >> >> >>> "run"
> > >>> > > > > > > > > > > >> >> >>>>>>> command.
> > >>> > > > > > > > > > > >> >> >>>>>>>>> That means it just gathered and passed
> > >>> > all
> > >>> > > > > config
> > >>> > > > > > > > from
> > >>> > > > > > > > > > > >> >> >> command
> > >>> > > > > > > > > > > >> >> >>>> line
> > >>> > > > > > > > > > > >> >> >>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>> the main method of user program.
> > >>> > Execution
> > >>> > > > > > > > environment
> > >>> > > > > > > > > > > knows
> > >>> > > > > > > > > > > >> >> >>> all
> > >>> > > > > > > > > > > >> >> >>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>> info
> > >>> > > > > > > > > > > >> >> >>>>>>>>> and with an addition to utils for
> > >>> > > > > ClusterClient,
> > >>> > > > > > we
> > >>> > > > > > > > > > > >> >> >> gracefully
> > >>> > > > > > > > > > > >> >> >>>> get
> > >>> > > > > > > > > > > >> >> >>>>> a
> > >>> > > > > > > > > > > >> >> >>>>>>>>> ClusterClient by deploying or
> > >>> > retrieving. In
> > >>> > > > > this
> > >>> > > > > > > > way,
> > >>> > > > > > > > > we
> > >>> > > > > > > > > > > >> >> >> don't
> > >>> > > > > > > > > > > >> >> >>>>> need
> > >>> > > > > > > > > > > >> >> >>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>> hijack #execute/executePlan methods and
> > >>> > can
> > >>> > > > > > remove
> > >>> > > > > > > > > > various
> > >>> > > > > > > > > > > >> >> >>>> hacking
> > >>> > > > > > > > > > > >> >> >>>>>>>>> subclasses of exec env, as well as #run
> > >>> > > > methods
> > >>> > > > > > in
> > >>> > > > > > > > > > > >> >> >>>>> ClusterClient(for
> > >>> > > > > > > > > > > >> >> >>>>>> an
> > >>> > > > > > > > > > > >> >> >>>>>>>>> interface-ized ClusterClient). Now the
> > >>> > > > control
> > >>> > > > > > flow
> > >>> > > > > > > > > flows
> > >>> > > > > > > > > > > >> >> >> from
> > >>> > > > > > > > > > > >> >> >>>>>>>> CliFrontend
> > >>> > > > > > > > > > > >> >> >>>>>>>>> to the main method and never returns.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> 2) Job cluster means a cluster for the
> > >>> > > > specific
> > >>> > > > > > job.
> > >>> > > > > > > > > From
> > >>> > > > > > > > > > > >> >> >>> another
> > >>> > > > > > > > > > > >> >> >>>>>>>>> perspective, it is an ephemeral session.
> > >>> > We
> > >>> > > > may
> > >>> > > > > > > > > decouple
> > >>> > > > > > > > > > > the
> > >>> > > > > > > > > > > >> >> >>>>>> deployment
> > >>> > > > > > > > > > > >> >> >>>>>>>>> with a compiled job graph, but start a
> > >>> > > > session
> > >>> > > > > > with
> > >>> > > > > > > > > idle
> > >>> > > > > > > > > > > >> >> >>> timeout
> > >>> > > > > > > > > > > >> >> >>>>>>>>> and submit the job following.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> These topics, before we go into more
> > >>> > details
> > >>> > > > on
> > >>> > > > > > > > design
> > >>> > > > > > > > > or
> > >>> > > > > > > > > > > >> >> >>>>>>> implementation,
> > >>> > > > > > > > > > > >> >> >>>>>>>>> are better to be aware and discussed for
> > >>> > a
> > >>> > > > > > consensus.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> Best,
> > >>> > > > > > > > > > > >> >> >>>>>>>>> tison.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> Zili Chen <wa...@gmail.com>
> > >>> > > > 于2019年6月20日周四
> > >>> > > > > > > > > 上午3:21写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> Hi Jeff,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> Thanks for raising this thread and the
> > >>> > > > design
> > >>> > > > > > > > > document!
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> As @Thomas Weise mentioned above,
> > >>> > extending
> > >>> > > > > > config
> > >>> > > > > > > > to
> > >>> > > > > > > > > > > flink
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> requires far more effort than it should
> > >>> > be.
> > >>> > > > > > Another
> > >>> > > > > > > > > > > example
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> is we achieve detach mode by introduce
> > >>> > > > another
> > >>> > > > > > > > > execution
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> environment which also hijack #execute
> > >>> > > > method.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> I agree with your idea that user would
> > >>> > > > > > configure all
> > >>> > > > > > > > > > > things
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> and flink "just" respect it. On this
> > >>> > topic I
> > >>> > > > > > think
> > >>> > > > > > > > the
> > >>> > > > > > > > > > > >> >> >> unusual
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> control flow when CliFrontend handle
> > >>> > "run"
> > >>> > > > > > command
> > >>> > > > > > > > is
> > >>> > > > > > > > > > the
> > >>> > > > > > > > > > > >> >> >>>> problem.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> It handles several configs, mainly about
> > >>> > > > > cluster
> > >>> > > > > > > > > > settings,
> > >>> > > > > > > > > > > >> >> >> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> thus main method of user program is
> > >>> > unaware
> > >>> > > > of
> > >>> > > > > > them.
> > >>> > > > > > > > > > Also
> > >>> > > > > > > > > > > it
> > >>> > > > > > > > > > > >> >> >>>>>> compiles
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> app to job graph by run the main method
> > >>> > > > with a
> > >>> > > > > > > > > hijacked
> > >>> > > > > > > > > > > exec
> > >>> > > > > > > > > > > >> >> >>>> env,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> which constrain the main method further.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> I'd like to write down a few of notes on
> > >>> > > > > > > > configs/args
> > >>> > > > > > > > > > pass
> > >>> > > > > > > > > > > >> >> >> and
> > >>> > > > > > > > > > > >> >> >>>>>>> respect,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> as well as decoupling job compilation
> > >>> > and
> > >>> > > > > > > > submission.
> > >>> > > > > > > > > > > Share
> > >>> > > > > > > > > > > >> >> >> on
> > >>> > > > > > > > > > > >> >> >>>>> this
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> thread later.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> Best,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> tison.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com>
> > >>> > > > > > 于2019年6月17日周一
> > >>> > > > > > > > > > > >> >> >> 下午7:29写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the
> > >>> > design
> > >>> > > > > > > > document.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> We are also working on refactoring
> > >>> > > > > > ClusterClient to
> > >>> > > > > > > > > > allow
> > >>> > > > > > > > > > > >> >> >>>>> flexible
> > >>> > > > > > > > > > > >> >> >>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> efficient job management in our
> > >>> > real-time
> > >>> > > > > > platform.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> We would like to draft a document to
> > >>> > share
> > >>> > > > > our
> > >>> > > > > > > > ideas
> > >>> > > > > > > > > > with
> > >>> > > > > > > > > > > >> >> >>> you.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> I think it's a good idea to have
> > >>> > something
> > >>> > > > > like
> > >>> > > > > > > > > Apache
> > >>> > > > > > > > > > > Livy
> > >>> > > > > > > > > > > >> >> >>> for
> > >>> > > > > > > > > > > >> >> >>>>>>> Flink,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> the efforts discussed here will take a
> > >>> > > > great
> > >>> > > > > > step
> > >>> > > > > > > > > > forward
> > >>> > > > > > > > > > > >> >> >> to
> > >>> > > > > > > > > > > >> >> >>>> it.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Regards,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Xiaogang
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Flavio Pompermaier <
> > >>> > pompermaier@okkam.it>
> > >>> > > > > > > > > > 于2019年6月17日周一
> > >>> > > > > > > > > > > >> >> >>>>> 下午7:13写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> Is there any possibility to have
> > >>> > something
> > >>> > > > > > like
> > >>> > > > > > > > > Apache
> > >>> > > > > > > > > > > >> >> >> Livy
> > >>> > > > > > > > > > > >> >> >>>> [1]
> > >>> > > > > > > > > > > >> >> >>>>>>> also
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> Flink in the future?
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff
> > >>> > > > Zhang <
> > >>> > > > > > > > > > > >> >> >>> zjffdu@gmail.com
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Any API we expose should not have
> > >>> > > > > > dependencies
> > >>> > > > > > > > > on
> > >>> > > > > > > > > > > >> >> >>> the
> > >>> > > > > > > > > > > >> >> >>>>>>> runtime
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> (flink-runtime) package or other
> > >>> > > > > > implementation
> > >>> > > > > > > > > > > >> >> >> details.
> > >>> > > > > > > > > > > >> >> >>> To
> > >>> > > > > > > > > > > >> >> >>>>> me,
> > >>> > > > > > > > > > > >> >> >>>>>>>> this
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> means
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> that the current ClusterClient
> > >>> > cannot be
> > >>> > > > > > exposed
> > >>> > > > > > > > to
> > >>> > > > > > > > > > > >> >> >> users
> > >>> > > > > > > > > > > >> >> >>>>>> because
> > >>> > > > > > > > > > > >> >> >>>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> uses
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> quite some classes from the
> > >>> > optimiser and
> > >>> > > > > > runtime
> > >>> > > > > > > > > > > >> >> >>> packages.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> We should change ClusterClient from
> > >>> > class
> > >>> > > > > to
> > >>> > > > > > > > > > interface.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> ExecutionEnvironment only use the
> > >>> > > > interface
> > >>> > > > > > > > > > > >> >> >> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>> which
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> should be
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> in flink-clients while the concrete
> > >>> > > > > > > > implementation
> > >>> > > > > > > > > > > >> >> >> class
> > >>> > > > > > > > > > > >> >> >>>>> could
> > >>> > > > > > > > > > > >> >> >>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> flink-runtime.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What happens when a
> > >>> > failure/restart in
> > >>> > > > > the
> > >>> > > > > > > > > client
> > >>> > > > > > > > > > > >> >> >>>>> happens?
> > >>> > > > > > > > > > > >> >> >>>>>>>> There
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> need
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to be a way of re-establishing the
> > >>> > > > > > connection to
> > >>> > > > > > > > > the
> > >>> > > > > > > > > > > >> >> >> job,
> > >>> > > > > > > > > > > >> >> >>>> set
> > >>> > > > > > > > > > > >> >> >>>>>> up
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> listeners again, etc.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Good point.  First we need to define
> > >>> > what
> > >>> > > > > > does
> > >>> > > > > > > > > > > >> >> >>>>> failure/restart
> > >>> > > > > > > > > > > >> >> >>>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> client mean. IIUC, that usually mean
> > >>> > > > > network
> > >>> > > > > > > > > failure
> > >>> > > > > > > > > > > >> >> >>> which
> > >>> > > > > > > > > > > >> >> >>>>> will
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> happen in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> class RestClient. If my
> > >>> > understanding is
> > >>> > > > > > correct,
> > >>> > > > > > > > > > > >> >> >>>>> restart/retry
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> mechanism
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> should be done in RestClient.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Aljoscha Krettek <
> > >>> > aljoscha@apache.org>
> > >>> > > > > > > > > 于2019年6月11日周二
> > >>> > > > > > > > > > > >> >> >>>>>> 下午11:10写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> Some points to consider:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> * Any API we expose should not have
> > >>> > > > > > dependencies
> > >>> > > > > > > > > on
> > >>> > > > > > > > > > > >> >> >> the
> > >>> > > > > > > > > > > >> >> >>>>>> runtime
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> (flink-runtime) package or other
> > >>> > > > > > implementation
> > >>> > > > > > > > > > > >> >> >>> details.
> > >>> > > > > > > > > > > >> >> >>>> To
> > >>> > > > > > > > > > > >> >> >>>>>> me,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> this
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> means
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> that the current ClusterClient
> > >>> > cannot be
> > >>> > > > > > exposed
> > >>> > > > > > > > > to
> > >>> > > > > > > > > > > >> >> >>> users
> > >>> > > > > > > > > > > >> >> >>>>>>> because
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> uses
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> quite some classes from the
> > >>> > optimiser
> > >>> > > > and
> > >>> > > > > > > > runtime
> > >>> > > > > > > > > > > >> >> >>>> packages.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> * What happens when a
> > >>> > failure/restart in
> > >>> > > > > the
> > >>> > > > > > > > > client
> > >>> > > > > > > > > > > >> >> >>>>> happens?
> > >>> > > > > > > > > > > >> >> >>>>>>>> There
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> need
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> be a way of re-establishing the
> > >>> > > > connection
> > >>> > > > > > to
> > >>> > > > > > > > the
> > >>> > > > > > > > > > > >> >> >> job,
> > >>> > > > > > > > > > > >> >> >>>> set
> > >>> > > > > > > > > > > >> >> >>>>> up
> > >>> > > > > > > > > > > >> >> >>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> listeners
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> again, etc.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> Aljoscha
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff
> > >>> > Zhang <
> > >>> > > > > > > > > > > >> >> >>>> zjffdu@gmail.com>
> > >>> > > > > > > > > > > >> >> >>>>>>>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is
> > >>> > late as
> > >>> > > > > you
> > >>> > > > > > > > > > > >> >> >> expected.
> > >>> > > > > > > > > > > >> >> >>>>> Here's
> > >>> > > > > > > > > > > >> >> >>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> design
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> doc
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> I drafted, welcome any comments and
> > >>> > > > > > feedback.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
> > >>> > > > > > 于2019年2月14日周四
> > >>> > > > > > > > > > > >> >> >>>> 下午8:43写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Nice that this discussion is
> > >>> > > > happening.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also
> > >>> > revisit the
> > >>> > > > > > entire
> > >>> > > > > > > > > role
> > >>> > > > > > > > > > > >> >> >>> of
> > >>> > > > > > > > > > > >> >> >>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> environments
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> again.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the environments take care of
> > >>> > the
> > >>> > > > > > specific
> > >>> > > > > > > > > > > >> >> >> setup
> > >>> > > > > > > > > > > >> >> >>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> standalone
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> (no
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the session ones have control
> > >>> > over
> > >>> > > > the
> > >>> > > > > > > > > session.
> > >>> > > > > > > > > > > >> >> >>> The
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> environment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> holds
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the session client.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - running a job gives a "control"
> > >>> > > > object
> > >>> > > > > > for
> > >>> > > > > > > > > that
> > >>> > > > > > > > > > > >> >> >>>> job.
> > >>> > > > > > > > > > > >> >> >>>>>> That
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> behavior
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> is
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the same in all environments.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> The actual implementation diverged
> > >>> > > > quite
> > >>> > > > > > a bit
> > >>> > > > > > > > > > > >> >> >> from
> > >>> > > > > > > > > > > >> >> >>>>> that.
> > >>> > > > > > > > > > > >> >> >>>>>>>> Happy
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> see a
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> discussion about straitening this
> > >>> > out
> > >>> > > > a
> > >>> > > > > > bit
> > >>> > > > > > > > > more.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM
> > >>> > Jeff
> > >>> > > > > > Zhang <
> > >>> > > > > > > > > > > >> >> >>>>>>> zjffdu@gmail.com>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Hi folks,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Sorry for late response, It
> > >>> > seems we
> > >>> > > > > > reach
> > >>> > > > > > > > > > > >> >> >>> consensus
> > >>> > > > > > > > > > > >> >> >>>> on
> > >>> > > > > > > > > > > >> >> >>>>>>>> this, I
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> will
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> create
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> FLIP for this with more detailed
> > >>> > > > design
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org>
> > >>> > > > > > 于2018年12月21日周五
> > >>> > > > > > > > > > > >> >> >>>>> 上午11:43写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Great to see this discussion
> > >>> > seeded!
> > >>> > > > > The
> > >>> > > > > > > > > > > >> >> >> problems
> > >>> > > > > > > > > > > >> >> >>>> you
> > >>> > > > > > > > > > > >> >> >>>>>> face
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> with
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also
> > >>> > > > > affecting
> > >>> > > > > > > > other
> > >>> > > > > > > > > > > >> >> >>>>> downstream
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> projects,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> like
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Beam.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint
> > >>> > > > restore
> > >>> > > > > > option
> > >>> > > > > > > > > in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> [1]
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and that was more difficult
> > >>> > than it
> > >>> > > > > > should
> > >>> > > > > > > > be.
> > >>> > > > > > > > > > > >> >> >> The
> > >>> > > > > > > > > > > >> >> >>>>> main
> > >>> > > > > > > > > > > >> >> >>>>>>>> issue
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> is
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> that
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> environment and cluster client
> > >>> > > > aren't
> > >>> > > > > > > > > decoupled.
> > >>> > > > > > > > > > > >> >> >>>>> Ideally
> > >>> > > > > > > > > > > >> >> >>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> should
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> possible to just get the
> > >>> > matching
> > >>> > > > > > cluster
> > >>> > > > > > > > > client
> > >>> > > > > > > > > > > >> >> >>>> from
> > >>> > > > > > > > > > > >> >> >>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> environment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> then control the job through it
> > >>> > > > > > (environment
> > >>> > > > > > > > > as
> > >>> > > > > > > > > > > >> >> >>>>> factory
> > >>> > > > > > > > > > > >> >> >>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> cluster
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> client). But note that the
> > >>> > > > environment
> > >>> > > > > > > > classes
> > >>> > > > > > > > > > > >> >> >> are
> > >>> > > > > > > > > > > >> >> >>>>> part
> > >>> > > > > > > > > > > >> >> >>>>>> of
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> public
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and it is not straightforward to
> > >>> > > > make
> > >>> > > > > > larger
> > >>> > > > > > > > > > > >> >> >>> changes
> > >>> > > > > > > > > > > >> >> >>>>>>> without
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> breaking
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes
> > >>> > > > > internal
> > >>> > > > > > > > > classes
> > >>> > > > > > > > > > > >> >> >>>> like
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> JobGraph and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be
> > >>> > > > possible
> > >>> > > > > > to
> > >>> > > > > > > > wrap
> > >>> > > > > > > > > > > >> >> >>> this
> > >>> > > > > > > > > > > >> >> >>>>>> with a
> > >>> > > > > > > > > > > >> >> >>>>>>>> new
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> public
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> API
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that brings the required job
> > >>> > control
> > >>> > > > > > > > > > > >> >> >> capabilities
> > >>> > > > > > > > > > > >> >> >>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> downstream
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> projects.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at
> > >>> > > > some
> > >>> > > > > > of the
> > >>> > > > > > > > > > > >> >> >>>>> interfaces
> > >>> > > > > > > > > > > >> >> >>>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Beam
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> while
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the
> > >>> > > > > > portable
> > >>> > > > > > > > job
> > >>> > > > > > > > > > > >> >> >> API
> > >>> > > > > > > > > > > >> >> >>>> and
> > >>> > > > > > > > > > > >> >> >>>>>> [3]
> > >>> > > > > > > > > > > >> >> >>>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> old
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> asynchronous job control from
> > >>> > the
> > >>> > > > Beam
> > >>> > > > > > Java
> > >>> > > > > > > > > SDK.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> The backward compatibility
> > >>> > > > discussion
> > >>> > > > > > [4] is
> > >>> > > > > > > > > > > >> >> >> also
> > >>> > > > > > > > > > > >> >> >>>>>> relevant
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> here. A
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> new
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> should shield downstream
> > >>> > projects
> > >>> > > > from
> > >>> > > > > > > > > internals
> > >>> > > > > > > > > > > >> >> >>> and
> > >>> > > > > > > > > > > >> >> >>>>>> allow
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> them to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> interoperate with multiple
> > >>> > future
> > >>> > > > > Flink
> > >>> > > > > > > > > versions
> > >>> > > > > > > > > > > >> >> >>> in
> > >>> > > > > > > > > > > >> >> >>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>> same
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> release
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> line
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thanks,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thomas
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [1]
> > >>> > > > > > > > https://github.com/apache/flink/pull/7249
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [2]
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [3]
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [4]
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM
> > >>> > Jeff
> > >>> > > > > > Zhang <
> > >>> > > > > > > > > > > >> >> >>>>>>>> zjffdu@gmail.com>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the
> > >>> > user
> > >>> > > > > > should
> > >>> > > > > > > > be
> > >>> > > > > > > > > > > >> >> >>> able
> > >>> > > > > > > > > > > >> >> >>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>> define
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> where
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn).
> > >>> > This
> > >>> > > > is
> > >>> > > > > > > > actually
> > >>> > > > > > > > > > > >> >> >>>>>> independent
> > >>> > > > > > > > > > > >> >> >>>>>>>> of
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> development and is something
> > >>> > which
> > >>> > > > is
> > >>> > > > > > > > decided
> > >>> > > > > > > > > > > >> >> >> at
> > >>> > > > > > > > > > > >> >> >>>>>>> deployment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> time.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> User don't need to specify
> > >>> > > > execution
> > >>> > > > > > mode
> > >>> > > > > > > > > > > >> >> >>>>>>> programmatically.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> They
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> can
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> also
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from
> > >>> > the
> > >>> > > > > > arguments
> > >>> > > > > > > > in
> > >>> > > > > > > > > > > >> >> >>> flink
> > >>> > > > > > > > > > > >> >> >>>>> run
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> command.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> e.g.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster
> > >>> > ....
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that
> > >>> > the
> > >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>> is
> > >>> > > > > > > > > > > >> >> >>>>>>>> not
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> directly
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> initialized by the user and
> > >>> > instead
> > >>> > > > > > context
> > >>> > > > > > > > > > > >> >> >>>> sensitive
> > >>> > > > > > > > > > > >> >> >>>>>> how
> > >>> > > > > > > > > > > >> >> >>>>>>>> you
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> want
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs.
> > >>> > > > IDE,
> > >>> > > > > > for
> > >>> > > > > > > > > > > >> >> >>> example).
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink
> > >>> > > > would
> > >>> > > > > > > > create
> > >>> > > > > > > > > > > >> >> >>>>> different
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment
> > >>> > based
> > >>> > > > on
> > >>> > > > > > > > > different
> > >>> > > > > > > > > > > >> >> >>>>>> submission
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> scenarios
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> (Flink
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is
> > >>> > kind of
> > >>> > > > > hack
> > >>> > > > > > > > > > > >> >> >> approach,
> > >>> > > > > > > > > > > >> >> >>>> not
> > >>> > > > > > > > > > > >> >> >>>>>> so
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> straightforward.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> What I suggested above is that
> > >>> > is
> > >>> > > > > that
> > >>> > > > > > > > flink
> > >>> > > > > > > > > > > >> >> >>> should
> > >>> > > > > > > > > > > >> >> >>>>>>> always
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> create
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with
> > >>> > > > > different
> > >>> > > > > > > > > > > >> >> >>>>> configuration,
> > >>> > > > > > > > > > > >> >> >>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> based
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> on
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> configuration it would create
> > >>> > the
> > >>> > > > > > proper
> > >>> > > > > > > > > > > >> >> >>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> different
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <
> > >>> > > > trohrmann@apache.org>
> > >>> > > > > > > > > > > >> >> >>>> 于2018年12月20日周四
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> 下午11:18写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> You are probably right that we
> > >>> > > > have
> > >>> > > > > > code
> > >>> > > > > > > > > > > >> >> >>>> duplication
> > >>> > > > > > > > > > > >> >> >>>>>>> when
> > >>> > > > > > > > > > > >> >> >>>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> comes
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient.
> > >>> > > > This
> > >>> > > > > > should
> > >>> > > > > > > > > be
> > >>> > > > > > > > > > > >> >> >>>>> reduced
> > >>> > > > > > > > > > > >> >> >>>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> future.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the
> > >>> > user
> > >>> > > > > > should be
> > >>> > > > > > > > > > > >> >> >> able
> > >>> > > > > > > > > > > >> >> >>> to
> > >>> > > > > > > > > > > >> >> >>>>>>> define
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> where
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn).
> > >>> > This
> > >>> > > > is
> > >>> > > > > > > > > actually
> > >>> > > > > > > > > > > >> >> >>>>>>> independent
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> of the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> development and is something
> > >>> > which
> > >>> > > > > is
> > >>> > > > > > > > > decided
> > >>> > > > > > > > > > > >> >> >> at
> > >>> > > > > > > > > > > >> >> >>>>>>>> deployment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> time.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> To
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the
> > >>> > > > > > ExecutionEnvironment
> > >>> > > > > > > > is
> > >>> > > > > > > > > > > >> >> >> not
> > >>> > > > > > > > > > > >> >> >>>>>>> directly
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> initialized
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> by
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> the user and instead context
> > >>> > > > > > sensitive how
> > >>> > > > > > > > > you
> > >>> > > > > > > > > > > >> >> >>>> want
> > >>> > > > > > > > > > > >> >> >>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> execute
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> your
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for
> > >>> > example).
> > >>> > > > > > > > However, I
> > >>> > > > > > > > > > > >> >> >>> agree
> > >>> > > > > > > > > > > >> >> >>>>>> that
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should
> > >>> > give
> > >>> > > > you
> > >>> > > > > > > > access
> > >>> > > > > > > > > to
> > >>> > > > > > > > > > > >> >> >>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the
> > >>> > > > > > JobGraph or
> > >>> > > > > > > > a
> > >>> > > > > > > > > > > >> >> >> job
> > >>> > > > > > > > > > > >> >> >>>>> plan).
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Till
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36
> > >>> > AM
> > >>> > > > Jeff
> > >>> > > > > > > > Zhang <
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You
> > >>> > are
> > >>> > > > > > right
> > >>> > > > > > > > > that I
> > >>> > > > > > > > > > > >> >> >>>>> expect
> > >>> > > > > > > > > > > >> >> >>>>>>>> better
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api
> > >>> > which
> > >>> > > > > > could be
> > >>> > > > > > > > > > > >> >> >> used
> > >>> > > > > > > > > > > >> >> >>> by
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> downstream
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> project.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> And
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the
> > >>> > flink
> > >>> > > > > > ecosystem.
> > >>> > > > > > > > > > > >> >> >> When
> > >>> > > > > > > > > > > >> >> >>> I
> > >>> > > > > > > > > > > >> >> >>>>> look
> > >>> > > > > > > > > > > >> >> >>>>>>> at
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> code
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> of
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> flink
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I
> > >>> > > > > believe
> > >>> > > > > > > > they
> > >>> > > > > > > > > > > >> >> >> are
> > >>> > > > > > > > > > > >> >> >>>> not
> > >>> > > > > > > > > > > >> >> >>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> core of
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> flink,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> but
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of
> > >>> > > > flink),
> > >>> > > > > I
> > >>> > > > > > find
> > >>> > > > > > > > > > > >> >> >> many
> > >>> > > > > > > > > > > >> >> >>>>>>> duplicated
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> code
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creating
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user
> > >>> > provided
> > >>> > > > > > > > > > > >> >> >> configuration
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> (configuration
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> format
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> may
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell
> > >>> > and
> > >>> > > > > > > > sql-client)
> > >>> > > > > > > > > > > >> >> >> and
> > >>> > > > > > > > > > > >> >> >>>> then
> > >>> > > > > > > > > > > >> >> >>>>>> use
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> that
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't
> > >>> > think
> > >>> > > > > > this is
> > >>> > > > > > > > > > > >> >> >>>>> convenient
> > >>> > > > > > > > > > > >> >> >>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> downstream
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is
> > >>> > that
> > >>> > > > > > > > downstream
> > >>> > > > > > > > > > > >> >> >>>> project
> > >>> > > > > > > > > > > >> >> >>>>>> only
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> needs
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> provide
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration info
> > >>> > > > (maybe
> > >>> > > > > > > > > > > >> >> >> introducing
> > >>> > > > > > > > > > > >> >> >>>>> class
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> FlinkConf),
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment
> > >>> > based
> > >>> > > > on
> > >>> > > > > > this
> > >>> > > > > > > > > > > >> >> >>>> FlinkConf,
> > >>> > > > > > > > > > > >> >> >>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will
> > >>> > create
> > >>> > > > > the
> > >>> > > > > > > > proper
> > >>> > > > > > > > > > > >> >> >>>>>>>> ClusterClient.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> It
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> not
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> only
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream
> > >>> > > > project
> > >>> > > > > > > > > > > >> >> >> development
> > >>> > > > > > > > > > > >> >> >>>> but
> > >>> > > > > > > > > > > >> >> >>>>>> also
> > >>> > > > > > > > > > > >> >> >>>>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> helpful
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> their integration test with
> > >>> > > > flink.
> > >>> > > > > > Here's
> > >>> > > > > > > > > one
> > >>> > > > > > > > > > > >> >> >>>>> sample
> > >>> > > > > > > > > > > >> >> >>>>>>> code
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> snippet
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> I
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new
> > >>> > > > > > FlinkConf().mode("yarn")
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new
> > >>> > > > > > ExecutionEnvironment(conf)
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > env.getClusterClient().queryJobStatus(jobId)
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > env.getClusterClient().cancelJob(jobId)
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
> > >>> > > > > trohrmann@apache.org>
> > >>> > > > > > > > > > > >> >> >>>>> 于2018年12月11日周二
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> 下午6:28写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to
> > >>> > > > > > provide the
> > >>> > > > > > > > > > > >> >> >> user
> > >>> > > > > > > > > > > >> >> >>>> with
> > >>> > > > > > > > > > > >> >> >>>>>>>> better
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was actually
> > >>> > an
> > >>> > > > > > effort to
> > >>> > > > > > > > > > > >> >> >>> achieve
> > >>> > > > > > > > > > > >> >> >>>>>> this
> > >>> > > > > > > > > > > >> >> >>>>>>>> but
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> has
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> never
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> been
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However,
> > >>> > there
> > >>> > > > are
> > >>> > > > > > some
> > >>> > > > > > > > > > > >> >> >>>> improvement
> > >>> > > > > > > > > > > >> >> >>>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> code
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> base
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> now.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the
> > >>> > > > > > NewClusterClient
> > >>> > > > > > > > > > > >> >> >>>> interface
> > >>> > > > > > > > > > > >> >> >>>>>>> which
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> offers a
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission.
> > >>> > > > But I
> > >>> > > > > > agree
> > >>> > > > > > > > > > > >> >> >> that
> > >>> > > > > > > > > > > >> >> >>> we
> > >>> > > > > > > > > > > >> >> >>>>>> need
> > >>> > > > > > > > > > > >> >> >>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> improve
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if
> > >>> > > > > > exposing all
> > >>> > > > > > > > > > > >> >> >>>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> calls
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> via
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > >>> > because it
> > >>> > > > > > would
> > >>> > > > > > > > > > > >> >> >> clutter
> > >>> > > > > > > > > > > >> >> >>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>> class
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> would
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> not
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> a
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns.
> > >>> > > > > > Instead one
> > >>> > > > > > > > > > > >> >> >> idea
> > >>> > > > > > > > > > > >> >> >>>>> could
> > >>> > > > > > > > > > > >> >> >>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> retrieve
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from
> > >>> > the
> > >>> > > > > > > > > > > >> >> >>>>> ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>> which
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> can
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> used
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job
> > >>> > control. But
> > >>> > > > > > before
> > >>> > > > > > > > we
> > >>> > > > > > > > > > > >> >> >>> start
> > >>> > > > > > > > > > > >> >> >>>>> an
> > >>> > > > > > > > > > > >> >> >>>>>>>> effort
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> here,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> we
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> need
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what
> > >>> > > > > > functionality we
> > >>> > > > > > > > > want
> > >>> > > > > > > > > > > >> >> >>> to
> > >>> > > > > > > > > > > >> >> >>>>>>> provide.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was
> > >>> > that we
> > >>> > > > > > have the
> > >>> > > > > > > > > > > >> >> >>>>>>>> ClusterDescriptor
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> describing
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> how
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager
> > >>> > like
> > >>> > > > > > Yarn or
> > >>> > > > > > > > > > > >> >> >> Mesos.
> > >>> > > > > > > > > > > >> >> >>>> The
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> can
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink
> > >>> > > > clusters
> > >>> > > > > > (job
> > >>> > > > > > > > and
> > >>> > > > > > > > > > > >> >> >>>>> session)
> > >>> > > > > > > > > > > >> >> >>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> gives
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> you a
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
> > >>> > ClusterClient
> > >>> > > > > > > > controls
> > >>> > > > > > > > > > > >> >> >> the
> > >>> > > > > > > > > > > >> >> >>>>>> cluster
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> (e.g.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> submitting
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running
> > >>> > jobs).
> > >>> > > > > And
> > >>> > > > > > > > then
> > >>> > > > > > > > > > > >> >> >>> there
> > >>> > > > > > > > > > > >> >> >>>>> was
> > >>> > > > > > > > > > > >> >> >>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> idea
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain
> > >>> > from
> > >>> > > > > the
> > >>> > > > > > > > > > > >> >> >>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> trigger
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> specific
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a
> > >>> > > > > savepoint,
> > >>> > > > > > > > > > > >> >> >>> cancelling
> > >>> > > > > > > > > > > >> >> >>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>> job).
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > https://issues.apache.org/jira/browse/FLINK-4272
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at
> > >>> > 10:13 AM
> > >>> > > > > > Jeff
> > >>> > > > > > > > > Zhang
> > >>> > > > > > > > > > > >> >> >> <
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate
> > >>> > flink
> > >>> > > > > into
> > >>> > > > > > > > > apache
> > >>> > > > > > > > > > > >> >> >>>>> zeppelin
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> which is
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> an
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several
> > >>> > > > > issues
> > >>> > > > > > that
> > >>> > > > > > > > > is
> > >>> > > > > > > > > > > >> >> >>>> caused
> > >>> > > > > > > > > > > >> >> >>>>>> by
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> flink
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> client
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> api.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> So
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the
> > >>> > > > > following
> > >>> > > > > > > > > changes
> > >>> > > > > > > > > > > >> >> >>> for
> > >>> > > > > > > > > > > >> >> >>>>>> flink
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> client
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> api.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking
> > >>> > > > execution.
> > >>> > > > > > > > > > > >> >> >> Currently,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which
> > >>> > > > would
> > >>> > > > > > do 2
> > >>> > > > > > > > > > > >> >> >> things,
> > >>> > > > > > > > > > > >> >> >>>>> first
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> submit
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is
> > >>> > > > > finished.
> > >>> > > > > > I'd
> > >>> > > > > > > > > like
> > >>> > > > > > > > > > > >> >> >>>>>>> introduce a
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> nonblocking
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
> > >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment#submit
> > >>> > > > > > > > > > > >> >> >>>>>>> which
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> only
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> submit
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to
> > >>> > client.
> > >>> > > > And
> > >>> > > > > > allow
> > >>> > > > > > > > > user
> > >>> > > > > > > > > > > >> >> >>> to
> > >>> > > > > > > > > > > >> >> >>>>>> query
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> status
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> via
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >> ExecutionEnvironment/StreamExecutionEnvironment,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to
> > >>> > > > cancel
> > >>> > > > > > job is
> > >>> > > > > > > > > via
> > >>> > > > > > > > > > > >> >> >>> cli
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> (bin/flink),
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> this
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> is
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> not
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream
> > >>> > > > project
> > >>> > > > > > to
> > >>> > > > > > > > use
> > >>> > > > > > > > > > > >> >> >> this
> > >>> > > > > > > > > > > >> >> >>>>>>> feature.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> So I'd
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> like
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> add
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in
> > >>> > > > > ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > ExecutionEnvironment/StreamExecutionEnvironment.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> It
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api,
> > >>> > we
> > >>> > > > > > should use
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> as
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to
> > >>> > > > integrate
> > >>> > > > > > with
> > >>> > > > > > > > > > > >> >> >> flink.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job
> > >>> > > > execution
> > >>> > > > > > > > > > > >> >> >> lifecycle.
> > >>> > > > > > > > > > > >> >> >>>>>>> Something
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> like
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> following,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> so
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project
> > >>> > can do
> > >>> > > > > > custom
> > >>> > > > > > > > > logic
> > >>> > > > > > > > > > > >> >> >>> in
> > >>> > > > > > > > > > > >> >> >>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> lifecycle
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> of
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> job.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the
> > >>> > > > jobId
> > >>> > > > > > after
> > >>> > > > > > > > > job
> > >>> > > > > > > > > > > >> >> >> is
> > >>> > > > > > > > > > > >> >> >>>>>>> submitted
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> use
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> this
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later
> > >>> > when
> > >>> > > > > > > > necessary.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface
> > >>> > JobListener {
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID
> > >>> > > > > jobId);
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
> > >>> > > > > > onJobExecuted(JobExecutionResult
> > >>> > > > > > > > > > > >> >> >>>>> jobResult);
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID
> > >>> > > > jobId);
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in
> > >>> > > > > > > > > ExecutionEnvironment.
> > >>> > > > > > > > > > > >> >> >>>>>> Currently
> > >>> > > > > > > > > > > >> >> >>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> is
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> disabled,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> but
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient
> > >>> > for
> > >>> > > > > > third
> > >>> > > > > > > > > party
> > >>> > > > > > > > > > > >> >> >> to
> > >>> > > > > > > > > > > >> >> >>>>>>>> submitting
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> jobs
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it
> > >>> > > > again.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client
> > >>> > api
> > >>> > > > > into
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > ExecutionEnvironment/StreamExecutionEnvironment.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue
> > >>> > which
> > >>> > > > > > needs
> > >>> > > > > > > > > more
> > >>> > > > > > > > > > > >> >> >>>>> careful
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> thinking
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> design.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features
> > >>> > of
> > >>> > > > > > flink is
> > >>> > > > > > > > > > > >> >> >>> exposed
> > >>> > > > > > > > > > > >> >> >>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > ExecutionEnvironment/StreamExecutionEnvironment,
> > >>> > > > > > > > > > > >> >> >>>>>> but
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> some are
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> exposed
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like
> > >>> > the
> > >>> > > > > > cancel and
> > >>> > > > > > > > > > > >> >> >>>>> savepoint I
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> mentioned
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> above.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is
> > >>> > due to
> > >>> > > > > that
> > >>> > > > > > > > flink
> > >>> > > > > > > > > > > >> >> >>>> didn't
> > >>> > > > > > > > > > > >> >> >>>>>>> unify
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> interaction
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> with
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3
> > >>> > scenarios
> > >>> > > > of
> > >>> > > > > > flink
> > >>> > > > > > > > > > > >> >> >>>> operation
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.
> > >>> > Flink
> > >>> > > > > will
> > >>> > > > > > > > > create
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> LocalEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> use
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to
> > >>> > > > create
> > >>> > > > > > > > > > > >> >> >>> LocalExecutor
> > >>> > > > > > > > > > > >> >> >>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> execution.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution.
> > >>> > Flink
> > >>> > > > > will
> > >>> > > > > > > > > create
> > >>> > > > > > > > > > > >> >> >>>>>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> first
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment
> > >>> > > > based
> > >>> > > > > > on the
> > >>> > > > > > > > > > > >> >> >>>>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> run
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink
> > >>> > will
> > >>> > > > > > create
> > >>> > > > > > > > > > > >> >> >>>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> first
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this
> > >>> > > > ClusterClient.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the
> > >>> > above 3
> > >>> > > > > > > > scenarios.
> > >>> > > > > > > > > > > >> >> >>> Flink
> > >>> > > > > > > > > > > >> >> >>>>>> didn't
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> use the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to
> > >>> > interact
> > >>> > > > > > with
> > >>> > > > > > > > > flink
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is
> > >>> > following:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> > >>> > > > > > > > > > > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> (based
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> on
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> user
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this
> > >>> > > > > > Environment
> > >>> > > > > > > > > to
> > >>> > > > > > > > > > > >> >> >>>> create
> > >>> > > > > > > > > > > >> >> >>>>>>>> proper
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or
> > >>> > > > > > > > RestClusterClient)
> > >>> > > > > > > > > > > >> >> >> to
> > >>> > > > > > > > > > > >> >> >>>>>>>> interactive
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> with
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink (
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the
> > >>> > > > process
> > >>> > > > > > of
> > >>> > > > > > > > > local
> > >>> > > > > > > > > > > >> >> >>>>>> execution
> > >>> > > > > > > > > > > >> >> >>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> remote
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for
> > >>> > third
> > >>> > > > > > party
> > >>> > > > > > > > to
> > >>> > > > > > > > > > > >> >> >>>>> integrate
> > >>> > > > > > > > > > > >> >> >>>>>>> with
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> flink,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> because
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the
> > >>> > > > > unified
> > >>> > > > > > > > entry
> > >>> > > > > > > > > > > >> >> >>> point
> > >>> > > > > > > > > > > >> >> >>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> flink.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> third
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> party
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass
> > >>> > > > > > configuration
> > >>> > > > > > > > to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will
> > >>> > do
> > >>> > > > the
> > >>> > > > > > right
> > >>> > > > > > > > > > > >> >> >> thing
> > >>> > > > > > > > > > > >> >> >>>>> based
> > >>> > > > > > > > > > > >> >> >>>>>> on
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be
> > >>> > > > considered
> > >>> > > > > as
> > >>> > > > > > > > flink
> > >>> > > > > > > > > > > >> >> >> api
> > >>> > > > > > > > > > > >> >> >>>>>>> consumer.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> just
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to
> > >>> > > > > > ExecutionEnvironment
> > >>> > > > > > > > and
> > >>> > > > > > > > > > > >> >> >> let
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper
> > >>> > ClusterClient
> > >>> > > > > > instead
> > >>> > > > > > > > > of
> > >>> > > > > > > > > > > >> >> >>>>> letting
> > >>> > > > > > > > > > > >> >> >>>>>>> cli
> > >>> > > > > > > > > > > >> >> >>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> create
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code
> > >>> > > > > > refactoring,
> > >>> > > > > > > > > so
> > >>> > > > > > > > > > > >> >> >> I
> > >>> > > > > > > > > > > >> >> >>>>> think
> > >>> > > > > > > > > > > >> >> >>>>>> we
> > >>> > > > > > > > > > > >> >> >>>>>>>> can
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> defer
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5
> > >>> > could
> > >>> > > > > be
> > >>> > > > > > done
> > >>> > > > > > > > > at
> > >>> > > > > > > > > > > >> >> >>>> once I
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> believe.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Let
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> know
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> your
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback,
> > >>> > thanks
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> --
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> --
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> --
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> --
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> --
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>> --
> > >>> > > > > > > > > > > >> >> >>>>>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>> --
> > >>> > > > > > > > > > > >> >> >>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >> > --
> > >>> > > > > > > > > > > >> >> > Best Regards
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >> > Jeff Zhang
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> >
> > >>>
> > >>>
> > >>> --
> > >>> Best Regards
> > >>>
> > >>> Jeff Zhang

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

Thanks for your update Kostas!

It looks good to me that clean up existing code paths as first
pass. I'd like to help on review and file subtasks if I find ones.

Best,
tison.


Kostas Kloudas <kk...@gmail.com> 于2019年9月4日周三 下午5:52写道：

> Here is the issue, and I will keep on updating it as I find more issues.
>
> https://issues.apache.org/jira/browse/FLINK-13954
>
> This will also cover the refactoring of the Executors that we discussed
> in this thread, without any additional functionality (such as the job
> client).
>
> Kostas
>
> On Wed, Sep 4, 2019 at 11:46 AM Kostas Kloudas <kk...@gmail.com> wrote:
> >
> > Great idea Tison!
> >
> > I will create the umbrella issue and post it here so that we are all
> > on the same page!
> >
> > Cheers,
> > Kostas
> >
> > On Wed, Sep 4, 2019 at 11:36 AM Zili Chen <wa...@gmail.com> wrote:
> > >
> > > Hi Kostas & Aljoscha,
> > >
> > > I notice that there is a JIRA(FLINK-13946) which could be included
> > > in this refactor thread. Since we agree on most of directions in
> > > big picture, is it reasonable that we create an umbrella issue for
> > > refactor client APIs and also linked subtasks? It would be a better
> > > way that we join forces of our community.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Zili Chen <wa...@gmail.com> 于2019年8月31日周六 下午12:52写道：
> > >>
> > >> Great Kostas! Looking forward to your POC!
> > >>
> > >> Best,
> > >> tison.
> > >>
> > >>
> > >> Jeff Zhang <zj...@gmail.com> 于2019年8月30日周五 下午11:07写道：
> > >>>
> > >>> Awesome, @Kostas Looking forward your POC.
> > >>>
> > >>> Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：
> > >>>
> > >>> > Hi all,
> > >>> >
> > >>> > I am just writing here to let you know that I am working on a POC
> that
> > >>> > tries to refactor the current state of job submission in Flink.
> > >>> > I want to stress out that it introduces NO CHANGES to the current
> > >>> > behaviour of Flink. It just re-arranges things and introduces the
> > >>> > notion of an Executor, which is the entity responsible for taking
> the
> > >>> > user-code and submitting it for execution.
> > >>> >
> > >>> > Given this, the discussion about the functionality that the
> JobClient
> > >>> > will expose to the user can go on independently and the same
> > >>> > holds for all the open questions so far.
> > >>> >
> > >>> > I hope I will have some more new to share soon.
> > >>> >
> > >>> > Thanks,
> > >>> > Kostas
> > >>> >
> > >>> > On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <da...@gmail.com>
> wrote:
> > >>> > >
> > >>> > > Hi Zili,
> > >>> > >
> > >>> > > It make sense to me that a dedicated cluster is started for a
> per-job
> > >>> > > cluster and will not accept more jobs.
> > >>> > > Just have a question about the command line.
> > >>> > >
> > >>> > > Currently we could use the following commands to start different
> > >>> > clusters.
> > >>> > > *per-job cluster*
> > >>> > > ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
> > >>> > > examples/streaming/WindowJoin.jar
> > >>> > > *session cluster*
> > >>> > > ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
> > >>> > > examples/streaming/WindowJoin.jar
> > >>> > >
> > >>> > > What will it look like after client enhancement?
> > >>> > >
> > >>> > >
> > >>> > > Best,
> > >>> > > Yang
> > >>> > >
> > >>> > > Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
> > >>> > >
> > >>> > > > Hi Till,
> > >>> > > >
> > >>> > > > Thanks for your update. Nice to hear :-)
> > >>> > > >
> > >>> > > > Best,
> > >>> > > > tison.
> > >>> > > >
> > >>> > > >
> > >>> > > > Till Rohrmann <tr...@apache.org> 于2019年8月23日周五 下午10:39写道：
> > >>> > > >
> > >>> > > > > Hi Tison,
> > >>> > > > >
> > >>> > > > > just a quick comment concerning the class loading issues
> when using
> > >>> > the
> > >>> > > > per
> > >>> > > > > job mode. The community wants to change it so that the
> > >>> > > > > StandaloneJobClusterEntryPoint actually uses the user code
> class
> > >>> > loader
> > >>> > > > > with child first class loading [1]. Hence, I hope that this
> problem
> > >>> > will
> > >>> > > > be
> > >>> > > > > resolved soon.
> > >>> > > > >
> > >>> > > > > [1] https://issues.apache.org/jira/browse/FLINK-13840
> > >>> > > > >
> > >>> > > > > Cheers,
> > >>> > > > > Till
> > >>> > > > >
> > >>> > > > > On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <
> kkloudas@gmail.com>
> > >>> > > > wrote:
> > >>> > > > >
> > >>> > > > > > Hi all,
> > >>> > > > > >
> > >>> > > > > > On the topic of web submission, I agree with Till that it
> only
> > >>> > seems
> > >>> > > > > > to complicate things.
> > >>> > > > > > It is bad for security, job isolation (anybody can
> submit/cancel
> > >>> > jobs),
> > >>> > > > > > and its
> > >>> > > > > > implementation complicates some parts of the code. So, if
> it were
> > >>> > to
> > >>> > > > > > redesign the
> > >>> > > > > > WebUI, maybe this part could be left out. In addition, I
> would say
> > >>> > > > > > that the ability to cancel
> > >>> > > > > > jobs could also be left out.
> > >>> > > > > >
> > >>> > > > > > Also I would also be in favour of removing the "detached"
> mode, for
> > >>> > > > > > the reasons mentioned
> > >>> > > > > > above (i.e. because now we will have a future representing
> the
> > >>> > result
> > >>> > > > > > on which the user
> > >>> > > > > > can choose to wait or not).
> > >>> > > > > >
> > >>> > > > > > Now for the separating job submission and cluster
> creation, I am in
> > >>> > > > > > favour of keeping both.
> > >>> > > > > > Once again, the reasons are mentioned above by Stephan,
> Till,
> > >>> > Aljoscha
> > >>> > > > > > and also Zili seems
> > >>> > > > > > to agree. They mainly have to do with security, isolation
> and ease
> > >>> > of
> > >>> > > > > > resource management
> > >>> > > > > > for the user as he knows that "when my job is done,
> everything
> > >>> > will be
> > >>> > > > > > cleared up". This is
> > >>> > > > > > also the experience you get when launching a process on
> your local
> > >>> > OS.
> > >>> > > > > >
> > >>> > > > > > On excluding the per-job mode from returning a JobClient
> or not, I
> > >>> > > > > > believe that eventually
> > >>> > > > > > it would be nice to allow users to get back a jobClient.
> The
> > >>> > reason is
> > >>> > > > > > that 1) I cannot
> > >>> > > > > > find any objective reason why the user-experience should
> diverge,
> > >>> > and
> > >>> > > > > > 2) this will be the
> > >>> > > > > > way that the user will be able to interact with his
> running job.
> > >>> > > > > > Assuming that the necessary
> > >>> > > > > > ports are open for the REST API to work, then I think that
> the
> > >>> > > > > > JobClient can run against the
> > >>> > > > > > REST API without problems. If the needed ports are not
> open, then
> > >>> > we
> > >>> > > > > > are safe to not return
> > >>> > > > > > a JobClient, as the user explicitly chose to close all
> points of
> > >>> > > > > > communication to his running job.
> > >>> > > > > >
> > >>> > > > > > On the topic of not hijacking the "env.execute()" in order
> to get
> > >>> > the
> > >>> > > > > > Plan, I definitely agree but
> > >>> > > > > > for the proposal of having a "compile()" method in the
> env, I would
> > >>> > > > > > like to have a better look at
> > >>> > > > > > the existing code.
> > >>> > > > > >
> > >>> > > > > > Cheers,
> > >>> > > > > > Kostas
> > >>> > > > > >
> > >>> > > > > > On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <
> wander4096@gmail.com>
> > >>> > > > wrote:
> > >>> > > > > > >
> > >>> > > > > > > Hi Yang,
> > >>> > > > > > >
> > >>> > > > > > > It would be helpful if you check Stephan's last comment,
> > >>> > > > > > > which states that isolation is important.
> > >>> > > > > > >
> > >>> > > > > > > For per-job mode, we run a dedicated cluster(maybe it
> > >>> > > > > > > should have been a couple of JM and TMs during FLIP-6
> > >>> > > > > > > design) for a specific job. Thus the process is prevented
> > >>> > > > > > > from other jobs.
> > >>> > > > > > >
> > >>> > > > > > > In our cases there was a time we suffered from multi
> > >>> > > > > > > jobs submitted by different users and they affected
> > >>> > > > > > > each other so that all ran into an error state. Also,
> > >>> > > > > > > run the client inside the cluster could save client
> > >>> > > > > > > resource at some points.
> > >>> > > > > > >
> > >>> > > > > > > However, we also face several issues as you mentioned,
> > >>> > > > > > > that in per-job mode it always uses parent classloader
> > >>> > > > > > > thus classloading issues occur.
> > >>> > > > > > >
> > >>> > > > > > > BTW, one can makes an analogy between session/per-job
> mode
> > >>> > > > > > > in  Flink, and client/cluster mode in Spark.
> > >>> > > > > > >
> > >>> > > > > > > Best,
> > >>> > > > > > > tison.
> > >>> > > > > > >
> > >>> > > > > > >
> > >>> > > > > > > Yang Wang <da...@gmail.com> 于2019年8月22日周四
> 上午11:25写道：
> > >>> > > > > > >
> > >>> > > > > > > > From the user's perspective, it is really confused
> about the
> > >>> > scope
> > >>> > > > of
> > >>> > > > > > > > per-job cluster.
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > > If it means a flink cluster with single job, so that
> we could
> > >>> > get
> > >>> > > > > > better
> > >>> > > > > > > > isolation.
> > >>> > > > > > > >
> > >>> > > > > > > > Now it does not matter how we deploy the cluster,
> directly
> > >>> > > > > > deploy(mode1)
> > >>> > > > > > > >
> > >>> > > > > > > > or start a flink cluster and then submit job through
> cluster
> > >>> > > > > > client(mode2).
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > > Otherwise, if it just means directly deploy, how
> should we
> > >>> > name the
> > >>> > > > > > mode2,
> > >>> > > > > > > >
> > >>> > > > > > > > session with job or something else?
> > >>> > > > > > > >
> > >>> > > > > > > > We could also benefit from the mode2. Users could get
> the same
> > >>> > > > > > isolation
> > >>> > > > > > > > with mode1.
> > >>> > > > > > > >
> > >>> > > > > > > > The user code and dependencies will be loaded by user
> class
> > >>> > loader
> > >>> > > > > > > >
> > >>> > > > > > > > to avoid class conflict with framework.
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > > Anyway, both of the two submission modes are useful.
> > >>> > > > > > > >
> > >>> > > > > > > > We just need to clarify the concepts.
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > > Best,
> > >>> > > > > > > >
> > >>> > > > > > > > Yang
> > >>> > > > > > > >
> > >>> > > > > > > > Zili Chen <wa...@gmail.com> 于2019年8月20日周二
> 下午5:58写道：
> > >>> > > > > > > >
> > >>> > > > > > > > > Thanks for the clarification.
> > >>> > > > > > > > >
> > >>> > > > > > > > > The idea JobDeployer ever came into my mind when I
> was
> > >>> > muddled
> > >>> > > > with
> > >>> > > > > > > > > how to execute per-job mode and session mode with
> the same
> > >>> > user
> > >>> > > > > code
> > >>> > > > > > > > > and framework codepath.
> > >>> > > > > > > > >
> > >>> > > > > > > > > With the concept JobDeployer we back to the
> statement that
> > >>> > > > > > environment
> > >>> > > > > > > > > knows every configs of cluster deployment and job
> > >>> > submission. We
> > >>> > > > > > > > > configure or generate from configuration a specific
> > >>> > JobDeployer
> > >>> > > > in
> > >>> > > > > > > > > environment and then code align on
> > >>> > > > > > > > >
> > >>> > > > > > > > > *JobClient client = env.execute().get();*
> > >>> > > > > > > > >
> > >>> > > > > > > > > which in session mode returned by
> clusterClient.submitJob
> > >>> > and in
> > >>> > > > > > per-job
> > >>> > > > > > > > > mode returned by clusterDescriptor.deployJobCluster.
> > >>> > > > > > > > >
> > >>> > > > > > > > > Here comes a problem that currently we directly run
> > >>> > > > > ClusterEntrypoint
> > >>> > > > > > > > > with extracted job graph. Follow the JobDeployer way
> we'd
> > >>> > better
> > >>> > > > > > > > > align entry point of per-job deployment at
> JobDeployer.
> > >>> > Users run
> > >>> > > > > > > > > their main method or by a Cli(finally call main
> method) to
> > >>> > deploy
> > >>> > > > > the
> > >>> > > > > > > > > job cluster.
> > >>> > > > > > > > >
> > >>> > > > > > > > > Best,
> > >>> > > > > > > > > tison.
> > >>> > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > > > Stephan Ewen <se...@apache.org> 于2019年8月20日周二
> 下午4:40写道：
> > >>> > > > > > > > >
> > >>> > > > > > > > > > Till has made some good comments here.
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > Two things to add:
> > >>> > > > > > > > > >
> > >>> > > > > > > > > >   - The job mode is very nice in the way that it
> runs the
> > >>> > > > client
> > >>> > > > > > inside
> > >>> > > > > > > > > the
> > >>> > > > > > > > > > cluster (in the same image/process that is the JM)
> and thus
> > >>> > > > > unifies
> > >>> > > > > > > > both
> > >>> > > > > > > > > > applications and what the Spark world calls the
> "driver
> > >>> > mode".
> > >>> > > > > > > > > >
> > >>> > > > > > > > > >   - Another thing I would add is that during the
> FLIP-6
> > >>> > design,
> > >>> > > > > we
> > >>> > > > > > were
> > >>> > > > > > > > > > thinking about setups where Dispatcher and
> JobManager are
> > >>> > > > > separate
> > >>> > > > > > > > > > processes.
> > >>> > > > > > > > > >     A Yarn or Mesos Dispatcher of a session could
> run
> > >>> > > > > independently
> > >>> > > > > > > > (even
> > >>> > > > > > > > > > as privileged processes executing no code).
> > >>> > > > > > > > > >     Then you the "per-job" mode could still be
> helpful:
> > >>> > when a
> > >>> > > > > job
> > >>> > > > > > is
> > >>> > > > > > > > > > submitted to the dispatcher, it launches the JM
> again in a
> > >>> > > > > per-job
> > >>> > > > > > > > mode,
> > >>> > > > > > > > > so
> > >>> > > > > > > > > > that JM and TM processes are bound to teh job
> only. For
> > >>> > higher
> > >>> > > > > > security
> > >>> > > > > > > > > > setups, it is important that processes are not
> reused
> > >>> > across
> > >>> > > > > jobs.
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
> > >>> > > > > > trohrmann@apache.org>
> > >>> > > > > > > > > > wrote:
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > > I would not be in favour of getting rid of the
> per-job
> > >>> > mode
> > >>> > > > > > since it
> > >>> > > > > > > > > > > simplifies the process of running Flink jobs
> > >>> > considerably.
> > >>> > > > > > Moreover,
> > >>> > > > > > > > it
> > >>> > > > > > > > > > is
> > >>> > > > > > > > > > > not only well suited for container deployments
> but also
> > >>> > for
> > >>> > > > > > > > deployments
> > >>> > > > > > > > > > > where you want to guarantee job isolation. For
> example, a
> > >>> > > > user
> > >>> > > > > > could
> > >>> > > > > > > > > use
> > >>> > > > > > > > > > > the per-job mode on Yarn to execute his job on a
> separate
> > >>> > > > > > cluster.
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > I think that having two notions of cluster
> deployments
> > >>> > > > (session
> > >>> > > > > > vs.
> > >>> > > > > > > > > > per-job
> > >>> > > > > > > > > > > mode) does not necessarily contradict your ideas
> for the
> > >>> > > > client
> > >>> > > > > > api
> > >>> > > > > > > > > > > refactoring. For example one could have the
> following
> > >>> > > > > interfaces:
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > - ClusterDeploymentDescriptor: encapsulates the
> logic
> > >>> > how to
> > >>> > > > > > deploy a
> > >>> > > > > > > > > > > cluster.
> > >>> > > > > > > > > > > - ClusterClient: allows to interact with a
> cluster
> > >>> > > > > > > > > > > - JobClient: allows to interact with a running
> job
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > Now the ClusterDeploymentDescriptor could have
> two
> > >>> > methods:
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > - ClusterClient deploySessionCluster()
> > >>> > > > > > > > > > > - JobClusterClient/JobClient
> > >>> > deployPerJobCluster(JobGraph)
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > where JobClusterClient is either a supertype of
> > >>> > ClusterClient
> > >>> > > > > > which
> > >>> > > > > > > > > does
> > >>> > > > > > > > > > > not give you the functionality to submit jobs or
> > >>> > > > > > deployPerJobCluster
> > >>> > > > > > > > > > > returns directly a JobClient.
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > When setting up the ExecutionEnvironment, one
> would then
> > >>> > not
> > >>> > > > > > provide
> > >>> > > > > > > > a
> > >>> > > > > > > > > > > ClusterClient to submit jobs but a JobDeployer
> which,
> > >>> > > > depending
> > >>> > > > > > on
> > >>> > > > > > > > the
> > >>> > > > > > > > > > > selected mode, either uses a ClusterClient
> (session
> > >>> > mode) to
> > >>> > > > > > submit
> > >>> > > > > > > > > jobs
> > >>> > > > > > > > > > or
> > >>> > > > > > > > > > > a ClusterDeploymentDescriptor to deploy per a
> job mode
> > >>> > > > cluster
> > >>> > > > > > with
> > >>> > > > > > > > the
> > >>> > > > > > > > > > job
> > >>> > > > > > > > > > > to execute.
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > These are just some thoughts how one could make
> it
> > >>> > working
> > >>> > > > > > because I
> > >>> > > > > > > > > > > believe there is some value in using the per job
> mode
> > >>> > from
> > >>> > > > the
> > >>> > > > > > > > > > > ExecutionEnvironment.
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > Concerning the web submission, this is indeed a
> bit
> > >>> > tricky.
> > >>> > > > > From
> > >>> > > > > > a
> > >>> > > > > > > > > > cluster
> > >>> > > > > > > > > > > management stand point, I would in favour of not
> > >>> > executing
> > >>> > > > user
> > >>> > > > > > code
> > >>> > > > > > > > on
> > >>> > > > > > > > > > the
> > >>> > > > > > > > > > > REST endpoint. Especially when considering
> security, it
> > >>> > would
> > >>> > > > > be
> > >>> > > > > > good
> > >>> > > > > > > > > to
> > >>> > > > > > > > > > > have a well defined cluster behaviour where it is
> > >>> > explicitly
> > >>> > > > > > stated
> > >>> > > > > > > > > where
> > >>> > > > > > > > > > > user code and, thus, potentially risky code is
> executed.
> > >>> > > > > Ideally
> > >>> > > > > > we
> > >>> > > > > > > > > limit
> > >>> > > > > > > > > > > it to the TaskExecutor and JobMaster.
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > Cheers,
> > >>> > > > > > > > > > > Till
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > On Tue, Aug 20, 2019 at 9:40 AM Flavio
> Pompermaier <
> > >>> > > > > > > > > pompermaier@okkam.it
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > wrote:
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > > > > In my opinion the client should not use any
> > >>> > environment to
> > >>> > > > > get
> > >>> > > > > > the
> > >>> > > > > > > > > Job
> > >>> > > > > > > > > > > > graph because the jar should reside ONLY on
> the cluster
> > >>> > > > (and
> > >>> > > > > > not in
> > >>> > > > > > > > > the
> > >>> > > > > > > > > > > > client classpath otherwise there are always
> > >>> > inconsistencies
> > >>> > > > > > between
> > >>> > > > > > > > > > > client
> > >>> > > > > > > > > > > > and Flink Job manager's classpath).
> > >>> > > > > > > > > > > > In the YARN, Mesos and Kubernetes scenarios
> you have
> > >>> > the
> > >>> > > > jar
> > >>> > > > > > but
> > >>> > > > > > > > you
> > >>> > > > > > > > > > > could
> > >>> > > > > > > > > > > > start a cluster that has the jar on the Job
> Manager as
> > >>> > well
> > >>> > > > > > (but
> > >>> > > > > > > > this
> > >>> > > > > > > > > > is
> > >>> > > > > > > > > > > > the only case where I think you can assume
> that the
> > >>> > client
> > >>> > > > > has
> > >>> > > > > > the
> > >>> > > > > > > > > jar
> > >>> > > > > > > > > > on
> > >>> > > > > > > > > > > > the classpath..in the REST job submission you
> don't
> > >>> > have
> > >>> > > > any
> > >>> > > > > > > > > > classpath).
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > > > Thus, always in my opinion, the JobGraph
> should be
> > >>> > > > generated
> > >>> > > > > > by the
> > >>> > > > > > > > > Job
> > >>> > > > > > > > > > > > Manager REST API.
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
> > >>> > > > > > wander4096@gmail.com>
> > >>> > > > > > > > > > wrote:
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > > >> I would like to involve Till & Stephan here
> to clarify
> > >>> > > > some
> > >>> > > > > > > > concept
> > >>> > > > > > > > > of
> > >>> > > > > > > > > > > >> per-job mode.
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > > >> The term per-job is one of modes a cluster
> could run
> > >>> > on.
> > >>> > > > It
> > >>> > > > > is
> > >>> > > > > > > > > mainly
> > >>> > > > > > > > > > > >> aimed
> > >>> > > > > > > > > > > >> at spawn
> > >>> > > > > > > > > > > >> a dedicated cluster for a specific job while
> the job
> > >>> > could
> > >>> > > > > be
> > >>> > > > > > > > > packaged
> > >>> > > > > > > > > > > >> with
> > >>> > > > > > > > > > > >> Flink
> > >>> > > > > > > > > > > >> itself and thus the cluster initialized with
> job so
> > >>> > that
> > >>> > > > get
> > >>> > > > > > rid
> > >>> > > > > > > > of
> > >>> > > > > > > > > a
> > >>> > > > > > > > > > > >> separated
> > >>> > > > > > > > > > > >> submission step.
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > > >> This is useful for container deployments
> where one
> > >>> > create
> > >>> > > > > his
> > >>> > > > > > > > image
> > >>> > > > > > > > > > with
> > >>> > > > > > > > > > > >> the job
> > >>> > > > > > > > > > > >> and then simply deploy the container.
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > > >> However, it is out of client scope since a
> > >>> > > > > > client(ClusterClient
> > >>> > > > > > > > for
> > >>> > > > > > > > > > > >> example) is for
> > >>> > > > > > > > > > > >> communicate with an existing cluster and
> performance
> > >>> > > > > actions.
> > >>> > > > > > > > > > Currently,
> > >>> > > > > > > > > > > >> in
> > >>> > > > > > > > > > > >> per-job
> > >>> > > > > > > > > > > >> mode, we extract the job graph and bundle it
> into
> > >>> > cluster
> > >>> > > > > > > > deployment
> > >>> > > > > > > > > > and
> > >>> > > > > > > > > > > >> thus no
> > >>> > > > > > > > > > > >> concept of client get involved. It looks like
> > >>> > reasonable
> > >>> > > > to
> > >>> > > > > > > > exclude
> > >>> > > > > > > > > > the
> > >>> > > > > > > > > > > >> deployment
> > >>> > > > > > > > > > > >> of per-job cluster from client api and use
> dedicated
> > >>> > > > utility
> > >>> > > > > > > > > > > >> classes(deployers) for
> > >>> > > > > > > > > > > >> deployment.
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > > >> Zili Chen <wa...@gmail.com>
> 于2019年8月20日周二
> > >>> > 下午12:37写道：
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > > >> > Hi Aljoscha,
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > Thanks for your reply and participance. The
> Google
> > >>> > Doc
> > >>> > > > you
> > >>> > > > > > > > linked
> > >>> > > > > > > > > to
> > >>> > > > > > > > > > > >> > requires
> > >>> > > > > > > > > > > >> > permission and I think you could use a
> share link
> > >>> > > > instead.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > I agree with that we almost reach a
> consensus that
> > >>> > > > > > JobClient is
> > >>> > > > > > > > > > > >> necessary
> > >>> > > > > > > > > > > >> > to
> > >>> > > > > > > > > > > >> > interacte with a running Job.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > Let me check your open questions one by one.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > 1. Separate cluster creation and job
> submission for
> > >>> > > > > per-job
> > >>> > > > > > > > mode.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > As you mentioned here is where the opinions
> > >>> > diverge. In
> > >>> > > > my
> > >>> > > > > > > > > document
> > >>> > > > > > > > > > > >> there
> > >>> > > > > > > > > > > >> > is
> > >>> > > > > > > > > > > >> > an alternative[2] that proposes excluding
> per-job
> > >>> > > > > deployment
> > >>> > > > > > > > from
> > >>> > > > > > > > > > > client
> > >>> > > > > > > > > > > >> > api
> > >>> > > > > > > > > > > >> > scope and now I find it is more reasonable
> we do the
> > >>> > > > > > exclusion.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > When in per-job mode, a dedicated
> JobCluster is
> > >>> > launched
> > >>> > > > > to
> > >>> > > > > > > > > execute
> > >>> > > > > > > > > > > the
> > >>> > > > > > > > > > > >> > specific job. It is like a Flink
> Application more
> > >>> > than a
> > >>> > > > > > > > > submission
> > >>> > > > > > > > > > > >> > of Flink Job. Client only takes care of job
> > >>> > submission
> > >>> > > > and
> > >>> > > > > > > > assume
> > >>> > > > > > > > > > > there
> > >>> > > > > > > > > > > >> is
> > >>> > > > > > > > > > > >> > an existing cluster. In this way we are
> able to
> > >>> > consider
> > >>> > > > > > per-job
> > >>> > > > > > > > > > > issues
> > >>> > > > > > > > > > > >> > individually and JobClusterEntrypoint would
> be the
> > >>> > > > utility
> > >>> > > > > > class
> > >>> > > > > > > > > for
> > >>> > > > > > > > > > > >> > per-job
> > >>> > > > > > > > > > > >> > deployment.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > Nevertheless, user program works in both
> session
> > >>> > mode
> > >>> > > > and
> > >>> > > > > > > > per-job
> > >>> > > > > > > > > > mode
> > >>> > > > > > > > > > > >> > without
> > >>> > > > > > > > > > > >> > necessary to change code. JobClient in
> per-job mode
> > >>> > is
> > >>> > > > > > returned
> > >>> > > > > > > > > from
> > >>> > > > > > > > > > > >> > env.execute as normal. However, it would be
> no
> > >>> > longer a
> > >>> > > > > > wrapper
> > >>> > > > > > > > of
> > >>> > > > > > > > > > > >> > RestClusterClient but a wrapper of
> > >>> > PerJobClusterClient
> > >>> > > > > which
> > >>> > > > > > > > > > > >> communicates
> > >>> > > > > > > > > > > >> > to Dispatcher locally.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > 2. How to deal with plan preview.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > With env.compile functions users can get
> JobGraph or
> > >>> > > > > > FlinkPlan
> > >>> > > > > > > > and
> > >>> > > > > > > > > > > thus
> > >>> > > > > > > > > > > >> > they can preview the plan with programming.
> > >>> > Typically it
> > >>> > > > > > looks
> > >>> > > > > > > > > like
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > if (preview configured) {
> > >>> > > > > > > > > > > >> >     FlinkPlan plan = env.compile();
> > >>> > > > > > > > > > > >> >     new JSONDumpGenerator(...).dump(plan);
> > >>> > > > > > > > > > > >> > } else {
> > >>> > > > > > > > > > > >> >     env.execute();
> > >>> > > > > > > > > > > >> > }
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > And `flink info` would be invalid any more.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > 3. How to deal with Jar Submission at the
> Web
> > >>> > Frontend.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > There is one more thread talked on this
> topic[1].
> > >>> > Apart
> > >>> > > > > from
> > >>> > > > > > > > > > removing
> > >>> > > > > > > > > > > >> > the functions there are two alternatives.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > One is to introduce an interface has a
> method
> > >>> > returns
> > >>> > > > > > > > > > > JobGraph/FilnkPlan
> > >>> > > > > > > > > > > >> > and Jar Submission only support main-class
> > >>> > implements
> > >>> > > > this
> > >>> > > > > > > > > > interface.
> > >>> > > > > > > > > > > >> > And then extract the JobGraph/FlinkPlan
> just by
> > >>> > calling
> > >>> > > > > the
> > >>> > > > > > > > > method.
> > >>> > > > > > > > > > > >> > In this way, it is even possible to
> consider a
> > >>> > > > separation
> > >>> > > > > > of job
> > >>> > > > > > > > > > > >> creation
> > >>> > > > > > > > > > > >> > and job submission.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > The other is, as you mentioned, let
> execute() do the
> > >>> > > > > actual
> > >>> > > > > > > > > > execution.
> > >>> > > > > > > > > > > >> > We won't execute the main method in the
> WebFrontend
> > >>> > but
> > >>> > > > > > spawn a
> > >>> > > > > > > > > > > process
> > >>> > > > > > > > > > > >> > at WebMonitor side to execute. For return
> part we
> > >>> > could
> > >>> > > > > > generate
> > >>> > > > > > > > > the
> > >>> > > > > > > > > > > >> > JobID from WebMonitor and pass it to the
> execution
> > >>> > > > > > environemnt.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > 4. How to deal with detached mode.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > I think detached mode is a temporary
> solution for
> > >>> > > > > > non-blocking
> > >>> > > > > > > > > > > >> submission.
> > >>> > > > > > > > > > > >> > In my document both submission and
> execution return
> > >>> > a
> > >>> > > > > > > > > > > CompletableFuture
> > >>> > > > > > > > > > > >> and
> > >>> > > > > > > > > > > >> > users control whether or not wait for the
> result. In
> > >>> > > > this
> > >>> > > > > > point
> > >>> > > > > > > > we
> > >>> > > > > > > > > > > don't
> > >>> > > > > > > > > > > >> > need a detached option but the
> functionality is
> > >>> > covered.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > 5. How does per-job mode interact with
> interactive
> > >>> > > > > > programming.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > All of YARN, Mesos and Kubernetes scenarios
> follow
> > >>> > the
> > >>> > > > > > pattern
> > >>> > > > > > > > > > launch
> > >>> > > > > > > > > > > a
> > >>> > > > > > > > > > > >> > JobCluster now. And I don't think there
> would be
> > >>> > > > > > inconsistency
> > >>> > > > > > > > > > between
> > >>> > > > > > > > > > > >> > different resource management.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > Best,
> > >>> > > > > > > > > > > >> > tison.
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > [1]
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> >
> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> > >>> > > > > > > > > > > >> > [2]
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> > Aljoscha Krettek <al...@apache.org>
> > >>> > 于2019年8月16日周五
> > >>> > > > > > 下午9:20写道：
> > >>> > > > > > > > > > > >> >
> > >>> > > > > > > > > > > >> >> Hi,
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> I read both Jeffs initial design document
> and the
> > >>> > newer
> > >>> > > > > > > > document
> > >>> > > > > > > > > by
> > >>> > > > > > > > > > > >> >> Tison. I also finally found the time to
> collect our
> > >>> > > > > > thoughts on
> > >>> > > > > > > > > the
> > >>> > > > > > > > > > > >> issue,
> > >>> > > > > > > > > > > >> >> I had quite some discussions with Kostas
> and this
> > >>> > is
> > >>> > > > the
> > >>> > > > > > > > result:
> > >>> > > > > > > > > > [1].
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> I think overall we agree that this part of
> the
> > >>> > code is
> > >>> > > > in
> > >>> > > > > > dire
> > >>> > > > > > > > > need
> > >>> > > > > > > > > > > of
> > >>> > > > > > > > > > > >> >> some refactoring/improvements but I think
> there are
> > >>> > > > still
> > >>> > > > > > some
> > >>> > > > > > > > > open
> > >>> > > > > > > > > > > >> >> questions and some differences in opinion
> what
> > >>> > those
> > >>> > > > > > > > refactorings
> > >>> > > > > > > > > > > >> should
> > >>> > > > > > > > > > > >> >> look like.
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> I think the API-side is quite clear, i.e.
> we need
> > >>> > some
> > >>> > > > > > > > JobClient
> > >>> > > > > > > > > > API
> > >>> > > > > > > > > > > >> that
> > >>> > > > > > > > > > > >> >> allows interacting with a running Job. It
> could be
> > >>> > > > > > worthwhile
> > >>> > > > > > > > to
> > >>> > > > > > > > > > spin
> > >>> > > > > > > > > > > >> that
> > >>> > > > > > > > > > > >> >> off into a separate FLIP because we can
> probably
> > >>> > find
> > >>> > > > > > consensus
> > >>> > > > > > > > > on
> > >>> > > > > > > > > > > that
> > >>> > > > > > > > > > > >> >> part more easily.
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> For the rest, the main open questions from
> our doc
> > >>> > are
> > >>> > > > > > these:
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >>   - Do we want to separate cluster
> creation and job
> > >>> > > > > > submission
> > >>> > > > > > > > > for
> > >>> > > > > > > > > > > >> >> per-job mode? In the past, there were
> conscious
> > >>> > efforts
> > >>> > > > > to
> > >>> > > > > > > > *not*
> > >>> > > > > > > > > > > >> separate
> > >>> > > > > > > > > > > >> >> job submission from cluster creation for
> per-job
> > >>> > > > clusters
> > >>> > > > > > for
> > >>> > > > > > > > > > Mesos,
> > >>> > > > > > > > > > > >> YARN,
> > >>> > > > > > > > > > > >> >> Kubernets (see
> StandaloneJobClusterEntryPoint).
> > >>> > Tison
> > >>> > > > > > suggests
> > >>> > > > > > > > in
> > >>> > > > > > > > > > his
> > >>> > > > > > > > > > > >> >> design document to decouple this in order
> to unify
> > >>> > job
> > >>> > > > > > > > > submission.
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >>   - How to deal with plan preview, which
> needs to
> > >>> > > > hijack
> > >>> > > > > > > > > execute()
> > >>> > > > > > > > > > > and
> > >>> > > > > > > > > > > >> >> let the outside code catch an exception?
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >>   - How to deal with Jar Submission at the
> Web
> > >>> > > > Frontend,
> > >>> > > > > > which
> > >>> > > > > > > > > > needs
> > >>> > > > > > > > > > > to
> > >>> > > > > > > > > > > >> >> hijack execute() and let the outside code
> catch an
> > >>> > > > > > exception?
> > >>> > > > > > > > > > > >> >> CliFrontend.run() “hijacks”
> > >>> > > > > ExecutionEnvironment.execute()
> > >>> > > > > > to
> > >>> > > > > > > > > get a
> > >>> > > > > > > > > > > >> >> JobGraph and then execute that JobGraph
> manually.
> > >>> > We
> > >>> > > > > could
> > >>> > > > > > get
> > >>> > > > > > > > > > around
> > >>> > > > > > > > > > > >> that
> > >>> > > > > > > > > > > >> >> by letting execute() do the actual
> execution. One
> > >>> > > > caveat
> > >>> > > > > > for
> > >>> > > > > > > > this
> > >>> > > > > > > > > > is
> > >>> > > > > > > > > > > >> that
> > >>> > > > > > > > > > > >> >> now the main() method doesn’t return (or
> is forced
> > >>> > to
> > >>> > > > > > return by
> > >>> > > > > > > > > > > >> throwing an
> > >>> > > > > > > > > > > >> >> exception from execute()) which means that
> for Jar
> > >>> > > > > > Submission
> > >>> > > > > > > > > from
> > >>> > > > > > > > > > > the
> > >>> > > > > > > > > > > >> >> WebFrontend we have a long-running main()
> method
> > >>> > > > running
> > >>> > > > > > in the
> > >>> > > > > > > > > > > >> >> WebFrontend. This doesn’t sound very good.
> We
> > >>> > could get
> > >>> > > > > > around
> > >>> > > > > > > > > this
> > >>> > > > > > > > > > > by
> > >>> > > > > > > > > > > >> >> removing the plan preview feature and by
> removing
> > >>> > Jar
> > >>> > > > > > > > > > > >> Submission/Running.
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >>   - How to deal with detached mode? Right
> now,
> > >>> > > > > > > > > DetachedEnvironment
> > >>> > > > > > > > > > > will
> > >>> > > > > > > > > > > >> >> execute the job and return immediately. If
> users
> > >>> > > > control
> > >>> > > > > > when
> > >>> > > > > > > > > they
> > >>> > > > > > > > > > > >> want to
> > >>> > > > > > > > > > > >> >> return, by waiting on the job completion
> future,
> > >>> > how do
> > >>> > > > > we
> > >>> > > > > > deal
> > >>> > > > > > > > > > with
> > >>> > > > > > > > > > > >> this?
> > >>> > > > > > > > > > > >> >> Do we simply remove the distinction between
> > >>> > > > > > > > > detached/non-detached?
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >>   - How does per-job mode interact with
> > >>> > “interactive
> > >>> > > > > > > > programming”
> > >>> > > > > > > > > > > >> >> (FLIP-36). For YARN, each execute() call
> could
> > >>> > spawn a
> > >>> > > > > new
> > >>> > > > > > > > Flink
> > >>> > > > > > > > > > YARN
> > >>> > > > > > > > > > > >> >> cluster. What about Mesos and Kubernetes?
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> The first open question is where the
> opinions
> > >>> > diverge,
> > >>> > > > I
> > >>> > > > > > think.
> > >>> > > > > > > > > The
> > >>> > > > > > > > > > > >> rest
> > >>> > > > > > > > > > > >> >> are just open questions and interesting
> things
> > >>> > that we
> > >>> > > > > > need to
> > >>> > > > > > > > > > > >> consider.
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> Best,
> > >>> > > > > > > > > > > >> >> Aljoscha
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> [1]
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > >>> > > > > > > > > > > >> >> <
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <
> > >>> > > > > zjffdu@gmail.com>
> > >>> > > > > > > > > wrote:
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >> > Thanks tison for the effort. I left a few
> > >>> > comments.
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >> > Zili Chen <wa...@gmail.com>
> 于2019年7月31日周三
> > >>> > > > > 下午8:24写道：
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >> >> Hi Flavio,
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >> Thanks for your reply.
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >> Either current impl and in the design,
> > >>> > ClusterClient
> > >>> > > > > > > > > > > >> >> >> never takes responsibility for
> generating
> > >>> > JobGraph.
> > >>> > > > > > > > > > > >> >> >> (what you see in current codebase is
> several
> > >>> > class
> > >>> > > > > > methods)
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >> Instead, user describes his program in
> the main
> > >>> > > > method
> > >>> > > > > > > > > > > >> >> >> with ExecutionEnvironment apis and calls
> > >>> > > > env.compile()
> > >>> > > > > > > > > > > >> >> >> or env.optimize() to get FlinkPlan and
> JobGraph
> > >>> > > > > > > > respectively.
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >> For listing main classes in a jar and
> choose
> > >>> > one for
> > >>> > > > > > > > > > > >> >> >> submission, you're now able to
> customize a CLI
> > >>> > to do
> > >>> > > > > it.
> > >>> > > > > > > > > > > >> >> >> Specifically, the path of jar is passed
> as
> > >>> > arguments
> > >>> > > > > and
> > >>> > > > > > > > > > > >> >> >> in the customized CLI you list main
> classes,
> > >>> > choose
> > >>> > > > > one
> > >>> > > > > > > > > > > >> >> >> to submit to the cluster.
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >> Best,
> > >>> > > > > > > > > > > >> >> >> tison.
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >> Flavio Pompermaier <
> pompermaier@okkam.it>
> > >>> > > > > 于2019年7月31日周三
> > >>> > > > > > > > > > 下午8:12写道：
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >>> Just one note on my side: it is not
> clear to me
> > >>> > > > > > whether the
> > >>> > > > > > > > > > > client
> > >>> > > > > > > > > > > >> >> needs
> > >>> > > > > > > > > > > >> >> >> to
> > >>> > > > > > > > > > > >> >> >>> be able to generate a job graph or not.
> > >>> > > > > > > > > > > >> >> >>> In my opinion, the job jar must
> resides only
> > >>> > on the
> > >>> > > > > > > > > > > >> server/jobManager
> > >>> > > > > > > > > > > >> >> >> side
> > >>> > > > > > > > > > > >> >> >>> and the client requires a way to get
> the job
> > >>> > graph.
> > >>> > > > > > > > > > > >> >> >>> If you really want to access to the
> job graph,
> > >>> > I'd
> > >>> > > > > add
> > >>> > > > > > a
> > >>> > > > > > > > > > > dedicated
> > >>> > > > > > > > > > > >> >> method
> > >>> > > > > > > > > > > >> >> >>> on the ClusterClient. like:
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>>   - getJobGraph(jarId, mainClass):
> JobGraph
> > >>> > > > > > > > > > > >> >> >>>   - listMainClasses(jarId):
> List<String>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>> These would require some addition also
> on the
> > >>> > job
> > >>> > > > > > manager
> > >>> > > > > > > > > > > endpoint
> > >>> > > > > > > > > > > >> as
> > >>> > > > > > > > > > > >> >> >>> well..what do you think?
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili
> Chen <
> > >>> > > > > > > > > > wander4096@gmail.com
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > > >> >> wrote:
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>>> Hi all,
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> Here is a document[1] on client api
> > >>> > enhancement
> > >>> > > > from
> > >>> > > > > > our
> > >>> > > > > > > > > > > >> perspective.
> > >>> > > > > > > > > > > >> >> >>>> We have investigated current
> implementations.
> > >>> > And
> > >>> > > > we
> > >>> > > > > > > > propose
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> 1. Unify the implementation of cluster
> > >>> > deployment
> > >>> > > > > and
> > >>> > > > > > job
> > >>> > > > > > > > > > > >> submission
> > >>> > > > > > > > > > > >> >> in
> > >>> > > > > > > > > > > >> >> >>>> Flink.
> > >>> > > > > > > > > > > >> >> >>>> 2. Provide programmatic interfaces to
> allow
> > >>> > > > flexible
> > >>> > > > > > job
> > >>> > > > > > > > and
> > >>> > > > > > > > > > > >> cluster
> > >>> > > > > > > > > > > >> >> >>>> management.
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> The first proposal is aimed at
> reducing code
> > >>> > paths
> > >>> > > > > of
> > >>> > > > > > > > > cluster
> > >>> > > > > > > > > > > >> >> >> deployment
> > >>> > > > > > > > > > > >> >> >>>> and
> > >>> > > > > > > > > > > >> >> >>>> job submission so that one can adopt
> Flink in
> > >>> > his
> > >>> > > > > > usage
> > >>> > > > > > > > > > easily.
> > >>> > > > > > > > > > > >> The
> > >>> > > > > > > > > > > >> >> >>> second
> > >>> > > > > > > > > > > >> >> >>>> proposal is aimed at providing rich
> > >>> > interfaces for
> > >>> > > > > > > > advanced
> > >>> > > > > > > > > > > users
> > >>> > > > > > > > > > > >> >> >>>> who want to make accurate control of
> these
> > >>> > stages.
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> Quick reference on open questions:
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> 1. Exclude job cluster deployment
> from client
> > >>> > side
> > >>> > > > > or
> > >>> > > > > > > > > redefine
> > >>> > > > > > > > > > > the
> > >>> > > > > > > > > > > >> >> >>> semantic
> > >>> > > > > > > > > > > >> >> >>>> of job cluster? Since it fits in a
> process
> > >>> > quite
> > >>> > > > > > different
> > >>> > > > > > > > > > from
> > >>> > > > > > > > > > > >> >> session
> > >>> > > > > > > > > > > >> >> >>>> cluster deployment and job submission.
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> 2. Maintain the codepaths handling
> class
> > >>> > > > > > > > > > > o.a.f.api.common.Program
> > >>> > > > > > > > > > > >> or
> > >>> > > > > > > > > > > >> >> >>>> implement customized program handling
> logic by
> > >>> > > > > > customized
> > >>> > > > > > > > > > > >> >> CliFrontend?
> > >>> > > > > > > > > > > >> >> >>>> See also this thread[2] and the
> document[1].
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> 3. Expose ClusterClient as public api
> or just
> > >>> > > > expose
> > >>> > > > > > api
> > >>> > > > > > > > in
> > >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>> and delegate them to ClusterClient?
> Further,
> > >>> > in
> > >>> > > > > > either way
> > >>> > > > > > > > > is
> > >>> > > > > > > > > > it
> > >>> > > > > > > > > > > >> >> worth
> > >>> > > > > > > > > > > >> >> >> to
> > >>> > > > > > > > > > > >> >> >>>> introduce a JobClient which is an
> > >>> > encapsulation of
> > >>> > > > > > > > > > ClusterClient
> > >>> > > > > > > > > > > >> that
> > >>> > > > > > > > > > > >> >> >>>> associated to specific job?
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> Best,
> > >>> > > > > > > > > > > >> >> >>>> tison.
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> [1]
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> > >>> > > > > > > > > > > >> >> >>>> [2]
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> >
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>> Jeff Zhang <zj...@gmail.com>
> 于2019年7月24日周三
> > >>> > > > > 上午9:19写道：
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>>> Thanks Stephan, I will follow up
> this issue
> > >>> > in
> > >>> > > > next
> > >>> > > > > > few
> > >>> > > > > > > > > > weeks,
> > >>> > > > > > > > > > > >> and
> > >>> > > > > > > > > > > >> >> >> will
> > >>> > > > > > > > > > > >> >> >>>>> refine the design doc. We could
> discuss more
> > >>> > > > > details
> > >>> > > > > > > > after
> > >>> > > > > > > > > > 1.9
> > >>> > > > > > > > > > > >> >> >> release.
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>> Stephan Ewen <se...@apache.org>
> > >>> > 于2019年7月24日周三
> > >>> > > > > > 上午12:58写道：
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>>> Hi all!
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>> This thread has stalled for a bit,
> which I
> > >>> > > > assume
> > >>> > > > > > ist
> > >>> > > > > > > > > mostly
> > >>> > > > > > > > > > > >> due to
> > >>> > > > > > > > > > > >> >> >>> the
> > >>> > > > > > > > > > > >> >> >>>>>> Flink 1.9 feature freeze and
> release testing
> > >>> > > > > effort.
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>> I personally still recognize this
> issue as
> > >>> > one
> > >>> > > > > > important
> > >>> > > > > > > > > to
> > >>> > > > > > > > > > be
> > >>> > > > > > > > > > > >> >> >>> solved.
> > >>> > > > > > > > > > > >> >> >>>>> I'd
> > >>> > > > > > > > > > > >> >> >>>>>> be happy to help resume this
> discussion soon
> > >>> > > > > (after
> > >>> > > > > > the
> > >>> > > > > > > > > 1.9
> > >>> > > > > > > > > > > >> >> >> release)
> > >>> > > > > > > > > > > >> >> >>>> and
> > >>> > > > > > > > > > > >> >> >>>>>> see if we can do some step towards
> this in
> > >>> > Flink
> > >>> > > > > > 1.10.
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>> Best,
> > >>> > > > > > > > > > > >> >> >>>>>> Stephan
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM
> Flavio
> > >>> > > > > Pompermaier
> > >>> > > > > > <
> > >>> > > > > > > > > > > >> >> >>>>> pompermaier@okkam.it>
> > >>> > > > > > > > > > > >> >> >>>>>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>> That's exactly what I suggested a
> long time
> > >>> > > > ago:
> > >>> > > > > > the
> > >>> > > > > > > > > Flink
> > >>> > > > > > > > > > > REST
> > >>> > > > > > > > > > > >> >> >>>> client
> > >>> > > > > > > > > > > >> >> >>>>>>> should not require any Flink
> dependency,
> > >>> > only
> > >>> > > > > http
> > >>> > > > > > > > > library
> > >>> > > > > > > > > > to
> > >>> > > > > > > > > > > >> >> >> call
> > >>> > > > > > > > > > > >> >> >>>> the
> > >>> > > > > > > > > > > >> >> >>>>>> REST
> > >>> > > > > > > > > > > >> >> >>>>>>> services to submit and monitor a
> job.
> > >>> > > > > > > > > > > >> >> >>>>>>> What I suggested also in [1] was
> to have a
> > >>> > way
> > >>> > > > to
> > >>> > > > > > > > > > > automatically
> > >>> > > > > > > > > > > >> >> >>>> suggest
> > >>> > > > > > > > > > > >> >> >>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>> user (via a UI) the available main
> classes
> > >>> > and
> > >>> > > > > > their
> > >>> > > > > > > > > > required
> > >>> > > > > > > > > > > >> >> >>>>>>> parameters[2].
> > >>> > > > > > > > > > > >> >> >>>>>>> Another problem we have with Flink
> is that
> > >>> > the
> > >>> > > > > Rest
> > >>> > > > > > > > > client
> > >>> > > > > > > > > > > and
> > >>> > > > > > > > > > > >> >> >> the
> > >>> > > > > > > > > > > >> >> >>>> CLI
> > >>> > > > > > > > > > > >> >> >>>>>> one
> > >>> > > > > > > > > > > >> >> >>>>>>> behaves differently and we use the
> CLI
> > >>> > client
> > >>> > > > > (via
> > >>> > > > > > ssh)
> > >>> > > > > > > > > > > because
> > >>> > > > > > > > > > > >> >> >> it
> > >>> > > > > > > > > > > >> >> >>>>> allows
> > >>> > > > > > > > > > > >> >> >>>>>>> to call some other method after
> > >>> > env.execute()
> > >>> > > > [3]
> > >>> > > > > > (we
> > >>> > > > > > > > > have
> > >>> > > > > > > > > > to
> > >>> > > > > > > > > > > >> >> >> call
> > >>> > > > > > > > > > > >> >> >>>>>> another
> > >>> > > > > > > > > > > >> >> >>>>>>> REST service to signal the end of
> the job).
> > >>> > > > > > > > > > > >> >> >>>>>>> Int his regard, a dedicated
> interface,
> > >>> > like the
> > >>> > > > > > > > > JobListener
> > >>> > > > > > > > > > > >> >> >>> suggested
> > >>> > > > > > > > > > > >> >> >>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>> the previous emails, would be very
> helpful
> > >>> > > > > (IMHO).
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>> [1]
> > >>> > > > > > https://issues.apache.org/jira/browse/FLINK-10864
> > >>> > > > > > > > > > > >> >> >>>>>>> [2]
> > >>> > > > > > https://issues.apache.org/jira/browse/FLINK-10862
> > >>> > > > > > > > > > > >> >> >>>>>>> [3]
> > >>> > > > > > https://issues.apache.org/jira/browse/FLINK-10879
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>> Best,
> > >>> > > > > > > > > > > >> >> >>>>>>> Flavio
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM
> Jeff Zhang
> > >>> > <
> > >>> > > > > > > > > > zjffdu@gmail.com
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > > >> >> >>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>> Hi, Tison,
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>> Thanks for your comments. Overall
> I agree
> > >>> > with
> > >>> > > > > you
> > >>> > > > > > > > that
> > >>> > > > > > > > > it
> > >>> > > > > > > > > > > is
> > >>> > > > > > > > > > > >> >> >>>>> difficult
> > >>> > > > > > > > > > > >> >> >>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>> down stream project to integrate
> with
> > >>> > flink
> > >>> > > > and
> > >>> > > > > we
> > >>> > > > > > > > need
> > >>> > > > > > > > > to
> > >>> > > > > > > > > > > >> >> >>> refactor
> > >>> > > > > > > > > > > >> >> >>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>> current flink client api.
> > >>> > > > > > > > > > > >> >> >>>>>>>> And I agree that CliFrontend
> should only
> > >>> > > > parsing
> > >>> > > > > > > > command
> > >>> > > > > > > > > > > line
> > >>> > > > > > > > > > > >> >> >>>>> arguments
> > >>> > > > > > > > > > > >> >> >>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>> then pass them to
> ExecutionEnvironment.
> > >>> > It is
> > >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment's
> > >>> > > > > > > > > > > >> >> >>>>>>>> responsibility to compile job,
> create
> > >>> > cluster,
> > >>> > > > > and
> > >>> > > > > > > > > submit
> > >>> > > > > > > > > > > job.
> > >>> > > > > > > > > > > >> >> >>>>> Besides
> > >>> > > > > > > > > > > >> >> >>>>>>>> that, Currently flink has many
> > >>> > > > > > ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>> implementations,
> > >>> > > > > > > > > > > >> >> >>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>> flink will use the specific one
> based on
> > >>> > the
> > >>> > > > > > context.
> > >>> > > > > > > > > > IMHO,
> > >>> > > > > > > > > > > it
> > >>> > > > > > > > > > > >> >> >> is
> > >>> > > > > > > > > > > >> >> >>>> not
> > >>> > > > > > > > > > > >> >> >>>>>>>> necessary, ExecutionEnvironment
> should be
> > >>> > able
> > >>> > > > > to
> > >>> > > > > > do
> > >>> > > > > > > > the
> > >>> > > > > > > > > > > right
> > >>> > > > > > > > > > > >> >> >>>> thing
> > >>> > > > > > > > > > > >> >> >>>>>>> based
> > >>> > > > > > > > > > > >> >> >>>>>>>> on the FlinkConf it is received.
> Too many
> > >>> > > > > > > > > > > ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>> implementation is another burden
> for
> > >>> > > > downstream
> > >>> > > > > > > > project
> > >>> > > > > > > > > > > >> >> >>>> integration.
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>> One thing I'd like to mention is
> flink's
> > >>> > scala
> > >>> > > > > > shell
> > >>> > > > > > > > and
> > >>> > > > > > > > > > sql
> > >>> > > > > > > > > > > >> >> >>>> client,
> > >>> > > > > > > > > > > >> >> >>>>>>>> although they are sub-modules of
> flink,
> > >>> > they
> > >>> > > > > > could be
> > >>> > > > > > > > > > > treated
> > >>> > > > > > > > > > > >> >> >> as
> > >>> > > > > > > > > > > >> >> >>>>>>> downstream
> > >>> > > > > > > > > > > >> >> >>>>>>>> project which use flink's client
> api.
> > >>> > > > Currently
> > >>> > > > > > you
> > >>> > > > > > > > will
> > >>> > > > > > > > > > > find
> > >>> > > > > > > > > > > >> >> >> it
> > >>> > > > > > > > > > > >> >> >>> is
> > >>> > > > > > > > > > > >> >> >>>>> not
> > >>> > > > > > > > > > > >> >> >>>>>>>> easy for them to integrate with
> flink,
> > >>> > they
> > >>> > > > > share
> > >>> > > > > > many
> > >>> > > > > > > > > > > >> >> >> duplicated
> > >>> > > > > > > > > > > >> >> >>>>> code.
> > >>> > > > > > > > > > > >> >> >>>>>>> It
> > >>> > > > > > > > > > > >> >> >>>>>>>> is another sign that we should
> refactor
> > >>> > flink
> > >>> > > > > > client
> > >>> > > > > > > > > api.
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>> I believe it is a large and hard
> change,
> > >>> > and I
> > >>> > > > > am
> > >>> > > > > > > > afraid
> > >>> > > > > > > > > > we
> > >>> > > > > > > > > > > >> can
> > >>> > > > > > > > > > > >> >> >>> not
> > >>> > > > > > > > > > > >> >> >>>>>> keep
> > >>> > > > > > > > > > > >> >> >>>>>>>> compatibility since many of
> changes are
> > >>> > user
> > >>> > > > > > facing.
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>> Zili Chen <wa...@gmail.com>
> > >>> > > > 于2019年6月24日周一
> > >>> > > > > > > > > 下午2:53写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> Hi all,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> After a closer look on our
> client apis,
> > >>> > I can
> > >>> > > > > see
> > >>> > > > > > > > there
> > >>> > > > > > > > > > are
> > >>> > > > > > > > > > > >> >> >> two
> > >>> > > > > > > > > > > >> >> >>>>> major
> > >>> > > > > > > > > > > >> >> >>>>>>>>> issues to consistency and
> integration,
> > >>> > namely
> > >>> > > > > > > > different
> > >>> > > > > > > > > > > >> >> >>>> deployment
> > >>> > > > > > > > > > > >> >> >>>>> of
> > >>> > > > > > > > > > > >> >> >>>>>>>>> job cluster which couples job
> graph
> > >>> > creation
> > >>> > > > > and
> > >>> > > > > > > > > cluster
> > >>> > > > > > > > > > > >> >> >>>>> deployment,
> > >>> > > > > > > > > > > >> >> >>>>>>>>> and submission via CliFrontend
> confusing
> > >>> > > > > control
> > >>> > > > > > flow
> > >>> > > > > > > > > of
> > >>> > > > > > > > > > > job
> > >>> > > > > > > > > > > >> >> >>>> graph
> > >>> > > > > > > > > > > >> >> >>>>>>>>> compilation and job submission.
> I'd like
> > >>> > to
> > >>> > > > > > follow
> > >>> > > > > > > > the
> > >>> > > > > > > > > > > >> >> >> discuss
> > >>> > > > > > > > > > > >> >> >>>>> above,
> > >>> > > > > > > > > > > >> >> >>>>>>>>> mainly the process described by
> Jeff and
> > >>> > > > > > Stephan, and
> > >>> > > > > > > > > > share
> > >>> > > > > > > > > > > >> >> >> my
> > >>> > > > > > > > > > > >> >> >>>>>>>>> ideas on these issues.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend confuses the
> control flow
> > >>> > of
> > >>> > > > job
> > >>> > > > > > > > > > compilation
> > >>> > > > > > > > > > > >> >> >> and
> > >>> > > > > > > > > > > >> >> >>>>>>>> submission.
> > >>> > > > > > > > > > > >> >> >>>>>>>>> Following the process of job
> submission
> > >>> > > > Stephan
> > >>> > > > > > and
> > >>> > > > > > > > > Jeff
> > >>> > > > > > > > > > > >> >> >>>> described,
> > >>> > > > > > > > > > > >> >> >>>>>>>>> execution environment knows all
> configs
> > >>> > of
> > >>> > > > the
> > >>> > > > > > > > cluster
> > >>> > > > > > > > > > and
> > >>> > > > > > > > > > > >> >> >>>>>>> topos/settings
> > >>> > > > > > > > > > > >> >> >>>>>>>>> of the job. Ideally, in the main
> method
> > >>> > of
> > >>> > > > user
> > >>> > > > > > > > > program,
> > >>> > > > > > > > > > it
> > >>> > > > > > > > > > > >> >> >>> calls
> > >>> > > > > > > > > > > >> >> >>>>>>>> #execute
> > >>> > > > > > > > > > > >> >> >>>>>>>>> (or named #submit) and Flink
> deploys the
> > >>> > > > > cluster,
> > >>> > > > > > > > > compile
> > >>> > > > > > > > > > > the
> > >>> > > > > > > > > > > >> >> >>> job
> > >>> > > > > > > > > > > >> >> >>>>>> graph
> > >>> > > > > > > > > > > >> >> >>>>>>>>> and submit it to the cluster.
> However,
> > >>> > > > current
> > >>> > > > > > > > > > CliFrontend
> > >>> > > > > > > > > > > >> >> >> does
> > >>> > > > > > > > > > > >> >> >>>> all
> > >>> > > > > > > > > > > >> >> >>>>>>> these
> > >>> > > > > > > > > > > >> >> >>>>>>>>> things inside its #runProgram
> method,
> > >>> > which
> > >>> > > > > > > > introduces
> > >>> > > > > > > > > a
> > >>> > > > > > > > > > > lot
> > >>> > > > > > > > > > > >> >> >> of
> > >>> > > > > > > > > > > >> >> >>>>>>>> subclasses
> > >>> > > > > > > > > > > >> >> >>>>>>>>> of (stream) execution
> environment.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> Actually, it sets up an exec env
> that
> > >>> > hijacks
> > >>> > > > > the
> > >>> > > > > > > > > > > >> >> >>>>>> #execute/executePlan
> > >>> > > > > > > > > > > >> >> >>>>>>>>> method, initializes the job
> graph and
> > >>> > abort
> > >>> > > > > > > > execution.
> > >>> > > > > > > > > > And
> > >>> > > > > > > > > > > >> >> >> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>> control flow back to
> CliFrontend, it
> > >>> > deploys
> > >>> > > > > the
> > >>> > > > > > > > > > cluster(or
> > >>> > > > > > > > > > > >> >> >>>>> retrieve
> > >>> > > > > > > > > > > >> >> >>>>>>>>> the client) and submits the job
> graph.
> > >>> > This
> > >>> > > > is
> > >>> > > > > > quite
> > >>> > > > > > > > a
> > >>> > > > > > > > > > > >> >> >> specific
> > >>> > > > > > > > > > > >> >> >>>>>>> internal
> > >>> > > > > > > > > > > >> >> >>>>>>>>> process inside Flink and none of
> > >>> > consistency
> > >>> > > > to
> > >>> > > > > > > > > anything.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> 2) Deployment of job cluster
> couples job
> > >>> > > > graph
> > >>> > > > > > > > creation
> > >>> > > > > > > > > > and
> > >>> > > > > > > > > > > >> >> >>>> cluster
> > >>> > > > > > > > > > > >> >> >>>>>>>>> deployment. Abstractly, from
> user job to
> > >>> > a
> > >>> > > > > > concrete
> > >>> > > > > > > > > > > >> >> >> submission,
> > >>> > > > > > > > > > > >> >> >>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>> requires
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>     create JobGraph --\
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> create ClusterClient -->  submit
> JobGraph
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> such a dependency. ClusterClient
> was
> > >>> > created
> > >>> > > > by
> > >>> > > > > > > > > deploying
> > >>> > > > > > > > > > > or
> > >>> > > > > > > > > > > >> >> >>>>>>> retrieving.
> > >>> > > > > > > > > > > >> >> >>>>>>>>> JobGraph submission requires a
> compiled
> > >>> > > > > JobGraph
> > >>> > > > > > and
> > >>> > > > > > > > > > valid
> > >>> > > > > > > > > > > >> >> >>>>>>> ClusterClient,
> > >>> > > > > > > > > > > >> >> >>>>>>>>> but the creation of
> ClusterClient is
> > >>> > > > abstractly
> > >>> > > > > > > > > > independent
> > >>> > > > > > > > > > > >> >> >> of
> > >>> > > > > > > > > > > >> >> >>>> that
> > >>> > > > > > > > > > > >> >> >>>>>> of
> > >>> > > > > > > > > > > >> >> >>>>>>>>> JobGraph. However, in job
> cluster mode,
> > >>> > we
> > >>> > > > > > deploy job
> > >>> > > > > > > > > > > cluster
> > >>> > > > > > > > > > > >> >> >>>> with
> > >>> > > > > > > > > > > >> >> >>>>> a
> > >>> > > > > > > > > > > >> >> >>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>> graph, which means we use another
> > >>> > process:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> create JobGraph --> deploy
> cluster with
> > >>> > the
> > >>> > > > > > JobGraph
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> Here is another inconsistency and
> > >>> > downstream
> > >>> > > > > > > > > > > projects/client
> > >>> > > > > > > > > > > >> >> >>> apis
> > >>> > > > > > > > > > > >> >> >>>>> are
> > >>> > > > > > > > > > > >> >> >>>>>>>>> forced to handle different cases
> with
> > >>> > rare
> > >>> > > > > > supports
> > >>> > > > > > > > > from
> > >>> > > > > > > > > > > >> >> >> Flink.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> Since we likely reached a
> consensus on
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> 1. all configs gathered by Flink
> > >>> > > > configuration
> > >>> > > > > > and
> > >>> > > > > > > > > passed
> > >>> > > > > > > > > > > >> >> >>>>>>>>> 2. execution environment knows
> all
> > >>> > configs
> > >>> > > > and
> > >>> > > > > > > > handles
> > >>> > > > > > > > > > > >> >> >>>>> execution(both
> > >>> > > > > > > > > > > >> >> >>>>>>>>> deployment and submission)
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> to the issues above I propose
> eliminating
> > >>> > > > > > > > > inconsistencies
> > >>> > > > > > > > > > > by
> > >>> > > > > > > > > > > >> >> >>>>>> following
> > >>> > > > > > > > > > > >> >> >>>>>>>>> approach:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend should exactly be
> a front
> > >>> > end,
> > >>> > > > > at
> > >>> > > > > > > > least
> > >>> > > > > > > > > > for
> > >>> > > > > > > > > > > >> >> >>> "run"
> > >>> > > > > > > > > > > >> >> >>>>>>> command.
> > >>> > > > > > > > > > > >> >> >>>>>>>>> That means it just gathered and
> passed
> > >>> > all
> > >>> > > > > config
> > >>> > > > > > > > from
> > >>> > > > > > > > > > > >> >> >> command
> > >>> > > > > > > > > > > >> >> >>>> line
> > >>> > > > > > > > > > > >> >> >>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>> the main method of user program.
> > >>> > Execution
> > >>> > > > > > > > environment
> > >>> > > > > > > > > > > knows
> > >>> > > > > > > > > > > >> >> >>> all
> > >>> > > > > > > > > > > >> >> >>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>> info
> > >>> > > > > > > > > > > >> >> >>>>>>>>> and with an addition to utils for
> > >>> > > > > ClusterClient,
> > >>> > > > > > we
> > >>> > > > > > > > > > > >> >> >> gracefully
> > >>> > > > > > > > > > > >> >> >>>> get
> > >>> > > > > > > > > > > >> >> >>>>> a
> > >>> > > > > > > > > > > >> >> >>>>>>>>> ClusterClient by deploying or
> > >>> > retrieving. In
> > >>> > > > > this
> > >>> > > > > > > > way,
> > >>> > > > > > > > > we
> > >>> > > > > > > > > > > >> >> >> don't
> > >>> > > > > > > > > > > >> >> >>>>> need
> > >>> > > > > > > > > > > >> >> >>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>> hijack #execute/executePlan
> methods and
> > >>> > can
> > >>> > > > > > remove
> > >>> > > > > > > > > > various
> > >>> > > > > > > > > > > >> >> >>>> hacking
> > >>> > > > > > > > > > > >> >> >>>>>>>>> subclasses of exec env, as well
> as #run
> > >>> > > > methods
> > >>> > > > > > in
> > >>> > > > > > > > > > > >> >> >>>>> ClusterClient(for
> > >>> > > > > > > > > > > >> >> >>>>>> an
> > >>> > > > > > > > > > > >> >> >>>>>>>>> interface-ized ClusterClient).
> Now the
> > >>> > > > control
> > >>> > > > > > flow
> > >>> > > > > > > > > flows
> > >>> > > > > > > > > > > >> >> >> from
> > >>> > > > > > > > > > > >> >> >>>>>>>> CliFrontend
> > >>> > > > > > > > > > > >> >> >>>>>>>>> to the main method and never
> returns.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> 2) Job cluster means a cluster
> for the
> > >>> > > > specific
> > >>> > > > > > job.
> > >>> > > > > > > > > From
> > >>> > > > > > > > > > > >> >> >>> another
> > >>> > > > > > > > > > > >> >> >>>>>>>>> perspective, it is an ephemeral
> session.
> > >>> > We
> > >>> > > > may
> > >>> > > > > > > > > decouple
> > >>> > > > > > > > > > > the
> > >>> > > > > > > > > > > >> >> >>>>>> deployment
> > >>> > > > > > > > > > > >> >> >>>>>>>>> with a compiled job graph, but
> start a
> > >>> > > > session
> > >>> > > > > > with
> > >>> > > > > > > > > idle
> > >>> > > > > > > > > > > >> >> >>> timeout
> > >>> > > > > > > > > > > >> >> >>>>>>>>> and submit the job following.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> These topics, before we go into
> more
> > >>> > details
> > >>> > > > on
> > >>> > > > > > > > design
> > >>> > > > > > > > > or
> > >>> > > > > > > > > > > >> >> >>>>>>> implementation,
> > >>> > > > > > > > > > > >> >> >>>>>>>>> are better to be aware and
> discussed for
> > >>> > a
> > >>> > > > > > consensus.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> Best,
> > >>> > > > > > > > > > > >> >> >>>>>>>>> tison.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>> Zili Chen <wa...@gmail.com>
> > >>> > > > 于2019年6月20日周四
> > >>> > > > > > > > > 上午3:21写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> Hi Jeff,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> Thanks for raising this thread
> and the
> > >>> > > > design
> > >>> > > > > > > > > document!
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> As @Thomas Weise mentioned
> above,
> > >>> > extending
> > >>> > > > > > config
> > >>> > > > > > > > to
> > >>> > > > > > > > > > > flink
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> requires far more effort than
> it should
> > >>> > be.
> > >>> > > > > > Another
> > >>> > > > > > > > > > > example
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> is we achieve detach mode by
> introduce
> > >>> > > > another
> > >>> > > > > > > > > execution
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> environment which also hijack
> #execute
> > >>> > > > method.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> I agree with your idea that
> user would
> > >>> > > > > > configure all
> > >>> > > > > > > > > > > things
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> and flink "just" respect it. On
> this
> > >>> > topic I
> > >>> > > > > > think
> > >>> > > > > > > > the
> > >>> > > > > > > > > > > >> >> >> unusual
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> control flow when CliFrontend
> handle
> > >>> > "run"
> > >>> > > > > > command
> > >>> > > > > > > > is
> > >>> > > > > > > > > > the
> > >>> > > > > > > > > > > >> >> >>>> problem.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> It handles several configs,
> mainly about
> > >>> > > > > cluster
> > >>> > > > > > > > > > settings,
> > >>> > > > > > > > > > > >> >> >> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> thus main method of user
> program is
> > >>> > unaware
> > >>> > > > of
> > >>> > > > > > them.
> > >>> > > > > > > > > > Also
> > >>> > > > > > > > > > > it
> > >>> > > > > > > > > > > >> >> >>>>>> compiles
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> app to job graph by run the
> main method
> > >>> > > > with a
> > >>> > > > > > > > > hijacked
> > >>> > > > > > > > > > > exec
> > >>> > > > > > > > > > > >> >> >>>> env,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> which constrain the main method
> further.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> I'd like to write down a few of
> notes on
> > >>> > > > > > > > configs/args
> > >>> > > > > > > > > > pass
> > >>> > > > > > > > > > > >> >> >> and
> > >>> > > > > > > > > > > >> >> >>>>>>> respect,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> as well as decoupling job
> compilation
> > >>> > and
> > >>> > > > > > > > submission.
> > >>> > > > > > > > > > > Share
> > >>> > > > > > > > > > > >> >> >> on
> > >>> > > > > > > > > > > >> >> >>>>> this
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> thread later.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> Best,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> tison.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>> SHI Xiaogang <
> shixiaogangg@gmail.com>
> > >>> > > > > > 于2019年6月17日周一
> > >>> > > > > > > > > > > >> >> >> 下午7:29写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Thanks Jeff a lot for
> proposing the
> > >>> > design
> > >>> > > > > > > > document.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> We are also working on
> refactoring
> > >>> > > > > > ClusterClient to
> > >>> > > > > > > > > > allow
> > >>> > > > > > > > > > > >> >> >>>>> flexible
> > >>> > > > > > > > > > > >> >> >>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> efficient job management in our
> > >>> > real-time
> > >>> > > > > > platform.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> We would like to draft a
> document to
> > >>> > share
> > >>> > > > > our
> > >>> > > > > > > > ideas
> > >>> > > > > > > > > > with
> > >>> > > > > > > > > > > >> >> >>> you.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> I think it's a good idea to
> have
> > >>> > something
> > >>> > > > > like
> > >>> > > > > > > > > Apache
> > >>> > > > > > > > > > > Livy
> > >>> > > > > > > > > > > >> >> >>> for
> > >>> > > > > > > > > > > >> >> >>>>>>> Flink,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> the efforts discussed here
> will take a
> > >>> > > > great
> > >>> > > > > > step
> > >>> > > > > > > > > > forward
> > >>> > > > > > > > > > > >> >> >> to
> > >>> > > > > > > > > > > >> >> >>>> it.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Regards,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Xiaogang
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Flavio Pompermaier <
> > >>> > pompermaier@okkam.it>
> > >>> > > > > > > > > > 于2019年6月17日周一
> > >>> > > > > > > > > > > >> >> >>>>> 下午7:13写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> Is there any possibility to
> have
> > >>> > something
> > >>> > > > > > like
> > >>> > > > > > > > > Apache
> > >>> > > > > > > > > > > >> >> >> Livy
> > >>> > > > > > > > > > > >> >> >>>> [1]
> > >>> > > > > > > > > > > >> >> >>>>>>> also
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> Flink in the future?
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23
> PM Jeff
> > >>> > > > Zhang <
> > >>> > > > > > > > > > > >> >> >>> zjffdu@gmail.com
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Any API we expose should
> not have
> > >>> > > > > > dependencies
> > >>> > > > > > > > > on
> > >>> > > > > > > > > > > >> >> >>> the
> > >>> > > > > > > > > > > >> >> >>>>>>> runtime
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> (flink-runtime) package or
> other
> > >>> > > > > > implementation
> > >>> > > > > > > > > > > >> >> >> details.
> > >>> > > > > > > > > > > >> >> >>> To
> > >>> > > > > > > > > > > >> >> >>>>> me,
> > >>> > > > > > > > > > > >> >> >>>>>>>> this
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> means
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> that the current
> ClusterClient
> > >>> > cannot be
> > >>> > > > > > exposed
> > >>> > > > > > > > to
> > >>> > > > > > > > > > > >> >> >> users
> > >>> > > > > > > > > > > >> >> >>>>>> because
> > >>> > > > > > > > > > > >> >> >>>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> uses
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> quite some classes from the
> > >>> > optimiser and
> > >>> > > > > > runtime
> > >>> > > > > > > > > > > >> >> >>> packages.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> We should change
> ClusterClient from
> > >>> > class
> > >>> > > > > to
> > >>> > > > > > > > > > interface.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> ExecutionEnvironment only
> use the
> > >>> > > > interface
> > >>> > > > > > > > > > > >> >> >> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>> which
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> should be
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> in flink-clients while the
> concrete
> > >>> > > > > > > > implementation
> > >>> > > > > > > > > > > >> >> >> class
> > >>> > > > > > > > > > > >> >> >>>>> could
> > >>> > > > > > > > > > > >> >> >>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> flink-runtime.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What happens when a
> > >>> > failure/restart in
> > >>> > > > > the
> > >>> > > > > > > > > client
> > >>> > > > > > > > > > > >> >> >>>>> happens?
> > >>> > > > > > > > > > > >> >> >>>>>>>> There
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> need
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to be a way of
> re-establishing the
> > >>> > > > > > connection to
> > >>> > > > > > > > > the
> > >>> > > > > > > > > > > >> >> >> job,
> > >>> > > > > > > > > > > >> >> >>>> set
> > >>> > > > > > > > > > > >> >> >>>>>> up
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> listeners again, etc.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Good point.  First we need
> to define
> > >>> > what
> > >>> > > > > > does
> > >>> > > > > > > > > > > >> >> >>>>> failure/restart
> > >>> > > > > > > > > > > >> >> >>>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> client mean. IIUC, that
> usually mean
> > >>> > > > > network
> > >>> > > > > > > > > failure
> > >>> > > > > > > > > > > >> >> >>> which
> > >>> > > > > > > > > > > >> >> >>>>> will
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> happen in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> class RestClient. If my
> > >>> > understanding is
> > >>> > > > > > correct,
> > >>> > > > > > > > > > > >> >> >>>>> restart/retry
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> mechanism
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> should be done in RestClient.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Aljoscha Krettek <
> > >>> > aljoscha@apache.org>
> > >>> > > > > > > > > 于2019年6月11日周二
> > >>> > > > > > > > > > > >> >> >>>>>> 下午11:10写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> Some points to consider:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> * Any API we expose should
> not have
> > >>> > > > > > dependencies
> > >>> > > > > > > > > on
> > >>> > > > > > > > > > > >> >> >> the
> > >>> > > > > > > > > > > >> >> >>>>>> runtime
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> (flink-runtime) package or
> other
> > >>> > > > > > implementation
> > >>> > > > > > > > > > > >> >> >>> details.
> > >>> > > > > > > > > > > >> >> >>>> To
> > >>> > > > > > > > > > > >> >> >>>>>> me,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> this
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> means
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> that the current
> ClusterClient
> > >>> > cannot be
> > >>> > > > > > exposed
> > >>> > > > > > > > > to
> > >>> > > > > > > > > > > >> >> >>> users
> > >>> > > > > > > > > > > >> >> >>>>>>> because
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> uses
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> quite some classes from the
> > >>> > optimiser
> > >>> > > > and
> > >>> > > > > > > > runtime
> > >>> > > > > > > > > > > >> >> >>>> packages.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> * What happens when a
> > >>> > failure/restart in
> > >>> > > > > the
> > >>> > > > > > > > > client
> > >>> > > > > > > > > > > >> >> >>>>> happens?
> > >>> > > > > > > > > > > >> >> >>>>>>>> There
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> need
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> be a way of re-establishing
> the
> > >>> > > > connection
> > >>> > > > > > to
> > >>> > > > > > > > the
> > >>> > > > > > > > > > > >> >> >> job,
> > >>> > > > > > > > > > > >> >> >>>> set
> > >>> > > > > > > > > > > >> >> >>>>> up
> > >>> > > > > > > > > > > >> >> >>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> listeners
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> again, etc.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> Aljoscha
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17,
> Jeff
> > >>> > Zhang <
> > >>> > > > > > > > > > > >> >> >>>> zjffdu@gmail.com>
> > >>> > > > > > > > > > > >> >> >>>>>>>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Sorry folks, the design
> doc is
> > >>> > late as
> > >>> > > > > you
> > >>> > > > > > > > > > > >> >> >> expected.
> > >>> > > > > > > > > > > >> >> >>>>> Here's
> > >>> > > > > > > > > > > >> >> >>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> design
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> doc
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> I drafted, welcome any
> comments and
> > >>> > > > > > feedback.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Stephan Ewen <
> sewen@apache.org>
> > >>> > > > > > 于2019年2月14日周四
> > >>> > > > > > > > > > > >> >> >>>> 下午8:43写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Nice that this discussion
> is
> > >>> > > > happening.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also
> > >>> > revisit the
> > >>> > > > > > entire
> > >>> > > > > > > > > role
> > >>> > > > > > > > > > > >> >> >>> of
> > >>> > > > > > > > > > > >> >> >>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> environments
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> again.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the environments take
> care of
> > >>> > the
> > >>> > > > > > specific
> > >>> > > > > > > > > > > >> >> >> setup
> > >>> > > > > > > > > > > >> >> >>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> standalone
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> (no
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> setup needed), yarn,
> mesos, etc.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the session ones have
> control
> > >>> > over
> > >>> > > > the
> > >>> > > > > > > > > session.
> > >>> > > > > > > > > > > >> >> >>> The
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> environment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> holds
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the session client.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - running a job gives a
> "control"
> > >>> > > > object
> > >>> > > > > > for
> > >>> > > > > > > > > that
> > >>> > > > > > > > > > > >> >> >>>> job.
> > >>> > > > > > > > > > > >> >> >>>>>> That
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> behavior
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> is
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the same in all
> environments.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> The actual implementation
> diverged
> > >>> > > > quite
> > >>> > > > > > a bit
> > >>> > > > > > > > > > > >> >> >> from
> > >>> > > > > > > > > > > >> >> >>>>> that.
> > >>> > > > > > > > > > > >> >> >>>>>>>> Happy
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> see a
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> discussion about
> straitening this
> > >>> > out
> > >>> > > > a
> > >>> > > > > > bit
> > >>> > > > > > > > > more.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at
> 4:58 AM
> > >>> > Jeff
> > >>> > > > > > Zhang <
> > >>> > > > > > > > > > > >> >> >>>>>>> zjffdu@gmail.com>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Hi folks,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Sorry for late response,
> It
> > >>> > seems we
> > >>> > > > > > reach
> > >>> > > > > > > > > > > >> >> >>> consensus
> > >>> > > > > > > > > > > >> >> >>>> on
> > >>> > > > > > > > > > > >> >> >>>>>>>> this, I
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> will
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> create
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> FLIP for this with more
> detailed
> > >>> > > > design
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <
> thw@apache.org>
> > >>> > > > > > 于2018年12月21日周五
> > >>> > > > > > > > > > > >> >> >>>>> 上午11:43写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Great to see this
> discussion
> > >>> > seeded!
> > >>> > > > > The
> > >>> > > > > > > > > > > >> >> >> problems
> > >>> > > > > > > > > > > >> >> >>>> you
> > >>> > > > > > > > > > > >> >> >>>>>> face
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> with
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration
> are also
> > >>> > > > > affecting
> > >>> > > > > > > > other
> > >>> > > > > > > > > > > >> >> >>>>> downstream
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> projects,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> like
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Beam.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> We just enabled the
> savepoint
> > >>> > > > restore
> > >>> > > > > > option
> > >>> > > > > > > > > in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> [1]
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and that was more
> difficult
> > >>> > than it
> > >>> > > > > > should
> > >>> > > > > > > > be.
> > >>> > > > > > > > > > > >> >> >> The
> > >>> > > > > > > > > > > >> >> >>>>> main
> > >>> > > > > > > > > > > >> >> >>>>>>>> issue
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> is
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> that
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> environment and cluster
> client
> > >>> > > > aren't
> > >>> > > > > > > > > decoupled.
> > >>> > > > > > > > > > > >> >> >>>>> Ideally
> > >>> > > > > > > > > > > >> >> >>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> should
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> possible to just get the
> > >>> > matching
> > >>> > > > > > cluster
> > >>> > > > > > > > > client
> > >>> > > > > > > > > > > >> >> >>>> from
> > >>> > > > > > > > > > > >> >> >>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> environment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> then control the job
> through it
> > >>> > > > > > (environment
> > >>> > > > > > > > > as
> > >>> > > > > > > > > > > >> >> >>>>> factory
> > >>> > > > > > > > > > > >> >> >>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> cluster
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> client). But note that
> the
> > >>> > > > environment
> > >>> > > > > > > > classes
> > >>> > > > > > > > > > > >> >> >> are
> > >>> > > > > > > > > > > >> >> >>>>> part
> > >>> > > > > > > > > > > >> >> >>>>>> of
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> public
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and it is not
> straightforward to
> > >>> > > > make
> > >>> > > > > > larger
> > >>> > > > > > > > > > > >> >> >>> changes
> > >>> > > > > > > > > > > >> >> >>>>>>> without
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> breaking
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently
> exposes
> > >>> > > > > internal
> > >>> > > > > > > > > classes
> > >>> > > > > > > > > > > >> >> >>>> like
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> JobGraph and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it
> should be
> > >>> > > > possible
> > >>> > > > > > to
> > >>> > > > > > > > wrap
> > >>> > > > > > > > > > > >> >> >>> this
> > >>> > > > > > > > > > > >> >> >>>>>> with a
> > >>> > > > > > > > > > > >> >> >>>>>>>> new
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> public
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> API
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that brings the
> required job
> > >>> > control
> > >>> > > > > > > > > > > >> >> >> capabilities
> > >>> > > > > > > > > > > >> >> >>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> downstream
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> projects.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful
> to look at
> > >>> > > > some
> > >>> > > > > > of the
> > >>> > > > > > > > > > > >> >> >>>>> interfaces
> > >>> > > > > > > > > > > >> >> >>>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> Beam
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> while
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> thinking about this:
> [2] for the
> > >>> > > > > > portable
> > >>> > > > > > > > job
> > >>> > > > > > > > > > > >> >> >> API
> > >>> > > > > > > > > > > >> >> >>>> and
> > >>> > > > > > > > > > > >> >> >>>>>> [3]
> > >>> > > > > > > > > > > >> >> >>>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> old
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> asynchronous job
> control from
> > >>> > the
> > >>> > > > Beam
> > >>> > > > > > Java
> > >>> > > > > > > > > SDK.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> The backward
> compatibility
> > >>> > > > discussion
> > >>> > > > > > [4] is
> > >>> > > > > > > > > > > >> >> >> also
> > >>> > > > > > > > > > > >> >> >>>>>> relevant
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> here. A
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> new
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> should shield downstream
> > >>> > projects
> > >>> > > > from
> > >>> > > > > > > > > internals
> > >>> > > > > > > > > > > >> >> >>> and
> > >>> > > > > > > > > > > >> >> >>>>>> allow
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> them to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> interoperate with
> multiple
> > >>> > future
> > >>> > > > > Flink
> > >>> > > > > > > > > versions
> > >>> > > > > > > > > > > >> >> >>> in
> > >>> > > > > > > > > > > >> >> >>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>> same
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> release
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> line
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thanks,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thomas
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [1]
> > >>> > > > > > > > https://github.com/apache/flink/pull/7249
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [2]
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [3]
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [4]
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at
> 6:15 PM
> > >>> > Jeff
> > >>> > > > > > Zhang <
> > >>> > > > > > > > > > > >> >> >>>>>>>> zjffdu@gmail.com>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure
> whether the
> > >>> > user
> > >>> > > > > > should
> > >>> > > > > > > > be
> > >>> > > > > > > > > > > >> >> >>> able
> > >>> > > > > > > > > > > >> >> >>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>> define
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> where
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> runs (in your example
> Yarn).
> > >>> > This
> > >>> > > > is
> > >>> > > > > > > > actually
> > >>> > > > > > > > > > > >> >> >>>>>> independent
> > >>> > > > > > > > > > > >> >> >>>>>>>> of
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> development and is
> something
> > >>> > which
> > >>> > > > is
> > >>> > > > > > > > decided
> > >>> > > > > > > > > > > >> >> >> at
> > >>> > > > > > > > > > > >> >> >>>>>>> deployment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> time.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> User don't need to
> specify
> > >>> > > > execution
> > >>> > > > > > mode
> > >>> > > > > > > > > > > >> >> >>>>>>> programmatically.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> They
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> can
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> also
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass the execution
> mode from
> > >>> > the
> > >>> > > > > > arguments
> > >>> > > > > > > > in
> > >>> > > > > > > > > > > >> >> >>> flink
> > >>> > > > > > > > > > > >> >> >>>>> run
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> command.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> e.g.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m
> yarn-cluster
> > >>> > ....
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local
> ...
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m
> host:port ...
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense
> to you ?
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes
> sense that
> > >>> > the
> > >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>> is
> > >>> > > > > > > > > > > >> >> >>>>>>>> not
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> directly
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> initialized by the
> user and
> > >>> > instead
> > >>> > > > > > context
> > >>> > > > > > > > > > > >> >> >>>> sensitive
> > >>> > > > > > > > > > > >> >> >>>>>> how
> > >>> > > > > > > > > > > >> >> >>>>>>>> you
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> want
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> execute your job
> (Flink CLI vs.
> > >>> > > > IDE,
> > >>> > > > > > for
> > >>> > > > > > > > > > > >> >> >>> example).
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I
> notice Flink
> > >>> > > > would
> > >>> > > > > > > > create
> > >>> > > > > > > > > > > >> >> >>>>> different
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> ContextExecutionEnvironment
> > >>> > based
> > >>> > > > on
> > >>> > > > > > > > > different
> > >>> > > > > > > > > > > >> >> >>>>>> submission
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> scenarios
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> (Flink
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me
> this is
> > >>> > kind of
> > >>> > > > > hack
> > >>> > > > > > > > > > > >> >> >> approach,
> > >>> > > > > > > > > > > >> >> >>>> not
> > >>> > > > > > > > > > > >> >> >>>>>> so
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> straightforward.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> What I suggested above
> is that
> > >>> > is
> > >>> > > > > that
> > >>> > > > > > > > flink
> > >>> > > > > > > > > > > >> >> >>> should
> > >>> > > > > > > > > > > >> >> >>>>>>> always
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> create
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> but with
> > >>> > > > > different
> > >>> > > > > > > > > > > >> >> >>>>> configuration,
> > >>> > > > > > > > > > > >> >> >>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> based
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> on
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> configuration it would
> create
> > >>> > the
> > >>> > > > > > proper
> > >>> > > > > > > > > > > >> >> >>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> different
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <
> > >>> > > > trohrmann@apache.org>
> > >>> > > > > > > > > > > >> >> >>>> 于2018年12月20日周四
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> 下午11:18写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> You are probably
> right that we
> > >>> > > > have
> > >>> > > > > > code
> > >>> > > > > > > > > > > >> >> >>>> duplication
> > >>> > > > > > > > > > > >> >> >>>>>>> when
> > >>> > > > > > > > > > > >> >> >>>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> comes
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creation of the
> ClusterClient.
> > >>> > > > This
> > >>> > > > > > should
> > >>> > > > > > > > > be
> > >>> > > > > > > > > > > >> >> >>>>> reduced
> > >>> > > > > > > > > > > >> >> >>>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> future.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure
> whether the
> > >>> > user
> > >>> > > > > > should be
> > >>> > > > > > > > > > > >> >> >> able
> > >>> > > > > > > > > > > >> >> >>> to
> > >>> > > > > > > > > > > >> >> >>>>>>> define
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> where
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example
> Yarn).
> > >>> > This
> > >>> > > > is
> > >>> > > > > > > > > actually
> > >>> > > > > > > > > > > >> >> >>>>>>> independent
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> of the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> development and is
> something
> > >>> > which
> > >>> > > > > is
> > >>> > > > > > > > > decided
> > >>> > > > > > > > > > > >> >> >> at
> > >>> > > > > > > > > > > >> >> >>>>>>>> deployment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> time.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> To
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the
> > >>> > > > > > ExecutionEnvironment
> > >>> > > > > > > > is
> > >>> > > > > > > > > > > >> >> >> not
> > >>> > > > > > > > > > > >> >> >>>>>>> directly
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> initialized
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> by
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> the user and instead
> context
> > >>> > > > > > sensitive how
> > >>> > > > > > > > > you
> > >>> > > > > > > > > > > >> >> >>>> want
> > >>> > > > > > > > > > > >> >> >>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> execute
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> your
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE,
> for
> > >>> > example).
> > >>> > > > > > > > However, I
> > >>> > > > > > > > > > > >> >> >>> agree
> > >>> > > > > > > > > > > >> >> >>>>>> that
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> should
> > >>> > give
> > >>> > > > you
> > >>> > > > > > > > access
> > >>> > > > > > > > > to
> > >>> > > > > > > > > > > >> >> >>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the
> form of the
> > >>> > > > > > JobGraph or
> > >>> > > > > > > > a
> > >>> > > > > > > > > > > >> >> >> job
> > >>> > > > > > > > > > > >> >> >>>>> plan).
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Till
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018
> at 4:36
> > >>> > AM
> > >>> > > > Jeff
> > >>> > > > > > > > Zhang <
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the
> feedback. You
> > >>> > are
> > >>> > > > > > right
> > >>> > > > > > > > > that I
> > >>> > > > > > > > > > > >> >> >>>>> expect
> > >>> > > > > > > > > > > >> >> >>>>>>>> better
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> submission/control api
> > >>> > which
> > >>> > > > > > could be
> > >>> > > > > > > > > > > >> >> >> used
> > >>> > > > > > > > > > > >> >> >>> by
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> downstream
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> project.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> And
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for
> the
> > >>> > flink
> > >>> > > > > > ecosystem.
> > >>> > > > > > > > > > > >> >> >> When
> > >>> > > > > > > > > > > >> >> >>> I
> > >>> > > > > > > > > > > >> >> >>>>> look
> > >>> > > > > > > > > > > >> >> >>>>>>> at
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> code
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> of
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> flink
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and
> sql-client (I
> > >>> > > > > believe
> > >>> > > > > > > > they
> > >>> > > > > > > > > > > >> >> >> are
> > >>> > > > > > > > > > > >> >> >>>> not
> > >>> > > > > > > > > > > >> >> >>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> core of
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> flink,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> but
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the
> ecosystem of
> > >>> > > > flink),
> > >>> > > > > I
> > >>> > > > > > find
> > >>> > > > > > > > > > > >> >> >> many
> > >>> > > > > > > > > > > >> >> >>>>>>> duplicated
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> code
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creating
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from
> user
> > >>> > provided
> > >>> > > > > > > > > > > >> >> >> configuration
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> (configuration
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> format
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> may
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> different from
> scala-shell
> > >>> > and
> > >>> > > > > > > > sql-client)
> > >>> > > > > > > > > > > >> >> >> and
> > >>> > > > > > > > > > > >> >> >>>> then
> > >>> > > > > > > > > > > >> >> >>>>>> use
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> that
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs.
> I don't
> > >>> > think
> > >>> > > > > > this is
> > >>> > > > > > > > > > > >> >> >>>>> convenient
> > >>> > > > > > > > > > > >> >> >>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> downstream
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I
> expect is
> > >>> > that
> > >>> > > > > > > > downstream
> > >>> > > > > > > > > > > >> >> >>>> project
> > >>> > > > > > > > > > > >> >> >>>>>> only
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> needs
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> provide
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> necessary
> configuration info
> > >>> > > > (maybe
> > >>> > > > > > > > > > > >> >> >> introducing
> > >>> > > > > > > > > > > >> >> >>>>> class
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> FlinkConf),
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> build
> ExecutionEnvironment
> > >>> > based
> > >>> > > > on
> > >>> > > > > > this
> > >>> > > > > > > > > > > >> >> >>>> FlinkConf,
> > >>> > > > > > > > > > > >> >> >>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> will
> > >>> > create
> > >>> > > > > the
> > >>> > > > > > > > proper
> > >>> > > > > > > > > > > >> >> >>>>>>>> ClusterClient.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> It
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> not
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> only
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the
> downstream
> > >>> > > > project
> > >>> > > > > > > > > > > >> >> >> development
> > >>> > > > > > > > > > > >> >> >>>> but
> > >>> > > > > > > > > > > >> >> >>>>>> also
> > >>> > > > > > > > > > > >> >> >>>>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> helpful
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> their integration
> test with
> > >>> > > > flink.
> > >>> > > > > > Here's
> > >>> > > > > > > > > one
> > >>> > > > > > > > > > > >> >> >>>>> sample
> > >>> > > > > > > > > > > >> >> >>>>>>> code
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> snippet
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> I
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new
> > >>> > > > > > FlinkConf().mode("yarn")
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new
> > >>> > > > > > ExecutionEnvironment(conf)
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId =
> env.submit(...)
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > env.getClusterClient().queryJobStatus(jobId)
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > env.getClusterClient().cancelJob(jobId)
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
> > >>> > > > > trohrmann@apache.org>
> > >>> > > > > > > > > > > >> >> >>>>> 于2018年12月11日周二
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> 下午6:28写道：
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are
> proposing is to
> > >>> > > > > > provide the
> > >>> > > > > > > > > > > >> >> >> user
> > >>> > > > > > > > > > > >> >> >>>> with
> > >>> > > > > > > > > > > >> >> >>>>>>>> better
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was
> actually
> > >>> > an
> > >>> > > > > > effort to
> > >>> > > > > > > > > > > >> >> >>> achieve
> > >>> > > > > > > > > > > >> >> >>>>>> this
> > >>> > > > > > > > > > > >> >> >>>>>>>> but
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> has
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> never
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> been
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1].
> However,
> > >>> > there
> > >>> > > > are
> > >>> > > > > > some
> > >>> > > > > > > > > > > >> >> >>>> improvement
> > >>> > > > > > > > > > > >> >> >>>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> code
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> base
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> now.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at
> the
> > >>> > > > > > NewClusterClient
> > >>> > > > > > > > > > > >> >> >>>> interface
> > >>> > > > > > > > > > > >> >> >>>>>>> which
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> offers a
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job
> submission.
> > >>> > > > But I
> > >>> > > > > > agree
> > >>> > > > > > > > > > > >> >> >> that
> > >>> > > > > > > > > > > >> >> >>> we
> > >>> > > > > > > > > > > >> >> >>>>>> need
> > >>> > > > > > > > > > > >> >> >>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> improve
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in
> favour if
> > >>> > > > > > exposing all
> > >>> > > > > > > > > > > >> >> >>>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> calls
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> via
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > >>> > because it
> > >>> > > > > > would
> > >>> > > > > > > > > > > >> >> >> clutter
> > >>> > > > > > > > > > > >> >> >>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>> class
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> would
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> not
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> a
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of
> concerns.
> > >>> > > > > > Instead one
> > >>> > > > > > > > > > > >> >> >> idea
> > >>> > > > > > > > > > > >> >> >>>>> could
> > >>> > > > > > > > > > > >> >> >>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> retrieve
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> current
> ClusterClient from
> > >>> > the
> > >>> > > > > > > > > > > >> >> >>>>> ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>> which
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> can
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> used
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job
> > >>> > control. But
> > >>> > > > > > before
> > >>> > > > > > > > we
> > >>> > > > > > > > > > > >> >> >>> start
> > >>> > > > > > > > > > > >> >> >>>>> an
> > >>> > > > > > > > > > > >> >> >>>>>>>> effort
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> here,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> we
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> need
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture
> what
> > >>> > > > > > functionality we
> > >>> > > > > > > > > want
> > >>> > > > > > > > > > > >> >> >>> to
> > >>> > > > > > > > > > > >> >> >>>>>>> provide.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea
> was
> > >>> > that we
> > >>> > > > > > have the
> > >>> > > > > > > > > > > >> >> >>>>>>>> ClusterDescriptor
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> describing
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> how
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster
> manager
> > >>> > like
> > >>> > > > > > Yarn or
> > >>> > > > > > > > > > > >> >> >> Mesos.
> > >>> > > > > > > > > > > >> >> >>>> The
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> can
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> be
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying
> Flink
> > >>> > > > clusters
> > >>> > > > > > (job
> > >>> > > > > > > > and
> > >>> > > > > > > > > > > >> >> >>>>> session)
> > >>> > > > > > > > > > > >> >> >>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> gives
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> you a
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
> > >>> > ClusterClient
> > >>> > > > > > > > controls
> > >>> > > > > > > > > > > >> >> >> the
> > >>> > > > > > > > > > > >> >> >>>>>> cluster
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> (e.g.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> submitting
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all
> running
> > >>> > jobs).
> > >>> > > > > And
> > >>> > > > > > > > then
> > >>> > > > > > > > > > > >> >> >>> there
> > >>> > > > > > > > > > > >> >> >>>>> was
> > >>> > > > > > > > > > > >> >> >>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> idea
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you
> obtain
> > >>> > from
> > >>> > > > > the
> > >>> > > > > > > > > > > >> >> >>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> trigger
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> specific
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g.
> taking a
> > >>> > > > > savepoint,
> > >>> > > > > > > > > > > >> >> >>> cancelling
> > >>> > > > > > > > > > > >> >> >>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>> job).
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > https://issues.apache.org/jira/browse/FLINK-4272
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11,
> 2018 at
> > >>> > 10:13 AM
> > >>> > > > > > Jeff
> > >>> > > > > > > > > Zhang
> > >>> > > > > > > > > > > >> >> >> <
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> wrote:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to
> integrate
> > >>> > flink
> > >>> > > > > into
> > >>> > > > > > > > > apache
> > >>> > > > > > > > > > > >> >> >>>>> zeppelin
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> which is
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> an
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I
> hit several
> > >>> > > > > issues
> > >>> > > > > > that
> > >>> > > > > > > > > is
> > >>> > > > > > > > > > > >> >> >>>> caused
> > >>> > > > > > > > > > > >> >> >>>>>> by
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> flink
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> client
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> api.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> So
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to
> proposal the
> > >>> > > > > following
> > >>> > > > > > > > > changes
> > >>> > > > > > > > > > > >> >> >>> for
> > >>> > > > > > > > > > > >> >> >>>>>> flink
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> client
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> api.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support
> nonblocking
> > >>> > > > execution.
> > >>> > > > > > > > > > > >> >> >> Currently,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> ExecutionEnvironment#execute
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking
> method which
> > >>> > > > would
> > >>> > > > > > do 2
> > >>> > > > > > > > > > > >> >> >> things,
> > >>> > > > > > > > > > > >> >> >>>>> first
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> submit
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until
> it is
> > >>> > > > > finished.
> > >>> > > > > > I'd
> > >>> > > > > > > > > like
> > >>> > > > > > > > > > > >> >> >>>>>>> introduce a
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> nonblocking
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method
> like
> > >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment#submit
> > >>> > > > > > > > > > > >> >> >>>>>>> which
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> only
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> submit
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId
> to
> > >>> > client.
> > >>> > > > And
> > >>> > > > > > allow
> > >>> > > > > > > > > user
> > >>> > > > > > > > > > > >> >> >>> to
> > >>> > > > > > > > > > > >> >> >>>>>> query
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> status
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> via
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api
> in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>
> ExecutionEnvironment/StreamExecutionEnvironment,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only
> way to
> > >>> > > > cancel
> > >>> > > > > > job is
> > >>> > > > > > > > > via
> > >>> > > > > > > > > > > >> >> >>> cli
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> (bin/flink),
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> this
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> is
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> not
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for
> downstream
> > >>> > > > project
> > >>> > > > > > to
> > >>> > > > > > > > use
> > >>> > > > > > > > > > > >> >> >> this
> > >>> > > > > > > > > > > >> >> >>>>>>> feature.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> So I'd
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> like
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> add
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in
> > >>> > > > > ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint
> api in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > ExecutionEnvironment/StreamExecutionEnvironment.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> It
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as
> cancel api,
> > >>> > we
> > >>> > > > > > should use
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> as
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third
> party to
> > >>> > > > integrate
> > >>> > > > > > with
> > >>> > > > > > > > > > > >> >> >> flink.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener
> for job
> > >>> > > > execution
> > >>> > > > > > > > > > > >> >> >> lifecycle.
> > >>> > > > > > > > > > > >> >> >>>>>>> Something
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> like
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> following,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> so
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream
> project
> > >>> > can do
> > >>> > > > > > custom
> > >>> > > > > > > > > logic
> > >>> > > > > > > > > > > >> >> >>> in
> > >>> > > > > > > > > > > >> >> >>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> lifecycle
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> of
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> job.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would
> capture the
> > >>> > > > jobId
> > >>> > > > > > after
> > >>> > > > > > > > > job
> > >>> > > > > > > > > > > >> >> >> is
> > >>> > > > > > > > > > > >> >> >>>>>>> submitted
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> use
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> this
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it
> later
> > >>> > when
> > >>> > > > > > > > necessary.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface
> > >>> > JobListener {
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
> onJobSubmitted(JobID
> > >>> > > > > jobId);
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
> > >>> > > > > > onJobExecuted(JobExecutionResult
> > >>> > > > > > > > > > > >> >> >>>>> jobResult);
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
> onJobCanceled(JobID
> > >>> > > > jobId);
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session
> in
> > >>> > > > > > > > > ExecutionEnvironment.
> > >>> > > > > > > > > > > >> >> >>>>>> Currently
> > >>> > > > > > > > > > > >> >> >>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> is
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> disabled,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> but
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very
> convenient
> > >>> > for
> > >>> > > > > > third
> > >>> > > > > > > > > party
> > >>> > > > > > > > > > > >> >> >> to
> > >>> > > > > > > > > > > >> >> >>>>>>>> submitting
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> jobs
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can
> enable it
> > >>> > > > again.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink
> client
> > >>> > api
> > >>> > > > > into
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > ExecutionEnvironment/StreamExecutionEnvironment.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long
> term issue
> > >>> > which
> > >>> > > > > > needs
> > >>> > > > > > > > > more
> > >>> > > > > > > > > > > >> >> >>>>> careful
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> thinking
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> design.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of
> features
> > >>> > of
> > >>> > > > > > flink is
> > >>> > > > > > > > > > > >> >> >>> exposed
> > >>> > > > > > > > > > > >> >> >>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > ExecutionEnvironment/StreamExecutionEnvironment,
> > >>> > > > > > > > > > > >> >> >>>>>> but
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> some are
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> exposed
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> in
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of
> api, like
> > >>> > the
> > >>> > > > > > cancel and
> > >>> > > > > > > > > > > >> >> >>>>> savepoint I
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> mentioned
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> above.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root
> cause is
> > >>> > due to
> > >>> > > > > that
> > >>> > > > > > > > flink
> > >>> > > > > > > > > > > >> >> >>>> didn't
> > >>> > > > > > > > > > > >> >> >>>>>>> unify
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> interaction
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> with
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list
> 3
> > >>> > scenarios
> > >>> > > > of
> > >>> > > > > > flink
> > >>> > > > > > > > > > > >> >> >>>> operation
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job
> execution.
> > >>> > Flink
> > >>> > > > > will
> > >>> > > > > > > > > create
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> LocalEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> use
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this
> LocalEnvironment to
> > >>> > > > create
> > >>> > > > > > > > > > > >> >> >>> LocalExecutor
> > >>> > > > > > > > > > > >> >> >>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> execution.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job
> execution.
> > >>> > Flink
> > >>> > > > > will
> > >>> > > > > > > > > create
> > >>> > > > > > > > > > > >> >> >>>>>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> first
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create
> ContextEnvironment
> > >>> > > > based
> > >>> > > > > > on the
> > >>> > > > > > > > > > > >> >> >>>>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> run
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job
> cancelation. Flink
> > >>> > will
> > >>> > > > > > create
> > >>> > > > > > > > > > > >> >> >>>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> first
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> then
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this
> > >>> > > > ClusterClient.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in
> the
> > >>> > above 3
> > >>> > > > > > > > scenarios.
> > >>> > > > > > > > > > > >> >> >>> Flink
> > >>> > > > > > > > > > > >> >> >>>>>> didn't
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> use the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code
> path) to
> > >>> > interact
> > >>> > > > > > with
> > >>> > > > > > > > > flink
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is
> > >>> > following:
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> > >>> > > > > > > > > > > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> (based
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> on
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> user
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) -->
> Use this
> > >>> > > > > > Environment
> > >>> > > > > > > > > to
> > >>> > > > > > > > > > > >> >> >>>> create
> > >>> > > > > > > > > > > >> >> >>>>>>>> proper
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> (LocalClusterClient or
> > >>> > > > > > > > RestClusterClient)
> > >>> > > > > > > > > > > >> >> >> to
> > >>> > > > > > > > > > > >> >> >>>>>>>> interactive
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> with
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink (
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or
> cancelation)
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can
> unify the
> > >>> > > > process
> > >>> > > > > > of
> > >>> > > > > > > > > local
> > >>> > > > > > > > > > > >> >> >>>>>> execution
> > >>> > > > > > > > > > > >> >> >>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> remote
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much
> easier for
> > >>> > third
> > >>> > > > > > party
> > >>> > > > > > > > to
> > >>> > > > > > > > > > > >> >> >>>>> integrate
> > >>> > > > > > > > > > > >> >> >>>>>>> with
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> flink,
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> because
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> ExecutionEnvironment is the
> > >>> > > > > unified
> > >>> > > > > > > > entry
> > >>> > > > > > > > > > > >> >> >>> point
> > >>> > > > > > > > > > > >> >> >>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> flink.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> third
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> party
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is
> just pass
> > >>> > > > > > configuration
> > >>> > > > > > > > to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> ExecutionEnvironment will
> > >>> > do
> > >>> > > > the
> > >>> > > > > > right
> > >>> > > > > > > > > > > >> >> >> thing
> > >>> > > > > > > > > > > >> >> >>>>> based
> > >>> > > > > > > > > > > >> >> >>>>>> on
> > >>> > > > > > > > > > > >> >> >>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also
> be
> > >>> > > > considered
> > >>> > > > > as
> > >>> > > > > > > > flink
> > >>> > > > > > > > > > > >> >> >> api
> > >>> > > > > > > > > > > >> >> >>>>>>> consumer.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> just
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to
> > >>> > > > > > ExecutionEnvironment
> > >>> > > > > > > > and
> > >>> > > > > > > > > > > >> >> >> let
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper
> > >>> > ClusterClient
> > >>> > > > > > instead
> > >>> > > > > > > > > of
> > >>> > > > > > > > > > > >> >> >>>>> letting
> > >>> > > > > > > > > > > >> >> >>>>>>> cli
> > >>> > > > > > > > > > > >> >> >>>>>>>> to
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> create
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient
> directly.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve
> large code
> > >>> > > > > > refactoring,
> > >>> > > > > > > > > so
> > >>> > > > > > > > > > > >> >> >> I
> > >>> > > > > > > > > > > >> >> >>>>> think
> > >>> > > > > > > > > > > >> >> >>>>>> we
> > >>> > > > > > > > > > > >> >> >>>>>>>> can
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> defer
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> it
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> for
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release,
> 1,2,3,4,5
> > >>> > could
> > >>> > > > > be
> > >>> > > > > > done
> > >>> > > > > > > > > at
> > >>> > > > > > > > > > > >> >> >>>> once I
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>> believe.
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Let
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> know
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> your
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and
> feedback,
> > >>> > thanks
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> --
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> --
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> --
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> --
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> --
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>> --
> > >>> > > > > > > > > > > >> >> >>>>>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>>
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>> --
> > >>> > > > > > > > > > > >> >> >>>>> Best Regards
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>> Jeff Zhang
> > >>> > > > > > > > > > > >> >> >>>>>
> > >>> > > > > > > > > > > >> >> >>>>
> > >>> > > > > > > > > > > >> >> >>>
> > >>> > > > > > > > > > > >> >> >>
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >> > --
> > >>> > > > > > > > > > > >> >> > Best Regards
> > >>> > > > > > > > > > > >> >> >
> > >>> > > > > > > > > > > >> >> > Jeff Zhang
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >> >>
> > >>> > > > > > > > > > > >>
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > > >
> > >>> > > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> >
> > >>>
> > >>>
> > >>> --
> > >>> Best Regards
> > >>>
> > >>> Jeff Zhang
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Kostas Kloudas <kk...@gmail.com>.

Here is the issue, and I will keep on updating it as I find more issues.

https://issues.apache.org/jira/browse/FLINK-13954

This will also cover the refactoring of the Executors that we discussed
in this thread, without any additional functionality (such as the job client).

Kostas

On Wed, Sep 4, 2019 at 11:46 AM Kostas Kloudas <kk...@gmail.com> wrote:
>
> Great idea Tison!
>
> I will create the umbrella issue and post it here so that we are all
> on the same page!
>
> Cheers,
> Kostas
>
> On Wed, Sep 4, 2019 at 11:36 AM Zili Chen <wa...@gmail.com> wrote:
> >
> > Hi Kostas & Aljoscha,
> >
> > I notice that there is a JIRA(FLINK-13946) which could be included
> > in this refactor thread. Since we agree on most of directions in
> > big picture, is it reasonable that we create an umbrella issue for
> > refactor client APIs and also linked subtasks? It would be a better
> > way that we join forces of our community.
> >
> > Best,
> > tison.
> >
> >
> > Zili Chen <wa...@gmail.com> 于2019年8月31日周六 下午12:52写道：
> >>
> >> Great Kostas! Looking forward to your POC!
> >>
> >> Best,
> >> tison.
> >>
> >>
> >> Jeff Zhang <zj...@gmail.com> 于2019年8月30日周五 下午11:07写道：
> >>>
> >>> Awesome, @Kostas Looking forward your POC.
> >>>
> >>> Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：
> >>>
> >>> > Hi all,
> >>> >
> >>> > I am just writing here to let you know that I am working on a POC that
> >>> > tries to refactor the current state of job submission in Flink.
> >>> > I want to stress out that it introduces NO CHANGES to the current
> >>> > behaviour of Flink. It just re-arranges things and introduces the
> >>> > notion of an Executor, which is the entity responsible for taking the
> >>> > user-code and submitting it for execution.
> >>> >
> >>> > Given this, the discussion about the functionality that the JobClient
> >>> > will expose to the user can go on independently and the same
> >>> > holds for all the open questions so far.
> >>> >
> >>> > I hope I will have some more new to share soon.
> >>> >
> >>> > Thanks,
> >>> > Kostas
> >>> >
> >>> > On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <da...@gmail.com> wrote:
> >>> > >
> >>> > > Hi Zili,
> >>> > >
> >>> > > It make sense to me that a dedicated cluster is started for a per-job
> >>> > > cluster and will not accept more jobs.
> >>> > > Just have a question about the command line.
> >>> > >
> >>> > > Currently we could use the following commands to start different
> >>> > clusters.
> >>> > > *per-job cluster*
> >>> > > ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
> >>> > > examples/streaming/WindowJoin.jar
> >>> > > *session cluster*
> >>> > > ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
> >>> > > examples/streaming/WindowJoin.jar
> >>> > >
> >>> > > What will it look like after client enhancement?
> >>> > >
> >>> > >
> >>> > > Best,
> >>> > > Yang
> >>> > >
> >>> > > Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
> >>> > >
> >>> > > > Hi Till,
> >>> > > >
> >>> > > > Thanks for your update. Nice to hear :-)
> >>> > > >
> >>> > > > Best,
> >>> > > > tison.
> >>> > > >
> >>> > > >
> >>> > > > Till Rohrmann <tr...@apache.org> 于2019年8月23日周五 下午10:39写道：
> >>> > > >
> >>> > > > > Hi Tison,
> >>> > > > >
> >>> > > > > just a quick comment concerning the class loading issues when using
> >>> > the
> >>> > > > per
> >>> > > > > job mode. The community wants to change it so that the
> >>> > > > > StandaloneJobClusterEntryPoint actually uses the user code class
> >>> > loader
> >>> > > > > with child first class loading [1]. Hence, I hope that this problem
> >>> > will
> >>> > > > be
> >>> > > > > resolved soon.
> >>> > > > >
> >>> > > > > [1] https://issues.apache.org/jira/browse/FLINK-13840
> >>> > > > >
> >>> > > > > Cheers,
> >>> > > > > Till
> >>> > > > >
> >>> > > > > On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <kk...@gmail.com>
> >>> > > > wrote:
> >>> > > > >
> >>> > > > > > Hi all,
> >>> > > > > >
> >>> > > > > > On the topic of web submission, I agree with Till that it only
> >>> > seems
> >>> > > > > > to complicate things.
> >>> > > > > > It is bad for security, job isolation (anybody can submit/cancel
> >>> > jobs),
> >>> > > > > > and its
> >>> > > > > > implementation complicates some parts of the code. So, if it were
> >>> > to
> >>> > > > > > redesign the
> >>> > > > > > WebUI, maybe this part could be left out. In addition, I would say
> >>> > > > > > that the ability to cancel
> >>> > > > > > jobs could also be left out.
> >>> > > > > >
> >>> > > > > > Also I would also be in favour of removing the "detached" mode, for
> >>> > > > > > the reasons mentioned
> >>> > > > > > above (i.e. because now we will have a future representing the
> >>> > result
> >>> > > > > > on which the user
> >>> > > > > > can choose to wait or not).
> >>> > > > > >
> >>> > > > > > Now for the separating job submission and cluster creation, I am in
> >>> > > > > > favour of keeping both.
> >>> > > > > > Once again, the reasons are mentioned above by Stephan, Till,
> >>> > Aljoscha
> >>> > > > > > and also Zili seems
> >>> > > > > > to agree. They mainly have to do with security, isolation and ease
> >>> > of
> >>> > > > > > resource management
> >>> > > > > > for the user as he knows that "when my job is done, everything
> >>> > will be
> >>> > > > > > cleared up". This is
> >>> > > > > > also the experience you get when launching a process on your local
> >>> > OS.
> >>> > > > > >
> >>> > > > > > On excluding the per-job mode from returning a JobClient or not, I
> >>> > > > > > believe that eventually
> >>> > > > > > it would be nice to allow users to get back a jobClient. The
> >>> > reason is
> >>> > > > > > that 1) I cannot
> >>> > > > > > find any objective reason why the user-experience should diverge,
> >>> > and
> >>> > > > > > 2) this will be the
> >>> > > > > > way that the user will be able to interact with his running job.
> >>> > > > > > Assuming that the necessary
> >>> > > > > > ports are open for the REST API to work, then I think that the
> >>> > > > > > JobClient can run against the
> >>> > > > > > REST API without problems. If the needed ports are not open, then
> >>> > we
> >>> > > > > > are safe to not return
> >>> > > > > > a JobClient, as the user explicitly chose to close all points of
> >>> > > > > > communication to his running job.
> >>> > > > > >
> >>> > > > > > On the topic of not hijacking the "env.execute()" in order to get
> >>> > the
> >>> > > > > > Plan, I definitely agree but
> >>> > > > > > for the proposal of having a "compile()" method in the env, I would
> >>> > > > > > like to have a better look at
> >>> > > > > > the existing code.
> >>> > > > > >
> >>> > > > > > Cheers,
> >>> > > > > > Kostas
> >>> > > > > >
> >>> > > > > > On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <wa...@gmail.com>
> >>> > > > wrote:
> >>> > > > > > >
> >>> > > > > > > Hi Yang,
> >>> > > > > > >
> >>> > > > > > > It would be helpful if you check Stephan's last comment,
> >>> > > > > > > which states that isolation is important.
> >>> > > > > > >
> >>> > > > > > > For per-job mode, we run a dedicated cluster(maybe it
> >>> > > > > > > should have been a couple of JM and TMs during FLIP-6
> >>> > > > > > > design) for a specific job. Thus the process is prevented
> >>> > > > > > > from other jobs.
> >>> > > > > > >
> >>> > > > > > > In our cases there was a time we suffered from multi
> >>> > > > > > > jobs submitted by different users and they affected
> >>> > > > > > > each other so that all ran into an error state. Also,
> >>> > > > > > > run the client inside the cluster could save client
> >>> > > > > > > resource at some points.
> >>> > > > > > >
> >>> > > > > > > However, we also face several issues as you mentioned,
> >>> > > > > > > that in per-job mode it always uses parent classloader
> >>> > > > > > > thus classloading issues occur.
> >>> > > > > > >
> >>> > > > > > > BTW, one can makes an analogy between session/per-job mode
> >>> > > > > > > in  Flink, and client/cluster mode in Spark.
> >>> > > > > > >
> >>> > > > > > > Best,
> >>> > > > > > > tison.
> >>> > > > > > >
> >>> > > > > > >
> >>> > > > > > > Yang Wang <da...@gmail.com> 于2019年8月22日周四 上午11:25写道：
> >>> > > > > > >
> >>> > > > > > > > From the user's perspective, it is really confused about the
> >>> > scope
> >>> > > > of
> >>> > > > > > > > per-job cluster.
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > > If it means a flink cluster with single job, so that we could
> >>> > get
> >>> > > > > > better
> >>> > > > > > > > isolation.
> >>> > > > > > > >
> >>> > > > > > > > Now it does not matter how we deploy the cluster, directly
> >>> > > > > > deploy(mode1)
> >>> > > > > > > >
> >>> > > > > > > > or start a flink cluster and then submit job through cluster
> >>> > > > > > client(mode2).
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > > Otherwise, if it just means directly deploy, how should we
> >>> > name the
> >>> > > > > > mode2,
> >>> > > > > > > >
> >>> > > > > > > > session with job or something else?
> >>> > > > > > > >
> >>> > > > > > > > We could also benefit from the mode2. Users could get the same
> >>> > > > > > isolation
> >>> > > > > > > > with mode1.
> >>> > > > > > > >
> >>> > > > > > > > The user code and dependencies will be loaded by user class
> >>> > loader
> >>> > > > > > > >
> >>> > > > > > > > to avoid class conflict with framework.
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > > Anyway, both of the two submission modes are useful.
> >>> > > > > > > >
> >>> > > > > > > > We just need to clarify the concepts.
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > > Best,
> >>> > > > > > > >
> >>> > > > > > > > Yang
> >>> > > > > > > >
> >>> > > > > > > > Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午5:58写道：
> >>> > > > > > > >
> >>> > > > > > > > > Thanks for the clarification.
> >>> > > > > > > > >
> >>> > > > > > > > > The idea JobDeployer ever came into my mind when I was
> >>> > muddled
> >>> > > > with
> >>> > > > > > > > > how to execute per-job mode and session mode with the same
> >>> > user
> >>> > > > > code
> >>> > > > > > > > > and framework codepath.
> >>> > > > > > > > >
> >>> > > > > > > > > With the concept JobDeployer we back to the statement that
> >>> > > > > > environment
> >>> > > > > > > > > knows every configs of cluster deployment and job
> >>> > submission. We
> >>> > > > > > > > > configure or generate from configuration a specific
> >>> > JobDeployer
> >>> > > > in
> >>> > > > > > > > > environment and then code align on
> >>> > > > > > > > >
> >>> > > > > > > > > *JobClient client = env.execute().get();*
> >>> > > > > > > > >
> >>> > > > > > > > > which in session mode returned by clusterClient.submitJob
> >>> > and in
> >>> > > > > > per-job
> >>> > > > > > > > > mode returned by clusterDescriptor.deployJobCluster.
> >>> > > > > > > > >
> >>> > > > > > > > > Here comes a problem that currently we directly run
> >>> > > > > ClusterEntrypoint
> >>> > > > > > > > > with extracted job graph. Follow the JobDeployer way we'd
> >>> > better
> >>> > > > > > > > > align entry point of per-job deployment at JobDeployer.
> >>> > Users run
> >>> > > > > > > > > their main method or by a Cli(finally call main method) to
> >>> > deploy
> >>> > > > > the
> >>> > > > > > > > > job cluster.
> >>> > > > > > > > >
> >>> > > > > > > > > Best,
> >>> > > > > > > > > tison.
> >>> > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > > > Stephan Ewen <se...@apache.org> 于2019年8月20日周二 下午4:40写道：
> >>> > > > > > > > >
> >>> > > > > > > > > > Till has made some good comments here.
> >>> > > > > > > > > >
> >>> > > > > > > > > > Two things to add:
> >>> > > > > > > > > >
> >>> > > > > > > > > >   - The job mode is very nice in the way that it runs the
> >>> > > > client
> >>> > > > > > inside
> >>> > > > > > > > > the
> >>> > > > > > > > > > cluster (in the same image/process that is the JM) and thus
> >>> > > > > unifies
> >>> > > > > > > > both
> >>> > > > > > > > > > applications and what the Spark world calls the "driver
> >>> > mode".
> >>> > > > > > > > > >
> >>> > > > > > > > > >   - Another thing I would add is that during the FLIP-6
> >>> > design,
> >>> > > > > we
> >>> > > > > > were
> >>> > > > > > > > > > thinking about setups where Dispatcher and JobManager are
> >>> > > > > separate
> >>> > > > > > > > > > processes.
> >>> > > > > > > > > >     A Yarn or Mesos Dispatcher of a session could run
> >>> > > > > independently
> >>> > > > > > > > (even
> >>> > > > > > > > > > as privileged processes executing no code).
> >>> > > > > > > > > >     Then you the "per-job" mode could still be helpful:
> >>> > when a
> >>> > > > > job
> >>> > > > > > is
> >>> > > > > > > > > > submitted to the dispatcher, it launches the JM again in a
> >>> > > > > per-job
> >>> > > > > > > > mode,
> >>> > > > > > > > > so
> >>> > > > > > > > > > that JM and TM processes are bound to teh job only. For
> >>> > higher
> >>> > > > > > security
> >>> > > > > > > > > > setups, it is important that processes are not reused
> >>> > across
> >>> > > > > jobs.
> >>> > > > > > > > > >
> >>> > > > > > > > > > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
> >>> > > > > > trohrmann@apache.org>
> >>> > > > > > > > > > wrote:
> >>> > > > > > > > > >
> >>> > > > > > > > > > > I would not be in favour of getting rid of the per-job
> >>> > mode
> >>> > > > > > since it
> >>> > > > > > > > > > > simplifies the process of running Flink jobs
> >>> > considerably.
> >>> > > > > > Moreover,
> >>> > > > > > > > it
> >>> > > > > > > > > > is
> >>> > > > > > > > > > > not only well suited for container deployments but also
> >>> > for
> >>> > > > > > > > deployments
> >>> > > > > > > > > > > where you want to guarantee job isolation. For example, a
> >>> > > > user
> >>> > > > > > could
> >>> > > > > > > > > use
> >>> > > > > > > > > > > the per-job mode on Yarn to execute his job on a separate
> >>> > > > > > cluster.
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > I think that having two notions of cluster deployments
> >>> > > > (session
> >>> > > > > > vs.
> >>> > > > > > > > > > per-job
> >>> > > > > > > > > > > mode) does not necessarily contradict your ideas for the
> >>> > > > client
> >>> > > > > > api
> >>> > > > > > > > > > > refactoring. For example one could have the following
> >>> > > > > interfaces:
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > - ClusterDeploymentDescriptor: encapsulates the logic
> >>> > how to
> >>> > > > > > deploy a
> >>> > > > > > > > > > > cluster.
> >>> > > > > > > > > > > - ClusterClient: allows to interact with a cluster
> >>> > > > > > > > > > > - JobClient: allows to interact with a running job
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > Now the ClusterDeploymentDescriptor could have two
> >>> > methods:
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > - ClusterClient deploySessionCluster()
> >>> > > > > > > > > > > - JobClusterClient/JobClient
> >>> > deployPerJobCluster(JobGraph)
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > where JobClusterClient is either a supertype of
> >>> > ClusterClient
> >>> > > > > > which
> >>> > > > > > > > > does
> >>> > > > > > > > > > > not give you the functionality to submit jobs or
> >>> > > > > > deployPerJobCluster
> >>> > > > > > > > > > > returns directly a JobClient.
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > When setting up the ExecutionEnvironment, one would then
> >>> > not
> >>> > > > > > provide
> >>> > > > > > > > a
> >>> > > > > > > > > > > ClusterClient to submit jobs but a JobDeployer which,
> >>> > > > depending
> >>> > > > > > on
> >>> > > > > > > > the
> >>> > > > > > > > > > > selected mode, either uses a ClusterClient (session
> >>> > mode) to
> >>> > > > > > submit
> >>> > > > > > > > > jobs
> >>> > > > > > > > > > or
> >>> > > > > > > > > > > a ClusterDeploymentDescriptor to deploy per a job mode
> >>> > > > cluster
> >>> > > > > > with
> >>> > > > > > > > the
> >>> > > > > > > > > > job
> >>> > > > > > > > > > > to execute.
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > These are just some thoughts how one could make it
> >>> > working
> >>> > > > > > because I
> >>> > > > > > > > > > > believe there is some value in using the per job mode
> >>> > from
> >>> > > > the
> >>> > > > > > > > > > > ExecutionEnvironment.
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > Concerning the web submission, this is indeed a bit
> >>> > tricky.
> >>> > > > > From
> >>> > > > > > a
> >>> > > > > > > > > > cluster
> >>> > > > > > > > > > > management stand point, I would in favour of not
> >>> > executing
> >>> > > > user
> >>> > > > > > code
> >>> > > > > > > > on
> >>> > > > > > > > > > the
> >>> > > > > > > > > > > REST endpoint. Especially when considering security, it
> >>> > would
> >>> > > > > be
> >>> > > > > > good
> >>> > > > > > > > > to
> >>> > > > > > > > > > > have a well defined cluster behaviour where it is
> >>> > explicitly
> >>> > > > > > stated
> >>> > > > > > > > > where
> >>> > > > > > > > > > > user code and, thus, potentially risky code is executed.
> >>> > > > > Ideally
> >>> > > > > > we
> >>> > > > > > > > > limit
> >>> > > > > > > > > > > it to the TaskExecutor and JobMaster.
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > Cheers,
> >>> > > > > > > > > > > Till
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <
> >>> > > > > > > > > pompermaier@okkam.it
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > wrote:
> >>> > > > > > > > > > >
> >>> > > > > > > > > > > > In my opinion the client should not use any
> >>> > environment to
> >>> > > > > get
> >>> > > > > > the
> >>> > > > > > > > > Job
> >>> > > > > > > > > > > > graph because the jar should reside ONLY on the cluster
> >>> > > > (and
> >>> > > > > > not in
> >>> > > > > > > > > the
> >>> > > > > > > > > > > > client classpath otherwise there are always
> >>> > inconsistencies
> >>> > > > > > between
> >>> > > > > > > > > > > client
> >>> > > > > > > > > > > > and Flink Job manager's classpath).
> >>> > > > > > > > > > > > In the YARN, Mesos and Kubernetes scenarios you have
> >>> > the
> >>> > > > jar
> >>> > > > > > but
> >>> > > > > > > > you
> >>> > > > > > > > > > > could
> >>> > > > > > > > > > > > start a cluster that has the jar on the Job Manager as
> >>> > well
> >>> > > > > > (but
> >>> > > > > > > > this
> >>> > > > > > > > > > is
> >>> > > > > > > > > > > > the only case where I think you can assume that the
> >>> > client
> >>> > > > > has
> >>> > > > > > the
> >>> > > > > > > > > jar
> >>> > > > > > > > > > on
> >>> > > > > > > > > > > > the classpath..in the REST job submission you don't
> >>> > have
> >>> > > > any
> >>> > > > > > > > > > classpath).
> >>> > > > > > > > > > > >
> >>> > > > > > > > > > > > Thus, always in my opinion, the JobGraph should be
> >>> > > > generated
> >>> > > > > > by the
> >>> > > > > > > > > Job
> >>> > > > > > > > > > > > Manager REST API.
> >>> > > > > > > > > > > >
> >>> > > > > > > > > > > >
> >>> > > > > > > > > > > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
> >>> > > > > > wander4096@gmail.com>
> >>> > > > > > > > > > wrote:
> >>> > > > > > > > > > > >
> >>> > > > > > > > > > > >> I would like to involve Till & Stephan here to clarify
> >>> > > > some
> >>> > > > > > > > concept
> >>> > > > > > > > > of
> >>> > > > > > > > > > > >> per-job mode.
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > > >> The term per-job is one of modes a cluster could run
> >>> > on.
> >>> > > > It
> >>> > > > > is
> >>> > > > > > > > > mainly
> >>> > > > > > > > > > > >> aimed
> >>> > > > > > > > > > > >> at spawn
> >>> > > > > > > > > > > >> a dedicated cluster for a specific job while the job
> >>> > could
> >>> > > > > be
> >>> > > > > > > > > packaged
> >>> > > > > > > > > > > >> with
> >>> > > > > > > > > > > >> Flink
> >>> > > > > > > > > > > >> itself and thus the cluster initialized with job so
> >>> > that
> >>> > > > get
> >>> > > > > > rid
> >>> > > > > > > > of
> >>> > > > > > > > > a
> >>> > > > > > > > > > > >> separated
> >>> > > > > > > > > > > >> submission step.
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > > >> This is useful for container deployments where one
> >>> > create
> >>> > > > > his
> >>> > > > > > > > image
> >>> > > > > > > > > > with
> >>> > > > > > > > > > > >> the job
> >>> > > > > > > > > > > >> and then simply deploy the container.
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > > >> However, it is out of client scope since a
> >>> > > > > > client(ClusterClient
> >>> > > > > > > > for
> >>> > > > > > > > > > > >> example) is for
> >>> > > > > > > > > > > >> communicate with an existing cluster and performance
> >>> > > > > actions.
> >>> > > > > > > > > > Currently,
> >>> > > > > > > > > > > >> in
> >>> > > > > > > > > > > >> per-job
> >>> > > > > > > > > > > >> mode, we extract the job graph and bundle it into
> >>> > cluster
> >>> > > > > > > > deployment
> >>> > > > > > > > > > and
> >>> > > > > > > > > > > >> thus no
> >>> > > > > > > > > > > >> concept of client get involved. It looks like
> >>> > reasonable
> >>> > > > to
> >>> > > > > > > > exclude
> >>> > > > > > > > > > the
> >>> > > > > > > > > > > >> deployment
> >>> > > > > > > > > > > >> of per-job cluster from client api and use dedicated
> >>> > > > utility
> >>> > > > > > > > > > > >> classes(deployers) for
> >>> > > > > > > > > > > >> deployment.
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > > >> Zili Chen <wa...@gmail.com> 于2019年8月20日周二
> >>> > 下午12:37写道：
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > > >> > Hi Aljoscha,
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > Thanks for your reply and participance. The Google
> >>> > Doc
> >>> > > > you
> >>> > > > > > > > linked
> >>> > > > > > > > > to
> >>> > > > > > > > > > > >> > requires
> >>> > > > > > > > > > > >> > permission and I think you could use a share link
> >>> > > > instead.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > I agree with that we almost reach a consensus that
> >>> > > > > > JobClient is
> >>> > > > > > > > > > > >> necessary
> >>> > > > > > > > > > > >> > to
> >>> > > > > > > > > > > >> > interacte with a running Job.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > Let me check your open questions one by one.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > 1. Separate cluster creation and job submission for
> >>> > > > > per-job
> >>> > > > > > > > mode.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > As you mentioned here is where the opinions
> >>> > diverge. In
> >>> > > > my
> >>> > > > > > > > > document
> >>> > > > > > > > > > > >> there
> >>> > > > > > > > > > > >> > is
> >>> > > > > > > > > > > >> > an alternative[2] that proposes excluding per-job
> >>> > > > > deployment
> >>> > > > > > > > from
> >>> > > > > > > > > > > client
> >>> > > > > > > > > > > >> > api
> >>> > > > > > > > > > > >> > scope and now I find it is more reasonable we do the
> >>> > > > > > exclusion.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > When in per-job mode, a dedicated JobCluster is
> >>> > launched
> >>> > > > > to
> >>> > > > > > > > > execute
> >>> > > > > > > > > > > the
> >>> > > > > > > > > > > >> > specific job. It is like a Flink Application more
> >>> > than a
> >>> > > > > > > > > submission
> >>> > > > > > > > > > > >> > of Flink Job. Client only takes care of job
> >>> > submission
> >>> > > > and
> >>> > > > > > > > assume
> >>> > > > > > > > > > > there
> >>> > > > > > > > > > > >> is
> >>> > > > > > > > > > > >> > an existing cluster. In this way we are able to
> >>> > consider
> >>> > > > > > per-job
> >>> > > > > > > > > > > issues
> >>> > > > > > > > > > > >> > individually and JobClusterEntrypoint would be the
> >>> > > > utility
> >>> > > > > > class
> >>> > > > > > > > > for
> >>> > > > > > > > > > > >> > per-job
> >>> > > > > > > > > > > >> > deployment.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > Nevertheless, user program works in both session
> >>> > mode
> >>> > > > and
> >>> > > > > > > > per-job
> >>> > > > > > > > > > mode
> >>> > > > > > > > > > > >> > without
> >>> > > > > > > > > > > >> > necessary to change code. JobClient in per-job mode
> >>> > is
> >>> > > > > > returned
> >>> > > > > > > > > from
> >>> > > > > > > > > > > >> > env.execute as normal. However, it would be no
> >>> > longer a
> >>> > > > > > wrapper
> >>> > > > > > > > of
> >>> > > > > > > > > > > >> > RestClusterClient but a wrapper of
> >>> > PerJobClusterClient
> >>> > > > > which
> >>> > > > > > > > > > > >> communicates
> >>> > > > > > > > > > > >> > to Dispatcher locally.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > 2. How to deal with plan preview.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > With env.compile functions users can get JobGraph or
> >>> > > > > > FlinkPlan
> >>> > > > > > > > and
> >>> > > > > > > > > > > thus
> >>> > > > > > > > > > > >> > they can preview the plan with programming.
> >>> > Typically it
> >>> > > > > > looks
> >>> > > > > > > > > like
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > if (preview configured) {
> >>> > > > > > > > > > > >> >     FlinkPlan plan = env.compile();
> >>> > > > > > > > > > > >> >     new JSONDumpGenerator(...).dump(plan);
> >>> > > > > > > > > > > >> > } else {
> >>> > > > > > > > > > > >> >     env.execute();
> >>> > > > > > > > > > > >> > }
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > And `flink info` would be invalid any more.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > 3. How to deal with Jar Submission at the Web
> >>> > Frontend.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > There is one more thread talked on this topic[1].
> >>> > Apart
> >>> > > > > from
> >>> > > > > > > > > > removing
> >>> > > > > > > > > > > >> > the functions there are two alternatives.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > One is to introduce an interface has a method
> >>> > returns
> >>> > > > > > > > > > > JobGraph/FilnkPlan
> >>> > > > > > > > > > > >> > and Jar Submission only support main-class
> >>> > implements
> >>> > > > this
> >>> > > > > > > > > > interface.
> >>> > > > > > > > > > > >> > And then extract the JobGraph/FlinkPlan just by
> >>> > calling
> >>> > > > > the
> >>> > > > > > > > > method.
> >>> > > > > > > > > > > >> > In this way, it is even possible to consider a
> >>> > > > separation
> >>> > > > > > of job
> >>> > > > > > > > > > > >> creation
> >>> > > > > > > > > > > >> > and job submission.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > The other is, as you mentioned, let execute() do the
> >>> > > > > actual
> >>> > > > > > > > > > execution.
> >>> > > > > > > > > > > >> > We won't execute the main method in the WebFrontend
> >>> > but
> >>> > > > > > spawn a
> >>> > > > > > > > > > > process
> >>> > > > > > > > > > > >> > at WebMonitor side to execute. For return part we
> >>> > could
> >>> > > > > > generate
> >>> > > > > > > > > the
> >>> > > > > > > > > > > >> > JobID from WebMonitor and pass it to the execution
> >>> > > > > > environemnt.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > 4. How to deal with detached mode.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > I think detached mode is a temporary solution for
> >>> > > > > > non-blocking
> >>> > > > > > > > > > > >> submission.
> >>> > > > > > > > > > > >> > In my document both submission and execution return
> >>> > a
> >>> > > > > > > > > > > CompletableFuture
> >>> > > > > > > > > > > >> and
> >>> > > > > > > > > > > >> > users control whether or not wait for the result. In
> >>> > > > this
> >>> > > > > > point
> >>> > > > > > > > we
> >>> > > > > > > > > > > don't
> >>> > > > > > > > > > > >> > need a detached option but the functionality is
> >>> > covered.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > 5. How does per-job mode interact with interactive
> >>> > > > > > programming.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > All of YARN, Mesos and Kubernetes scenarios follow
> >>> > the
> >>> > > > > > pattern
> >>> > > > > > > > > > launch
> >>> > > > > > > > > > > a
> >>> > > > > > > > > > > >> > JobCluster now. And I don't think there would be
> >>> > > > > > inconsistency
> >>> > > > > > > > > > between
> >>> > > > > > > > > > > >> > different resource management.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > Best,
> >>> > > > > > > > > > > >> > tison.
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > [1]
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> >>> > > > > > > > > > > >> > [2]
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> > Aljoscha Krettek <al...@apache.org>
> >>> > 于2019年8月16日周五
> >>> > > > > > 下午9:20写道：
> >>> > > > > > > > > > > >> >
> >>> > > > > > > > > > > >> >> Hi,
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >> >> I read both Jeffs initial design document and the
> >>> > newer
> >>> > > > > > > > document
> >>> > > > > > > > > by
> >>> > > > > > > > > > > >> >> Tison. I also finally found the time to collect our
> >>> > > > > > thoughts on
> >>> > > > > > > > > the
> >>> > > > > > > > > > > >> issue,
> >>> > > > > > > > > > > >> >> I had quite some discussions with Kostas and this
> >>> > is
> >>> > > > the
> >>> > > > > > > > result:
> >>> > > > > > > > > > [1].
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >> >> I think overall we agree that this part of the
> >>> > code is
> >>> > > > in
> >>> > > > > > dire
> >>> > > > > > > > > need
> >>> > > > > > > > > > > of
> >>> > > > > > > > > > > >> >> some refactoring/improvements but I think there are
> >>> > > > still
> >>> > > > > > some
> >>> > > > > > > > > open
> >>> > > > > > > > > > > >> >> questions and some differences in opinion what
> >>> > those
> >>> > > > > > > > refactorings
> >>> > > > > > > > > > > >> should
> >>> > > > > > > > > > > >> >> look like.
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >> >> I think the API-side is quite clear, i.e. we need
> >>> > some
> >>> > > > > > > > JobClient
> >>> > > > > > > > > > API
> >>> > > > > > > > > > > >> that
> >>> > > > > > > > > > > >> >> allows interacting with a running Job. It could be
> >>> > > > > > worthwhile
> >>> > > > > > > > to
> >>> > > > > > > > > > spin
> >>> > > > > > > > > > > >> that
> >>> > > > > > > > > > > >> >> off into a separate FLIP because we can probably
> >>> > find
> >>> > > > > > consensus
> >>> > > > > > > > > on
> >>> > > > > > > > > > > that
> >>> > > > > > > > > > > >> >> part more easily.
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >> >> For the rest, the main open questions from our doc
> >>> > are
> >>> > > > > > these:
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >> >>   - Do we want to separate cluster creation and job
> >>> > > > > > submission
> >>> > > > > > > > > for
> >>> > > > > > > > > > > >> >> per-job mode? In the past, there were conscious
> >>> > efforts
> >>> > > > > to
> >>> > > > > > > > *not*
> >>> > > > > > > > > > > >> separate
> >>> > > > > > > > > > > >> >> job submission from cluster creation for per-job
> >>> > > > clusters
> >>> > > > > > for
> >>> > > > > > > > > > Mesos,
> >>> > > > > > > > > > > >> YARN,
> >>> > > > > > > > > > > >> >> Kubernets (see StandaloneJobClusterEntryPoint).
> >>> > Tison
> >>> > > > > > suggests
> >>> > > > > > > > in
> >>> > > > > > > > > > his
> >>> > > > > > > > > > > >> >> design document to decouple this in order to unify
> >>> > job
> >>> > > > > > > > > submission.
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >> >>   - How to deal with plan preview, which needs to
> >>> > > > hijack
> >>> > > > > > > > > execute()
> >>> > > > > > > > > > > and
> >>> > > > > > > > > > > >> >> let the outside code catch an exception?
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >> >>   - How to deal with Jar Submission at the Web
> >>> > > > Frontend,
> >>> > > > > > which
> >>> > > > > > > > > > needs
> >>> > > > > > > > > > > to
> >>> > > > > > > > > > > >> >> hijack execute() and let the outside code catch an
> >>> > > > > > exception?
> >>> > > > > > > > > > > >> >> CliFrontend.run() “hijacks”
> >>> > > > > ExecutionEnvironment.execute()
> >>> > > > > > to
> >>> > > > > > > > > get a
> >>> > > > > > > > > > > >> >> JobGraph and then execute that JobGraph manually.
> >>> > We
> >>> > > > > could
> >>> > > > > > get
> >>> > > > > > > > > > around
> >>> > > > > > > > > > > >> that
> >>> > > > > > > > > > > >> >> by letting execute() do the actual execution. One
> >>> > > > caveat
> >>> > > > > > for
> >>> > > > > > > > this
> >>> > > > > > > > > > is
> >>> > > > > > > > > > > >> that
> >>> > > > > > > > > > > >> >> now the main() method doesn’t return (or is forced
> >>> > to
> >>> > > > > > return by
> >>> > > > > > > > > > > >> throwing an
> >>> > > > > > > > > > > >> >> exception from execute()) which means that for Jar
> >>> > > > > > Submission
> >>> > > > > > > > > from
> >>> > > > > > > > > > > the
> >>> > > > > > > > > > > >> >> WebFrontend we have a long-running main() method
> >>> > > > running
> >>> > > > > > in the
> >>> > > > > > > > > > > >> >> WebFrontend. This doesn’t sound very good. We
> >>> > could get
> >>> > > > > > around
> >>> > > > > > > > > this
> >>> > > > > > > > > > > by
> >>> > > > > > > > > > > >> >> removing the plan preview feature and by removing
> >>> > Jar
> >>> > > > > > > > > > > >> Submission/Running.
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >> >>   - How to deal with detached mode? Right now,
> >>> > > > > > > > > DetachedEnvironment
> >>> > > > > > > > > > > will
> >>> > > > > > > > > > > >> >> execute the job and return immediately. If users
> >>> > > > control
> >>> > > > > > when
> >>> > > > > > > > > they
> >>> > > > > > > > > > > >> want to
> >>> > > > > > > > > > > >> >> return, by waiting on the job completion future,
> >>> > how do
> >>> > > > > we
> >>> > > > > > deal
> >>> > > > > > > > > > with
> >>> > > > > > > > > > > >> this?
> >>> > > > > > > > > > > >> >> Do we simply remove the distinction between
> >>> > > > > > > > > detached/non-detached?
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >> >>   - How does per-job mode interact with
> >>> > “interactive
> >>> > > > > > > > programming”
> >>> > > > > > > > > > > >> >> (FLIP-36). For YARN, each execute() call could
> >>> > spawn a
> >>> > > > > new
> >>> > > > > > > > Flink
> >>> > > > > > > > > > YARN
> >>> > > > > > > > > > > >> >> cluster. What about Mesos and Kubernetes?
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >> >> The first open question is where the opinions
> >>> > diverge,
> >>> > > > I
> >>> > > > > > think.
> >>> > > > > > > > > The
> >>> > > > > > > > > > > >> rest
> >>> > > > > > > > > > > >> >> are just open questions and interesting things
> >>> > that we
> >>> > > > > > need to
> >>> > > > > > > > > > > >> consider.
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >> >> Best,
> >>> > > > > > > > > > > >> >> Aljoscha
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >> >> [1]
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> >>> > > > > > > > > > > >> >> <
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> >>> > > > > > > > > > > >> >> >
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <
> >>> > > > > zjffdu@gmail.com>
> >>> > > > > > > > > wrote:
> >>> > > > > > > > > > > >> >> >
> >>> > > > > > > > > > > >> >> > Thanks tison for the effort. I left a few
> >>> > comments.
> >>> > > > > > > > > > > >> >> >
> >>> > > > > > > > > > > >> >> >
> >>> > > > > > > > > > > >> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三
> >>> > > > > 下午8:24写道：
> >>> > > > > > > > > > > >> >> >
> >>> > > > > > > > > > > >> >> >> Hi Flavio,
> >>> > > > > > > > > > > >> >> >>
> >>> > > > > > > > > > > >> >> >> Thanks for your reply.
> >>> > > > > > > > > > > >> >> >>
> >>> > > > > > > > > > > >> >> >> Either current impl and in the design,
> >>> > ClusterClient
> >>> > > > > > > > > > > >> >> >> never takes responsibility for generating
> >>> > JobGraph.
> >>> > > > > > > > > > > >> >> >> (what you see in current codebase is several
> >>> > class
> >>> > > > > > methods)
> >>> > > > > > > > > > > >> >> >>
> >>> > > > > > > > > > > >> >> >> Instead, user describes his program in the main
> >>> > > > method
> >>> > > > > > > > > > > >> >> >> with ExecutionEnvironment apis and calls
> >>> > > > env.compile()
> >>> > > > > > > > > > > >> >> >> or env.optimize() to get FlinkPlan and JobGraph
> >>> > > > > > > > respectively.
> >>> > > > > > > > > > > >> >> >>
> >>> > > > > > > > > > > >> >> >> For listing main classes in a jar and choose
> >>> > one for
> >>> > > > > > > > > > > >> >> >> submission, you're now able to customize a CLI
> >>> > to do
> >>> > > > > it.
> >>> > > > > > > > > > > >> >> >> Specifically, the path of jar is passed as
> >>> > arguments
> >>> > > > > and
> >>> > > > > > > > > > > >> >> >> in the customized CLI you list main classes,
> >>> > choose
> >>> > > > > one
> >>> > > > > > > > > > > >> >> >> to submit to the cluster.
> >>> > > > > > > > > > > >> >> >>
> >>> > > > > > > > > > > >> >> >> Best,
> >>> > > > > > > > > > > >> >> >> tison.
> >>> > > > > > > > > > > >> >> >>
> >>> > > > > > > > > > > >> >> >>
> >>> > > > > > > > > > > >> >> >> Flavio Pompermaier <po...@okkam.it>
> >>> > > > > 于2019年7月31日周三
> >>> > > > > > > > > > 下午8:12写道：
> >>> > > > > > > > > > > >> >> >>
> >>> > > > > > > > > > > >> >> >>> Just one note on my side: it is not clear to me
> >>> > > > > > whether the
> >>> > > > > > > > > > > client
> >>> > > > > > > > > > > >> >> needs
> >>> > > > > > > > > > > >> >> >> to
> >>> > > > > > > > > > > >> >> >>> be able to generate a job graph or not.
> >>> > > > > > > > > > > >> >> >>> In my opinion, the job jar must resides only
> >>> > on the
> >>> > > > > > > > > > > >> server/jobManager
> >>> > > > > > > > > > > >> >> >> side
> >>> > > > > > > > > > > >> >> >>> and the client requires a way to get the job
> >>> > graph.
> >>> > > > > > > > > > > >> >> >>> If you really want to access to the job graph,
> >>> > I'd
> >>> > > > > add
> >>> > > > > > a
> >>> > > > > > > > > > > dedicated
> >>> > > > > > > > > > > >> >> method
> >>> > > > > > > > > > > >> >> >>> on the ClusterClient. like:
> >>> > > > > > > > > > > >> >> >>>
> >>> > > > > > > > > > > >> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
> >>> > > > > > > > > > > >> >> >>>   - listMainClasses(jarId): List<String>
> >>> > > > > > > > > > > >> >> >>>
> >>> > > > > > > > > > > >> >> >>> These would require some addition also on the
> >>> > job
> >>> > > > > > manager
> >>> > > > > > > > > > > endpoint
> >>> > > > > > > > > > > >> as
> >>> > > > > > > > > > > >> >> >>> well..what do you think?
> >>> > > > > > > > > > > >> >> >>>
> >>> > > > > > > > > > > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <
> >>> > > > > > > > > > wander4096@gmail.com
> >>> > > > > > > > > > > >
> >>> > > > > > > > > > > >> >> wrote:
> >>> > > > > > > > > > > >> >> >>>
> >>> > > > > > > > > > > >> >> >>>> Hi all,
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>> Here is a document[1] on client api
> >>> > enhancement
> >>> > > > from
> >>> > > > > > our
> >>> > > > > > > > > > > >> perspective.
> >>> > > > > > > > > > > >> >> >>>> We have investigated current implementations.
> >>> > And
> >>> > > > we
> >>> > > > > > > > propose
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>> 1. Unify the implementation of cluster
> >>> > deployment
> >>> > > > > and
> >>> > > > > > job
> >>> > > > > > > > > > > >> submission
> >>> > > > > > > > > > > >> >> in
> >>> > > > > > > > > > > >> >> >>>> Flink.
> >>> > > > > > > > > > > >> >> >>>> 2. Provide programmatic interfaces to allow
> >>> > > > flexible
> >>> > > > > > job
> >>> > > > > > > > and
> >>> > > > > > > > > > > >> cluster
> >>> > > > > > > > > > > >> >> >>>> management.
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>> The first proposal is aimed at reducing code
> >>> > paths
> >>> > > > > of
> >>> > > > > > > > > cluster
> >>> > > > > > > > > > > >> >> >> deployment
> >>> > > > > > > > > > > >> >> >>>> and
> >>> > > > > > > > > > > >> >> >>>> job submission so that one can adopt Flink in
> >>> > his
> >>> > > > > > usage
> >>> > > > > > > > > > easily.
> >>> > > > > > > > > > > >> The
> >>> > > > > > > > > > > >> >> >>> second
> >>> > > > > > > > > > > >> >> >>>> proposal is aimed at providing rich
> >>> > interfaces for
> >>> > > > > > > > advanced
> >>> > > > > > > > > > > users
> >>> > > > > > > > > > > >> >> >>>> who want to make accurate control of these
> >>> > stages.
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>> Quick reference on open questions:
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>> 1. Exclude job cluster deployment from client
> >>> > side
> >>> > > > > or
> >>> > > > > > > > > redefine
> >>> > > > > > > > > > > the
> >>> > > > > > > > > > > >> >> >>> semantic
> >>> > > > > > > > > > > >> >> >>>> of job cluster? Since it fits in a process
> >>> > quite
> >>> > > > > > different
> >>> > > > > > > > > > from
> >>> > > > > > > > > > > >> >> session
> >>> > > > > > > > > > > >> >> >>>> cluster deployment and job submission.
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>> 2. Maintain the codepaths handling class
> >>> > > > > > > > > > > o.a.f.api.common.Program
> >>> > > > > > > > > > > >> or
> >>> > > > > > > > > > > >> >> >>>> implement customized program handling logic by
> >>> > > > > > customized
> >>> > > > > > > > > > > >> >> CliFrontend?
> >>> > > > > > > > > > > >> >> >>>> See also this thread[2] and the document[1].
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>> 3. Expose ClusterClient as public api or just
> >>> > > > expose
> >>> > > > > > api
> >>> > > > > > > > in
> >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment
> >>> > > > > > > > > > > >> >> >>>> and delegate them to ClusterClient? Further,
> >>> > in
> >>> > > > > > either way
> >>> > > > > > > > > is
> >>> > > > > > > > > > it
> >>> > > > > > > > > > > >> >> worth
> >>> > > > > > > > > > > >> >> >> to
> >>> > > > > > > > > > > >> >> >>>> introduce a JobClient which is an
> >>> > encapsulation of
> >>> > > > > > > > > > ClusterClient
> >>> > > > > > > > > > > >> that
> >>> > > > > > > > > > > >> >> >>>> associated to specific job?
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>> Best,
> >>> > > > > > > > > > > >> >> >>>> tison.
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>> [1]
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>
> >>> > > > > > > > > > > >> >> >>
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> >>> > > > > > > > > > > >> >> >>>> [2]
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>
> >>> > > > > > > > > > > >> >> >>
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三
> >>> > > > > 上午9:19写道：
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>>> Thanks Stephan, I will follow up this issue
> >>> > in
> >>> > > > next
> >>> > > > > > few
> >>> > > > > > > > > > weeks,
> >>> > > > > > > > > > > >> and
> >>> > > > > > > > > > > >> >> >> will
> >>> > > > > > > > > > > >> >> >>>>> refine the design doc. We could discuss more
> >>> > > > > details
> >>> > > > > > > > after
> >>> > > > > > > > > > 1.9
> >>> > > > > > > > > > > >> >> >> release.
> >>> > > > > > > > > > > >> >> >>>>>
> >>> > > > > > > > > > > >> >> >>>>> Stephan Ewen <se...@apache.org>
> >>> > 于2019年7月24日周三
> >>> > > > > > 上午12:58写道：
> >>> > > > > > > > > > > >> >> >>>>>
> >>> > > > > > > > > > > >> >> >>>>>> Hi all!
> >>> > > > > > > > > > > >> >> >>>>>>
> >>> > > > > > > > > > > >> >> >>>>>> This thread has stalled for a bit, which I
> >>> > > > assume
> >>> > > > > > ist
> >>> > > > > > > > > mostly
> >>> > > > > > > > > > > >> due to
> >>> > > > > > > > > > > >> >> >>> the
> >>> > > > > > > > > > > >> >> >>>>>> Flink 1.9 feature freeze and release testing
> >>> > > > > effort.
> >>> > > > > > > > > > > >> >> >>>>>>
> >>> > > > > > > > > > > >> >> >>>>>> I personally still recognize this issue as
> >>> > one
> >>> > > > > > important
> >>> > > > > > > > > to
> >>> > > > > > > > > > be
> >>> > > > > > > > > > > >> >> >>> solved.
> >>> > > > > > > > > > > >> >> >>>>> I'd
> >>> > > > > > > > > > > >> >> >>>>>> be happy to help resume this discussion soon
> >>> > > > > (after
> >>> > > > > > the
> >>> > > > > > > > > 1.9
> >>> > > > > > > > > > > >> >> >> release)
> >>> > > > > > > > > > > >> >> >>>> and
> >>> > > > > > > > > > > >> >> >>>>>> see if we can do some step towards this in
> >>> > Flink
> >>> > > > > > 1.10.
> >>> > > > > > > > > > > >> >> >>>>>>
> >>> > > > > > > > > > > >> >> >>>>>> Best,
> >>> > > > > > > > > > > >> >> >>>>>> Stephan
> >>> > > > > > > > > > > >> >> >>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>
> >>> > > > > > > > > > > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio
> >>> > > > > Pompermaier
> >>> > > > > > <
> >>> > > > > > > > > > > >> >> >>>>> pompermaier@okkam.it>
> >>> > > > > > > > > > > >> >> >>>>>> wrote:
> >>> > > > > > > > > > > >> >> >>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>> That's exactly what I suggested a long time
> >>> > > > ago:
> >>> > > > > > the
> >>> > > > > > > > > Flink
> >>> > > > > > > > > > > REST
> >>> > > > > > > > > > > >> >> >>>> client
> >>> > > > > > > > > > > >> >> >>>>>>> should not require any Flink dependency,
> >>> > only
> >>> > > > > http
> >>> > > > > > > > > library
> >>> > > > > > > > > > to
> >>> > > > > > > > > > > >> >> >> call
> >>> > > > > > > > > > > >> >> >>>> the
> >>> > > > > > > > > > > >> >> >>>>>> REST
> >>> > > > > > > > > > > >> >> >>>>>>> services to submit and monitor a job.
> >>> > > > > > > > > > > >> >> >>>>>>> What I suggested also in [1] was to have a
> >>> > way
> >>> > > > to
> >>> > > > > > > > > > > automatically
> >>> > > > > > > > > > > >> >> >>>> suggest
> >>> > > > > > > > > > > >> >> >>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>> user (via a UI) the available main classes
> >>> > and
> >>> > > > > > their
> >>> > > > > > > > > > required
> >>> > > > > > > > > > > >> >> >>>>>>> parameters[2].
> >>> > > > > > > > > > > >> >> >>>>>>> Another problem we have with Flink is that
> >>> > the
> >>> > > > > Rest
> >>> > > > > > > > > client
> >>> > > > > > > > > > > and
> >>> > > > > > > > > > > >> >> >> the
> >>> > > > > > > > > > > >> >> >>>> CLI
> >>> > > > > > > > > > > >> >> >>>>>> one
> >>> > > > > > > > > > > >> >> >>>>>>> behaves differently and we use the CLI
> >>> > client
> >>> > > > > (via
> >>> > > > > > ssh)
> >>> > > > > > > > > > > because
> >>> > > > > > > > > > > >> >> >> it
> >>> > > > > > > > > > > >> >> >>>>> allows
> >>> > > > > > > > > > > >> >> >>>>>>> to call some other method after
> >>> > env.execute()
> >>> > > > [3]
> >>> > > > > > (we
> >>> > > > > > > > > have
> >>> > > > > > > > > > to
> >>> > > > > > > > > > > >> >> >> call
> >>> > > > > > > > > > > >> >> >>>>>> another
> >>> > > > > > > > > > > >> >> >>>>>>> REST service to signal the end of the job).
> >>> > > > > > > > > > > >> >> >>>>>>> Int his regard, a dedicated interface,
> >>> > like the
> >>> > > > > > > > > JobListener
> >>> > > > > > > > > > > >> >> >>> suggested
> >>> > > > > > > > > > > >> >> >>>>> in
> >>> > > > > > > > > > > >> >> >>>>>>> the previous emails, would be very helpful
> >>> > > > > (IMHO).
> >>> > > > > > > > > > > >> >> >>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>> [1]
> >>> > > > > > https://issues.apache.org/jira/browse/FLINK-10864
> >>> > > > > > > > > > > >> >> >>>>>>> [2]
> >>> > > > > > https://issues.apache.org/jira/browse/FLINK-10862
> >>> > > > > > > > > > > >> >> >>>>>>> [3]
> >>> > > > > > https://issues.apache.org/jira/browse/FLINK-10879
> >>> > > > > > > > > > > >> >> >>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>> Best,
> >>> > > > > > > > > > > >> >> >>>>>>> Flavio
> >>> > > > > > > > > > > >> >> >>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang
> >>> > <
> >>> > > > > > > > > > zjffdu@gmail.com
> >>> > > > > > > > > > > >
> >>> > > > > > > > > > > >> >> >>> wrote:
> >>> > > > > > > > > > > >> >> >>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>> Hi, Tison,
> >>> > > > > > > > > > > >> >> >>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>> Thanks for your comments. Overall I agree
> >>> > with
> >>> > > > > you
> >>> > > > > > > > that
> >>> > > > > > > > > it
> >>> > > > > > > > > > > is
> >>> > > > > > > > > > > >> >> >>>>> difficult
> >>> > > > > > > > > > > >> >> >>>>>>> for
> >>> > > > > > > > > > > >> >> >>>>>>>> down stream project to integrate with
> >>> > flink
> >>> > > > and
> >>> > > > > we
> >>> > > > > > > > need
> >>> > > > > > > > > to
> >>> > > > > > > > > > > >> >> >>> refactor
> >>> > > > > > > > > > > >> >> >>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>> current flink client api.
> >>> > > > > > > > > > > >> >> >>>>>>>> And I agree that CliFrontend should only
> >>> > > > parsing
> >>> > > > > > > > command
> >>> > > > > > > > > > > line
> >>> > > > > > > > > > > >> >> >>>>> arguments
> >>> > > > > > > > > > > >> >> >>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>> then pass them to ExecutionEnvironment.
> >>> > It is
> >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment's
> >>> > > > > > > > > > > >> >> >>>>>>>> responsibility to compile job, create
> >>> > cluster,
> >>> > > > > and
> >>> > > > > > > > > submit
> >>> > > > > > > > > > > job.
> >>> > > > > > > > > > > >> >> >>>>> Besides
> >>> > > > > > > > > > > >> >> >>>>>>>> that, Currently flink has many
> >>> > > > > > ExecutionEnvironment
> >>> > > > > > > > > > > >> >> >>>> implementations,
> >>> > > > > > > > > > > >> >> >>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>> flink will use the specific one based on
> >>> > the
> >>> > > > > > context.
> >>> > > > > > > > > > IMHO,
> >>> > > > > > > > > > > it
> >>> > > > > > > > > > > >> >> >> is
> >>> > > > > > > > > > > >> >> >>>> not
> >>> > > > > > > > > > > >> >> >>>>>>>> necessary, ExecutionEnvironment should be
> >>> > able
> >>> > > > > to
> >>> > > > > > do
> >>> > > > > > > > the
> >>> > > > > > > > > > > right
> >>> > > > > > > > > > > >> >> >>>> thing
> >>> > > > > > > > > > > >> >> >>>>>>> based
> >>> > > > > > > > > > > >> >> >>>>>>>> on the FlinkConf it is received. Too many
> >>> > > > > > > > > > > ExecutionEnvironment
> >>> > > > > > > > > > > >> >> >>>>>>>> implementation is another burden for
> >>> > > > downstream
> >>> > > > > > > > project
> >>> > > > > > > > > > > >> >> >>>> integration.
> >>> > > > > > > > > > > >> >> >>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>> One thing I'd like to mention is flink's
> >>> > scala
> >>> > > > > > shell
> >>> > > > > > > > and
> >>> > > > > > > > > > sql
> >>> > > > > > > > > > > >> >> >>>> client,
> >>> > > > > > > > > > > >> >> >>>>>>>> although they are sub-modules of flink,
> >>> > they
> >>> > > > > > could be
> >>> > > > > > > > > > > treated
> >>> > > > > > > > > > > >> >> >> as
> >>> > > > > > > > > > > >> >> >>>>>>> downstream
> >>> > > > > > > > > > > >> >> >>>>>>>> project which use flink's client api.
> >>> > > > Currently
> >>> > > > > > you
> >>> > > > > > > > will
> >>> > > > > > > > > > > find
> >>> > > > > > > > > > > >> >> >> it
> >>> > > > > > > > > > > >> >> >>> is
> >>> > > > > > > > > > > >> >> >>>>> not
> >>> > > > > > > > > > > >> >> >>>>>>>> easy for them to integrate with flink,
> >>> > they
> >>> > > > > share
> >>> > > > > > many
> >>> > > > > > > > > > > >> >> >> duplicated
> >>> > > > > > > > > > > >> >> >>>>> code.
> >>> > > > > > > > > > > >> >> >>>>>>> It
> >>> > > > > > > > > > > >> >> >>>>>>>> is another sign that we should refactor
> >>> > flink
> >>> > > > > > client
> >>> > > > > > > > > api.
> >>> > > > > > > > > > > >> >> >>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>> I believe it is a large and hard change,
> >>> > and I
> >>> > > > > am
> >>> > > > > > > > afraid
> >>> > > > > > > > > > we
> >>> > > > > > > > > > > >> can
> >>> > > > > > > > > > > >> >> >>> not
> >>> > > > > > > > > > > >> >> >>>>>> keep
> >>> > > > > > > > > > > >> >> >>>>>>>> compatibility since many of changes are
> >>> > user
> >>> > > > > > facing.
> >>> > > > > > > > > > > >> >> >>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>> Zili Chen <wa...@gmail.com>
> >>> > > > 于2019年6月24日周一
> >>> > > > > > > > > 下午2:53写道：
> >>> > > > > > > > > > > >> >> >>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> Hi all,
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> After a closer look on our client apis,
> >>> > I can
> >>> > > > > see
> >>> > > > > > > > there
> >>> > > > > > > > > > are
> >>> > > > > > > > > > > >> >> >> two
> >>> > > > > > > > > > > >> >> >>>>> major
> >>> > > > > > > > > > > >> >> >>>>>>>>> issues to consistency and integration,
> >>> > namely
> >>> > > > > > > > different
> >>> > > > > > > > > > > >> >> >>>> deployment
> >>> > > > > > > > > > > >> >> >>>>> of
> >>> > > > > > > > > > > >> >> >>>>>>>>> job cluster which couples job graph
> >>> > creation
> >>> > > > > and
> >>> > > > > > > > > cluster
> >>> > > > > > > > > > > >> >> >>>>> deployment,
> >>> > > > > > > > > > > >> >> >>>>>>>>> and submission via CliFrontend confusing
> >>> > > > > control
> >>> > > > > > flow
> >>> > > > > > > > > of
> >>> > > > > > > > > > > job
> >>> > > > > > > > > > > >> >> >>>> graph
> >>> > > > > > > > > > > >> >> >>>>>>>>> compilation and job submission. I'd like
> >>> > to
> >>> > > > > > follow
> >>> > > > > > > > the
> >>> > > > > > > > > > > >> >> >> discuss
> >>> > > > > > > > > > > >> >> >>>>> above,
> >>> > > > > > > > > > > >> >> >>>>>>>>> mainly the process described by Jeff and
> >>> > > > > > Stephan, and
> >>> > > > > > > > > > share
> >>> > > > > > > > > > > >> >> >> my
> >>> > > > > > > > > > > >> >> >>>>>>>>> ideas on these issues.
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend confuses the control flow
> >>> > of
> >>> > > > job
> >>> > > > > > > > > > compilation
> >>> > > > > > > > > > > >> >> >> and
> >>> > > > > > > > > > > >> >> >>>>>>>> submission.
> >>> > > > > > > > > > > >> >> >>>>>>>>> Following the process of job submission
> >>> > > > Stephan
> >>> > > > > > and
> >>> > > > > > > > > Jeff
> >>> > > > > > > > > > > >> >> >>>> described,
> >>> > > > > > > > > > > >> >> >>>>>>>>> execution environment knows all configs
> >>> > of
> >>> > > > the
> >>> > > > > > > > cluster
> >>> > > > > > > > > > and
> >>> > > > > > > > > > > >> >> >>>>>>> topos/settings
> >>> > > > > > > > > > > >> >> >>>>>>>>> of the job. Ideally, in the main method
> >>> > of
> >>> > > > user
> >>> > > > > > > > > program,
> >>> > > > > > > > > > it
> >>> > > > > > > > > > > >> >> >>> calls
> >>> > > > > > > > > > > >> >> >>>>>>>> #execute
> >>> > > > > > > > > > > >> >> >>>>>>>>> (or named #submit) and Flink deploys the
> >>> > > > > cluster,
> >>> > > > > > > > > compile
> >>> > > > > > > > > > > the
> >>> > > > > > > > > > > >> >> >>> job
> >>> > > > > > > > > > > >> >> >>>>>> graph
> >>> > > > > > > > > > > >> >> >>>>>>>>> and submit it to the cluster. However,
> >>> > > > current
> >>> > > > > > > > > > CliFrontend
> >>> > > > > > > > > > > >> >> >> does
> >>> > > > > > > > > > > >> >> >>>> all
> >>> > > > > > > > > > > >> >> >>>>>>> these
> >>> > > > > > > > > > > >> >> >>>>>>>>> things inside its #runProgram method,
> >>> > which
> >>> > > > > > > > introduces
> >>> > > > > > > > > a
> >>> > > > > > > > > > > lot
> >>> > > > > > > > > > > >> >> >> of
> >>> > > > > > > > > > > >> >> >>>>>>>> subclasses
> >>> > > > > > > > > > > >> >> >>>>>>>>> of (stream) execution environment.
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> Actually, it sets up an exec env that
> >>> > hijacks
> >>> > > > > the
> >>> > > > > > > > > > > >> >> >>>>>> #execute/executePlan
> >>> > > > > > > > > > > >> >> >>>>>>>>> method, initializes the job graph and
> >>> > abort
> >>> > > > > > > > execution.
> >>> > > > > > > > > > And
> >>> > > > > > > > > > > >> >> >> then
> >>> > > > > > > > > > > >> >> >>>>>>>>> control flow back to CliFrontend, it
> >>> > deploys
> >>> > > > > the
> >>> > > > > > > > > > cluster(or
> >>> > > > > > > > > > > >> >> >>>>> retrieve
> >>> > > > > > > > > > > >> >> >>>>>>>>> the client) and submits the job graph.
> >>> > This
> >>> > > > is
> >>> > > > > > quite
> >>> > > > > > > > a
> >>> > > > > > > > > > > >> >> >> specific
> >>> > > > > > > > > > > >> >> >>>>>>> internal
> >>> > > > > > > > > > > >> >> >>>>>>>>> process inside Flink and none of
> >>> > consistency
> >>> > > > to
> >>> > > > > > > > > anything.
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> 2) Deployment of job cluster couples job
> >>> > > > graph
> >>> > > > > > > > creation
> >>> > > > > > > > > > and
> >>> > > > > > > > > > > >> >> >>>> cluster
> >>> > > > > > > > > > > >> >> >>>>>>>>> deployment. Abstractly, from user job to
> >>> > a
> >>> > > > > > concrete
> >>> > > > > > > > > > > >> >> >> submission,
> >>> > > > > > > > > > > >> >> >>>> it
> >>> > > > > > > > > > > >> >> >>>>>>>> requires
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>     create JobGraph --\
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> create ClusterClient -->  submit JobGraph
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> such a dependency. ClusterClient was
> >>> > created
> >>> > > > by
> >>> > > > > > > > > deploying
> >>> > > > > > > > > > > or
> >>> > > > > > > > > > > >> >> >>>>>>> retrieving.
> >>> > > > > > > > > > > >> >> >>>>>>>>> JobGraph submission requires a compiled
> >>> > > > > JobGraph
> >>> > > > > > and
> >>> > > > > > > > > > valid
> >>> > > > > > > > > > > >> >> >>>>>>> ClusterClient,
> >>> > > > > > > > > > > >> >> >>>>>>>>> but the creation of ClusterClient is
> >>> > > > abstractly
> >>> > > > > > > > > > independent
> >>> > > > > > > > > > > >> >> >> of
> >>> > > > > > > > > > > >> >> >>>> that
> >>> > > > > > > > > > > >> >> >>>>>> of
> >>> > > > > > > > > > > >> >> >>>>>>>>> JobGraph. However, in job cluster mode,
> >>> > we
> >>> > > > > > deploy job
> >>> > > > > > > > > > > cluster
> >>> > > > > > > > > > > >> >> >>>> with
> >>> > > > > > > > > > > >> >> >>>>> a
> >>> > > > > > > > > > > >> >> >>>>>>> job
> >>> > > > > > > > > > > >> >> >>>>>>>>> graph, which means we use another
> >>> > process:
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> create JobGraph --> deploy cluster with
> >>> > the
> >>> > > > > > JobGraph
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> Here is another inconsistency and
> >>> > downstream
> >>> > > > > > > > > > > projects/client
> >>> > > > > > > > > > > >> >> >>> apis
> >>> > > > > > > > > > > >> >> >>>>> are
> >>> > > > > > > > > > > >> >> >>>>>>>>> forced to handle different cases with
> >>> > rare
> >>> > > > > > supports
> >>> > > > > > > > > from
> >>> > > > > > > > > > > >> >> >> Flink.
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> Since we likely reached a consensus on
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> 1. all configs gathered by Flink
> >>> > > > configuration
> >>> > > > > > and
> >>> > > > > > > > > passed
> >>> > > > > > > > > > > >> >> >>>>>>>>> 2. execution environment knows all
> >>> > configs
> >>> > > > and
> >>> > > > > > > > handles
> >>> > > > > > > > > > > >> >> >>>>> execution(both
> >>> > > > > > > > > > > >> >> >>>>>>>>> deployment and submission)
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> to the issues above I propose eliminating
> >>> > > > > > > > > inconsistencies
> >>> > > > > > > > > > > by
> >>> > > > > > > > > > > >> >> >>>>>> following
> >>> > > > > > > > > > > >> >> >>>>>>>>> approach:
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend should exactly be a front
> >>> > end,
> >>> > > > > at
> >>> > > > > > > > least
> >>> > > > > > > > > > for
> >>> > > > > > > > > > > >> >> >>> "run"
> >>> > > > > > > > > > > >> >> >>>>>>> command.
> >>> > > > > > > > > > > >> >> >>>>>>>>> That means it just gathered and passed
> >>> > all
> >>> > > > > config
> >>> > > > > > > > from
> >>> > > > > > > > > > > >> >> >> command
> >>> > > > > > > > > > > >> >> >>>> line
> >>> > > > > > > > > > > >> >> >>>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>> the main method of user program.
> >>> > Execution
> >>> > > > > > > > environment
> >>> > > > > > > > > > > knows
> >>> > > > > > > > > > > >> >> >>> all
> >>> > > > > > > > > > > >> >> >>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>> info
> >>> > > > > > > > > > > >> >> >>>>>>>>> and with an addition to utils for
> >>> > > > > ClusterClient,
> >>> > > > > > we
> >>> > > > > > > > > > > >> >> >> gracefully
> >>> > > > > > > > > > > >> >> >>>> get
> >>> > > > > > > > > > > >> >> >>>>> a
> >>> > > > > > > > > > > >> >> >>>>>>>>> ClusterClient by deploying or
> >>> > retrieving. In
> >>> > > > > this
> >>> > > > > > > > way,
> >>> > > > > > > > > we
> >>> > > > > > > > > > > >> >> >> don't
> >>> > > > > > > > > > > >> >> >>>>> need
> >>> > > > > > > > > > > >> >> >>>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>> hijack #execute/executePlan methods and
> >>> > can
> >>> > > > > > remove
> >>> > > > > > > > > > various
> >>> > > > > > > > > > > >> >> >>>> hacking
> >>> > > > > > > > > > > >> >> >>>>>>>>> subclasses of exec env, as well as #run
> >>> > > > methods
> >>> > > > > > in
> >>> > > > > > > > > > > >> >> >>>>> ClusterClient(for
> >>> > > > > > > > > > > >> >> >>>>>> an
> >>> > > > > > > > > > > >> >> >>>>>>>>> interface-ized ClusterClient). Now the
> >>> > > > control
> >>> > > > > > flow
> >>> > > > > > > > > flows
> >>> > > > > > > > > > > >> >> >> from
> >>> > > > > > > > > > > >> >> >>>>>>>> CliFrontend
> >>> > > > > > > > > > > >> >> >>>>>>>>> to the main method and never returns.
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> 2) Job cluster means a cluster for the
> >>> > > > specific
> >>> > > > > > job.
> >>> > > > > > > > > From
> >>> > > > > > > > > > > >> >> >>> another
> >>> > > > > > > > > > > >> >> >>>>>>>>> perspective, it is an ephemeral session.
> >>> > We
> >>> > > > may
> >>> > > > > > > > > decouple
> >>> > > > > > > > > > > the
> >>> > > > > > > > > > > >> >> >>>>>> deployment
> >>> > > > > > > > > > > >> >> >>>>>>>>> with a compiled job graph, but start a
> >>> > > > session
> >>> > > > > > with
> >>> > > > > > > > > idle
> >>> > > > > > > > > > > >> >> >>> timeout
> >>> > > > > > > > > > > >> >> >>>>>>>>> and submit the job following.
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> These topics, before we go into more
> >>> > details
> >>> > > > on
> >>> > > > > > > > design
> >>> > > > > > > > > or
> >>> > > > > > > > > > > >> >> >>>>>>> implementation,
> >>> > > > > > > > > > > >> >> >>>>>>>>> are better to be aware and discussed for
> >>> > a
> >>> > > > > > consensus.
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> Best,
> >>> > > > > > > > > > > >> >> >>>>>>>>> tison.
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>> Zili Chen <wa...@gmail.com>
> >>> > > > 于2019年6月20日周四
> >>> > > > > > > > > 上午3:21写道：
> >>> > > > > > > > > > > >> >> >>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>> Hi Jeff,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>> Thanks for raising this thread and the
> >>> > > > design
> >>> > > > > > > > > document!
> >>> > > > > > > > > > > >> >> >>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>> As @Thomas Weise mentioned above,
> >>> > extending
> >>> > > > > > config
> >>> > > > > > > > to
> >>> > > > > > > > > > > flink
> >>> > > > > > > > > > > >> >> >>>>>>>>>> requires far more effort than it should
> >>> > be.
> >>> > > > > > Another
> >>> > > > > > > > > > > example
> >>> > > > > > > > > > > >> >> >>>>>>>>>> is we achieve detach mode by introduce
> >>> > > > another
> >>> > > > > > > > > execution
> >>> > > > > > > > > > > >> >> >>>>>>>>>> environment which also hijack #execute
> >>> > > > method.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>> I agree with your idea that user would
> >>> > > > > > configure all
> >>> > > > > > > > > > > things
> >>> > > > > > > > > > > >> >> >>>>>>>>>> and flink "just" respect it. On this
> >>> > topic I
> >>> > > > > > think
> >>> > > > > > > > the
> >>> > > > > > > > > > > >> >> >> unusual
> >>> > > > > > > > > > > >> >> >>>>>>>>>> control flow when CliFrontend handle
> >>> > "run"
> >>> > > > > > command
> >>> > > > > > > > is
> >>> > > > > > > > > > the
> >>> > > > > > > > > > > >> >> >>>> problem.
> >>> > > > > > > > > > > >> >> >>>>>>>>>> It handles several configs, mainly about
> >>> > > > > cluster
> >>> > > > > > > > > > settings,
> >>> > > > > > > > > > > >> >> >> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>> thus main method of user program is
> >>> > unaware
> >>> > > > of
> >>> > > > > > them.
> >>> > > > > > > > > > Also
> >>> > > > > > > > > > > it
> >>> > > > > > > > > > > >> >> >>>>>> compiles
> >>> > > > > > > > > > > >> >> >>>>>>>>>> app to job graph by run the main method
> >>> > > > with a
> >>> > > > > > > > > hijacked
> >>> > > > > > > > > > > exec
> >>> > > > > > > > > > > >> >> >>>> env,
> >>> > > > > > > > > > > >> >> >>>>>>>>>> which constrain the main method further.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>> I'd like to write down a few of notes on
> >>> > > > > > > > configs/args
> >>> > > > > > > > > > pass
> >>> > > > > > > > > > > >> >> >> and
> >>> > > > > > > > > > > >> >> >>>>>>> respect,
> >>> > > > > > > > > > > >> >> >>>>>>>>>> as well as decoupling job compilation
> >>> > and
> >>> > > > > > > > submission.
> >>> > > > > > > > > > > Share
> >>> > > > > > > > > > > >> >> >> on
> >>> > > > > > > > > > > >> >> >>>>> this
> >>> > > > > > > > > > > >> >> >>>>>>>>>> thread later.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>> Best,
> >>> > > > > > > > > > > >> >> >>>>>>>>>> tison.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com>
> >>> > > > > > 于2019年6月17日周一
> >>> > > > > > > > > > > >> >> >> 下午7:29写道：
> >>> > > > > > > > > > > >> >> >>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the
> >>> > design
> >>> > > > > > > > document.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> We are also working on refactoring
> >>> > > > > > ClusterClient to
> >>> > > > > > > > > > allow
> >>> > > > > > > > > > > >> >> >>>>> flexible
> >>> > > > > > > > > > > >> >> >>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> efficient job management in our
> >>> > real-time
> >>> > > > > > platform.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> We would like to draft a document to
> >>> > share
> >>> > > > > our
> >>> > > > > > > > ideas
> >>> > > > > > > > > > with
> >>> > > > > > > > > > > >> >> >>> you.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> I think it's a good idea to have
> >>> > something
> >>> > > > > like
> >>> > > > > > > > > Apache
> >>> > > > > > > > > > > Livy
> >>> > > > > > > > > > > >> >> >>> for
> >>> > > > > > > > > > > >> >> >>>>>>> Flink,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> the efforts discussed here will take a
> >>> > > > great
> >>> > > > > > step
> >>> > > > > > > > > > forward
> >>> > > > > > > > > > > >> >> >> to
> >>> > > > > > > > > > > >> >> >>>> it.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> Regards,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> Xiaogang
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> Flavio Pompermaier <
> >>> > pompermaier@okkam.it>
> >>> > > > > > > > > > 于2019年6月17日周一
> >>> > > > > > > > > > > >> >> >>>>> 下午7:13写道：
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> Is there any possibility to have
> >>> > something
> >>> > > > > > like
> >>> > > > > > > > > Apache
> >>> > > > > > > > > > > >> >> >> Livy
> >>> > > > > > > > > > > >> >> >>>> [1]
> >>> > > > > > > > > > > >> >> >>>>>>> also
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> for
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> Flink in the future?
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff
> >>> > > > Zhang <
> >>> > > > > > > > > > > >> >> >>> zjffdu@gmail.com
> >>> > > > > > > > > > > >> >> >>>>>
> >>> > > > > > > > > > > >> >> >>>>>>> wrote:
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Any API we expose should not have
> >>> > > > > > dependencies
> >>> > > > > > > > > on
> >>> > > > > > > > > > > >> >> >>> the
> >>> > > > > > > > > > > >> >> >>>>>>> runtime
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> (flink-runtime) package or other
> >>> > > > > > implementation
> >>> > > > > > > > > > > >> >> >> details.
> >>> > > > > > > > > > > >> >> >>> To
> >>> > > > > > > > > > > >> >> >>>>> me,
> >>> > > > > > > > > > > >> >> >>>>>>>> this
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> means
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> that the current ClusterClient
> >>> > cannot be
> >>> > > > > > exposed
> >>> > > > > > > > to
> >>> > > > > > > > > > > >> >> >> users
> >>> > > > > > > > > > > >> >> >>>>>> because
> >>> > > > > > > > > > > >> >> >>>>>>>> it
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> uses
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> quite some classes from the
> >>> > optimiser and
> >>> > > > > > runtime
> >>> > > > > > > > > > > >> >> >>> packages.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> We should change ClusterClient from
> >>> > class
> >>> > > > > to
> >>> > > > > > > > > > interface.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> ExecutionEnvironment only use the
> >>> > > > interface
> >>> > > > > > > > > > > >> >> >> ClusterClient
> >>> > > > > > > > > > > >> >> >>>>> which
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> should be
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> in flink-clients while the concrete
> >>> > > > > > > > implementation
> >>> > > > > > > > > > > >> >> >> class
> >>> > > > > > > > > > > >> >> >>>>> could
> >>> > > > > > > > > > > >> >> >>>>>> be
> >>> > > > > > > > > > > >> >> >>>>>>>> in
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> flink-runtime.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What happens when a
> >>> > failure/restart in
> >>> > > > > the
> >>> > > > > > > > > client
> >>> > > > > > > > > > > >> >> >>>>> happens?
> >>> > > > > > > > > > > >> >> >>>>>>>> There
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> need
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to be a way of re-establishing the
> >>> > > > > > connection to
> >>> > > > > > > > > the
> >>> > > > > > > > > > > >> >> >> job,
> >>> > > > > > > > > > > >> >> >>>> set
> >>> > > > > > > > > > > >> >> >>>>>> up
> >>> > > > > > > > > > > >> >> >>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> listeners again, etc.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Good point.  First we need to define
> >>> > what
> >>> > > > > > does
> >>> > > > > > > > > > > >> >> >>>>> failure/restart
> >>> > > > > > > > > > > >> >> >>>>>> in
> >>> > > > > > > > > > > >> >> >>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> client mean. IIUC, that usually mean
> >>> > > > > network
> >>> > > > > > > > > failure
> >>> > > > > > > > > > > >> >> >>> which
> >>> > > > > > > > > > > >> >> >>>>> will
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> happen in
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> class RestClient. If my
> >>> > understanding is
> >>> > > > > > correct,
> >>> > > > > > > > > > > >> >> >>>>> restart/retry
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> mechanism
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> should be done in RestClient.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Aljoscha Krettek <
> >>> > aljoscha@apache.org>
> >>> > > > > > > > > 于2019年6月11日周二
> >>> > > > > > > > > > > >> >> >>>>>> 下午11:10写道：
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> Some points to consider:
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> * Any API we expose should not have
> >>> > > > > > dependencies
> >>> > > > > > > > > on
> >>> > > > > > > > > > > >> >> >> the
> >>> > > > > > > > > > > >> >> >>>>>> runtime
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> (flink-runtime) package or other
> >>> > > > > > implementation
> >>> > > > > > > > > > > >> >> >>> details.
> >>> > > > > > > > > > > >> >> >>>> To
> >>> > > > > > > > > > > >> >> >>>>>> me,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> this
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> means
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> that the current ClusterClient
> >>> > cannot be
> >>> > > > > > exposed
> >>> > > > > > > > > to
> >>> > > > > > > > > > > >> >> >>> users
> >>> > > > > > > > > > > >> >> >>>>>>> because
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> it
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> uses
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> quite some classes from the
> >>> > optimiser
> >>> > > > and
> >>> > > > > > > > runtime
> >>> > > > > > > > > > > >> >> >>>> packages.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> * What happens when a
> >>> > failure/restart in
> >>> > > > > the
> >>> > > > > > > > > client
> >>> > > > > > > > > > > >> >> >>>>> happens?
> >>> > > > > > > > > > > >> >> >>>>>>>> There
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> need
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> be a way of re-establishing the
> >>> > > > connection
> >>> > > > > > to
> >>> > > > > > > > the
> >>> > > > > > > > > > > >> >> >> job,
> >>> > > > > > > > > > > >> >> >>>> set
> >>> > > > > > > > > > > >> >> >>>>> up
> >>> > > > > > > > > > > >> >> >>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> listeners
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> again, etc.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> Aljoscha
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff
> >>> > Zhang <
> >>> > > > > > > > > > > >> >> >>>> zjffdu@gmail.com>
> >>> > > > > > > > > > > >> >> >>>>>>>> wrote:
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is
> >>> > late as
> >>> > > > > you
> >>> > > > > > > > > > > >> >> >> expected.
> >>> > > > > > > > > > > >> >> >>>>> Here's
> >>> > > > > > > > > > > >> >> >>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> design
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> doc
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> I drafted, welcome any comments and
> >>> > > > > > feedback.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>
> >>> > > > > > > > > > > >> >> >>>>>
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>
> >>> > > > > > > > > > > >> >> >>
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
> >>> > > > > > 于2019年2月14日周四
> >>> > > > > > > > > > > >> >> >>>> 下午8:43写道：
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Nice that this discussion is
> >>> > > > happening.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also
> >>> > revisit the
> >>> > > > > > entire
> >>> > > > > > > > > role
> >>> > > > > > > > > > > >> >> >>> of
> >>> > > > > > > > > > > >> >> >>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> environments
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> again.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the environments take care of
> >>> > the
> >>> > > > > > specific
> >>> > > > > > > > > > > >> >> >> setup
> >>> > > > > > > > > > > >> >> >>>> for
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> standalone
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> (no
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the session ones have control
> >>> > over
> >>> > > > the
> >>> > > > > > > > > session.
> >>> > > > > > > > > > > >> >> >>> The
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> environment
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> holds
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the session client.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - running a job gives a "control"
> >>> > > > object
> >>> > > > > > for
> >>> > > > > > > > > that
> >>> > > > > > > > > > > >> >> >>>> job.
> >>> > > > > > > > > > > >> >> >>>>>> That
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> behavior
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> is
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the same in all environments.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> The actual implementation diverged
> >>> > > > quite
> >>> > > > > > a bit
> >>> > > > > > > > > > > >> >> >> from
> >>> > > > > > > > > > > >> >> >>>>> that.
> >>> > > > > > > > > > > >> >> >>>>>>>> Happy
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> see a
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> discussion about straitening this
> >>> > out
> >>> > > > a
> >>> > > > > > bit
> >>> > > > > > > > > more.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM
> >>> > Jeff
> >>> > > > > > Zhang <
> >>> > > > > > > > > > > >> >> >>>>>>> zjffdu@gmail.com>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> wrote:
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Hi folks,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Sorry for late response, It
> >>> > seems we
> >>> > > > > > reach
> >>> > > > > > > > > > > >> >> >>> consensus
> >>> > > > > > > > > > > >> >> >>>> on
> >>> > > > > > > > > > > >> >> >>>>>>>> this, I
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> will
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> create
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> FLIP for this with more detailed
> >>> > > > design
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org>
> >>> > > > > > 于2018年12月21日周五
> >>> > > > > > > > > > > >> >> >>>>> 上午11:43写道：
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Great to see this discussion
> >>> > seeded!
> >>> > > > > The
> >>> > > > > > > > > > > >> >> >> problems
> >>> > > > > > > > > > > >> >> >>>> you
> >>> > > > > > > > > > > >> >> >>>>>> face
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> with
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also
> >>> > > > > affecting
> >>> > > > > > > > other
> >>> > > > > > > > > > > >> >> >>>>> downstream
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> projects,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> like
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Beam.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint
> >>> > > > restore
> >>> > > > > > option
> >>> > > > > > > > > in
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> [1]
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and that was more difficult
> >>> > than it
> >>> > > > > > should
> >>> > > > > > > > be.
> >>> > > > > > > > > > > >> >> >> The
> >>> > > > > > > > > > > >> >> >>>>> main
> >>> > > > > > > > > > > >> >> >>>>>>>> issue
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> is
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> that
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> environment and cluster client
> >>> > > > aren't
> >>> > > > > > > > > decoupled.
> >>> > > > > > > > > > > >> >> >>>>> Ideally
> >>> > > > > > > > > > > >> >> >>>>>>> it
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> should
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> be
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> possible to just get the
> >>> > matching
> >>> > > > > > cluster
> >>> > > > > > > > > client
> >>> > > > > > > > > > > >> >> >>>> from
> >>> > > > > > > > > > > >> >> >>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> environment
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> then control the job through it
> >>> > > > > > (environment
> >>> > > > > > > > > as
> >>> > > > > > > > > > > >> >> >>>>> factory
> >>> > > > > > > > > > > >> >> >>>>>>> for
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> cluster
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> client). But note that the
> >>> > > > environment
> >>> > > > > > > > classes
> >>> > > > > > > > > > > >> >> >> are
> >>> > > > > > > > > > > >> >> >>>>> part
> >>> > > > > > > > > > > >> >> >>>>>> of
> >>> > > > > > > > > > > >> >> >>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> public
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and it is not straightforward to
> >>> > > > make
> >>> > > > > > larger
> >>> > > > > > > > > > > >> >> >>> changes
> >>> > > > > > > > > > > >> >> >>>>>>> without
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> breaking
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes
> >>> > > > > internal
> >>> > > > > > > > > classes
> >>> > > > > > > > > > > >> >> >>>> like
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> JobGraph and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be
> >>> > > > possible
> >>> > > > > > to
> >>> > > > > > > > wrap
> >>> > > > > > > > > > > >> >> >>> this
> >>> > > > > > > > > > > >> >> >>>>>> with a
> >>> > > > > > > > > > > >> >> >>>>>>>> new
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> public
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> API
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that brings the required job
> >>> > control
> >>> > > > > > > > > > > >> >> >> capabilities
> >>> > > > > > > > > > > >> >> >>>> for
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> downstream
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> projects.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at
> >>> > > > some
> >>> > > > > > of the
> >>> > > > > > > > > > > >> >> >>>>> interfaces
> >>> > > > > > > > > > > >> >> >>>>>> in
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> Beam
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> while
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the
> >>> > > > > > portable
> >>> > > > > > > > job
> >>> > > > > > > > > > > >> >> >> API
> >>> > > > > > > > > > > >> >> >>>> and
> >>> > > > > > > > > > > >> >> >>>>>> [3]
> >>> > > > > > > > > > > >> >> >>>>>>>> for
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> old
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> asynchronous job control from
> >>> > the
> >>> > > > Beam
> >>> > > > > > Java
> >>> > > > > > > > > SDK.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> The backward compatibility
> >>> > > > discussion
> >>> > > > > > [4] is
> >>> > > > > > > > > > > >> >> >> also
> >>> > > > > > > > > > > >> >> >>>>>> relevant
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> here. A
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> new
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> should shield downstream
> >>> > projects
> >>> > > > from
> >>> > > > > > > > > internals
> >>> > > > > > > > > > > >> >> >>> and
> >>> > > > > > > > > > > >> >> >>>>>> allow
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> them to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> interoperate with multiple
> >>> > future
> >>> > > > > Flink
> >>> > > > > > > > > versions
> >>> > > > > > > > > > > >> >> >>> in
> >>> > > > > > > > > > > >> >> >>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>> same
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> release
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> line
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thanks,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thomas
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [1]
> >>> > > > > > > > https://github.com/apache/flink/pull/7249
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [2]
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>
> >>> > > > > > > > > > > >> >> >>>>>
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>
> >>> > > > > > > > > > > >> >> >>
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [3]
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>
> >>> > > > > > > > > > > >> >> >>>>>
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>
> >>> > > > > > > > > > > >> >> >>
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [4]
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>
> >>> > > > > > > > > > > >> >> >>>>>
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>
> >>> > > > > > > > > > > >> >> >>
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM
> >>> > Jeff
> >>> > > > > > Zhang <
> >>> > > > > > > > > > > >> >> >>>>>>>> zjffdu@gmail.com>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> wrote:
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the
> >>> > user
> >>> > > > > > should
> >>> > > > > > > > be
> >>> > > > > > > > > > > >> >> >>> able
> >>> > > > > > > > > > > >> >> >>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>> define
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> where
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> job
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn).
> >>> > This
> >>> > > > is
> >>> > > > > > > > actually
> >>> > > > > > > > > > > >> >> >>>>>> independent
> >>> > > > > > > > > > > >> >> >>>>>>>> of
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> job
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> development and is something
> >>> > which
> >>> > > > is
> >>> > > > > > > > decided
> >>> > > > > > > > > > > >> >> >> at
> >>> > > > > > > > > > > >> >> >>>>>>> deployment
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> time.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> User don't need to specify
> >>> > > > execution
> >>> > > > > > mode
> >>> > > > > > > > > > > >> >> >>>>>>> programmatically.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> They
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> can
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> also
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from
> >>> > the
> >>> > > > > > arguments
> >>> > > > > > > > in
> >>> > > > > > > > > > > >> >> >>> flink
> >>> > > > > > > > > > > >> >> >>>>> run
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> command.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> e.g.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster
> >>> > ....
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that
> >>> > the
> >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment
> >>> > > > > > > > > > > >> >> >>>>>> is
> >>> > > > > > > > > > > >> >> >>>>>>>> not
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> directly
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> initialized by the user and
> >>> > instead
> >>> > > > > > context
> >>> > > > > > > > > > > >> >> >>>> sensitive
> >>> > > > > > > > > > > >> >> >>>>>> how
> >>> > > > > > > > > > > >> >> >>>>>>>> you
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> want
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs.
> >>> > > > IDE,
> >>> > > > > > for
> >>> > > > > > > > > > > >> >> >>> example).
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink
> >>> > > > would
> >>> > > > > > > > create
> >>> > > > > > > > > > > >> >> >>>>> different
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment
> >>> > based
> >>> > > > on
> >>> > > > > > > > > different
> >>> > > > > > > > > > > >> >> >>>>>> submission
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> scenarios
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> (Flink
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is
> >>> > kind of
> >>> > > > > hack
> >>> > > > > > > > > > > >> >> >> approach,
> >>> > > > > > > > > > > >> >> >>>> not
> >>> > > > > > > > > > > >> >> >>>>>> so
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> straightforward.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> What I suggested above is that
> >>> > is
> >>> > > > > that
> >>> > > > > > > > flink
> >>> > > > > > > > > > > >> >> >>> should
> >>> > > > > > > > > > > >> >> >>>>>>> always
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> create
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with
> >>> > > > > different
> >>> > > > > > > > > > > >> >> >>>>> configuration,
> >>> > > > > > > > > > > >> >> >>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> based
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> on
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> configuration it would create
> >>> > the
> >>> > > > > > proper
> >>> > > > > > > > > > > >> >> >>>>> ClusterClient
> >>> > > > > > > > > > > >> >> >>>>>>> for
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> different
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <
> >>> > > > trohrmann@apache.org>
> >>> > > > > > > > > > > >> >> >>>> 于2018年12月20日周四
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> 下午11:18写道：
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> You are probably right that we
> >>> > > > have
> >>> > > > > > code
> >>> > > > > > > > > > > >> >> >>>> duplication
> >>> > > > > > > > > > > >> >> >>>>>>> when
> >>> > > > > > > > > > > >> >> >>>>>>>> it
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> comes
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient.
> >>> > > > This
> >>> > > > > > should
> >>> > > > > > > > > be
> >>> > > > > > > > > > > >> >> >>>>> reduced
> >>> > > > > > > > > > > >> >> >>>>>> in
> >>> > > > > > > > > > > >> >> >>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> future.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the
> >>> > user
> >>> > > > > > should be
> >>> > > > > > > > > > > >> >> >> able
> >>> > > > > > > > > > > >> >> >>> to
> >>> > > > > > > > > > > >> >> >>>>>>> define
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> where
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn).
> >>> > This
> >>> > > > is
> >>> > > > > > > > > actually
> >>> > > > > > > > > > > >> >> >>>>>>> independent
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> of the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> development and is something
> >>> > which
> >>> > > > > is
> >>> > > > > > > > > decided
> >>> > > > > > > > > > > >> >> >> at
> >>> > > > > > > > > > > >> >> >>>>>>>> deployment
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> time.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> To
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> it
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the
> >>> > > > > > ExecutionEnvironment
> >>> > > > > > > > is
> >>> > > > > > > > > > > >> >> >> not
> >>> > > > > > > > > > > >> >> >>>>>>> directly
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> initialized
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> by
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> the user and instead context
> >>> > > > > > sensitive how
> >>> > > > > > > > > you
> >>> > > > > > > > > > > >> >> >>>> want
> >>> > > > > > > > > > > >> >> >>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> execute
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> your
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for
> >>> > example).
> >>> > > > > > > > However, I
> >>> > > > > > > > > > > >> >> >>> agree
> >>> > > > > > > > > > > >> >> >>>>>> that
> >>> > > > > > > > > > > >> >> >>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should
> >>> > give
> >>> > > > you
> >>> > > > > > > > access
> >>> > > > > > > > > to
> >>> > > > > > > > > > > >> >> >>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> ClusterClient
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the
> >>> > > > > > JobGraph or
> >>> > > > > > > > a
> >>> > > > > > > > > > > >> >> >> job
> >>> > > > > > > > > > > >> >> >>>>> plan).
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Till
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36
> >>> > AM
> >>> > > > Jeff
> >>> > > > > > > > Zhang <
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> wrote:
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You
> >>> > are
> >>> > > > > > right
> >>> > > > > > > > > that I
> >>> > > > > > > > > > > >> >> >>>>> expect
> >>> > > > > > > > > > > >> >> >>>>>>>> better
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api
> >>> > which
> >>> > > > > > could be
> >>> > > > > > > > > > > >> >> >> used
> >>> > > > > > > > > > > >> >> >>> by
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> downstream
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> project.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> And
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the
> >>> > flink
> >>> > > > > > ecosystem.
> >>> > > > > > > > > > > >> >> >> When
> >>> > > > > > > > > > > >> >> >>> I
> >>> > > > > > > > > > > >> >> >>>>> look
> >>> > > > > > > > > > > >> >> >>>>>>> at
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> code
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> of
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> flink
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I
> >>> > > > > believe
> >>> > > > > > > > they
> >>> > > > > > > > > > > >> >> >> are
> >>> > > > > > > > > > > >> >> >>>> not
> >>> > > > > > > > > > > >> >> >>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> core of
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> flink,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> but
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of
> >>> > > > flink),
> >>> > > > > I
> >>> > > > > > find
> >>> > > > > > > > > > > >> >> >> many
> >>> > > > > > > > > > > >> >> >>>>>>> duplicated
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> code
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> for
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creating
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user
> >>> > provided
> >>> > > > > > > > > > > >> >> >> configuration
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> (configuration
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> format
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> may
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell
> >>> > and
> >>> > > > > > > > sql-client)
> >>> > > > > > > > > > > >> >> >> and
> >>> > > > > > > > > > > >> >> >>>> then
> >>> > > > > > > > > > > >> >> >>>>>> use
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> that
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't
> >>> > think
> >>> > > > > > this is
> >>> > > > > > > > > > > >> >> >>>>> convenient
> >>> > > > > > > > > > > >> >> >>>>>>> for
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> downstream
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is
> >>> > that
> >>> > > > > > > > downstream
> >>> > > > > > > > > > > >> >> >>>> project
> >>> > > > > > > > > > > >> >> >>>>>> only
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> needs
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> provide
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration info
> >>> > > > (maybe
> >>> > > > > > > > > > > >> >> >> introducing
> >>> > > > > > > > > > > >> >> >>>>> class
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> FlinkConf),
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment
> >>> > based
> >>> > > > on
> >>> > > > > > this
> >>> > > > > > > > > > > >> >> >>>> FlinkConf,
> >>> > > > > > > > > > > >> >> >>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will
> >>> > create
> >>> > > > > the
> >>> > > > > > > > proper
> >>> > > > > > > > > > > >> >> >>>>>>>> ClusterClient.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> It
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> not
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> only
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream
> >>> > > > project
> >>> > > > > > > > > > > >> >> >> development
> >>> > > > > > > > > > > >> >> >>>> but
> >>> > > > > > > > > > > >> >> >>>>>> also
> >>> > > > > > > > > > > >> >> >>>>>>>> be
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> helpful
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> for
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> their integration test with
> >>> > > > flink.
> >>> > > > > > Here's
> >>> > > > > > > > > one
> >>> > > > > > > > > > > >> >> >>>>> sample
> >>> > > > > > > > > > > >> >> >>>>>>> code
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> snippet
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> I
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new
> >>> > > > > > FlinkConf().mode("yarn")
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new
> >>> > > > > > ExecutionEnvironment(conf)
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> >>> > > > env.getClusterClient().queryJobStatus(jobId)
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > env.getClusterClient().cancelJob(jobId)
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
> >>> > > > > trohrmann@apache.org>
> >>> > > > > > > > > > > >> >> >>>>> 于2018年12月11日周二
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> 下午6:28写道：
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to
> >>> > > > > > provide the
> >>> > > > > > > > > > > >> >> >> user
> >>> > > > > > > > > > > >> >> >>>> with
> >>> > > > > > > > > > > >> >> >>>>>>>> better
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was actually
> >>> > an
> >>> > > > > > effort to
> >>> > > > > > > > > > > >> >> >>> achieve
> >>> > > > > > > > > > > >> >> >>>>>> this
> >>> > > > > > > > > > > >> >> >>>>>>>> but
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> it
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> has
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> never
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> been
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However,
> >>> > there
> >>> > > > are
> >>> > > > > > some
> >>> > > > > > > > > > > >> >> >>>> improvement
> >>> > > > > > > > > > > >> >> >>>>>> in
> >>> > > > > > > > > > > >> >> >>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> code
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> base
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> now.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the
> >>> > > > > > NewClusterClient
> >>> > > > > > > > > > > >> >> >>>> interface
> >>> > > > > > > > > > > >> >> >>>>>>> which
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> offers a
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission.
> >>> > > > But I
> >>> > > > > > agree
> >>> > > > > > > > > > > >> >> >> that
> >>> > > > > > > > > > > >> >> >>> we
> >>> > > > > > > > > > > >> >> >>>>>> need
> >>> > > > > > > > > > > >> >> >>>>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> improve
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> in
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if
> >>> > > > > > exposing all
> >>> > > > > > > > > > > >> >> >>>>>> ClusterClient
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> calls
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> via
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>> > because it
> >>> > > > > > would
> >>> > > > > > > > > > > >> >> >> clutter
> >>> > > > > > > > > > > >> >> >>>> the
> >>> > > > > > > > > > > >> >> >>>>>>> class
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> would
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> not
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> a
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns.
> >>> > > > > > Instead one
> >>> > > > > > > > > > > >> >> >> idea
> >>> > > > > > > > > > > >> >> >>>>> could
> >>> > > > > > > > > > > >> >> >>>>>> be
> >>> > > > > > > > > > > >> >> >>>>>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> retrieve
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from
> >>> > the
> >>> > > > > > > > > > > >> >> >>>>> ExecutionEnvironment
> >>> > > > > > > > > > > >> >> >>>>>>>> which
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> can
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> be
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> used
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job
> >>> > control. But
> >>> > > > > > before
> >>> > > > > > > > we
> >>> > > > > > > > > > > >> >> >>> start
> >>> > > > > > > > > > > >> >> >>>>> an
> >>> > > > > > > > > > > >> >> >>>>>>>> effort
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> here,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> we
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> need
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what
> >>> > > > > > functionality we
> >>> > > > > > > > > want
> >>> > > > > > > > > > > >> >> >>> to
> >>> > > > > > > > > > > >> >> >>>>>>> provide.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was
> >>> > that we
> >>> > > > > > have the
> >>> > > > > > > > > > > >> >> >>>>>>>> ClusterDescriptor
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> describing
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> how
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager
> >>> > like
> >>> > > > > > Yarn or
> >>> > > > > > > > > > > >> >> >> Mesos.
> >>> > > > > > > > > > > >> >> >>>> The
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> can
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> be
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink
> >>> > > > clusters
> >>> > > > > > (job
> >>> > > > > > > > and
> >>> > > > > > > > > > > >> >> >>>>> session)
> >>> > > > > > > > > > > >> >> >>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> gives
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> you a
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
> >>> > ClusterClient
> >>> > > > > > > > controls
> >>> > > > > > > > > > > >> >> >> the
> >>> > > > > > > > > > > >> >> >>>>>> cluster
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> (e.g.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> submitting
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running
> >>> > jobs).
> >>> > > > > And
> >>> > > > > > > > then
> >>> > > > > > > > > > > >> >> >>> there
> >>> > > > > > > > > > > >> >> >>>>> was
> >>> > > > > > > > > > > >> >> >>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> idea
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain
> >>> > from
> >>> > > > > the
> >>> > > > > > > > > > > >> >> >>>> ClusterClient
> >>> > > > > > > > > > > >> >> >>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> trigger
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> specific
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a
> >>> > > > > savepoint,
> >>> > > > > > > > > > > >> >> >>> cancelling
> >>> > > > > > > > > > > >> >> >>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>> job).
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> >>> > > > > > > > > > > >> >> >>>>>>
> >>> > > > https://issues.apache.org/jira/browse/FLINK-4272
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at
> >>> > 10:13 AM
> >>> > > > > > Jeff
> >>> > > > > > > > > Zhang
> >>> > > > > > > > > > > >> >> >> <
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> wrote:
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate
> >>> > flink
> >>> > > > > into
> >>> > > > > > > > > apache
> >>> > > > > > > > > > > >> >> >>>>> zeppelin
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> which is
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> an
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several
> >>> > > > > issues
> >>> > > > > > that
> >>> > > > > > > > > is
> >>> > > > > > > > > > > >> >> >>>> caused
> >>> > > > > > > > > > > >> >> >>>>>> by
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> flink
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> client
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> api.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> So
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the
> >>> > > > > following
> >>> > > > > > > > > changes
> >>> > > > > > > > > > > >> >> >>> for
> >>> > > > > > > > > > > >> >> >>>>>> flink
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> client
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> api.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking
> >>> > > > execution.
> >>> > > > > > > > > > > >> >> >> Currently,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which
> >>> > > > would
> >>> > > > > > do 2
> >>> > > > > > > > > > > >> >> >> things,
> >>> > > > > > > > > > > >> >> >>>>> first
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> submit
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is
> >>> > > > > finished.
> >>> > > > > > I'd
> >>> > > > > > > > > like
> >>> > > > > > > > > > > >> >> >>>>>>> introduce a
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> nonblocking
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
> >>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment#submit
> >>> > > > > > > > > > > >> >> >>>>>>> which
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> only
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> submit
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to
> >>> > client.
> >>> > > > And
> >>> > > > > > allow
> >>> > > > > > > > > user
> >>> > > > > > > > > > > >> >> >>> to
> >>> > > > > > > > > > > >> >> >>>>>> query
> >>> > > > > > > > > > > >> >> >>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> job
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> status
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> via
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >> ExecutionEnvironment/StreamExecutionEnvironment,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to
> >>> > > > cancel
> >>> > > > > > job is
> >>> > > > > > > > > via
> >>> > > > > > > > > > > >> >> >>> cli
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> (bin/flink),
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> this
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> is
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> not
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream
> >>> > > > project
> >>> > > > > > to
> >>> > > > > > > > use
> >>> > > > > > > > > > > >> >> >> this
> >>> > > > > > > > > > > >> >> >>>>>>> feature.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> So I'd
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> like
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> add
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in
> >>> > > > > ExecutionEnvironment
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>
> >>> > ExecutionEnvironment/StreamExecutionEnvironment.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> It
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api,
> >>> > we
> >>> > > > > > should use
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> ExecutionEnvironment
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> as
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to
> >>> > > > integrate
> >>> > > > > > with
> >>> > > > > > > > > > > >> >> >> flink.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job
> >>> > > > execution
> >>> > > > > > > > > > > >> >> >> lifecycle.
> >>> > > > > > > > > > > >> >> >>>>>>> Something
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> like
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> following,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> so
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project
> >>> > can do
> >>> > > > > > custom
> >>> > > > > > > > > logic
> >>> > > > > > > > > > > >> >> >>> in
> >>> > > > > > > > > > > >> >> >>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> lifecycle
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> of
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> job.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the
> >>> > > > jobId
> >>> > > > > > after
> >>> > > > > > > > > job
> >>> > > > > > > > > > > >> >> >> is
> >>> > > > > > > > > > > >> >> >>>>>>> submitted
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> use
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> this
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later
> >>> > when
> >>> > > > > > > > necessary.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface
> >>> > JobListener {
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID
> >>> > > > > jobId);
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
> >>> > > > > > onJobExecuted(JobExecutionResult
> >>> > > > > > > > > > > >> >> >>>>> jobResult);
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID
> >>> > > > jobId);
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in
> >>> > > > > > > > > ExecutionEnvironment.
> >>> > > > > > > > > > > >> >> >>>>>> Currently
> >>> > > > > > > > > > > >> >> >>>>>>> it
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> is
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> disabled,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> but
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient
> >>> > for
> >>> > > > > > third
> >>> > > > > > > > > party
> >>> > > > > > > > > > > >> >> >> to
> >>> > > > > > > > > > > >> >> >>>>>>>> submitting
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> jobs
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it
> >>> > > > again.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client
> >>> > api
> >>> > > > > into
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>
> >>> > ExecutionEnvironment/StreamExecutionEnvironment.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue
> >>> > which
> >>> > > > > > needs
> >>> > > > > > > > > more
> >>> > > > > > > > > > > >> >> >>>>> careful
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> thinking
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> design.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features
> >>> > of
> >>> > > > > > flink is
> >>> > > > > > > > > > > >> >> >>> exposed
> >>> > > > > > > > > > > >> >> >>>> in
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>
> >>> > ExecutionEnvironment/StreamExecutionEnvironment,
> >>> > > > > > > > > > > >> >> >>>>>> but
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> some are
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> exposed
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> in
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like
> >>> > the
> >>> > > > > > cancel and
> >>> > > > > > > > > > > >> >> >>>>> savepoint I
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> mentioned
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> above.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is
> >>> > due to
> >>> > > > > that
> >>> > > > > > > > flink
> >>> > > > > > > > > > > >> >> >>>> didn't
> >>> > > > > > > > > > > >> >> >>>>>>> unify
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> interaction
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> with
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3
> >>> > scenarios
> >>> > > > of
> >>> > > > > > flink
> >>> > > > > > > > > > > >> >> >>>> operation
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.
> >>> > Flink
> >>> > > > > will
> >>> > > > > > > > > create
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> LocalEnvironment
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> use
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to
> >>> > > > create
> >>> > > > > > > > > > > >> >> >>> LocalExecutor
> >>> > > > > > > > > > > >> >> >>>>> for
> >>> > > > > > > > > > > >> >> >>>>>>> job
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> execution.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution.
> >>> > Flink
> >>> > > > > will
> >>> > > > > > > > > create
> >>> > > > > > > > > > > >> >> >>>>>>>> ClusterClient
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> first
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> then
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment
> >>> > > > based
> >>> > > > > > on the
> >>> > > > > > > > > > > >> >> >>>>>>> ClusterClient
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> run
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink
> >>> > will
> >>> > > > > > create
> >>> > > > > > > > > > > >> >> >>>>>> ClusterClient
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> first
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> then
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this
> >>> > > > ClusterClient.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the
> >>> > above 3
> >>> > > > > > > > scenarios.
> >>> > > > > > > > > > > >> >> >>> Flink
> >>> > > > > > > > > > > >> >> >>>>>> didn't
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> use the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to
> >>> > interact
> >>> > > > > > with
> >>> > > > > > > > > flink
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is
> >>> > following:
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> >>> > > > > > > > > > > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> (based
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> on
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> user
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this
> >>> > > > > > Environment
> >>> > > > > > > > > to
> >>> > > > > > > > > > > >> >> >>>> create
> >>> > > > > > > > > > > >> >> >>>>>>>> proper
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or
> >>> > > > > > > > RestClusterClient)
> >>> > > > > > > > > > > >> >> >> to
> >>> > > > > > > > > > > >> >> >>>>>>>> interactive
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> with
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink (
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the
> >>> > > > process
> >>> > > > > > of
> >>> > > > > > > > > local
> >>> > > > > > > > > > > >> >> >>>>>> execution
> >>> > > > > > > > > > > >> >> >>>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> remote
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for
> >>> > third
> >>> > > > > > party
> >>> > > > > > > > to
> >>> > > > > > > > > > > >> >> >>>>> integrate
> >>> > > > > > > > > > > >> >> >>>>>>> with
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> flink,
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> because
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the
> >>> > > > > unified
> >>> > > > > > > > entry
> >>> > > > > > > > > > > >> >> >>> point
> >>> > > > > > > > > > > >> >> >>>>> for
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> flink.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> third
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> party
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass
> >>> > > > > > configuration
> >>> > > > > > > > to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will
> >>> > do
> >>> > > > the
> >>> > > > > > right
> >>> > > > > > > > > > > >> >> >> thing
> >>> > > > > > > > > > > >> >> >>>>> based
> >>> > > > > > > > > > > >> >> >>>>>> on
> >>> > > > > > > > > > > >> >> >>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be
> >>> > > > considered
> >>> > > > > as
> >>> > > > > > > > flink
> >>> > > > > > > > > > > >> >> >> api
> >>> > > > > > > > > > > >> >> >>>>>>> consumer.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> it
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> just
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to
> >>> > > > > > ExecutionEnvironment
> >>> > > > > > > > and
> >>> > > > > > > > > > > >> >> >> let
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper
> >>> > ClusterClient
> >>> > > > > > instead
> >>> > > > > > > > > of
> >>> > > > > > > > > > > >> >> >>>>> letting
> >>> > > > > > > > > > > >> >> >>>>>>> cli
> >>> > > > > > > > > > > >> >> >>>>>>>> to
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> create
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code
> >>> > > > > > refactoring,
> >>> > > > > > > > > so
> >>> > > > > > > > > > > >> >> >> I
> >>> > > > > > > > > > > >> >> >>>>> think
> >>> > > > > > > > > > > >> >> >>>>>> we
> >>> > > > > > > > > > > >> >> >>>>>>>> can
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> defer
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> it
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> for
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5
> >>> > could
> >>> > > > > be
> >>> > > > > > done
> >>> > > > > > > > > at
> >>> > > > > > > > > > > >> >> >>>> once I
> >>> > > > > > > > > > > >> >> >>>>>>>>>>> believe.
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Let
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> know
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> your
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback,
> >>> > thanks
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> --
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> --
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> --
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Best Regards
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> --
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Best Regards
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Jeff Zhang
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> --
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Best Regards
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Jeff Zhang
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>> --
> >>> > > > > > > > > > > >> >> >>>>>>>> Best Regards
> >>> > > > > > > > > > > >> >> >>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>> Jeff Zhang
> >>> > > > > > > > > > > >> >> >>>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>>
> >>> > > > > > > > > > > >> >> >>>>>>
> >>> > > > > > > > > > > >> >> >>>>>
> >>> > > > > > > > > > > >> >> >>>>>
> >>> > > > > > > > > > > >> >> >>>>> --
> >>> > > > > > > > > > > >> >> >>>>> Best Regards
> >>> > > > > > > > > > > >> >> >>>>>
> >>> > > > > > > > > > > >> >> >>>>> Jeff Zhang
> >>> > > > > > > > > > > >> >> >>>>>
> >>> > > > > > > > > > > >> >> >>>>
> >>> > > > > > > > > > > >> >> >>>
> >>> > > > > > > > > > > >> >> >>
> >>> > > > > > > > > > > >> >> >
> >>> > > > > > > > > > > >> >> >
> >>> > > > > > > > > > > >> >> > --
> >>> > > > > > > > > > > >> >> > Best Regards
> >>> > > > > > > > > > > >> >> >
> >>> > > > > > > > > > > >> >> > Jeff Zhang
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >> >>
> >>> > > > > > > > > > > >>
> >>> > > > > > > > > > > >
> >>> > > > > > > > > > > >
> >>> > > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> >
> >>>
> >>>
> >>> --
> >>> Best Regards
> >>>
> >>> Jeff Zhang

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Kostas Kloudas <kk...@gmail.com>.

Great idea Tison!

I will create the umbrella issue and post it here so that we are all
on the same page!

Cheers,
Kostas

On Wed, Sep 4, 2019 at 11:36 AM Zili Chen <wa...@gmail.com> wrote:
>
> Hi Kostas & Aljoscha,
>
> I notice that there is a JIRA(FLINK-13946) which could be included
> in this refactor thread. Since we agree on most of directions in
> big picture, is it reasonable that we create an umbrella issue for
> refactor client APIs and also linked subtasks? It would be a better
> way that we join forces of our community.
>
> Best,
> tison.
>
>
> Zili Chen <wa...@gmail.com> 于2019年8月31日周六 下午12:52写道：
>>
>> Great Kostas! Looking forward to your POC!
>>
>> Best,
>> tison.
>>
>>
>> Jeff Zhang <zj...@gmail.com> 于2019年8月30日周五 下午11:07写道：
>>>
>>> Awesome, @Kostas Looking forward your POC.
>>>
>>> Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：
>>>
>>> > Hi all,
>>> >
>>> > I am just writing here to let you know that I am working on a POC that
>>> > tries to refactor the current state of job submission in Flink.
>>> > I want to stress out that it introduces NO CHANGES to the current
>>> > behaviour of Flink. It just re-arranges things and introduces the
>>> > notion of an Executor, which is the entity responsible for taking the
>>> > user-code and submitting it for execution.
>>> >
>>> > Given this, the discussion about the functionality that the JobClient
>>> > will expose to the user can go on independently and the same
>>> > holds for all the open questions so far.
>>> >
>>> > I hope I will have some more new to share soon.
>>> >
>>> > Thanks,
>>> > Kostas
>>> >
>>> > On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <da...@gmail.com> wrote:
>>> > >
>>> > > Hi Zili,
>>> > >
>>> > > It make sense to me that a dedicated cluster is started for a per-job
>>> > > cluster and will not accept more jobs.
>>> > > Just have a question about the command line.
>>> > >
>>> > > Currently we could use the following commands to start different
>>> > clusters.
>>> > > *per-job cluster*
>>> > > ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
>>> > > examples/streaming/WindowJoin.jar
>>> > > *session cluster*
>>> > > ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
>>> > > examples/streaming/WindowJoin.jar
>>> > >
>>> > > What will it look like after client enhancement?
>>> > >
>>> > >
>>> > > Best,
>>> > > Yang
>>> > >
>>> > > Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
>>> > >
>>> > > > Hi Till,
>>> > > >
>>> > > > Thanks for your update. Nice to hear :-)
>>> > > >
>>> > > > Best,
>>> > > > tison.
>>> > > >
>>> > > >
>>> > > > Till Rohrmann <tr...@apache.org> 于2019年8月23日周五 下午10:39写道：
>>> > > >
>>> > > > > Hi Tison,
>>> > > > >
>>> > > > > just a quick comment concerning the class loading issues when using
>>> > the
>>> > > > per
>>> > > > > job mode. The community wants to change it so that the
>>> > > > > StandaloneJobClusterEntryPoint actually uses the user code class
>>> > loader
>>> > > > > with child first class loading [1]. Hence, I hope that this problem
>>> > will
>>> > > > be
>>> > > > > resolved soon.
>>> > > > >
>>> > > > > [1] https://issues.apache.org/jira/browse/FLINK-13840
>>> > > > >
>>> > > > > Cheers,
>>> > > > > Till
>>> > > > >
>>> > > > > On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <kk...@gmail.com>
>>> > > > wrote:
>>> > > > >
>>> > > > > > Hi all,
>>> > > > > >
>>> > > > > > On the topic of web submission, I agree with Till that it only
>>> > seems
>>> > > > > > to complicate things.
>>> > > > > > It is bad for security, job isolation (anybody can submit/cancel
>>> > jobs),
>>> > > > > > and its
>>> > > > > > implementation complicates some parts of the code. So, if it were
>>> > to
>>> > > > > > redesign the
>>> > > > > > WebUI, maybe this part could be left out. In addition, I would say
>>> > > > > > that the ability to cancel
>>> > > > > > jobs could also be left out.
>>> > > > > >
>>> > > > > > Also I would also be in favour of removing the "detached" mode, for
>>> > > > > > the reasons mentioned
>>> > > > > > above (i.e. because now we will have a future representing the
>>> > result
>>> > > > > > on which the user
>>> > > > > > can choose to wait or not).
>>> > > > > >
>>> > > > > > Now for the separating job submission and cluster creation, I am in
>>> > > > > > favour of keeping both.
>>> > > > > > Once again, the reasons are mentioned above by Stephan, Till,
>>> > Aljoscha
>>> > > > > > and also Zili seems
>>> > > > > > to agree. They mainly have to do with security, isolation and ease
>>> > of
>>> > > > > > resource management
>>> > > > > > for the user as he knows that "when my job is done, everything
>>> > will be
>>> > > > > > cleared up". This is
>>> > > > > > also the experience you get when launching a process on your local
>>> > OS.
>>> > > > > >
>>> > > > > > On excluding the per-job mode from returning a JobClient or not, I
>>> > > > > > believe that eventually
>>> > > > > > it would be nice to allow users to get back a jobClient. The
>>> > reason is
>>> > > > > > that 1) I cannot
>>> > > > > > find any objective reason why the user-experience should diverge,
>>> > and
>>> > > > > > 2) this will be the
>>> > > > > > way that the user will be able to interact with his running job.
>>> > > > > > Assuming that the necessary
>>> > > > > > ports are open for the REST API to work, then I think that the
>>> > > > > > JobClient can run against the
>>> > > > > > REST API without problems. If the needed ports are not open, then
>>> > we
>>> > > > > > are safe to not return
>>> > > > > > a JobClient, as the user explicitly chose to close all points of
>>> > > > > > communication to his running job.
>>> > > > > >
>>> > > > > > On the topic of not hijacking the "env.execute()" in order to get
>>> > the
>>> > > > > > Plan, I definitely agree but
>>> > > > > > for the proposal of having a "compile()" method in the env, I would
>>> > > > > > like to have a better look at
>>> > > > > > the existing code.
>>> > > > > >
>>> > > > > > Cheers,
>>> > > > > > Kostas
>>> > > > > >
>>> > > > > > On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <wa...@gmail.com>
>>> > > > wrote:
>>> > > > > > >
>>> > > > > > > Hi Yang,
>>> > > > > > >
>>> > > > > > > It would be helpful if you check Stephan's last comment,
>>> > > > > > > which states that isolation is important.
>>> > > > > > >
>>> > > > > > > For per-job mode, we run a dedicated cluster(maybe it
>>> > > > > > > should have been a couple of JM and TMs during FLIP-6
>>> > > > > > > design) for a specific job. Thus the process is prevented
>>> > > > > > > from other jobs.
>>> > > > > > >
>>> > > > > > > In our cases there was a time we suffered from multi
>>> > > > > > > jobs submitted by different users and they affected
>>> > > > > > > each other so that all ran into an error state. Also,
>>> > > > > > > run the client inside the cluster could save client
>>> > > > > > > resource at some points.
>>> > > > > > >
>>> > > > > > > However, we also face several issues as you mentioned,
>>> > > > > > > that in per-job mode it always uses parent classloader
>>> > > > > > > thus classloading issues occur.
>>> > > > > > >
>>> > > > > > > BTW, one can makes an analogy between session/per-job mode
>>> > > > > > > in  Flink, and client/cluster mode in Spark.
>>> > > > > > >
>>> > > > > > > Best,
>>> > > > > > > tison.
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > Yang Wang <da...@gmail.com> 于2019年8月22日周四 上午11:25写道：
>>> > > > > > >
>>> > > > > > > > From the user's perspective, it is really confused about the
>>> > scope
>>> > > > of
>>> > > > > > > > per-job cluster.
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > If it means a flink cluster with single job, so that we could
>>> > get
>>> > > > > > better
>>> > > > > > > > isolation.
>>> > > > > > > >
>>> > > > > > > > Now it does not matter how we deploy the cluster, directly
>>> > > > > > deploy(mode1)
>>> > > > > > > >
>>> > > > > > > > or start a flink cluster and then submit job through cluster
>>> > > > > > client(mode2).
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > Otherwise, if it just means directly deploy, how should we
>>> > name the
>>> > > > > > mode2,
>>> > > > > > > >
>>> > > > > > > > session with job or something else?
>>> > > > > > > >
>>> > > > > > > > We could also benefit from the mode2. Users could get the same
>>> > > > > > isolation
>>> > > > > > > > with mode1.
>>> > > > > > > >
>>> > > > > > > > The user code and dependencies will be loaded by user class
>>> > loader
>>> > > > > > > >
>>> > > > > > > > to avoid class conflict with framework.
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > Anyway, both of the two submission modes are useful.
>>> > > > > > > >
>>> > > > > > > > We just need to clarify the concepts.
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > Best,
>>> > > > > > > >
>>> > > > > > > > Yang
>>> > > > > > > >
>>> > > > > > > > Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午5:58写道：
>>> > > > > > > >
>>> > > > > > > > > Thanks for the clarification.
>>> > > > > > > > >
>>> > > > > > > > > The idea JobDeployer ever came into my mind when I was
>>> > muddled
>>> > > > with
>>> > > > > > > > > how to execute per-job mode and session mode with the same
>>> > user
>>> > > > > code
>>> > > > > > > > > and framework codepath.
>>> > > > > > > > >
>>> > > > > > > > > With the concept JobDeployer we back to the statement that
>>> > > > > > environment
>>> > > > > > > > > knows every configs of cluster deployment and job
>>> > submission. We
>>> > > > > > > > > configure or generate from configuration a specific
>>> > JobDeployer
>>> > > > in
>>> > > > > > > > > environment and then code align on
>>> > > > > > > > >
>>> > > > > > > > > *JobClient client = env.execute().get();*
>>> > > > > > > > >
>>> > > > > > > > > which in session mode returned by clusterClient.submitJob
>>> > and in
>>> > > > > > per-job
>>> > > > > > > > > mode returned by clusterDescriptor.deployJobCluster.
>>> > > > > > > > >
>>> > > > > > > > > Here comes a problem that currently we directly run
>>> > > > > ClusterEntrypoint
>>> > > > > > > > > with extracted job graph. Follow the JobDeployer way we'd
>>> > better
>>> > > > > > > > > align entry point of per-job deployment at JobDeployer.
>>> > Users run
>>> > > > > > > > > their main method or by a Cli(finally call main method) to
>>> > deploy
>>> > > > > the
>>> > > > > > > > > job cluster.
>>> > > > > > > > >
>>> > > > > > > > > Best,
>>> > > > > > > > > tison.
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > > > Stephan Ewen <se...@apache.org> 于2019年8月20日周二 下午4:40写道：
>>> > > > > > > > >
>>> > > > > > > > > > Till has made some good comments here.
>>> > > > > > > > > >
>>> > > > > > > > > > Two things to add:
>>> > > > > > > > > >
>>> > > > > > > > > >   - The job mode is very nice in the way that it runs the
>>> > > > client
>>> > > > > > inside
>>> > > > > > > > > the
>>> > > > > > > > > > cluster (in the same image/process that is the JM) and thus
>>> > > > > unifies
>>> > > > > > > > both
>>> > > > > > > > > > applications and what the Spark world calls the "driver
>>> > mode".
>>> > > > > > > > > >
>>> > > > > > > > > >   - Another thing I would add is that during the FLIP-6
>>> > design,
>>> > > > > we
>>> > > > > > were
>>> > > > > > > > > > thinking about setups where Dispatcher and JobManager are
>>> > > > > separate
>>> > > > > > > > > > processes.
>>> > > > > > > > > >     A Yarn or Mesos Dispatcher of a session could run
>>> > > > > independently
>>> > > > > > > > (even
>>> > > > > > > > > > as privileged processes executing no code).
>>> > > > > > > > > >     Then you the "per-job" mode could still be helpful:
>>> > when a
>>> > > > > job
>>> > > > > > is
>>> > > > > > > > > > submitted to the dispatcher, it launches the JM again in a
>>> > > > > per-job
>>> > > > > > > > mode,
>>> > > > > > > > > so
>>> > > > > > > > > > that JM and TM processes are bound to teh job only. For
>>> > higher
>>> > > > > > security
>>> > > > > > > > > > setups, it is important that processes are not reused
>>> > across
>>> > > > > jobs.
>>> > > > > > > > > >
>>> > > > > > > > > > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
>>> > > > > > trohrmann@apache.org>
>>> > > > > > > > > > wrote:
>>> > > > > > > > > >
>>> > > > > > > > > > > I would not be in favour of getting rid of the per-job
>>> > mode
>>> > > > > > since it
>>> > > > > > > > > > > simplifies the process of running Flink jobs
>>> > considerably.
>>> > > > > > Moreover,
>>> > > > > > > > it
>>> > > > > > > > > > is
>>> > > > > > > > > > > not only well suited for container deployments but also
>>> > for
>>> > > > > > > > deployments
>>> > > > > > > > > > > where you want to guarantee job isolation. For example, a
>>> > > > user
>>> > > > > > could
>>> > > > > > > > > use
>>> > > > > > > > > > > the per-job mode on Yarn to execute his job on a separate
>>> > > > > > cluster.
>>> > > > > > > > > > >
>>> > > > > > > > > > > I think that having two notions of cluster deployments
>>> > > > (session
>>> > > > > > vs.
>>> > > > > > > > > > per-job
>>> > > > > > > > > > > mode) does not necessarily contradict your ideas for the
>>> > > > client
>>> > > > > > api
>>> > > > > > > > > > > refactoring. For example one could have the following
>>> > > > > interfaces:
>>> > > > > > > > > > >
>>> > > > > > > > > > > - ClusterDeploymentDescriptor: encapsulates the logic
>>> > how to
>>> > > > > > deploy a
>>> > > > > > > > > > > cluster.
>>> > > > > > > > > > > - ClusterClient: allows to interact with a cluster
>>> > > > > > > > > > > - JobClient: allows to interact with a running job
>>> > > > > > > > > > >
>>> > > > > > > > > > > Now the ClusterDeploymentDescriptor could have two
>>> > methods:
>>> > > > > > > > > > >
>>> > > > > > > > > > > - ClusterClient deploySessionCluster()
>>> > > > > > > > > > > - JobClusterClient/JobClient
>>> > deployPerJobCluster(JobGraph)
>>> > > > > > > > > > >
>>> > > > > > > > > > > where JobClusterClient is either a supertype of
>>> > ClusterClient
>>> > > > > > which
>>> > > > > > > > > does
>>> > > > > > > > > > > not give you the functionality to submit jobs or
>>> > > > > > deployPerJobCluster
>>> > > > > > > > > > > returns directly a JobClient.
>>> > > > > > > > > > >
>>> > > > > > > > > > > When setting up the ExecutionEnvironment, one would then
>>> > not
>>> > > > > > provide
>>> > > > > > > > a
>>> > > > > > > > > > > ClusterClient to submit jobs but a JobDeployer which,
>>> > > > depending
>>> > > > > > on
>>> > > > > > > > the
>>> > > > > > > > > > > selected mode, either uses a ClusterClient (session
>>> > mode) to
>>> > > > > > submit
>>> > > > > > > > > jobs
>>> > > > > > > > > > or
>>> > > > > > > > > > > a ClusterDeploymentDescriptor to deploy per a job mode
>>> > > > cluster
>>> > > > > > with
>>> > > > > > > > the
>>> > > > > > > > > > job
>>> > > > > > > > > > > to execute.
>>> > > > > > > > > > >
>>> > > > > > > > > > > These are just some thoughts how one could make it
>>> > working
>>> > > > > > because I
>>> > > > > > > > > > > believe there is some value in using the per job mode
>>> > from
>>> > > > the
>>> > > > > > > > > > > ExecutionEnvironment.
>>> > > > > > > > > > >
>>> > > > > > > > > > > Concerning the web submission, this is indeed a bit
>>> > tricky.
>>> > > > > From
>>> > > > > > a
>>> > > > > > > > > > cluster
>>> > > > > > > > > > > management stand point, I would in favour of not
>>> > executing
>>> > > > user
>>> > > > > > code
>>> > > > > > > > on
>>> > > > > > > > > > the
>>> > > > > > > > > > > REST endpoint. Especially when considering security, it
>>> > would
>>> > > > > be
>>> > > > > > good
>>> > > > > > > > > to
>>> > > > > > > > > > > have a well defined cluster behaviour where it is
>>> > explicitly
>>> > > > > > stated
>>> > > > > > > > > where
>>> > > > > > > > > > > user code and, thus, potentially risky code is executed.
>>> > > > > Ideally
>>> > > > > > we
>>> > > > > > > > > limit
>>> > > > > > > > > > > it to the TaskExecutor and JobMaster.
>>> > > > > > > > > > >
>>> > > > > > > > > > > Cheers,
>>> > > > > > > > > > > Till
>>> > > > > > > > > > >
>>> > > > > > > > > > > On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <
>>> > > > > > > > > pompermaier@okkam.it
>>> > > > > > > > > > >
>>> > > > > > > > > > > wrote:
>>> > > > > > > > > > >
>>> > > > > > > > > > > > In my opinion the client should not use any
>>> > environment to
>>> > > > > get
>>> > > > > > the
>>> > > > > > > > > Job
>>> > > > > > > > > > > > graph because the jar should reside ONLY on the cluster
>>> > > > (and
>>> > > > > > not in
>>> > > > > > > > > the
>>> > > > > > > > > > > > client classpath otherwise there are always
>>> > inconsistencies
>>> > > > > > between
>>> > > > > > > > > > > client
>>> > > > > > > > > > > > and Flink Job manager's classpath).
>>> > > > > > > > > > > > In the YARN, Mesos and Kubernetes scenarios you have
>>> > the
>>> > > > jar
>>> > > > > > but
>>> > > > > > > > you
>>> > > > > > > > > > > could
>>> > > > > > > > > > > > start a cluster that has the jar on the Job Manager as
>>> > well
>>> > > > > > (but
>>> > > > > > > > this
>>> > > > > > > > > > is
>>> > > > > > > > > > > > the only case where I think you can assume that the
>>> > client
>>> > > > > has
>>> > > > > > the
>>> > > > > > > > > jar
>>> > > > > > > > > > on
>>> > > > > > > > > > > > the classpath..in the REST job submission you don't
>>> > have
>>> > > > any
>>> > > > > > > > > > classpath).
>>> > > > > > > > > > > >
>>> > > > > > > > > > > > Thus, always in my opinion, the JobGraph should be
>>> > > > generated
>>> > > > > > by the
>>> > > > > > > > > Job
>>> > > > > > > > > > > > Manager REST API.
>>> > > > > > > > > > > >
>>> > > > > > > > > > > >
>>> > > > > > > > > > > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
>>> > > > > > wander4096@gmail.com>
>>> > > > > > > > > > wrote:
>>> > > > > > > > > > > >
>>> > > > > > > > > > > >> I would like to involve Till & Stephan here to clarify
>>> > > > some
>>> > > > > > > > concept
>>> > > > > > > > > of
>>> > > > > > > > > > > >> per-job mode.
>>> > > > > > > > > > > >>
>>> > > > > > > > > > > >> The term per-job is one of modes a cluster could run
>>> > on.
>>> > > > It
>>> > > > > is
>>> > > > > > > > > mainly
>>> > > > > > > > > > > >> aimed
>>> > > > > > > > > > > >> at spawn
>>> > > > > > > > > > > >> a dedicated cluster for a specific job while the job
>>> > could
>>> > > > > be
>>> > > > > > > > > packaged
>>> > > > > > > > > > > >> with
>>> > > > > > > > > > > >> Flink
>>> > > > > > > > > > > >> itself and thus the cluster initialized with job so
>>> > that
>>> > > > get
>>> > > > > > rid
>>> > > > > > > > of
>>> > > > > > > > > a
>>> > > > > > > > > > > >> separated
>>> > > > > > > > > > > >> submission step.
>>> > > > > > > > > > > >>
>>> > > > > > > > > > > >> This is useful for container deployments where one
>>> > create
>>> > > > > his
>>> > > > > > > > image
>>> > > > > > > > > > with
>>> > > > > > > > > > > >> the job
>>> > > > > > > > > > > >> and then simply deploy the container.
>>> > > > > > > > > > > >>
>>> > > > > > > > > > > >> However, it is out of client scope since a
>>> > > > > > client(ClusterClient
>>> > > > > > > > for
>>> > > > > > > > > > > >> example) is for
>>> > > > > > > > > > > >> communicate with an existing cluster and performance
>>> > > > > actions.
>>> > > > > > > > > > Currently,
>>> > > > > > > > > > > >> in
>>> > > > > > > > > > > >> per-job
>>> > > > > > > > > > > >> mode, we extract the job graph and bundle it into
>>> > cluster
>>> > > > > > > > deployment
>>> > > > > > > > > > and
>>> > > > > > > > > > > >> thus no
>>> > > > > > > > > > > >> concept of client get involved. It looks like
>>> > reasonable
>>> > > > to
>>> > > > > > > > exclude
>>> > > > > > > > > > the
>>> > > > > > > > > > > >> deployment
>>> > > > > > > > > > > >> of per-job cluster from client api and use dedicated
>>> > > > utility
>>> > > > > > > > > > > >> classes(deployers) for
>>> > > > > > > > > > > >> deployment.
>>> > > > > > > > > > > >>
>>> > > > > > > > > > > >> Zili Chen <wa...@gmail.com> 于2019年8月20日周二
>>> > 下午12:37写道：
>>> > > > > > > > > > > >>
>>> > > > > > > > > > > >> > Hi Aljoscha,
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > Thanks for your reply and participance. The Google
>>> > Doc
>>> > > > you
>>> > > > > > > > linked
>>> > > > > > > > > to
>>> > > > > > > > > > > >> > requires
>>> > > > > > > > > > > >> > permission and I think you could use a share link
>>> > > > instead.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > I agree with that we almost reach a consensus that
>>> > > > > > JobClient is
>>> > > > > > > > > > > >> necessary
>>> > > > > > > > > > > >> > to
>>> > > > > > > > > > > >> > interacte with a running Job.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > Let me check your open questions one by one.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > 1. Separate cluster creation and job submission for
>>> > > > > per-job
>>> > > > > > > > mode.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > As you mentioned here is where the opinions
>>> > diverge. In
>>> > > > my
>>> > > > > > > > > document
>>> > > > > > > > > > > >> there
>>> > > > > > > > > > > >> > is
>>> > > > > > > > > > > >> > an alternative[2] that proposes excluding per-job
>>> > > > > deployment
>>> > > > > > > > from
>>> > > > > > > > > > > client
>>> > > > > > > > > > > >> > api
>>> > > > > > > > > > > >> > scope and now I find it is more reasonable we do the
>>> > > > > > exclusion.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > When in per-job mode, a dedicated JobCluster is
>>> > launched
>>> > > > > to
>>> > > > > > > > > execute
>>> > > > > > > > > > > the
>>> > > > > > > > > > > >> > specific job. It is like a Flink Application more
>>> > than a
>>> > > > > > > > > submission
>>> > > > > > > > > > > >> > of Flink Job. Client only takes care of job
>>> > submission
>>> > > > and
>>> > > > > > > > assume
>>> > > > > > > > > > > there
>>> > > > > > > > > > > >> is
>>> > > > > > > > > > > >> > an existing cluster. In this way we are able to
>>> > consider
>>> > > > > > per-job
>>> > > > > > > > > > > issues
>>> > > > > > > > > > > >> > individually and JobClusterEntrypoint would be the
>>> > > > utility
>>> > > > > > class
>>> > > > > > > > > for
>>> > > > > > > > > > > >> > per-job
>>> > > > > > > > > > > >> > deployment.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > Nevertheless, user program works in both session
>>> > mode
>>> > > > and
>>> > > > > > > > per-job
>>> > > > > > > > > > mode
>>> > > > > > > > > > > >> > without
>>> > > > > > > > > > > >> > necessary to change code. JobClient in per-job mode
>>> > is
>>> > > > > > returned
>>> > > > > > > > > from
>>> > > > > > > > > > > >> > env.execute as normal. However, it would be no
>>> > longer a
>>> > > > > > wrapper
>>> > > > > > > > of
>>> > > > > > > > > > > >> > RestClusterClient but a wrapper of
>>> > PerJobClusterClient
>>> > > > > which
>>> > > > > > > > > > > >> communicates
>>> > > > > > > > > > > >> > to Dispatcher locally.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > 2. How to deal with plan preview.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > With env.compile functions users can get JobGraph or
>>> > > > > > FlinkPlan
>>> > > > > > > > and
>>> > > > > > > > > > > thus
>>> > > > > > > > > > > >> > they can preview the plan with programming.
>>> > Typically it
>>> > > > > > looks
>>> > > > > > > > > like
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > if (preview configured) {
>>> > > > > > > > > > > >> >     FlinkPlan plan = env.compile();
>>> > > > > > > > > > > >> >     new JSONDumpGenerator(...).dump(plan);
>>> > > > > > > > > > > >> > } else {
>>> > > > > > > > > > > >> >     env.execute();
>>> > > > > > > > > > > >> > }
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > And `flink info` would be invalid any more.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > 3. How to deal with Jar Submission at the Web
>>> > Frontend.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > There is one more thread talked on this topic[1].
>>> > Apart
>>> > > > > from
>>> > > > > > > > > > removing
>>> > > > > > > > > > > >> > the functions there are two alternatives.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > One is to introduce an interface has a method
>>> > returns
>>> > > > > > > > > > > JobGraph/FilnkPlan
>>> > > > > > > > > > > >> > and Jar Submission only support main-class
>>> > implements
>>> > > > this
>>> > > > > > > > > > interface.
>>> > > > > > > > > > > >> > And then extract the JobGraph/FlinkPlan just by
>>> > calling
>>> > > > > the
>>> > > > > > > > > method.
>>> > > > > > > > > > > >> > In this way, it is even possible to consider a
>>> > > > separation
>>> > > > > > of job
>>> > > > > > > > > > > >> creation
>>> > > > > > > > > > > >> > and job submission.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > The other is, as you mentioned, let execute() do the
>>> > > > > actual
>>> > > > > > > > > > execution.
>>> > > > > > > > > > > >> > We won't execute the main method in the WebFrontend
>>> > but
>>> > > > > > spawn a
>>> > > > > > > > > > > process
>>> > > > > > > > > > > >> > at WebMonitor side to execute. For return part we
>>> > could
>>> > > > > > generate
>>> > > > > > > > > the
>>> > > > > > > > > > > >> > JobID from WebMonitor and pass it to the execution
>>> > > > > > environemnt.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > 4. How to deal with detached mode.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > I think detached mode is a temporary solution for
>>> > > > > > non-blocking
>>> > > > > > > > > > > >> submission.
>>> > > > > > > > > > > >> > In my document both submission and execution return
>>> > a
>>> > > > > > > > > > > CompletableFuture
>>> > > > > > > > > > > >> and
>>> > > > > > > > > > > >> > users control whether or not wait for the result. In
>>> > > > this
>>> > > > > > point
>>> > > > > > > > we
>>> > > > > > > > > > > don't
>>> > > > > > > > > > > >> > need a detached option but the functionality is
>>> > covered.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > 5. How does per-job mode interact with interactive
>>> > > > > > programming.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > All of YARN, Mesos and Kubernetes scenarios follow
>>> > the
>>> > > > > > pattern
>>> > > > > > > > > > launch
>>> > > > > > > > > > > a
>>> > > > > > > > > > > >> > JobCluster now. And I don't think there would be
>>> > > > > > inconsistency
>>> > > > > > > > > > between
>>> > > > > > > > > > > >> > different resource management.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > Best,
>>> > > > > > > > > > > >> > tison.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > [1]
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >>
>>> > > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
>>> > > > > > > > > > > >> > [2]
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >>
>>> > > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > Aljoscha Krettek <al...@apache.org>
>>> > 于2019年8月16日周五
>>> > > > > > 下午9:20写道：
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> >> Hi,
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >> >> I read both Jeffs initial design document and the
>>> > newer
>>> > > > > > > > document
>>> > > > > > > > > by
>>> > > > > > > > > > > >> >> Tison. I also finally found the time to collect our
>>> > > > > > thoughts on
>>> > > > > > > > > the
>>> > > > > > > > > > > >> issue,
>>> > > > > > > > > > > >> >> I had quite some discussions with Kostas and this
>>> > is
>>> > > > the
>>> > > > > > > > result:
>>> > > > > > > > > > [1].
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >> >> I think overall we agree that this part of the
>>> > code is
>>> > > > in
>>> > > > > > dire
>>> > > > > > > > > need
>>> > > > > > > > > > > of
>>> > > > > > > > > > > >> >> some refactoring/improvements but I think there are
>>> > > > still
>>> > > > > > some
>>> > > > > > > > > open
>>> > > > > > > > > > > >> >> questions and some differences in opinion what
>>> > those
>>> > > > > > > > refactorings
>>> > > > > > > > > > > >> should
>>> > > > > > > > > > > >> >> look like.
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >> >> I think the API-side is quite clear, i.e. we need
>>> > some
>>> > > > > > > > JobClient
>>> > > > > > > > > > API
>>> > > > > > > > > > > >> that
>>> > > > > > > > > > > >> >> allows interacting with a running Job. It could be
>>> > > > > > worthwhile
>>> > > > > > > > to
>>> > > > > > > > > > spin
>>> > > > > > > > > > > >> that
>>> > > > > > > > > > > >> >> off into a separate FLIP because we can probably
>>> > find
>>> > > > > > consensus
>>> > > > > > > > > on
>>> > > > > > > > > > > that
>>> > > > > > > > > > > >> >> part more easily.
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >> >> For the rest, the main open questions from our doc
>>> > are
>>> > > > > > these:
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >> >>   - Do we want to separate cluster creation and job
>>> > > > > > submission
>>> > > > > > > > > for
>>> > > > > > > > > > > >> >> per-job mode? In the past, there were conscious
>>> > efforts
>>> > > > > to
>>> > > > > > > > *not*
>>> > > > > > > > > > > >> separate
>>> > > > > > > > > > > >> >> job submission from cluster creation for per-job
>>> > > > clusters
>>> > > > > > for
>>> > > > > > > > > > Mesos,
>>> > > > > > > > > > > >> YARN,
>>> > > > > > > > > > > >> >> Kubernets (see StandaloneJobClusterEntryPoint).
>>> > Tison
>>> > > > > > suggests
>>> > > > > > > > in
>>> > > > > > > > > > his
>>> > > > > > > > > > > >> >> design document to decouple this in order to unify
>>> > job
>>> > > > > > > > > submission.
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >> >>   - How to deal with plan preview, which needs to
>>> > > > hijack
>>> > > > > > > > > execute()
>>> > > > > > > > > > > and
>>> > > > > > > > > > > >> >> let the outside code catch an exception?
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >> >>   - How to deal with Jar Submission at the Web
>>> > > > Frontend,
>>> > > > > > which
>>> > > > > > > > > > needs
>>> > > > > > > > > > > to
>>> > > > > > > > > > > >> >> hijack execute() and let the outside code catch an
>>> > > > > > exception?
>>> > > > > > > > > > > >> >> CliFrontend.run() “hijacks”
>>> > > > > ExecutionEnvironment.execute()
>>> > > > > > to
>>> > > > > > > > > get a
>>> > > > > > > > > > > >> >> JobGraph and then execute that JobGraph manually.
>>> > We
>>> > > > > could
>>> > > > > > get
>>> > > > > > > > > > around
>>> > > > > > > > > > > >> that
>>> > > > > > > > > > > >> >> by letting execute() do the actual execution. One
>>> > > > caveat
>>> > > > > > for
>>> > > > > > > > this
>>> > > > > > > > > > is
>>> > > > > > > > > > > >> that
>>> > > > > > > > > > > >> >> now the main() method doesn’t return (or is forced
>>> > to
>>> > > > > > return by
>>> > > > > > > > > > > >> throwing an
>>> > > > > > > > > > > >> >> exception from execute()) which means that for Jar
>>> > > > > > Submission
>>> > > > > > > > > from
>>> > > > > > > > > > > the
>>> > > > > > > > > > > >> >> WebFrontend we have a long-running main() method
>>> > > > running
>>> > > > > > in the
>>> > > > > > > > > > > >> >> WebFrontend. This doesn’t sound very good. We
>>> > could get
>>> > > > > > around
>>> > > > > > > > > this
>>> > > > > > > > > > > by
>>> > > > > > > > > > > >> >> removing the plan preview feature and by removing
>>> > Jar
>>> > > > > > > > > > > >> Submission/Running.
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >> >>   - How to deal with detached mode? Right now,
>>> > > > > > > > > DetachedEnvironment
>>> > > > > > > > > > > will
>>> > > > > > > > > > > >> >> execute the job and return immediately. If users
>>> > > > control
>>> > > > > > when
>>> > > > > > > > > they
>>> > > > > > > > > > > >> want to
>>> > > > > > > > > > > >> >> return, by waiting on the job completion future,
>>> > how do
>>> > > > > we
>>> > > > > > deal
>>> > > > > > > > > > with
>>> > > > > > > > > > > >> this?
>>> > > > > > > > > > > >> >> Do we simply remove the distinction between
>>> > > > > > > > > detached/non-detached?
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >> >>   - How does per-job mode interact with
>>> > “interactive
>>> > > > > > > > programming”
>>> > > > > > > > > > > >> >> (FLIP-36). For YARN, each execute() call could
>>> > spawn a
>>> > > > > new
>>> > > > > > > > Flink
>>> > > > > > > > > > YARN
>>> > > > > > > > > > > >> >> cluster. What about Mesos and Kubernetes?
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >> >> The first open question is where the opinions
>>> > diverge,
>>> > > > I
>>> > > > > > think.
>>> > > > > > > > > The
>>> > > > > > > > > > > >> rest
>>> > > > > > > > > > > >> >> are just open questions and interesting things
>>> > that we
>>> > > > > > need to
>>> > > > > > > > > > > >> consider.
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >> >> Best,
>>> > > > > > > > > > > >> >> Aljoscha
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >> >> [1]
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >>
>>> > > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>>> > > > > > > > > > > >> >> <
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >>
>>> > > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>>> > > > > > > > > > > >> >> >
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <
>>> > > > > zjffdu@gmail.com>
>>> > > > > > > > > wrote:
>>> > > > > > > > > > > >> >> >
>>> > > > > > > > > > > >> >> > Thanks tison for the effort. I left a few
>>> > comments.
>>> > > > > > > > > > > >> >> >
>>> > > > > > > > > > > >> >> >
>>> > > > > > > > > > > >> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三
>>> > > > > 下午8:24写道：
>>> > > > > > > > > > > >> >> >
>>> > > > > > > > > > > >> >> >> Hi Flavio,
>>> > > > > > > > > > > >> >> >>
>>> > > > > > > > > > > >> >> >> Thanks for your reply.
>>> > > > > > > > > > > >> >> >>
>>> > > > > > > > > > > >> >> >> Either current impl and in the design,
>>> > ClusterClient
>>> > > > > > > > > > > >> >> >> never takes responsibility for generating
>>> > JobGraph.
>>> > > > > > > > > > > >> >> >> (what you see in current codebase is several
>>> > class
>>> > > > > > methods)
>>> > > > > > > > > > > >> >> >>
>>> > > > > > > > > > > >> >> >> Instead, user describes his program in the main
>>> > > > method
>>> > > > > > > > > > > >> >> >> with ExecutionEnvironment apis and calls
>>> > > > env.compile()
>>> > > > > > > > > > > >> >> >> or env.optimize() to get FlinkPlan and JobGraph
>>> > > > > > > > respectively.
>>> > > > > > > > > > > >> >> >>
>>> > > > > > > > > > > >> >> >> For listing main classes in a jar and choose
>>> > one for
>>> > > > > > > > > > > >> >> >> submission, you're now able to customize a CLI
>>> > to do
>>> > > > > it.
>>> > > > > > > > > > > >> >> >> Specifically, the path of jar is passed as
>>> > arguments
>>> > > > > and
>>> > > > > > > > > > > >> >> >> in the customized CLI you list main classes,
>>> > choose
>>> > > > > one
>>> > > > > > > > > > > >> >> >> to submit to the cluster.
>>> > > > > > > > > > > >> >> >>
>>> > > > > > > > > > > >> >> >> Best,
>>> > > > > > > > > > > >> >> >> tison.
>>> > > > > > > > > > > >> >> >>
>>> > > > > > > > > > > >> >> >>
>>> > > > > > > > > > > >> >> >> Flavio Pompermaier <po...@okkam.it>
>>> > > > > 于2019年7月31日周三
>>> > > > > > > > > > 下午8:12写道：
>>> > > > > > > > > > > >> >> >>
>>> > > > > > > > > > > >> >> >>> Just one note on my side: it is not clear to me
>>> > > > > > whether the
>>> > > > > > > > > > > client
>>> > > > > > > > > > > >> >> needs
>>> > > > > > > > > > > >> >> >> to
>>> > > > > > > > > > > >> >> >>> be able to generate a job graph or not.
>>> > > > > > > > > > > >> >> >>> In my opinion, the job jar must resides only
>>> > on the
>>> > > > > > > > > > > >> server/jobManager
>>> > > > > > > > > > > >> >> >> side
>>> > > > > > > > > > > >> >> >>> and the client requires a way to get the job
>>> > graph.
>>> > > > > > > > > > > >> >> >>> If you really want to access to the job graph,
>>> > I'd
>>> > > > > add
>>> > > > > > a
>>> > > > > > > > > > > dedicated
>>> > > > > > > > > > > >> >> method
>>> > > > > > > > > > > >> >> >>> on the ClusterClient. like:
>>> > > > > > > > > > > >> >> >>>
>>> > > > > > > > > > > >> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
>>> > > > > > > > > > > >> >> >>>   - listMainClasses(jarId): List<String>
>>> > > > > > > > > > > >> >> >>>
>>> > > > > > > > > > > >> >> >>> These would require some addition also on the
>>> > job
>>> > > > > > manager
>>> > > > > > > > > > > endpoint
>>> > > > > > > > > > > >> as
>>> > > > > > > > > > > >> >> >>> well..what do you think?
>>> > > > > > > > > > > >> >> >>>
>>> > > > > > > > > > > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <
>>> > > > > > > > > > wander4096@gmail.com
>>> > > > > > > > > > > >
>>> > > > > > > > > > > >> >> wrote:
>>> > > > > > > > > > > >> >> >>>
>>> > > > > > > > > > > >> >> >>>> Hi all,
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>> Here is a document[1] on client api
>>> > enhancement
>>> > > > from
>>> > > > > > our
>>> > > > > > > > > > > >> perspective.
>>> > > > > > > > > > > >> >> >>>> We have investigated current implementations.
>>> > And
>>> > > > we
>>> > > > > > > > propose
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>> 1. Unify the implementation of cluster
>>> > deployment
>>> > > > > and
>>> > > > > > job
>>> > > > > > > > > > > >> submission
>>> > > > > > > > > > > >> >> in
>>> > > > > > > > > > > >> >> >>>> Flink.
>>> > > > > > > > > > > >> >> >>>> 2. Provide programmatic interfaces to allow
>>> > > > flexible
>>> > > > > > job
>>> > > > > > > > and
>>> > > > > > > > > > > >> cluster
>>> > > > > > > > > > > >> >> >>>> management.
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>> The first proposal is aimed at reducing code
>>> > paths
>>> > > > > of
>>> > > > > > > > > cluster
>>> > > > > > > > > > > >> >> >> deployment
>>> > > > > > > > > > > >> >> >>>> and
>>> > > > > > > > > > > >> >> >>>> job submission so that one can adopt Flink in
>>> > his
>>> > > > > > usage
>>> > > > > > > > > > easily.
>>> > > > > > > > > > > >> The
>>> > > > > > > > > > > >> >> >>> second
>>> > > > > > > > > > > >> >> >>>> proposal is aimed at providing rich
>>> > interfaces for
>>> > > > > > > > advanced
>>> > > > > > > > > > > users
>>> > > > > > > > > > > >> >> >>>> who want to make accurate control of these
>>> > stages.
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>> Quick reference on open questions:
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>> 1. Exclude job cluster deployment from client
>>> > side
>>> > > > > or
>>> > > > > > > > > redefine
>>> > > > > > > > > > > the
>>> > > > > > > > > > > >> >> >>> semantic
>>> > > > > > > > > > > >> >> >>>> of job cluster? Since it fits in a process
>>> > quite
>>> > > > > > different
>>> > > > > > > > > > from
>>> > > > > > > > > > > >> >> session
>>> > > > > > > > > > > >> >> >>>> cluster deployment and job submission.
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>> 2. Maintain the codepaths handling class
>>> > > > > > > > > > > o.a.f.api.common.Program
>>> > > > > > > > > > > >> or
>>> > > > > > > > > > > >> >> >>>> implement customized program handling logic by
>>> > > > > > customized
>>> > > > > > > > > > > >> >> CliFrontend?
>>> > > > > > > > > > > >> >> >>>> See also this thread[2] and the document[1].
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>> 3. Expose ClusterClient as public api or just
>>> > > > expose
>>> > > > > > api
>>> > > > > > > > in
>>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment
>>> > > > > > > > > > > >> >> >>>> and delegate them to ClusterClient? Further,
>>> > in
>>> > > > > > either way
>>> > > > > > > > > is
>>> > > > > > > > > > it
>>> > > > > > > > > > > >> >> worth
>>> > > > > > > > > > > >> >> >> to
>>> > > > > > > > > > > >> >> >>>> introduce a JobClient which is an
>>> > encapsulation of
>>> > > > > > > > > > ClusterClient
>>> > > > > > > > > > > >> that
>>> > > > > > > > > > > >> >> >>>> associated to specific job?
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>> Best,
>>> > > > > > > > > > > >> >> >>>> tison.
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>> [1]
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>
>>> > > > > > > > > > > >> >> >>
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >>
>>> > > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
>>> > > > > > > > > > > >> >> >>>> [2]
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>
>>> > > > > > > > > > > >> >> >>
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >>
>>> > > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三
>>> > > > > 上午9:19写道：
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>>> Thanks Stephan, I will follow up this issue
>>> > in
>>> > > > next
>>> > > > > > few
>>> > > > > > > > > > weeks,
>>> > > > > > > > > > > >> and
>>> > > > > > > > > > > >> >> >> will
>>> > > > > > > > > > > >> >> >>>>> refine the design doc. We could discuss more
>>> > > > > details
>>> > > > > > > > after
>>> > > > > > > > > > 1.9
>>> > > > > > > > > > > >> >> >> release.
>>> > > > > > > > > > > >> >> >>>>>
>>> > > > > > > > > > > >> >> >>>>> Stephan Ewen <se...@apache.org>
>>> > 于2019年7月24日周三
>>> > > > > > 上午12:58写道：
>>> > > > > > > > > > > >> >> >>>>>
>>> > > > > > > > > > > >> >> >>>>>> Hi all!
>>> > > > > > > > > > > >> >> >>>>>>
>>> > > > > > > > > > > >> >> >>>>>> This thread has stalled for a bit, which I
>>> > > > assume
>>> > > > > > ist
>>> > > > > > > > > mostly
>>> > > > > > > > > > > >> due to
>>> > > > > > > > > > > >> >> >>> the
>>> > > > > > > > > > > >> >> >>>>>> Flink 1.9 feature freeze and release testing
>>> > > > > effort.
>>> > > > > > > > > > > >> >> >>>>>>
>>> > > > > > > > > > > >> >> >>>>>> I personally still recognize this issue as
>>> > one
>>> > > > > > important
>>> > > > > > > > > to
>>> > > > > > > > > > be
>>> > > > > > > > > > > >> >> >>> solved.
>>> > > > > > > > > > > >> >> >>>>> I'd
>>> > > > > > > > > > > >> >> >>>>>> be happy to help resume this discussion soon
>>> > > > > (after
>>> > > > > > the
>>> > > > > > > > > 1.9
>>> > > > > > > > > > > >> >> >> release)
>>> > > > > > > > > > > >> >> >>>> and
>>> > > > > > > > > > > >> >> >>>>>> see if we can do some step towards this in
>>> > Flink
>>> > > > > > 1.10.
>>> > > > > > > > > > > >> >> >>>>>>
>>> > > > > > > > > > > >> >> >>>>>> Best,
>>> > > > > > > > > > > >> >> >>>>>> Stephan
>>> > > > > > > > > > > >> >> >>>>>>
>>> > > > > > > > > > > >> >> >>>>>>
>>> > > > > > > > > > > >> >> >>>>>>
>>> > > > > > > > > > > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio
>>> > > > > Pompermaier
>>> > > > > > <
>>> > > > > > > > > > > >> >> >>>>> pompermaier@okkam.it>
>>> > > > > > > > > > > >> >> >>>>>> wrote:
>>> > > > > > > > > > > >> >> >>>>>>
>>> > > > > > > > > > > >> >> >>>>>>> That's exactly what I suggested a long time
>>> > > > ago:
>>> > > > > > the
>>> > > > > > > > > Flink
>>> > > > > > > > > > > REST
>>> > > > > > > > > > > >> >> >>>> client
>>> > > > > > > > > > > >> >> >>>>>>> should not require any Flink dependency,
>>> > only
>>> > > > > http
>>> > > > > > > > > library
>>> > > > > > > > > > to
>>> > > > > > > > > > > >> >> >> call
>>> > > > > > > > > > > >> >> >>>> the
>>> > > > > > > > > > > >> >> >>>>>> REST
>>> > > > > > > > > > > >> >> >>>>>>> services to submit and monitor a job.
>>> > > > > > > > > > > >> >> >>>>>>> What I suggested also in [1] was to have a
>>> > way
>>> > > > to
>>> > > > > > > > > > > automatically
>>> > > > > > > > > > > >> >> >>>> suggest
>>> > > > > > > > > > > >> >> >>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>> user (via a UI) the available main classes
>>> > and
>>> > > > > > their
>>> > > > > > > > > > required
>>> > > > > > > > > > > >> >> >>>>>>> parameters[2].
>>> > > > > > > > > > > >> >> >>>>>>> Another problem we have with Flink is that
>>> > the
>>> > > > > Rest
>>> > > > > > > > > client
>>> > > > > > > > > > > and
>>> > > > > > > > > > > >> >> >> the
>>> > > > > > > > > > > >> >> >>>> CLI
>>> > > > > > > > > > > >> >> >>>>>> one
>>> > > > > > > > > > > >> >> >>>>>>> behaves differently and we use the CLI
>>> > client
>>> > > > > (via
>>> > > > > > ssh)
>>> > > > > > > > > > > because
>>> > > > > > > > > > > >> >> >> it
>>> > > > > > > > > > > >> >> >>>>> allows
>>> > > > > > > > > > > >> >> >>>>>>> to call some other method after
>>> > env.execute()
>>> > > > [3]
>>> > > > > > (we
>>> > > > > > > > > have
>>> > > > > > > > > > to
>>> > > > > > > > > > > >> >> >> call
>>> > > > > > > > > > > >> >> >>>>>> another
>>> > > > > > > > > > > >> >> >>>>>>> REST service to signal the end of the job).
>>> > > > > > > > > > > >> >> >>>>>>> Int his regard, a dedicated interface,
>>> > like the
>>> > > > > > > > > JobListener
>>> > > > > > > > > > > >> >> >>> suggested
>>> > > > > > > > > > > >> >> >>>>> in
>>> > > > > > > > > > > >> >> >>>>>>> the previous emails, would be very helpful
>>> > > > > (IMHO).
>>> > > > > > > > > > > >> >> >>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>> [1]
>>> > > > > > https://issues.apache.org/jira/browse/FLINK-10864
>>> > > > > > > > > > > >> >> >>>>>>> [2]
>>> > > > > > https://issues.apache.org/jira/browse/FLINK-10862
>>> > > > > > > > > > > >> >> >>>>>>> [3]
>>> > > > > > https://issues.apache.org/jira/browse/FLINK-10879
>>> > > > > > > > > > > >> >> >>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>> Best,
>>> > > > > > > > > > > >> >> >>>>>>> Flavio
>>> > > > > > > > > > > >> >> >>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang
>>> > <
>>> > > > > > > > > > zjffdu@gmail.com
>>> > > > > > > > > > > >
>>> > > > > > > > > > > >> >> >>> wrote:
>>> > > > > > > > > > > >> >> >>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>> Hi, Tison,
>>> > > > > > > > > > > >> >> >>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>> Thanks for your comments. Overall I agree
>>> > with
>>> > > > > you
>>> > > > > > > > that
>>> > > > > > > > > it
>>> > > > > > > > > > > is
>>> > > > > > > > > > > >> >> >>>>> difficult
>>> > > > > > > > > > > >> >> >>>>>>> for
>>> > > > > > > > > > > >> >> >>>>>>>> down stream project to integrate with
>>> > flink
>>> > > > and
>>> > > > > we
>>> > > > > > > > need
>>> > > > > > > > > to
>>> > > > > > > > > > > >> >> >>> refactor
>>> > > > > > > > > > > >> >> >>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>> current flink client api.
>>> > > > > > > > > > > >> >> >>>>>>>> And I agree that CliFrontend should only
>>> > > > parsing
>>> > > > > > > > command
>>> > > > > > > > > > > line
>>> > > > > > > > > > > >> >> >>>>> arguments
>>> > > > > > > > > > > >> >> >>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>> then pass them to ExecutionEnvironment.
>>> > It is
>>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment's
>>> > > > > > > > > > > >> >> >>>>>>>> responsibility to compile job, create
>>> > cluster,
>>> > > > > and
>>> > > > > > > > > submit
>>> > > > > > > > > > > job.
>>> > > > > > > > > > > >> >> >>>>> Besides
>>> > > > > > > > > > > >> >> >>>>>>>> that, Currently flink has many
>>> > > > > > ExecutionEnvironment
>>> > > > > > > > > > > >> >> >>>> implementations,
>>> > > > > > > > > > > >> >> >>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>> flink will use the specific one based on
>>> > the
>>> > > > > > context.
>>> > > > > > > > > > IMHO,
>>> > > > > > > > > > > it
>>> > > > > > > > > > > >> >> >> is
>>> > > > > > > > > > > >> >> >>>> not
>>> > > > > > > > > > > >> >> >>>>>>>> necessary, ExecutionEnvironment should be
>>> > able
>>> > > > > to
>>> > > > > > do
>>> > > > > > > > the
>>> > > > > > > > > > > right
>>> > > > > > > > > > > >> >> >>>> thing
>>> > > > > > > > > > > >> >> >>>>>>> based
>>> > > > > > > > > > > >> >> >>>>>>>> on the FlinkConf it is received. Too many
>>> > > > > > > > > > > ExecutionEnvironment
>>> > > > > > > > > > > >> >> >>>>>>>> implementation is another burden for
>>> > > > downstream
>>> > > > > > > > project
>>> > > > > > > > > > > >> >> >>>> integration.
>>> > > > > > > > > > > >> >> >>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>> One thing I'd like to mention is flink's
>>> > scala
>>> > > > > > shell
>>> > > > > > > > and
>>> > > > > > > > > > sql
>>> > > > > > > > > > > >> >> >>>> client,
>>> > > > > > > > > > > >> >> >>>>>>>> although they are sub-modules of flink,
>>> > they
>>> > > > > > could be
>>> > > > > > > > > > > treated
>>> > > > > > > > > > > >> >> >> as
>>> > > > > > > > > > > >> >> >>>>>>> downstream
>>> > > > > > > > > > > >> >> >>>>>>>> project which use flink's client api.
>>> > > > Currently
>>> > > > > > you
>>> > > > > > > > will
>>> > > > > > > > > > > find
>>> > > > > > > > > > > >> >> >> it
>>> > > > > > > > > > > >> >> >>> is
>>> > > > > > > > > > > >> >> >>>>> not
>>> > > > > > > > > > > >> >> >>>>>>>> easy for them to integrate with flink,
>>> > they
>>> > > > > share
>>> > > > > > many
>>> > > > > > > > > > > >> >> >> duplicated
>>> > > > > > > > > > > >> >> >>>>> code.
>>> > > > > > > > > > > >> >> >>>>>>> It
>>> > > > > > > > > > > >> >> >>>>>>>> is another sign that we should refactor
>>> > flink
>>> > > > > > client
>>> > > > > > > > > api.
>>> > > > > > > > > > > >> >> >>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>> I believe it is a large and hard change,
>>> > and I
>>> > > > > am
>>> > > > > > > > afraid
>>> > > > > > > > > > we
>>> > > > > > > > > > > >> can
>>> > > > > > > > > > > >> >> >>> not
>>> > > > > > > > > > > >> >> >>>>>> keep
>>> > > > > > > > > > > >> >> >>>>>>>> compatibility since many of changes are
>>> > user
>>> > > > > > facing.
>>> > > > > > > > > > > >> >> >>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>> Zili Chen <wa...@gmail.com>
>>> > > > 于2019年6月24日周一
>>> > > > > > > > > 下午2:53写道：
>>> > > > > > > > > > > >> >> >>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> Hi all,
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> After a closer look on our client apis,
>>> > I can
>>> > > > > see
>>> > > > > > > > there
>>> > > > > > > > > > are
>>> > > > > > > > > > > >> >> >> two
>>> > > > > > > > > > > >> >> >>>>> major
>>> > > > > > > > > > > >> >> >>>>>>>>> issues to consistency and integration,
>>> > namely
>>> > > > > > > > different
>>> > > > > > > > > > > >> >> >>>> deployment
>>> > > > > > > > > > > >> >> >>>>> of
>>> > > > > > > > > > > >> >> >>>>>>>>> job cluster which couples job graph
>>> > creation
>>> > > > > and
>>> > > > > > > > > cluster
>>> > > > > > > > > > > >> >> >>>>> deployment,
>>> > > > > > > > > > > >> >> >>>>>>>>> and submission via CliFrontend confusing
>>> > > > > control
>>> > > > > > flow
>>> > > > > > > > > of
>>> > > > > > > > > > > job
>>> > > > > > > > > > > >> >> >>>> graph
>>> > > > > > > > > > > >> >> >>>>>>>>> compilation and job submission. I'd like
>>> > to
>>> > > > > > follow
>>> > > > > > > > the
>>> > > > > > > > > > > >> >> >> discuss
>>> > > > > > > > > > > >> >> >>>>> above,
>>> > > > > > > > > > > >> >> >>>>>>>>> mainly the process described by Jeff and
>>> > > > > > Stephan, and
>>> > > > > > > > > > share
>>> > > > > > > > > > > >> >> >> my
>>> > > > > > > > > > > >> >> >>>>>>>>> ideas on these issues.
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend confuses the control flow
>>> > of
>>> > > > job
>>> > > > > > > > > > compilation
>>> > > > > > > > > > > >> >> >> and
>>> > > > > > > > > > > >> >> >>>>>>>> submission.
>>> > > > > > > > > > > >> >> >>>>>>>>> Following the process of job submission
>>> > > > Stephan
>>> > > > > > and
>>> > > > > > > > > Jeff
>>> > > > > > > > > > > >> >> >>>> described,
>>> > > > > > > > > > > >> >> >>>>>>>>> execution environment knows all configs
>>> > of
>>> > > > the
>>> > > > > > > > cluster
>>> > > > > > > > > > and
>>> > > > > > > > > > > >> >> >>>>>>> topos/settings
>>> > > > > > > > > > > >> >> >>>>>>>>> of the job. Ideally, in the main method
>>> > of
>>> > > > user
>>> > > > > > > > > program,
>>> > > > > > > > > > it
>>> > > > > > > > > > > >> >> >>> calls
>>> > > > > > > > > > > >> >> >>>>>>>> #execute
>>> > > > > > > > > > > >> >> >>>>>>>>> (or named #submit) and Flink deploys the
>>> > > > > cluster,
>>> > > > > > > > > compile
>>> > > > > > > > > > > the
>>> > > > > > > > > > > >> >> >>> job
>>> > > > > > > > > > > >> >> >>>>>> graph
>>> > > > > > > > > > > >> >> >>>>>>>>> and submit it to the cluster. However,
>>> > > > current
>>> > > > > > > > > > CliFrontend
>>> > > > > > > > > > > >> >> >> does
>>> > > > > > > > > > > >> >> >>>> all
>>> > > > > > > > > > > >> >> >>>>>>> these
>>> > > > > > > > > > > >> >> >>>>>>>>> things inside its #runProgram method,
>>> > which
>>> > > > > > > > introduces
>>> > > > > > > > > a
>>> > > > > > > > > > > lot
>>> > > > > > > > > > > >> >> >> of
>>> > > > > > > > > > > >> >> >>>>>>>> subclasses
>>> > > > > > > > > > > >> >> >>>>>>>>> of (stream) execution environment.
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> Actually, it sets up an exec env that
>>> > hijacks
>>> > > > > the
>>> > > > > > > > > > > >> >> >>>>>> #execute/executePlan
>>> > > > > > > > > > > >> >> >>>>>>>>> method, initializes the job graph and
>>> > abort
>>> > > > > > > > execution.
>>> > > > > > > > > > And
>>> > > > > > > > > > > >> >> >> then
>>> > > > > > > > > > > >> >> >>>>>>>>> control flow back to CliFrontend, it
>>> > deploys
>>> > > > > the
>>> > > > > > > > > > cluster(or
>>> > > > > > > > > > > >> >> >>>>> retrieve
>>> > > > > > > > > > > >> >> >>>>>>>>> the client) and submits the job graph.
>>> > This
>>> > > > is
>>> > > > > > quite
>>> > > > > > > > a
>>> > > > > > > > > > > >> >> >> specific
>>> > > > > > > > > > > >> >> >>>>>>> internal
>>> > > > > > > > > > > >> >> >>>>>>>>> process inside Flink and none of
>>> > consistency
>>> > > > to
>>> > > > > > > > > anything.
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> 2) Deployment of job cluster couples job
>>> > > > graph
>>> > > > > > > > creation
>>> > > > > > > > > > and
>>> > > > > > > > > > > >> >> >>>> cluster
>>> > > > > > > > > > > >> >> >>>>>>>>> deployment. Abstractly, from user job to
>>> > a
>>> > > > > > concrete
>>> > > > > > > > > > > >> >> >> submission,
>>> > > > > > > > > > > >> >> >>>> it
>>> > > > > > > > > > > >> >> >>>>>>>> requires
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>     create JobGraph --\
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> create ClusterClient -->  submit JobGraph
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> such a dependency. ClusterClient was
>>> > created
>>> > > > by
>>> > > > > > > > > deploying
>>> > > > > > > > > > > or
>>> > > > > > > > > > > >> >> >>>>>>> retrieving.
>>> > > > > > > > > > > >> >> >>>>>>>>> JobGraph submission requires a compiled
>>> > > > > JobGraph
>>> > > > > > and
>>> > > > > > > > > > valid
>>> > > > > > > > > > > >> >> >>>>>>> ClusterClient,
>>> > > > > > > > > > > >> >> >>>>>>>>> but the creation of ClusterClient is
>>> > > > abstractly
>>> > > > > > > > > > independent
>>> > > > > > > > > > > >> >> >> of
>>> > > > > > > > > > > >> >> >>>> that
>>> > > > > > > > > > > >> >> >>>>>> of
>>> > > > > > > > > > > >> >> >>>>>>>>> JobGraph. However, in job cluster mode,
>>> > we
>>> > > > > > deploy job
>>> > > > > > > > > > > cluster
>>> > > > > > > > > > > >> >> >>>> with
>>> > > > > > > > > > > >> >> >>>>> a
>>> > > > > > > > > > > >> >> >>>>>>> job
>>> > > > > > > > > > > >> >> >>>>>>>>> graph, which means we use another
>>> > process:
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> create JobGraph --> deploy cluster with
>>> > the
>>> > > > > > JobGraph
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> Here is another inconsistency and
>>> > downstream
>>> > > > > > > > > > > projects/client
>>> > > > > > > > > > > >> >> >>> apis
>>> > > > > > > > > > > >> >> >>>>> are
>>> > > > > > > > > > > >> >> >>>>>>>>> forced to handle different cases with
>>> > rare
>>> > > > > > supports
>>> > > > > > > > > from
>>> > > > > > > > > > > >> >> >> Flink.
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> Since we likely reached a consensus on
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> 1. all configs gathered by Flink
>>> > > > configuration
>>> > > > > > and
>>> > > > > > > > > passed
>>> > > > > > > > > > > >> >> >>>>>>>>> 2. execution environment knows all
>>> > configs
>>> > > > and
>>> > > > > > > > handles
>>> > > > > > > > > > > >> >> >>>>> execution(both
>>> > > > > > > > > > > >> >> >>>>>>>>> deployment and submission)
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> to the issues above I propose eliminating
>>> > > > > > > > > inconsistencies
>>> > > > > > > > > > > by
>>> > > > > > > > > > > >> >> >>>>>> following
>>> > > > > > > > > > > >> >> >>>>>>>>> approach:
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend should exactly be a front
>>> > end,
>>> > > > > at
>>> > > > > > > > least
>>> > > > > > > > > > for
>>> > > > > > > > > > > >> >> >>> "run"
>>> > > > > > > > > > > >> >> >>>>>>> command.
>>> > > > > > > > > > > >> >> >>>>>>>>> That means it just gathered and passed
>>> > all
>>> > > > > config
>>> > > > > > > > from
>>> > > > > > > > > > > >> >> >> command
>>> > > > > > > > > > > >> >> >>>> line
>>> > > > > > > > > > > >> >> >>>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>> the main method of user program.
>>> > Execution
>>> > > > > > > > environment
>>> > > > > > > > > > > knows
>>> > > > > > > > > > > >> >> >>> all
>>> > > > > > > > > > > >> >> >>>>> the
>>> > > > > > > > > > > >> >> >>>>>>> info
>>> > > > > > > > > > > >> >> >>>>>>>>> and with an addition to utils for
>>> > > > > ClusterClient,
>>> > > > > > we
>>> > > > > > > > > > > >> >> >> gracefully
>>> > > > > > > > > > > >> >> >>>> get
>>> > > > > > > > > > > >> >> >>>>> a
>>> > > > > > > > > > > >> >> >>>>>>>>> ClusterClient by deploying or
>>> > retrieving. In
>>> > > > > this
>>> > > > > > > > way,
>>> > > > > > > > > we
>>> > > > > > > > > > > >> >> >> don't
>>> > > > > > > > > > > >> >> >>>>> need
>>> > > > > > > > > > > >> >> >>>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>> hijack #execute/executePlan methods and
>>> > can
>>> > > > > > remove
>>> > > > > > > > > > various
>>> > > > > > > > > > > >> >> >>>> hacking
>>> > > > > > > > > > > >> >> >>>>>>>>> subclasses of exec env, as well as #run
>>> > > > methods
>>> > > > > > in
>>> > > > > > > > > > > >> >> >>>>> ClusterClient(for
>>> > > > > > > > > > > >> >> >>>>>> an
>>> > > > > > > > > > > >> >> >>>>>>>>> interface-ized ClusterClient). Now the
>>> > > > control
>>> > > > > > flow
>>> > > > > > > > > flows
>>> > > > > > > > > > > >> >> >> from
>>> > > > > > > > > > > >> >> >>>>>>>> CliFrontend
>>> > > > > > > > > > > >> >> >>>>>>>>> to the main method and never returns.
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> 2) Job cluster means a cluster for the
>>> > > > specific
>>> > > > > > job.
>>> > > > > > > > > From
>>> > > > > > > > > > > >> >> >>> another
>>> > > > > > > > > > > >> >> >>>>>>>>> perspective, it is an ephemeral session.
>>> > We
>>> > > > may
>>> > > > > > > > > decouple
>>> > > > > > > > > > > the
>>> > > > > > > > > > > >> >> >>>>>> deployment
>>> > > > > > > > > > > >> >> >>>>>>>>> with a compiled job graph, but start a
>>> > > > session
>>> > > > > > with
>>> > > > > > > > > idle
>>> > > > > > > > > > > >> >> >>> timeout
>>> > > > > > > > > > > >> >> >>>>>>>>> and submit the job following.
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> These topics, before we go into more
>>> > details
>>> > > > on
>>> > > > > > > > design
>>> > > > > > > > > or
>>> > > > > > > > > > > >> >> >>>>>>> implementation,
>>> > > > > > > > > > > >> >> >>>>>>>>> are better to be aware and discussed for
>>> > a
>>> > > > > > consensus.
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> Best,
>>> > > > > > > > > > > >> >> >>>>>>>>> tison.
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>> Zili Chen <wa...@gmail.com>
>>> > > > 于2019年6月20日周四
>>> > > > > > > > > 上午3:21写道：
>>> > > > > > > > > > > >> >> >>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>> Hi Jeff,
>>> > > > > > > > > > > >> >> >>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>> Thanks for raising this thread and the
>>> > > > design
>>> > > > > > > > > document!
>>> > > > > > > > > > > >> >> >>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>> As @Thomas Weise mentioned above,
>>> > extending
>>> > > > > > config
>>> > > > > > > > to
>>> > > > > > > > > > > flink
>>> > > > > > > > > > > >> >> >>>>>>>>>> requires far more effort than it should
>>> > be.
>>> > > > > > Another
>>> > > > > > > > > > > example
>>> > > > > > > > > > > >> >> >>>>>>>>>> is we achieve detach mode by introduce
>>> > > > another
>>> > > > > > > > > execution
>>> > > > > > > > > > > >> >> >>>>>>>>>> environment which also hijack #execute
>>> > > > method.
>>> > > > > > > > > > > >> >> >>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>> I agree with your idea that user would
>>> > > > > > configure all
>>> > > > > > > > > > > things
>>> > > > > > > > > > > >> >> >>>>>>>>>> and flink "just" respect it. On this
>>> > topic I
>>> > > > > > think
>>> > > > > > > > the
>>> > > > > > > > > > > >> >> >> unusual
>>> > > > > > > > > > > >> >> >>>>>>>>>> control flow when CliFrontend handle
>>> > "run"
>>> > > > > > command
>>> > > > > > > > is
>>> > > > > > > > > > the
>>> > > > > > > > > > > >> >> >>>> problem.
>>> > > > > > > > > > > >> >> >>>>>>>>>> It handles several configs, mainly about
>>> > > > > cluster
>>> > > > > > > > > > settings,
>>> > > > > > > > > > > >> >> >> and
>>> > > > > > > > > > > >> >> >>>>>>>>>> thus main method of user program is
>>> > unaware
>>> > > > of
>>> > > > > > them.
>>> > > > > > > > > > Also
>>> > > > > > > > > > > it
>>> > > > > > > > > > > >> >> >>>>>> compiles
>>> > > > > > > > > > > >> >> >>>>>>>>>> app to job graph by run the main method
>>> > > > with a
>>> > > > > > > > > hijacked
>>> > > > > > > > > > > exec
>>> > > > > > > > > > > >> >> >>>> env,
>>> > > > > > > > > > > >> >> >>>>>>>>>> which constrain the main method further.
>>> > > > > > > > > > > >> >> >>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>> I'd like to write down a few of notes on
>>> > > > > > > > configs/args
>>> > > > > > > > > > pass
>>> > > > > > > > > > > >> >> >> and
>>> > > > > > > > > > > >> >> >>>>>>> respect,
>>> > > > > > > > > > > >> >> >>>>>>>>>> as well as decoupling job compilation
>>> > and
>>> > > > > > > > submission.
>>> > > > > > > > > > > Share
>>> > > > > > > > > > > >> >> >> on
>>> > > > > > > > > > > >> >> >>>>> this
>>> > > > > > > > > > > >> >> >>>>>>>>>> thread later.
>>> > > > > > > > > > > >> >> >>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>> Best,
>>> > > > > > > > > > > >> >> >>>>>>>>>> tison.
>>> > > > > > > > > > > >> >> >>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com>
>>> > > > > > 于2019年6月17日周一
>>> > > > > > > > > > > >> >> >> 下午7:29写道：
>>> > > > > > > > > > > >> >> >>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the
>>> > design
>>> > > > > > > > document.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>> We are also working on refactoring
>>> > > > > > ClusterClient to
>>> > > > > > > > > > allow
>>> > > > > > > > > > > >> >> >>>>> flexible
>>> > > > > > > > > > > >> >> >>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>> efficient job management in our
>>> > real-time
>>> > > > > > platform.
>>> > > > > > > > > > > >> >> >>>>>>>>>>> We would like to draft a document to
>>> > share
>>> > > > > our
>>> > > > > > > > ideas
>>> > > > > > > > > > with
>>> > > > > > > > > > > >> >> >>> you.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>> I think it's a good idea to have
>>> > something
>>> > > > > like
>>> > > > > > > > > Apache
>>> > > > > > > > > > > Livy
>>> > > > > > > > > > > >> >> >>> for
>>> > > > > > > > > > > >> >> >>>>>>> Flink,
>>> > > > > > > > > > > >> >> >>>>>>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>> the efforts discussed here will take a
>>> > > > great
>>> > > > > > step
>>> > > > > > > > > > forward
>>> > > > > > > > > > > >> >> >> to
>>> > > > > > > > > > > >> >> >>>> it.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>> Regards,
>>> > > > > > > > > > > >> >> >>>>>>>>>>> Xiaogang
>>> > > > > > > > > > > >> >> >>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>> Flavio Pompermaier <
>>> > pompermaier@okkam.it>
>>> > > > > > > > > > 于2019年6月17日周一
>>> > > > > > > > > > > >> >> >>>>> 下午7:13写道：
>>> > > > > > > > > > > >> >> >>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> Is there any possibility to have
>>> > something
>>> > > > > > like
>>> > > > > > > > > Apache
>>> > > > > > > > > > > >> >> >> Livy
>>> > > > > > > > > > > >> >> >>>> [1]
>>> > > > > > > > > > > >> >> >>>>>>> also
>>> > > > > > > > > > > >> >> >>>>>>>>>>> for
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> Flink in the future?
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff
>>> > > > Zhang <
>>> > > > > > > > > > > >> >> >>> zjffdu@gmail.com
>>> > > > > > > > > > > >> >> >>>>>
>>> > > > > > > > > > > >> >> >>>>>>> wrote:
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Any API we expose should not have
>>> > > > > > dependencies
>>> > > > > > > > > on
>>> > > > > > > > > > > >> >> >>> the
>>> > > > > > > > > > > >> >> >>>>>>> runtime
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> (flink-runtime) package or other
>>> > > > > > implementation
>>> > > > > > > > > > > >> >> >> details.
>>> > > > > > > > > > > >> >> >>> To
>>> > > > > > > > > > > >> >> >>>>> me,
>>> > > > > > > > > > > >> >> >>>>>>>> this
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> means
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> that the current ClusterClient
>>> > cannot be
>>> > > > > > exposed
>>> > > > > > > > to
>>> > > > > > > > > > > >> >> >> users
>>> > > > > > > > > > > >> >> >>>>>> because
>>> > > > > > > > > > > >> >> >>>>>>>> it
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> uses
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> quite some classes from the
>>> > optimiser and
>>> > > > > > runtime
>>> > > > > > > > > > > >> >> >>> packages.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> We should change ClusterClient from
>>> > class
>>> > > > > to
>>> > > > > > > > > > interface.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> ExecutionEnvironment only use the
>>> > > > interface
>>> > > > > > > > > > > >> >> >> ClusterClient
>>> > > > > > > > > > > >> >> >>>>> which
>>> > > > > > > > > > > >> >> >>>>>>>>>>> should be
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> in flink-clients while the concrete
>>> > > > > > > > implementation
>>> > > > > > > > > > > >> >> >> class
>>> > > > > > > > > > > >> >> >>>>> could
>>> > > > > > > > > > > >> >> >>>>>> be
>>> > > > > > > > > > > >> >> >>>>>>>> in
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> flink-runtime.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What happens when a
>>> > failure/restart in
>>> > > > > the
>>> > > > > > > > > client
>>> > > > > > > > > > > >> >> >>>>> happens?
>>> > > > > > > > > > > >> >> >>>>>>>> There
>>> > > > > > > > > > > >> >> >>>>>>>>>>> need
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to be a way of re-establishing the
>>> > > > > > connection to
>>> > > > > > > > > the
>>> > > > > > > > > > > >> >> >> job,
>>> > > > > > > > > > > >> >> >>>> set
>>> > > > > > > > > > > >> >> >>>>>> up
>>> > > > > > > > > > > >> >> >>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> listeners again, etc.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Good point.  First we need to define
>>> > what
>>> > > > > > does
>>> > > > > > > > > > > >> >> >>>>> failure/restart
>>> > > > > > > > > > > >> >> >>>>>> in
>>> > > > > > > > > > > >> >> >>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> client mean. IIUC, that usually mean
>>> > > > > network
>>> > > > > > > > > failure
>>> > > > > > > > > > > >> >> >>> which
>>> > > > > > > > > > > >> >> >>>>> will
>>> > > > > > > > > > > >> >> >>>>>>>>>>> happen in
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> class RestClient. If my
>>> > understanding is
>>> > > > > > correct,
>>> > > > > > > > > > > >> >> >>>>> restart/retry
>>> > > > > > > > > > > >> >> >>>>>>>>>>> mechanism
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> should be done in RestClient.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Aljoscha Krettek <
>>> > aljoscha@apache.org>
>>> > > > > > > > > 于2019年6月11日周二
>>> > > > > > > > > > > >> >> >>>>>> 下午11:10写道：
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> Some points to consider:
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> * Any API we expose should not have
>>> > > > > > dependencies
>>> > > > > > > > > on
>>> > > > > > > > > > > >> >> >> the
>>> > > > > > > > > > > >> >> >>>>>> runtime
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> (flink-runtime) package or other
>>> > > > > > implementation
>>> > > > > > > > > > > >> >> >>> details.
>>> > > > > > > > > > > >> >> >>>> To
>>> > > > > > > > > > > >> >> >>>>>> me,
>>> > > > > > > > > > > >> >> >>>>>>>>>>> this
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> means
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> that the current ClusterClient
>>> > cannot be
>>> > > > > > exposed
>>> > > > > > > > > to
>>> > > > > > > > > > > >> >> >>> users
>>> > > > > > > > > > > >> >> >>>>>>> because
>>> > > > > > > > > > > >> >> >>>>>>>>>>> it
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> uses
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> quite some classes from the
>>> > optimiser
>>> > > > and
>>> > > > > > > > runtime
>>> > > > > > > > > > > >> >> >>>> packages.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> * What happens when a
>>> > failure/restart in
>>> > > > > the
>>> > > > > > > > > client
>>> > > > > > > > > > > >> >> >>>>> happens?
>>> > > > > > > > > > > >> >> >>>>>>>> There
>>> > > > > > > > > > > >> >> >>>>>>>>>>> need
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> be a way of re-establishing the
>>> > > > connection
>>> > > > > > to
>>> > > > > > > > the
>>> > > > > > > > > > > >> >> >> job,
>>> > > > > > > > > > > >> >> >>>> set
>>> > > > > > > > > > > >> >> >>>>> up
>>> > > > > > > > > > > >> >> >>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> listeners
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> again, etc.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> Aljoscha
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff
>>> > Zhang <
>>> > > > > > > > > > > >> >> >>>> zjffdu@gmail.com>
>>> > > > > > > > > > > >> >> >>>>>>>> wrote:
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is
>>> > late as
>>> > > > > you
>>> > > > > > > > > > > >> >> >> expected.
>>> > > > > > > > > > > >> >> >>>>> Here's
>>> > > > > > > > > > > >> >> >>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> design
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> doc
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> I drafted, welcome any comments and
>>> > > > > > feedback.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>
>>> > > > > > > > > > > >> >> >>>>>
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>
>>> > > > > > > > > > > >> >> >>
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >>
>>> > > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
>>> > > > > > 于2019年2月14日周四
>>> > > > > > > > > > > >> >> >>>> 下午8:43写道：
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Nice that this discussion is
>>> > > > happening.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also
>>> > revisit the
>>> > > > > > entire
>>> > > > > > > > > role
>>> > > > > > > > > > > >> >> >>> of
>>> > > > > > > > > > > >> >> >>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> environments
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> again.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the environments take care of
>>> > the
>>> > > > > > specific
>>> > > > > > > > > > > >> >> >> setup
>>> > > > > > > > > > > >> >> >>>> for
>>> > > > > > > > > > > >> >> >>>>>>>>>>> standalone
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> (no
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the session ones have control
>>> > over
>>> > > > the
>>> > > > > > > > > session.
>>> > > > > > > > > > > >> >> >>> The
>>> > > > > > > > > > > >> >> >>>>>>>>>>> environment
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> holds
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the session client.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - running a job gives a "control"
>>> > > > object
>>> > > > > > for
>>> > > > > > > > > that
>>> > > > > > > > > > > >> >> >>>> job.
>>> > > > > > > > > > > >> >> >>>>>> That
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> behavior
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> is
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the same in all environments.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> The actual implementation diverged
>>> > > > quite
>>> > > > > > a bit
>>> > > > > > > > > > > >> >> >> from
>>> > > > > > > > > > > >> >> >>>>> that.
>>> > > > > > > > > > > >> >> >>>>>>>> Happy
>>> > > > > > > > > > > >> >> >>>>>>>>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> see a
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> discussion about straitening this
>>> > out
>>> > > > a
>>> > > > > > bit
>>> > > > > > > > > more.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM
>>> > Jeff
>>> > > > > > Zhang <
>>> > > > > > > > > > > >> >> >>>>>>> zjffdu@gmail.com>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> wrote:
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Hi folks,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Sorry for late response, It
>>> > seems we
>>> > > > > > reach
>>> > > > > > > > > > > >> >> >>> consensus
>>> > > > > > > > > > > >> >> >>>> on
>>> > > > > > > > > > > >> >> >>>>>>>> this, I
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> will
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> create
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> FLIP for this with more detailed
>>> > > > design
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org>
>>> > > > > > 于2018年12月21日周五
>>> > > > > > > > > > > >> >> >>>>> 上午11:43写道：
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Great to see this discussion
>>> > seeded!
>>> > > > > The
>>> > > > > > > > > > > >> >> >> problems
>>> > > > > > > > > > > >> >> >>>> you
>>> > > > > > > > > > > >> >> >>>>>> face
>>> > > > > > > > > > > >> >> >>>>>>>>>>> with
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also
>>> > > > > affecting
>>> > > > > > > > other
>>> > > > > > > > > > > >> >> >>>>> downstream
>>> > > > > > > > > > > >> >> >>>>>>>>>>> projects,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> like
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Beam.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint
>>> > > > restore
>>> > > > > > option
>>> > > > > > > > > in
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> [1]
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and that was more difficult
>>> > than it
>>> > > > > > should
>>> > > > > > > > be.
>>> > > > > > > > > > > >> >> >> The
>>> > > > > > > > > > > >> >> >>>>> main
>>> > > > > > > > > > > >> >> >>>>>>>> issue
>>> > > > > > > > > > > >> >> >>>>>>>>>>> is
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> that
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> environment and cluster client
>>> > > > aren't
>>> > > > > > > > > decoupled.
>>> > > > > > > > > > > >> >> >>>>> Ideally
>>> > > > > > > > > > > >> >> >>>>>>> it
>>> > > > > > > > > > > >> >> >>>>>>>>>>> should
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> be
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> possible to just get the
>>> > matching
>>> > > > > > cluster
>>> > > > > > > > > client
>>> > > > > > > > > > > >> >> >>>> from
>>> > > > > > > > > > > >> >> >>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> environment
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> then control the job through it
>>> > > > > > (environment
>>> > > > > > > > > as
>>> > > > > > > > > > > >> >> >>>>> factory
>>> > > > > > > > > > > >> >> >>>>>>> for
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> cluster
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> client). But note that the
>>> > > > environment
>>> > > > > > > > classes
>>> > > > > > > > > > > >> >> >> are
>>> > > > > > > > > > > >> >> >>>>> part
>>> > > > > > > > > > > >> >> >>>>>> of
>>> > > > > > > > > > > >> >> >>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> public
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and it is not straightforward to
>>> > > > make
>>> > > > > > larger
>>> > > > > > > > > > > >> >> >>> changes
>>> > > > > > > > > > > >> >> >>>>>>> without
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> breaking
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes
>>> > > > > internal
>>> > > > > > > > > classes
>>> > > > > > > > > > > >> >> >>>> like
>>> > > > > > > > > > > >> >> >>>>>>>>>>> JobGraph and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be
>>> > > > possible
>>> > > > > > to
>>> > > > > > > > wrap
>>> > > > > > > > > > > >> >> >>> this
>>> > > > > > > > > > > >> >> >>>>>> with a
>>> > > > > > > > > > > >> >> >>>>>>>> new
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> public
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> API
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that brings the required job
>>> > control
>>> > > > > > > > > > > >> >> >> capabilities
>>> > > > > > > > > > > >> >> >>>> for
>>> > > > > > > > > > > >> >> >>>>>>>>>>> downstream
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> projects.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at
>>> > > > some
>>> > > > > > of the
>>> > > > > > > > > > > >> >> >>>>> interfaces
>>> > > > > > > > > > > >> >> >>>>>> in
>>> > > > > > > > > > > >> >> >>>>>>>>>>> Beam
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> while
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the
>>> > > > > > portable
>>> > > > > > > > job
>>> > > > > > > > > > > >> >> >> API
>>> > > > > > > > > > > >> >> >>>> and
>>> > > > > > > > > > > >> >> >>>>>> [3]
>>> > > > > > > > > > > >> >> >>>>>>>> for
>>> > > > > > > > > > > >> >> >>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> old
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> asynchronous job control from
>>> > the
>>> > > > Beam
>>> > > > > > Java
>>> > > > > > > > > SDK.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> The backward compatibility
>>> > > > discussion
>>> > > > > > [4] is
>>> > > > > > > > > > > >> >> >> also
>>> > > > > > > > > > > >> >> >>>>>> relevant
>>> > > > > > > > > > > >> >> >>>>>>>>>>> here. A
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> new
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> should shield downstream
>>> > projects
>>> > > > from
>>> > > > > > > > > internals
>>> > > > > > > > > > > >> >> >>> and
>>> > > > > > > > > > > >> >> >>>>>> allow
>>> > > > > > > > > > > >> >> >>>>>>>>>>> them to
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> interoperate with multiple
>>> > future
>>> > > > > Flink
>>> > > > > > > > > versions
>>> > > > > > > > > > > >> >> >>> in
>>> > > > > > > > > > > >> >> >>>>> the
>>> > > > > > > > > > > >> >> >>>>>>> same
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> release
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> line
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thanks,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thomas
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [1]
>>> > > > > > > > https://github.com/apache/flink/pull/7249
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [2]
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>
>>> > > > > > > > > > > >> >> >>>>>
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>
>>> > > > > > > > > > > >> >> >>
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >>
>>> > > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [3]
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>
>>> > > > > > > > > > > >> >> >>>>>
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>
>>> > > > > > > > > > > >> >> >>
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >>
>>> > > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [4]
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>
>>> > > > > > > > > > > >> >> >>>>>
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>
>>> > > > > > > > > > > >> >> >>
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >>
>>> > > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM
>>> > Jeff
>>> > > > > > Zhang <
>>> > > > > > > > > > > >> >> >>>>>>>> zjffdu@gmail.com>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> wrote:
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the
>>> > user
>>> > > > > > should
>>> > > > > > > > be
>>> > > > > > > > > > > >> >> >>> able
>>> > > > > > > > > > > >> >> >>>> to
>>> > > > > > > > > > > >> >> >>>>>>>> define
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> where
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> job
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn).
>>> > This
>>> > > > is
>>> > > > > > > > actually
>>> > > > > > > > > > > >> >> >>>>>> independent
>>> > > > > > > > > > > >> >> >>>>>>>> of
>>> > > > > > > > > > > >> >> >>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> job
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> development and is something
>>> > which
>>> > > > is
>>> > > > > > > > decided
>>> > > > > > > > > > > >> >> >> at
>>> > > > > > > > > > > >> >> >>>>>>> deployment
>>> > > > > > > > > > > >> >> >>>>>>>>>>> time.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> User don't need to specify
>>> > > > execution
>>> > > > > > mode
>>> > > > > > > > > > > >> >> >>>>>>> programmatically.
>>> > > > > > > > > > > >> >> >>>>>>>>>>> They
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> can
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> also
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from
>>> > the
>>> > > > > > arguments
>>> > > > > > > > in
>>> > > > > > > > > > > >> >> >>> flink
>>> > > > > > > > > > > >> >> >>>>> run
>>> > > > > > > > > > > >> >> >>>>>>>>>>> command.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> e.g.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster
>>> > ....
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that
>>> > the
>>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment
>>> > > > > > > > > > > >> >> >>>>>> is
>>> > > > > > > > > > > >> >> >>>>>>>> not
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> directly
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> initialized by the user and
>>> > instead
>>> > > > > > context
>>> > > > > > > > > > > >> >> >>>> sensitive
>>> > > > > > > > > > > >> >> >>>>>> how
>>> > > > > > > > > > > >> >> >>>>>>>> you
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> want
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs.
>>> > > > IDE,
>>> > > > > > for
>>> > > > > > > > > > > >> >> >>> example).
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink
>>> > > > would
>>> > > > > > > > create
>>> > > > > > > > > > > >> >> >>>>> different
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment
>>> > based
>>> > > > on
>>> > > > > > > > > different
>>> > > > > > > > > > > >> >> >>>>>> submission
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> scenarios
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> (Flink
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is
>>> > kind of
>>> > > > > hack
>>> > > > > > > > > > > >> >> >> approach,
>>> > > > > > > > > > > >> >> >>>> not
>>> > > > > > > > > > > >> >> >>>>>> so
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> straightforward.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> What I suggested above is that
>>> > is
>>> > > > > that
>>> > > > > > > > flink
>>> > > > > > > > > > > >> >> >>> should
>>> > > > > > > > > > > >> >> >>>>>>> always
>>> > > > > > > > > > > >> >> >>>>>>>>>>> create
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with
>>> > > > > different
>>> > > > > > > > > > > >> >> >>>>> configuration,
>>> > > > > > > > > > > >> >> >>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>> based
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> on
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> configuration it would create
>>> > the
>>> > > > > > proper
>>> > > > > > > > > > > >> >> >>>>> ClusterClient
>>> > > > > > > > > > > >> >> >>>>>>> for
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> different
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <
>>> > > > trohrmann@apache.org>
>>> > > > > > > > > > > >> >> >>>> 于2018年12月20日周四
>>> > > > > > > > > > > >> >> >>>>>>>>>>> 下午11:18写道：
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> You are probably right that we
>>> > > > have
>>> > > > > > code
>>> > > > > > > > > > > >> >> >>>> duplication
>>> > > > > > > > > > > >> >> >>>>>>> when
>>> > > > > > > > > > > >> >> >>>>>>>> it
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> comes
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient.
>>> > > > This
>>> > > > > > should
>>> > > > > > > > > be
>>> > > > > > > > > > > >> >> >>>>> reduced
>>> > > > > > > > > > > >> >> >>>>>> in
>>> > > > > > > > > > > >> >> >>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> future.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the
>>> > user
>>> > > > > > should be
>>> > > > > > > > > > > >> >> >> able
>>> > > > > > > > > > > >> >> >>> to
>>> > > > > > > > > > > >> >> >>>>>>> define
>>> > > > > > > > > > > >> >> >>>>>>>>>>> where
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn).
>>> > This
>>> > > > is
>>> > > > > > > > > actually
>>> > > > > > > > > > > >> >> >>>>>>> independent
>>> > > > > > > > > > > >> >> >>>>>>>>>>> of the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> development and is something
>>> > which
>>> > > > > is
>>> > > > > > > > > decided
>>> > > > > > > > > > > >> >> >> at
>>> > > > > > > > > > > >> >> >>>>>>>> deployment
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> time.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> To
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> it
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the
>>> > > > > > ExecutionEnvironment
>>> > > > > > > > is
>>> > > > > > > > > > > >> >> >> not
>>> > > > > > > > > > > >> >> >>>>>>> directly
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> initialized
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> by
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> the user and instead context
>>> > > > > > sensitive how
>>> > > > > > > > > you
>>> > > > > > > > > > > >> >> >>>> want
>>> > > > > > > > > > > >> >> >>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>>>> execute
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> your
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for
>>> > example).
>>> > > > > > > > However, I
>>> > > > > > > > > > > >> >> >>> agree
>>> > > > > > > > > > > >> >> >>>>>> that
>>> > > > > > > > > > > >> >> >>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should
>>> > give
>>> > > > you
>>> > > > > > > > access
>>> > > > > > > > > to
>>> > > > > > > > > > > >> >> >>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>> ClusterClient
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the
>>> > > > > > JobGraph or
>>> > > > > > > > a
>>> > > > > > > > > > > >> >> >> job
>>> > > > > > > > > > > >> >> >>>>> plan).
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Till
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36
>>> > AM
>>> > > > Jeff
>>> > > > > > > > Zhang <
>>> > > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> wrote:
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You
>>> > are
>>> > > > > > right
>>> > > > > > > > > that I
>>> > > > > > > > > > > >> >> >>>>> expect
>>> > > > > > > > > > > >> >> >>>>>>>> better
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api
>>> > which
>>> > > > > > could be
>>> > > > > > > > > > > >> >> >> used
>>> > > > > > > > > > > >> >> >>> by
>>> > > > > > > > > > > >> >> >>>>>>>>>>> downstream
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> project.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> And
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the
>>> > flink
>>> > > > > > ecosystem.
>>> > > > > > > > > > > >> >> >> When
>>> > > > > > > > > > > >> >> >>> I
>>> > > > > > > > > > > >> >> >>>>> look
>>> > > > > > > > > > > >> >> >>>>>>> at
>>> > > > > > > > > > > >> >> >>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> code
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> of
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> flink
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I
>>> > > > > believe
>>> > > > > > > > they
>>> > > > > > > > > > > >> >> >> are
>>> > > > > > > > > > > >> >> >>>> not
>>> > > > > > > > > > > >> >> >>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>> core of
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> flink,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> but
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of
>>> > > > flink),
>>> > > > > I
>>> > > > > > find
>>> > > > > > > > > > > >> >> >> many
>>> > > > > > > > > > > >> >> >>>>>>> duplicated
>>> > > > > > > > > > > >> >> >>>>>>>>>>> code
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> for
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creating
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user
>>> > provided
>>> > > > > > > > > > > >> >> >> configuration
>>> > > > > > > > > > > >> >> >>>>>>>>>>> (configuration
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> format
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> may
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell
>>> > and
>>> > > > > > > > sql-client)
>>> > > > > > > > > > > >> >> >> and
>>> > > > > > > > > > > >> >> >>>> then
>>> > > > > > > > > > > >> >> >>>>>> use
>>> > > > > > > > > > > >> >> >>>>>>>>>>> that
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't
>>> > think
>>> > > > > > this is
>>> > > > > > > > > > > >> >> >>>>> convenient
>>> > > > > > > > > > > >> >> >>>>>>> for
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> downstream
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is
>>> > that
>>> > > > > > > > downstream
>>> > > > > > > > > > > >> >> >>>> project
>>> > > > > > > > > > > >> >> >>>>>> only
>>> > > > > > > > > > > >> >> >>>>>>>>>>> needs
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> provide
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration info
>>> > > > (maybe
>>> > > > > > > > > > > >> >> >> introducing
>>> > > > > > > > > > > >> >> >>>>> class
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> FlinkConf),
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment
>>> > based
>>> > > > on
>>> > > > > > this
>>> > > > > > > > > > > >> >> >>>> FlinkConf,
>>> > > > > > > > > > > >> >> >>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will
>>> > create
>>> > > > > the
>>> > > > > > > > proper
>>> > > > > > > > > > > >> >> >>>>>>>> ClusterClient.
>>> > > > > > > > > > > >> >> >>>>>>>>>>> It
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> not
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> only
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream
>>> > > > project
>>> > > > > > > > > > > >> >> >> development
>>> > > > > > > > > > > >> >> >>>> but
>>> > > > > > > > > > > >> >> >>>>>> also
>>> > > > > > > > > > > >> >> >>>>>>>> be
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> helpful
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> for
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> their integration test with
>>> > > > flink.
>>> > > > > > Here's
>>> > > > > > > > > one
>>> > > > > > > > > > > >> >> >>>>> sample
>>> > > > > > > > > > > >> >> >>>>>>> code
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> snippet
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> I
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new
>>> > > > > > FlinkConf().mode("yarn")
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new
>>> > > > > > ExecutionEnvironment(conf)
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
>>> > > > > > > > > > > >> >> >>>>>>>>>>>
>>> > > > env.getClusterClient().queryJobStatus(jobId)
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>>> > > > > > env.getClusterClient().cancelJob(jobId)
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
>>> > > > > trohrmann@apache.org>
>>> > > > > > > > > > > >> >> >>>>> 于2018年12月11日周二
>>> > > > > > > > > > > >> >> >>>>>>>>>>> 下午6:28写道：
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to
>>> > > > > > provide the
>>> > > > > > > > > > > >> >> >> user
>>> > > > > > > > > > > >> >> >>>> with
>>> > > > > > > > > > > >> >> >>>>>>>> better
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was actually
>>> > an
>>> > > > > > effort to
>>> > > > > > > > > > > >> >> >>> achieve
>>> > > > > > > > > > > >> >> >>>>>> this
>>> > > > > > > > > > > >> >> >>>>>>>> but
>>> > > > > > > > > > > >> >> >>>>>>>>>>> it
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> has
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> never
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> been
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However,
>>> > there
>>> > > > are
>>> > > > > > some
>>> > > > > > > > > > > >> >> >>>> improvement
>>> > > > > > > > > > > >> >> >>>>>> in
>>> > > > > > > > > > > >> >> >>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>> code
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> base
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> now.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the
>>> > > > > > NewClusterClient
>>> > > > > > > > > > > >> >> >>>> interface
>>> > > > > > > > > > > >> >> >>>>>>> which
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> offers a
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission.
>>> > > > But I
>>> > > > > > agree
>>> > > > > > > > > > > >> >> >> that
>>> > > > > > > > > > > >> >> >>> we
>>> > > > > > > > > > > >> >> >>>>>> need
>>> > > > > > > > > > > >> >> >>>>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> improve
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> in
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if
>>> > > > > > exposing all
>>> > > > > > > > > > > >> >> >>>>>> ClusterClient
>>> > > > > > > > > > > >> >> >>>>>>>>>>> calls
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> via
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> > because it
>>> > > > > > would
>>> > > > > > > > > > > >> >> >> clutter
>>> > > > > > > > > > > >> >> >>>> the
>>> > > > > > > > > > > >> >> >>>>>>> class
>>> > > > > > > > > > > >> >> >>>>>>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> would
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> not
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> a
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns.
>>> > > > > > Instead one
>>> > > > > > > > > > > >> >> >> idea
>>> > > > > > > > > > > >> >> >>>>> could
>>> > > > > > > > > > > >> >> >>>>>> be
>>> > > > > > > > > > > >> >> >>>>>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> retrieve
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from
>>> > the
>>> > > > > > > > > > > >> >> >>>>> ExecutionEnvironment
>>> > > > > > > > > > > >> >> >>>>>>>> which
>>> > > > > > > > > > > >> >> >>>>>>>>>>> can
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> be
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> used
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job
>>> > control. But
>>> > > > > > before
>>> > > > > > > > we
>>> > > > > > > > > > > >> >> >>> start
>>> > > > > > > > > > > >> >> >>>>> an
>>> > > > > > > > > > > >> >> >>>>>>>> effort
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> here,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> we
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> need
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what
>>> > > > > > functionality we
>>> > > > > > > > > want
>>> > > > > > > > > > > >> >> >>> to
>>> > > > > > > > > > > >> >> >>>>>>> provide.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was
>>> > that we
>>> > > > > > have the
>>> > > > > > > > > > > >> >> >>>>>>>> ClusterDescriptor
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> describing
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> how
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager
>>> > like
>>> > > > > > Yarn or
>>> > > > > > > > > > > >> >> >> Mesos.
>>> > > > > > > > > > > >> >> >>>> The
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> can
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> be
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink
>>> > > > clusters
>>> > > > > > (job
>>> > > > > > > > and
>>> > > > > > > > > > > >> >> >>>>> session)
>>> > > > > > > > > > > >> >> >>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>> gives
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> you a
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
>>> > ClusterClient
>>> > > > > > > > controls
>>> > > > > > > > > > > >> >> >> the
>>> > > > > > > > > > > >> >> >>>>>> cluster
>>> > > > > > > > > > > >> >> >>>>>>>>>>> (e.g.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> submitting
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running
>>> > jobs).
>>> > > > > And
>>> > > > > > > > then
>>> > > > > > > > > > > >> >> >>> there
>>> > > > > > > > > > > >> >> >>>>> was
>>> > > > > > > > > > > >> >> >>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>> idea
>>> > > > > > > > > > > >> >> >>>>>>>>>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain
>>> > from
>>> > > > > the
>>> > > > > > > > > > > >> >> >>>> ClusterClient
>>> > > > > > > > > > > >> >> >>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>>>> trigger
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> specific
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a
>>> > > > > savepoint,
>>> > > > > > > > > > > >> >> >>> cancelling
>>> > > > > > > > > > > >> >> >>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>> job).
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
>>> > > > > > > > > > > >> >> >>>>>>
>>> > > > https://issues.apache.org/jira/browse/FLINK-4272
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at
>>> > 10:13 AM
>>> > > > > > Jeff
>>> > > > > > > > > Zhang
>>> > > > > > > > > > > >> >> >> <
>>> > > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> wrote:
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate
>>> > flink
>>> > > > > into
>>> > > > > > > > > apache
>>> > > > > > > > > > > >> >> >>>>> zeppelin
>>> > > > > > > > > > > >> >> >>>>>>>>>>> which is
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> an
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several
>>> > > > > issues
>>> > > > > > that
>>> > > > > > > > > is
>>> > > > > > > > > > > >> >> >>>> caused
>>> > > > > > > > > > > >> >> >>>>>> by
>>> > > > > > > > > > > >> >> >>>>>>>>>>> flink
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> client
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> api.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> So
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the
>>> > > > > following
>>> > > > > > > > > changes
>>> > > > > > > > > > > >> >> >>> for
>>> > > > > > > > > > > >> >> >>>>>> flink
>>> > > > > > > > > > > >> >> >>>>>>>>>>> client
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> api.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking
>>> > > > execution.
>>> > > > > > > > > > > >> >> >> Currently,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which
>>> > > > would
>>> > > > > > do 2
>>> > > > > > > > > > > >> >> >> things,
>>> > > > > > > > > > > >> >> >>>>> first
>>> > > > > > > > > > > >> >> >>>>>>>>>>> submit
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is
>>> > > > > finished.
>>> > > > > > I'd
>>> > > > > > > > > like
>>> > > > > > > > > > > >> >> >>>>>>> introduce a
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> nonblocking
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
>>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment#submit
>>> > > > > > > > > > > >> >> >>>>>>> which
>>> > > > > > > > > > > >> >> >>>>>>>>>>> only
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> submit
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to
>>> > client.
>>> > > > And
>>> > > > > > allow
>>> > > > > > > > > user
>>> > > > > > > > > > > >> >> >>> to
>>> > > > > > > > > > > >> >> >>>>>> query
>>> > > > > > > > > > > >> >> >>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>> job
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> status
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> via
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >> ExecutionEnvironment/StreamExecutionEnvironment,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to
>>> > > > cancel
>>> > > > > > job is
>>> > > > > > > > > via
>>> > > > > > > > > > > >> >> >>> cli
>>> > > > > > > > > > > >> >> >>>>>>>>>>> (bin/flink),
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> this
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> is
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> not
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream
>>> > > > project
>>> > > > > > to
>>> > > > > > > > use
>>> > > > > > > > > > > >> >> >> this
>>> > > > > > > > > > > >> >> >>>>>>> feature.
>>> > > > > > > > > > > >> >> >>>>>>>>>>> So I'd
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> like
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> add
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in
>>> > > > > ExecutionEnvironment
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>
>>> > ExecutionEnvironment/StreamExecutionEnvironment.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> It
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api,
>>> > we
>>> > > > > > should use
>>> > > > > > > > > > > >> >> >>>>>>>>>>> ExecutionEnvironment
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> as
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to
>>> > > > integrate
>>> > > > > > with
>>> > > > > > > > > > > >> >> >> flink.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job
>>> > > > execution
>>> > > > > > > > > > > >> >> >> lifecycle.
>>> > > > > > > > > > > >> >> >>>>>>> Something
>>> > > > > > > > > > > >> >> >>>>>>>>>>> like
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> following,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> so
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project
>>> > can do
>>> > > > > > custom
>>> > > > > > > > > logic
>>> > > > > > > > > > > >> >> >>> in
>>> > > > > > > > > > > >> >> >>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>> lifecycle
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> of
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> job.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the
>>> > > > jobId
>>> > > > > > after
>>> > > > > > > > > job
>>> > > > > > > > > > > >> >> >> is
>>> > > > > > > > > > > >> >> >>>>>>> submitted
>>> > > > > > > > > > > >> >> >>>>>>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> use
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> this
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later
>>> > when
>>> > > > > > > > necessary.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface
>>> > JobListener {
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID
>>> > > > > jobId);
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
>>> > > > > > onJobExecuted(JobExecutionResult
>>> > > > > > > > > > > >> >> >>>>> jobResult);
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID
>>> > > > jobId);
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in
>>> > > > > > > > > ExecutionEnvironment.
>>> > > > > > > > > > > >> >> >>>>>> Currently
>>> > > > > > > > > > > >> >> >>>>>>> it
>>> > > > > > > > > > > >> >> >>>>>>>>>>> is
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> disabled,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> but
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient
>>> > for
>>> > > > > > third
>>> > > > > > > > > party
>>> > > > > > > > > > > >> >> >> to
>>> > > > > > > > > > > >> >> >>>>>>>> submitting
>>> > > > > > > > > > > >> >> >>>>>>>>>>> jobs
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it
>>> > > > again.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client
>>> > api
>>> > > > > into
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>
>>> > ExecutionEnvironment/StreamExecutionEnvironment.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue
>>> > which
>>> > > > > > needs
>>> > > > > > > > > more
>>> > > > > > > > > > > >> >> >>>>> careful
>>> > > > > > > > > > > >> >> >>>>>>>>>>> thinking
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> design.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features
>>> > of
>>> > > > > > flink is
>>> > > > > > > > > > > >> >> >>> exposed
>>> > > > > > > > > > > >> >> >>>> in
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>
>>> > ExecutionEnvironment/StreamExecutionEnvironment,
>>> > > > > > > > > > > >> >> >>>>>> but
>>> > > > > > > > > > > >> >> >>>>>>>>>>> some are
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> exposed
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> in
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like
>>> > the
>>> > > > > > cancel and
>>> > > > > > > > > > > >> >> >>>>> savepoint I
>>> > > > > > > > > > > >> >> >>>>>>>>>>> mentioned
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> above.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is
>>> > due to
>>> > > > > that
>>> > > > > > > > flink
>>> > > > > > > > > > > >> >> >>>> didn't
>>> > > > > > > > > > > >> >> >>>>>>> unify
>>> > > > > > > > > > > >> >> >>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> interaction
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> with
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3
>>> > scenarios
>>> > > > of
>>> > > > > > flink
>>> > > > > > > > > > > >> >> >>>> operation
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.
>>> > Flink
>>> > > > > will
>>> > > > > > > > > create
>>> > > > > > > > > > > >> >> >>>>>>>>>>> LocalEnvironment
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> use
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to
>>> > > > create
>>> > > > > > > > > > > >> >> >>> LocalExecutor
>>> > > > > > > > > > > >> >> >>>>> for
>>> > > > > > > > > > > >> >> >>>>>>> job
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> execution.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution.
>>> > Flink
>>> > > > > will
>>> > > > > > > > > create
>>> > > > > > > > > > > >> >> >>>>>>>> ClusterClient
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> first
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> then
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment
>>> > > > based
>>> > > > > > on the
>>> > > > > > > > > > > >> >> >>>>>>> ClusterClient
>>> > > > > > > > > > > >> >> >>>>>>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> run
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink
>>> > will
>>> > > > > > create
>>> > > > > > > > > > > >> >> >>>>>> ClusterClient
>>> > > > > > > > > > > >> >> >>>>>>>>>>> first
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> then
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this
>>> > > > ClusterClient.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the
>>> > above 3
>>> > > > > > > > scenarios.
>>> > > > > > > > > > > >> >> >>> Flink
>>> > > > > > > > > > > >> >> >>>>>> didn't
>>> > > > > > > > > > > >> >> >>>>>>>>>>> use the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to
>>> > interact
>>> > > > > > with
>>> > > > > > > > > flink
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is
>>> > following:
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
>>> > > > > > > > > > > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
>>> > > > > > > > > > > >> >> >>>>>>>>>>> (based
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> on
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> user
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this
>>> > > > > > Environment
>>> > > > > > > > > to
>>> > > > > > > > > > > >> >> >>>> create
>>> > > > > > > > > > > >> >> >>>>>>>> proper
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or
>>> > > > > > > > RestClusterClient)
>>> > > > > > > > > > > >> >> >> to
>>> > > > > > > > > > > >> >> >>>>>>>> interactive
>>> > > > > > > > > > > >> >> >>>>>>>>>>> with
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink (
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the
>>> > > > process
>>> > > > > > of
>>> > > > > > > > > local
>>> > > > > > > > > > > >> >> >>>>>> execution
>>> > > > > > > > > > > >> >> >>>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> remote
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for
>>> > third
>>> > > > > > party
>>> > > > > > > > to
>>> > > > > > > > > > > >> >> >>>>> integrate
>>> > > > > > > > > > > >> >> >>>>>>> with
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> flink,
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> because
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the
>>> > > > > unified
>>> > > > > > > > entry
>>> > > > > > > > > > > >> >> >>> point
>>> > > > > > > > > > > >> >> >>>>> for
>>> > > > > > > > > > > >> >> >>>>>>>>>>> flink.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> third
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> party
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass
>>> > > > > > configuration
>>> > > > > > > > to
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will
>>> > do
>>> > > > the
>>> > > > > > right
>>> > > > > > > > > > > >> >> >> thing
>>> > > > > > > > > > > >> >> >>>>> based
>>> > > > > > > > > > > >> >> >>>>>> on
>>> > > > > > > > > > > >> >> >>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be
>>> > > > considered
>>> > > > > as
>>> > > > > > > > flink
>>> > > > > > > > > > > >> >> >> api
>>> > > > > > > > > > > >> >> >>>>>>> consumer.
>>> > > > > > > > > > > >> >> >>>>>>>>>>> it
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> just
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to
>>> > > > > > ExecutionEnvironment
>>> > > > > > > > and
>>> > > > > > > > > > > >> >> >> let
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper
>>> > ClusterClient
>>> > > > > > instead
>>> > > > > > > > > of
>>> > > > > > > > > > > >> >> >>>>> letting
>>> > > > > > > > > > > >> >> >>>>>>> cli
>>> > > > > > > > > > > >> >> >>>>>>>> to
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> create
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code
>>> > > > > > refactoring,
>>> > > > > > > > > so
>>> > > > > > > > > > > >> >> >> I
>>> > > > > > > > > > > >> >> >>>>> think
>>> > > > > > > > > > > >> >> >>>>>> we
>>> > > > > > > > > > > >> >> >>>>>>>> can
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> defer
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> it
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> for
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5
>>> > could
>>> > > > > be
>>> > > > > > done
>>> > > > > > > > > at
>>> > > > > > > > > > > >> >> >>>> once I
>>> > > > > > > > > > > >> >> >>>>>>>>>>> believe.
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Let
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> know
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> your
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback,
>>> > thanks
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> --
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> --
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> --
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Best Regards
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> --
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Best Regards
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Jeff Zhang
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> --
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Best Regards
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Jeff Zhang
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>> --
>>> > > > > > > > > > > >> >> >>>>>>>> Best Regards
>>> > > > > > > > > > > >> >> >>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>> Jeff Zhang
>>> > > > > > > > > > > >> >> >>>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>>
>>> > > > > > > > > > > >> >> >>>>>>
>>> > > > > > > > > > > >> >> >>>>>
>>> > > > > > > > > > > >> >> >>>>>
>>> > > > > > > > > > > >> >> >>>>> --
>>> > > > > > > > > > > >> >> >>>>> Best Regards
>>> > > > > > > > > > > >> >> >>>>>
>>> > > > > > > > > > > >> >> >>>>> Jeff Zhang
>>> > > > > > > > > > > >> >> >>>>>
>>> > > > > > > > > > > >> >> >>>>
>>> > > > > > > > > > > >> >> >>>
>>> > > > > > > > > > > >> >> >>
>>> > > > > > > > > > > >> >> >
>>> > > > > > > > > > > >> >> >
>>> > > > > > > > > > > >> >> > --
>>> > > > > > > > > > > >> >> > Best Regards
>>> > > > > > > > > > > >> >> >
>>> > > > > > > > > > > >> >> > Jeff Zhang
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >> >>
>>> > > > > > > > > > > >>
>>> > > > > > > > > > > >
>>> > > > > > > > > > > >
>>> > > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> >
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

Hi Kostas & Aljoscha,

I notice that there is a JIRA(FLINK-13946) which could be included
in this refactor thread. Since we agree on most of directions in
big picture, is it reasonable that we create an umbrella issue for
refactor client APIs and also linked subtasks? It would be a better
way that we join forces of our community.

Best,
tison.


Zili Chen <wa...@gmail.com> 于2019年8月31日周六 下午12:52写道：

> Great Kostas! Looking forward to your POC!
>
> Best,
> tison.
>
>
> Jeff Zhang <zj...@gmail.com> 于2019年8月30日周五 下午11:07写道：
>
>> Awesome, @Kostas Looking forward your POC.
>>
>> Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：
>>
>> > Hi all,
>> >
>> > I am just writing here to let you know that I am working on a POC that
>> > tries to refactor the current state of job submission in Flink.
>> > I want to stress out that it introduces NO CHANGES to the current
>> > behaviour of Flink. It just re-arranges things and introduces the
>> > notion of an Executor, which is the entity responsible for taking the
>> > user-code and submitting it for execution.
>> >
>> > Given this, the discussion about the functionality that the JobClient
>> > will expose to the user can go on independently and the same
>> > holds for all the open questions so far.
>> >
>> > I hope I will have some more new to share soon.
>> >
>> > Thanks,
>> > Kostas
>> >
>> > On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <da...@gmail.com>
>> wrote:
>> > >
>> > > Hi Zili,
>> > >
>> > > It make sense to me that a dedicated cluster is started for a per-job
>> > > cluster and will not accept more jobs.
>> > > Just have a question about the command line.
>> > >
>> > > Currently we could use the following commands to start different
>> > clusters.
>> > > *per-job cluster*
>> > > ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
>> > > examples/streaming/WindowJoin.jar
>> > > *session cluster*
>> > > ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
>> > > examples/streaming/WindowJoin.jar
>> > >
>> > > What will it look like after client enhancement?
>> > >
>> > >
>> > > Best,
>> > > Yang
>> > >
>> > > Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
>> > >
>> > > > Hi Till,
>> > > >
>> > > > Thanks for your update. Nice to hear :-)
>> > > >
>> > > > Best,
>> > > > tison.
>> > > >
>> > > >
>> > > > Till Rohrmann <tr...@apache.org> 于2019年8月23日周五 下午10:39写道：
>> > > >
>> > > > > Hi Tison,
>> > > > >
>> > > > > just a quick comment concerning the class loading issues when
>> using
>> > the
>> > > > per
>> > > > > job mode. The community wants to change it so that the
>> > > > > StandaloneJobClusterEntryPoint actually uses the user code class
>> > loader
>> > > > > with child first class loading [1]. Hence, I hope that this
>> problem
>> > will
>> > > > be
>> > > > > resolved soon.
>> > > > >
>> > > > > [1] https://issues.apache.org/jira/browse/FLINK-13840
>> > > > >
>> > > > > Cheers,
>> > > > > Till
>> > > > >
>> > > > > On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <
>> kkloudas@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > > > Hi all,
>> > > > > >
>> > > > > > On the topic of web submission, I agree with Till that it only
>> > seems
>> > > > > > to complicate things.
>> > > > > > It is bad for security, job isolation (anybody can submit/cancel
>> > jobs),
>> > > > > > and its
>> > > > > > implementation complicates some parts of the code. So, if it
>> were
>> > to
>> > > > > > redesign the
>> > > > > > WebUI, maybe this part could be left out. In addition, I would
>> say
>> > > > > > that the ability to cancel
>> > > > > > jobs could also be left out.
>> > > > > >
>> > > > > > Also I would also be in favour of removing the "detached" mode,
>> for
>> > > > > > the reasons mentioned
>> > > > > > above (i.e. because now we will have a future representing the
>> > result
>> > > > > > on which the user
>> > > > > > can choose to wait or not).
>> > > > > >
>> > > > > > Now for the separating job submission and cluster creation, I
>> am in
>> > > > > > favour of keeping both.
>> > > > > > Once again, the reasons are mentioned above by Stephan, Till,
>> > Aljoscha
>> > > > > > and also Zili seems
>> > > > > > to agree. They mainly have to do with security, isolation and
>> ease
>> > of
>> > > > > > resource management
>> > > > > > for the user as he knows that "when my job is done, everything
>> > will be
>> > > > > > cleared up". This is
>> > > > > > also the experience you get when launching a process on your
>> local
>> > OS.
>> > > > > >
>> > > > > > On excluding the per-job mode from returning a JobClient or
>> not, I
>> > > > > > believe that eventually
>> > > > > > it would be nice to allow users to get back a jobClient. The
>> > reason is
>> > > > > > that 1) I cannot
>> > > > > > find any objective reason why the user-experience should
>> diverge,
>> > and
>> > > > > > 2) this will be the
>> > > > > > way that the user will be able to interact with his running job.
>> > > > > > Assuming that the necessary
>> > > > > > ports are open for the REST API to work, then I think that the
>> > > > > > JobClient can run against the
>> > > > > > REST API without problems. If the needed ports are not open,
>> then
>> > we
>> > > > > > are safe to not return
>> > > > > > a JobClient, as the user explicitly chose to close all points of
>> > > > > > communication to his running job.
>> > > > > >
>> > > > > > On the topic of not hijacking the "env.execute()" in order to
>> get
>> > the
>> > > > > > Plan, I definitely agree but
>> > > > > > for the proposal of having a "compile()" method in the env, I
>> would
>> > > > > > like to have a better look at
>> > > > > > the existing code.
>> > > > > >
>> > > > > > Cheers,
>> > > > > > Kostas
>> > > > > >
>> > > > > > On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <wander4096@gmail.com
>> >
>> > > > wrote:
>> > > > > > >
>> > > > > > > Hi Yang,
>> > > > > > >
>> > > > > > > It would be helpful if you check Stephan's last comment,
>> > > > > > > which states that isolation is important.
>> > > > > > >
>> > > > > > > For per-job mode, we run a dedicated cluster(maybe it
>> > > > > > > should have been a couple of JM and TMs during FLIP-6
>> > > > > > > design) for a specific job. Thus the process is prevented
>> > > > > > > from other jobs.
>> > > > > > >
>> > > > > > > In our cases there was a time we suffered from multi
>> > > > > > > jobs submitted by different users and they affected
>> > > > > > > each other so that all ran into an error state. Also,
>> > > > > > > run the client inside the cluster could save client
>> > > > > > > resource at some points.
>> > > > > > >
>> > > > > > > However, we also face several issues as you mentioned,
>> > > > > > > that in per-job mode it always uses parent classloader
>> > > > > > > thus classloading issues occur.
>> > > > > > >
>> > > > > > > BTW, one can makes an analogy between session/per-job mode
>> > > > > > > in  Flink, and client/cluster mode in Spark.
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > tison.
>> > > > > > >
>> > > > > > >
>> > > > > > > Yang Wang <da...@gmail.com> 于2019年8月22日周四 上午11:25写道：
>> > > > > > >
>> > > > > > > > From the user's perspective, it is really confused about the
>> > scope
>> > > > of
>> > > > > > > > per-job cluster.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > If it means a flink cluster with single job, so that we
>> could
>> > get
>> > > > > > better
>> > > > > > > > isolation.
>> > > > > > > >
>> > > > > > > > Now it does not matter how we deploy the cluster, directly
>> > > > > > deploy(mode1)
>> > > > > > > >
>> > > > > > > > or start a flink cluster and then submit job through cluster
>> > > > > > client(mode2).
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Otherwise, if it just means directly deploy, how should we
>> > name the
>> > > > > > mode2,
>> > > > > > > >
>> > > > > > > > session with job or something else?
>> > > > > > > >
>> > > > > > > > We could also benefit from the mode2. Users could get the
>> same
>> > > > > > isolation
>> > > > > > > > with mode1.
>> > > > > > > >
>> > > > > > > > The user code and dependencies will be loaded by user class
>> > loader
>> > > > > > > >
>> > > > > > > > to avoid class conflict with framework.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Anyway, both of the two submission modes are useful.
>> > > > > > > >
>> > > > > > > > We just need to clarify the concepts.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > >
>> > > > > > > > Yang
>> > > > > > > >
>> > > > > > > > Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午5:58写道：
>> > > > > > > >
>> > > > > > > > > Thanks for the clarification.
>> > > > > > > > >
>> > > > > > > > > The idea JobDeployer ever came into my mind when I was
>> > muddled
>> > > > with
>> > > > > > > > > how to execute per-job mode and session mode with the same
>> > user
>> > > > > code
>> > > > > > > > > and framework codepath.
>> > > > > > > > >
>> > > > > > > > > With the concept JobDeployer we back to the statement that
>> > > > > > environment
>> > > > > > > > > knows every configs of cluster deployment and job
>> > submission. We
>> > > > > > > > > configure or generate from configuration a specific
>> > JobDeployer
>> > > > in
>> > > > > > > > > environment and then code align on
>> > > > > > > > >
>> > > > > > > > > *JobClient client = env.execute().get();*
>> > > > > > > > >
>> > > > > > > > > which in session mode returned by clusterClient.submitJob
>> > and in
>> > > > > > per-job
>> > > > > > > > > mode returned by clusterDescriptor.deployJobCluster.
>> > > > > > > > >
>> > > > > > > > > Here comes a problem that currently we directly run
>> > > > > ClusterEntrypoint
>> > > > > > > > > with extracted job graph. Follow the JobDeployer way we'd
>> > better
>> > > > > > > > > align entry point of per-job deployment at JobDeployer.
>> > Users run
>> > > > > > > > > their main method or by a Cli(finally call main method) to
>> > deploy
>> > > > > the
>> > > > > > > > > job cluster.
>> > > > > > > > >
>> > > > > > > > > Best,
>> > > > > > > > > tison.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Stephan Ewen <se...@apache.org> 于2019年8月20日周二 下午4:40写道：
>> > > > > > > > >
>> > > > > > > > > > Till has made some good comments here.
>> > > > > > > > > >
>> > > > > > > > > > Two things to add:
>> > > > > > > > > >
>> > > > > > > > > >   - The job mode is very nice in the way that it runs
>> the
>> > > > client
>> > > > > > inside
>> > > > > > > > > the
>> > > > > > > > > > cluster (in the same image/process that is the JM) and
>> thus
>> > > > > unifies
>> > > > > > > > both
>> > > > > > > > > > applications and what the Spark world calls the "driver
>> > mode".
>> > > > > > > > > >
>> > > > > > > > > >   - Another thing I would add is that during the FLIP-6
>> > design,
>> > > > > we
>> > > > > > were
>> > > > > > > > > > thinking about setups where Dispatcher and JobManager
>> are
>> > > > > separate
>> > > > > > > > > > processes.
>> > > > > > > > > >     A Yarn or Mesos Dispatcher of a session could run
>> > > > > independently
>> > > > > > > > (even
>> > > > > > > > > > as privileged processes executing no code).
>> > > > > > > > > >     Then you the "per-job" mode could still be helpful:
>> > when a
>> > > > > job
>> > > > > > is
>> > > > > > > > > > submitted to the dispatcher, it launches the JM again
>> in a
>> > > > > per-job
>> > > > > > > > mode,
>> > > > > > > > > so
>> > > > > > > > > > that JM and TM processes are bound to teh job only. For
>> > higher
>> > > > > > security
>> > > > > > > > > > setups, it is important that processes are not reused
>> > across
>> > > > > jobs.
>> > > > > > > > > >
>> > > > > > > > > > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
>> > > > > > trohrmann@apache.org>
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > I would not be in favour of getting rid of the per-job
>> > mode
>> > > > > > since it
>> > > > > > > > > > > simplifies the process of running Flink jobs
>> > considerably.
>> > > > > > Moreover,
>> > > > > > > > it
>> > > > > > > > > > is
>> > > > > > > > > > > not only well suited for container deployments but
>> also
>> > for
>> > > > > > > > deployments
>> > > > > > > > > > > where you want to guarantee job isolation. For
>> example, a
>> > > > user
>> > > > > > could
>> > > > > > > > > use
>> > > > > > > > > > > the per-job mode on Yarn to execute his job on a
>> separate
>> > > > > > cluster.
>> > > > > > > > > > >
>> > > > > > > > > > > I think that having two notions of cluster deployments
>> > > > (session
>> > > > > > vs.
>> > > > > > > > > > per-job
>> > > > > > > > > > > mode) does not necessarily contradict your ideas for
>> the
>> > > > client
>> > > > > > api
>> > > > > > > > > > > refactoring. For example one could have the following
>> > > > > interfaces:
>> > > > > > > > > > >
>> > > > > > > > > > > - ClusterDeploymentDescriptor: encapsulates the logic
>> > how to
>> > > > > > deploy a
>> > > > > > > > > > > cluster.
>> > > > > > > > > > > - ClusterClient: allows to interact with a cluster
>> > > > > > > > > > > - JobClient: allows to interact with a running job
>> > > > > > > > > > >
>> > > > > > > > > > > Now the ClusterDeploymentDescriptor could have two
>> > methods:
>> > > > > > > > > > >
>> > > > > > > > > > > - ClusterClient deploySessionCluster()
>> > > > > > > > > > > - JobClusterClient/JobClient
>> > deployPerJobCluster(JobGraph)
>> > > > > > > > > > >
>> > > > > > > > > > > where JobClusterClient is either a supertype of
>> > ClusterClient
>> > > > > > which
>> > > > > > > > > does
>> > > > > > > > > > > not give you the functionality to submit jobs or
>> > > > > > deployPerJobCluster
>> > > > > > > > > > > returns directly a JobClient.
>> > > > > > > > > > >
>> > > > > > > > > > > When setting up the ExecutionEnvironment, one would
>> then
>> > not
>> > > > > > provide
>> > > > > > > > a
>> > > > > > > > > > > ClusterClient to submit jobs but a JobDeployer which,
>> > > > depending
>> > > > > > on
>> > > > > > > > the
>> > > > > > > > > > > selected mode, either uses a ClusterClient (session
>> > mode) to
>> > > > > > submit
>> > > > > > > > > jobs
>> > > > > > > > > > or
>> > > > > > > > > > > a ClusterDeploymentDescriptor to deploy per a job mode
>> > > > cluster
>> > > > > > with
>> > > > > > > > the
>> > > > > > > > > > job
>> > > > > > > > > > > to execute.
>> > > > > > > > > > >
>> > > > > > > > > > > These are just some thoughts how one could make it
>> > working
>> > > > > > because I
>> > > > > > > > > > > believe there is some value in using the per job mode
>> > from
>> > > > the
>> > > > > > > > > > > ExecutionEnvironment.
>> > > > > > > > > > >
>> > > > > > > > > > > Concerning the web submission, this is indeed a bit
>> > tricky.
>> > > > > From
>> > > > > > a
>> > > > > > > > > > cluster
>> > > > > > > > > > > management stand point, I would in favour of not
>> > executing
>> > > > user
>> > > > > > code
>> > > > > > > > on
>> > > > > > > > > > the
>> > > > > > > > > > > REST endpoint. Especially when considering security,
>> it
>> > would
>> > > > > be
>> > > > > > good
>> > > > > > > > > to
>> > > > > > > > > > > have a well defined cluster behaviour where it is
>> > explicitly
>> > > > > > stated
>> > > > > > > > > where
>> > > > > > > > > > > user code and, thus, potentially risky code is
>> executed.
>> > > > > Ideally
>> > > > > > we
>> > > > > > > > > limit
>> > > > > > > > > > > it to the TaskExecutor and JobMaster.
>> > > > > > > > > > >
>> > > > > > > > > > > Cheers,
>> > > > > > > > > > > Till
>> > > > > > > > > > >
>> > > > > > > > > > > On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <
>> > > > > > > > > pompermaier@okkam.it
>> > > > > > > > > > >
>> > > > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > In my opinion the client should not use any
>> > environment to
>> > > > > get
>> > > > > > the
>> > > > > > > > > Job
>> > > > > > > > > > > > graph because the jar should reside ONLY on the
>> cluster
>> > > > (and
>> > > > > > not in
>> > > > > > > > > the
>> > > > > > > > > > > > client classpath otherwise there are always
>> > inconsistencies
>> > > > > > between
>> > > > > > > > > > > client
>> > > > > > > > > > > > and Flink Job manager's classpath).
>> > > > > > > > > > > > In the YARN, Mesos and Kubernetes scenarios you have
>> > the
>> > > > jar
>> > > > > > but
>> > > > > > > > you
>> > > > > > > > > > > could
>> > > > > > > > > > > > start a cluster that has the jar on the Job Manager
>> as
>> > well
>> > > > > > (but
>> > > > > > > > this
>> > > > > > > > > > is
>> > > > > > > > > > > > the only case where I think you can assume that the
>> > client
>> > > > > has
>> > > > > > the
>> > > > > > > > > jar
>> > > > > > > > > > on
>> > > > > > > > > > > > the classpath..in the REST job submission you don't
>> > have
>> > > > any
>> > > > > > > > > > classpath).
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thus, always in my opinion, the JobGraph should be
>> > > > generated
>> > > > > > by the
>> > > > > > > > > Job
>> > > > > > > > > > > > Manager REST API.
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
>> > > > > > wander4096@gmail.com>
>> > > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > >> I would like to involve Till & Stephan here to
>> clarify
>> > > > some
>> > > > > > > > concept
>> > > > > > > > > of
>> > > > > > > > > > > >> per-job mode.
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> The term per-job is one of modes a cluster could
>> run
>> > on.
>> > > > It
>> > > > > is
>> > > > > > > > > mainly
>> > > > > > > > > > > >> aimed
>> > > > > > > > > > > >> at spawn
>> > > > > > > > > > > >> a dedicated cluster for a specific job while the
>> job
>> > could
>> > > > > be
>> > > > > > > > > packaged
>> > > > > > > > > > > >> with
>> > > > > > > > > > > >> Flink
>> > > > > > > > > > > >> itself and thus the cluster initialized with job so
>> > that
>> > > > get
>> > > > > > rid
>> > > > > > > > of
>> > > > > > > > > a
>> > > > > > > > > > > >> separated
>> > > > > > > > > > > >> submission step.
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> This is useful for container deployments where one
>> > create
>> > > > > his
>> > > > > > > > image
>> > > > > > > > > > with
>> > > > > > > > > > > >> the job
>> > > > > > > > > > > >> and then simply deploy the container.
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> However, it is out of client scope since a
>> > > > > > client(ClusterClient
>> > > > > > > > for
>> > > > > > > > > > > >> example) is for
>> > > > > > > > > > > >> communicate with an existing cluster and
>> performance
>> > > > > actions.
>> > > > > > > > > > Currently,
>> > > > > > > > > > > >> in
>> > > > > > > > > > > >> per-job
>> > > > > > > > > > > >> mode, we extract the job graph and bundle it into
>> > cluster
>> > > > > > > > deployment
>> > > > > > > > > > and
>> > > > > > > > > > > >> thus no
>> > > > > > > > > > > >> concept of client get involved. It looks like
>> > reasonable
>> > > > to
>> > > > > > > > exclude
>> > > > > > > > > > the
>> > > > > > > > > > > >> deployment
>> > > > > > > > > > > >> of per-job cluster from client api and use
>> dedicated
>> > > > utility
>> > > > > > > > > > > >> classes(deployers) for
>> > > > > > > > > > > >> deployment.
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> Zili Chen <wa...@gmail.com> 于2019年8月20日周二
>> > 下午12:37写道：
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> > Hi Aljoscha,
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > Thanks for your reply and participance. The
>> Google
>> > Doc
>> > > > you
>> > > > > > > > linked
>> > > > > > > > > to
>> > > > > > > > > > > >> > requires
>> > > > > > > > > > > >> > permission and I think you could use a share link
>> > > > instead.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > I agree with that we almost reach a consensus
>> that
>> > > > > > JobClient is
>> > > > > > > > > > > >> necessary
>> > > > > > > > > > > >> > to
>> > > > > > > > > > > >> > interacte with a running Job.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > Let me check your open questions one by one.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > 1. Separate cluster creation and job submission
>> for
>> > > > > per-job
>> > > > > > > > mode.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > As you mentioned here is where the opinions
>> > diverge. In
>> > > > my
>> > > > > > > > > document
>> > > > > > > > > > > >> there
>> > > > > > > > > > > >> > is
>> > > > > > > > > > > >> > an alternative[2] that proposes excluding per-job
>> > > > > deployment
>> > > > > > > > from
>> > > > > > > > > > > client
>> > > > > > > > > > > >> > api
>> > > > > > > > > > > >> > scope and now I find it is more reasonable we do
>> the
>> > > > > > exclusion.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > When in per-job mode, a dedicated JobCluster is
>> > launched
>> > > > > to
>> > > > > > > > > execute
>> > > > > > > > > > > the
>> > > > > > > > > > > >> > specific job. It is like a Flink Application more
>> > than a
>> > > > > > > > > submission
>> > > > > > > > > > > >> > of Flink Job. Client only takes care of job
>> > submission
>> > > > and
>> > > > > > > > assume
>> > > > > > > > > > > there
>> > > > > > > > > > > >> is
>> > > > > > > > > > > >> > an existing cluster. In this way we are able to
>> > consider
>> > > > > > per-job
>> > > > > > > > > > > issues
>> > > > > > > > > > > >> > individually and JobClusterEntrypoint would be
>> the
>> > > > utility
>> > > > > > class
>> > > > > > > > > for
>> > > > > > > > > > > >> > per-job
>> > > > > > > > > > > >> > deployment.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > Nevertheless, user program works in both session
>> > mode
>> > > > and
>> > > > > > > > per-job
>> > > > > > > > > > mode
>> > > > > > > > > > > >> > without
>> > > > > > > > > > > >> > necessary to change code. JobClient in per-job
>> mode
>> > is
>> > > > > > returned
>> > > > > > > > > from
>> > > > > > > > > > > >> > env.execute as normal. However, it would be no
>> > longer a
>> > > > > > wrapper
>> > > > > > > > of
>> > > > > > > > > > > >> > RestClusterClient but a wrapper of
>> > PerJobClusterClient
>> > > > > which
>> > > > > > > > > > > >> communicates
>> > > > > > > > > > > >> > to Dispatcher locally.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > 2. How to deal with plan preview.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > With env.compile functions users can get
>> JobGraph or
>> > > > > > FlinkPlan
>> > > > > > > > and
>> > > > > > > > > > > thus
>> > > > > > > > > > > >> > they can preview the plan with programming.
>> > Typically it
>> > > > > > looks
>> > > > > > > > > like
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > if (preview configured) {
>> > > > > > > > > > > >> >     FlinkPlan plan = env.compile();
>> > > > > > > > > > > >> >     new JSONDumpGenerator(...).dump(plan);
>> > > > > > > > > > > >> > } else {
>> > > > > > > > > > > >> >     env.execute();
>> > > > > > > > > > > >> > }
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > And `flink info` would be invalid any more.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > 3. How to deal with Jar Submission at the Web
>> > Frontend.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > There is one more thread talked on this topic[1].
>> > Apart
>> > > > > from
>> > > > > > > > > > removing
>> > > > > > > > > > > >> > the functions there are two alternatives.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > One is to introduce an interface has a method
>> > returns
>> > > > > > > > > > > JobGraph/FilnkPlan
>> > > > > > > > > > > >> > and Jar Submission only support main-class
>> > implements
>> > > > this
>> > > > > > > > > > interface.
>> > > > > > > > > > > >> > And then extract the JobGraph/FlinkPlan just by
>> > calling
>> > > > > the
>> > > > > > > > > method.
>> > > > > > > > > > > >> > In this way, it is even possible to consider a
>> > > > separation
>> > > > > > of job
>> > > > > > > > > > > >> creation
>> > > > > > > > > > > >> > and job submission.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > The other is, as you mentioned, let execute() do
>> the
>> > > > > actual
>> > > > > > > > > > execution.
>> > > > > > > > > > > >> > We won't execute the main method in the
>> WebFrontend
>> > but
>> > > > > > spawn a
>> > > > > > > > > > > process
>> > > > > > > > > > > >> > at WebMonitor side to execute. For return part we
>> > could
>> > > > > > generate
>> > > > > > > > > the
>> > > > > > > > > > > >> > JobID from WebMonitor and pass it to the
>> execution
>> > > > > > environemnt.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > 4. How to deal with detached mode.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > I think detached mode is a temporary solution for
>> > > > > > non-blocking
>> > > > > > > > > > > >> submission.
>> > > > > > > > > > > >> > In my document both submission and execution
>> return
>> > a
>> > > > > > > > > > > CompletableFuture
>> > > > > > > > > > > >> and
>> > > > > > > > > > > >> > users control whether or not wait for the
>> result. In
>> > > > this
>> > > > > > point
>> > > > > > > > we
>> > > > > > > > > > > don't
>> > > > > > > > > > > >> > need a detached option but the functionality is
>> > covered.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > 5. How does per-job mode interact with
>> interactive
>> > > > > > programming.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > All of YARN, Mesos and Kubernetes scenarios
>> follow
>> > the
>> > > > > > pattern
>> > > > > > > > > > launch
>> > > > > > > > > > > a
>> > > > > > > > > > > >> > JobCluster now. And I don't think there would be
>> > > > > > inconsistency
>> > > > > > > > > > between
>> > > > > > > > > > > >> > different resource management.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > Best,
>> > > > > > > > > > > >> > tison.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > [1]
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
>> > > > > > > > > > > >> > [2]
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > Aljoscha Krettek <al...@apache.org>
>> > 于2019年8月16日周五
>> > > > > > 下午9:20写道：
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> >> Hi,
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >> >> I read both Jeffs initial design document and
>> the
>> > newer
>> > > > > > > > document
>> > > > > > > > > by
>> > > > > > > > > > > >> >> Tison. I also finally found the time to collect
>> our
>> > > > > > thoughts on
>> > > > > > > > > the
>> > > > > > > > > > > >> issue,
>> > > > > > > > > > > >> >> I had quite some discussions with Kostas and
>> this
>> > is
>> > > > the
>> > > > > > > > result:
>> > > > > > > > > > [1].
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >> >> I think overall we agree that this part of the
>> > code is
>> > > > in
>> > > > > > dire
>> > > > > > > > > need
>> > > > > > > > > > > of
>> > > > > > > > > > > >> >> some refactoring/improvements but I think there
>> are
>> > > > still
>> > > > > > some
>> > > > > > > > > open
>> > > > > > > > > > > >> >> questions and some differences in opinion what
>> > those
>> > > > > > > > refactorings
>> > > > > > > > > > > >> should
>> > > > > > > > > > > >> >> look like.
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >> >> I think the API-side is quite clear, i.e. we
>> need
>> > some
>> > > > > > > > JobClient
>> > > > > > > > > > API
>> > > > > > > > > > > >> that
>> > > > > > > > > > > >> >> allows interacting with a running Job. It could
>> be
>> > > > > > worthwhile
>> > > > > > > > to
>> > > > > > > > > > spin
>> > > > > > > > > > > >> that
>> > > > > > > > > > > >> >> off into a separate FLIP because we can probably
>> > find
>> > > > > > consensus
>> > > > > > > > > on
>> > > > > > > > > > > that
>> > > > > > > > > > > >> >> part more easily.
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >> >> For the rest, the main open questions from our
>> doc
>> > are
>> > > > > > these:
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >> >>   - Do we want to separate cluster creation and
>> job
>> > > > > > submission
>> > > > > > > > > for
>> > > > > > > > > > > >> >> per-job mode? In the past, there were conscious
>> > efforts
>> > > > > to
>> > > > > > > > *not*
>> > > > > > > > > > > >> separate
>> > > > > > > > > > > >> >> job submission from cluster creation for per-job
>> > > > clusters
>> > > > > > for
>> > > > > > > > > > Mesos,
>> > > > > > > > > > > >> YARN,
>> > > > > > > > > > > >> >> Kubernets (see StandaloneJobClusterEntryPoint).
>> > Tison
>> > > > > > suggests
>> > > > > > > > in
>> > > > > > > > > > his
>> > > > > > > > > > > >> >> design document to decouple this in order to
>> unify
>> > job
>> > > > > > > > > submission.
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >> >>   - How to deal with plan preview, which needs
>> to
>> > > > hijack
>> > > > > > > > > execute()
>> > > > > > > > > > > and
>> > > > > > > > > > > >> >> let the outside code catch an exception?
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >> >>   - How to deal with Jar Submission at the Web
>> > > > Frontend,
>> > > > > > which
>> > > > > > > > > > needs
>> > > > > > > > > > > to
>> > > > > > > > > > > >> >> hijack execute() and let the outside code catch
>> an
>> > > > > > exception?
>> > > > > > > > > > > >> >> CliFrontend.run() “hijacks”
>> > > > > ExecutionEnvironment.execute()
>> > > > > > to
>> > > > > > > > > get a
>> > > > > > > > > > > >> >> JobGraph and then execute that JobGraph
>> manually.
>> > We
>> > > > > could
>> > > > > > get
>> > > > > > > > > > around
>> > > > > > > > > > > >> that
>> > > > > > > > > > > >> >> by letting execute() do the actual execution.
>> One
>> > > > caveat
>> > > > > > for
>> > > > > > > > this
>> > > > > > > > > > is
>> > > > > > > > > > > >> that
>> > > > > > > > > > > >> >> now the main() method doesn’t return (or is
>> forced
>> > to
>> > > > > > return by
>> > > > > > > > > > > >> throwing an
>> > > > > > > > > > > >> >> exception from execute()) which means that for
>> Jar
>> > > > > > Submission
>> > > > > > > > > from
>> > > > > > > > > > > the
>> > > > > > > > > > > >> >> WebFrontend we have a long-running main() method
>> > > > running
>> > > > > > in the
>> > > > > > > > > > > >> >> WebFrontend. This doesn’t sound very good. We
>> > could get
>> > > > > > around
>> > > > > > > > > this
>> > > > > > > > > > > by
>> > > > > > > > > > > >> >> removing the plan preview feature and by
>> removing
>> > Jar
>> > > > > > > > > > > >> Submission/Running.
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >> >>   - How to deal with detached mode? Right now,
>> > > > > > > > > DetachedEnvironment
>> > > > > > > > > > > will
>> > > > > > > > > > > >> >> execute the job and return immediately. If users
>> > > > control
>> > > > > > when
>> > > > > > > > > they
>> > > > > > > > > > > >> want to
>> > > > > > > > > > > >> >> return, by waiting on the job completion future,
>> > how do
>> > > > > we
>> > > > > > deal
>> > > > > > > > > > with
>> > > > > > > > > > > >> this?
>> > > > > > > > > > > >> >> Do we simply remove the distinction between
>> > > > > > > > > detached/non-detached?
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >> >>   - How does per-job mode interact with
>> > “interactive
>> > > > > > > > programming”
>> > > > > > > > > > > >> >> (FLIP-36). For YARN, each execute() call could
>> > spawn a
>> > > > > new
>> > > > > > > > Flink
>> > > > > > > > > > YARN
>> > > > > > > > > > > >> >> cluster. What about Mesos and Kubernetes?
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >> >> The first open question is where the opinions
>> > diverge,
>> > > > I
>> > > > > > think.
>> > > > > > > > > The
>> > > > > > > > > > > >> rest
>> > > > > > > > > > > >> >> are just open questions and interesting things
>> > that we
>> > > > > > need to
>> > > > > > > > > > > >> consider.
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >> >> Best,
>> > > > > > > > > > > >> >> Aljoscha
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >> >> [1]
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>> > > > > > > > > > > >> >> <
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>> > > > > > > > > > > >> >> >
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <
>> > > > > zjffdu@gmail.com>
>> > > > > > > > > wrote:
>> > > > > > > > > > > >> >> >
>> > > > > > > > > > > >> >> > Thanks tison for the effort. I left a few
>> > comments.
>> > > > > > > > > > > >> >> >
>> > > > > > > > > > > >> >> >
>> > > > > > > > > > > >> >> > Zili Chen <wa...@gmail.com>
>> 于2019年7月31日周三
>> > > > > 下午8:24写道：
>> > > > > > > > > > > >> >> >
>> > > > > > > > > > > >> >> >> Hi Flavio,
>> > > > > > > > > > > >> >> >>
>> > > > > > > > > > > >> >> >> Thanks for your reply.
>> > > > > > > > > > > >> >> >>
>> > > > > > > > > > > >> >> >> Either current impl and in the design,
>> > ClusterClient
>> > > > > > > > > > > >> >> >> never takes responsibility for generating
>> > JobGraph.
>> > > > > > > > > > > >> >> >> (what you see in current codebase is several
>> > class
>> > > > > > methods)
>> > > > > > > > > > > >> >> >>
>> > > > > > > > > > > >> >> >> Instead, user describes his program in the
>> main
>> > > > method
>> > > > > > > > > > > >> >> >> with ExecutionEnvironment apis and calls
>> > > > env.compile()
>> > > > > > > > > > > >> >> >> or env.optimize() to get FlinkPlan and
>> JobGraph
>> > > > > > > > respectively.
>> > > > > > > > > > > >> >> >>
>> > > > > > > > > > > >> >> >> For listing main classes in a jar and choose
>> > one for
>> > > > > > > > > > > >> >> >> submission, you're now able to customize a
>> CLI
>> > to do
>> > > > > it.
>> > > > > > > > > > > >> >> >> Specifically, the path of jar is passed as
>> > arguments
>> > > > > and
>> > > > > > > > > > > >> >> >> in the customized CLI you list main classes,
>> > choose
>> > > > > one
>> > > > > > > > > > > >> >> >> to submit to the cluster.
>> > > > > > > > > > > >> >> >>
>> > > > > > > > > > > >> >> >> Best,
>> > > > > > > > > > > >> >> >> tison.
>> > > > > > > > > > > >> >> >>
>> > > > > > > > > > > >> >> >>
>> > > > > > > > > > > >> >> >> Flavio Pompermaier <po...@okkam.it>
>> > > > > 于2019年7月31日周三
>> > > > > > > > > > 下午8:12写道：
>> > > > > > > > > > > >> >> >>
>> > > > > > > > > > > >> >> >>> Just one note on my side: it is not clear
>> to me
>> > > > > > whether the
>> > > > > > > > > > > client
>> > > > > > > > > > > >> >> needs
>> > > > > > > > > > > >> >> >> to
>> > > > > > > > > > > >> >> >>> be able to generate a job graph or not.
>> > > > > > > > > > > >> >> >>> In my opinion, the job jar must resides only
>> > on the
>> > > > > > > > > > > >> server/jobManager
>> > > > > > > > > > > >> >> >> side
>> > > > > > > > > > > >> >> >>> and the client requires a way to get the job
>> > graph.
>> > > > > > > > > > > >> >> >>> If you really want to access to the job
>> graph,
>> > I'd
>> > > > > add
>> > > > > > a
>> > > > > > > > > > > dedicated
>> > > > > > > > > > > >> >> method
>> > > > > > > > > > > >> >> >>> on the ClusterClient. like:
>> > > > > > > > > > > >> >> >>>
>> > > > > > > > > > > >> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
>> > > > > > > > > > > >> >> >>>   - listMainClasses(jarId): List<String>
>> > > > > > > > > > > >> >> >>>
>> > > > > > > > > > > >> >> >>> These would require some addition also on
>> the
>> > job
>> > > > > > manager
>> > > > > > > > > > > endpoint
>> > > > > > > > > > > >> as
>> > > > > > > > > > > >> >> >>> well..what do you think?
>> > > > > > > > > > > >> >> >>>
>> > > > > > > > > > > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <
>> > > > > > > > > > wander4096@gmail.com
>> > > > > > > > > > > >
>> > > > > > > > > > > >> >> wrote:
>> > > > > > > > > > > >> >> >>>
>> > > > > > > > > > > >> >> >>>> Hi all,
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>> Here is a document[1] on client api
>> > enhancement
>> > > > from
>> > > > > > our
>> > > > > > > > > > > >> perspective.
>> > > > > > > > > > > >> >> >>>> We have investigated current
>> implementations.
>> > And
>> > > > we
>> > > > > > > > propose
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>> 1. Unify the implementation of cluster
>> > deployment
>> > > > > and
>> > > > > > job
>> > > > > > > > > > > >> submission
>> > > > > > > > > > > >> >> in
>> > > > > > > > > > > >> >> >>>> Flink.
>> > > > > > > > > > > >> >> >>>> 2. Provide programmatic interfaces to allow
>> > > > flexible
>> > > > > > job
>> > > > > > > > and
>> > > > > > > > > > > >> cluster
>> > > > > > > > > > > >> >> >>>> management.
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>> The first proposal is aimed at reducing
>> code
>> > paths
>> > > > > of
>> > > > > > > > > cluster
>> > > > > > > > > > > >> >> >> deployment
>> > > > > > > > > > > >> >> >>>> and
>> > > > > > > > > > > >> >> >>>> job submission so that one can adopt Flink
>> in
>> > his
>> > > > > > usage
>> > > > > > > > > > easily.
>> > > > > > > > > > > >> The
>> > > > > > > > > > > >> >> >>> second
>> > > > > > > > > > > >> >> >>>> proposal is aimed at providing rich
>> > interfaces for
>> > > > > > > > advanced
>> > > > > > > > > > > users
>> > > > > > > > > > > >> >> >>>> who want to make accurate control of these
>> > stages.
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>> Quick reference on open questions:
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>> 1. Exclude job cluster deployment from
>> client
>> > side
>> > > > > or
>> > > > > > > > > redefine
>> > > > > > > > > > > the
>> > > > > > > > > > > >> >> >>> semantic
>> > > > > > > > > > > >> >> >>>> of job cluster? Since it fits in a process
>> > quite
>> > > > > > different
>> > > > > > > > > > from
>> > > > > > > > > > > >> >> session
>> > > > > > > > > > > >> >> >>>> cluster deployment and job submission.
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>> 2. Maintain the codepaths handling class
>> > > > > > > > > > > o.a.f.api.common.Program
>> > > > > > > > > > > >> or
>> > > > > > > > > > > >> >> >>>> implement customized program handling
>> logic by
>> > > > > > customized
>> > > > > > > > > > > >> >> CliFrontend?
>> > > > > > > > > > > >> >> >>>> See also this thread[2] and the
>> document[1].
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>> 3. Expose ClusterClient as public api or
>> just
>> > > > expose
>> > > > > > api
>> > > > > > > > in
>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment
>> > > > > > > > > > > >> >> >>>> and delegate them to ClusterClient?
>> Further,
>> > in
>> > > > > > either way
>> > > > > > > > > is
>> > > > > > > > > > it
>> > > > > > > > > > > >> >> worth
>> > > > > > > > > > > >> >> >> to
>> > > > > > > > > > > >> >> >>>> introduce a JobClient which is an
>> > encapsulation of
>> > > > > > > > > > ClusterClient
>> > > > > > > > > > > >> that
>> > > > > > > > > > > >> >> >>>> associated to specific job?
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>> Best,
>> > > > > > > > > > > >> >> >>>> tison.
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>> [1]
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>
>> > > > > > > > > > > >> >> >>
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
>> > > > > > > > > > > >> >> >>>> [2]
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>
>> > > > > > > > > > > >> >> >>
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>> Jeff Zhang <zj...@gmail.com>
>> 于2019年7月24日周三
>> > > > > 上午9:19写道：
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>>> Thanks Stephan, I will follow up this
>> issue
>> > in
>> > > > next
>> > > > > > few
>> > > > > > > > > > weeks,
>> > > > > > > > > > > >> and
>> > > > > > > > > > > >> >> >> will
>> > > > > > > > > > > >> >> >>>>> refine the design doc. We could discuss
>> more
>> > > > > details
>> > > > > > > > after
>> > > > > > > > > > 1.9
>> > > > > > > > > > > >> >> >> release.
>> > > > > > > > > > > >> >> >>>>>
>> > > > > > > > > > > >> >> >>>>> Stephan Ewen <se...@apache.org>
>> > 于2019年7月24日周三
>> > > > > > 上午12:58写道：
>> > > > > > > > > > > >> >> >>>>>
>> > > > > > > > > > > >> >> >>>>>> Hi all!
>> > > > > > > > > > > >> >> >>>>>>
>> > > > > > > > > > > >> >> >>>>>> This thread has stalled for a bit, which
>> I
>> > > > assume
>> > > > > > ist
>> > > > > > > > > mostly
>> > > > > > > > > > > >> due to
>> > > > > > > > > > > >> >> >>> the
>> > > > > > > > > > > >> >> >>>>>> Flink 1.9 feature freeze and release
>> testing
>> > > > > effort.
>> > > > > > > > > > > >> >> >>>>>>
>> > > > > > > > > > > >> >> >>>>>> I personally still recognize this issue
>> as
>> > one
>> > > > > > important
>> > > > > > > > > to
>> > > > > > > > > > be
>> > > > > > > > > > > >> >> >>> solved.
>> > > > > > > > > > > >> >> >>>>> I'd
>> > > > > > > > > > > >> >> >>>>>> be happy to help resume this discussion
>> soon
>> > > > > (after
>> > > > > > the
>> > > > > > > > > 1.9
>> > > > > > > > > > > >> >> >> release)
>> > > > > > > > > > > >> >> >>>> and
>> > > > > > > > > > > >> >> >>>>>> see if we can do some step towards this
>> in
>> > Flink
>> > > > > > 1.10.
>> > > > > > > > > > > >> >> >>>>>>
>> > > > > > > > > > > >> >> >>>>>> Best,
>> > > > > > > > > > > >> >> >>>>>> Stephan
>> > > > > > > > > > > >> >> >>>>>>
>> > > > > > > > > > > >> >> >>>>>>
>> > > > > > > > > > > >> >> >>>>>>
>> > > > > > > > > > > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio
>> > > > > Pompermaier
>> > > > > > <
>> > > > > > > > > > > >> >> >>>>> pompermaier@okkam.it>
>> > > > > > > > > > > >> >> >>>>>> wrote:
>> > > > > > > > > > > >> >> >>>>>>
>> > > > > > > > > > > >> >> >>>>>>> That's exactly what I suggested a long
>> time
>> > > > ago:
>> > > > > > the
>> > > > > > > > > Flink
>> > > > > > > > > > > REST
>> > > > > > > > > > > >> >> >>>> client
>> > > > > > > > > > > >> >> >>>>>>> should not require any Flink dependency,
>> > only
>> > > > > http
>> > > > > > > > > library
>> > > > > > > > > > to
>> > > > > > > > > > > >> >> >> call
>> > > > > > > > > > > >> >> >>>> the
>> > > > > > > > > > > >> >> >>>>>> REST
>> > > > > > > > > > > >> >> >>>>>>> services to submit and monitor a job.
>> > > > > > > > > > > >> >> >>>>>>> What I suggested also in [1] was to
>> have a
>> > way
>> > > > to
>> > > > > > > > > > > automatically
>> > > > > > > > > > > >> >> >>>> suggest
>> > > > > > > > > > > >> >> >>>>>> the
>> > > > > > > > > > > >> >> >>>>>>> user (via a UI) the available main
>> classes
>> > and
>> > > > > > their
>> > > > > > > > > > required
>> > > > > > > > > > > >> >> >>>>>>> parameters[2].
>> > > > > > > > > > > >> >> >>>>>>> Another problem we have with Flink is
>> that
>> > the
>> > > > > Rest
>> > > > > > > > > client
>> > > > > > > > > > > and
>> > > > > > > > > > > >> >> >> the
>> > > > > > > > > > > >> >> >>>> CLI
>> > > > > > > > > > > >> >> >>>>>> one
>> > > > > > > > > > > >> >> >>>>>>> behaves differently and we use the CLI
>> > client
>> > > > > (via
>> > > > > > ssh)
>> > > > > > > > > > > because
>> > > > > > > > > > > >> >> >> it
>> > > > > > > > > > > >> >> >>>>> allows
>> > > > > > > > > > > >> >> >>>>>>> to call some other method after
>> > env.execute()
>> > > > [3]
>> > > > > > (we
>> > > > > > > > > have
>> > > > > > > > > > to
>> > > > > > > > > > > >> >> >> call
>> > > > > > > > > > > >> >> >>>>>> another
>> > > > > > > > > > > >> >> >>>>>>> REST service to signal the end of the
>> job).
>> > > > > > > > > > > >> >> >>>>>>> Int his regard, a dedicated interface,
>> > like the
>> > > > > > > > > JobListener
>> > > > > > > > > > > >> >> >>> suggested
>> > > > > > > > > > > >> >> >>>>> in
>> > > > > > > > > > > >> >> >>>>>>> the previous emails, would be very
>> helpful
>> > > > > (IMHO).
>> > > > > > > > > > > >> >> >>>>>>>
>> > > > > > > > > > > >> >> >>>>>>> [1]
>> > > > > > https://issues.apache.org/jira/browse/FLINK-10864
>> > > > > > > > > > > >> >> >>>>>>> [2]
>> > > > > > https://issues.apache.org/jira/browse/FLINK-10862
>> > > > > > > > > > > >> >> >>>>>>> [3]
>> > > > > > https://issues.apache.org/jira/browse/FLINK-10879
>> > > > > > > > > > > >> >> >>>>>>>
>> > > > > > > > > > > >> >> >>>>>>> Best,
>> > > > > > > > > > > >> >> >>>>>>> Flavio
>> > > > > > > > > > > >> >> >>>>>>>
>> > > > > > > > > > > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff
>> Zhang
>> > <
>> > > > > > > > > > zjffdu@gmail.com
>> > > > > > > > > > > >
>> > > > > > > > > > > >> >> >>> wrote:
>> > > > > > > > > > > >> >> >>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>> Hi, Tison,
>> > > > > > > > > > > >> >> >>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>> Thanks for your comments. Overall I
>> agree
>> > with
>> > > > > you
>> > > > > > > > that
>> > > > > > > > > it
>> > > > > > > > > > > is
>> > > > > > > > > > > >> >> >>>>> difficult
>> > > > > > > > > > > >> >> >>>>>>> for
>> > > > > > > > > > > >> >> >>>>>>>> down stream project to integrate with
>> > flink
>> > > > and
>> > > > > we
>> > > > > > > > need
>> > > > > > > > > to
>> > > > > > > > > > > >> >> >>> refactor
>> > > > > > > > > > > >> >> >>>>> the
>> > > > > > > > > > > >> >> >>>>>>>> current flink client api.
>> > > > > > > > > > > >> >> >>>>>>>> And I agree that CliFrontend should
>> only
>> > > > parsing
>> > > > > > > > command
>> > > > > > > > > > > line
>> > > > > > > > > > > >> >> >>>>> arguments
>> > > > > > > > > > > >> >> >>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>> then pass them to ExecutionEnvironment.
>> > It is
>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment's
>> > > > > > > > > > > >> >> >>>>>>>> responsibility to compile job, create
>> > cluster,
>> > > > > and
>> > > > > > > > > submit
>> > > > > > > > > > > job.
>> > > > > > > > > > > >> >> >>>>> Besides
>> > > > > > > > > > > >> >> >>>>>>>> that, Currently flink has many
>> > > > > > ExecutionEnvironment
>> > > > > > > > > > > >> >> >>>> implementations,
>> > > > > > > > > > > >> >> >>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>> flink will use the specific one based
>> on
>> > the
>> > > > > > context.
>> > > > > > > > > > IMHO,
>> > > > > > > > > > > it
>> > > > > > > > > > > >> >> >> is
>> > > > > > > > > > > >> >> >>>> not
>> > > > > > > > > > > >> >> >>>>>>>> necessary, ExecutionEnvironment should
>> be
>> > able
>> > > > > to
>> > > > > > do
>> > > > > > > > the
>> > > > > > > > > > > right
>> > > > > > > > > > > >> >> >>>> thing
>> > > > > > > > > > > >> >> >>>>>>> based
>> > > > > > > > > > > >> >> >>>>>>>> on the FlinkConf it is received. Too
>> many
>> > > > > > > > > > > ExecutionEnvironment
>> > > > > > > > > > > >> >> >>>>>>>> implementation is another burden for
>> > > > downstream
>> > > > > > > > project
>> > > > > > > > > > > >> >> >>>> integration.
>> > > > > > > > > > > >> >> >>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>> One thing I'd like to mention is
>> flink's
>> > scala
>> > > > > > shell
>> > > > > > > > and
>> > > > > > > > > > sql
>> > > > > > > > > > > >> >> >>>> client,
>> > > > > > > > > > > >> >> >>>>>>>> although they are sub-modules of flink,
>> > they
>> > > > > > could be
>> > > > > > > > > > > treated
>> > > > > > > > > > > >> >> >> as
>> > > > > > > > > > > >> >> >>>>>>> downstream
>> > > > > > > > > > > >> >> >>>>>>>> project which use flink's client api.
>> > > > Currently
>> > > > > > you
>> > > > > > > > will
>> > > > > > > > > > > find
>> > > > > > > > > > > >> >> >> it
>> > > > > > > > > > > >> >> >>> is
>> > > > > > > > > > > >> >> >>>>> not
>> > > > > > > > > > > >> >> >>>>>>>> easy for them to integrate with flink,
>> > they
>> > > > > share
>> > > > > > many
>> > > > > > > > > > > >> >> >> duplicated
>> > > > > > > > > > > >> >> >>>>> code.
>> > > > > > > > > > > >> >> >>>>>>> It
>> > > > > > > > > > > >> >> >>>>>>>> is another sign that we should refactor
>> > flink
>> > > > > > client
>> > > > > > > > > api.
>> > > > > > > > > > > >> >> >>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>> I believe it is a large and hard
>> change,
>> > and I
>> > > > > am
>> > > > > > > > afraid
>> > > > > > > > > > we
>> > > > > > > > > > > >> can
>> > > > > > > > > > > >> >> >>> not
>> > > > > > > > > > > >> >> >>>>>> keep
>> > > > > > > > > > > >> >> >>>>>>>> compatibility since many of changes are
>> > user
>> > > > > > facing.
>> > > > > > > > > > > >> >> >>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>> Zili Chen <wa...@gmail.com>
>> > > > 于2019年6月24日周一
>> > > > > > > > > 下午2:53写道：
>> > > > > > > > > > > >> >> >>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> Hi all,
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> After a closer look on our client
>> apis,
>> > I can
>> > > > > see
>> > > > > > > > there
>> > > > > > > > > > are
>> > > > > > > > > > > >> >> >> two
>> > > > > > > > > > > >> >> >>>>> major
>> > > > > > > > > > > >> >> >>>>>>>>> issues to consistency and integration,
>> > namely
>> > > > > > > > different
>> > > > > > > > > > > >> >> >>>> deployment
>> > > > > > > > > > > >> >> >>>>> of
>> > > > > > > > > > > >> >> >>>>>>>>> job cluster which couples job graph
>> > creation
>> > > > > and
>> > > > > > > > > cluster
>> > > > > > > > > > > >> >> >>>>> deployment,
>> > > > > > > > > > > >> >> >>>>>>>>> and submission via CliFrontend
>> confusing
>> > > > > control
>> > > > > > flow
>> > > > > > > > > of
>> > > > > > > > > > > job
>> > > > > > > > > > > >> >> >>>> graph
>> > > > > > > > > > > >> >> >>>>>>>>> compilation and job submission. I'd
>> like
>> > to
>> > > > > > follow
>> > > > > > > > the
>> > > > > > > > > > > >> >> >> discuss
>> > > > > > > > > > > >> >> >>>>> above,
>> > > > > > > > > > > >> >> >>>>>>>>> mainly the process described by Jeff
>> and
>> > > > > > Stephan, and
>> > > > > > > > > > share
>> > > > > > > > > > > >> >> >> my
>> > > > > > > > > > > >> >> >>>>>>>>> ideas on these issues.
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend confuses the control
>> flow
>> > of
>> > > > job
>> > > > > > > > > > compilation
>> > > > > > > > > > > >> >> >> and
>> > > > > > > > > > > >> >> >>>>>>>> submission.
>> > > > > > > > > > > >> >> >>>>>>>>> Following the process of job
>> submission
>> > > > Stephan
>> > > > > > and
>> > > > > > > > > Jeff
>> > > > > > > > > > > >> >> >>>> described,
>> > > > > > > > > > > >> >> >>>>>>>>> execution environment knows all
>> configs
>> > of
>> > > > the
>> > > > > > > > cluster
>> > > > > > > > > > and
>> > > > > > > > > > > >> >> >>>>>>> topos/settings
>> > > > > > > > > > > >> >> >>>>>>>>> of the job. Ideally, in the main
>> method
>> > of
>> > > > user
>> > > > > > > > > program,
>> > > > > > > > > > it
>> > > > > > > > > > > >> >> >>> calls
>> > > > > > > > > > > >> >> >>>>>>>> #execute
>> > > > > > > > > > > >> >> >>>>>>>>> (or named #submit) and Flink deploys
>> the
>> > > > > cluster,
>> > > > > > > > > compile
>> > > > > > > > > > > the
>> > > > > > > > > > > >> >> >>> job
>> > > > > > > > > > > >> >> >>>>>> graph
>> > > > > > > > > > > >> >> >>>>>>>>> and submit it to the cluster. However,
>> > > > current
>> > > > > > > > > > CliFrontend
>> > > > > > > > > > > >> >> >> does
>> > > > > > > > > > > >> >> >>>> all
>> > > > > > > > > > > >> >> >>>>>>> these
>> > > > > > > > > > > >> >> >>>>>>>>> things inside its #runProgram method,
>> > which
>> > > > > > > > introduces
>> > > > > > > > > a
>> > > > > > > > > > > lot
>> > > > > > > > > > > >> >> >> of
>> > > > > > > > > > > >> >> >>>>>>>> subclasses
>> > > > > > > > > > > >> >> >>>>>>>>> of (stream) execution environment.
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> Actually, it sets up an exec env that
>> > hijacks
>> > > > > the
>> > > > > > > > > > > >> >> >>>>>> #execute/executePlan
>> > > > > > > > > > > >> >> >>>>>>>>> method, initializes the job graph and
>> > abort
>> > > > > > > > execution.
>> > > > > > > > > > And
>> > > > > > > > > > > >> >> >> then
>> > > > > > > > > > > >> >> >>>>>>>>> control flow back to CliFrontend, it
>> > deploys
>> > > > > the
>> > > > > > > > > > cluster(or
>> > > > > > > > > > > >> >> >>>>> retrieve
>> > > > > > > > > > > >> >> >>>>>>>>> the client) and submits the job graph.
>> > This
>> > > > is
>> > > > > > quite
>> > > > > > > > a
>> > > > > > > > > > > >> >> >> specific
>> > > > > > > > > > > >> >> >>>>>>> internal
>> > > > > > > > > > > >> >> >>>>>>>>> process inside Flink and none of
>> > consistency
>> > > > to
>> > > > > > > > > anything.
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> 2) Deployment of job cluster couples
>> job
>> > > > graph
>> > > > > > > > creation
>> > > > > > > > > > and
>> > > > > > > > > > > >> >> >>>> cluster
>> > > > > > > > > > > >> >> >>>>>>>>> deployment. Abstractly, from user job
>> to
>> > a
>> > > > > > concrete
>> > > > > > > > > > > >> >> >> submission,
>> > > > > > > > > > > >> >> >>>> it
>> > > > > > > > > > > >> >> >>>>>>>> requires
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>     create JobGraph --\
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> create ClusterClient -->  submit
>> JobGraph
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> such a dependency. ClusterClient was
>> > created
>> > > > by
>> > > > > > > > > deploying
>> > > > > > > > > > > or
>> > > > > > > > > > > >> >> >>>>>>> retrieving.
>> > > > > > > > > > > >> >> >>>>>>>>> JobGraph submission requires a
>> compiled
>> > > > > JobGraph
>> > > > > > and
>> > > > > > > > > > valid
>> > > > > > > > > > > >> >> >>>>>>> ClusterClient,
>> > > > > > > > > > > >> >> >>>>>>>>> but the creation of ClusterClient is
>> > > > abstractly
>> > > > > > > > > > independent
>> > > > > > > > > > > >> >> >> of
>> > > > > > > > > > > >> >> >>>> that
>> > > > > > > > > > > >> >> >>>>>> of
>> > > > > > > > > > > >> >> >>>>>>>>> JobGraph. However, in job cluster
>> mode,
>> > we
>> > > > > > deploy job
>> > > > > > > > > > > cluster
>> > > > > > > > > > > >> >> >>>> with
>> > > > > > > > > > > >> >> >>>>> a
>> > > > > > > > > > > >> >> >>>>>>> job
>> > > > > > > > > > > >> >> >>>>>>>>> graph, which means we use another
>> > process:
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> create JobGraph --> deploy cluster
>> with
>> > the
>> > > > > > JobGraph
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> Here is another inconsistency and
>> > downstream
>> > > > > > > > > > > projects/client
>> > > > > > > > > > > >> >> >>> apis
>> > > > > > > > > > > >> >> >>>>> are
>> > > > > > > > > > > >> >> >>>>>>>>> forced to handle different cases with
>> > rare
>> > > > > > supports
>> > > > > > > > > from
>> > > > > > > > > > > >> >> >> Flink.
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> Since we likely reached a consensus on
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> 1. all configs gathered by Flink
>> > > > configuration
>> > > > > > and
>> > > > > > > > > passed
>> > > > > > > > > > > >> >> >>>>>>>>> 2. execution environment knows all
>> > configs
>> > > > and
>> > > > > > > > handles
>> > > > > > > > > > > >> >> >>>>> execution(both
>> > > > > > > > > > > >> >> >>>>>>>>> deployment and submission)
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> to the issues above I propose
>> eliminating
>> > > > > > > > > inconsistencies
>> > > > > > > > > > > by
>> > > > > > > > > > > >> >> >>>>>> following
>> > > > > > > > > > > >> >> >>>>>>>>> approach:
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend should exactly be a
>> front
>> > end,
>> > > > > at
>> > > > > > > > least
>> > > > > > > > > > for
>> > > > > > > > > > > >> >> >>> "run"
>> > > > > > > > > > > >> >> >>>>>>> command.
>> > > > > > > > > > > >> >> >>>>>>>>> That means it just gathered and passed
>> > all
>> > > > > config
>> > > > > > > > from
>> > > > > > > > > > > >> >> >> command
>> > > > > > > > > > > >> >> >>>> line
>> > > > > > > > > > > >> >> >>>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>> the main method of user program.
>> > Execution
>> > > > > > > > environment
>> > > > > > > > > > > knows
>> > > > > > > > > > > >> >> >>> all
>> > > > > > > > > > > >> >> >>>>> the
>> > > > > > > > > > > >> >> >>>>>>> info
>> > > > > > > > > > > >> >> >>>>>>>>> and with an addition to utils for
>> > > > > ClusterClient,
>> > > > > > we
>> > > > > > > > > > > >> >> >> gracefully
>> > > > > > > > > > > >> >> >>>> get
>> > > > > > > > > > > >> >> >>>>> a
>> > > > > > > > > > > >> >> >>>>>>>>> ClusterClient by deploying or
>> > retrieving. In
>> > > > > this
>> > > > > > > > way,
>> > > > > > > > > we
>> > > > > > > > > > > >> >> >> don't
>> > > > > > > > > > > >> >> >>>>> need
>> > > > > > > > > > > >> >> >>>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>> hijack #execute/executePlan methods
>> and
>> > can
>> > > > > > remove
>> > > > > > > > > > various
>> > > > > > > > > > > >> >> >>>> hacking
>> > > > > > > > > > > >> >> >>>>>>>>> subclasses of exec env, as well as
>> #run
>> > > > methods
>> > > > > > in
>> > > > > > > > > > > >> >> >>>>> ClusterClient(for
>> > > > > > > > > > > >> >> >>>>>> an
>> > > > > > > > > > > >> >> >>>>>>>>> interface-ized ClusterClient). Now the
>> > > > control
>> > > > > > flow
>> > > > > > > > > flows
>> > > > > > > > > > > >> >> >> from
>> > > > > > > > > > > >> >> >>>>>>>> CliFrontend
>> > > > > > > > > > > >> >> >>>>>>>>> to the main method and never returns.
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> 2) Job cluster means a cluster for the
>> > > > specific
>> > > > > > job.
>> > > > > > > > > From
>> > > > > > > > > > > >> >> >>> another
>> > > > > > > > > > > >> >> >>>>>>>>> perspective, it is an ephemeral
>> session.
>> > We
>> > > > may
>> > > > > > > > > decouple
>> > > > > > > > > > > the
>> > > > > > > > > > > >> >> >>>>>> deployment
>> > > > > > > > > > > >> >> >>>>>>>>> with a compiled job graph, but start a
>> > > > session
>> > > > > > with
>> > > > > > > > > idle
>> > > > > > > > > > > >> >> >>> timeout
>> > > > > > > > > > > >> >> >>>>>>>>> and submit the job following.
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> These topics, before we go into more
>> > details
>> > > > on
>> > > > > > > > design
>> > > > > > > > > or
>> > > > > > > > > > > >> >> >>>>>>> implementation,
>> > > > > > > > > > > >> >> >>>>>>>>> are better to be aware and discussed
>> for
>> > a
>> > > > > > consensus.
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> Best,
>> > > > > > > > > > > >> >> >>>>>>>>> tison.
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>> Zili Chen <wa...@gmail.com>
>> > > > 于2019年6月20日周四
>> > > > > > > > > 上午3:21写道：
>> > > > > > > > > > > >> >> >>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>> Hi Jeff,
>> > > > > > > > > > > >> >> >>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>> Thanks for raising this thread and
>> the
>> > > > design
>> > > > > > > > > document!
>> > > > > > > > > > > >> >> >>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>> As @Thomas Weise mentioned above,
>> > extending
>> > > > > > config
>> > > > > > > > to
>> > > > > > > > > > > flink
>> > > > > > > > > > > >> >> >>>>>>>>>> requires far more effort than it
>> should
>> > be.
>> > > > > > Another
>> > > > > > > > > > > example
>> > > > > > > > > > > >> >> >>>>>>>>>> is we achieve detach mode by
>> introduce
>> > > > another
>> > > > > > > > > execution
>> > > > > > > > > > > >> >> >>>>>>>>>> environment which also hijack
>> #execute
>> > > > method.
>> > > > > > > > > > > >> >> >>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>> I agree with your idea that user
>> would
>> > > > > > configure all
>> > > > > > > > > > > things
>> > > > > > > > > > > >> >> >>>>>>>>>> and flink "just" respect it. On this
>> > topic I
>> > > > > > think
>> > > > > > > > the
>> > > > > > > > > > > >> >> >> unusual
>> > > > > > > > > > > >> >> >>>>>>>>>> control flow when CliFrontend handle
>> > "run"
>> > > > > > command
>> > > > > > > > is
>> > > > > > > > > > the
>> > > > > > > > > > > >> >> >>>> problem.
>> > > > > > > > > > > >> >> >>>>>>>>>> It handles several configs, mainly
>> about
>> > > > > cluster
>> > > > > > > > > > settings,
>> > > > > > > > > > > >> >> >> and
>> > > > > > > > > > > >> >> >>>>>>>>>> thus main method of user program is
>> > unaware
>> > > > of
>> > > > > > them.
>> > > > > > > > > > Also
>> > > > > > > > > > > it
>> > > > > > > > > > > >> >> >>>>>> compiles
>> > > > > > > > > > > >> >> >>>>>>>>>> app to job graph by run the main
>> method
>> > > > with a
>> > > > > > > > > hijacked
>> > > > > > > > > > > exec
>> > > > > > > > > > > >> >> >>>> env,
>> > > > > > > > > > > >> >> >>>>>>>>>> which constrain the main method
>> further.
>> > > > > > > > > > > >> >> >>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>> I'd like to write down a few of
>> notes on
>> > > > > > > > configs/args
>> > > > > > > > > > pass
>> > > > > > > > > > > >> >> >> and
>> > > > > > > > > > > >> >> >>>>>>> respect,
>> > > > > > > > > > > >> >> >>>>>>>>>> as well as decoupling job compilation
>> > and
>> > > > > > > > submission.
>> > > > > > > > > > > Share
>> > > > > > > > > > > >> >> >> on
>> > > > > > > > > > > >> >> >>>>> this
>> > > > > > > > > > > >> >> >>>>>>>>>> thread later.
>> > > > > > > > > > > >> >> >>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>> Best,
>> > > > > > > > > > > >> >> >>>>>>>>>> tison.
>> > > > > > > > > > > >> >> >>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>> SHI Xiaogang <shixiaogangg@gmail.com
>> >
>> > > > > > 于2019年6月17日周一
>> > > > > > > > > > > >> >> >> 下午7:29写道：
>> > > > > > > > > > > >> >> >>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
>> > > > > > > > > > > >> >> >>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the
>> > design
>> > > > > > > > document.
>> > > > > > > > > > > >> >> >>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>> We are also working on refactoring
>> > > > > > ClusterClient to
>> > > > > > > > > > allow
>> > > > > > > > > > > >> >> >>>>> flexible
>> > > > > > > > > > > >> >> >>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>> efficient job management in our
>> > real-time
>> > > > > > platform.
>> > > > > > > > > > > >> >> >>>>>>>>>>> We would like to draft a document to
>> > share
>> > > > > our
>> > > > > > > > ideas
>> > > > > > > > > > with
>> > > > > > > > > > > >> >> >>> you.
>> > > > > > > > > > > >> >> >>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>> I think it's a good idea to have
>> > something
>> > > > > like
>> > > > > > > > > Apache
>> > > > > > > > > > > Livy
>> > > > > > > > > > > >> >> >>> for
>> > > > > > > > > > > >> >> >>>>>>> Flink,
>> > > > > > > > > > > >> >> >>>>>>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>> the efforts discussed here will
>> take a
>> > > > great
>> > > > > > step
>> > > > > > > > > > forward
>> > > > > > > > > > > >> >> >> to
>> > > > > > > > > > > >> >> >>>> it.
>> > > > > > > > > > > >> >> >>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>> Regards,
>> > > > > > > > > > > >> >> >>>>>>>>>>> Xiaogang
>> > > > > > > > > > > >> >> >>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>> Flavio Pompermaier <
>> > pompermaier@okkam.it>
>> > > > > > > > > > 于2019年6月17日周一
>> > > > > > > > > > > >> >> >>>>> 下午7:13写道：
>> > > > > > > > > > > >> >> >>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>> Is there any possibility to have
>> > something
>> > > > > > like
>> > > > > > > > > Apache
>> > > > > > > > > > > >> >> >> Livy
>> > > > > > > > > > > >> >> >>>> [1]
>> > > > > > > > > > > >> >> >>>>>>> also
>> > > > > > > > > > > >> >> >>>>>>>>>>> for
>> > > > > > > > > > > >> >> >>>>>>>>>>>> Flink in the future?
>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM
>> Jeff
>> > > > Zhang <
>> > > > > > > > > > > >> >> >>> zjffdu@gmail.com
>> > > > > > > > > > > >> >> >>>>>
>> > > > > > > > > > > >> >> >>>>>>> wrote:
>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Any API we expose should not
>> have
>> > > > > > dependencies
>> > > > > > > > > on
>> > > > > > > > > > > >> >> >>> the
>> > > > > > > > > > > >> >> >>>>>>> runtime
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> (flink-runtime) package or other
>> > > > > > implementation
>> > > > > > > > > > > >> >> >> details.
>> > > > > > > > > > > >> >> >>> To
>> > > > > > > > > > > >> >> >>>>> me,
>> > > > > > > > > > > >> >> >>>>>>>> this
>> > > > > > > > > > > >> >> >>>>>>>>>>>> means
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> that the current ClusterClient
>> > cannot be
>> > > > > > exposed
>> > > > > > > > to
>> > > > > > > > > > > >> >> >> users
>> > > > > > > > > > > >> >> >>>>>> because
>> > > > > > > > > > > >> >> >>>>>>>> it
>> > > > > > > > > > > >> >> >>>>>>>>>>>> uses
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> quite some classes from the
>> > optimiser and
>> > > > > > runtime
>> > > > > > > > > > > >> >> >>> packages.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> We should change ClusterClient
>> from
>> > class
>> > > > > to
>> > > > > > > > > > interface.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> ExecutionEnvironment only use the
>> > > > interface
>> > > > > > > > > > > >> >> >> ClusterClient
>> > > > > > > > > > > >> >> >>>>> which
>> > > > > > > > > > > >> >> >>>>>>>>>>> should be
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> in flink-clients while the
>> concrete
>> > > > > > > > implementation
>> > > > > > > > > > > >> >> >> class
>> > > > > > > > > > > >> >> >>>>> could
>> > > > > > > > > > > >> >> >>>>>> be
>> > > > > > > > > > > >> >> >>>>>>>> in
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> flink-runtime.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What happens when a
>> > failure/restart in
>> > > > > the
>> > > > > > > > > client
>> > > > > > > > > > > >> >> >>>>> happens?
>> > > > > > > > > > > >> >> >>>>>>>> There
>> > > > > > > > > > > >> >> >>>>>>>>>>> need
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to be a way of re-establishing the
>> > > > > > connection to
>> > > > > > > > > the
>> > > > > > > > > > > >> >> >> job,
>> > > > > > > > > > > >> >> >>>> set
>> > > > > > > > > > > >> >> >>>>>> up
>> > > > > > > > > > > >> >> >>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> listeners again, etc.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Good point.  First we need to
>> define
>> > what
>> > > > > > does
>> > > > > > > > > > > >> >> >>>>> failure/restart
>> > > > > > > > > > > >> >> >>>>>> in
>> > > > > > > > > > > >> >> >>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> client mean. IIUC, that usually
>> mean
>> > > > > network
>> > > > > > > > > failure
>> > > > > > > > > > > >> >> >>> which
>> > > > > > > > > > > >> >> >>>>> will
>> > > > > > > > > > > >> >> >>>>>>>>>>> happen in
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> class RestClient. If my
>> > understanding is
>> > > > > > correct,
>> > > > > > > > > > > >> >> >>>>> restart/retry
>> > > > > > > > > > > >> >> >>>>>>>>>>> mechanism
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> should be done in RestClient.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Aljoscha Krettek <
>> > aljoscha@apache.org>
>> > > > > > > > > 于2019年6月11日周二
>> > > > > > > > > > > >> >> >>>>>> 下午11:10写道：
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> Some points to consider:
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> * Any API we expose should not
>> have
>> > > > > > dependencies
>> > > > > > > > > on
>> > > > > > > > > > > >> >> >> the
>> > > > > > > > > > > >> >> >>>>>> runtime
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> (flink-runtime) package or other
>> > > > > > implementation
>> > > > > > > > > > > >> >> >>> details.
>> > > > > > > > > > > >> >> >>>> To
>> > > > > > > > > > > >> >> >>>>>> me,
>> > > > > > > > > > > >> >> >>>>>>>>>>> this
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> means
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> that the current ClusterClient
>> > cannot be
>> > > > > > exposed
>> > > > > > > > > to
>> > > > > > > > > > > >> >> >>> users
>> > > > > > > > > > > >> >> >>>>>>> because
>> > > > > > > > > > > >> >> >>>>>>>>>>> it
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> uses
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> quite some classes from the
>> > optimiser
>> > > > and
>> > > > > > > > runtime
>> > > > > > > > > > > >> >> >>>> packages.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> * What happens when a
>> > failure/restart in
>> > > > > the
>> > > > > > > > > client
>> > > > > > > > > > > >> >> >>>>> happens?
>> > > > > > > > > > > >> >> >>>>>>>> There
>> > > > > > > > > > > >> >> >>>>>>>>>>> need
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> be a way of re-establishing the
>> > > > connection
>> > > > > > to
>> > > > > > > > the
>> > > > > > > > > > > >> >> >> job,
>> > > > > > > > > > > >> >> >>>> set
>> > > > > > > > > > > >> >> >>>>> up
>> > > > > > > > > > > >> >> >>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> listeners
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> again, etc.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> Aljoscha
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff
>> > Zhang <
>> > > > > > > > > > > >> >> >>>> zjffdu@gmail.com>
>> > > > > > > > > > > >> >> >>>>>>>> wrote:
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is
>> > late as
>> > > > > you
>> > > > > > > > > > > >> >> >> expected.
>> > > > > > > > > > > >> >> >>>>> Here's
>> > > > > > > > > > > >> >> >>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>> design
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> doc
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> I drafted, welcome any comments
>> and
>> > > > > > feedback.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>
>> > > > > > > > > > > >> >> >>>>>>
>> > > > > > > > > > > >> >> >>>>>
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>
>> > > > > > > > > > > >> >> >>
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
>> > > > > > 于2019年2月14日周四
>> > > > > > > > > > > >> >> >>>> 下午8:43写道：
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Nice that this discussion is
>> > > > happening.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also
>> > revisit the
>> > > > > > entire
>> > > > > > > > > role
>> > > > > > > > > > > >> >> >>> of
>> > > > > > > > > > > >> >> >>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>> environments
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> again.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the environments take care of
>> > the
>> > > > > > specific
>> > > > > > > > > > > >> >> >> setup
>> > > > > > > > > > > >> >> >>>> for
>> > > > > > > > > > > >> >> >>>>>>>>>>> standalone
>> > > > > > > > > > > >> >> >>>>>>>>>>>> (no
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos,
>> etc.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the session ones have control
>> > over
>> > > > the
>> > > > > > > > > session.
>> > > > > > > > > > > >> >> >>> The
>> > > > > > > > > > > >> >> >>>>>>>>>>> environment
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> holds
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the session client.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - running a job gives a
>> "control"
>> > > > object
>> > > > > > for
>> > > > > > > > > that
>> > > > > > > > > > > >> >> >>>> job.
>> > > > > > > > > > > >> >> >>>>>> That
>> > > > > > > > > > > >> >> >>>>>>>>>>>> behavior
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> is
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the same in all environments.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> The actual implementation
>> diverged
>> > > > quite
>> > > > > > a bit
>> > > > > > > > > > > >> >> >> from
>> > > > > > > > > > > >> >> >>>>> that.
>> > > > > > > > > > > >> >> >>>>>>>> Happy
>> > > > > > > > > > > >> >> >>>>>>>>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> see a
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> discussion about straitening
>> this
>> > out
>> > > > a
>> > > > > > bit
>> > > > > > > > > more.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM
>> > Jeff
>> > > > > > Zhang <
>> > > > > > > > > > > >> >> >>>>>>> zjffdu@gmail.com>
>> > > > > > > > > > > >> >> >>>>>>>>>>>> wrote:
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Hi folks,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Sorry for late response, It
>> > seems we
>> > > > > > reach
>> > > > > > > > > > > >> >> >>> consensus
>> > > > > > > > > > > >> >> >>>> on
>> > > > > > > > > > > >> >> >>>>>>>> this, I
>> > > > > > > > > > > >> >> >>>>>>>>>>>> will
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> create
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> FLIP for this with more
>> detailed
>> > > > design
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org>
>> > > > > > 于2018年12月21日周五
>> > > > > > > > > > > >> >> >>>>> 上午11:43写道：
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Great to see this discussion
>> > seeded!
>> > > > > The
>> > > > > > > > > > > >> >> >> problems
>> > > > > > > > > > > >> >> >>>> you
>> > > > > > > > > > > >> >> >>>>>> face
>> > > > > > > > > > > >> >> >>>>>>>>>>> with
>> > > > > > > > > > > >> >> >>>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also
>> > > > > affecting
>> > > > > > > > other
>> > > > > > > > > > > >> >> >>>>> downstream
>> > > > > > > > > > > >> >> >>>>>>>>>>> projects,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> like
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Beam.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint
>> > > > restore
>> > > > > > option
>> > > > > > > > > in
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> [1]
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and that was more difficult
>> > than it
>> > > > > > should
>> > > > > > > > be.
>> > > > > > > > > > > >> >> >> The
>> > > > > > > > > > > >> >> >>>>> main
>> > > > > > > > > > > >> >> >>>>>>>> issue
>> > > > > > > > > > > >> >> >>>>>>>>>>> is
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> that
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> environment and cluster
>> client
>> > > > aren't
>> > > > > > > > > decoupled.
>> > > > > > > > > > > >> >> >>>>> Ideally
>> > > > > > > > > > > >> >> >>>>>>> it
>> > > > > > > > > > > >> >> >>>>>>>>>>> should
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> be
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> possible to just get the
>> > matching
>> > > > > > cluster
>> > > > > > > > > client
>> > > > > > > > > > > >> >> >>>> from
>> > > > > > > > > > > >> >> >>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> environment
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> then control the job through
>> it
>> > > > > > (environment
>> > > > > > > > > as
>> > > > > > > > > > > >> >> >>>>> factory
>> > > > > > > > > > > >> >> >>>>>>> for
>> > > > > > > > > > > >> >> >>>>>>>>>>>> cluster
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> client). But note that the
>> > > > environment
>> > > > > > > > classes
>> > > > > > > > > > > >> >> >> are
>> > > > > > > > > > > >> >> >>>>> part
>> > > > > > > > > > > >> >> >>>>>> of
>> > > > > > > > > > > >> >> >>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> public
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and it is not
>> straightforward to
>> > > > make
>> > > > > > larger
>> > > > > > > > > > > >> >> >>> changes
>> > > > > > > > > > > >> >> >>>>>>> without
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> breaking
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently
>> exposes
>> > > > > internal
>> > > > > > > > > classes
>> > > > > > > > > > > >> >> >>>> like
>> > > > > > > > > > > >> >> >>>>>>>>>>> JobGraph and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be
>> > > > possible
>> > > > > > to
>> > > > > > > > wrap
>> > > > > > > > > > > >> >> >>> this
>> > > > > > > > > > > >> >> >>>>>> with a
>> > > > > > > > > > > >> >> >>>>>>>> new
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> public
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> API
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that brings the required job
>> > control
>> > > > > > > > > > > >> >> >> capabilities
>> > > > > > > > > > > >> >> >>>> for
>> > > > > > > > > > > >> >> >>>>>>>>>>> downstream
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> projects.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to
>> look at
>> > > > some
>> > > > > > of the
>> > > > > > > > > > > >> >> >>>>> interfaces
>> > > > > > > > > > > >> >> >>>>>> in
>> > > > > > > > > > > >> >> >>>>>>>>>>> Beam
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> while
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for
>> the
>> > > > > > portable
>> > > > > > > > job
>> > > > > > > > > > > >> >> >> API
>> > > > > > > > > > > >> >> >>>> and
>> > > > > > > > > > > >> >> >>>>>> [3]
>> > > > > > > > > > > >> >> >>>>>>>> for
>> > > > > > > > > > > >> >> >>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> old
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> asynchronous job control from
>> > the
>> > > > Beam
>> > > > > > Java
>> > > > > > > > > SDK.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> The backward compatibility
>> > > > discussion
>> > > > > > [4] is
>> > > > > > > > > > > >> >> >> also
>> > > > > > > > > > > >> >> >>>>>> relevant
>> > > > > > > > > > > >> >> >>>>>>>>>>> here. A
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> new
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> should shield downstream
>> > projects
>> > > > from
>> > > > > > > > > internals
>> > > > > > > > > > > >> >> >>> and
>> > > > > > > > > > > >> >> >>>>>> allow
>> > > > > > > > > > > >> >> >>>>>>>>>>> them to
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> interoperate with multiple
>> > future
>> > > > > Flink
>> > > > > > > > > versions
>> > > > > > > > > > > >> >> >>> in
>> > > > > > > > > > > >> >> >>>>> the
>> > > > > > > > > > > >> >> >>>>>>> same
>> > > > > > > > > > > >> >> >>>>>>>>>>>> release
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> line
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thanks,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thomas
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [1]
>> > > > > > > > https://github.com/apache/flink/pull/7249
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [2]
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>
>> > > > > > > > > > > >> >> >>>>>>
>> > > > > > > > > > > >> >> >>>>>
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>
>> > > > > > > > > > > >> >> >>
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [3]
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>
>> > > > > > > > > > > >> >> >>>>>>
>> > > > > > > > > > > >> >> >>>>>
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>
>> > > > > > > > > > > >> >> >>
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [4]
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>
>> > > > > > > > > > > >> >> >>>>>>
>> > > > > > > > > > > >> >> >>>>>
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>
>> > > > > > > > > > > >> >> >>
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15
>> PM
>> > Jeff
>> > > > > > Zhang <
>> > > > > > > > > > > >> >> >>>>>>>> zjffdu@gmail.com>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> wrote:
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether
>> the
>> > user
>> > > > > > should
>> > > > > > > > be
>> > > > > > > > > > > >> >> >>> able
>> > > > > > > > > > > >> >> >>>> to
>> > > > > > > > > > > >> >> >>>>>>>> define
>> > > > > > > > > > > >> >> >>>>>>>>>>>> where
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> job
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn).
>> > This
>> > > > is
>> > > > > > > > actually
>> > > > > > > > > > > >> >> >>>>>> independent
>> > > > > > > > > > > >> >> >>>>>>>> of
>> > > > > > > > > > > >> >> >>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> job
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> development and is something
>> > which
>> > > > is
>> > > > > > > > decided
>> > > > > > > > > > > >> >> >> at
>> > > > > > > > > > > >> >> >>>>>>> deployment
>> > > > > > > > > > > >> >> >>>>>>>>>>> time.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> User don't need to specify
>> > > > execution
>> > > > > > mode
>> > > > > > > > > > > >> >> >>>>>>> programmatically.
>> > > > > > > > > > > >> >> >>>>>>>>>>> They
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> can
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> also
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from
>> > the
>> > > > > > arguments
>> > > > > > > > in
>> > > > > > > > > > > >> >> >>> flink
>> > > > > > > > > > > >> >> >>>>> run
>> > > > > > > > > > > >> >> >>>>>>>>>>> command.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> e.g.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m
>> yarn-cluster
>> > ....
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port
>> ...
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you
>> ?
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that
>> > the
>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment
>> > > > > > > > > > > >> >> >>>>>> is
>> > > > > > > > > > > >> >> >>>>>>>> not
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> directly
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> initialized by the user and
>> > instead
>> > > > > > context
>> > > > > > > > > > > >> >> >>>> sensitive
>> > > > > > > > > > > >> >> >>>>>> how
>> > > > > > > > > > > >> >> >>>>>>>> you
>> > > > > > > > > > > >> >> >>>>>>>>>>>> want
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI
>> vs.
>> > > > IDE,
>> > > > > > for
>> > > > > > > > > > > >> >> >>> example).
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice
>> Flink
>> > > > would
>> > > > > > > > create
>> > > > > > > > > > > >> >> >>>>> different
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment
>> > based
>> > > > on
>> > > > > > > > > different
>> > > > > > > > > > > >> >> >>>>>> submission
>> > > > > > > > > > > >> >> >>>>>>>>>>>> scenarios
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> (Flink
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is
>> > kind of
>> > > > > hack
>> > > > > > > > > > > >> >> >> approach,
>> > > > > > > > > > > >> >> >>>> not
>> > > > > > > > > > > >> >> >>>>>> so
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> straightforward.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> What I suggested above is
>> that
>> > is
>> > > > > that
>> > > > > > > > flink
>> > > > > > > > > > > >> >> >>> should
>> > > > > > > > > > > >> >> >>>>>>> always
>> > > > > > > > > > > >> >> >>>>>>>>>>> create
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but
>> with
>> > > > > different
>> > > > > > > > > > > >> >> >>>>> configuration,
>> > > > > > > > > > > >> >> >>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>> based
>> > > > > > > > > > > >> >> >>>>>>>>>>>> on
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> configuration it would
>> create
>> > the
>> > > > > > proper
>> > > > > > > > > > > >> >> >>>>> ClusterClient
>> > > > > > > > > > > >> >> >>>>>>> for
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> different
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <
>> > > > trohrmann@apache.org>
>> > > > > > > > > > > >> >> >>>> 于2018年12月20日周四
>> > > > > > > > > > > >> >> >>>>>>>>>>> 下午11:18写道：
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> You are probably right
>> that we
>> > > > have
>> > > > > > code
>> > > > > > > > > > > >> >> >>>> duplication
>> > > > > > > > > > > >> >> >>>>>>> when
>> > > > > > > > > > > >> >> >>>>>>>> it
>> > > > > > > > > > > >> >> >>>>>>>>>>>> comes
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creation of the
>> ClusterClient.
>> > > > This
>> > > > > > should
>> > > > > > > > > be
>> > > > > > > > > > > >> >> >>>>> reduced
>> > > > > > > > > > > >> >> >>>>>> in
>> > > > > > > > > > > >> >> >>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> future.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the
>> > user
>> > > > > > should be
>> > > > > > > > > > > >> >> >> able
>> > > > > > > > > > > >> >> >>> to
>> > > > > > > > > > > >> >> >>>>>>> define
>> > > > > > > > > > > >> >> >>>>>>>>>>> where
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example
>> Yarn).
>> > This
>> > > > is
>> > > > > > > > > actually
>> > > > > > > > > > > >> >> >>>>>>> independent
>> > > > > > > > > > > >> >> >>>>>>>>>>> of the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> development and is
>> something
>> > which
>> > > > > is
>> > > > > > > > > decided
>> > > > > > > > > > > >> >> >> at
>> > > > > > > > > > > >> >> >>>>>>>> deployment
>> > > > > > > > > > > >> >> >>>>>>>>>>>> time.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> To
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> it
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the
>> > > > > > ExecutionEnvironment
>> > > > > > > > is
>> > > > > > > > > > > >> >> >> not
>> > > > > > > > > > > >> >> >>>>>>> directly
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> initialized
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> by
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> the user and instead
>> context
>> > > > > > sensitive how
>> > > > > > > > > you
>> > > > > > > > > > > >> >> >>>> want
>> > > > > > > > > > > >> >> >>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>>>> execute
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> your
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for
>> > example).
>> > > > > > > > However, I
>> > > > > > > > > > > >> >> >>> agree
>> > > > > > > > > > > >> >> >>>>>> that
>> > > > > > > > > > > >> >> >>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should
>> > give
>> > > > you
>> > > > > > > > access
>> > > > > > > > > to
>> > > > > > > > > > > >> >> >>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>> ClusterClient
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of
>> the
>> > > > > > JobGraph or
>> > > > > > > > a
>> > > > > > > > > > > >> >> >> job
>> > > > > > > > > > > >> >> >>>>> plan).
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Till
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at
>> 4:36
>> > AM
>> > > > Jeff
>> > > > > > > > Zhang <
>> > > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> wrote:
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback.
>> You
>> > are
>> > > > > > right
>> > > > > > > > > that I
>> > > > > > > > > > > >> >> >>>>> expect
>> > > > > > > > > > > >> >> >>>>>>>> better
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api
>> > which
>> > > > > > could be
>> > > > > > > > > > > >> >> >> used
>> > > > > > > > > > > >> >> >>> by
>> > > > > > > > > > > >> >> >>>>>>>>>>> downstream
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> project.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> And
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the
>> > flink
>> > > > > > ecosystem.
>> > > > > > > > > > > >> >> >> When
>> > > > > > > > > > > >> >> >>> I
>> > > > > > > > > > > >> >> >>>>> look
>> > > > > > > > > > > >> >> >>>>>>> at
>> > > > > > > > > > > >> >> >>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>> code
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> of
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> flink
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and
>> sql-client (I
>> > > > > believe
>> > > > > > > > they
>> > > > > > > > > > > >> >> >> are
>> > > > > > > > > > > >> >> >>>> not
>> > > > > > > > > > > >> >> >>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>> core of
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> flink,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> but
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of
>> > > > flink),
>> > > > > I
>> > > > > > find
>> > > > > > > > > > > >> >> >> many
>> > > > > > > > > > > >> >> >>>>>>> duplicated
>> > > > > > > > > > > >> >> >>>>>>>>>>> code
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> for
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creating
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user
>> > provided
>> > > > > > > > > > > >> >> >> configuration
>> > > > > > > > > > > >> >> >>>>>>>>>>> (configuration
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> format
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> may
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell
>> > and
>> > > > > > > > sql-client)
>> > > > > > > > > > > >> >> >> and
>> > > > > > > > > > > >> >> >>>> then
>> > > > > > > > > > > >> >> >>>>>> use
>> > > > > > > > > > > >> >> >>>>>>>>>>> that
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I
>> don't
>> > think
>> > > > > > this is
>> > > > > > > > > > > >> >> >>>>> convenient
>> > > > > > > > > > > >> >> >>>>>>> for
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> downstream
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is
>> > that
>> > > > > > > > downstream
>> > > > > > > > > > > >> >> >>>> project
>> > > > > > > > > > > >> >> >>>>>> only
>> > > > > > > > > > > >> >> >>>>>>>>>>> needs
>> > > > > > > > > > > >> >> >>>>>>>>>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> provide
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration
>> info
>> > > > (maybe
>> > > > > > > > > > > >> >> >> introducing
>> > > > > > > > > > > >> >> >>>>> class
>> > > > > > > > > > > >> >> >>>>>>>>>>>> FlinkConf),
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment
>> > based
>> > > > on
>> > > > > > this
>> > > > > > > > > > > >> >> >>>> FlinkConf,
>> > > > > > > > > > > >> >> >>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will
>> > create
>> > > > > the
>> > > > > > > > proper
>> > > > > > > > > > > >> >> >>>>>>>> ClusterClient.
>> > > > > > > > > > > >> >> >>>>>>>>>>> It
>> > > > > > > > > > > >> >> >>>>>>>>>>>> not
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> only
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream
>> > > > project
>> > > > > > > > > > > >> >> >> development
>> > > > > > > > > > > >> >> >>>> but
>> > > > > > > > > > > >> >> >>>>>> also
>> > > > > > > > > > > >> >> >>>>>>>> be
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> helpful
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> for
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> their integration test
>> with
>> > > > flink.
>> > > > > > Here's
>> > > > > > > > > one
>> > > > > > > > > > > >> >> >>>>> sample
>> > > > > > > > > > > >> >> >>>>>>> code
>> > > > > > > > > > > >> >> >>>>>>>>>>>> snippet
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> I
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new
>> > > > > > FlinkConf().mode("yarn")
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new
>> > > > > > ExecutionEnvironment(conf)
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId =
>> env.submit(...)
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
>> > > > > > > > > > > >> >> >>>>>>>>>>>
>> > > > env.getClusterClient().queryJobStatus(jobId)
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>> > > > > > env.getClusterClient().cancelJob(jobId)
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
>> > > > > trohrmann@apache.org>
>> > > > > > > > > > > >> >> >>>>> 于2018年12月11日周二
>> > > > > > > > > > > >> >> >>>>>>>>>>> 下午6:28写道：
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing
>> is to
>> > > > > > provide the
>> > > > > > > > > > > >> >> >> user
>> > > > > > > > > > > >> >> >>>> with
>> > > > > > > > > > > >> >> >>>>>>>> better
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was
>> actually
>> > an
>> > > > > > effort to
>> > > > > > > > > > > >> >> >>> achieve
>> > > > > > > > > > > >> >> >>>>>> this
>> > > > > > > > > > > >> >> >>>>>>>> but
>> > > > > > > > > > > >> >> >>>>>>>>>>> it
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> has
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> never
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> been
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However,
>> > there
>> > > > are
>> > > > > > some
>> > > > > > > > > > > >> >> >>>> improvement
>> > > > > > > > > > > >> >> >>>>>> in
>> > > > > > > > > > > >> >> >>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>> code
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> base
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> now.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the
>> > > > > > NewClusterClient
>> > > > > > > > > > > >> >> >>>> interface
>> > > > > > > > > > > >> >> >>>>>>> which
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> offers a
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job
>> submission.
>> > > > But I
>> > > > > > agree
>> > > > > > > > > > > >> >> >> that
>> > > > > > > > > > > >> >> >>> we
>> > > > > > > > > > > >> >> >>>>>> need
>> > > > > > > > > > > >> >> >>>>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> improve
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> in
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour
>> if
>> > > > > > exposing all
>> > > > > > > > > > > >> >> >>>>>> ClusterClient
>> > > > > > > > > > > >> >> >>>>>>>>>>> calls
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> via
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> > because it
>> > > > > > would
>> > > > > > > > > > > >> >> >> clutter
>> > > > > > > > > > > >> >> >>>> the
>> > > > > > > > > > > >> >> >>>>>>> class
>> > > > > > > > > > > >> >> >>>>>>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> would
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> not
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> a
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of
>> concerns.
>> > > > > > Instead one
>> > > > > > > > > > > >> >> >> idea
>> > > > > > > > > > > >> >> >>>>> could
>> > > > > > > > > > > >> >> >>>>>> be
>> > > > > > > > > > > >> >> >>>>>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> retrieve
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient
>> from
>> > the
>> > > > > > > > > > > >> >> >>>>> ExecutionEnvironment
>> > > > > > > > > > > >> >> >>>>>>>> which
>> > > > > > > > > > > >> >> >>>>>>>>>>> can
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> be
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> used
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job
>> > control. But
>> > > > > > before
>> > > > > > > > we
>> > > > > > > > > > > >> >> >>> start
>> > > > > > > > > > > >> >> >>>>> an
>> > > > > > > > > > > >> >> >>>>>>>> effort
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> here,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> we
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> need
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what
>> > > > > > functionality we
>> > > > > > > > > want
>> > > > > > > > > > > >> >> >>> to
>> > > > > > > > > > > >> >> >>>>>>> provide.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was
>> > that we
>> > > > > > have the
>> > > > > > > > > > > >> >> >>>>>>>> ClusterDescriptor
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> describing
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> how
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster
>> manager
>> > like
>> > > > > > Yarn or
>> > > > > > > > > > > >> >> >> Mesos.
>> > > > > > > > > > > >> >> >>>> The
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> can
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> be
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink
>> > > > clusters
>> > > > > > (job
>> > > > > > > > and
>> > > > > > > > > > > >> >> >>>>> session)
>> > > > > > > > > > > >> >> >>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>> gives
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> you a
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
>> > ClusterClient
>> > > > > > > > controls
>> > > > > > > > > > > >> >> >> the
>> > > > > > > > > > > >> >> >>>>>> cluster
>> > > > > > > > > > > >> >> >>>>>>>>>>> (e.g.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> submitting
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running
>> > jobs).
>> > > > > And
>> > > > > > > > then
>> > > > > > > > > > > >> >> >>> there
>> > > > > > > > > > > >> >> >>>>> was
>> > > > > > > > > > > >> >> >>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>> idea
>> > > > > > > > > > > >> >> >>>>>>>>>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you
>> obtain
>> > from
>> > > > > the
>> > > > > > > > > > > >> >> >>>> ClusterClient
>> > > > > > > > > > > >> >> >>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>>>> trigger
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> specific
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a
>> > > > > savepoint,
>> > > > > > > > > > > >> >> >>> cancelling
>> > > > > > > > > > > >> >> >>>>> the
>> > > > > > > > > > > >> >> >>>>>>>> job).
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
>> > > > > > > > > > > >> >> >>>>>>
>> > > > https://issues.apache.org/jira/browse/FLINK-4272
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at
>> > 10:13 AM
>> > > > > > Jeff
>> > > > > > > > > Zhang
>> > > > > > > > > > > >> >> >> <
>> > > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> wrote:
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate
>> > flink
>> > > > > into
>> > > > > > > > > apache
>> > > > > > > > > > > >> >> >>>>> zeppelin
>> > > > > > > > > > > >> >> >>>>>>>>>>> which is
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> an
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit
>> several
>> > > > > issues
>> > > > > > that
>> > > > > > > > > is
>> > > > > > > > > > > >> >> >>>> caused
>> > > > > > > > > > > >> >> >>>>>> by
>> > > > > > > > > > > >> >> >>>>>>>>>>> flink
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> client
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> api.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> So
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the
>> > > > > following
>> > > > > > > > > changes
>> > > > > > > > > > > >> >> >>> for
>> > > > > > > > > > > >> >> >>>>>> flink
>> > > > > > > > > > > >> >> >>>>>>>>>>> client
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> api.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking
>> > > > execution.
>> > > > > > > > > > > >> >> >> Currently,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>> ExecutionEnvironment#execute
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method
>> which
>> > > > would
>> > > > > > do 2
>> > > > > > > > > > > >> >> >> things,
>> > > > > > > > > > > >> >> >>>>> first
>> > > > > > > > > > > >> >> >>>>>>>>>>> submit
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is
>> > > > > finished.
>> > > > > > I'd
>> > > > > > > > > like
>> > > > > > > > > > > >> >> >>>>>>> introduce a
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> nonblocking
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
>> > > > > > > > > > > >> >> >>>> ExecutionEnvironment#submit
>> > > > > > > > > > > >> >> >>>>>>> which
>> > > > > > > > > > > >> >> >>>>>>>>>>> only
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> submit
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to
>> > client.
>> > > > And
>> > > > > > allow
>> > > > > > > > > user
>> > > > > > > > > > > >> >> >>> to
>> > > > > > > > > > > >> >> >>>>>> query
>> > > > > > > > > > > >> >> >>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>> job
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> status
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> via
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>
>> ExecutionEnvironment/StreamExecutionEnvironment,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way
>> to
>> > > > cancel
>> > > > > > job is
>> > > > > > > > > via
>> > > > > > > > > > > >> >> >>> cli
>> > > > > > > > > > > >> >> >>>>>>>>>>> (bin/flink),
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> this
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> is
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> not
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for
>> downstream
>> > > > project
>> > > > > > to
>> > > > > > > > use
>> > > > > > > > > > > >> >> >> this
>> > > > > > > > > > > >> >> >>>>>>> feature.
>> > > > > > > > > > > >> >> >>>>>>>>>>> So I'd
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> like
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> add
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in
>> > > > > ExecutionEnvironment
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>
>> > ExecutionEnvironment/StreamExecutionEnvironment.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> It
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel
>> api,
>> > we
>> > > > > > should use
>> > > > > > > > > > > >> >> >>>>>>>>>>> ExecutionEnvironment
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> as
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to
>> > > > integrate
>> > > > > > with
>> > > > > > > > > > > >> >> >> flink.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job
>> > > > execution
>> > > > > > > > > > > >> >> >> lifecycle.
>> > > > > > > > > > > >> >> >>>>>>> Something
>> > > > > > > > > > > >> >> >>>>>>>>>>> like
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> following,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> so
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project
>> > can do
>> > > > > > custom
>> > > > > > > > > logic
>> > > > > > > > > > > >> >> >>> in
>> > > > > > > > > > > >> >> >>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>> lifecycle
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> of
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> job.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture
>> the
>> > > > jobId
>> > > > > > after
>> > > > > > > > > job
>> > > > > > > > > > > >> >> >> is
>> > > > > > > > > > > >> >> >>>>>>> submitted
>> > > > > > > > > > > >> >> >>>>>>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> use
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> this
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later
>> > when
>> > > > > > > > necessary.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface
>> > JobListener {
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
>> onJobSubmitted(JobID
>> > > > > jobId);
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
>> > > > > > onJobExecuted(JobExecutionResult
>> > > > > > > > > > > >> >> >>>>> jobResult);
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
>> onJobCanceled(JobID
>> > > > jobId);
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in
>> > > > > > > > > ExecutionEnvironment.
>> > > > > > > > > > > >> >> >>>>>> Currently
>> > > > > > > > > > > >> >> >>>>>>> it
>> > > > > > > > > > > >> >> >>>>>>>>>>> is
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> disabled,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> but
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very
>> convenient
>> > for
>> > > > > > third
>> > > > > > > > > party
>> > > > > > > > > > > >> >> >> to
>> > > > > > > > > > > >> >> >>>>>>>> submitting
>> > > > > > > > > > > >> >> >>>>>>>>>>> jobs
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable
>> it
>> > > > again.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink
>> client
>> > api
>> > > > > into
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>
>> > ExecutionEnvironment/StreamExecutionEnvironment.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term
>> issue
>> > which
>> > > > > > needs
>> > > > > > > > > more
>> > > > > > > > > > > >> >> >>>>> careful
>> > > > > > > > > > > >> >> >>>>>>>>>>> thinking
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> design.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of
>> features
>> > of
>> > > > > > flink is
>> > > > > > > > > > > >> >> >>> exposed
>> > > > > > > > > > > >> >> >>>> in
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>
>> > ExecutionEnvironment/StreamExecutionEnvironment,
>> > > > > > > > > > > >> >> >>>>>> but
>> > > > > > > > > > > >> >> >>>>>>>>>>> some are
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> exposed
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> in
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like
>> > the
>> > > > > > cancel and
>> > > > > > > > > > > >> >> >>>>> savepoint I
>> > > > > > > > > > > >> >> >>>>>>>>>>> mentioned
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> above.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is
>> > due to
>> > > > > that
>> > > > > > > > flink
>> > > > > > > > > > > >> >> >>>> didn't
>> > > > > > > > > > > >> >> >>>>>>> unify
>> > > > > > > > > > > >> >> >>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> interaction
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> with
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3
>> > scenarios
>> > > > of
>> > > > > > flink
>> > > > > > > > > > > >> >> >>>> operation
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.
>> > Flink
>> > > > > will
>> > > > > > > > > create
>> > > > > > > > > > > >> >> >>>>>>>>>>> LocalEnvironment
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> use
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment
>> to
>> > > > create
>> > > > > > > > > > > >> >> >>> LocalExecutor
>> > > > > > > > > > > >> >> >>>>> for
>> > > > > > > > > > > >> >> >>>>>>> job
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> execution.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution.
>> > Flink
>> > > > > will
>> > > > > > > > > create
>> > > > > > > > > > > >> >> >>>>>>>> ClusterClient
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> first
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> then
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create
>> ContextEnvironment
>> > > > based
>> > > > > > on the
>> > > > > > > > > > > >> >> >>>>>>> ClusterClient
>> > > > > > > > > > > >> >> >>>>>>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> run
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation.
>> Flink
>> > will
>> > > > > > create
>> > > > > > > > > > > >> >> >>>>>> ClusterClient
>> > > > > > > > > > > >> >> >>>>>>>>>>> first
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> then
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this
>> > > > ClusterClient.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the
>> > above 3
>> > > > > > > > scenarios.
>> > > > > > > > > > > >> >> >>> Flink
>> > > > > > > > > > > >> >> >>>>>> didn't
>> > > > > > > > > > > >> >> >>>>>>>>>>> use the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to
>> > interact
>> > > > > > with
>> > > > > > > > > flink
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is
>> > following:
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
>> > > > > > > > > > > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
>> > > > > > > > > > > >> >> >>>>>>>>>>> (based
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> on
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> user
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use
>> this
>> > > > > > Environment
>> > > > > > > > > to
>> > > > > > > > > > > >> >> >>>> create
>> > > > > > > > > > > >> >> >>>>>>>> proper
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or
>> > > > > > > > RestClusterClient)
>> > > > > > > > > > > >> >> >> to
>> > > > > > > > > > > >> >> >>>>>>>> interactive
>> > > > > > > > > > > >> >> >>>>>>>>>>> with
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink (
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or
>> cancelation)
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify
>> the
>> > > > process
>> > > > > > of
>> > > > > > > > > local
>> > > > > > > > > > > >> >> >>>>>> execution
>> > > > > > > > > > > >> >> >>>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> remote
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier
>> for
>> > third
>> > > > > > party
>> > > > > > > > to
>> > > > > > > > > > > >> >> >>>>> integrate
>> > > > > > > > > > > >> >> >>>>>>> with
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> flink,
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> because
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is
>> the
>> > > > > unified
>> > > > > > > > entry
>> > > > > > > > > > > >> >> >>> point
>> > > > > > > > > > > >> >> >>>>> for
>> > > > > > > > > > > >> >> >>>>>>>>>>> flink.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> third
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> party
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass
>> > > > > > configuration
>> > > > > > > > to
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> will
>> > do
>> > > > the
>> > > > > > right
>> > > > > > > > > > > >> >> >> thing
>> > > > > > > > > > > >> >> >>>>> based
>> > > > > > > > > > > >> >> >>>>>> on
>> > > > > > > > > > > >> >> >>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be
>> > > > considered
>> > > > > as
>> > > > > > > > flink
>> > > > > > > > > > > >> >> >> api
>> > > > > > > > > > > >> >> >>>>>>> consumer.
>> > > > > > > > > > > >> >> >>>>>>>>>>> it
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> just
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to
>> > > > > > ExecutionEnvironment
>> > > > > > > > and
>> > > > > > > > > > > >> >> >> let
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper
>> > ClusterClient
>> > > > > > instead
>> > > > > > > > > of
>> > > > > > > > > > > >> >> >>>>> letting
>> > > > > > > > > > > >> >> >>>>>>> cli
>> > > > > > > > > > > >> >> >>>>>>>> to
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> create
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large
>> code
>> > > > > > refactoring,
>> > > > > > > > > so
>> > > > > > > > > > > >> >> >> I
>> > > > > > > > > > > >> >> >>>>> think
>> > > > > > > > > > > >> >> >>>>>> we
>> > > > > > > > > > > >> >> >>>>>>>> can
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> defer
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> it
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> for
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release,
>> 1,2,3,4,5
>> > could
>> > > > > be
>> > > > > > done
>> > > > > > > > > at
>> > > > > > > > > > > >> >> >>>> once I
>> > > > > > > > > > > >> >> >>>>>>>>>>> believe.
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Let
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> know
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> your
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback,
>> > thanks
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> --
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> --
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> --
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Best Regards
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> --
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Best Regards
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Jeff Zhang
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> --
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Best Regards
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>> Jeff Zhang
>> > > > > > > > > > > >> >> >>>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>> --
>> > > > > > > > > > > >> >> >>>>>>>> Best Regards
>> > > > > > > > > > > >> >> >>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>> Jeff Zhang
>> > > > > > > > > > > >> >> >>>>>>>>
>> > > > > > > > > > > >> >> >>>>>>>
>> > > > > > > > > > > >> >> >>>>>>
>> > > > > > > > > > > >> >> >>>>>
>> > > > > > > > > > > >> >> >>>>>
>> > > > > > > > > > > >> >> >>>>> --
>> > > > > > > > > > > >> >> >>>>> Best Regards
>> > > > > > > > > > > >> >> >>>>>
>> > > > > > > > > > > >> >> >>>>> Jeff Zhang
>> > > > > > > > > > > >> >> >>>>>
>> > > > > > > > > > > >> >> >>>>
>> > > > > > > > > > > >> >> >>>
>> > > > > > > > > > > >> >> >>
>> > > > > > > > > > > >> >> >
>> > > > > > > > > > > >> >> >
>> > > > > > > > > > > >> >> > --
>> > > > > > > > > > > >> >> > Best Regards
>> > > > > > > > > > > >> >> >
>> > > > > > > > > > > >> >> > Jeff Zhang
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >> >>
>> > > > > > > > > > > >>
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> >
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

Great Kostas! Looking forward to your POC!

Best,
tison.


Jeff Zhang <zj...@gmail.com> 于2019年8月30日周五 下午11:07写道：

> Awesome, @Kostas Looking forward your POC.
>
> Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：
>
> > Hi all,
> >
> > I am just writing here to let you know that I am working on a POC that
> > tries to refactor the current state of job submission in Flink.
> > I want to stress out that it introduces NO CHANGES to the current
> > behaviour of Flink. It just re-arranges things and introduces the
> > notion of an Executor, which is the entity responsible for taking the
> > user-code and submitting it for execution.
> >
> > Given this, the discussion about the functionality that the JobClient
> > will expose to the user can go on independently and the same
> > holds for all the open questions so far.
> >
> > I hope I will have some more new to share soon.
> >
> > Thanks,
> > Kostas
> >
> > On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <da...@gmail.com> wrote:
> > >
> > > Hi Zili,
> > >
> > > It make sense to me that a dedicated cluster is started for a per-job
> > > cluster and will not accept more jobs.
> > > Just have a question about the command line.
> > >
> > > Currently we could use the following commands to start different
> > clusters.
> > > *per-job cluster*
> > > ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
> > > examples/streaming/WindowJoin.jar
> > > *session cluster*
> > > ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
> > > examples/streaming/WindowJoin.jar
> > >
> > > What will it look like after client enhancement?
> > >
> > >
> > > Best,
> > > Yang
> > >
> > > Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
> > >
> > > > Hi Till,
> > > >
> > > > Thanks for your update. Nice to hear :-)
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > Till Rohrmann <tr...@apache.org> 于2019年8月23日周五 下午10:39写道：
> > > >
> > > > > Hi Tison,
> > > > >
> > > > > just a quick comment concerning the class loading issues when using
> > the
> > > > per
> > > > > job mode. The community wants to change it so that the
> > > > > StandaloneJobClusterEntryPoint actually uses the user code class
> > loader
> > > > > with child first class loading [1]. Hence, I hope that this problem
> > will
> > > > be
> > > > > resolved soon.
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/FLINK-13840
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <kkloudas@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > On the topic of web submission, I agree with Till that it only
> > seems
> > > > > > to complicate things.
> > > > > > It is bad for security, job isolation (anybody can submit/cancel
> > jobs),
> > > > > > and its
> > > > > > implementation complicates some parts of the code. So, if it were
> > to
> > > > > > redesign the
> > > > > > WebUI, maybe this part could be left out. In addition, I would
> say
> > > > > > that the ability to cancel
> > > > > > jobs could also be left out.
> > > > > >
> > > > > > Also I would also be in favour of removing the "detached" mode,
> for
> > > > > > the reasons mentioned
> > > > > > above (i.e. because now we will have a future representing the
> > result
> > > > > > on which the user
> > > > > > can choose to wait or not).
> > > > > >
> > > > > > Now for the separating job submission and cluster creation, I am
> in
> > > > > > favour of keeping both.
> > > > > > Once again, the reasons are mentioned above by Stephan, Till,
> > Aljoscha
> > > > > > and also Zili seems
> > > > > > to agree. They mainly have to do with security, isolation and
> ease
> > of
> > > > > > resource management
> > > > > > for the user as he knows that "when my job is done, everything
> > will be
> > > > > > cleared up". This is
> > > > > > also the experience you get when launching a process on your
> local
> > OS.
> > > > > >
> > > > > > On excluding the per-job mode from returning a JobClient or not,
> I
> > > > > > believe that eventually
> > > > > > it would be nice to allow users to get back a jobClient. The
> > reason is
> > > > > > that 1) I cannot
> > > > > > find any objective reason why the user-experience should diverge,
> > and
> > > > > > 2) this will be the
> > > > > > way that the user will be able to interact with his running job.
> > > > > > Assuming that the necessary
> > > > > > ports are open for the REST API to work, then I think that the
> > > > > > JobClient can run against the
> > > > > > REST API without problems. If the needed ports are not open, then
> > we
> > > > > > are safe to not return
> > > > > > a JobClient, as the user explicitly chose to close all points of
> > > > > > communication to his running job.
> > > > > >
> > > > > > On the topic of not hijacking the "env.execute()" in order to get
> > the
> > > > > > Plan, I definitely agree but
> > > > > > for the proposal of having a "compile()" method in the env, I
> would
> > > > > > like to have a better look at
> > > > > > the existing code.
> > > > > >
> > > > > > Cheers,
> > > > > > Kostas
> > > > > >
> > > > > > On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <wa...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > Hi Yang,
> > > > > > >
> > > > > > > It would be helpful if you check Stephan's last comment,
> > > > > > > which states that isolation is important.
> > > > > > >
> > > > > > > For per-job mode, we run a dedicated cluster(maybe it
> > > > > > > should have been a couple of JM and TMs during FLIP-6
> > > > > > > design) for a specific job. Thus the process is prevented
> > > > > > > from other jobs.
> > > > > > >
> > > > > > > In our cases there was a time we suffered from multi
> > > > > > > jobs submitted by different users and they affected
> > > > > > > each other so that all ran into an error state. Also,
> > > > > > > run the client inside the cluster could save client
> > > > > > > resource at some points.
> > > > > > >
> > > > > > > However, we also face several issues as you mentioned,
> > > > > > > that in per-job mode it always uses parent classloader
> > > > > > > thus classloading issues occur.
> > > > > > >
> > > > > > > BTW, one can makes an analogy between session/per-job mode
> > > > > > > in  Flink, and client/cluster mode in Spark.
> > > > > > >
> > > > > > > Best,
> > > > > > > tison.
> > > > > > >
> > > > > > >
> > > > > > > Yang Wang <da...@gmail.com> 于2019年8月22日周四 上午11:25写道：
> > > > > > >
> > > > > > > > From the user's perspective, it is really confused about the
> > scope
> > > > of
> > > > > > > > per-job cluster.
> > > > > > > >
> > > > > > > >
> > > > > > > > If it means a flink cluster with single job, so that we could
> > get
> > > > > > better
> > > > > > > > isolation.
> > > > > > > >
> > > > > > > > Now it does not matter how we deploy the cluster, directly
> > > > > > deploy(mode1)
> > > > > > > >
> > > > > > > > or start a flink cluster and then submit job through cluster
> > > > > > client(mode2).
> > > > > > > >
> > > > > > > >
> > > > > > > > Otherwise, if it just means directly deploy, how should we
> > name the
> > > > > > mode2,
> > > > > > > >
> > > > > > > > session with job or something else?
> > > > > > > >
> > > > > > > > We could also benefit from the mode2. Users could get the
> same
> > > > > > isolation
> > > > > > > > with mode1.
> > > > > > > >
> > > > > > > > The user code and dependencies will be loaded by user class
> > loader
> > > > > > > >
> > > > > > > > to avoid class conflict with framework.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Anyway, both of the two submission modes are useful.
> > > > > > > >
> > > > > > > > We just need to clarify the concepts.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > Yang
> > > > > > > >
> > > > > > > > Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午5:58写道：
> > > > > > > >
> > > > > > > > > Thanks for the clarification.
> > > > > > > > >
> > > > > > > > > The idea JobDeployer ever came into my mind when I was
> > muddled
> > > > with
> > > > > > > > > how to execute per-job mode and session mode with the same
> > user
> > > > > code
> > > > > > > > > and framework codepath.
> > > > > > > > >
> > > > > > > > > With the concept JobDeployer we back to the statement that
> > > > > > environment
> > > > > > > > > knows every configs of cluster deployment and job
> > submission. We
> > > > > > > > > configure or generate from configuration a specific
> > JobDeployer
> > > > in
> > > > > > > > > environment and then code align on
> > > > > > > > >
> > > > > > > > > *JobClient client = env.execute().get();*
> > > > > > > > >
> > > > > > > > > which in session mode returned by clusterClient.submitJob
> > and in
> > > > > > per-job
> > > > > > > > > mode returned by clusterDescriptor.deployJobCluster.
> > > > > > > > >
> > > > > > > > > Here comes a problem that currently we directly run
> > > > > ClusterEntrypoint
> > > > > > > > > with extracted job graph. Follow the JobDeployer way we'd
> > better
> > > > > > > > > align entry point of per-job deployment at JobDeployer.
> > Users run
> > > > > > > > > their main method or by a Cli(finally call main method) to
> > deploy
> > > > > the
> > > > > > > > > job cluster.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > tison.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Stephan Ewen <se...@apache.org> 于2019年8月20日周二 下午4:40写道：
> > > > > > > > >
> > > > > > > > > > Till has made some good comments here.
> > > > > > > > > >
> > > > > > > > > > Two things to add:
> > > > > > > > > >
> > > > > > > > > >   - The job mode is very nice in the way that it runs the
> > > > client
> > > > > > inside
> > > > > > > > > the
> > > > > > > > > > cluster (in the same image/process that is the JM) and
> thus
> > > > > unifies
> > > > > > > > both
> > > > > > > > > > applications and what the Spark world calls the "driver
> > mode".
> > > > > > > > > >
> > > > > > > > > >   - Another thing I would add is that during the FLIP-6
> > design,
> > > > > we
> > > > > > were
> > > > > > > > > > thinking about setups where Dispatcher and JobManager are
> > > > > separate
> > > > > > > > > > processes.
> > > > > > > > > >     A Yarn or Mesos Dispatcher of a session could run
> > > > > independently
> > > > > > > > (even
> > > > > > > > > > as privileged processes executing no code).
> > > > > > > > > >     Then you the "per-job" mode could still be helpful:
> > when a
> > > > > job
> > > > > > is
> > > > > > > > > > submitted to the dispatcher, it launches the JM again in
> a
> > > > > per-job
> > > > > > > > mode,
> > > > > > > > > so
> > > > > > > > > > that JM and TM processes are bound to teh job only. For
> > higher
> > > > > > security
> > > > > > > > > > setups, it is important that processes are not reused
> > across
> > > > > jobs.
> > > > > > > > > >
> > > > > > > > > > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
> > > > > > trohrmann@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I would not be in favour of getting rid of the per-job
> > mode
> > > > > > since it
> > > > > > > > > > > simplifies the process of running Flink jobs
> > considerably.
> > > > > > Moreover,
> > > > > > > > it
> > > > > > > > > > is
> > > > > > > > > > > not only well suited for container deployments but also
> > for
> > > > > > > > deployments
> > > > > > > > > > > where you want to guarantee job isolation. For
> example, a
> > > > user
> > > > > > could
> > > > > > > > > use
> > > > > > > > > > > the per-job mode on Yarn to execute his job on a
> separate
> > > > > > cluster.
> > > > > > > > > > >
> > > > > > > > > > > I think that having two notions of cluster deployments
> > > > (session
> > > > > > vs.
> > > > > > > > > > per-job
> > > > > > > > > > > mode) does not necessarily contradict your ideas for
> the
> > > > client
> > > > > > api
> > > > > > > > > > > refactoring. For example one could have the following
> > > > > interfaces:
> > > > > > > > > > >
> > > > > > > > > > > - ClusterDeploymentDescriptor: encapsulates the logic
> > how to
> > > > > > deploy a
> > > > > > > > > > > cluster.
> > > > > > > > > > > - ClusterClient: allows to interact with a cluster
> > > > > > > > > > > - JobClient: allows to interact with a running job
> > > > > > > > > > >
> > > > > > > > > > > Now the ClusterDeploymentDescriptor could have two
> > methods:
> > > > > > > > > > >
> > > > > > > > > > > - ClusterClient deploySessionCluster()
> > > > > > > > > > > - JobClusterClient/JobClient
> > deployPerJobCluster(JobGraph)
> > > > > > > > > > >
> > > > > > > > > > > where JobClusterClient is either a supertype of
> > ClusterClient
> > > > > > which
> > > > > > > > > does
> > > > > > > > > > > not give you the functionality to submit jobs or
> > > > > > deployPerJobCluster
> > > > > > > > > > > returns directly a JobClient.
> > > > > > > > > > >
> > > > > > > > > > > When setting up the ExecutionEnvironment, one would
> then
> > not
> > > > > > provide
> > > > > > > > a
> > > > > > > > > > > ClusterClient to submit jobs but a JobDeployer which,
> > > > depending
> > > > > > on
> > > > > > > > the
> > > > > > > > > > > selected mode, either uses a ClusterClient (session
> > mode) to
> > > > > > submit
> > > > > > > > > jobs
> > > > > > > > > > or
> > > > > > > > > > > a ClusterDeploymentDescriptor to deploy per a job mode
> > > > cluster
> > > > > > with
> > > > > > > > the
> > > > > > > > > > job
> > > > > > > > > > > to execute.
> > > > > > > > > > >
> > > > > > > > > > > These are just some thoughts how one could make it
> > working
> > > > > > because I
> > > > > > > > > > > believe there is some value in using the per job mode
> > from
> > > > the
> > > > > > > > > > > ExecutionEnvironment.
> > > > > > > > > > >
> > > > > > > > > > > Concerning the web submission, this is indeed a bit
> > tricky.
> > > > > From
> > > > > > a
> > > > > > > > > > cluster
> > > > > > > > > > > management stand point, I would in favour of not
> > executing
> > > > user
> > > > > > code
> > > > > > > > on
> > > > > > > > > > the
> > > > > > > > > > > REST endpoint. Especially when considering security, it
> > would
> > > > > be
> > > > > > good
> > > > > > > > > to
> > > > > > > > > > > have a well defined cluster behaviour where it is
> > explicitly
> > > > > > stated
> > > > > > > > > where
> > > > > > > > > > > user code and, thus, potentially risky code is
> executed.
> > > > > Ideally
> > > > > > we
> > > > > > > > > limit
> > > > > > > > > > > it to the TaskExecutor and JobMaster.
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Till
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <
> > > > > > > > > pompermaier@okkam.it
> > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > In my opinion the client should not use any
> > environment to
> > > > > get
> > > > > > the
> > > > > > > > > Job
> > > > > > > > > > > > graph because the jar should reside ONLY on the
> cluster
> > > > (and
> > > > > > not in
> > > > > > > > > the
> > > > > > > > > > > > client classpath otherwise there are always
> > inconsistencies
> > > > > > between
> > > > > > > > > > > client
> > > > > > > > > > > > and Flink Job manager's classpath).
> > > > > > > > > > > > In the YARN, Mesos and Kubernetes scenarios you have
> > the
> > > > jar
> > > > > > but
> > > > > > > > you
> > > > > > > > > > > could
> > > > > > > > > > > > start a cluster that has the jar on the Job Manager
> as
> > well
> > > > > > (but
> > > > > > > > this
> > > > > > > > > > is
> > > > > > > > > > > > the only case where I think you can assume that the
> > client
> > > > > has
> > > > > > the
> > > > > > > > > jar
> > > > > > > > > > on
> > > > > > > > > > > > the classpath..in the REST job submission you don't
> > have
> > > > any
> > > > > > > > > > classpath).
> > > > > > > > > > > >
> > > > > > > > > > > > Thus, always in my opinion, the JobGraph should be
> > > > generated
> > > > > > by the
> > > > > > > > > Job
> > > > > > > > > > > > Manager REST API.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
> > > > > > wander4096@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >> I would like to involve Till & Stephan here to
> clarify
> > > > some
> > > > > > > > concept
> > > > > > > > > of
> > > > > > > > > > > >> per-job mode.
> > > > > > > > > > > >>
> > > > > > > > > > > >> The term per-job is one of modes a cluster could run
> > on.
> > > > It
> > > > > is
> > > > > > > > > mainly
> > > > > > > > > > > >> aimed
> > > > > > > > > > > >> at spawn
> > > > > > > > > > > >> a dedicated cluster for a specific job while the job
> > could
> > > > > be
> > > > > > > > > packaged
> > > > > > > > > > > >> with
> > > > > > > > > > > >> Flink
> > > > > > > > > > > >> itself and thus the cluster initialized with job so
> > that
> > > > get
> > > > > > rid
> > > > > > > > of
> > > > > > > > > a
> > > > > > > > > > > >> separated
> > > > > > > > > > > >> submission step.
> > > > > > > > > > > >>
> > > > > > > > > > > >> This is useful for container deployments where one
> > create
> > > > > his
> > > > > > > > image
> > > > > > > > > > with
> > > > > > > > > > > >> the job
> > > > > > > > > > > >> and then simply deploy the container.
> > > > > > > > > > > >>
> > > > > > > > > > > >> However, it is out of client scope since a
> > > > > > client(ClusterClient
> > > > > > > > for
> > > > > > > > > > > >> example) is for
> > > > > > > > > > > >> communicate with an existing cluster and performance
> > > > > actions.
> > > > > > > > > > Currently,
> > > > > > > > > > > >> in
> > > > > > > > > > > >> per-job
> > > > > > > > > > > >> mode, we extract the job graph and bundle it into
> > cluster
> > > > > > > > deployment
> > > > > > > > > > and
> > > > > > > > > > > >> thus no
> > > > > > > > > > > >> concept of client get involved. It looks like
> > reasonable
> > > > to
> > > > > > > > exclude
> > > > > > > > > > the
> > > > > > > > > > > >> deployment
> > > > > > > > > > > >> of per-job cluster from client api and use dedicated
> > > > utility
> > > > > > > > > > > >> classes(deployers) for
> > > > > > > > > > > >> deployment.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Zili Chen <wa...@gmail.com> 于2019年8月20日周二
> > 下午12:37写道：
> > > > > > > > > > > >>
> > > > > > > > > > > >> > Hi Aljoscha,
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Thanks for your reply and participance. The Google
> > Doc
> > > > you
> > > > > > > > linked
> > > > > > > > > to
> > > > > > > > > > > >> > requires
> > > > > > > > > > > >> > permission and I think you could use a share link
> > > > instead.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > I agree with that we almost reach a consensus that
> > > > > > JobClient is
> > > > > > > > > > > >> necessary
> > > > > > > > > > > >> > to
> > > > > > > > > > > >> > interacte with a running Job.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Let me check your open questions one by one.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > 1. Separate cluster creation and job submission
> for
> > > > > per-job
> > > > > > > > mode.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > As you mentioned here is where the opinions
> > diverge. In
> > > > my
> > > > > > > > > document
> > > > > > > > > > > >> there
> > > > > > > > > > > >> > is
> > > > > > > > > > > >> > an alternative[2] that proposes excluding per-job
> > > > > deployment
> > > > > > > > from
> > > > > > > > > > > client
> > > > > > > > > > > >> > api
> > > > > > > > > > > >> > scope and now I find it is more reasonable we do
> the
> > > > > > exclusion.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > When in per-job mode, a dedicated JobCluster is
> > launched
> > > > > to
> > > > > > > > > execute
> > > > > > > > > > > the
> > > > > > > > > > > >> > specific job. It is like a Flink Application more
> > than a
> > > > > > > > > submission
> > > > > > > > > > > >> > of Flink Job. Client only takes care of job
> > submission
> > > > and
> > > > > > > > assume
> > > > > > > > > > > there
> > > > > > > > > > > >> is
> > > > > > > > > > > >> > an existing cluster. In this way we are able to
> > consider
> > > > > > per-job
> > > > > > > > > > > issues
> > > > > > > > > > > >> > individually and JobClusterEntrypoint would be the
> > > > utility
> > > > > > class
> > > > > > > > > for
> > > > > > > > > > > >> > per-job
> > > > > > > > > > > >> > deployment.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Nevertheless, user program works in both session
> > mode
> > > > and
> > > > > > > > per-job
> > > > > > > > > > mode
> > > > > > > > > > > >> > without
> > > > > > > > > > > >> > necessary to change code. JobClient in per-job
> mode
> > is
> > > > > > returned
> > > > > > > > > from
> > > > > > > > > > > >> > env.execute as normal. However, it would be no
> > longer a
> > > > > > wrapper
> > > > > > > > of
> > > > > > > > > > > >> > RestClusterClient but a wrapper of
> > PerJobClusterClient
> > > > > which
> > > > > > > > > > > >> communicates
> > > > > > > > > > > >> > to Dispatcher locally.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > 2. How to deal with plan preview.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > With env.compile functions users can get JobGraph
> or
> > > > > > FlinkPlan
> > > > > > > > and
> > > > > > > > > > > thus
> > > > > > > > > > > >> > they can preview the plan with programming.
> > Typically it
> > > > > > looks
> > > > > > > > > like
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > if (preview configured) {
> > > > > > > > > > > >> >     FlinkPlan plan = env.compile();
> > > > > > > > > > > >> >     new JSONDumpGenerator(...).dump(plan);
> > > > > > > > > > > >> > } else {
> > > > > > > > > > > >> >     env.execute();
> > > > > > > > > > > >> > }
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > And `flink info` would be invalid any more.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > 3. How to deal with Jar Submission at the Web
> > Frontend.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > There is one more thread talked on this topic[1].
> > Apart
> > > > > from
> > > > > > > > > > removing
> > > > > > > > > > > >> > the functions there are two alternatives.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > One is to introduce an interface has a method
> > returns
> > > > > > > > > > > JobGraph/FilnkPlan
> > > > > > > > > > > >> > and Jar Submission only support main-class
> > implements
> > > > this
> > > > > > > > > > interface.
> > > > > > > > > > > >> > And then extract the JobGraph/FlinkPlan just by
> > calling
> > > > > the
> > > > > > > > > method.
> > > > > > > > > > > >> > In this way, it is even possible to consider a
> > > > separation
> > > > > > of job
> > > > > > > > > > > >> creation
> > > > > > > > > > > >> > and job submission.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > The other is, as you mentioned, let execute() do
> the
> > > > > actual
> > > > > > > > > > execution.
> > > > > > > > > > > >> > We won't execute the main method in the
> WebFrontend
> > but
> > > > > > spawn a
> > > > > > > > > > > process
> > > > > > > > > > > >> > at WebMonitor side to execute. For return part we
> > could
> > > > > > generate
> > > > > > > > > the
> > > > > > > > > > > >> > JobID from WebMonitor and pass it to the execution
> > > > > > environemnt.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > 4. How to deal with detached mode.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > I think detached mode is a temporary solution for
> > > > > > non-blocking
> > > > > > > > > > > >> submission.
> > > > > > > > > > > >> > In my document both submission and execution
> return
> > a
> > > > > > > > > > > CompletableFuture
> > > > > > > > > > > >> and
> > > > > > > > > > > >> > users control whether or not wait for the result.
> In
> > > > this
> > > > > > point
> > > > > > > > we
> > > > > > > > > > > don't
> > > > > > > > > > > >> > need a detached option but the functionality is
> > covered.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > 5. How does per-job mode interact with interactive
> > > > > > programming.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > All of YARN, Mesos and Kubernetes scenarios follow
> > the
> > > > > > pattern
> > > > > > > > > > launch
> > > > > > > > > > > a
> > > > > > > > > > > >> > JobCluster now. And I don't think there would be
> > > > > > inconsistency
> > > > > > > > > > between
> > > > > > > > > > > >> > different resource management.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Best,
> > > > > > > > > > > >> > tison.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > [1]
> > > > > > > > > > > >> >
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> > > > > > > > > > > >> > [2]
> > > > > > > > > > > >> >
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Aljoscha Krettek <al...@apache.org>
> > 于2019年8月16日周五
> > > > > > 下午9:20写道：
> > > > > > > > > > > >> >
> > > > > > > > > > > >> >> Hi,
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >> I read both Jeffs initial design document and the
> > newer
> > > > > > > > document
> > > > > > > > > by
> > > > > > > > > > > >> >> Tison. I also finally found the time to collect
> our
> > > > > > thoughts on
> > > > > > > > > the
> > > > > > > > > > > >> issue,
> > > > > > > > > > > >> >> I had quite some discussions with Kostas and this
> > is
> > > > the
> > > > > > > > result:
> > > > > > > > > > [1].
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >> I think overall we agree that this part of the
> > code is
> > > > in
> > > > > > dire
> > > > > > > > > need
> > > > > > > > > > > of
> > > > > > > > > > > >> >> some refactoring/improvements but I think there
> are
> > > > still
> > > > > > some
> > > > > > > > > open
> > > > > > > > > > > >> >> questions and some differences in opinion what
> > those
> > > > > > > > refactorings
> > > > > > > > > > > >> should
> > > > > > > > > > > >> >> look like.
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >> I think the API-side is quite clear, i.e. we need
> > some
> > > > > > > > JobClient
> > > > > > > > > > API
> > > > > > > > > > > >> that
> > > > > > > > > > > >> >> allows interacting with a running Job. It could
> be
> > > > > > worthwhile
> > > > > > > > to
> > > > > > > > > > spin
> > > > > > > > > > > >> that
> > > > > > > > > > > >> >> off into a separate FLIP because we can probably
> > find
> > > > > > consensus
> > > > > > > > > on
> > > > > > > > > > > that
> > > > > > > > > > > >> >> part more easily.
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >> For the rest, the main open questions from our
> doc
> > are
> > > > > > these:
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >>   - Do we want to separate cluster creation and
> job
> > > > > > submission
> > > > > > > > > for
> > > > > > > > > > > >> >> per-job mode? In the past, there were conscious
> > efforts
> > > > > to
> > > > > > > > *not*
> > > > > > > > > > > >> separate
> > > > > > > > > > > >> >> job submission from cluster creation for per-job
> > > > clusters
> > > > > > for
> > > > > > > > > > Mesos,
> > > > > > > > > > > >> YARN,
> > > > > > > > > > > >> >> Kubernets (see StandaloneJobClusterEntryPoint).
> > Tison
> > > > > > suggests
> > > > > > > > in
> > > > > > > > > > his
> > > > > > > > > > > >> >> design document to decouple this in order to
> unify
> > job
> > > > > > > > > submission.
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >>   - How to deal with plan preview, which needs to
> > > > hijack
> > > > > > > > > execute()
> > > > > > > > > > > and
> > > > > > > > > > > >> >> let the outside code catch an exception?
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >>   - How to deal with Jar Submission at the Web
> > > > Frontend,
> > > > > > which
> > > > > > > > > > needs
> > > > > > > > > > > to
> > > > > > > > > > > >> >> hijack execute() and let the outside code catch
> an
> > > > > > exception?
> > > > > > > > > > > >> >> CliFrontend.run() “hijacks”
> > > > > ExecutionEnvironment.execute()
> > > > > > to
> > > > > > > > > get a
> > > > > > > > > > > >> >> JobGraph and then execute that JobGraph manually.
> > We
> > > > > could
> > > > > > get
> > > > > > > > > > around
> > > > > > > > > > > >> that
> > > > > > > > > > > >> >> by letting execute() do the actual execution. One
> > > > caveat
> > > > > > for
> > > > > > > > this
> > > > > > > > > > is
> > > > > > > > > > > >> that
> > > > > > > > > > > >> >> now the main() method doesn’t return (or is
> forced
> > to
> > > > > > return by
> > > > > > > > > > > >> throwing an
> > > > > > > > > > > >> >> exception from execute()) which means that for
> Jar
> > > > > > Submission
> > > > > > > > > from
> > > > > > > > > > > the
> > > > > > > > > > > >> >> WebFrontend we have a long-running main() method
> > > > running
> > > > > > in the
> > > > > > > > > > > >> >> WebFrontend. This doesn’t sound very good. We
> > could get
> > > > > > around
> > > > > > > > > this
> > > > > > > > > > > by
> > > > > > > > > > > >> >> removing the plan preview feature and by removing
> > Jar
> > > > > > > > > > > >> Submission/Running.
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >>   - How to deal with detached mode? Right now,
> > > > > > > > > DetachedEnvironment
> > > > > > > > > > > will
> > > > > > > > > > > >> >> execute the job and return immediately. If users
> > > > control
> > > > > > when
> > > > > > > > > they
> > > > > > > > > > > >> want to
> > > > > > > > > > > >> >> return, by waiting on the job completion future,
> > how do
> > > > > we
> > > > > > deal
> > > > > > > > > > with
> > > > > > > > > > > >> this?
> > > > > > > > > > > >> >> Do we simply remove the distinction between
> > > > > > > > > detached/non-detached?
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >>   - How does per-job mode interact with
> > “interactive
> > > > > > > > programming”
> > > > > > > > > > > >> >> (FLIP-36). For YARN, each execute() call could
> > spawn a
> > > > > new
> > > > > > > > Flink
> > > > > > > > > > YARN
> > > > > > > > > > > >> >> cluster. What about Mesos and Kubernetes?
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >> The first open question is where the opinions
> > diverge,
> > > > I
> > > > > > think.
> > > > > > > > > The
> > > > > > > > > > > >> rest
> > > > > > > > > > > >> >> are just open questions and interesting things
> > that we
> > > > > > need to
> > > > > > > > > > > >> consider.
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >> Best,
> > > > > > > > > > > >> >> Aljoscha
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >> [1]
> > > > > > > > > > > >> >>
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > > > > > > > > >> >> <
> > > > > > > > > > > >> >>
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > > > > > > > > >> >> >
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <
> > > > > zjffdu@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > >> >> >
> > > > > > > > > > > >> >> > Thanks tison for the effort. I left a few
> > comments.
> > > > > > > > > > > >> >> >
> > > > > > > > > > > >> >> >
> > > > > > > > > > > >> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三
> > > > > 下午8:24写道：
> > > > > > > > > > > >> >> >
> > > > > > > > > > > >> >> >> Hi Flavio,
> > > > > > > > > > > >> >> >>
> > > > > > > > > > > >> >> >> Thanks for your reply.
> > > > > > > > > > > >> >> >>
> > > > > > > > > > > >> >> >> Either current impl and in the design,
> > ClusterClient
> > > > > > > > > > > >> >> >> never takes responsibility for generating
> > JobGraph.
> > > > > > > > > > > >> >> >> (what you see in current codebase is several
> > class
> > > > > > methods)
> > > > > > > > > > > >> >> >>
> > > > > > > > > > > >> >> >> Instead, user describes his program in the
> main
> > > > method
> > > > > > > > > > > >> >> >> with ExecutionEnvironment apis and calls
> > > > env.compile()
> > > > > > > > > > > >> >> >> or env.optimize() to get FlinkPlan and
> JobGraph
> > > > > > > > respectively.
> > > > > > > > > > > >> >> >>
> > > > > > > > > > > >> >> >> For listing main classes in a jar and choose
> > one for
> > > > > > > > > > > >> >> >> submission, you're now able to customize a CLI
> > to do
> > > > > it.
> > > > > > > > > > > >> >> >> Specifically, the path of jar is passed as
> > arguments
> > > > > and
> > > > > > > > > > > >> >> >> in the customized CLI you list main classes,
> > choose
> > > > > one
> > > > > > > > > > > >> >> >> to submit to the cluster.
> > > > > > > > > > > >> >> >>
> > > > > > > > > > > >> >> >> Best,
> > > > > > > > > > > >> >> >> tison.
> > > > > > > > > > > >> >> >>
> > > > > > > > > > > >> >> >>
> > > > > > > > > > > >> >> >> Flavio Pompermaier <po...@okkam.it>
> > > > > 于2019年7月31日周三
> > > > > > > > > > 下午8:12写道：
> > > > > > > > > > > >> >> >>
> > > > > > > > > > > >> >> >>> Just one note on my side: it is not clear to
> me
> > > > > > whether the
> > > > > > > > > > > client
> > > > > > > > > > > >> >> needs
> > > > > > > > > > > >> >> >> to
> > > > > > > > > > > >> >> >>> be able to generate a job graph or not.
> > > > > > > > > > > >> >> >>> In my opinion, the job jar must resides only
> > on the
> > > > > > > > > > > >> server/jobManager
> > > > > > > > > > > >> >> >> side
> > > > > > > > > > > >> >> >>> and the client requires a way to get the job
> > graph.
> > > > > > > > > > > >> >> >>> If you really want to access to the job
> graph,
> > I'd
> > > > > add
> > > > > > a
> > > > > > > > > > > dedicated
> > > > > > > > > > > >> >> method
> > > > > > > > > > > >> >> >>> on the ClusterClient. like:
> > > > > > > > > > > >> >> >>>
> > > > > > > > > > > >> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
> > > > > > > > > > > >> >> >>>   - listMainClasses(jarId): List<String>
> > > > > > > > > > > >> >> >>>
> > > > > > > > > > > >> >> >>> These would require some addition also on the
> > job
> > > > > > manager
> > > > > > > > > > > endpoint
> > > > > > > > > > > >> as
> > > > > > > > > > > >> >> >>> well..what do you think?
> > > > > > > > > > > >> >> >>>
> > > > > > > > > > > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <
> > > > > > > > > > wander4096@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > >> >> wrote:
> > > > > > > > > > > >> >> >>>
> > > > > > > > > > > >> >> >>>> Hi all,
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>> Here is a document[1] on client api
> > enhancement
> > > > from
> > > > > > our
> > > > > > > > > > > >> perspective.
> > > > > > > > > > > >> >> >>>> We have investigated current
> implementations.
> > And
> > > > we
> > > > > > > > propose
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>> 1. Unify the implementation of cluster
> > deployment
> > > > > and
> > > > > > job
> > > > > > > > > > > >> submission
> > > > > > > > > > > >> >> in
> > > > > > > > > > > >> >> >>>> Flink.
> > > > > > > > > > > >> >> >>>> 2. Provide programmatic interfaces to allow
> > > > flexible
> > > > > > job
> > > > > > > > and
> > > > > > > > > > > >> cluster
> > > > > > > > > > > >> >> >>>> management.
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>> The first proposal is aimed at reducing code
> > paths
> > > > > of
> > > > > > > > > cluster
> > > > > > > > > > > >> >> >> deployment
> > > > > > > > > > > >> >> >>>> and
> > > > > > > > > > > >> >> >>>> job submission so that one can adopt Flink
> in
> > his
> > > > > > usage
> > > > > > > > > > easily.
> > > > > > > > > > > >> The
> > > > > > > > > > > >> >> >>> second
> > > > > > > > > > > >> >> >>>> proposal is aimed at providing rich
> > interfaces for
> > > > > > > > advanced
> > > > > > > > > > > users
> > > > > > > > > > > >> >> >>>> who want to make accurate control of these
> > stages.
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>> Quick reference on open questions:
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>> 1. Exclude job cluster deployment from
> client
> > side
> > > > > or
> > > > > > > > > redefine
> > > > > > > > > > > the
> > > > > > > > > > > >> >> >>> semantic
> > > > > > > > > > > >> >> >>>> of job cluster? Since it fits in a process
> > quite
> > > > > > different
> > > > > > > > > > from
> > > > > > > > > > > >> >> session
> > > > > > > > > > > >> >> >>>> cluster deployment and job submission.
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>> 2. Maintain the codepaths handling class
> > > > > > > > > > > o.a.f.api.common.Program
> > > > > > > > > > > >> or
> > > > > > > > > > > >> >> >>>> implement customized program handling logic
> by
> > > > > > customized
> > > > > > > > > > > >> >> CliFrontend?
> > > > > > > > > > > >> >> >>>> See also this thread[2] and the document[1].
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>> 3. Expose ClusterClient as public api or
> just
> > > > expose
> > > > > > api
> > > > > > > > in
> > > > > > > > > > > >> >> >>>> ExecutionEnvironment
> > > > > > > > > > > >> >> >>>> and delegate them to ClusterClient? Further,
> > in
> > > > > > either way
> > > > > > > > > is
> > > > > > > > > > it
> > > > > > > > > > > >> >> worth
> > > > > > > > > > > >> >> >> to
> > > > > > > > > > > >> >> >>>> introduce a JobClient which is an
> > encapsulation of
> > > > > > > > > > ClusterClient
> > > > > > > > > > > >> that
> > > > > > > > > > > >> >> >>>> associated to specific job?
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>> Best,
> > > > > > > > > > > >> >> >>>> tison.
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>> [1]
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>
> > > > > > > > > > > >> >> >>
> > > > > > > > > > > >> >>
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> > > > > > > > > > > >> >> >>>> [2]
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>
> > > > > > > > > > > >> >> >>
> > > > > > > > > > > >> >>
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三
> > > > > 上午9:19写道：
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>>> Thanks Stephan, I will follow up this issue
> > in
> > > > next
> > > > > > few
> > > > > > > > > > weeks,
> > > > > > > > > > > >> and
> > > > > > > > > > > >> >> >> will
> > > > > > > > > > > >> >> >>>>> refine the design doc. We could discuss
> more
> > > > > details
> > > > > > > > after
> > > > > > > > > > 1.9
> > > > > > > > > > > >> >> >> release.
> > > > > > > > > > > >> >> >>>>>
> > > > > > > > > > > >> >> >>>>> Stephan Ewen <se...@apache.org>
> > 于2019年7月24日周三
> > > > > > 上午12:58写道：
> > > > > > > > > > > >> >> >>>>>
> > > > > > > > > > > >> >> >>>>>> Hi all!
> > > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > > >> >> >>>>>> This thread has stalled for a bit, which I
> > > > assume
> > > > > > ist
> > > > > > > > > mostly
> > > > > > > > > > > >> due to
> > > > > > > > > > > >> >> >>> the
> > > > > > > > > > > >> >> >>>>>> Flink 1.9 feature freeze and release
> testing
> > > > > effort.
> > > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > > >> >> >>>>>> I personally still recognize this issue as
> > one
> > > > > > important
> > > > > > > > > to
> > > > > > > > > > be
> > > > > > > > > > > >> >> >>> solved.
> > > > > > > > > > > >> >> >>>>> I'd
> > > > > > > > > > > >> >> >>>>>> be happy to help resume this discussion
> soon
> > > > > (after
> > > > > > the
> > > > > > > > > 1.9
> > > > > > > > > > > >> >> >> release)
> > > > > > > > > > > >> >> >>>> and
> > > > > > > > > > > >> >> >>>>>> see if we can do some step towards this in
> > Flink
> > > > > > 1.10.
> > > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > > >> >> >>>>>> Best,
> > > > > > > > > > > >> >> >>>>>> Stephan
> > > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio
> > > > > Pompermaier
> > > > > > <
> > > > > > > > > > > >> >> >>>>> pompermaier@okkam.it>
> > > > > > > > > > > >> >> >>>>>> wrote:
> > > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > > >> >> >>>>>>> That's exactly what I suggested a long
> time
> > > > ago:
> > > > > > the
> > > > > > > > > Flink
> > > > > > > > > > > REST
> > > > > > > > > > > >> >> >>>> client
> > > > > > > > > > > >> >> >>>>>>> should not require any Flink dependency,
> > only
> > > > > http
> > > > > > > > > library
> > > > > > > > > > to
> > > > > > > > > > > >> >> >> call
> > > > > > > > > > > >> >> >>>> the
> > > > > > > > > > > >> >> >>>>>> REST
> > > > > > > > > > > >> >> >>>>>>> services to submit and monitor a job.
> > > > > > > > > > > >> >> >>>>>>> What I suggested also in [1] was to have
> a
> > way
> > > > to
> > > > > > > > > > > automatically
> > > > > > > > > > > >> >> >>>> suggest
> > > > > > > > > > > >> >> >>>>>> the
> > > > > > > > > > > >> >> >>>>>>> user (via a UI) the available main
> classes
> > and
> > > > > > their
> > > > > > > > > > required
> > > > > > > > > > > >> >> >>>>>>> parameters[2].
> > > > > > > > > > > >> >> >>>>>>> Another problem we have with Flink is
> that
> > the
> > > > > Rest
> > > > > > > > > client
> > > > > > > > > > > and
> > > > > > > > > > > >> >> >> the
> > > > > > > > > > > >> >> >>>> CLI
> > > > > > > > > > > >> >> >>>>>> one
> > > > > > > > > > > >> >> >>>>>>> behaves differently and we use the CLI
> > client
> > > > > (via
> > > > > > ssh)
> > > > > > > > > > > because
> > > > > > > > > > > >> >> >> it
> > > > > > > > > > > >> >> >>>>> allows
> > > > > > > > > > > >> >> >>>>>>> to call some other method after
> > env.execute()
> > > > [3]
> > > > > > (we
> > > > > > > > > have
> > > > > > > > > > to
> > > > > > > > > > > >> >> >> call
> > > > > > > > > > > >> >> >>>>>> another
> > > > > > > > > > > >> >> >>>>>>> REST service to signal the end of the
> job).
> > > > > > > > > > > >> >> >>>>>>> Int his regard, a dedicated interface,
> > like the
> > > > > > > > > JobListener
> > > > > > > > > > > >> >> >>> suggested
> > > > > > > > > > > >> >> >>>>> in
> > > > > > > > > > > >> >> >>>>>>> the previous emails, would be very
> helpful
> > > > > (IMHO).
> > > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > > >> >> >>>>>>> [1]
> > > > > > https://issues.apache.org/jira/browse/FLINK-10864
> > > > > > > > > > > >> >> >>>>>>> [2]
> > > > > > https://issues.apache.org/jira/browse/FLINK-10862
> > > > > > > > > > > >> >> >>>>>>> [3]
> > > > > > https://issues.apache.org/jira/browse/FLINK-10879
> > > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > > >> >> >>>>>>> Best,
> > > > > > > > > > > >> >> >>>>>>> Flavio
> > > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff
> Zhang
> > <
> > > > > > > > > > zjffdu@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > >> >> >>> wrote:
> > > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > > >> >> >>>>>>>> Hi, Tison,
> > > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>> Thanks for your comments. Overall I
> agree
> > with
> > > > > you
> > > > > > > > that
> > > > > > > > > it
> > > > > > > > > > > is
> > > > > > > > > > > >> >> >>>>> difficult
> > > > > > > > > > > >> >> >>>>>>> for
> > > > > > > > > > > >> >> >>>>>>>> down stream project to integrate with
> > flink
> > > > and
> > > > > we
> > > > > > > > need
> > > > > > > > > to
> > > > > > > > > > > >> >> >>> refactor
> > > > > > > > > > > >> >> >>>>> the
> > > > > > > > > > > >> >> >>>>>>>> current flink client api.
> > > > > > > > > > > >> >> >>>>>>>> And I agree that CliFrontend should only
> > > > parsing
> > > > > > > > command
> > > > > > > > > > > line
> > > > > > > > > > > >> >> >>>>> arguments
> > > > > > > > > > > >> >> >>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>> then pass them to ExecutionEnvironment.
> > It is
> > > > > > > > > > > >> >> >>>> ExecutionEnvironment's
> > > > > > > > > > > >> >> >>>>>>>> responsibility to compile job, create
> > cluster,
> > > > > and
> > > > > > > > > submit
> > > > > > > > > > > job.
> > > > > > > > > > > >> >> >>>>> Besides
> > > > > > > > > > > >> >> >>>>>>>> that, Currently flink has many
> > > > > > ExecutionEnvironment
> > > > > > > > > > > >> >> >>>> implementations,
> > > > > > > > > > > >> >> >>>>>> and
> > > > > > > > > > > >> >> >>>>>>>> flink will use the specific one based on
> > the
> > > > > > context.
> > > > > > > > > > IMHO,
> > > > > > > > > > > it
> > > > > > > > > > > >> >> >> is
> > > > > > > > > > > >> >> >>>> not
> > > > > > > > > > > >> >> >>>>>>>> necessary, ExecutionEnvironment should
> be
> > able
> > > > > to
> > > > > > do
> > > > > > > > the
> > > > > > > > > > > right
> > > > > > > > > > > >> >> >>>> thing
> > > > > > > > > > > >> >> >>>>>>> based
> > > > > > > > > > > >> >> >>>>>>>> on the FlinkConf it is received. Too
> many
> > > > > > > > > > > ExecutionEnvironment
> > > > > > > > > > > >> >> >>>>>>>> implementation is another burden for
> > > > downstream
> > > > > > > > project
> > > > > > > > > > > >> >> >>>> integration.
> > > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>> One thing I'd like to mention is flink's
> > scala
> > > > > > shell
> > > > > > > > and
> > > > > > > > > > sql
> > > > > > > > > > > >> >> >>>> client,
> > > > > > > > > > > >> >> >>>>>>>> although they are sub-modules of flink,
> > they
> > > > > > could be
> > > > > > > > > > > treated
> > > > > > > > > > > >> >> >> as
> > > > > > > > > > > >> >> >>>>>>> downstream
> > > > > > > > > > > >> >> >>>>>>>> project which use flink's client api.
> > > > Currently
> > > > > > you
> > > > > > > > will
> > > > > > > > > > > find
> > > > > > > > > > > >> >> >> it
> > > > > > > > > > > >> >> >>> is
> > > > > > > > > > > >> >> >>>>> not
> > > > > > > > > > > >> >> >>>>>>>> easy for them to integrate with flink,
> > they
> > > > > share
> > > > > > many
> > > > > > > > > > > >> >> >> duplicated
> > > > > > > > > > > >> >> >>>>> code.
> > > > > > > > > > > >> >> >>>>>>> It
> > > > > > > > > > > >> >> >>>>>>>> is another sign that we should refactor
> > flink
> > > > > > client
> > > > > > > > > api.
> > > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>> I believe it is a large and hard change,
> > and I
> > > > > am
> > > > > > > > afraid
> > > > > > > > > > we
> > > > > > > > > > > >> can
> > > > > > > > > > > >> >> >>> not
> > > > > > > > > > > >> >> >>>>>> keep
> > > > > > > > > > > >> >> >>>>>>>> compatibility since many of changes are
> > user
> > > > > > facing.
> > > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>> Zili Chen <wa...@gmail.com>
> > > > 于2019年6月24日周一
> > > > > > > > > 下午2:53写道：
> > > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> Hi all,
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> After a closer look on our client apis,
> > I can
> > > > > see
> > > > > > > > there
> > > > > > > > > > are
> > > > > > > > > > > >> >> >> two
> > > > > > > > > > > >> >> >>>>> major
> > > > > > > > > > > >> >> >>>>>>>>> issues to consistency and integration,
> > namely
> > > > > > > > different
> > > > > > > > > > > >> >> >>>> deployment
> > > > > > > > > > > >> >> >>>>> of
> > > > > > > > > > > >> >> >>>>>>>>> job cluster which couples job graph
> > creation
> > > > > and
> > > > > > > > > cluster
> > > > > > > > > > > >> >> >>>>> deployment,
> > > > > > > > > > > >> >> >>>>>>>>> and submission via CliFrontend
> confusing
> > > > > control
> > > > > > flow
> > > > > > > > > of
> > > > > > > > > > > job
> > > > > > > > > > > >> >> >>>> graph
> > > > > > > > > > > >> >> >>>>>>>>> compilation and job submission. I'd
> like
> > to
> > > > > > follow
> > > > > > > > the
> > > > > > > > > > > >> >> >> discuss
> > > > > > > > > > > >> >> >>>>> above,
> > > > > > > > > > > >> >> >>>>>>>>> mainly the process described by Jeff
> and
> > > > > > Stephan, and
> > > > > > > > > > share
> > > > > > > > > > > >> >> >> my
> > > > > > > > > > > >> >> >>>>>>>>> ideas on these issues.
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend confuses the control
> flow
> > of
> > > > job
> > > > > > > > > > compilation
> > > > > > > > > > > >> >> >> and
> > > > > > > > > > > >> >> >>>>>>>> submission.
> > > > > > > > > > > >> >> >>>>>>>>> Following the process of job submission
> > > > Stephan
> > > > > > and
> > > > > > > > > Jeff
> > > > > > > > > > > >> >> >>>> described,
> > > > > > > > > > > >> >> >>>>>>>>> execution environment knows all configs
> > of
> > > > the
> > > > > > > > cluster
> > > > > > > > > > and
> > > > > > > > > > > >> >> >>>>>>> topos/settings
> > > > > > > > > > > >> >> >>>>>>>>> of the job. Ideally, in the main method
> > of
> > > > user
> > > > > > > > > program,
> > > > > > > > > > it
> > > > > > > > > > > >> >> >>> calls
> > > > > > > > > > > >> >> >>>>>>>> #execute
> > > > > > > > > > > >> >> >>>>>>>>> (or named #submit) and Flink deploys
> the
> > > > > cluster,
> > > > > > > > > compile
> > > > > > > > > > > the
> > > > > > > > > > > >> >> >>> job
> > > > > > > > > > > >> >> >>>>>> graph
> > > > > > > > > > > >> >> >>>>>>>>> and submit it to the cluster. However,
> > > > current
> > > > > > > > > > CliFrontend
> > > > > > > > > > > >> >> >> does
> > > > > > > > > > > >> >> >>>> all
> > > > > > > > > > > >> >> >>>>>>> these
> > > > > > > > > > > >> >> >>>>>>>>> things inside its #runProgram method,
> > which
> > > > > > > > introduces
> > > > > > > > > a
> > > > > > > > > > > lot
> > > > > > > > > > > >> >> >> of
> > > > > > > > > > > >> >> >>>>>>>> subclasses
> > > > > > > > > > > >> >> >>>>>>>>> of (stream) execution environment.
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> Actually, it sets up an exec env that
> > hijacks
> > > > > the
> > > > > > > > > > > >> >> >>>>>> #execute/executePlan
> > > > > > > > > > > >> >> >>>>>>>>> method, initializes the job graph and
> > abort
> > > > > > > > execution.
> > > > > > > > > > And
> > > > > > > > > > > >> >> >> then
> > > > > > > > > > > >> >> >>>>>>>>> control flow back to CliFrontend, it
> > deploys
> > > > > the
> > > > > > > > > > cluster(or
> > > > > > > > > > > >> >> >>>>> retrieve
> > > > > > > > > > > >> >> >>>>>>>>> the client) and submits the job graph.
> > This
> > > > is
> > > > > > quite
> > > > > > > > a
> > > > > > > > > > > >> >> >> specific
> > > > > > > > > > > >> >> >>>>>>> internal
> > > > > > > > > > > >> >> >>>>>>>>> process inside Flink and none of
> > consistency
> > > > to
> > > > > > > > > anything.
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> 2) Deployment of job cluster couples
> job
> > > > graph
> > > > > > > > creation
> > > > > > > > > > and
> > > > > > > > > > > >> >> >>>> cluster
> > > > > > > > > > > >> >> >>>>>>>>> deployment. Abstractly, from user job
> to
> > a
> > > > > > concrete
> > > > > > > > > > > >> >> >> submission,
> > > > > > > > > > > >> >> >>>> it
> > > > > > > > > > > >> >> >>>>>>>> requires
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>     create JobGraph --\
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> create ClusterClient -->  submit
> JobGraph
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> such a dependency. ClusterClient was
> > created
> > > > by
> > > > > > > > > deploying
> > > > > > > > > > > or
> > > > > > > > > > > >> >> >>>>>>> retrieving.
> > > > > > > > > > > >> >> >>>>>>>>> JobGraph submission requires a compiled
> > > > > JobGraph
> > > > > > and
> > > > > > > > > > valid
> > > > > > > > > > > >> >> >>>>>>> ClusterClient,
> > > > > > > > > > > >> >> >>>>>>>>> but the creation of ClusterClient is
> > > > abstractly
> > > > > > > > > > independent
> > > > > > > > > > > >> >> >> of
> > > > > > > > > > > >> >> >>>> that
> > > > > > > > > > > >> >> >>>>>> of
> > > > > > > > > > > >> >> >>>>>>>>> JobGraph. However, in job cluster mode,
> > we
> > > > > > deploy job
> > > > > > > > > > > cluster
> > > > > > > > > > > >> >> >>>> with
> > > > > > > > > > > >> >> >>>>> a
> > > > > > > > > > > >> >> >>>>>>> job
> > > > > > > > > > > >> >> >>>>>>>>> graph, which means we use another
> > process:
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> create JobGraph --> deploy cluster with
> > the
> > > > > > JobGraph
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> Here is another inconsistency and
> > downstream
> > > > > > > > > > > projects/client
> > > > > > > > > > > >> >> >>> apis
> > > > > > > > > > > >> >> >>>>> are
> > > > > > > > > > > >> >> >>>>>>>>> forced to handle different cases with
> > rare
> > > > > > supports
> > > > > > > > > from
> > > > > > > > > > > >> >> >> Flink.
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> Since we likely reached a consensus on
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> 1. all configs gathered by Flink
> > > > configuration
> > > > > > and
> > > > > > > > > passed
> > > > > > > > > > > >> >> >>>>>>>>> 2. execution environment knows all
> > configs
> > > > and
> > > > > > > > handles
> > > > > > > > > > > >> >> >>>>> execution(both
> > > > > > > > > > > >> >> >>>>>>>>> deployment and submission)
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> to the issues above I propose
> eliminating
> > > > > > > > > inconsistencies
> > > > > > > > > > > by
> > > > > > > > > > > >> >> >>>>>> following
> > > > > > > > > > > >> >> >>>>>>>>> approach:
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend should exactly be a
> front
> > end,
> > > > > at
> > > > > > > > least
> > > > > > > > > > for
> > > > > > > > > > > >> >> >>> "run"
> > > > > > > > > > > >> >> >>>>>>> command.
> > > > > > > > > > > >> >> >>>>>>>>> That means it just gathered and passed
> > all
> > > > > config
> > > > > > > > from
> > > > > > > > > > > >> >> >> command
> > > > > > > > > > > >> >> >>>> line
> > > > > > > > > > > >> >> >>>>>> to
> > > > > > > > > > > >> >> >>>>>>>>> the main method of user program.
> > Execution
> > > > > > > > environment
> > > > > > > > > > > knows
> > > > > > > > > > > >> >> >>> all
> > > > > > > > > > > >> >> >>>>> the
> > > > > > > > > > > >> >> >>>>>>> info
> > > > > > > > > > > >> >> >>>>>>>>> and with an addition to utils for
> > > > > ClusterClient,
> > > > > > we
> > > > > > > > > > > >> >> >> gracefully
> > > > > > > > > > > >> >> >>>> get
> > > > > > > > > > > >> >> >>>>> a
> > > > > > > > > > > >> >> >>>>>>>>> ClusterClient by deploying or
> > retrieving. In
> > > > > this
> > > > > > > > way,
> > > > > > > > > we
> > > > > > > > > > > >> >> >> don't
> > > > > > > > > > > >> >> >>>>> need
> > > > > > > > > > > >> >> >>>>>> to
> > > > > > > > > > > >> >> >>>>>>>>> hijack #execute/executePlan methods and
> > can
> > > > > > remove
> > > > > > > > > > various
> > > > > > > > > > > >> >> >>>> hacking
> > > > > > > > > > > >> >> >>>>>>>>> subclasses of exec env, as well as #run
> > > > methods
> > > > > > in
> > > > > > > > > > > >> >> >>>>> ClusterClient(for
> > > > > > > > > > > >> >> >>>>>> an
> > > > > > > > > > > >> >> >>>>>>>>> interface-ized ClusterClient). Now the
> > > > control
> > > > > > flow
> > > > > > > > > flows
> > > > > > > > > > > >> >> >> from
> > > > > > > > > > > >> >> >>>>>>>> CliFrontend
> > > > > > > > > > > >> >> >>>>>>>>> to the main method and never returns.
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> 2) Job cluster means a cluster for the
> > > > specific
> > > > > > job.
> > > > > > > > > From
> > > > > > > > > > > >> >> >>> another
> > > > > > > > > > > >> >> >>>>>>>>> perspective, it is an ephemeral
> session.
> > We
> > > > may
> > > > > > > > > decouple
> > > > > > > > > > > the
> > > > > > > > > > > >> >> >>>>>> deployment
> > > > > > > > > > > >> >> >>>>>>>>> with a compiled job graph, but start a
> > > > session
> > > > > > with
> > > > > > > > > idle
> > > > > > > > > > > >> >> >>> timeout
> > > > > > > > > > > >> >> >>>>>>>>> and submit the job following.
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> These topics, before we go into more
> > details
> > > > on
> > > > > > > > design
> > > > > > > > > or
> > > > > > > > > > > >> >> >>>>>>> implementation,
> > > > > > > > > > > >> >> >>>>>>>>> are better to be aware and discussed
> for
> > a
> > > > > > consensus.
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> Best,
> > > > > > > > > > > >> >> >>>>>>>>> tison.
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>> Zili Chen <wa...@gmail.com>
> > > > 于2019年6月20日周四
> > > > > > > > > 上午3:21写道：
> > > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>> Hi Jeff,
> > > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>> Thanks for raising this thread and the
> > > > design
> > > > > > > > > document!
> > > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>> As @Thomas Weise mentioned above,
> > extending
> > > > > > config
> > > > > > > > to
> > > > > > > > > > > flink
> > > > > > > > > > > >> >> >>>>>>>>>> requires far more effort than it
> should
> > be.
> > > > > > Another
> > > > > > > > > > > example
> > > > > > > > > > > >> >> >>>>>>>>>> is we achieve detach mode by introduce
> > > > another
> > > > > > > > > execution
> > > > > > > > > > > >> >> >>>>>>>>>> environment which also hijack #execute
> > > > method.
> > > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>> I agree with your idea that user would
> > > > > > configure all
> > > > > > > > > > > things
> > > > > > > > > > > >> >> >>>>>>>>>> and flink "just" respect it. On this
> > topic I
> > > > > > think
> > > > > > > > the
> > > > > > > > > > > >> >> >> unusual
> > > > > > > > > > > >> >> >>>>>>>>>> control flow when CliFrontend handle
> > "run"
> > > > > > command
> > > > > > > > is
> > > > > > > > > > the
> > > > > > > > > > > >> >> >>>> problem.
> > > > > > > > > > > >> >> >>>>>>>>>> It handles several configs, mainly
> about
> > > > > cluster
> > > > > > > > > > settings,
> > > > > > > > > > > >> >> >> and
> > > > > > > > > > > >> >> >>>>>>>>>> thus main method of user program is
> > unaware
> > > > of
> > > > > > them.
> > > > > > > > > > Also
> > > > > > > > > > > it
> > > > > > > > > > > >> >> >>>>>> compiles
> > > > > > > > > > > >> >> >>>>>>>>>> app to job graph by run the main
> method
> > > > with a
> > > > > > > > > hijacked
> > > > > > > > > > > exec
> > > > > > > > > > > >> >> >>>> env,
> > > > > > > > > > > >> >> >>>>>>>>>> which constrain the main method
> further.
> > > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>> I'd like to write down a few of notes
> on
> > > > > > > > configs/args
> > > > > > > > > > pass
> > > > > > > > > > > >> >> >> and
> > > > > > > > > > > >> >> >>>>>>> respect,
> > > > > > > > > > > >> >> >>>>>>>>>> as well as decoupling job compilation
> > and
> > > > > > > > submission.
> > > > > > > > > > > Share
> > > > > > > > > > > >> >> >> on
> > > > > > > > > > > >> >> >>>>> this
> > > > > > > > > > > >> >> >>>>>>>>>> thread later.
> > > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>> Best,
> > > > > > > > > > > >> >> >>>>>>>>>> tison.
> > > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com>
> > > > > > 于2019年6月17日周一
> > > > > > > > > > > >> >> >> 下午7:29写道：
> > > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the
> > design
> > > > > > > > document.
> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>> We are also working on refactoring
> > > > > > ClusterClient to
> > > > > > > > > > allow
> > > > > > > > > > > >> >> >>>>> flexible
> > > > > > > > > > > >> >> >>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>> efficient job management in our
> > real-time
> > > > > > platform.
> > > > > > > > > > > >> >> >>>>>>>>>>> We would like to draft a document to
> > share
> > > > > our
> > > > > > > > ideas
> > > > > > > > > > with
> > > > > > > > > > > >> >> >>> you.
> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>> I think it's a good idea to have
> > something
> > > > > like
> > > > > > > > > Apache
> > > > > > > > > > > Livy
> > > > > > > > > > > >> >> >>> for
> > > > > > > > > > > >> >> >>>>>>> Flink,
> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>> the efforts discussed here will take
> a
> > > > great
> > > > > > step
> > > > > > > > > > forward
> > > > > > > > > > > >> >> >> to
> > > > > > > > > > > >> >> >>>> it.
> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>> Regards,
> > > > > > > > > > > >> >> >>>>>>>>>>> Xiaogang
> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>> Flavio Pompermaier <
> > pompermaier@okkam.it>
> > > > > > > > > > 于2019年6月17日周一
> > > > > > > > > > > >> >> >>>>> 下午7:13写道：
> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>> Is there any possibility to have
> > something
> > > > > > like
> > > > > > > > > Apache
> > > > > > > > > > > >> >> >> Livy
> > > > > > > > > > > >> >> >>>> [1]
> > > > > > > > > > > >> >> >>>>>>> also
> > > > > > > > > > > >> >> >>>>>>>>>>> for
> > > > > > > > > > > >> >> >>>>>>>>>>>> Flink in the future?
> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff
> > > > Zhang <
> > > > > > > > > > > >> >> >>> zjffdu@gmail.com
> > > > > > > > > > > >> >> >>>>>
> > > > > > > > > > > >> >> >>>>>>> wrote:
> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Any API we expose should not
> have
> > > > > > dependencies
> > > > > > > > > on
> > > > > > > > > > > >> >> >>> the
> > > > > > > > > > > >> >> >>>>>>> runtime
> > > > > > > > > > > >> >> >>>>>>>>>>>>> (flink-runtime) package or other
> > > > > > implementation
> > > > > > > > > > > >> >> >> details.
> > > > > > > > > > > >> >> >>> To
> > > > > > > > > > > >> >> >>>>> me,
> > > > > > > > > > > >> >> >>>>>>>> this
> > > > > > > > > > > >> >> >>>>>>>>>>>> means
> > > > > > > > > > > >> >> >>>>>>>>>>>>> that the current ClusterClient
> > cannot be
> > > > > > exposed
> > > > > > > > to
> > > > > > > > > > > >> >> >> users
> > > > > > > > > > > >> >> >>>>>> because
> > > > > > > > > > > >> >> >>>>>>>> it
> > > > > > > > > > > >> >> >>>>>>>>>>>> uses
> > > > > > > > > > > >> >> >>>>>>>>>>>>> quite some classes from the
> > optimiser and
> > > > > > runtime
> > > > > > > > > > > >> >> >>> packages.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>> We should change ClusterClient from
> > class
> > > > > to
> > > > > > > > > > interface.
> > > > > > > > > > > >> >> >>>>>>>>>>>>> ExecutionEnvironment only use the
> > > > interface
> > > > > > > > > > > >> >> >> ClusterClient
> > > > > > > > > > > >> >> >>>>> which
> > > > > > > > > > > >> >> >>>>>>>>>>> should be
> > > > > > > > > > > >> >> >>>>>>>>>>>>> in flink-clients while the concrete
> > > > > > > > implementation
> > > > > > > > > > > >> >> >> class
> > > > > > > > > > > >> >> >>>>> could
> > > > > > > > > > > >> >> >>>>>> be
> > > > > > > > > > > >> >> >>>>>>>> in
> > > > > > > > > > > >> >> >>>>>>>>>>>>> flink-runtime.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What happens when a
> > failure/restart in
> > > > > the
> > > > > > > > > client
> > > > > > > > > > > >> >> >>>>> happens?
> > > > > > > > > > > >> >> >>>>>>>> There
> > > > > > > > > > > >> >> >>>>>>>>>>> need
> > > > > > > > > > > >> >> >>>>>>>>>>>>> to be a way of re-establishing the
> > > > > > connection to
> > > > > > > > > the
> > > > > > > > > > > >> >> >> job,
> > > > > > > > > > > >> >> >>>> set
> > > > > > > > > > > >> >> >>>>>> up
> > > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>> listeners again, etc.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>> Good point.  First we need to
> define
> > what
> > > > > > does
> > > > > > > > > > > >> >> >>>>> failure/restart
> > > > > > > > > > > >> >> >>>>>> in
> > > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>> client mean. IIUC, that usually
> mean
> > > > > network
> > > > > > > > > failure
> > > > > > > > > > > >> >> >>> which
> > > > > > > > > > > >> >> >>>>> will
> > > > > > > > > > > >> >> >>>>>>>>>>> happen in
> > > > > > > > > > > >> >> >>>>>>>>>>>>> class RestClient. If my
> > understanding is
> > > > > > correct,
> > > > > > > > > > > >> >> >>>>> restart/retry
> > > > > > > > > > > >> >> >>>>>>>>>>> mechanism
> > > > > > > > > > > >> >> >>>>>>>>>>>>> should be done in RestClient.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>> Aljoscha Krettek <
> > aljoscha@apache.org>
> > > > > > > > > 于2019年6月11日周二
> > > > > > > > > > > >> >> >>>>>> 下午11:10写道：
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>> Some points to consider:
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>> * Any API we expose should not
> have
> > > > > > dependencies
> > > > > > > > > on
> > > > > > > > > > > >> >> >> the
> > > > > > > > > > > >> >> >>>>>> runtime
> > > > > > > > > > > >> >> >>>>>>>>>>>>>> (flink-runtime) package or other
> > > > > > implementation
> > > > > > > > > > > >> >> >>> details.
> > > > > > > > > > > >> >> >>>> To
> > > > > > > > > > > >> >> >>>>>> me,
> > > > > > > > > > > >> >> >>>>>>>>>>> this
> > > > > > > > > > > >> >> >>>>>>>>>>>>> means
> > > > > > > > > > > >> >> >>>>>>>>>>>>>> that the current ClusterClient
> > cannot be
> > > > > > exposed
> > > > > > > > > to
> > > > > > > > > > > >> >> >>> users
> > > > > > > > > > > >> >> >>>>>>> because
> > > > > > > > > > > >> >> >>>>>>>>>>> it
> > > > > > > > > > > >> >> >>>>>>>>>>>>> uses
> > > > > > > > > > > >> >> >>>>>>>>>>>>>> quite some classes from the
> > optimiser
> > > > and
> > > > > > > > runtime
> > > > > > > > > > > >> >> >>>> packages.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>> * What happens when a
> > failure/restart in
> > > > > the
> > > > > > > > > client
> > > > > > > > > > > >> >> >>>>> happens?
> > > > > > > > > > > >> >> >>>>>>>> There
> > > > > > > > > > > >> >> >>>>>>>>>>> need
> > > > > > > > > > > >> >> >>>>>>>>>>>>> to
> > > > > > > > > > > >> >> >>>>>>>>>>>>>> be a way of re-establishing the
> > > > connection
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > >> >> >> job,
> > > > > > > > > > > >> >> >>>> set
> > > > > > > > > > > >> >> >>>>> up
> > > > > > > > > > > >> >> >>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>> listeners
> > > > > > > > > > > >> >> >>>>>>>>>>>>>> again, etc.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>> Aljoscha
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff
> > Zhang <
> > > > > > > > > > > >> >> >>>> zjffdu@gmail.com>
> > > > > > > > > > > >> >> >>>>>>>> wrote:
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is
> > late as
> > > > > you
> > > > > > > > > > > >> >> >> expected.
> > > > > > > > > > > >> >> >>>>> Here's
> > > > > > > > > > > >> >> >>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>> design
> > > > > > > > > > > >> >> >>>>>>>>>>>>>> doc
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> I drafted, welcome any comments
> and
> > > > > > feedback.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > > >> >> >>>>>
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>
> > > > > > > > > > > >> >> >>
> > > > > > > > > > > >> >>
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
> > > > > > 于2019年2月14日周四
> > > > > > > > > > > >> >> >>>> 下午8:43写道：
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Nice that this discussion is
> > > > happening.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also
> > revisit the
> > > > > > entire
> > > > > > > > > role
> > > > > > > > > > > >> >> >>> of
> > > > > > > > > > > >> >> >>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>> environments
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> again.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the environments take care of
> > the
> > > > > > specific
> > > > > > > > > > > >> >> >> setup
> > > > > > > > > > > >> >> >>>> for
> > > > > > > > > > > >> >> >>>>>>>>>>> standalone
> > > > > > > > > > > >> >> >>>>>>>>>>>> (no
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the session ones have control
> > over
> > > > the
> > > > > > > > > session.
> > > > > > > > > > > >> >> >>> The
> > > > > > > > > > > >> >> >>>>>>>>>>> environment
> > > > > > > > > > > >> >> >>>>>>>>>>>>> holds
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the session client.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - running a job gives a
> "control"
> > > > object
> > > > > > for
> > > > > > > > > that
> > > > > > > > > > > >> >> >>>> job.
> > > > > > > > > > > >> >> >>>>>> That
> > > > > > > > > > > >> >> >>>>>>>>>>>> behavior
> > > > > > > > > > > >> >> >>>>>>>>>>>>> is
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the same in all environments.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> The actual implementation
> diverged
> > > > quite
> > > > > > a bit
> > > > > > > > > > > >> >> >> from
> > > > > > > > > > > >> >> >>>>> that.
> > > > > > > > > > > >> >> >>>>>>>> Happy
> > > > > > > > > > > >> >> >>>>>>>>>>> to
> > > > > > > > > > > >> >> >>>>>>>>>>>>> see a
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> discussion about straitening
> this
> > out
> > > > a
> > > > > > bit
> > > > > > > > > more.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM
> > Jeff
> > > > > > Zhang <
> > > > > > > > > > > >> >> >>>>>>> zjffdu@gmail.com>
> > > > > > > > > > > >> >> >>>>>>>>>>>> wrote:
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Hi folks,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Sorry for late response, It
> > seems we
> > > > > > reach
> > > > > > > > > > > >> >> >>> consensus
> > > > > > > > > > > >> >> >>>> on
> > > > > > > > > > > >> >> >>>>>>>> this, I
> > > > > > > > > > > >> >> >>>>>>>>>>>> will
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> create
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> FLIP for this with more
> detailed
> > > > design
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org>
> > > > > > 于2018年12月21日周五
> > > > > > > > > > > >> >> >>>>> 上午11:43写道：
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Great to see this discussion
> > seeded!
> > > > > The
> > > > > > > > > > > >> >> >> problems
> > > > > > > > > > > >> >> >>>> you
> > > > > > > > > > > >> >> >>>>>> face
> > > > > > > > > > > >> >> >>>>>>>>>>> with
> > > > > > > > > > > >> >> >>>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also
> > > > > affecting
> > > > > > > > other
> > > > > > > > > > > >> >> >>>>> downstream
> > > > > > > > > > > >> >> >>>>>>>>>>> projects,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>> like
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Beam.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint
> > > > restore
> > > > > > option
> > > > > > > > > in
> > > > > > > > > > > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> [1]
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and that was more difficult
> > than it
> > > > > > should
> > > > > > > > be.
> > > > > > > > > > > >> >> >> The
> > > > > > > > > > > >> >> >>>>> main
> > > > > > > > > > > >> >> >>>>>>>> issue
> > > > > > > > > > > >> >> >>>>>>>>>>> is
> > > > > > > > > > > >> >> >>>>>>>>>>>>> that
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> environment and cluster client
> > > > aren't
> > > > > > > > > decoupled.
> > > > > > > > > > > >> >> >>>>> Ideally
> > > > > > > > > > > >> >> >>>>>>> it
> > > > > > > > > > > >> >> >>>>>>>>>>> should
> > > > > > > > > > > >> >> >>>>>>>>>>>>> be
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> possible to just get the
> > matching
> > > > > > cluster
> > > > > > > > > client
> > > > > > > > > > > >> >> >>>> from
> > > > > > > > > > > >> >> >>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>> environment
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> then control the job through
> it
> > > > > > (environment
> > > > > > > > > as
> > > > > > > > > > > >> >> >>>>> factory
> > > > > > > > > > > >> >> >>>>>>> for
> > > > > > > > > > > >> >> >>>>>>>>>>>> cluster
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> client). But note that the
> > > > environment
> > > > > > > > classes
> > > > > > > > > > > >> >> >> are
> > > > > > > > > > > >> >> >>>>> part
> > > > > > > > > > > >> >> >>>>>> of
> > > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>> public
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and it is not straightforward
> to
> > > > make
> > > > > > larger
> > > > > > > > > > > >> >> >>> changes
> > > > > > > > > > > >> >> >>>>>>> without
> > > > > > > > > > > >> >> >>>>>>>>>>>>> breaking
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently
> exposes
> > > > > internal
> > > > > > > > > classes
> > > > > > > > > > > >> >> >>>> like
> > > > > > > > > > > >> >> >>>>>>>>>>> JobGraph and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be
> > > > possible
> > > > > > to
> > > > > > > > wrap
> > > > > > > > > > > >> >> >>> this
> > > > > > > > > > > >> >> >>>>>> with a
> > > > > > > > > > > >> >> >>>>>>>> new
> > > > > > > > > > > >> >> >>>>>>>>>>>>> public
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> API
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that brings the required job
> > control
> > > > > > > > > > > >> >> >> capabilities
> > > > > > > > > > > >> >> >>>> for
> > > > > > > > > > > >> >> >>>>>>>>>>> downstream
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> projects.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look
> at
> > > > some
> > > > > > of the
> > > > > > > > > > > >> >> >>>>> interfaces
> > > > > > > > > > > >> >> >>>>>> in
> > > > > > > > > > > >> >> >>>>>>>>>>> Beam
> > > > > > > > > > > >> >> >>>>>>>>>>>>> while
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for
> the
> > > > > > portable
> > > > > > > > job
> > > > > > > > > > > >> >> >> API
> > > > > > > > > > > >> >> >>>> and
> > > > > > > > > > > >> >> >>>>>> [3]
> > > > > > > > > > > >> >> >>>>>>>> for
> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>> old
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> asynchronous job control from
> > the
> > > > Beam
> > > > > > Java
> > > > > > > > > SDK.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> The backward compatibility
> > > > discussion
> > > > > > [4] is
> > > > > > > > > > > >> >> >> also
> > > > > > > > > > > >> >> >>>>>> relevant
> > > > > > > > > > > >> >> >>>>>>>>>>> here. A
> > > > > > > > > > > >> >> >>>>>>>>>>>>> new
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> should shield downstream
> > projects
> > > > from
> > > > > > > > > internals
> > > > > > > > > > > >> >> >>> and
> > > > > > > > > > > >> >> >>>>>> allow
> > > > > > > > > > > >> >> >>>>>>>>>>> them to
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> interoperate with multiple
> > future
> > > > > Flink
> > > > > > > > > versions
> > > > > > > > > > > >> >> >>> in
> > > > > > > > > > > >> >> >>>>> the
> > > > > > > > > > > >> >> >>>>>>> same
> > > > > > > > > > > >> >> >>>>>>>>>>>> release
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> line
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thomas
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [1]
> > > > > > > > https://github.com/apache/flink/pull/7249
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [2]
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > > >> >> >>>>>
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>
> > > > > > > > > > > >> >> >>
> > > > > > > > > > > >> >>
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [3]
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > > >> >> >>>>>
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>
> > > > > > > > > > > >> >> >>
> > > > > > > > > > > >> >>
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [4]
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > > >> >> >>>>>
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>
> > > > > > > > > > > >> >> >>
> > > > > > > > > > > >> >>
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15
> PM
> > Jeff
> > > > > > Zhang <
> > > > > > > > > > > >> >> >>>>>>>> zjffdu@gmail.com>
> > > > > > > > > > > >> >> >>>>>>>>>>>>> wrote:
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether
> the
> > user
> > > > > > should
> > > > > > > > be
> > > > > > > > > > > >> >> >>> able
> > > > > > > > > > > >> >> >>>> to
> > > > > > > > > > > >> >> >>>>>>>> define
> > > > > > > > > > > >> >> >>>>>>>>>>>> where
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> job
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn).
> > This
> > > > is
> > > > > > > > actually
> > > > > > > > > > > >> >> >>>>>> independent
> > > > > > > > > > > >> >> >>>>>>>> of
> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>> job
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> development and is something
> > which
> > > > is
> > > > > > > > decided
> > > > > > > > > > > >> >> >> at
> > > > > > > > > > > >> >> >>>>>>> deployment
> > > > > > > > > > > >> >> >>>>>>>>>>> time.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> User don't need to specify
> > > > execution
> > > > > > mode
> > > > > > > > > > > >> >> >>>>>>> programmatically.
> > > > > > > > > > > >> >> >>>>>>>>>>> They
> > > > > > > > > > > >> >> >>>>>>>>>>>>> can
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> also
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from
> > the
> > > > > > arguments
> > > > > > > > in
> > > > > > > > > > > >> >> >>> flink
> > > > > > > > > > > >> >> >>>>> run
> > > > > > > > > > > >> >> >>>>>>>>>>> command.
> > > > > > > > > > > >> >> >>>>>>>>>>>>> e.g.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster
> > ....
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port
> ...
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that
> > the
> > > > > > > > > > > >> >> >>>> ExecutionEnvironment
> > > > > > > > > > > >> >> >>>>>> is
> > > > > > > > > > > >> >> >>>>>>>> not
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> directly
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> initialized by the user and
> > instead
> > > > > > context
> > > > > > > > > > > >> >> >>>> sensitive
> > > > > > > > > > > >> >> >>>>>> how
> > > > > > > > > > > >> >> >>>>>>>> you
> > > > > > > > > > > >> >> >>>>>>>>>>>> want
> > > > > > > > > > > >> >> >>>>>>>>>>>>> to
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI
> vs.
> > > > IDE,
> > > > > > for
> > > > > > > > > > > >> >> >>> example).
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice
> Flink
> > > > would
> > > > > > > > create
> > > > > > > > > > > >> >> >>>>> different
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment
> > based
> > > > on
> > > > > > > > > different
> > > > > > > > > > > >> >> >>>>>> submission
> > > > > > > > > > > >> >> >>>>>>>>>>>> scenarios
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> (Flink
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is
> > kind of
> > > > > hack
> > > > > > > > > > > >> >> >> approach,
> > > > > > > > > > > >> >> >>>> not
> > > > > > > > > > > >> >> >>>>>> so
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> straightforward.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> What I suggested above is
> that
> > is
> > > > > that
> > > > > > > > flink
> > > > > > > > > > > >> >> >>> should
> > > > > > > > > > > >> >> >>>>>>> always
> > > > > > > > > > > >> >> >>>>>>>>>>> create
> > > > > > > > > > > >> >> >>>>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with
> > > > > different
> > > > > > > > > > > >> >> >>>>> configuration,
> > > > > > > > > > > >> >> >>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>> based
> > > > > > > > > > > >> >> >>>>>>>>>>>> on
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> configuration it would create
> > the
> > > > > > proper
> > > > > > > > > > > >> >> >>>>> ClusterClient
> > > > > > > > > > > >> >> >>>>>>> for
> > > > > > > > > > > >> >> >>>>>>>>>>>>> different
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <
> > > > trohrmann@apache.org>
> > > > > > > > > > > >> >> >>>> 于2018年12月20日周四
> > > > > > > > > > > >> >> >>>>>>>>>>> 下午11:18写道：
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> You are probably right that
> we
> > > > have
> > > > > > code
> > > > > > > > > > > >> >> >>>> duplication
> > > > > > > > > > > >> >> >>>>>>> when
> > > > > > > > > > > >> >> >>>>>>>> it
> > > > > > > > > > > >> >> >>>>>>>>>>>> comes
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> to
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creation of the
> ClusterClient.
> > > > This
> > > > > > should
> > > > > > > > > be
> > > > > > > > > > > >> >> >>>>> reduced
> > > > > > > > > > > >> >> >>>>>> in
> > > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> future.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the
> > user
> > > > > > should be
> > > > > > > > > > > >> >> >> able
> > > > > > > > > > > >> >> >>> to
> > > > > > > > > > > >> >> >>>>>>> define
> > > > > > > > > > > >> >> >>>>>>>>>>> where
> > > > > > > > > > > >> >> >>>>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn).
> > This
> > > > is
> > > > > > > > > actually
> > > > > > > > > > > >> >> >>>>>>> independent
> > > > > > > > > > > >> >> >>>>>>>>>>> of the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> development and is something
> > which
> > > > > is
> > > > > > > > > decided
> > > > > > > > > > > >> >> >> at
> > > > > > > > > > > >> >> >>>>>>>> deployment
> > > > > > > > > > > >> >> >>>>>>>>>>>> time.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> To
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> it
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the
> > > > > > ExecutionEnvironment
> > > > > > > > is
> > > > > > > > > > > >> >> >> not
> > > > > > > > > > > >> >> >>>>>>> directly
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> initialized
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> by
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> the user and instead context
> > > > > > sensitive how
> > > > > > > > > you
> > > > > > > > > > > >> >> >>>> want
> > > > > > > > > > > >> >> >>>>> to
> > > > > > > > > > > >> >> >>>>>>>>>>> execute
> > > > > > > > > > > >> >> >>>>>>>>>>>>> your
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for
> > example).
> > > > > > > > However, I
> > > > > > > > > > > >> >> >>> agree
> > > > > > > > > > > >> >> >>>>>> that
> > > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should
> > give
> > > > you
> > > > > > > > access
> > > > > > > > > to
> > > > > > > > > > > >> >> >>> the
> > > > > > > > > > > >> >> >>>>>>>>>>> ClusterClient
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> to
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of
> the
> > > > > > JobGraph or
> > > > > > > > a
> > > > > > > > > > > >> >> >> job
> > > > > > > > > > > >> >> >>>>> plan).
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Till
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36
> > AM
> > > > Jeff
> > > > > > > > Zhang <
> > > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> wrote:
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback.
> You
> > are
> > > > > > right
> > > > > > > > > that I
> > > > > > > > > > > >> >> >>>>> expect
> > > > > > > > > > > >> >> >>>>>>>> better
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api
> > which
> > > > > > could be
> > > > > > > > > > > >> >> >> used
> > > > > > > > > > > >> >> >>> by
> > > > > > > > > > > >> >> >>>>>>>>>>> downstream
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> project.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> And
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the
> > flink
> > > > > > ecosystem.
> > > > > > > > > > > >> >> >> When
> > > > > > > > > > > >> >> >>> I
> > > > > > > > > > > >> >> >>>>> look
> > > > > > > > > > > >> >> >>>>>>> at
> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>> code
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> of
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> flink
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client
> (I
> > > > > believe
> > > > > > > > they
> > > > > > > > > > > >> >> >> are
> > > > > > > > > > > >> >> >>>> not
> > > > > > > > > > > >> >> >>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>> core of
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> flink,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> but
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of
> > > > flink),
> > > > > I
> > > > > > find
> > > > > > > > > > > >> >> >> many
> > > > > > > > > > > >> >> >>>>>>> duplicated
> > > > > > > > > > > >> >> >>>>>>>>>>> code
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> for
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creating
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user
> > provided
> > > > > > > > > > > >> >> >> configuration
> > > > > > > > > > > >> >> >>>>>>>>>>> (configuration
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> format
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> may
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell
> > and
> > > > > > > > sql-client)
> > > > > > > > > > > >> >> >> and
> > > > > > > > > > > >> >> >>>> then
> > > > > > > > > > > >> >> >>>>>> use
> > > > > > > > > > > >> >> >>>>>>>>>>> that
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't
> > think
> > > > > > this is
> > > > > > > > > > > >> >> >>>>> convenient
> > > > > > > > > > > >> >> >>>>>>> for
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> downstream
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is
> > that
> > > > > > > > downstream
> > > > > > > > > > > >> >> >>>> project
> > > > > > > > > > > >> >> >>>>>> only
> > > > > > > > > > > >> >> >>>>>>>>>>> needs
> > > > > > > > > > > >> >> >>>>>>>>>>>> to
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> provide
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration
> info
> > > > (maybe
> > > > > > > > > > > >> >> >> introducing
> > > > > > > > > > > >> >> >>>>> class
> > > > > > > > > > > >> >> >>>>>>>>>>>> FlinkConf),
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment
> > based
> > > > on
> > > > > > this
> > > > > > > > > > > >> >> >>>> FlinkConf,
> > > > > > > > > > > >> >> >>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will
> > create
> > > > > the
> > > > > > > > proper
> > > > > > > > > > > >> >> >>>>>>>> ClusterClient.
> > > > > > > > > > > >> >> >>>>>>>>>>> It
> > > > > > > > > > > >> >> >>>>>>>>>>>> not
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> only
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream
> > > > project
> > > > > > > > > > > >> >> >> development
> > > > > > > > > > > >> >> >>>> but
> > > > > > > > > > > >> >> >>>>>> also
> > > > > > > > > > > >> >> >>>>>>>> be
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> helpful
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> for
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> their integration test with
> > > > flink.
> > > > > > Here's
> > > > > > > > > one
> > > > > > > > > > > >> >> >>>>> sample
> > > > > > > > > > > >> >> >>>>>>> code
> > > > > > > > > > > >> >> >>>>>>>>>>>> snippet
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> I
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new
> > > > > > FlinkConf().mode("yarn")
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new
> > > > > > ExecutionEnvironment(conf)
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > > env.getClusterClient().queryJobStatus(jobId)
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > env.getClusterClient().cancelJob(jobId)
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
> > > > > trohrmann@apache.org>
> > > > > > > > > > > >> >> >>>>> 于2018年12月11日周二
> > > > > > > > > > > >> >> >>>>>>>>>>> 下午6:28写道：
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is
> to
> > > > > > provide the
> > > > > > > > > > > >> >> >> user
> > > > > > > > > > > >> >> >>>> with
> > > > > > > > > > > >> >> >>>>>>>> better
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was
> actually
> > an
> > > > > > effort to
> > > > > > > > > > > >> >> >>> achieve
> > > > > > > > > > > >> >> >>>>>> this
> > > > > > > > > > > >> >> >>>>>>>> but
> > > > > > > > > > > >> >> >>>>>>>>>>> it
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> has
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> never
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> been
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However,
> > there
> > > > are
> > > > > > some
> > > > > > > > > > > >> >> >>>> improvement
> > > > > > > > > > > >> >> >>>>>> in
> > > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>> code
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> base
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> now.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the
> > > > > > NewClusterClient
> > > > > > > > > > > >> >> >>>> interface
> > > > > > > > > > > >> >> >>>>>>> which
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> offers a
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job
> submission.
> > > > But I
> > > > > > agree
> > > > > > > > > > > >> >> >> that
> > > > > > > > > > > >> >> >>> we
> > > > > > > > > > > >> >> >>>>>> need
> > > > > > > > > > > >> >> >>>>>>> to
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> improve
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> in
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour
> if
> > > > > > exposing all
> > > > > > > > > > > >> >> >>>>>> ClusterClient
> > > > > > > > > > > >> >> >>>>>>>>>>> calls
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> via
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > because it
> > > > > > would
> > > > > > > > > > > >> >> >> clutter
> > > > > > > > > > > >> >> >>>> the
> > > > > > > > > > > >> >> >>>>>>> class
> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> would
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> not
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> a
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of
> concerns.
> > > > > > Instead one
> > > > > > > > > > > >> >> >> idea
> > > > > > > > > > > >> >> >>>>> could
> > > > > > > > > > > >> >> >>>>>> be
> > > > > > > > > > > >> >> >>>>>>>> to
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> retrieve
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from
> > the
> > > > > > > > > > > >> >> >>>>> ExecutionEnvironment
> > > > > > > > > > > >> >> >>>>>>>> which
> > > > > > > > > > > >> >> >>>>>>>>>>> can
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> be
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> used
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job
> > control. But
> > > > > > before
> > > > > > > > we
> > > > > > > > > > > >> >> >>> start
> > > > > > > > > > > >> >> >>>>> an
> > > > > > > > > > > >> >> >>>>>>>> effort
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> here,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> we
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> need
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what
> > > > > > functionality we
> > > > > > > > > want
> > > > > > > > > > > >> >> >>> to
> > > > > > > > > > > >> >> >>>>>>> provide.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was
> > that we
> > > > > > have the
> > > > > > > > > > > >> >> >>>>>>>> ClusterDescriptor
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> describing
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> how
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager
> > like
> > > > > > Yarn or
> > > > > > > > > > > >> >> >> Mesos.
> > > > > > > > > > > >> >> >>>> The
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> can
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> be
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink
> > > > clusters
> > > > > > (job
> > > > > > > > and
> > > > > > > > > > > >> >> >>>>> session)
> > > > > > > > > > > >> >> >>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>> gives
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> you a
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
> > ClusterClient
> > > > > > > > controls
> > > > > > > > > > > >> >> >> the
> > > > > > > > > > > >> >> >>>>>> cluster
> > > > > > > > > > > >> >> >>>>>>>>>>> (e.g.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> submitting
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running
> > jobs).
> > > > > And
> > > > > > > > then
> > > > > > > > > > > >> >> >>> there
> > > > > > > > > > > >> >> >>>>> was
> > > > > > > > > > > >> >> >>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>> idea
> > > > > > > > > > > >> >> >>>>>>>>>>>> to
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain
> > from
> > > > > the
> > > > > > > > > > > >> >> >>>> ClusterClient
> > > > > > > > > > > >> >> >>>>> to
> > > > > > > > > > > >> >> >>>>>>>>>>> trigger
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> specific
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a
> > > > > savepoint,
> > > > > > > > > > > >> >> >>> cancelling
> > > > > > > > > > > >> >> >>>>> the
> > > > > > > > > > > >> >> >>>>>>>> job).
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> > > > > > > > > > > >> >> >>>>>>
> > > > https://issues.apache.org/jira/browse/FLINK-4272
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at
> > 10:13 AM
> > > > > > Jeff
> > > > > > > > > Zhang
> > > > > > > > > > > >> >> >> <
> > > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> wrote:
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate
> > flink
> > > > > into
> > > > > > > > > apache
> > > > > > > > > > > >> >> >>>>> zeppelin
> > > > > > > > > > > >> >> >>>>>>>>>>> which is
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> an
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit
> several
> > > > > issues
> > > > > > that
> > > > > > > > > is
> > > > > > > > > > > >> >> >>>> caused
> > > > > > > > > > > >> >> >>>>>> by
> > > > > > > > > > > >> >> >>>>>>>>>>> flink
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> client
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> api.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> So
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the
> > > > > following
> > > > > > > > > changes
> > > > > > > > > > > >> >> >>> for
> > > > > > > > > > > >> >> >>>>>> flink
> > > > > > > > > > > >> >> >>>>>>>>>>> client
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> api.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking
> > > > execution.
> > > > > > > > > > > >> >> >> Currently,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> ExecutionEnvironment#execute
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method
> which
> > > > would
> > > > > > do 2
> > > > > > > > > > > >> >> >> things,
> > > > > > > > > > > >> >> >>>>> first
> > > > > > > > > > > >> >> >>>>>>>>>>> submit
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is
> > > > > finished.
> > > > > > I'd
> > > > > > > > > like
> > > > > > > > > > > >> >> >>>>>>> introduce a
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> nonblocking
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
> > > > > > > > > > > >> >> >>>> ExecutionEnvironment#submit
> > > > > > > > > > > >> >> >>>>>>> which
> > > > > > > > > > > >> >> >>>>>>>>>>> only
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> submit
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to
> > client.
> > > > And
> > > > > > allow
> > > > > > > > > user
> > > > > > > > > > > >> >> >>> to
> > > > > > > > > > > >> >> >>>>>> query
> > > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>> job
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> status
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> via
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>
> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to
> > > > cancel
> > > > > > job is
> > > > > > > > > via
> > > > > > > > > > > >> >> >>> cli
> > > > > > > > > > > >> >> >>>>>>>>>>> (bin/flink),
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> this
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> is
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> not
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream
> > > > project
> > > > > > to
> > > > > > > > use
> > > > > > > > > > > >> >> >> this
> > > > > > > > > > > >> >> >>>>>>> feature.
> > > > > > > > > > > >> >> >>>>>>>>>>> So I'd
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> like
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> add
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in
> > > > > ExecutionEnvironment
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>
> > ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> It
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api,
> > we
> > > > > > should use
> > > > > > > > > > > >> >> >>>>>>>>>>> ExecutionEnvironment
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> as
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to
> > > > integrate
> > > > > > with
> > > > > > > > > > > >> >> >> flink.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job
> > > > execution
> > > > > > > > > > > >> >> >> lifecycle.
> > > > > > > > > > > >> >> >>>>>>> Something
> > > > > > > > > > > >> >> >>>>>>>>>>> like
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> following,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> so
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project
> > can do
> > > > > > custom
> > > > > > > > > logic
> > > > > > > > > > > >> >> >>> in
> > > > > > > > > > > >> >> >>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>> lifecycle
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> of
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> job.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture
> the
> > > > jobId
> > > > > > after
> > > > > > > > > job
> > > > > > > > > > > >> >> >> is
> > > > > > > > > > > >> >> >>>>>>> submitted
> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> use
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> this
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later
> > when
> > > > > > > > necessary.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface
> > JobListener {
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
> onJobSubmitted(JobID
> > > > > jobId);
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
> > > > > > onJobExecuted(JobExecutionResult
> > > > > > > > > > > >> >> >>>>> jobResult);
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID
> > > > jobId);
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in
> > > > > > > > > ExecutionEnvironment.
> > > > > > > > > > > >> >> >>>>>> Currently
> > > > > > > > > > > >> >> >>>>>>> it
> > > > > > > > > > > >> >> >>>>>>>>>>> is
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> disabled,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> but
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very
> convenient
> > for
> > > > > > third
> > > > > > > > > party
> > > > > > > > > > > >> >> >> to
> > > > > > > > > > > >> >> >>>>>>>> submitting
> > > > > > > > > > > >> >> >>>>>>>>>>> jobs
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable
> it
> > > > again.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client
> > api
> > > > > into
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>
> > ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue
> > which
> > > > > > needs
> > > > > > > > > more
> > > > > > > > > > > >> >> >>>>> careful
> > > > > > > > > > > >> >> >>>>>>>>>>> thinking
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> design.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of
> features
> > of
> > > > > > flink is
> > > > > > > > > > > >> >> >>> exposed
> > > > > > > > > > > >> >> >>>> in
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>
> > ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > > > > > > > >> >> >>>>>> but
> > > > > > > > > > > >> >> >>>>>>>>>>> some are
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> exposed
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> in
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like
> > the
> > > > > > cancel and
> > > > > > > > > > > >> >> >>>>> savepoint I
> > > > > > > > > > > >> >> >>>>>>>>>>> mentioned
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> above.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is
> > due to
> > > > > that
> > > > > > > > flink
> > > > > > > > > > > >> >> >>>> didn't
> > > > > > > > > > > >> >> >>>>>>> unify
> > > > > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> interaction
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> with
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3
> > scenarios
> > > > of
> > > > > > flink
> > > > > > > > > > > >> >> >>>> operation
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.
> > Flink
> > > > > will
> > > > > > > > > create
> > > > > > > > > > > >> >> >>>>>>>>>>> LocalEnvironment
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> use
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to
> > > > create
> > > > > > > > > > > >> >> >>> LocalExecutor
> > > > > > > > > > > >> >> >>>>> for
> > > > > > > > > > > >> >> >>>>>>> job
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> execution.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution.
> > Flink
> > > > > will
> > > > > > > > > create
> > > > > > > > > > > >> >> >>>>>>>> ClusterClient
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> first
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> then
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create
> ContextEnvironment
> > > > based
> > > > > > on the
> > > > > > > > > > > >> >> >>>>>>> ClusterClient
> > > > > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> run
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink
> > will
> > > > > > create
> > > > > > > > > > > >> >> >>>>>> ClusterClient
> > > > > > > > > > > >> >> >>>>>>>>>>> first
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> then
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this
> > > > ClusterClient.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the
> > above 3
> > > > > > > > scenarios.
> > > > > > > > > > > >> >> >>> Flink
> > > > > > > > > > > >> >> >>>>>> didn't
> > > > > > > > > > > >> >> >>>>>>>>>>> use the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to
> > interact
> > > > > > with
> > > > > > > > > flink
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is
> > following:
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> > > > > > > > > > > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
> > > > > > > > > > > >> >> >>>>>>>>>>> (based
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> on
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> user
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use
> this
> > > > > > Environment
> > > > > > > > > to
> > > > > > > > > > > >> >> >>>> create
> > > > > > > > > > > >> >> >>>>>>>> proper
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or
> > > > > > > > RestClusterClient)
> > > > > > > > > > > >> >> >> to
> > > > > > > > > > > >> >> >>>>>>>> interactive
> > > > > > > > > > > >> >> >>>>>>>>>>> with
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink (
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the
> > > > process
> > > > > > of
> > > > > > > > > local
> > > > > > > > > > > >> >> >>>>>> execution
> > > > > > > > > > > >> >> >>>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> remote
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for
> > third
> > > > > > party
> > > > > > > > to
> > > > > > > > > > > >> >> >>>>> integrate
> > > > > > > > > > > >> >> >>>>>>> with
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> flink,
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> because
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is
> the
> > > > > unified
> > > > > > > > entry
> > > > > > > > > > > >> >> >>> point
> > > > > > > > > > > >> >> >>>>> for
> > > > > > > > > > > >> >> >>>>>>>>>>> flink.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> third
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> party
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass
> > > > > > configuration
> > > > > > > > to
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will
> > do
> > > > the
> > > > > > right
> > > > > > > > > > > >> >> >> thing
> > > > > > > > > > > >> >> >>>>> based
> > > > > > > > > > > >> >> >>>>>> on
> > > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be
> > > > considered
> > > > > as
> > > > > > > > flink
> > > > > > > > > > > >> >> >> api
> > > > > > > > > > > >> >> >>>>>>> consumer.
> > > > > > > > > > > >> >> >>>>>>>>>>> it
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> just
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to
> > > > > > ExecutionEnvironment
> > > > > > > > and
> > > > > > > > > > > >> >> >> let
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper
> > ClusterClient
> > > > > > instead
> > > > > > > > > of
> > > > > > > > > > > >> >> >>>>> letting
> > > > > > > > > > > >> >> >>>>>>> cli
> > > > > > > > > > > >> >> >>>>>>>> to
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> create
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large
> code
> > > > > > refactoring,
> > > > > > > > > so
> > > > > > > > > > > >> >> >> I
> > > > > > > > > > > >> >> >>>>> think
> > > > > > > > > > > >> >> >>>>>> we
> > > > > > > > > > > >> >> >>>>>>>> can
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> defer
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> it
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> for
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5
> > could
> > > > > be
> > > > > > done
> > > > > > > > > at
> > > > > > > > > > > >> >> >>>> once I
> > > > > > > > > > > >> >> >>>>>>>>>>> believe.
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Let
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> know
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> your
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback,
> > thanks
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> --
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> --
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> --
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> --
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Best Regards
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>> --
> > > > > > > > > > > >> >> >>>>>>>>>>>>> Best Regards
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>> --
> > > > > > > > > > > >> >> >>>>>>>> Best Regards
> > > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>> Jeff Zhang
> > > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > > >> >> >>>>>
> > > > > > > > > > > >> >> >>>>>
> > > > > > > > > > > >> >> >>>>> --
> > > > > > > > > > > >> >> >>>>> Best Regards
> > > > > > > > > > > >> >> >>>>>
> > > > > > > > > > > >> >> >>>>> Jeff Zhang
> > > > > > > > > > > >> >> >>>>>
> > > > > > > > > > > >> >> >>>>
> > > > > > > > > > > >> >> >>>
> > > > > > > > > > > >> >> >>
> > > > > > > > > > > >> >> >
> > > > > > > > > > > >> >> >
> > > > > > > > > > > >> >> > --
> > > > > > > > > > > >> >> > Best Regards
> > > > > > > > > > > >> >> >
> > > > > > > > > > > >> >> > Jeff Zhang
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >>
> > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Jeff Zhang <zj...@gmail.com>.

Awesome, @Kostas Looking forward your POC.

Kostas Kloudas <kk...@gmail.com> 于2019年8月30日周五 下午8:33写道：

> Hi all,
>
> I am just writing here to let you know that I am working on a POC that
> tries to refactor the current state of job submission in Flink.
> I want to stress out that it introduces NO CHANGES to the current
> behaviour of Flink. It just re-arranges things and introduces the
> notion of an Executor, which is the entity responsible for taking the
> user-code and submitting it for execution.
>
> Given this, the discussion about the functionality that the JobClient
> will expose to the user can go on independently and the same
> holds for all the open questions so far.
>
> I hope I will have some more new to share soon.
>
> Thanks,
> Kostas
>
> On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <da...@gmail.com> wrote:
> >
> > Hi Zili,
> >
> > It make sense to me that a dedicated cluster is started for a per-job
> > cluster and will not accept more jobs.
> > Just have a question about the command line.
> >
> > Currently we could use the following commands to start different
> clusters.
> > *per-job cluster*
> > ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
> > examples/streaming/WindowJoin.jar
> > *session cluster*
> > ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
> > examples/streaming/WindowJoin.jar
> >
> > What will it look like after client enhancement?
> >
> >
> > Best,
> > Yang
> >
> > Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
> >
> > > Hi Till,
> > >
> > > Thanks for your update. Nice to hear :-)
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Till Rohrmann <tr...@apache.org> 于2019年8月23日周五 下午10:39写道：
> > >
> > > > Hi Tison,
> > > >
> > > > just a quick comment concerning the class loading issues when using
> the
> > > per
> > > > job mode. The community wants to change it so that the
> > > > StandaloneJobClusterEntryPoint actually uses the user code class
> loader
> > > > with child first class loading [1]. Hence, I hope that this problem
> will
> > > be
> > > > resolved soon.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-13840
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <kk...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > On the topic of web submission, I agree with Till that it only
> seems
> > > > > to complicate things.
> > > > > It is bad for security, job isolation (anybody can submit/cancel
> jobs),
> > > > > and its
> > > > > implementation complicates some parts of the code. So, if it were
> to
> > > > > redesign the
> > > > > WebUI, maybe this part could be left out. In addition, I would say
> > > > > that the ability to cancel
> > > > > jobs could also be left out.
> > > > >
> > > > > Also I would also be in favour of removing the "detached" mode, for
> > > > > the reasons mentioned
> > > > > above (i.e. because now we will have a future representing the
> result
> > > > > on which the user
> > > > > can choose to wait or not).
> > > > >
> > > > > Now for the separating job submission and cluster creation, I am in
> > > > > favour of keeping both.
> > > > > Once again, the reasons are mentioned above by Stephan, Till,
> Aljoscha
> > > > > and also Zili seems
> > > > > to agree. They mainly have to do with security, isolation and ease
> of
> > > > > resource management
> > > > > for the user as he knows that "when my job is done, everything
> will be
> > > > > cleared up". This is
> > > > > also the experience you get when launching a process on your local
> OS.
> > > > >
> > > > > On excluding the per-job mode from returning a JobClient or not, I
> > > > > believe that eventually
> > > > > it would be nice to allow users to get back a jobClient. The
> reason is
> > > > > that 1) I cannot
> > > > > find any objective reason why the user-experience should diverge,
> and
> > > > > 2) this will be the
> > > > > way that the user will be able to interact with his running job.
> > > > > Assuming that the necessary
> > > > > ports are open for the REST API to work, then I think that the
> > > > > JobClient can run against the
> > > > > REST API without problems. If the needed ports are not open, then
> we
> > > > > are safe to not return
> > > > > a JobClient, as the user explicitly chose to close all points of
> > > > > communication to his running job.
> > > > >
> > > > > On the topic of not hijacking the "env.execute()" in order to get
> the
> > > > > Plan, I definitely agree but
> > > > > for the proposal of having a "compile()" method in the env, I would
> > > > > like to have a better look at
> > > > > the existing code.
> > > > >
> > > > > Cheers,
> > > > > Kostas
> > > > >
> > > > > On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <wa...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > Hi Yang,
> > > > > >
> > > > > > It would be helpful if you check Stephan's last comment,
> > > > > > which states that isolation is important.
> > > > > >
> > > > > > For per-job mode, we run a dedicated cluster(maybe it
> > > > > > should have been a couple of JM and TMs during FLIP-6
> > > > > > design) for a specific job. Thus the process is prevented
> > > > > > from other jobs.
> > > > > >
> > > > > > In our cases there was a time we suffered from multi
> > > > > > jobs submitted by different users and they affected
> > > > > > each other so that all ran into an error state. Also,
> > > > > > run the client inside the cluster could save client
> > > > > > resource at some points.
> > > > > >
> > > > > > However, we also face several issues as you mentioned,
> > > > > > that in per-job mode it always uses parent classloader
> > > > > > thus classloading issues occur.
> > > > > >
> > > > > > BTW, one can makes an analogy between session/per-job mode
> > > > > > in  Flink, and client/cluster mode in Spark.
> > > > > >
> > > > > > Best,
> > > > > > tison.
> > > > > >
> > > > > >
> > > > > > Yang Wang <da...@gmail.com> 于2019年8月22日周四 上午11:25写道：
> > > > > >
> > > > > > > From the user's perspective, it is really confused about the
> scope
> > > of
> > > > > > > per-job cluster.
> > > > > > >
> > > > > > >
> > > > > > > If it means a flink cluster with single job, so that we could
> get
> > > > > better
> > > > > > > isolation.
> > > > > > >
> > > > > > > Now it does not matter how we deploy the cluster, directly
> > > > > deploy(mode1)
> > > > > > >
> > > > > > > or start a flink cluster and then submit job through cluster
> > > > > client(mode2).
> > > > > > >
> > > > > > >
> > > > > > > Otherwise, if it just means directly deploy, how should we
> name the
> > > > > mode2,
> > > > > > >
> > > > > > > session with job or something else?
> > > > > > >
> > > > > > > We could also benefit from the mode2. Users could get the same
> > > > > isolation
> > > > > > > with mode1.
> > > > > > >
> > > > > > > The user code and dependencies will be loaded by user class
> loader
> > > > > > >
> > > > > > > to avoid class conflict with framework.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Anyway, both of the two submission modes are useful.
> > > > > > >
> > > > > > > We just need to clarify the concepts.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Yang
> > > > > > >
> > > > > > > Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午5:58写道：
> > > > > > >
> > > > > > > > Thanks for the clarification.
> > > > > > > >
> > > > > > > > The idea JobDeployer ever came into my mind when I was
> muddled
> > > with
> > > > > > > > how to execute per-job mode and session mode with the same
> user
> > > > code
> > > > > > > > and framework codepath.
> > > > > > > >
> > > > > > > > With the concept JobDeployer we back to the statement that
> > > > > environment
> > > > > > > > knows every configs of cluster deployment and job
> submission. We
> > > > > > > > configure or generate from configuration a specific
> JobDeployer
> > > in
> > > > > > > > environment and then code align on
> > > > > > > >
> > > > > > > > *JobClient client = env.execute().get();*
> > > > > > > >
> > > > > > > > which in session mode returned by clusterClient.submitJob
> and in
> > > > > per-job
> > > > > > > > mode returned by clusterDescriptor.deployJobCluster.
> > > > > > > >
> > > > > > > > Here comes a problem that currently we directly run
> > > > ClusterEntrypoint
> > > > > > > > with extracted job graph. Follow the JobDeployer way we'd
> better
> > > > > > > > align entry point of per-job deployment at JobDeployer.
> Users run
> > > > > > > > their main method or by a Cli(finally call main method) to
> deploy
> > > > the
> > > > > > > > job cluster.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > tison.
> > > > > > > >
> > > > > > > >
> > > > > > > > Stephan Ewen <se...@apache.org> 于2019年8月20日周二 下午4:40写道：
> > > > > > > >
> > > > > > > > > Till has made some good comments here.
> > > > > > > > >
> > > > > > > > > Two things to add:
> > > > > > > > >
> > > > > > > > >   - The job mode is very nice in the way that it runs the
> > > client
> > > > > inside
> > > > > > > > the
> > > > > > > > > cluster (in the same image/process that is the JM) and thus
> > > > unifies
> > > > > > > both
> > > > > > > > > applications and what the Spark world calls the "driver
> mode".
> > > > > > > > >
> > > > > > > > >   - Another thing I would add is that during the FLIP-6
> design,
> > > > we
> > > > > were
> > > > > > > > > thinking about setups where Dispatcher and JobManager are
> > > > separate
> > > > > > > > > processes.
> > > > > > > > >     A Yarn or Mesos Dispatcher of a session could run
> > > > independently
> > > > > > > (even
> > > > > > > > > as privileged processes executing no code).
> > > > > > > > >     Then you the "per-job" mode could still be helpful:
> when a
> > > > job
> > > > > is
> > > > > > > > > submitted to the dispatcher, it launches the JM again in a
> > > > per-job
> > > > > > > mode,
> > > > > > > > so
> > > > > > > > > that JM and TM processes are bound to teh job only. For
> higher
> > > > > security
> > > > > > > > > setups, it is important that processes are not reused
> across
> > > > jobs.
> > > > > > > > >
> > > > > > > > > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
> > > > > trohrmann@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I would not be in favour of getting rid of the per-job
> mode
> > > > > since it
> > > > > > > > > > simplifies the process of running Flink jobs
> considerably.
> > > > > Moreover,
> > > > > > > it
> > > > > > > > > is
> > > > > > > > > > not only well suited for container deployments but also
> for
> > > > > > > deployments
> > > > > > > > > > where you want to guarantee job isolation. For example, a
> > > user
> > > > > could
> > > > > > > > use
> > > > > > > > > > the per-job mode on Yarn to execute his job on a separate
> > > > > cluster.
> > > > > > > > > >
> > > > > > > > > > I think that having two notions of cluster deployments
> > > (session
> > > > > vs.
> > > > > > > > > per-job
> > > > > > > > > > mode) does not necessarily contradict your ideas for the
> > > client
> > > > > api
> > > > > > > > > > refactoring. For example one could have the following
> > > > interfaces:
> > > > > > > > > >
> > > > > > > > > > - ClusterDeploymentDescriptor: encapsulates the logic
> how to
> > > > > deploy a
> > > > > > > > > > cluster.
> > > > > > > > > > - ClusterClient: allows to interact with a cluster
> > > > > > > > > > - JobClient: allows to interact with a running job
> > > > > > > > > >
> > > > > > > > > > Now the ClusterDeploymentDescriptor could have two
> methods:
> > > > > > > > > >
> > > > > > > > > > - ClusterClient deploySessionCluster()
> > > > > > > > > > - JobClusterClient/JobClient
> deployPerJobCluster(JobGraph)
> > > > > > > > > >
> > > > > > > > > > where JobClusterClient is either a supertype of
> ClusterClient
> > > > > which
> > > > > > > > does
> > > > > > > > > > not give you the functionality to submit jobs or
> > > > > deployPerJobCluster
> > > > > > > > > > returns directly a JobClient.
> > > > > > > > > >
> > > > > > > > > > When setting up the ExecutionEnvironment, one would then
> not
> > > > > provide
> > > > > > > a
> > > > > > > > > > ClusterClient to submit jobs but a JobDeployer which,
> > > depending
> > > > > on
> > > > > > > the
> > > > > > > > > > selected mode, either uses a ClusterClient (session
> mode) to
> > > > > submit
> > > > > > > > jobs
> > > > > > > > > or
> > > > > > > > > > a ClusterDeploymentDescriptor to deploy per a job mode
> > > cluster
> > > > > with
> > > > > > > the
> > > > > > > > > job
> > > > > > > > > > to execute.
> > > > > > > > > >
> > > > > > > > > > These are just some thoughts how one could make it
> working
> > > > > because I
> > > > > > > > > > believe there is some value in using the per job mode
> from
> > > the
> > > > > > > > > > ExecutionEnvironment.
> > > > > > > > > >
> > > > > > > > > > Concerning the web submission, this is indeed a bit
> tricky.
> > > > From
> > > > > a
> > > > > > > > > cluster
> > > > > > > > > > management stand point, I would in favour of not
> executing
> > > user
> > > > > code
> > > > > > > on
> > > > > > > > > the
> > > > > > > > > > REST endpoint. Especially when considering security, it
> would
> > > > be
> > > > > good
> > > > > > > > to
> > > > > > > > > > have a well defined cluster behaviour where it is
> explicitly
> > > > > stated
> > > > > > > > where
> > > > > > > > > > user code and, thus, potentially risky code is executed.
> > > > Ideally
> > > > > we
> > > > > > > > limit
> > > > > > > > > > it to the TaskExecutor and JobMaster.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Till
> > > > > > > > > >
> > > > > > > > > > On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <
> > > > > > > > pompermaier@okkam.it
> > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > In my opinion the client should not use any
> environment to
> > > > get
> > > > > the
> > > > > > > > Job
> > > > > > > > > > > graph because the jar should reside ONLY on the cluster
> > > (and
> > > > > not in
> > > > > > > > the
> > > > > > > > > > > client classpath otherwise there are always
> inconsistencies
> > > > > between
> > > > > > > > > > client
> > > > > > > > > > > and Flink Job manager's classpath).
> > > > > > > > > > > In the YARN, Mesos and Kubernetes scenarios you have
> the
> > > jar
> > > > > but
> > > > > > > you
> > > > > > > > > > could
> > > > > > > > > > > start a cluster that has the jar on the Job Manager as
> well
> > > > > (but
> > > > > > > this
> > > > > > > > > is
> > > > > > > > > > > the only case where I think you can assume that the
> client
> > > > has
> > > > > the
> > > > > > > > jar
> > > > > > > > > on
> > > > > > > > > > > the classpath..in the REST job submission you don't
> have
> > > any
> > > > > > > > > classpath).
> > > > > > > > > > >
> > > > > > > > > > > Thus, always in my opinion, the JobGraph should be
> > > generated
> > > > > by the
> > > > > > > > Job
> > > > > > > > > > > Manager REST API.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
> > > > > wander4096@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> I would like to involve Till & Stephan here to clarify
> > > some
> > > > > > > concept
> > > > > > > > of
> > > > > > > > > > >> per-job mode.
> > > > > > > > > > >>
> > > > > > > > > > >> The term per-job is one of modes a cluster could run
> on.
> > > It
> > > > is
> > > > > > > > mainly
> > > > > > > > > > >> aimed
> > > > > > > > > > >> at spawn
> > > > > > > > > > >> a dedicated cluster for a specific job while the job
> could
> > > > be
> > > > > > > > packaged
> > > > > > > > > > >> with
> > > > > > > > > > >> Flink
> > > > > > > > > > >> itself and thus the cluster initialized with job so
> that
> > > get
> > > > > rid
> > > > > > > of
> > > > > > > > a
> > > > > > > > > > >> separated
> > > > > > > > > > >> submission step.
> > > > > > > > > > >>
> > > > > > > > > > >> This is useful for container deployments where one
> create
> > > > his
> > > > > > > image
> > > > > > > > > with
> > > > > > > > > > >> the job
> > > > > > > > > > >> and then simply deploy the container.
> > > > > > > > > > >>
> > > > > > > > > > >> However, it is out of client scope since a
> > > > > client(ClusterClient
> > > > > > > for
> > > > > > > > > > >> example) is for
> > > > > > > > > > >> communicate with an existing cluster and performance
> > > > actions.
> > > > > > > > > Currently,
> > > > > > > > > > >> in
> > > > > > > > > > >> per-job
> > > > > > > > > > >> mode, we extract the job graph and bundle it into
> cluster
> > > > > > > deployment
> > > > > > > > > and
> > > > > > > > > > >> thus no
> > > > > > > > > > >> concept of client get involved. It looks like
> reasonable
> > > to
> > > > > > > exclude
> > > > > > > > > the
> > > > > > > > > > >> deployment
> > > > > > > > > > >> of per-job cluster from client api and use dedicated
> > > utility
> > > > > > > > > > >> classes(deployers) for
> > > > > > > > > > >> deployment.
> > > > > > > > > > >>
> > > > > > > > > > >> Zili Chen <wa...@gmail.com> 于2019年8月20日周二
> 下午12:37写道：
> > > > > > > > > > >>
> > > > > > > > > > >> > Hi Aljoscha,
> > > > > > > > > > >> >
> > > > > > > > > > >> > Thanks for your reply and participance. The Google
> Doc
> > > you
> > > > > > > linked
> > > > > > > > to
> > > > > > > > > > >> > requires
> > > > > > > > > > >> > permission and I think you could use a share link
> > > instead.
> > > > > > > > > > >> >
> > > > > > > > > > >> > I agree with that we almost reach a consensus that
> > > > > JobClient is
> > > > > > > > > > >> necessary
> > > > > > > > > > >> > to
> > > > > > > > > > >> > interacte with a running Job.
> > > > > > > > > > >> >
> > > > > > > > > > >> > Let me check your open questions one by one.
> > > > > > > > > > >> >
> > > > > > > > > > >> > 1. Separate cluster creation and job submission for
> > > > per-job
> > > > > > > mode.
> > > > > > > > > > >> >
> > > > > > > > > > >> > As you mentioned here is where the opinions
> diverge. In
> > > my
> > > > > > > > document
> > > > > > > > > > >> there
> > > > > > > > > > >> > is
> > > > > > > > > > >> > an alternative[2] that proposes excluding per-job
> > > > deployment
> > > > > > > from
> > > > > > > > > > client
> > > > > > > > > > >> > api
> > > > > > > > > > >> > scope and now I find it is more reasonable we do the
> > > > > exclusion.
> > > > > > > > > > >> >
> > > > > > > > > > >> > When in per-job mode, a dedicated JobCluster is
> launched
> > > > to
> > > > > > > > execute
> > > > > > > > > > the
> > > > > > > > > > >> > specific job. It is like a Flink Application more
> than a
> > > > > > > > submission
> > > > > > > > > > >> > of Flink Job. Client only takes care of job
> submission
> > > and
> > > > > > > assume
> > > > > > > > > > there
> > > > > > > > > > >> is
> > > > > > > > > > >> > an existing cluster. In this way we are able to
> consider
> > > > > per-job
> > > > > > > > > > issues
> > > > > > > > > > >> > individually and JobClusterEntrypoint would be the
> > > utility
> > > > > class
> > > > > > > > for
> > > > > > > > > > >> > per-job
> > > > > > > > > > >> > deployment.
> > > > > > > > > > >> >
> > > > > > > > > > >> > Nevertheless, user program works in both session
> mode
> > > and
> > > > > > > per-job
> > > > > > > > > mode
> > > > > > > > > > >> > without
> > > > > > > > > > >> > necessary to change code. JobClient in per-job mode
> is
> > > > > returned
> > > > > > > > from
> > > > > > > > > > >> > env.execute as normal. However, it would be no
> longer a
> > > > > wrapper
> > > > > > > of
> > > > > > > > > > >> > RestClusterClient but a wrapper of
> PerJobClusterClient
> > > > which
> > > > > > > > > > >> communicates
> > > > > > > > > > >> > to Dispatcher locally.
> > > > > > > > > > >> >
> > > > > > > > > > >> > 2. How to deal with plan preview.
> > > > > > > > > > >> >
> > > > > > > > > > >> > With env.compile functions users can get JobGraph or
> > > > > FlinkPlan
> > > > > > > and
> > > > > > > > > > thus
> > > > > > > > > > >> > they can preview the plan with programming.
> Typically it
> > > > > looks
> > > > > > > > like
> > > > > > > > > > >> >
> > > > > > > > > > >> > if (preview configured) {
> > > > > > > > > > >> >     FlinkPlan plan = env.compile();
> > > > > > > > > > >> >     new JSONDumpGenerator(...).dump(plan);
> > > > > > > > > > >> > } else {
> > > > > > > > > > >> >     env.execute();
> > > > > > > > > > >> > }
> > > > > > > > > > >> >
> > > > > > > > > > >> > And `flink info` would be invalid any more.
> > > > > > > > > > >> >
> > > > > > > > > > >> > 3. How to deal with Jar Submission at the Web
> Frontend.
> > > > > > > > > > >> >
> > > > > > > > > > >> > There is one more thread talked on this topic[1].
> Apart
> > > > from
> > > > > > > > > removing
> > > > > > > > > > >> > the functions there are two alternatives.
> > > > > > > > > > >> >
> > > > > > > > > > >> > One is to introduce an interface has a method
> returns
> > > > > > > > > > JobGraph/FilnkPlan
> > > > > > > > > > >> > and Jar Submission only support main-class
> implements
> > > this
> > > > > > > > > interface.
> > > > > > > > > > >> > And then extract the JobGraph/FlinkPlan just by
> calling
> > > > the
> > > > > > > > method.
> > > > > > > > > > >> > In this way, it is even possible to consider a
> > > separation
> > > > > of job
> > > > > > > > > > >> creation
> > > > > > > > > > >> > and job submission.
> > > > > > > > > > >> >
> > > > > > > > > > >> > The other is, as you mentioned, let execute() do the
> > > > actual
> > > > > > > > > execution.
> > > > > > > > > > >> > We won't execute the main method in the WebFrontend
> but
> > > > > spawn a
> > > > > > > > > > process
> > > > > > > > > > >> > at WebMonitor side to execute. For return part we
> could
> > > > > generate
> > > > > > > > the
> > > > > > > > > > >> > JobID from WebMonitor and pass it to the execution
> > > > > environemnt.
> > > > > > > > > > >> >
> > > > > > > > > > >> > 4. How to deal with detached mode.
> > > > > > > > > > >> >
> > > > > > > > > > >> > I think detached mode is a temporary solution for
> > > > > non-blocking
> > > > > > > > > > >> submission.
> > > > > > > > > > >> > In my document both submission and execution return
> a
> > > > > > > > > > CompletableFuture
> > > > > > > > > > >> and
> > > > > > > > > > >> > users control whether or not wait for the result. In
> > > this
> > > > > point
> > > > > > > we
> > > > > > > > > > don't
> > > > > > > > > > >> > need a detached option but the functionality is
> covered.
> > > > > > > > > > >> >
> > > > > > > > > > >> > 5. How does per-job mode interact with interactive
> > > > > programming.
> > > > > > > > > > >> >
> > > > > > > > > > >> > All of YARN, Mesos and Kubernetes scenarios follow
> the
> > > > > pattern
> > > > > > > > > launch
> > > > > > > > > > a
> > > > > > > > > > >> > JobCluster now. And I don't think there would be
> > > > > inconsistency
> > > > > > > > > between
> > > > > > > > > > >> > different resource management.
> > > > > > > > > > >> >
> > > > > > > > > > >> > Best,
> > > > > > > > > > >> > tison.
> > > > > > > > > > >> >
> > > > > > > > > > >> > [1]
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> > > > > > > > > > >> > [2]
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> > > > > > > > > > >> >
> > > > > > > > > > >> > Aljoscha Krettek <al...@apache.org>
> 于2019年8月16日周五
> > > > > 下午9:20写道：
> > > > > > > > > > >> >
> > > > > > > > > > >> >> Hi,
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> I read both Jeffs initial design document and the
> newer
> > > > > > > document
> > > > > > > > by
> > > > > > > > > > >> >> Tison. I also finally found the time to collect our
> > > > > thoughts on
> > > > > > > > the
> > > > > > > > > > >> issue,
> > > > > > > > > > >> >> I had quite some discussions with Kostas and this
> is
> > > the
> > > > > > > result:
> > > > > > > > > [1].
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> I think overall we agree that this part of the
> code is
> > > in
> > > > > dire
> > > > > > > > need
> > > > > > > > > > of
> > > > > > > > > > >> >> some refactoring/improvements but I think there are
> > > still
> > > > > some
> > > > > > > > open
> > > > > > > > > > >> >> questions and some differences in opinion what
> those
> > > > > > > refactorings
> > > > > > > > > > >> should
> > > > > > > > > > >> >> look like.
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> I think the API-side is quite clear, i.e. we need
> some
> > > > > > > JobClient
> > > > > > > > > API
> > > > > > > > > > >> that
> > > > > > > > > > >> >> allows interacting with a running Job. It could be
> > > > > worthwhile
> > > > > > > to
> > > > > > > > > spin
> > > > > > > > > > >> that
> > > > > > > > > > >> >> off into a separate FLIP because we can probably
> find
> > > > > consensus
> > > > > > > > on
> > > > > > > > > > that
> > > > > > > > > > >> >> part more easily.
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> For the rest, the main open questions from our doc
> are
> > > > > these:
> > > > > > > > > > >> >>
> > > > > > > > > > >> >>   - Do we want to separate cluster creation and job
> > > > > submission
> > > > > > > > for
> > > > > > > > > > >> >> per-job mode? In the past, there were conscious
> efforts
> > > > to
> > > > > > > *not*
> > > > > > > > > > >> separate
> > > > > > > > > > >> >> job submission from cluster creation for per-job
> > > clusters
> > > > > for
> > > > > > > > > Mesos,
> > > > > > > > > > >> YARN,
> > > > > > > > > > >> >> Kubernets (see StandaloneJobClusterEntryPoint).
> Tison
> > > > > suggests
> > > > > > > in
> > > > > > > > > his
> > > > > > > > > > >> >> design document to decouple this in order to unify
> job
> > > > > > > > submission.
> > > > > > > > > > >> >>
> > > > > > > > > > >> >>   - How to deal with plan preview, which needs to
> > > hijack
> > > > > > > > execute()
> > > > > > > > > > and
> > > > > > > > > > >> >> let the outside code catch an exception?
> > > > > > > > > > >> >>
> > > > > > > > > > >> >>   - How to deal with Jar Submission at the Web
> > > Frontend,
> > > > > which
> > > > > > > > > needs
> > > > > > > > > > to
> > > > > > > > > > >> >> hijack execute() and let the outside code catch an
> > > > > exception?
> > > > > > > > > > >> >> CliFrontend.run() “hijacks”
> > > > ExecutionEnvironment.execute()
> > > > > to
> > > > > > > > get a
> > > > > > > > > > >> >> JobGraph and then execute that JobGraph manually.
> We
> > > > could
> > > > > get
> > > > > > > > > around
> > > > > > > > > > >> that
> > > > > > > > > > >> >> by letting execute() do the actual execution. One
> > > caveat
> > > > > for
> > > > > > > this
> > > > > > > > > is
> > > > > > > > > > >> that
> > > > > > > > > > >> >> now the main() method doesn’t return (or is forced
> to
> > > > > return by
> > > > > > > > > > >> throwing an
> > > > > > > > > > >> >> exception from execute()) which means that for Jar
> > > > > Submission
> > > > > > > > from
> > > > > > > > > > the
> > > > > > > > > > >> >> WebFrontend we have a long-running main() method
> > > running
> > > > > in the
> > > > > > > > > > >> >> WebFrontend. This doesn’t sound very good. We
> could get
> > > > > around
> > > > > > > > this
> > > > > > > > > > by
> > > > > > > > > > >> >> removing the plan preview feature and by removing
> Jar
> > > > > > > > > > >> Submission/Running.
> > > > > > > > > > >> >>
> > > > > > > > > > >> >>   - How to deal with detached mode? Right now,
> > > > > > > > DetachedEnvironment
> > > > > > > > > > will
> > > > > > > > > > >> >> execute the job and return immediately. If users
> > > control
> > > > > when
> > > > > > > > they
> > > > > > > > > > >> want to
> > > > > > > > > > >> >> return, by waiting on the job completion future,
> how do
> > > > we
> > > > > deal
> > > > > > > > > with
> > > > > > > > > > >> this?
> > > > > > > > > > >> >> Do we simply remove the distinction between
> > > > > > > > detached/non-detached?
> > > > > > > > > > >> >>
> > > > > > > > > > >> >>   - How does per-job mode interact with
> “interactive
> > > > > > > programming”
> > > > > > > > > > >> >> (FLIP-36). For YARN, each execute() call could
> spawn a
> > > > new
> > > > > > > Flink
> > > > > > > > > YARN
> > > > > > > > > > >> >> cluster. What about Mesos and Kubernetes?
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> The first open question is where the opinions
> diverge,
> > > I
> > > > > think.
> > > > > > > > The
> > > > > > > > > > >> rest
> > > > > > > > > > >> >> are just open questions and interesting things
> that we
> > > > > need to
> > > > > > > > > > >> consider.
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> Best,
> > > > > > > > > > >> >> Aljoscha
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> [1]
> > > > > > > > > > >> >>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > > > > > > > >> >> <
> > > > > > > > > > >> >>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >>
> > > > > > > > > > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <
> > > > zjffdu@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> > Thanks tison for the effort. I left a few
> comments.
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三
> > > > 下午8:24写道：
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> >> Hi Flavio,
> > > > > > > > > > >> >> >>
> > > > > > > > > > >> >> >> Thanks for your reply.
> > > > > > > > > > >> >> >>
> > > > > > > > > > >> >> >> Either current impl and in the design,
> ClusterClient
> > > > > > > > > > >> >> >> never takes responsibility for generating
> JobGraph.
> > > > > > > > > > >> >> >> (what you see in current codebase is several
> class
> > > > > methods)
> > > > > > > > > > >> >> >>
> > > > > > > > > > >> >> >> Instead, user describes his program in the main
> > > method
> > > > > > > > > > >> >> >> with ExecutionEnvironment apis and calls
> > > env.compile()
> > > > > > > > > > >> >> >> or env.optimize() to get FlinkPlan and JobGraph
> > > > > > > respectively.
> > > > > > > > > > >> >> >>
> > > > > > > > > > >> >> >> For listing main classes in a jar and choose
> one for
> > > > > > > > > > >> >> >> submission, you're now able to customize a CLI
> to do
> > > > it.
> > > > > > > > > > >> >> >> Specifically, the path of jar is passed as
> arguments
> > > > and
> > > > > > > > > > >> >> >> in the customized CLI you list main classes,
> choose
> > > > one
> > > > > > > > > > >> >> >> to submit to the cluster.
> > > > > > > > > > >> >> >>
> > > > > > > > > > >> >> >> Best,
> > > > > > > > > > >> >> >> tison.
> > > > > > > > > > >> >> >>
> > > > > > > > > > >> >> >>
> > > > > > > > > > >> >> >> Flavio Pompermaier <po...@okkam.it>
> > > > 于2019年7月31日周三
> > > > > > > > > 下午8:12写道：
> > > > > > > > > > >> >> >>
> > > > > > > > > > >> >> >>> Just one note on my side: it is not clear to me
> > > > > whether the
> > > > > > > > > > client
> > > > > > > > > > >> >> needs
> > > > > > > > > > >> >> >> to
> > > > > > > > > > >> >> >>> be able to generate a job graph or not.
> > > > > > > > > > >> >> >>> In my opinion, the job jar must resides only
> on the
> > > > > > > > > > >> server/jobManager
> > > > > > > > > > >> >> >> side
> > > > > > > > > > >> >> >>> and the client requires a way to get the job
> graph.
> > > > > > > > > > >> >> >>> If you really want to access to the job graph,
> I'd
> > > > add
> > > > > a
> > > > > > > > > > dedicated
> > > > > > > > > > >> >> method
> > > > > > > > > > >> >> >>> on the ClusterClient. like:
> > > > > > > > > > >> >> >>>
> > > > > > > > > > >> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
> > > > > > > > > > >> >> >>>   - listMainClasses(jarId): List<String>
> > > > > > > > > > >> >> >>>
> > > > > > > > > > >> >> >>> These would require some addition also on the
> job
> > > > > manager
> > > > > > > > > > endpoint
> > > > > > > > > > >> as
> > > > > > > > > > >> >> >>> well..what do you think?
> > > > > > > > > > >> >> >>>
> > > > > > > > > > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <
> > > > > > > > > wander4096@gmail.com
> > > > > > > > > > >
> > > > > > > > > > >> >> wrote:
> > > > > > > > > > >> >> >>>
> > > > > > > > > > >> >> >>>> Hi all,
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>> Here is a document[1] on client api
> enhancement
> > > from
> > > > > our
> > > > > > > > > > >> perspective.
> > > > > > > > > > >> >> >>>> We have investigated current implementations.
> And
> > > we
> > > > > > > propose
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>> 1. Unify the implementation of cluster
> deployment
> > > > and
> > > > > job
> > > > > > > > > > >> submission
> > > > > > > > > > >> >> in
> > > > > > > > > > >> >> >>>> Flink.
> > > > > > > > > > >> >> >>>> 2. Provide programmatic interfaces to allow
> > > flexible
> > > > > job
> > > > > > > and
> > > > > > > > > > >> cluster
> > > > > > > > > > >> >> >>>> management.
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>> The first proposal is aimed at reducing code
> paths
> > > > of
> > > > > > > > cluster
> > > > > > > > > > >> >> >> deployment
> > > > > > > > > > >> >> >>>> and
> > > > > > > > > > >> >> >>>> job submission so that one can adopt Flink in
> his
> > > > > usage
> > > > > > > > > easily.
> > > > > > > > > > >> The
> > > > > > > > > > >> >> >>> second
> > > > > > > > > > >> >> >>>> proposal is aimed at providing rich
> interfaces for
> > > > > > > advanced
> > > > > > > > > > users
> > > > > > > > > > >> >> >>>> who want to make accurate control of these
> stages.
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>> Quick reference on open questions:
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>> 1. Exclude job cluster deployment from client
> side
> > > > or
> > > > > > > > redefine
> > > > > > > > > > the
> > > > > > > > > > >> >> >>> semantic
> > > > > > > > > > >> >> >>>> of job cluster? Since it fits in a process
> quite
> > > > > different
> > > > > > > > > from
> > > > > > > > > > >> >> session
> > > > > > > > > > >> >> >>>> cluster deployment and job submission.
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>> 2. Maintain the codepaths handling class
> > > > > > > > > > o.a.f.api.common.Program
> > > > > > > > > > >> or
> > > > > > > > > > >> >> >>>> implement customized program handling logic by
> > > > > customized
> > > > > > > > > > >> >> CliFrontend?
> > > > > > > > > > >> >> >>>> See also this thread[2] and the document[1].
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>> 3. Expose ClusterClient as public api or just
> > > expose
> > > > > api
> > > > > > > in
> > > > > > > > > > >> >> >>>> ExecutionEnvironment
> > > > > > > > > > >> >> >>>> and delegate them to ClusterClient? Further,
> in
> > > > > either way
> > > > > > > > is
> > > > > > > > > it
> > > > > > > > > > >> >> worth
> > > > > > > > > > >> >> >> to
> > > > > > > > > > >> >> >>>> introduce a JobClient which is an
> encapsulation of
> > > > > > > > > ClusterClient
> > > > > > > > > > >> that
> > > > > > > > > > >> >> >>>> associated to specific job?
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>> Best,
> > > > > > > > > > >> >> >>>> tison.
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>> [1]
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>
> > > > > > > > > > >> >> >>
> > > > > > > > > > >> >>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> > > > > > > > > > >> >> >>>> [2]
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>
> > > > > > > > > > >> >> >>
> > > > > > > > > > >> >>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三
> > > > 上午9:19写道：
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>>> Thanks Stephan, I will follow up this issue
> in
> > > next
> > > > > few
> > > > > > > > > weeks,
> > > > > > > > > > >> and
> > > > > > > > > > >> >> >> will
> > > > > > > > > > >> >> >>>>> refine the design doc. We could discuss more
> > > > details
> > > > > > > after
> > > > > > > > > 1.9
> > > > > > > > > > >> >> >> release.
> > > > > > > > > > >> >> >>>>>
> > > > > > > > > > >> >> >>>>> Stephan Ewen <se...@apache.org>
> 于2019年7月24日周三
> > > > > 上午12:58写道：
> > > > > > > > > > >> >> >>>>>
> > > > > > > > > > >> >> >>>>>> Hi all!
> > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > >> >> >>>>>> This thread has stalled for a bit, which I
> > > assume
> > > > > ist
> > > > > > > > mostly
> > > > > > > > > > >> due to
> > > > > > > > > > >> >> >>> the
> > > > > > > > > > >> >> >>>>>> Flink 1.9 feature freeze and release testing
> > > > effort.
> > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > >> >> >>>>>> I personally still recognize this issue as
> one
> > > > > important
> > > > > > > > to
> > > > > > > > > be
> > > > > > > > > > >> >> >>> solved.
> > > > > > > > > > >> >> >>>>> I'd
> > > > > > > > > > >> >> >>>>>> be happy to help resume this discussion soon
> > > > (after
> > > > > the
> > > > > > > > 1.9
> > > > > > > > > > >> >> >> release)
> > > > > > > > > > >> >> >>>> and
> > > > > > > > > > >> >> >>>>>> see if we can do some step towards this in
> Flink
> > > > > 1.10.
> > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > >> >> >>>>>> Best,
> > > > > > > > > > >> >> >>>>>> Stephan
> > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio
> > > > Pompermaier
> > > > > <
> > > > > > > > > > >> >> >>>>> pompermaier@okkam.it>
> > > > > > > > > > >> >> >>>>>> wrote:
> > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > >> >> >>>>>>> That's exactly what I suggested a long time
> > > ago:
> > > > > the
> > > > > > > > Flink
> > > > > > > > > > REST
> > > > > > > > > > >> >> >>>> client
> > > > > > > > > > >> >> >>>>>>> should not require any Flink dependency,
> only
> > > > http
> > > > > > > > library
> > > > > > > > > to
> > > > > > > > > > >> >> >> call
> > > > > > > > > > >> >> >>>> the
> > > > > > > > > > >> >> >>>>>> REST
> > > > > > > > > > >> >> >>>>>>> services to submit and monitor a job.
> > > > > > > > > > >> >> >>>>>>> What I suggested also in [1] was to have a
> way
> > > to
> > > > > > > > > > automatically
> > > > > > > > > > >> >> >>>> suggest
> > > > > > > > > > >> >> >>>>>> the
> > > > > > > > > > >> >> >>>>>>> user (via a UI) the available main classes
> and
> > > > > their
> > > > > > > > > required
> > > > > > > > > > >> >> >>>>>>> parameters[2].
> > > > > > > > > > >> >> >>>>>>> Another problem we have with Flink is that
> the
> > > > Rest
> > > > > > > > client
> > > > > > > > > > and
> > > > > > > > > > >> >> >> the
> > > > > > > > > > >> >> >>>> CLI
> > > > > > > > > > >> >> >>>>>> one
> > > > > > > > > > >> >> >>>>>>> behaves differently and we use the CLI
> client
> > > > (via
> > > > > ssh)
> > > > > > > > > > because
> > > > > > > > > > >> >> >> it
> > > > > > > > > > >> >> >>>>> allows
> > > > > > > > > > >> >> >>>>>>> to call some other method after
> env.execute()
> > > [3]
> > > > > (we
> > > > > > > > have
> > > > > > > > > to
> > > > > > > > > > >> >> >> call
> > > > > > > > > > >> >> >>>>>> another
> > > > > > > > > > >> >> >>>>>>> REST service to signal the end of the job).
> > > > > > > > > > >> >> >>>>>>> Int his regard, a dedicated interface,
> like the
> > > > > > > > JobListener
> > > > > > > > > > >> >> >>> suggested
> > > > > > > > > > >> >> >>>>> in
> > > > > > > > > > >> >> >>>>>>> the previous emails, would be very helpful
> > > > (IMHO).
> > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > >> >> >>>>>>> [1]
> > > > > https://issues.apache.org/jira/browse/FLINK-10864
> > > > > > > > > > >> >> >>>>>>> [2]
> > > > > https://issues.apache.org/jira/browse/FLINK-10862
> > > > > > > > > > >> >> >>>>>>> [3]
> > > > > https://issues.apache.org/jira/browse/FLINK-10879
> > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > >> >> >>>>>>> Best,
> > > > > > > > > > >> >> >>>>>>> Flavio
> > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang
> <
> > > > > > > > > zjffdu@gmail.com
> > > > > > > > > > >
> > > > > > > > > > >> >> >>> wrote:
> > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > >> >> >>>>>>>> Hi, Tison,
> > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > >> >> >>>>>>>> Thanks for your comments. Overall I agree
> with
> > > > you
> > > > > > > that
> > > > > > > > it
> > > > > > > > > > is
> > > > > > > > > > >> >> >>>>> difficult
> > > > > > > > > > >> >> >>>>>>> for
> > > > > > > > > > >> >> >>>>>>>> down stream project to integrate with
> flink
> > > and
> > > > we
> > > > > > > need
> > > > > > > > to
> > > > > > > > > > >> >> >>> refactor
> > > > > > > > > > >> >> >>>>> the
> > > > > > > > > > >> >> >>>>>>>> current flink client api.
> > > > > > > > > > >> >> >>>>>>>> And I agree that CliFrontend should only
> > > parsing
> > > > > > > command
> > > > > > > > > > line
> > > > > > > > > > >> >> >>>>> arguments
> > > > > > > > > > >> >> >>>>>>> and
> > > > > > > > > > >> >> >>>>>>>> then pass them to ExecutionEnvironment.
> It is
> > > > > > > > > > >> >> >>>> ExecutionEnvironment's
> > > > > > > > > > >> >> >>>>>>>> responsibility to compile job, create
> cluster,
> > > > and
> > > > > > > > submit
> > > > > > > > > > job.
> > > > > > > > > > >> >> >>>>> Besides
> > > > > > > > > > >> >> >>>>>>>> that, Currently flink has many
> > > > > ExecutionEnvironment
> > > > > > > > > > >> >> >>>> implementations,
> > > > > > > > > > >> >> >>>>>> and
> > > > > > > > > > >> >> >>>>>>>> flink will use the specific one based on
> the
> > > > > context.
> > > > > > > > > IMHO,
> > > > > > > > > > it
> > > > > > > > > > >> >> >> is
> > > > > > > > > > >> >> >>>> not
> > > > > > > > > > >> >> >>>>>>>> necessary, ExecutionEnvironment should be
> able
> > > > to
> > > > > do
> > > > > > > the
> > > > > > > > > > right
> > > > > > > > > > >> >> >>>> thing
> > > > > > > > > > >> >> >>>>>>> based
> > > > > > > > > > >> >> >>>>>>>> on the FlinkConf it is received. Too many
> > > > > > > > > > ExecutionEnvironment
> > > > > > > > > > >> >> >>>>>>>> implementation is another burden for
> > > downstream
> > > > > > > project
> > > > > > > > > > >> >> >>>> integration.
> > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > >> >> >>>>>>>> One thing I'd like to mention is flink's
> scala
> > > > > shell
> > > > > > > and
> > > > > > > > > sql
> > > > > > > > > > >> >> >>>> client,
> > > > > > > > > > >> >> >>>>>>>> although they are sub-modules of flink,
> they
> > > > > could be
> > > > > > > > > > treated
> > > > > > > > > > >> >> >> as
> > > > > > > > > > >> >> >>>>>>> downstream
> > > > > > > > > > >> >> >>>>>>>> project which use flink's client api.
> > > Currently
> > > > > you
> > > > > > > will
> > > > > > > > > > find
> > > > > > > > > > >> >> >> it
> > > > > > > > > > >> >> >>> is
> > > > > > > > > > >> >> >>>>> not
> > > > > > > > > > >> >> >>>>>>>> easy for them to integrate with flink,
> they
> > > > share
> > > > > many
> > > > > > > > > > >> >> >> duplicated
> > > > > > > > > > >> >> >>>>> code.
> > > > > > > > > > >> >> >>>>>>> It
> > > > > > > > > > >> >> >>>>>>>> is another sign that we should refactor
> flink
> > > > > client
> > > > > > > > api.
> > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > >> >> >>>>>>>> I believe it is a large and hard change,
> and I
> > > > am
> > > > > > > afraid
> > > > > > > > > we
> > > > > > > > > > >> can
> > > > > > > > > > >> >> >>> not
> > > > > > > > > > >> >> >>>>>> keep
> > > > > > > > > > >> >> >>>>>>>> compatibility since many of changes are
> user
> > > > > facing.
> > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > >> >> >>>>>>>> Zili Chen <wa...@gmail.com>
> > > 于2019年6月24日周一
> > > > > > > > 下午2:53写道：
> > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> Hi all,
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> After a closer look on our client apis,
> I can
> > > > see
> > > > > > > there
> > > > > > > > > are
> > > > > > > > > > >> >> >> two
> > > > > > > > > > >> >> >>>>> major
> > > > > > > > > > >> >> >>>>>>>>> issues to consistency and integration,
> namely
> > > > > > > different
> > > > > > > > > > >> >> >>>> deployment
> > > > > > > > > > >> >> >>>>> of
> > > > > > > > > > >> >> >>>>>>>>> job cluster which couples job graph
> creation
> > > > and
> > > > > > > > cluster
> > > > > > > > > > >> >> >>>>> deployment,
> > > > > > > > > > >> >> >>>>>>>>> and submission via CliFrontend confusing
> > > > control
> > > > > flow
> > > > > > > > of
> > > > > > > > > > job
> > > > > > > > > > >> >> >>>> graph
> > > > > > > > > > >> >> >>>>>>>>> compilation and job submission. I'd like
> to
> > > > > follow
> > > > > > > the
> > > > > > > > > > >> >> >> discuss
> > > > > > > > > > >> >> >>>>> above,
> > > > > > > > > > >> >> >>>>>>>>> mainly the process described by Jeff and
> > > > > Stephan, and
> > > > > > > > > share
> > > > > > > > > > >> >> >> my
> > > > > > > > > > >> >> >>>>>>>>> ideas on these issues.
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend confuses the control flow
> of
> > > job
> > > > > > > > > compilation
> > > > > > > > > > >> >> >> and
> > > > > > > > > > >> >> >>>>>>>> submission.
> > > > > > > > > > >> >> >>>>>>>>> Following the process of job submission
> > > Stephan
> > > > > and
> > > > > > > > Jeff
> > > > > > > > > > >> >> >>>> described,
> > > > > > > > > > >> >> >>>>>>>>> execution environment knows all configs
> of
> > > the
> > > > > > > cluster
> > > > > > > > > and
> > > > > > > > > > >> >> >>>>>>> topos/settings
> > > > > > > > > > >> >> >>>>>>>>> of the job. Ideally, in the main method
> of
> > > user
> > > > > > > > program,
> > > > > > > > > it
> > > > > > > > > > >> >> >>> calls
> > > > > > > > > > >> >> >>>>>>>> #execute
> > > > > > > > > > >> >> >>>>>>>>> (or named #submit) and Flink deploys the
> > > > cluster,
> > > > > > > > compile
> > > > > > > > > > the
> > > > > > > > > > >> >> >>> job
> > > > > > > > > > >> >> >>>>>> graph
> > > > > > > > > > >> >> >>>>>>>>> and submit it to the cluster. However,
> > > current
> > > > > > > > > CliFrontend
> > > > > > > > > > >> >> >> does
> > > > > > > > > > >> >> >>>> all
> > > > > > > > > > >> >> >>>>>>> these
> > > > > > > > > > >> >> >>>>>>>>> things inside its #runProgram method,
> which
> > > > > > > introduces
> > > > > > > > a
> > > > > > > > > > lot
> > > > > > > > > > >> >> >> of
> > > > > > > > > > >> >> >>>>>>>> subclasses
> > > > > > > > > > >> >> >>>>>>>>> of (stream) execution environment.
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> Actually, it sets up an exec env that
> hijacks
> > > > the
> > > > > > > > > > >> >> >>>>>> #execute/executePlan
> > > > > > > > > > >> >> >>>>>>>>> method, initializes the job graph and
> abort
> > > > > > > execution.
> > > > > > > > > And
> > > > > > > > > > >> >> >> then
> > > > > > > > > > >> >> >>>>>>>>> control flow back to CliFrontend, it
> deploys
> > > > the
> > > > > > > > > cluster(or
> > > > > > > > > > >> >> >>>>> retrieve
> > > > > > > > > > >> >> >>>>>>>>> the client) and submits the job graph.
> This
> > > is
> > > > > quite
> > > > > > > a
> > > > > > > > > > >> >> >> specific
> > > > > > > > > > >> >> >>>>>>> internal
> > > > > > > > > > >> >> >>>>>>>>> process inside Flink and none of
> consistency
> > > to
> > > > > > > > anything.
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> 2) Deployment of job cluster couples job
> > > graph
> > > > > > > creation
> > > > > > > > > and
> > > > > > > > > > >> >> >>>> cluster
> > > > > > > > > > >> >> >>>>>>>>> deployment. Abstractly, from user job to
> a
> > > > > concrete
> > > > > > > > > > >> >> >> submission,
> > > > > > > > > > >> >> >>>> it
> > > > > > > > > > >> >> >>>>>>>> requires
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>     create JobGraph --\
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> create ClusterClient -->  submit JobGraph
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> such a dependency. ClusterClient was
> created
> > > by
> > > > > > > > deploying
> > > > > > > > > > or
> > > > > > > > > > >> >> >>>>>>> retrieving.
> > > > > > > > > > >> >> >>>>>>>>> JobGraph submission requires a compiled
> > > > JobGraph
> > > > > and
> > > > > > > > > valid
> > > > > > > > > > >> >> >>>>>>> ClusterClient,
> > > > > > > > > > >> >> >>>>>>>>> but the creation of ClusterClient is
> > > abstractly
> > > > > > > > > independent
> > > > > > > > > > >> >> >> of
> > > > > > > > > > >> >> >>>> that
> > > > > > > > > > >> >> >>>>>> of
> > > > > > > > > > >> >> >>>>>>>>> JobGraph. However, in job cluster mode,
> we
> > > > > deploy job
> > > > > > > > > > cluster
> > > > > > > > > > >> >> >>>> with
> > > > > > > > > > >> >> >>>>> a
> > > > > > > > > > >> >> >>>>>>> job
> > > > > > > > > > >> >> >>>>>>>>> graph, which means we use another
> process:
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> create JobGraph --> deploy cluster with
> the
> > > > > JobGraph
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> Here is another inconsistency and
> downstream
> > > > > > > > > > projects/client
> > > > > > > > > > >> >> >>> apis
> > > > > > > > > > >> >> >>>>> are
> > > > > > > > > > >> >> >>>>>>>>> forced to handle different cases with
> rare
> > > > > supports
> > > > > > > > from
> > > > > > > > > > >> >> >> Flink.
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> Since we likely reached a consensus on
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> 1. all configs gathered by Flink
> > > configuration
> > > > > and
> > > > > > > > passed
> > > > > > > > > > >> >> >>>>>>>>> 2. execution environment knows all
> configs
> > > and
> > > > > > > handles
> > > > > > > > > > >> >> >>>>> execution(both
> > > > > > > > > > >> >> >>>>>>>>> deployment and submission)
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> to the issues above I propose eliminating
> > > > > > > > inconsistencies
> > > > > > > > > > by
> > > > > > > > > > >> >> >>>>>> following
> > > > > > > > > > >> >> >>>>>>>>> approach:
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend should exactly be a front
> end,
> > > > at
> > > > > > > least
> > > > > > > > > for
> > > > > > > > > > >> >> >>> "run"
> > > > > > > > > > >> >> >>>>>>> command.
> > > > > > > > > > >> >> >>>>>>>>> That means it just gathered and passed
> all
> > > > config
> > > > > > > from
> > > > > > > > > > >> >> >> command
> > > > > > > > > > >> >> >>>> line
> > > > > > > > > > >> >> >>>>>> to
> > > > > > > > > > >> >> >>>>>>>>> the main method of user program.
> Execution
> > > > > > > environment
> > > > > > > > > > knows
> > > > > > > > > > >> >> >>> all
> > > > > > > > > > >> >> >>>>> the
> > > > > > > > > > >> >> >>>>>>> info
> > > > > > > > > > >> >> >>>>>>>>> and with an addition to utils for
> > > > ClusterClient,
> > > > > we
> > > > > > > > > > >> >> >> gracefully
> > > > > > > > > > >> >> >>>> get
> > > > > > > > > > >> >> >>>>> a
> > > > > > > > > > >> >> >>>>>>>>> ClusterClient by deploying or
> retrieving. In
> > > > this
> > > > > > > way,
> > > > > > > > we
> > > > > > > > > > >> >> >> don't
> > > > > > > > > > >> >> >>>>> need
> > > > > > > > > > >> >> >>>>>> to
> > > > > > > > > > >> >> >>>>>>>>> hijack #execute/executePlan methods and
> can
> > > > > remove
> > > > > > > > > various
> > > > > > > > > > >> >> >>>> hacking
> > > > > > > > > > >> >> >>>>>>>>> subclasses of exec env, as well as #run
> > > methods
> > > > > in
> > > > > > > > > > >> >> >>>>> ClusterClient(for
> > > > > > > > > > >> >> >>>>>> an
> > > > > > > > > > >> >> >>>>>>>>> interface-ized ClusterClient). Now the
> > > control
> > > > > flow
> > > > > > > > flows
> > > > > > > > > > >> >> >> from
> > > > > > > > > > >> >> >>>>>>>> CliFrontend
> > > > > > > > > > >> >> >>>>>>>>> to the main method and never returns.
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> 2) Job cluster means a cluster for the
> > > specific
> > > > > job.
> > > > > > > > From
> > > > > > > > > > >> >> >>> another
> > > > > > > > > > >> >> >>>>>>>>> perspective, it is an ephemeral session.
> We
> > > may
> > > > > > > > decouple
> > > > > > > > > > the
> > > > > > > > > > >> >> >>>>>> deployment
> > > > > > > > > > >> >> >>>>>>>>> with a compiled job graph, but start a
> > > session
> > > > > with
> > > > > > > > idle
> > > > > > > > > > >> >> >>> timeout
> > > > > > > > > > >> >> >>>>>>>>> and submit the job following.
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> These topics, before we go into more
> details
> > > on
> > > > > > > design
> > > > > > > > or
> > > > > > > > > > >> >> >>>>>>> implementation,
> > > > > > > > > > >> >> >>>>>>>>> are better to be aware and discussed for
> a
> > > > > consensus.
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> Best,
> > > > > > > > > > >> >> >>>>>>>>> tison.
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>> Zili Chen <wa...@gmail.com>
> > > 于2019年6月20日周四
> > > > > > > > 上午3:21写道：
> > > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>> Hi Jeff,
> > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>> Thanks for raising this thread and the
> > > design
> > > > > > > > document!
> > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>> As @Thomas Weise mentioned above,
> extending
> > > > > config
> > > > > > > to
> > > > > > > > > > flink
> > > > > > > > > > >> >> >>>>>>>>>> requires far more effort than it should
> be.
> > > > > Another
> > > > > > > > > > example
> > > > > > > > > > >> >> >>>>>>>>>> is we achieve detach mode by introduce
> > > another
> > > > > > > > execution
> > > > > > > > > > >> >> >>>>>>>>>> environment which also hijack #execute
> > > method.
> > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>> I agree with your idea that user would
> > > > > configure all
> > > > > > > > > > things
> > > > > > > > > > >> >> >>>>>>>>>> and flink "just" respect it. On this
> topic I
> > > > > think
> > > > > > > the
> > > > > > > > > > >> >> >> unusual
> > > > > > > > > > >> >> >>>>>>>>>> control flow when CliFrontend handle
> "run"
> > > > > command
> > > > > > > is
> > > > > > > > > the
> > > > > > > > > > >> >> >>>> problem.
> > > > > > > > > > >> >> >>>>>>>>>> It handles several configs, mainly about
> > > > cluster
> > > > > > > > > settings,
> > > > > > > > > > >> >> >> and
> > > > > > > > > > >> >> >>>>>>>>>> thus main method of user program is
> unaware
> > > of
> > > > > them.
> > > > > > > > > Also
> > > > > > > > > > it
> > > > > > > > > > >> >> >>>>>> compiles
> > > > > > > > > > >> >> >>>>>>>>>> app to job graph by run the main method
> > > with a
> > > > > > > > hijacked
> > > > > > > > > > exec
> > > > > > > > > > >> >> >>>> env,
> > > > > > > > > > >> >> >>>>>>>>>> which constrain the main method further.
> > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>> I'd like to write down a few of notes on
> > > > > > > configs/args
> > > > > > > > > pass
> > > > > > > > > > >> >> >> and
> > > > > > > > > > >> >> >>>>>>> respect,
> > > > > > > > > > >> >> >>>>>>>>>> as well as decoupling job compilation
> and
> > > > > > > submission.
> > > > > > > > > > Share
> > > > > > > > > > >> >> >> on
> > > > > > > > > > >> >> >>>>> this
> > > > > > > > > > >> >> >>>>>>>>>> thread later.
> > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>> Best,
> > > > > > > > > > >> >> >>>>>>>>>> tison.
> > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com>
> > > > > 于2019年6月17日周一
> > > > > > > > > > >> >> >> 下午7:29写道：
> > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
> > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the
> design
> > > > > > > document.
> > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>> We are also working on refactoring
> > > > > ClusterClient to
> > > > > > > > > allow
> > > > > > > > > > >> >> >>>>> flexible
> > > > > > > > > > >> >> >>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>> efficient job management in our
> real-time
> > > > > platform.
> > > > > > > > > > >> >> >>>>>>>>>>> We would like to draft a document to
> share
> > > > our
> > > > > > > ideas
> > > > > > > > > with
> > > > > > > > > > >> >> >>> you.
> > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>> I think it's a good idea to have
> something
> > > > like
> > > > > > > > Apache
> > > > > > > > > > Livy
> > > > > > > > > > >> >> >>> for
> > > > > > > > > > >> >> >>>>>>> Flink,
> > > > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>> the efforts discussed here will take a
> > > great
> > > > > step
> > > > > > > > > forward
> > > > > > > > > > >> >> >> to
> > > > > > > > > > >> >> >>>> it.
> > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>> Regards,
> > > > > > > > > > >> >> >>>>>>>>>>> Xiaogang
> > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>> Flavio Pompermaier <
> pompermaier@okkam.it>
> > > > > > > > > 于2019年6月17日周一
> > > > > > > > > > >> >> >>>>> 下午7:13写道：
> > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>> Is there any possibility to have
> something
> > > > > like
> > > > > > > > Apache
> > > > > > > > > > >> >> >> Livy
> > > > > > > > > > >> >> >>>> [1]
> > > > > > > > > > >> >> >>>>>>> also
> > > > > > > > > > >> >> >>>>>>>>>>> for
> > > > > > > > > > >> >> >>>>>>>>>>>> Flink in the future?
> > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
> > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff
> > > Zhang <
> > > > > > > > > > >> >> >>> zjffdu@gmail.com
> > > > > > > > > > >> >> >>>>>
> > > > > > > > > > >> >> >>>>>>> wrote:
> > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Any API we expose should not have
> > > > > dependencies
> > > > > > > > on
> > > > > > > > > > >> >> >>> the
> > > > > > > > > > >> >> >>>>>>> runtime
> > > > > > > > > > >> >> >>>>>>>>>>>>> (flink-runtime) package or other
> > > > > implementation
> > > > > > > > > > >> >> >> details.
> > > > > > > > > > >> >> >>> To
> > > > > > > > > > >> >> >>>>> me,
> > > > > > > > > > >> >> >>>>>>>> this
> > > > > > > > > > >> >> >>>>>>>>>>>> means
> > > > > > > > > > >> >> >>>>>>>>>>>>> that the current ClusterClient
> cannot be
> > > > > exposed
> > > > > > > to
> > > > > > > > > > >> >> >> users
> > > > > > > > > > >> >> >>>>>> because
> > > > > > > > > > >> >> >>>>>>>> it
> > > > > > > > > > >> >> >>>>>>>>>>>> uses
> > > > > > > > > > >> >> >>>>>>>>>>>>> quite some classes from the
> optimiser and
> > > > > runtime
> > > > > > > > > > >> >> >>> packages.
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>> We should change ClusterClient from
> class
> > > > to
> > > > > > > > > interface.
> > > > > > > > > > >> >> >>>>>>>>>>>>> ExecutionEnvironment only use the
> > > interface
> > > > > > > > > > >> >> >> ClusterClient
> > > > > > > > > > >> >> >>>>> which
> > > > > > > > > > >> >> >>>>>>>>>>> should be
> > > > > > > > > > >> >> >>>>>>>>>>>>> in flink-clients while the concrete
> > > > > > > implementation
> > > > > > > > > > >> >> >> class
> > > > > > > > > > >> >> >>>>> could
> > > > > > > > > > >> >> >>>>>> be
> > > > > > > > > > >> >> >>>>>>>> in
> > > > > > > > > > >> >> >>>>>>>>>>>>> flink-runtime.
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What happens when a
> failure/restart in
> > > > the
> > > > > > > > client
> > > > > > > > > > >> >> >>>>> happens?
> > > > > > > > > > >> >> >>>>>>>> There
> > > > > > > > > > >> >> >>>>>>>>>>> need
> > > > > > > > > > >> >> >>>>>>>>>>>>> to be a way of re-establishing the
> > > > > connection to
> > > > > > > > the
> > > > > > > > > > >> >> >> job,
> > > > > > > > > > >> >> >>>> set
> > > > > > > > > > >> >> >>>>>> up
> > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>> listeners again, etc.
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>> Good point.  First we need to define
> what
> > > > > does
> > > > > > > > > > >> >> >>>>> failure/restart
> > > > > > > > > > >> >> >>>>>> in
> > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>> client mean. IIUC, that usually mean
> > > > network
> > > > > > > > failure
> > > > > > > > > > >> >> >>> which
> > > > > > > > > > >> >> >>>>> will
> > > > > > > > > > >> >> >>>>>>>>>>> happen in
> > > > > > > > > > >> >> >>>>>>>>>>>>> class RestClient. If my
> understanding is
> > > > > correct,
> > > > > > > > > > >> >> >>>>> restart/retry
> > > > > > > > > > >> >> >>>>>>>>>>> mechanism
> > > > > > > > > > >> >> >>>>>>>>>>>>> should be done in RestClient.
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>> Aljoscha Krettek <
> aljoscha@apache.org>
> > > > > > > > 于2019年6月11日周二
> > > > > > > > > > >> >> >>>>>> 下午11:10写道：
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>> Some points to consider:
> > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>> * Any API we expose should not have
> > > > > dependencies
> > > > > > > > on
> > > > > > > > > > >> >> >> the
> > > > > > > > > > >> >> >>>>>> runtime
> > > > > > > > > > >> >> >>>>>>>>>>>>>> (flink-runtime) package or other
> > > > > implementation
> > > > > > > > > > >> >> >>> details.
> > > > > > > > > > >> >> >>>> To
> > > > > > > > > > >> >> >>>>>> me,
> > > > > > > > > > >> >> >>>>>>>>>>> this
> > > > > > > > > > >> >> >>>>>>>>>>>>> means
> > > > > > > > > > >> >> >>>>>>>>>>>>>> that the current ClusterClient
> cannot be
> > > > > exposed
> > > > > > > > to
> > > > > > > > > > >> >> >>> users
> > > > > > > > > > >> >> >>>>>>> because
> > > > > > > > > > >> >> >>>>>>>>>>> it
> > > > > > > > > > >> >> >>>>>>>>>>>>> uses
> > > > > > > > > > >> >> >>>>>>>>>>>>>> quite some classes from the
> optimiser
> > > and
> > > > > > > runtime
> > > > > > > > > > >> >> >>>> packages.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>> * What happens when a
> failure/restart in
> > > > the
> > > > > > > > client
> > > > > > > > > > >> >> >>>>> happens?
> > > > > > > > > > >> >> >>>>>>>> There
> > > > > > > > > > >> >> >>>>>>>>>>> need
> > > > > > > > > > >> >> >>>>>>>>>>>>> to
> > > > > > > > > > >> >> >>>>>>>>>>>>>> be a way of re-establishing the
> > > connection
> > > > > to
> > > > > > > the
> > > > > > > > > > >> >> >> job,
> > > > > > > > > > >> >> >>>> set
> > > > > > > > > > >> >> >>>>> up
> > > > > > > > > > >> >> >>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>> listeners
> > > > > > > > > > >> >> >>>>>>>>>>>>>> again, etc.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>> Aljoscha
> > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff
> Zhang <
> > > > > > > > > > >> >> >>>> zjffdu@gmail.com>
> > > > > > > > > > >> >> >>>>>>>> wrote:
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is
> late as
> > > > you
> > > > > > > > > > >> >> >> expected.
> > > > > > > > > > >> >> >>>>> Here's
> > > > > > > > > > >> >> >>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>> design
> > > > > > > > > > >> >> >>>>>>>>>>>>>> doc
> > > > > > > > > > >> >> >>>>>>>>>>>>>>> I drafted, welcome any comments and
> > > > > feedback.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > >> >> >>>>>
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>
> > > > > > > > > > >> >> >>
> > > > > > > > > > >> >>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
> > > > > 于2019年2月14日周四
> > > > > > > > > > >> >> >>>> 下午8:43写道：
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Nice that this discussion is
> > > happening.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also
> revisit the
> > > > > entire
> > > > > > > > role
> > > > > > > > > > >> >> >>> of
> > > > > > > > > > >> >> >>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>> environments
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> again.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the environments take care of
> the
> > > > > specific
> > > > > > > > > > >> >> >> setup
> > > > > > > > > > >> >> >>>> for
> > > > > > > > > > >> >> >>>>>>>>>>> standalone
> > > > > > > > > > >> >> >>>>>>>>>>>> (no
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the session ones have control
> over
> > > the
> > > > > > > > session.
> > > > > > > > > > >> >> >>> The
> > > > > > > > > > >> >> >>>>>>>>>>> environment
> > > > > > > > > > >> >> >>>>>>>>>>>>> holds
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the session client.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> - running a job gives a "control"
> > > object
> > > > > for
> > > > > > > > that
> > > > > > > > > > >> >> >>>> job.
> > > > > > > > > > >> >> >>>>>> That
> > > > > > > > > > >> >> >>>>>>>>>>>> behavior
> > > > > > > > > > >> >> >>>>>>>>>>>>> is
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the same in all environments.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> The actual implementation diverged
> > > quite
> > > > > a bit
> > > > > > > > > > >> >> >> from
> > > > > > > > > > >> >> >>>>> that.
> > > > > > > > > > >> >> >>>>>>>> Happy
> > > > > > > > > > >> >> >>>>>>>>>>> to
> > > > > > > > > > >> >> >>>>>>>>>>>>> see a
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> discussion about straitening this
> out
> > > a
> > > > > bit
> > > > > > > > more.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM
> Jeff
> > > > > Zhang <
> > > > > > > > > > >> >> >>>>>>> zjffdu@gmail.com>
> > > > > > > > > > >> >> >>>>>>>>>>>> wrote:
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Hi folks,
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Sorry for late response, It
> seems we
> > > > > reach
> > > > > > > > > > >> >> >>> consensus
> > > > > > > > > > >> >> >>>> on
> > > > > > > > > > >> >> >>>>>>>> this, I
> > > > > > > > > > >> >> >>>>>>>>>>>> will
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> create
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> FLIP for this with more detailed
> > > design
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org>
> > > > > 于2018年12月21日周五
> > > > > > > > > > >> >> >>>>> 上午11:43写道：
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Great to see this discussion
> seeded!
> > > > The
> > > > > > > > > > >> >> >> problems
> > > > > > > > > > >> >> >>>> you
> > > > > > > > > > >> >> >>>>>> face
> > > > > > > > > > >> >> >>>>>>>>>>> with
> > > > > > > > > > >> >> >>>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also
> > > > affecting
> > > > > > > other
> > > > > > > > > > >> >> >>>>> downstream
> > > > > > > > > > >> >> >>>>>>>>>>> projects,
> > > > > > > > > > >> >> >>>>>>>>>>>>>> like
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Beam.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint
> > > restore
> > > > > option
> > > > > > > > in
> > > > > > > > > > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> [1]
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and that was more difficult
> than it
> > > > > should
> > > > > > > be.
> > > > > > > > > > >> >> >> The
> > > > > > > > > > >> >> >>>>> main
> > > > > > > > > > >> >> >>>>>>>> issue
> > > > > > > > > > >> >> >>>>>>>>>>> is
> > > > > > > > > > >> >> >>>>>>>>>>>>> that
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> environment and cluster client
> > > aren't
> > > > > > > > decoupled.
> > > > > > > > > > >> >> >>>>> Ideally
> > > > > > > > > > >> >> >>>>>>> it
> > > > > > > > > > >> >> >>>>>>>>>>> should
> > > > > > > > > > >> >> >>>>>>>>>>>>> be
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> possible to just get the
> matching
> > > > > cluster
> > > > > > > > client
> > > > > > > > > > >> >> >>>> from
> > > > > > > > > > >> >> >>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>> environment
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> then control the job through it
> > > > > (environment
> > > > > > > > as
> > > > > > > > > > >> >> >>>>> factory
> > > > > > > > > > >> >> >>>>>>> for
> > > > > > > > > > >> >> >>>>>>>>>>>> cluster
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> client). But note that the
> > > environment
> > > > > > > classes
> > > > > > > > > > >> >> >> are
> > > > > > > > > > >> >> >>>>> part
> > > > > > > > > > >> >> >>>>>> of
> > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>> public
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API,
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and it is not straightforward to
> > > make
> > > > > larger
> > > > > > > > > > >> >> >>> changes
> > > > > > > > > > >> >> >>>>>>> without
> > > > > > > > > > >> >> >>>>>>>>>>>>> breaking
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes
> > > > internal
> > > > > > > > classes
> > > > > > > > > > >> >> >>>> like
> > > > > > > > > > >> >> >>>>>>>>>>> JobGraph and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be
> > > possible
> > > > > to
> > > > > > > wrap
> > > > > > > > > > >> >> >>> this
> > > > > > > > > > >> >> >>>>>> with a
> > > > > > > > > > >> >> >>>>>>>> new
> > > > > > > > > > >> >> >>>>>>>>>>>>> public
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> API
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that brings the required job
> control
> > > > > > > > > > >> >> >> capabilities
> > > > > > > > > > >> >> >>>> for
> > > > > > > > > > >> >> >>>>>>>>>>> downstream
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> projects.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at
> > > some
> > > > > of the
> > > > > > > > > > >> >> >>>>> interfaces
> > > > > > > > > > >> >> >>>>>> in
> > > > > > > > > > >> >> >>>>>>>>>>> Beam
> > > > > > > > > > >> >> >>>>>>>>>>>>> while
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the
> > > > > portable
> > > > > > > job
> > > > > > > > > > >> >> >> API
> > > > > > > > > > >> >> >>>> and
> > > > > > > > > > >> >> >>>>>> [3]
> > > > > > > > > > >> >> >>>>>>>> for
> > > > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>> old
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> asynchronous job control from
> the
> > > Beam
> > > > > Java
> > > > > > > > SDK.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> The backward compatibility
> > > discussion
> > > > > [4] is
> > > > > > > > > > >> >> >> also
> > > > > > > > > > >> >> >>>>>> relevant
> > > > > > > > > > >> >> >>>>>>>>>>> here. A
> > > > > > > > > > >> >> >>>>>>>>>>>>> new
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> should shield downstream
> projects
> > > from
> > > > > > > > internals
> > > > > > > > > > >> >> >>> and
> > > > > > > > > > >> >> >>>>>> allow
> > > > > > > > > > >> >> >>>>>>>>>>> them to
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> interoperate with multiple
> future
> > > > Flink
> > > > > > > > versions
> > > > > > > > > > >> >> >>> in
> > > > > > > > > > >> >> >>>>> the
> > > > > > > > > > >> >> >>>>>>> same
> > > > > > > > > > >> >> >>>>>>>>>>>> release
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> line
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thomas
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [1]
> > > > > > > https://github.com/apache/flink/pull/7249
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [2]
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > >> >> >>>>>
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>
> > > > > > > > > > >> >> >>
> > > > > > > > > > >> >>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [3]
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > >> >> >>>>>
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>
> > > > > > > > > > >> >> >>
> > > > > > > > > > >> >>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [4]
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > >> >> >>>>>
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>
> > > > > > > > > > >> >> >>
> > > > > > > > > > >> >>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM
> Jeff
> > > > > Zhang <
> > > > > > > > > > >> >> >>>>>>>> zjffdu@gmail.com>
> > > > > > > > > > >> >> >>>>>>>>>>>>> wrote:
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the
> user
> > > > > should
> > > > > > > be
> > > > > > > > > > >> >> >>> able
> > > > > > > > > > >> >> >>>> to
> > > > > > > > > > >> >> >>>>>>>> define
> > > > > > > > > > >> >> >>>>>>>>>>>> where
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> job
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn).
> This
> > > is
> > > > > > > actually
> > > > > > > > > > >> >> >>>>>> independent
> > > > > > > > > > >> >> >>>>>>>> of
> > > > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>> job
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> development and is something
> which
> > > is
> > > > > > > decided
> > > > > > > > > > >> >> >> at
> > > > > > > > > > >> >> >>>>>>> deployment
> > > > > > > > > > >> >> >>>>>>>>>>> time.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> User don't need to specify
> > > execution
> > > > > mode
> > > > > > > > > > >> >> >>>>>>> programmatically.
> > > > > > > > > > >> >> >>>>>>>>>>> They
> > > > > > > > > > >> >> >>>>>>>>>>>>> can
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> also
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from
> the
> > > > > arguments
> > > > > > > in
> > > > > > > > > > >> >> >>> flink
> > > > > > > > > > >> >> >>>>> run
> > > > > > > > > > >> >> >>>>>>>>>>> command.
> > > > > > > > > > >> >> >>>>>>>>>>>>> e.g.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster
> ....
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that
> the
> > > > > > > > > > >> >> >>>> ExecutionEnvironment
> > > > > > > > > > >> >> >>>>>> is
> > > > > > > > > > >> >> >>>>>>>> not
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> directly
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> initialized by the user and
> instead
> > > > > context
> > > > > > > > > > >> >> >>>> sensitive
> > > > > > > > > > >> >> >>>>>> how
> > > > > > > > > > >> >> >>>>>>>> you
> > > > > > > > > > >> >> >>>>>>>>>>>> want
> > > > > > > > > > >> >> >>>>>>>>>>>>> to
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs.
> > > IDE,
> > > > > for
> > > > > > > > > > >> >> >>> example).
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink
> > > would
> > > > > > > create
> > > > > > > > > > >> >> >>>>> different
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment
> based
> > > on
> > > > > > > > different
> > > > > > > > > > >> >> >>>>>> submission
> > > > > > > > > > >> >> >>>>>>>>>>>> scenarios
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> (Flink
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is
> kind of
> > > > hack
> > > > > > > > > > >> >> >> approach,
> > > > > > > > > > >> >> >>>> not
> > > > > > > > > > >> >> >>>>>> so
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> straightforward.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> What I suggested above is that
> is
> > > > that
> > > > > > > flink
> > > > > > > > > > >> >> >>> should
> > > > > > > > > > >> >> >>>>>>> always
> > > > > > > > > > >> >> >>>>>>>>>>> create
> > > > > > > > > > >> >> >>>>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with
> > > > different
> > > > > > > > > > >> >> >>>>> configuration,
> > > > > > > > > > >> >> >>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>> based
> > > > > > > > > > >> >> >>>>>>>>>>>> on
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> configuration it would create
> the
> > > > > proper
> > > > > > > > > > >> >> >>>>> ClusterClient
> > > > > > > > > > >> >> >>>>>>> for
> > > > > > > > > > >> >> >>>>>>>>>>>>> different
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <
> > > trohrmann@apache.org>
> > > > > > > > > > >> >> >>>> 于2018年12月20日周四
> > > > > > > > > > >> >> >>>>>>>>>>> 下午11:18写道：
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> You are probably right that we
> > > have
> > > > > code
> > > > > > > > > > >> >> >>>> duplication
> > > > > > > > > > >> >> >>>>>>> when
> > > > > > > > > > >> >> >>>>>>>> it
> > > > > > > > > > >> >> >>>>>>>>>>>> comes
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> to
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient.
> > > This
> > > > > should
> > > > > > > > be
> > > > > > > > > > >> >> >>>>> reduced
> > > > > > > > > > >> >> >>>>>> in
> > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> future.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the
> user
> > > > > should be
> > > > > > > > > > >> >> >> able
> > > > > > > > > > >> >> >>> to
> > > > > > > > > > >> >> >>>>>>> define
> > > > > > > > > > >> >> >>>>>>>>>>> where
> > > > > > > > > > >> >> >>>>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn).
> This
> > > is
> > > > > > > > actually
> > > > > > > > > > >> >> >>>>>>> independent
> > > > > > > > > > >> >> >>>>>>>>>>> of the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> development and is something
> which
> > > > is
> > > > > > > > decided
> > > > > > > > > > >> >> >> at
> > > > > > > > > > >> >> >>>>>>>> deployment
> > > > > > > > > > >> >> >>>>>>>>>>>> time.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> To
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> it
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the
> > > > > ExecutionEnvironment
> > > > > > > is
> > > > > > > > > > >> >> >> not
> > > > > > > > > > >> >> >>>>>>> directly
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> initialized
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> by
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> the user and instead context
> > > > > sensitive how
> > > > > > > > you
> > > > > > > > > > >> >> >>>> want
> > > > > > > > > > >> >> >>>>> to
> > > > > > > > > > >> >> >>>>>>>>>>> execute
> > > > > > > > > > >> >> >>>>>>>>>>>>> your
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for
> example).
> > > > > > > However, I
> > > > > > > > > > >> >> >>> agree
> > > > > > > > > > >> >> >>>>>> that
> > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should
> give
> > > you
> > > > > > > access
> > > > > > > > to
> > > > > > > > > > >> >> >>> the
> > > > > > > > > > >> >> >>>>>>>>>>> ClusterClient
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> to
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the
> > > > > JobGraph or
> > > > > > > a
> > > > > > > > > > >> >> >> job
> > > > > > > > > > >> >> >>>>> plan).
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Till
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36
> AM
> > > Jeff
> > > > > > > Zhang <
> > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> wrote:
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You
> are
> > > > > right
> > > > > > > > that I
> > > > > > > > > > >> >> >>>>> expect
> > > > > > > > > > >> >> >>>>>>>> better
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api
> which
> > > > > could be
> > > > > > > > > > >> >> >> used
> > > > > > > > > > >> >> >>> by
> > > > > > > > > > >> >> >>>>>>>>>>> downstream
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> project.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> And
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the
> flink
> > > > > ecosystem.
> > > > > > > > > > >> >> >> When
> > > > > > > > > > >> >> >>> I
> > > > > > > > > > >> >> >>>>> look
> > > > > > > > > > >> >> >>>>>>> at
> > > > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>> code
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> of
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> flink
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I
> > > > believe
> > > > > > > they
> > > > > > > > > > >> >> >> are
> > > > > > > > > > >> >> >>>> not
> > > > > > > > > > >> >> >>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>> core of
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> flink,
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> but
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of
> > > flink),
> > > > I
> > > > > find
> > > > > > > > > > >> >> >> many
> > > > > > > > > > >> >> >>>>>>> duplicated
> > > > > > > > > > >> >> >>>>>>>>>>> code
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> for
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creating
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user
> provided
> > > > > > > > > > >> >> >> configuration
> > > > > > > > > > >> >> >>>>>>>>>>> (configuration
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> format
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> may
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell
> and
> > > > > > > sql-client)
> > > > > > > > > > >> >> >> and
> > > > > > > > > > >> >> >>>> then
> > > > > > > > > > >> >> >>>>>> use
> > > > > > > > > > >> >> >>>>>>>>>>> that
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't
> think
> > > > > this is
> > > > > > > > > > >> >> >>>>> convenient
> > > > > > > > > > >> >> >>>>>>> for
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> downstream
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is
> that
> > > > > > > downstream
> > > > > > > > > > >> >> >>>> project
> > > > > > > > > > >> >> >>>>>> only
> > > > > > > > > > >> >> >>>>>>>>>>> needs
> > > > > > > > > > >> >> >>>>>>>>>>>> to
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> provide
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration info
> > > (maybe
> > > > > > > > > > >> >> >> introducing
> > > > > > > > > > >> >> >>>>> class
> > > > > > > > > > >> >> >>>>>>>>>>>> FlinkConf),
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment
> based
> > > on
> > > > > this
> > > > > > > > > > >> >> >>>> FlinkConf,
> > > > > > > > > > >> >> >>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will
> create
> > > > the
> > > > > > > proper
> > > > > > > > > > >> >> >>>>>>>> ClusterClient.
> > > > > > > > > > >> >> >>>>>>>>>>> It
> > > > > > > > > > >> >> >>>>>>>>>>>> not
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> only
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream
> > > project
> > > > > > > > > > >> >> >> development
> > > > > > > > > > >> >> >>>> but
> > > > > > > > > > >> >> >>>>>> also
> > > > > > > > > > >> >> >>>>>>>> be
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> helpful
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> for
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> their integration test with
> > > flink.
> > > > > Here's
> > > > > > > > one
> > > > > > > > > > >> >> >>>>> sample
> > > > > > > > > > >> >> >>>>>>> code
> > > > > > > > > > >> >> >>>>>>>>>>>> snippet
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> I
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new
> > > > > FlinkConf().mode("yarn")
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new
> > > > > ExecutionEnvironment(conf)
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> > > > > > > > > > >> >> >>>>>>>>>>>
> > > env.getClusterClient().queryJobStatus(jobId)
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > env.getClusterClient().cancelJob(jobId)
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
> > > > trohrmann@apache.org>
> > > > > > > > > > >> >> >>>>> 于2018年12月11日周二
> > > > > > > > > > >> >> >>>>>>>>>>> 下午6:28写道：
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to
> > > > > provide the
> > > > > > > > > > >> >> >> user
> > > > > > > > > > >> >> >>>> with
> > > > > > > > > > >> >> >>>>>>>> better
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was actually
> an
> > > > > effort to
> > > > > > > > > > >> >> >>> achieve
> > > > > > > > > > >> >> >>>>>> this
> > > > > > > > > > >> >> >>>>>>>> but
> > > > > > > > > > >> >> >>>>>>>>>>> it
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> has
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> never
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> been
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However,
> there
> > > are
> > > > > some
> > > > > > > > > > >> >> >>>> improvement
> > > > > > > > > > >> >> >>>>>> in
> > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>> code
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> base
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> now.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the
> > > > > NewClusterClient
> > > > > > > > > > >> >> >>>> interface
> > > > > > > > > > >> >> >>>>>>> which
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> offers a
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission.
> > > But I
> > > > > agree
> > > > > > > > > > >> >> >> that
> > > > > > > > > > >> >> >>> we
> > > > > > > > > > >> >> >>>>>> need
> > > > > > > > > > >> >> >>>>>>> to
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> improve
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> in
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if
> > > > > exposing all
> > > > > > > > > > >> >> >>>>>> ClusterClient
> > > > > > > > > > >> >> >>>>>>>>>>> calls
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> via
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment
> because it
> > > > > would
> > > > > > > > > > >> >> >> clutter
> > > > > > > > > > >> >> >>>> the
> > > > > > > > > > >> >> >>>>>>> class
> > > > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> would
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> not
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> a
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns.
> > > > > Instead one
> > > > > > > > > > >> >> >> idea
> > > > > > > > > > >> >> >>>>> could
> > > > > > > > > > >> >> >>>>>> be
> > > > > > > > > > >> >> >>>>>>>> to
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> retrieve
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from
> the
> > > > > > > > > > >> >> >>>>> ExecutionEnvironment
> > > > > > > > > > >> >> >>>>>>>> which
> > > > > > > > > > >> >> >>>>>>>>>>> can
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> be
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> used
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job
> control. But
> > > > > before
> > > > > > > we
> > > > > > > > > > >> >> >>> start
> > > > > > > > > > >> >> >>>>> an
> > > > > > > > > > >> >> >>>>>>>> effort
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> here,
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> we
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> need
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what
> > > > > functionality we
> > > > > > > > want
> > > > > > > > > > >> >> >>> to
> > > > > > > > > > >> >> >>>>>>> provide.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was
> that we
> > > > > have the
> > > > > > > > > > >> >> >>>>>>>> ClusterDescriptor
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> describing
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> how
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager
> like
> > > > > Yarn or
> > > > > > > > > > >> >> >> Mesos.
> > > > > > > > > > >> >> >>>> The
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> can
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> be
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink
> > > clusters
> > > > > (job
> > > > > > > and
> > > > > > > > > > >> >> >>>>> session)
> > > > > > > > > > >> >> >>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>> gives
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> you a
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The
> ClusterClient
> > > > > > > controls
> > > > > > > > > > >> >> >> the
> > > > > > > > > > >> >> >>>>>> cluster
> > > > > > > > > > >> >> >>>>>>>>>>> (e.g.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> submitting
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running
> jobs).
> > > > And
> > > > > > > then
> > > > > > > > > > >> >> >>> there
> > > > > > > > > > >> >> >>>>> was
> > > > > > > > > > >> >> >>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>> idea
> > > > > > > > > > >> >> >>>>>>>>>>>> to
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain
> from
> > > > the
> > > > > > > > > > >> >> >>>> ClusterClient
> > > > > > > > > > >> >> >>>>> to
> > > > > > > > > > >> >> >>>>>>>>>>> trigger
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> specific
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a
> > > > savepoint,
> > > > > > > > > > >> >> >>> cancelling
> > > > > > > > > > >> >> >>>>> the
> > > > > > > > > > >> >> >>>>>>>> job).
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> > > > > > > > > > >> >> >>>>>>
> > > https://issues.apache.org/jira/browse/FLINK-4272
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at
> 10:13 AM
> > > > > Jeff
> > > > > > > > Zhang
> > > > > > > > > > >> >> >> <
> > > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> wrote:
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate
> flink
> > > > into
> > > > > > > > apache
> > > > > > > > > > >> >> >>>>> zeppelin
> > > > > > > > > > >> >> >>>>>>>>>>> which is
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> an
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several
> > > > issues
> > > > > that
> > > > > > > > is
> > > > > > > > > > >> >> >>>> caused
> > > > > > > > > > >> >> >>>>>> by
> > > > > > > > > > >> >> >>>>>>>>>>> flink
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> client
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> api.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> So
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the
> > > > following
> > > > > > > > changes
> > > > > > > > > > >> >> >>> for
> > > > > > > > > > >> >> >>>>>> flink
> > > > > > > > > > >> >> >>>>>>>>>>> client
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> api.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking
> > > execution.
> > > > > > > > > > >> >> >> Currently,
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which
> > > would
> > > > > do 2
> > > > > > > > > > >> >> >> things,
> > > > > > > > > > >> >> >>>>> first
> > > > > > > > > > >> >> >>>>>>>>>>> submit
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is
> > > > finished.
> > > > > I'd
> > > > > > > > like
> > > > > > > > > > >> >> >>>>>>> introduce a
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> nonblocking
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
> > > > > > > > > > >> >> >>>> ExecutionEnvironment#submit
> > > > > > > > > > >> >> >>>>>>> which
> > > > > > > > > > >> >> >>>>>>>>>>> only
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> submit
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to
> client.
> > > And
> > > > > allow
> > > > > > > > user
> > > > > > > > > > >> >> >>> to
> > > > > > > > > > >> >> >>>>>> query
> > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>> job
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> status
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> via
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to
> > > cancel
> > > > > job is
> > > > > > > > via
> > > > > > > > > > >> >> >>> cli
> > > > > > > > > > >> >> >>>>>>>>>>> (bin/flink),
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> this
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> is
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> not
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream
> > > project
> > > > > to
> > > > > > > use
> > > > > > > > > > >> >> >> this
> > > > > > > > > > >> >> >>>>>>> feature.
> > > > > > > > > > >> >> >>>>>>>>>>> So I'd
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> like
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> add
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in
> > > > ExecutionEnvironment
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>
> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> It
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api,
> we
> > > > > should use
> > > > > > > > > > >> >> >>>>>>>>>>> ExecutionEnvironment
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> as
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to
> > > integrate
> > > > > with
> > > > > > > > > > >> >> >> flink.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job
> > > execution
> > > > > > > > > > >> >> >> lifecycle.
> > > > > > > > > > >> >> >>>>>>> Something
> > > > > > > > > > >> >> >>>>>>>>>>> like
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> following,
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> so
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project
> can do
> > > > > custom
> > > > > > > > logic
> > > > > > > > > > >> >> >>> in
> > > > > > > > > > >> >> >>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>> lifecycle
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> of
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> job.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the
> > > jobId
> > > > > after
> > > > > > > > job
> > > > > > > > > > >> >> >> is
> > > > > > > > > > >> >> >>>>>>> submitted
> > > > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> use
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> this
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later
> when
> > > > > > > necessary.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface
> JobListener {
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID
> > > > jobId);
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
> > > > > onJobExecuted(JobExecutionResult
> > > > > > > > > > >> >> >>>>> jobResult);
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID
> > > jobId);
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in
> > > > > > > > ExecutionEnvironment.
> > > > > > > > > > >> >> >>>>>> Currently
> > > > > > > > > > >> >> >>>>>>> it
> > > > > > > > > > >> >> >>>>>>>>>>> is
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> disabled,
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> but
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient
> for
> > > > > third
> > > > > > > > party
> > > > > > > > > > >> >> >> to
> > > > > > > > > > >> >> >>>>>>>> submitting
> > > > > > > > > > >> >> >>>>>>>>>>> jobs
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it
> > > again.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client
> api
> > > > into
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>
> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue
> which
> > > > > needs
> > > > > > > > more
> > > > > > > > > > >> >> >>>>> careful
> > > > > > > > > > >> >> >>>>>>>>>>> thinking
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> design.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features
> of
> > > > > flink is
> > > > > > > > > > >> >> >>> exposed
> > > > > > > > > > >> >> >>>> in
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>
> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > > > > > > >> >> >>>>>> but
> > > > > > > > > > >> >> >>>>>>>>>>> some are
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> exposed
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> in
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like
> the
> > > > > cancel and
> > > > > > > > > > >> >> >>>>> savepoint I
> > > > > > > > > > >> >> >>>>>>>>>>> mentioned
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> above.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is
> due to
> > > > that
> > > > > > > flink
> > > > > > > > > > >> >> >>>> didn't
> > > > > > > > > > >> >> >>>>>>> unify
> > > > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> interaction
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> with
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3
> scenarios
> > > of
> > > > > flink
> > > > > > > > > > >> >> >>>> operation
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.
> Flink
> > > > will
> > > > > > > > create
> > > > > > > > > > >> >> >>>>>>>>>>> LocalEnvironment
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> use
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to
> > > create
> > > > > > > > > > >> >> >>> LocalExecutor
> > > > > > > > > > >> >> >>>>> for
> > > > > > > > > > >> >> >>>>>>> job
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> execution.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution.
> Flink
> > > > will
> > > > > > > > create
> > > > > > > > > > >> >> >>>>>>>> ClusterClient
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> first
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> then
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment
> > > based
> > > > > on the
> > > > > > > > > > >> >> >>>>>>> ClusterClient
> > > > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> run
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink
> will
> > > > > create
> > > > > > > > > > >> >> >>>>>> ClusterClient
> > > > > > > > > > >> >> >>>>>>>>>>> first
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> then
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this
> > > ClusterClient.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the
> above 3
> > > > > > > scenarios.
> > > > > > > > > > >> >> >>> Flink
> > > > > > > > > > >> >> >>>>>> didn't
> > > > > > > > > > >> >> >>>>>>>>>>> use the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to
> interact
> > > > > with
> > > > > > > > flink
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is
> following:
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> > > > > > > > > > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
> > > > > > > > > > >> >> >>>>>>>>>>> (based
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> on
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> user
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this
> > > > > Environment
> > > > > > > > to
> > > > > > > > > > >> >> >>>> create
> > > > > > > > > > >> >> >>>>>>>> proper
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or
> > > > > > > RestClusterClient)
> > > > > > > > > > >> >> >> to
> > > > > > > > > > >> >> >>>>>>>> interactive
> > > > > > > > > > >> >> >>>>>>>>>>> with
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink (
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the
> > > process
> > > > > of
> > > > > > > > local
> > > > > > > > > > >> >> >>>>>> execution
> > > > > > > > > > >> >> >>>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> remote
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for
> third
> > > > > party
> > > > > > > to
> > > > > > > > > > >> >> >>>>> integrate
> > > > > > > > > > >> >> >>>>>>> with
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> flink,
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> because
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the
> > > > unified
> > > > > > > entry
> > > > > > > > > > >> >> >>> point
> > > > > > > > > > >> >> >>>>> for
> > > > > > > > > > >> >> >>>>>>>>>>> flink.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> What
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> third
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> party
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass
> > > > > configuration
> > > > > > > to
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will
> do
> > > the
> > > > > right
> > > > > > > > > > >> >> >> thing
> > > > > > > > > > >> >> >>>>> based
> > > > > > > > > > >> >> >>>>>> on
> > > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be
> > > considered
> > > > as
> > > > > > > flink
> > > > > > > > > > >> >> >> api
> > > > > > > > > > >> >> >>>>>>> consumer.
> > > > > > > > > > >> >> >>>>>>>>>>> it
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> just
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to
> > > > > ExecutionEnvironment
> > > > > > > and
> > > > > > > > > > >> >> >> let
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper
> ClusterClient
> > > > > instead
> > > > > > > > of
> > > > > > > > > > >> >> >>>>> letting
> > > > > > > > > > >> >> >>>>>>> cli
> > > > > > > > > > >> >> >>>>>>>> to
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> create
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code
> > > > > refactoring,
> > > > > > > > so
> > > > > > > > > > >> >> >> I
> > > > > > > > > > >> >> >>>>> think
> > > > > > > > > > >> >> >>>>>> we
> > > > > > > > > > >> >> >>>>>>>> can
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> defer
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> it
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> for
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5
> could
> > > > be
> > > > > done
> > > > > > > > at
> > > > > > > > > > >> >> >>>> once I
> > > > > > > > > > >> >> >>>>>>>>>>> believe.
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>> Let
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> know
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> your
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback,
> thanks
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> --
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> --
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> --
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>> --
> > > > > > > > > > >> >> >>>>>>>>>>>>>>> Best Regards
> > > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>> --
> > > > > > > > > > >> >> >>>>>>>>>>>>> Best Regards
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > >> >> >>>>>>>> --
> > > > > > > > > > >> >> >>>>>>>> Best Regards
> > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > >> >> >>>>>>>> Jeff Zhang
> > > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > > >> >> >>>>>>>
> > > > > > > > > > >> >> >>>>>>
> > > > > > > > > > >> >> >>>>>
> > > > > > > > > > >> >> >>>>>
> > > > > > > > > > >> >> >>>>> --
> > > > > > > > > > >> >> >>>>> Best Regards
> > > > > > > > > > >> >> >>>>>
> > > > > > > > > > >> >> >>>>> Jeff Zhang
> > > > > > > > > > >> >> >>>>>
> > > > > > > > > > >> >> >>>>
> > > > > > > > > > >> >> >>>
> > > > > > > > > > >> >> >>
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> > --
> > > > > > > > > > >> >> > Best Regards
> > > > > > > > > > >> >> >
> > > > > > > > > > >> >> > Jeff Zhang
> > > > > > > > > > >> >>
> > > > > > > > > > >> >>
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > >
> > >
>


-- 
Best Regards

Jeff Zhang

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Kostas Kloudas <kk...@gmail.com>.

Hi all,

I am just writing here to let you know that I am working on a POC that
tries to refactor the current state of job submission in Flink.
I want to stress out that it introduces NO CHANGES to the current
behaviour of Flink. It just re-arranges things and introduces the
notion of an Executor, which is the entity responsible for taking the
user-code and submitting it for execution.

Given this, the discussion about the functionality that the JobClient
will expose to the user can go on independently and the same
holds for all the open questions so far.

I hope I will have some more new to share soon.

Thanks,
Kostas

On Mon, Aug 26, 2019 at 4:20 AM Yang Wang <da...@gmail.com> wrote:
>
> Hi Zili,
>
> It make sense to me that a dedicated cluster is started for a per-job
> cluster and will not accept more jobs.
> Just have a question about the command line.
>
> Currently we could use the following commands to start different clusters.
> *per-job cluster*
> ./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
> examples/streaming/WindowJoin.jar
> *session cluster*
> ./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
> examples/streaming/WindowJoin.jar
>
> What will it look like after client enhancement?
>
>
> Best,
> Yang
>
> Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：
>
> > Hi Till,
> >
> > Thanks for your update. Nice to hear :-)
> >
> > Best,
> > tison.
> >
> >
> > Till Rohrmann <tr...@apache.org> 于2019年8月23日周五 下午10:39写道：
> >
> > > Hi Tison,
> > >
> > > just a quick comment concerning the class loading issues when using the
> > per
> > > job mode. The community wants to change it so that the
> > > StandaloneJobClusterEntryPoint actually uses the user code class loader
> > > with child first class loading [1]. Hence, I hope that this problem will
> > be
> > > resolved soon.
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-13840
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <kk...@gmail.com>
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > On the topic of web submission, I agree with Till that it only seems
> > > > to complicate things.
> > > > It is bad for security, job isolation (anybody can submit/cancel jobs),
> > > > and its
> > > > implementation complicates some parts of the code. So, if it were to
> > > > redesign the
> > > > WebUI, maybe this part could be left out. In addition, I would say
> > > > that the ability to cancel
> > > > jobs could also be left out.
> > > >
> > > > Also I would also be in favour of removing the "detached" mode, for
> > > > the reasons mentioned
> > > > above (i.e. because now we will have a future representing the result
> > > > on which the user
> > > > can choose to wait or not).
> > > >
> > > > Now for the separating job submission and cluster creation, I am in
> > > > favour of keeping both.
> > > > Once again, the reasons are mentioned above by Stephan, Till, Aljoscha
> > > > and also Zili seems
> > > > to agree. They mainly have to do with security, isolation and ease of
> > > > resource management
> > > > for the user as he knows that "when my job is done, everything will be
> > > > cleared up". This is
> > > > also the experience you get when launching a process on your local OS.
> > > >
> > > > On excluding the per-job mode from returning a JobClient or not, I
> > > > believe that eventually
> > > > it would be nice to allow users to get back a jobClient. The reason is
> > > > that 1) I cannot
> > > > find any objective reason why the user-experience should diverge, and
> > > > 2) this will be the
> > > > way that the user will be able to interact with his running job.
> > > > Assuming that the necessary
> > > > ports are open for the REST API to work, then I think that the
> > > > JobClient can run against the
> > > > REST API without problems. If the needed ports are not open, then we
> > > > are safe to not return
> > > > a JobClient, as the user explicitly chose to close all points of
> > > > communication to his running job.
> > > >
> > > > On the topic of not hijacking the "env.execute()" in order to get the
> > > > Plan, I definitely agree but
> > > > for the proposal of having a "compile()" method in the env, I would
> > > > like to have a better look at
> > > > the existing code.
> > > >
> > > > Cheers,
> > > > Kostas
> > > >
> > > > On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <wa...@gmail.com>
> > wrote:
> > > > >
> > > > > Hi Yang,
> > > > >
> > > > > It would be helpful if you check Stephan's last comment,
> > > > > which states that isolation is important.
> > > > >
> > > > > For per-job mode, we run a dedicated cluster(maybe it
> > > > > should have been a couple of JM and TMs during FLIP-6
> > > > > design) for a specific job. Thus the process is prevented
> > > > > from other jobs.
> > > > >
> > > > > In our cases there was a time we suffered from multi
> > > > > jobs submitted by different users and they affected
> > > > > each other so that all ran into an error state. Also,
> > > > > run the client inside the cluster could save client
> > > > > resource at some points.
> > > > >
> > > > > However, we also face several issues as you mentioned,
> > > > > that in per-job mode it always uses parent classloader
> > > > > thus classloading issues occur.
> > > > >
> > > > > BTW, one can makes an analogy between session/per-job mode
> > > > > in  Flink, and client/cluster mode in Spark.
> > > > >
> > > > > Best,
> > > > > tison.
> > > > >
> > > > >
> > > > > Yang Wang <da...@gmail.com> 于2019年8月22日周四 上午11:25写道：
> > > > >
> > > > > > From the user's perspective, it is really confused about the scope
> > of
> > > > > > per-job cluster.
> > > > > >
> > > > > >
> > > > > > If it means a flink cluster with single job, so that we could get
> > > > better
> > > > > > isolation.
> > > > > >
> > > > > > Now it does not matter how we deploy the cluster, directly
> > > > deploy(mode1)
> > > > > >
> > > > > > or start a flink cluster and then submit job through cluster
> > > > client(mode2).
> > > > > >
> > > > > >
> > > > > > Otherwise, if it just means directly deploy, how should we name the
> > > > mode2,
> > > > > >
> > > > > > session with job or something else?
> > > > > >
> > > > > > We could also benefit from the mode2. Users could get the same
> > > > isolation
> > > > > > with mode1.
> > > > > >
> > > > > > The user code and dependencies will be loaded by user class loader
> > > > > >
> > > > > > to avoid class conflict with framework.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Anyway, both of the two submission modes are useful.
> > > > > >
> > > > > > We just need to clarify the concepts.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Yang
> > > > > >
> > > > > > Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午5:58写道：
> > > > > >
> > > > > > > Thanks for the clarification.
> > > > > > >
> > > > > > > The idea JobDeployer ever came into my mind when I was muddled
> > with
> > > > > > > how to execute per-job mode and session mode with the same user
> > > code
> > > > > > > and framework codepath.
> > > > > > >
> > > > > > > With the concept JobDeployer we back to the statement that
> > > > environment
> > > > > > > knows every configs of cluster deployment and job submission. We
> > > > > > > configure or generate from configuration a specific JobDeployer
> > in
> > > > > > > environment and then code align on
> > > > > > >
> > > > > > > *JobClient client = env.execute().get();*
> > > > > > >
> > > > > > > which in session mode returned by clusterClient.submitJob and in
> > > > per-job
> > > > > > > mode returned by clusterDescriptor.deployJobCluster.
> > > > > > >
> > > > > > > Here comes a problem that currently we directly run
> > > ClusterEntrypoint
> > > > > > > with extracted job graph. Follow the JobDeployer way we'd better
> > > > > > > align entry point of per-job deployment at JobDeployer. Users run
> > > > > > > their main method or by a Cli(finally call main method) to deploy
> > > the
> > > > > > > job cluster.
> > > > > > >
> > > > > > > Best,
> > > > > > > tison.
> > > > > > >
> > > > > > >
> > > > > > > Stephan Ewen <se...@apache.org> 于2019年8月20日周二 下午4:40写道：
> > > > > > >
> > > > > > > > Till has made some good comments here.
> > > > > > > >
> > > > > > > > Two things to add:
> > > > > > > >
> > > > > > > >   - The job mode is very nice in the way that it runs the
> > client
> > > > inside
> > > > > > > the
> > > > > > > > cluster (in the same image/process that is the JM) and thus
> > > unifies
> > > > > > both
> > > > > > > > applications and what the Spark world calls the "driver mode".
> > > > > > > >
> > > > > > > >   - Another thing I would add is that during the FLIP-6 design,
> > > we
> > > > were
> > > > > > > > thinking about setups where Dispatcher and JobManager are
> > > separate
> > > > > > > > processes.
> > > > > > > >     A Yarn or Mesos Dispatcher of a session could run
> > > independently
> > > > > > (even
> > > > > > > > as privileged processes executing no code).
> > > > > > > >     Then you the "per-job" mode could still be helpful: when a
> > > job
> > > > is
> > > > > > > > submitted to the dispatcher, it launches the JM again in a
> > > per-job
> > > > > > mode,
> > > > > > > so
> > > > > > > > that JM and TM processes are bound to teh job only. For higher
> > > > security
> > > > > > > > setups, it is important that processes are not reused across
> > > jobs.
> > > > > > > >
> > > > > > > > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
> > > > trohrmann@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I would not be in favour of getting rid of the per-job mode
> > > > since it
> > > > > > > > > simplifies the process of running Flink jobs considerably.
> > > > Moreover,
> > > > > > it
> > > > > > > > is
> > > > > > > > > not only well suited for container deployments but also for
> > > > > > deployments
> > > > > > > > > where you want to guarantee job isolation. For example, a
> > user
> > > > could
> > > > > > > use
> > > > > > > > > the per-job mode on Yarn to execute his job on a separate
> > > > cluster.
> > > > > > > > >
> > > > > > > > > I think that having two notions of cluster deployments
> > (session
> > > > vs.
> > > > > > > > per-job
> > > > > > > > > mode) does not necessarily contradict your ideas for the
> > client
> > > > api
> > > > > > > > > refactoring. For example one could have the following
> > > interfaces:
> > > > > > > > >
> > > > > > > > > - ClusterDeploymentDescriptor: encapsulates the logic how to
> > > > deploy a
> > > > > > > > > cluster.
> > > > > > > > > - ClusterClient: allows to interact with a cluster
> > > > > > > > > - JobClient: allows to interact with a running job
> > > > > > > > >
> > > > > > > > > Now the ClusterDeploymentDescriptor could have two methods:
> > > > > > > > >
> > > > > > > > > - ClusterClient deploySessionCluster()
> > > > > > > > > - JobClusterClient/JobClient deployPerJobCluster(JobGraph)
> > > > > > > > >
> > > > > > > > > where JobClusterClient is either a supertype of ClusterClient
> > > > which
> > > > > > > does
> > > > > > > > > not give you the functionality to submit jobs or
> > > > deployPerJobCluster
> > > > > > > > > returns directly a JobClient.
> > > > > > > > >
> > > > > > > > > When setting up the ExecutionEnvironment, one would then not
> > > > provide
> > > > > > a
> > > > > > > > > ClusterClient to submit jobs but a JobDeployer which,
> > depending
> > > > on
> > > > > > the
> > > > > > > > > selected mode, either uses a ClusterClient (session mode) to
> > > > submit
> > > > > > > jobs
> > > > > > > > or
> > > > > > > > > a ClusterDeploymentDescriptor to deploy per a job mode
> > cluster
> > > > with
> > > > > > the
> > > > > > > > job
> > > > > > > > > to execute.
> > > > > > > > >
> > > > > > > > > These are just some thoughts how one could make it working
> > > > because I
> > > > > > > > > believe there is some value in using the per job mode from
> > the
> > > > > > > > > ExecutionEnvironment.
> > > > > > > > >
> > > > > > > > > Concerning the web submission, this is indeed a bit tricky.
> > > From
> > > > a
> > > > > > > > cluster
> > > > > > > > > management stand point, I would in favour of not executing
> > user
> > > > code
> > > > > > on
> > > > > > > > the
> > > > > > > > > REST endpoint. Especially when considering security, it would
> > > be
> > > > good
> > > > > > > to
> > > > > > > > > have a well defined cluster behaviour where it is explicitly
> > > > stated
> > > > > > > where
> > > > > > > > > user code and, thus, potentially risky code is executed.
> > > Ideally
> > > > we
> > > > > > > limit
> > > > > > > > > it to the TaskExecutor and JobMaster.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Till
> > > > > > > > >
> > > > > > > > > On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <
> > > > > > > pompermaier@okkam.it
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > In my opinion the client should not use any environment to
> > > get
> > > > the
> > > > > > > Job
> > > > > > > > > > graph because the jar should reside ONLY on the cluster
> > (and
> > > > not in
> > > > > > > the
> > > > > > > > > > client classpath otherwise there are always inconsistencies
> > > > between
> > > > > > > > > client
> > > > > > > > > > and Flink Job manager's classpath).
> > > > > > > > > > In the YARN, Mesos and Kubernetes scenarios you have the
> > jar
> > > > but
> > > > > > you
> > > > > > > > > could
> > > > > > > > > > start a cluster that has the jar on the Job Manager as well
> > > > (but
> > > > > > this
> > > > > > > > is
> > > > > > > > > > the only case where I think you can assume that the client
> > > has
> > > > the
> > > > > > > jar
> > > > > > > > on
> > > > > > > > > > the classpath..in the REST job submission you don't have
> > any
> > > > > > > > classpath).
> > > > > > > > > >
> > > > > > > > > > Thus, always in my opinion, the JobGraph should be
> > generated
> > > > by the
> > > > > > > Job
> > > > > > > > > > Manager REST API.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
> > > > wander4096@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> I would like to involve Till & Stephan here to clarify
> > some
> > > > > > concept
> > > > > > > of
> > > > > > > > > >> per-job mode.
> > > > > > > > > >>
> > > > > > > > > >> The term per-job is one of modes a cluster could run on.
> > It
> > > is
> > > > > > > mainly
> > > > > > > > > >> aimed
> > > > > > > > > >> at spawn
> > > > > > > > > >> a dedicated cluster for a specific job while the job could
> > > be
> > > > > > > packaged
> > > > > > > > > >> with
> > > > > > > > > >> Flink
> > > > > > > > > >> itself and thus the cluster initialized with job so that
> > get
> > > > rid
> > > > > > of
> > > > > > > a
> > > > > > > > > >> separated
> > > > > > > > > >> submission step.
> > > > > > > > > >>
> > > > > > > > > >> This is useful for container deployments where one create
> > > his
> > > > > > image
> > > > > > > > with
> > > > > > > > > >> the job
> > > > > > > > > >> and then simply deploy the container.
> > > > > > > > > >>
> > > > > > > > > >> However, it is out of client scope since a
> > > > client(ClusterClient
> > > > > > for
> > > > > > > > > >> example) is for
> > > > > > > > > >> communicate with an existing cluster and performance
> > > actions.
> > > > > > > > Currently,
> > > > > > > > > >> in
> > > > > > > > > >> per-job
> > > > > > > > > >> mode, we extract the job graph and bundle it into cluster
> > > > > > deployment
> > > > > > > > and
> > > > > > > > > >> thus no
> > > > > > > > > >> concept of client get involved. It looks like reasonable
> > to
> > > > > > exclude
> > > > > > > > the
> > > > > > > > > >> deployment
> > > > > > > > > >> of per-job cluster from client api and use dedicated
> > utility
> > > > > > > > > >> classes(deployers) for
> > > > > > > > > >> deployment.
> > > > > > > > > >>
> > > > > > > > > >> Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午12:37写道：
> > > > > > > > > >>
> > > > > > > > > >> > Hi Aljoscha,
> > > > > > > > > >> >
> > > > > > > > > >> > Thanks for your reply and participance. The Google Doc
> > you
> > > > > > linked
> > > > > > > to
> > > > > > > > > >> > requires
> > > > > > > > > >> > permission and I think you could use a share link
> > instead.
> > > > > > > > > >> >
> > > > > > > > > >> > I agree with that we almost reach a consensus that
> > > > JobClient is
> > > > > > > > > >> necessary
> > > > > > > > > >> > to
> > > > > > > > > >> > interacte with a running Job.
> > > > > > > > > >> >
> > > > > > > > > >> > Let me check your open questions one by one.
> > > > > > > > > >> >
> > > > > > > > > >> > 1. Separate cluster creation and job submission for
> > > per-job
> > > > > > mode.
> > > > > > > > > >> >
> > > > > > > > > >> > As you mentioned here is where the opinions diverge. In
> > my
> > > > > > > document
> > > > > > > > > >> there
> > > > > > > > > >> > is
> > > > > > > > > >> > an alternative[2] that proposes excluding per-job
> > > deployment
> > > > > > from
> > > > > > > > > client
> > > > > > > > > >> > api
> > > > > > > > > >> > scope and now I find it is more reasonable we do the
> > > > exclusion.
> > > > > > > > > >> >
> > > > > > > > > >> > When in per-job mode, a dedicated JobCluster is launched
> > > to
> > > > > > > execute
> > > > > > > > > the
> > > > > > > > > >> > specific job. It is like a Flink Application more than a
> > > > > > > submission
> > > > > > > > > >> > of Flink Job. Client only takes care of job submission
> > and
> > > > > > assume
> > > > > > > > > there
> > > > > > > > > >> is
> > > > > > > > > >> > an existing cluster. In this way we are able to consider
> > > > per-job
> > > > > > > > > issues
> > > > > > > > > >> > individually and JobClusterEntrypoint would be the
> > utility
> > > > class
> > > > > > > for
> > > > > > > > > >> > per-job
> > > > > > > > > >> > deployment.
> > > > > > > > > >> >
> > > > > > > > > >> > Nevertheless, user program works in both session mode
> > and
> > > > > > per-job
> > > > > > > > mode
> > > > > > > > > >> > without
> > > > > > > > > >> > necessary to change code. JobClient in per-job mode is
> > > > returned
> > > > > > > from
> > > > > > > > > >> > env.execute as normal. However, it would be no longer a
> > > > wrapper
> > > > > > of
> > > > > > > > > >> > RestClusterClient but a wrapper of PerJobClusterClient
> > > which
> > > > > > > > > >> communicates
> > > > > > > > > >> > to Dispatcher locally.
> > > > > > > > > >> >
> > > > > > > > > >> > 2. How to deal with plan preview.
> > > > > > > > > >> >
> > > > > > > > > >> > With env.compile functions users can get JobGraph or
> > > > FlinkPlan
> > > > > > and
> > > > > > > > > thus
> > > > > > > > > >> > they can preview the plan with programming. Typically it
> > > > looks
> > > > > > > like
> > > > > > > > > >> >
> > > > > > > > > >> > if (preview configured) {
> > > > > > > > > >> >     FlinkPlan plan = env.compile();
> > > > > > > > > >> >     new JSONDumpGenerator(...).dump(plan);
> > > > > > > > > >> > } else {
> > > > > > > > > >> >     env.execute();
> > > > > > > > > >> > }
> > > > > > > > > >> >
> > > > > > > > > >> > And `flink info` would be invalid any more.
> > > > > > > > > >> >
> > > > > > > > > >> > 3. How to deal with Jar Submission at the Web Frontend.
> > > > > > > > > >> >
> > > > > > > > > >> > There is one more thread talked on this topic[1]. Apart
> > > from
> > > > > > > > removing
> > > > > > > > > >> > the functions there are two alternatives.
> > > > > > > > > >> >
> > > > > > > > > >> > One is to introduce an interface has a method returns
> > > > > > > > > JobGraph/FilnkPlan
> > > > > > > > > >> > and Jar Submission only support main-class implements
> > this
> > > > > > > > interface.
> > > > > > > > > >> > And then extract the JobGraph/FlinkPlan just by calling
> > > the
> > > > > > > method.
> > > > > > > > > >> > In this way, it is even possible to consider a
> > separation
> > > > of job
> > > > > > > > > >> creation
> > > > > > > > > >> > and job submission.
> > > > > > > > > >> >
> > > > > > > > > >> > The other is, as you mentioned, let execute() do the
> > > actual
> > > > > > > > execution.
> > > > > > > > > >> > We won't execute the main method in the WebFrontend but
> > > > spawn a
> > > > > > > > > process
> > > > > > > > > >> > at WebMonitor side to execute. For return part we could
> > > > generate
> > > > > > > the
> > > > > > > > > >> > JobID from WebMonitor and pass it to the execution
> > > > environemnt.
> > > > > > > > > >> >
> > > > > > > > > >> > 4. How to deal with detached mode.
> > > > > > > > > >> >
> > > > > > > > > >> > I think detached mode is a temporary solution for
> > > > non-blocking
> > > > > > > > > >> submission.
> > > > > > > > > >> > In my document both submission and execution return a
> > > > > > > > > CompletableFuture
> > > > > > > > > >> and
> > > > > > > > > >> > users control whether or not wait for the result. In
> > this
> > > > point
> > > > > > we
> > > > > > > > > don't
> > > > > > > > > >> > need a detached option but the functionality is covered.
> > > > > > > > > >> >
> > > > > > > > > >> > 5. How does per-job mode interact with interactive
> > > > programming.
> > > > > > > > > >> >
> > > > > > > > > >> > All of YARN, Mesos and Kubernetes scenarios follow the
> > > > pattern
> > > > > > > > launch
> > > > > > > > > a
> > > > > > > > > >> > JobCluster now. And I don't think there would be
> > > > inconsistency
> > > > > > > > between
> > > > > > > > > >> > different resource management.
> > > > > > > > > >> >
> > > > > > > > > >> > Best,
> > > > > > > > > >> > tison.
> > > > > > > > > >> >
> > > > > > > > > >> > [1]
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> > > > > > > > > >> > [2]
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> > > > > > > > > >> >
> > > > > > > > > >> > Aljoscha Krettek <al...@apache.org> 于2019年8月16日周五
> > > > 下午9:20写道：
> > > > > > > > > >> >
> > > > > > > > > >> >> Hi,
> > > > > > > > > >> >>
> > > > > > > > > >> >> I read both Jeffs initial design document and the newer
> > > > > > document
> > > > > > > by
> > > > > > > > > >> >> Tison. I also finally found the time to collect our
> > > > thoughts on
> > > > > > > the
> > > > > > > > > >> issue,
> > > > > > > > > >> >> I had quite some discussions with Kostas and this is
> > the
> > > > > > result:
> > > > > > > > [1].
> > > > > > > > > >> >>
> > > > > > > > > >> >> I think overall we agree that this part of the code is
> > in
> > > > dire
> > > > > > > need
> > > > > > > > > of
> > > > > > > > > >> >> some refactoring/improvements but I think there are
> > still
> > > > some
> > > > > > > open
> > > > > > > > > >> >> questions and some differences in opinion what those
> > > > > > refactorings
> > > > > > > > > >> should
> > > > > > > > > >> >> look like.
> > > > > > > > > >> >>
> > > > > > > > > >> >> I think the API-side is quite clear, i.e. we need some
> > > > > > JobClient
> > > > > > > > API
> > > > > > > > > >> that
> > > > > > > > > >> >> allows interacting with a running Job. It could be
> > > > worthwhile
> > > > > > to
> > > > > > > > spin
> > > > > > > > > >> that
> > > > > > > > > >> >> off into a separate FLIP because we can probably find
> > > > consensus
> > > > > > > on
> > > > > > > > > that
> > > > > > > > > >> >> part more easily.
> > > > > > > > > >> >>
> > > > > > > > > >> >> For the rest, the main open questions from our doc are
> > > > these:
> > > > > > > > > >> >>
> > > > > > > > > >> >>   - Do we want to separate cluster creation and job
> > > > submission
> > > > > > > for
> > > > > > > > > >> >> per-job mode? In the past, there were conscious efforts
> > > to
> > > > > > *not*
> > > > > > > > > >> separate
> > > > > > > > > >> >> job submission from cluster creation for per-job
> > clusters
> > > > for
> > > > > > > > Mesos,
> > > > > > > > > >> YARN,
> > > > > > > > > >> >> Kubernets (see StandaloneJobClusterEntryPoint). Tison
> > > > suggests
> > > > > > in
> > > > > > > > his
> > > > > > > > > >> >> design document to decouple this in order to unify job
> > > > > > > submission.
> > > > > > > > > >> >>
> > > > > > > > > >> >>   - How to deal with plan preview, which needs to
> > hijack
> > > > > > > execute()
> > > > > > > > > and
> > > > > > > > > >> >> let the outside code catch an exception?
> > > > > > > > > >> >>
> > > > > > > > > >> >>   - How to deal with Jar Submission at the Web
> > Frontend,
> > > > which
> > > > > > > > needs
> > > > > > > > > to
> > > > > > > > > >> >> hijack execute() and let the outside code catch an
> > > > exception?
> > > > > > > > > >> >> CliFrontend.run() “hijacks”
> > > ExecutionEnvironment.execute()
> > > > to
> > > > > > > get a
> > > > > > > > > >> >> JobGraph and then execute that JobGraph manually. We
> > > could
> > > > get
> > > > > > > > around
> > > > > > > > > >> that
> > > > > > > > > >> >> by letting execute() do the actual execution. One
> > caveat
> > > > for
> > > > > > this
> > > > > > > > is
> > > > > > > > > >> that
> > > > > > > > > >> >> now the main() method doesn’t return (or is forced to
> > > > return by
> > > > > > > > > >> throwing an
> > > > > > > > > >> >> exception from execute()) which means that for Jar
> > > > Submission
> > > > > > > from
> > > > > > > > > the
> > > > > > > > > >> >> WebFrontend we have a long-running main() method
> > running
> > > > in the
> > > > > > > > > >> >> WebFrontend. This doesn’t sound very good. We could get
> > > > around
> > > > > > > this
> > > > > > > > > by
> > > > > > > > > >> >> removing the plan preview feature and by removing Jar
> > > > > > > > > >> Submission/Running.
> > > > > > > > > >> >>
> > > > > > > > > >> >>   - How to deal with detached mode? Right now,
> > > > > > > DetachedEnvironment
> > > > > > > > > will
> > > > > > > > > >> >> execute the job and return immediately. If users
> > control
> > > > when
> > > > > > > they
> > > > > > > > > >> want to
> > > > > > > > > >> >> return, by waiting on the job completion future, how do
> > > we
> > > > deal
> > > > > > > > with
> > > > > > > > > >> this?
> > > > > > > > > >> >> Do we simply remove the distinction between
> > > > > > > detached/non-detached?
> > > > > > > > > >> >>
> > > > > > > > > >> >>   - How does per-job mode interact with “interactive
> > > > > > programming”
> > > > > > > > > >> >> (FLIP-36). For YARN, each execute() call could spawn a
> > > new
> > > > > > Flink
> > > > > > > > YARN
> > > > > > > > > >> >> cluster. What about Mesos and Kubernetes?
> > > > > > > > > >> >>
> > > > > > > > > >> >> The first open question is where the opinions diverge,
> > I
> > > > think.
> > > > > > > The
> > > > > > > > > >> rest
> > > > > > > > > >> >> are just open questions and interesting things that we
> > > > need to
> > > > > > > > > >> consider.
> > > > > > > > > >> >>
> > > > > > > > > >> >> Best,
> > > > > > > > > >> >> Aljoscha
> > > > > > > > > >> >>
> > > > > > > > > >> >> [1]
> > > > > > > > > >> >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > > > > > > >> >> <
> > > > > > > > > >> >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > > > > > > >> >> >
> > > > > > > > > >> >>
> > > > > > > > > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <
> > > zjffdu@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > Thanks tison for the effort. I left a few comments.
> > > > > > > > > >> >> >
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三
> > > 下午8:24写道：
> > > > > > > > > >> >> >
> > > > > > > > > >> >> >> Hi Flavio,
> > > > > > > > > >> >> >>
> > > > > > > > > >> >> >> Thanks for your reply.
> > > > > > > > > >> >> >>
> > > > > > > > > >> >> >> Either current impl and in the design, ClusterClient
> > > > > > > > > >> >> >> never takes responsibility for generating JobGraph.
> > > > > > > > > >> >> >> (what you see in current codebase is several class
> > > > methods)
> > > > > > > > > >> >> >>
> > > > > > > > > >> >> >> Instead, user describes his program in the main
> > method
> > > > > > > > > >> >> >> with ExecutionEnvironment apis and calls
> > env.compile()
> > > > > > > > > >> >> >> or env.optimize() to get FlinkPlan and JobGraph
> > > > > > respectively.
> > > > > > > > > >> >> >>
> > > > > > > > > >> >> >> For listing main classes in a jar and choose one for
> > > > > > > > > >> >> >> submission, you're now able to customize a CLI to do
> > > it.
> > > > > > > > > >> >> >> Specifically, the path of jar is passed as arguments
> > > and
> > > > > > > > > >> >> >> in the customized CLI you list main classes, choose
> > > one
> > > > > > > > > >> >> >> to submit to the cluster.
> > > > > > > > > >> >> >>
> > > > > > > > > >> >> >> Best,
> > > > > > > > > >> >> >> tison.
> > > > > > > > > >> >> >>
> > > > > > > > > >> >> >>
> > > > > > > > > >> >> >> Flavio Pompermaier <po...@okkam.it>
> > > 于2019年7月31日周三
> > > > > > > > 下午8:12写道：
> > > > > > > > > >> >> >>
> > > > > > > > > >> >> >>> Just one note on my side: it is not clear to me
> > > > whether the
> > > > > > > > > client
> > > > > > > > > >> >> needs
> > > > > > > > > >> >> >> to
> > > > > > > > > >> >> >>> be able to generate a job graph or not.
> > > > > > > > > >> >> >>> In my opinion, the job jar must resides only on the
> > > > > > > > > >> server/jobManager
> > > > > > > > > >> >> >> side
> > > > > > > > > >> >> >>> and the client requires a way to get the job graph.
> > > > > > > > > >> >> >>> If you really want to access to the job graph, I'd
> > > add
> > > > a
> > > > > > > > > dedicated
> > > > > > > > > >> >> method
> > > > > > > > > >> >> >>> on the ClusterClient. like:
> > > > > > > > > >> >> >>>
> > > > > > > > > >> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
> > > > > > > > > >> >> >>>   - listMainClasses(jarId): List<String>
> > > > > > > > > >> >> >>>
> > > > > > > > > >> >> >>> These would require some addition also on the job
> > > > manager
> > > > > > > > > endpoint
> > > > > > > > > >> as
> > > > > > > > > >> >> >>> well..what do you think?
> > > > > > > > > >> >> >>>
> > > > > > > > > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <
> > > > > > > > wander4096@gmail.com
> > > > > > > > > >
> > > > > > > > > >> >> wrote:
> > > > > > > > > >> >> >>>
> > > > > > > > > >> >> >>>> Hi all,
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>> Here is a document[1] on client api enhancement
> > from
> > > > our
> > > > > > > > > >> perspective.
> > > > > > > > > >> >> >>>> We have investigated current implementations. And
> > we
> > > > > > propose
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>> 1. Unify the implementation of cluster deployment
> > > and
> > > > job
> > > > > > > > > >> submission
> > > > > > > > > >> >> in
> > > > > > > > > >> >> >>>> Flink.
> > > > > > > > > >> >> >>>> 2. Provide programmatic interfaces to allow
> > flexible
> > > > job
> > > > > > and
> > > > > > > > > >> cluster
> > > > > > > > > >> >> >>>> management.
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>> The first proposal is aimed at reducing code paths
> > > of
> > > > > > > cluster
> > > > > > > > > >> >> >> deployment
> > > > > > > > > >> >> >>>> and
> > > > > > > > > >> >> >>>> job submission so that one can adopt Flink in his
> > > > usage
> > > > > > > > easily.
> > > > > > > > > >> The
> > > > > > > > > >> >> >>> second
> > > > > > > > > >> >> >>>> proposal is aimed at providing rich interfaces for
> > > > > > advanced
> > > > > > > > > users
> > > > > > > > > >> >> >>>> who want to make accurate control of these stages.
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>> Quick reference on open questions:
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>> 1. Exclude job cluster deployment from client side
> > > or
> > > > > > > redefine
> > > > > > > > > the
> > > > > > > > > >> >> >>> semantic
> > > > > > > > > >> >> >>>> of job cluster? Since it fits in a process quite
> > > > different
> > > > > > > > from
> > > > > > > > > >> >> session
> > > > > > > > > >> >> >>>> cluster deployment and job submission.
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>> 2. Maintain the codepaths handling class
> > > > > > > > > o.a.f.api.common.Program
> > > > > > > > > >> or
> > > > > > > > > >> >> >>>> implement customized program handling logic by
> > > > customized
> > > > > > > > > >> >> CliFrontend?
> > > > > > > > > >> >> >>>> See also this thread[2] and the document[1].
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>> 3. Expose ClusterClient as public api or just
> > expose
> > > > api
> > > > > > in
> > > > > > > > > >> >> >>>> ExecutionEnvironment
> > > > > > > > > >> >> >>>> and delegate them to ClusterClient? Further, in
> > > > either way
> > > > > > > is
> > > > > > > > it
> > > > > > > > > >> >> worth
> > > > > > > > > >> >> >> to
> > > > > > > > > >> >> >>>> introduce a JobClient which is an encapsulation of
> > > > > > > > ClusterClient
> > > > > > > > > >> that
> > > > > > > > > >> >> >>>> associated to specific job?
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>> Best,
> > > > > > > > > >> >> >>>> tison.
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>> [1]
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>
> > > > > > > > > >> >> >>
> > > > > > > > > >> >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> > > > > > > > > >> >> >>>> [2]
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>
> > > > > > > > > >> >> >>
> > > > > > > > > >> >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三
> > > 上午9:19写道：
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>>> Thanks Stephan, I will follow up this issue in
> > next
> > > > few
> > > > > > > > weeks,
> > > > > > > > > >> and
> > > > > > > > > >> >> >> will
> > > > > > > > > >> >> >>>>> refine the design doc. We could discuss more
> > > details
> > > > > > after
> > > > > > > > 1.9
> > > > > > > > > >> >> >> release.
> > > > > > > > > >> >> >>>>>
> > > > > > > > > >> >> >>>>> Stephan Ewen <se...@apache.org> 于2019年7月24日周三
> > > > 上午12:58写道：
> > > > > > > > > >> >> >>>>>
> > > > > > > > > >> >> >>>>>> Hi all!
> > > > > > > > > >> >> >>>>>>
> > > > > > > > > >> >> >>>>>> This thread has stalled for a bit, which I
> > assume
> > > > ist
> > > > > > > mostly
> > > > > > > > > >> due to
> > > > > > > > > >> >> >>> the
> > > > > > > > > >> >> >>>>>> Flink 1.9 feature freeze and release testing
> > > effort.
> > > > > > > > > >> >> >>>>>>
> > > > > > > > > >> >> >>>>>> I personally still recognize this issue as one
> > > > important
> > > > > > > to
> > > > > > > > be
> > > > > > > > > >> >> >>> solved.
> > > > > > > > > >> >> >>>>> I'd
> > > > > > > > > >> >> >>>>>> be happy to help resume this discussion soon
> > > (after
> > > > the
> > > > > > > 1.9
> > > > > > > > > >> >> >> release)
> > > > > > > > > >> >> >>>> and
> > > > > > > > > >> >> >>>>>> see if we can do some step towards this in Flink
> > > > 1.10.
> > > > > > > > > >> >> >>>>>>
> > > > > > > > > >> >> >>>>>> Best,
> > > > > > > > > >> >> >>>>>> Stephan
> > > > > > > > > >> >> >>>>>>
> > > > > > > > > >> >> >>>>>>
> > > > > > > > > >> >> >>>>>>
> > > > > > > > > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio
> > > Pompermaier
> > > > <
> > > > > > > > > >> >> >>>>> pompermaier@okkam.it>
> > > > > > > > > >> >> >>>>>> wrote:
> > > > > > > > > >> >> >>>>>>
> > > > > > > > > >> >> >>>>>>> That's exactly what I suggested a long time
> > ago:
> > > > the
> > > > > > > Flink
> > > > > > > > > REST
> > > > > > > > > >> >> >>>> client
> > > > > > > > > >> >> >>>>>>> should not require any Flink dependency, only
> > > http
> > > > > > > library
> > > > > > > > to
> > > > > > > > > >> >> >> call
> > > > > > > > > >> >> >>>> the
> > > > > > > > > >> >> >>>>>> REST
> > > > > > > > > >> >> >>>>>>> services to submit and monitor a job.
> > > > > > > > > >> >> >>>>>>> What I suggested also in [1] was to have a way
> > to
> > > > > > > > > automatically
> > > > > > > > > >> >> >>>> suggest
> > > > > > > > > >> >> >>>>>> the
> > > > > > > > > >> >> >>>>>>> user (via a UI) the available main classes and
> > > > their
> > > > > > > > required
> > > > > > > > > >> >> >>>>>>> parameters[2].
> > > > > > > > > >> >> >>>>>>> Another problem we have with Flink is that the
> > > Rest
> > > > > > > client
> > > > > > > > > and
> > > > > > > > > >> >> >> the
> > > > > > > > > >> >> >>>> CLI
> > > > > > > > > >> >> >>>>>> one
> > > > > > > > > >> >> >>>>>>> behaves differently and we use the CLI client
> > > (via
> > > > ssh)
> > > > > > > > > because
> > > > > > > > > >> >> >> it
> > > > > > > > > >> >> >>>>> allows
> > > > > > > > > >> >> >>>>>>> to call some other method after env.execute()
> > [3]
> > > > (we
> > > > > > > have
> > > > > > > > to
> > > > > > > > > >> >> >> call
> > > > > > > > > >> >> >>>>>> another
> > > > > > > > > >> >> >>>>>>> REST service to signal the end of the job).
> > > > > > > > > >> >> >>>>>>> Int his regard, a dedicated interface, like the
> > > > > > > JobListener
> > > > > > > > > >> >> >>> suggested
> > > > > > > > > >> >> >>>>> in
> > > > > > > > > >> >> >>>>>>> the previous emails, would be very helpful
> > > (IMHO).
> > > > > > > > > >> >> >>>>>>>
> > > > > > > > > >> >> >>>>>>> [1]
> > > > https://issues.apache.org/jira/browse/FLINK-10864
> > > > > > > > > >> >> >>>>>>> [2]
> > > > https://issues.apache.org/jira/browse/FLINK-10862
> > > > > > > > > >> >> >>>>>>> [3]
> > > > https://issues.apache.org/jira/browse/FLINK-10879
> > > > > > > > > >> >> >>>>>>>
> > > > > > > > > >> >> >>>>>>> Best,
> > > > > > > > > >> >> >>>>>>> Flavio
> > > > > > > > > >> >> >>>>>>>
> > > > > > > > > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <
> > > > > > > > zjffdu@gmail.com
> > > > > > > > > >
> > > > > > > > > >> >> >>> wrote:
> > > > > > > > > >> >> >>>>>>>
> > > > > > > > > >> >> >>>>>>>> Hi, Tison,
> > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > >> >> >>>>>>>> Thanks for your comments. Overall I agree with
> > > you
> > > > > > that
> > > > > > > it
> > > > > > > > > is
> > > > > > > > > >> >> >>>>> difficult
> > > > > > > > > >> >> >>>>>>> for
> > > > > > > > > >> >> >>>>>>>> down stream project to integrate with flink
> > and
> > > we
> > > > > > need
> > > > > > > to
> > > > > > > > > >> >> >>> refactor
> > > > > > > > > >> >> >>>>> the
> > > > > > > > > >> >> >>>>>>>> current flink client api.
> > > > > > > > > >> >> >>>>>>>> And I agree that CliFrontend should only
> > parsing
> > > > > > command
> > > > > > > > > line
> > > > > > > > > >> >> >>>>> arguments
> > > > > > > > > >> >> >>>>>>> and
> > > > > > > > > >> >> >>>>>>>> then pass them to ExecutionEnvironment. It is
> > > > > > > > > >> >> >>>> ExecutionEnvironment's
> > > > > > > > > >> >> >>>>>>>> responsibility to compile job, create cluster,
> > > and
> > > > > > > submit
> > > > > > > > > job.
> > > > > > > > > >> >> >>>>> Besides
> > > > > > > > > >> >> >>>>>>>> that, Currently flink has many
> > > > ExecutionEnvironment
> > > > > > > > > >> >> >>>> implementations,
> > > > > > > > > >> >> >>>>>> and
> > > > > > > > > >> >> >>>>>>>> flink will use the specific one based on the
> > > > context.
> > > > > > > > IMHO,
> > > > > > > > > it
> > > > > > > > > >> >> >> is
> > > > > > > > > >> >> >>>> not
> > > > > > > > > >> >> >>>>>>>> necessary, ExecutionEnvironment should be able
> > > to
> > > > do
> > > > > > the
> > > > > > > > > right
> > > > > > > > > >> >> >>>> thing
> > > > > > > > > >> >> >>>>>>> based
> > > > > > > > > >> >> >>>>>>>> on the FlinkConf it is received. Too many
> > > > > > > > > ExecutionEnvironment
> > > > > > > > > >> >> >>>>>>>> implementation is another burden for
> > downstream
> > > > > > project
> > > > > > > > > >> >> >>>> integration.
> > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > >> >> >>>>>>>> One thing I'd like to mention is flink's scala
> > > > shell
> > > > > > and
> > > > > > > > sql
> > > > > > > > > >> >> >>>> client,
> > > > > > > > > >> >> >>>>>>>> although they are sub-modules of flink, they
> > > > could be
> > > > > > > > > treated
> > > > > > > > > >> >> >> as
> > > > > > > > > >> >> >>>>>>> downstream
> > > > > > > > > >> >> >>>>>>>> project which use flink's client api.
> > Currently
> > > > you
> > > > > > will
> > > > > > > > > find
> > > > > > > > > >> >> >> it
> > > > > > > > > >> >> >>> is
> > > > > > > > > >> >> >>>>> not
> > > > > > > > > >> >> >>>>>>>> easy for them to integrate with flink, they
> > > share
> > > > many
> > > > > > > > > >> >> >> duplicated
> > > > > > > > > >> >> >>>>> code.
> > > > > > > > > >> >> >>>>>>> It
> > > > > > > > > >> >> >>>>>>>> is another sign that we should refactor flink
> > > > client
> > > > > > > api.
> > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > >> >> >>>>>>>> I believe it is a large and hard change, and I
> > > am
> > > > > > afraid
> > > > > > > > we
> > > > > > > > > >> can
> > > > > > > > > >> >> >>> not
> > > > > > > > > >> >> >>>>>> keep
> > > > > > > > > >> >> >>>>>>>> compatibility since many of changes are user
> > > > facing.
> > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > >> >> >>>>>>>> Zili Chen <wa...@gmail.com>
> > 于2019年6月24日周一
> > > > > > > 下午2:53写道：
> > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> Hi all,
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> After a closer look on our client apis, I can
> > > see
> > > > > > there
> > > > > > > > are
> > > > > > > > > >> >> >> two
> > > > > > > > > >> >> >>>>> major
> > > > > > > > > >> >> >>>>>>>>> issues to consistency and integration, namely
> > > > > > different
> > > > > > > > > >> >> >>>> deployment
> > > > > > > > > >> >> >>>>> of
> > > > > > > > > >> >> >>>>>>>>> job cluster which couples job graph creation
> > > and
> > > > > > > cluster
> > > > > > > > > >> >> >>>>> deployment,
> > > > > > > > > >> >> >>>>>>>>> and submission via CliFrontend confusing
> > > control
> > > > flow
> > > > > > > of
> > > > > > > > > job
> > > > > > > > > >> >> >>>> graph
> > > > > > > > > >> >> >>>>>>>>> compilation and job submission. I'd like to
> > > > follow
> > > > > > the
> > > > > > > > > >> >> >> discuss
> > > > > > > > > >> >> >>>>> above,
> > > > > > > > > >> >> >>>>>>>>> mainly the process described by Jeff and
> > > > Stephan, and
> > > > > > > > share
> > > > > > > > > >> >> >> my
> > > > > > > > > >> >> >>>>>>>>> ideas on these issues.
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend confuses the control flow of
> > job
> > > > > > > > compilation
> > > > > > > > > >> >> >> and
> > > > > > > > > >> >> >>>>>>>> submission.
> > > > > > > > > >> >> >>>>>>>>> Following the process of job submission
> > Stephan
> > > > and
> > > > > > > Jeff
> > > > > > > > > >> >> >>>> described,
> > > > > > > > > >> >> >>>>>>>>> execution environment knows all configs of
> > the
> > > > > > cluster
> > > > > > > > and
> > > > > > > > > >> >> >>>>>>> topos/settings
> > > > > > > > > >> >> >>>>>>>>> of the job. Ideally, in the main method of
> > user
> > > > > > > program,
> > > > > > > > it
> > > > > > > > > >> >> >>> calls
> > > > > > > > > >> >> >>>>>>>> #execute
> > > > > > > > > >> >> >>>>>>>>> (or named #submit) and Flink deploys the
> > > cluster,
> > > > > > > compile
> > > > > > > > > the
> > > > > > > > > >> >> >>> job
> > > > > > > > > >> >> >>>>>> graph
> > > > > > > > > >> >> >>>>>>>>> and submit it to the cluster. However,
> > current
> > > > > > > > CliFrontend
> > > > > > > > > >> >> >> does
> > > > > > > > > >> >> >>>> all
> > > > > > > > > >> >> >>>>>>> these
> > > > > > > > > >> >> >>>>>>>>> things inside its #runProgram method, which
> > > > > > introduces
> > > > > > > a
> > > > > > > > > lot
> > > > > > > > > >> >> >> of
> > > > > > > > > >> >> >>>>>>>> subclasses
> > > > > > > > > >> >> >>>>>>>>> of (stream) execution environment.
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> Actually, it sets up an exec env that hijacks
> > > the
> > > > > > > > > >> >> >>>>>> #execute/executePlan
> > > > > > > > > >> >> >>>>>>>>> method, initializes the job graph and abort
> > > > > > execution.
> > > > > > > > And
> > > > > > > > > >> >> >> then
> > > > > > > > > >> >> >>>>>>>>> control flow back to CliFrontend, it deploys
> > > the
> > > > > > > > cluster(or
> > > > > > > > > >> >> >>>>> retrieve
> > > > > > > > > >> >> >>>>>>>>> the client) and submits the job graph. This
> > is
> > > > quite
> > > > > > a
> > > > > > > > > >> >> >> specific
> > > > > > > > > >> >> >>>>>>> internal
> > > > > > > > > >> >> >>>>>>>>> process inside Flink and none of consistency
> > to
> > > > > > > anything.
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> 2) Deployment of job cluster couples job
> > graph
> > > > > > creation
> > > > > > > > and
> > > > > > > > > >> >> >>>> cluster
> > > > > > > > > >> >> >>>>>>>>> deployment. Abstractly, from user job to a
> > > > concrete
> > > > > > > > > >> >> >> submission,
> > > > > > > > > >> >> >>>> it
> > > > > > > > > >> >> >>>>>>>> requires
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>     create JobGraph --\
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> create ClusterClient -->  submit JobGraph
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> such a dependency. ClusterClient was created
> > by
> > > > > > > deploying
> > > > > > > > > or
> > > > > > > > > >> >> >>>>>>> retrieving.
> > > > > > > > > >> >> >>>>>>>>> JobGraph submission requires a compiled
> > > JobGraph
> > > > and
> > > > > > > > valid
> > > > > > > > > >> >> >>>>>>> ClusterClient,
> > > > > > > > > >> >> >>>>>>>>> but the creation of ClusterClient is
> > abstractly
> > > > > > > > independent
> > > > > > > > > >> >> >> of
> > > > > > > > > >> >> >>>> that
> > > > > > > > > >> >> >>>>>> of
> > > > > > > > > >> >> >>>>>>>>> JobGraph. However, in job cluster mode, we
> > > > deploy job
> > > > > > > > > cluster
> > > > > > > > > >> >> >>>> with
> > > > > > > > > >> >> >>>>> a
> > > > > > > > > >> >> >>>>>>> job
> > > > > > > > > >> >> >>>>>>>>> graph, which means we use another process:
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> create JobGraph --> deploy cluster with the
> > > > JobGraph
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> Here is another inconsistency and downstream
> > > > > > > > > projects/client
> > > > > > > > > >> >> >>> apis
> > > > > > > > > >> >> >>>>> are
> > > > > > > > > >> >> >>>>>>>>> forced to handle different cases with rare
> > > > supports
> > > > > > > from
> > > > > > > > > >> >> >> Flink.
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> Since we likely reached a consensus on
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> 1. all configs gathered by Flink
> > configuration
> > > > and
> > > > > > > passed
> > > > > > > > > >> >> >>>>>>>>> 2. execution environment knows all configs
> > and
> > > > > > handles
> > > > > > > > > >> >> >>>>> execution(both
> > > > > > > > > >> >> >>>>>>>>> deployment and submission)
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> to the issues above I propose eliminating
> > > > > > > inconsistencies
> > > > > > > > > by
> > > > > > > > > >> >> >>>>>> following
> > > > > > > > > >> >> >>>>>>>>> approach:
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> 1) CliFrontend should exactly be a front end,
> > > at
> > > > > > least
> > > > > > > > for
> > > > > > > > > >> >> >>> "run"
> > > > > > > > > >> >> >>>>>>> command.
> > > > > > > > > >> >> >>>>>>>>> That means it just gathered and passed all
> > > config
> > > > > > from
> > > > > > > > > >> >> >> command
> > > > > > > > > >> >> >>>> line
> > > > > > > > > >> >> >>>>>> to
> > > > > > > > > >> >> >>>>>>>>> the main method of user program. Execution
> > > > > > environment
> > > > > > > > > knows
> > > > > > > > > >> >> >>> all
> > > > > > > > > >> >> >>>>> the
> > > > > > > > > >> >> >>>>>>> info
> > > > > > > > > >> >> >>>>>>>>> and with an addition to utils for
> > > ClusterClient,
> > > > we
> > > > > > > > > >> >> >> gracefully
> > > > > > > > > >> >> >>>> get
> > > > > > > > > >> >> >>>>> a
> > > > > > > > > >> >> >>>>>>>>> ClusterClient by deploying or retrieving. In
> > > this
> > > > > > way,
> > > > > > > we
> > > > > > > > > >> >> >> don't
> > > > > > > > > >> >> >>>>> need
> > > > > > > > > >> >> >>>>>> to
> > > > > > > > > >> >> >>>>>>>>> hijack #execute/executePlan methods and can
> > > > remove
> > > > > > > > various
> > > > > > > > > >> >> >>>> hacking
> > > > > > > > > >> >> >>>>>>>>> subclasses of exec env, as well as #run
> > methods
> > > > in
> > > > > > > > > >> >> >>>>> ClusterClient(for
> > > > > > > > > >> >> >>>>>> an
> > > > > > > > > >> >> >>>>>>>>> interface-ized ClusterClient). Now the
> > control
> > > > flow
> > > > > > > flows
> > > > > > > > > >> >> >> from
> > > > > > > > > >> >> >>>>>>>> CliFrontend
> > > > > > > > > >> >> >>>>>>>>> to the main method and never returns.
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> 2) Job cluster means a cluster for the
> > specific
> > > > job.
> > > > > > > From
> > > > > > > > > >> >> >>> another
> > > > > > > > > >> >> >>>>>>>>> perspective, it is an ephemeral session. We
> > may
> > > > > > > decouple
> > > > > > > > > the
> > > > > > > > > >> >> >>>>>> deployment
> > > > > > > > > >> >> >>>>>>>>> with a compiled job graph, but start a
> > session
> > > > with
> > > > > > > idle
> > > > > > > > > >> >> >>> timeout
> > > > > > > > > >> >> >>>>>>>>> and submit the job following.
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> These topics, before we go into more details
> > on
> > > > > > design
> > > > > > > or
> > > > > > > > > >> >> >>>>>>> implementation,
> > > > > > > > > >> >> >>>>>>>>> are better to be aware and discussed for a
> > > > consensus.
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> Best,
> > > > > > > > > >> >> >>>>>>>>> tison.
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>> Zili Chen <wa...@gmail.com>
> > 于2019年6月20日周四
> > > > > > > 上午3:21写道：
> > > > > > > > > >> >> >>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>> Hi Jeff,
> > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>> Thanks for raising this thread and the
> > design
> > > > > > > document!
> > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>> As @Thomas Weise mentioned above, extending
> > > > config
> > > > > > to
> > > > > > > > > flink
> > > > > > > > > >> >> >>>>>>>>>> requires far more effort than it should be.
> > > > Another
> > > > > > > > > example
> > > > > > > > > >> >> >>>>>>>>>> is we achieve detach mode by introduce
> > another
> > > > > > > execution
> > > > > > > > > >> >> >>>>>>>>>> environment which also hijack #execute
> > method.
> > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>> I agree with your idea that user would
> > > > configure all
> > > > > > > > > things
> > > > > > > > > >> >> >>>>>>>>>> and flink "just" respect it. On this topic I
> > > > think
> > > > > > the
> > > > > > > > > >> >> >> unusual
> > > > > > > > > >> >> >>>>>>>>>> control flow when CliFrontend handle "run"
> > > > command
> > > > > > is
> > > > > > > > the
> > > > > > > > > >> >> >>>> problem.
> > > > > > > > > >> >> >>>>>>>>>> It handles several configs, mainly about
> > > cluster
> > > > > > > > settings,
> > > > > > > > > >> >> >> and
> > > > > > > > > >> >> >>>>>>>>>> thus main method of user program is unaware
> > of
> > > > them.
> > > > > > > > Also
> > > > > > > > > it
> > > > > > > > > >> >> >>>>>> compiles
> > > > > > > > > >> >> >>>>>>>>>> app to job graph by run the main method
> > with a
> > > > > > > hijacked
> > > > > > > > > exec
> > > > > > > > > >> >> >>>> env,
> > > > > > > > > >> >> >>>>>>>>>> which constrain the main method further.
> > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>> I'd like to write down a few of notes on
> > > > > > configs/args
> > > > > > > > pass
> > > > > > > > > >> >> >> and
> > > > > > > > > >> >> >>>>>>> respect,
> > > > > > > > > >> >> >>>>>>>>>> as well as decoupling job compilation and
> > > > > > submission.
> > > > > > > > > Share
> > > > > > > > > >> >> >> on
> > > > > > > > > >> >> >>>>> this
> > > > > > > > > >> >> >>>>>>>>>> thread later.
> > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>> Best,
> > > > > > > > > >> >> >>>>>>>>>> tison.
> > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com>
> > > > 于2019年6月17日周一
> > > > > > > > > >> >> >> 下午7:29写道：
> > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
> > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the design
> > > > > > document.
> > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>> We are also working on refactoring
> > > > ClusterClient to
> > > > > > > > allow
> > > > > > > > > >> >> >>>>> flexible
> > > > > > > > > >> >> >>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>> efficient job management in our real-time
> > > > platform.
> > > > > > > > > >> >> >>>>>>>>>>> We would like to draft a document to share
> > > our
> > > > > > ideas
> > > > > > > > with
> > > > > > > > > >> >> >>> you.
> > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>> I think it's a good idea to have something
> > > like
> > > > > > > Apache
> > > > > > > > > Livy
> > > > > > > > > >> >> >>> for
> > > > > > > > > >> >> >>>>>>> Flink,
> > > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>> the efforts discussed here will take a
> > great
> > > > step
> > > > > > > > forward
> > > > > > > > > >> >> >> to
> > > > > > > > > >> >> >>>> it.
> > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>> Regards,
> > > > > > > > > >> >> >>>>>>>>>>> Xiaogang
> > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>> Flavio Pompermaier <po...@okkam.it>
> > > > > > > > 于2019年6月17日周一
> > > > > > > > > >> >> >>>>> 下午7:13写道：
> > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>> Is there any possibility to have something
> > > > like
> > > > > > > Apache
> > > > > > > > > >> >> >> Livy
> > > > > > > > > >> >> >>>> [1]
> > > > > > > > > >> >> >>>>>>> also
> > > > > > > > > >> >> >>>>>>>>>>> for
> > > > > > > > > >> >> >>>>>>>>>>>> Flink in the future?
> > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
> > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff
> > Zhang <
> > > > > > > > > >> >> >>> zjffdu@gmail.com
> > > > > > > > > >> >> >>>>>
> > > > > > > > > >> >> >>>>>>> wrote:
> > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> Any API we expose should not have
> > > > dependencies
> > > > > > > on
> > > > > > > > > >> >> >>> the
> > > > > > > > > >> >> >>>>>>> runtime
> > > > > > > > > >> >> >>>>>>>>>>>>> (flink-runtime) package or other
> > > > implementation
> > > > > > > > > >> >> >> details.
> > > > > > > > > >> >> >>> To
> > > > > > > > > >> >> >>>>> me,
> > > > > > > > > >> >> >>>>>>>> this
> > > > > > > > > >> >> >>>>>>>>>>>> means
> > > > > > > > > >> >> >>>>>>>>>>>>> that the current ClusterClient cannot be
> > > > exposed
> > > > > > to
> > > > > > > > > >> >> >> users
> > > > > > > > > >> >> >>>>>> because
> > > > > > > > > >> >> >>>>>>>> it
> > > > > > > > > >> >> >>>>>>>>>>>> uses
> > > > > > > > > >> >> >>>>>>>>>>>>> quite some classes from the optimiser and
> > > > runtime
> > > > > > > > > >> >> >>> packages.
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>> We should change ClusterClient from class
> > > to
> > > > > > > > interface.
> > > > > > > > > >> >> >>>>>>>>>>>>> ExecutionEnvironment only use the
> > interface
> > > > > > > > > >> >> >> ClusterClient
> > > > > > > > > >> >> >>>>> which
> > > > > > > > > >> >> >>>>>>>>>>> should be
> > > > > > > > > >> >> >>>>>>>>>>>>> in flink-clients while the concrete
> > > > > > implementation
> > > > > > > > > >> >> >> class
> > > > > > > > > >> >> >>>>> could
> > > > > > > > > >> >> >>>>>> be
> > > > > > > > > >> >> >>>>>>>> in
> > > > > > > > > >> >> >>>>>>>>>>>>> flink-runtime.
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> What happens when a failure/restart in
> > > the
> > > > > > > client
> > > > > > > > > >> >> >>>>> happens?
> > > > > > > > > >> >> >>>>>>>> There
> > > > > > > > > >> >> >>>>>>>>>>> need
> > > > > > > > > >> >> >>>>>>>>>>>>> to be a way of re-establishing the
> > > > connection to
> > > > > > > the
> > > > > > > > > >> >> >> job,
> > > > > > > > > >> >> >>>> set
> > > > > > > > > >> >> >>>>>> up
> > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>> listeners again, etc.
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>> Good point.  First we need to define what
> > > > does
> > > > > > > > > >> >> >>>>> failure/restart
> > > > > > > > > >> >> >>>>>> in
> > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>> client mean. IIUC, that usually mean
> > > network
> > > > > > > failure
> > > > > > > > > >> >> >>> which
> > > > > > > > > >> >> >>>>> will
> > > > > > > > > >> >> >>>>>>>>>>> happen in
> > > > > > > > > >> >> >>>>>>>>>>>>> class RestClient. If my understanding is
> > > > correct,
> > > > > > > > > >> >> >>>>> restart/retry
> > > > > > > > > >> >> >>>>>>>>>>> mechanism
> > > > > > > > > >> >> >>>>>>>>>>>>> should be done in RestClient.
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org>
> > > > > > > 于2019年6月11日周二
> > > > > > > > > >> >> >>>>>> 下午11:10写道：
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>> Some points to consider:
> > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>> * Any API we expose should not have
> > > > dependencies
> > > > > > > on
> > > > > > > > > >> >> >> the
> > > > > > > > > >> >> >>>>>> runtime
> > > > > > > > > >> >> >>>>>>>>>>>>>> (flink-runtime) package or other
> > > > implementation
> > > > > > > > > >> >> >>> details.
> > > > > > > > > >> >> >>>> To
> > > > > > > > > >> >> >>>>>> me,
> > > > > > > > > >> >> >>>>>>>>>>> this
> > > > > > > > > >> >> >>>>>>>>>>>>> means
> > > > > > > > > >> >> >>>>>>>>>>>>>> that the current ClusterClient cannot be
> > > > exposed
> > > > > > > to
> > > > > > > > > >> >> >>> users
> > > > > > > > > >> >> >>>>>>> because
> > > > > > > > > >> >> >>>>>>>>>>> it
> > > > > > > > > >> >> >>>>>>>>>>>>> uses
> > > > > > > > > >> >> >>>>>>>>>>>>>> quite some classes from the optimiser
> > and
> > > > > > runtime
> > > > > > > > > >> >> >>>> packages.
> > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>> * What happens when a failure/restart in
> > > the
> > > > > > > client
> > > > > > > > > >> >> >>>>> happens?
> > > > > > > > > >> >> >>>>>>>> There
> > > > > > > > > >> >> >>>>>>>>>>> need
> > > > > > > > > >> >> >>>>>>>>>>>>> to
> > > > > > > > > >> >> >>>>>>>>>>>>>> be a way of re-establishing the
> > connection
> > > > to
> > > > > > the
> > > > > > > > > >> >> >> job,
> > > > > > > > > >> >> >>>> set
> > > > > > > > > >> >> >>>>> up
> > > > > > > > > >> >> >>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>> listeners
> > > > > > > > > >> >> >>>>>>>>>>>>>> again, etc.
> > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>> Aljoscha
> > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff Zhang <
> > > > > > > > > >> >> >>>> zjffdu@gmail.com>
> > > > > > > > > >> >> >>>>>>>> wrote:
> > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is late as
> > > you
> > > > > > > > > >> >> >> expected.
> > > > > > > > > >> >> >>>>> Here's
> > > > > > > > > >> >> >>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>> design
> > > > > > > > > >> >> >>>>>>>>>>>>>> doc
> > > > > > > > > >> >> >>>>>>>>>>>>>>> I drafted, welcome any comments and
> > > > feedback.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > >> >> >>>>>>>
> > > > > > > > > >> >> >>>>>>
> > > > > > > > > >> >> >>>>>
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>
> > > > > > > > > >> >> >>
> > > > > > > > > >> >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
> > > > 于2019年2月14日周四
> > > > > > > > > >> >> >>>> 下午8:43写道：
> > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> Nice that this discussion is
> > happening.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also revisit the
> > > > entire
> > > > > > > role
> > > > > > > > > >> >> >>> of
> > > > > > > > > >> >> >>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>> environments
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> again.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the environments take care of the
> > > > specific
> > > > > > > > > >> >> >> setup
> > > > > > > > > >> >> >>>> for
> > > > > > > > > >> >> >>>>>>>>>>> standalone
> > > > > > > > > >> >> >>>>>>>>>>>> (no
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> - the session ones have control over
> > the
> > > > > > > session.
> > > > > > > > > >> >> >>> The
> > > > > > > > > >> >> >>>>>>>>>>> environment
> > > > > > > > > >> >> >>>>>>>>>>>>> holds
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> the session client.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> - running a job gives a "control"
> > object
> > > > for
> > > > > > > that
> > > > > > > > > >> >> >>>> job.
> > > > > > > > > >> >> >>>>>> That
> > > > > > > > > >> >> >>>>>>>>>>>> behavior
> > > > > > > > > >> >> >>>>>>>>>>>>> is
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> the same in all environments.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> The actual implementation diverged
> > quite
> > > > a bit
> > > > > > > > > >> >> >> from
> > > > > > > > > >> >> >>>>> that.
> > > > > > > > > >> >> >>>>>>>> Happy
> > > > > > > > > >> >> >>>>>>>>>>> to
> > > > > > > > > >> >> >>>>>>>>>>>>> see a
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> discussion about straitening this out
> > a
> > > > bit
> > > > > > > more.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM Jeff
> > > > Zhang <
> > > > > > > > > >> >> >>>>>>> zjffdu@gmail.com>
> > > > > > > > > >> >> >>>>>>>>>>>> wrote:
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Hi folks,
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Sorry for late response, It seems we
> > > > reach
> > > > > > > > > >> >> >>> consensus
> > > > > > > > > >> >> >>>> on
> > > > > > > > > >> >> >>>>>>>> this, I
> > > > > > > > > >> >> >>>>>>>>>>>> will
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> create
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> FLIP for this with more detailed
> > design
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org>
> > > > 于2018年12月21日周五
> > > > > > > > > >> >> >>>>> 上午11:43写道：
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Great to see this discussion seeded!
> > > The
> > > > > > > > > >> >> >> problems
> > > > > > > > > >> >> >>>> you
> > > > > > > > > >> >> >>>>>> face
> > > > > > > > > >> >> >>>>>>>>>>> with
> > > > > > > > > >> >> >>>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also
> > > affecting
> > > > > > other
> > > > > > > > > >> >> >>>>> downstream
> > > > > > > > > >> >> >>>>>>>>>>> projects,
> > > > > > > > > >> >> >>>>>>>>>>>>>> like
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Beam.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint
> > restore
> > > > option
> > > > > > > in
> > > > > > > > > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> [1]
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and that was more difficult than it
> > > > should
> > > > > > be.
> > > > > > > > > >> >> >> The
> > > > > > > > > >> >> >>>>> main
> > > > > > > > > >> >> >>>>>>>> issue
> > > > > > > > > >> >> >>>>>>>>>>> is
> > > > > > > > > >> >> >>>>>>>>>>>>> that
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> environment and cluster client
> > aren't
> > > > > > > decoupled.
> > > > > > > > > >> >> >>>>> Ideally
> > > > > > > > > >> >> >>>>>>> it
> > > > > > > > > >> >> >>>>>>>>>>> should
> > > > > > > > > >> >> >>>>>>>>>>>>> be
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> possible to just get the matching
> > > > cluster
> > > > > > > client
> > > > > > > > > >> >> >>>> from
> > > > > > > > > >> >> >>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>> environment
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> then control the job through it
> > > > (environment
> > > > > > > as
> > > > > > > > > >> >> >>>>> factory
> > > > > > > > > >> >> >>>>>>> for
> > > > > > > > > >> >> >>>>>>>>>>>> cluster
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> client). But note that the
> > environment
> > > > > > classes
> > > > > > > > > >> >> >> are
> > > > > > > > > >> >> >>>>> part
> > > > > > > > > >> >> >>>>>> of
> > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>> public
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API,
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and it is not straightforward to
> > make
> > > > larger
> > > > > > > > > >> >> >>> changes
> > > > > > > > > >> >> >>>>>>> without
> > > > > > > > > >> >> >>>>>>>>>>>>> breaking
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes
> > > internal
> > > > > > > classes
> > > > > > > > > >> >> >>>> like
> > > > > > > > > >> >> >>>>>>>>>>> JobGraph and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be
> > possible
> > > > to
> > > > > > wrap
> > > > > > > > > >> >> >>> this
> > > > > > > > > >> >> >>>>>> with a
> > > > > > > > > >> >> >>>>>>>> new
> > > > > > > > > >> >> >>>>>>>>>>>>> public
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> API
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that brings the required job control
> > > > > > > > > >> >> >> capabilities
> > > > > > > > > >> >> >>>> for
> > > > > > > > > >> >> >>>>>>>>>>> downstream
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> projects.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at
> > some
> > > > of the
> > > > > > > > > >> >> >>>>> interfaces
> > > > > > > > > >> >> >>>>>> in
> > > > > > > > > >> >> >>>>>>>>>>> Beam
> > > > > > > > > >> >> >>>>>>>>>>>>> while
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the
> > > > portable
> > > > > > job
> > > > > > > > > >> >> >> API
> > > > > > > > > >> >> >>>> and
> > > > > > > > > >> >> >>>>>> [3]
> > > > > > > > > >> >> >>>>>>>> for
> > > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>> old
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> asynchronous job control from the
> > Beam
> > > > Java
> > > > > > > SDK.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> The backward compatibility
> > discussion
> > > > [4] is
> > > > > > > > > >> >> >> also
> > > > > > > > > >> >> >>>>>> relevant
> > > > > > > > > >> >> >>>>>>>>>>> here. A
> > > > > > > > > >> >> >>>>>>>>>>>>> new
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> API
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> should shield downstream projects
> > from
> > > > > > > internals
> > > > > > > > > >> >> >>> and
> > > > > > > > > >> >> >>>>>> allow
> > > > > > > > > >> >> >>>>>>>>>>> them to
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> interoperate with multiple future
> > > Flink
> > > > > > > versions
> > > > > > > > > >> >> >>> in
> > > > > > > > > >> >> >>>>> the
> > > > > > > > > >> >> >>>>>>> same
> > > > > > > > > >> >> >>>>>>>>>>>> release
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> line
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thomas
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [1]
> > > > > > https://github.com/apache/flink/pull/7249
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [2]
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > >> >> >>>>>>>
> > > > > > > > > >> >> >>>>>>
> > > > > > > > > >> >> >>>>>
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>
> > > > > > > > > >> >> >>
> > > > > > > > > >> >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [3]
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > >> >> >>>>>>>
> > > > > > > > > >> >> >>>>>>
> > > > > > > > > >> >> >>>>>
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>
> > > > > > > > > >> >> >>
> > > > > > > > > >> >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [4]
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > >> >> >>>>>>>
> > > > > > > > > >> >> >>>>>>
> > > > > > > > > >> >> >>>>>
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>
> > > > > > > > > >> >> >>
> > > > > > > > > >> >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff
> > > > Zhang <
> > > > > > > > > >> >> >>>>>>>> zjffdu@gmail.com>
> > > > > > > > > >> >> >>>>>>>>>>>>> wrote:
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user
> > > > should
> > > > > > be
> > > > > > > > > >> >> >>> able
> > > > > > > > > >> >> >>>> to
> > > > > > > > > >> >> >>>>>>>> define
> > > > > > > > > >> >> >>>>>>>>>>>> where
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> job
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This
> > is
> > > > > > actually
> > > > > > > > > >> >> >>>>>> independent
> > > > > > > > > >> >> >>>>>>>> of
> > > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>> job
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> development and is something which
> > is
> > > > > > decided
> > > > > > > > > >> >> >> at
> > > > > > > > > >> >> >>>>>>> deployment
> > > > > > > > > >> >> >>>>>>>>>>> time.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> User don't need to specify
> > execution
> > > > mode
> > > > > > > > > >> >> >>>>>>> programmatically.
> > > > > > > > > >> >> >>>>>>>>>>> They
> > > > > > > > > >> >> >>>>>>>>>>>>> can
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> also
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from the
> > > > arguments
> > > > > > in
> > > > > > > > > >> >> >>> flink
> > > > > > > > > >> >> >>>>> run
> > > > > > > > > >> >> >>>>>>>>>>> command.
> > > > > > > > > >> >> >>>>>>>>>>>>> e.g.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster ....
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that the
> > > > > > > > > >> >> >>>> ExecutionEnvironment
> > > > > > > > > >> >> >>>>>> is
> > > > > > > > > >> >> >>>>>>>> not
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> directly
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> initialized by the user and instead
> > > > context
> > > > > > > > > >> >> >>>> sensitive
> > > > > > > > > >> >> >>>>>> how
> > > > > > > > > >> >> >>>>>>>> you
> > > > > > > > > >> >> >>>>>>>>>>>> want
> > > > > > > > > >> >> >>>>>>>>>>>>> to
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs.
> > IDE,
> > > > for
> > > > > > > > > >> >> >>> example).
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink
> > would
> > > > > > create
> > > > > > > > > >> >> >>>>> different
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment based
> > on
> > > > > > > different
> > > > > > > > > >> >> >>>>>> submission
> > > > > > > > > >> >> >>>>>>>>>>>> scenarios
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> (Flink
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is kind of
> > > hack
> > > > > > > > > >> >> >> approach,
> > > > > > > > > >> >> >>>> not
> > > > > > > > > >> >> >>>>>> so
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> straightforward.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> What I suggested above is that is
> > > that
> > > > > > flink
> > > > > > > > > >> >> >>> should
> > > > > > > > > >> >> >>>>>>> always
> > > > > > > > > >> >> >>>>>>>>>>> create
> > > > > > > > > >> >> >>>>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with
> > > different
> > > > > > > > > >> >> >>>>> configuration,
> > > > > > > > > >> >> >>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>> based
> > > > > > > > > >> >> >>>>>>>>>>>> on
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> configuration it would create the
> > > > proper
> > > > > > > > > >> >> >>>>> ClusterClient
> > > > > > > > > >> >> >>>>>>> for
> > > > > > > > > >> >> >>>>>>>>>>>>> different
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <
> > trohrmann@apache.org>
> > > > > > > > > >> >> >>>> 于2018年12月20日周四
> > > > > > > > > >> >> >>>>>>>>>>> 下午11:18写道：
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> You are probably right that we
> > have
> > > > code
> > > > > > > > > >> >> >>>> duplication
> > > > > > > > > >> >> >>>>>>> when
> > > > > > > > > >> >> >>>>>>>> it
> > > > > > > > > >> >> >>>>>>>>>>>> comes
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> to
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient.
> > This
> > > > should
> > > > > > > be
> > > > > > > > > >> >> >>>>> reduced
> > > > > > > > > >> >> >>>>>> in
> > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> future.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user
> > > > should be
> > > > > > > > > >> >> >> able
> > > > > > > > > >> >> >>> to
> > > > > > > > > >> >> >>>>>>> define
> > > > > > > > > >> >> >>>>>>>>>>> where
> > > > > > > > > >> >> >>>>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This
> > is
> > > > > > > actually
> > > > > > > > > >> >> >>>>>>> independent
> > > > > > > > > >> >> >>>>>>>>>>> of the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> development and is something which
> > > is
> > > > > > > decided
> > > > > > > > > >> >> >> at
> > > > > > > > > >> >> >>>>>>>> deployment
> > > > > > > > > >> >> >>>>>>>>>>>> time.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> To
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> it
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the
> > > > ExecutionEnvironment
> > > > > > is
> > > > > > > > > >> >> >> not
> > > > > > > > > >> >> >>>>>>> directly
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> initialized
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> by
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> the user and instead context
> > > > sensitive how
> > > > > > > you
> > > > > > > > > >> >> >>>> want
> > > > > > > > > >> >> >>>>> to
> > > > > > > > > >> >> >>>>>>>>>>> execute
> > > > > > > > > >> >> >>>>>>>>>>>>> your
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for example).
> > > > > > However, I
> > > > > > > > > >> >> >>> agree
> > > > > > > > > >> >> >>>>>> that
> > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should give
> > you
> > > > > > access
> > > > > > > to
> > > > > > > > > >> >> >>> the
> > > > > > > > > >> >> >>>>>>>>>>> ClusterClient
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> to
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the
> > > > JobGraph or
> > > > > > a
> > > > > > > > > >> >> >> job
> > > > > > > > > >> >> >>>>> plan).
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Till
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36 AM
> > Jeff
> > > > > > Zhang <
> > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> wrote:
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You are
> > > > right
> > > > > > > that I
> > > > > > > > > >> >> >>>>> expect
> > > > > > > > > >> >> >>>>>>>> better
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api which
> > > > could be
> > > > > > > > > >> >> >> used
> > > > > > > > > >> >> >>> by
> > > > > > > > > >> >> >>>>>>>>>>> downstream
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> project.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> And
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the flink
> > > > ecosystem.
> > > > > > > > > >> >> >> When
> > > > > > > > > >> >> >>> I
> > > > > > > > > >> >> >>>>> look
> > > > > > > > > >> >> >>>>>>> at
> > > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>> code
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> of
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> flink
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I
> > > believe
> > > > > > they
> > > > > > > > > >> >> >> are
> > > > > > > > > >> >> >>>> not
> > > > > > > > > >> >> >>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>> core of
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> flink,
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> but
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of
> > flink),
> > > I
> > > > find
> > > > > > > > > >> >> >> many
> > > > > > > > > >> >> >>>>>>> duplicated
> > > > > > > > > >> >> >>>>>>>>>>> code
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> for
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creating
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user provided
> > > > > > > > > >> >> >> configuration
> > > > > > > > > >> >> >>>>>>>>>>> (configuration
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> format
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> may
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell and
> > > > > > sql-client)
> > > > > > > > > >> >> >> and
> > > > > > > > > >> >> >>>> then
> > > > > > > > > >> >> >>>>>> use
> > > > > > > > > >> >> >>>>>>>>>>> that
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't think
> > > > this is
> > > > > > > > > >> >> >>>>> convenient
> > > > > > > > > >> >> >>>>>>> for
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> downstream
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is that
> > > > > > downstream
> > > > > > > > > >> >> >>>> project
> > > > > > > > > >> >> >>>>>> only
> > > > > > > > > >> >> >>>>>>>>>>> needs
> > > > > > > > > >> >> >>>>>>>>>>>> to
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> provide
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration info
> > (maybe
> > > > > > > > > >> >> >> introducing
> > > > > > > > > >> >> >>>>> class
> > > > > > > > > >> >> >>>>>>>>>>>> FlinkConf),
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment based
> > on
> > > > this
> > > > > > > > > >> >> >>>> FlinkConf,
> > > > > > > > > >> >> >>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will create
> > > the
> > > > > > proper
> > > > > > > > > >> >> >>>>>>>> ClusterClient.
> > > > > > > > > >> >> >>>>>>>>>>> It
> > > > > > > > > >> >> >>>>>>>>>>>> not
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> only
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream
> > project
> > > > > > > > > >> >> >> development
> > > > > > > > > >> >> >>>> but
> > > > > > > > > >> >> >>>>>> also
> > > > > > > > > >> >> >>>>>>>> be
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> helpful
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> for
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> their integration test with
> > flink.
> > > > Here's
> > > > > > > one
> > > > > > > > > >> >> >>>>> sample
> > > > > > > > > >> >> >>>>>>> code
> > > > > > > > > >> >> >>>>>>>>>>>> snippet
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> I
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new
> > > > FlinkConf().mode("yarn")
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new
> > > > ExecutionEnvironment(conf)
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> > > > > > > > > >> >> >>>>>>>>>>>
> > env.getClusterClient().queryJobStatus(jobId)
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > env.getClusterClient().cancelJob(jobId)
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
> > > trohrmann@apache.org>
> > > > > > > > > >> >> >>>>> 于2018年12月11日周二
> > > > > > > > > >> >> >>>>>>>>>>> 下午6:28写道：
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to
> > > > provide the
> > > > > > > > > >> >> >> user
> > > > > > > > > >> >> >>>> with
> > > > > > > > > >> >> >>>>>>>> better
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was actually an
> > > > effort to
> > > > > > > > > >> >> >>> achieve
> > > > > > > > > >> >> >>>>>> this
> > > > > > > > > >> >> >>>>>>>> but
> > > > > > > > > >> >> >>>>>>>>>>> it
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> has
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> never
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> been
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However, there
> > are
> > > > some
> > > > > > > > > >> >> >>>> improvement
> > > > > > > > > >> >> >>>>>> in
> > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>> code
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> base
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> now.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the
> > > > NewClusterClient
> > > > > > > > > >> >> >>>> interface
> > > > > > > > > >> >> >>>>>>> which
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> offers a
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission.
> > But I
> > > > agree
> > > > > > > > > >> >> >> that
> > > > > > > > > >> >> >>> we
> > > > > > > > > >> >> >>>>>> need
> > > > > > > > > >> >> >>>>>>> to
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> improve
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> in
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if
> > > > exposing all
> > > > > > > > > >> >> >>>>>> ClusterClient
> > > > > > > > > >> >> >>>>>>>>>>> calls
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> via
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment because it
> > > > would
> > > > > > > > > >> >> >> clutter
> > > > > > > > > >> >> >>>> the
> > > > > > > > > >> >> >>>>>>> class
> > > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> would
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> not
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> a
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns.
> > > > Instead one
> > > > > > > > > >> >> >> idea
> > > > > > > > > >> >> >>>>> could
> > > > > > > > > >> >> >>>>>> be
> > > > > > > > > >> >> >>>>>>>> to
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> retrieve
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from the
> > > > > > > > > >> >> >>>>> ExecutionEnvironment
> > > > > > > > > >> >> >>>>>>>> which
> > > > > > > > > >> >> >>>>>>>>>>> can
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> be
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> used
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job control. But
> > > > before
> > > > > > we
> > > > > > > > > >> >> >>> start
> > > > > > > > > >> >> >>>>> an
> > > > > > > > > >> >> >>>>>>>> effort
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> here,
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> we
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> need
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what
> > > > functionality we
> > > > > > > want
> > > > > > > > > >> >> >>> to
> > > > > > > > > >> >> >>>>>>> provide.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was that we
> > > > have the
> > > > > > > > > >> >> >>>>>>>> ClusterDescriptor
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> describing
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> how
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager like
> > > > Yarn or
> > > > > > > > > >> >> >> Mesos.
> > > > > > > > > >> >> >>>> The
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> can
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> be
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink
> > clusters
> > > > (job
> > > > > > and
> > > > > > > > > >> >> >>>>> session)
> > > > > > > > > >> >> >>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>> gives
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> you a
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The ClusterClient
> > > > > > controls
> > > > > > > > > >> >> >> the
> > > > > > > > > >> >> >>>>>> cluster
> > > > > > > > > >> >> >>>>>>>>>>> (e.g.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> submitting
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running jobs).
> > > And
> > > > > > then
> > > > > > > > > >> >> >>> there
> > > > > > > > > >> >> >>>>> was
> > > > > > > > > >> >> >>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>> idea
> > > > > > > > > >> >> >>>>>>>>>>>> to
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain from
> > > the
> > > > > > > > > >> >> >>>> ClusterClient
> > > > > > > > > >> >> >>>>> to
> > > > > > > > > >> >> >>>>>>>>>>> trigger
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> specific
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a
> > > savepoint,
> > > > > > > > > >> >> >>> cancelling
> > > > > > > > > >> >> >>>>> the
> > > > > > > > > >> >> >>>>>>>> job).
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> > > > > > > > > >> >> >>>>>>
> > https://issues.apache.org/jira/browse/FLINK-4272
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM
> > > > Jeff
> > > > > > > Zhang
> > > > > > > > > >> >> >> <
> > > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> wrote:
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate flink
> > > into
> > > > > > > apache
> > > > > > > > > >> >> >>>>> zeppelin
> > > > > > > > > >> >> >>>>>>>>>>> which is
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> an
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several
> > > issues
> > > > that
> > > > > > > is
> > > > > > > > > >> >> >>>> caused
> > > > > > > > > >> >> >>>>>> by
> > > > > > > > > >> >> >>>>>>>>>>> flink
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> client
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> api.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> So
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the
> > > following
> > > > > > > changes
> > > > > > > > > >> >> >>> for
> > > > > > > > > >> >> >>>>>> flink
> > > > > > > > > >> >> >>>>>>>>>>> client
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> api.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking
> > execution.
> > > > > > > > > >> >> >> Currently,
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which
> > would
> > > > do 2
> > > > > > > > > >> >> >> things,
> > > > > > > > > >> >> >>>>> first
> > > > > > > > > >> >> >>>>>>>>>>> submit
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is
> > > finished.
> > > > I'd
> > > > > > > like
> > > > > > > > > >> >> >>>>>>> introduce a
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> nonblocking
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
> > > > > > > > > >> >> >>>> ExecutionEnvironment#submit
> > > > > > > > > >> >> >>>>>>> which
> > > > > > > > > >> >> >>>>>>>>>>> only
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> submit
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to client.
> > And
> > > > allow
> > > > > > > user
> > > > > > > > > >> >> >>> to
> > > > > > > > > >> >> >>>>>> query
> > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>> job
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> status
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> via
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to
> > cancel
> > > > job is
> > > > > > > via
> > > > > > > > > >> >> >>> cli
> > > > > > > > > >> >> >>>>>>>>>>> (bin/flink),
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> this
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> is
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> not
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream
> > project
> > > > to
> > > > > > use
> > > > > > > > > >> >> >> this
> > > > > > > > > >> >> >>>>>>> feature.
> > > > > > > > > >> >> >>>>>>>>>>> So I'd
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> like
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> add
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in
> > > ExecutionEnvironment
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> It
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api, we
> > > > should use
> > > > > > > > > >> >> >>>>>>>>>>> ExecutionEnvironment
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> as
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to
> > integrate
> > > > with
> > > > > > > > > >> >> >> flink.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job
> > execution
> > > > > > > > > >> >> >> lifecycle.
> > > > > > > > > >> >> >>>>>>> Something
> > > > > > > > > >> >> >>>>>>>>>>> like
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> following,
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> so
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project can do
> > > > custom
> > > > > > > logic
> > > > > > > > > >> >> >>> in
> > > > > > > > > >> >> >>>>> the
> > > > > > > > > >> >> >>>>>>>>>>> lifecycle
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> of
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> job.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the
> > jobId
> > > > after
> > > > > > > job
> > > > > > > > > >> >> >> is
> > > > > > > > > >> >> >>>>>>> submitted
> > > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> use
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> this
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later when
> > > > > > necessary.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface JobListener {
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID
> > > jobId);
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
> > > > onJobExecuted(JobExecutionResult
> > > > > > > > > >> >> >>>>> jobResult);
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID
> > jobId);
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in
> > > > > > > ExecutionEnvironment.
> > > > > > > > > >> >> >>>>>> Currently
> > > > > > > > > >> >> >>>>>>> it
> > > > > > > > > >> >> >>>>>>>>>>> is
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> disabled,
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> but
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient for
> > > > third
> > > > > > > party
> > > > > > > > > >> >> >> to
> > > > > > > > > >> >> >>>>>>>> submitting
> > > > > > > > > >> >> >>>>>>>>>>> jobs
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it
> > again.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client api
> > > into
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue which
> > > > needs
> > > > > > > more
> > > > > > > > > >> >> >>>>> careful
> > > > > > > > > >> >> >>>>>>>>>>> thinking
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> design.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features of
> > > > flink is
> > > > > > > > > >> >> >>> exposed
> > > > > > > > > >> >> >>>> in
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > > > > > >> >> >>>>>> but
> > > > > > > > > >> >> >>>>>>>>>>> some are
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> exposed
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> in
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like the
> > > > cancel and
> > > > > > > > > >> >> >>>>> savepoint I
> > > > > > > > > >> >> >>>>>>>>>>> mentioned
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> above.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is due to
> > > that
> > > > > > flink
> > > > > > > > > >> >> >>>> didn't
> > > > > > > > > >> >> >>>>>>> unify
> > > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> interaction
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> with
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3 scenarios
> > of
> > > > flink
> > > > > > > > > >> >> >>>> operation
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.  Flink
> > > will
> > > > > > > create
> > > > > > > > > >> >> >>>>>>>>>>> LocalEnvironment
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> use
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to
> > create
> > > > > > > > > >> >> >>> LocalExecutor
> > > > > > > > > >> >> >>>>> for
> > > > > > > > > >> >> >>>>>>> job
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> execution.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution. Flink
> > > will
> > > > > > > create
> > > > > > > > > >> >> >>>>>>>> ClusterClient
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> first
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> then
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment
> > based
> > > > on the
> > > > > > > > > >> >> >>>>>>> ClusterClient
> > > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> run
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink will
> > > > create
> > > > > > > > > >> >> >>>>>> ClusterClient
> > > > > > > > > >> >> >>>>>>>>>>> first
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> then
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this
> > ClusterClient.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the above 3
> > > > > > scenarios.
> > > > > > > > > >> >> >>> Flink
> > > > > > > > > >> >> >>>>>> didn't
> > > > > > > > > >> >> >>>>>>>>>>> use the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to interact
> > > > with
> > > > > > > flink
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is following:
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> > > > > > > > > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
> > > > > > > > > >> >> >>>>>>>>>>> (based
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> on
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> user
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this
> > > > Environment
> > > > > > > to
> > > > > > > > > >> >> >>>> create
> > > > > > > > > >> >> >>>>>>>> proper
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or
> > > > > > RestClusterClient)
> > > > > > > > > >> >> >> to
> > > > > > > > > >> >> >>>>>>>> interactive
> > > > > > > > > >> >> >>>>>>>>>>> with
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink (
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the
> > process
> > > > of
> > > > > > > local
> > > > > > > > > >> >> >>>>>> execution
> > > > > > > > > >> >> >>>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> remote
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for third
> > > > party
> > > > > > to
> > > > > > > > > >> >> >>>>> integrate
> > > > > > > > > >> >> >>>>>>> with
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> flink,
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> because
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the
> > > unified
> > > > > > entry
> > > > > > > > > >> >> >>> point
> > > > > > > > > >> >> >>>>> for
> > > > > > > > > >> >> >>>>>>>>>>> flink.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> What
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> third
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> party
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass
> > > > configuration
> > > > > > to
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will do
> > the
> > > > right
> > > > > > > > > >> >> >> thing
> > > > > > > > > >> >> >>>>> based
> > > > > > > > > >> >> >>>>>> on
> > > > > > > > > >> >> >>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be
> > considered
> > > as
> > > > > > flink
> > > > > > > > > >> >> >> api
> > > > > > > > > >> >> >>>>>>> consumer.
> > > > > > > > > >> >> >>>>>>>>>>> it
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> just
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to
> > > > ExecutionEnvironment
> > > > > > and
> > > > > > > > > >> >> >> let
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper ClusterClient
> > > > instead
> > > > > > > of
> > > > > > > > > >> >> >>>>> letting
> > > > > > > > > >> >> >>>>>>> cli
> > > > > > > > > >> >> >>>>>>>> to
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> create
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code
> > > > refactoring,
> > > > > > > so
> > > > > > > > > >> >> >> I
> > > > > > > > > >> >> >>>>> think
> > > > > > > > > >> >> >>>>>> we
> > > > > > > > > >> >> >>>>>>>> can
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> defer
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>> it
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> for
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5 could
> > > be
> > > > done
> > > > > > > at
> > > > > > > > > >> >> >>>> once I
> > > > > > > > > >> >> >>>>>>>>>>> believe.
> > > > > > > > > >> >> >>>>>>>>>>>>>>>> Let
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> know
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> your
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback, thanks
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> --
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> --
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> --
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>> --
> > > > > > > > > >> >> >>>>>>>>>>>>>>> Best Regards
> > > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>> --
> > > > > > > > > >> >> >>>>>>>>>>>>> Best Regards
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>> Jeff Zhang
> > > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > >> >> >>>>>>>> --
> > > > > > > > > >> >> >>>>>>>> Best Regards
> > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > >> >> >>>>>>>> Jeff Zhang
> > > > > > > > > >> >> >>>>>>>>
> > > > > > > > > >> >> >>>>>>>
> > > > > > > > > >> >> >>>>>>
> > > > > > > > > >> >> >>>>>
> > > > > > > > > >> >> >>>>>
> > > > > > > > > >> >> >>>>> --
> > > > > > > > > >> >> >>>>> Best Regards
> > > > > > > > > >> >> >>>>>
> > > > > > > > > >> >> >>>>> Jeff Zhang
> > > > > > > > > >> >> >>>>>
> > > > > > > > > >> >> >>>>
> > > > > > > > > >> >> >>>
> > > > > > > > > >> >> >>
> > > > > > > > > >> >> >
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > --
> > > > > > > > > >> >> > Best Regards
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > Jeff Zhang
> > > > > > > > > >> >>
> > > > > > > > > >> >>
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > > >
> > >
> >

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Yang Wang <da...@gmail.com>.

Hi Zili,

It make sense to me that a dedicated cluster is started for a per-job
cluster and will not accept more jobs.
Just have a question about the command line.

Currently we could use the following commands to start different clusters.
*per-job cluster*
./bin/flink run -d -p 5 -ynm perjob-cluster1 -m yarn-cluster
examples/streaming/WindowJoin.jar
*session cluster*
./bin/flink run -p 5 -ynm session-cluster1 -m yarn-cluster
examples/streaming/WindowJoin.jar

What will it look like after client enhancement?


Best,
Yang

Zili Chen <wa...@gmail.com> 于2019年8月23日周五 下午10:46写道：

> Hi Till,
>
> Thanks for your update. Nice to hear :-)
>
> Best,
> tison.
>
>
> Till Rohrmann <tr...@apache.org> 于2019年8月23日周五 下午10:39写道：
>
> > Hi Tison,
> >
> > just a quick comment concerning the class loading issues when using the
> per
> > job mode. The community wants to change it so that the
> > StandaloneJobClusterEntryPoint actually uses the user code class loader
> > with child first class loading [1]. Hence, I hope that this problem will
> be
> > resolved soon.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-13840
> >
> > Cheers,
> > Till
> >
> > On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <kk...@gmail.com>
> wrote:
> >
> > > Hi all,
> > >
> > > On the topic of web submission, I agree with Till that it only seems
> > > to complicate things.
> > > It is bad for security, job isolation (anybody can submit/cancel jobs),
> > > and its
> > > implementation complicates some parts of the code. So, if it were to
> > > redesign the
> > > WebUI, maybe this part could be left out. In addition, I would say
> > > that the ability to cancel
> > > jobs could also be left out.
> > >
> > > Also I would also be in favour of removing the "detached" mode, for
> > > the reasons mentioned
> > > above (i.e. because now we will have a future representing the result
> > > on which the user
> > > can choose to wait or not).
> > >
> > > Now for the separating job submission and cluster creation, I am in
> > > favour of keeping both.
> > > Once again, the reasons are mentioned above by Stephan, Till, Aljoscha
> > > and also Zili seems
> > > to agree. They mainly have to do with security, isolation and ease of
> > > resource management
> > > for the user as he knows that "when my job is done, everything will be
> > > cleared up". This is
> > > also the experience you get when launching a process on your local OS.
> > >
> > > On excluding the per-job mode from returning a JobClient or not, I
> > > believe that eventually
> > > it would be nice to allow users to get back a jobClient. The reason is
> > > that 1) I cannot
> > > find any objective reason why the user-experience should diverge, and
> > > 2) this will be the
> > > way that the user will be able to interact with his running job.
> > > Assuming that the necessary
> > > ports are open for the REST API to work, then I think that the
> > > JobClient can run against the
> > > REST API without problems. If the needed ports are not open, then we
> > > are safe to not return
> > > a JobClient, as the user explicitly chose to close all points of
> > > communication to his running job.
> > >
> > > On the topic of not hijacking the "env.execute()" in order to get the
> > > Plan, I definitely agree but
> > > for the proposal of having a "compile()" method in the env, I would
> > > like to have a better look at
> > > the existing code.
> > >
> > > Cheers,
> > > Kostas
> > >
> > > On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <wa...@gmail.com>
> wrote:
> > > >
> > > > Hi Yang,
> > > >
> > > > It would be helpful if you check Stephan's last comment,
> > > > which states that isolation is important.
> > > >
> > > > For per-job mode, we run a dedicated cluster(maybe it
> > > > should have been a couple of JM and TMs during FLIP-6
> > > > design) for a specific job. Thus the process is prevented
> > > > from other jobs.
> > > >
> > > > In our cases there was a time we suffered from multi
> > > > jobs submitted by different users and they affected
> > > > each other so that all ran into an error state. Also,
> > > > run the client inside the cluster could save client
> > > > resource at some points.
> > > >
> > > > However, we also face several issues as you mentioned,
> > > > that in per-job mode it always uses parent classloader
> > > > thus classloading issues occur.
> > > >
> > > > BTW, one can makes an analogy between session/per-job mode
> > > > in  Flink, and client/cluster mode in Spark.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > Yang Wang <da...@gmail.com> 于2019年8月22日周四 上午11:25写道：
> > > >
> > > > > From the user's perspective, it is really confused about the scope
> of
> > > > > per-job cluster.
> > > > >
> > > > >
> > > > > If it means a flink cluster with single job, so that we could get
> > > better
> > > > > isolation.
> > > > >
> > > > > Now it does not matter how we deploy the cluster, directly
> > > deploy(mode1)
> > > > >
> > > > > or start a flink cluster and then submit job through cluster
> > > client(mode2).
> > > > >
> > > > >
> > > > > Otherwise, if it just means directly deploy, how should we name the
> > > mode2,
> > > > >
> > > > > session with job or something else?
> > > > >
> > > > > We could also benefit from the mode2. Users could get the same
> > > isolation
> > > > > with mode1.
> > > > >
> > > > > The user code and dependencies will be loaded by user class loader
> > > > >
> > > > > to avoid class conflict with framework.
> > > > >
> > > > >
> > > > >
> > > > > Anyway, both of the two submission modes are useful.
> > > > >
> > > > > We just need to clarify the concepts.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Yang
> > > > >
> > > > > Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午5:58写道：
> > > > >
> > > > > > Thanks for the clarification.
> > > > > >
> > > > > > The idea JobDeployer ever came into my mind when I was muddled
> with
> > > > > > how to execute per-job mode and session mode with the same user
> > code
> > > > > > and framework codepath.
> > > > > >
> > > > > > With the concept JobDeployer we back to the statement that
> > > environment
> > > > > > knows every configs of cluster deployment and job submission. We
> > > > > > configure or generate from configuration a specific JobDeployer
> in
> > > > > > environment and then code align on
> > > > > >
> > > > > > *JobClient client = env.execute().get();*
> > > > > >
> > > > > > which in session mode returned by clusterClient.submitJob and in
> > > per-job
> > > > > > mode returned by clusterDescriptor.deployJobCluster.
> > > > > >
> > > > > > Here comes a problem that currently we directly run
> > ClusterEntrypoint
> > > > > > with extracted job graph. Follow the JobDeployer way we'd better
> > > > > > align entry point of per-job deployment at JobDeployer. Users run
> > > > > > their main method or by a Cli(finally call main method) to deploy
> > the
> > > > > > job cluster.
> > > > > >
> > > > > > Best,
> > > > > > tison.
> > > > > >
> > > > > >
> > > > > > Stephan Ewen <se...@apache.org> 于2019年8月20日周二 下午4:40写道：
> > > > > >
> > > > > > > Till has made some good comments here.
> > > > > > >
> > > > > > > Two things to add:
> > > > > > >
> > > > > > >   - The job mode is very nice in the way that it runs the
> client
> > > inside
> > > > > > the
> > > > > > > cluster (in the same image/process that is the JM) and thus
> > unifies
> > > > > both
> > > > > > > applications and what the Spark world calls the "driver mode".
> > > > > > >
> > > > > > >   - Another thing I would add is that during the FLIP-6 design,
> > we
> > > were
> > > > > > > thinking about setups where Dispatcher and JobManager are
> > separate
> > > > > > > processes.
> > > > > > >     A Yarn or Mesos Dispatcher of a session could run
> > independently
> > > > > (even
> > > > > > > as privileged processes executing no code).
> > > > > > >     Then you the "per-job" mode could still be helpful: when a
> > job
> > > is
> > > > > > > submitted to the dispatcher, it launches the JM again in a
> > per-job
> > > > > mode,
> > > > > > so
> > > > > > > that JM and TM processes are bound to teh job only. For higher
> > > security
> > > > > > > setups, it is important that processes are not reused across
> > jobs.
> > > > > > >
> > > > > > > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
> > > trohrmann@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I would not be in favour of getting rid of the per-job mode
> > > since it
> > > > > > > > simplifies the process of running Flink jobs considerably.
> > > Moreover,
> > > > > it
> > > > > > > is
> > > > > > > > not only well suited for container deployments but also for
> > > > > deployments
> > > > > > > > where you want to guarantee job isolation. For example, a
> user
> > > could
> > > > > > use
> > > > > > > > the per-job mode on Yarn to execute his job on a separate
> > > cluster.
> > > > > > > >
> > > > > > > > I think that having two notions of cluster deployments
> (session
> > > vs.
> > > > > > > per-job
> > > > > > > > mode) does not necessarily contradict your ideas for the
> client
> > > api
> > > > > > > > refactoring. For example one could have the following
> > interfaces:
> > > > > > > >
> > > > > > > > - ClusterDeploymentDescriptor: encapsulates the logic how to
> > > deploy a
> > > > > > > > cluster.
> > > > > > > > - ClusterClient: allows to interact with a cluster
> > > > > > > > - JobClient: allows to interact with a running job
> > > > > > > >
> > > > > > > > Now the ClusterDeploymentDescriptor could have two methods:
> > > > > > > >
> > > > > > > > - ClusterClient deploySessionCluster()
> > > > > > > > - JobClusterClient/JobClient deployPerJobCluster(JobGraph)
> > > > > > > >
> > > > > > > > where JobClusterClient is either a supertype of ClusterClient
> > > which
> > > > > > does
> > > > > > > > not give you the functionality to submit jobs or
> > > deployPerJobCluster
> > > > > > > > returns directly a JobClient.
> > > > > > > >
> > > > > > > > When setting up the ExecutionEnvironment, one would then not
> > > provide
> > > > > a
> > > > > > > > ClusterClient to submit jobs but a JobDeployer which,
> depending
> > > on
> > > > > the
> > > > > > > > selected mode, either uses a ClusterClient (session mode) to
> > > submit
> > > > > > jobs
> > > > > > > or
> > > > > > > > a ClusterDeploymentDescriptor to deploy per a job mode
> cluster
> > > with
> > > > > the
> > > > > > > job
> > > > > > > > to execute.
> > > > > > > >
> > > > > > > > These are just some thoughts how one could make it working
> > > because I
> > > > > > > > believe there is some value in using the per job mode from
> the
> > > > > > > > ExecutionEnvironment.
> > > > > > > >
> > > > > > > > Concerning the web submission, this is indeed a bit tricky.
> > From
> > > a
> > > > > > > cluster
> > > > > > > > management stand point, I would in favour of not executing
> user
> > > code
> > > > > on
> > > > > > > the
> > > > > > > > REST endpoint. Especially when considering security, it would
> > be
> > > good
> > > > > > to
> > > > > > > > have a well defined cluster behaviour where it is explicitly
> > > stated
> > > > > > where
> > > > > > > > user code and, thus, potentially risky code is executed.
> > Ideally
> > > we
> > > > > > limit
> > > > > > > > it to the TaskExecutor and JobMaster.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Till
> > > > > > > >
> > > > > > > > On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <
> > > > > > pompermaier@okkam.it
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > In my opinion the client should not use any environment to
> > get
> > > the
> > > > > > Job
> > > > > > > > > graph because the jar should reside ONLY on the cluster
> (and
> > > not in
> > > > > > the
> > > > > > > > > client classpath otherwise there are always inconsistencies
> > > between
> > > > > > > > client
> > > > > > > > > and Flink Job manager's classpath).
> > > > > > > > > In the YARN, Mesos and Kubernetes scenarios you have the
> jar
> > > but
> > > > > you
> > > > > > > > could
> > > > > > > > > start a cluster that has the jar on the Job Manager as well
> > > (but
> > > > > this
> > > > > > > is
> > > > > > > > > the only case where I think you can assume that the client
> > has
> > > the
> > > > > > jar
> > > > > > > on
> > > > > > > > > the classpath..in the REST job submission you don't have
> any
> > > > > > > classpath).
> > > > > > > > >
> > > > > > > > > Thus, always in my opinion, the JobGraph should be
> generated
> > > by the
> > > > > > Job
> > > > > > > > > Manager REST API.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
> > > wander4096@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> I would like to involve Till & Stephan here to clarify
> some
> > > > > concept
> > > > > > of
> > > > > > > > >> per-job mode.
> > > > > > > > >>
> > > > > > > > >> The term per-job is one of modes a cluster could run on.
> It
> > is
> > > > > > mainly
> > > > > > > > >> aimed
> > > > > > > > >> at spawn
> > > > > > > > >> a dedicated cluster for a specific job while the job could
> > be
> > > > > > packaged
> > > > > > > > >> with
> > > > > > > > >> Flink
> > > > > > > > >> itself and thus the cluster initialized with job so that
> get
> > > rid
> > > > > of
> > > > > > a
> > > > > > > > >> separated
> > > > > > > > >> submission step.
> > > > > > > > >>
> > > > > > > > >> This is useful for container deployments where one create
> > his
> > > > > image
> > > > > > > with
> > > > > > > > >> the job
> > > > > > > > >> and then simply deploy the container.
> > > > > > > > >>
> > > > > > > > >> However, it is out of client scope since a
> > > client(ClusterClient
> > > > > for
> > > > > > > > >> example) is for
> > > > > > > > >> communicate with an existing cluster and performance
> > actions.
> > > > > > > Currently,
> > > > > > > > >> in
> > > > > > > > >> per-job
> > > > > > > > >> mode, we extract the job graph and bundle it into cluster
> > > > > deployment
> > > > > > > and
> > > > > > > > >> thus no
> > > > > > > > >> concept of client get involved. It looks like reasonable
> to
> > > > > exclude
> > > > > > > the
> > > > > > > > >> deployment
> > > > > > > > >> of per-job cluster from client api and use dedicated
> utility
> > > > > > > > >> classes(deployers) for
> > > > > > > > >> deployment.
> > > > > > > > >>
> > > > > > > > >> Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午12:37写道：
> > > > > > > > >>
> > > > > > > > >> > Hi Aljoscha,
> > > > > > > > >> >
> > > > > > > > >> > Thanks for your reply and participance. The Google Doc
> you
> > > > > linked
> > > > > > to
> > > > > > > > >> > requires
> > > > > > > > >> > permission and I think you could use a share link
> instead.
> > > > > > > > >> >
> > > > > > > > >> > I agree with that we almost reach a consensus that
> > > JobClient is
> > > > > > > > >> necessary
> > > > > > > > >> > to
> > > > > > > > >> > interacte with a running Job.
> > > > > > > > >> >
> > > > > > > > >> > Let me check your open questions one by one.
> > > > > > > > >> >
> > > > > > > > >> > 1. Separate cluster creation and job submission for
> > per-job
> > > > > mode.
> > > > > > > > >> >
> > > > > > > > >> > As you mentioned here is where the opinions diverge. In
> my
> > > > > > document
> > > > > > > > >> there
> > > > > > > > >> > is
> > > > > > > > >> > an alternative[2] that proposes excluding per-job
> > deployment
> > > > > from
> > > > > > > > client
> > > > > > > > >> > api
> > > > > > > > >> > scope and now I find it is more reasonable we do the
> > > exclusion.
> > > > > > > > >> >
> > > > > > > > >> > When in per-job mode, a dedicated JobCluster is launched
> > to
> > > > > > execute
> > > > > > > > the
> > > > > > > > >> > specific job. It is like a Flink Application more than a
> > > > > > submission
> > > > > > > > >> > of Flink Job. Client only takes care of job submission
> and
> > > > > assume
> > > > > > > > there
> > > > > > > > >> is
> > > > > > > > >> > an existing cluster. In this way we are able to consider
> > > per-job
> > > > > > > > issues
> > > > > > > > >> > individually and JobClusterEntrypoint would be the
> utility
> > > class
> > > > > > for
> > > > > > > > >> > per-job
> > > > > > > > >> > deployment.
> > > > > > > > >> >
> > > > > > > > >> > Nevertheless, user program works in both session mode
> and
> > > > > per-job
> > > > > > > mode
> > > > > > > > >> > without
> > > > > > > > >> > necessary to change code. JobClient in per-job mode is
> > > returned
> > > > > > from
> > > > > > > > >> > env.execute as normal. However, it would be no longer a
> > > wrapper
> > > > > of
> > > > > > > > >> > RestClusterClient but a wrapper of PerJobClusterClient
> > which
> > > > > > > > >> communicates
> > > > > > > > >> > to Dispatcher locally.
> > > > > > > > >> >
> > > > > > > > >> > 2. How to deal with plan preview.
> > > > > > > > >> >
> > > > > > > > >> > With env.compile functions users can get JobGraph or
> > > FlinkPlan
> > > > > and
> > > > > > > > thus
> > > > > > > > >> > they can preview the plan with programming. Typically it
> > > looks
> > > > > > like
> > > > > > > > >> >
> > > > > > > > >> > if (preview configured) {
> > > > > > > > >> >     FlinkPlan plan = env.compile();
> > > > > > > > >> >     new JSONDumpGenerator(...).dump(plan);
> > > > > > > > >> > } else {
> > > > > > > > >> >     env.execute();
> > > > > > > > >> > }
> > > > > > > > >> >
> > > > > > > > >> > And `flink info` would be invalid any more.
> > > > > > > > >> >
> > > > > > > > >> > 3. How to deal with Jar Submission at the Web Frontend.
> > > > > > > > >> >
> > > > > > > > >> > There is one more thread talked on this topic[1]. Apart
> > from
> > > > > > > removing
> > > > > > > > >> > the functions there are two alternatives.
> > > > > > > > >> >
> > > > > > > > >> > One is to introduce an interface has a method returns
> > > > > > > > JobGraph/FilnkPlan
> > > > > > > > >> > and Jar Submission only support main-class implements
> this
> > > > > > > interface.
> > > > > > > > >> > And then extract the JobGraph/FlinkPlan just by calling
> > the
> > > > > > method.
> > > > > > > > >> > In this way, it is even possible to consider a
> separation
> > > of job
> > > > > > > > >> creation
> > > > > > > > >> > and job submission.
> > > > > > > > >> >
> > > > > > > > >> > The other is, as you mentioned, let execute() do the
> > actual
> > > > > > > execution.
> > > > > > > > >> > We won't execute the main method in the WebFrontend but
> > > spawn a
> > > > > > > > process
> > > > > > > > >> > at WebMonitor side to execute. For return part we could
> > > generate
> > > > > > the
> > > > > > > > >> > JobID from WebMonitor and pass it to the execution
> > > environemnt.
> > > > > > > > >> >
> > > > > > > > >> > 4. How to deal with detached mode.
> > > > > > > > >> >
> > > > > > > > >> > I think detached mode is a temporary solution for
> > > non-blocking
> > > > > > > > >> submission.
> > > > > > > > >> > In my document both submission and execution return a
> > > > > > > > CompletableFuture
> > > > > > > > >> and
> > > > > > > > >> > users control whether or not wait for the result. In
> this
> > > point
> > > > > we
> > > > > > > > don't
> > > > > > > > >> > need a detached option but the functionality is covered.
> > > > > > > > >> >
> > > > > > > > >> > 5. How does per-job mode interact with interactive
> > > programming.
> > > > > > > > >> >
> > > > > > > > >> > All of YARN, Mesos and Kubernetes scenarios follow the
> > > pattern
> > > > > > > launch
> > > > > > > > a
> > > > > > > > >> > JobCluster now. And I don't think there would be
> > > inconsistency
> > > > > > > between
> > > > > > > > >> > different resource management.
> > > > > > > > >> >
> > > > > > > > >> > Best,
> > > > > > > > >> > tison.
> > > > > > > > >> >
> > > > > > > > >> > [1]
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> > > > > > > > >> > [2]
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> > > > > > > > >> >
> > > > > > > > >> > Aljoscha Krettek <al...@apache.org> 于2019年8月16日周五
> > > 下午9:20写道：
> > > > > > > > >> >
> > > > > > > > >> >> Hi,
> > > > > > > > >> >>
> > > > > > > > >> >> I read both Jeffs initial design document and the newer
> > > > > document
> > > > > > by
> > > > > > > > >> >> Tison. I also finally found the time to collect our
> > > thoughts on
> > > > > > the
> > > > > > > > >> issue,
> > > > > > > > >> >> I had quite some discussions with Kostas and this is
> the
> > > > > result:
> > > > > > > [1].
> > > > > > > > >> >>
> > > > > > > > >> >> I think overall we agree that this part of the code is
> in
> > > dire
> > > > > > need
> > > > > > > > of
> > > > > > > > >> >> some refactoring/improvements but I think there are
> still
> > > some
> > > > > > open
> > > > > > > > >> >> questions and some differences in opinion what those
> > > > > refactorings
> > > > > > > > >> should
> > > > > > > > >> >> look like.
> > > > > > > > >> >>
> > > > > > > > >> >> I think the API-side is quite clear, i.e. we need some
> > > > > JobClient
> > > > > > > API
> > > > > > > > >> that
> > > > > > > > >> >> allows interacting with a running Job. It could be
> > > worthwhile
> > > > > to
> > > > > > > spin
> > > > > > > > >> that
> > > > > > > > >> >> off into a separate FLIP because we can probably find
> > > consensus
> > > > > > on
> > > > > > > > that
> > > > > > > > >> >> part more easily.
> > > > > > > > >> >>
> > > > > > > > >> >> For the rest, the main open questions from our doc are
> > > these:
> > > > > > > > >> >>
> > > > > > > > >> >>   - Do we want to separate cluster creation and job
> > > submission
> > > > > > for
> > > > > > > > >> >> per-job mode? In the past, there were conscious efforts
> > to
> > > > > *not*
> > > > > > > > >> separate
> > > > > > > > >> >> job submission from cluster creation for per-job
> clusters
> > > for
> > > > > > > Mesos,
> > > > > > > > >> YARN,
> > > > > > > > >> >> Kubernets (see StandaloneJobClusterEntryPoint). Tison
> > > suggests
> > > > > in
> > > > > > > his
> > > > > > > > >> >> design document to decouple this in order to unify job
> > > > > > submission.
> > > > > > > > >> >>
> > > > > > > > >> >>   - How to deal with plan preview, which needs to
> hijack
> > > > > > execute()
> > > > > > > > and
> > > > > > > > >> >> let the outside code catch an exception?
> > > > > > > > >> >>
> > > > > > > > >> >>   - How to deal with Jar Submission at the Web
> Frontend,
> > > which
> > > > > > > needs
> > > > > > > > to
> > > > > > > > >> >> hijack execute() and let the outside code catch an
> > > exception?
> > > > > > > > >> >> CliFrontend.run() “hijacks”
> > ExecutionEnvironment.execute()
> > > to
> > > > > > get a
> > > > > > > > >> >> JobGraph and then execute that JobGraph manually. We
> > could
> > > get
> > > > > > > around
> > > > > > > > >> that
> > > > > > > > >> >> by letting execute() do the actual execution. One
> caveat
> > > for
> > > > > this
> > > > > > > is
> > > > > > > > >> that
> > > > > > > > >> >> now the main() method doesn’t return (or is forced to
> > > return by
> > > > > > > > >> throwing an
> > > > > > > > >> >> exception from execute()) which means that for Jar
> > > Submission
> > > > > > from
> > > > > > > > the
> > > > > > > > >> >> WebFrontend we have a long-running main() method
> running
> > > in the
> > > > > > > > >> >> WebFrontend. This doesn’t sound very good. We could get
> > > around
> > > > > > this
> > > > > > > > by
> > > > > > > > >> >> removing the plan preview feature and by removing Jar
> > > > > > > > >> Submission/Running.
> > > > > > > > >> >>
> > > > > > > > >> >>   - How to deal with detached mode? Right now,
> > > > > > DetachedEnvironment
> > > > > > > > will
> > > > > > > > >> >> execute the job and return immediately. If users
> control
> > > when
> > > > > > they
> > > > > > > > >> want to
> > > > > > > > >> >> return, by waiting on the job completion future, how do
> > we
> > > deal
> > > > > > > with
> > > > > > > > >> this?
> > > > > > > > >> >> Do we simply remove the distinction between
> > > > > > detached/non-detached?
> > > > > > > > >> >>
> > > > > > > > >> >>   - How does per-job mode interact with “interactive
> > > > > programming”
> > > > > > > > >> >> (FLIP-36). For YARN, each execute() call could spawn a
> > new
> > > > > Flink
> > > > > > > YARN
> > > > > > > > >> >> cluster. What about Mesos and Kubernetes?
> > > > > > > > >> >>
> > > > > > > > >> >> The first open question is where the opinions diverge,
> I
> > > think.
> > > > > > The
> > > > > > > > >> rest
> > > > > > > > >> >> are just open questions and interesting things that we
> > > need to
> > > > > > > > >> consider.
> > > > > > > > >> >>
> > > > > > > > >> >> Best,
> > > > > > > > >> >> Aljoscha
> > > > > > > > >> >>
> > > > > > > > >> >> [1]
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > > > > > >> >> <
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > > > > > >> >> >
> > > > > > > > >> >>
> > > > > > > > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <
> > zjffdu@gmail.com>
> > > > > > wrote:
> > > > > > > > >> >> >
> > > > > > > > >> >> > Thanks tison for the effort. I left a few comments.
> > > > > > > > >> >> >
> > > > > > > > >> >> >
> > > > > > > > >> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三
> > 下午8:24写道：
> > > > > > > > >> >> >
> > > > > > > > >> >> >> Hi Flavio,
> > > > > > > > >> >> >>
> > > > > > > > >> >> >> Thanks for your reply.
> > > > > > > > >> >> >>
> > > > > > > > >> >> >> Either current impl and in the design, ClusterClient
> > > > > > > > >> >> >> never takes responsibility for generating JobGraph.
> > > > > > > > >> >> >> (what you see in current codebase is several class
> > > methods)
> > > > > > > > >> >> >>
> > > > > > > > >> >> >> Instead, user describes his program in the main
> method
> > > > > > > > >> >> >> with ExecutionEnvironment apis and calls
> env.compile()
> > > > > > > > >> >> >> or env.optimize() to get FlinkPlan and JobGraph
> > > > > respectively.
> > > > > > > > >> >> >>
> > > > > > > > >> >> >> For listing main classes in a jar and choose one for
> > > > > > > > >> >> >> submission, you're now able to customize a CLI to do
> > it.
> > > > > > > > >> >> >> Specifically, the path of jar is passed as arguments
> > and
> > > > > > > > >> >> >> in the customized CLI you list main classes, choose
> > one
> > > > > > > > >> >> >> to submit to the cluster.
> > > > > > > > >> >> >>
> > > > > > > > >> >> >> Best,
> > > > > > > > >> >> >> tison.
> > > > > > > > >> >> >>
> > > > > > > > >> >> >>
> > > > > > > > >> >> >> Flavio Pompermaier <po...@okkam.it>
> > 于2019年7月31日周三
> > > > > > > 下午8:12写道：
> > > > > > > > >> >> >>
> > > > > > > > >> >> >>> Just one note on my side: it is not clear to me
> > > whether the
> > > > > > > > client
> > > > > > > > >> >> needs
> > > > > > > > >> >> >> to
> > > > > > > > >> >> >>> be able to generate a job graph or not.
> > > > > > > > >> >> >>> In my opinion, the job jar must resides only on the
> > > > > > > > >> server/jobManager
> > > > > > > > >> >> >> side
> > > > > > > > >> >> >>> and the client requires a way to get the job graph.
> > > > > > > > >> >> >>> If you really want to access to the job graph, I'd
> > add
> > > a
> > > > > > > > dedicated
> > > > > > > > >> >> method
> > > > > > > > >> >> >>> on the ClusterClient. like:
> > > > > > > > >> >> >>>
> > > > > > > > >> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
> > > > > > > > >> >> >>>   - listMainClasses(jarId): List<String>
> > > > > > > > >> >> >>>
> > > > > > > > >> >> >>> These would require some addition also on the job
> > > manager
> > > > > > > > endpoint
> > > > > > > > >> as
> > > > > > > > >> >> >>> well..what do you think?
> > > > > > > > >> >> >>>
> > > > > > > > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <
> > > > > > > wander4096@gmail.com
> > > > > > > > >
> > > > > > > > >> >> wrote:
> > > > > > > > >> >> >>>
> > > > > > > > >> >> >>>> Hi all,
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>> Here is a document[1] on client api enhancement
> from
> > > our
> > > > > > > > >> perspective.
> > > > > > > > >> >> >>>> We have investigated current implementations. And
> we
> > > > > propose
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>> 1. Unify the implementation of cluster deployment
> > and
> > > job
> > > > > > > > >> submission
> > > > > > > > >> >> in
> > > > > > > > >> >> >>>> Flink.
> > > > > > > > >> >> >>>> 2. Provide programmatic interfaces to allow
> flexible
> > > job
> > > > > and
> > > > > > > > >> cluster
> > > > > > > > >> >> >>>> management.
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>> The first proposal is aimed at reducing code paths
> > of
> > > > > > cluster
> > > > > > > > >> >> >> deployment
> > > > > > > > >> >> >>>> and
> > > > > > > > >> >> >>>> job submission so that one can adopt Flink in his
> > > usage
> > > > > > > easily.
> > > > > > > > >> The
> > > > > > > > >> >> >>> second
> > > > > > > > >> >> >>>> proposal is aimed at providing rich interfaces for
> > > > > advanced
> > > > > > > > users
> > > > > > > > >> >> >>>> who want to make accurate control of these stages.
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>> Quick reference on open questions:
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>> 1. Exclude job cluster deployment from client side
> > or
> > > > > > redefine
> > > > > > > > the
> > > > > > > > >> >> >>> semantic
> > > > > > > > >> >> >>>> of job cluster? Since it fits in a process quite
> > > different
> > > > > > > from
> > > > > > > > >> >> session
> > > > > > > > >> >> >>>> cluster deployment and job submission.
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>> 2. Maintain the codepaths handling class
> > > > > > > > o.a.f.api.common.Program
> > > > > > > > >> or
> > > > > > > > >> >> >>>> implement customized program handling logic by
> > > customized
> > > > > > > > >> >> CliFrontend?
> > > > > > > > >> >> >>>> See also this thread[2] and the document[1].
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>> 3. Expose ClusterClient as public api or just
> expose
> > > api
> > > > > in
> > > > > > > > >> >> >>>> ExecutionEnvironment
> > > > > > > > >> >> >>>> and delegate them to ClusterClient? Further, in
> > > either way
> > > > > > is
> > > > > > > it
> > > > > > > > >> >> worth
> > > > > > > > >> >> >> to
> > > > > > > > >> >> >>>> introduce a JobClient which is an encapsulation of
> > > > > > > ClusterClient
> > > > > > > > >> that
> > > > > > > > >> >> >>>> associated to specific job?
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>> Best,
> > > > > > > > >> >> >>>> tison.
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>> [1]
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>
> > > > > > > > >> >> >>
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> > > > > > > > >> >> >>>> [2]
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>
> > > > > > > > >> >> >>
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三
> > 上午9:19写道：
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>>> Thanks Stephan, I will follow up this issue in
> next
> > > few
> > > > > > > weeks,
> > > > > > > > >> and
> > > > > > > > >> >> >> will
> > > > > > > > >> >> >>>>> refine the design doc. We could discuss more
> > details
> > > > > after
> > > > > > > 1.9
> > > > > > > > >> >> >> release.
> > > > > > > > >> >> >>>>>
> > > > > > > > >> >> >>>>> Stephan Ewen <se...@apache.org> 于2019年7月24日周三
> > > 上午12:58写道：
> > > > > > > > >> >> >>>>>
> > > > > > > > >> >> >>>>>> Hi all!
> > > > > > > > >> >> >>>>>>
> > > > > > > > >> >> >>>>>> This thread has stalled for a bit, which I
> assume
> > > ist
> > > > > > mostly
> > > > > > > > >> due to
> > > > > > > > >> >> >>> the
> > > > > > > > >> >> >>>>>> Flink 1.9 feature freeze and release testing
> > effort.
> > > > > > > > >> >> >>>>>>
> > > > > > > > >> >> >>>>>> I personally still recognize this issue as one
> > > important
> > > > > > to
> > > > > > > be
> > > > > > > > >> >> >>> solved.
> > > > > > > > >> >> >>>>> I'd
> > > > > > > > >> >> >>>>>> be happy to help resume this discussion soon
> > (after
> > > the
> > > > > > 1.9
> > > > > > > > >> >> >> release)
> > > > > > > > >> >> >>>> and
> > > > > > > > >> >> >>>>>> see if we can do some step towards this in Flink
> > > 1.10.
> > > > > > > > >> >> >>>>>>
> > > > > > > > >> >> >>>>>> Best,
> > > > > > > > >> >> >>>>>> Stephan
> > > > > > > > >> >> >>>>>>
> > > > > > > > >> >> >>>>>>
> > > > > > > > >> >> >>>>>>
> > > > > > > > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio
> > Pompermaier
> > > <
> > > > > > > > >> >> >>>>> pompermaier@okkam.it>
> > > > > > > > >> >> >>>>>> wrote:
> > > > > > > > >> >> >>>>>>
> > > > > > > > >> >> >>>>>>> That's exactly what I suggested a long time
> ago:
> > > the
> > > > > > Flink
> > > > > > > > REST
> > > > > > > > >> >> >>>> client
> > > > > > > > >> >> >>>>>>> should not require any Flink dependency, only
> > http
> > > > > > library
> > > > > > > to
> > > > > > > > >> >> >> call
> > > > > > > > >> >> >>>> the
> > > > > > > > >> >> >>>>>> REST
> > > > > > > > >> >> >>>>>>> services to submit and monitor a job.
> > > > > > > > >> >> >>>>>>> What I suggested also in [1] was to have a way
> to
> > > > > > > > automatically
> > > > > > > > >> >> >>>> suggest
> > > > > > > > >> >> >>>>>> the
> > > > > > > > >> >> >>>>>>> user (via a UI) the available main classes and
> > > their
> > > > > > > required
> > > > > > > > >> >> >>>>>>> parameters[2].
> > > > > > > > >> >> >>>>>>> Another problem we have with Flink is that the
> > Rest
> > > > > > client
> > > > > > > > and
> > > > > > > > >> >> >> the
> > > > > > > > >> >> >>>> CLI
> > > > > > > > >> >> >>>>>> one
> > > > > > > > >> >> >>>>>>> behaves differently and we use the CLI client
> > (via
> > > ssh)
> > > > > > > > because
> > > > > > > > >> >> >> it
> > > > > > > > >> >> >>>>> allows
> > > > > > > > >> >> >>>>>>> to call some other method after env.execute()
> [3]
> > > (we
> > > > > > have
> > > > > > > to
> > > > > > > > >> >> >> call
> > > > > > > > >> >> >>>>>> another
> > > > > > > > >> >> >>>>>>> REST service to signal the end of the job).
> > > > > > > > >> >> >>>>>>> Int his regard, a dedicated interface, like the
> > > > > > JobListener
> > > > > > > > >> >> >>> suggested
> > > > > > > > >> >> >>>>> in
> > > > > > > > >> >> >>>>>>> the previous emails, would be very helpful
> > (IMHO).
> > > > > > > > >> >> >>>>>>>
> > > > > > > > >> >> >>>>>>> [1]
> > > https://issues.apache.org/jira/browse/FLINK-10864
> > > > > > > > >> >> >>>>>>> [2]
> > > https://issues.apache.org/jira/browse/FLINK-10862
> > > > > > > > >> >> >>>>>>> [3]
> > > https://issues.apache.org/jira/browse/FLINK-10879
> > > > > > > > >> >> >>>>>>>
> > > > > > > > >> >> >>>>>>> Best,
> > > > > > > > >> >> >>>>>>> Flavio
> > > > > > > > >> >> >>>>>>>
> > > > > > > > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <
> > > > > > > zjffdu@gmail.com
> > > > > > > > >
> > > > > > > > >> >> >>> wrote:
> > > > > > > > >> >> >>>>>>>
> > > > > > > > >> >> >>>>>>>> Hi, Tison,
> > > > > > > > >> >> >>>>>>>>
> > > > > > > > >> >> >>>>>>>> Thanks for your comments. Overall I agree with
> > you
> > > > > that
> > > > > > it
> > > > > > > > is
> > > > > > > > >> >> >>>>> difficult
> > > > > > > > >> >> >>>>>>> for
> > > > > > > > >> >> >>>>>>>> down stream project to integrate with flink
> and
> > we
> > > > > need
> > > > > > to
> > > > > > > > >> >> >>> refactor
> > > > > > > > >> >> >>>>> the
> > > > > > > > >> >> >>>>>>>> current flink client api.
> > > > > > > > >> >> >>>>>>>> And I agree that CliFrontend should only
> parsing
> > > > > command
> > > > > > > > line
> > > > > > > > >> >> >>>>> arguments
> > > > > > > > >> >> >>>>>>> and
> > > > > > > > >> >> >>>>>>>> then pass them to ExecutionEnvironment. It is
> > > > > > > > >> >> >>>> ExecutionEnvironment's
> > > > > > > > >> >> >>>>>>>> responsibility to compile job, create cluster,
> > and
> > > > > > submit
> > > > > > > > job.
> > > > > > > > >> >> >>>>> Besides
> > > > > > > > >> >> >>>>>>>> that, Currently flink has many
> > > ExecutionEnvironment
> > > > > > > > >> >> >>>> implementations,
> > > > > > > > >> >> >>>>>> and
> > > > > > > > >> >> >>>>>>>> flink will use the specific one based on the
> > > context.
> > > > > > > IMHO,
> > > > > > > > it
> > > > > > > > >> >> >> is
> > > > > > > > >> >> >>>> not
> > > > > > > > >> >> >>>>>>>> necessary, ExecutionEnvironment should be able
> > to
> > > do
> > > > > the
> > > > > > > > right
> > > > > > > > >> >> >>>> thing
> > > > > > > > >> >> >>>>>>> based
> > > > > > > > >> >> >>>>>>>> on the FlinkConf it is received. Too many
> > > > > > > > ExecutionEnvironment
> > > > > > > > >> >> >>>>>>>> implementation is another burden for
> downstream
> > > > > project
> > > > > > > > >> >> >>>> integration.
> > > > > > > > >> >> >>>>>>>>
> > > > > > > > >> >> >>>>>>>> One thing I'd like to mention is flink's scala
> > > shell
> > > > > and
> > > > > > > sql
> > > > > > > > >> >> >>>> client,
> > > > > > > > >> >> >>>>>>>> although they are sub-modules of flink, they
> > > could be
> > > > > > > > treated
> > > > > > > > >> >> >> as
> > > > > > > > >> >> >>>>>>> downstream
> > > > > > > > >> >> >>>>>>>> project which use flink's client api.
> Currently
> > > you
> > > > > will
> > > > > > > > find
> > > > > > > > >> >> >> it
> > > > > > > > >> >> >>> is
> > > > > > > > >> >> >>>>> not
> > > > > > > > >> >> >>>>>>>> easy for them to integrate with flink, they
> > share
> > > many
> > > > > > > > >> >> >> duplicated
> > > > > > > > >> >> >>>>> code.
> > > > > > > > >> >> >>>>>>> It
> > > > > > > > >> >> >>>>>>>> is another sign that we should refactor flink
> > > client
> > > > > > api.
> > > > > > > > >> >> >>>>>>>>
> > > > > > > > >> >> >>>>>>>> I believe it is a large and hard change, and I
> > am
> > > > > afraid
> > > > > > > we
> > > > > > > > >> can
> > > > > > > > >> >> >>> not
> > > > > > > > >> >> >>>>>> keep
> > > > > > > > >> >> >>>>>>>> compatibility since many of changes are user
> > > facing.
> > > > > > > > >> >> >>>>>>>>
> > > > > > > > >> >> >>>>>>>>
> > > > > > > > >> >> >>>>>>>>
> > > > > > > > >> >> >>>>>>>> Zili Chen <wa...@gmail.com>
> 于2019年6月24日周一
> > > > > > 下午2:53写道：
> > > > > > > > >> >> >>>>>>>>
> > > > > > > > >> >> >>>>>>>>> Hi all,
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> After a closer look on our client apis, I can
> > see
> > > > > there
> > > > > > > are
> > > > > > > > >> >> >> two
> > > > > > > > >> >> >>>>> major
> > > > > > > > >> >> >>>>>>>>> issues to consistency and integration, namely
> > > > > different
> > > > > > > > >> >> >>>> deployment
> > > > > > > > >> >> >>>>> of
> > > > > > > > >> >> >>>>>>>>> job cluster which couples job graph creation
> > and
> > > > > > cluster
> > > > > > > > >> >> >>>>> deployment,
> > > > > > > > >> >> >>>>>>>>> and submission via CliFrontend confusing
> > control
> > > flow
> > > > > > of
> > > > > > > > job
> > > > > > > > >> >> >>>> graph
> > > > > > > > >> >> >>>>>>>>> compilation and job submission. I'd like to
> > > follow
> > > > > the
> > > > > > > > >> >> >> discuss
> > > > > > > > >> >> >>>>> above,
> > > > > > > > >> >> >>>>>>>>> mainly the process described by Jeff and
> > > Stephan, and
> > > > > > > share
> > > > > > > > >> >> >> my
> > > > > > > > >> >> >>>>>>>>> ideas on these issues.
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> 1) CliFrontend confuses the control flow of
> job
> > > > > > > compilation
> > > > > > > > >> >> >> and
> > > > > > > > >> >> >>>>>>>> submission.
> > > > > > > > >> >> >>>>>>>>> Following the process of job submission
> Stephan
> > > and
> > > > > > Jeff
> > > > > > > > >> >> >>>> described,
> > > > > > > > >> >> >>>>>>>>> execution environment knows all configs of
> the
> > > > > cluster
> > > > > > > and
> > > > > > > > >> >> >>>>>>> topos/settings
> > > > > > > > >> >> >>>>>>>>> of the job. Ideally, in the main method of
> user
> > > > > > program,
> > > > > > > it
> > > > > > > > >> >> >>> calls
> > > > > > > > >> >> >>>>>>>> #execute
> > > > > > > > >> >> >>>>>>>>> (or named #submit) and Flink deploys the
> > cluster,
> > > > > > compile
> > > > > > > > the
> > > > > > > > >> >> >>> job
> > > > > > > > >> >> >>>>>> graph
> > > > > > > > >> >> >>>>>>>>> and submit it to the cluster. However,
> current
> > > > > > > CliFrontend
> > > > > > > > >> >> >> does
> > > > > > > > >> >> >>>> all
> > > > > > > > >> >> >>>>>>> these
> > > > > > > > >> >> >>>>>>>>> things inside its #runProgram method, which
> > > > > introduces
> > > > > > a
> > > > > > > > lot
> > > > > > > > >> >> >> of
> > > > > > > > >> >> >>>>>>>> subclasses
> > > > > > > > >> >> >>>>>>>>> of (stream) execution environment.
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> Actually, it sets up an exec env that hijacks
> > the
> > > > > > > > >> >> >>>>>> #execute/executePlan
> > > > > > > > >> >> >>>>>>>>> method, initializes the job graph and abort
> > > > > execution.
> > > > > > > And
> > > > > > > > >> >> >> then
> > > > > > > > >> >> >>>>>>>>> control flow back to CliFrontend, it deploys
> > the
> > > > > > > cluster(or
> > > > > > > > >> >> >>>>> retrieve
> > > > > > > > >> >> >>>>>>>>> the client) and submits the job graph. This
> is
> > > quite
> > > > > a
> > > > > > > > >> >> >> specific
> > > > > > > > >> >> >>>>>>> internal
> > > > > > > > >> >> >>>>>>>>> process inside Flink and none of consistency
> to
> > > > > > anything.
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> 2) Deployment of job cluster couples job
> graph
> > > > > creation
> > > > > > > and
> > > > > > > > >> >> >>>> cluster
> > > > > > > > >> >> >>>>>>>>> deployment. Abstractly, from user job to a
> > > concrete
> > > > > > > > >> >> >> submission,
> > > > > > > > >> >> >>>> it
> > > > > > > > >> >> >>>>>>>> requires
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>     create JobGraph --\
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> create ClusterClient -->  submit JobGraph
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> such a dependency. ClusterClient was created
> by
> > > > > > deploying
> > > > > > > > or
> > > > > > > > >> >> >>>>>>> retrieving.
> > > > > > > > >> >> >>>>>>>>> JobGraph submission requires a compiled
> > JobGraph
> > > and
> > > > > > > valid
> > > > > > > > >> >> >>>>>>> ClusterClient,
> > > > > > > > >> >> >>>>>>>>> but the creation of ClusterClient is
> abstractly
> > > > > > > independent
> > > > > > > > >> >> >> of
> > > > > > > > >> >> >>>> that
> > > > > > > > >> >> >>>>>> of
> > > > > > > > >> >> >>>>>>>>> JobGraph. However, in job cluster mode, we
> > > deploy job
> > > > > > > > cluster
> > > > > > > > >> >> >>>> with
> > > > > > > > >> >> >>>>> a
> > > > > > > > >> >> >>>>>>> job
> > > > > > > > >> >> >>>>>>>>> graph, which means we use another process:
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> create JobGraph --> deploy cluster with the
> > > JobGraph
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> Here is another inconsistency and downstream
> > > > > > > > projects/client
> > > > > > > > >> >> >>> apis
> > > > > > > > >> >> >>>>> are
> > > > > > > > >> >> >>>>>>>>> forced to handle different cases with rare
> > > supports
> > > > > > from
> > > > > > > > >> >> >> Flink.
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> Since we likely reached a consensus on
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> 1. all configs gathered by Flink
> configuration
> > > and
> > > > > > passed
> > > > > > > > >> >> >>>>>>>>> 2. execution environment knows all configs
> and
> > > > > handles
> > > > > > > > >> >> >>>>> execution(both
> > > > > > > > >> >> >>>>>>>>> deployment and submission)
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> to the issues above I propose eliminating
> > > > > > inconsistencies
> > > > > > > > by
> > > > > > > > >> >> >>>>>> following
> > > > > > > > >> >> >>>>>>>>> approach:
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> 1) CliFrontend should exactly be a front end,
> > at
> > > > > least
> > > > > > > for
> > > > > > > > >> >> >>> "run"
> > > > > > > > >> >> >>>>>>> command.
> > > > > > > > >> >> >>>>>>>>> That means it just gathered and passed all
> > config
> > > > > from
> > > > > > > > >> >> >> command
> > > > > > > > >> >> >>>> line
> > > > > > > > >> >> >>>>>> to
> > > > > > > > >> >> >>>>>>>>> the main method of user program. Execution
> > > > > environment
> > > > > > > > knows
> > > > > > > > >> >> >>> all
> > > > > > > > >> >> >>>>> the
> > > > > > > > >> >> >>>>>>> info
> > > > > > > > >> >> >>>>>>>>> and with an addition to utils for
> > ClusterClient,
> > > we
> > > > > > > > >> >> >> gracefully
> > > > > > > > >> >> >>>> get
> > > > > > > > >> >> >>>>> a
> > > > > > > > >> >> >>>>>>>>> ClusterClient by deploying or retrieving. In
> > this
> > > > > way,
> > > > > > we
> > > > > > > > >> >> >> don't
> > > > > > > > >> >> >>>>> need
> > > > > > > > >> >> >>>>>> to
> > > > > > > > >> >> >>>>>>>>> hijack #execute/executePlan methods and can
> > > remove
> > > > > > > various
> > > > > > > > >> >> >>>> hacking
> > > > > > > > >> >> >>>>>>>>> subclasses of exec env, as well as #run
> methods
> > > in
> > > > > > > > >> >> >>>>> ClusterClient(for
> > > > > > > > >> >> >>>>>> an
> > > > > > > > >> >> >>>>>>>>> interface-ized ClusterClient). Now the
> control
> > > flow
> > > > > > flows
> > > > > > > > >> >> >> from
> > > > > > > > >> >> >>>>>>>> CliFrontend
> > > > > > > > >> >> >>>>>>>>> to the main method and never returns.
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> 2) Job cluster means a cluster for the
> specific
> > > job.
> > > > > > From
> > > > > > > > >> >> >>> another
> > > > > > > > >> >> >>>>>>>>> perspective, it is an ephemeral session. We
> may
> > > > > > decouple
> > > > > > > > the
> > > > > > > > >> >> >>>>>> deployment
> > > > > > > > >> >> >>>>>>>>> with a compiled job graph, but start a
> session
> > > with
> > > > > > idle
> > > > > > > > >> >> >>> timeout
> > > > > > > > >> >> >>>>>>>>> and submit the job following.
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> These topics, before we go into more details
> on
> > > > > design
> > > > > > or
> > > > > > > > >> >> >>>>>>> implementation,
> > > > > > > > >> >> >>>>>>>>> are better to be aware and discussed for a
> > > consensus.
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> Best,
> > > > > > > > >> >> >>>>>>>>> tison.
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>> Zili Chen <wa...@gmail.com>
> 于2019年6月20日周四
> > > > > > 上午3:21写道：
> > > > > > > > >> >> >>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>> Hi Jeff,
> > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>> Thanks for raising this thread and the
> design
> > > > > > document!
> > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>> As @Thomas Weise mentioned above, extending
> > > config
> > > > > to
> > > > > > > > flink
> > > > > > > > >> >> >>>>>>>>>> requires far more effort than it should be.
> > > Another
> > > > > > > > example
> > > > > > > > >> >> >>>>>>>>>> is we achieve detach mode by introduce
> another
> > > > > > execution
> > > > > > > > >> >> >>>>>>>>>> environment which also hijack #execute
> method.
> > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>> I agree with your idea that user would
> > > configure all
> > > > > > > > things
> > > > > > > > >> >> >>>>>>>>>> and flink "just" respect it. On this topic I
> > > think
> > > > > the
> > > > > > > > >> >> >> unusual
> > > > > > > > >> >> >>>>>>>>>> control flow when CliFrontend handle "run"
> > > command
> > > > > is
> > > > > > > the
> > > > > > > > >> >> >>>> problem.
> > > > > > > > >> >> >>>>>>>>>> It handles several configs, mainly about
> > cluster
> > > > > > > settings,
> > > > > > > > >> >> >> and
> > > > > > > > >> >> >>>>>>>>>> thus main method of user program is unaware
> of
> > > them.
> > > > > > > Also
> > > > > > > > it
> > > > > > > > >> >> >>>>>> compiles
> > > > > > > > >> >> >>>>>>>>>> app to job graph by run the main method
> with a
> > > > > > hijacked
> > > > > > > > exec
> > > > > > > > >> >> >>>> env,
> > > > > > > > >> >> >>>>>>>>>> which constrain the main method further.
> > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>> I'd like to write down a few of notes on
> > > > > configs/args
> > > > > > > pass
> > > > > > > > >> >> >> and
> > > > > > > > >> >> >>>>>>> respect,
> > > > > > > > >> >> >>>>>>>>>> as well as decoupling job compilation and
> > > > > submission.
> > > > > > > > Share
> > > > > > > > >> >> >> on
> > > > > > > > >> >> >>>>> this
> > > > > > > > >> >> >>>>>>>>>> thread later.
> > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>> Best,
> > > > > > > > >> >> >>>>>>>>>> tison.
> > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com>
> > > 于2019年6月17日周一
> > > > > > > > >> >> >> 下午7:29写道：
> > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
> > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the design
> > > > > document.
> > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>> We are also working on refactoring
> > > ClusterClient to
> > > > > > > allow
> > > > > > > > >> >> >>>>> flexible
> > > > > > > > >> >> >>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>> efficient job management in our real-time
> > > platform.
> > > > > > > > >> >> >>>>>>>>>>> We would like to draft a document to share
> > our
> > > > > ideas
> > > > > > > with
> > > > > > > > >> >> >>> you.
> > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>> I think it's a good idea to have something
> > like
> > > > > > Apache
> > > > > > > > Livy
> > > > > > > > >> >> >>> for
> > > > > > > > >> >> >>>>>>> Flink,
> > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>> the efforts discussed here will take a
> great
> > > step
> > > > > > > forward
> > > > > > > > >> >> >> to
> > > > > > > > >> >> >>>> it.
> > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>> Regards,
> > > > > > > > >> >> >>>>>>>>>>> Xiaogang
> > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>> Flavio Pompermaier <po...@okkam.it>
> > > > > > > 于2019年6月17日周一
> > > > > > > > >> >> >>>>> 下午7:13写道：
> > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>> Is there any possibility to have something
> > > like
> > > > > > Apache
> > > > > > > > >> >> >> Livy
> > > > > > > > >> >> >>>> [1]
> > > > > > > > >> >> >>>>>>> also
> > > > > > > > >> >> >>>>>>>>>>> for
> > > > > > > > >> >> >>>>>>>>>>>> Flink in the future?
> > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
> > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff
> Zhang <
> > > > > > > > >> >> >>> zjffdu@gmail.com
> > > > > > > > >> >> >>>>>
> > > > > > > > >> >> >>>>>>> wrote:
> > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>> Any API we expose should not have
> > > dependencies
> > > > > > on
> > > > > > > > >> >> >>> the
> > > > > > > > >> >> >>>>>>> runtime
> > > > > > > > >> >> >>>>>>>>>>>>> (flink-runtime) package or other
> > > implementation
> > > > > > > > >> >> >> details.
> > > > > > > > >> >> >>> To
> > > > > > > > >> >> >>>>> me,
> > > > > > > > >> >> >>>>>>>> this
> > > > > > > > >> >> >>>>>>>>>>>> means
> > > > > > > > >> >> >>>>>>>>>>>>> that the current ClusterClient cannot be
> > > exposed
> > > > > to
> > > > > > > > >> >> >> users
> > > > > > > > >> >> >>>>>> because
> > > > > > > > >> >> >>>>>>>> it
> > > > > > > > >> >> >>>>>>>>>>>> uses
> > > > > > > > >> >> >>>>>>>>>>>>> quite some classes from the optimiser and
> > > runtime
> > > > > > > > >> >> >>> packages.
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>> We should change ClusterClient from class
> > to
> > > > > > > interface.
> > > > > > > > >> >> >>>>>>>>>>>>> ExecutionEnvironment only use the
> interface
> > > > > > > > >> >> >> ClusterClient
> > > > > > > > >> >> >>>>> which
> > > > > > > > >> >> >>>>>>>>>>> should be
> > > > > > > > >> >> >>>>>>>>>>>>> in flink-clients while the concrete
> > > > > implementation
> > > > > > > > >> >> >> class
> > > > > > > > >> >> >>>>> could
> > > > > > > > >> >> >>>>>> be
> > > > > > > > >> >> >>>>>>>> in
> > > > > > > > >> >> >>>>>>>>>>>>> flink-runtime.
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>> What happens when a failure/restart in
> > the
> > > > > > client
> > > > > > > > >> >> >>>>> happens?
> > > > > > > > >> >> >>>>>>>> There
> > > > > > > > >> >> >>>>>>>>>>> need
> > > > > > > > >> >> >>>>>>>>>>>>> to be a way of re-establishing the
> > > connection to
> > > > > > the
> > > > > > > > >> >> >> job,
> > > > > > > > >> >> >>>> set
> > > > > > > > >> >> >>>>>> up
> > > > > > > > >> >> >>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>> listeners again, etc.
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>> Good point.  First we need to define what
> > > does
> > > > > > > > >> >> >>>>> failure/restart
> > > > > > > > >> >> >>>>>> in
> > > > > > > > >> >> >>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>> client mean. IIUC, that usually mean
> > network
> > > > > > failure
> > > > > > > > >> >> >>> which
> > > > > > > > >> >> >>>>> will
> > > > > > > > >> >> >>>>>>>>>>> happen in
> > > > > > > > >> >> >>>>>>>>>>>>> class RestClient. If my understanding is
> > > correct,
> > > > > > > > >> >> >>>>> restart/retry
> > > > > > > > >> >> >>>>>>>>>>> mechanism
> > > > > > > > >> >> >>>>>>>>>>>>> should be done in RestClient.
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org>
> > > > > > 于2019年6月11日周二
> > > > > > > > >> >> >>>>>> 下午11:10写道：
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>> Some points to consider:
> > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>> * Any API we expose should not have
> > > dependencies
> > > > > > on
> > > > > > > > >> >> >> the
> > > > > > > > >> >> >>>>>> runtime
> > > > > > > > >> >> >>>>>>>>>>>>>> (flink-runtime) package or other
> > > implementation
> > > > > > > > >> >> >>> details.
> > > > > > > > >> >> >>>> To
> > > > > > > > >> >> >>>>>> me,
> > > > > > > > >> >> >>>>>>>>>>> this
> > > > > > > > >> >> >>>>>>>>>>>>> means
> > > > > > > > >> >> >>>>>>>>>>>>>> that the current ClusterClient cannot be
> > > exposed
> > > > > > to
> > > > > > > > >> >> >>> users
> > > > > > > > >> >> >>>>>>> because
> > > > > > > > >> >> >>>>>>>>>>> it
> > > > > > > > >> >> >>>>>>>>>>>>> uses
> > > > > > > > >> >> >>>>>>>>>>>>>> quite some classes from the optimiser
> and
> > > > > runtime
> > > > > > > > >> >> >>>> packages.
> > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>> * What happens when a failure/restart in
> > the
> > > > > > client
> > > > > > > > >> >> >>>>> happens?
> > > > > > > > >> >> >>>>>>>> There
> > > > > > > > >> >> >>>>>>>>>>> need
> > > > > > > > >> >> >>>>>>>>>>>>> to
> > > > > > > > >> >> >>>>>>>>>>>>>> be a way of re-establishing the
> connection
> > > to
> > > > > the
> > > > > > > > >> >> >> job,
> > > > > > > > >> >> >>>> set
> > > > > > > > >> >> >>>>> up
> > > > > > > > >> >> >>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>> listeners
> > > > > > > > >> >> >>>>>>>>>>>>>> again, etc.
> > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>> Aljoscha
> > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff Zhang <
> > > > > > > > >> >> >>>> zjffdu@gmail.com>
> > > > > > > > >> >> >>>>>>>> wrote:
> > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is late as
> > you
> > > > > > > > >> >> >> expected.
> > > > > > > > >> >> >>>>> Here's
> > > > > > > > >> >> >>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>> design
> > > > > > > > >> >> >>>>>>>>>>>>>> doc
> > > > > > > > >> >> >>>>>>>>>>>>>>> I drafted, welcome any comments and
> > > feedback.
> > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>
> > > > > > > > >> >> >>>>>>>
> > > > > > > > >> >> >>>>>>
> > > > > > > > >> >> >>>>>
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>
> > > > > > > > >> >> >>
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
> > > 于2019年2月14日周四
> > > > > > > > >> >> >>>> 下午8:43写道：
> > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>> Nice that this discussion is
> happening.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also revisit the
> > > entire
> > > > > > role
> > > > > > > > >> >> >>> of
> > > > > > > > >> >> >>>>> the
> > > > > > > > >> >> >>>>>>>>>>>> environments
> > > > > > > > >> >> >>>>>>>>>>>>>>>> again.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> > > > > > > > >> >> >>>>>>>>>>>>>>>> - the environments take care of the
> > > specific
> > > > > > > > >> >> >> setup
> > > > > > > > >> >> >>>> for
> > > > > > > > >> >> >>>>>>>>>>> standalone
> > > > > > > > >> >> >>>>>>>>>>>> (no
> > > > > > > > >> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
> > > > > > > > >> >> >>>>>>>>>>>>>>>> - the session ones have control over
> the
> > > > > > session.
> > > > > > > > >> >> >>> The
> > > > > > > > >> >> >>>>>>>>>>> environment
> > > > > > > > >> >> >>>>>>>>>>>>> holds
> > > > > > > > >> >> >>>>>>>>>>>>>>>> the session client.
> > > > > > > > >> >> >>>>>>>>>>>>>>>> - running a job gives a "control"
> object
> > > for
> > > > > > that
> > > > > > > > >> >> >>>> job.
> > > > > > > > >> >> >>>>>> That
> > > > > > > > >> >> >>>>>>>>>>>> behavior
> > > > > > > > >> >> >>>>>>>>>>>>> is
> > > > > > > > >> >> >>>>>>>>>>>>>>>> the same in all environments.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>> The actual implementation diverged
> quite
> > > a bit
> > > > > > > > >> >> >> from
> > > > > > > > >> >> >>>>> that.
> > > > > > > > >> >> >>>>>>>> Happy
> > > > > > > > >> >> >>>>>>>>>>> to
> > > > > > > > >> >> >>>>>>>>>>>>> see a
> > > > > > > > >> >> >>>>>>>>>>>>>>>> discussion about straitening this out
> a
> > > bit
> > > > > > more.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM Jeff
> > > Zhang <
> > > > > > > > >> >> >>>>>>> zjffdu@gmail.com>
> > > > > > > > >> >> >>>>>>>>>>>> wrote:
> > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> Hi folks,
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> Sorry for late response, It seems we
> > > reach
> > > > > > > > >> >> >>> consensus
> > > > > > > > >> >> >>>> on
> > > > > > > > >> >> >>>>>>>> this, I
> > > > > > > > >> >> >>>>>>>>>>>> will
> > > > > > > > >> >> >>>>>>>>>>>>>>>> create
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> FLIP for this with more detailed
> design
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org>
> > > 于2018年12月21日周五
> > > > > > > > >> >> >>>>> 上午11:43写道：
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Great to see this discussion seeded!
> > The
> > > > > > > > >> >> >> problems
> > > > > > > > >> >> >>>> you
> > > > > > > > >> >> >>>>>> face
> > > > > > > > >> >> >>>>>>>>>>> with
> > > > > > > > >> >> >>>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also
> > affecting
> > > > > other
> > > > > > > > >> >> >>>>> downstream
> > > > > > > > >> >> >>>>>>>>>>> projects,
> > > > > > > > >> >> >>>>>>>>>>>>>> like
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Beam.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint
> restore
> > > option
> > > > > > in
> > > > > > > > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> [1]
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and that was more difficult than it
> > > should
> > > > > be.
> > > > > > > > >> >> >> The
> > > > > > > > >> >> >>>>> main
> > > > > > > > >> >> >>>>>>>> issue
> > > > > > > > >> >> >>>>>>>>>>> is
> > > > > > > > >> >> >>>>>>>>>>>>> that
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> environment and cluster client
> aren't
> > > > > > decoupled.
> > > > > > > > >> >> >>>>> Ideally
> > > > > > > > >> >> >>>>>>> it
> > > > > > > > >> >> >>>>>>>>>>> should
> > > > > > > > >> >> >>>>>>>>>>>>> be
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> possible to just get the matching
> > > cluster
> > > > > > client
> > > > > > > > >> >> >>>> from
> > > > > > > > >> >> >>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>> environment
> > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> then control the job through it
> > > (environment
> > > > > > as
> > > > > > > > >> >> >>>>> factory
> > > > > > > > >> >> >>>>>>> for
> > > > > > > > >> >> >>>>>>>>>>>> cluster
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> client). But note that the
> environment
> > > > > classes
> > > > > > > > >> >> >> are
> > > > > > > > >> >> >>>>> part
> > > > > > > > >> >> >>>>>> of
> > > > > > > > >> >> >>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>> public
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> API,
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and it is not straightforward to
> make
> > > larger
> > > > > > > > >> >> >>> changes
> > > > > > > > >> >> >>>>>>> without
> > > > > > > > >> >> >>>>>>>>>>>>> breaking
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes
> > internal
> > > > > > classes
> > > > > > > > >> >> >>>> like
> > > > > > > > >> >> >>>>>>>>>>> JobGraph and
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be
> possible
> > > to
> > > > > wrap
> > > > > > > > >> >> >>> this
> > > > > > > > >> >> >>>>>> with a
> > > > > > > > >> >> >>>>>>>> new
> > > > > > > > >> >> >>>>>>>>>>>>> public
> > > > > > > > >> >> >>>>>>>>>>>>>>>> API
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that brings the required job control
> > > > > > > > >> >> >> capabilities
> > > > > > > > >> >> >>>> for
> > > > > > > > >> >> >>>>>>>>>>> downstream
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> projects.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at
> some
> > > of the
> > > > > > > > >> >> >>>>> interfaces
> > > > > > > > >> >> >>>>>> in
> > > > > > > > >> >> >>>>>>>>>>> Beam
> > > > > > > > >> >> >>>>>>>>>>>>> while
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the
> > > portable
> > > > > job
> > > > > > > > >> >> >> API
> > > > > > > > >> >> >>>> and
> > > > > > > > >> >> >>>>>> [3]
> > > > > > > > >> >> >>>>>>>> for
> > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>> old
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> asynchronous job control from the
> Beam
> > > Java
> > > > > > SDK.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> The backward compatibility
> discussion
> > > [4] is
> > > > > > > > >> >> >> also
> > > > > > > > >> >> >>>>>> relevant
> > > > > > > > >> >> >>>>>>>>>>> here. A
> > > > > > > > >> >> >>>>>>>>>>>>> new
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> API
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> should shield downstream projects
> from
> > > > > > internals
> > > > > > > > >> >> >>> and
> > > > > > > > >> >> >>>>>> allow
> > > > > > > > >> >> >>>>>>>>>>> them to
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> interoperate with multiple future
> > Flink
> > > > > > versions
> > > > > > > > >> >> >>> in
> > > > > > > > >> >> >>>>> the
> > > > > > > > >> >> >>>>>>> same
> > > > > > > > >> >> >>>>>>>>>>>> release
> > > > > > > > >> >> >>>>>>>>>>>>>>>> line
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thomas
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [1]
> > > > > https://github.com/apache/flink/pull/7249
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [2]
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>
> > > > > > > > >> >> >>>>>>>
> > > > > > > > >> >> >>>>>>
> > > > > > > > >> >> >>>>>
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>
> > > > > > > > >> >> >>
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [3]
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>
> > > > > > > > >> >> >>>>>>>
> > > > > > > > >> >> >>>>>>
> > > > > > > > >> >> >>>>>
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>
> > > > > > > > >> >> >>
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> [4]
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>
> > > > > > > > >> >> >>>>>>>
> > > > > > > > >> >> >>>>>>
> > > > > > > > >> >> >>>>>
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>
> > > > > > > > >> >> >>
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff
> > > Zhang <
> > > > > > > > >> >> >>>>>>>> zjffdu@gmail.com>
> > > > > > > > >> >> >>>>>>>>>>>>> wrote:
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user
> > > should
> > > > > be
> > > > > > > > >> >> >>> able
> > > > > > > > >> >> >>>> to
> > > > > > > > >> >> >>>>>>>> define
> > > > > > > > >> >> >>>>>>>>>>>> where
> > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> job
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This
> is
> > > > > actually
> > > > > > > > >> >> >>>>>> independent
> > > > > > > > >> >> >>>>>>>> of
> > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>> job
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> development and is something which
> is
> > > > > decided
> > > > > > > > >> >> >> at
> > > > > > > > >> >> >>>>>>> deployment
> > > > > > > > >> >> >>>>>>>>>>> time.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> User don't need to specify
> execution
> > > mode
> > > > > > > > >> >> >>>>>>> programmatically.
> > > > > > > > >> >> >>>>>>>>>>> They
> > > > > > > > >> >> >>>>>>>>>>>>> can
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> also
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from the
> > > arguments
> > > > > in
> > > > > > > > >> >> >>> flink
> > > > > > > > >> >> >>>>> run
> > > > > > > > >> >> >>>>>>>>>>> command.
> > > > > > > > >> >> >>>>>>>>>>>>> e.g.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster ....
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that the
> > > > > > > > >> >> >>>> ExecutionEnvironment
> > > > > > > > >> >> >>>>>> is
> > > > > > > > >> >> >>>>>>>> not
> > > > > > > > >> >> >>>>>>>>>>>>>>>> directly
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> initialized by the user and instead
> > > context
> > > > > > > > >> >> >>>> sensitive
> > > > > > > > >> >> >>>>>> how
> > > > > > > > >> >> >>>>>>>> you
> > > > > > > > >> >> >>>>>>>>>>>> want
> > > > > > > > >> >> >>>>>>>>>>>>> to
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs.
> IDE,
> > > for
> > > > > > > > >> >> >>> example).
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink
> would
> > > > > create
> > > > > > > > >> >> >>>>> different
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment based
> on
> > > > > > different
> > > > > > > > >> >> >>>>>> submission
> > > > > > > > >> >> >>>>>>>>>>>> scenarios
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> (Flink
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is kind of
> > hack
> > > > > > > > >> >> >> approach,
> > > > > > > > >> >> >>>> not
> > > > > > > > >> >> >>>>>> so
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> straightforward.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> What I suggested above is that is
> > that
> > > > > flink
> > > > > > > > >> >> >>> should
> > > > > > > > >> >> >>>>>>> always
> > > > > > > > >> >> >>>>>>>>>>> create
> > > > > > > > >> >> >>>>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with
> > different
> > > > > > > > >> >> >>>>> configuration,
> > > > > > > > >> >> >>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>> based
> > > > > > > > >> >> >>>>>>>>>>>> on
> > > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> configuration it would create the
> > > proper
> > > > > > > > >> >> >>>>> ClusterClient
> > > > > > > > >> >> >>>>>>> for
> > > > > > > > >> >> >>>>>>>>>>>>> different
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <
> trohrmann@apache.org>
> > > > > > > > >> >> >>>> 于2018年12月20日周四
> > > > > > > > >> >> >>>>>>>>>>> 下午11:18写道：
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> You are probably right that we
> have
> > > code
> > > > > > > > >> >> >>>> duplication
> > > > > > > > >> >> >>>>>>> when
> > > > > > > > >> >> >>>>>>>> it
> > > > > > > > >> >> >>>>>>>>>>>> comes
> > > > > > > > >> >> >>>>>>>>>>>>>>>> to
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient.
> This
> > > should
> > > > > > be
> > > > > > > > >> >> >>>>> reduced
> > > > > > > > >> >> >>>>>> in
> > > > > > > > >> >> >>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>> future.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user
> > > should be
> > > > > > > > >> >> >> able
> > > > > > > > >> >> >>> to
> > > > > > > > >> >> >>>>>>> define
> > > > > > > > >> >> >>>>>>>>>>> where
> > > > > > > > >> >> >>>>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This
> is
> > > > > > actually
> > > > > > > > >> >> >>>>>>> independent
> > > > > > > > >> >> >>>>>>>>>>> of the
> > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> development and is something which
> > is
> > > > > > decided
> > > > > > > > >> >> >> at
> > > > > > > > >> >> >>>>>>>> deployment
> > > > > > > > >> >> >>>>>>>>>>>> time.
> > > > > > > > >> >> >>>>>>>>>>>>>>>> To
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> it
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the
> > > ExecutionEnvironment
> > > > > is
> > > > > > > > >> >> >> not
> > > > > > > > >> >> >>>>>>> directly
> > > > > > > > >> >> >>>>>>>>>>>>>>>> initialized
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> by
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> the user and instead context
> > > sensitive how
> > > > > > you
> > > > > > > > >> >> >>>> want
> > > > > > > > >> >> >>>>> to
> > > > > > > > >> >> >>>>>>>>>>> execute
> > > > > > > > >> >> >>>>>>>>>>>>> your
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for example).
> > > > > However, I
> > > > > > > > >> >> >>> agree
> > > > > > > > >> >> >>>>>> that
> > > > > > > > >> >> >>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should give
> you
> > > > > access
> > > > > > to
> > > > > > > > >> >> >>> the
> > > > > > > > >> >> >>>>>>>>>>> ClusterClient
> > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> to
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the
> > > JobGraph or
> > > > > a
> > > > > > > > >> >> >> job
> > > > > > > > >> >> >>>>> plan).
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Till
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36 AM
> Jeff
> > > > > Zhang <
> > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com>
> > > > > > > > >> >> >>>>>>>>>>>>>>>> wrote:
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You are
> > > right
> > > > > > that I
> > > > > > > > >> >> >>>>> expect
> > > > > > > > >> >> >>>>>>>> better
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api which
> > > could be
> > > > > > > > >> >> >> used
> > > > > > > > >> >> >>> by
> > > > > > > > >> >> >>>>>>>>>>> downstream
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> project.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> And
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the flink
> > > ecosystem.
> > > > > > > > >> >> >> When
> > > > > > > > >> >> >>> I
> > > > > > > > >> >> >>>>> look
> > > > > > > > >> >> >>>>>>> at
> > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>> code
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> of
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> flink
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I
> > believe
> > > > > they
> > > > > > > > >> >> >> are
> > > > > > > > >> >> >>>> not
> > > > > > > > >> >> >>>>>> the
> > > > > > > > >> >> >>>>>>>>>>> core of
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> flink,
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> but
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of
> flink),
> > I
> > > find
> > > > > > > > >> >> >> many
> > > > > > > > >> >> >>>>>>> duplicated
> > > > > > > > >> >> >>>>>>>>>>> code
> > > > > > > > >> >> >>>>>>>>>>>>>>>> for
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creating
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user provided
> > > > > > > > >> >> >> configuration
> > > > > > > > >> >> >>>>>>>>>>> (configuration
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> format
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> may
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell and
> > > > > sql-client)
> > > > > > > > >> >> >> and
> > > > > > > > >> >> >>>> then
> > > > > > > > >> >> >>>>>> use
> > > > > > > > >> >> >>>>>>>>>>> that
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't think
> > > this is
> > > > > > > > >> >> >>>>> convenient
> > > > > > > > >> >> >>>>>>> for
> > > > > > > > >> >> >>>>>>>>>>>>>>>> downstream
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is that
> > > > > downstream
> > > > > > > > >> >> >>>> project
> > > > > > > > >> >> >>>>>> only
> > > > > > > > >> >> >>>>>>>>>>> needs
> > > > > > > > >> >> >>>>>>>>>>>> to
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> provide
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration info
> (maybe
> > > > > > > > >> >> >> introducing
> > > > > > > > >> >> >>>>> class
> > > > > > > > >> >> >>>>>>>>>>>> FlinkConf),
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment based
> on
> > > this
> > > > > > > > >> >> >>>> FlinkConf,
> > > > > > > > >> >> >>>>>> and
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will create
> > the
> > > > > proper
> > > > > > > > >> >> >>>>>>>> ClusterClient.
> > > > > > > > >> >> >>>>>>>>>>> It
> > > > > > > > >> >> >>>>>>>>>>>> not
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> only
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream
> project
> > > > > > > > >> >> >> development
> > > > > > > > >> >> >>>> but
> > > > > > > > >> >> >>>>>> also
> > > > > > > > >> >> >>>>>>>> be
> > > > > > > > >> >> >>>>>>>>>>>>>>>> helpful
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> for
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> their integration test with
> flink.
> > > Here's
> > > > > > one
> > > > > > > > >> >> >>>>> sample
> > > > > > > > >> >> >>>>>>> code
> > > > > > > > >> >> >>>>>>>>>>>> snippet
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> that
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> I
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new
> > > FlinkConf().mode("yarn")
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new
> > > ExecutionEnvironment(conf)
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> > > > > > > > >> >> >>>>>>>>>>>
> env.getClusterClient().queryJobStatus(jobId)
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > env.getClusterClient().cancelJob(jobId)
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
> > trohrmann@apache.org>
> > > > > > > > >> >> >>>>> 于2018年12月11日周二
> > > > > > > > >> >> >>>>>>>>>>> 下午6:28写道：
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to
> > > provide the
> > > > > > > > >> >> >> user
> > > > > > > > >> >> >>>> with
> > > > > > > > >> >> >>>>>>>> better
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was actually an
> > > effort to
> > > > > > > > >> >> >>> achieve
> > > > > > > > >> >> >>>>>> this
> > > > > > > > >> >> >>>>>>>> but
> > > > > > > > >> >> >>>>>>>>>>> it
> > > > > > > > >> >> >>>>>>>>>>>>>>>> has
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> never
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> been
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However, there
> are
> > > some
> > > > > > > > >> >> >>>> improvement
> > > > > > > > >> >> >>>>>> in
> > > > > > > > >> >> >>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>> code
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> base
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> now.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the
> > > NewClusterClient
> > > > > > > > >> >> >>>> interface
> > > > > > > > >> >> >>>>>>> which
> > > > > > > > >> >> >>>>>>>>>>>>>>>> offers a
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission.
> But I
> > > agree
> > > > > > > > >> >> >> that
> > > > > > > > >> >> >>> we
> > > > > > > > >> >> >>>>>> need
> > > > > > > > >> >> >>>>>>> to
> > > > > > > > >> >> >>>>>>>>>>>>>>>> improve
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> in
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if
> > > exposing all
> > > > > > > > >> >> >>>>>> ClusterClient
> > > > > > > > >> >> >>>>>>>>>>> calls
> > > > > > > > >> >> >>>>>>>>>>>>>>>> via
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment because it
> > > would
> > > > > > > > >> >> >> clutter
> > > > > > > > >> >> >>>> the
> > > > > > > > >> >> >>>>>>> class
> > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>>>>>>> would
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> not
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> a
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns.
> > > Instead one
> > > > > > > > >> >> >> idea
> > > > > > > > >> >> >>>>> could
> > > > > > > > >> >> >>>>>> be
> > > > > > > > >> >> >>>>>>>> to
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> retrieve
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from the
> > > > > > > > >> >> >>>>> ExecutionEnvironment
> > > > > > > > >> >> >>>>>>>> which
> > > > > > > > >> >> >>>>>>>>>>> can
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> be
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> used
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job control. But
> > > before
> > > > > we
> > > > > > > > >> >> >>> start
> > > > > > > > >> >> >>>>> an
> > > > > > > > >> >> >>>>>>>> effort
> > > > > > > > >> >> >>>>>>>>>>>>>>>> here,
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> we
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> need
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what
> > > functionality we
> > > > > > want
> > > > > > > > >> >> >>> to
> > > > > > > > >> >> >>>>>>> provide.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was that we
> > > have the
> > > > > > > > >> >> >>>>>>>> ClusterDescriptor
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> describing
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> how
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager like
> > > Yarn or
> > > > > > > > >> >> >> Mesos.
> > > > > > > > >> >> >>>> The
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> can
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> be
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink
> clusters
> > > (job
> > > > > and
> > > > > > > > >> >> >>>>> session)
> > > > > > > > >> >> >>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>> gives
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> you a
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The ClusterClient
> > > > > controls
> > > > > > > > >> >> >> the
> > > > > > > > >> >> >>>>>> cluster
> > > > > > > > >> >> >>>>>>>>>>> (e.g.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> submitting
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running jobs).
> > And
> > > > > then
> > > > > > > > >> >> >>> there
> > > > > > > > >> >> >>>>> was
> > > > > > > > >> >> >>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>> idea
> > > > > > > > >> >> >>>>>>>>>>>> to
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain from
> > the
> > > > > > > > >> >> >>>> ClusterClient
> > > > > > > > >> >> >>>>> to
> > > > > > > > >> >> >>>>>>>>>>> trigger
> > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> specific
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a
> > savepoint,
> > > > > > > > >> >> >>> cancelling
> > > > > > > > >> >> >>>>> the
> > > > > > > > >> >> >>>>>>>> job).
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> > > > > > > > >> >> >>>>>>
> https://issues.apache.org/jira/browse/FLINK-4272
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM
> > > Jeff
> > > > > > Zhang
> > > > > > > > >> >> >> <
> > > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> wrote:
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate flink
> > into
> > > > > > apache
> > > > > > > > >> >> >>>>> zeppelin
> > > > > > > > >> >> >>>>>>>>>>> which is
> > > > > > > > >> >> >>>>>>>>>>>>>>>> an
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several
> > issues
> > > that
> > > > > > is
> > > > > > > > >> >> >>>> caused
> > > > > > > > >> >> >>>>>> by
> > > > > > > > >> >> >>>>>>>>>>> flink
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> client
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> api.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> So
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the
> > following
> > > > > > changes
> > > > > > > > >> >> >>> for
> > > > > > > > >> >> >>>>>> flink
> > > > > > > > >> >> >>>>>>>>>>> client
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> api.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking
> execution.
> > > > > > > > >> >> >> Currently,
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which
> would
> > > do 2
> > > > > > > > >> >> >> things,
> > > > > > > > >> >> >>>>> first
> > > > > > > > >> >> >>>>>>>>>>> submit
> > > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is
> > finished.
> > > I'd
> > > > > > like
> > > > > > > > >> >> >>>>>>> introduce a
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> nonblocking
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
> > > > > > > > >> >> >>>> ExecutionEnvironment#submit
> > > > > > > > >> >> >>>>>>> which
> > > > > > > > >> >> >>>>>>>>>>> only
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> submit
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to client.
> And
> > > allow
> > > > > > user
> > > > > > > > >> >> >>> to
> > > > > > > > >> >> >>>>>> query
> > > > > > > > >> >> >>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>> job
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> status
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> via
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to
> cancel
> > > job is
> > > > > > via
> > > > > > > > >> >> >>> cli
> > > > > > > > >> >> >>>>>>>>>>> (bin/flink),
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> this
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> is
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> not
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream
> project
> > > to
> > > > > use
> > > > > > > > >> >> >> this
> > > > > > > > >> >> >>>>>>> feature.
> > > > > > > > >> >> >>>>>>>>>>> So I'd
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> like
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> add
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in
> > ExecutionEnvironment
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> It
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api, we
> > > should use
> > > > > > > > >> >> >>>>>>>>>>> ExecutionEnvironment
> > > > > > > > >> >> >>>>>>>>>>>>>>>> as
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to
> integrate
> > > with
> > > > > > > > >> >> >> flink.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job
> execution
> > > > > > > > >> >> >> lifecycle.
> > > > > > > > >> >> >>>>>>> Something
> > > > > > > > >> >> >>>>>>>>>>> like
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> following,
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> so
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project can do
> > > custom
> > > > > > logic
> > > > > > > > >> >> >>> in
> > > > > > > > >> >> >>>>> the
> > > > > > > > >> >> >>>>>>>>>>> lifecycle
> > > > > > > > >> >> >>>>>>>>>>>>>>>> of
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> job.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the
> jobId
> > > after
> > > > > > job
> > > > > > > > >> >> >> is
> > > > > > > > >> >> >>>>>>> submitted
> > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> use
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> this
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later when
> > > > > necessary.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface JobListener {
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID
> > jobId);
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
> > > onJobExecuted(JobExecutionResult
> > > > > > > > >> >> >>>>> jobResult);
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID
> jobId);
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in
> > > > > > ExecutionEnvironment.
> > > > > > > > >> >> >>>>>> Currently
> > > > > > > > >> >> >>>>>>> it
> > > > > > > > >> >> >>>>>>>>>>> is
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> disabled,
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> but
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient for
> > > third
> > > > > > party
> > > > > > > > >> >> >> to
> > > > > > > > >> >> >>>>>>>> submitting
> > > > > > > > >> >> >>>>>>>>>>> jobs
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it
> again.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client api
> > into
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue which
> > > needs
> > > > > > more
> > > > > > > > >> >> >>>>> careful
> > > > > > > > >> >> >>>>>>>>>>> thinking
> > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> design.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features of
> > > flink is
> > > > > > > > >> >> >>> exposed
> > > > > > > > >> >> >>>> in
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > > > > >> >> >>>>>> but
> > > > > > > > >> >> >>>>>>>>>>> some are
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> exposed
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> in
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like the
> > > cancel and
> > > > > > > > >> >> >>>>> savepoint I
> > > > > > > > >> >> >>>>>>>>>>> mentioned
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> above.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is due to
> > that
> > > > > flink
> > > > > > > > >> >> >>>> didn't
> > > > > > > > >> >> >>>>>>> unify
> > > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> interaction
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> with
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3 scenarios
> of
> > > flink
> > > > > > > > >> >> >>>> operation
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.  Flink
> > will
> > > > > > create
> > > > > > > > >> >> >>>>>>>>>>> LocalEnvironment
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> use
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to
> create
> > > > > > > > >> >> >>> LocalExecutor
> > > > > > > > >> >> >>>>> for
> > > > > > > > >> >> >>>>>>> job
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> execution.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution. Flink
> > will
> > > > > > create
> > > > > > > > >> >> >>>>>>>> ClusterClient
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> first
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> then
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment
> based
> > > on the
> > > > > > > > >> >> >>>>>>> ClusterClient
> > > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> run
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink will
> > > create
> > > > > > > > >> >> >>>>>> ClusterClient
> > > > > > > > >> >> >>>>>>>>>>> first
> > > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> then
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this
> ClusterClient.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the above 3
> > > > > scenarios.
> > > > > > > > >> >> >>> Flink
> > > > > > > > >> >> >>>>>> didn't
> > > > > > > > >> >> >>>>>>>>>>> use the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to interact
> > > with
> > > > > > flink
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is following:
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> > > > > > > > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
> > > > > > > > >> >> >>>>>>>>>>> (based
> > > > > > > > >> >> >>>>>>>>>>>>>>>> on
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> user
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this
> > > Environment
> > > > > > to
> > > > > > > > >> >> >>>> create
> > > > > > > > >> >> >>>>>>>> proper
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or
> > > > > RestClusterClient)
> > > > > > > > >> >> >> to
> > > > > > > > >> >> >>>>>>>> interactive
> > > > > > > > >> >> >>>>>>>>>>> with
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink (
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the
> process
> > > of
> > > > > > local
> > > > > > > > >> >> >>>>>> execution
> > > > > > > > >> >> >>>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>>>>>>> remote
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for third
> > > party
> > > > > to
> > > > > > > > >> >> >>>>> integrate
> > > > > > > > >> >> >>>>>>> with
> > > > > > > > >> >> >>>>>>>>>>>>>>>> flink,
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> because
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the
> > unified
> > > > > entry
> > > > > > > > >> >> >>> point
> > > > > > > > >> >> >>>>> for
> > > > > > > > >> >> >>>>>>>>>>> flink.
> > > > > > > > >> >> >>>>>>>>>>>>>>>> What
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> third
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> party
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass
> > > configuration
> > > > > to
> > > > > > > > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will do
> the
> > > right
> > > > > > > > >> >> >> thing
> > > > > > > > >> >> >>>>> based
> > > > > > > > >> >> >>>>>> on
> > > > > > > > >> >> >>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be
> considered
> > as
> > > > > flink
> > > > > > > > >> >> >> api
> > > > > > > > >> >> >>>>>>> consumer.
> > > > > > > > >> >> >>>>>>>>>>> it
> > > > > > > > >> >> >>>>>>>>>>>>>>>> just
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to
> > > ExecutionEnvironment
> > > > > and
> > > > > > > > >> >> >> let
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper ClusterClient
> > > instead
> > > > > > of
> > > > > > > > >> >> >>>>> letting
> > > > > > > > >> >> >>>>>>> cli
> > > > > > > > >> >> >>>>>>>> to
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> create
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code
> > > refactoring,
> > > > > > so
> > > > > > > > >> >> >> I
> > > > > > > > >> >> >>>>> think
> > > > > > > > >> >> >>>>>> we
> > > > > > > > >> >> >>>>>>>> can
> > > > > > > > >> >> >>>>>>>>>>>>>>>> defer
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>> it
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> for
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5 could
> > be
> > > done
> > > > > > at
> > > > > > > > >> >> >>>> once I
> > > > > > > > >> >> >>>>>>>>>>> believe.
> > > > > > > > >> >> >>>>>>>>>>>>>>>> Let
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> know
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> your
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback, thanks
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> --
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> --
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> --
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> Best Regards
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>> --
> > > > > > > > >> >> >>>>>>>>>>>>>>> Best Regards
> > > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>> --
> > > > > > > > >> >> >>>>>>>>>>>>> Best Regards
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>> Jeff Zhang
> > > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>>>
> > > > > > > > >> >> >>>>>>>>
> > > > > > > > >> >> >>>>>>>> --
> > > > > > > > >> >> >>>>>>>> Best Regards
> > > > > > > > >> >> >>>>>>>>
> > > > > > > > >> >> >>>>>>>> Jeff Zhang
> > > > > > > > >> >> >>>>>>>>
> > > > > > > > >> >> >>>>>>>
> > > > > > > > >> >> >>>>>>
> > > > > > > > >> >> >>>>>
> > > > > > > > >> >> >>>>>
> > > > > > > > >> >> >>>>> --
> > > > > > > > >> >> >>>>> Best Regards
> > > > > > > > >> >> >>>>>
> > > > > > > > >> >> >>>>> Jeff Zhang
> > > > > > > > >> >> >>>>>
> > > > > > > > >> >> >>>>
> > > > > > > > >> >> >>>
> > > > > > > > >> >> >>
> > > > > > > > >> >> >
> > > > > > > > >> >> >
> > > > > > > > >> >> > --
> > > > > > > > >> >> > Best Regards
> > > > > > > > >> >> >
> > > > > > > > >> >> > Jeff Zhang
> > > > > > > > >> >>
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> > >
> >
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

Hi Till,

Thanks for your update. Nice to hear :-)

Best,
tison.


Till Rohrmann <tr...@apache.org> 于2019年8月23日周五 下午10:39写道：

> Hi Tison,
>
> just a quick comment concerning the class loading issues when using the per
> job mode. The community wants to change it so that the
> StandaloneJobClusterEntryPoint actually uses the user code class loader
> with child first class loading [1]. Hence, I hope that this problem will be
> resolved soon.
>
> [1] https://issues.apache.org/jira/browse/FLINK-13840
>
> Cheers,
> Till
>
> On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <kk...@gmail.com> wrote:
>
> > Hi all,
> >
> > On the topic of web submission, I agree with Till that it only seems
> > to complicate things.
> > It is bad for security, job isolation (anybody can submit/cancel jobs),
> > and its
> > implementation complicates some parts of the code. So, if it were to
> > redesign the
> > WebUI, maybe this part could be left out. In addition, I would say
> > that the ability to cancel
> > jobs could also be left out.
> >
> > Also I would also be in favour of removing the "detached" mode, for
> > the reasons mentioned
> > above (i.e. because now we will have a future representing the result
> > on which the user
> > can choose to wait or not).
> >
> > Now for the separating job submission and cluster creation, I am in
> > favour of keeping both.
> > Once again, the reasons are mentioned above by Stephan, Till, Aljoscha
> > and also Zili seems
> > to agree. They mainly have to do with security, isolation and ease of
> > resource management
> > for the user as he knows that "when my job is done, everything will be
> > cleared up". This is
> > also the experience you get when launching a process on your local OS.
> >
> > On excluding the per-job mode from returning a JobClient or not, I
> > believe that eventually
> > it would be nice to allow users to get back a jobClient. The reason is
> > that 1) I cannot
> > find any objective reason why the user-experience should diverge, and
> > 2) this will be the
> > way that the user will be able to interact with his running job.
> > Assuming that the necessary
> > ports are open for the REST API to work, then I think that the
> > JobClient can run against the
> > REST API without problems. If the needed ports are not open, then we
> > are safe to not return
> > a JobClient, as the user explicitly chose to close all points of
> > communication to his running job.
> >
> > On the topic of not hijacking the "env.execute()" in order to get the
> > Plan, I definitely agree but
> > for the proposal of having a "compile()" method in the env, I would
> > like to have a better look at
> > the existing code.
> >
> > Cheers,
> > Kostas
> >
> > On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <wa...@gmail.com> wrote:
> > >
> > > Hi Yang,
> > >
> > > It would be helpful if you check Stephan's last comment,
> > > which states that isolation is important.
> > >
> > > For per-job mode, we run a dedicated cluster(maybe it
> > > should have been a couple of JM and TMs during FLIP-6
> > > design) for a specific job. Thus the process is prevented
> > > from other jobs.
> > >
> > > In our cases there was a time we suffered from multi
> > > jobs submitted by different users and they affected
> > > each other so that all ran into an error state. Also,
> > > run the client inside the cluster could save client
> > > resource at some points.
> > >
> > > However, we also face several issues as you mentioned,
> > > that in per-job mode it always uses parent classloader
> > > thus classloading issues occur.
> > >
> > > BTW, one can makes an analogy between session/per-job mode
> > > in  Flink, and client/cluster mode in Spark.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Yang Wang <da...@gmail.com> 于2019年8月22日周四 上午11:25写道：
> > >
> > > > From the user's perspective, it is really confused about the scope of
> > > > per-job cluster.
> > > >
> > > >
> > > > If it means a flink cluster with single job, so that we could get
> > better
> > > > isolation.
> > > >
> > > > Now it does not matter how we deploy the cluster, directly
> > deploy(mode1)
> > > >
> > > > or start a flink cluster and then submit job through cluster
> > client(mode2).
> > > >
> > > >
> > > > Otherwise, if it just means directly deploy, how should we name the
> > mode2,
> > > >
> > > > session with job or something else?
> > > >
> > > > We could also benefit from the mode2. Users could get the same
> > isolation
> > > > with mode1.
> > > >
> > > > The user code and dependencies will be loaded by user class loader
> > > >
> > > > to avoid class conflict with framework.
> > > >
> > > >
> > > >
> > > > Anyway, both of the two submission modes are useful.
> > > >
> > > > We just need to clarify the concepts.
> > > >
> > > >
> > > >
> > > >
> > > > Best,
> > > >
> > > > Yang
> > > >
> > > > Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午5:58写道：
> > > >
> > > > > Thanks for the clarification.
> > > > >
> > > > > The idea JobDeployer ever came into my mind when I was muddled with
> > > > > how to execute per-job mode and session mode with the same user
> code
> > > > > and framework codepath.
> > > > >
> > > > > With the concept JobDeployer we back to the statement that
> > environment
> > > > > knows every configs of cluster deployment and job submission. We
> > > > > configure or generate from configuration a specific JobDeployer in
> > > > > environment and then code align on
> > > > >
> > > > > *JobClient client = env.execute().get();*
> > > > >
> > > > > which in session mode returned by clusterClient.submitJob and in
> > per-job
> > > > > mode returned by clusterDescriptor.deployJobCluster.
> > > > >
> > > > > Here comes a problem that currently we directly run
> ClusterEntrypoint
> > > > > with extracted job graph. Follow the JobDeployer way we'd better
> > > > > align entry point of per-job deployment at JobDeployer. Users run
> > > > > their main method or by a Cli(finally call main method) to deploy
> the
> > > > > job cluster.
> > > > >
> > > > > Best,
> > > > > tison.
> > > > >
> > > > >
> > > > > Stephan Ewen <se...@apache.org> 于2019年8月20日周二 下午4:40写道：
> > > > >
> > > > > > Till has made some good comments here.
> > > > > >
> > > > > > Two things to add:
> > > > > >
> > > > > >   - The job mode is very nice in the way that it runs the client
> > inside
> > > > > the
> > > > > > cluster (in the same image/process that is the JM) and thus
> unifies
> > > > both
> > > > > > applications and what the Spark world calls the "driver mode".
> > > > > >
> > > > > >   - Another thing I would add is that during the FLIP-6 design,
> we
> > were
> > > > > > thinking about setups where Dispatcher and JobManager are
> separate
> > > > > > processes.
> > > > > >     A Yarn or Mesos Dispatcher of a session could run
> independently
> > > > (even
> > > > > > as privileged processes executing no code).
> > > > > >     Then you the "per-job" mode could still be helpful: when a
> job
> > is
> > > > > > submitted to the dispatcher, it launches the JM again in a
> per-job
> > > > mode,
> > > > > so
> > > > > > that JM and TM processes are bound to teh job only. For higher
> > security
> > > > > > setups, it is important that processes are not reused across
> jobs.
> > > > > >
> > > > > > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
> > trohrmann@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > I would not be in favour of getting rid of the per-job mode
> > since it
> > > > > > > simplifies the process of running Flink jobs considerably.
> > Moreover,
> > > > it
> > > > > > is
> > > > > > > not only well suited for container deployments but also for
> > > > deployments
> > > > > > > where you want to guarantee job isolation. For example, a user
> > could
> > > > > use
> > > > > > > the per-job mode on Yarn to execute his job on a separate
> > cluster.
> > > > > > >
> > > > > > > I think that having two notions of cluster deployments (session
> > vs.
> > > > > > per-job
> > > > > > > mode) does not necessarily contradict your ideas for the client
> > api
> > > > > > > refactoring. For example one could have the following
> interfaces:
> > > > > > >
> > > > > > > - ClusterDeploymentDescriptor: encapsulates the logic how to
> > deploy a
> > > > > > > cluster.
> > > > > > > - ClusterClient: allows to interact with a cluster
> > > > > > > - JobClient: allows to interact with a running job
> > > > > > >
> > > > > > > Now the ClusterDeploymentDescriptor could have two methods:
> > > > > > >
> > > > > > > - ClusterClient deploySessionCluster()
> > > > > > > - JobClusterClient/JobClient deployPerJobCluster(JobGraph)
> > > > > > >
> > > > > > > where JobClusterClient is either a supertype of ClusterClient
> > which
> > > > > does
> > > > > > > not give you the functionality to submit jobs or
> > deployPerJobCluster
> > > > > > > returns directly a JobClient.
> > > > > > >
> > > > > > > When setting up the ExecutionEnvironment, one would then not
> > provide
> > > > a
> > > > > > > ClusterClient to submit jobs but a JobDeployer which, depending
> > on
> > > > the
> > > > > > > selected mode, either uses a ClusterClient (session mode) to
> > submit
> > > > > jobs
> > > > > > or
> > > > > > > a ClusterDeploymentDescriptor to deploy per a job mode cluster
> > with
> > > > the
> > > > > > job
> > > > > > > to execute.
> > > > > > >
> > > > > > > These are just some thoughts how one could make it working
> > because I
> > > > > > > believe there is some value in using the per job mode from the
> > > > > > > ExecutionEnvironment.
> > > > > > >
> > > > > > > Concerning the web submission, this is indeed a bit tricky.
> From
> > a
> > > > > > cluster
> > > > > > > management stand point, I would in favour of not executing user
> > code
> > > > on
> > > > > > the
> > > > > > > REST endpoint. Especially when considering security, it would
> be
> > good
> > > > > to
> > > > > > > have a well defined cluster behaviour where it is explicitly
> > stated
> > > > > where
> > > > > > > user code and, thus, potentially risky code is executed.
> Ideally
> > we
> > > > > limit
> > > > > > > it to the TaskExecutor and JobMaster.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <
> > > > > pompermaier@okkam.it
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > In my opinion the client should not use any environment to
> get
> > the
> > > > > Job
> > > > > > > > graph because the jar should reside ONLY on the cluster (and
> > not in
> > > > > the
> > > > > > > > client classpath otherwise there are always inconsistencies
> > between
> > > > > > > client
> > > > > > > > and Flink Job manager's classpath).
> > > > > > > > In the YARN, Mesos and Kubernetes scenarios you have the jar
> > but
> > > > you
> > > > > > > could
> > > > > > > > start a cluster that has the jar on the Job Manager as well
> > (but
> > > > this
> > > > > > is
> > > > > > > > the only case where I think you can assume that the client
> has
> > the
> > > > > jar
> > > > > > on
> > > > > > > > the classpath..in the REST job submission you don't have any
> > > > > > classpath).
> > > > > > > >
> > > > > > > > Thus, always in my opinion, the JobGraph should be generated
> > by the
> > > > > Job
> > > > > > > > Manager REST API.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
> > wander4096@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> I would like to involve Till & Stephan here to clarify some
> > > > concept
> > > > > of
> > > > > > > >> per-job mode.
> > > > > > > >>
> > > > > > > >> The term per-job is one of modes a cluster could run on. It
> is
> > > > > mainly
> > > > > > > >> aimed
> > > > > > > >> at spawn
> > > > > > > >> a dedicated cluster for a specific job while the job could
> be
> > > > > packaged
> > > > > > > >> with
> > > > > > > >> Flink
> > > > > > > >> itself and thus the cluster initialized with job so that get
> > rid
> > > > of
> > > > > a
> > > > > > > >> separated
> > > > > > > >> submission step.
> > > > > > > >>
> > > > > > > >> This is useful for container deployments where one create
> his
> > > > image
> > > > > > with
> > > > > > > >> the job
> > > > > > > >> and then simply deploy the container.
> > > > > > > >>
> > > > > > > >> However, it is out of client scope since a
> > client(ClusterClient
> > > > for
> > > > > > > >> example) is for
> > > > > > > >> communicate with an existing cluster and performance
> actions.
> > > > > > Currently,
> > > > > > > >> in
> > > > > > > >> per-job
> > > > > > > >> mode, we extract the job graph and bundle it into cluster
> > > > deployment
> > > > > > and
> > > > > > > >> thus no
> > > > > > > >> concept of client get involved. It looks like reasonable to
> > > > exclude
> > > > > > the
> > > > > > > >> deployment
> > > > > > > >> of per-job cluster from client api and use dedicated utility
> > > > > > > >> classes(deployers) for
> > > > > > > >> deployment.
> > > > > > > >>
> > > > > > > >> Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午12:37写道：
> > > > > > > >>
> > > > > > > >> > Hi Aljoscha,
> > > > > > > >> >
> > > > > > > >> > Thanks for your reply and participance. The Google Doc you
> > > > linked
> > > > > to
> > > > > > > >> > requires
> > > > > > > >> > permission and I think you could use a share link instead.
> > > > > > > >> >
> > > > > > > >> > I agree with that we almost reach a consensus that
> > JobClient is
> > > > > > > >> necessary
> > > > > > > >> > to
> > > > > > > >> > interacte with a running Job.
> > > > > > > >> >
> > > > > > > >> > Let me check your open questions one by one.
> > > > > > > >> >
> > > > > > > >> > 1. Separate cluster creation and job submission for
> per-job
> > > > mode.
> > > > > > > >> >
> > > > > > > >> > As you mentioned here is where the opinions diverge. In my
> > > > > document
> > > > > > > >> there
> > > > > > > >> > is
> > > > > > > >> > an alternative[2] that proposes excluding per-job
> deployment
> > > > from
> > > > > > > client
> > > > > > > >> > api
> > > > > > > >> > scope and now I find it is more reasonable we do the
> > exclusion.
> > > > > > > >> >
> > > > > > > >> > When in per-job mode, a dedicated JobCluster is launched
> to
> > > > > execute
> > > > > > > the
> > > > > > > >> > specific job. It is like a Flink Application more than a
> > > > > submission
> > > > > > > >> > of Flink Job. Client only takes care of job submission and
> > > > assume
> > > > > > > there
> > > > > > > >> is
> > > > > > > >> > an existing cluster. In this way we are able to consider
> > per-job
> > > > > > > issues
> > > > > > > >> > individually and JobClusterEntrypoint would be the utility
> > class
> > > > > for
> > > > > > > >> > per-job
> > > > > > > >> > deployment.
> > > > > > > >> >
> > > > > > > >> > Nevertheless, user program works in both session mode and
> > > > per-job
> > > > > > mode
> > > > > > > >> > without
> > > > > > > >> > necessary to change code. JobClient in per-job mode is
> > returned
> > > > > from
> > > > > > > >> > env.execute as normal. However, it would be no longer a
> > wrapper
> > > > of
> > > > > > > >> > RestClusterClient but a wrapper of PerJobClusterClient
> which
> > > > > > > >> communicates
> > > > > > > >> > to Dispatcher locally.
> > > > > > > >> >
> > > > > > > >> > 2. How to deal with plan preview.
> > > > > > > >> >
> > > > > > > >> > With env.compile functions users can get JobGraph or
> > FlinkPlan
> > > > and
> > > > > > > thus
> > > > > > > >> > they can preview the plan with programming. Typically it
> > looks
> > > > > like
> > > > > > > >> >
> > > > > > > >> > if (preview configured) {
> > > > > > > >> >     FlinkPlan plan = env.compile();
> > > > > > > >> >     new JSONDumpGenerator(...).dump(plan);
> > > > > > > >> > } else {
> > > > > > > >> >     env.execute();
> > > > > > > >> > }
> > > > > > > >> >
> > > > > > > >> > And `flink info` would be invalid any more.
> > > > > > > >> >
> > > > > > > >> > 3. How to deal with Jar Submission at the Web Frontend.
> > > > > > > >> >
> > > > > > > >> > There is one more thread talked on this topic[1]. Apart
> from
> > > > > > removing
> > > > > > > >> > the functions there are two alternatives.
> > > > > > > >> >
> > > > > > > >> > One is to introduce an interface has a method returns
> > > > > > > JobGraph/FilnkPlan
> > > > > > > >> > and Jar Submission only support main-class implements this
> > > > > > interface.
> > > > > > > >> > And then extract the JobGraph/FlinkPlan just by calling
> the
> > > > > method.
> > > > > > > >> > In this way, it is even possible to consider a separation
> > of job
> > > > > > > >> creation
> > > > > > > >> > and job submission.
> > > > > > > >> >
> > > > > > > >> > The other is, as you mentioned, let execute() do the
> actual
> > > > > > execution.
> > > > > > > >> > We won't execute the main method in the WebFrontend but
> > spawn a
> > > > > > > process
> > > > > > > >> > at WebMonitor side to execute. For return part we could
> > generate
> > > > > the
> > > > > > > >> > JobID from WebMonitor and pass it to the execution
> > environemnt.
> > > > > > > >> >
> > > > > > > >> > 4. How to deal with detached mode.
> > > > > > > >> >
> > > > > > > >> > I think detached mode is a temporary solution for
> > non-blocking
> > > > > > > >> submission.
> > > > > > > >> > In my document both submission and execution return a
> > > > > > > CompletableFuture
> > > > > > > >> and
> > > > > > > >> > users control whether or not wait for the result. In this
> > point
> > > > we
> > > > > > > don't
> > > > > > > >> > need a detached option but the functionality is covered.
> > > > > > > >> >
> > > > > > > >> > 5. How does per-job mode interact with interactive
> > programming.
> > > > > > > >> >
> > > > > > > >> > All of YARN, Mesos and Kubernetes scenarios follow the
> > pattern
> > > > > > launch
> > > > > > > a
> > > > > > > >> > JobCluster now. And I don't think there would be
> > inconsistency
> > > > > > between
> > > > > > > >> > different resource management.
> > > > > > > >> >
> > > > > > > >> > Best,
> > > > > > > >> > tison.
> > > > > > > >> >
> > > > > > > >> > [1]
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> > > > > > > >> > [2]
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> > > > > > > >> >
> > > > > > > >> > Aljoscha Krettek <al...@apache.org> 于2019年8月16日周五
> > 下午9:20写道：
> > > > > > > >> >
> > > > > > > >> >> Hi,
> > > > > > > >> >>
> > > > > > > >> >> I read both Jeffs initial design document and the newer
> > > > document
> > > > > by
> > > > > > > >> >> Tison. I also finally found the time to collect our
> > thoughts on
> > > > > the
> > > > > > > >> issue,
> > > > > > > >> >> I had quite some discussions with Kostas and this is the
> > > > result:
> > > > > > [1].
> > > > > > > >> >>
> > > > > > > >> >> I think overall we agree that this part of the code is in
> > dire
> > > > > need
> > > > > > > of
> > > > > > > >> >> some refactoring/improvements but I think there are still
> > some
> > > > > open
> > > > > > > >> >> questions and some differences in opinion what those
> > > > refactorings
> > > > > > > >> should
> > > > > > > >> >> look like.
> > > > > > > >> >>
> > > > > > > >> >> I think the API-side is quite clear, i.e. we need some
> > > > JobClient
> > > > > > API
> > > > > > > >> that
> > > > > > > >> >> allows interacting with a running Job. It could be
> > worthwhile
> > > > to
> > > > > > spin
> > > > > > > >> that
> > > > > > > >> >> off into a separate FLIP because we can probably find
> > consensus
> > > > > on
> > > > > > > that
> > > > > > > >> >> part more easily.
> > > > > > > >> >>
> > > > > > > >> >> For the rest, the main open questions from our doc are
> > these:
> > > > > > > >> >>
> > > > > > > >> >>   - Do we want to separate cluster creation and job
> > submission
> > > > > for
> > > > > > > >> >> per-job mode? In the past, there were conscious efforts
> to
> > > > *not*
> > > > > > > >> separate
> > > > > > > >> >> job submission from cluster creation for per-job clusters
> > for
> > > > > > Mesos,
> > > > > > > >> YARN,
> > > > > > > >> >> Kubernets (see StandaloneJobClusterEntryPoint). Tison
> > suggests
> > > > in
> > > > > > his
> > > > > > > >> >> design document to decouple this in order to unify job
> > > > > submission.
> > > > > > > >> >>
> > > > > > > >> >>   - How to deal with plan preview, which needs to hijack
> > > > > execute()
> > > > > > > and
> > > > > > > >> >> let the outside code catch an exception?
> > > > > > > >> >>
> > > > > > > >> >>   - How to deal with Jar Submission at the Web Frontend,
> > which
> > > > > > needs
> > > > > > > to
> > > > > > > >> >> hijack execute() and let the outside code catch an
> > exception?
> > > > > > > >> >> CliFrontend.run() “hijacks”
> ExecutionEnvironment.execute()
> > to
> > > > > get a
> > > > > > > >> >> JobGraph and then execute that JobGraph manually. We
> could
> > get
> > > > > > around
> > > > > > > >> that
> > > > > > > >> >> by letting execute() do the actual execution. One caveat
> > for
> > > > this
> > > > > > is
> > > > > > > >> that
> > > > > > > >> >> now the main() method doesn’t return (or is forced to
> > return by
> > > > > > > >> throwing an
> > > > > > > >> >> exception from execute()) which means that for Jar
> > Submission
> > > > > from
> > > > > > > the
> > > > > > > >> >> WebFrontend we have a long-running main() method running
> > in the
> > > > > > > >> >> WebFrontend. This doesn’t sound very good. We could get
> > around
> > > > > this
> > > > > > > by
> > > > > > > >> >> removing the plan preview feature and by removing Jar
> > > > > > > >> Submission/Running.
> > > > > > > >> >>
> > > > > > > >> >>   - How to deal with detached mode? Right now,
> > > > > DetachedEnvironment
> > > > > > > will
> > > > > > > >> >> execute the job and return immediately. If users control
> > when
> > > > > they
> > > > > > > >> want to
> > > > > > > >> >> return, by waiting on the job completion future, how do
> we
> > deal
> > > > > > with
> > > > > > > >> this?
> > > > > > > >> >> Do we simply remove the distinction between
> > > > > detached/non-detached?
> > > > > > > >> >>
> > > > > > > >> >>   - How does per-job mode interact with “interactive
> > > > programming”
> > > > > > > >> >> (FLIP-36). For YARN, each execute() call could spawn a
> new
> > > > Flink
> > > > > > YARN
> > > > > > > >> >> cluster. What about Mesos and Kubernetes?
> > > > > > > >> >>
> > > > > > > >> >> The first open question is where the opinions diverge, I
> > think.
> > > > > The
> > > > > > > >> rest
> > > > > > > >> >> are just open questions and interesting things that we
> > need to
> > > > > > > >> consider.
> > > > > > > >> >>
> > > > > > > >> >> Best,
> > > > > > > >> >> Aljoscha
> > > > > > > >> >>
> > > > > > > >> >> [1]
> > > > > > > >> >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > > > > >> >> <
> > > > > > > >> >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > > > > >> >> >
> > > > > > > >> >>
> > > > > > > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <
> zjffdu@gmail.com>
> > > > > wrote:
> > > > > > > >> >> >
> > > > > > > >> >> > Thanks tison for the effort. I left a few comments.
> > > > > > > >> >> >
> > > > > > > >> >> >
> > > > > > > >> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三
> 下午8:24写道：
> > > > > > > >> >> >
> > > > > > > >> >> >> Hi Flavio,
> > > > > > > >> >> >>
> > > > > > > >> >> >> Thanks for your reply.
> > > > > > > >> >> >>
> > > > > > > >> >> >> Either current impl and in the design, ClusterClient
> > > > > > > >> >> >> never takes responsibility for generating JobGraph.
> > > > > > > >> >> >> (what you see in current codebase is several class
> > methods)
> > > > > > > >> >> >>
> > > > > > > >> >> >> Instead, user describes his program in the main method
> > > > > > > >> >> >> with ExecutionEnvironment apis and calls env.compile()
> > > > > > > >> >> >> or env.optimize() to get FlinkPlan and JobGraph
> > > > respectively.
> > > > > > > >> >> >>
> > > > > > > >> >> >> For listing main classes in a jar and choose one for
> > > > > > > >> >> >> submission, you're now able to customize a CLI to do
> it.
> > > > > > > >> >> >> Specifically, the path of jar is passed as arguments
> and
> > > > > > > >> >> >> in the customized CLI you list main classes, choose
> one
> > > > > > > >> >> >> to submit to the cluster.
> > > > > > > >> >> >>
> > > > > > > >> >> >> Best,
> > > > > > > >> >> >> tison.
> > > > > > > >> >> >>
> > > > > > > >> >> >>
> > > > > > > >> >> >> Flavio Pompermaier <po...@okkam.it>
> 于2019年7月31日周三
> > > > > > 下午8:12写道：
> > > > > > > >> >> >>
> > > > > > > >> >> >>> Just one note on my side: it is not clear to me
> > whether the
> > > > > > > client
> > > > > > > >> >> needs
> > > > > > > >> >> >> to
> > > > > > > >> >> >>> be able to generate a job graph or not.
> > > > > > > >> >> >>> In my opinion, the job jar must resides only on the
> > > > > > > >> server/jobManager
> > > > > > > >> >> >> side
> > > > > > > >> >> >>> and the client requires a way to get the job graph.
> > > > > > > >> >> >>> If you really want to access to the job graph, I'd
> add
> > a
> > > > > > > dedicated
> > > > > > > >> >> method
> > > > > > > >> >> >>> on the ClusterClient. like:
> > > > > > > >> >> >>>
> > > > > > > >> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
> > > > > > > >> >> >>>   - listMainClasses(jarId): List<String>
> > > > > > > >> >> >>>
> > > > > > > >> >> >>> These would require some addition also on the job
> > manager
> > > > > > > endpoint
> > > > > > > >> as
> > > > > > > >> >> >>> well..what do you think?
> > > > > > > >> >> >>>
> > > > > > > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <
> > > > > > wander4096@gmail.com
> > > > > > > >
> > > > > > > >> >> wrote:
> > > > > > > >> >> >>>
> > > > > > > >> >> >>>> Hi all,
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>> Here is a document[1] on client api enhancement from
> > our
> > > > > > > >> perspective.
> > > > > > > >> >> >>>> We have investigated current implementations. And we
> > > > propose
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>> 1. Unify the implementation of cluster deployment
> and
> > job
> > > > > > > >> submission
> > > > > > > >> >> in
> > > > > > > >> >> >>>> Flink.
> > > > > > > >> >> >>>> 2. Provide programmatic interfaces to allow flexible
> > job
> > > > and
> > > > > > > >> cluster
> > > > > > > >> >> >>>> management.
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>> The first proposal is aimed at reducing code paths
> of
> > > > > cluster
> > > > > > > >> >> >> deployment
> > > > > > > >> >> >>>> and
> > > > > > > >> >> >>>> job submission so that one can adopt Flink in his
> > usage
> > > > > > easily.
> > > > > > > >> The
> > > > > > > >> >> >>> second
> > > > > > > >> >> >>>> proposal is aimed at providing rich interfaces for
> > > > advanced
> > > > > > > users
> > > > > > > >> >> >>>> who want to make accurate control of these stages.
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>> Quick reference on open questions:
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>> 1. Exclude job cluster deployment from client side
> or
> > > > > redefine
> > > > > > > the
> > > > > > > >> >> >>> semantic
> > > > > > > >> >> >>>> of job cluster? Since it fits in a process quite
> > different
> > > > > > from
> > > > > > > >> >> session
> > > > > > > >> >> >>>> cluster deployment and job submission.
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>> 2. Maintain the codepaths handling class
> > > > > > > o.a.f.api.common.Program
> > > > > > > >> or
> > > > > > > >> >> >>>> implement customized program handling logic by
> > customized
> > > > > > > >> >> CliFrontend?
> > > > > > > >> >> >>>> See also this thread[2] and the document[1].
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>> 3. Expose ClusterClient as public api or just expose
> > api
> > > > in
> > > > > > > >> >> >>>> ExecutionEnvironment
> > > > > > > >> >> >>>> and delegate them to ClusterClient? Further, in
> > either way
> > > > > is
> > > > > > it
> > > > > > > >> >> worth
> > > > > > > >> >> >> to
> > > > > > > >> >> >>>> introduce a JobClient which is an encapsulation of
> > > > > > ClusterClient
> > > > > > > >> that
> > > > > > > >> >> >>>> associated to specific job?
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>> Best,
> > > > > > > >> >> >>>> tison.
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>> [1]
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>
> > > > > > > >> >> >>
> > > > > > > >> >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> > > > > > > >> >> >>>> [2]
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>
> > > > > > > >> >> >>
> > > > > > > >> >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三
> 上午9:19写道：
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>>> Thanks Stephan, I will follow up this issue in next
> > few
> > > > > > weeks,
> > > > > > > >> and
> > > > > > > >> >> >> will
> > > > > > > >> >> >>>>> refine the design doc. We could discuss more
> details
> > > > after
> > > > > > 1.9
> > > > > > > >> >> >> release.
> > > > > > > >> >> >>>>>
> > > > > > > >> >> >>>>> Stephan Ewen <se...@apache.org> 于2019年7月24日周三
> > 上午12:58写道：
> > > > > > > >> >> >>>>>
> > > > > > > >> >> >>>>>> Hi all!
> > > > > > > >> >> >>>>>>
> > > > > > > >> >> >>>>>> This thread has stalled for a bit, which I assume
> > ist
> > > > > mostly
> > > > > > > >> due to
> > > > > > > >> >> >>> the
> > > > > > > >> >> >>>>>> Flink 1.9 feature freeze and release testing
> effort.
> > > > > > > >> >> >>>>>>
> > > > > > > >> >> >>>>>> I personally still recognize this issue as one
> > important
> > > > > to
> > > > > > be
> > > > > > > >> >> >>> solved.
> > > > > > > >> >> >>>>> I'd
> > > > > > > >> >> >>>>>> be happy to help resume this discussion soon
> (after
> > the
> > > > > 1.9
> > > > > > > >> >> >> release)
> > > > > > > >> >> >>>> and
> > > > > > > >> >> >>>>>> see if we can do some step towards this in Flink
> > 1.10.
> > > > > > > >> >> >>>>>>
> > > > > > > >> >> >>>>>> Best,
> > > > > > > >> >> >>>>>> Stephan
> > > > > > > >> >> >>>>>>
> > > > > > > >> >> >>>>>>
> > > > > > > >> >> >>>>>>
> > > > > > > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio
> Pompermaier
> > <
> > > > > > > >> >> >>>>> pompermaier@okkam.it>
> > > > > > > >> >> >>>>>> wrote:
> > > > > > > >> >> >>>>>>
> > > > > > > >> >> >>>>>>> That's exactly what I suggested a long time ago:
> > the
> > > > > Flink
> > > > > > > REST
> > > > > > > >> >> >>>> client
> > > > > > > >> >> >>>>>>> should not require any Flink dependency, only
> http
> > > > > library
> > > > > > to
> > > > > > > >> >> >> call
> > > > > > > >> >> >>>> the
> > > > > > > >> >> >>>>>> REST
> > > > > > > >> >> >>>>>>> services to submit and monitor a job.
> > > > > > > >> >> >>>>>>> What I suggested also in [1] was to have a way to
> > > > > > > automatically
> > > > > > > >> >> >>>> suggest
> > > > > > > >> >> >>>>>> the
> > > > > > > >> >> >>>>>>> user (via a UI) the available main classes and
> > their
> > > > > > required
> > > > > > > >> >> >>>>>>> parameters[2].
> > > > > > > >> >> >>>>>>> Another problem we have with Flink is that the
> Rest
> > > > > client
> > > > > > > and
> > > > > > > >> >> >> the
> > > > > > > >> >> >>>> CLI
> > > > > > > >> >> >>>>>> one
> > > > > > > >> >> >>>>>>> behaves differently and we use the CLI client
> (via
> > ssh)
> > > > > > > because
> > > > > > > >> >> >> it
> > > > > > > >> >> >>>>> allows
> > > > > > > >> >> >>>>>>> to call some other method after env.execute() [3]
> > (we
> > > > > have
> > > > > > to
> > > > > > > >> >> >> call
> > > > > > > >> >> >>>>>> another
> > > > > > > >> >> >>>>>>> REST service to signal the end of the job).
> > > > > > > >> >> >>>>>>> Int his regard, a dedicated interface, like the
> > > > > JobListener
> > > > > > > >> >> >>> suggested
> > > > > > > >> >> >>>>> in
> > > > > > > >> >> >>>>>>> the previous emails, would be very helpful
> (IMHO).
> > > > > > > >> >> >>>>>>>
> > > > > > > >> >> >>>>>>> [1]
> > https://issues.apache.org/jira/browse/FLINK-10864
> > > > > > > >> >> >>>>>>> [2]
> > https://issues.apache.org/jira/browse/FLINK-10862
> > > > > > > >> >> >>>>>>> [3]
> > https://issues.apache.org/jira/browse/FLINK-10879
> > > > > > > >> >> >>>>>>>
> > > > > > > >> >> >>>>>>> Best,
> > > > > > > >> >> >>>>>>> Flavio
> > > > > > > >> >> >>>>>>>
> > > > > > > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <
> > > > > > zjffdu@gmail.com
> > > > > > > >
> > > > > > > >> >> >>> wrote:
> > > > > > > >> >> >>>>>>>
> > > > > > > >> >> >>>>>>>> Hi, Tison,
> > > > > > > >> >> >>>>>>>>
> > > > > > > >> >> >>>>>>>> Thanks for your comments. Overall I agree with
> you
> > > > that
> > > > > it
> > > > > > > is
> > > > > > > >> >> >>>>> difficult
> > > > > > > >> >> >>>>>>> for
> > > > > > > >> >> >>>>>>>> down stream project to integrate with flink and
> we
> > > > need
> > > > > to
> > > > > > > >> >> >>> refactor
> > > > > > > >> >> >>>>> the
> > > > > > > >> >> >>>>>>>> current flink client api.
> > > > > > > >> >> >>>>>>>> And I agree that CliFrontend should only parsing
> > > > command
> > > > > > > line
> > > > > > > >> >> >>>>> arguments
> > > > > > > >> >> >>>>>>> and
> > > > > > > >> >> >>>>>>>> then pass them to ExecutionEnvironment. It is
> > > > > > > >> >> >>>> ExecutionEnvironment's
> > > > > > > >> >> >>>>>>>> responsibility to compile job, create cluster,
> and
> > > > > submit
> > > > > > > job.
> > > > > > > >> >> >>>>> Besides
> > > > > > > >> >> >>>>>>>> that, Currently flink has many
> > ExecutionEnvironment
> > > > > > > >> >> >>>> implementations,
> > > > > > > >> >> >>>>>> and
> > > > > > > >> >> >>>>>>>> flink will use the specific one based on the
> > context.
> > > > > > IMHO,
> > > > > > > it
> > > > > > > >> >> >> is
> > > > > > > >> >> >>>> not
> > > > > > > >> >> >>>>>>>> necessary, ExecutionEnvironment should be able
> to
> > do
> > > > the
> > > > > > > right
> > > > > > > >> >> >>>> thing
> > > > > > > >> >> >>>>>>> based
> > > > > > > >> >> >>>>>>>> on the FlinkConf it is received. Too many
> > > > > > > ExecutionEnvironment
> > > > > > > >> >> >>>>>>>> implementation is another burden for downstream
> > > > project
> > > > > > > >> >> >>>> integration.
> > > > > > > >> >> >>>>>>>>
> > > > > > > >> >> >>>>>>>> One thing I'd like to mention is flink's scala
> > shell
> > > > and
> > > > > > sql
> > > > > > > >> >> >>>> client,
> > > > > > > >> >> >>>>>>>> although they are sub-modules of flink, they
> > could be
> > > > > > > treated
> > > > > > > >> >> >> as
> > > > > > > >> >> >>>>>>> downstream
> > > > > > > >> >> >>>>>>>> project which use flink's client api. Currently
> > you
> > > > will
> > > > > > > find
> > > > > > > >> >> >> it
> > > > > > > >> >> >>> is
> > > > > > > >> >> >>>>> not
> > > > > > > >> >> >>>>>>>> easy for them to integrate with flink, they
> share
> > many
> > > > > > > >> >> >> duplicated
> > > > > > > >> >> >>>>> code.
> > > > > > > >> >> >>>>>>> It
> > > > > > > >> >> >>>>>>>> is another sign that we should refactor flink
> > client
> > > > > api.
> > > > > > > >> >> >>>>>>>>
> > > > > > > >> >> >>>>>>>> I believe it is a large and hard change, and I
> am
> > > > afraid
> > > > > > we
> > > > > > > >> can
> > > > > > > >> >> >>> not
> > > > > > > >> >> >>>>>> keep
> > > > > > > >> >> >>>>>>>> compatibility since many of changes are user
> > facing.
> > > > > > > >> >> >>>>>>>>
> > > > > > > >> >> >>>>>>>>
> > > > > > > >> >> >>>>>>>>
> > > > > > > >> >> >>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月24日周一
> > > > > 下午2:53写道：
> > > > > > > >> >> >>>>>>>>
> > > > > > > >> >> >>>>>>>>> Hi all,
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> After a closer look on our client apis, I can
> see
> > > > there
> > > > > > are
> > > > > > > >> >> >> two
> > > > > > > >> >> >>>>> major
> > > > > > > >> >> >>>>>>>>> issues to consistency and integration, namely
> > > > different
> > > > > > > >> >> >>>> deployment
> > > > > > > >> >> >>>>> of
> > > > > > > >> >> >>>>>>>>> job cluster which couples job graph creation
> and
> > > > > cluster
> > > > > > > >> >> >>>>> deployment,
> > > > > > > >> >> >>>>>>>>> and submission via CliFrontend confusing
> control
> > flow
> > > > > of
> > > > > > > job
> > > > > > > >> >> >>>> graph
> > > > > > > >> >> >>>>>>>>> compilation and job submission. I'd like to
> > follow
> > > > the
> > > > > > > >> >> >> discuss
> > > > > > > >> >> >>>>> above,
> > > > > > > >> >> >>>>>>>>> mainly the process described by Jeff and
> > Stephan, and
> > > > > > share
> > > > > > > >> >> >> my
> > > > > > > >> >> >>>>>>>>> ideas on these issues.
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> 1) CliFrontend confuses the control flow of job
> > > > > > compilation
> > > > > > > >> >> >> and
> > > > > > > >> >> >>>>>>>> submission.
> > > > > > > >> >> >>>>>>>>> Following the process of job submission Stephan
> > and
> > > > > Jeff
> > > > > > > >> >> >>>> described,
> > > > > > > >> >> >>>>>>>>> execution environment knows all configs of the
> > > > cluster
> > > > > > and
> > > > > > > >> >> >>>>>>> topos/settings
> > > > > > > >> >> >>>>>>>>> of the job. Ideally, in the main method of user
> > > > > program,
> > > > > > it
> > > > > > > >> >> >>> calls
> > > > > > > >> >> >>>>>>>> #execute
> > > > > > > >> >> >>>>>>>>> (or named #submit) and Flink deploys the
> cluster,
> > > > > compile
> > > > > > > the
> > > > > > > >> >> >>> job
> > > > > > > >> >> >>>>>> graph
> > > > > > > >> >> >>>>>>>>> and submit it to the cluster. However, current
> > > > > > CliFrontend
> > > > > > > >> >> >> does
> > > > > > > >> >> >>>> all
> > > > > > > >> >> >>>>>>> these
> > > > > > > >> >> >>>>>>>>> things inside its #runProgram method, which
> > > > introduces
> > > > > a
> > > > > > > lot
> > > > > > > >> >> >> of
> > > > > > > >> >> >>>>>>>> subclasses
> > > > > > > >> >> >>>>>>>>> of (stream) execution environment.
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> Actually, it sets up an exec env that hijacks
> the
> > > > > > > >> >> >>>>>> #execute/executePlan
> > > > > > > >> >> >>>>>>>>> method, initializes the job graph and abort
> > > > execution.
> > > > > > And
> > > > > > > >> >> >> then
> > > > > > > >> >> >>>>>>>>> control flow back to CliFrontend, it deploys
> the
> > > > > > cluster(or
> > > > > > > >> >> >>>>> retrieve
> > > > > > > >> >> >>>>>>>>> the client) and submits the job graph. This is
> > quite
> > > > a
> > > > > > > >> >> >> specific
> > > > > > > >> >> >>>>>>> internal
> > > > > > > >> >> >>>>>>>>> process inside Flink and none of consistency to
> > > > > anything.
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> 2) Deployment of job cluster couples job graph
> > > > creation
> > > > > > and
> > > > > > > >> >> >>>> cluster
> > > > > > > >> >> >>>>>>>>> deployment. Abstractly, from user job to a
> > concrete
> > > > > > > >> >> >> submission,
> > > > > > > >> >> >>>> it
> > > > > > > >> >> >>>>>>>> requires
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>>     create JobGraph --\
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> create ClusterClient -->  submit JobGraph
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> such a dependency. ClusterClient was created by
> > > > > deploying
> > > > > > > or
> > > > > > > >> >> >>>>>>> retrieving.
> > > > > > > >> >> >>>>>>>>> JobGraph submission requires a compiled
> JobGraph
> > and
> > > > > > valid
> > > > > > > >> >> >>>>>>> ClusterClient,
> > > > > > > >> >> >>>>>>>>> but the creation of ClusterClient is abstractly
> > > > > > independent
> > > > > > > >> >> >> of
> > > > > > > >> >> >>>> that
> > > > > > > >> >> >>>>>> of
> > > > > > > >> >> >>>>>>>>> JobGraph. However, in job cluster mode, we
> > deploy job
> > > > > > > cluster
> > > > > > > >> >> >>>> with
> > > > > > > >> >> >>>>> a
> > > > > > > >> >> >>>>>>> job
> > > > > > > >> >> >>>>>>>>> graph, which means we use another process:
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> create JobGraph --> deploy cluster with the
> > JobGraph
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> Here is another inconsistency and downstream
> > > > > > > projects/client
> > > > > > > >> >> >>> apis
> > > > > > > >> >> >>>>> are
> > > > > > > >> >> >>>>>>>>> forced to handle different cases with rare
> > supports
> > > > > from
> > > > > > > >> >> >> Flink.
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> Since we likely reached a consensus on
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> 1. all configs gathered by Flink configuration
> > and
> > > > > passed
> > > > > > > >> >> >>>>>>>>> 2. execution environment knows all configs and
> > > > handles
> > > > > > > >> >> >>>>> execution(both
> > > > > > > >> >> >>>>>>>>> deployment and submission)
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> to the issues above I propose eliminating
> > > > > inconsistencies
> > > > > > > by
> > > > > > > >> >> >>>>>> following
> > > > > > > >> >> >>>>>>>>> approach:
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> 1) CliFrontend should exactly be a front end,
> at
> > > > least
> > > > > > for
> > > > > > > >> >> >>> "run"
> > > > > > > >> >> >>>>>>> command.
> > > > > > > >> >> >>>>>>>>> That means it just gathered and passed all
> config
> > > > from
> > > > > > > >> >> >> command
> > > > > > > >> >> >>>> line
> > > > > > > >> >> >>>>>> to
> > > > > > > >> >> >>>>>>>>> the main method of user program. Execution
> > > > environment
> > > > > > > knows
> > > > > > > >> >> >>> all
> > > > > > > >> >> >>>>> the
> > > > > > > >> >> >>>>>>> info
> > > > > > > >> >> >>>>>>>>> and with an addition to utils for
> ClusterClient,
> > we
> > > > > > > >> >> >> gracefully
> > > > > > > >> >> >>>> get
> > > > > > > >> >> >>>>> a
> > > > > > > >> >> >>>>>>>>> ClusterClient by deploying or retrieving. In
> this
> > > > way,
> > > > > we
> > > > > > > >> >> >> don't
> > > > > > > >> >> >>>>> need
> > > > > > > >> >> >>>>>> to
> > > > > > > >> >> >>>>>>>>> hijack #execute/executePlan methods and can
> > remove
> > > > > > various
> > > > > > > >> >> >>>> hacking
> > > > > > > >> >> >>>>>>>>> subclasses of exec env, as well as #run methods
> > in
> > > > > > > >> >> >>>>> ClusterClient(for
> > > > > > > >> >> >>>>>> an
> > > > > > > >> >> >>>>>>>>> interface-ized ClusterClient). Now the control
> > flow
> > > > > flows
> > > > > > > >> >> >> from
> > > > > > > >> >> >>>>>>>> CliFrontend
> > > > > > > >> >> >>>>>>>>> to the main method and never returns.
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> 2) Job cluster means a cluster for the specific
> > job.
> > > > > From
> > > > > > > >> >> >>> another
> > > > > > > >> >> >>>>>>>>> perspective, it is an ephemeral session. We may
> > > > > decouple
> > > > > > > the
> > > > > > > >> >> >>>>>> deployment
> > > > > > > >> >> >>>>>>>>> with a compiled job graph, but start a session
> > with
> > > > > idle
> > > > > > > >> >> >>> timeout
> > > > > > > >> >> >>>>>>>>> and submit the job following.
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> These topics, before we go into more details on
> > > > design
> > > > > or
> > > > > > > >> >> >>>>>>> implementation,
> > > > > > > >> >> >>>>>>>>> are better to be aware and discussed for a
> > consensus.
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> Best,
> > > > > > > >> >> >>>>>>>>> tison.
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月20日周四
> > > > > 上午3:21写道：
> > > > > > > >> >> >>>>>>>>>
> > > > > > > >> >> >>>>>>>>>> Hi Jeff,
> > > > > > > >> >> >>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>> Thanks for raising this thread and the design
> > > > > document!
> > > > > > > >> >> >>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>> As @Thomas Weise mentioned above, extending
> > config
> > > > to
> > > > > > > flink
> > > > > > > >> >> >>>>>>>>>> requires far more effort than it should be.
> > Another
> > > > > > > example
> > > > > > > >> >> >>>>>>>>>> is we achieve detach mode by introduce another
> > > > > execution
> > > > > > > >> >> >>>>>>>>>> environment which also hijack #execute method.
> > > > > > > >> >> >>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>> I agree with your idea that user would
> > configure all
> > > > > > > things
> > > > > > > >> >> >>>>>>>>>> and flink "just" respect it. On this topic I
> > think
> > > > the
> > > > > > > >> >> >> unusual
> > > > > > > >> >> >>>>>>>>>> control flow when CliFrontend handle "run"
> > command
> > > > is
> > > > > > the
> > > > > > > >> >> >>>> problem.
> > > > > > > >> >> >>>>>>>>>> It handles several configs, mainly about
> cluster
> > > > > > settings,
> > > > > > > >> >> >> and
> > > > > > > >> >> >>>>>>>>>> thus main method of user program is unaware of
> > them.
> > > > > > Also
> > > > > > > it
> > > > > > > >> >> >>>>>> compiles
> > > > > > > >> >> >>>>>>>>>> app to job graph by run the main method with a
> > > > > hijacked
> > > > > > > exec
> > > > > > > >> >> >>>> env,
> > > > > > > >> >> >>>>>>>>>> which constrain the main method further.
> > > > > > > >> >> >>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>> I'd like to write down a few of notes on
> > > > configs/args
> > > > > > pass
> > > > > > > >> >> >> and
> > > > > > > >> >> >>>>>>> respect,
> > > > > > > >> >> >>>>>>>>>> as well as decoupling job compilation and
> > > > submission.
> > > > > > > Share
> > > > > > > >> >> >> on
> > > > > > > >> >> >>>>> this
> > > > > > > >> >> >>>>>>>>>> thread later.
> > > > > > > >> >> >>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>> Best,
> > > > > > > >> >> >>>>>>>>>> tison.
> > > > > > > >> >> >>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com>
> > 于2019年6月17日周一
> > > > > > > >> >> >> 下午7:29写道：
> > > > > > > >> >> >>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
> > > > > > > >> >> >>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the design
> > > > document.
> > > > > > > >> >> >>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>> We are also working on refactoring
> > ClusterClient to
> > > > > > allow
> > > > > > > >> >> >>>>> flexible
> > > > > > > >> >> >>>>>>> and
> > > > > > > >> >> >>>>>>>>>>> efficient job management in our real-time
> > platform.
> > > > > > > >> >> >>>>>>>>>>> We would like to draft a document to share
> our
> > > > ideas
> > > > > > with
> > > > > > > >> >> >>> you.
> > > > > > > >> >> >>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>> I think it's a good idea to have something
> like
> > > > > Apache
> > > > > > > Livy
> > > > > > > >> >> >>> for
> > > > > > > >> >> >>>>>>> Flink,
> > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > >> >> >>>>>>>>>>> the efforts discussed here will take a great
> > step
> > > > > > forward
> > > > > > > >> >> >> to
> > > > > > > >> >> >>>> it.
> > > > > > > >> >> >>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>> Regards,
> > > > > > > >> >> >>>>>>>>>>> Xiaogang
> > > > > > > >> >> >>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>> Flavio Pompermaier <po...@okkam.it>
> > > > > > 于2019年6月17日周一
> > > > > > > >> >> >>>>> 下午7:13写道：
> > > > > > > >> >> >>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>> Is there any possibility to have something
> > like
> > > > > Apache
> > > > > > > >> >> >> Livy
> > > > > > > >> >> >>>> [1]
> > > > > > > >> >> >>>>>>> also
> > > > > > > >> >> >>>>>>>>>>> for
> > > > > > > >> >> >>>>>>>>>>>> Flink in the future?
> > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
> > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <
> > > > > > > >> >> >>> zjffdu@gmail.com
> > > > > > > >> >> >>>>>
> > > > > > > >> >> >>>>>>> wrote:
> > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>> Any API we expose should not have
> > dependencies
> > > > > on
> > > > > > > >> >> >>> the
> > > > > > > >> >> >>>>>>> runtime
> > > > > > > >> >> >>>>>>>>>>>>> (flink-runtime) package or other
> > implementation
> > > > > > > >> >> >> details.
> > > > > > > >> >> >>> To
> > > > > > > >> >> >>>>> me,
> > > > > > > >> >> >>>>>>>> this
> > > > > > > >> >> >>>>>>>>>>>> means
> > > > > > > >> >> >>>>>>>>>>>>> that the current ClusterClient cannot be
> > exposed
> > > > to
> > > > > > > >> >> >> users
> > > > > > > >> >> >>>>>> because
> > > > > > > >> >> >>>>>>>> it
> > > > > > > >> >> >>>>>>>>>>>> uses
> > > > > > > >> >> >>>>>>>>>>>>> quite some classes from the optimiser and
> > runtime
> > > > > > > >> >> >>> packages.
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>> We should change ClusterClient from class
> to
> > > > > > interface.
> > > > > > > >> >> >>>>>>>>>>>>> ExecutionEnvironment only use the interface
> > > > > > > >> >> >> ClusterClient
> > > > > > > >> >> >>>>> which
> > > > > > > >> >> >>>>>>>>>>> should be
> > > > > > > >> >> >>>>>>>>>>>>> in flink-clients while the concrete
> > > > implementation
> > > > > > > >> >> >> class
> > > > > > > >> >> >>>>> could
> > > > > > > >> >> >>>>>> be
> > > > > > > >> >> >>>>>>>> in
> > > > > > > >> >> >>>>>>>>>>>>> flink-runtime.
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>> What happens when a failure/restart in
> the
> > > > > client
> > > > > > > >> >> >>>>> happens?
> > > > > > > >> >> >>>>>>>> There
> > > > > > > >> >> >>>>>>>>>>> need
> > > > > > > >> >> >>>>>>>>>>>>> to be a way of re-establishing the
> > connection to
> > > > > the
> > > > > > > >> >> >> job,
> > > > > > > >> >> >>>> set
> > > > > > > >> >> >>>>>> up
> > > > > > > >> >> >>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>> listeners again, etc.
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>> Good point.  First we need to define what
> > does
> > > > > > > >> >> >>>>> failure/restart
> > > > > > > >> >> >>>>>> in
> > > > > > > >> >> >>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>> client mean. IIUC, that usually mean
> network
> > > > > failure
> > > > > > > >> >> >>> which
> > > > > > > >> >> >>>>> will
> > > > > > > >> >> >>>>>>>>>>> happen in
> > > > > > > >> >> >>>>>>>>>>>>> class RestClient. If my understanding is
> > correct,
> > > > > > > >> >> >>>>> restart/retry
> > > > > > > >> >> >>>>>>>>>>> mechanism
> > > > > > > >> >> >>>>>>>>>>>>> should be done in RestClient.
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org>
> > > > > 于2019年6月11日周二
> > > > > > > >> >> >>>>>> 下午11:10写道：
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>> Some points to consider:
> > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>> * Any API we expose should not have
> > dependencies
> > > > > on
> > > > > > > >> >> >> the
> > > > > > > >> >> >>>>>> runtime
> > > > > > > >> >> >>>>>>>>>>>>>> (flink-runtime) package or other
> > implementation
> > > > > > > >> >> >>> details.
> > > > > > > >> >> >>>> To
> > > > > > > >> >> >>>>>> me,
> > > > > > > >> >> >>>>>>>>>>> this
> > > > > > > >> >> >>>>>>>>>>>>> means
> > > > > > > >> >> >>>>>>>>>>>>>> that the current ClusterClient cannot be
> > exposed
> > > > > to
> > > > > > > >> >> >>> users
> > > > > > > >> >> >>>>>>> because
> > > > > > > >> >> >>>>>>>>>>> it
> > > > > > > >> >> >>>>>>>>>>>>> uses
> > > > > > > >> >> >>>>>>>>>>>>>> quite some classes from the optimiser and
> > > > runtime
> > > > > > > >> >> >>>> packages.
> > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>> * What happens when a failure/restart in
> the
> > > > > client
> > > > > > > >> >> >>>>> happens?
> > > > > > > >> >> >>>>>>>> There
> > > > > > > >> >> >>>>>>>>>>> need
> > > > > > > >> >> >>>>>>>>>>>>> to
> > > > > > > >> >> >>>>>>>>>>>>>> be a way of re-establishing the connection
> > to
> > > > the
> > > > > > > >> >> >> job,
> > > > > > > >> >> >>>> set
> > > > > > > >> >> >>>>> up
> > > > > > > >> >> >>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>> listeners
> > > > > > > >> >> >>>>>>>>>>>>>> again, etc.
> > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>> Aljoscha
> > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff Zhang <
> > > > > > > >> >> >>>> zjffdu@gmail.com>
> > > > > > > >> >> >>>>>>>> wrote:
> > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is late as
> you
> > > > > > > >> >> >> expected.
> > > > > > > >> >> >>>>> Here's
> > > > > > > >> >> >>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>> design
> > > > > > > >> >> >>>>>>>>>>>>>> doc
> > > > > > > >> >> >>>>>>>>>>>>>>> I drafted, welcome any comments and
> > feedback.
> > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>
> > > > > > > >> >> >>>>>>>
> > > > > > > >> >> >>>>>>
> > > > > > > >> >> >>>>>
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>
> > > > > > > >> >> >>
> > > > > > > >> >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
> > 于2019年2月14日周四
> > > > > > > >> >> >>>> 下午8:43写道：
> > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>> Nice that this discussion is happening.
> > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also revisit the
> > entire
> > > > > role
> > > > > > > >> >> >>> of
> > > > > > > >> >> >>>>> the
> > > > > > > >> >> >>>>>>>>>>>> environments
> > > > > > > >> >> >>>>>>>>>>>>>>>> again.
> > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> > > > > > > >> >> >>>>>>>>>>>>>>>> - the environments take care of the
> > specific
> > > > > > > >> >> >> setup
> > > > > > > >> >> >>>> for
> > > > > > > >> >> >>>>>>>>>>> standalone
> > > > > > > >> >> >>>>>>>>>>>> (no
> > > > > > > >> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
> > > > > > > >> >> >>>>>>>>>>>>>>>> - the session ones have control over the
> > > > > session.
> > > > > > > >> >> >>> The
> > > > > > > >> >> >>>>>>>>>>> environment
> > > > > > > >> >> >>>>>>>>>>>>> holds
> > > > > > > >> >> >>>>>>>>>>>>>>>> the session client.
> > > > > > > >> >> >>>>>>>>>>>>>>>> - running a job gives a "control" object
> > for
> > > > > that
> > > > > > > >> >> >>>> job.
> > > > > > > >> >> >>>>>> That
> > > > > > > >> >> >>>>>>>>>>>> behavior
> > > > > > > >> >> >>>>>>>>>>>>> is
> > > > > > > >> >> >>>>>>>>>>>>>>>> the same in all environments.
> > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>> The actual implementation diverged quite
> > a bit
> > > > > > > >> >> >> from
> > > > > > > >> >> >>>>> that.
> > > > > > > >> >> >>>>>>>> Happy
> > > > > > > >> >> >>>>>>>>>>> to
> > > > > > > >> >> >>>>>>>>>>>>> see a
> > > > > > > >> >> >>>>>>>>>>>>>>>> discussion about straitening this out a
> > bit
> > > > > more.
> > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM Jeff
> > Zhang <
> > > > > > > >> >> >>>>>>> zjffdu@gmail.com>
> > > > > > > >> >> >>>>>>>>>>>> wrote:
> > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>> Hi folks,
> > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>> Sorry for late response, It seems we
> > reach
> > > > > > > >> >> >>> consensus
> > > > > > > >> >> >>>> on
> > > > > > > >> >> >>>>>>>> this, I
> > > > > > > >> >> >>>>>>>>>>>> will
> > > > > > > >> >> >>>>>>>>>>>>>>>> create
> > > > > > > >> >> >>>>>>>>>>>>>>>>> FLIP for this with more detailed design
> > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org>
> > 于2018年12月21日周五
> > > > > > > >> >> >>>>> 上午11:43写道：
> > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> Great to see this discussion seeded!
> The
> > > > > > > >> >> >> problems
> > > > > > > >> >> >>>> you
> > > > > > > >> >> >>>>>> face
> > > > > > > >> >> >>>>>>>>>>> with
> > > > > > > >> >> >>>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also
> affecting
> > > > other
> > > > > > > >> >> >>>>> downstream
> > > > > > > >> >> >>>>>>>>>>> projects,
> > > > > > > >> >> >>>>>>>>>>>>>> like
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> Beam.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint restore
> > option
> > > > > in
> > > > > > > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> > > > > > > >> >> >>>>>>>>>>>>>>>>> [1]
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> and that was more difficult than it
> > should
> > > > be.
> > > > > > > >> >> >> The
> > > > > > > >> >> >>>>> main
> > > > > > > >> >> >>>>>>>> issue
> > > > > > > >> >> >>>>>>>>>>> is
> > > > > > > >> >> >>>>>>>>>>>>> that
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> environment and cluster client aren't
> > > > > decoupled.
> > > > > > > >> >> >>>>> Ideally
> > > > > > > >> >> >>>>>>> it
> > > > > > > >> >> >>>>>>>>>>> should
> > > > > > > >> >> >>>>>>>>>>>>> be
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> possible to just get the matching
> > cluster
> > > > > client
> > > > > > > >> >> >>>> from
> > > > > > > >> >> >>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>> environment
> > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> then control the job through it
> > (environment
> > > > > as
> > > > > > > >> >> >>>>> factory
> > > > > > > >> >> >>>>>>> for
> > > > > > > >> >> >>>>>>>>>>>> cluster
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> client). But note that the environment
> > > > classes
> > > > > > > >> >> >> are
> > > > > > > >> >> >>>>> part
> > > > > > > >> >> >>>>>> of
> > > > > > > >> >> >>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>> public
> > > > > > > >> >> >>>>>>>>>>>>>>>>> API,
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> and it is not straightforward to make
> > larger
> > > > > > > >> >> >>> changes
> > > > > > > >> >> >>>>>>> without
> > > > > > > >> >> >>>>>>>>>>>>> breaking
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes
> internal
> > > > > classes
> > > > > > > >> >> >>>> like
> > > > > > > >> >> >>>>>>>>>>> JobGraph and
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be possible
> > to
> > > > wrap
> > > > > > > >> >> >>> this
> > > > > > > >> >> >>>>>> with a
> > > > > > > >> >> >>>>>>>> new
> > > > > > > >> >> >>>>>>>>>>>>> public
> > > > > > > >> >> >>>>>>>>>>>>>>>> API
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> that brings the required job control
> > > > > > > >> >> >> capabilities
> > > > > > > >> >> >>>> for
> > > > > > > >> >> >>>>>>>>>>> downstream
> > > > > > > >> >> >>>>>>>>>>>>>>>>> projects.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at some
> > of the
> > > > > > > >> >> >>>>> interfaces
> > > > > > > >> >> >>>>>> in
> > > > > > > >> >> >>>>>>>>>>> Beam
> > > > > > > >> >> >>>>>>>>>>>>> while
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the
> > portable
> > > > job
> > > > > > > >> >> >> API
> > > > > > > >> >> >>>> and
> > > > > > > >> >> >>>>>> [3]
> > > > > > > >> >> >>>>>>>> for
> > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>> old
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> asynchronous job control from the Beam
> > Java
> > > > > SDK.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> The backward compatibility discussion
> > [4] is
> > > > > > > >> >> >> also
> > > > > > > >> >> >>>>>> relevant
> > > > > > > >> >> >>>>>>>>>>> here. A
> > > > > > > >> >> >>>>>>>>>>>>> new
> > > > > > > >> >> >>>>>>>>>>>>>>>>> API
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> should shield downstream projects from
> > > > > internals
> > > > > > > >> >> >>> and
> > > > > > > >> >> >>>>>> allow
> > > > > > > >> >> >>>>>>>>>>> them to
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> interoperate with multiple future
> Flink
> > > > > versions
> > > > > > > >> >> >>> in
> > > > > > > >> >> >>>>> the
> > > > > > > >> >> >>>>>>> same
> > > > > > > >> >> >>>>>>>>>>>> release
> > > > > > > >> >> >>>>>>>>>>>>>>>> line
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thanks,
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> Thomas
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> [1]
> > > > https://github.com/apache/flink/pull/7249
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> [2]
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>
> > > > > > > >> >> >>>>>>>
> > > > > > > >> >> >>>>>>
> > > > > > > >> >> >>>>>
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>
> > > > > > > >> >> >>
> > > > > > > >> >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> [3]
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>
> > > > > > > >> >> >>>>>>>
> > > > > > > >> >> >>>>>>
> > > > > > > >> >> >>>>>
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>
> > > > > > > >> >> >>
> > > > > > > >> >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> [4]
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>
> > > > > > > >> >> >>>>>>>
> > > > > > > >> >> >>>>>>
> > > > > > > >> >> >>>>>
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>
> > > > > > > >> >> >>
> > > > > > > >> >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff
> > Zhang <
> > > > > > > >> >> >>>>>>>> zjffdu@gmail.com>
> > > > > > > >> >> >>>>>>>>>>>>> wrote:
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user
> > should
> > > > be
> > > > > > > >> >> >>> able
> > > > > > > >> >> >>>> to
> > > > > > > >> >> >>>>>>>> define
> > > > > > > >> >> >>>>>>>>>>>> where
> > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> job
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is
> > > > actually
> > > > > > > >> >> >>>>>> independent
> > > > > > > >> >> >>>>>>>> of
> > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>> job
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> development and is something which is
> > > > decided
> > > > > > > >> >> >> at
> > > > > > > >> >> >>>>>>> deployment
> > > > > > > >> >> >>>>>>>>>>> time.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> User don't need to specify execution
> > mode
> > > > > > > >> >> >>>>>>> programmatically.
> > > > > > > >> >> >>>>>>>>>>> They
> > > > > > > >> >> >>>>>>>>>>>>> can
> > > > > > > >> >> >>>>>>>>>>>>>>>>> also
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from the
> > arguments
> > > > in
> > > > > > > >> >> >>> flink
> > > > > > > >> >> >>>>> run
> > > > > > > >> >> >>>>>>>>>>> command.
> > > > > > > >> >> >>>>>>>>>>>>> e.g.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster ....
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that the
> > > > > > > >> >> >>>> ExecutionEnvironment
> > > > > > > >> >> >>>>>> is
> > > > > > > >> >> >>>>>>>> not
> > > > > > > >> >> >>>>>>>>>>>>>>>> directly
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> initialized by the user and instead
> > context
> > > > > > > >> >> >>>> sensitive
> > > > > > > >> >> >>>>>> how
> > > > > > > >> >> >>>>>>>> you
> > > > > > > >> >> >>>>>>>>>>>> want
> > > > > > > >> >> >>>>>>>>>>>>> to
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs. IDE,
> > for
> > > > > > > >> >> >>> example).
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink would
> > > > create
> > > > > > > >> >> >>>>> different
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment based on
> > > > > different
> > > > > > > >> >> >>>>>> submission
> > > > > > > >> >> >>>>>>>>>>>> scenarios
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> (Flink
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is kind of
> hack
> > > > > > > >> >> >> approach,
> > > > > > > >> >> >>>> not
> > > > > > > >> >> >>>>>> so
> > > > > > > >> >> >>>>>>>>>>>>>>>>> straightforward.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> What I suggested above is that is
> that
> > > > flink
> > > > > > > >> >> >>> should
> > > > > > > >> >> >>>>>>> always
> > > > > > > >> >> >>>>>>>>>>> create
> > > > > > > >> >> >>>>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with
> different
> > > > > > > >> >> >>>>> configuration,
> > > > > > > >> >> >>>>>>> and
> > > > > > > >> >> >>>>>>>>>>> based
> > > > > > > >> >> >>>>>>>>>>>> on
> > > > > > > >> >> >>>>>>>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> configuration it would create the
> > proper
> > > > > > > >> >> >>>>> ClusterClient
> > > > > > > >> >> >>>>>>> for
> > > > > > > >> >> >>>>>>>>>>>>> different
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> > > > > > > >> >> >>>> 于2018年12月20日周四
> > > > > > > >> >> >>>>>>>>>>> 下午11:18写道：
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> You are probably right that we have
> > code
> > > > > > > >> >> >>>> duplication
> > > > > > > >> >> >>>>>>> when
> > > > > > > >> >> >>>>>>>> it
> > > > > > > >> >> >>>>>>>>>>>> comes
> > > > > > > >> >> >>>>>>>>>>>>>>>> to
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient. This
> > should
> > > > > be
> > > > > > > >> >> >>>>> reduced
> > > > > > > >> >> >>>>>> in
> > > > > > > >> >> >>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>> future.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user
> > should be
> > > > > > > >> >> >> able
> > > > > > > >> >> >>> to
> > > > > > > >> >> >>>>>>> define
> > > > > > > >> >> >>>>>>>>>>> where
> > > > > > > >> >> >>>>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is
> > > > > actually
> > > > > > > >> >> >>>>>>> independent
> > > > > > > >> >> >>>>>>>>>>> of the
> > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> development and is something which
> is
> > > > > decided
> > > > > > > >> >> >> at
> > > > > > > >> >> >>>>>>>> deployment
> > > > > > > >> >> >>>>>>>>>>>> time.
> > > > > > > >> >> >>>>>>>>>>>>>>>> To
> > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> it
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the
> > ExecutionEnvironment
> > > > is
> > > > > > > >> >> >> not
> > > > > > > >> >> >>>>>>> directly
> > > > > > > >> >> >>>>>>>>>>>>>>>> initialized
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> by
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> the user and instead context
> > sensitive how
> > > > > you
> > > > > > > >> >> >>>> want
> > > > > > > >> >> >>>>> to
> > > > > > > >> >> >>>>>>>>>>> execute
> > > > > > > >> >> >>>>>>>>>>>>> your
> > > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for example).
> > > > However, I
> > > > > > > >> >> >>> agree
> > > > > > > >> >> >>>>>> that
> > > > > > > >> >> >>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should give you
> > > > access
> > > > > to
> > > > > > > >> >> >>> the
> > > > > > > >> >> >>>>>>>>>>> ClusterClient
> > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > >> >> >>>>>>>>>>>>>>>>> to
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the
> > JobGraph or
> > > > a
> > > > > > > >> >> >> job
> > > > > > > >> >> >>>>> plan).
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Till
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff
> > > > Zhang <
> > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com>
> > > > > > > >> >> >>>>>>>>>>>>>>>> wrote:
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You are
> > right
> > > > > that I
> > > > > > > >> >> >>>>> expect
> > > > > > > >> >> >>>>>>>> better
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api which
> > could be
> > > > > > > >> >> >> used
> > > > > > > >> >> >>> by
> > > > > > > >> >> >>>>>>>>>>> downstream
> > > > > > > >> >> >>>>>>>>>>>>>>>>> project.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> And
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the flink
> > ecosystem.
> > > > > > > >> >> >> When
> > > > > > > >> >> >>> I
> > > > > > > >> >> >>>>> look
> > > > > > > >> >> >>>>>>> at
> > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>> code
> > > > > > > >> >> >>>>>>>>>>>>>>>>> of
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> flink
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I
> believe
> > > > they
> > > > > > > >> >> >> are
> > > > > > > >> >> >>>> not
> > > > > > > >> >> >>>>>> the
> > > > > > > >> >> >>>>>>>>>>> core of
> > > > > > > >> >> >>>>>>>>>>>>>>>>> flink,
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> but
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of flink),
> I
> > find
> > > > > > > >> >> >> many
> > > > > > > >> >> >>>>>>> duplicated
> > > > > > > >> >> >>>>>>>>>>> code
> > > > > > > >> >> >>>>>>>>>>>>>>>> for
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creating
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user provided
> > > > > > > >> >> >> configuration
> > > > > > > >> >> >>>>>>>>>>> (configuration
> > > > > > > >> >> >>>>>>>>>>>>>>>>> format
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> may
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell and
> > > > sql-client)
> > > > > > > >> >> >> and
> > > > > > > >> >> >>>> then
> > > > > > > >> >> >>>>>> use
> > > > > > > >> >> >>>>>>>>>>> that
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't think
> > this is
> > > > > > > >> >> >>>>> convenient
> > > > > > > >> >> >>>>>>> for
> > > > > > > >> >> >>>>>>>>>>>>>>>> downstream
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is that
> > > > downstream
> > > > > > > >> >> >>>> project
> > > > > > > >> >> >>>>>> only
> > > > > > > >> >> >>>>>>>>>>> needs
> > > > > > > >> >> >>>>>>>>>>>> to
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> provide
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration info (maybe
> > > > > > > >> >> >> introducing
> > > > > > > >> >> >>>>> class
> > > > > > > >> >> >>>>>>>>>>>> FlinkConf),
> > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment based on
> > this
> > > > > > > >> >> >>>> FlinkConf,
> > > > > > > >> >> >>>>>> and
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will create
> the
> > > > proper
> > > > > > > >> >> >>>>>>>> ClusterClient.
> > > > > > > >> >> >>>>>>>>>>> It
> > > > > > > >> >> >>>>>>>>>>>> not
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> only
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream project
> > > > > > > >> >> >> development
> > > > > > > >> >> >>>> but
> > > > > > > >> >> >>>>>> also
> > > > > > > >> >> >>>>>>>> be
> > > > > > > >> >> >>>>>>>>>>>>>>>> helpful
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> for
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> their integration test with flink.
> > Here's
> > > > > one
> > > > > > > >> >> >>>>> sample
> > > > > > > >> >> >>>>>>> code
> > > > > > > >> >> >>>>>>>>>>>> snippet
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> that
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> I
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new
> > FlinkConf().mode("yarn")
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new
> > ExecutionEnvironment(conf)
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> > > > > > > >> >> >>>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > env.getClusterClient().cancelJob(jobId)
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <
> trohrmann@apache.org>
> > > > > > > >> >> >>>>> 于2018年12月11日周二
> > > > > > > >> >> >>>>>>>>>>> 下午6:28写道：
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to
> > provide the
> > > > > > > >> >> >> user
> > > > > > > >> >> >>>> with
> > > > > > > >> >> >>>>>>>> better
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was actually an
> > effort to
> > > > > > > >> >> >>> achieve
> > > > > > > >> >> >>>>>> this
> > > > > > > >> >> >>>>>>>> but
> > > > > > > >> >> >>>>>>>>>>> it
> > > > > > > >> >> >>>>>>>>>>>>>>>> has
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> never
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> been
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However, there are
> > some
> > > > > > > >> >> >>>> improvement
> > > > > > > >> >> >>>>>> in
> > > > > > > >> >> >>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>> code
> > > > > > > >> >> >>>>>>>>>>>>>>>>> base
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> now.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the
> > NewClusterClient
> > > > > > > >> >> >>>> interface
> > > > > > > >> >> >>>>>>> which
> > > > > > > >> >> >>>>>>>>>>>>>>>> offers a
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission. But I
> > agree
> > > > > > > >> >> >> that
> > > > > > > >> >> >>> we
> > > > > > > >> >> >>>>>> need
> > > > > > > >> >> >>>>>>> to
> > > > > > > >> >> >>>>>>>>>>>>>>>> improve
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> in
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if
> > exposing all
> > > > > > > >> >> >>>>>> ClusterClient
> > > > > > > >> >> >>>>>>>>>>> calls
> > > > > > > >> >> >>>>>>>>>>>>>>>> via
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment because it
> > would
> > > > > > > >> >> >> clutter
> > > > > > > >> >> >>>> the
> > > > > > > >> >> >>>>>>> class
> > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > >> >> >>>>>>>>>>>>>>>> would
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> not
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> a
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns.
> > Instead one
> > > > > > > >> >> >> idea
> > > > > > > >> >> >>>>> could
> > > > > > > >> >> >>>>>> be
> > > > > > > >> >> >>>>>>>> to
> > > > > > > >> >> >>>>>>>>>>>>>>>>> retrieve
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from the
> > > > > > > >> >> >>>>> ExecutionEnvironment
> > > > > > > >> >> >>>>>>>> which
> > > > > > > >> >> >>>>>>>>>>> can
> > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> be
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> used
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job control. But
> > before
> > > > we
> > > > > > > >> >> >>> start
> > > > > > > >> >> >>>>> an
> > > > > > > >> >> >>>>>>>> effort
> > > > > > > >> >> >>>>>>>>>>>>>>>> here,
> > > > > > > >> >> >>>>>>>>>>>>>>>>> we
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> need
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what
> > functionality we
> > > > > want
> > > > > > > >> >> >>> to
> > > > > > > >> >> >>>>>>> provide.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was that we
> > have the
> > > > > > > >> >> >>>>>>>> ClusterDescriptor
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> describing
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> how
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager like
> > Yarn or
> > > > > > > >> >> >> Mesos.
> > > > > > > >> >> >>>> The
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> can
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> be
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink clusters
> > (job
> > > > and
> > > > > > > >> >> >>>>> session)
> > > > > > > >> >> >>>>>>> and
> > > > > > > >> >> >>>>>>>>>>> gives
> > > > > > > >> >> >>>>>>>>>>>>>>>>> you a
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The ClusterClient
> > > > controls
> > > > > > > >> >> >> the
> > > > > > > >> >> >>>>>> cluster
> > > > > > > >> >> >>>>>>>>>>> (e.g.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> submitting
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running jobs).
> And
> > > > then
> > > > > > > >> >> >>> there
> > > > > > > >> >> >>>>> was
> > > > > > > >> >> >>>>>>> the
> > > > > > > >> >> >>>>>>>>>>> idea
> > > > > > > >> >> >>>>>>>>>>>> to
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain from
> the
> > > > > > > >> >> >>>> ClusterClient
> > > > > > > >> >> >>>>> to
> > > > > > > >> >> >>>>>>>>>>> trigger
> > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> specific
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a
> savepoint,
> > > > > > > >> >> >>> cancelling
> > > > > > > >> >> >>>>> the
> > > > > > > >> >> >>>>>>>> job).
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> > > > > > > >> >> >>>>>> https://issues.apache.org/jira/browse/FLINK-4272
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM
> > Jeff
> > > > > Zhang
> > > > > > > >> >> >> <
> > > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> wrote:
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate flink
> into
> > > > > apache
> > > > > > > >> >> >>>>> zeppelin
> > > > > > > >> >> >>>>>>>>>>> which is
> > > > > > > >> >> >>>>>>>>>>>>>>>> an
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several
> issues
> > that
> > > > > is
> > > > > > > >> >> >>>> caused
> > > > > > > >> >> >>>>>> by
> > > > > > > >> >> >>>>>>>>>>> flink
> > > > > > > >> >> >>>>>>>>>>>>>>>>> client
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> api.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> So
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the
> following
> > > > > changes
> > > > > > > >> >> >>> for
> > > > > > > >> >> >>>>>> flink
> > > > > > > >> >> >>>>>>>>>>> client
> > > > > > > >> >> >>>>>>>>>>>>>>>>> api.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking execution.
> > > > > > > >> >> >> Currently,
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which would
> > do 2
> > > > > > > >> >> >> things,
> > > > > > > >> >> >>>>> first
> > > > > > > >> >> >>>>>>>>>>> submit
> > > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is
> finished.
> > I'd
> > > > > like
> > > > > > > >> >> >>>>>>> introduce a
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> nonblocking
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
> > > > > > > >> >> >>>> ExecutionEnvironment#submit
> > > > > > > >> >> >>>>>>> which
> > > > > > > >> >> >>>>>>>>>>> only
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> submit
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> and
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to client. And
> > allow
> > > > > user
> > > > > > > >> >> >>> to
> > > > > > > >> >> >>>>>> query
> > > > > > > >> >> >>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>> job
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> status
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> via
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to cancel
> > job is
> > > > > via
> > > > > > > >> >> >>> cli
> > > > > > > >> >> >>>>>>>>>>> (bin/flink),
> > > > > > > >> >> >>>>>>>>>>>>>>>>> this
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> is
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> not
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream project
> > to
> > > > use
> > > > > > > >> >> >> this
> > > > > > > >> >> >>>>>>> feature.
> > > > > > > >> >> >>>>>>>>>>> So I'd
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> like
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> add
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in
> ExecutionEnvironment
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> It
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api, we
> > should use
> > > > > > > >> >> >>>>>>>>>>> ExecutionEnvironment
> > > > > > > >> >> >>>>>>>>>>>>>>>> as
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to integrate
> > with
> > > > > > > >> >> >> flink.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job execution
> > > > > > > >> >> >> lifecycle.
> > > > > > > >> >> >>>>>>> Something
> > > > > > > >> >> >>>>>>>>>>> like
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> following,
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> so
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project can do
> > custom
> > > > > logic
> > > > > > > >> >> >>> in
> > > > > > > >> >> >>>>> the
> > > > > > > >> >> >>>>>>>>>>> lifecycle
> > > > > > > >> >> >>>>>>>>>>>>>>>> of
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> job.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the jobId
> > after
> > > > > job
> > > > > > > >> >> >> is
> > > > > > > >> >> >>>>>>> submitted
> > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> use
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> this
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later when
> > > > necessary.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface JobListener {
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID
> jobId);
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
> > onJobExecuted(JobExecutionResult
> > > > > > > >> >> >>>>> jobResult);
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID jobId);
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in
> > > > > ExecutionEnvironment.
> > > > > > > >> >> >>>>>> Currently
> > > > > > > >> >> >>>>>>> it
> > > > > > > >> >> >>>>>>>>>>> is
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> disabled,
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> but
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient for
> > third
> > > > > party
> > > > > > > >> >> >> to
> > > > > > > >> >> >>>>>>>> submitting
> > > > > > > >> >> >>>>>>>>>>> jobs
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it again.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client api
> into
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue which
> > needs
> > > > > more
> > > > > > > >> >> >>>>> careful
> > > > > > > >> >> >>>>>>>>>>> thinking
> > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> design.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features of
> > flink is
> > > > > > > >> >> >>> exposed
> > > > > > > >> >> >>>> in
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > > > >> >> >>>>>> but
> > > > > > > >> >> >>>>>>>>>>> some are
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> exposed
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> in
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like the
> > cancel and
> > > > > > > >> >> >>>>> savepoint I
> > > > > > > >> >> >>>>>>>>>>> mentioned
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> above.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is due to
> that
> > > > flink
> > > > > > > >> >> >>>> didn't
> > > > > > > >> >> >>>>>>> unify
> > > > > > > >> >> >>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> interaction
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> with
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3 scenarios of
> > flink
> > > > > > > >> >> >>>> operation
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.  Flink
> will
> > > > > create
> > > > > > > >> >> >>>>>>>>>>> LocalEnvironment
> > > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> use
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to create
> > > > > > > >> >> >>> LocalExecutor
> > > > > > > >> >> >>>>> for
> > > > > > > >> >> >>>>>>> job
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> execution.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution. Flink
> will
> > > > > create
> > > > > > > >> >> >>>>>>>> ClusterClient
> > > > > > > >> >> >>>>>>>>>>>>>>>>> first
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> and
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> then
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment based
> > on the
> > > > > > > >> >> >>>>>>> ClusterClient
> > > > > > > >> >> >>>>>>>>>>> and
> > > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> run
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink will
> > create
> > > > > > > >> >> >>>>>> ClusterClient
> > > > > > > >> >> >>>>>>>>>>> first
> > > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> then
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this ClusterClient.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the above 3
> > > > scenarios.
> > > > > > > >> >> >>> Flink
> > > > > > > >> >> >>>>>> didn't
> > > > > > > >> >> >>>>>>>>>>> use the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to interact
> > with
> > > > > flink
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is following:
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> > > > > > > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
> > > > > > > >> >> >>>>>>>>>>> (based
> > > > > > > >> >> >>>>>>>>>>>>>>>> on
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> user
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this
> > Environment
> > > > > to
> > > > > > > >> >> >>>> create
> > > > > > > >> >> >>>>>>>> proper
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or
> > > > RestClusterClient)
> > > > > > > >> >> >> to
> > > > > > > >> >> >>>>>>>> interactive
> > > > > > > >> >> >>>>>>>>>>> with
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink (
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the process
> > of
> > > > > local
> > > > > > > >> >> >>>>>> execution
> > > > > > > >> >> >>>>>>>> and
> > > > > > > >> >> >>>>>>>>>>>>>>>> remote
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for third
> > party
> > > > to
> > > > > > > >> >> >>>>> integrate
> > > > > > > >> >> >>>>>>> with
> > > > > > > >> >> >>>>>>>>>>>>>>>> flink,
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> because
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the
> unified
> > > > entry
> > > > > > > >> >> >>> point
> > > > > > > >> >> >>>>> for
> > > > > > > >> >> >>>>>>>>>>> flink.
> > > > > > > >> >> >>>>>>>>>>>>>>>> What
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> third
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> party
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass
> > configuration
> > > > to
> > > > > > > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will do the
> > right
> > > > > > > >> >> >> thing
> > > > > > > >> >> >>>>> based
> > > > > > > >> >> >>>>>> on
> > > > > > > >> >> >>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be considered
> as
> > > > flink
> > > > > > > >> >> >> api
> > > > > > > >> >> >>>>>>> consumer.
> > > > > > > >> >> >>>>>>>>>>> it
> > > > > > > >> >> >>>>>>>>>>>>>>>> just
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to
> > ExecutionEnvironment
> > > > and
> > > > > > > >> >> >> let
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper ClusterClient
> > instead
> > > > > of
> > > > > > > >> >> >>>>> letting
> > > > > > > >> >> >>>>>>> cli
> > > > > > > >> >> >>>>>>>> to
> > > > > > > >> >> >>>>>>>>>>>>>>>>> create
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code
> > refactoring,
> > > > > so
> > > > > > > >> >> >> I
> > > > > > > >> >> >>>>> think
> > > > > > > >> >> >>>>>> we
> > > > > > > >> >> >>>>>>>> can
> > > > > > > >> >> >>>>>>>>>>>>>>>> defer
> > > > > > > >> >> >>>>>>>>>>>>>>>>>> it
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> for
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5 could
> be
> > done
> > > > > at
> > > > > > > >> >> >>>> once I
> > > > > > > >> >> >>>>>>>>>>> believe.
> > > > > > > >> >> >>>>>>>>>>>>>>>> Let
> > > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>> know
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> your
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback, thanks
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> --
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> --
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>> --
> > > > > > > >> >> >>>>>>>>>>>>>>>>> Best Regards
> > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>> --
> > > > > > > >> >> >>>>>>>>>>>>>>> Best Regards
> > > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>> Jeff Zhang
> > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>> --
> > > > > > > >> >> >>>>>>>>>>>>> Best Regards
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>> Jeff Zhang
> > > > > > > >> >> >>>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>>
> > > > > > > >> >> >>>>>>>>>>
> > > > > > > >> >> >>>>>>>>
> > > > > > > >> >> >>>>>>>> --
> > > > > > > >> >> >>>>>>>> Best Regards
> > > > > > > >> >> >>>>>>>>
> > > > > > > >> >> >>>>>>>> Jeff Zhang
> > > > > > > >> >> >>>>>>>>
> > > > > > > >> >> >>>>>>>
> > > > > > > >> >> >>>>>>
> > > > > > > >> >> >>>>>
> > > > > > > >> >> >>>>>
> > > > > > > >> >> >>>>> --
> > > > > > > >> >> >>>>> Best Regards
> > > > > > > >> >> >>>>>
> > > > > > > >> >> >>>>> Jeff Zhang
> > > > > > > >> >> >>>>>
> > > > > > > >> >> >>>>
> > > > > > > >> >> >>>
> > > > > > > >> >> >>
> > > > > > > >> >> >
> > > > > > > >> >> >
> > > > > > > >> >> > --
> > > > > > > >> >> > Best Regards
> > > > > > > >> >> >
> > > > > > > >> >> > Jeff Zhang
> > > > > > > >> >>
> > > > > > > >> >>
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> >
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Till Rohrmann <tr...@apache.org>.

Hi Tison,

just a quick comment concerning the class loading issues when using the per
job mode. The community wants to change it so that the
StandaloneJobClusterEntryPoint actually uses the user code class loader
with child first class loading [1]. Hence, I hope that this problem will be
resolved soon.

[1] https://issues.apache.org/jira/browse/FLINK-13840

Cheers,
Till

On Fri, Aug 23, 2019 at 2:47 PM Kostas Kloudas <kk...@gmail.com> wrote:

> Hi all,
>
> On the topic of web submission, I agree with Till that it only seems
> to complicate things.
> It is bad for security, job isolation (anybody can submit/cancel jobs),
> and its
> implementation complicates some parts of the code. So, if it were to
> redesign the
> WebUI, maybe this part could be left out. In addition, I would say
> that the ability to cancel
> jobs could also be left out.
>
> Also I would also be in favour of removing the "detached" mode, for
> the reasons mentioned
> above (i.e. because now we will have a future representing the result
> on which the user
> can choose to wait or not).
>
> Now for the separating job submission and cluster creation, I am in
> favour of keeping both.
> Once again, the reasons are mentioned above by Stephan, Till, Aljoscha
> and also Zili seems
> to agree. They mainly have to do with security, isolation and ease of
> resource management
> for the user as he knows that "when my job is done, everything will be
> cleared up". This is
> also the experience you get when launching a process on your local OS.
>
> On excluding the per-job mode from returning a JobClient or not, I
> believe that eventually
> it would be nice to allow users to get back a jobClient. The reason is
> that 1) I cannot
> find any objective reason why the user-experience should diverge, and
> 2) this will be the
> way that the user will be able to interact with his running job.
> Assuming that the necessary
> ports are open for the REST API to work, then I think that the
> JobClient can run against the
> REST API without problems. If the needed ports are not open, then we
> are safe to not return
> a JobClient, as the user explicitly chose to close all points of
> communication to his running job.
>
> On the topic of not hijacking the "env.execute()" in order to get the
> Plan, I definitely agree but
> for the proposal of having a "compile()" method in the env, I would
> like to have a better look at
> the existing code.
>
> Cheers,
> Kostas
>
> On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <wa...@gmail.com> wrote:
> >
> > Hi Yang,
> >
> > It would be helpful if you check Stephan's last comment,
> > which states that isolation is important.
> >
> > For per-job mode, we run a dedicated cluster(maybe it
> > should have been a couple of JM and TMs during FLIP-6
> > design) for a specific job. Thus the process is prevented
> > from other jobs.
> >
> > In our cases there was a time we suffered from multi
> > jobs submitted by different users and they affected
> > each other so that all ran into an error state. Also,
> > run the client inside the cluster could save client
> > resource at some points.
> >
> > However, we also face several issues as you mentioned,
> > that in per-job mode it always uses parent classloader
> > thus classloading issues occur.
> >
> > BTW, one can makes an analogy between session/per-job mode
> > in  Flink, and client/cluster mode in Spark.
> >
> > Best,
> > tison.
> >
> >
> > Yang Wang <da...@gmail.com> 于2019年8月22日周四 上午11:25写道：
> >
> > > From the user's perspective, it is really confused about the scope of
> > > per-job cluster.
> > >
> > >
> > > If it means a flink cluster with single job, so that we could get
> better
> > > isolation.
> > >
> > > Now it does not matter how we deploy the cluster, directly
> deploy(mode1)
> > >
> > > or start a flink cluster and then submit job through cluster
> client(mode2).
> > >
> > >
> > > Otherwise, if it just means directly deploy, how should we name the
> mode2,
> > >
> > > session with job or something else?
> > >
> > > We could also benefit from the mode2. Users could get the same
> isolation
> > > with mode1.
> > >
> > > The user code and dependencies will be loaded by user class loader
> > >
> > > to avoid class conflict with framework.
> > >
> > >
> > >
> > > Anyway, both of the two submission modes are useful.
> > >
> > > We just need to clarify the concepts.
> > >
> > >
> > >
> > >
> > > Best,
> > >
> > > Yang
> > >
> > > Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午5:58写道：
> > >
> > > > Thanks for the clarification.
> > > >
> > > > The idea JobDeployer ever came into my mind when I was muddled with
> > > > how to execute per-job mode and session mode with the same user code
> > > > and framework codepath.
> > > >
> > > > With the concept JobDeployer we back to the statement that
> environment
> > > > knows every configs of cluster deployment and job submission. We
> > > > configure or generate from configuration a specific JobDeployer in
> > > > environment and then code align on
> > > >
> > > > *JobClient client = env.execute().get();*
> > > >
> > > > which in session mode returned by clusterClient.submitJob and in
> per-job
> > > > mode returned by clusterDescriptor.deployJobCluster.
> > > >
> > > > Here comes a problem that currently we directly run ClusterEntrypoint
> > > > with extracted job graph. Follow the JobDeployer way we'd better
> > > > align entry point of per-job deployment at JobDeployer. Users run
> > > > their main method or by a Cli(finally call main method) to deploy the
> > > > job cluster.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > Stephan Ewen <se...@apache.org> 于2019年8月20日周二 下午4:40写道：
> > > >
> > > > > Till has made some good comments here.
> > > > >
> > > > > Two things to add:
> > > > >
> > > > >   - The job mode is very nice in the way that it runs the client
> inside
> > > > the
> > > > > cluster (in the same image/process that is the JM) and thus unifies
> > > both
> > > > > applications and what the Spark world calls the "driver mode".
> > > > >
> > > > >   - Another thing I would add is that during the FLIP-6 design, we
> were
> > > > > thinking about setups where Dispatcher and JobManager are separate
> > > > > processes.
> > > > >     A Yarn or Mesos Dispatcher of a session could run independently
> > > (even
> > > > > as privileged processes executing no code).
> > > > >     Then you the "per-job" mode could still be helpful: when a job
> is
> > > > > submitted to the dispatcher, it launches the JM again in a per-job
> > > mode,
> > > > so
> > > > > that JM and TM processes are bound to teh job only. For higher
> security
> > > > > setups, it is important that processes are not reused across jobs.
> > > > >
> > > > > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <
> trohrmann@apache.org>
> > > > > wrote:
> > > > >
> > > > > > I would not be in favour of getting rid of the per-job mode
> since it
> > > > > > simplifies the process of running Flink jobs considerably.
> Moreover,
> > > it
> > > > > is
> > > > > > not only well suited for container deployments but also for
> > > deployments
> > > > > > where you want to guarantee job isolation. For example, a user
> could
> > > > use
> > > > > > the per-job mode on Yarn to execute his job on a separate
> cluster.
> > > > > >
> > > > > > I think that having two notions of cluster deployments (session
> vs.
> > > > > per-job
> > > > > > mode) does not necessarily contradict your ideas for the client
> api
> > > > > > refactoring. For example one could have the following interfaces:
> > > > > >
> > > > > > - ClusterDeploymentDescriptor: encapsulates the logic how to
> deploy a
> > > > > > cluster.
> > > > > > - ClusterClient: allows to interact with a cluster
> > > > > > - JobClient: allows to interact with a running job
> > > > > >
> > > > > > Now the ClusterDeploymentDescriptor could have two methods:
> > > > > >
> > > > > > - ClusterClient deploySessionCluster()
> > > > > > - JobClusterClient/JobClient deployPerJobCluster(JobGraph)
> > > > > >
> > > > > > where JobClusterClient is either a supertype of ClusterClient
> which
> > > > does
> > > > > > not give you the functionality to submit jobs or
> deployPerJobCluster
> > > > > > returns directly a JobClient.
> > > > > >
> > > > > > When setting up the ExecutionEnvironment, one would then not
> provide
> > > a
> > > > > > ClusterClient to submit jobs but a JobDeployer which, depending
> on
> > > the
> > > > > > selected mode, either uses a ClusterClient (session mode) to
> submit
> > > > jobs
> > > > > or
> > > > > > a ClusterDeploymentDescriptor to deploy per a job mode cluster
> with
> > > the
> > > > > job
> > > > > > to execute.
> > > > > >
> > > > > > These are just some thoughts how one could make it working
> because I
> > > > > > believe there is some value in using the per job mode from the
> > > > > > ExecutionEnvironment.
> > > > > >
> > > > > > Concerning the web submission, this is indeed a bit tricky. From
> a
> > > > > cluster
> > > > > > management stand point, I would in favour of not executing user
> code
> > > on
> > > > > the
> > > > > > REST endpoint. Especially when considering security, it would be
> good
> > > > to
> > > > > > have a well defined cluster behaviour where it is explicitly
> stated
> > > > where
> > > > > > user code and, thus, potentially risky code is executed. Ideally
> we
> > > > limit
> > > > > > it to the TaskExecutor and JobMaster.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <
> > > > pompermaier@okkam.it
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > In my opinion the client should not use any environment to get
> the
> > > > Job
> > > > > > > graph because the jar should reside ONLY on the cluster (and
> not in
> > > > the
> > > > > > > client classpath otherwise there are always inconsistencies
> between
> > > > > > client
> > > > > > > and Flink Job manager's classpath).
> > > > > > > In the YARN, Mesos and Kubernetes scenarios you have the jar
> but
> > > you
> > > > > > could
> > > > > > > start a cluster that has the jar on the Job Manager as well
> (but
> > > this
> > > > > is
> > > > > > > the only case where I think you can assume that the client has
> the
> > > > jar
> > > > > on
> > > > > > > the classpath..in the REST job submission you don't have any
> > > > > classpath).
> > > > > > >
> > > > > > > Thus, always in my opinion, the JobGraph should be generated
> by the
> > > > Job
> > > > > > > Manager REST API.
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <
> wander4096@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > >> I would like to involve Till & Stephan here to clarify some
> > > concept
> > > > of
> > > > > > >> per-job mode.
> > > > > > >>
> > > > > > >> The term per-job is one of modes a cluster could run on. It is
> > > > mainly
> > > > > > >> aimed
> > > > > > >> at spawn
> > > > > > >> a dedicated cluster for a specific job while the job could be
> > > > packaged
> > > > > > >> with
> > > > > > >> Flink
> > > > > > >> itself and thus the cluster initialized with job so that get
> rid
> > > of
> > > > a
> > > > > > >> separated
> > > > > > >> submission step.
> > > > > > >>
> > > > > > >> This is useful for container deployments where one create his
> > > image
> > > > > with
> > > > > > >> the job
> > > > > > >> and then simply deploy the container.
> > > > > > >>
> > > > > > >> However, it is out of client scope since a
> client(ClusterClient
> > > for
> > > > > > >> example) is for
> > > > > > >> communicate with an existing cluster and performance actions.
> > > > > Currently,
> > > > > > >> in
> > > > > > >> per-job
> > > > > > >> mode, we extract the job graph and bundle it into cluster
> > > deployment
> > > > > and
> > > > > > >> thus no
> > > > > > >> concept of client get involved. It looks like reasonable to
> > > exclude
> > > > > the
> > > > > > >> deployment
> > > > > > >> of per-job cluster from client api and use dedicated utility
> > > > > > >> classes(deployers) for
> > > > > > >> deployment.
> > > > > > >>
> > > > > > >> Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午12:37写道：
> > > > > > >>
> > > > > > >> > Hi Aljoscha,
> > > > > > >> >
> > > > > > >> > Thanks for your reply and participance. The Google Doc you
> > > linked
> > > > to
> > > > > > >> > requires
> > > > > > >> > permission and I think you could use a share link instead.
> > > > > > >> >
> > > > > > >> > I agree with that we almost reach a consensus that
> JobClient is
> > > > > > >> necessary
> > > > > > >> > to
> > > > > > >> > interacte with a running Job.
> > > > > > >> >
> > > > > > >> > Let me check your open questions one by one.
> > > > > > >> >
> > > > > > >> > 1. Separate cluster creation and job submission for per-job
> > > mode.
> > > > > > >> >
> > > > > > >> > As you mentioned here is where the opinions diverge. In my
> > > > document
> > > > > > >> there
> > > > > > >> > is
> > > > > > >> > an alternative[2] that proposes excluding per-job deployment
> > > from
> > > > > > client
> > > > > > >> > api
> > > > > > >> > scope and now I find it is more reasonable we do the
> exclusion.
> > > > > > >> >
> > > > > > >> > When in per-job mode, a dedicated JobCluster is launched to
> > > > execute
> > > > > > the
> > > > > > >> > specific job. It is like a Flink Application more than a
> > > > submission
> > > > > > >> > of Flink Job. Client only takes care of job submission and
> > > assume
> > > > > > there
> > > > > > >> is
> > > > > > >> > an existing cluster. In this way we are able to consider
> per-job
> > > > > > issues
> > > > > > >> > individually and JobClusterEntrypoint would be the utility
> class
> > > > for
> > > > > > >> > per-job
> > > > > > >> > deployment.
> > > > > > >> >
> > > > > > >> > Nevertheless, user program works in both session mode and
> > > per-job
> > > > > mode
> > > > > > >> > without
> > > > > > >> > necessary to change code. JobClient in per-job mode is
> returned
> > > > from
> > > > > > >> > env.execute as normal. However, it would be no longer a
> wrapper
> > > of
> > > > > > >> > RestClusterClient but a wrapper of PerJobClusterClient which
> > > > > > >> communicates
> > > > > > >> > to Dispatcher locally.
> > > > > > >> >
> > > > > > >> > 2. How to deal with plan preview.
> > > > > > >> >
> > > > > > >> > With env.compile functions users can get JobGraph or
> FlinkPlan
> > > and
> > > > > > thus
> > > > > > >> > they can preview the plan with programming. Typically it
> looks
> > > > like
> > > > > > >> >
> > > > > > >> > if (preview configured) {
> > > > > > >> >     FlinkPlan plan = env.compile();
> > > > > > >> >     new JSONDumpGenerator(...).dump(plan);
> > > > > > >> > } else {
> > > > > > >> >     env.execute();
> > > > > > >> > }
> > > > > > >> >
> > > > > > >> > And `flink info` would be invalid any more.
> > > > > > >> >
> > > > > > >> > 3. How to deal with Jar Submission at the Web Frontend.
> > > > > > >> >
> > > > > > >> > There is one more thread talked on this topic[1]. Apart from
> > > > > removing
> > > > > > >> > the functions there are two alternatives.
> > > > > > >> >
> > > > > > >> > One is to introduce an interface has a method returns
> > > > > > JobGraph/FilnkPlan
> > > > > > >> > and Jar Submission only support main-class implements this
> > > > > interface.
> > > > > > >> > And then extract the JobGraph/FlinkPlan just by calling the
> > > > method.
> > > > > > >> > In this way, it is even possible to consider a separation
> of job
> > > > > > >> creation
> > > > > > >> > and job submission.
> > > > > > >> >
> > > > > > >> > The other is, as you mentioned, let execute() do the actual
> > > > > execution.
> > > > > > >> > We won't execute the main method in the WebFrontend but
> spawn a
> > > > > > process
> > > > > > >> > at WebMonitor side to execute. For return part we could
> generate
> > > > the
> > > > > > >> > JobID from WebMonitor and pass it to the execution
> environemnt.
> > > > > > >> >
> > > > > > >> > 4. How to deal with detached mode.
> > > > > > >> >
> > > > > > >> > I think detached mode is a temporary solution for
> non-blocking
> > > > > > >> submission.
> > > > > > >> > In my document both submission and execution return a
> > > > > > CompletableFuture
> > > > > > >> and
> > > > > > >> > users control whether or not wait for the result. In this
> point
> > > we
> > > > > > don't
> > > > > > >> > need a detached option but the functionality is covered.
> > > > > > >> >
> > > > > > >> > 5. How does per-job mode interact with interactive
> programming.
> > > > > > >> >
> > > > > > >> > All of YARN, Mesos and Kubernetes scenarios follow the
> pattern
> > > > > launch
> > > > > > a
> > > > > > >> > JobCluster now. And I don't think there would be
> inconsistency
> > > > > between
> > > > > > >> > different resource management.
> > > > > > >> >
> > > > > > >> > Best,
> > > > > > >> > tison.
> > > > > > >> >
> > > > > > >> > [1]
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> > > > > > >> > [2]
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> > > > > > >> >
> > > > > > >> > Aljoscha Krettek <al...@apache.org> 于2019年8月16日周五
> 下午9:20写道：
> > > > > > >> >
> > > > > > >> >> Hi,
> > > > > > >> >>
> > > > > > >> >> I read both Jeffs initial design document and the newer
> > > document
> > > > by
> > > > > > >> >> Tison. I also finally found the time to collect our
> thoughts on
> > > > the
> > > > > > >> issue,
> > > > > > >> >> I had quite some discussions with Kostas and this is the
> > > result:
> > > > > [1].
> > > > > > >> >>
> > > > > > >> >> I think overall we agree that this part of the code is in
> dire
> > > > need
> > > > > > of
> > > > > > >> >> some refactoring/improvements but I think there are still
> some
> > > > open
> > > > > > >> >> questions and some differences in opinion what those
> > > refactorings
> > > > > > >> should
> > > > > > >> >> look like.
> > > > > > >> >>
> > > > > > >> >> I think the API-side is quite clear, i.e. we need some
> > > JobClient
> > > > > API
> > > > > > >> that
> > > > > > >> >> allows interacting with a running Job. It could be
> worthwhile
> > > to
> > > > > spin
> > > > > > >> that
> > > > > > >> >> off into a separate FLIP because we can probably find
> consensus
> > > > on
> > > > > > that
> > > > > > >> >> part more easily.
> > > > > > >> >>
> > > > > > >> >> For the rest, the main open questions from our doc are
> these:
> > > > > > >> >>
> > > > > > >> >>   - Do we want to separate cluster creation and job
> submission
> > > > for
> > > > > > >> >> per-job mode? In the past, there were conscious efforts to
> > > *not*
> > > > > > >> separate
> > > > > > >> >> job submission from cluster creation for per-job clusters
> for
> > > > > Mesos,
> > > > > > >> YARN,
> > > > > > >> >> Kubernets (see StandaloneJobClusterEntryPoint). Tison
> suggests
> > > in
> > > > > his
> > > > > > >> >> design document to decouple this in order to unify job
> > > > submission.
> > > > > > >> >>
> > > > > > >> >>   - How to deal with plan preview, which needs to hijack
> > > > execute()
> > > > > > and
> > > > > > >> >> let the outside code catch an exception?
> > > > > > >> >>
> > > > > > >> >>   - How to deal with Jar Submission at the Web Frontend,
> which
> > > > > needs
> > > > > > to
> > > > > > >> >> hijack execute() and let the outside code catch an
> exception?
> > > > > > >> >> CliFrontend.run() “hijacks” ExecutionEnvironment.execute()
> to
> > > > get a
> > > > > > >> >> JobGraph and then execute that JobGraph manually. We could
> get
> > > > > around
> > > > > > >> that
> > > > > > >> >> by letting execute() do the actual execution. One caveat
> for
> > > this
> > > > > is
> > > > > > >> that
> > > > > > >> >> now the main() method doesn’t return (or is forced to
> return by
> > > > > > >> throwing an
> > > > > > >> >> exception from execute()) which means that for Jar
> Submission
> > > > from
> > > > > > the
> > > > > > >> >> WebFrontend we have a long-running main() method running
> in the
> > > > > > >> >> WebFrontend. This doesn’t sound very good. We could get
> around
> > > > this
> > > > > > by
> > > > > > >> >> removing the plan preview feature and by removing Jar
> > > > > > >> Submission/Running.
> > > > > > >> >>
> > > > > > >> >>   - How to deal with detached mode? Right now,
> > > > DetachedEnvironment
> > > > > > will
> > > > > > >> >> execute the job and return immediately. If users control
> when
> > > > they
> > > > > > >> want to
> > > > > > >> >> return, by waiting on the job completion future, how do we
> deal
> > > > > with
> > > > > > >> this?
> > > > > > >> >> Do we simply remove the distinction between
> > > > detached/non-detached?
> > > > > > >> >>
> > > > > > >> >>   - How does per-job mode interact with “interactive
> > > programming”
> > > > > > >> >> (FLIP-36). For YARN, each execute() call could spawn a new
> > > Flink
> > > > > YARN
> > > > > > >> >> cluster. What about Mesos and Kubernetes?
> > > > > > >> >>
> > > > > > >> >> The first open question is where the opinions diverge, I
> think.
> > > > The
> > > > > > >> rest
> > > > > > >> >> are just open questions and interesting things that we
> need to
> > > > > > >> consider.
> > > > > > >> >>
> > > > > > >> >> Best,
> > > > > > >> >> Aljoscha
> > > > > > >> >>
> > > > > > >> >> [1]
> > > > > > >> >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > > > >> >> <
> > > > > > >> >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > > > >> >> >
> > > > > > >> >>
> > > > > > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <zj...@gmail.com>
> > > > wrote:
> > > > > > >> >> >
> > > > > > >> >> > Thanks tison for the effort. I left a few comments.
> > > > > > >> >> >
> > > > > > >> >> >
> > > > > > >> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三 下午8:24写道：
> > > > > > >> >> >
> > > > > > >> >> >> Hi Flavio,
> > > > > > >> >> >>
> > > > > > >> >> >> Thanks for your reply.
> > > > > > >> >> >>
> > > > > > >> >> >> Either current impl and in the design, ClusterClient
> > > > > > >> >> >> never takes responsibility for generating JobGraph.
> > > > > > >> >> >> (what you see in current codebase is several class
> methods)
> > > > > > >> >> >>
> > > > > > >> >> >> Instead, user describes his program in the main method
> > > > > > >> >> >> with ExecutionEnvironment apis and calls env.compile()
> > > > > > >> >> >> or env.optimize() to get FlinkPlan and JobGraph
> > > respectively.
> > > > > > >> >> >>
> > > > > > >> >> >> For listing main classes in a jar and choose one for
> > > > > > >> >> >> submission, you're now able to customize a CLI to do it.
> > > > > > >> >> >> Specifically, the path of jar is passed as arguments and
> > > > > > >> >> >> in the customized CLI you list main classes, choose one
> > > > > > >> >> >> to submit to the cluster.
> > > > > > >> >> >>
> > > > > > >> >> >> Best,
> > > > > > >> >> >> tison.
> > > > > > >> >> >>
> > > > > > >> >> >>
> > > > > > >> >> >> Flavio Pompermaier <po...@okkam.it> 于2019年7月31日周三
> > > > > 下午8:12写道：
> > > > > > >> >> >>
> > > > > > >> >> >>> Just one note on my side: it is not clear to me
> whether the
> > > > > > client
> > > > > > >> >> needs
> > > > > > >> >> >> to
> > > > > > >> >> >>> be able to generate a job graph or not.
> > > > > > >> >> >>> In my opinion, the job jar must resides only on the
> > > > > > >> server/jobManager
> > > > > > >> >> >> side
> > > > > > >> >> >>> and the client requires a way to get the job graph.
> > > > > > >> >> >>> If you really want to access to the job graph, I'd add
> a
> > > > > > dedicated
> > > > > > >> >> method
> > > > > > >> >> >>> on the ClusterClient. like:
> > > > > > >> >> >>>
> > > > > > >> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
> > > > > > >> >> >>>   - listMainClasses(jarId): List<String>
> > > > > > >> >> >>>
> > > > > > >> >> >>> These would require some addition also on the job
> manager
> > > > > > endpoint
> > > > > > >> as
> > > > > > >> >> >>> well..what do you think?
> > > > > > >> >> >>>
> > > > > > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <
> > > > > wander4096@gmail.com
> > > > > > >
> > > > > > >> >> wrote:
> > > > > > >> >> >>>
> > > > > > >> >> >>>> Hi all,
> > > > > > >> >> >>>>
> > > > > > >> >> >>>> Here is a document[1] on client api enhancement from
> our
> > > > > > >> perspective.
> > > > > > >> >> >>>> We have investigated current implementations. And we
> > > propose
> > > > > > >> >> >>>>
> > > > > > >> >> >>>> 1. Unify the implementation of cluster deployment and
> job
> > > > > > >> submission
> > > > > > >> >> in
> > > > > > >> >> >>>> Flink.
> > > > > > >> >> >>>> 2. Provide programmatic interfaces to allow flexible
> job
> > > and
> > > > > > >> cluster
> > > > > > >> >> >>>> management.
> > > > > > >> >> >>>>
> > > > > > >> >> >>>> The first proposal is aimed at reducing code paths of
> > > > cluster
> > > > > > >> >> >> deployment
> > > > > > >> >> >>>> and
> > > > > > >> >> >>>> job submission so that one can adopt Flink in his
> usage
> > > > > easily.
> > > > > > >> The
> > > > > > >> >> >>> second
> > > > > > >> >> >>>> proposal is aimed at providing rich interfaces for
> > > advanced
> > > > > > users
> > > > > > >> >> >>>> who want to make accurate control of these stages.
> > > > > > >> >> >>>>
> > > > > > >> >> >>>> Quick reference on open questions:
> > > > > > >> >> >>>>
> > > > > > >> >> >>>> 1. Exclude job cluster deployment from client side or
> > > > redefine
> > > > > > the
> > > > > > >> >> >>> semantic
> > > > > > >> >> >>>> of job cluster? Since it fits in a process quite
> different
> > > > > from
> > > > > > >> >> session
> > > > > > >> >> >>>> cluster deployment and job submission.
> > > > > > >> >> >>>>
> > > > > > >> >> >>>> 2. Maintain the codepaths handling class
> > > > > > o.a.f.api.common.Program
> > > > > > >> or
> > > > > > >> >> >>>> implement customized program handling logic by
> customized
> > > > > > >> >> CliFrontend?
> > > > > > >> >> >>>> See also this thread[2] and the document[1].
> > > > > > >> >> >>>>
> > > > > > >> >> >>>> 3. Expose ClusterClient as public api or just expose
> api
> > > in
> > > > > > >> >> >>>> ExecutionEnvironment
> > > > > > >> >> >>>> and delegate them to ClusterClient? Further, in
> either way
> > > > is
> > > > > it
> > > > > > >> >> worth
> > > > > > >> >> >> to
> > > > > > >> >> >>>> introduce a JobClient which is an encapsulation of
> > > > > ClusterClient
> > > > > > >> that
> > > > > > >> >> >>>> associated to specific job?
> > > > > > >> >> >>>>
> > > > > > >> >> >>>> Best,
> > > > > > >> >> >>>> tison.
> > > > > > >> >> >>>>
> > > > > > >> >> >>>> [1]
> > > > > > >> >> >>>>
> > > > > > >> >> >>>>
> > > > > > >> >> >>>
> > > > > > >> >> >>
> > > > > > >> >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> > > > > > >> >> >>>> [2]
> > > > > > >> >> >>>>
> > > > > > >> >> >>>>
> > > > > > >> >> >>>
> > > > > > >> >> >>
> > > > > > >> >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> > > > > > >> >> >>>>
> > > > > > >> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三 上午9:19写道：
> > > > > > >> >> >>>>
> > > > > > >> >> >>>>> Thanks Stephan, I will follow up this issue in next
> few
> > > > > weeks,
> > > > > > >> and
> > > > > > >> >> >> will
> > > > > > >> >> >>>>> refine the design doc. We could discuss more details
> > > after
> > > > > 1.9
> > > > > > >> >> >> release.
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>> Stephan Ewen <se...@apache.org> 于2019年7月24日周三
> 上午12:58写道：
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>>> Hi all!
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>> This thread has stalled for a bit, which I assume
> ist
> > > > mostly
> > > > > > >> due to
> > > > > > >> >> >>> the
> > > > > > >> >> >>>>>> Flink 1.9 feature freeze and release testing effort.
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>> I personally still recognize this issue as one
> important
> > > > to
> > > > > be
> > > > > > >> >> >>> solved.
> > > > > > >> >> >>>>> I'd
> > > > > > >> >> >>>>>> be happy to help resume this discussion soon (after
> the
> > > > 1.9
> > > > > > >> >> >> release)
> > > > > > >> >> >>>> and
> > > > > > >> >> >>>>>> see if we can do some step towards this in Flink
> 1.10.
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>> Best,
> > > > > > >> >> >>>>>> Stephan
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier
> <
> > > > > > >> >> >>>>> pompermaier@okkam.it>
> > > > > > >> >> >>>>>> wrote:
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>>> That's exactly what I suggested a long time ago:
> the
> > > > Flink
> > > > > > REST
> > > > > > >> >> >>>> client
> > > > > > >> >> >>>>>>> should not require any Flink dependency, only http
> > > > library
> > > > > to
> > > > > > >> >> >> call
> > > > > > >> >> >>>> the
> > > > > > >> >> >>>>>> REST
> > > > > > >> >> >>>>>>> services to submit and monitor a job.
> > > > > > >> >> >>>>>>> What I suggested also in [1] was to have a way to
> > > > > > automatically
> > > > > > >> >> >>>> suggest
> > > > > > >> >> >>>>>> the
> > > > > > >> >> >>>>>>> user (via a UI) the available main classes and
> their
> > > > > required
> > > > > > >> >> >>>>>>> parameters[2].
> > > > > > >> >> >>>>>>> Another problem we have with Flink is that the Rest
> > > > client
> > > > > > and
> > > > > > >> >> >> the
> > > > > > >> >> >>>> CLI
> > > > > > >> >> >>>>>> one
> > > > > > >> >> >>>>>>> behaves differently and we use the CLI client (via
> ssh)
> > > > > > because
> > > > > > >> >> >> it
> > > > > > >> >> >>>>> allows
> > > > > > >> >> >>>>>>> to call some other method after env.execute() [3]
> (we
> > > > have
> > > > > to
> > > > > > >> >> >> call
> > > > > > >> >> >>>>>> another
> > > > > > >> >> >>>>>>> REST service to signal the end of the job).
> > > > > > >> >> >>>>>>> Int his regard, a dedicated interface, like the
> > > > JobListener
> > > > > > >> >> >>> suggested
> > > > > > >> >> >>>>> in
> > > > > > >> >> >>>>>>> the previous emails, would be very helpful (IMHO).
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>> [1]
> https://issues.apache.org/jira/browse/FLINK-10864
> > > > > > >> >> >>>>>>> [2]
> https://issues.apache.org/jira/browse/FLINK-10862
> > > > > > >> >> >>>>>>> [3]
> https://issues.apache.org/jira/browse/FLINK-10879
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>> Best,
> > > > > > >> >> >>>>>>> Flavio
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <
> > > > > zjffdu@gmail.com
> > > > > > >
> > > > > > >> >> >>> wrote:
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>>> Hi, Tison,
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> Thanks for your comments. Overall I agree with you
> > > that
> > > > it
> > > > > > is
> > > > > > >> >> >>>>> difficult
> > > > > > >> >> >>>>>>> for
> > > > > > >> >> >>>>>>>> down stream project to integrate with flink and we
> > > need
> > > > to
> > > > > > >> >> >>> refactor
> > > > > > >> >> >>>>> the
> > > > > > >> >> >>>>>>>> current flink client api.
> > > > > > >> >> >>>>>>>> And I agree that CliFrontend should only parsing
> > > command
> > > > > > line
> > > > > > >> >> >>>>> arguments
> > > > > > >> >> >>>>>>> and
> > > > > > >> >> >>>>>>>> then pass them to ExecutionEnvironment. It is
> > > > > > >> >> >>>> ExecutionEnvironment's
> > > > > > >> >> >>>>>>>> responsibility to compile job, create cluster, and
> > > > submit
> > > > > > job.
> > > > > > >> >> >>>>> Besides
> > > > > > >> >> >>>>>>>> that, Currently flink has many
> ExecutionEnvironment
> > > > > > >> >> >>>> implementations,
> > > > > > >> >> >>>>>> and
> > > > > > >> >> >>>>>>>> flink will use the specific one based on the
> context.
> > > > > IMHO,
> > > > > > it
> > > > > > >> >> >> is
> > > > > > >> >> >>>> not
> > > > > > >> >> >>>>>>>> necessary, ExecutionEnvironment should be able to
> do
> > > the
> > > > > > right
> > > > > > >> >> >>>> thing
> > > > > > >> >> >>>>>>> based
> > > > > > >> >> >>>>>>>> on the FlinkConf it is received. Too many
> > > > > > ExecutionEnvironment
> > > > > > >> >> >>>>>>>> implementation is another burden for downstream
> > > project
> > > > > > >> >> >>>> integration.
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> One thing I'd like to mention is flink's scala
> shell
> > > and
> > > > > sql
> > > > > > >> >> >>>> client,
> > > > > > >> >> >>>>>>>> although they are sub-modules of flink, they
> could be
> > > > > > treated
> > > > > > >> >> >> as
> > > > > > >> >> >>>>>>> downstream
> > > > > > >> >> >>>>>>>> project which use flink's client api. Currently
> you
> > > will
> > > > > > find
> > > > > > >> >> >> it
> > > > > > >> >> >>> is
> > > > > > >> >> >>>>> not
> > > > > > >> >> >>>>>>>> easy for them to integrate with flink, they share
> many
> > > > > > >> >> >> duplicated
> > > > > > >> >> >>>>> code.
> > > > > > >> >> >>>>>>> It
> > > > > > >> >> >>>>>>>> is another sign that we should refactor flink
> client
> > > > api.
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> I believe it is a large and hard change, and I am
> > > afraid
> > > > > we
> > > > > > >> can
> > > > > > >> >> >>> not
> > > > > > >> >> >>>>>> keep
> > > > > > >> >> >>>>>>>> compatibility since many of changes are user
> facing.
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月24日周一
> > > > 下午2:53写道：
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>>> Hi all,
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> After a closer look on our client apis, I can see
> > > there
> > > > > are
> > > > > > >> >> >> two
> > > > > > >> >> >>>>> major
> > > > > > >> >> >>>>>>>>> issues to consistency and integration, namely
> > > different
> > > > > > >> >> >>>> deployment
> > > > > > >> >> >>>>> of
> > > > > > >> >> >>>>>>>>> job cluster which couples job graph creation and
> > > > cluster
> > > > > > >> >> >>>>> deployment,
> > > > > > >> >> >>>>>>>>> and submission via CliFrontend confusing control
> flow
> > > > of
> > > > > > job
> > > > > > >> >> >>>> graph
> > > > > > >> >> >>>>>>>>> compilation and job submission. I'd like to
> follow
> > > the
> > > > > > >> >> >> discuss
> > > > > > >> >> >>>>> above,
> > > > > > >> >> >>>>>>>>> mainly the process described by Jeff and
> Stephan, and
> > > > > share
> > > > > > >> >> >> my
> > > > > > >> >> >>>>>>>>> ideas on these issues.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> 1) CliFrontend confuses the control flow of job
> > > > > compilation
> > > > > > >> >> >> and
> > > > > > >> >> >>>>>>>> submission.
> > > > > > >> >> >>>>>>>>> Following the process of job submission Stephan
> and
> > > > Jeff
> > > > > > >> >> >>>> described,
> > > > > > >> >> >>>>>>>>> execution environment knows all configs of the
> > > cluster
> > > > > and
> > > > > > >> >> >>>>>>> topos/settings
> > > > > > >> >> >>>>>>>>> of the job. Ideally, in the main method of user
> > > > program,
> > > > > it
> > > > > > >> >> >>> calls
> > > > > > >> >> >>>>>>>> #execute
> > > > > > >> >> >>>>>>>>> (or named #submit) and Flink deploys the cluster,
> > > > compile
> > > > > > the
> > > > > > >> >> >>> job
> > > > > > >> >> >>>>>> graph
> > > > > > >> >> >>>>>>>>> and submit it to the cluster. However, current
> > > > > CliFrontend
> > > > > > >> >> >> does
> > > > > > >> >> >>>> all
> > > > > > >> >> >>>>>>> these
> > > > > > >> >> >>>>>>>>> things inside its #runProgram method, which
> > > introduces
> > > > a
> > > > > > lot
> > > > > > >> >> >> of
> > > > > > >> >> >>>>>>>> subclasses
> > > > > > >> >> >>>>>>>>> of (stream) execution environment.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> Actually, it sets up an exec env that hijacks the
> > > > > > >> >> >>>>>> #execute/executePlan
> > > > > > >> >> >>>>>>>>> method, initializes the job graph and abort
> > > execution.
> > > > > And
> > > > > > >> >> >> then
> > > > > > >> >> >>>>>>>>> control flow back to CliFrontend, it deploys the
> > > > > cluster(or
> > > > > > >> >> >>>>> retrieve
> > > > > > >> >> >>>>>>>>> the client) and submits the job graph. This is
> quite
> > > a
> > > > > > >> >> >> specific
> > > > > > >> >> >>>>>>> internal
> > > > > > >> >> >>>>>>>>> process inside Flink and none of consistency to
> > > > anything.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> 2) Deployment of job cluster couples job graph
> > > creation
> > > > > and
> > > > > > >> >> >>>> cluster
> > > > > > >> >> >>>>>>>>> deployment. Abstractly, from user job to a
> concrete
> > > > > > >> >> >> submission,
> > > > > > >> >> >>>> it
> > > > > > >> >> >>>>>>>> requires
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>>     create JobGraph --\
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> create ClusterClient -->  submit JobGraph
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> such a dependency. ClusterClient was created by
> > > > deploying
> > > > > > or
> > > > > > >> >> >>>>>>> retrieving.
> > > > > > >> >> >>>>>>>>> JobGraph submission requires a compiled JobGraph
> and
> > > > > valid
> > > > > > >> >> >>>>>>> ClusterClient,
> > > > > > >> >> >>>>>>>>> but the creation of ClusterClient is abstractly
> > > > > independent
> > > > > > >> >> >> of
> > > > > > >> >> >>>> that
> > > > > > >> >> >>>>>> of
> > > > > > >> >> >>>>>>>>> JobGraph. However, in job cluster mode, we
> deploy job
> > > > > > cluster
> > > > > > >> >> >>>> with
> > > > > > >> >> >>>>> a
> > > > > > >> >> >>>>>>> job
> > > > > > >> >> >>>>>>>>> graph, which means we use another process:
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> create JobGraph --> deploy cluster with the
> JobGraph
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> Here is another inconsistency and downstream
> > > > > > projects/client
> > > > > > >> >> >>> apis
> > > > > > >> >> >>>>> are
> > > > > > >> >> >>>>>>>>> forced to handle different cases with rare
> supports
> > > > from
> > > > > > >> >> >> Flink.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> Since we likely reached a consensus on
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> 1. all configs gathered by Flink configuration
> and
> > > > passed
> > > > > > >> >> >>>>>>>>> 2. execution environment knows all configs and
> > > handles
> > > > > > >> >> >>>>> execution(both
> > > > > > >> >> >>>>>>>>> deployment and submission)
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> to the issues above I propose eliminating
> > > > inconsistencies
> > > > > > by
> > > > > > >> >> >>>>>> following
> > > > > > >> >> >>>>>>>>> approach:
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> 1) CliFrontend should exactly be a front end, at
> > > least
> > > > > for
> > > > > > >> >> >>> "run"
> > > > > > >> >> >>>>>>> command.
> > > > > > >> >> >>>>>>>>> That means it just gathered and passed all config
> > > from
> > > > > > >> >> >> command
> > > > > > >> >> >>>> line
> > > > > > >> >> >>>>>> to
> > > > > > >> >> >>>>>>>>> the main method of user program. Execution
> > > environment
> > > > > > knows
> > > > > > >> >> >>> all
> > > > > > >> >> >>>>> the
> > > > > > >> >> >>>>>>> info
> > > > > > >> >> >>>>>>>>> and with an addition to utils for ClusterClient,
> we
> > > > > > >> >> >> gracefully
> > > > > > >> >> >>>> get
> > > > > > >> >> >>>>> a
> > > > > > >> >> >>>>>>>>> ClusterClient by deploying or retrieving. In this
> > > way,
> > > > we
> > > > > > >> >> >> don't
> > > > > > >> >> >>>>> need
> > > > > > >> >> >>>>>> to
> > > > > > >> >> >>>>>>>>> hijack #execute/executePlan methods and can
> remove
> > > > > various
> > > > > > >> >> >>>> hacking
> > > > > > >> >> >>>>>>>>> subclasses of exec env, as well as #run methods
> in
> > > > > > >> >> >>>>> ClusterClient(for
> > > > > > >> >> >>>>>> an
> > > > > > >> >> >>>>>>>>> interface-ized ClusterClient). Now the control
> flow
> > > > flows
> > > > > > >> >> >> from
> > > > > > >> >> >>>>>>>> CliFrontend
> > > > > > >> >> >>>>>>>>> to the main method and never returns.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> 2) Job cluster means a cluster for the specific
> job.
> > > > From
> > > > > > >> >> >>> another
> > > > > > >> >> >>>>>>>>> perspective, it is an ephemeral session. We may
> > > > decouple
> > > > > > the
> > > > > > >> >> >>>>>> deployment
> > > > > > >> >> >>>>>>>>> with a compiled job graph, but start a session
> with
> > > > idle
> > > > > > >> >> >>> timeout
> > > > > > >> >> >>>>>>>>> and submit the job following.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> These topics, before we go into more details on
> > > design
> > > > or
> > > > > > >> >> >>>>>>> implementation,
> > > > > > >> >> >>>>>>>>> are better to be aware and discussed for a
> consensus.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> Best,
> > > > > > >> >> >>>>>>>>> tison.
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月20日周四
> > > > 上午3:21写道：
> > > > > > >> >> >>>>>>>>>
> > > > > > >> >> >>>>>>>>>> Hi Jeff,
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>> Thanks for raising this thread and the design
> > > > document!
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>> As @Thomas Weise mentioned above, extending
> config
> > > to
> > > > > > flink
> > > > > > >> >> >>>>>>>>>> requires far more effort than it should be.
> Another
> > > > > > example
> > > > > > >> >> >>>>>>>>>> is we achieve detach mode by introduce another
> > > > execution
> > > > > > >> >> >>>>>>>>>> environment which also hijack #execute method.
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>> I agree with your idea that user would
> configure all
> > > > > > things
> > > > > > >> >> >>>>>>>>>> and flink "just" respect it. On this topic I
> think
> > > the
> > > > > > >> >> >> unusual
> > > > > > >> >> >>>>>>>>>> control flow when CliFrontend handle "run"
> command
> > > is
> > > > > the
> > > > > > >> >> >>>> problem.
> > > > > > >> >> >>>>>>>>>> It handles several configs, mainly about cluster
> > > > > settings,
> > > > > > >> >> >> and
> > > > > > >> >> >>>>>>>>>> thus main method of user program is unaware of
> them.
> > > > > Also
> > > > > > it
> > > > > > >> >> >>>>>> compiles
> > > > > > >> >> >>>>>>>>>> app to job graph by run the main method with a
> > > > hijacked
> > > > > > exec
> > > > > > >> >> >>>> env,
> > > > > > >> >> >>>>>>>>>> which constrain the main method further.
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>> I'd like to write down a few of notes on
> > > configs/args
> > > > > pass
> > > > > > >> >> >> and
> > > > > > >> >> >>>>>>> respect,
> > > > > > >> >> >>>>>>>>>> as well as decoupling job compilation and
> > > submission.
> > > > > > Share
> > > > > > >> >> >> on
> > > > > > >> >> >>>>> this
> > > > > > >> >> >>>>>>>>>> thread later.
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>> Best,
> > > > > > >> >> >>>>>>>>>> tison.
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com>
> 于2019年6月17日周一
> > > > > > >> >> >> 下午7:29写道：
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
> > > > > > >> >> >>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the design
> > > document.
> > > > > > >> >> >>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>> We are also working on refactoring
> ClusterClient to
> > > > > allow
> > > > > > >> >> >>>>> flexible
> > > > > > >> >> >>>>>>> and
> > > > > > >> >> >>>>>>>>>>> efficient job management in our real-time
> platform.
> > > > > > >> >> >>>>>>>>>>> We would like to draft a document to share our
> > > ideas
> > > > > with
> > > > > > >> >> >>> you.
> > > > > > >> >> >>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>> I think it's a good idea to have something like
> > > > Apache
> > > > > > Livy
> > > > > > >> >> >>> for
> > > > > > >> >> >>>>>>> Flink,
> > > > > > >> >> >>>>>>>>>>> and
> > > > > > >> >> >>>>>>>>>>> the efforts discussed here will take a great
> step
> > > > > forward
> > > > > > >> >> >> to
> > > > > > >> >> >>>> it.
> > > > > > >> >> >>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>> Regards,
> > > > > > >> >> >>>>>>>>>>> Xiaogang
> > > > > > >> >> >>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>> Flavio Pompermaier <po...@okkam.it>
> > > > > 于2019年6月17日周一
> > > > > > >> >> >>>>> 下午7:13写道：
> > > > > > >> >> >>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>> Is there any possibility to have something
> like
> > > > Apache
> > > > > > >> >> >> Livy
> > > > > > >> >> >>>> [1]
> > > > > > >> >> >>>>>>> also
> > > > > > >> >> >>>>>>>>>>> for
> > > > > > >> >> >>>>>>>>>>>> Flink in the future?
> > > > > > >> >> >>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
> > > > > > >> >> >>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <
> > > > > > >> >> >>> zjffdu@gmail.com
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>>>> wrote:
> > > > > > >> >> >>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>> Any API we expose should not have
> dependencies
> > > > on
> > > > > > >> >> >>> the
> > > > > > >> >> >>>>>>> runtime
> > > > > > >> >> >>>>>>>>>>>>> (flink-runtime) package or other
> implementation
> > > > > > >> >> >> details.
> > > > > > >> >> >>> To
> > > > > > >> >> >>>>> me,
> > > > > > >> >> >>>>>>>> this
> > > > > > >> >> >>>>>>>>>>>> means
> > > > > > >> >> >>>>>>>>>>>>> that the current ClusterClient cannot be
> exposed
> > > to
> > > > > > >> >> >> users
> > > > > > >> >> >>>>>> because
> > > > > > >> >> >>>>>>>> it
> > > > > > >> >> >>>>>>>>>>>> uses
> > > > > > >> >> >>>>>>>>>>>>> quite some classes from the optimiser and
> runtime
> > > > > > >> >> >>> packages.
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>> We should change ClusterClient from class to
> > > > > interface.
> > > > > > >> >> >>>>>>>>>>>>> ExecutionEnvironment only use the interface
> > > > > > >> >> >> ClusterClient
> > > > > > >> >> >>>>> which
> > > > > > >> >> >>>>>>>>>>> should be
> > > > > > >> >> >>>>>>>>>>>>> in flink-clients while the concrete
> > > implementation
> > > > > > >> >> >> class
> > > > > > >> >> >>>>> could
> > > > > > >> >> >>>>>> be
> > > > > > >> >> >>>>>>>> in
> > > > > > >> >> >>>>>>>>>>>>> flink-runtime.
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>> What happens when a failure/restart in the
> > > > client
> > > > > > >> >> >>>>> happens?
> > > > > > >> >> >>>>>>>> There
> > > > > > >> >> >>>>>>>>>>> need
> > > > > > >> >> >>>>>>>>>>>>> to be a way of re-establishing the
> connection to
> > > > the
> > > > > > >> >> >> job,
> > > > > > >> >> >>>> set
> > > > > > >> >> >>>>>> up
> > > > > > >> >> >>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>> listeners again, etc.
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>> Good point.  First we need to define what
> does
> > > > > > >> >> >>>>> failure/restart
> > > > > > >> >> >>>>>> in
> > > > > > >> >> >>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>> client mean. IIUC, that usually mean network
> > > > failure
> > > > > > >> >> >>> which
> > > > > > >> >> >>>>> will
> > > > > > >> >> >>>>>>>>>>> happen in
> > > > > > >> >> >>>>>>>>>>>>> class RestClient. If my understanding is
> correct,
> > > > > > >> >> >>>>> restart/retry
> > > > > > >> >> >>>>>>>>>>> mechanism
> > > > > > >> >> >>>>>>>>>>>>> should be done in RestClient.
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org>
> > > > 于2019年6月11日周二
> > > > > > >> >> >>>>>> 下午11:10写道：
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>> Some points to consider:
> > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>> * Any API we expose should not have
> dependencies
> > > > on
> > > > > > >> >> >> the
> > > > > > >> >> >>>>>> runtime
> > > > > > >> >> >>>>>>>>>>>>>> (flink-runtime) package or other
> implementation
> > > > > > >> >> >>> details.
> > > > > > >> >> >>>> To
> > > > > > >> >> >>>>>> me,
> > > > > > >> >> >>>>>>>>>>> this
> > > > > > >> >> >>>>>>>>>>>>> means
> > > > > > >> >> >>>>>>>>>>>>>> that the current ClusterClient cannot be
> exposed
> > > > to
> > > > > > >> >> >>> users
> > > > > > >> >> >>>>>>> because
> > > > > > >> >> >>>>>>>>>>> it
> > > > > > >> >> >>>>>>>>>>>>> uses
> > > > > > >> >> >>>>>>>>>>>>>> quite some classes from the optimiser and
> > > runtime
> > > > > > >> >> >>>> packages.
> > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>> * What happens when a failure/restart in the
> > > > client
> > > > > > >> >> >>>>> happens?
> > > > > > >> >> >>>>>>>> There
> > > > > > >> >> >>>>>>>>>>> need
> > > > > > >> >> >>>>>>>>>>>>> to
> > > > > > >> >> >>>>>>>>>>>>>> be a way of re-establishing the connection
> to
> > > the
> > > > > > >> >> >> job,
> > > > > > >> >> >>>> set
> > > > > > >> >> >>>>> up
> > > > > > >> >> >>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>> listeners
> > > > > > >> >> >>>>>>>>>>>>>> again, etc.
> > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>> Aljoscha
> > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff Zhang <
> > > > > > >> >> >>>> zjffdu@gmail.com>
> > > > > > >> >> >>>>>>>> wrote:
> > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is late as you
> > > > > > >> >> >> expected.
> > > > > > >> >> >>>>> Here's
> > > > > > >> >> >>>>>>> the
> > > > > > >> >> >>>>>>>>>>>> design
> > > > > > >> >> >>>>>>>>>>>>>> doc
> > > > > > >> >> >>>>>>>>>>>>>>> I drafted, welcome any comments and
> feedback.
> > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>
> > > > > > >> >> >>>
> > > > > > >> >> >>
> > > > > > >> >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>
> 于2019年2月14日周四
> > > > > > >> >> >>>> 下午8:43写道：
> > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>> Nice that this discussion is happening.
> > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also revisit the
> entire
> > > > role
> > > > > > >> >> >>> of
> > > > > > >> >> >>>>> the
> > > > > > >> >> >>>>>>>>>>>> environments
> > > > > > >> >> >>>>>>>>>>>>>>>> again.
> > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> > > > > > >> >> >>>>>>>>>>>>>>>> - the environments take care of the
> specific
> > > > > > >> >> >> setup
> > > > > > >> >> >>>> for
> > > > > > >> >> >>>>>>>>>>> standalone
> > > > > > >> >> >>>>>>>>>>>> (no
> > > > > > >> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
> > > > > > >> >> >>>>>>>>>>>>>>>> - the session ones have control over the
> > > > session.
> > > > > > >> >> >>> The
> > > > > > >> >> >>>>>>>>>>> environment
> > > > > > >> >> >>>>>>>>>>>>> holds
> > > > > > >> >> >>>>>>>>>>>>>>>> the session client.
> > > > > > >> >> >>>>>>>>>>>>>>>> - running a job gives a "control" object
> for
> > > > that
> > > > > > >> >> >>>> job.
> > > > > > >> >> >>>>>> That
> > > > > > >> >> >>>>>>>>>>>> behavior
> > > > > > >> >> >>>>>>>>>>>>> is
> > > > > > >> >> >>>>>>>>>>>>>>>> the same in all environments.
> > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>> The actual implementation diverged quite
> a bit
> > > > > > >> >> >> from
> > > > > > >> >> >>>>> that.
> > > > > > >> >> >>>>>>>> Happy
> > > > > > >> >> >>>>>>>>>>> to
> > > > > > >> >> >>>>>>>>>>>>> see a
> > > > > > >> >> >>>>>>>>>>>>>>>> discussion about straitening this out a
> bit
> > > > more.
> > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM Jeff
> Zhang <
> > > > > > >> >> >>>>>>> zjffdu@gmail.com>
> > > > > > >> >> >>>>>>>>>>>> wrote:
> > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>> Hi folks,
> > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>> Sorry for late response, It seems we
> reach
> > > > > > >> >> >>> consensus
> > > > > > >> >> >>>> on
> > > > > > >> >> >>>>>>>> this, I
> > > > > > >> >> >>>>>>>>>>>> will
> > > > > > >> >> >>>>>>>>>>>>>>>> create
> > > > > > >> >> >>>>>>>>>>>>>>>>> FLIP for this with more detailed design
> > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org>
> 于2018年12月21日周五
> > > > > > >> >> >>>>> 上午11:43写道：
> > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>> Great to see this discussion seeded! The
> > > > > > >> >> >> problems
> > > > > > >> >> >>>> you
> > > > > > >> >> >>>>>> face
> > > > > > >> >> >>>>>>>>>>> with
> > > > > > >> >> >>>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also affecting
> > > other
> > > > > > >> >> >>>>> downstream
> > > > > > >> >> >>>>>>>>>>> projects,
> > > > > > >> >> >>>>>>>>>>>>>> like
> > > > > > >> >> >>>>>>>>>>>>>>>>>> Beam.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint restore
> option
> > > > in
> > > > > > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> > > > > > >> >> >>>>>>>>>>>>>>>>> [1]
> > > > > > >> >> >>>>>>>>>>>>>>>>>> and that was more difficult than it
> should
> > > be.
> > > > > > >> >> >> The
> > > > > > >> >> >>>>> main
> > > > > > >> >> >>>>>>>> issue
> > > > > > >> >> >>>>>>>>>>> is
> > > > > > >> >> >>>>>>>>>>>>> that
> > > > > > >> >> >>>>>>>>>>>>>>>>>> environment and cluster client aren't
> > > > decoupled.
> > > > > > >> >> >>>>> Ideally
> > > > > > >> >> >>>>>>> it
> > > > > > >> >> >>>>>>>>>>> should
> > > > > > >> >> >>>>>>>>>>>>> be
> > > > > > >> >> >>>>>>>>>>>>>>>>>> possible to just get the matching
> cluster
> > > > client
> > > > > > >> >> >>>> from
> > > > > > >> >> >>>>>> the
> > > > > > >> >> >>>>>>>>>>>>> environment
> > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > >> >> >>>>>>>>>>>>>>>>>> then control the job through it
> (environment
> > > > as
> > > > > > >> >> >>>>> factory
> > > > > > >> >> >>>>>>> for
> > > > > > >> >> >>>>>>>>>>>> cluster
> > > > > > >> >> >>>>>>>>>>>>>>>>>> client). But note that the environment
> > > classes
> > > > > > >> >> >> are
> > > > > > >> >> >>>>> part
> > > > > > >> >> >>>>>> of
> > > > > > >> >> >>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>> public
> > > > > > >> >> >>>>>>>>>>>>>>>>> API,
> > > > > > >> >> >>>>>>>>>>>>>>>>>> and it is not straightforward to make
> larger
> > > > > > >> >> >>> changes
> > > > > > >> >> >>>>>>> without
> > > > > > >> >> >>>>>>>>>>>>> breaking
> > > > > > >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes internal
> > > > classes
> > > > > > >> >> >>>> like
> > > > > > >> >> >>>>>>>>>>> JobGraph and
> > > > > > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be possible
> to
> > > wrap
> > > > > > >> >> >>> this
> > > > > > >> >> >>>>>> with a
> > > > > > >> >> >>>>>>>> new
> > > > > > >> >> >>>>>>>>>>>>> public
> > > > > > >> >> >>>>>>>>>>>>>>>> API
> > > > > > >> >> >>>>>>>>>>>>>>>>>> that brings the required job control
> > > > > > >> >> >> capabilities
> > > > > > >> >> >>>> for
> > > > > > >> >> >>>>>>>>>>> downstream
> > > > > > >> >> >>>>>>>>>>>>>>>>> projects.
> > > > > > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at some
> of the
> > > > > > >> >> >>>>> interfaces
> > > > > > >> >> >>>>>> in
> > > > > > >> >> >>>>>>>>>>> Beam
> > > > > > >> >> >>>>>>>>>>>>> while
> > > > > > >> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the
> portable
> > > job
> > > > > > >> >> >> API
> > > > > > >> >> >>>> and
> > > > > > >> >> >>>>>> [3]
> > > > > > >> >> >>>>>>>> for
> > > > > > >> >> >>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>> old
> > > > > > >> >> >>>>>>>>>>>>>>>>>> asynchronous job control from the Beam
> Java
> > > > SDK.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>> The backward compatibility discussion
> [4] is
> > > > > > >> >> >> also
> > > > > > >> >> >>>>>> relevant
> > > > > > >> >> >>>>>>>>>>> here. A
> > > > > > >> >> >>>>>>>>>>>>> new
> > > > > > >> >> >>>>>>>>>>>>>>>>> API
> > > > > > >> >> >>>>>>>>>>>>>>>>>> should shield downstream projects from
> > > > internals
> > > > > > >> >> >>> and
> > > > > > >> >> >>>>>> allow
> > > > > > >> >> >>>>>>>>>>> them to
> > > > > > >> >> >>>>>>>>>>>>>>>>>> interoperate with multiple future Flink
> > > > versions
> > > > > > >> >> >>> in
> > > > > > >> >> >>>>> the
> > > > > > >> >> >>>>>>> same
> > > > > > >> >> >>>>>>>>>>>> release
> > > > > > >> >> >>>>>>>>>>>>>>>> line
> > > > > > >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>> Thanks,
> > > > > > >> >> >>>>>>>>>>>>>>>>>> Thomas
> > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>> [1]
> > > https://github.com/apache/flink/pull/7249
> > > > > > >> >> >>>>>>>>>>>>>>>>>> [2]
> > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>
> > > > > > >> >> >>>
> > > > > > >> >> >>
> > > > > > >> >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > > > > >> >> >>>>>>>>>>>>>>>>>> [3]
> > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>
> > > > > > >> >> >>>
> > > > > > >> >> >>
> > > > > > >> >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > > > > >> >> >>>>>>>>>>>>>>>>>> [4]
> > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>
> > > > > > >> >> >>>
> > > > > > >> >> >>
> > > > > > >> >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff
> Zhang <
> > > > > > >> >> >>>>>>>> zjffdu@gmail.com>
> > > > > > >> >> >>>>>>>>>>>>> wrote:
> > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user
> should
> > > be
> > > > > > >> >> >>> able
> > > > > > >> >> >>>> to
> > > > > > >> >> >>>>>>>> define
> > > > > > >> >> >>>>>>>>>>>> where
> > > > > > >> >> >>>>>>>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>>> job
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is
> > > actually
> > > > > > >> >> >>>>>> independent
> > > > > > >> >> >>>>>>>> of
> > > > > > >> >> >>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>> job
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> development and is something which is
> > > decided
> > > > > > >> >> >> at
> > > > > > >> >> >>>>>>> deployment
> > > > > > >> >> >>>>>>>>>>> time.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> User don't need to specify execution
> mode
> > > > > > >> >> >>>>>>> programmatically.
> > > > > > >> >> >>>>>>>>>>> They
> > > > > > >> >> >>>>>>>>>>>>> can
> > > > > > >> >> >>>>>>>>>>>>>>>>> also
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from the
> arguments
> > > in
> > > > > > >> >> >>> flink
> > > > > > >> >> >>>>> run
> > > > > > >> >> >>>>>>>>>>> command.
> > > > > > >> >> >>>>>>>>>>>>> e.g.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster ....
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that the
> > > > > > >> >> >>>> ExecutionEnvironment
> > > > > > >> >> >>>>>> is
> > > > > > >> >> >>>>>>>> not
> > > > > > >> >> >>>>>>>>>>>>>>>> directly
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> initialized by the user and instead
> context
> > > > > > >> >> >>>> sensitive
> > > > > > >> >> >>>>>> how
> > > > > > >> >> >>>>>>>> you
> > > > > > >> >> >>>>>>>>>>>> want
> > > > > > >> >> >>>>>>>>>>>>> to
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs. IDE,
> for
> > > > > > >> >> >>> example).
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink would
> > > create
> > > > > > >> >> >>>>> different
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment based on
> > > > different
> > > > > > >> >> >>>>>> submission
> > > > > > >> >> >>>>>>>>>>>> scenarios
> > > > > > >> >> >>>>>>>>>>>>>>>>>> (Flink
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is kind of hack
> > > > > > >> >> >> approach,
> > > > > > >> >> >>>> not
> > > > > > >> >> >>>>>> so
> > > > > > >> >> >>>>>>>>>>>>>>>>> straightforward.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> What I suggested above is that is that
> > > flink
> > > > > > >> >> >>> should
> > > > > > >> >> >>>>>>> always
> > > > > > >> >> >>>>>>>>>>> create
> > > > > > >> >> >>>>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with different
> > > > > > >> >> >>>>> configuration,
> > > > > > >> >> >>>>>>> and
> > > > > > >> >> >>>>>>>>>>> based
> > > > > > >> >> >>>>>>>>>>>> on
> > > > > > >> >> >>>>>>>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> configuration it would create the
> proper
> > > > > > >> >> >>>>> ClusterClient
> > > > > > >> >> >>>>>>> for
> > > > > > >> >> >>>>>>>>>>>>> different
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> > > > > > >> >> >>>> 于2018年12月20日周四
> > > > > > >> >> >>>>>>>>>>> 下午11:18写道：
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> You are probably right that we have
> code
> > > > > > >> >> >>>> duplication
> > > > > > >> >> >>>>>>> when
> > > > > > >> >> >>>>>>>> it
> > > > > > >> >> >>>>>>>>>>>> comes
> > > > > > >> >> >>>>>>>>>>>>>>>> to
> > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient. This
> should
> > > > be
> > > > > > >> >> >>>>> reduced
> > > > > > >> >> >>>>>> in
> > > > > > >> >> >>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>> future.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user
> should be
> > > > > > >> >> >> able
> > > > > > >> >> >>> to
> > > > > > >> >> >>>>>>> define
> > > > > > >> >> >>>>>>>>>>> where
> > > > > > >> >> >>>>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is
> > > > actually
> > > > > > >> >> >>>>>>> independent
> > > > > > >> >> >>>>>>>>>>> of the
> > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> development and is something which is
> > > > decided
> > > > > > >> >> >> at
> > > > > > >> >> >>>>>>>> deployment
> > > > > > >> >> >>>>>>>>>>>> time.
> > > > > > >> >> >>>>>>>>>>>>>>>> To
> > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> it
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the
> ExecutionEnvironment
> > > is
> > > > > > >> >> >> not
> > > > > > >> >> >>>>>>> directly
> > > > > > >> >> >>>>>>>>>>>>>>>> initialized
> > > > > > >> >> >>>>>>>>>>>>>>>>>> by
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> the user and instead context
> sensitive how
> > > > you
> > > > > > >> >> >>>> want
> > > > > > >> >> >>>>> to
> > > > > > >> >> >>>>>>>>>>> execute
> > > > > > >> >> >>>>>>>>>>>>> your
> > > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for example).
> > > However, I
> > > > > > >> >> >>> agree
> > > > > > >> >> >>>>>> that
> > > > > > >> >> >>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should give you
> > > access
> > > > to
> > > > > > >> >> >>> the
> > > > > > >> >> >>>>>>>>>>> ClusterClient
> > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > >> >> >>>>>>>>>>>>>>>>> to
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the
> JobGraph or
> > > a
> > > > > > >> >> >> job
> > > > > > >> >> >>>>> plan).
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> Till
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff
> > > Zhang <
> > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com>
> > > > > > >> >> >>>>>>>>>>>>>>>> wrote:
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You are
> right
> > > > that I
> > > > > > >> >> >>>>> expect
> > > > > > >> >> >>>>>>>> better
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api which
> could be
> > > > > > >> >> >> used
> > > > > > >> >> >>> by
> > > > > > >> >> >>>>>>>>>>> downstream
> > > > > > >> >> >>>>>>>>>>>>>>>>> project.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> And
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the flink
> ecosystem.
> > > > > > >> >> >> When
> > > > > > >> >> >>> I
> > > > > > >> >> >>>>> look
> > > > > > >> >> >>>>>>> at
> > > > > > >> >> >>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>> code
> > > > > > >> >> >>>>>>>>>>>>>>>>> of
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> flink
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I believe
> > > they
> > > > > > >> >> >> are
> > > > > > >> >> >>>> not
> > > > > > >> >> >>>>>> the
> > > > > > >> >> >>>>>>>>>>> core of
> > > > > > >> >> >>>>>>>>>>>>>>>>> flink,
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> but
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of flink), I
> find
> > > > > > >> >> >> many
> > > > > > >> >> >>>>>>> duplicated
> > > > > > >> >> >>>>>>>>>>> code
> > > > > > >> >> >>>>>>>>>>>>>>>> for
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> creating
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user provided
> > > > > > >> >> >> configuration
> > > > > > >> >> >>>>>>>>>>> (configuration
> > > > > > >> >> >>>>>>>>>>>>>>>>> format
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> may
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell and
> > > sql-client)
> > > > > > >> >> >> and
> > > > > > >> >> >>>> then
> > > > > > >> >> >>>>>> use
> > > > > > >> >> >>>>>>>>>>> that
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't think
> this is
> > > > > > >> >> >>>>> convenient
> > > > > > >> >> >>>>>>> for
> > > > > > >> >> >>>>>>>>>>>>>>>> downstream
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is that
> > > downstream
> > > > > > >> >> >>>> project
> > > > > > >> >> >>>>>> only
> > > > > > >> >> >>>>>>>>>>> needs
> > > > > > >> >> >>>>>>>>>>>> to
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> provide
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration info (maybe
> > > > > > >> >> >> introducing
> > > > > > >> >> >>>>> class
> > > > > > >> >> >>>>>>>>>>>> FlinkConf),
> > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment based on
> this
> > > > > > >> >> >>>> FlinkConf,
> > > > > > >> >> >>>>>> and
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will create the
> > > proper
> > > > > > >> >> >>>>>>>> ClusterClient.
> > > > > > >> >> >>>>>>>>>>> It
> > > > > > >> >> >>>>>>>>>>>> not
> > > > > > >> >> >>>>>>>>>>>>>>>>>> only
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream project
> > > > > > >> >> >> development
> > > > > > >> >> >>>> but
> > > > > > >> >> >>>>>> also
> > > > > > >> >> >>>>>>>> be
> > > > > > >> >> >>>>>>>>>>>>>>>> helpful
> > > > > > >> >> >>>>>>>>>>>>>>>>>> for
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> their integration test with flink.
> Here's
> > > > one
> > > > > > >> >> >>>>> sample
> > > > > > >> >> >>>>>>> code
> > > > > > >> >> >>>>>>>>>>>> snippet
> > > > > > >> >> >>>>>>>>>>>>>>>>>> that
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> I
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new
> FlinkConf().mode("yarn")
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new
> ExecutionEnvironment(conf)
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> > > > > > >> >> >>>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> env.getClusterClient().cancelJob(jobId)
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> > > > > > >> >> >>>>> 于2018年12月11日周二
> > > > > > >> >> >>>>>>>>>>> 下午6:28写道：
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to
> provide the
> > > > > > >> >> >> user
> > > > > > >> >> >>>> with
> > > > > > >> >> >>>>>>>> better
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was actually an
> effort to
> > > > > > >> >> >>> achieve
> > > > > > >> >> >>>>>> this
> > > > > > >> >> >>>>>>>> but
> > > > > > >> >> >>>>>>>>>>> it
> > > > > > >> >> >>>>>>>>>>>>>>>> has
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> never
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> been
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However, there are
> some
> > > > > > >> >> >>>> improvement
> > > > > > >> >> >>>>>> in
> > > > > > >> >> >>>>>>>> the
> > > > > > >> >> >>>>>>>>>>> code
> > > > > > >> >> >>>>>>>>>>>>>>>>> base
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> now.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the
> NewClusterClient
> > > > > > >> >> >>>> interface
> > > > > > >> >> >>>>>>> which
> > > > > > >> >> >>>>>>>>>>>>>>>> offers a
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission. But I
> agree
> > > > > > >> >> >> that
> > > > > > >> >> >>> we
> > > > > > >> >> >>>>>> need
> > > > > > >> >> >>>>>>> to
> > > > > > >> >> >>>>>>>>>>>>>>>> improve
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> in
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if
> exposing all
> > > > > > >> >> >>>>>> ClusterClient
> > > > > > >> >> >>>>>>>>>>> calls
> > > > > > >> >> >>>>>>>>>>>>>>>> via
> > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment because it
> would
> > > > > > >> >> >> clutter
> > > > > > >> >> >>>> the
> > > > > > >> >> >>>>>>> class
> > > > > > >> >> >>>>>>>>>>> and
> > > > > > >> >> >>>>>>>>>>>>>>>> would
> > > > > > >> >> >>>>>>>>>>>>>>>>>> not
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> a
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns.
> Instead one
> > > > > > >> >> >> idea
> > > > > > >> >> >>>>> could
> > > > > > >> >> >>>>>> be
> > > > > > >> >> >>>>>>>> to
> > > > > > >> >> >>>>>>>>>>>>>>>>> retrieve
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from the
> > > > > > >> >> >>>>> ExecutionEnvironment
> > > > > > >> >> >>>>>>>> which
> > > > > > >> >> >>>>>>>>>>> can
> > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > >> >> >>>>>>>>>>>>>>>>>> be
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> used
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job control. But
> before
> > > we
> > > > > > >> >> >>> start
> > > > > > >> >> >>>>> an
> > > > > > >> >> >>>>>>>> effort
> > > > > > >> >> >>>>>>>>>>>>>>>> here,
> > > > > > >> >> >>>>>>>>>>>>>>>>> we
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> need
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what
> functionality we
> > > > want
> > > > > > >> >> >>> to
> > > > > > >> >> >>>>>>> provide.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was that we
> have the
> > > > > > >> >> >>>>>>>> ClusterDescriptor
> > > > > > >> >> >>>>>>>>>>>>>>>>>> describing
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> how
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager like
> Yarn or
> > > > > > >> >> >> Mesos.
> > > > > > >> >> >>>> The
> > > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> can
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> be
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink clusters
> (job
> > > and
> > > > > > >> >> >>>>> session)
> > > > > > >> >> >>>>>>> and
> > > > > > >> >> >>>>>>>>>>> gives
> > > > > > >> >> >>>>>>>>>>>>>>>>> you a
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The ClusterClient
> > > controls
> > > > > > >> >> >> the
> > > > > > >> >> >>>>>> cluster
> > > > > > >> >> >>>>>>>>>>> (e.g.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> submitting
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running jobs). And
> > > then
> > > > > > >> >> >>> there
> > > > > > >> >> >>>>> was
> > > > > > >> >> >>>>>>> the
> > > > > > >> >> >>>>>>>>>>> idea
> > > > > > >> >> >>>>>>>>>>>> to
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain from the
> > > > > > >> >> >>>> ClusterClient
> > > > > > >> >> >>>>> to
> > > > > > >> >> >>>>>>>>>>> trigger
> > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> specific
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a savepoint,
> > > > > > >> >> >>> cancelling
> > > > > > >> >> >>>>> the
> > > > > > >> >> >>>>>>>> job).
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> > > > > > >> >> >>>>>> https://issues.apache.org/jira/browse/FLINK-4272
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM
> Jeff
> > > > Zhang
> > > > > > >> >> >> <
> > > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> wrote:
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate flink into
> > > > apache
> > > > > > >> >> >>>>> zeppelin
> > > > > > >> >> >>>>>>>>>>> which is
> > > > > > >> >> >>>>>>>>>>>>>>>> an
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several issues
> that
> > > > is
> > > > > > >> >> >>>> caused
> > > > > > >> >> >>>>>> by
> > > > > > >> >> >>>>>>>>>>> flink
> > > > > > >> >> >>>>>>>>>>>>>>>>> client
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> api.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> So
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the following
> > > > changes
> > > > > > >> >> >>> for
> > > > > > >> >> >>>>>> flink
> > > > > > >> >> >>>>>>>>>>> client
> > > > > > >> >> >>>>>>>>>>>>>>>>> api.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking execution.
> > > > > > >> >> >> Currently,
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which would
> do 2
> > > > > > >> >> >> things,
> > > > > > >> >> >>>>> first
> > > > > > >> >> >>>>>>>>>>> submit
> > > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is finished.
> I'd
> > > > like
> > > > > > >> >> >>>>>>> introduce a
> > > > > > >> >> >>>>>>>>>>>>>>>>>> nonblocking
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
> > > > > > >> >> >>>> ExecutionEnvironment#submit
> > > > > > >> >> >>>>>>> which
> > > > > > >> >> >>>>>>>>>>> only
> > > > > > >> >> >>>>>>>>>>>>>>>>>> submit
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> job
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> and
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to client. And
> allow
> > > > user
> > > > > > >> >> >>> to
> > > > > > >> >> >>>>>> query
> > > > > > >> >> >>>>>>>> the
> > > > > > >> >> >>>>>>>>>>> job
> > > > > > >> >> >>>>>>>>>>>>>>>>>> status
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> via
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to cancel
> job is
> > > > via
> > > > > > >> >> >>> cli
> > > > > > >> >> >>>>>>>>>>> (bin/flink),
> > > > > > >> >> >>>>>>>>>>>>>>>>> this
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> is
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> not
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream project
> to
> > > use
> > > > > > >> >> >> this
> > > > > > >> >> >>>>>>> feature.
> > > > > > >> >> >>>>>>>>>>> So I'd
> > > > > > >> >> >>>>>>>>>>>>>>>>>> like
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> add
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in ExecutionEnvironment
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> It
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api, we
> should use
> > > > > > >> >> >>>>>>>>>>> ExecutionEnvironment
> > > > > > >> >> >>>>>>>>>>>>>>>> as
> > > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to integrate
> with
> > > > > > >> >> >> flink.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job execution
> > > > > > >> >> >> lifecycle.
> > > > > > >> >> >>>>>>> Something
> > > > > > >> >> >>>>>>>>>>> like
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> following,
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> so
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project can do
> custom
> > > > logic
> > > > > > >> >> >>> in
> > > > > > >> >> >>>>> the
> > > > > > >> >> >>>>>>>>>>> lifecycle
> > > > > > >> >> >>>>>>>>>>>>>>>> of
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> job.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the jobId
> after
> > > > job
> > > > > > >> >> >> is
> > > > > > >> >> >>>>>>> submitted
> > > > > > >> >> >>>>>>>>>>> and
> > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> use
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> this
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later when
> > > necessary.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface JobListener {
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID jobId);
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void
> onJobExecuted(JobExecutionResult
> > > > > > >> >> >>>>> jobResult);
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID jobId);
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in
> > > > ExecutionEnvironment.
> > > > > > >> >> >>>>>> Currently
> > > > > > >> >> >>>>>>> it
> > > > > > >> >> >>>>>>>>>>> is
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> disabled,
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> but
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient for
> third
> > > > party
> > > > > > >> >> >> to
> > > > > > >> >> >>>>>>>> submitting
> > > > > > >> >> >>>>>>>>>>> jobs
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it again.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client api into
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue which
> needs
> > > > more
> > > > > > >> >> >>>>> careful
> > > > > > >> >> >>>>>>>>>>> thinking
> > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> design.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features of
> flink is
> > > > > > >> >> >>> exposed
> > > > > > >> >> >>>> in
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > > >> >> >>>>>> but
> > > > > > >> >> >>>>>>>>>>> some are
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> exposed
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> in
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like the
> cancel and
> > > > > > >> >> >>>>> savepoint I
> > > > > > >> >> >>>>>>>>>>> mentioned
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> above.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> I
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is due to that
> > > flink
> > > > > > >> >> >>>> didn't
> > > > > > >> >> >>>>>>> unify
> > > > > > >> >> >>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> interaction
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> with
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3 scenarios of
> flink
> > > > > > >> >> >>>> operation
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.  Flink will
> > > > create
> > > > > > >> >> >>>>>>>>>>> LocalEnvironment
> > > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> use
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to create
> > > > > > >> >> >>> LocalExecutor
> > > > > > >> >> >>>>> for
> > > > > > >> >> >>>>>>> job
> > > > > > >> >> >>>>>>>>>>>>>>>>>> execution.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution. Flink will
> > > > create
> > > > > > >> >> >>>>>>>> ClusterClient
> > > > > > >> >> >>>>>>>>>>>>>>>>> first
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> and
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> then
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment based
> on the
> > > > > > >> >> >>>>>>> ClusterClient
> > > > > > >> >> >>>>>>>>>>> and
> > > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> run
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink will
> create
> > > > > > >> >> >>>>>> ClusterClient
> > > > > > >> >> >>>>>>>>>>> first
> > > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> then
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this ClusterClient.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the above 3
> > > scenarios.
> > > > > > >> >> >>> Flink
> > > > > > >> >> >>>>>> didn't
> > > > > > >> >> >>>>>>>>>>> use the
> > > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to interact
> with
> > > > flink
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is following:
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> > > > > > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
> > > > > > >> >> >>>>>>>>>>> (based
> > > > > > >> >> >>>>>>>>>>>>>>>> on
> > > > > > >> >> >>>>>>>>>>>>>>>>>> user
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this
> Environment
> > > > to
> > > > > > >> >> >>>> create
> > > > > > >> >> >>>>>>>> proper
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or
> > > RestClusterClient)
> > > > > > >> >> >> to
> > > > > > >> >> >>>>>>>> interactive
> > > > > > >> >> >>>>>>>>>>> with
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink (
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the process
> of
> > > > local
> > > > > > >> >> >>>>>> execution
> > > > > > >> >> >>>>>>>> and
> > > > > > >> >> >>>>>>>>>>>>>>>> remote
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for third
> party
> > > to
> > > > > > >> >> >>>>> integrate
> > > > > > >> >> >>>>>>> with
> > > > > > >> >> >>>>>>>>>>>>>>>> flink,
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> because
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the unified
> > > entry
> > > > > > >> >> >>> point
> > > > > > >> >> >>>>> for
> > > > > > >> >> >>>>>>>>>>> flink.
> > > > > > >> >> >>>>>>>>>>>>>>>> What
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> third
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> party
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass
> configuration
> > > to
> > > > > > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will do the
> right
> > > > > > >> >> >> thing
> > > > > > >> >> >>>>> based
> > > > > > >> >> >>>>>> on
> > > > > > >> >> >>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be considered as
> > > flink
> > > > > > >> >> >> api
> > > > > > >> >> >>>>>>> consumer.
> > > > > > >> >> >>>>>>>>>>> it
> > > > > > >> >> >>>>>>>>>>>>>>>> just
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> pass
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to
> ExecutionEnvironment
> > > and
> > > > > > >> >> >> let
> > > > > > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper ClusterClient
> instead
> > > > of
> > > > > > >> >> >>>>> letting
> > > > > > >> >> >>>>>>> cli
> > > > > > >> >> >>>>>>>> to
> > > > > > >> >> >>>>>>>>>>>>>>>>> create
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code
> refactoring,
> > > > so
> > > > > > >> >> >> I
> > > > > > >> >> >>>>> think
> > > > > > >> >> >>>>>> we
> > > > > > >> >> >>>>>>>> can
> > > > > > >> >> >>>>>>>>>>>>>>>> defer
> > > > > > >> >> >>>>>>>>>>>>>>>>>> it
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> for
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5 could be
> done
> > > > at
> > > > > > >> >> >>>> once I
> > > > > > >> >> >>>>>>>>>>> believe.
> > > > > > >> >> >>>>>>>>>>>>>>>> Let
> > > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>> know
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> your
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback, thanks
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> --
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> --
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>> --
> > > > > > >> >> >>>>>>>>>>>>>>>>> Best Regards
> > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>> --
> > > > > > >> >> >>>>>>>>>>>>>>> Best Regards
> > > > > > >> >> >>>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>> Jeff Zhang
> > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>> --
> > > > > > >> >> >>>>>>>>>>>>> Best Regards
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>> Jeff Zhang
> > > > > > >> >> >>>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>>
> > > > > > >> >> >>>>>>>>>>
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> --
> > > > > > >> >> >>>>>>>> Best Regards
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>> Jeff Zhang
> > > > > > >> >> >>>>>>>>
> > > > > > >> >> >>>>>>>
> > > > > > >> >> >>>>>>
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>> --
> > > > > > >> >> >>>>> Best Regards
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>> Jeff Zhang
> > > > > > >> >> >>>>>
> > > > > > >> >> >>>>
> > > > > > >> >> >>>
> > > > > > >> >> >>
> > > > > > >> >> >
> > > > > > >> >> >
> > > > > > >> >> > --
> > > > > > >> >> > Best Regards
> > > > > > >> >> >
> > > > > > >> >> > Jeff Zhang
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Kostas Kloudas <kk...@gmail.com>.

Hi all,

On the topic of web submission, I agree with Till that it only seems
to complicate things.
It is bad for security, job isolation (anybody can submit/cancel jobs), and its
implementation complicates some parts of the code. So, if it were to
redesign the
WebUI, maybe this part could be left out. In addition, I would say
that the ability to cancel
jobs could also be left out.

Also I would also be in favour of removing the "detached" mode, for
the reasons mentioned
above (i.e. because now we will have a future representing the result
on which the user
can choose to wait or not).

Now for the separating job submission and cluster creation, I am in
favour of keeping both.
Once again, the reasons are mentioned above by Stephan, Till, Aljoscha
and also Zili seems
to agree. They mainly have to do with security, isolation and ease of
resource management
for the user as he knows that "when my job is done, everything will be
cleared up". This is
also the experience you get when launching a process on your local OS.

On excluding the per-job mode from returning a JobClient or not, I
believe that eventually
it would be nice to allow users to get back a jobClient. The reason is
that 1) I cannot
find any objective reason why the user-experience should diverge, and
2) this will be the
way that the user will be able to interact with his running job.
Assuming that the necessary
ports are open for the REST API to work, then I think that the
JobClient can run against the
REST API without problems. If the needed ports are not open, then we
are safe to not return
a JobClient, as the user explicitly chose to close all points of
communication to his running job.

On the topic of not hijacking the "env.execute()" in order to get the
Plan, I definitely agree but
for the proposal of having a "compile()" method in the env, I would
like to have a better look at
the existing code.

Cheers,
Kostas

On Fri, Aug 23, 2019 at 5:52 AM Zili Chen <wa...@gmail.com> wrote:
>
> Hi Yang,
>
> It would be helpful if you check Stephan's last comment,
> which states that isolation is important.
>
> For per-job mode, we run a dedicated cluster(maybe it
> should have been a couple of JM and TMs during FLIP-6
> design) for a specific job. Thus the process is prevented
> from other jobs.
>
> In our cases there was a time we suffered from multi
> jobs submitted by different users and they affected
> each other so that all ran into an error state. Also,
> run the client inside the cluster could save client
> resource at some points.
>
> However, we also face several issues as you mentioned,
> that in per-job mode it always uses parent classloader
> thus classloading issues occur.
>
> BTW, one can makes an analogy between session/per-job mode
> in  Flink, and client/cluster mode in Spark.
>
> Best,
> tison.
>
>
> Yang Wang <da...@gmail.com> 于2019年8月22日周四 上午11:25写道：
>
> > From the user's perspective, it is really confused about the scope of
> > per-job cluster.
> >
> >
> > If it means a flink cluster with single job, so that we could get better
> > isolation.
> >
> > Now it does not matter how we deploy the cluster, directly deploy(mode1)
> >
> > or start a flink cluster and then submit job through cluster client(mode2).
> >
> >
> > Otherwise, if it just means directly deploy, how should we name the mode2,
> >
> > session with job or something else?
> >
> > We could also benefit from the mode2. Users could get the same isolation
> > with mode1.
> >
> > The user code and dependencies will be loaded by user class loader
> >
> > to avoid class conflict with framework.
> >
> >
> >
> > Anyway, both of the two submission modes are useful.
> >
> > We just need to clarify the concepts.
> >
> >
> >
> >
> > Best,
> >
> > Yang
> >
> > Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午5:58写道：
> >
> > > Thanks for the clarification.
> > >
> > > The idea JobDeployer ever came into my mind when I was muddled with
> > > how to execute per-job mode and session mode with the same user code
> > > and framework codepath.
> > >
> > > With the concept JobDeployer we back to the statement that environment
> > > knows every configs of cluster deployment and job submission. We
> > > configure or generate from configuration a specific JobDeployer in
> > > environment and then code align on
> > >
> > > *JobClient client = env.execute().get();*
> > >
> > > which in session mode returned by clusterClient.submitJob and in per-job
> > > mode returned by clusterDescriptor.deployJobCluster.
> > >
> > > Here comes a problem that currently we directly run ClusterEntrypoint
> > > with extracted job graph. Follow the JobDeployer way we'd better
> > > align entry point of per-job deployment at JobDeployer. Users run
> > > their main method or by a Cli(finally call main method) to deploy the
> > > job cluster.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Stephan Ewen <se...@apache.org> 于2019年8月20日周二 下午4:40写道：
> > >
> > > > Till has made some good comments here.
> > > >
> > > > Two things to add:
> > > >
> > > >   - The job mode is very nice in the way that it runs the client inside
> > > the
> > > > cluster (in the same image/process that is the JM) and thus unifies
> > both
> > > > applications and what the Spark world calls the "driver mode".
> > > >
> > > >   - Another thing I would add is that during the FLIP-6 design, we were
> > > > thinking about setups where Dispatcher and JobManager are separate
> > > > processes.
> > > >     A Yarn or Mesos Dispatcher of a session could run independently
> > (even
> > > > as privileged processes executing no code).
> > > >     Then you the "per-job" mode could still be helpful: when a job is
> > > > submitted to the dispatcher, it launches the JM again in a per-job
> > mode,
> > > so
> > > > that JM and TM processes are bound to teh job only. For higher security
> > > > setups, it is important that processes are not reused across jobs.
> > > >
> > > > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <tr...@apache.org>
> > > > wrote:
> > > >
> > > > > I would not be in favour of getting rid of the per-job mode since it
> > > > > simplifies the process of running Flink jobs considerably. Moreover,
> > it
> > > > is
> > > > > not only well suited for container deployments but also for
> > deployments
> > > > > where you want to guarantee job isolation. For example, a user could
> > > use
> > > > > the per-job mode on Yarn to execute his job on a separate cluster.
> > > > >
> > > > > I think that having two notions of cluster deployments (session vs.
> > > > per-job
> > > > > mode) does not necessarily contradict your ideas for the client api
> > > > > refactoring. For example one could have the following interfaces:
> > > > >
> > > > > - ClusterDeploymentDescriptor: encapsulates the logic how to deploy a
> > > > > cluster.
> > > > > - ClusterClient: allows to interact with a cluster
> > > > > - JobClient: allows to interact with a running job
> > > > >
> > > > > Now the ClusterDeploymentDescriptor could have two methods:
> > > > >
> > > > > - ClusterClient deploySessionCluster()
> > > > > - JobClusterClient/JobClient deployPerJobCluster(JobGraph)
> > > > >
> > > > > where JobClusterClient is either a supertype of ClusterClient which
> > > does
> > > > > not give you the functionality to submit jobs or deployPerJobCluster
> > > > > returns directly a JobClient.
> > > > >
> > > > > When setting up the ExecutionEnvironment, one would then not provide
> > a
> > > > > ClusterClient to submit jobs but a JobDeployer which, depending on
> > the
> > > > > selected mode, either uses a ClusterClient (session mode) to submit
> > > jobs
> > > > or
> > > > > a ClusterDeploymentDescriptor to deploy per a job mode cluster with
> > the
> > > > job
> > > > > to execute.
> > > > >
> > > > > These are just some thoughts how one could make it working because I
> > > > > believe there is some value in using the per job mode from the
> > > > > ExecutionEnvironment.
> > > > >
> > > > > Concerning the web submission, this is indeed a bit tricky. From a
> > > > cluster
> > > > > management stand point, I would in favour of not executing user code
> > on
> > > > the
> > > > > REST endpoint. Especially when considering security, it would be good
> > > to
> > > > > have a well defined cluster behaviour where it is explicitly stated
> > > where
> > > > > user code and, thus, potentially risky code is executed. Ideally we
> > > limit
> > > > > it to the TaskExecutor and JobMaster.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <
> > > pompermaier@okkam.it
> > > > >
> > > > > wrote:
> > > > >
> > > > > > In my opinion the client should not use any environment to get the
> > > Job
> > > > > > graph because the jar should reside ONLY on the cluster (and not in
> > > the
> > > > > > client classpath otherwise there are always inconsistencies between
> > > > > client
> > > > > > and Flink Job manager's classpath).
> > > > > > In the YARN, Mesos and Kubernetes scenarios you have the jar but
> > you
> > > > > could
> > > > > > start a cluster that has the jar on the Job Manager as well (but
> > this
> > > > is
> > > > > > the only case where I think you can assume that the client has the
> > > jar
> > > > on
> > > > > > the classpath..in the REST job submission you don't have any
> > > > classpath).
> > > > > >
> > > > > > Thus, always in my opinion, the JobGraph should be generated by the
> > > Job
> > > > > > Manager REST API.
> > > > > >
> > > > > >
> > > > > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <wa...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > >> I would like to involve Till & Stephan here to clarify some
> > concept
> > > of
> > > > > >> per-job mode.
> > > > > >>
> > > > > >> The term per-job is one of modes a cluster could run on. It is
> > > mainly
> > > > > >> aimed
> > > > > >> at spawn
> > > > > >> a dedicated cluster for a specific job while the job could be
> > > packaged
> > > > > >> with
> > > > > >> Flink
> > > > > >> itself and thus the cluster initialized with job so that get rid
> > of
> > > a
> > > > > >> separated
> > > > > >> submission step.
> > > > > >>
> > > > > >> This is useful for container deployments where one create his
> > image
> > > > with
> > > > > >> the job
> > > > > >> and then simply deploy the container.
> > > > > >>
> > > > > >> However, it is out of client scope since a client(ClusterClient
> > for
> > > > > >> example) is for
> > > > > >> communicate with an existing cluster and performance actions.
> > > > Currently,
> > > > > >> in
> > > > > >> per-job
> > > > > >> mode, we extract the job graph and bundle it into cluster
> > deployment
> > > > and
> > > > > >> thus no
> > > > > >> concept of client get involved. It looks like reasonable to
> > exclude
> > > > the
> > > > > >> deployment
> > > > > >> of per-job cluster from client api and use dedicated utility
> > > > > >> classes(deployers) for
> > > > > >> deployment.
> > > > > >>
> > > > > >> Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午12:37写道：
> > > > > >>
> > > > > >> > Hi Aljoscha,
> > > > > >> >
> > > > > >> > Thanks for your reply and participance. The Google Doc you
> > linked
> > > to
> > > > > >> > requires
> > > > > >> > permission and I think you could use a share link instead.
> > > > > >> >
> > > > > >> > I agree with that we almost reach a consensus that JobClient is
> > > > > >> necessary
> > > > > >> > to
> > > > > >> > interacte with a running Job.
> > > > > >> >
> > > > > >> > Let me check your open questions one by one.
> > > > > >> >
> > > > > >> > 1. Separate cluster creation and job submission for per-job
> > mode.
> > > > > >> >
> > > > > >> > As you mentioned here is where the opinions diverge. In my
> > > document
> > > > > >> there
> > > > > >> > is
> > > > > >> > an alternative[2] that proposes excluding per-job deployment
> > from
> > > > > client
> > > > > >> > api
> > > > > >> > scope and now I find it is more reasonable we do the exclusion.
> > > > > >> >
> > > > > >> > When in per-job mode, a dedicated JobCluster is launched to
> > > execute
> > > > > the
> > > > > >> > specific job. It is like a Flink Application more than a
> > > submission
> > > > > >> > of Flink Job. Client only takes care of job submission and
> > assume
> > > > > there
> > > > > >> is
> > > > > >> > an existing cluster. In this way we are able to consider per-job
> > > > > issues
> > > > > >> > individually and JobClusterEntrypoint would be the utility class
> > > for
> > > > > >> > per-job
> > > > > >> > deployment.
> > > > > >> >
> > > > > >> > Nevertheless, user program works in both session mode and
> > per-job
> > > > mode
> > > > > >> > without
> > > > > >> > necessary to change code. JobClient in per-job mode is returned
> > > from
> > > > > >> > env.execute as normal. However, it would be no longer a wrapper
> > of
> > > > > >> > RestClusterClient but a wrapper of PerJobClusterClient which
> > > > > >> communicates
> > > > > >> > to Dispatcher locally.
> > > > > >> >
> > > > > >> > 2. How to deal with plan preview.
> > > > > >> >
> > > > > >> > With env.compile functions users can get JobGraph or FlinkPlan
> > and
> > > > > thus
> > > > > >> > they can preview the plan with programming. Typically it looks
> > > like
> > > > > >> >
> > > > > >> > if (preview configured) {
> > > > > >> >     FlinkPlan plan = env.compile();
> > > > > >> >     new JSONDumpGenerator(...).dump(plan);
> > > > > >> > } else {
> > > > > >> >     env.execute();
> > > > > >> > }
> > > > > >> >
> > > > > >> > And `flink info` would be invalid any more.
> > > > > >> >
> > > > > >> > 3. How to deal with Jar Submission at the Web Frontend.
> > > > > >> >
> > > > > >> > There is one more thread talked on this topic[1]. Apart from
> > > > removing
> > > > > >> > the functions there are two alternatives.
> > > > > >> >
> > > > > >> > One is to introduce an interface has a method returns
> > > > > JobGraph/FilnkPlan
> > > > > >> > and Jar Submission only support main-class implements this
> > > > interface.
> > > > > >> > And then extract the JobGraph/FlinkPlan just by calling the
> > > method.
> > > > > >> > In this way, it is even possible to consider a separation of job
> > > > > >> creation
> > > > > >> > and job submission.
> > > > > >> >
> > > > > >> > The other is, as you mentioned, let execute() do the actual
> > > > execution.
> > > > > >> > We won't execute the main method in the WebFrontend but spawn a
> > > > > process
> > > > > >> > at WebMonitor side to execute. For return part we could generate
> > > the
> > > > > >> > JobID from WebMonitor and pass it to the execution environemnt.
> > > > > >> >
> > > > > >> > 4. How to deal with detached mode.
> > > > > >> >
> > > > > >> > I think detached mode is a temporary solution for non-blocking
> > > > > >> submission.
> > > > > >> > In my document both submission and execution return a
> > > > > CompletableFuture
> > > > > >> and
> > > > > >> > users control whether or not wait for the result. In this point
> > we
> > > > > don't
> > > > > >> > need a detached option but the functionality is covered.
> > > > > >> >
> > > > > >> > 5. How does per-job mode interact with interactive programming.
> > > > > >> >
> > > > > >> > All of YARN, Mesos and Kubernetes scenarios follow the pattern
> > > > launch
> > > > > a
> > > > > >> > JobCluster now. And I don't think there would be inconsistency
> > > > between
> > > > > >> > different resource management.
> > > > > >> >
> > > > > >> > Best,
> > > > > >> > tison.
> > > > > >> >
> > > > > >> > [1]
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> > https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> > > > > >> > [2]
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> > https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> > > > > >> >
> > > > > >> > Aljoscha Krettek <al...@apache.org> 于2019年8月16日周五 下午9:20写道：
> > > > > >> >
> > > > > >> >> Hi,
> > > > > >> >>
> > > > > >> >> I read both Jeffs initial design document and the newer
> > document
> > > by
> > > > > >> >> Tison. I also finally found the time to collect our thoughts on
> > > the
> > > > > >> issue,
> > > > > >> >> I had quite some discussions with Kostas and this is the
> > result:
> > > > [1].
> > > > > >> >>
> > > > > >> >> I think overall we agree that this part of the code is in dire
> > > need
> > > > > of
> > > > > >> >> some refactoring/improvements but I think there are still some
> > > open
> > > > > >> >> questions and some differences in opinion what those
> > refactorings
> > > > > >> should
> > > > > >> >> look like.
> > > > > >> >>
> > > > > >> >> I think the API-side is quite clear, i.e. we need some
> > JobClient
> > > > API
> > > > > >> that
> > > > > >> >> allows interacting with a running Job. It could be worthwhile
> > to
> > > > spin
> > > > > >> that
> > > > > >> >> off into a separate FLIP because we can probably find consensus
> > > on
> > > > > that
> > > > > >> >> part more easily.
> > > > > >> >>
> > > > > >> >> For the rest, the main open questions from our doc are these:
> > > > > >> >>
> > > > > >> >>   - Do we want to separate cluster creation and job submission
> > > for
> > > > > >> >> per-job mode? In the past, there were conscious efforts to
> > *not*
> > > > > >> separate
> > > > > >> >> job submission from cluster creation for per-job clusters for
> > > > Mesos,
> > > > > >> YARN,
> > > > > >> >> Kubernets (see StandaloneJobClusterEntryPoint). Tison suggests
> > in
> > > > his
> > > > > >> >> design document to decouple this in order to unify job
> > > submission.
> > > > > >> >>
> > > > > >> >>   - How to deal with plan preview, which needs to hijack
> > > execute()
> > > > > and
> > > > > >> >> let the outside code catch an exception?
> > > > > >> >>
> > > > > >> >>   - How to deal with Jar Submission at the Web Frontend, which
> > > > needs
> > > > > to
> > > > > >> >> hijack execute() and let the outside code catch an exception?
> > > > > >> >> CliFrontend.run() “hijacks” ExecutionEnvironment.execute() to
> > > get a
> > > > > >> >> JobGraph and then execute that JobGraph manually. We could get
> > > > around
> > > > > >> that
> > > > > >> >> by letting execute() do the actual execution. One caveat for
> > this
> > > > is
> > > > > >> that
> > > > > >> >> now the main() method doesn’t return (or is forced to return by
> > > > > >> throwing an
> > > > > >> >> exception from execute()) which means that for Jar Submission
> > > from
> > > > > the
> > > > > >> >> WebFrontend we have a long-running main() method running in the
> > > > > >> >> WebFrontend. This doesn’t sound very good. We could get around
> > > this
> > > > > by
> > > > > >> >> removing the plan preview feature and by removing Jar
> > > > > >> Submission/Running.
> > > > > >> >>
> > > > > >> >>   - How to deal with detached mode? Right now,
> > > DetachedEnvironment
> > > > > will
> > > > > >> >> execute the job and return immediately. If users control when
> > > they
> > > > > >> want to
> > > > > >> >> return, by waiting on the job completion future, how do we deal
> > > > with
> > > > > >> this?
> > > > > >> >> Do we simply remove the distinction between
> > > detached/non-detached?
> > > > > >> >>
> > > > > >> >>   - How does per-job mode interact with “interactive
> > programming”
> > > > > >> >> (FLIP-36). For YARN, each execute() call could spawn a new
> > Flink
> > > > YARN
> > > > > >> >> cluster. What about Mesos and Kubernetes?
> > > > > >> >>
> > > > > >> >> The first open question is where the opinions diverge, I think.
> > > The
> > > > > >> rest
> > > > > >> >> are just open questions and interesting things that we need to
> > > > > >> consider.
> > > > > >> >>
> > > > > >> >> Best,
> > > > > >> >> Aljoscha
> > > > > >> >>
> > > > > >> >> [1]
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> > https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > > >> >> <
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> > https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > > >> >> >
> > > > > >> >>
> > > > > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <zj...@gmail.com>
> > > wrote:
> > > > > >> >> >
> > > > > >> >> > Thanks tison for the effort. I left a few comments.
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三 下午8:24写道：
> > > > > >> >> >
> > > > > >> >> >> Hi Flavio,
> > > > > >> >> >>
> > > > > >> >> >> Thanks for your reply.
> > > > > >> >> >>
> > > > > >> >> >> Either current impl and in the design, ClusterClient
> > > > > >> >> >> never takes responsibility for generating JobGraph.
> > > > > >> >> >> (what you see in current codebase is several class methods)
> > > > > >> >> >>
> > > > > >> >> >> Instead, user describes his program in the main method
> > > > > >> >> >> with ExecutionEnvironment apis and calls env.compile()
> > > > > >> >> >> or env.optimize() to get FlinkPlan and JobGraph
> > respectively.
> > > > > >> >> >>
> > > > > >> >> >> For listing main classes in a jar and choose one for
> > > > > >> >> >> submission, you're now able to customize a CLI to do it.
> > > > > >> >> >> Specifically, the path of jar is passed as arguments and
> > > > > >> >> >> in the customized CLI you list main classes, choose one
> > > > > >> >> >> to submit to the cluster.
> > > > > >> >> >>
> > > > > >> >> >> Best,
> > > > > >> >> >> tison.
> > > > > >> >> >>
> > > > > >> >> >>
> > > > > >> >> >> Flavio Pompermaier <po...@okkam.it> 于2019年7月31日周三
> > > > 下午8:12写道：
> > > > > >> >> >>
> > > > > >> >> >>> Just one note on my side: it is not clear to me whether the
> > > > > client
> > > > > >> >> needs
> > > > > >> >> >> to
> > > > > >> >> >>> be able to generate a job graph or not.
> > > > > >> >> >>> In my opinion, the job jar must resides only on the
> > > > > >> server/jobManager
> > > > > >> >> >> side
> > > > > >> >> >>> and the client requires a way to get the job graph.
> > > > > >> >> >>> If you really want to access to the job graph, I'd add a
> > > > > dedicated
> > > > > >> >> method
> > > > > >> >> >>> on the ClusterClient. like:
> > > > > >> >> >>>
> > > > > >> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
> > > > > >> >> >>>   - listMainClasses(jarId): List<String>
> > > > > >> >> >>>
> > > > > >> >> >>> These would require some addition also on the job manager
> > > > > endpoint
> > > > > >> as
> > > > > >> >> >>> well..what do you think?
> > > > > >> >> >>>
> > > > > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <
> > > > wander4096@gmail.com
> > > > > >
> > > > > >> >> wrote:
> > > > > >> >> >>>
> > > > > >> >> >>>> Hi all,
> > > > > >> >> >>>>
> > > > > >> >> >>>> Here is a document[1] on client api enhancement from our
> > > > > >> perspective.
> > > > > >> >> >>>> We have investigated current implementations. And we
> > propose
> > > > > >> >> >>>>
> > > > > >> >> >>>> 1. Unify the implementation of cluster deployment and job
> > > > > >> submission
> > > > > >> >> in
> > > > > >> >> >>>> Flink.
> > > > > >> >> >>>> 2. Provide programmatic interfaces to allow flexible job
> > and
> > > > > >> cluster
> > > > > >> >> >>>> management.
> > > > > >> >> >>>>
> > > > > >> >> >>>> The first proposal is aimed at reducing code paths of
> > > cluster
> > > > > >> >> >> deployment
> > > > > >> >> >>>> and
> > > > > >> >> >>>> job submission so that one can adopt Flink in his usage
> > > > easily.
> > > > > >> The
> > > > > >> >> >>> second
> > > > > >> >> >>>> proposal is aimed at providing rich interfaces for
> > advanced
> > > > > users
> > > > > >> >> >>>> who want to make accurate control of these stages.
> > > > > >> >> >>>>
> > > > > >> >> >>>> Quick reference on open questions:
> > > > > >> >> >>>>
> > > > > >> >> >>>> 1. Exclude job cluster deployment from client side or
> > > redefine
> > > > > the
> > > > > >> >> >>> semantic
> > > > > >> >> >>>> of job cluster? Since it fits in a process quite different
> > > > from
> > > > > >> >> session
> > > > > >> >> >>>> cluster deployment and job submission.
> > > > > >> >> >>>>
> > > > > >> >> >>>> 2. Maintain the codepaths handling class
> > > > > o.a.f.api.common.Program
> > > > > >> or
> > > > > >> >> >>>> implement customized program handling logic by customized
> > > > > >> >> CliFrontend?
> > > > > >> >> >>>> See also this thread[2] and the document[1].
> > > > > >> >> >>>>
> > > > > >> >> >>>> 3. Expose ClusterClient as public api or just expose api
> > in
> > > > > >> >> >>>> ExecutionEnvironment
> > > > > >> >> >>>> and delegate them to ClusterClient? Further, in either way
> > > is
> > > > it
> > > > > >> >> worth
> > > > > >> >> >> to
> > > > > >> >> >>>> introduce a JobClient which is an encapsulation of
> > > > ClusterClient
> > > > > >> that
> > > > > >> >> >>>> associated to specific job?
> > > > > >> >> >>>>
> > > > > >> >> >>>> Best,
> > > > > >> >> >>>> tison.
> > > > > >> >> >>>>
> > > > > >> >> >>>> [1]
> > > > > >> >> >>>>
> > > > > >> >> >>>>
> > > > > >> >> >>>
> > > > > >> >> >>
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> > https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> > > > > >> >> >>>> [2]
> > > > > >> >> >>>>
> > > > > >> >> >>>>
> > > > > >> >> >>>
> > > > > >> >> >>
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> > https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> > > > > >> >> >>>>
> > > > > >> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三 上午9:19写道：
> > > > > >> >> >>>>
> > > > > >> >> >>>>> Thanks Stephan, I will follow up this issue in next few
> > > > weeks,
> > > > > >> and
> > > > > >> >> >> will
> > > > > >> >> >>>>> refine the design doc. We could discuss more details
> > after
> > > > 1.9
> > > > > >> >> >> release.
> > > > > >> >> >>>>>
> > > > > >> >> >>>>> Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：
> > > > > >> >> >>>>>
> > > > > >> >> >>>>>> Hi all!
> > > > > >> >> >>>>>>
> > > > > >> >> >>>>>> This thread has stalled for a bit, which I assume ist
> > > mostly
> > > > > >> due to
> > > > > >> >> >>> the
> > > > > >> >> >>>>>> Flink 1.9 feature freeze and release testing effort.
> > > > > >> >> >>>>>>
> > > > > >> >> >>>>>> I personally still recognize this issue as one important
> > > to
> > > > be
> > > > > >> >> >>> solved.
> > > > > >> >> >>>>> I'd
> > > > > >> >> >>>>>> be happy to help resume this discussion soon (after the
> > > 1.9
> > > > > >> >> >> release)
> > > > > >> >> >>>> and
> > > > > >> >> >>>>>> see if we can do some step towards this in Flink 1.10.
> > > > > >> >> >>>>>>
> > > > > >> >> >>>>>> Best,
> > > > > >> >> >>>>>> Stephan
> > > > > >> >> >>>>>>
> > > > > >> >> >>>>>>
> > > > > >> >> >>>>>>
> > > > > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <
> > > > > >> >> >>>>> pompermaier@okkam.it>
> > > > > >> >> >>>>>> wrote:
> > > > > >> >> >>>>>>
> > > > > >> >> >>>>>>> That's exactly what I suggested a long time ago: the
> > > Flink
> > > > > REST
> > > > > >> >> >>>> client
> > > > > >> >> >>>>>>> should not require any Flink dependency, only http
> > > library
> > > > to
> > > > > >> >> >> call
> > > > > >> >> >>>> the
> > > > > >> >> >>>>>> REST
> > > > > >> >> >>>>>>> services to submit and monitor a job.
> > > > > >> >> >>>>>>> What I suggested also in [1] was to have a way to
> > > > > automatically
> > > > > >> >> >>>> suggest
> > > > > >> >> >>>>>> the
> > > > > >> >> >>>>>>> user (via a UI) the available main classes and their
> > > > required
> > > > > >> >> >>>>>>> parameters[2].
> > > > > >> >> >>>>>>> Another problem we have with Flink is that the Rest
> > > client
> > > > > and
> > > > > >> >> >> the
> > > > > >> >> >>>> CLI
> > > > > >> >> >>>>>> one
> > > > > >> >> >>>>>>> behaves differently and we use the CLI client (via ssh)
> > > > > because
> > > > > >> >> >> it
> > > > > >> >> >>>>> allows
> > > > > >> >> >>>>>>> to call some other method after env.execute() [3] (we
> > > have
> > > > to
> > > > > >> >> >> call
> > > > > >> >> >>>>>> another
> > > > > >> >> >>>>>>> REST service to signal the end of the job).
> > > > > >> >> >>>>>>> Int his regard, a dedicated interface, like the
> > > JobListener
> > > > > >> >> >>> suggested
> > > > > >> >> >>>>> in
> > > > > >> >> >>>>>>> the previous emails, would be very helpful (IMHO).
> > > > > >> >> >>>>>>>
> > > > > >> >> >>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-10864
> > > > > >> >> >>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-10862
> > > > > >> >> >>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-10879
> > > > > >> >> >>>>>>>
> > > > > >> >> >>>>>>> Best,
> > > > > >> >> >>>>>>> Flavio
> > > > > >> >> >>>>>>>
> > > > > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <
> > > > zjffdu@gmail.com
> > > > > >
> > > > > >> >> >>> wrote:
> > > > > >> >> >>>>>>>
> > > > > >> >> >>>>>>>> Hi, Tison,
> > > > > >> >> >>>>>>>>
> > > > > >> >> >>>>>>>> Thanks for your comments. Overall I agree with you
> > that
> > > it
> > > > > is
> > > > > >> >> >>>>> difficult
> > > > > >> >> >>>>>>> for
> > > > > >> >> >>>>>>>> down stream project to integrate with flink and we
> > need
> > > to
> > > > > >> >> >>> refactor
> > > > > >> >> >>>>> the
> > > > > >> >> >>>>>>>> current flink client api.
> > > > > >> >> >>>>>>>> And I agree that CliFrontend should only parsing
> > command
> > > > > line
> > > > > >> >> >>>>> arguments
> > > > > >> >> >>>>>>> and
> > > > > >> >> >>>>>>>> then pass them to ExecutionEnvironment. It is
> > > > > >> >> >>>> ExecutionEnvironment's
> > > > > >> >> >>>>>>>> responsibility to compile job, create cluster, and
> > > submit
> > > > > job.
> > > > > >> >> >>>>> Besides
> > > > > >> >> >>>>>>>> that, Currently flink has many ExecutionEnvironment
> > > > > >> >> >>>> implementations,
> > > > > >> >> >>>>>> and
> > > > > >> >> >>>>>>>> flink will use the specific one based on the context.
> > > > IMHO,
> > > > > it
> > > > > >> >> >> is
> > > > > >> >> >>>> not
> > > > > >> >> >>>>>>>> necessary, ExecutionEnvironment should be able to do
> > the
> > > > > right
> > > > > >> >> >>>> thing
> > > > > >> >> >>>>>>> based
> > > > > >> >> >>>>>>>> on the FlinkConf it is received. Too many
> > > > > ExecutionEnvironment
> > > > > >> >> >>>>>>>> implementation is another burden for downstream
> > project
> > > > > >> >> >>>> integration.
> > > > > >> >> >>>>>>>>
> > > > > >> >> >>>>>>>> One thing I'd like to mention is flink's scala shell
> > and
> > > > sql
> > > > > >> >> >>>> client,
> > > > > >> >> >>>>>>>> although they are sub-modules of flink, they could be
> > > > > treated
> > > > > >> >> >> as
> > > > > >> >> >>>>>>> downstream
> > > > > >> >> >>>>>>>> project which use flink's client api. Currently you
> > will
> > > > > find
> > > > > >> >> >> it
> > > > > >> >> >>> is
> > > > > >> >> >>>>> not
> > > > > >> >> >>>>>>>> easy for them to integrate with flink, they share many
> > > > > >> >> >> duplicated
> > > > > >> >> >>>>> code.
> > > > > >> >> >>>>>>> It
> > > > > >> >> >>>>>>>> is another sign that we should refactor flink client
> > > api.
> > > > > >> >> >>>>>>>>
> > > > > >> >> >>>>>>>> I believe it is a large and hard change, and I am
> > afraid
> > > > we
> > > > > >> can
> > > > > >> >> >>> not
> > > > > >> >> >>>>>> keep
> > > > > >> >> >>>>>>>> compatibility since many of changes are user facing.
> > > > > >> >> >>>>>>>>
> > > > > >> >> >>>>>>>>
> > > > > >> >> >>>>>>>>
> > > > > >> >> >>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月24日周一
> > > 下午2:53写道：
> > > > > >> >> >>>>>>>>
> > > > > >> >> >>>>>>>>> Hi all,
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> After a closer look on our client apis, I can see
> > there
> > > > are
> > > > > >> >> >> two
> > > > > >> >> >>>>> major
> > > > > >> >> >>>>>>>>> issues to consistency and integration, namely
> > different
> > > > > >> >> >>>> deployment
> > > > > >> >> >>>>> of
> > > > > >> >> >>>>>>>>> job cluster which couples job graph creation and
> > > cluster
> > > > > >> >> >>>>> deployment,
> > > > > >> >> >>>>>>>>> and submission via CliFrontend confusing control flow
> > > of
> > > > > job
> > > > > >> >> >>>> graph
> > > > > >> >> >>>>>>>>> compilation and job submission. I'd like to follow
> > the
> > > > > >> >> >> discuss
> > > > > >> >> >>>>> above,
> > > > > >> >> >>>>>>>>> mainly the process described by Jeff and Stephan, and
> > > > share
> > > > > >> >> >> my
> > > > > >> >> >>>>>>>>> ideas on these issues.
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> 1) CliFrontend confuses the control flow of job
> > > > compilation
> > > > > >> >> >> and
> > > > > >> >> >>>>>>>> submission.
> > > > > >> >> >>>>>>>>> Following the process of job submission Stephan and
> > > Jeff
> > > > > >> >> >>>> described,
> > > > > >> >> >>>>>>>>> execution environment knows all configs of the
> > cluster
> > > > and
> > > > > >> >> >>>>>>> topos/settings
> > > > > >> >> >>>>>>>>> of the job. Ideally, in the main method of user
> > > program,
> > > > it
> > > > > >> >> >>> calls
> > > > > >> >> >>>>>>>> #execute
> > > > > >> >> >>>>>>>>> (or named #submit) and Flink deploys the cluster,
> > > compile
> > > > > the
> > > > > >> >> >>> job
> > > > > >> >> >>>>>> graph
> > > > > >> >> >>>>>>>>> and submit it to the cluster. However, current
> > > > CliFrontend
> > > > > >> >> >> does
> > > > > >> >> >>>> all
> > > > > >> >> >>>>>>> these
> > > > > >> >> >>>>>>>>> things inside its #runProgram method, which
> > introduces
> > > a
> > > > > lot
> > > > > >> >> >> of
> > > > > >> >> >>>>>>>> subclasses
> > > > > >> >> >>>>>>>>> of (stream) execution environment.
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> Actually, it sets up an exec env that hijacks the
> > > > > >> >> >>>>>> #execute/executePlan
> > > > > >> >> >>>>>>>>> method, initializes the job graph and abort
> > execution.
> > > > And
> > > > > >> >> >> then
> > > > > >> >> >>>>>>>>> control flow back to CliFrontend, it deploys the
> > > > cluster(or
> > > > > >> >> >>>>> retrieve
> > > > > >> >> >>>>>>>>> the client) and submits the job graph. This is quite
> > a
> > > > > >> >> >> specific
> > > > > >> >> >>>>>>> internal
> > > > > >> >> >>>>>>>>> process inside Flink and none of consistency to
> > > anything.
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> 2) Deployment of job cluster couples job graph
> > creation
> > > > and
> > > > > >> >> >>>> cluster
> > > > > >> >> >>>>>>>>> deployment. Abstractly, from user job to a concrete
> > > > > >> >> >> submission,
> > > > > >> >> >>>> it
> > > > > >> >> >>>>>>>> requires
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>>     create JobGraph --\
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> create ClusterClient -->  submit JobGraph
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> such a dependency. ClusterClient was created by
> > > deploying
> > > > > or
> > > > > >> >> >>>>>>> retrieving.
> > > > > >> >> >>>>>>>>> JobGraph submission requires a compiled JobGraph and
> > > > valid
> > > > > >> >> >>>>>>> ClusterClient,
> > > > > >> >> >>>>>>>>> but the creation of ClusterClient is abstractly
> > > > independent
> > > > > >> >> >> of
> > > > > >> >> >>>> that
> > > > > >> >> >>>>>> of
> > > > > >> >> >>>>>>>>> JobGraph. However, in job cluster mode, we deploy job
> > > > > cluster
> > > > > >> >> >>>> with
> > > > > >> >> >>>>> a
> > > > > >> >> >>>>>>> job
> > > > > >> >> >>>>>>>>> graph, which means we use another process:
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> create JobGraph --> deploy cluster with the JobGraph
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> Here is another inconsistency and downstream
> > > > > projects/client
> > > > > >> >> >>> apis
> > > > > >> >> >>>>> are
> > > > > >> >> >>>>>>>>> forced to handle different cases with rare supports
> > > from
> > > > > >> >> >> Flink.
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> Since we likely reached a consensus on
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> 1. all configs gathered by Flink configuration and
> > > passed
> > > > > >> >> >>>>>>>>> 2. execution environment knows all configs and
> > handles
> > > > > >> >> >>>>> execution(both
> > > > > >> >> >>>>>>>>> deployment and submission)
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> to the issues above I propose eliminating
> > > inconsistencies
> > > > > by
> > > > > >> >> >>>>>> following
> > > > > >> >> >>>>>>>>> approach:
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> 1) CliFrontend should exactly be a front end, at
> > least
> > > > for
> > > > > >> >> >>> "run"
> > > > > >> >> >>>>>>> command.
> > > > > >> >> >>>>>>>>> That means it just gathered and passed all config
> > from
> > > > > >> >> >> command
> > > > > >> >> >>>> line
> > > > > >> >> >>>>>> to
> > > > > >> >> >>>>>>>>> the main method of user program. Execution
> > environment
> > > > > knows
> > > > > >> >> >>> all
> > > > > >> >> >>>>> the
> > > > > >> >> >>>>>>> info
> > > > > >> >> >>>>>>>>> and with an addition to utils for ClusterClient, we
> > > > > >> >> >> gracefully
> > > > > >> >> >>>> get
> > > > > >> >> >>>>> a
> > > > > >> >> >>>>>>>>> ClusterClient by deploying or retrieving. In this
> > way,
> > > we
> > > > > >> >> >> don't
> > > > > >> >> >>>>> need
> > > > > >> >> >>>>>> to
> > > > > >> >> >>>>>>>>> hijack #execute/executePlan methods and can remove
> > > > various
> > > > > >> >> >>>> hacking
> > > > > >> >> >>>>>>>>> subclasses of exec env, as well as #run methods in
> > > > > >> >> >>>>> ClusterClient(for
> > > > > >> >> >>>>>> an
> > > > > >> >> >>>>>>>>> interface-ized ClusterClient). Now the control flow
> > > flows
> > > > > >> >> >> from
> > > > > >> >> >>>>>>>> CliFrontend
> > > > > >> >> >>>>>>>>> to the main method and never returns.
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> 2) Job cluster means a cluster for the specific job.
> > > From
> > > > > >> >> >>> another
> > > > > >> >> >>>>>>>>> perspective, it is an ephemeral session. We may
> > > decouple
> > > > > the
> > > > > >> >> >>>>>> deployment
> > > > > >> >> >>>>>>>>> with a compiled job graph, but start a session with
> > > idle
> > > > > >> >> >>> timeout
> > > > > >> >> >>>>>>>>> and submit the job following.
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> These topics, before we go into more details on
> > design
> > > or
> > > > > >> >> >>>>>>> implementation,
> > > > > >> >> >>>>>>>>> are better to be aware and discussed for a consensus.
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> Best,
> > > > > >> >> >>>>>>>>> tison.
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月20日周四
> > > 上午3:21写道：
> > > > > >> >> >>>>>>>>>
> > > > > >> >> >>>>>>>>>> Hi Jeff,
> > > > > >> >> >>>>>>>>>>
> > > > > >> >> >>>>>>>>>> Thanks for raising this thread and the design
> > > document!
> > > > > >> >> >>>>>>>>>>
> > > > > >> >> >>>>>>>>>> As @Thomas Weise mentioned above, extending config
> > to
> > > > > flink
> > > > > >> >> >>>>>>>>>> requires far more effort than it should be. Another
> > > > > example
> > > > > >> >> >>>>>>>>>> is we achieve detach mode by introduce another
> > > execution
> > > > > >> >> >>>>>>>>>> environment which also hijack #execute method.
> > > > > >> >> >>>>>>>>>>
> > > > > >> >> >>>>>>>>>> I agree with your idea that user would configure all
> > > > > things
> > > > > >> >> >>>>>>>>>> and flink "just" respect it. On this topic I think
> > the
> > > > > >> >> >> unusual
> > > > > >> >> >>>>>>>>>> control flow when CliFrontend handle "run" command
> > is
> > > > the
> > > > > >> >> >>>> problem.
> > > > > >> >> >>>>>>>>>> It handles several configs, mainly about cluster
> > > > settings,
> > > > > >> >> >> and
> > > > > >> >> >>>>>>>>>> thus main method of user program is unaware of them.
> > > > Also
> > > > > it
> > > > > >> >> >>>>>> compiles
> > > > > >> >> >>>>>>>>>> app to job graph by run the main method with a
> > > hijacked
> > > > > exec
> > > > > >> >> >>>> env,
> > > > > >> >> >>>>>>>>>> which constrain the main method further.
> > > > > >> >> >>>>>>>>>>
> > > > > >> >> >>>>>>>>>> I'd like to write down a few of notes on
> > configs/args
> > > > pass
> > > > > >> >> >> and
> > > > > >> >> >>>>>>> respect,
> > > > > >> >> >>>>>>>>>> as well as decoupling job compilation and
> > submission.
> > > > > Share
> > > > > >> >> >> on
> > > > > >> >> >>>>> this
> > > > > >> >> >>>>>>>>>> thread later.
> > > > > >> >> >>>>>>>>>>
> > > > > >> >> >>>>>>>>>> Best,
> > > > > >> >> >>>>>>>>>> tison.
> > > > > >> >> >>>>>>>>>>
> > > > > >> >> >>>>>>>>>>
> > > > > >> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一
> > > > > >> >> >> 下午7:29写道：
> > > > > >> >> >>>>>>>>>>
> > > > > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
> > > > > >> >> >>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the design
> > document.
> > > > > >> >> >>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>> We are also working on refactoring ClusterClient to
> > > > allow
> > > > > >> >> >>>>> flexible
> > > > > >> >> >>>>>>> and
> > > > > >> >> >>>>>>>>>>> efficient job management in our real-time platform.
> > > > > >> >> >>>>>>>>>>> We would like to draft a document to share our
> > ideas
> > > > with
> > > > > >> >> >>> you.
> > > > > >> >> >>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>> I think it's a good idea to have something like
> > > Apache
> > > > > Livy
> > > > > >> >> >>> for
> > > > > >> >> >>>>>>> Flink,
> > > > > >> >> >>>>>>>>>>> and
> > > > > >> >> >>>>>>>>>>> the efforts discussed here will take a great step
> > > > forward
> > > > > >> >> >> to
> > > > > >> >> >>>> it.
> > > > > >> >> >>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>> Regards,
> > > > > >> >> >>>>>>>>>>> Xiaogang
> > > > > >> >> >>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>> Flavio Pompermaier <po...@okkam.it>
> > > > 于2019年6月17日周一
> > > > > >> >> >>>>> 下午7:13写道：
> > > > > >> >> >>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>> Is there any possibility to have something like
> > > Apache
> > > > > >> >> >> Livy
> > > > > >> >> >>>> [1]
> > > > > >> >> >>>>>>> also
> > > > > >> >> >>>>>>>>>>> for
> > > > > >> >> >>>>>>>>>>>> Flink in the future?
> > > > > >> >> >>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
> > > > > >> >> >>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <
> > > > > >> >> >>> zjffdu@gmail.com
> > > > > >> >> >>>>>
> > > > > >> >> >>>>>>> wrote:
> > > > > >> >> >>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>> Any API we expose should not have dependencies
> > > on
> > > > > >> >> >>> the
> > > > > >> >> >>>>>>> runtime
> > > > > >> >> >>>>>>>>>>>>> (flink-runtime) package or other implementation
> > > > > >> >> >> details.
> > > > > >> >> >>> To
> > > > > >> >> >>>>> me,
> > > > > >> >> >>>>>>>> this
> > > > > >> >> >>>>>>>>>>>> means
> > > > > >> >> >>>>>>>>>>>>> that the current ClusterClient cannot be exposed
> > to
> > > > > >> >> >> users
> > > > > >> >> >>>>>> because
> > > > > >> >> >>>>>>>> it
> > > > > >> >> >>>>>>>>>>>> uses
> > > > > >> >> >>>>>>>>>>>>> quite some classes from the optimiser and runtime
> > > > > >> >> >>> packages.
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>> We should change ClusterClient from class to
> > > > interface.
> > > > > >> >> >>>>>>>>>>>>> ExecutionEnvironment only use the interface
> > > > > >> >> >> ClusterClient
> > > > > >> >> >>>>> which
> > > > > >> >> >>>>>>>>>>> should be
> > > > > >> >> >>>>>>>>>>>>> in flink-clients while the concrete
> > implementation
> > > > > >> >> >> class
> > > > > >> >> >>>>> could
> > > > > >> >> >>>>>> be
> > > > > >> >> >>>>>>>> in
> > > > > >> >> >>>>>>>>>>>>> flink-runtime.
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>> What happens when a failure/restart in the
> > > client
> > > > > >> >> >>>>> happens?
> > > > > >> >> >>>>>>>> There
> > > > > >> >> >>>>>>>>>>> need
> > > > > >> >> >>>>>>>>>>>>> to be a way of re-establishing the connection to
> > > the
> > > > > >> >> >> job,
> > > > > >> >> >>>> set
> > > > > >> >> >>>>>> up
> > > > > >> >> >>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>> listeners again, etc.
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>> Good point.  First we need to define what does
> > > > > >> >> >>>>> failure/restart
> > > > > >> >> >>>>>> in
> > > > > >> >> >>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>> client mean. IIUC, that usually mean network
> > > failure
> > > > > >> >> >>> which
> > > > > >> >> >>>>> will
> > > > > >> >> >>>>>>>>>>> happen in
> > > > > >> >> >>>>>>>>>>>>> class RestClient. If my understanding is correct,
> > > > > >> >> >>>>> restart/retry
> > > > > >> >> >>>>>>>>>>> mechanism
> > > > > >> >> >>>>>>>>>>>>> should be done in RestClient.
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org>
> > > 于2019年6月11日周二
> > > > > >> >> >>>>>> 下午11:10写道：
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>> Some points to consider:
> > > > > >> >> >>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>> * Any API we expose should not have dependencies
> > > on
> > > > > >> >> >> the
> > > > > >> >> >>>>>> runtime
> > > > > >> >> >>>>>>>>>>>>>> (flink-runtime) package or other implementation
> > > > > >> >> >>> details.
> > > > > >> >> >>>> To
> > > > > >> >> >>>>>> me,
> > > > > >> >> >>>>>>>>>>> this
> > > > > >> >> >>>>>>>>>>>>> means
> > > > > >> >> >>>>>>>>>>>>>> that the current ClusterClient cannot be exposed
> > > to
> > > > > >> >> >>> users
> > > > > >> >> >>>>>>> because
> > > > > >> >> >>>>>>>>>>> it
> > > > > >> >> >>>>>>>>>>>>> uses
> > > > > >> >> >>>>>>>>>>>>>> quite some classes from the optimiser and
> > runtime
> > > > > >> >> >>>> packages.
> > > > > >> >> >>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>> * What happens when a failure/restart in the
> > > client
> > > > > >> >> >>>>> happens?
> > > > > >> >> >>>>>>>> There
> > > > > >> >> >>>>>>>>>>> need
> > > > > >> >> >>>>>>>>>>>>> to
> > > > > >> >> >>>>>>>>>>>>>> be a way of re-establishing the connection to
> > the
> > > > > >> >> >> job,
> > > > > >> >> >>>> set
> > > > > >> >> >>>>> up
> > > > > >> >> >>>>>>> the
> > > > > >> >> >>>>>>>>>>>>> listeners
> > > > > >> >> >>>>>>>>>>>>>> again, etc.
> > > > > >> >> >>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>> Aljoscha
> > > > > >> >> >>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff Zhang <
> > > > > >> >> >>>> zjffdu@gmail.com>
> > > > > >> >> >>>>>>>> wrote:
> > > > > >> >> >>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is late as you
> > > > > >> >> >> expected.
> > > > > >> >> >>>>> Here's
> > > > > >> >> >>>>>>> the
> > > > > >> >> >>>>>>>>>>>> design
> > > > > >> >> >>>>>>>>>>>>>> doc
> > > > > >> >> >>>>>>>>>>>>>>> I drafted, welcome any comments and feedback.
> > > > > >> >> >>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>
> > > > > >> >> >>>>>>>>
> > > > > >> >> >>>>>>>
> > > > > >> >> >>>>>>
> > > > > >> >> >>>>>
> > > > > >> >> >>>>
> > > > > >> >> >>>
> > > > > >> >> >>
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> > https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > > > >> >> >>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四
> > > > > >> >> >>>> 下午8:43写道：
> > > > > >> >> >>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>> Nice that this discussion is happening.
> > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also revisit the entire
> > > role
> > > > > >> >> >>> of
> > > > > >> >> >>>>> the
> > > > > >> >> >>>>>>>>>>>> environments
> > > > > >> >> >>>>>>>>>>>>>>>> again.
> > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> > > > > >> >> >>>>>>>>>>>>>>>> - the environments take care of the specific
> > > > > >> >> >> setup
> > > > > >> >> >>>> for
> > > > > >> >> >>>>>>>>>>> standalone
> > > > > >> >> >>>>>>>>>>>> (no
> > > > > >> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
> > > > > >> >> >>>>>>>>>>>>>>>> - the session ones have control over the
> > > session.
> > > > > >> >> >>> The
> > > > > >> >> >>>>>>>>>>> environment
> > > > > >> >> >>>>>>>>>>>>> holds
> > > > > >> >> >>>>>>>>>>>>>>>> the session client.
> > > > > >> >> >>>>>>>>>>>>>>>> - running a job gives a "control" object for
> > > that
> > > > > >> >> >>>> job.
> > > > > >> >> >>>>>> That
> > > > > >> >> >>>>>>>>>>>> behavior
> > > > > >> >> >>>>>>>>>>>>> is
> > > > > >> >> >>>>>>>>>>>>>>>> the same in all environments.
> > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>> The actual implementation diverged quite a bit
> > > > > >> >> >> from
> > > > > >> >> >>>>> that.
> > > > > >> >> >>>>>>>> Happy
> > > > > >> >> >>>>>>>>>>> to
> > > > > >> >> >>>>>>>>>>>>> see a
> > > > > >> >> >>>>>>>>>>>>>>>> discussion about straitening this out a bit
> > > more.
> > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
> > > > > >> >> >>>>>>> zjffdu@gmail.com>
> > > > > >> >> >>>>>>>>>>>> wrote:
> > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>> Hi folks,
> > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>> Sorry for late response, It seems we reach
> > > > > >> >> >>> consensus
> > > > > >> >> >>>> on
> > > > > >> >> >>>>>>>> this, I
> > > > > >> >> >>>>>>>>>>>> will
> > > > > >> >> >>>>>>>>>>>>>>>> create
> > > > > >> >> >>>>>>>>>>>>>>>>> FLIP for this with more detailed design
> > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org> 于2018年12月21日周五
> > > > > >> >> >>>>> 上午11:43写道：
> > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>> Great to see this discussion seeded! The
> > > > > >> >> >> problems
> > > > > >> >> >>>> you
> > > > > >> >> >>>>>> face
> > > > > >> >> >>>>>>>>>>> with
> > > > > >> >> >>>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also affecting
> > other
> > > > > >> >> >>>>> downstream
> > > > > >> >> >>>>>>>>>>> projects,
> > > > > >> >> >>>>>>>>>>>>>> like
> > > > > >> >> >>>>>>>>>>>>>>>>>> Beam.
> > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint restore option
> > > in
> > > > > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> > > > > >> >> >>>>>>>>>>>>>>>>> [1]
> > > > > >> >> >>>>>>>>>>>>>>>>>> and that was more difficult than it should
> > be.
> > > > > >> >> >> The
> > > > > >> >> >>>>> main
> > > > > >> >> >>>>>>>> issue
> > > > > >> >> >>>>>>>>>>> is
> > > > > >> >> >>>>>>>>>>>>> that
> > > > > >> >> >>>>>>>>>>>>>>>>>> environment and cluster client aren't
> > > decoupled.
> > > > > >> >> >>>>> Ideally
> > > > > >> >> >>>>>>> it
> > > > > >> >> >>>>>>>>>>> should
> > > > > >> >> >>>>>>>>>>>>> be
> > > > > >> >> >>>>>>>>>>>>>>>>>> possible to just get the matching cluster
> > > client
> > > > > >> >> >>>> from
> > > > > >> >> >>>>>> the
> > > > > >> >> >>>>>>>>>>>>> environment
> > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > >> >> >>>>>>>>>>>>>>>>>> then control the job through it (environment
> > > as
> > > > > >> >> >>>>> factory
> > > > > >> >> >>>>>>> for
> > > > > >> >> >>>>>>>>>>>> cluster
> > > > > >> >> >>>>>>>>>>>>>>>>>> client). But note that the environment
> > classes
> > > > > >> >> >> are
> > > > > >> >> >>>>> part
> > > > > >> >> >>>>>> of
> > > > > >> >> >>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>> public
> > > > > >> >> >>>>>>>>>>>>>>>>> API,
> > > > > >> >> >>>>>>>>>>>>>>>>>> and it is not straightforward to make larger
> > > > > >> >> >>> changes
> > > > > >> >> >>>>>>> without
> > > > > >> >> >>>>>>>>>>>>> breaking
> > > > > >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
> > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes internal
> > > classes
> > > > > >> >> >>>> like
> > > > > >> >> >>>>>>>>>>> JobGraph and
> > > > > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be possible to
> > wrap
> > > > > >> >> >>> this
> > > > > >> >> >>>>>> with a
> > > > > >> >> >>>>>>>> new
> > > > > >> >> >>>>>>>>>>>>> public
> > > > > >> >> >>>>>>>>>>>>>>>> API
> > > > > >> >> >>>>>>>>>>>>>>>>>> that brings the required job control
> > > > > >> >> >> capabilities
> > > > > >> >> >>>> for
> > > > > >> >> >>>>>>>>>>> downstream
> > > > > >> >> >>>>>>>>>>>>>>>>> projects.
> > > > > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at some of the
> > > > > >> >> >>>>> interfaces
> > > > > >> >> >>>>>> in
> > > > > >> >> >>>>>>>>>>> Beam
> > > > > >> >> >>>>>>>>>>>>> while
> > > > > >> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the portable
> > job
> > > > > >> >> >> API
> > > > > >> >> >>>> and
> > > > > >> >> >>>>>> [3]
> > > > > >> >> >>>>>>>> for
> > > > > >> >> >>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>> old
> > > > > >> >> >>>>>>>>>>>>>>>>>> asynchronous job control from the Beam Java
> > > SDK.
> > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>> The backward compatibility discussion [4] is
> > > > > >> >> >> also
> > > > > >> >> >>>>>> relevant
> > > > > >> >> >>>>>>>>>>> here. A
> > > > > >> >> >>>>>>>>>>>>> new
> > > > > >> >> >>>>>>>>>>>>>>>>> API
> > > > > >> >> >>>>>>>>>>>>>>>>>> should shield downstream projects from
> > > internals
> > > > > >> >> >>> and
> > > > > >> >> >>>>>> allow
> > > > > >> >> >>>>>>>>>>> them to
> > > > > >> >> >>>>>>>>>>>>>>>>>> interoperate with multiple future Flink
> > > versions
> > > > > >> >> >>> in
> > > > > >> >> >>>>> the
> > > > > >> >> >>>>>>> same
> > > > > >> >> >>>>>>>>>>>> release
> > > > > >> >> >>>>>>>>>>>>>>>> line
> > > > > >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
> > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>> Thanks,
> > > > > >> >> >>>>>>>>>>>>>>>>>> Thomas
> > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>> [1]
> > https://github.com/apache/flink/pull/7249
> > > > > >> >> >>>>>>>>>>>>>>>>>> [2]
> > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>
> > > > > >> >> >>>>>>>>
> > > > > >> >> >>>>>>>
> > > > > >> >> >>>>>>
> > > > > >> >> >>>>>
> > > > > >> >> >>>>
> > > > > >> >> >>>
> > > > > >> >> >>
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> > https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > > > >> >> >>>>>>>>>>>>>>>>>> [3]
> > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>
> > > > > >> >> >>>>>>>>
> > > > > >> >> >>>>>>>
> > > > > >> >> >>>>>>
> > > > > >> >> >>>>>
> > > > > >> >> >>>>
> > > > > >> >> >>>
> > > > > >> >> >>
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > > > >> >> >>>>>>>>>>>>>>>>>> [4]
> > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>
> > > > > >> >> >>>>>>>>
> > > > > >> >> >>>>>>>
> > > > > >> >> >>>>>>
> > > > > >> >> >>>>>
> > > > > >> >> >>>>
> > > > > >> >> >>>
> > > > > >> >> >>
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> > https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
> > > > > >> >> >>>>>>>> zjffdu@gmail.com>
> > > > > >> >> >>>>>>>>>>>>> wrote:
> > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should
> > be
> > > > > >> >> >>> able
> > > > > >> >> >>>> to
> > > > > >> >> >>>>>>>> define
> > > > > >> >> >>>>>>>>>>>> where
> > > > > >> >> >>>>>>>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>>> job
> > > > > >> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is
> > actually
> > > > > >> >> >>>>>> independent
> > > > > >> >> >>>>>>>> of
> > > > > >> >> >>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>> job
> > > > > >> >> >>>>>>>>>>>>>>>>>>> development and is something which is
> > decided
> > > > > >> >> >> at
> > > > > >> >> >>>>>>> deployment
> > > > > >> >> >>>>>>>>>>> time.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>> User don't need to specify execution mode
> > > > > >> >> >>>>>>> programmatically.
> > > > > >> >> >>>>>>>>>>> They
> > > > > >> >> >>>>>>>>>>>>> can
> > > > > >> >> >>>>>>>>>>>>>>>>> also
> > > > > >> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from the arguments
> > in
> > > > > >> >> >>> flink
> > > > > >> >> >>>>> run
> > > > > >> >> >>>>>>>>>>> command.
> > > > > >> >> >>>>>>>>>>>>> e.g.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster ....
> > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
> > > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that the
> > > > > >> >> >>>> ExecutionEnvironment
> > > > > >> >> >>>>>> is
> > > > > >> >> >>>>>>>> not
> > > > > >> >> >>>>>>>>>>>>>>>> directly
> > > > > >> >> >>>>>>>>>>>>>>>>>>> initialized by the user and instead context
> > > > > >> >> >>>> sensitive
> > > > > >> >> >>>>>> how
> > > > > >> >> >>>>>>>> you
> > > > > >> >> >>>>>>>>>>>> want
> > > > > >> >> >>>>>>>>>>>>> to
> > > > > >> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs. IDE, for
> > > > > >> >> >>> example).
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink would
> > create
> > > > > >> >> >>>>> different
> > > > > >> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment based on
> > > different
> > > > > >> >> >>>>>> submission
> > > > > >> >> >>>>>>>>>>>> scenarios
> > > > > >> >> >>>>>>>>>>>>>>>>>> (Flink
> > > > > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is kind of hack
> > > > > >> >> >> approach,
> > > > > >> >> >>>> not
> > > > > >> >> >>>>>> so
> > > > > >> >> >>>>>>>>>>>>>>>>> straightforward.
> > > > > >> >> >>>>>>>>>>>>>>>>>>> What I suggested above is that is that
> > flink
> > > > > >> >> >>> should
> > > > > >> >> >>>>>>> always
> > > > > >> >> >>>>>>>>>>> create
> > > > > >> >> >>>>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with different
> > > > > >> >> >>>>> configuration,
> > > > > >> >> >>>>>>> and
> > > > > >> >> >>>>>>>>>>> based
> > > > > >> >> >>>>>>>>>>>> on
> > > > > >> >> >>>>>>>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>>>> configuration it would create the proper
> > > > > >> >> >>>>> ClusterClient
> > > > > >> >> >>>>>>> for
> > > > > >> >> >>>>>>>>>>>>> different
> > > > > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> > > > > >> >> >>>> 于2018年12月20日周四
> > > > > >> >> >>>>>>>>>>> 下午11:18写道：
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> You are probably right that we have code
> > > > > >> >> >>>> duplication
> > > > > >> >> >>>>>>> when
> > > > > >> >> >>>>>>>> it
> > > > > >> >> >>>>>>>>>>>> comes
> > > > > >> >> >>>>>>>>>>>>>>>> to
> > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient. This should
> > > be
> > > > > >> >> >>>>> reduced
> > > > > >> >> >>>>>> in
> > > > > >> >> >>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>> future.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
> > > > > >> >> >> able
> > > > > >> >> >>> to
> > > > > >> >> >>>>>>> define
> > > > > >> >> >>>>>>>>>>> where
> > > > > >> >> >>>>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is
> > > actually
> > > > > >> >> >>>>>>> independent
> > > > > >> >> >>>>>>>>>>> of the
> > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> development and is something which is
> > > decided
> > > > > >> >> >> at
> > > > > >> >> >>>>>>>> deployment
> > > > > >> >> >>>>>>>>>>>> time.
> > > > > >> >> >>>>>>>>>>>>>>>> To
> > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > > >> >> >>>>>>>>>>>>>>>>>>> it
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the ExecutionEnvironment
> > is
> > > > > >> >> >> not
> > > > > >> >> >>>>>>> directly
> > > > > >> >> >>>>>>>>>>>>>>>> initialized
> > > > > >> >> >>>>>>>>>>>>>>>>>> by
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> the user and instead context sensitive how
> > > you
> > > > > >> >> >>>> want
> > > > > >> >> >>>>> to
> > > > > >> >> >>>>>>>>>>> execute
> > > > > >> >> >>>>>>>>>>>>> your
> > > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for example).
> > However, I
> > > > > >> >> >>> agree
> > > > > >> >> >>>>>> that
> > > > > >> >> >>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should give you
> > access
> > > to
> > > > > >> >> >>> the
> > > > > >> >> >>>>>>>>>>> ClusterClient
> > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > >> >> >>>>>>>>>>>>>>>>> to
> > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the JobGraph or
> > a
> > > > > >> >> >> job
> > > > > >> >> >>>>> plan).
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> Till
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff
> > Zhang <
> > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com>
> > > > > >> >> >>>>>>>>>>>>>>>> wrote:
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You are right
> > > that I
> > > > > >> >> >>>>> expect
> > > > > >> >> >>>>>>>> better
> > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api which could be
> > > > > >> >> >> used
> > > > > >> >> >>> by
> > > > > >> >> >>>>>>>>>>> downstream
> > > > > >> >> >>>>>>>>>>>>>>>>> project.
> > > > > >> >> >>>>>>>>>>>>>>>>>>> And
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the flink ecosystem.
> > > > > >> >> >> When
> > > > > >> >> >>> I
> > > > > >> >> >>>>> look
> > > > > >> >> >>>>>>> at
> > > > > >> >> >>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>> code
> > > > > >> >> >>>>>>>>>>>>>>>>> of
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> flink
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I believe
> > they
> > > > > >> >> >> are
> > > > > >> >> >>>> not
> > > > > >> >> >>>>>> the
> > > > > >> >> >>>>>>>>>>> core of
> > > > > >> >> >>>>>>>>>>>>>>>>> flink,
> > > > > >> >> >>>>>>>>>>>>>>>>>>> but
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of flink), I find
> > > > > >> >> >> many
> > > > > >> >> >>>>>>> duplicated
> > > > > >> >> >>>>>>>>>>> code
> > > > > >> >> >>>>>>>>>>>>>>>> for
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> creating
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user provided
> > > > > >> >> >> configuration
> > > > > >> >> >>>>>>>>>>> (configuration
> > > > > >> >> >>>>>>>>>>>>>>>>> format
> > > > > >> >> >>>>>>>>>>>>>>>>>>> may
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell and
> > sql-client)
> > > > > >> >> >> and
> > > > > >> >> >>>> then
> > > > > >> >> >>>>>> use
> > > > > >> >> >>>>>>>>>>> that
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't think this is
> > > > > >> >> >>>>> convenient
> > > > > >> >> >>>>>>> for
> > > > > >> >> >>>>>>>>>>>>>>>> downstream
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is that
> > downstream
> > > > > >> >> >>>> project
> > > > > >> >> >>>>>> only
> > > > > >> >> >>>>>>>>>>> needs
> > > > > >> >> >>>>>>>>>>>> to
> > > > > >> >> >>>>>>>>>>>>>>>>>>> provide
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration info (maybe
> > > > > >> >> >> introducing
> > > > > >> >> >>>>> class
> > > > > >> >> >>>>>>>>>>>> FlinkConf),
> > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment based on this
> > > > > >> >> >>>> FlinkConf,
> > > > > >> >> >>>>>> and
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will create the
> > proper
> > > > > >> >> >>>>>>>> ClusterClient.
> > > > > >> >> >>>>>>>>>>> It
> > > > > >> >> >>>>>>>>>>>> not
> > > > > >> >> >>>>>>>>>>>>>>>>>> only
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream project
> > > > > >> >> >> development
> > > > > >> >> >>>> but
> > > > > >> >> >>>>>> also
> > > > > >> >> >>>>>>>> be
> > > > > >> >> >>>>>>>>>>>>>>>> helpful
> > > > > >> >> >>>>>>>>>>>>>>>>>> for
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> their integration test with flink. Here's
> > > one
> > > > > >> >> >>>>> sample
> > > > > >> >> >>>>>>> code
> > > > > >> >> >>>>>>>>>>>> snippet
> > > > > >> >> >>>>>>>>>>>>>>>>>> that
> > > > > >> >> >>>>>>>>>>>>>>>>>>> I
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new FlinkConf().mode("yarn")
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new ExecutionEnvironment(conf)
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> > > > > >> >> >>>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> > > > > >> >> >>>>> 于2018年12月11日周二
> > > > > >> >> >>>>>>>>>>> 下午6:28写道：
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to provide the
> > > > > >> >> >> user
> > > > > >> >> >>>> with
> > > > > >> >> >>>>>>>> better
> > > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was actually an effort to
> > > > > >> >> >>> achieve
> > > > > >> >> >>>>>> this
> > > > > >> >> >>>>>>>> but
> > > > > >> >> >>>>>>>>>>> it
> > > > > >> >> >>>>>>>>>>>>>>>> has
> > > > > >> >> >>>>>>>>>>>>>>>>>>> never
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> been
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However, there are some
> > > > > >> >> >>>> improvement
> > > > > >> >> >>>>>> in
> > > > > >> >> >>>>>>>> the
> > > > > >> >> >>>>>>>>>>> code
> > > > > >> >> >>>>>>>>>>>>>>>>> base
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> now.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the NewClusterClient
> > > > > >> >> >>>> interface
> > > > > >> >> >>>>>>> which
> > > > > >> >> >>>>>>>>>>>>>>>> offers a
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission. But I agree
> > > > > >> >> >> that
> > > > > >> >> >>> we
> > > > > >> >> >>>>>> need
> > > > > >> >> >>>>>>> to
> > > > > >> >> >>>>>>>>>>>>>>>> improve
> > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> in
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if exposing all
> > > > > >> >> >>>>>> ClusterClient
> > > > > >> >> >>>>>>>>>>> calls
> > > > > >> >> >>>>>>>>>>>>>>>> via
> > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment because it would
> > > > > >> >> >> clutter
> > > > > >> >> >>>> the
> > > > > >> >> >>>>>>> class
> > > > > >> >> >>>>>>>>>>> and
> > > > > >> >> >>>>>>>>>>>>>>>> would
> > > > > >> >> >>>>>>>>>>>>>>>>>> not
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> a
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns. Instead one
> > > > > >> >> >> idea
> > > > > >> >> >>>>> could
> > > > > >> >> >>>>>> be
> > > > > >> >> >>>>>>>> to
> > > > > >> >> >>>>>>>>>>>>>>>>> retrieve
> > > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from the
> > > > > >> >> >>>>> ExecutionEnvironment
> > > > > >> >> >>>>>>>> which
> > > > > >> >> >>>>>>>>>>> can
> > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > >> >> >>>>>>>>>>>>>>>>>> be
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> used
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job control. But before
> > we
> > > > > >> >> >>> start
> > > > > >> >> >>>>> an
> > > > > >> >> >>>>>>>> effort
> > > > > >> >> >>>>>>>>>>>>>>>> here,
> > > > > >> >> >>>>>>>>>>>>>>>>> we
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> need
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> to
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what functionality we
> > > want
> > > > > >> >> >>> to
> > > > > >> >> >>>>>>> provide.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was that we have the
> > > > > >> >> >>>>>>>> ClusterDescriptor
> > > > > >> >> >>>>>>>>>>>>>>>>>> describing
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> how
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager like Yarn or
> > > > > >> >> >> Mesos.
> > > > > >> >> >>>> The
> > > > > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> can
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> be
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink clusters (job
> > and
> > > > > >> >> >>>>> session)
> > > > > >> >> >>>>>>> and
> > > > > >> >> >>>>>>>>>>> gives
> > > > > >> >> >>>>>>>>>>>>>>>>> you a
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The ClusterClient
> > controls
> > > > > >> >> >> the
> > > > > >> >> >>>>>> cluster
> > > > > >> >> >>>>>>>>>>> (e.g.
> > > > > >> >> >>>>>>>>>>>>>>>>>>> submitting
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running jobs). And
> > then
> > > > > >> >> >>> there
> > > > > >> >> >>>>> was
> > > > > >> >> >>>>>>> the
> > > > > >> >> >>>>>>>>>>> idea
> > > > > >> >> >>>>>>>>>>>> to
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain from the
> > > > > >> >> >>>> ClusterClient
> > > > > >> >> >>>>> to
> > > > > >> >> >>>>>>>>>>> trigger
> > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> specific
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a savepoint,
> > > > > >> >> >>> cancelling
> > > > > >> >> >>>>> the
> > > > > >> >> >>>>>>>> job).
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> > > > > >> >> >>>>>> https://issues.apache.org/jira/browse/FLINK-4272
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff
> > > Zhang
> > > > > >> >> >> <
> > > > > >> >> >>>>>>>>>>> zjffdu@gmail.com
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>> wrote:
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate flink into
> > > apache
> > > > > >> >> >>>>> zeppelin
> > > > > >> >> >>>>>>>>>>> which is
> > > > > >> >> >>>>>>>>>>>>>>>> an
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several issues that
> > > is
> > > > > >> >> >>>> caused
> > > > > >> >> >>>>>> by
> > > > > >> >> >>>>>>>>>>> flink
> > > > > >> >> >>>>>>>>>>>>>>>>> client
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> api.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> So
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the following
> > > changes
> > > > > >> >> >>> for
> > > > > >> >> >>>>>> flink
> > > > > >> >> >>>>>>>>>>> client
> > > > > >> >> >>>>>>>>>>>>>>>>> api.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking execution.
> > > > > >> >> >> Currently,
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which would do 2
> > > > > >> >> >> things,
> > > > > >> >> >>>>> first
> > > > > >> >> >>>>>>>>>>> submit
> > > > > >> >> >>>>>>>>>>>>>>>> job
> > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is finished. I'd
> > > like
> > > > > >> >> >>>>>>> introduce a
> > > > > >> >> >>>>>>>>>>>>>>>>>> nonblocking
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
> > > > > >> >> >>>> ExecutionEnvironment#submit
> > > > > >> >> >>>>>>> which
> > > > > >> >> >>>>>>>>>>> only
> > > > > >> >> >>>>>>>>>>>>>>>>>> submit
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> job
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> and
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to client. And allow
> > > user
> > > > > >> >> >>> to
> > > > > >> >> >>>>>> query
> > > > > >> >> >>>>>>>> the
> > > > > >> >> >>>>>>>>>>> job
> > > > > >> >> >>>>>>>>>>>>>>>>>> status
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> via
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to cancel job is
> > > via
> > > > > >> >> >>> cli
> > > > > >> >> >>>>>>>>>>> (bin/flink),
> > > > > >> >> >>>>>>>>>>>>>>>>> this
> > > > > >> >> >>>>>>>>>>>>>>>>>>> is
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> not
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream project to
> > use
> > > > > >> >> >> this
> > > > > >> >> >>>>>>> feature.
> > > > > >> >> >>>>>>>>>>> So I'd
> > > > > >> >> >>>>>>>>>>>>>>>>>> like
> > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> add
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in ExecutionEnvironment
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> It
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api, we should use
> > > > > >> >> >>>>>>>>>>> ExecutionEnvironment
> > > > > >> >> >>>>>>>>>>>>>>>> as
> > > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to integrate with
> > > > > >> >> >> flink.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job execution
> > > > > >> >> >> lifecycle.
> > > > > >> >> >>>>>>> Something
> > > > > >> >> >>>>>>>>>>> like
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> following,
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> so
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project can do custom
> > > logic
> > > > > >> >> >>> in
> > > > > >> >> >>>>> the
> > > > > >> >> >>>>>>>>>>> lifecycle
> > > > > >> >> >>>>>>>>>>>>>>>> of
> > > > > >> >> >>>>>>>>>>>>>>>>>>> job.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the jobId after
> > > job
> > > > > >> >> >> is
> > > > > >> >> >>>>>>> submitted
> > > > > >> >> >>>>>>>>>>> and
> > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > >> >> >>>>>>>>>>>>>>>>>>> use
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> this
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later when
> > necessary.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface JobListener {
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID jobId);
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobExecuted(JobExecutionResult
> > > > > >> >> >>>>> jobResult);
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID jobId);
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in
> > > ExecutionEnvironment.
> > > > > >> >> >>>>>> Currently
> > > > > >> >> >>>>>>> it
> > > > > >> >> >>>>>>>>>>> is
> > > > > >> >> >>>>>>>>>>>>>>>>>>> disabled,
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> but
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient for third
> > > party
> > > > > >> >> >> to
> > > > > >> >> >>>>>>>> submitting
> > > > > >> >> >>>>>>>>>>> jobs
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it again.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client api into
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue which needs
> > > more
> > > > > >> >> >>>>> careful
> > > > > >> >> >>>>>>>>>>> thinking
> > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> design.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features of flink is
> > > > > >> >> >>> exposed
> > > > > >> >> >>>> in
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > >> >> >>>>>> but
> > > > > >> >> >>>>>>>>>>> some are
> > > > > >> >> >>>>>>>>>>>>>>>>>>> exposed
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> in
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like the cancel and
> > > > > >> >> >>>>> savepoint I
> > > > > >> >> >>>>>>>>>>> mentioned
> > > > > >> >> >>>>>>>>>>>>>>>>>>> above.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> I
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is due to that
> > flink
> > > > > >> >> >>>> didn't
> > > > > >> >> >>>>>>> unify
> > > > > >> >> >>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> interaction
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> with
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3 scenarios of flink
> > > > > >> >> >>>> operation
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.  Flink will
> > > create
> > > > > >> >> >>>>>>>>>>> LocalEnvironment
> > > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> use
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to create
> > > > > >> >> >>> LocalExecutor
> > > > > >> >> >>>>> for
> > > > > >> >> >>>>>>> job
> > > > > >> >> >>>>>>>>>>>>>>>>>> execution.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution. Flink will
> > > create
> > > > > >> >> >>>>>>>> ClusterClient
> > > > > >> >> >>>>>>>>>>>>>>>>> first
> > > > > >> >> >>>>>>>>>>>>>>>>>>> and
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> then
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment based on the
> > > > > >> >> >>>>>>> ClusterClient
> > > > > >> >> >>>>>>>>>>> and
> > > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > > >> >> >>>>>>>>>>>>>>>>>>> run
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink will create
> > > > > >> >> >>>>>> ClusterClient
> > > > > >> >> >>>>>>>>>>> first
> > > > > >> >> >>>>>>>>>>>>>>>> and
> > > > > >> >> >>>>>>>>>>>>>>>>>>> then
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this ClusterClient.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the above 3
> > scenarios.
> > > > > >> >> >>> Flink
> > > > > >> >> >>>>>> didn't
> > > > > >> >> >>>>>>>>>>> use the
> > > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to interact with
> > > flink
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is following:
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> > > > > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
> > > > > >> >> >>>>>>>>>>> (based
> > > > > >> >> >>>>>>>>>>>>>>>> on
> > > > > >> >> >>>>>>>>>>>>>>>>>> user
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this Environment
> > > to
> > > > > >> >> >>>> create
> > > > > >> >> >>>>>>>> proper
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or
> > RestClusterClient)
> > > > > >> >> >> to
> > > > > >> >> >>>>>>>> interactive
> > > > > >> >> >>>>>>>>>>> with
> > > > > >> >> >>>>>>>>>>>>>>>>>>> Flink (
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the process of
> > > local
> > > > > >> >> >>>>>> execution
> > > > > >> >> >>>>>>>> and
> > > > > >> >> >>>>>>>>>>>>>>>> remote
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for third party
> > to
> > > > > >> >> >>>>> integrate
> > > > > >> >> >>>>>>> with
> > > > > >> >> >>>>>>>>>>>>>>>> flink,
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> because
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the unified
> > entry
> > > > > >> >> >>> point
> > > > > >> >> >>>>> for
> > > > > >> >> >>>>>>>>>>> flink.
> > > > > >> >> >>>>>>>>>>>>>>>> What
> > > > > >> >> >>>>>>>>>>>>>>>>>>> third
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> party
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass configuration
> > to
> > > > > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will do the right
> > > > > >> >> >> thing
> > > > > >> >> >>>>> based
> > > > > >> >> >>>>>> on
> > > > > >> >> >>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be considered as
> > flink
> > > > > >> >> >> api
> > > > > >> >> >>>>>>> consumer.
> > > > > >> >> >>>>>>>>>>> it
> > > > > >> >> >>>>>>>>>>>>>>>> just
> > > > > >> >> >>>>>>>>>>>>>>>>>>> pass
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to ExecutionEnvironment
> > and
> > > > > >> >> >> let
> > > > > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper ClusterClient instead
> > > of
> > > > > >> >> >>>>> letting
> > > > > >> >> >>>>>>> cli
> > > > > >> >> >>>>>>>> to
> > > > > >> >> >>>>>>>>>>>>>>>>> create
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code refactoring,
> > > so
> > > > > >> >> >> I
> > > > > >> >> >>>>> think
> > > > > >> >> >>>>>> we
> > > > > >> >> >>>>>>>> can
> > > > > >> >> >>>>>>>>>>>>>>>> defer
> > > > > >> >> >>>>>>>>>>>>>>>>>> it
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> for
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5 could be done
> > > at
> > > > > >> >> >>>> once I
> > > > > >> >> >>>>>>>>>>> believe.
> > > > > >> >> >>>>>>>>>>>>>>>> Let
> > > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > > >> >> >>>>>>>>>>>>>>>>>>>> know
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>> your
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback, thanks
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> --
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>> --
> > > > > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>> --
> > > > > >> >> >>>>>>>>>>>>>>>>> Best Regards
> > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> > > > > >> >> >>>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>> --
> > > > > >> >> >>>>>>>>>>>>>>> Best Regards
> > > > > >> >> >>>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>> Jeff Zhang
> > > > > >> >> >>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>> --
> > > > > >> >> >>>>>>>>>>>>> Best Regards
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>> Jeff Zhang
> > > > > >> >> >>>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>>
> > > > > >> >> >>>>>>>>>>
> > > > > >> >> >>>>>>>>
> > > > > >> >> >>>>>>>> --
> > > > > >> >> >>>>>>>> Best Regards
> > > > > >> >> >>>>>>>>
> > > > > >> >> >>>>>>>> Jeff Zhang
> > > > > >> >> >>>>>>>>
> > > > > >> >> >>>>>>>
> > > > > >> >> >>>>>>
> > > > > >> >> >>>>>
> > > > > >> >> >>>>>
> > > > > >> >> >>>>> --
> > > > > >> >> >>>>> Best Regards
> > > > > >> >> >>>>>
> > > > > >> >> >>>>> Jeff Zhang
> > > > > >> >> >>>>>
> > > > > >> >> >>>>
> > > > > >> >> >>>
> > > > > >> >> >>
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > --
> > > > > >> >> > Best Regards
> > > > > >> >> >
> > > > > >> >> > Jeff Zhang
> > > > > >> >>
> > > > > >> >>
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

Hi Yang,

It would be helpful if you check Stephan's last comment,
which states that isolation is important.

For per-job mode, we run a dedicated cluster(maybe it
should have been a couple of JM and TMs during FLIP-6
design) for a specific job. Thus the process is prevented
from other jobs.

In our cases there was a time we suffered from multi
jobs submitted by different users and they affected
each other so that all ran into an error state. Also,
run the client inside the cluster could save client
resource at some points.

However, we also face several issues as you mentioned,
that in per-job mode it always uses parent classloader
thus classloading issues occur.

BTW, one can makes an analogy between session/per-job mode
in  Flink, and client/cluster mode in Spark.

Best,
tison.


Yang Wang <da...@gmail.com> 于2019年8月22日周四 上午11:25写道：

> From the user's perspective, it is really confused about the scope of
> per-job cluster.
>
>
> If it means a flink cluster with single job, so that we could get better
> isolation.
>
> Now it does not matter how we deploy the cluster, directly deploy(mode1)
>
> or start a flink cluster and then submit job through cluster client(mode2).
>
>
> Otherwise, if it just means directly deploy, how should we name the mode2,
>
> session with job or something else?
>
> We could also benefit from the mode2. Users could get the same isolation
> with mode1.
>
> The user code and dependencies will be loaded by user class loader
>
> to avoid class conflict with framework.
>
>
>
> Anyway, both of the two submission modes are useful.
>
> We just need to clarify the concepts.
>
>
>
>
> Best,
>
> Yang
>
> Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午5:58写道：
>
> > Thanks for the clarification.
> >
> > The idea JobDeployer ever came into my mind when I was muddled with
> > how to execute per-job mode and session mode with the same user code
> > and framework codepath.
> >
> > With the concept JobDeployer we back to the statement that environment
> > knows every configs of cluster deployment and job submission. We
> > configure or generate from configuration a specific JobDeployer in
> > environment and then code align on
> >
> > *JobClient client = env.execute().get();*
> >
> > which in session mode returned by clusterClient.submitJob and in per-job
> > mode returned by clusterDescriptor.deployJobCluster.
> >
> > Here comes a problem that currently we directly run ClusterEntrypoint
> > with extracted job graph. Follow the JobDeployer way we'd better
> > align entry point of per-job deployment at JobDeployer. Users run
> > their main method or by a Cli(finally call main method) to deploy the
> > job cluster.
> >
> > Best,
> > tison.
> >
> >
> > Stephan Ewen <se...@apache.org> 于2019年8月20日周二 下午4:40写道：
> >
> > > Till has made some good comments here.
> > >
> > > Two things to add:
> > >
> > >   - The job mode is very nice in the way that it runs the client inside
> > the
> > > cluster (in the same image/process that is the JM) and thus unifies
> both
> > > applications and what the Spark world calls the "driver mode".
> > >
> > >   - Another thing I would add is that during the FLIP-6 design, we were
> > > thinking about setups where Dispatcher and JobManager are separate
> > > processes.
> > >     A Yarn or Mesos Dispatcher of a session could run independently
> (even
> > > as privileged processes executing no code).
> > >     Then you the "per-job" mode could still be helpful: when a job is
> > > submitted to the dispatcher, it launches the JM again in a per-job
> mode,
> > so
> > > that JM and TM processes are bound to teh job only. For higher security
> > > setups, it is important that processes are not reused across jobs.
> > >
> > > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <tr...@apache.org>
> > > wrote:
> > >
> > > > I would not be in favour of getting rid of the per-job mode since it
> > > > simplifies the process of running Flink jobs considerably. Moreover,
> it
> > > is
> > > > not only well suited for container deployments but also for
> deployments
> > > > where you want to guarantee job isolation. For example, a user could
> > use
> > > > the per-job mode on Yarn to execute his job on a separate cluster.
> > > >
> > > > I think that having two notions of cluster deployments (session vs.
> > > per-job
> > > > mode) does not necessarily contradict your ideas for the client api
> > > > refactoring. For example one could have the following interfaces:
> > > >
> > > > - ClusterDeploymentDescriptor: encapsulates the logic how to deploy a
> > > > cluster.
> > > > - ClusterClient: allows to interact with a cluster
> > > > - JobClient: allows to interact with a running job
> > > >
> > > > Now the ClusterDeploymentDescriptor could have two methods:
> > > >
> > > > - ClusterClient deploySessionCluster()
> > > > - JobClusterClient/JobClient deployPerJobCluster(JobGraph)
> > > >
> > > > where JobClusterClient is either a supertype of ClusterClient which
> > does
> > > > not give you the functionality to submit jobs or deployPerJobCluster
> > > > returns directly a JobClient.
> > > >
> > > > When setting up the ExecutionEnvironment, one would then not provide
> a
> > > > ClusterClient to submit jobs but a JobDeployer which, depending on
> the
> > > > selected mode, either uses a ClusterClient (session mode) to submit
> > jobs
> > > or
> > > > a ClusterDeploymentDescriptor to deploy per a job mode cluster with
> the
> > > job
> > > > to execute.
> > > >
> > > > These are just some thoughts how one could make it working because I
> > > > believe there is some value in using the per job mode from the
> > > > ExecutionEnvironment.
> > > >
> > > > Concerning the web submission, this is indeed a bit tricky. From a
> > > cluster
> > > > management stand point, I would in favour of not executing user code
> on
> > > the
> > > > REST endpoint. Especially when considering security, it would be good
> > to
> > > > have a well defined cluster behaviour where it is explicitly stated
> > where
> > > > user code and, thus, potentially risky code is executed. Ideally we
> > limit
> > > > it to the TaskExecutor and JobMaster.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <
> > pompermaier@okkam.it
> > > >
> > > > wrote:
> > > >
> > > > > In my opinion the client should not use any environment to get the
> > Job
> > > > > graph because the jar should reside ONLY on the cluster (and not in
> > the
> > > > > client classpath otherwise there are always inconsistencies between
> > > > client
> > > > > and Flink Job manager's classpath).
> > > > > In the YARN, Mesos and Kubernetes scenarios you have the jar but
> you
> > > > could
> > > > > start a cluster that has the jar on the Job Manager as well (but
> this
> > > is
> > > > > the only case where I think you can assume that the client has the
> > jar
> > > on
> > > > > the classpath..in the REST job submission you don't have any
> > > classpath).
> > > > >
> > > > > Thus, always in my opinion, the JobGraph should be generated by the
> > Job
> > > > > Manager REST API.
> > > > >
> > > > >
> > > > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <wa...@gmail.com>
> > > wrote:
> > > > >
> > > > >> I would like to involve Till & Stephan here to clarify some
> concept
> > of
> > > > >> per-job mode.
> > > > >>
> > > > >> The term per-job is one of modes a cluster could run on. It is
> > mainly
> > > > >> aimed
> > > > >> at spawn
> > > > >> a dedicated cluster for a specific job while the job could be
> > packaged
> > > > >> with
> > > > >> Flink
> > > > >> itself and thus the cluster initialized with job so that get rid
> of
> > a
> > > > >> separated
> > > > >> submission step.
> > > > >>
> > > > >> This is useful for container deployments where one create his
> image
> > > with
> > > > >> the job
> > > > >> and then simply deploy the container.
> > > > >>
> > > > >> However, it is out of client scope since a client(ClusterClient
> for
> > > > >> example) is for
> > > > >> communicate with an existing cluster and performance actions.
> > > Currently,
> > > > >> in
> > > > >> per-job
> > > > >> mode, we extract the job graph and bundle it into cluster
> deployment
> > > and
> > > > >> thus no
> > > > >> concept of client get involved. It looks like reasonable to
> exclude
> > > the
> > > > >> deployment
> > > > >> of per-job cluster from client api and use dedicated utility
> > > > >> classes(deployers) for
> > > > >> deployment.
> > > > >>
> > > > >> Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午12:37写道：
> > > > >>
> > > > >> > Hi Aljoscha,
> > > > >> >
> > > > >> > Thanks for your reply and participance. The Google Doc you
> linked
> > to
> > > > >> > requires
> > > > >> > permission and I think you could use a share link instead.
> > > > >> >
> > > > >> > I agree with that we almost reach a consensus that JobClient is
> > > > >> necessary
> > > > >> > to
> > > > >> > interacte with a running Job.
> > > > >> >
> > > > >> > Let me check your open questions one by one.
> > > > >> >
> > > > >> > 1. Separate cluster creation and job submission for per-job
> mode.
> > > > >> >
> > > > >> > As you mentioned here is where the opinions diverge. In my
> > document
> > > > >> there
> > > > >> > is
> > > > >> > an alternative[2] that proposes excluding per-job deployment
> from
> > > > client
> > > > >> > api
> > > > >> > scope and now I find it is more reasonable we do the exclusion.
> > > > >> >
> > > > >> > When in per-job mode, a dedicated JobCluster is launched to
> > execute
> > > > the
> > > > >> > specific job. It is like a Flink Application more than a
> > submission
> > > > >> > of Flink Job. Client only takes care of job submission and
> assume
> > > > there
> > > > >> is
> > > > >> > an existing cluster. In this way we are able to consider per-job
> > > > issues
> > > > >> > individually and JobClusterEntrypoint would be the utility class
> > for
> > > > >> > per-job
> > > > >> > deployment.
> > > > >> >
> > > > >> > Nevertheless, user program works in both session mode and
> per-job
> > > mode
> > > > >> > without
> > > > >> > necessary to change code. JobClient in per-job mode is returned
> > from
> > > > >> > env.execute as normal. However, it would be no longer a wrapper
> of
> > > > >> > RestClusterClient but a wrapper of PerJobClusterClient which
> > > > >> communicates
> > > > >> > to Dispatcher locally.
> > > > >> >
> > > > >> > 2. How to deal with plan preview.
> > > > >> >
> > > > >> > With env.compile functions users can get JobGraph or FlinkPlan
> and
> > > > thus
> > > > >> > they can preview the plan with programming. Typically it looks
> > like
> > > > >> >
> > > > >> > if (preview configured) {
> > > > >> >     FlinkPlan plan = env.compile();
> > > > >> >     new JSONDumpGenerator(...).dump(plan);
> > > > >> > } else {
> > > > >> >     env.execute();
> > > > >> > }
> > > > >> >
> > > > >> > And `flink info` would be invalid any more.
> > > > >> >
> > > > >> > 3. How to deal with Jar Submission at the Web Frontend.
> > > > >> >
> > > > >> > There is one more thread talked on this topic[1]. Apart from
> > > removing
> > > > >> > the functions there are two alternatives.
> > > > >> >
> > > > >> > One is to introduce an interface has a method returns
> > > > JobGraph/FilnkPlan
> > > > >> > and Jar Submission only support main-class implements this
> > > interface.
> > > > >> > And then extract the JobGraph/FlinkPlan just by calling the
> > method.
> > > > >> > In this way, it is even possible to consider a separation of job
> > > > >> creation
> > > > >> > and job submission.
> > > > >> >
> > > > >> > The other is, as you mentioned, let execute() do the actual
> > > execution.
> > > > >> > We won't execute the main method in the WebFrontend but spawn a
> > > > process
> > > > >> > at WebMonitor side to execute. For return part we could generate
> > the
> > > > >> > JobID from WebMonitor and pass it to the execution environemnt.
> > > > >> >
> > > > >> > 4. How to deal with detached mode.
> > > > >> >
> > > > >> > I think detached mode is a temporary solution for non-blocking
> > > > >> submission.
> > > > >> > In my document both submission and execution return a
> > > > CompletableFuture
> > > > >> and
> > > > >> > users control whether or not wait for the result. In this point
> we
> > > > don't
> > > > >> > need a detached option but the functionality is covered.
> > > > >> >
> > > > >> > 5. How does per-job mode interact with interactive programming.
> > > > >> >
> > > > >> > All of YARN, Mesos and Kubernetes scenarios follow the pattern
> > > launch
> > > > a
> > > > >> > JobCluster now. And I don't think there would be inconsistency
> > > between
> > > > >> > different resource management.
> > > > >> >
> > > > >> > Best,
> > > > >> > tison.
> > > > >> >
> > > > >> > [1]
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> > > > >> > [2]
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> > > > >> >
> > > > >> > Aljoscha Krettek <al...@apache.org> 于2019年8月16日周五 下午9:20写道：
> > > > >> >
> > > > >> >> Hi,
> > > > >> >>
> > > > >> >> I read both Jeffs initial design document and the newer
> document
> > by
> > > > >> >> Tison. I also finally found the time to collect our thoughts on
> > the
> > > > >> issue,
> > > > >> >> I had quite some discussions with Kostas and this is the
> result:
> > > [1].
> > > > >> >>
> > > > >> >> I think overall we agree that this part of the code is in dire
> > need
> > > > of
> > > > >> >> some refactoring/improvements but I think there are still some
> > open
> > > > >> >> questions and some differences in opinion what those
> refactorings
> > > > >> should
> > > > >> >> look like.
> > > > >> >>
> > > > >> >> I think the API-side is quite clear, i.e. we need some
> JobClient
> > > API
> > > > >> that
> > > > >> >> allows interacting with a running Job. It could be worthwhile
> to
> > > spin
> > > > >> that
> > > > >> >> off into a separate FLIP because we can probably find consensus
> > on
> > > > that
> > > > >> >> part more easily.
> > > > >> >>
> > > > >> >> For the rest, the main open questions from our doc are these:
> > > > >> >>
> > > > >> >>   - Do we want to separate cluster creation and job submission
> > for
> > > > >> >> per-job mode? In the past, there were conscious efforts to
> *not*
> > > > >> separate
> > > > >> >> job submission from cluster creation for per-job clusters for
> > > Mesos,
> > > > >> YARN,
> > > > >> >> Kubernets (see StandaloneJobClusterEntryPoint). Tison suggests
> in
> > > his
> > > > >> >> design document to decouple this in order to unify job
> > submission.
> > > > >> >>
> > > > >> >>   - How to deal with plan preview, which needs to hijack
> > execute()
> > > > and
> > > > >> >> let the outside code catch an exception?
> > > > >> >>
> > > > >> >>   - How to deal with Jar Submission at the Web Frontend, which
> > > needs
> > > > to
> > > > >> >> hijack execute() and let the outside code catch an exception?
> > > > >> >> CliFrontend.run() “hijacks” ExecutionEnvironment.execute() to
> > get a
> > > > >> >> JobGraph and then execute that JobGraph manually. We could get
> > > around
> > > > >> that
> > > > >> >> by letting execute() do the actual execution. One caveat for
> this
> > > is
> > > > >> that
> > > > >> >> now the main() method doesn’t return (or is forced to return by
> > > > >> throwing an
> > > > >> >> exception from execute()) which means that for Jar Submission
> > from
> > > > the
> > > > >> >> WebFrontend we have a long-running main() method running in the
> > > > >> >> WebFrontend. This doesn’t sound very good. We could get around
> > this
> > > > by
> > > > >> >> removing the plan preview feature and by removing Jar
> > > > >> Submission/Running.
> > > > >> >>
> > > > >> >>   - How to deal with detached mode? Right now,
> > DetachedEnvironment
> > > > will
> > > > >> >> execute the job and return immediately. If users control when
> > they
> > > > >> want to
> > > > >> >> return, by waiting on the job completion future, how do we deal
> > > with
> > > > >> this?
> > > > >> >> Do we simply remove the distinction between
> > detached/non-detached?
> > > > >> >>
> > > > >> >>   - How does per-job mode interact with “interactive
> programming”
> > > > >> >> (FLIP-36). For YARN, each execute() call could spawn a new
> Flink
> > > YARN
> > > > >> >> cluster. What about Mesos and Kubernetes?
> > > > >> >>
> > > > >> >> The first open question is where the opinions diverge, I think.
> > The
> > > > >> rest
> > > > >> >> are just open questions and interesting things that we need to
> > > > >> consider.
> > > > >> >>
> > > > >> >> Best,
> > > > >> >> Aljoscha
> > > > >> >>
> > > > >> >> [1]
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > >> >> <
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > > >> >> >
> > > > >> >>
> > > > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <zj...@gmail.com>
> > wrote:
> > > > >> >> >
> > > > >> >> > Thanks tison for the effort. I left a few comments.
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三 下午8:24写道：
> > > > >> >> >
> > > > >> >> >> Hi Flavio,
> > > > >> >> >>
> > > > >> >> >> Thanks for your reply.
> > > > >> >> >>
> > > > >> >> >> Either current impl and in the design, ClusterClient
> > > > >> >> >> never takes responsibility for generating JobGraph.
> > > > >> >> >> (what you see in current codebase is several class methods)
> > > > >> >> >>
> > > > >> >> >> Instead, user describes his program in the main method
> > > > >> >> >> with ExecutionEnvironment apis and calls env.compile()
> > > > >> >> >> or env.optimize() to get FlinkPlan and JobGraph
> respectively.
> > > > >> >> >>
> > > > >> >> >> For listing main classes in a jar and choose one for
> > > > >> >> >> submission, you're now able to customize a CLI to do it.
> > > > >> >> >> Specifically, the path of jar is passed as arguments and
> > > > >> >> >> in the customized CLI you list main classes, choose one
> > > > >> >> >> to submit to the cluster.
> > > > >> >> >>
> > > > >> >> >> Best,
> > > > >> >> >> tison.
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >> Flavio Pompermaier <po...@okkam.it> 于2019年7月31日周三
> > > 下午8:12写道：
> > > > >> >> >>
> > > > >> >> >>> Just one note on my side: it is not clear to me whether the
> > > > client
> > > > >> >> needs
> > > > >> >> >> to
> > > > >> >> >>> be able to generate a job graph or not.
> > > > >> >> >>> In my opinion, the job jar must resides only on the
> > > > >> server/jobManager
> > > > >> >> >> side
> > > > >> >> >>> and the client requires a way to get the job graph.
> > > > >> >> >>> If you really want to access to the job graph, I'd add a
> > > > dedicated
> > > > >> >> method
> > > > >> >> >>> on the ClusterClient. like:
> > > > >> >> >>>
> > > > >> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
> > > > >> >> >>>   - listMainClasses(jarId): List<String>
> > > > >> >> >>>
> > > > >> >> >>> These would require some addition also on the job manager
> > > > endpoint
> > > > >> as
> > > > >> >> >>> well..what do you think?
> > > > >> >> >>>
> > > > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <
> > > wander4096@gmail.com
> > > > >
> > > > >> >> wrote:
> > > > >> >> >>>
> > > > >> >> >>>> Hi all,
> > > > >> >> >>>>
> > > > >> >> >>>> Here is a document[1] on client api enhancement from our
> > > > >> perspective.
> > > > >> >> >>>> We have investigated current implementations. And we
> propose
> > > > >> >> >>>>
> > > > >> >> >>>> 1. Unify the implementation of cluster deployment and job
> > > > >> submission
> > > > >> >> in
> > > > >> >> >>>> Flink.
> > > > >> >> >>>> 2. Provide programmatic interfaces to allow flexible job
> and
> > > > >> cluster
> > > > >> >> >>>> management.
> > > > >> >> >>>>
> > > > >> >> >>>> The first proposal is aimed at reducing code paths of
> > cluster
> > > > >> >> >> deployment
> > > > >> >> >>>> and
> > > > >> >> >>>> job submission so that one can adopt Flink in his usage
> > > easily.
> > > > >> The
> > > > >> >> >>> second
> > > > >> >> >>>> proposal is aimed at providing rich interfaces for
> advanced
> > > > users
> > > > >> >> >>>> who want to make accurate control of these stages.
> > > > >> >> >>>>
> > > > >> >> >>>> Quick reference on open questions:
> > > > >> >> >>>>
> > > > >> >> >>>> 1. Exclude job cluster deployment from client side or
> > redefine
> > > > the
> > > > >> >> >>> semantic
> > > > >> >> >>>> of job cluster? Since it fits in a process quite different
> > > from
> > > > >> >> session
> > > > >> >> >>>> cluster deployment and job submission.
> > > > >> >> >>>>
> > > > >> >> >>>> 2. Maintain the codepaths handling class
> > > > o.a.f.api.common.Program
> > > > >> or
> > > > >> >> >>>> implement customized program handling logic by customized
> > > > >> >> CliFrontend?
> > > > >> >> >>>> See also this thread[2] and the document[1].
> > > > >> >> >>>>
> > > > >> >> >>>> 3. Expose ClusterClient as public api or just expose api
> in
> > > > >> >> >>>> ExecutionEnvironment
> > > > >> >> >>>> and delegate them to ClusterClient? Further, in either way
> > is
> > > it
> > > > >> >> worth
> > > > >> >> >> to
> > > > >> >> >>>> introduce a JobClient which is an encapsulation of
> > > ClusterClient
> > > > >> that
> > > > >> >> >>>> associated to specific job?
> > > > >> >> >>>>
> > > > >> >> >>>> Best,
> > > > >> >> >>>> tison.
> > > > >> >> >>>>
> > > > >> >> >>>> [1]
> > > > >> >> >>>>
> > > > >> >> >>>>
> > > > >> >> >>>
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> > > > >> >> >>>> [2]
> > > > >> >> >>>>
> > > > >> >> >>>>
> > > > >> >> >>>
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> > > > >> >> >>>>
> > > > >> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三 上午9:19写道：
> > > > >> >> >>>>
> > > > >> >> >>>>> Thanks Stephan, I will follow up this issue in next few
> > > weeks,
> > > > >> and
> > > > >> >> >> will
> > > > >> >> >>>>> refine the design doc. We could discuss more details
> after
> > > 1.9
> > > > >> >> >> release.
> > > > >> >> >>>>>
> > > > >> >> >>>>> Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：
> > > > >> >> >>>>>
> > > > >> >> >>>>>> Hi all!
> > > > >> >> >>>>>>
> > > > >> >> >>>>>> This thread has stalled for a bit, which I assume ist
> > mostly
> > > > >> due to
> > > > >> >> >>> the
> > > > >> >> >>>>>> Flink 1.9 feature freeze and release testing effort.
> > > > >> >> >>>>>>
> > > > >> >> >>>>>> I personally still recognize this issue as one important
> > to
> > > be
> > > > >> >> >>> solved.
> > > > >> >> >>>>> I'd
> > > > >> >> >>>>>> be happy to help resume this discussion soon (after the
> > 1.9
> > > > >> >> >> release)
> > > > >> >> >>>> and
> > > > >> >> >>>>>> see if we can do some step towards this in Flink 1.10.
> > > > >> >> >>>>>>
> > > > >> >> >>>>>> Best,
> > > > >> >> >>>>>> Stephan
> > > > >> >> >>>>>>
> > > > >> >> >>>>>>
> > > > >> >> >>>>>>
> > > > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <
> > > > >> >> >>>>> pompermaier@okkam.it>
> > > > >> >> >>>>>> wrote:
> > > > >> >> >>>>>>
> > > > >> >> >>>>>>> That's exactly what I suggested a long time ago: the
> > Flink
> > > > REST
> > > > >> >> >>>> client
> > > > >> >> >>>>>>> should not require any Flink dependency, only http
> > library
> > > to
> > > > >> >> >> call
> > > > >> >> >>>> the
> > > > >> >> >>>>>> REST
> > > > >> >> >>>>>>> services to submit and monitor a job.
> > > > >> >> >>>>>>> What I suggested also in [1] was to have a way to
> > > > automatically
> > > > >> >> >>>> suggest
> > > > >> >> >>>>>> the
> > > > >> >> >>>>>>> user (via a UI) the available main classes and their
> > > required
> > > > >> >> >>>>>>> parameters[2].
> > > > >> >> >>>>>>> Another problem we have with Flink is that the Rest
> > client
> > > > and
> > > > >> >> >> the
> > > > >> >> >>>> CLI
> > > > >> >> >>>>>> one
> > > > >> >> >>>>>>> behaves differently and we use the CLI client (via ssh)
> > > > because
> > > > >> >> >> it
> > > > >> >> >>>>> allows
> > > > >> >> >>>>>>> to call some other method after env.execute() [3] (we
> > have
> > > to
> > > > >> >> >> call
> > > > >> >> >>>>>> another
> > > > >> >> >>>>>>> REST service to signal the end of the job).
> > > > >> >> >>>>>>> Int his regard, a dedicated interface, like the
> > JobListener
> > > > >> >> >>> suggested
> > > > >> >> >>>>> in
> > > > >> >> >>>>>>> the previous emails, would be very helpful (IMHO).
> > > > >> >> >>>>>>>
> > > > >> >> >>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-10864
> > > > >> >> >>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-10862
> > > > >> >> >>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-10879
> > > > >> >> >>>>>>>
> > > > >> >> >>>>>>> Best,
> > > > >> >> >>>>>>> Flavio
> > > > >> >> >>>>>>>
> > > > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <
> > > zjffdu@gmail.com
> > > > >
> > > > >> >> >>> wrote:
> > > > >> >> >>>>>>>
> > > > >> >> >>>>>>>> Hi, Tison,
> > > > >> >> >>>>>>>>
> > > > >> >> >>>>>>>> Thanks for your comments. Overall I agree with you
> that
> > it
> > > > is
> > > > >> >> >>>>> difficult
> > > > >> >> >>>>>>> for
> > > > >> >> >>>>>>>> down stream project to integrate with flink and we
> need
> > to
> > > > >> >> >>> refactor
> > > > >> >> >>>>> the
> > > > >> >> >>>>>>>> current flink client api.
> > > > >> >> >>>>>>>> And I agree that CliFrontend should only parsing
> command
> > > > line
> > > > >> >> >>>>> arguments
> > > > >> >> >>>>>>> and
> > > > >> >> >>>>>>>> then pass them to ExecutionEnvironment. It is
> > > > >> >> >>>> ExecutionEnvironment's
> > > > >> >> >>>>>>>> responsibility to compile job, create cluster, and
> > submit
> > > > job.
> > > > >> >> >>>>> Besides
> > > > >> >> >>>>>>>> that, Currently flink has many ExecutionEnvironment
> > > > >> >> >>>> implementations,
> > > > >> >> >>>>>> and
> > > > >> >> >>>>>>>> flink will use the specific one based on the context.
> > > IMHO,
> > > > it
> > > > >> >> >> is
> > > > >> >> >>>> not
> > > > >> >> >>>>>>>> necessary, ExecutionEnvironment should be able to do
> the
> > > > right
> > > > >> >> >>>> thing
> > > > >> >> >>>>>>> based
> > > > >> >> >>>>>>>> on the FlinkConf it is received. Too many
> > > > ExecutionEnvironment
> > > > >> >> >>>>>>>> implementation is another burden for downstream
> project
> > > > >> >> >>>> integration.
> > > > >> >> >>>>>>>>
> > > > >> >> >>>>>>>> One thing I'd like to mention is flink's scala shell
> and
> > > sql
> > > > >> >> >>>> client,
> > > > >> >> >>>>>>>> although they are sub-modules of flink, they could be
> > > > treated
> > > > >> >> >> as
> > > > >> >> >>>>>>> downstream
> > > > >> >> >>>>>>>> project which use flink's client api. Currently you
> will
> > > > find
> > > > >> >> >> it
> > > > >> >> >>> is
> > > > >> >> >>>>> not
> > > > >> >> >>>>>>>> easy for them to integrate with flink, they share many
> > > > >> >> >> duplicated
> > > > >> >> >>>>> code.
> > > > >> >> >>>>>>> It
> > > > >> >> >>>>>>>> is another sign that we should refactor flink client
> > api.
> > > > >> >> >>>>>>>>
> > > > >> >> >>>>>>>> I believe it is a large and hard change, and I am
> afraid
> > > we
> > > > >> can
> > > > >> >> >>> not
> > > > >> >> >>>>>> keep
> > > > >> >> >>>>>>>> compatibility since many of changes are user facing.
> > > > >> >> >>>>>>>>
> > > > >> >> >>>>>>>>
> > > > >> >> >>>>>>>>
> > > > >> >> >>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月24日周一
> > 下午2:53写道：
> > > > >> >> >>>>>>>>
> > > > >> >> >>>>>>>>> Hi all,
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> After a closer look on our client apis, I can see
> there
> > > are
> > > > >> >> >> two
> > > > >> >> >>>>> major
> > > > >> >> >>>>>>>>> issues to consistency and integration, namely
> different
> > > > >> >> >>>> deployment
> > > > >> >> >>>>> of
> > > > >> >> >>>>>>>>> job cluster which couples job graph creation and
> > cluster
> > > > >> >> >>>>> deployment,
> > > > >> >> >>>>>>>>> and submission via CliFrontend confusing control flow
> > of
> > > > job
> > > > >> >> >>>> graph
> > > > >> >> >>>>>>>>> compilation and job submission. I'd like to follow
> the
> > > > >> >> >> discuss
> > > > >> >> >>>>> above,
> > > > >> >> >>>>>>>>> mainly the process described by Jeff and Stephan, and
> > > share
> > > > >> >> >> my
> > > > >> >> >>>>>>>>> ideas on these issues.
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> 1) CliFrontend confuses the control flow of job
> > > compilation
> > > > >> >> >> and
> > > > >> >> >>>>>>>> submission.
> > > > >> >> >>>>>>>>> Following the process of job submission Stephan and
> > Jeff
> > > > >> >> >>>> described,
> > > > >> >> >>>>>>>>> execution environment knows all configs of the
> cluster
> > > and
> > > > >> >> >>>>>>> topos/settings
> > > > >> >> >>>>>>>>> of the job. Ideally, in the main method of user
> > program,
> > > it
> > > > >> >> >>> calls
> > > > >> >> >>>>>>>> #execute
> > > > >> >> >>>>>>>>> (or named #submit) and Flink deploys the cluster,
> > compile
> > > > the
> > > > >> >> >>> job
> > > > >> >> >>>>>> graph
> > > > >> >> >>>>>>>>> and submit it to the cluster. However, current
> > > CliFrontend
> > > > >> >> >> does
> > > > >> >> >>>> all
> > > > >> >> >>>>>>> these
> > > > >> >> >>>>>>>>> things inside its #runProgram method, which
> introduces
> > a
> > > > lot
> > > > >> >> >> of
> > > > >> >> >>>>>>>> subclasses
> > > > >> >> >>>>>>>>> of (stream) execution environment.
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> Actually, it sets up an exec env that hijacks the
> > > > >> >> >>>>>> #execute/executePlan
> > > > >> >> >>>>>>>>> method, initializes the job graph and abort
> execution.
> > > And
> > > > >> >> >> then
> > > > >> >> >>>>>>>>> control flow back to CliFrontend, it deploys the
> > > cluster(or
> > > > >> >> >>>>> retrieve
> > > > >> >> >>>>>>>>> the client) and submits the job graph. This is quite
> a
> > > > >> >> >> specific
> > > > >> >> >>>>>>> internal
> > > > >> >> >>>>>>>>> process inside Flink and none of consistency to
> > anything.
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> 2) Deployment of job cluster couples job graph
> creation
> > > and
> > > > >> >> >>>> cluster
> > > > >> >> >>>>>>>>> deployment. Abstractly, from user job to a concrete
> > > > >> >> >> submission,
> > > > >> >> >>>> it
> > > > >> >> >>>>>>>> requires
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>>     create JobGraph --\
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> create ClusterClient -->  submit JobGraph
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> such a dependency. ClusterClient was created by
> > deploying
> > > > or
> > > > >> >> >>>>>>> retrieving.
> > > > >> >> >>>>>>>>> JobGraph submission requires a compiled JobGraph and
> > > valid
> > > > >> >> >>>>>>> ClusterClient,
> > > > >> >> >>>>>>>>> but the creation of ClusterClient is abstractly
> > > independent
> > > > >> >> >> of
> > > > >> >> >>>> that
> > > > >> >> >>>>>> of
> > > > >> >> >>>>>>>>> JobGraph. However, in job cluster mode, we deploy job
> > > > cluster
> > > > >> >> >>>> with
> > > > >> >> >>>>> a
> > > > >> >> >>>>>>> job
> > > > >> >> >>>>>>>>> graph, which means we use another process:
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> create JobGraph --> deploy cluster with the JobGraph
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> Here is another inconsistency and downstream
> > > > projects/client
> > > > >> >> >>> apis
> > > > >> >> >>>>> are
> > > > >> >> >>>>>>>>> forced to handle different cases with rare supports
> > from
> > > > >> >> >> Flink.
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> Since we likely reached a consensus on
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> 1. all configs gathered by Flink configuration and
> > passed
> > > > >> >> >>>>>>>>> 2. execution environment knows all configs and
> handles
> > > > >> >> >>>>> execution(both
> > > > >> >> >>>>>>>>> deployment and submission)
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> to the issues above I propose eliminating
> > inconsistencies
> > > > by
> > > > >> >> >>>>>> following
> > > > >> >> >>>>>>>>> approach:
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> 1) CliFrontend should exactly be a front end, at
> least
> > > for
> > > > >> >> >>> "run"
> > > > >> >> >>>>>>> command.
> > > > >> >> >>>>>>>>> That means it just gathered and passed all config
> from
> > > > >> >> >> command
> > > > >> >> >>>> line
> > > > >> >> >>>>>> to
> > > > >> >> >>>>>>>>> the main method of user program. Execution
> environment
> > > > knows
> > > > >> >> >>> all
> > > > >> >> >>>>> the
> > > > >> >> >>>>>>> info
> > > > >> >> >>>>>>>>> and with an addition to utils for ClusterClient, we
> > > > >> >> >> gracefully
> > > > >> >> >>>> get
> > > > >> >> >>>>> a
> > > > >> >> >>>>>>>>> ClusterClient by deploying or retrieving. In this
> way,
> > we
> > > > >> >> >> don't
> > > > >> >> >>>>> need
> > > > >> >> >>>>>> to
> > > > >> >> >>>>>>>>> hijack #execute/executePlan methods and can remove
> > > various
> > > > >> >> >>>> hacking
> > > > >> >> >>>>>>>>> subclasses of exec env, as well as #run methods in
> > > > >> >> >>>>> ClusterClient(for
> > > > >> >> >>>>>> an
> > > > >> >> >>>>>>>>> interface-ized ClusterClient). Now the control flow
> > flows
> > > > >> >> >> from
> > > > >> >> >>>>>>>> CliFrontend
> > > > >> >> >>>>>>>>> to the main method and never returns.
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> 2) Job cluster means a cluster for the specific job.
> > From
> > > > >> >> >>> another
> > > > >> >> >>>>>>>>> perspective, it is an ephemeral session. We may
> > decouple
> > > > the
> > > > >> >> >>>>>> deployment
> > > > >> >> >>>>>>>>> with a compiled job graph, but start a session with
> > idle
> > > > >> >> >>> timeout
> > > > >> >> >>>>>>>>> and submit the job following.
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> These topics, before we go into more details on
> design
> > or
> > > > >> >> >>>>>>> implementation,
> > > > >> >> >>>>>>>>> are better to be aware and discussed for a consensus.
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> Best,
> > > > >> >> >>>>>>>>> tison.
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月20日周四
> > 上午3:21写道：
> > > > >> >> >>>>>>>>>
> > > > >> >> >>>>>>>>>> Hi Jeff,
> > > > >> >> >>>>>>>>>>
> > > > >> >> >>>>>>>>>> Thanks for raising this thread and the design
> > document!
> > > > >> >> >>>>>>>>>>
> > > > >> >> >>>>>>>>>> As @Thomas Weise mentioned above, extending config
> to
> > > > flink
> > > > >> >> >>>>>>>>>> requires far more effort than it should be. Another
> > > > example
> > > > >> >> >>>>>>>>>> is we achieve detach mode by introduce another
> > execution
> > > > >> >> >>>>>>>>>> environment which also hijack #execute method.
> > > > >> >> >>>>>>>>>>
> > > > >> >> >>>>>>>>>> I agree with your idea that user would configure all
> > > > things
> > > > >> >> >>>>>>>>>> and flink "just" respect it. On this topic I think
> the
> > > > >> >> >> unusual
> > > > >> >> >>>>>>>>>> control flow when CliFrontend handle "run" command
> is
> > > the
> > > > >> >> >>>> problem.
> > > > >> >> >>>>>>>>>> It handles several configs, mainly about cluster
> > > settings,
> > > > >> >> >> and
> > > > >> >> >>>>>>>>>> thus main method of user program is unaware of them.
> > > Also
> > > > it
> > > > >> >> >>>>>> compiles
> > > > >> >> >>>>>>>>>> app to job graph by run the main method with a
> > hijacked
> > > > exec
> > > > >> >> >>>> env,
> > > > >> >> >>>>>>>>>> which constrain the main method further.
> > > > >> >> >>>>>>>>>>
> > > > >> >> >>>>>>>>>> I'd like to write down a few of notes on
> configs/args
> > > pass
> > > > >> >> >> and
> > > > >> >> >>>>>>> respect,
> > > > >> >> >>>>>>>>>> as well as decoupling job compilation and
> submission.
> > > > Share
> > > > >> >> >> on
> > > > >> >> >>>>> this
> > > > >> >> >>>>>>>>>> thread later.
> > > > >> >> >>>>>>>>>>
> > > > >> >> >>>>>>>>>> Best,
> > > > >> >> >>>>>>>>>> tison.
> > > > >> >> >>>>>>>>>>
> > > > >> >> >>>>>>>>>>
> > > > >> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一
> > > > >> >> >> 下午7:29写道：
> > > > >> >> >>>>>>>>>>
> > > > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
> > > > >> >> >>>>>>>>>>>
> > > > >> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the design
> document.
> > > > >> >> >>>>>>>>>>>
> > > > >> >> >>>>>>>>>>> We are also working on refactoring ClusterClient to
> > > allow
> > > > >> >> >>>>> flexible
> > > > >> >> >>>>>>> and
> > > > >> >> >>>>>>>>>>> efficient job management in our real-time platform.
> > > > >> >> >>>>>>>>>>> We would like to draft a document to share our
> ideas
> > > with
> > > > >> >> >>> you.
> > > > >> >> >>>>>>>>>>>
> > > > >> >> >>>>>>>>>>> I think it's a good idea to have something like
> > Apache
> > > > Livy
> > > > >> >> >>> for
> > > > >> >> >>>>>>> Flink,
> > > > >> >> >>>>>>>>>>> and
> > > > >> >> >>>>>>>>>>> the efforts discussed here will take a great step
> > > forward
> > > > >> >> >> to
> > > > >> >> >>>> it.
> > > > >> >> >>>>>>>>>>>
> > > > >> >> >>>>>>>>>>> Regards,
> > > > >> >> >>>>>>>>>>> Xiaogang
> > > > >> >> >>>>>>>>>>>
> > > > >> >> >>>>>>>>>>> Flavio Pompermaier <po...@okkam.it>
> > > 于2019年6月17日周一
> > > > >> >> >>>>> 下午7:13写道：
> > > > >> >> >>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>> Is there any possibility to have something like
> > Apache
> > > > >> >> >> Livy
> > > > >> >> >>>> [1]
> > > > >> >> >>>>>>> also
> > > > >> >> >>>>>>>>>>> for
> > > > >> >> >>>>>>>>>>>> Flink in the future?
> > > > >> >> >>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
> > > > >> >> >>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <
> > > > >> >> >>> zjffdu@gmail.com
> > > > >> >> >>>>>
> > > > >> >> >>>>>>> wrote:
> > > > >> >> >>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>> Any API we expose should not have dependencies
> > on
> > > > >> >> >>> the
> > > > >> >> >>>>>>> runtime
> > > > >> >> >>>>>>>>>>>>> (flink-runtime) package or other implementation
> > > > >> >> >> details.
> > > > >> >> >>> To
> > > > >> >> >>>>> me,
> > > > >> >> >>>>>>>> this
> > > > >> >> >>>>>>>>>>>> means
> > > > >> >> >>>>>>>>>>>>> that the current ClusterClient cannot be exposed
> to
> > > > >> >> >> users
> > > > >> >> >>>>>> because
> > > > >> >> >>>>>>>> it
> > > > >> >> >>>>>>>>>>>> uses
> > > > >> >> >>>>>>>>>>>>> quite some classes from the optimiser and runtime
> > > > >> >> >>> packages.
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>> We should change ClusterClient from class to
> > > interface.
> > > > >> >> >>>>>>>>>>>>> ExecutionEnvironment only use the interface
> > > > >> >> >> ClusterClient
> > > > >> >> >>>>> which
> > > > >> >> >>>>>>>>>>> should be
> > > > >> >> >>>>>>>>>>>>> in flink-clients while the concrete
> implementation
> > > > >> >> >> class
> > > > >> >> >>>>> could
> > > > >> >> >>>>>> be
> > > > >> >> >>>>>>>> in
> > > > >> >> >>>>>>>>>>>>> flink-runtime.
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>> What happens when a failure/restart in the
> > client
> > > > >> >> >>>>> happens?
> > > > >> >> >>>>>>>> There
> > > > >> >> >>>>>>>>>>> need
> > > > >> >> >>>>>>>>>>>>> to be a way of re-establishing the connection to
> > the
> > > > >> >> >> job,
> > > > >> >> >>>> set
> > > > >> >> >>>>>> up
> > > > >> >> >>>>>>>> the
> > > > >> >> >>>>>>>>>>>>> listeners again, etc.
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>> Good point.  First we need to define what does
> > > > >> >> >>>>> failure/restart
> > > > >> >> >>>>>> in
> > > > >> >> >>>>>>>> the
> > > > >> >> >>>>>>>>>>>>> client mean. IIUC, that usually mean network
> > failure
> > > > >> >> >>> which
> > > > >> >> >>>>> will
> > > > >> >> >>>>>>>>>>> happen in
> > > > >> >> >>>>>>>>>>>>> class RestClient. If my understanding is correct,
> > > > >> >> >>>>> restart/retry
> > > > >> >> >>>>>>>>>>> mechanism
> > > > >> >> >>>>>>>>>>>>> should be done in RestClient.
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org>
> > 于2019年6月11日周二
> > > > >> >> >>>>>> 下午11:10写道：
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>> Some points to consider:
> > > > >> >> >>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>> * Any API we expose should not have dependencies
> > on
> > > > >> >> >> the
> > > > >> >> >>>>>> runtime
> > > > >> >> >>>>>>>>>>>>>> (flink-runtime) package or other implementation
> > > > >> >> >>> details.
> > > > >> >> >>>> To
> > > > >> >> >>>>>> me,
> > > > >> >> >>>>>>>>>>> this
> > > > >> >> >>>>>>>>>>>>> means
> > > > >> >> >>>>>>>>>>>>>> that the current ClusterClient cannot be exposed
> > to
> > > > >> >> >>> users
> > > > >> >> >>>>>>> because
> > > > >> >> >>>>>>>>>>> it
> > > > >> >> >>>>>>>>>>>>> uses
> > > > >> >> >>>>>>>>>>>>>> quite some classes from the optimiser and
> runtime
> > > > >> >> >>>> packages.
> > > > >> >> >>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>> * What happens when a failure/restart in the
> > client
> > > > >> >> >>>>> happens?
> > > > >> >> >>>>>>>> There
> > > > >> >> >>>>>>>>>>> need
> > > > >> >> >>>>>>>>>>>>> to
> > > > >> >> >>>>>>>>>>>>>> be a way of re-establishing the connection to
> the
> > > > >> >> >> job,
> > > > >> >> >>>> set
> > > > >> >> >>>>> up
> > > > >> >> >>>>>>> the
> > > > >> >> >>>>>>>>>>>>> listeners
> > > > >> >> >>>>>>>>>>>>>> again, etc.
> > > > >> >> >>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>> Aljoscha
> > > > >> >> >>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff Zhang <
> > > > >> >> >>>> zjffdu@gmail.com>
> > > > >> >> >>>>>>>> wrote:
> > > > >> >> >>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is late as you
> > > > >> >> >> expected.
> > > > >> >> >>>>> Here's
> > > > >> >> >>>>>>> the
> > > > >> >> >>>>>>>>>>>> design
> > > > >> >> >>>>>>>>>>>>>> doc
> > > > >> >> >>>>>>>>>>>>>>> I drafted, welcome any comments and feedback.
> > > > >> >> >>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>
> > > > >> >> >>>>>>>>
> > > > >> >> >>>>>>>
> > > > >> >> >>>>>>
> > > > >> >> >>>>>
> > > > >> >> >>>>
> > > > >> >> >>>
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > > >> >> >>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四
> > > > >> >> >>>> 下午8:43写道：
> > > > >> >> >>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>> Nice that this discussion is happening.
> > > > >> >> >>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also revisit the entire
> > role
> > > > >> >> >>> of
> > > > >> >> >>>>> the
> > > > >> >> >>>>>>>>>>>> environments
> > > > >> >> >>>>>>>>>>>>>>>> again.
> > > > >> >> >>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> > > > >> >> >>>>>>>>>>>>>>>> - the environments take care of the specific
> > > > >> >> >> setup
> > > > >> >> >>>> for
> > > > >> >> >>>>>>>>>>> standalone
> > > > >> >> >>>>>>>>>>>> (no
> > > > >> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
> > > > >> >> >>>>>>>>>>>>>>>> - the session ones have control over the
> > session.
> > > > >> >> >>> The
> > > > >> >> >>>>>>>>>>> environment
> > > > >> >> >>>>>>>>>>>>> holds
> > > > >> >> >>>>>>>>>>>>>>>> the session client.
> > > > >> >> >>>>>>>>>>>>>>>> - running a job gives a "control" object for
> > that
> > > > >> >> >>>> job.
> > > > >> >> >>>>>> That
> > > > >> >> >>>>>>>>>>>> behavior
> > > > >> >> >>>>>>>>>>>>> is
> > > > >> >> >>>>>>>>>>>>>>>> the same in all environments.
> > > > >> >> >>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>> The actual implementation diverged quite a bit
> > > > >> >> >> from
> > > > >> >> >>>>> that.
> > > > >> >> >>>>>>>> Happy
> > > > >> >> >>>>>>>>>>> to
> > > > >> >> >>>>>>>>>>>>> see a
> > > > >> >> >>>>>>>>>>>>>>>> discussion about straitening this out a bit
> > more.
> > > > >> >> >>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
> > > > >> >> >>>>>>> zjffdu@gmail.com>
> > > > >> >> >>>>>>>>>>>> wrote:
> > > > >> >> >>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>> Hi folks,
> > > > >> >> >>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>> Sorry for late response, It seems we reach
> > > > >> >> >>> consensus
> > > > >> >> >>>> on
> > > > >> >> >>>>>>>> this, I
> > > > >> >> >>>>>>>>>>>> will
> > > > >> >> >>>>>>>>>>>>>>>> create
> > > > >> >> >>>>>>>>>>>>>>>>> FLIP for this with more detailed design
> > > > >> >> >>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org> 于2018年12月21日周五
> > > > >> >> >>>>> 上午11:43写道：
> > > > >> >> >>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>> Great to see this discussion seeded! The
> > > > >> >> >> problems
> > > > >> >> >>>> you
> > > > >> >> >>>>>> face
> > > > >> >> >>>>>>>>>>> with
> > > > >> >> >>>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also affecting
> other
> > > > >> >> >>>>> downstream
> > > > >> >> >>>>>>>>>>> projects,
> > > > >> >> >>>>>>>>>>>>>> like
> > > > >> >> >>>>>>>>>>>>>>>>>> Beam.
> > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint restore option
> > in
> > > > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> > > > >> >> >>>>>>>>>>>>>>>>> [1]
> > > > >> >> >>>>>>>>>>>>>>>>>> and that was more difficult than it should
> be.
> > > > >> >> >> The
> > > > >> >> >>>>> main
> > > > >> >> >>>>>>>> issue
> > > > >> >> >>>>>>>>>>> is
> > > > >> >> >>>>>>>>>>>>> that
> > > > >> >> >>>>>>>>>>>>>>>>>> environment and cluster client aren't
> > decoupled.
> > > > >> >> >>>>> Ideally
> > > > >> >> >>>>>>> it
> > > > >> >> >>>>>>>>>>> should
> > > > >> >> >>>>>>>>>>>>> be
> > > > >> >> >>>>>>>>>>>>>>>>>> possible to just get the matching cluster
> > client
> > > > >> >> >>>> from
> > > > >> >> >>>>>> the
> > > > >> >> >>>>>>>>>>>>> environment
> > > > >> >> >>>>>>>>>>>>>>>> and
> > > > >> >> >>>>>>>>>>>>>>>>>> then control the job through it (environment
> > as
> > > > >> >> >>>>> factory
> > > > >> >> >>>>>>> for
> > > > >> >> >>>>>>>>>>>> cluster
> > > > >> >> >>>>>>>>>>>>>>>>>> client). But note that the environment
> classes
> > > > >> >> >> are
> > > > >> >> >>>>> part
> > > > >> >> >>>>>> of
> > > > >> >> >>>>>>>> the
> > > > >> >> >>>>>>>>>>>>> public
> > > > >> >> >>>>>>>>>>>>>>>>> API,
> > > > >> >> >>>>>>>>>>>>>>>>>> and it is not straightforward to make larger
> > > > >> >> >>> changes
> > > > >> >> >>>>>>> without
> > > > >> >> >>>>>>>>>>>>> breaking
> > > > >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
> > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes internal
> > classes
> > > > >> >> >>>> like
> > > > >> >> >>>>>>>>>>> JobGraph and
> > > > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be possible to
> wrap
> > > > >> >> >>> this
> > > > >> >> >>>>>> with a
> > > > >> >> >>>>>>>> new
> > > > >> >> >>>>>>>>>>>>> public
> > > > >> >> >>>>>>>>>>>>>>>> API
> > > > >> >> >>>>>>>>>>>>>>>>>> that brings the required job control
> > > > >> >> >> capabilities
> > > > >> >> >>>> for
> > > > >> >> >>>>>>>>>>> downstream
> > > > >> >> >>>>>>>>>>>>>>>>> projects.
> > > > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at some of the
> > > > >> >> >>>>> interfaces
> > > > >> >> >>>>>> in
> > > > >> >> >>>>>>>>>>> Beam
> > > > >> >> >>>>>>>>>>>>> while
> > > > >> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the portable
> job
> > > > >> >> >> API
> > > > >> >> >>>> and
> > > > >> >> >>>>>> [3]
> > > > >> >> >>>>>>>> for
> > > > >> >> >>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>> old
> > > > >> >> >>>>>>>>>>>>>>>>>> asynchronous job control from the Beam Java
> > SDK.
> > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>> The backward compatibility discussion [4] is
> > > > >> >> >> also
> > > > >> >> >>>>>> relevant
> > > > >> >> >>>>>>>>>>> here. A
> > > > >> >> >>>>>>>>>>>>> new
> > > > >> >> >>>>>>>>>>>>>>>>> API
> > > > >> >> >>>>>>>>>>>>>>>>>> should shield downstream projects from
> > internals
> > > > >> >> >>> and
> > > > >> >> >>>>>> allow
> > > > >> >> >>>>>>>>>>> them to
> > > > >> >> >>>>>>>>>>>>>>>>>> interoperate with multiple future Flink
> > versions
> > > > >> >> >>> in
> > > > >> >> >>>>> the
> > > > >> >> >>>>>>> same
> > > > >> >> >>>>>>>>>>>> release
> > > > >> >> >>>>>>>>>>>>>>>> line
> > > > >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
> > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>> Thanks,
> > > > >> >> >>>>>>>>>>>>>>>>>> Thomas
> > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>> [1]
> https://github.com/apache/flink/pull/7249
> > > > >> >> >>>>>>>>>>>>>>>>>> [2]
> > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>
> > > > >> >> >>>>>>>>
> > > > >> >> >>>>>>>
> > > > >> >> >>>>>>
> > > > >> >> >>>>>
> > > > >> >> >>>>
> > > > >> >> >>>
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > > >> >> >>>>>>>>>>>>>>>>>> [3]
> > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>
> > > > >> >> >>>>>>>>
> > > > >> >> >>>>>>>
> > > > >> >> >>>>>>
> > > > >> >> >>>>>
> > > > >> >> >>>>
> > > > >> >> >>>
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > > >> >> >>>>>>>>>>>>>>>>>> [4]
> > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>
> > > > >> >> >>>>>>>>
> > > > >> >> >>>>>>>
> > > > >> >> >>>>>>
> > > > >> >> >>>>>
> > > > >> >> >>>>
> > > > >> >> >>>
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
> > > > >> >> >>>>>>>> zjffdu@gmail.com>
> > > > >> >> >>>>>>>>>>>>> wrote:
> > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should
> be
> > > > >> >> >>> able
> > > > >> >> >>>> to
> > > > >> >> >>>>>>>> define
> > > > >> >> >>>>>>>>>>>> where
> > > > >> >> >>>>>>>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>>> job
> > > > >> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is
> actually
> > > > >> >> >>>>>> independent
> > > > >> >> >>>>>>>> of
> > > > >> >> >>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>> job
> > > > >> >> >>>>>>>>>>>>>>>>>>> development and is something which is
> decided
> > > > >> >> >> at
> > > > >> >> >>>>>>> deployment
> > > > >> >> >>>>>>>>>>> time.
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>> User don't need to specify execution mode
> > > > >> >> >>>>>>> programmatically.
> > > > >> >> >>>>>>>>>>> They
> > > > >> >> >>>>>>>>>>>>> can
> > > > >> >> >>>>>>>>>>>>>>>>> also
> > > > >> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from the arguments
> in
> > > > >> >> >>> flink
> > > > >> >> >>>>> run
> > > > >> >> >>>>>>>>>>> command.
> > > > >> >> >>>>>>>>>>>>> e.g.
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster ....
> > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
> > > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that the
> > > > >> >> >>>> ExecutionEnvironment
> > > > >> >> >>>>>> is
> > > > >> >> >>>>>>>> not
> > > > >> >> >>>>>>>>>>>>>>>> directly
> > > > >> >> >>>>>>>>>>>>>>>>>>> initialized by the user and instead context
> > > > >> >> >>>> sensitive
> > > > >> >> >>>>>> how
> > > > >> >> >>>>>>>> you
> > > > >> >> >>>>>>>>>>>> want
> > > > >> >> >>>>>>>>>>>>> to
> > > > >> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs. IDE, for
> > > > >> >> >>> example).
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink would
> create
> > > > >> >> >>>>> different
> > > > >> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment based on
> > different
> > > > >> >> >>>>>> submission
> > > > >> >> >>>>>>>>>>>> scenarios
> > > > >> >> >>>>>>>>>>>>>>>>>> (Flink
> > > > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is kind of hack
> > > > >> >> >> approach,
> > > > >> >> >>>> not
> > > > >> >> >>>>>> so
> > > > >> >> >>>>>>>>>>>>>>>>> straightforward.
> > > > >> >> >>>>>>>>>>>>>>>>>>> What I suggested above is that is that
> flink
> > > > >> >> >>> should
> > > > >> >> >>>>>>> always
> > > > >> >> >>>>>>>>>>> create
> > > > >> >> >>>>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with different
> > > > >> >> >>>>> configuration,
> > > > >> >> >>>>>>> and
> > > > >> >> >>>>>>>>>>> based
> > > > >> >> >>>>>>>>>>>> on
> > > > >> >> >>>>>>>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>>>> configuration it would create the proper
> > > > >> >> >>>>> ClusterClient
> > > > >> >> >>>>>>> for
> > > > >> >> >>>>>>>>>>>>> different
> > > > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> > > > >> >> >>>> 于2018年12月20日周四
> > > > >> >> >>>>>>>>>>> 下午11:18写道：
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>> You are probably right that we have code
> > > > >> >> >>>> duplication
> > > > >> >> >>>>>>> when
> > > > >> >> >>>>>>>> it
> > > > >> >> >>>>>>>>>>>> comes
> > > > >> >> >>>>>>>>>>>>>>>> to
> > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient. This should
> > be
> > > > >> >> >>>>> reduced
> > > > >> >> >>>>>> in
> > > > >> >> >>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>> future.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
> > > > >> >> >> able
> > > > >> >> >>> to
> > > > >> >> >>>>>>> define
> > > > >> >> >>>>>>>>>>> where
> > > > >> >> >>>>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is
> > actually
> > > > >> >> >>>>>>> independent
> > > > >> >> >>>>>>>>>>> of the
> > > > >> >> >>>>>>>>>>>>>>>> job
> > > > >> >> >>>>>>>>>>>>>>>>>>>> development and is something which is
> > decided
> > > > >> >> >> at
> > > > >> >> >>>>>>>> deployment
> > > > >> >> >>>>>>>>>>>> time.
> > > > >> >> >>>>>>>>>>>>>>>> To
> > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > >> >> >>>>>>>>>>>>>>>>>>> it
> > > > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the ExecutionEnvironment
> is
> > > > >> >> >> not
> > > > >> >> >>>>>>> directly
> > > > >> >> >>>>>>>>>>>>>>>> initialized
> > > > >> >> >>>>>>>>>>>>>>>>>> by
> > > > >> >> >>>>>>>>>>>>>>>>>>>> the user and instead context sensitive how
> > you
> > > > >> >> >>>> want
> > > > >> >> >>>>> to
> > > > >> >> >>>>>>>>>>> execute
> > > > >> >> >>>>>>>>>>>>> your
> > > > >> >> >>>>>>>>>>>>>>>>> job
> > > > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for example).
> However, I
> > > > >> >> >>> agree
> > > > >> >> >>>>>> that
> > > > >> >> >>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should give you
> access
> > to
> > > > >> >> >>> the
> > > > >> >> >>>>>>>>>>> ClusterClient
> > > > >> >> >>>>>>>>>>>>>>>> and
> > > > >> >> >>>>>>>>>>>>>>>>> to
> > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the JobGraph or
> a
> > > > >> >> >> job
> > > > >> >> >>>>> plan).
> > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> > > > >> >> >>>>>>>>>>>>>>>>>>>> Till
> > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff
> Zhang <
> > > > >> >> >>>>>>>>>>> zjffdu@gmail.com>
> > > > >> >> >>>>>>>>>>>>>>>> wrote:
> > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You are right
> > that I
> > > > >> >> >>>>> expect
> > > > >> >> >>>>>>>> better
> > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api which could be
> > > > >> >> >> used
> > > > >> >> >>> by
> > > > >> >> >>>>>>>>>>> downstream
> > > > >> >> >>>>>>>>>>>>>>>>> project.
> > > > >> >> >>>>>>>>>>>>>>>>>>> And
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the flink ecosystem.
> > > > >> >> >> When
> > > > >> >> >>> I
> > > > >> >> >>>>> look
> > > > >> >> >>>>>>> at
> > > > >> >> >>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>> code
> > > > >> >> >>>>>>>>>>>>>>>>> of
> > > > >> >> >>>>>>>>>>>>>>>>>>>> flink
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I believe
> they
> > > > >> >> >> are
> > > > >> >> >>>> not
> > > > >> >> >>>>>> the
> > > > >> >> >>>>>>>>>>> core of
> > > > >> >> >>>>>>>>>>>>>>>>> flink,
> > > > >> >> >>>>>>>>>>>>>>>>>>> but
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of flink), I find
> > > > >> >> >> many
> > > > >> >> >>>>>>> duplicated
> > > > >> >> >>>>>>>>>>> code
> > > > >> >> >>>>>>>>>>>>>>>> for
> > > > >> >> >>>>>>>>>>>>>>>>>>>> creating
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user provided
> > > > >> >> >> configuration
> > > > >> >> >>>>>>>>>>> (configuration
> > > > >> >> >>>>>>>>>>>>>>>>> format
> > > > >> >> >>>>>>>>>>>>>>>>>>> may
> > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell and
> sql-client)
> > > > >> >> >> and
> > > > >> >> >>>> then
> > > > >> >> >>>>>> use
> > > > >> >> >>>>>>>>>>> that
> > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't think this is
> > > > >> >> >>>>> convenient
> > > > >> >> >>>>>>> for
> > > > >> >> >>>>>>>>>>>>>>>> downstream
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is that
> downstream
> > > > >> >> >>>> project
> > > > >> >> >>>>>> only
> > > > >> >> >>>>>>>>>>> needs
> > > > >> >> >>>>>>>>>>>> to
> > > > >> >> >>>>>>>>>>>>>>>>>>> provide
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration info (maybe
> > > > >> >> >> introducing
> > > > >> >> >>>>> class
> > > > >> >> >>>>>>>>>>>> FlinkConf),
> > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment based on this
> > > > >> >> >>>> FlinkConf,
> > > > >> >> >>>>>> and
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will create the
> proper
> > > > >> >> >>>>>>>> ClusterClient.
> > > > >> >> >>>>>>>>>>> It
> > > > >> >> >>>>>>>>>>>> not
> > > > >> >> >>>>>>>>>>>>>>>>>> only
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream project
> > > > >> >> >> development
> > > > >> >> >>>> but
> > > > >> >> >>>>>> also
> > > > >> >> >>>>>>>> be
> > > > >> >> >>>>>>>>>>>>>>>> helpful
> > > > >> >> >>>>>>>>>>>>>>>>>> for
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> their integration test with flink. Here's
> > one
> > > > >> >> >>>>> sample
> > > > >> >> >>>>>>> code
> > > > >> >> >>>>>>>>>>>> snippet
> > > > >> >> >>>>>>>>>>>>>>>>>> that
> > > > >> >> >>>>>>>>>>>>>>>>>>> I
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new FlinkConf().mode("yarn")
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new ExecutionEnvironment(conf)
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> > > > >> >> >>>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> > > > >> >> >>>>> 于2018年12月11日周二
> > > > >> >> >>>>>>>>>>> 下午6:28写道：
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to provide the
> > > > >> >> >> user
> > > > >> >> >>>> with
> > > > >> >> >>>>>>>> better
> > > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was actually an effort to
> > > > >> >> >>> achieve
> > > > >> >> >>>>>> this
> > > > >> >> >>>>>>>> but
> > > > >> >> >>>>>>>>>>> it
> > > > >> >> >>>>>>>>>>>>>>>> has
> > > > >> >> >>>>>>>>>>>>>>>>>>> never
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> been
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However, there are some
> > > > >> >> >>>> improvement
> > > > >> >> >>>>>> in
> > > > >> >> >>>>>>>> the
> > > > >> >> >>>>>>>>>>> code
> > > > >> >> >>>>>>>>>>>>>>>>> base
> > > > >> >> >>>>>>>>>>>>>>>>>>>> now.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the NewClusterClient
> > > > >> >> >>>> interface
> > > > >> >> >>>>>>> which
> > > > >> >> >>>>>>>>>>>>>>>> offers a
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission. But I agree
> > > > >> >> >> that
> > > > >> >> >>> we
> > > > >> >> >>>>>> need
> > > > >> >> >>>>>>> to
> > > > >> >> >>>>>>>>>>>>>>>> improve
> > > > >> >> >>>>>>>>>>>>>>>>>>> Flink
> > > > >> >> >>>>>>>>>>>>>>>>>>>> in
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if exposing all
> > > > >> >> >>>>>> ClusterClient
> > > > >> >> >>>>>>>>>>> calls
> > > > >> >> >>>>>>>>>>>>>>>> via
> > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment because it would
> > > > >> >> >> clutter
> > > > >> >> >>>> the
> > > > >> >> >>>>>>> class
> > > > >> >> >>>>>>>>>>> and
> > > > >> >> >>>>>>>>>>>>>>>> would
> > > > >> >> >>>>>>>>>>>>>>>>>> not
> > > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> a
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns. Instead one
> > > > >> >> >> idea
> > > > >> >> >>>>> could
> > > > >> >> >>>>>> be
> > > > >> >> >>>>>>>> to
> > > > >> >> >>>>>>>>>>>>>>>>> retrieve
> > > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from the
> > > > >> >> >>>>> ExecutionEnvironment
> > > > >> >> >>>>>>>> which
> > > > >> >> >>>>>>>>>>> can
> > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > >> >> >>>>>>>>>>>>>>>>>> be
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> used
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job control. But before
> we
> > > > >> >> >>> start
> > > > >> >> >>>>> an
> > > > >> >> >>>>>>>> effort
> > > > >> >> >>>>>>>>>>>>>>>> here,
> > > > >> >> >>>>>>>>>>>>>>>>> we
> > > > >> >> >>>>>>>>>>>>>>>>>>>> need
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> to
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what functionality we
> > want
> > > > >> >> >>> to
> > > > >> >> >>>>>>> provide.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was that we have the
> > > > >> >> >>>>>>>> ClusterDescriptor
> > > > >> >> >>>>>>>>>>>>>>>>>> describing
> > > > >> >> >>>>>>>>>>>>>>>>>>>> how
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager like Yarn or
> > > > >> >> >> Mesos.
> > > > >> >> >>>> The
> > > > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> > > > >> >> >>>>>>>>>>>>>>>>>>>> can
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> be
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink clusters (job
> and
> > > > >> >> >>>>> session)
> > > > >> >> >>>>>>> and
> > > > >> >> >>>>>>>>>>> gives
> > > > >> >> >>>>>>>>>>>>>>>>> you a
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The ClusterClient
> controls
> > > > >> >> >> the
> > > > >> >> >>>>>> cluster
> > > > >> >> >>>>>>>>>>> (e.g.
> > > > >> >> >>>>>>>>>>>>>>>>>>> submitting
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running jobs). And
> then
> > > > >> >> >>> there
> > > > >> >> >>>>> was
> > > > >> >> >>>>>>> the
> > > > >> >> >>>>>>>>>>> idea
> > > > >> >> >>>>>>>>>>>> to
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain from the
> > > > >> >> >>>> ClusterClient
> > > > >> >> >>>>> to
> > > > >> >> >>>>>>>>>>> trigger
> > > > >> >> >>>>>>>>>>>>>>>> job
> > > > >> >> >>>>>>>>>>>>>>>>>>>> specific
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a savepoint,
> > > > >> >> >>> cancelling
> > > > >> >> >>>>> the
> > > > >> >> >>>>>>>> job).
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> > > > >> >> >>>>>> https://issues.apache.org/jira/browse/FLINK-4272
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff
> > Zhang
> > > > >> >> >> <
> > > > >> >> >>>>>>>>>>> zjffdu@gmail.com
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>> wrote:
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate flink into
> > apache
> > > > >> >> >>>>> zeppelin
> > > > >> >> >>>>>>>>>>> which is
> > > > >> >> >>>>>>>>>>>>>>>> an
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several issues that
> > is
> > > > >> >> >>>> caused
> > > > >> >> >>>>>> by
> > > > >> >> >>>>>>>>>>> flink
> > > > >> >> >>>>>>>>>>>>>>>>> client
> > > > >> >> >>>>>>>>>>>>>>>>>>>> api.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> So
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the following
> > changes
> > > > >> >> >>> for
> > > > >> >> >>>>>> flink
> > > > >> >> >>>>>>>>>>> client
> > > > >> >> >>>>>>>>>>>>>>>>> api.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking execution.
> > > > >> >> >> Currently,
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which would do 2
> > > > >> >> >> things,
> > > > >> >> >>>>> first
> > > > >> >> >>>>>>>>>>> submit
> > > > >> >> >>>>>>>>>>>>>>>> job
> > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is finished. I'd
> > like
> > > > >> >> >>>>>>> introduce a
> > > > >> >> >>>>>>>>>>>>>>>>>> nonblocking
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
> > > > >> >> >>>> ExecutionEnvironment#submit
> > > > >> >> >>>>>>> which
> > > > >> >> >>>>>>>>>>> only
> > > > >> >> >>>>>>>>>>>>>>>>>> submit
> > > > >> >> >>>>>>>>>>>>>>>>>>>> job
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> and
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to client. And allow
> > user
> > > > >> >> >>> to
> > > > >> >> >>>>>> query
> > > > >> >> >>>>>>>> the
> > > > >> >> >>>>>>>>>>> job
> > > > >> >> >>>>>>>>>>>>>>>>>> status
> > > > >> >> >>>>>>>>>>>>>>>>>>>> via
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to cancel job is
> > via
> > > > >> >> >>> cli
> > > > >> >> >>>>>>>>>>> (bin/flink),
> > > > >> >> >>>>>>>>>>>>>>>>> this
> > > > >> >> >>>>>>>>>>>>>>>>>>> is
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> not
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream project to
> use
> > > > >> >> >> this
> > > > >> >> >>>>>>> feature.
> > > > >> >> >>>>>>>>>>> So I'd
> > > > >> >> >>>>>>>>>>>>>>>>>> like
> > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> add
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in ExecutionEnvironment
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> It
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api, we should use
> > > > >> >> >>>>>>>>>>> ExecutionEnvironment
> > > > >> >> >>>>>>>>>>>>>>>> as
> > > > >> >> >>>>>>>>>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to integrate with
> > > > >> >> >> flink.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job execution
> > > > >> >> >> lifecycle.
> > > > >> >> >>>>>>> Something
> > > > >> >> >>>>>>>>>>> like
> > > > >> >> >>>>>>>>>>>>>>>>>>>> following,
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> so
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project can do custom
> > logic
> > > > >> >> >>> in
> > > > >> >> >>>>> the
> > > > >> >> >>>>>>>>>>> lifecycle
> > > > >> >> >>>>>>>>>>>>>>>> of
> > > > >> >> >>>>>>>>>>>>>>>>>>> job.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the jobId after
> > job
> > > > >> >> >> is
> > > > >> >> >>>>>>> submitted
> > > > >> >> >>>>>>>>>>> and
> > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > >> >> >>>>>>>>>>>>>>>>>>> use
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> this
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later when
> necessary.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface JobListener {
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID jobId);
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobExecuted(JobExecutionResult
> > > > >> >> >>>>> jobResult);
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID jobId);
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in
> > ExecutionEnvironment.
> > > > >> >> >>>>>> Currently
> > > > >> >> >>>>>>> it
> > > > >> >> >>>>>>>>>>> is
> > > > >> >> >>>>>>>>>>>>>>>>>>> disabled,
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> but
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient for third
> > party
> > > > >> >> >> to
> > > > >> >> >>>>>>>> submitting
> > > > >> >> >>>>>>>>>>> jobs
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it again.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client api into
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue which needs
> > more
> > > > >> >> >>>>> careful
> > > > >> >> >>>>>>>>>>> thinking
> > > > >> >> >>>>>>>>>>>>>>>> and
> > > > >> >> >>>>>>>>>>>>>>>>>>>> design.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features of flink is
> > > > >> >> >>> exposed
> > > > >> >> >>>> in
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > >> >> >>>>>> but
> > > > >> >> >>>>>>>>>>> some are
> > > > >> >> >>>>>>>>>>>>>>>>>>> exposed
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> in
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like the cancel and
> > > > >> >> >>>>> savepoint I
> > > > >> >> >>>>>>>>>>> mentioned
> > > > >> >> >>>>>>>>>>>>>>>>>>> above.
> > > > >> >> >>>>>>>>>>>>>>>>>>>> I
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is due to that
> flink
> > > > >> >> >>>> didn't
> > > > >> >> >>>>>>> unify
> > > > >> >> >>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>>>>> interaction
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> with
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3 scenarios of flink
> > > > >> >> >>>> operation
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.  Flink will
> > create
> > > > >> >> >>>>>>>>>>> LocalEnvironment
> > > > >> >> >>>>>>>>>>>>>>>>> and
> > > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> use
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to create
> > > > >> >> >>> LocalExecutor
> > > > >> >> >>>>> for
> > > > >> >> >>>>>>> job
> > > > >> >> >>>>>>>>>>>>>>>>>> execution.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution. Flink will
> > create
> > > > >> >> >>>>>>>> ClusterClient
> > > > >> >> >>>>>>>>>>>>>>>>> first
> > > > >> >> >>>>>>>>>>>>>>>>>>> and
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> then
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment based on the
> > > > >> >> >>>>>>> ClusterClient
> > > > >> >> >>>>>>>>>>> and
> > > > >> >> >>>>>>>>>>>>>>>>> then
> > > > >> >> >>>>>>>>>>>>>>>>>>> run
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink will create
> > > > >> >> >>>>>> ClusterClient
> > > > >> >> >>>>>>>>>>> first
> > > > >> >> >>>>>>>>>>>>>>>> and
> > > > >> >> >>>>>>>>>>>>>>>>>>> then
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this ClusterClient.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the above 3
> scenarios.
> > > > >> >> >>> Flink
> > > > >> >> >>>>>> didn't
> > > > >> >> >>>>>>>>>>> use the
> > > > >> >> >>>>>>>>>>>>>>>>>> same
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to interact with
> > flink
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is following:
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> > > > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
> > > > >> >> >>>>>>>>>>> (based
> > > > >> >> >>>>>>>>>>>>>>>> on
> > > > >> >> >>>>>>>>>>>>>>>>>> user
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this Environment
> > to
> > > > >> >> >>>> create
> > > > >> >> >>>>>>>> proper
> > > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or
> RestClusterClient)
> > > > >> >> >> to
> > > > >> >> >>>>>>>> interactive
> > > > >> >> >>>>>>>>>>> with
> > > > >> >> >>>>>>>>>>>>>>>>>>> Flink (
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the process of
> > local
> > > > >> >> >>>>>> execution
> > > > >> >> >>>>>>>> and
> > > > >> >> >>>>>>>>>>>>>>>> remote
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for third party
> to
> > > > >> >> >>>>> integrate
> > > > >> >> >>>>>>> with
> > > > >> >> >>>>>>>>>>>>>>>> flink,
> > > > >> >> >>>>>>>>>>>>>>>>>>>> because
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the unified
> entry
> > > > >> >> >>> point
> > > > >> >> >>>>> for
> > > > >> >> >>>>>>>>>>> flink.
> > > > >> >> >>>>>>>>>>>>>>>> What
> > > > >> >> >>>>>>>>>>>>>>>>>>> third
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> party
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass configuration
> to
> > > > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > >> >> >>>>>>>>>>>>>>>>>> and
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will do the right
> > > > >> >> >> thing
> > > > >> >> >>>>> based
> > > > >> >> >>>>>> on
> > > > >> >> >>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be considered as
> flink
> > > > >> >> >> api
> > > > >> >> >>>>>>> consumer.
> > > > >> >> >>>>>>>>>>> it
> > > > >> >> >>>>>>>>>>>>>>>> just
> > > > >> >> >>>>>>>>>>>>>>>>>>> pass
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to ExecutionEnvironment
> and
> > > > >> >> >> let
> > > > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper ClusterClient instead
> > of
> > > > >> >> >>>>> letting
> > > > >> >> >>>>>>> cli
> > > > >> >> >>>>>>>> to
> > > > >> >> >>>>>>>>>>>>>>>>> create
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code refactoring,
> > so
> > > > >> >> >> I
> > > > >> >> >>>>> think
> > > > >> >> >>>>>> we
> > > > >> >> >>>>>>>> can
> > > > >> >> >>>>>>>>>>>>>>>> defer
> > > > >> >> >>>>>>>>>>>>>>>>>> it
> > > > >> >> >>>>>>>>>>>>>>>>>>>> for
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5 could be done
> > at
> > > > >> >> >>>> once I
> > > > >> >> >>>>>>>>>>> believe.
> > > > >> >> >>>>>>>>>>>>>>>> Let
> > > > >> >> >>>>>>>>>>>>>>>>> me
> > > > >> >> >>>>>>>>>>>>>>>>>>>> know
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>> your
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback, thanks
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> --
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>> --
> > > > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > > >> >> >>>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>> --
> > > > >> >> >>>>>>>>>>>>>>>>> Best Regards
> > > > >> >> >>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> > > > >> >> >>>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>> --
> > > > >> >> >>>>>>>>>>>>>>> Best Regards
> > > > >> >> >>>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>> Jeff Zhang
> > > > >> >> >>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>> --
> > > > >> >> >>>>>>>>>>>>> Best Regards
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>> Jeff Zhang
> > > > >> >> >>>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>>
> > > > >> >> >>>>>>>>>>>
> > > > >> >> >>>>>>>>>>
> > > > >> >> >>>>>>>>
> > > > >> >> >>>>>>>> --
> > > > >> >> >>>>>>>> Best Regards
> > > > >> >> >>>>>>>>
> > > > >> >> >>>>>>>> Jeff Zhang
> > > > >> >> >>>>>>>>
> > > > >> >> >>>>>>>
> > > > >> >> >>>>>>
> > > > >> >> >>>>>
> > > > >> >> >>>>>
> > > > >> >> >>>>> --
> > > > >> >> >>>>> Best Regards
> > > > >> >> >>>>>
> > > > >> >> >>>>> Jeff Zhang
> > > > >> >> >>>>>
> > > > >> >> >>>>
> > > > >> >> >>>
> > > > >> >> >>
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > --
> > > > >> >> > Best Regards
> > > > >> >> >
> > > > >> >> > Jeff Zhang
> > > > >> >>
> > > > >> >>
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Yang Wang <da...@gmail.com>.

From the user's perspective, it is really confused about the scope of
per-job cluster.


If it means a flink cluster with single job, so that we could get better
isolation.

Now it does not matter how we deploy the cluster, directly deploy(mode1)

or start a flink cluster and then submit job through cluster client(mode2).


Otherwise, if it just means directly deploy, how should we name the mode2,

session with job or something else?

We could also benefit from the mode2. Users could get the same isolation
with mode1.

The user code and dependencies will be loaded by user class loader

to avoid class conflict with framework.



Anyway, both of the two submission modes are useful.

We just need to clarify the concepts.




Best,

Yang

Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午5:58写道：

> Thanks for the clarification.
>
> The idea JobDeployer ever came into my mind when I was muddled with
> how to execute per-job mode and session mode with the same user code
> and framework codepath.
>
> With the concept JobDeployer we back to the statement that environment
> knows every configs of cluster deployment and job submission. We
> configure or generate from configuration a specific JobDeployer in
> environment and then code align on
>
> *JobClient client = env.execute().get();*
>
> which in session mode returned by clusterClient.submitJob and in per-job
> mode returned by clusterDescriptor.deployJobCluster.
>
> Here comes a problem that currently we directly run ClusterEntrypoint
> with extracted job graph. Follow the JobDeployer way we'd better
> align entry point of per-job deployment at JobDeployer. Users run
> their main method or by a Cli(finally call main method) to deploy the
> job cluster.
>
> Best,
> tison.
>
>
> Stephan Ewen <se...@apache.org> 于2019年8月20日周二 下午4:40写道：
>
> > Till has made some good comments here.
> >
> > Two things to add:
> >
> >   - The job mode is very nice in the way that it runs the client inside
> the
> > cluster (in the same image/process that is the JM) and thus unifies both
> > applications and what the Spark world calls the "driver mode".
> >
> >   - Another thing I would add is that during the FLIP-6 design, we were
> > thinking about setups where Dispatcher and JobManager are separate
> > processes.
> >     A Yarn or Mesos Dispatcher of a session could run independently (even
> > as privileged processes executing no code).
> >     Then you the "per-job" mode could still be helpful: when a job is
> > submitted to the dispatcher, it launches the JM again in a per-job mode,
> so
> > that JM and TM processes are bound to teh job only. For higher security
> > setups, it is important that processes are not reused across jobs.
> >
> > On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <tr...@apache.org>
> > wrote:
> >
> > > I would not be in favour of getting rid of the per-job mode since it
> > > simplifies the process of running Flink jobs considerably. Moreover, it
> > is
> > > not only well suited for container deployments but also for deployments
> > > where you want to guarantee job isolation. For example, a user could
> use
> > > the per-job mode on Yarn to execute his job on a separate cluster.
> > >
> > > I think that having two notions of cluster deployments (session vs.
> > per-job
> > > mode) does not necessarily contradict your ideas for the client api
> > > refactoring. For example one could have the following interfaces:
> > >
> > > - ClusterDeploymentDescriptor: encapsulates the logic how to deploy a
> > > cluster.
> > > - ClusterClient: allows to interact with a cluster
> > > - JobClient: allows to interact with a running job
> > >
> > > Now the ClusterDeploymentDescriptor could have two methods:
> > >
> > > - ClusterClient deploySessionCluster()
> > > - JobClusterClient/JobClient deployPerJobCluster(JobGraph)
> > >
> > > where JobClusterClient is either a supertype of ClusterClient which
> does
> > > not give you the functionality to submit jobs or deployPerJobCluster
> > > returns directly a JobClient.
> > >
> > > When setting up the ExecutionEnvironment, one would then not provide a
> > > ClusterClient to submit jobs but a JobDeployer which, depending on the
> > > selected mode, either uses a ClusterClient (session mode) to submit
> jobs
> > or
> > > a ClusterDeploymentDescriptor to deploy per a job mode cluster with the
> > job
> > > to execute.
> > >
> > > These are just some thoughts how one could make it working because I
> > > believe there is some value in using the per job mode from the
> > > ExecutionEnvironment.
> > >
> > > Concerning the web submission, this is indeed a bit tricky. From a
> > cluster
> > > management stand point, I would in favour of not executing user code on
> > the
> > > REST endpoint. Especially when considering security, it would be good
> to
> > > have a well defined cluster behaviour where it is explicitly stated
> where
> > > user code and, thus, potentially risky code is executed. Ideally we
> limit
> > > it to the TaskExecutor and JobMaster.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <
> pompermaier@okkam.it
> > >
> > > wrote:
> > >
> > > > In my opinion the client should not use any environment to get the
> Job
> > > > graph because the jar should reside ONLY on the cluster (and not in
> the
> > > > client classpath otherwise there are always inconsistencies between
> > > client
> > > > and Flink Job manager's classpath).
> > > > In the YARN, Mesos and Kubernetes scenarios you have the jar but you
> > > could
> > > > start a cluster that has the jar on the Job Manager as well (but this
> > is
> > > > the only case where I think you can assume that the client has the
> jar
> > on
> > > > the classpath..in the REST job submission you don't have any
> > classpath).
> > > >
> > > > Thus, always in my opinion, the JobGraph should be generated by the
> Job
> > > > Manager REST API.
> > > >
> > > >
> > > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <wa...@gmail.com>
> > wrote:
> > > >
> > > >> I would like to involve Till & Stephan here to clarify some concept
> of
> > > >> per-job mode.
> > > >>
> > > >> The term per-job is one of modes a cluster could run on. It is
> mainly
> > > >> aimed
> > > >> at spawn
> > > >> a dedicated cluster for a specific job while the job could be
> packaged
> > > >> with
> > > >> Flink
> > > >> itself and thus the cluster initialized with job so that get rid of
> a
> > > >> separated
> > > >> submission step.
> > > >>
> > > >> This is useful for container deployments where one create his image
> > with
> > > >> the job
> > > >> and then simply deploy the container.
> > > >>
> > > >> However, it is out of client scope since a client(ClusterClient for
> > > >> example) is for
> > > >> communicate with an existing cluster and performance actions.
> > Currently,
> > > >> in
> > > >> per-job
> > > >> mode, we extract the job graph and bundle it into cluster deployment
> > and
> > > >> thus no
> > > >> concept of client get involved. It looks like reasonable to exclude
> > the
> > > >> deployment
> > > >> of per-job cluster from client api and use dedicated utility
> > > >> classes(deployers) for
> > > >> deployment.
> > > >>
> > > >> Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午12:37写道：
> > > >>
> > > >> > Hi Aljoscha,
> > > >> >
> > > >> > Thanks for your reply and participance. The Google Doc you linked
> to
> > > >> > requires
> > > >> > permission and I think you could use a share link instead.
> > > >> >
> > > >> > I agree with that we almost reach a consensus that JobClient is
> > > >> necessary
> > > >> > to
> > > >> > interacte with a running Job.
> > > >> >
> > > >> > Let me check your open questions one by one.
> > > >> >
> > > >> > 1. Separate cluster creation and job submission for per-job mode.
> > > >> >
> > > >> > As you mentioned here is where the opinions diverge. In my
> document
> > > >> there
> > > >> > is
> > > >> > an alternative[2] that proposes excluding per-job deployment from
> > > client
> > > >> > api
> > > >> > scope and now I find it is more reasonable we do the exclusion.
> > > >> >
> > > >> > When in per-job mode, a dedicated JobCluster is launched to
> execute
> > > the
> > > >> > specific job. It is like a Flink Application more than a
> submission
> > > >> > of Flink Job. Client only takes care of job submission and assume
> > > there
> > > >> is
> > > >> > an existing cluster. In this way we are able to consider per-job
> > > issues
> > > >> > individually and JobClusterEntrypoint would be the utility class
> for
> > > >> > per-job
> > > >> > deployment.
> > > >> >
> > > >> > Nevertheless, user program works in both session mode and per-job
> > mode
> > > >> > without
> > > >> > necessary to change code. JobClient in per-job mode is returned
> from
> > > >> > env.execute as normal. However, it would be no longer a wrapper of
> > > >> > RestClusterClient but a wrapper of PerJobClusterClient which
> > > >> communicates
> > > >> > to Dispatcher locally.
> > > >> >
> > > >> > 2. How to deal with plan preview.
> > > >> >
> > > >> > With env.compile functions users can get JobGraph or FlinkPlan and
> > > thus
> > > >> > they can preview the plan with programming. Typically it looks
> like
> > > >> >
> > > >> > if (preview configured) {
> > > >> >     FlinkPlan plan = env.compile();
> > > >> >     new JSONDumpGenerator(...).dump(plan);
> > > >> > } else {
> > > >> >     env.execute();
> > > >> > }
> > > >> >
> > > >> > And `flink info` would be invalid any more.
> > > >> >
> > > >> > 3. How to deal with Jar Submission at the Web Frontend.
> > > >> >
> > > >> > There is one more thread talked on this topic[1]. Apart from
> > removing
> > > >> > the functions there are two alternatives.
> > > >> >
> > > >> > One is to introduce an interface has a method returns
> > > JobGraph/FilnkPlan
> > > >> > and Jar Submission only support main-class implements this
> > interface.
> > > >> > And then extract the JobGraph/FlinkPlan just by calling the
> method.
> > > >> > In this way, it is even possible to consider a separation of job
> > > >> creation
> > > >> > and job submission.
> > > >> >
> > > >> > The other is, as you mentioned, let execute() do the actual
> > execution.
> > > >> > We won't execute the main method in the WebFrontend but spawn a
> > > process
> > > >> > at WebMonitor side to execute. For return part we could generate
> the
> > > >> > JobID from WebMonitor and pass it to the execution environemnt.
> > > >> >
> > > >> > 4. How to deal with detached mode.
> > > >> >
> > > >> > I think detached mode is a temporary solution for non-blocking
> > > >> submission.
> > > >> > In my document both submission and execution return a
> > > CompletableFuture
> > > >> and
> > > >> > users control whether or not wait for the result. In this point we
> > > don't
> > > >> > need a detached option but the functionality is covered.
> > > >> >
> > > >> > 5. How does per-job mode interact with interactive programming.
> > > >> >
> > > >> > All of YARN, Mesos and Kubernetes scenarios follow the pattern
> > launch
> > > a
> > > >> > JobCluster now. And I don't think there would be inconsistency
> > between
> > > >> > different resource management.
> > > >> >
> > > >> > Best,
> > > >> > tison.
> > > >> >
> > > >> > [1]
> > > >> >
> > > >>
> > >
> >
> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> > > >> > [2]
> > > >> >
> > > >>
> > >
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> > > >> >
> > > >> > Aljoscha Krettek <al...@apache.org> 于2019年8月16日周五 下午9:20写道：
> > > >> >
> > > >> >> Hi,
> > > >> >>
> > > >> >> I read both Jeffs initial design document and the newer document
> by
> > > >> >> Tison. I also finally found the time to collect our thoughts on
> the
> > > >> issue,
> > > >> >> I had quite some discussions with Kostas and this is the result:
> > [1].
> > > >> >>
> > > >> >> I think overall we agree that this part of the code is in dire
> need
> > > of
> > > >> >> some refactoring/improvements but I think there are still some
> open
> > > >> >> questions and some differences in opinion what those refactorings
> > > >> should
> > > >> >> look like.
> > > >> >>
> > > >> >> I think the API-side is quite clear, i.e. we need some JobClient
> > API
> > > >> that
> > > >> >> allows interacting with a running Job. It could be worthwhile to
> > spin
> > > >> that
> > > >> >> off into a separate FLIP because we can probably find consensus
> on
> > > that
> > > >> >> part more easily.
> > > >> >>
> > > >> >> For the rest, the main open questions from our doc are these:
> > > >> >>
> > > >> >>   - Do we want to separate cluster creation and job submission
> for
> > > >> >> per-job mode? In the past, there were conscious efforts to *not*
> > > >> separate
> > > >> >> job submission from cluster creation for per-job clusters for
> > Mesos,
> > > >> YARN,
> > > >> >> Kubernets (see StandaloneJobClusterEntryPoint). Tison suggests in
> > his
> > > >> >> design document to decouple this in order to unify job
> submission.
> > > >> >>
> > > >> >>   - How to deal with plan preview, which needs to hijack
> execute()
> > > and
> > > >> >> let the outside code catch an exception?
> > > >> >>
> > > >> >>   - How to deal with Jar Submission at the Web Frontend, which
> > needs
> > > to
> > > >> >> hijack execute() and let the outside code catch an exception?
> > > >> >> CliFrontend.run() “hijacks” ExecutionEnvironment.execute() to
> get a
> > > >> >> JobGraph and then execute that JobGraph manually. We could get
> > around
> > > >> that
> > > >> >> by letting execute() do the actual execution. One caveat for this
> > is
> > > >> that
> > > >> >> now the main() method doesn’t return (or is forced to return by
> > > >> throwing an
> > > >> >> exception from execute()) which means that for Jar Submission
> from
> > > the
> > > >> >> WebFrontend we have a long-running main() method running in the
> > > >> >> WebFrontend. This doesn’t sound very good. We could get around
> this
> > > by
> > > >> >> removing the plan preview feature and by removing Jar
> > > >> Submission/Running.
> > > >> >>
> > > >> >>   - How to deal with detached mode? Right now,
> DetachedEnvironment
> > > will
> > > >> >> execute the job and return immediately. If users control when
> they
> > > >> want to
> > > >> >> return, by waiting on the job completion future, how do we deal
> > with
> > > >> this?
> > > >> >> Do we simply remove the distinction between
> detached/non-detached?
> > > >> >>
> > > >> >>   - How does per-job mode interact with “interactive programming”
> > > >> >> (FLIP-36). For YARN, each execute() call could spawn a new Flink
> > YARN
> > > >> >> cluster. What about Mesos and Kubernetes?
> > > >> >>
> > > >> >> The first open question is where the opinions diverge, I think.
> The
> > > >> rest
> > > >> >> are just open questions and interesting things that we need to
> > > >> consider.
> > > >> >>
> > > >> >> Best,
> > > >> >> Aljoscha
> > > >> >>
> > > >> >> [1]
> > > >> >>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > >> >> <
> > > >> >>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > > >> >> >
> > > >> >>
> > > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <zj...@gmail.com>
> wrote:
> > > >> >> >
> > > >> >> > Thanks tison for the effort. I left a few comments.
> > > >> >> >
> > > >> >> >
> > > >> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三 下午8:24写道：
> > > >> >> >
> > > >> >> >> Hi Flavio,
> > > >> >> >>
> > > >> >> >> Thanks for your reply.
> > > >> >> >>
> > > >> >> >> Either current impl and in the design, ClusterClient
> > > >> >> >> never takes responsibility for generating JobGraph.
> > > >> >> >> (what you see in current codebase is several class methods)
> > > >> >> >>
> > > >> >> >> Instead, user describes his program in the main method
> > > >> >> >> with ExecutionEnvironment apis and calls env.compile()
> > > >> >> >> or env.optimize() to get FlinkPlan and JobGraph respectively.
> > > >> >> >>
> > > >> >> >> For listing main classes in a jar and choose one for
> > > >> >> >> submission, you're now able to customize a CLI to do it.
> > > >> >> >> Specifically, the path of jar is passed as arguments and
> > > >> >> >> in the customized CLI you list main classes, choose one
> > > >> >> >> to submit to the cluster.
> > > >> >> >>
> > > >> >> >> Best,
> > > >> >> >> tison.
> > > >> >> >>
> > > >> >> >>
> > > >> >> >> Flavio Pompermaier <po...@okkam.it> 于2019年7月31日周三
> > 下午8:12写道：
> > > >> >> >>
> > > >> >> >>> Just one note on my side: it is not clear to me whether the
> > > client
> > > >> >> needs
> > > >> >> >> to
> > > >> >> >>> be able to generate a job graph or not.
> > > >> >> >>> In my opinion, the job jar must resides only on the
> > > >> server/jobManager
> > > >> >> >> side
> > > >> >> >>> and the client requires a way to get the job graph.
> > > >> >> >>> If you really want to access to the job graph, I'd add a
> > > dedicated
> > > >> >> method
> > > >> >> >>> on the ClusterClient. like:
> > > >> >> >>>
> > > >> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
> > > >> >> >>>   - listMainClasses(jarId): List<String>
> > > >> >> >>>
> > > >> >> >>> These would require some addition also on the job manager
> > > endpoint
> > > >> as
> > > >> >> >>> well..what do you think?
> > > >> >> >>>
> > > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <
> > wander4096@gmail.com
> > > >
> > > >> >> wrote:
> > > >> >> >>>
> > > >> >> >>>> Hi all,
> > > >> >> >>>>
> > > >> >> >>>> Here is a document[1] on client api enhancement from our
> > > >> perspective.
> > > >> >> >>>> We have investigated current implementations. And we propose
> > > >> >> >>>>
> > > >> >> >>>> 1. Unify the implementation of cluster deployment and job
> > > >> submission
> > > >> >> in
> > > >> >> >>>> Flink.
> > > >> >> >>>> 2. Provide programmatic interfaces to allow flexible job and
> > > >> cluster
> > > >> >> >>>> management.
> > > >> >> >>>>
> > > >> >> >>>> The first proposal is aimed at reducing code paths of
> cluster
> > > >> >> >> deployment
> > > >> >> >>>> and
> > > >> >> >>>> job submission so that one can adopt Flink in his usage
> > easily.
> > > >> The
> > > >> >> >>> second
> > > >> >> >>>> proposal is aimed at providing rich interfaces for advanced
> > > users
> > > >> >> >>>> who want to make accurate control of these stages.
> > > >> >> >>>>
> > > >> >> >>>> Quick reference on open questions:
> > > >> >> >>>>
> > > >> >> >>>> 1. Exclude job cluster deployment from client side or
> redefine
> > > the
> > > >> >> >>> semantic
> > > >> >> >>>> of job cluster? Since it fits in a process quite different
> > from
> > > >> >> session
> > > >> >> >>>> cluster deployment and job submission.
> > > >> >> >>>>
> > > >> >> >>>> 2. Maintain the codepaths handling class
> > > o.a.f.api.common.Program
> > > >> or
> > > >> >> >>>> implement customized program handling logic by customized
> > > >> >> CliFrontend?
> > > >> >> >>>> See also this thread[2] and the document[1].
> > > >> >> >>>>
> > > >> >> >>>> 3. Expose ClusterClient as public api or just expose api in
> > > >> >> >>>> ExecutionEnvironment
> > > >> >> >>>> and delegate them to ClusterClient? Further, in either way
> is
> > it
> > > >> >> worth
> > > >> >> >> to
> > > >> >> >>>> introduce a JobClient which is an encapsulation of
> > ClusterClient
> > > >> that
> > > >> >> >>>> associated to specific job?
> > > >> >> >>>>
> > > >> >> >>>> Best,
> > > >> >> >>>> tison.
> > > >> >> >>>>
> > > >> >> >>>> [1]
> > > >> >> >>>>
> > > >> >> >>>>
> > > >> >> >>>
> > > >> >> >>
> > > >> >>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> > > >> >> >>>> [2]
> > > >> >> >>>>
> > > >> >> >>>>
> > > >> >> >>>
> > > >> >> >>
> > > >> >>
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> > > >> >> >>>>
> > > >> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三 上午9:19写道：
> > > >> >> >>>>
> > > >> >> >>>>> Thanks Stephan, I will follow up this issue in next few
> > weeks,
> > > >> and
> > > >> >> >> will
> > > >> >> >>>>> refine the design doc. We could discuss more details after
> > 1.9
> > > >> >> >> release.
> > > >> >> >>>>>
> > > >> >> >>>>> Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：
> > > >> >> >>>>>
> > > >> >> >>>>>> Hi all!
> > > >> >> >>>>>>
> > > >> >> >>>>>> This thread has stalled for a bit, which I assume ist
> mostly
> > > >> due to
> > > >> >> >>> the
> > > >> >> >>>>>> Flink 1.9 feature freeze and release testing effort.
> > > >> >> >>>>>>
> > > >> >> >>>>>> I personally still recognize this issue as one important
> to
> > be
> > > >> >> >>> solved.
> > > >> >> >>>>> I'd
> > > >> >> >>>>>> be happy to help resume this discussion soon (after the
> 1.9
> > > >> >> >> release)
> > > >> >> >>>> and
> > > >> >> >>>>>> see if we can do some step towards this in Flink 1.10.
> > > >> >> >>>>>>
> > > >> >> >>>>>> Best,
> > > >> >> >>>>>> Stephan
> > > >> >> >>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <
> > > >> >> >>>>> pompermaier@okkam.it>
> > > >> >> >>>>>> wrote:
> > > >> >> >>>>>>
> > > >> >> >>>>>>> That's exactly what I suggested a long time ago: the
> Flink
> > > REST
> > > >> >> >>>> client
> > > >> >> >>>>>>> should not require any Flink dependency, only http
> library
> > to
> > > >> >> >> call
> > > >> >> >>>> the
> > > >> >> >>>>>> REST
> > > >> >> >>>>>>> services to submit and monitor a job.
> > > >> >> >>>>>>> What I suggested also in [1] was to have a way to
> > > automatically
> > > >> >> >>>> suggest
> > > >> >> >>>>>> the
> > > >> >> >>>>>>> user (via a UI) the available main classes and their
> > required
> > > >> >> >>>>>>> parameters[2].
> > > >> >> >>>>>>> Another problem we have with Flink is that the Rest
> client
> > > and
> > > >> >> >> the
> > > >> >> >>>> CLI
> > > >> >> >>>>>> one
> > > >> >> >>>>>>> behaves differently and we use the CLI client (via ssh)
> > > because
> > > >> >> >> it
> > > >> >> >>>>> allows
> > > >> >> >>>>>>> to call some other method after env.execute() [3] (we
> have
> > to
> > > >> >> >> call
> > > >> >> >>>>>> another
> > > >> >> >>>>>>> REST service to signal the end of the job).
> > > >> >> >>>>>>> Int his regard, a dedicated interface, like the
> JobListener
> > > >> >> >>> suggested
> > > >> >> >>>>> in
> > > >> >> >>>>>>> the previous emails, would be very helpful (IMHO).
> > > >> >> >>>>>>>
> > > >> >> >>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-10864
> > > >> >> >>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-10862
> > > >> >> >>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-10879
> > > >> >> >>>>>>>
> > > >> >> >>>>>>> Best,
> > > >> >> >>>>>>> Flavio
> > > >> >> >>>>>>>
> > > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <
> > zjffdu@gmail.com
> > > >
> > > >> >> >>> wrote:
> > > >> >> >>>>>>>
> > > >> >> >>>>>>>> Hi, Tison,
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>> Thanks for your comments. Overall I agree with you that
> it
> > > is
> > > >> >> >>>>> difficult
> > > >> >> >>>>>>> for
> > > >> >> >>>>>>>> down stream project to integrate with flink and we need
> to
> > > >> >> >>> refactor
> > > >> >> >>>>> the
> > > >> >> >>>>>>>> current flink client api.
> > > >> >> >>>>>>>> And I agree that CliFrontend should only parsing command
> > > line
> > > >> >> >>>>> arguments
> > > >> >> >>>>>>> and
> > > >> >> >>>>>>>> then pass them to ExecutionEnvironment. It is
> > > >> >> >>>> ExecutionEnvironment's
> > > >> >> >>>>>>>> responsibility to compile job, create cluster, and
> submit
> > > job.
> > > >> >> >>>>> Besides
> > > >> >> >>>>>>>> that, Currently flink has many ExecutionEnvironment
> > > >> >> >>>> implementations,
> > > >> >> >>>>>> and
> > > >> >> >>>>>>>> flink will use the specific one based on the context.
> > IMHO,
> > > it
> > > >> >> >> is
> > > >> >> >>>> not
> > > >> >> >>>>>>>> necessary, ExecutionEnvironment should be able to do the
> > > right
> > > >> >> >>>> thing
> > > >> >> >>>>>>> based
> > > >> >> >>>>>>>> on the FlinkConf it is received. Too many
> > > ExecutionEnvironment
> > > >> >> >>>>>>>> implementation is another burden for downstream project
> > > >> >> >>>> integration.
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>> One thing I'd like to mention is flink's scala shell and
> > sql
> > > >> >> >>>> client,
> > > >> >> >>>>>>>> although they are sub-modules of flink, they could be
> > > treated
> > > >> >> >> as
> > > >> >> >>>>>>> downstream
> > > >> >> >>>>>>>> project which use flink's client api. Currently you will
> > > find
> > > >> >> >> it
> > > >> >> >>> is
> > > >> >> >>>>> not
> > > >> >> >>>>>>>> easy for them to integrate with flink, they share many
> > > >> >> >> duplicated
> > > >> >> >>>>> code.
> > > >> >> >>>>>>> It
> > > >> >> >>>>>>>> is another sign that we should refactor flink client
> api.
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>> I believe it is a large and hard change, and I am afraid
> > we
> > > >> can
> > > >> >> >>> not
> > > >> >> >>>>>> keep
> > > >> >> >>>>>>>> compatibility since many of changes are user facing.
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月24日周一
> 下午2:53写道：
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>>> Hi all,
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> After a closer look on our client apis, I can see there
> > are
> > > >> >> >> two
> > > >> >> >>>>> major
> > > >> >> >>>>>>>>> issues to consistency and integration, namely different
> > > >> >> >>>> deployment
> > > >> >> >>>>> of
> > > >> >> >>>>>>>>> job cluster which couples job graph creation and
> cluster
> > > >> >> >>>>> deployment,
> > > >> >> >>>>>>>>> and submission via CliFrontend confusing control flow
> of
> > > job
> > > >> >> >>>> graph
> > > >> >> >>>>>>>>> compilation and job submission. I'd like to follow the
> > > >> >> >> discuss
> > > >> >> >>>>> above,
> > > >> >> >>>>>>>>> mainly the process described by Jeff and Stephan, and
> > share
> > > >> >> >> my
> > > >> >> >>>>>>>>> ideas on these issues.
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> 1) CliFrontend confuses the control flow of job
> > compilation
> > > >> >> >> and
> > > >> >> >>>>>>>> submission.
> > > >> >> >>>>>>>>> Following the process of job submission Stephan and
> Jeff
> > > >> >> >>>> described,
> > > >> >> >>>>>>>>> execution environment knows all configs of the cluster
> > and
> > > >> >> >>>>>>> topos/settings
> > > >> >> >>>>>>>>> of the job. Ideally, in the main method of user
> program,
> > it
> > > >> >> >>> calls
> > > >> >> >>>>>>>> #execute
> > > >> >> >>>>>>>>> (or named #submit) and Flink deploys the cluster,
> compile
> > > the
> > > >> >> >>> job
> > > >> >> >>>>>> graph
> > > >> >> >>>>>>>>> and submit it to the cluster. However, current
> > CliFrontend
> > > >> >> >> does
> > > >> >> >>>> all
> > > >> >> >>>>>>> these
> > > >> >> >>>>>>>>> things inside its #runProgram method, which introduces
> a
> > > lot
> > > >> >> >> of
> > > >> >> >>>>>>>> subclasses
> > > >> >> >>>>>>>>> of (stream) execution environment.
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> Actually, it sets up an exec env that hijacks the
> > > >> >> >>>>>> #execute/executePlan
> > > >> >> >>>>>>>>> method, initializes the job graph and abort execution.
> > And
> > > >> >> >> then
> > > >> >> >>>>>>>>> control flow back to CliFrontend, it deploys the
> > cluster(or
> > > >> >> >>>>> retrieve
> > > >> >> >>>>>>>>> the client) and submits the job graph. This is quite a
> > > >> >> >> specific
> > > >> >> >>>>>>> internal
> > > >> >> >>>>>>>>> process inside Flink and none of consistency to
> anything.
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> 2) Deployment of job cluster couples job graph creation
> > and
> > > >> >> >>>> cluster
> > > >> >> >>>>>>>>> deployment. Abstractly, from user job to a concrete
> > > >> >> >> submission,
> > > >> >> >>>> it
> > > >> >> >>>>>>>> requires
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>>     create JobGraph --\
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> create ClusterClient -->  submit JobGraph
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> such a dependency. ClusterClient was created by
> deploying
> > > or
> > > >> >> >>>>>>> retrieving.
> > > >> >> >>>>>>>>> JobGraph submission requires a compiled JobGraph and
> > valid
> > > >> >> >>>>>>> ClusterClient,
> > > >> >> >>>>>>>>> but the creation of ClusterClient is abstractly
> > independent
> > > >> >> >> of
> > > >> >> >>>> that
> > > >> >> >>>>>> of
> > > >> >> >>>>>>>>> JobGraph. However, in job cluster mode, we deploy job
> > > cluster
> > > >> >> >>>> with
> > > >> >> >>>>> a
> > > >> >> >>>>>>> job
> > > >> >> >>>>>>>>> graph, which means we use another process:
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> create JobGraph --> deploy cluster with the JobGraph
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> Here is another inconsistency and downstream
> > > projects/client
> > > >> >> >>> apis
> > > >> >> >>>>> are
> > > >> >> >>>>>>>>> forced to handle different cases with rare supports
> from
> > > >> >> >> Flink.
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> Since we likely reached a consensus on
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> 1. all configs gathered by Flink configuration and
> passed
> > > >> >> >>>>>>>>> 2. execution environment knows all configs and handles
> > > >> >> >>>>> execution(both
> > > >> >> >>>>>>>>> deployment and submission)
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> to the issues above I propose eliminating
> inconsistencies
> > > by
> > > >> >> >>>>>> following
> > > >> >> >>>>>>>>> approach:
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> 1) CliFrontend should exactly be a front end, at least
> > for
> > > >> >> >>> "run"
> > > >> >> >>>>>>> command.
> > > >> >> >>>>>>>>> That means it just gathered and passed all config from
> > > >> >> >> command
> > > >> >> >>>> line
> > > >> >> >>>>>> to
> > > >> >> >>>>>>>>> the main method of user program. Execution environment
> > > knows
> > > >> >> >>> all
> > > >> >> >>>>> the
> > > >> >> >>>>>>> info
> > > >> >> >>>>>>>>> and with an addition to utils for ClusterClient, we
> > > >> >> >> gracefully
> > > >> >> >>>> get
> > > >> >> >>>>> a
> > > >> >> >>>>>>>>> ClusterClient by deploying or retrieving. In this way,
> we
> > > >> >> >> don't
> > > >> >> >>>>> need
> > > >> >> >>>>>> to
> > > >> >> >>>>>>>>> hijack #execute/executePlan methods and can remove
> > various
> > > >> >> >>>> hacking
> > > >> >> >>>>>>>>> subclasses of exec env, as well as #run methods in
> > > >> >> >>>>> ClusterClient(for
> > > >> >> >>>>>> an
> > > >> >> >>>>>>>>> interface-ized ClusterClient). Now the control flow
> flows
> > > >> >> >> from
> > > >> >> >>>>>>>> CliFrontend
> > > >> >> >>>>>>>>> to the main method and never returns.
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> 2) Job cluster means a cluster for the specific job.
> From
> > > >> >> >>> another
> > > >> >> >>>>>>>>> perspective, it is an ephemeral session. We may
> decouple
> > > the
> > > >> >> >>>>>> deployment
> > > >> >> >>>>>>>>> with a compiled job graph, but start a session with
> idle
> > > >> >> >>> timeout
> > > >> >> >>>>>>>>> and submit the job following.
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> These topics, before we go into more details on design
> or
> > > >> >> >>>>>>> implementation,
> > > >> >> >>>>>>>>> are better to be aware and discussed for a consensus.
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> Best,
> > > >> >> >>>>>>>>> tison.
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月20日周四
> 上午3:21写道：
> > > >> >> >>>>>>>>>
> > > >> >> >>>>>>>>>> Hi Jeff,
> > > >> >> >>>>>>>>>>
> > > >> >> >>>>>>>>>> Thanks for raising this thread and the design
> document!
> > > >> >> >>>>>>>>>>
> > > >> >> >>>>>>>>>> As @Thomas Weise mentioned above, extending config to
> > > flink
> > > >> >> >>>>>>>>>> requires far more effort than it should be. Another
> > > example
> > > >> >> >>>>>>>>>> is we achieve detach mode by introduce another
> execution
> > > >> >> >>>>>>>>>> environment which also hijack #execute method.
> > > >> >> >>>>>>>>>>
> > > >> >> >>>>>>>>>> I agree with your idea that user would configure all
> > > things
> > > >> >> >>>>>>>>>> and flink "just" respect it. On this topic I think the
> > > >> >> >> unusual
> > > >> >> >>>>>>>>>> control flow when CliFrontend handle "run" command is
> > the
> > > >> >> >>>> problem.
> > > >> >> >>>>>>>>>> It handles several configs, mainly about cluster
> > settings,
> > > >> >> >> and
> > > >> >> >>>>>>>>>> thus main method of user program is unaware of them.
> > Also
> > > it
> > > >> >> >>>>>> compiles
> > > >> >> >>>>>>>>>> app to job graph by run the main method with a
> hijacked
> > > exec
> > > >> >> >>>> env,
> > > >> >> >>>>>>>>>> which constrain the main method further.
> > > >> >> >>>>>>>>>>
> > > >> >> >>>>>>>>>> I'd like to write down a few of notes on configs/args
> > pass
> > > >> >> >> and
> > > >> >> >>>>>>> respect,
> > > >> >> >>>>>>>>>> as well as decoupling job compilation and submission.
> > > Share
> > > >> >> >> on
> > > >> >> >>>>> this
> > > >> >> >>>>>>>>>> thread later.
> > > >> >> >>>>>>>>>>
> > > >> >> >>>>>>>>>> Best,
> > > >> >> >>>>>>>>>> tison.
> > > >> >> >>>>>>>>>>
> > > >> >> >>>>>>>>>>
> > > >> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一
> > > >> >> >> 下午7:29写道：
> > > >> >> >>>>>>>>>>
> > > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the design document.
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>> We are also working on refactoring ClusterClient to
> > allow
> > > >> >> >>>>> flexible
> > > >> >> >>>>>>> and
> > > >> >> >>>>>>>>>>> efficient job management in our real-time platform.
> > > >> >> >>>>>>>>>>> We would like to draft a document to share our ideas
> > with
> > > >> >> >>> you.
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>> I think it's a good idea to have something like
> Apache
> > > Livy
> > > >> >> >>> for
> > > >> >> >>>>>>> Flink,
> > > >> >> >>>>>>>>>>> and
> > > >> >> >>>>>>>>>>> the efforts discussed here will take a great step
> > forward
> > > >> >> >> to
> > > >> >> >>>> it.
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>> Regards,
> > > >> >> >>>>>>>>>>> Xiaogang
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>> Flavio Pompermaier <po...@okkam.it>
> > 于2019年6月17日周一
> > > >> >> >>>>> 下午7:13写道：
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>>> Is there any possibility to have something like
> Apache
> > > >> >> >> Livy
> > > >> >> >>>> [1]
> > > >> >> >>>>>>> also
> > > >> >> >>>>>>>>>>> for
> > > >> >> >>>>>>>>>>>> Flink in the future?
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <
> > > >> >> >>> zjffdu@gmail.com
> > > >> >> >>>>>
> > > >> >> >>>>>>> wrote:
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>> Any API we expose should not have dependencies
> on
> > > >> >> >>> the
> > > >> >> >>>>>>> runtime
> > > >> >> >>>>>>>>>>>>> (flink-runtime) package or other implementation
> > > >> >> >> details.
> > > >> >> >>> To
> > > >> >> >>>>> me,
> > > >> >> >>>>>>>> this
> > > >> >> >>>>>>>>>>>> means
> > > >> >> >>>>>>>>>>>>> that the current ClusterClient cannot be exposed to
> > > >> >> >> users
> > > >> >> >>>>>> because
> > > >> >> >>>>>>>> it
> > > >> >> >>>>>>>>>>>> uses
> > > >> >> >>>>>>>>>>>>> quite some classes from the optimiser and runtime
> > > >> >> >>> packages.
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>> We should change ClusterClient from class to
> > interface.
> > > >> >> >>>>>>>>>>>>> ExecutionEnvironment only use the interface
> > > >> >> >> ClusterClient
> > > >> >> >>>>> which
> > > >> >> >>>>>>>>>>> should be
> > > >> >> >>>>>>>>>>>>> in flink-clients while the concrete implementation
> > > >> >> >> class
> > > >> >> >>>>> could
> > > >> >> >>>>>> be
> > > >> >> >>>>>>>> in
> > > >> >> >>>>>>>>>>>>> flink-runtime.
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>> What happens when a failure/restart in the
> client
> > > >> >> >>>>> happens?
> > > >> >> >>>>>>>> There
> > > >> >> >>>>>>>>>>> need
> > > >> >> >>>>>>>>>>>>> to be a way of re-establishing the connection to
> the
> > > >> >> >> job,
> > > >> >> >>>> set
> > > >> >> >>>>>> up
> > > >> >> >>>>>>>> the
> > > >> >> >>>>>>>>>>>>> listeners again, etc.
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>> Good point.  First we need to define what does
> > > >> >> >>>>> failure/restart
> > > >> >> >>>>>> in
> > > >> >> >>>>>>>> the
> > > >> >> >>>>>>>>>>>>> client mean. IIUC, that usually mean network
> failure
> > > >> >> >>> which
> > > >> >> >>>>> will
> > > >> >> >>>>>>>>>>> happen in
> > > >> >> >>>>>>>>>>>>> class RestClient. If my understanding is correct,
> > > >> >> >>>>> restart/retry
> > > >> >> >>>>>>>>>>> mechanism
> > > >> >> >>>>>>>>>>>>> should be done in RestClient.
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org>
> 于2019年6月11日周二
> > > >> >> >>>>>> 下午11:10写道：
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>> Some points to consider:
> > > >> >> >>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>> * Any API we expose should not have dependencies
> on
> > > >> >> >> the
> > > >> >> >>>>>> runtime
> > > >> >> >>>>>>>>>>>>>> (flink-runtime) package or other implementation
> > > >> >> >>> details.
> > > >> >> >>>> To
> > > >> >> >>>>>> me,
> > > >> >> >>>>>>>>>>> this
> > > >> >> >>>>>>>>>>>>> means
> > > >> >> >>>>>>>>>>>>>> that the current ClusterClient cannot be exposed
> to
> > > >> >> >>> users
> > > >> >> >>>>>>> because
> > > >> >> >>>>>>>>>>> it
> > > >> >> >>>>>>>>>>>>> uses
> > > >> >> >>>>>>>>>>>>>> quite some classes from the optimiser and runtime
> > > >> >> >>>> packages.
> > > >> >> >>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>> * What happens when a failure/restart in the
> client
> > > >> >> >>>>> happens?
> > > >> >> >>>>>>>> There
> > > >> >> >>>>>>>>>>> need
> > > >> >> >>>>>>>>>>>>> to
> > > >> >> >>>>>>>>>>>>>> be a way of re-establishing the connection to the
> > > >> >> >> job,
> > > >> >> >>>> set
> > > >> >> >>>>> up
> > > >> >> >>>>>>> the
> > > >> >> >>>>>>>>>>>>> listeners
> > > >> >> >>>>>>>>>>>>>> again, etc.
> > > >> >> >>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>> Aljoscha
> > > >> >> >>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff Zhang <
> > > >> >> >>>> zjffdu@gmail.com>
> > > >> >> >>>>>>>> wrote:
> > > >> >> >>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is late as you
> > > >> >> >> expected.
> > > >> >> >>>>> Here's
> > > >> >> >>>>>>> the
> > > >> >> >>>>>>>>>>>> design
> > > >> >> >>>>>>>>>>>>>> doc
> > > >> >> >>>>>>>>>>>>>>> I drafted, welcome any comments and feedback.
> > > >> >> >>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>
> > > >> >> >>>>
> > > >> >> >>>
> > > >> >> >>
> > > >> >>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > >> >> >>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四
> > > >> >> >>>> 下午8:43写道：
> > > >> >> >>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>> Nice that this discussion is happening.
> > > >> >> >>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also revisit the entire
> role
> > > >> >> >>> of
> > > >> >> >>>>> the
> > > >> >> >>>>>>>>>>>> environments
> > > >> >> >>>>>>>>>>>>>>>> again.
> > > >> >> >>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> > > >> >> >>>>>>>>>>>>>>>> - the environments take care of the specific
> > > >> >> >> setup
> > > >> >> >>>> for
> > > >> >> >>>>>>>>>>> standalone
> > > >> >> >>>>>>>>>>>> (no
> > > >> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
> > > >> >> >>>>>>>>>>>>>>>> - the session ones have control over the
> session.
> > > >> >> >>> The
> > > >> >> >>>>>>>>>>> environment
> > > >> >> >>>>>>>>>>>>> holds
> > > >> >> >>>>>>>>>>>>>>>> the session client.
> > > >> >> >>>>>>>>>>>>>>>> - running a job gives a "control" object for
> that
> > > >> >> >>>> job.
> > > >> >> >>>>>> That
> > > >> >> >>>>>>>>>>>> behavior
> > > >> >> >>>>>>>>>>>>> is
> > > >> >> >>>>>>>>>>>>>>>> the same in all environments.
> > > >> >> >>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>> The actual implementation diverged quite a bit
> > > >> >> >> from
> > > >> >> >>>>> that.
> > > >> >> >>>>>>>> Happy
> > > >> >> >>>>>>>>>>> to
> > > >> >> >>>>>>>>>>>>> see a
> > > >> >> >>>>>>>>>>>>>>>> discussion about straitening this out a bit
> more.
> > > >> >> >>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
> > > >> >> >>>>>>> zjffdu@gmail.com>
> > > >> >> >>>>>>>>>>>> wrote:
> > > >> >> >>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>> Hi folks,
> > > >> >> >>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>> Sorry for late response, It seems we reach
> > > >> >> >>> consensus
> > > >> >> >>>> on
> > > >> >> >>>>>>>> this, I
> > > >> >> >>>>>>>>>>>> will
> > > >> >> >>>>>>>>>>>>>>>> create
> > > >> >> >>>>>>>>>>>>>>>>> FLIP for this with more detailed design
> > > >> >> >>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org> 于2018年12月21日周五
> > > >> >> >>>>> 上午11:43写道：
> > > >> >> >>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>> Great to see this discussion seeded! The
> > > >> >> >> problems
> > > >> >> >>>> you
> > > >> >> >>>>>> face
> > > >> >> >>>>>>>>>>> with
> > > >> >> >>>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also affecting other
> > > >> >> >>>>> downstream
> > > >> >> >>>>>>>>>>> projects,
> > > >> >> >>>>>>>>>>>>>> like
> > > >> >> >>>>>>>>>>>>>>>>>> Beam.
> > > >> >> >>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint restore option
> in
> > > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> > > >> >> >>>>>>>>>>>>>>>>> [1]
> > > >> >> >>>>>>>>>>>>>>>>>> and that was more difficult than it should be.
> > > >> >> >> The
> > > >> >> >>>>> main
> > > >> >> >>>>>>>> issue
> > > >> >> >>>>>>>>>>> is
> > > >> >> >>>>>>>>>>>>> that
> > > >> >> >>>>>>>>>>>>>>>>>> environment and cluster client aren't
> decoupled.
> > > >> >> >>>>> Ideally
> > > >> >> >>>>>>> it
> > > >> >> >>>>>>>>>>> should
> > > >> >> >>>>>>>>>>>>> be
> > > >> >> >>>>>>>>>>>>>>>>>> possible to just get the matching cluster
> client
> > > >> >> >>>> from
> > > >> >> >>>>>> the
> > > >> >> >>>>>>>>>>>>> environment
> > > >> >> >>>>>>>>>>>>>>>> and
> > > >> >> >>>>>>>>>>>>>>>>>> then control the job through it (environment
> as
> > > >> >> >>>>> factory
> > > >> >> >>>>>>> for
> > > >> >> >>>>>>>>>>>> cluster
> > > >> >> >>>>>>>>>>>>>>>>>> client). But note that the environment classes
> > > >> >> >> are
> > > >> >> >>>>> part
> > > >> >> >>>>>> of
> > > >> >> >>>>>>>> the
> > > >> >> >>>>>>>>>>>>> public
> > > >> >> >>>>>>>>>>>>>>>>> API,
> > > >> >> >>>>>>>>>>>>>>>>>> and it is not straightforward to make larger
> > > >> >> >>> changes
> > > >> >> >>>>>>> without
> > > >> >> >>>>>>>>>>>>> breaking
> > > >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
> > > >> >> >>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes internal
> classes
> > > >> >> >>>> like
> > > >> >> >>>>>>>>>>> JobGraph and
> > > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be possible to wrap
> > > >> >> >>> this
> > > >> >> >>>>>> with a
> > > >> >> >>>>>>>> new
> > > >> >> >>>>>>>>>>>>> public
> > > >> >> >>>>>>>>>>>>>>>> API
> > > >> >> >>>>>>>>>>>>>>>>>> that brings the required job control
> > > >> >> >> capabilities
> > > >> >> >>>> for
> > > >> >> >>>>>>>>>>> downstream
> > > >> >> >>>>>>>>>>>>>>>>> projects.
> > > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at some of the
> > > >> >> >>>>> interfaces
> > > >> >> >>>>>> in
> > > >> >> >>>>>>>>>>> Beam
> > > >> >> >>>>>>>>>>>>> while
> > > >> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the portable job
> > > >> >> >> API
> > > >> >> >>>> and
> > > >> >> >>>>>> [3]
> > > >> >> >>>>>>>> for
> > > >> >> >>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>> old
> > > >> >> >>>>>>>>>>>>>>>>>> asynchronous job control from the Beam Java
> SDK.
> > > >> >> >>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>> The backward compatibility discussion [4] is
> > > >> >> >> also
> > > >> >> >>>>>> relevant
> > > >> >> >>>>>>>>>>> here. A
> > > >> >> >>>>>>>>>>>>> new
> > > >> >> >>>>>>>>>>>>>>>>> API
> > > >> >> >>>>>>>>>>>>>>>>>> should shield downstream projects from
> internals
> > > >> >> >>> and
> > > >> >> >>>>>> allow
> > > >> >> >>>>>>>>>>> them to
> > > >> >> >>>>>>>>>>>>>>>>>> interoperate with multiple future Flink
> versions
> > > >> >> >>> in
> > > >> >> >>>>> the
> > > >> >> >>>>>>> same
> > > >> >> >>>>>>>>>>>> release
> > > >> >> >>>>>>>>>>>>>>>> line
> > > >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
> > > >> >> >>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>> Thanks,
> > > >> >> >>>>>>>>>>>>>>>>>> Thomas
> > > >> >> >>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>> [1] https://github.com/apache/flink/pull/7249
> > > >> >> >>>>>>>>>>>>>>>>>> [2]
> > > >> >> >>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>
> > > >> >> >>>>
> > > >> >> >>>
> > > >> >> >>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > >> >> >>>>>>>>>>>>>>>>>> [3]
> > > >> >> >>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>
> > > >> >> >>>>
> > > >> >> >>>
> > > >> >> >>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > >> >> >>>>>>>>>>>>>>>>>> [4]
> > > >> >> >>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>
> > > >> >> >>>>
> > > >> >> >>>
> > > >> >> >>
> > > >> >>
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > >> >> >>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
> > > >> >> >>>>>>>> zjffdu@gmail.com>
> > > >> >> >>>>>>>>>>>>> wrote:
> > > >> >> >>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
> > > >> >> >>> able
> > > >> >> >>>> to
> > > >> >> >>>>>>>> define
> > > >> >> >>>>>>>>>>>> where
> > > >> >> >>>>>>>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>>> job
> > > >> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is actually
> > > >> >> >>>>>> independent
> > > >> >> >>>>>>>> of
> > > >> >> >>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>> job
> > > >> >> >>>>>>>>>>>>>>>>>>> development and is something which is decided
> > > >> >> >> at
> > > >> >> >>>>>>> deployment
> > > >> >> >>>>>>>>>>> time.
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>> User don't need to specify execution mode
> > > >> >> >>>>>>> programmatically.
> > > >> >> >>>>>>>>>>> They
> > > >> >> >>>>>>>>>>>>> can
> > > >> >> >>>>>>>>>>>>>>>>> also
> > > >> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from the arguments in
> > > >> >> >>> flink
> > > >> >> >>>>> run
> > > >> >> >>>>>>>>>>> command.
> > > >> >> >>>>>>>>>>>>> e.g.
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster ....
> > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
> > > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that the
> > > >> >> >>>> ExecutionEnvironment
> > > >> >> >>>>>> is
> > > >> >> >>>>>>>> not
> > > >> >> >>>>>>>>>>>>>>>> directly
> > > >> >> >>>>>>>>>>>>>>>>>>> initialized by the user and instead context
> > > >> >> >>>> sensitive
> > > >> >> >>>>>> how
> > > >> >> >>>>>>>> you
> > > >> >> >>>>>>>>>>>> want
> > > >> >> >>>>>>>>>>>>> to
> > > >> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs. IDE, for
> > > >> >> >>> example).
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink would create
> > > >> >> >>>>> different
> > > >> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment based on
> different
> > > >> >> >>>>>> submission
> > > >> >> >>>>>>>>>>>> scenarios
> > > >> >> >>>>>>>>>>>>>>>>>> (Flink
> > > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is kind of hack
> > > >> >> >> approach,
> > > >> >> >>>> not
> > > >> >> >>>>>> so
> > > >> >> >>>>>>>>>>>>>>>>> straightforward.
> > > >> >> >>>>>>>>>>>>>>>>>>> What I suggested above is that is that flink
> > > >> >> >>> should
> > > >> >> >>>>>>> always
> > > >> >> >>>>>>>>>>> create
> > > >> >> >>>>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>>> same
> > > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with different
> > > >> >> >>>>> configuration,
> > > >> >> >>>>>>> and
> > > >> >> >>>>>>>>>>> based
> > > >> >> >>>>>>>>>>>> on
> > > >> >> >>>>>>>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>>>> configuration it would create the proper
> > > >> >> >>>>> ClusterClient
> > > >> >> >>>>>>> for
> > > >> >> >>>>>>>>>>>>> different
> > > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> > > >> >> >>>> 于2018年12月20日周四
> > > >> >> >>>>>>>>>>> 下午11:18写道：
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>> You are probably right that we have code
> > > >> >> >>>> duplication
> > > >> >> >>>>>>> when
> > > >> >> >>>>>>>> it
> > > >> >> >>>>>>>>>>>> comes
> > > >> >> >>>>>>>>>>>>>>>> to
> > > >> >> >>>>>>>>>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient. This should
> be
> > > >> >> >>>>> reduced
> > > >> >> >>>>>> in
> > > >> >> >>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>> future.
> > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
> > > >> >> >> able
> > > >> >> >>> to
> > > >> >> >>>>>>> define
> > > >> >> >>>>>>>>>>> where
> > > >> >> >>>>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>> job
> > > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is
> actually
> > > >> >> >>>>>>> independent
> > > >> >> >>>>>>>>>>> of the
> > > >> >> >>>>>>>>>>>>>>>> job
> > > >> >> >>>>>>>>>>>>>>>>>>>> development and is something which is
> decided
> > > >> >> >> at
> > > >> >> >>>>>>>> deployment
> > > >> >> >>>>>>>>>>>> time.
> > > >> >> >>>>>>>>>>>>>>>> To
> > > >> >> >>>>>>>>>>>>>>>>> me
> > > >> >> >>>>>>>>>>>>>>>>>>> it
> > > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the ExecutionEnvironment is
> > > >> >> >> not
> > > >> >> >>>>>>> directly
> > > >> >> >>>>>>>>>>>>>>>> initialized
> > > >> >> >>>>>>>>>>>>>>>>>> by
> > > >> >> >>>>>>>>>>>>>>>>>>>> the user and instead context sensitive how
> you
> > > >> >> >>>> want
> > > >> >> >>>>> to
> > > >> >> >>>>>>>>>>> execute
> > > >> >> >>>>>>>>>>>>> your
> > > >> >> >>>>>>>>>>>>>>>>> job
> > > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for example). However, I
> > > >> >> >>> agree
> > > >> >> >>>>>> that
> > > >> >> >>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should give you access
> to
> > > >> >> >>> the
> > > >> >> >>>>>>>>>>> ClusterClient
> > > >> >> >>>>>>>>>>>>>>>> and
> > > >> >> >>>>>>>>>>>>>>>>> to
> > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the JobGraph or a
> > > >> >> >> job
> > > >> >> >>>>> plan).
> > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> > > >> >> >>>>>>>>>>>>>>>>>>>> Till
> > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
> > > >> >> >>>>>>>>>>> zjffdu@gmail.com>
> > > >> >> >>>>>>>>>>>>>>>> wrote:
> > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> > > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You are right
> that I
> > > >> >> >>>>> expect
> > > >> >> >>>>>>>> better
> > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > >> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api which could be
> > > >> >> >> used
> > > >> >> >>> by
> > > >> >> >>>>>>>>>>> downstream
> > > >> >> >>>>>>>>>>>>>>>>> project.
> > > >> >> >>>>>>>>>>>>>>>>>>> And
> > > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the flink ecosystem.
> > > >> >> >> When
> > > >> >> >>> I
> > > >> >> >>>>> look
> > > >> >> >>>>>>> at
> > > >> >> >>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>> code
> > > >> >> >>>>>>>>>>>>>>>>> of
> > > >> >> >>>>>>>>>>>>>>>>>>>> flink
> > > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I believe they
> > > >> >> >> are
> > > >> >> >>>> not
> > > >> >> >>>>>> the
> > > >> >> >>>>>>>>>>> core of
> > > >> >> >>>>>>>>>>>>>>>>> flink,
> > > >> >> >>>>>>>>>>>>>>>>>>> but
> > > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of flink), I find
> > > >> >> >> many
> > > >> >> >>>>>>> duplicated
> > > >> >> >>>>>>>>>>> code
> > > >> >> >>>>>>>>>>>>>>>> for
> > > >> >> >>>>>>>>>>>>>>>>>>>> creating
> > > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user provided
> > > >> >> >> configuration
> > > >> >> >>>>>>>>>>> (configuration
> > > >> >> >>>>>>>>>>>>>>>>> format
> > > >> >> >>>>>>>>>>>>>>>>>>> may
> > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > >> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell and sql-client)
> > > >> >> >> and
> > > >> >> >>>> then
> > > >> >> >>>>>> use
> > > >> >> >>>>>>>>>>> that
> > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't think this is
> > > >> >> >>>>> convenient
> > > >> >> >>>>>>> for
> > > >> >> >>>>>>>>>>>>>>>> downstream
> > > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is that downstream
> > > >> >> >>>> project
> > > >> >> >>>>>> only
> > > >> >> >>>>>>>>>>> needs
> > > >> >> >>>>>>>>>>>> to
> > > >> >> >>>>>>>>>>>>>>>>>>> provide
> > > >> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration info (maybe
> > > >> >> >> introducing
> > > >> >> >>>>> class
> > > >> >> >>>>>>>>>>>> FlinkConf),
> > > >> >> >>>>>>>>>>>>>>>>> and
> > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > >> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment based on this
> > > >> >> >>>> FlinkConf,
> > > >> >> >>>>>> and
> > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will create the proper
> > > >> >> >>>>>>>> ClusterClient.
> > > >> >> >>>>>>>>>>> It
> > > >> >> >>>>>>>>>>>> not
> > > >> >> >>>>>>>>>>>>>>>>>> only
> > > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream project
> > > >> >> >> development
> > > >> >> >>>> but
> > > >> >> >>>>>> also
> > > >> >> >>>>>>>> be
> > > >> >> >>>>>>>>>>>>>>>> helpful
> > > >> >> >>>>>>>>>>>>>>>>>> for
> > > >> >> >>>>>>>>>>>>>>>>>>>>> their integration test with flink. Here's
> one
> > > >> >> >>>>> sample
> > > >> >> >>>>>>> code
> > > >> >> >>>>>>>>>>>> snippet
> > > >> >> >>>>>>>>>>>>>>>>>> that
> > > >> >> >>>>>>>>>>>>>>>>>>> I
> > > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new FlinkConf().mode("yarn")
> > > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new ExecutionEnvironment(conf)
> > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
> > > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> > > >> >> >>>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
> > > >> >> >>>>>>>>>>>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
> > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> > > >> >> >>>>> 于2018年12月11日周二
> > > >> >> >>>>>>>>>>> 下午6:28写道：
> > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to provide the
> > > >> >> >> user
> > > >> >> >>>> with
> > > >> >> >>>>>>>> better
> > > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was actually an effort to
> > > >> >> >>> achieve
> > > >> >> >>>>>> this
> > > >> >> >>>>>>>> but
> > > >> >> >>>>>>>>>>> it
> > > >> >> >>>>>>>>>>>>>>>> has
> > > >> >> >>>>>>>>>>>>>>>>>>> never
> > > >> >> >>>>>>>>>>>>>>>>>>>>> been
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However, there are some
> > > >> >> >>>> improvement
> > > >> >> >>>>>> in
> > > >> >> >>>>>>>> the
> > > >> >> >>>>>>>>>>> code
> > > >> >> >>>>>>>>>>>>>>>>> base
> > > >> >> >>>>>>>>>>>>>>>>>>>> now.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the NewClusterClient
> > > >> >> >>>> interface
> > > >> >> >>>>>>> which
> > > >> >> >>>>>>>>>>>>>>>> offers a
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission. But I agree
> > > >> >> >> that
> > > >> >> >>> we
> > > >> >> >>>>>> need
> > > >> >> >>>>>>> to
> > > >> >> >>>>>>>>>>>>>>>> improve
> > > >> >> >>>>>>>>>>>>>>>>>>> Flink
> > > >> >> >>>>>>>>>>>>>>>>>>>> in
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if exposing all
> > > >> >> >>>>>> ClusterClient
> > > >> >> >>>>>>>>>>> calls
> > > >> >> >>>>>>>>>>>>>>>> via
> > > >> >> >>>>>>>>>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment because it would
> > > >> >> >> clutter
> > > >> >> >>>> the
> > > >> >> >>>>>>> class
> > > >> >> >>>>>>>>>>> and
> > > >> >> >>>>>>>>>>>>>>>> would
> > > >> >> >>>>>>>>>>>>>>>>>> not
> > > >> >> >>>>>>>>>>>>>>>>>>>> be
> > > >> >> >>>>>>>>>>>>>>>>>>>>> a
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns. Instead one
> > > >> >> >> idea
> > > >> >> >>>>> could
> > > >> >> >>>>>> be
> > > >> >> >>>>>>>> to
> > > >> >> >>>>>>>>>>>>>>>>> retrieve
> > > >> >> >>>>>>>>>>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from the
> > > >> >> >>>>> ExecutionEnvironment
> > > >> >> >>>>>>>> which
> > > >> >> >>>>>>>>>>> can
> > > >> >> >>>>>>>>>>>>>>>>> then
> > > >> >> >>>>>>>>>>>>>>>>>> be
> > > >> >> >>>>>>>>>>>>>>>>>>>>> used
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job control. But before we
> > > >> >> >>> start
> > > >> >> >>>>> an
> > > >> >> >>>>>>>> effort
> > > >> >> >>>>>>>>>>>>>>>> here,
> > > >> >> >>>>>>>>>>>>>>>>> we
> > > >> >> >>>>>>>>>>>>>>>>>>>> need
> > > >> >> >>>>>>>>>>>>>>>>>>>>> to
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what functionality we
> want
> > > >> >> >>> to
> > > >> >> >>>>>>> provide.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was that we have the
> > > >> >> >>>>>>>> ClusterDescriptor
> > > >> >> >>>>>>>>>>>>>>>>>> describing
> > > >> >> >>>>>>>>>>>>>>>>>>>> how
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager like Yarn or
> > > >> >> >> Mesos.
> > > >> >> >>>> The
> > > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> > > >> >> >>>>>>>>>>>>>>>>>>>> can
> > > >> >> >>>>>>>>>>>>>>>>>>>>> be
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink clusters (job and
> > > >> >> >>>>> session)
> > > >> >> >>>>>>> and
> > > >> >> >>>>>>>>>>> gives
> > > >> >> >>>>>>>>>>>>>>>>> you a
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The ClusterClient controls
> > > >> >> >> the
> > > >> >> >>>>>> cluster
> > > >> >> >>>>>>>>>>> (e.g.
> > > >> >> >>>>>>>>>>>>>>>>>>> submitting
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running jobs). And then
> > > >> >> >>> there
> > > >> >> >>>>> was
> > > >> >> >>>>>>> the
> > > >> >> >>>>>>>>>>> idea
> > > >> >> >>>>>>>>>>>> to
> > > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain from the
> > > >> >> >>>> ClusterClient
> > > >> >> >>>>> to
> > > >> >> >>>>>>>>>>> trigger
> > > >> >> >>>>>>>>>>>>>>>> job
> > > >> >> >>>>>>>>>>>>>>>>>>>> specific
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a savepoint,
> > > >> >> >>> cancelling
> > > >> >> >>>>> the
> > > >> >> >>>>>>>> job).
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> > > >> >> >>>>>> https://issues.apache.org/jira/browse/FLINK-4272
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff
> Zhang
> > > >> >> >> <
> > > >> >> >>>>>>>>>>> zjffdu@gmail.com
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>> wrote:
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate flink into
> apache
> > > >> >> >>>>> zeppelin
> > > >> >> >>>>>>>>>>> which is
> > > >> >> >>>>>>>>>>>>>>>> an
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several issues that
> is
> > > >> >> >>>> caused
> > > >> >> >>>>>> by
> > > >> >> >>>>>>>>>>> flink
> > > >> >> >>>>>>>>>>>>>>>>> client
> > > >> >> >>>>>>>>>>>>>>>>>>>> api.
> > > >> >> >>>>>>>>>>>>>>>>>>>>> So
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the following
> changes
> > > >> >> >>> for
> > > >> >> >>>>>> flink
> > > >> >> >>>>>>>>>>> client
> > > >> >> >>>>>>>>>>>>>>>>> api.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking execution.
> > > >> >> >> Currently,
> > > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which would do 2
> > > >> >> >> things,
> > > >> >> >>>>> first
> > > >> >> >>>>>>>>>>> submit
> > > >> >> >>>>>>>>>>>>>>>> job
> > > >> >> >>>>>>>>>>>>>>>>>> and
> > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is finished. I'd
> like
> > > >> >> >>>>>>> introduce a
> > > >> >> >>>>>>>>>>>>>>>>>> nonblocking
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
> > > >> >> >>>> ExecutionEnvironment#submit
> > > >> >> >>>>>>> which
> > > >> >> >>>>>>>>>>> only
> > > >> >> >>>>>>>>>>>>>>>>>> submit
> > > >> >> >>>>>>>>>>>>>>>>>>>> job
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> and
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to client. And allow
> user
> > > >> >> >>> to
> > > >> >> >>>>>> query
> > > >> >> >>>>>>>> the
> > > >> >> >>>>>>>>>>> job
> > > >> >> >>>>>>>>>>>>>>>>>> status
> > > >> >> >>>>>>>>>>>>>>>>>>>> via
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >> ExecutionEnvironment/StreamExecutionEnvironment,
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to cancel job is
> via
> > > >> >> >>> cli
> > > >> >> >>>>>>>>>>> (bin/flink),
> > > >> >> >>>>>>>>>>>>>>>>> this
> > > >> >> >>>>>>>>>>>>>>>>>>> is
> > > >> >> >>>>>>>>>>>>>>>>>>>>> not
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream project to use
> > > >> >> >> this
> > > >> >> >>>>>>> feature.
> > > >> >> >>>>>>>>>>> So I'd
> > > >> >> >>>>>>>>>>>>>>>>>> like
> > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > >> >> >>>>>>>>>>>>>>>>>>>>> add
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in ExecutionEnvironment
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
> > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> It
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api, we should use
> > > >> >> >>>>>>>>>>> ExecutionEnvironment
> > > >> >> >>>>>>>>>>>>>>>> as
> > > >> >> >>>>>>>>>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to integrate with
> > > >> >> >> flink.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job execution
> > > >> >> >> lifecycle.
> > > >> >> >>>>>>> Something
> > > >> >> >>>>>>>>>>> like
> > > >> >> >>>>>>>>>>>>>>>>>>>> following,
> > > >> >> >>>>>>>>>>>>>>>>>>>>> so
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project can do custom
> logic
> > > >> >> >>> in
> > > >> >> >>>>> the
> > > >> >> >>>>>>>>>>> lifecycle
> > > >> >> >>>>>>>>>>>>>>>> of
> > > >> >> >>>>>>>>>>>>>>>>>>> job.
> > > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the jobId after
> job
> > > >> >> >> is
> > > >> >> >>>>>>> submitted
> > > >> >> >>>>>>>>>>> and
> > > >> >> >>>>>>>>>>>>>>>>> then
> > > >> >> >>>>>>>>>>>>>>>>>>> use
> > > >> >> >>>>>>>>>>>>>>>>>>>>> this
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later when necessary.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface JobListener {
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID jobId);
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobExecuted(JobExecutionResult
> > > >> >> >>>>> jobResult);
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID jobId);
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in
> ExecutionEnvironment.
> > > >> >> >>>>>> Currently
> > > >> >> >>>>>>> it
> > > >> >> >>>>>>>>>>> is
> > > >> >> >>>>>>>>>>>>>>>>>>> disabled,
> > > >> >> >>>>>>>>>>>>>>>>>>>>> but
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient for third
> party
> > > >> >> >> to
> > > >> >> >>>>>>>> submitting
> > > >> >> >>>>>>>>>>> jobs
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it again.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client api into
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue which needs
> more
> > > >> >> >>>>> careful
> > > >> >> >>>>>>>>>>> thinking
> > > >> >> >>>>>>>>>>>>>>>> and
> > > >> >> >>>>>>>>>>>>>>>>>>>> design.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features of flink is
> > > >> >> >>> exposed
> > > >> >> >>>> in
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > > >> >> >>>>>> but
> > > >> >> >>>>>>>>>>> some are
> > > >> >> >>>>>>>>>>>>>>>>>>> exposed
> > > >> >> >>>>>>>>>>>>>>>>>>>>> in
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like the cancel and
> > > >> >> >>>>> savepoint I
> > > >> >> >>>>>>>>>>> mentioned
> > > >> >> >>>>>>>>>>>>>>>>>>> above.
> > > >> >> >>>>>>>>>>>>>>>>>>>> I
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is due to that flink
> > > >> >> >>>> didn't
> > > >> >> >>>>>>> unify
> > > >> >> >>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>>>>> interaction
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> with
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3 scenarios of flink
> > > >> >> >>>> operation
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.  Flink will
> create
> > > >> >> >>>>>>>>>>> LocalEnvironment
> > > >> >> >>>>>>>>>>>>>>>>> and
> > > >> >> >>>>>>>>>>>>>>>>>>>> then
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> use
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to create
> > > >> >> >>> LocalExecutor
> > > >> >> >>>>> for
> > > >> >> >>>>>>> job
> > > >> >> >>>>>>>>>>>>>>>>>> execution.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution. Flink will
> create
> > > >> >> >>>>>>>> ClusterClient
> > > >> >> >>>>>>>>>>>>>>>>> first
> > > >> >> >>>>>>>>>>>>>>>>>>> and
> > > >> >> >>>>>>>>>>>>>>>>>>>>> then
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment based on the
> > > >> >> >>>>>>> ClusterClient
> > > >> >> >>>>>>>>>>> and
> > > >> >> >>>>>>>>>>>>>>>>> then
> > > >> >> >>>>>>>>>>>>>>>>>>> run
> > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink will create
> > > >> >> >>>>>> ClusterClient
> > > >> >> >>>>>>>>>>> first
> > > >> >> >>>>>>>>>>>>>>>> and
> > > >> >> >>>>>>>>>>>>>>>>>>> then
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this ClusterClient.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the above 3 scenarios.
> > > >> >> >>> Flink
> > > >> >> >>>>>> didn't
> > > >> >> >>>>>>>>>>> use the
> > > >> >> >>>>>>>>>>>>>>>>>> same
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to interact with
> flink
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is following:
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> > > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
> > > >> >> >>>>>>>>>>> (based
> > > >> >> >>>>>>>>>>>>>>>> on
> > > >> >> >>>>>>>>>>>>>>>>>> user
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this Environment
> to
> > > >> >> >>>> create
> > > >> >> >>>>>>>> proper
> > > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or RestClusterClient)
> > > >> >> >> to
> > > >> >> >>>>>>>> interactive
> > > >> >> >>>>>>>>>>> with
> > > >> >> >>>>>>>>>>>>>>>>>>> Flink (
> > > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the process of
> local
> > > >> >> >>>>>> execution
> > > >> >> >>>>>>>> and
> > > >> >> >>>>>>>>>>>>>>>> remote
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for third party to
> > > >> >> >>>>> integrate
> > > >> >> >>>>>>> with
> > > >> >> >>>>>>>>>>>>>>>> flink,
> > > >> >> >>>>>>>>>>>>>>>>>>>> because
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the unified entry
> > > >> >> >>> point
> > > >> >> >>>>> for
> > > >> >> >>>>>>>>>>> flink.
> > > >> >> >>>>>>>>>>>>>>>> What
> > > >> >> >>>>>>>>>>>>>>>>>>> third
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> party
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass configuration to
> > > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> > > >> >> >>>>>>>>>>>>>>>>>> and
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will do the right
> > > >> >> >> thing
> > > >> >> >>>>> based
> > > >> >> >>>>>> on
> > > >> >> >>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be considered as flink
> > > >> >> >> api
> > > >> >> >>>>>>> consumer.
> > > >> >> >>>>>>>>>>> it
> > > >> >> >>>>>>>>>>>>>>>> just
> > > >> >> >>>>>>>>>>>>>>>>>>> pass
> > > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to ExecutionEnvironment and
> > > >> >> >> let
> > > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > > >> >> >>>>>>>>>>>>>>>>>>> to
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper ClusterClient instead
> of
> > > >> >> >>>>> letting
> > > >> >> >>>>>>> cli
> > > >> >> >>>>>>>> to
> > > >> >> >>>>>>>>>>>>>>>>> create
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code refactoring,
> so
> > > >> >> >> I
> > > >> >> >>>>> think
> > > >> >> >>>>>> we
> > > >> >> >>>>>>>> can
> > > >> >> >>>>>>>>>>>>>>>> defer
> > > >> >> >>>>>>>>>>>>>>>>>> it
> > > >> >> >>>>>>>>>>>>>>>>>>>> for
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5 could be done
> at
> > > >> >> >>>> once I
> > > >> >> >>>>>>>>>>> believe.
> > > >> >> >>>>>>>>>>>>>>>> Let
> > > >> >> >>>>>>>>>>>>>>>>> me
> > > >> >> >>>>>>>>>>>>>>>>>>>> know
> > > >> >> >>>>>>>>>>>>>>>>>>>>>> your
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback, thanks
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>> --
> > > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > >> >> >>>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>> --
> > > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> > > >> >> >>>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>> --
> > > >> >> >>>>>>>>>>>>>>>>> Best Regards
> > > >> >> >>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> > > >> >> >>>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>> --
> > > >> >> >>>>>>>>>>>>>>> Best Regards
> > > >> >> >>>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>> Jeff Zhang
> > > >> >> >>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>> --
> > > >> >> >>>>>>>>>>>>> Best Regards
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>> Jeff Zhang
> > > >> >> >>>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>>
> > > >> >> >>>>>>>>>>>
> > > >> >> >>>>>>>>>>
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>> --
> > > >> >> >>>>>>>> Best Regards
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>> Jeff Zhang
> > > >> >> >>>>>>>>
> > > >> >> >>>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>
> > > >> >> >>>>>
> > > >> >> >>>>> --
> > > >> >> >>>>> Best Regards
> > > >> >> >>>>>
> > > >> >> >>>>> Jeff Zhang
> > > >> >> >>>>>
> > > >> >> >>>>
> > > >> >> >>>
> > > >> >> >>
> > > >> >> >
> > > >> >> >
> > > >> >> > --
> > > >> >> > Best Regards
> > > >> >> >
> > > >> >> > Jeff Zhang
> > > >> >>
> > > >> >>
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

Thanks for the clarification.

The idea JobDeployer ever came into my mind when I was muddled with
how to execute per-job mode and session mode with the same user code
and framework codepath.

With the concept JobDeployer we back to the statement that environment
knows every configs of cluster deployment and job submission. We
configure or generate from configuration a specific JobDeployer in
environment and then code align on

*JobClient client = env.execute().get();*

which in session mode returned by clusterClient.submitJob and in per-job
mode returned by clusterDescriptor.deployJobCluster.

Here comes a problem that currently we directly run ClusterEntrypoint
with extracted job graph. Follow the JobDeployer way we'd better
align entry point of per-job deployment at JobDeployer. Users run
their main method or by a Cli(finally call main method) to deploy the
job cluster.

Best,
tison.


Stephan Ewen <se...@apache.org> 于2019年8月20日周二 下午4:40写道：

> Till has made some good comments here.
>
> Two things to add:
>
>   - The job mode is very nice in the way that it runs the client inside the
> cluster (in the same image/process that is the JM) and thus unifies both
> applications and what the Spark world calls the "driver mode".
>
>   - Another thing I would add is that during the FLIP-6 design, we were
> thinking about setups where Dispatcher and JobManager are separate
> processes.
>     A Yarn or Mesos Dispatcher of a session could run independently (even
> as privileged processes executing no code).
>     Then you the "per-job" mode could still be helpful: when a job is
> submitted to the dispatcher, it launches the JM again in a per-job mode, so
> that JM and TM processes are bound to teh job only. For higher security
> setups, it is important that processes are not reused across jobs.
>
> On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <tr...@apache.org>
> wrote:
>
> > I would not be in favour of getting rid of the per-job mode since it
> > simplifies the process of running Flink jobs considerably. Moreover, it
> is
> > not only well suited for container deployments but also for deployments
> > where you want to guarantee job isolation. For example, a user could use
> > the per-job mode on Yarn to execute his job on a separate cluster.
> >
> > I think that having two notions of cluster deployments (session vs.
> per-job
> > mode) does not necessarily contradict your ideas for the client api
> > refactoring. For example one could have the following interfaces:
> >
> > - ClusterDeploymentDescriptor: encapsulates the logic how to deploy a
> > cluster.
> > - ClusterClient: allows to interact with a cluster
> > - JobClient: allows to interact with a running job
> >
> > Now the ClusterDeploymentDescriptor could have two methods:
> >
> > - ClusterClient deploySessionCluster()
> > - JobClusterClient/JobClient deployPerJobCluster(JobGraph)
> >
> > where JobClusterClient is either a supertype of ClusterClient which does
> > not give you the functionality to submit jobs or deployPerJobCluster
> > returns directly a JobClient.
> >
> > When setting up the ExecutionEnvironment, one would then not provide a
> > ClusterClient to submit jobs but a JobDeployer which, depending on the
> > selected mode, either uses a ClusterClient (session mode) to submit jobs
> or
> > a ClusterDeploymentDescriptor to deploy per a job mode cluster with the
> job
> > to execute.
> >
> > These are just some thoughts how one could make it working because I
> > believe there is some value in using the per job mode from the
> > ExecutionEnvironment.
> >
> > Concerning the web submission, this is indeed a bit tricky. From a
> cluster
> > management stand point, I would in favour of not executing user code on
> the
> > REST endpoint. Especially when considering security, it would be good to
> > have a well defined cluster behaviour where it is explicitly stated where
> > user code and, thus, potentially risky code is executed. Ideally we limit
> > it to the TaskExecutor and JobMaster.
> >
> > Cheers,
> > Till
> >
> > On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <pompermaier@okkam.it
> >
> > wrote:
> >
> > > In my opinion the client should not use any environment to get the Job
> > > graph because the jar should reside ONLY on the cluster (and not in the
> > > client classpath otherwise there are always inconsistencies between
> > client
> > > and Flink Job manager's classpath).
> > > In the YARN, Mesos and Kubernetes scenarios you have the jar but you
> > could
> > > start a cluster that has the jar on the Job Manager as well (but this
> is
> > > the only case where I think you can assume that the client has the jar
> on
> > > the classpath..in the REST job submission you don't have any
> classpath).
> > >
> > > Thus, always in my opinion, the JobGraph should be generated by the Job
> > > Manager REST API.
> > >
> > >
> > > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <wa...@gmail.com>
> wrote:
> > >
> > >> I would like to involve Till & Stephan here to clarify some concept of
> > >> per-job mode.
> > >>
> > >> The term per-job is one of modes a cluster could run on. It is mainly
> > >> aimed
> > >> at spawn
> > >> a dedicated cluster for a specific job while the job could be packaged
> > >> with
> > >> Flink
> > >> itself and thus the cluster initialized with job so that get rid of a
> > >> separated
> > >> submission step.
> > >>
> > >> This is useful for container deployments where one create his image
> with
> > >> the job
> > >> and then simply deploy the container.
> > >>
> > >> However, it is out of client scope since a client(ClusterClient for
> > >> example) is for
> > >> communicate with an existing cluster and performance actions.
> Currently,
> > >> in
> > >> per-job
> > >> mode, we extract the job graph and bundle it into cluster deployment
> and
> > >> thus no
> > >> concept of client get involved. It looks like reasonable to exclude
> the
> > >> deployment
> > >> of per-job cluster from client api and use dedicated utility
> > >> classes(deployers) for
> > >> deployment.
> > >>
> > >> Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午12:37写道：
> > >>
> > >> > Hi Aljoscha,
> > >> >
> > >> > Thanks for your reply and participance. The Google Doc you linked to
> > >> > requires
> > >> > permission and I think you could use a share link instead.
> > >> >
> > >> > I agree with that we almost reach a consensus that JobClient is
> > >> necessary
> > >> > to
> > >> > interacte with a running Job.
> > >> >
> > >> > Let me check your open questions one by one.
> > >> >
> > >> > 1. Separate cluster creation and job submission for per-job mode.
> > >> >
> > >> > As you mentioned here is where the opinions diverge. In my document
> > >> there
> > >> > is
> > >> > an alternative[2] that proposes excluding per-job deployment from
> > client
> > >> > api
> > >> > scope and now I find it is more reasonable we do the exclusion.
> > >> >
> > >> > When in per-job mode, a dedicated JobCluster is launched to execute
> > the
> > >> > specific job. It is like a Flink Application more than a submission
> > >> > of Flink Job. Client only takes care of job submission and assume
> > there
> > >> is
> > >> > an existing cluster. In this way we are able to consider per-job
> > issues
> > >> > individually and JobClusterEntrypoint would be the utility class for
> > >> > per-job
> > >> > deployment.
> > >> >
> > >> > Nevertheless, user program works in both session mode and per-job
> mode
> > >> > without
> > >> > necessary to change code. JobClient in per-job mode is returned from
> > >> > env.execute as normal. However, it would be no longer a wrapper of
> > >> > RestClusterClient but a wrapper of PerJobClusterClient which
> > >> communicates
> > >> > to Dispatcher locally.
> > >> >
> > >> > 2. How to deal with plan preview.
> > >> >
> > >> > With env.compile functions users can get JobGraph or FlinkPlan and
> > thus
> > >> > they can preview the plan with programming. Typically it looks like
> > >> >
> > >> > if (preview configured) {
> > >> >     FlinkPlan plan = env.compile();
> > >> >     new JSONDumpGenerator(...).dump(plan);
> > >> > } else {
> > >> >     env.execute();
> > >> > }
> > >> >
> > >> > And `flink info` would be invalid any more.
> > >> >
> > >> > 3. How to deal with Jar Submission at the Web Frontend.
> > >> >
> > >> > There is one more thread talked on this topic[1]. Apart from
> removing
> > >> > the functions there are two alternatives.
> > >> >
> > >> > One is to introduce an interface has a method returns
> > JobGraph/FilnkPlan
> > >> > and Jar Submission only support main-class implements this
> interface.
> > >> > And then extract the JobGraph/FlinkPlan just by calling the method.
> > >> > In this way, it is even possible to consider a separation of job
> > >> creation
> > >> > and job submission.
> > >> >
> > >> > The other is, as you mentioned, let execute() do the actual
> execution.
> > >> > We won't execute the main method in the WebFrontend but spawn a
> > process
> > >> > at WebMonitor side to execute. For return part we could generate the
> > >> > JobID from WebMonitor and pass it to the execution environemnt.
> > >> >
> > >> > 4. How to deal with detached mode.
> > >> >
> > >> > I think detached mode is a temporary solution for non-blocking
> > >> submission.
> > >> > In my document both submission and execution return a
> > CompletableFuture
> > >> and
> > >> > users control whether or not wait for the result. In this point we
> > don't
> > >> > need a detached option but the functionality is covered.
> > >> >
> > >> > 5. How does per-job mode interact with interactive programming.
> > >> >
> > >> > All of YARN, Mesos and Kubernetes scenarios follow the pattern
> launch
> > a
> > >> > JobCluster now. And I don't think there would be inconsistency
> between
> > >> > different resource management.
> > >> >
> > >> > Best,
> > >> > tison.
> > >> >
> > >> > [1]
> > >> >
> > >>
> >
> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> > >> > [2]
> > >> >
> > >>
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> > >> >
> > >> > Aljoscha Krettek <al...@apache.org> 于2019年8月16日周五 下午9:20写道：
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> I read both Jeffs initial design document and the newer document by
> > >> >> Tison. I also finally found the time to collect our thoughts on the
> > >> issue,
> > >> >> I had quite some discussions with Kostas and this is the result:
> [1].
> > >> >>
> > >> >> I think overall we agree that this part of the code is in dire need
> > of
> > >> >> some refactoring/improvements but I think there are still some open
> > >> >> questions and some differences in opinion what those refactorings
> > >> should
> > >> >> look like.
> > >> >>
> > >> >> I think the API-side is quite clear, i.e. we need some JobClient
> API
> > >> that
> > >> >> allows interacting with a running Job. It could be worthwhile to
> spin
> > >> that
> > >> >> off into a separate FLIP because we can probably find consensus on
> > that
> > >> >> part more easily.
> > >> >>
> > >> >> For the rest, the main open questions from our doc are these:
> > >> >>
> > >> >>   - Do we want to separate cluster creation and job submission for
> > >> >> per-job mode? In the past, there were conscious efforts to *not*
> > >> separate
> > >> >> job submission from cluster creation for per-job clusters for
> Mesos,
> > >> YARN,
> > >> >> Kubernets (see StandaloneJobClusterEntryPoint). Tison suggests in
> his
> > >> >> design document to decouple this in order to unify job submission.
> > >> >>
> > >> >>   - How to deal with plan preview, which needs to hijack execute()
> > and
> > >> >> let the outside code catch an exception?
> > >> >>
> > >> >>   - How to deal with Jar Submission at the Web Frontend, which
> needs
> > to
> > >> >> hijack execute() and let the outside code catch an exception?
> > >> >> CliFrontend.run() “hijacks” ExecutionEnvironment.execute() to get a
> > >> >> JobGraph and then execute that JobGraph manually. We could get
> around
> > >> that
> > >> >> by letting execute() do the actual execution. One caveat for this
> is
> > >> that
> > >> >> now the main() method doesn’t return (or is forced to return by
> > >> throwing an
> > >> >> exception from execute()) which means that for Jar Submission from
> > the
> > >> >> WebFrontend we have a long-running main() method running in the
> > >> >> WebFrontend. This doesn’t sound very good. We could get around this
> > by
> > >> >> removing the plan preview feature and by removing Jar
> > >> Submission/Running.
> > >> >>
> > >> >>   - How to deal with detached mode? Right now, DetachedEnvironment
> > will
> > >> >> execute the job and return immediately. If users control when they
> > >> want to
> > >> >> return, by waiting on the job completion future, how do we deal
> with
> > >> this?
> > >> >> Do we simply remove the distinction between detached/non-detached?
> > >> >>
> > >> >>   - How does per-job mode interact with “interactive programming”
> > >> >> (FLIP-36). For YARN, each execute() call could spawn a new Flink
> YARN
> > >> >> cluster. What about Mesos and Kubernetes?
> > >> >>
> > >> >> The first open question is where the opinions diverge, I think. The
> > >> rest
> > >> >> are just open questions and interesting things that we need to
> > >> consider.
> > >> >>
> > >> >> Best,
> > >> >> Aljoscha
> > >> >>
> > >> >> [1]
> > >> >>
> > >>
> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > >> >> <
> > >> >>
> > >>
> >
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> > >> >> >
> > >> >>
> > >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <zj...@gmail.com> wrote:
> > >> >> >
> > >> >> > Thanks tison for the effort. I left a few comments.
> > >> >> >
> > >> >> >
> > >> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三 下午8:24写道：
> > >> >> >
> > >> >> >> Hi Flavio,
> > >> >> >>
> > >> >> >> Thanks for your reply.
> > >> >> >>
> > >> >> >> Either current impl and in the design, ClusterClient
> > >> >> >> never takes responsibility for generating JobGraph.
> > >> >> >> (what you see in current codebase is several class methods)
> > >> >> >>
> > >> >> >> Instead, user describes his program in the main method
> > >> >> >> with ExecutionEnvironment apis and calls env.compile()
> > >> >> >> or env.optimize() to get FlinkPlan and JobGraph respectively.
> > >> >> >>
> > >> >> >> For listing main classes in a jar and choose one for
> > >> >> >> submission, you're now able to customize a CLI to do it.
> > >> >> >> Specifically, the path of jar is passed as arguments and
> > >> >> >> in the customized CLI you list main classes, choose one
> > >> >> >> to submit to the cluster.
> > >> >> >>
> > >> >> >> Best,
> > >> >> >> tison.
> > >> >> >>
> > >> >> >>
> > >> >> >> Flavio Pompermaier <po...@okkam.it> 于2019年7月31日周三
> 下午8:12写道：
> > >> >> >>
> > >> >> >>> Just one note on my side: it is not clear to me whether the
> > client
> > >> >> needs
> > >> >> >> to
> > >> >> >>> be able to generate a job graph or not.
> > >> >> >>> In my opinion, the job jar must resides only on the
> > >> server/jobManager
> > >> >> >> side
> > >> >> >>> and the client requires a way to get the job graph.
> > >> >> >>> If you really want to access to the job graph, I'd add a
> > dedicated
> > >> >> method
> > >> >> >>> on the ClusterClient. like:
> > >> >> >>>
> > >> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
> > >> >> >>>   - listMainClasses(jarId): List<String>
> > >> >> >>>
> > >> >> >>> These would require some addition also on the job manager
> > endpoint
> > >> as
> > >> >> >>> well..what do you think?
> > >> >> >>>
> > >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <
> wander4096@gmail.com
> > >
> > >> >> wrote:
> > >> >> >>>
> > >> >> >>>> Hi all,
> > >> >> >>>>
> > >> >> >>>> Here is a document[1] on client api enhancement from our
> > >> perspective.
> > >> >> >>>> We have investigated current implementations. And we propose
> > >> >> >>>>
> > >> >> >>>> 1. Unify the implementation of cluster deployment and job
> > >> submission
> > >> >> in
> > >> >> >>>> Flink.
> > >> >> >>>> 2. Provide programmatic interfaces to allow flexible job and
> > >> cluster
> > >> >> >>>> management.
> > >> >> >>>>
> > >> >> >>>> The first proposal is aimed at reducing code paths of cluster
> > >> >> >> deployment
> > >> >> >>>> and
> > >> >> >>>> job submission so that one can adopt Flink in his usage
> easily.
> > >> The
> > >> >> >>> second
> > >> >> >>>> proposal is aimed at providing rich interfaces for advanced
> > users
> > >> >> >>>> who want to make accurate control of these stages.
> > >> >> >>>>
> > >> >> >>>> Quick reference on open questions:
> > >> >> >>>>
> > >> >> >>>> 1. Exclude job cluster deployment from client side or redefine
> > the
> > >> >> >>> semantic
> > >> >> >>>> of job cluster? Since it fits in a process quite different
> from
> > >> >> session
> > >> >> >>>> cluster deployment and job submission.
> > >> >> >>>>
> > >> >> >>>> 2. Maintain the codepaths handling class
> > o.a.f.api.common.Program
> > >> or
> > >> >> >>>> implement customized program handling logic by customized
> > >> >> CliFrontend?
> > >> >> >>>> See also this thread[2] and the document[1].
> > >> >> >>>>
> > >> >> >>>> 3. Expose ClusterClient as public api or just expose api in
> > >> >> >>>> ExecutionEnvironment
> > >> >> >>>> and delegate them to ClusterClient? Further, in either way is
> it
> > >> >> worth
> > >> >> >> to
> > >> >> >>>> introduce a JobClient which is an encapsulation of
> ClusterClient
> > >> that
> > >> >> >>>> associated to specific job?
> > >> >> >>>>
> > >> >> >>>> Best,
> > >> >> >>>> tison.
> > >> >> >>>>
> > >> >> >>>> [1]
> > >> >> >>>>
> > >> >> >>>>
> > >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> > >> >> >>>> [2]
> > >> >> >>>>
> > >> >> >>>>
> > >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> > >> >> >>>>
> > >> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三 上午9:19写道：
> > >> >> >>>>
> > >> >> >>>>> Thanks Stephan, I will follow up this issue in next few
> weeks,
> > >> and
> > >> >> >> will
> > >> >> >>>>> refine the design doc. We could discuss more details after
> 1.9
> > >> >> >> release.
> > >> >> >>>>>
> > >> >> >>>>> Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：
> > >> >> >>>>>
> > >> >> >>>>>> Hi all!
> > >> >> >>>>>>
> > >> >> >>>>>> This thread has stalled for a bit, which I assume ist mostly
> > >> due to
> > >> >> >>> the
> > >> >> >>>>>> Flink 1.9 feature freeze and release testing effort.
> > >> >> >>>>>>
> > >> >> >>>>>> I personally still recognize this issue as one important to
> be
> > >> >> >>> solved.
> > >> >> >>>>> I'd
> > >> >> >>>>>> be happy to help resume this discussion soon (after the 1.9
> > >> >> >> release)
> > >> >> >>>> and
> > >> >> >>>>>> see if we can do some step towards this in Flink 1.10.
> > >> >> >>>>>>
> > >> >> >>>>>> Best,
> > >> >> >>>>>> Stephan
> > >> >> >>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <
> > >> >> >>>>> pompermaier@okkam.it>
> > >> >> >>>>>> wrote:
> > >> >> >>>>>>
> > >> >> >>>>>>> That's exactly what I suggested a long time ago: the Flink
> > REST
> > >> >> >>>> client
> > >> >> >>>>>>> should not require any Flink dependency, only http library
> to
> > >> >> >> call
> > >> >> >>>> the
> > >> >> >>>>>> REST
> > >> >> >>>>>>> services to submit and monitor a job.
> > >> >> >>>>>>> What I suggested also in [1] was to have a way to
> > automatically
> > >> >> >>>> suggest
> > >> >> >>>>>> the
> > >> >> >>>>>>> user (via a UI) the available main classes and their
> required
> > >> >> >>>>>>> parameters[2].
> > >> >> >>>>>>> Another problem we have with Flink is that the Rest client
> > and
> > >> >> >> the
> > >> >> >>>> CLI
> > >> >> >>>>>> one
> > >> >> >>>>>>> behaves differently and we use the CLI client (via ssh)
> > because
> > >> >> >> it
> > >> >> >>>>> allows
> > >> >> >>>>>>> to call some other method after env.execute() [3] (we have
> to
> > >> >> >> call
> > >> >> >>>>>> another
> > >> >> >>>>>>> REST service to signal the end of the job).
> > >> >> >>>>>>> Int his regard, a dedicated interface, like the JobListener
> > >> >> >>> suggested
> > >> >> >>>>> in
> > >> >> >>>>>>> the previous emails, would be very helpful (IMHO).
> > >> >> >>>>>>>
> > >> >> >>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-10864
> > >> >> >>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-10862
> > >> >> >>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-10879
> > >> >> >>>>>>>
> > >> >> >>>>>>> Best,
> > >> >> >>>>>>> Flavio
> > >> >> >>>>>>>
> > >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <
> zjffdu@gmail.com
> > >
> > >> >> >>> wrote:
> > >> >> >>>>>>>
> > >> >> >>>>>>>> Hi, Tison,
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Thanks for your comments. Overall I agree with you that it
> > is
> > >> >> >>>>> difficult
> > >> >> >>>>>>> for
> > >> >> >>>>>>>> down stream project to integrate with flink and we need to
> > >> >> >>> refactor
> > >> >> >>>>> the
> > >> >> >>>>>>>> current flink client api.
> > >> >> >>>>>>>> And I agree that CliFrontend should only parsing command
> > line
> > >> >> >>>>> arguments
> > >> >> >>>>>>> and
> > >> >> >>>>>>>> then pass them to ExecutionEnvironment. It is
> > >> >> >>>> ExecutionEnvironment's
> > >> >> >>>>>>>> responsibility to compile job, create cluster, and submit
> > job.
> > >> >> >>>>> Besides
> > >> >> >>>>>>>> that, Currently flink has many ExecutionEnvironment
> > >> >> >>>> implementations,
> > >> >> >>>>>> and
> > >> >> >>>>>>>> flink will use the specific one based on the context.
> IMHO,
> > it
> > >> >> >> is
> > >> >> >>>> not
> > >> >> >>>>>>>> necessary, ExecutionEnvironment should be able to do the
> > right
> > >> >> >>>> thing
> > >> >> >>>>>>> based
> > >> >> >>>>>>>> on the FlinkConf it is received. Too many
> > ExecutionEnvironment
> > >> >> >>>>>>>> implementation is another burden for downstream project
> > >> >> >>>> integration.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> One thing I'd like to mention is flink's scala shell and
> sql
> > >> >> >>>> client,
> > >> >> >>>>>>>> although they are sub-modules of flink, they could be
> > treated
> > >> >> >> as
> > >> >> >>>>>>> downstream
> > >> >> >>>>>>>> project which use flink's client api. Currently you will
> > find
> > >> >> >> it
> > >> >> >>> is
> > >> >> >>>>> not
> > >> >> >>>>>>>> easy for them to integrate with flink, they share many
> > >> >> >> duplicated
> > >> >> >>>>> code.
> > >> >> >>>>>>> It
> > >> >> >>>>>>>> is another sign that we should refactor flink client api.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> I believe it is a large and hard change, and I am afraid
> we
> > >> can
> > >> >> >>> not
> > >> >> >>>>>> keep
> > >> >> >>>>>>>> compatibility since many of changes are user facing.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>> Hi all,
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> After a closer look on our client apis, I can see there
> are
> > >> >> >> two
> > >> >> >>>>> major
> > >> >> >>>>>>>>> issues to consistency and integration, namely different
> > >> >> >>>> deployment
> > >> >> >>>>> of
> > >> >> >>>>>>>>> job cluster which couples job graph creation and cluster
> > >> >> >>>>> deployment,
> > >> >> >>>>>>>>> and submission via CliFrontend confusing control flow of
> > job
> > >> >> >>>> graph
> > >> >> >>>>>>>>> compilation and job submission. I'd like to follow the
> > >> >> >> discuss
> > >> >> >>>>> above,
> > >> >> >>>>>>>>> mainly the process described by Jeff and Stephan, and
> share
> > >> >> >> my
> > >> >> >>>>>>>>> ideas on these issues.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> 1) CliFrontend confuses the control flow of job
> compilation
> > >> >> >> and
> > >> >> >>>>>>>> submission.
> > >> >> >>>>>>>>> Following the process of job submission Stephan and Jeff
> > >> >> >>>> described,
> > >> >> >>>>>>>>> execution environment knows all configs of the cluster
> and
> > >> >> >>>>>>> topos/settings
> > >> >> >>>>>>>>> of the job. Ideally, in the main method of user program,
> it
> > >> >> >>> calls
> > >> >> >>>>>>>> #execute
> > >> >> >>>>>>>>> (or named #submit) and Flink deploys the cluster, compile
> > the
> > >> >> >>> job
> > >> >> >>>>>> graph
> > >> >> >>>>>>>>> and submit it to the cluster. However, current
> CliFrontend
> > >> >> >> does
> > >> >> >>>> all
> > >> >> >>>>>>> these
> > >> >> >>>>>>>>> things inside its #runProgram method, which introduces a
> > lot
> > >> >> >> of
> > >> >> >>>>>>>> subclasses
> > >> >> >>>>>>>>> of (stream) execution environment.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> Actually, it sets up an exec env that hijacks the
> > >> >> >>>>>> #execute/executePlan
> > >> >> >>>>>>>>> method, initializes the job graph and abort execution.
> And
> > >> >> >> then
> > >> >> >>>>>>>>> control flow back to CliFrontend, it deploys the
> cluster(or
> > >> >> >>>>> retrieve
> > >> >> >>>>>>>>> the client) and submits the job graph. This is quite a
> > >> >> >> specific
> > >> >> >>>>>>> internal
> > >> >> >>>>>>>>> process inside Flink and none of consistency to anything.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> 2) Deployment of job cluster couples job graph creation
> and
> > >> >> >>>> cluster
> > >> >> >>>>>>>>> deployment. Abstractly, from user job to a concrete
> > >> >> >> submission,
> > >> >> >>>> it
> > >> >> >>>>>>>> requires
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>>     create JobGraph --\
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> create ClusterClient -->  submit JobGraph
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> such a dependency. ClusterClient was created by deploying
> > or
> > >> >> >>>>>>> retrieving.
> > >> >> >>>>>>>>> JobGraph submission requires a compiled JobGraph and
> valid
> > >> >> >>>>>>> ClusterClient,
> > >> >> >>>>>>>>> but the creation of ClusterClient is abstractly
> independent
> > >> >> >> of
> > >> >> >>>> that
> > >> >> >>>>>> of
> > >> >> >>>>>>>>> JobGraph. However, in job cluster mode, we deploy job
> > cluster
> > >> >> >>>> with
> > >> >> >>>>> a
> > >> >> >>>>>>> job
> > >> >> >>>>>>>>> graph, which means we use another process:
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> create JobGraph --> deploy cluster with the JobGraph
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> Here is another inconsistency and downstream
> > projects/client
> > >> >> >>> apis
> > >> >> >>>>> are
> > >> >> >>>>>>>>> forced to handle different cases with rare supports from
> > >> >> >> Flink.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> Since we likely reached a consensus on
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> 1. all configs gathered by Flink configuration and passed
> > >> >> >>>>>>>>> 2. execution environment knows all configs and handles
> > >> >> >>>>> execution(both
> > >> >> >>>>>>>>> deployment and submission)
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> to the issues above I propose eliminating inconsistencies
> > by
> > >> >> >>>>>> following
> > >> >> >>>>>>>>> approach:
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> 1) CliFrontend should exactly be a front end, at least
> for
> > >> >> >>> "run"
> > >> >> >>>>>>> command.
> > >> >> >>>>>>>>> That means it just gathered and passed all config from
> > >> >> >> command
> > >> >> >>>> line
> > >> >> >>>>>> to
> > >> >> >>>>>>>>> the main method of user program. Execution environment
> > knows
> > >> >> >>> all
> > >> >> >>>>> the
> > >> >> >>>>>>> info
> > >> >> >>>>>>>>> and with an addition to utils for ClusterClient, we
> > >> >> >> gracefully
> > >> >> >>>> get
> > >> >> >>>>> a
> > >> >> >>>>>>>>> ClusterClient by deploying or retrieving. In this way, we
> > >> >> >> don't
> > >> >> >>>>> need
> > >> >> >>>>>> to
> > >> >> >>>>>>>>> hijack #execute/executePlan methods and can remove
> various
> > >> >> >>>> hacking
> > >> >> >>>>>>>>> subclasses of exec env, as well as #run methods in
> > >> >> >>>>> ClusterClient(for
> > >> >> >>>>>> an
> > >> >> >>>>>>>>> interface-ized ClusterClient). Now the control flow flows
> > >> >> >> from
> > >> >> >>>>>>>> CliFrontend
> > >> >> >>>>>>>>> to the main method and never returns.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> 2) Job cluster means a cluster for the specific job. From
> > >> >> >>> another
> > >> >> >>>>>>>>> perspective, it is an ephemeral session. We may decouple
> > the
> > >> >> >>>>>> deployment
> > >> >> >>>>>>>>> with a compiled job graph, but start a session with idle
> > >> >> >>> timeout
> > >> >> >>>>>>>>> and submit the job following.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> These topics, before we go into more details on design or
> > >> >> >>>>>>> implementation,
> > >> >> >>>>>>>>> are better to be aware and discussed for a consensus.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> Best,
> > >> >> >>>>>>>>> tison.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>>> Hi Jeff,
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> Thanks for raising this thread and the design document!
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> As @Thomas Weise mentioned above, extending config to
> > flink
> > >> >> >>>>>>>>>> requires far more effort than it should be. Another
> > example
> > >> >> >>>>>>>>>> is we achieve detach mode by introduce another execution
> > >> >> >>>>>>>>>> environment which also hijack #execute method.
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> I agree with your idea that user would configure all
> > things
> > >> >> >>>>>>>>>> and flink "just" respect it. On this topic I think the
> > >> >> >> unusual
> > >> >> >>>>>>>>>> control flow when CliFrontend handle "run" command is
> the
> > >> >> >>>> problem.
> > >> >> >>>>>>>>>> It handles several configs, mainly about cluster
> settings,
> > >> >> >> and
> > >> >> >>>>>>>>>> thus main method of user program is unaware of them.
> Also
> > it
> > >> >> >>>>>> compiles
> > >> >> >>>>>>>>>> app to job graph by run the main method with a hijacked
> > exec
> > >> >> >>>> env,
> > >> >> >>>>>>>>>> which constrain the main method further.
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> I'd like to write down a few of notes on configs/args
> pass
> > >> >> >> and
> > >> >> >>>>>>> respect,
> > >> >> >>>>>>>>>> as well as decoupling job compilation and submission.
> > Share
> > >> >> >> on
> > >> >> >>>>> this
> > >> >> >>>>>>>>>> thread later.
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> Best,
> > >> >> >>>>>>>>>> tison.
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一
> > >> >> >> 下午7:29写道：
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>>> Hi Jeff and Flavio,
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the design document.
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>> We are also working on refactoring ClusterClient to
> allow
> > >> >> >>>>> flexible
> > >> >> >>>>>>> and
> > >> >> >>>>>>>>>>> efficient job management in our real-time platform.
> > >> >> >>>>>>>>>>> We would like to draft a document to share our ideas
> with
> > >> >> >>> you.
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>> I think it's a good idea to have something like Apache
> > Livy
> > >> >> >>> for
> > >> >> >>>>>>> Flink,
> > >> >> >>>>>>>>>>> and
> > >> >> >>>>>>>>>>> the efforts discussed here will take a great step
> forward
> > >> >> >> to
> > >> >> >>>> it.
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>> Regards,
> > >> >> >>>>>>>>>>> Xiaogang
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>> Flavio Pompermaier <po...@okkam.it>
> 于2019年6月17日周一
> > >> >> >>>>> 下午7:13写道：
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>>> Is there any possibility to have something like Apache
> > >> >> >> Livy
> > >> >> >>>> [1]
> > >> >> >>>>>>> also
> > >> >> >>>>>>>>>>> for
> > >> >> >>>>>>>>>>>> Flink in the future?
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <
> > >> >> >>> zjffdu@gmail.com
> > >> >> >>>>>
> > >> >> >>>>>>> wrote:
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>> Any API we expose should not have dependencies on
> > >> >> >>> the
> > >> >> >>>>>>> runtime
> > >> >> >>>>>>>>>>>>> (flink-runtime) package or other implementation
> > >> >> >> details.
> > >> >> >>> To
> > >> >> >>>>> me,
> > >> >> >>>>>>>> this
> > >> >> >>>>>>>>>>>> means
> > >> >> >>>>>>>>>>>>> that the current ClusterClient cannot be exposed to
> > >> >> >> users
> > >> >> >>>>>> because
> > >> >> >>>>>>>> it
> > >> >> >>>>>>>>>>>> uses
> > >> >> >>>>>>>>>>>>> quite some classes from the optimiser and runtime
> > >> >> >>> packages.
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>> We should change ClusterClient from class to
> interface.
> > >> >> >>>>>>>>>>>>> ExecutionEnvironment only use the interface
> > >> >> >> ClusterClient
> > >> >> >>>>> which
> > >> >> >>>>>>>>>>> should be
> > >> >> >>>>>>>>>>>>> in flink-clients while the concrete implementation
> > >> >> >> class
> > >> >> >>>>> could
> > >> >> >>>>>> be
> > >> >> >>>>>>>> in
> > >> >> >>>>>>>>>>>>> flink-runtime.
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>> What happens when a failure/restart in the client
> > >> >> >>>>> happens?
> > >> >> >>>>>>>> There
> > >> >> >>>>>>>>>>> need
> > >> >> >>>>>>>>>>>>> to be a way of re-establishing the connection to the
> > >> >> >> job,
> > >> >> >>>> set
> > >> >> >>>>>> up
> > >> >> >>>>>>>> the
> > >> >> >>>>>>>>>>>>> listeners again, etc.
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>> Good point.  First we need to define what does
> > >> >> >>>>> failure/restart
> > >> >> >>>>>> in
> > >> >> >>>>>>>> the
> > >> >> >>>>>>>>>>>>> client mean. IIUC, that usually mean network failure
> > >> >> >>> which
> > >> >> >>>>> will
> > >> >> >>>>>>>>>>> happen in
> > >> >> >>>>>>>>>>>>> class RestClient. If my understanding is correct,
> > >> >> >>>>> restart/retry
> > >> >> >>>>>>>>>>> mechanism
> > >> >> >>>>>>>>>>>>> should be done in RestClient.
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二
> > >> >> >>>>>> 下午11:10写道：
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>> Some points to consider:
> > >> >> >>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>> * Any API we expose should not have dependencies on
> > >> >> >> the
> > >> >> >>>>>> runtime
> > >> >> >>>>>>>>>>>>>> (flink-runtime) package or other implementation
> > >> >> >>> details.
> > >> >> >>>> To
> > >> >> >>>>>> me,
> > >> >> >>>>>>>>>>> this
> > >> >> >>>>>>>>>>>>> means
> > >> >> >>>>>>>>>>>>>> that the current ClusterClient cannot be exposed to
> > >> >> >>> users
> > >> >> >>>>>>> because
> > >> >> >>>>>>>>>>> it
> > >> >> >>>>>>>>>>>>> uses
> > >> >> >>>>>>>>>>>>>> quite some classes from the optimiser and runtime
> > >> >> >>>> packages.
> > >> >> >>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>> * What happens when a failure/restart in the client
> > >> >> >>>>> happens?
> > >> >> >>>>>>>> There
> > >> >> >>>>>>>>>>> need
> > >> >> >>>>>>>>>>>>> to
> > >> >> >>>>>>>>>>>>>> be a way of re-establishing the connection to the
> > >> >> >> job,
> > >> >> >>>> set
> > >> >> >>>>> up
> > >> >> >>>>>>> the
> > >> >> >>>>>>>>>>>>> listeners
> > >> >> >>>>>>>>>>>>>> again, etc.
> > >> >> >>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>> Aljoscha
> > >> >> >>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff Zhang <
> > >> >> >>>> zjffdu@gmail.com>
> > >> >> >>>>>>>> wrote:
> > >> >> >>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is late as you
> > >> >> >> expected.
> > >> >> >>>>> Here's
> > >> >> >>>>>>> the
> > >> >> >>>>>>>>>>>> design
> > >> >> >>>>>>>>>>>>>> doc
> > >> >> >>>>>>>>>>>>>>> I drafted, welcome any comments and feedback.
> > >> >> >>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>
> > >> >> >>>>
> > >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > >> >> >>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四
> > >> >> >>>> 下午8:43写道：
> > >> >> >>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>> Nice that this discussion is happening.
> > >> >> >>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also revisit the entire role
> > >> >> >>> of
> > >> >> >>>>> the
> > >> >> >>>>>>>>>>>> environments
> > >> >> >>>>>>>>>>>>>>>> again.
> > >> >> >>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> > >> >> >>>>>>>>>>>>>>>> - the environments take care of the specific
> > >> >> >> setup
> > >> >> >>>> for
> > >> >> >>>>>>>>>>> standalone
> > >> >> >>>>>>>>>>>> (no
> > >> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
> > >> >> >>>>>>>>>>>>>>>> - the session ones have control over the session.
> > >> >> >>> The
> > >> >> >>>>>>>>>>> environment
> > >> >> >>>>>>>>>>>>> holds
> > >> >> >>>>>>>>>>>>>>>> the session client.
> > >> >> >>>>>>>>>>>>>>>> - running a job gives a "control" object for that
> > >> >> >>>> job.
> > >> >> >>>>>> That
> > >> >> >>>>>>>>>>>> behavior
> > >> >> >>>>>>>>>>>>> is
> > >> >> >>>>>>>>>>>>>>>> the same in all environments.
> > >> >> >>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>> The actual implementation diverged quite a bit
> > >> >> >> from
> > >> >> >>>>> that.
> > >> >> >>>>>>>> Happy
> > >> >> >>>>>>>>>>> to
> > >> >> >>>>>>>>>>>>> see a
> > >> >> >>>>>>>>>>>>>>>> discussion about straitening this out a bit more.
> > >> >> >>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
> > >> >> >>>>>>> zjffdu@gmail.com>
> > >> >> >>>>>>>>>>>> wrote:
> > >> >> >>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>> Hi folks,
> > >> >> >>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>> Sorry for late response, It seems we reach
> > >> >> >>> consensus
> > >> >> >>>> on
> > >> >> >>>>>>>> this, I
> > >> >> >>>>>>>>>>>> will
> > >> >> >>>>>>>>>>>>>>>> create
> > >> >> >>>>>>>>>>>>>>>>> FLIP for this with more detailed design
> > >> >> >>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org> 于2018年12月21日周五
> > >> >> >>>>> 上午11:43写道：
> > >> >> >>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>> Great to see this discussion seeded! The
> > >> >> >> problems
> > >> >> >>>> you
> > >> >> >>>>>> face
> > >> >> >>>>>>>>>>> with
> > >> >> >>>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also affecting other
> > >> >> >>>>> downstream
> > >> >> >>>>>>>>>>> projects,
> > >> >> >>>>>>>>>>>>>> like
> > >> >> >>>>>>>>>>>>>>>>>> Beam.
> > >> >> >>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint restore option in
> > >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> > >> >> >>>>>>>>>>>>>>>>> [1]
> > >> >> >>>>>>>>>>>>>>>>>> and that was more difficult than it should be.
> > >> >> >> The
> > >> >> >>>>> main
> > >> >> >>>>>>>> issue
> > >> >> >>>>>>>>>>> is
> > >> >> >>>>>>>>>>>>> that
> > >> >> >>>>>>>>>>>>>>>>>> environment and cluster client aren't decoupled.
> > >> >> >>>>> Ideally
> > >> >> >>>>>>> it
> > >> >> >>>>>>>>>>> should
> > >> >> >>>>>>>>>>>>> be
> > >> >> >>>>>>>>>>>>>>>>>> possible to just get the matching cluster client
> > >> >> >>>> from
> > >> >> >>>>>> the
> > >> >> >>>>>>>>>>>>> environment
> > >> >> >>>>>>>>>>>>>>>> and
> > >> >> >>>>>>>>>>>>>>>>>> then control the job through it (environment as
> > >> >> >>>>> factory
> > >> >> >>>>>>> for
> > >> >> >>>>>>>>>>>> cluster
> > >> >> >>>>>>>>>>>>>>>>>> client). But note that the environment classes
> > >> >> >> are
> > >> >> >>>>> part
> > >> >> >>>>>> of
> > >> >> >>>>>>>> the
> > >> >> >>>>>>>>>>>>> public
> > >> >> >>>>>>>>>>>>>>>>> API,
> > >> >> >>>>>>>>>>>>>>>>>> and it is not straightforward to make larger
> > >> >> >>> changes
> > >> >> >>>>>>> without
> > >> >> >>>>>>>>>>>>> breaking
> > >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
> > >> >> >>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes internal classes
> > >> >> >>>> like
> > >> >> >>>>>>>>>>> JobGraph and
> > >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be possible to wrap
> > >> >> >>> this
> > >> >> >>>>>> with a
> > >> >> >>>>>>>> new
> > >> >> >>>>>>>>>>>>> public
> > >> >> >>>>>>>>>>>>>>>> API
> > >> >> >>>>>>>>>>>>>>>>>> that brings the required job control
> > >> >> >> capabilities
> > >> >> >>>> for
> > >> >> >>>>>>>>>>> downstream
> > >> >> >>>>>>>>>>>>>>>>> projects.
> > >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at some of the
> > >> >> >>>>> interfaces
> > >> >> >>>>>> in
> > >> >> >>>>>>>>>>> Beam
> > >> >> >>>>>>>>>>>>> while
> > >> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the portable job
> > >> >> >> API
> > >> >> >>>> and
> > >> >> >>>>>> [3]
> > >> >> >>>>>>>> for
> > >> >> >>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>> old
> > >> >> >>>>>>>>>>>>>>>>>> asynchronous job control from the Beam Java SDK.
> > >> >> >>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>> The backward compatibility discussion [4] is
> > >> >> >> also
> > >> >> >>>>>> relevant
> > >> >> >>>>>>>>>>> here. A
> > >> >> >>>>>>>>>>>>> new
> > >> >> >>>>>>>>>>>>>>>>> API
> > >> >> >>>>>>>>>>>>>>>>>> should shield downstream projects from internals
> > >> >> >>> and
> > >> >> >>>>>> allow
> > >> >> >>>>>>>>>>> them to
> > >> >> >>>>>>>>>>>>>>>>>> interoperate with multiple future Flink versions
> > >> >> >>> in
> > >> >> >>>>> the
> > >> >> >>>>>>> same
> > >> >> >>>>>>>>>>>> release
> > >> >> >>>>>>>>>>>>>>>> line
> > >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
> > >> >> >>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>> Thanks,
> > >> >> >>>>>>>>>>>>>>>>>> Thomas
> > >> >> >>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>> [1] https://github.com/apache/flink/pull/7249
> > >> >> >>>>>>>>>>>>>>>>>> [2]
> > >> >> >>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>
> > >> >> >>>>
> > >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > >> >> >>>>>>>>>>>>>>>>>> [3]
> > >> >> >>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>
> > >> >> >>>>
> > >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > >> >> >>>>>>>>>>>>>>>>>> [4]
> > >> >> >>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>
> > >> >> >>>>
> > >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > >> >> >>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
> > >> >> >>>>>>>> zjffdu@gmail.com>
> > >> >> >>>>>>>>>>>>> wrote:
> > >> >> >>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
> > >> >> >>> able
> > >> >> >>>> to
> > >> >> >>>>>>>> define
> > >> >> >>>>>>>>>>>> where
> > >> >> >>>>>>>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>>> job
> > >> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is actually
> > >> >> >>>>>> independent
> > >> >> >>>>>>>> of
> > >> >> >>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>> job
> > >> >> >>>>>>>>>>>>>>>>>>> development and is something which is decided
> > >> >> >> at
> > >> >> >>>>>>> deployment
> > >> >> >>>>>>>>>>> time.
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>> User don't need to specify execution mode
> > >> >> >>>>>>> programmatically.
> > >> >> >>>>>>>>>>> They
> > >> >> >>>>>>>>>>>>> can
> > >> >> >>>>>>>>>>>>>>>>> also
> > >> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from the arguments in
> > >> >> >>> flink
> > >> >> >>>>> run
> > >> >> >>>>>>>>>>> command.
> > >> >> >>>>>>>>>>>>> e.g.
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster ....
> > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
> > >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that the
> > >> >> >>>> ExecutionEnvironment
> > >> >> >>>>>> is
> > >> >> >>>>>>>> not
> > >> >> >>>>>>>>>>>>>>>> directly
> > >> >> >>>>>>>>>>>>>>>>>>> initialized by the user and instead context
> > >> >> >>>> sensitive
> > >> >> >>>>>> how
> > >> >> >>>>>>>> you
> > >> >> >>>>>>>>>>>> want
> > >> >> >>>>>>>>>>>>> to
> > >> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs. IDE, for
> > >> >> >>> example).
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink would create
> > >> >> >>>>> different
> > >> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment based on different
> > >> >> >>>>>> submission
> > >> >> >>>>>>>>>>>> scenarios
> > >> >> >>>>>>>>>>>>>>>>>> (Flink
> > >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is kind of hack
> > >> >> >> approach,
> > >> >> >>>> not
> > >> >> >>>>>> so
> > >> >> >>>>>>>>>>>>>>>>> straightforward.
> > >> >> >>>>>>>>>>>>>>>>>>> What I suggested above is that is that flink
> > >> >> >>> should
> > >> >> >>>>>>> always
> > >> >> >>>>>>>>>>> create
> > >> >> >>>>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>>> same
> > >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with different
> > >> >> >>>>> configuration,
> > >> >> >>>>>>> and
> > >> >> >>>>>>>>>>> based
> > >> >> >>>>>>>>>>>> on
> > >> >> >>>>>>>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>>>> configuration it would create the proper
> > >> >> >>>>> ClusterClient
> > >> >> >>>>>>> for
> > >> >> >>>>>>>>>>>>> different
> > >> >> >>>>>>>>>>>>>>>>>>> behaviors.
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> > >> >> >>>> 于2018年12月20日周四
> > >> >> >>>>>>>>>>> 下午11:18写道：
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>> You are probably right that we have code
> > >> >> >>>> duplication
> > >> >> >>>>>>> when
> > >> >> >>>>>>>> it
> > >> >> >>>>>>>>>>>> comes
> > >> >> >>>>>>>>>>>>>>>> to
> > >> >> >>>>>>>>>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient. This should be
> > >> >> >>>>> reduced
> > >> >> >>>>>> in
> > >> >> >>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>> future.
> > >> >> >>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
> > >> >> >> able
> > >> >> >>> to
> > >> >> >>>>>>> define
> > >> >> >>>>>>>>>>> where
> > >> >> >>>>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>> job
> > >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is actually
> > >> >> >>>>>>> independent
> > >> >> >>>>>>>>>>> of the
> > >> >> >>>>>>>>>>>>>>>> job
> > >> >> >>>>>>>>>>>>>>>>>>>> development and is something which is decided
> > >> >> >> at
> > >> >> >>>>>>>> deployment
> > >> >> >>>>>>>>>>>> time.
> > >> >> >>>>>>>>>>>>>>>> To
> > >> >> >>>>>>>>>>>>>>>>> me
> > >> >> >>>>>>>>>>>>>>>>>>> it
> > >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the ExecutionEnvironment is
> > >> >> >> not
> > >> >> >>>>>>> directly
> > >> >> >>>>>>>>>>>>>>>> initialized
> > >> >> >>>>>>>>>>>>>>>>>> by
> > >> >> >>>>>>>>>>>>>>>>>>>> the user and instead context sensitive how you
> > >> >> >>>> want
> > >> >> >>>>> to
> > >> >> >>>>>>>>>>> execute
> > >> >> >>>>>>>>>>>>> your
> > >> >> >>>>>>>>>>>>>>>>> job
> > >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for example). However, I
> > >> >> >>> agree
> > >> >> >>>>>> that
> > >> >> >>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should give you access to
> > >> >> >>> the
> > >> >> >>>>>>>>>>> ClusterClient
> > >> >> >>>>>>>>>>>>>>>> and
> > >> >> >>>>>>>>>>>>>>>>> to
> > >> >> >>>>>>>>>>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the JobGraph or a
> > >> >> >> job
> > >> >> >>>>> plan).
> > >> >> >>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> > >> >> >>>>>>>>>>>>>>>>>>>> Till
> > >> >> >>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
> > >> >> >>>>>>>>>>> zjffdu@gmail.com>
> > >> >> >>>>>>>>>>>>>>>> wrote:
> > >> >> >>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> > >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You are right that I
> > >> >> >>>>> expect
> > >> >> >>>>>>>> better
> > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > >> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api which could be
> > >> >> >> used
> > >> >> >>> by
> > >> >> >>>>>>>>>>> downstream
> > >> >> >>>>>>>>>>>>>>>>> project.
> > >> >> >>>>>>>>>>>>>>>>>>> And
> > >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the flink ecosystem.
> > >> >> >> When
> > >> >> >>> I
> > >> >> >>>>> look
> > >> >> >>>>>>> at
> > >> >> >>>>>>>>>>> the
> > >> >> >>>>>>>>>>>> code
> > >> >> >>>>>>>>>>>>>>>>> of
> > >> >> >>>>>>>>>>>>>>>>>>>> flink
> > >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I believe they
> > >> >> >> are
> > >> >> >>>> not
> > >> >> >>>>>> the
> > >> >> >>>>>>>>>>> core of
> > >> >> >>>>>>>>>>>>>>>>> flink,
> > >> >> >>>>>>>>>>>>>>>>>>> but
> > >> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of flink), I find
> > >> >> >> many
> > >> >> >>>>>>> duplicated
> > >> >> >>>>>>>>>>> code
> > >> >> >>>>>>>>>>>>>>>> for
> > >> >> >>>>>>>>>>>>>>>>>>>> creating
> > >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user provided
> > >> >> >> configuration
> > >> >> >>>>>>>>>>> (configuration
> > >> >> >>>>>>>>>>>>>>>>> format
> > >> >> >>>>>>>>>>>>>>>>>>> may
> > >> >> >>>>>>>>>>>>>>>>>>>> be
> > >> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell and sql-client)
> > >> >> >> and
> > >> >> >>>> then
> > >> >> >>>>>> use
> > >> >> >>>>>>>>>>> that
> > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't think this is
> > >> >> >>>>> convenient
> > >> >> >>>>>>> for
> > >> >> >>>>>>>>>>>>>>>> downstream
> > >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is that downstream
> > >> >> >>>> project
> > >> >> >>>>>> only
> > >> >> >>>>>>>>>>> needs
> > >> >> >>>>>>>>>>>> to
> > >> >> >>>>>>>>>>>>>>>>>>> provide
> > >> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration info (maybe
> > >> >> >> introducing
> > >> >> >>>>> class
> > >> >> >>>>>>>>>>>> FlinkConf),
> > >> >> >>>>>>>>>>>>>>>>> and
> > >> >> >>>>>>>>>>>>>>>>>>>> then
> > >> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment based on this
> > >> >> >>>> FlinkConf,
> > >> >> >>>>>> and
> > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will create the proper
> > >> >> >>>>>>>> ClusterClient.
> > >> >> >>>>>>>>>>> It
> > >> >> >>>>>>>>>>>> not
> > >> >> >>>>>>>>>>>>>>>>>> only
> > >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream project
> > >> >> >> development
> > >> >> >>>> but
> > >> >> >>>>>> also
> > >> >> >>>>>>>> be
> > >> >> >>>>>>>>>>>>>>>> helpful
> > >> >> >>>>>>>>>>>>>>>>>> for
> > >> >> >>>>>>>>>>>>>>>>>>>>> their integration test with flink. Here's one
> > >> >> >>>>> sample
> > >> >> >>>>>>> code
> > >> >> >>>>>>>>>>>> snippet
> > >> >> >>>>>>>>>>>>>>>>>> that
> > >> >> >>>>>>>>>>>>>>>>>>> I
> > >> >> >>>>>>>>>>>>>>>>>>>>> expect.
> > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new FlinkConf().mode("yarn")
> > >> >> >>>>>>>>>>>>>>>>>>>>> val env = new ExecutionEnvironment(conf)
> > >> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
> > >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> > >> >> >>>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
> > >> >> >>>>>>>>>>>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
> > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> > >> >> >>>>> 于2018年12月11日周二
> > >> >> >>>>>>>>>>> 下午6:28写道：
> > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to provide the
> > >> >> >> user
> > >> >> >>>> with
> > >> >> >>>>>>>> better
> > >> >> >>>>>>>>>>>>>>>>>>> programmatic
> > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was actually an effort to
> > >> >> >>> achieve
> > >> >> >>>>>> this
> > >> >> >>>>>>>> but
> > >> >> >>>>>>>>>>> it
> > >> >> >>>>>>>>>>>>>>>> has
> > >> >> >>>>>>>>>>>>>>>>>>> never
> > >> >> >>>>>>>>>>>>>>>>>>>>> been
> > >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However, there are some
> > >> >> >>>> improvement
> > >> >> >>>>>> in
> > >> >> >>>>>>>> the
> > >> >> >>>>>>>>>>> code
> > >> >> >>>>>>>>>>>>>>>>> base
> > >> >> >>>>>>>>>>>>>>>>>>>> now.
> > >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the NewClusterClient
> > >> >> >>>> interface
> > >> >> >>>>>>> which
> > >> >> >>>>>>>>>>>>>>>> offers a
> > >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission. But I agree
> > >> >> >> that
> > >> >> >>> we
> > >> >> >>>>>> need
> > >> >> >>>>>>> to
> > >> >> >>>>>>>>>>>>>>>> improve
> > >> >> >>>>>>>>>>>>>>>>>>> Flink
> > >> >> >>>>>>>>>>>>>>>>>>>> in
> > >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if exposing all
> > >> >> >>>>>> ClusterClient
> > >> >> >>>>>>>>>>> calls
> > >> >> >>>>>>>>>>>>>>>> via
> > >> >> >>>>>>>>>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment because it would
> > >> >> >> clutter
> > >> >> >>>> the
> > >> >> >>>>>>> class
> > >> >> >>>>>>>>>>> and
> > >> >> >>>>>>>>>>>>>>>> would
> > >> >> >>>>>>>>>>>>>>>>>> not
> > >> >> >>>>>>>>>>>>>>>>>>>> be
> > >> >> >>>>>>>>>>>>>>>>>>>>> a
> > >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns. Instead one
> > >> >> >> idea
> > >> >> >>>>> could
> > >> >> >>>>>> be
> > >> >> >>>>>>>> to
> > >> >> >>>>>>>>>>>>>>>>> retrieve
> > >> >> >>>>>>>>>>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from the
> > >> >> >>>>> ExecutionEnvironment
> > >> >> >>>>>>>> which
> > >> >> >>>>>>>>>>> can
> > >> >> >>>>>>>>>>>>>>>>> then
> > >> >> >>>>>>>>>>>>>>>>>> be
> > >> >> >>>>>>>>>>>>>>>>>>>>> used
> > >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job control. But before we
> > >> >> >>> start
> > >> >> >>>>> an
> > >> >> >>>>>>>> effort
> > >> >> >>>>>>>>>>>>>>>> here,
> > >> >> >>>>>>>>>>>>>>>>> we
> > >> >> >>>>>>>>>>>>>>>>>>>> need
> > >> >> >>>>>>>>>>>>>>>>>>>>> to
> > >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what functionality we want
> > >> >> >>> to
> > >> >> >>>>>>> provide.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was that we have the
> > >> >> >>>>>>>> ClusterDescriptor
> > >> >> >>>>>>>>>>>>>>>>>> describing
> > >> >> >>>>>>>>>>>>>>>>>>>> how
> > >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager like Yarn or
> > >> >> >> Mesos.
> > >> >> >>>> The
> > >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> > >> >> >>>>>>>>>>>>>>>>>>>> can
> > >> >> >>>>>>>>>>>>>>>>>>>>> be
> > >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink clusters (job and
> > >> >> >>>>> session)
> > >> >> >>>>>>> and
> > >> >> >>>>>>>>>>> gives
> > >> >> >>>>>>>>>>>>>>>>> you a
> > >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The ClusterClient controls
> > >> >> >> the
> > >> >> >>>>>> cluster
> > >> >> >>>>>>>>>>> (e.g.
> > >> >> >>>>>>>>>>>>>>>>>>> submitting
> > >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running jobs). And then
> > >> >> >>> there
> > >> >> >>>>> was
> > >> >> >>>>>>> the
> > >> >> >>>>>>>>>>> idea
> > >> >> >>>>>>>>>>>> to
> > >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> > >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain from the
> > >> >> >>>> ClusterClient
> > >> >> >>>>> to
> > >> >> >>>>>>>>>>> trigger
> > >> >> >>>>>>>>>>>>>>>> job
> > >> >> >>>>>>>>>>>>>>>>>>>> specific
> > >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a savepoint,
> > >> >> >>> cancelling
> > >> >> >>>>> the
> > >> >> >>>>>>>> job).
> > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> > >> >> >>>>>> https://issues.apache.org/jira/browse/FLINK-4272
> > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> > >> >> >>>>>>>>>>>>>>>>>>>>>> Till
> > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang
> > >> >> >> <
> > >> >> >>>>>>>>>>> zjffdu@gmail.com
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>> wrote:
> > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate flink into apache
> > >> >> >>>>> zeppelin
> > >> >> >>>>>>>>>>> which is
> > >> >> >>>>>>>>>>>>>>>> an
> > >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several issues that is
> > >> >> >>>> caused
> > >> >> >>>>>> by
> > >> >> >>>>>>>>>>> flink
> > >> >> >>>>>>>>>>>>>>>>> client
> > >> >> >>>>>>>>>>>>>>>>>>>> api.
> > >> >> >>>>>>>>>>>>>>>>>>>>> So
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the following changes
> > >> >> >>> for
> > >> >> >>>>>> flink
> > >> >> >>>>>>>>>>> client
> > >> >> >>>>>>>>>>>>>>>>> api.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking execution.
> > >> >> >> Currently,
> > >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which would do 2
> > >> >> >> things,
> > >> >> >>>>> first
> > >> >> >>>>>>>>>>> submit
> > >> >> >>>>>>>>>>>>>>>> job
> > >> >> >>>>>>>>>>>>>>>>>> and
> > >> >> >>>>>>>>>>>>>>>>>>>> then
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is finished. I'd like
> > >> >> >>>>>>> introduce a
> > >> >> >>>>>>>>>>>>>>>>>> nonblocking
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
> > >> >> >>>> ExecutionEnvironment#submit
> > >> >> >>>>>>> which
> > >> >> >>>>>>>>>>> only
> > >> >> >>>>>>>>>>>>>>>>>> submit
> > >> >> >>>>>>>>>>>>>>>>>>>> job
> > >> >> >>>>>>>>>>>>>>>>>>>>>> and
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to client. And allow user
> > >> >> >>> to
> > >> >> >>>>>> query
> > >> >> >>>>>>>> the
> > >> >> >>>>>>>>>>> job
> > >> >> >>>>>>>>>>>>>>>>>> status
> > >> >> >>>>>>>>>>>>>>>>>>>> via
> > >> >> >>>>>>>>>>>>>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >> ExecutionEnvironment/StreamExecutionEnvironment,
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to cancel job is via
> > >> >> >>> cli
> > >> >> >>>>>>>>>>> (bin/flink),
> > >> >> >>>>>>>>>>>>>>>>> this
> > >> >> >>>>>>>>>>>>>>>>>>> is
> > >> >> >>>>>>>>>>>>>>>>>>>>> not
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream project to use
> > >> >> >> this
> > >> >> >>>>>>> feature.
> > >> >> >>>>>>>>>>> So I'd
> > >> >> >>>>>>>>>>>>>>>>>> like
> > >> >> >>>>>>>>>>>>>>>>>>> to
> > >> >> >>>>>>>>>>>>>>>>>>>>> add
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in ExecutionEnvironment
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
> > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >> >> >>> ExecutionEnvironment/StreamExecutionEnvironment.
> > >> >> >>>>>>>>>>>>>>>>>>>>>> It
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api, we should use
> > >> >> >>>>>>>>>>> ExecutionEnvironment
> > >> >> >>>>>>>>>>>>>>>> as
> > >> >> >>>>>>>>>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>>>>>>> unified
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to integrate with
> > >> >> >> flink.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job execution
> > >> >> >> lifecycle.
> > >> >> >>>>>>> Something
> > >> >> >>>>>>>>>>> like
> > >> >> >>>>>>>>>>>>>>>>>>>> following,
> > >> >> >>>>>>>>>>>>>>>>>>>>> so
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project can do custom logic
> > >> >> >>> in
> > >> >> >>>>> the
> > >> >> >>>>>>>>>>> lifecycle
> > >> >> >>>>>>>>>>>>>>>> of
> > >> >> >>>>>>>>>>>>>>>>>>> job.
> > >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the jobId after job
> > >> >> >> is
> > >> >> >>>>>>> submitted
> > >> >> >>>>>>>>>>> and
> > >> >> >>>>>>>>>>>>>>>>> then
> > >> >> >>>>>>>>>>>>>>>>>>> use
> > >> >> >>>>>>>>>>>>>>>>>>>>> this
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later when necessary.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface JobListener {
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID jobId);
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobExecuted(JobExecutionResult
> > >> >> >>>>> jobResult);
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID jobId);
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> }
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in ExecutionEnvironment.
> > >> >> >>>>>> Currently
> > >> >> >>>>>>> it
> > >> >> >>>>>>>>>>> is
> > >> >> >>>>>>>>>>>>>>>>>>> disabled,
> > >> >> >>>>>>>>>>>>>>>>>>>>> but
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient for third party
> > >> >> >> to
> > >> >> >>>>>>>> submitting
> > >> >> >>>>>>>>>>> jobs
> > >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it again.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client api into
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue which needs more
> > >> >> >>>>> careful
> > >> >> >>>>>>>>>>> thinking
> > >> >> >>>>>>>>>>>>>>>> and
> > >> >> >>>>>>>>>>>>>>>>>>>> design.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features of flink is
> > >> >> >>> exposed
> > >> >> >>>> in
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > >> >> >>>>>> but
> > >> >> >>>>>>>>>>> some are
> > >> >> >>>>>>>>>>>>>>>>>>> exposed
> > >> >> >>>>>>>>>>>>>>>>>>>>> in
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like the cancel and
> > >> >> >>>>> savepoint I
> > >> >> >>>>>>>>>>> mentioned
> > >> >> >>>>>>>>>>>>>>>>>>> above.
> > >> >> >>>>>>>>>>>>>>>>>>>> I
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is due to that flink
> > >> >> >>>> didn't
> > >> >> >>>>>>> unify
> > >> >> >>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>>>>> interaction
> > >> >> >>>>>>>>>>>>>>>>>>>>>> with
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3 scenarios of flink
> > >> >> >>>> operation
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.  Flink will create
> > >> >> >>>>>>>>>>> LocalEnvironment
> > >> >> >>>>>>>>>>>>>>>>> and
> > >> >> >>>>>>>>>>>>>>>>>>>> then
> > >> >> >>>>>>>>>>>>>>>>>>>>>> use
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to create
> > >> >> >>> LocalExecutor
> > >> >> >>>>> for
> > >> >> >>>>>>> job
> > >> >> >>>>>>>>>>>>>>>>>> execution.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution. Flink will create
> > >> >> >>>>>>>> ClusterClient
> > >> >> >>>>>>>>>>>>>>>>> first
> > >> >> >>>>>>>>>>>>>>>>>>> and
> > >> >> >>>>>>>>>>>>>>>>>>>>> then
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment based on the
> > >> >> >>>>>>> ClusterClient
> > >> >> >>>>>>>>>>> and
> > >> >> >>>>>>>>>>>>>>>>> then
> > >> >> >>>>>>>>>>>>>>>>>>> run
> > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink will create
> > >> >> >>>>>> ClusterClient
> > >> >> >>>>>>>>>>> first
> > >> >> >>>>>>>>>>>>>>>> and
> > >> >> >>>>>>>>>>>>>>>>>>> then
> > >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this ClusterClient.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the above 3 scenarios.
> > >> >> >>> Flink
> > >> >> >>>>>> didn't
> > >> >> >>>>>>>>>>> use the
> > >> >> >>>>>>>>>>>>>>>>>> same
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to interact with flink
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is following:
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> > >> >> >>>>>> LocalEnvironment/RemoteEnvironment
> > >> >> >>>>>>>>>>> (based
> > >> >> >>>>>>>>>>>>>>>> on
> > >> >> >>>>>>>>>>>>>>>>>> user
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this Environment to
> > >> >> >>>> create
> > >> >> >>>>>>>> proper
> > >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or RestClusterClient)
> > >> >> >> to
> > >> >> >>>>>>>> interactive
> > >> >> >>>>>>>>>>> with
> > >> >> >>>>>>>>>>>>>>>>>>> Flink (
> > >> >> >>>>>>>>>>>>>>>>>>>>> job
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the process of local
> > >> >> >>>>>> execution
> > >> >> >>>>>>>> and
> > >> >> >>>>>>>>>>>>>>>> remote
> > >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for third party to
> > >> >> >>>>> integrate
> > >> >> >>>>>>> with
> > >> >> >>>>>>>>>>>>>>>> flink,
> > >> >> >>>>>>>>>>>>>>>>>>>> because
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the unified entry
> > >> >> >>> point
> > >> >> >>>>> for
> > >> >> >>>>>>>>>>> flink.
> > >> >> >>>>>>>>>>>>>>>> What
> > >> >> >>>>>>>>>>>>>>>>>>> third
> > >> >> >>>>>>>>>>>>>>>>>>>>>> party
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass configuration to
> > >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> > >> >> >>>>>>>>>>>>>>>>>> and
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will do the right
> > >> >> >> thing
> > >> >> >>>>> based
> > >> >> >>>>>> on
> > >> >> >>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be considered as flink
> > >> >> >> api
> > >> >> >>>>>>> consumer.
> > >> >> >>>>>>>>>>> it
> > >> >> >>>>>>>>>>>>>>>> just
> > >> >> >>>>>>>>>>>>>>>>>>> pass
> > >> >> >>>>>>>>>>>>>>>>>>>>> the
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to ExecutionEnvironment and
> > >> >> >> let
> > >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> > >> >> >>>>>>>>>>>>>>>>>>> to
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper ClusterClient instead of
> > >> >> >>>>> letting
> > >> >> >>>>>>> cli
> > >> >> >>>>>>>> to
> > >> >> >>>>>>>>>>>>>>>>> create
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code refactoring, so
> > >> >> >> I
> > >> >> >>>>> think
> > >> >> >>>>>> we
> > >> >> >>>>>>>> can
> > >> >> >>>>>>>>>>>>>>>> defer
> > >> >> >>>>>>>>>>>>>>>>>> it
> > >> >> >>>>>>>>>>>>>>>>>>>> for
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5 could be done at
> > >> >> >>>> once I
> > >> >> >>>>>>>>>>> believe.
> > >> >> >>>>>>>>>>>>>>>> Let
> > >> >> >>>>>>>>>>>>>>>>> me
> > >> >> >>>>>>>>>>>>>>>>>>>> know
> > >> >> >>>>>>>>>>>>>>>>>>>>>> your
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback, thanks
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> --
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > >> >> >>>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>> --
> > >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> > >> >> >>>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>> --
> > >> >> >>>>>>>>>>>>>>>>>>> Best Regards
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> > >> >> >>>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>> --
> > >> >> >>>>>>>>>>>>>>>>> Best Regards
> > >> >> >>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> > >> >> >>>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>> --
> > >> >> >>>>>>>>>>>>>>> Best Regards
> > >> >> >>>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>> Jeff Zhang
> > >> >> >>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>> --
> > >> >> >>>>>>>>>>>>> Best Regards
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>> Jeff Zhang
> > >> >> >>>>>>>>>>>>>
> > >> >> >>>>>>>>>>>>
> > >> >> >>>>>>>>>>>
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> --
> > >> >> >>>>>>>> Best Regards
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Jeff Zhang
> > >> >> >>>>>>>>
> > >> >> >>>>>>>
> > >> >> >>>>>>
> > >> >> >>>>>
> > >> >> >>>>>
> > >> >> >>>>> --
> > >> >> >>>>> Best Regards
> > >> >> >>>>>
> > >> >> >>>>> Jeff Zhang
> > >> >> >>>>>
> > >> >> >>>>
> > >> >> >>>
> > >> >> >>
> > >> >> >
> > >> >> >
> > >> >> > --
> > >> >> > Best Regards
> > >> >> >
> > >> >> > Jeff Zhang
> > >> >>
> > >> >>
> > >>
> > >
> > >
> >
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Stephan Ewen <se...@apache.org>.

Till has made some good comments here.

Two things to add:

  - The job mode is very nice in the way that it runs the client inside the
cluster (in the same image/process that is the JM) and thus unifies both
applications and what the Spark world calls the "driver mode".

  - Another thing I would add is that during the FLIP-6 design, we were
thinking about setups where Dispatcher and JobManager are separate
processes.
    A Yarn or Mesos Dispatcher of a session could run independently (even
as privileged processes executing no code).
    Then you the "per-job" mode could still be helpful: when a job is
submitted to the dispatcher, it launches the JM again in a per-job mode, so
that JM and TM processes are bound to teh job only. For higher security
setups, it is important that processes are not reused across jobs.

On Tue, Aug 20, 2019 at 10:27 AM Till Rohrmann <tr...@apache.org> wrote:

> I would not be in favour of getting rid of the per-job mode since it
> simplifies the process of running Flink jobs considerably. Moreover, it is
> not only well suited for container deployments but also for deployments
> where you want to guarantee job isolation. For example, a user could use
> the per-job mode on Yarn to execute his job on a separate cluster.
>
> I think that having two notions of cluster deployments (session vs. per-job
> mode) does not necessarily contradict your ideas for the client api
> refactoring. For example one could have the following interfaces:
>
> - ClusterDeploymentDescriptor: encapsulates the logic how to deploy a
> cluster.
> - ClusterClient: allows to interact with a cluster
> - JobClient: allows to interact with a running job
>
> Now the ClusterDeploymentDescriptor could have two methods:
>
> - ClusterClient deploySessionCluster()
> - JobClusterClient/JobClient deployPerJobCluster(JobGraph)
>
> where JobClusterClient is either a supertype of ClusterClient which does
> not give you the functionality to submit jobs or deployPerJobCluster
> returns directly a JobClient.
>
> When setting up the ExecutionEnvironment, one would then not provide a
> ClusterClient to submit jobs but a JobDeployer which, depending on the
> selected mode, either uses a ClusterClient (session mode) to submit jobs or
> a ClusterDeploymentDescriptor to deploy per a job mode cluster with the job
> to execute.
>
> These are just some thoughts how one could make it working because I
> believe there is some value in using the per job mode from the
> ExecutionEnvironment.
>
> Concerning the web submission, this is indeed a bit tricky. From a cluster
> management stand point, I would in favour of not executing user code on the
> REST endpoint. Especially when considering security, it would be good to
> have a well defined cluster behaviour where it is explicitly stated where
> user code and, thus, potentially risky code is executed. Ideally we limit
> it to the TaskExecutor and JobMaster.
>
> Cheers,
> Till
>
> On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <po...@okkam.it>
> wrote:
>
> > In my opinion the client should not use any environment to get the Job
> > graph because the jar should reside ONLY on the cluster (and not in the
> > client classpath otherwise there are always inconsistencies between
> client
> > and Flink Job manager's classpath).
> > In the YARN, Mesos and Kubernetes scenarios you have the jar but you
> could
> > start a cluster that has the jar on the Job Manager as well (but this is
> > the only case where I think you can assume that the client has the jar on
> > the classpath..in the REST job submission you don't have any classpath).
> >
> > Thus, always in my opinion, the JobGraph should be generated by the Job
> > Manager REST API.
> >
> >
> > On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <wa...@gmail.com> wrote:
> >
> >> I would like to involve Till & Stephan here to clarify some concept of
> >> per-job mode.
> >>
> >> The term per-job is one of modes a cluster could run on. It is mainly
> >> aimed
> >> at spawn
> >> a dedicated cluster for a specific job while the job could be packaged
> >> with
> >> Flink
> >> itself and thus the cluster initialized with job so that get rid of a
> >> separated
> >> submission step.
> >>
> >> This is useful for container deployments where one create his image with
> >> the job
> >> and then simply deploy the container.
> >>
> >> However, it is out of client scope since a client(ClusterClient for
> >> example) is for
> >> communicate with an existing cluster and performance actions. Currently,
> >> in
> >> per-job
> >> mode, we extract the job graph and bundle it into cluster deployment and
> >> thus no
> >> concept of client get involved. It looks like reasonable to exclude the
> >> deployment
> >> of per-job cluster from client api and use dedicated utility
> >> classes(deployers) for
> >> deployment.
> >>
> >> Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午12:37写道：
> >>
> >> > Hi Aljoscha,
> >> >
> >> > Thanks for your reply and participance. The Google Doc you linked to
> >> > requires
> >> > permission and I think you could use a share link instead.
> >> >
> >> > I agree with that we almost reach a consensus that JobClient is
> >> necessary
> >> > to
> >> > interacte with a running Job.
> >> >
> >> > Let me check your open questions one by one.
> >> >
> >> > 1. Separate cluster creation and job submission for per-job mode.
> >> >
> >> > As you mentioned here is where the opinions diverge. In my document
> >> there
> >> > is
> >> > an alternative[2] that proposes excluding per-job deployment from
> client
> >> > api
> >> > scope and now I find it is more reasonable we do the exclusion.
> >> >
> >> > When in per-job mode, a dedicated JobCluster is launched to execute
> the
> >> > specific job. It is like a Flink Application more than a submission
> >> > of Flink Job. Client only takes care of job submission and assume
> there
> >> is
> >> > an existing cluster. In this way we are able to consider per-job
> issues
> >> > individually and JobClusterEntrypoint would be the utility class for
> >> > per-job
> >> > deployment.
> >> >
> >> > Nevertheless, user program works in both session mode and per-job mode
> >> > without
> >> > necessary to change code. JobClient in per-job mode is returned from
> >> > env.execute as normal. However, it would be no longer a wrapper of
> >> > RestClusterClient but a wrapper of PerJobClusterClient which
> >> communicates
> >> > to Dispatcher locally.
> >> >
> >> > 2. How to deal with plan preview.
> >> >
> >> > With env.compile functions users can get JobGraph or FlinkPlan and
> thus
> >> > they can preview the plan with programming. Typically it looks like
> >> >
> >> > if (preview configured) {
> >> >     FlinkPlan plan = env.compile();
> >> >     new JSONDumpGenerator(...).dump(plan);
> >> > } else {
> >> >     env.execute();
> >> > }
> >> >
> >> > And `flink info` would be invalid any more.
> >> >
> >> > 3. How to deal with Jar Submission at the Web Frontend.
> >> >
> >> > There is one more thread talked on this topic[1]. Apart from removing
> >> > the functions there are two alternatives.
> >> >
> >> > One is to introduce an interface has a method returns
> JobGraph/FilnkPlan
> >> > and Jar Submission only support main-class implements this interface.
> >> > And then extract the JobGraph/FlinkPlan just by calling the method.
> >> > In this way, it is even possible to consider a separation of job
> >> creation
> >> > and job submission.
> >> >
> >> > The other is, as you mentioned, let execute() do the actual execution.
> >> > We won't execute the main method in the WebFrontend but spawn a
> process
> >> > at WebMonitor side to execute. For return part we could generate the
> >> > JobID from WebMonitor and pass it to the execution environemnt.
> >> >
> >> > 4. How to deal with detached mode.
> >> >
> >> > I think detached mode is a temporary solution for non-blocking
> >> submission.
> >> > In my document both submission and execution return a
> CompletableFuture
> >> and
> >> > users control whether or not wait for the result. In this point we
> don't
> >> > need a detached option but the functionality is covered.
> >> >
> >> > 5. How does per-job mode interact with interactive programming.
> >> >
> >> > All of YARN, Mesos and Kubernetes scenarios follow the pattern launch
> a
> >> > JobCluster now. And I don't think there would be inconsistency between
> >> > different resource management.
> >> >
> >> > Best,
> >> > tison.
> >> >
> >> > [1]
> >> >
> >>
> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> >> > [2]
> >> >
> >>
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> >> >
> >> > Aljoscha Krettek <al...@apache.org> 于2019年8月16日周五 下午9:20写道：
> >> >
> >> >> Hi,
> >> >>
> >> >> I read both Jeffs initial design document and the newer document by
> >> >> Tison. I also finally found the time to collect our thoughts on the
> >> issue,
> >> >> I had quite some discussions with Kostas and this is the result: [1].
> >> >>
> >> >> I think overall we agree that this part of the code is in dire need
> of
> >> >> some refactoring/improvements but I think there are still some open
> >> >> questions and some differences in opinion what those refactorings
> >> should
> >> >> look like.
> >> >>
> >> >> I think the API-side is quite clear, i.e. we need some JobClient API
> >> that
> >> >> allows interacting with a running Job. It could be worthwhile to spin
> >> that
> >> >> off into a separate FLIP because we can probably find consensus on
> that
> >> >> part more easily.
> >> >>
> >> >> For the rest, the main open questions from our doc are these:
> >> >>
> >> >>   - Do we want to separate cluster creation and job submission for
> >> >> per-job mode? In the past, there were conscious efforts to *not*
> >> separate
> >> >> job submission from cluster creation for per-job clusters for Mesos,
> >> YARN,
> >> >> Kubernets (see StandaloneJobClusterEntryPoint). Tison suggests in his
> >> >> design document to decouple this in order to unify job submission.
> >> >>
> >> >>   - How to deal with plan preview, which needs to hijack execute()
> and
> >> >> let the outside code catch an exception?
> >> >>
> >> >>   - How to deal with Jar Submission at the Web Frontend, which needs
> to
> >> >> hijack execute() and let the outside code catch an exception?
> >> >> CliFrontend.run() “hijacks” ExecutionEnvironment.execute() to get a
> >> >> JobGraph and then execute that JobGraph manually. We could get around
> >> that
> >> >> by letting execute() do the actual execution. One caveat for this is
> >> that
> >> >> now the main() method doesn’t return (or is forced to return by
> >> throwing an
> >> >> exception from execute()) which means that for Jar Submission from
> the
> >> >> WebFrontend we have a long-running main() method running in the
> >> >> WebFrontend. This doesn’t sound very good. We could get around this
> by
> >> >> removing the plan preview feature and by removing Jar
> >> Submission/Running.
> >> >>
> >> >>   - How to deal with detached mode? Right now, DetachedEnvironment
> will
> >> >> execute the job and return immediately. If users control when they
> >> want to
> >> >> return, by waiting on the job completion future, how do we deal with
> >> this?
> >> >> Do we simply remove the distinction between detached/non-detached?
> >> >>
> >> >>   - How does per-job mode interact with “interactive programming”
> >> >> (FLIP-36). For YARN, each execute() call could spawn a new Flink YARN
> >> >> cluster. What about Mesos and Kubernetes?
> >> >>
> >> >> The first open question is where the opinions diverge, I think. The
> >> rest
> >> >> are just open questions and interesting things that we need to
> >> consider.
> >> >>
> >> >> Best,
> >> >> Aljoscha
> >> >>
> >> >> [1]
> >> >>
> >>
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> >> >> <
> >> >>
> >>
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> >> >> >
> >> >>
> >> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <zj...@gmail.com> wrote:
> >> >> >
> >> >> > Thanks tison for the effort. I left a few comments.
> >> >> >
> >> >> >
> >> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三 下午8:24写道：
> >> >> >
> >> >> >> Hi Flavio,
> >> >> >>
> >> >> >> Thanks for your reply.
> >> >> >>
> >> >> >> Either current impl and in the design, ClusterClient
> >> >> >> never takes responsibility for generating JobGraph.
> >> >> >> (what you see in current codebase is several class methods)
> >> >> >>
> >> >> >> Instead, user describes his program in the main method
> >> >> >> with ExecutionEnvironment apis and calls env.compile()
> >> >> >> or env.optimize() to get FlinkPlan and JobGraph respectively.
> >> >> >>
> >> >> >> For listing main classes in a jar and choose one for
> >> >> >> submission, you're now able to customize a CLI to do it.
> >> >> >> Specifically, the path of jar is passed as arguments and
> >> >> >> in the customized CLI you list main classes, choose one
> >> >> >> to submit to the cluster.
> >> >> >>
> >> >> >> Best,
> >> >> >> tison.
> >> >> >>
> >> >> >>
> >> >> >> Flavio Pompermaier <po...@okkam.it> 于2019年7月31日周三 下午8:12写道：
> >> >> >>
> >> >> >>> Just one note on my side: it is not clear to me whether the
> client
> >> >> needs
> >> >> >> to
> >> >> >>> be able to generate a job graph or not.
> >> >> >>> In my opinion, the job jar must resides only on the
> >> server/jobManager
> >> >> >> side
> >> >> >>> and the client requires a way to get the job graph.
> >> >> >>> If you really want to access to the job graph, I'd add a
> dedicated
> >> >> method
> >> >> >>> on the ClusterClient. like:
> >> >> >>>
> >> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
> >> >> >>>   - listMainClasses(jarId): List<String>
> >> >> >>>
> >> >> >>> These would require some addition also on the job manager
> endpoint
> >> as
> >> >> >>> well..what do you think?
> >> >> >>>
> >> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <wander4096@gmail.com
> >
> >> >> wrote:
> >> >> >>>
> >> >> >>>> Hi all,
> >> >> >>>>
> >> >> >>>> Here is a document[1] on client api enhancement from our
> >> perspective.
> >> >> >>>> We have investigated current implementations. And we propose
> >> >> >>>>
> >> >> >>>> 1. Unify the implementation of cluster deployment and job
> >> submission
> >> >> in
> >> >> >>>> Flink.
> >> >> >>>> 2. Provide programmatic interfaces to allow flexible job and
> >> cluster
> >> >> >>>> management.
> >> >> >>>>
> >> >> >>>> The first proposal is aimed at reducing code paths of cluster
> >> >> >> deployment
> >> >> >>>> and
> >> >> >>>> job submission so that one can adopt Flink in his usage easily.
> >> The
> >> >> >>> second
> >> >> >>>> proposal is aimed at providing rich interfaces for advanced
> users
> >> >> >>>> who want to make accurate control of these stages.
> >> >> >>>>
> >> >> >>>> Quick reference on open questions:
> >> >> >>>>
> >> >> >>>> 1. Exclude job cluster deployment from client side or redefine
> the
> >> >> >>> semantic
> >> >> >>>> of job cluster? Since it fits in a process quite different from
> >> >> session
> >> >> >>>> cluster deployment and job submission.
> >> >> >>>>
> >> >> >>>> 2. Maintain the codepaths handling class
> o.a.f.api.common.Program
> >> or
> >> >> >>>> implement customized program handling logic by customized
> >> >> CliFrontend?
> >> >> >>>> See also this thread[2] and the document[1].
> >> >> >>>>
> >> >> >>>> 3. Expose ClusterClient as public api or just expose api in
> >> >> >>>> ExecutionEnvironment
> >> >> >>>> and delegate them to ClusterClient? Further, in either way is it
> >> >> worth
> >> >> >> to
> >> >> >>>> introduce a JobClient which is an encapsulation of ClusterClient
> >> that
> >> >> >>>> associated to specific job?
> >> >> >>>>
> >> >> >>>> Best,
> >> >> >>>> tison.
> >> >> >>>>
> >> >> >>>> [1]
> >> >> >>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>
> >> >>
> >>
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> >> >> >>>> [2]
> >> >> >>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>
> >> >>
> >>
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> >> >> >>>>
> >> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三 上午9:19写道：
> >> >> >>>>
> >> >> >>>>> Thanks Stephan, I will follow up this issue in next few weeks,
> >> and
> >> >> >> will
> >> >> >>>>> refine the design doc. We could discuss more details after 1.9
> >> >> >> release.
> >> >> >>>>>
> >> >> >>>>> Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：
> >> >> >>>>>
> >> >> >>>>>> Hi all!
> >> >> >>>>>>
> >> >> >>>>>> This thread has stalled for a bit, which I assume ist mostly
> >> due to
> >> >> >>> the
> >> >> >>>>>> Flink 1.9 feature freeze and release testing effort.
> >> >> >>>>>>
> >> >> >>>>>> I personally still recognize this issue as one important to be
> >> >> >>> solved.
> >> >> >>>>> I'd
> >> >> >>>>>> be happy to help resume this discussion soon (after the 1.9
> >> >> >> release)
> >> >> >>>> and
> >> >> >>>>>> see if we can do some step towards this in Flink 1.10.
> >> >> >>>>>>
> >> >> >>>>>> Best,
> >> >> >>>>>> Stephan
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>>
> >> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <
> >> >> >>>>> pompermaier@okkam.it>
> >> >> >>>>>> wrote:
> >> >> >>>>>>
> >> >> >>>>>>> That's exactly what I suggested a long time ago: the Flink
> REST
> >> >> >>>> client
> >> >> >>>>>>> should not require any Flink dependency, only http library to
> >> >> >> call
> >> >> >>>> the
> >> >> >>>>>> REST
> >> >> >>>>>>> services to submit and monitor a job.
> >> >> >>>>>>> What I suggested also in [1] was to have a way to
> automatically
> >> >> >>>> suggest
> >> >> >>>>>> the
> >> >> >>>>>>> user (via a UI) the available main classes and their required
> >> >> >>>>>>> parameters[2].
> >> >> >>>>>>> Another problem we have with Flink is that the Rest client
> and
> >> >> >> the
> >> >> >>>> CLI
> >> >> >>>>>> one
> >> >> >>>>>>> behaves differently and we use the CLI client (via ssh)
> because
> >> >> >> it
> >> >> >>>>> allows
> >> >> >>>>>>> to call some other method after env.execute() [3] (we have to
> >> >> >> call
> >> >> >>>>>> another
> >> >> >>>>>>> REST service to signal the end of the job).
> >> >> >>>>>>> Int his regard, a dedicated interface, like the JobListener
> >> >> >>> suggested
> >> >> >>>>> in
> >> >> >>>>>>> the previous emails, would be very helpful (IMHO).
> >> >> >>>>>>>
> >> >> >>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-10864
> >> >> >>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-10862
> >> >> >>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-10879
> >> >> >>>>>>>
> >> >> >>>>>>> Best,
> >> >> >>>>>>> Flavio
> >> >> >>>>>>>
> >> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <zjffdu@gmail.com
> >
> >> >> >>> wrote:
> >> >> >>>>>>>
> >> >> >>>>>>>> Hi, Tison,
> >> >> >>>>>>>>
> >> >> >>>>>>>> Thanks for your comments. Overall I agree with you that it
> is
> >> >> >>>>> difficult
> >> >> >>>>>>> for
> >> >> >>>>>>>> down stream project to integrate with flink and we need to
> >> >> >>> refactor
> >> >> >>>>> the
> >> >> >>>>>>>> current flink client api.
> >> >> >>>>>>>> And I agree that CliFrontend should only parsing command
> line
> >> >> >>>>> arguments
> >> >> >>>>>>> and
> >> >> >>>>>>>> then pass them to ExecutionEnvironment. It is
> >> >> >>>> ExecutionEnvironment's
> >> >> >>>>>>>> responsibility to compile job, create cluster, and submit
> job.
> >> >> >>>>> Besides
> >> >> >>>>>>>> that, Currently flink has many ExecutionEnvironment
> >> >> >>>> implementations,
> >> >> >>>>>> and
> >> >> >>>>>>>> flink will use the specific one based on the context. IMHO,
> it
> >> >> >> is
> >> >> >>>> not
> >> >> >>>>>>>> necessary, ExecutionEnvironment should be able to do the
> right
> >> >> >>>> thing
> >> >> >>>>>>> based
> >> >> >>>>>>>> on the FlinkConf it is received. Too many
> ExecutionEnvironment
> >> >> >>>>>>>> implementation is another burden for downstream project
> >> >> >>>> integration.
> >> >> >>>>>>>>
> >> >> >>>>>>>> One thing I'd like to mention is flink's scala shell and sql
> >> >> >>>> client,
> >> >> >>>>>>>> although they are sub-modules of flink, they could be
> treated
> >> >> >> as
> >> >> >>>>>>> downstream
> >> >> >>>>>>>> project which use flink's client api. Currently you will
> find
> >> >> >> it
> >> >> >>> is
> >> >> >>>>> not
> >> >> >>>>>>>> easy for them to integrate with flink, they share many
> >> >> >> duplicated
> >> >> >>>>> code.
> >> >> >>>>>>> It
> >> >> >>>>>>>> is another sign that we should refactor flink client api.
> >> >> >>>>>>>>
> >> >> >>>>>>>> I believe it is a large and hard change, and I am afraid we
> >> can
> >> >> >>> not
> >> >> >>>>>> keep
> >> >> >>>>>>>> compatibility since many of changes are user facing.
> >> >> >>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：
> >> >> >>>>>>>>
> >> >> >>>>>>>>> Hi all,
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> After a closer look on our client apis, I can see there are
> >> >> >> two
> >> >> >>>>> major
> >> >> >>>>>>>>> issues to consistency and integration, namely different
> >> >> >>>> deployment
> >> >> >>>>> of
> >> >> >>>>>>>>> job cluster which couples job graph creation and cluster
> >> >> >>>>> deployment,
> >> >> >>>>>>>>> and submission via CliFrontend confusing control flow of
> job
> >> >> >>>> graph
> >> >> >>>>>>>>> compilation and job submission. I'd like to follow the
> >> >> >> discuss
> >> >> >>>>> above,
> >> >> >>>>>>>>> mainly the process described by Jeff and Stephan, and share
> >> >> >> my
> >> >> >>>>>>>>> ideas on these issues.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> 1) CliFrontend confuses the control flow of job compilation
> >> >> >> and
> >> >> >>>>>>>> submission.
> >> >> >>>>>>>>> Following the process of job submission Stephan and Jeff
> >> >> >>>> described,
> >> >> >>>>>>>>> execution environment knows all configs of the cluster and
> >> >> >>>>>>> topos/settings
> >> >> >>>>>>>>> of the job. Ideally, in the main method of user program, it
> >> >> >>> calls
> >> >> >>>>>>>> #execute
> >> >> >>>>>>>>> (or named #submit) and Flink deploys the cluster, compile
> the
> >> >> >>> job
> >> >> >>>>>> graph
> >> >> >>>>>>>>> and submit it to the cluster. However, current CliFrontend
> >> >> >> does
> >> >> >>>> all
> >> >> >>>>>>> these
> >> >> >>>>>>>>> things inside its #runProgram method, which introduces a
> lot
> >> >> >> of
> >> >> >>>>>>>> subclasses
> >> >> >>>>>>>>> of (stream) execution environment.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> Actually, it sets up an exec env that hijacks the
> >> >> >>>>>> #execute/executePlan
> >> >> >>>>>>>>> method, initializes the job graph and abort execution. And
> >> >> >> then
> >> >> >>>>>>>>> control flow back to CliFrontend, it deploys the cluster(or
> >> >> >>>>> retrieve
> >> >> >>>>>>>>> the client) and submits the job graph. This is quite a
> >> >> >> specific
> >> >> >>>>>>> internal
> >> >> >>>>>>>>> process inside Flink and none of consistency to anything.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> 2) Deployment of job cluster couples job graph creation and
> >> >> >>>> cluster
> >> >> >>>>>>>>> deployment. Abstractly, from user job to a concrete
> >> >> >> submission,
> >> >> >>>> it
> >> >> >>>>>>>> requires
> >> >> >>>>>>>>>
> >> >> >>>>>>>>>     create JobGraph --\
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> create ClusterClient -->  submit JobGraph
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> such a dependency. ClusterClient was created by deploying
> or
> >> >> >>>>>>> retrieving.
> >> >> >>>>>>>>> JobGraph submission requires a compiled JobGraph and valid
> >> >> >>>>>>> ClusterClient,
> >> >> >>>>>>>>> but the creation of ClusterClient is abstractly independent
> >> >> >> of
> >> >> >>>> that
> >> >> >>>>>> of
> >> >> >>>>>>>>> JobGraph. However, in job cluster mode, we deploy job
> cluster
> >> >> >>>> with
> >> >> >>>>> a
> >> >> >>>>>>> job
> >> >> >>>>>>>>> graph, which means we use another process:
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> create JobGraph --> deploy cluster with the JobGraph
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> Here is another inconsistency and downstream
> projects/client
> >> >> >>> apis
> >> >> >>>>> are
> >> >> >>>>>>>>> forced to handle different cases with rare supports from
> >> >> >> Flink.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> Since we likely reached a consensus on
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> 1. all configs gathered by Flink configuration and passed
> >> >> >>>>>>>>> 2. execution environment knows all configs and handles
> >> >> >>>>> execution(both
> >> >> >>>>>>>>> deployment and submission)
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> to the issues above I propose eliminating inconsistencies
> by
> >> >> >>>>>> following
> >> >> >>>>>>>>> approach:
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> 1) CliFrontend should exactly be a front end, at least for
> >> >> >>> "run"
> >> >> >>>>>>> command.
> >> >> >>>>>>>>> That means it just gathered and passed all config from
> >> >> >> command
> >> >> >>>> line
> >> >> >>>>>> to
> >> >> >>>>>>>>> the main method of user program. Execution environment
> knows
> >> >> >>> all
> >> >> >>>>> the
> >> >> >>>>>>> info
> >> >> >>>>>>>>> and with an addition to utils for ClusterClient, we
> >> >> >> gracefully
> >> >> >>>> get
> >> >> >>>>> a
> >> >> >>>>>>>>> ClusterClient by deploying or retrieving. In this way, we
> >> >> >> don't
> >> >> >>>>> need
> >> >> >>>>>> to
> >> >> >>>>>>>>> hijack #execute/executePlan methods and can remove various
> >> >> >>>> hacking
> >> >> >>>>>>>>> subclasses of exec env, as well as #run methods in
> >> >> >>>>> ClusterClient(for
> >> >> >>>>>> an
> >> >> >>>>>>>>> interface-ized ClusterClient). Now the control flow flows
> >> >> >> from
> >> >> >>>>>>>> CliFrontend
> >> >> >>>>>>>>> to the main method and never returns.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> 2) Job cluster means a cluster for the specific job. From
> >> >> >>> another
> >> >> >>>>>>>>> perspective, it is an ephemeral session. We may decouple
> the
> >> >> >>>>>> deployment
> >> >> >>>>>>>>> with a compiled job graph, but start a session with idle
> >> >> >>> timeout
> >> >> >>>>>>>>> and submit the job following.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> These topics, before we go into more details on design or
> >> >> >>>>>>> implementation,
> >> >> >>>>>>>>> are better to be aware and discussed for a consensus.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> Best,
> >> >> >>>>>>>>> tison.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
> >> >> >>>>>>>>>
> >> >> >>>>>>>>>> Hi Jeff,
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>> Thanks for raising this thread and the design document!
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>> As @Thomas Weise mentioned above, extending config to
> flink
> >> >> >>>>>>>>>> requires far more effort than it should be. Another
> example
> >> >> >>>>>>>>>> is we achieve detach mode by introduce another execution
> >> >> >>>>>>>>>> environment which also hijack #execute method.
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>> I agree with your idea that user would configure all
> things
> >> >> >>>>>>>>>> and flink "just" respect it. On this topic I think the
> >> >> >> unusual
> >> >> >>>>>>>>>> control flow when CliFrontend handle "run" command is the
> >> >> >>>> problem.
> >> >> >>>>>>>>>> It handles several configs, mainly about cluster settings,
> >> >> >> and
> >> >> >>>>>>>>>> thus main method of user program is unaware of them. Also
> it
> >> >> >>>>>> compiles
> >> >> >>>>>>>>>> app to job graph by run the main method with a hijacked
> exec
> >> >> >>>> env,
> >> >> >>>>>>>>>> which constrain the main method further.
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>> I'd like to write down a few of notes on configs/args pass
> >> >> >> and
> >> >> >>>>>>> respect,
> >> >> >>>>>>>>>> as well as decoupling job compilation and submission.
> Share
> >> >> >> on
> >> >> >>>>> this
> >> >> >>>>>>>>>> thread later.
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>> Best,
> >> >> >>>>>>>>>> tison.
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一
> >> >> >> 下午7:29写道：
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>>> Hi Jeff and Flavio,
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the design document.
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> We are also working on refactoring ClusterClient to allow
> >> >> >>>>> flexible
> >> >> >>>>>>> and
> >> >> >>>>>>>>>>> efficient job management in our real-time platform.
> >> >> >>>>>>>>>>> We would like to draft a document to share our ideas with
> >> >> >>> you.
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> I think it's a good idea to have something like Apache
> Livy
> >> >> >>> for
> >> >> >>>>>>> Flink,
> >> >> >>>>>>>>>>> and
> >> >> >>>>>>>>>>> the efforts discussed here will take a great step forward
> >> >> >> to
> >> >> >>>> it.
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> Regards,
> >> >> >>>>>>>>>>> Xiaogang
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一
> >> >> >>>>> 下午7:13写道：
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>>> Is there any possibility to have something like Apache
> >> >> >> Livy
> >> >> >>>> [1]
> >> >> >>>>>>> also
> >> >> >>>>>>>>>>> for
> >> >> >>>>>>>>>>>> Flink in the future?
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>> [1] https://livy.apache.org/
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <
> >> >> >>> zjffdu@gmail.com
> >> >> >>>>>
> >> >> >>>>>>> wrote:
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>> Any API we expose should not have dependencies on
> >> >> >>> the
> >> >> >>>>>>> runtime
> >> >> >>>>>>>>>>>>> (flink-runtime) package or other implementation
> >> >> >> details.
> >> >> >>> To
> >> >> >>>>> me,
> >> >> >>>>>>>> this
> >> >> >>>>>>>>>>>> means
> >> >> >>>>>>>>>>>>> that the current ClusterClient cannot be exposed to
> >> >> >> users
> >> >> >>>>>> because
> >> >> >>>>>>>> it
> >> >> >>>>>>>>>>>> uses
> >> >> >>>>>>>>>>>>> quite some classes from the optimiser and runtime
> >> >> >>> packages.
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>> We should change ClusterClient from class to interface.
> >> >> >>>>>>>>>>>>> ExecutionEnvironment only use the interface
> >> >> >> ClusterClient
> >> >> >>>>> which
> >> >> >>>>>>>>>>> should be
> >> >> >>>>>>>>>>>>> in flink-clients while the concrete implementation
> >> >> >> class
> >> >> >>>>> could
> >> >> >>>>>> be
> >> >> >>>>>>>> in
> >> >> >>>>>>>>>>>>> flink-runtime.
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>> What happens when a failure/restart in the client
> >> >> >>>>> happens?
> >> >> >>>>>>>> There
> >> >> >>>>>>>>>>> need
> >> >> >>>>>>>>>>>>> to be a way of re-establishing the connection to the
> >> >> >> job,
> >> >> >>>> set
> >> >> >>>>>> up
> >> >> >>>>>>>> the
> >> >> >>>>>>>>>>>>> listeners again, etc.
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>> Good point.  First we need to define what does
> >> >> >>>>> failure/restart
> >> >> >>>>>> in
> >> >> >>>>>>>> the
> >> >> >>>>>>>>>>>>> client mean. IIUC, that usually mean network failure
> >> >> >>> which
> >> >> >>>>> will
> >> >> >>>>>>>>>>> happen in
> >> >> >>>>>>>>>>>>> class RestClient. If my understanding is correct,
> >> >> >>>>> restart/retry
> >> >> >>>>>>>>>>> mechanism
> >> >> >>>>>>>>>>>>> should be done in RestClient.
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二
> >> >> >>>>>> 下午11:10写道：
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>> Some points to consider:
> >> >> >>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>> * Any API we expose should not have dependencies on
> >> >> >> the
> >> >> >>>>>> runtime
> >> >> >>>>>>>>>>>>>> (flink-runtime) package or other implementation
> >> >> >>> details.
> >> >> >>>> To
> >> >> >>>>>> me,
> >> >> >>>>>>>>>>> this
> >> >> >>>>>>>>>>>>> means
> >> >> >>>>>>>>>>>>>> that the current ClusterClient cannot be exposed to
> >> >> >>> users
> >> >> >>>>>>> because
> >> >> >>>>>>>>>>> it
> >> >> >>>>>>>>>>>>> uses
> >> >> >>>>>>>>>>>>>> quite some classes from the optimiser and runtime
> >> >> >>>> packages.
> >> >> >>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>> * What happens when a failure/restart in the client
> >> >> >>>>> happens?
> >> >> >>>>>>>> There
> >> >> >>>>>>>>>>> need
> >> >> >>>>>>>>>>>>> to
> >> >> >>>>>>>>>>>>>> be a way of re-establishing the connection to the
> >> >> >> job,
> >> >> >>>> set
> >> >> >>>>> up
> >> >> >>>>>>> the
> >> >> >>>>>>>>>>>>> listeners
> >> >> >>>>>>>>>>>>>> again, etc.
> >> >> >>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>> Aljoscha
> >> >> >>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff Zhang <
> >> >> >>>> zjffdu@gmail.com>
> >> >> >>>>>>>> wrote:
> >> >> >>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is late as you
> >> >> >> expected.
> >> >> >>>>> Here's
> >> >> >>>>>>> the
> >> >> >>>>>>>>>>>> design
> >> >> >>>>>>>>>>>>>> doc
> >> >> >>>>>>>>>>>>>>> I drafted, welcome any comments and feedback.
> >> >> >>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>
> >> >> >>>>>>
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>
> >> >>
> >>
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> >> >> >>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四
> >> >> >>>> 下午8:43写道：
> >> >> >>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>> Nice that this discussion is happening.
> >> >> >>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>> In the FLIP, we could also revisit the entire role
> >> >> >>> of
> >> >> >>>>> the
> >> >> >>>>>>>>>>>> environments
> >> >> >>>>>>>>>>>>>>>> again.
> >> >> >>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> >> >> >>>>>>>>>>>>>>>> - the environments take care of the specific
> >> >> >> setup
> >> >> >>>> for
> >> >> >>>>>>>>>>> standalone
> >> >> >>>>>>>>>>>> (no
> >> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
> >> >> >>>>>>>>>>>>>>>> - the session ones have control over the session.
> >> >> >>> The
> >> >> >>>>>>>>>>> environment
> >> >> >>>>>>>>>>>>> holds
> >> >> >>>>>>>>>>>>>>>> the session client.
> >> >> >>>>>>>>>>>>>>>> - running a job gives a "control" object for that
> >> >> >>>> job.
> >> >> >>>>>> That
> >> >> >>>>>>>>>>>> behavior
> >> >> >>>>>>>>>>>>> is
> >> >> >>>>>>>>>>>>>>>> the same in all environments.
> >> >> >>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>> The actual implementation diverged quite a bit
> >> >> >> from
> >> >> >>>>> that.
> >> >> >>>>>>>> Happy
> >> >> >>>>>>>>>>> to
> >> >> >>>>>>>>>>>>> see a
> >> >> >>>>>>>>>>>>>>>> discussion about straitening this out a bit more.
> >> >> >>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
> >> >> >>>>>>> zjffdu@gmail.com>
> >> >> >>>>>>>>>>>> wrote:
> >> >> >>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>> Hi folks,
> >> >> >>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>> Sorry for late response, It seems we reach
> >> >> >>> consensus
> >> >> >>>> on
> >> >> >>>>>>>> this, I
> >> >> >>>>>>>>>>>> will
> >> >> >>>>>>>>>>>>>>>> create
> >> >> >>>>>>>>>>>>>>>>> FLIP for this with more detailed design
> >> >> >>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org> 于2018年12月21日周五
> >> >> >>>>> 上午11:43写道：
> >> >> >>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>> Great to see this discussion seeded! The
> >> >> >> problems
> >> >> >>>> you
> >> >> >>>>>> face
> >> >> >>>>>>>>>>> with
> >> >> >>>>>>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also affecting other
> >> >> >>>>> downstream
> >> >> >>>>>>>>>>> projects,
> >> >> >>>>>>>>>>>>>> like
> >> >> >>>>>>>>>>>>>>>>>> Beam.
> >> >> >>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint restore option in
> >> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> >> >> >>>>>>>>>>>>>>>>> [1]
> >> >> >>>>>>>>>>>>>>>>>> and that was more difficult than it should be.
> >> >> >> The
> >> >> >>>>> main
> >> >> >>>>>>>> issue
> >> >> >>>>>>>>>>> is
> >> >> >>>>>>>>>>>>> that
> >> >> >>>>>>>>>>>>>>>>>> environment and cluster client aren't decoupled.
> >> >> >>>>> Ideally
> >> >> >>>>>>> it
> >> >> >>>>>>>>>>> should
> >> >> >>>>>>>>>>>>> be
> >> >> >>>>>>>>>>>>>>>>>> possible to just get the matching cluster client
> >> >> >>>> from
> >> >> >>>>>> the
> >> >> >>>>>>>>>>>>> environment
> >> >> >>>>>>>>>>>>>>>> and
> >> >> >>>>>>>>>>>>>>>>>> then control the job through it (environment as
> >> >> >>>>> factory
> >> >> >>>>>>> for
> >> >> >>>>>>>>>>>> cluster
> >> >> >>>>>>>>>>>>>>>>>> client). But note that the environment classes
> >> >> >> are
> >> >> >>>>> part
> >> >> >>>>>> of
> >> >> >>>>>>>> the
> >> >> >>>>>>>>>>>>> public
> >> >> >>>>>>>>>>>>>>>>> API,
> >> >> >>>>>>>>>>>>>>>>>> and it is not straightforward to make larger
> >> >> >>> changes
> >> >> >>>>>>> without
> >> >> >>>>>>>>>>>>> breaking
> >> >> >>>>>>>>>>>>>>>>>> backward compatibility.
> >> >> >>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes internal classes
> >> >> >>>> like
> >> >> >>>>>>>>>>> JobGraph and
> >> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be possible to wrap
> >> >> >>> this
> >> >> >>>>>> with a
> >> >> >>>>>>>> new
> >> >> >>>>>>>>>>>>> public
> >> >> >>>>>>>>>>>>>>>> API
> >> >> >>>>>>>>>>>>>>>>>> that brings the required job control
> >> >> >> capabilities
> >> >> >>>> for
> >> >> >>>>>>>>>>> downstream
> >> >> >>>>>>>>>>>>>>>>> projects.
> >> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at some of the
> >> >> >>>>> interfaces
> >> >> >>>>>> in
> >> >> >>>>>>>>>>> Beam
> >> >> >>>>>>>>>>>>> while
> >> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the portable job
> >> >> >> API
> >> >> >>>> and
> >> >> >>>>>> [3]
> >> >> >>>>>>>> for
> >> >> >>>>>>>>>>> the
> >> >> >>>>>>>>>>>>> old
> >> >> >>>>>>>>>>>>>>>>>> asynchronous job control from the Beam Java SDK.
> >> >> >>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>> The backward compatibility discussion [4] is
> >> >> >> also
> >> >> >>>>>> relevant
> >> >> >>>>>>>>>>> here. A
> >> >> >>>>>>>>>>>>> new
> >> >> >>>>>>>>>>>>>>>>> API
> >> >> >>>>>>>>>>>>>>>>>> should shield downstream projects from internals
> >> >> >>> and
> >> >> >>>>>> allow
> >> >> >>>>>>>>>>> them to
> >> >> >>>>>>>>>>>>>>>>>> interoperate with multiple future Flink versions
> >> >> >>> in
> >> >> >>>>> the
> >> >> >>>>>>> same
> >> >> >>>>>>>>>>>> release
> >> >> >>>>>>>>>>>>>>>> line
> >> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
> >> >> >>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>> Thanks,
> >> >> >>>>>>>>>>>>>>>>>> Thomas
> >> >> >>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>> [1] https://github.com/apache/flink/pull/7249
> >> >> >>>>>>>>>>>>>>>>>> [2]
> >> >> >>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>
> >> >> >>>>>>
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>
> >> >>
> >>
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> >> >> >>>>>>>>>>>>>>>>>> [3]
> >> >> >>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>
> >> >> >>>>>>
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>
> >> >>
> >>
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> >> >> >>>>>>>>>>>>>>>>>> [4]
> >> >> >>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>
> >> >> >>>>>>
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>
> >> >>
> >>
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> >> >> >>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
> >> >> >>>>>>>> zjffdu@gmail.com>
> >> >> >>>>>>>>>>>>> wrote:
> >> >> >>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
> >> >> >>> able
> >> >> >>>> to
> >> >> >>>>>>>> define
> >> >> >>>>>>>>>>>> where
> >> >> >>>>>>>>>>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>>> job
> >> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is actually
> >> >> >>>>>> independent
> >> >> >>>>>>>> of
> >> >> >>>>>>>>>>> the
> >> >> >>>>>>>>>>>>> job
> >> >> >>>>>>>>>>>>>>>>>>> development and is something which is decided
> >> >> >> at
> >> >> >>>>>>> deployment
> >> >> >>>>>>>>>>> time.
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>> User don't need to specify execution mode
> >> >> >>>>>>> programmatically.
> >> >> >>>>>>>>>>> They
> >> >> >>>>>>>>>>>>> can
> >> >> >>>>>>>>>>>>>>>>> also
> >> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from the arguments in
> >> >> >>> flink
> >> >> >>>>> run
> >> >> >>>>>>>>>>> command.
> >> >> >>>>>>>>>>>>> e.g.
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster ....
> >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
> >> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that the
> >> >> >>>> ExecutionEnvironment
> >> >> >>>>>> is
> >> >> >>>>>>>> not
> >> >> >>>>>>>>>>>>>>>> directly
> >> >> >>>>>>>>>>>>>>>>>>> initialized by the user and instead context
> >> >> >>>> sensitive
> >> >> >>>>>> how
> >> >> >>>>>>>> you
> >> >> >>>>>>>>>>>> want
> >> >> >>>>>>>>>>>>> to
> >> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs. IDE, for
> >> >> >>> example).
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink would create
> >> >> >>>>> different
> >> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment based on different
> >> >> >>>>>> submission
> >> >> >>>>>>>>>>>> scenarios
> >> >> >>>>>>>>>>>>>>>>>> (Flink
> >> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is kind of hack
> >> >> >> approach,
> >> >> >>>> not
> >> >> >>>>>> so
> >> >> >>>>>>>>>>>>>>>>> straightforward.
> >> >> >>>>>>>>>>>>>>>>>>> What I suggested above is that is that flink
> >> >> >>> should
> >> >> >>>>>>> always
> >> >> >>>>>>>>>>> create
> >> >> >>>>>>>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>>> same
> >> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with different
> >> >> >>>>> configuration,
> >> >> >>>>>>> and
> >> >> >>>>>>>>>>> based
> >> >> >>>>>>>>>>>> on
> >> >> >>>>>>>>>>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>>>> configuration it would create the proper
> >> >> >>>>> ClusterClient
> >> >> >>>>>>> for
> >> >> >>>>>>>>>>>>> different
> >> >> >>>>>>>>>>>>>>>>>>> behaviors.
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> >> >> >>>> 于2018年12月20日周四
> >> >> >>>>>>>>>>> 下午11:18写道：
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>> You are probably right that we have code
> >> >> >>>> duplication
> >> >> >>>>>>> when
> >> >> >>>>>>>> it
> >> >> >>>>>>>>>>>> comes
> >> >> >>>>>>>>>>>>>>>> to
> >> >> >>>>>>>>>>>>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient. This should be
> >> >> >>>>> reduced
> >> >> >>>>>> in
> >> >> >>>>>>>> the
> >> >> >>>>>>>>>>>>>>>> future.
> >> >> >>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
> >> >> >> able
> >> >> >>> to
> >> >> >>>>>>> define
> >> >> >>>>>>>>>>> where
> >> >> >>>>>>>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>> job
> >> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is actually
> >> >> >>>>>>> independent
> >> >> >>>>>>>>>>> of the
> >> >> >>>>>>>>>>>>>>>> job
> >> >> >>>>>>>>>>>>>>>>>>>> development and is something which is decided
> >> >> >> at
> >> >> >>>>>>>> deployment
> >> >> >>>>>>>>>>>> time.
> >> >> >>>>>>>>>>>>>>>> To
> >> >> >>>>>>>>>>>>>>>>> me
> >> >> >>>>>>>>>>>>>>>>>>> it
> >> >> >>>>>>>>>>>>>>>>>>>> makes sense that the ExecutionEnvironment is
> >> >> >> not
> >> >> >>>>>>> directly
> >> >> >>>>>>>>>>>>>>>> initialized
> >> >> >>>>>>>>>>>>>>>>>> by
> >> >> >>>>>>>>>>>>>>>>>>>> the user and instead context sensitive how you
> >> >> >>>> want
> >> >> >>>>> to
> >> >> >>>>>>>>>>> execute
> >> >> >>>>>>>>>>>>> your
> >> >> >>>>>>>>>>>>>>>>> job
> >> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for example). However, I
> >> >> >>> agree
> >> >> >>>>>> that
> >> >> >>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should give you access to
> >> >> >>> the
> >> >> >>>>>>>>>>> ClusterClient
> >> >> >>>>>>>>>>>>>>>> and
> >> >> >>>>>>>>>>>>>>>>> to
> >> >> >>>>>>>>>>>>>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the JobGraph or a
> >> >> >> job
> >> >> >>>>> plan).
> >> >> >>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> >> >> >>>>>>>>>>>>>>>>>>>> Till
> >> >> >>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
> >> >> >>>>>>>>>>> zjffdu@gmail.com>
> >> >> >>>>>>>>>>>>>>>> wrote:
> >> >> >>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> >> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You are right that I
> >> >> >>>>> expect
> >> >> >>>>>>>> better
> >> >> >>>>>>>>>>>>>>>>>>> programmatic
> >> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api which could be
> >> >> >> used
> >> >> >>> by
> >> >> >>>>>>>>>>> downstream
> >> >> >>>>>>>>>>>>>>>>> project.
> >> >> >>>>>>>>>>>>>>>>>>> And
> >> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the flink ecosystem.
> >> >> >> When
> >> >> >>> I
> >> >> >>>>> look
> >> >> >>>>>>> at
> >> >> >>>>>>>>>>> the
> >> >> >>>>>>>>>>>> code
> >> >> >>>>>>>>>>>>>>>>> of
> >> >> >>>>>>>>>>>>>>>>>>>> flink
> >> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I believe they
> >> >> >> are
> >> >> >>>> not
> >> >> >>>>>> the
> >> >> >>>>>>>>>>> core of
> >> >> >>>>>>>>>>>>>>>>> flink,
> >> >> >>>>>>>>>>>>>>>>>>> but
> >> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of flink), I find
> >> >> >> many
> >> >> >>>>>>> duplicated
> >> >> >>>>>>>>>>> code
> >> >> >>>>>>>>>>>>>>>> for
> >> >> >>>>>>>>>>>>>>>>>>>> creating
> >> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user provided
> >> >> >> configuration
> >> >> >>>>>>>>>>> (configuration
> >> >> >>>>>>>>>>>>>>>>> format
> >> >> >>>>>>>>>>>>>>>>>>> may
> >> >> >>>>>>>>>>>>>>>>>>>> be
> >> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell and sql-client)
> >> >> >> and
> >> >> >>>> then
> >> >> >>>>>> use
> >> >> >>>>>>>>>>> that
> >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> >> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't think this is
> >> >> >>>>> convenient
> >> >> >>>>>>> for
> >> >> >>>>>>>>>>>>>>>> downstream
> >> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is that downstream
> >> >> >>>> project
> >> >> >>>>>> only
> >> >> >>>>>>>>>>> needs
> >> >> >>>>>>>>>>>> to
> >> >> >>>>>>>>>>>>>>>>>>> provide
> >> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration info (maybe
> >> >> >> introducing
> >> >> >>>>> class
> >> >> >>>>>>>>>>>> FlinkConf),
> >> >> >>>>>>>>>>>>>>>>> and
> >> >> >>>>>>>>>>>>>>>>>>>> then
> >> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment based on this
> >> >> >>>> FlinkConf,
> >> >> >>>>>> and
> >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will create the proper
> >> >> >>>>>>>> ClusterClient.
> >> >> >>>>>>>>>>> It
> >> >> >>>>>>>>>>>> not
> >> >> >>>>>>>>>>>>>>>>>> only
> >> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream project
> >> >> >> development
> >> >> >>>> but
> >> >> >>>>>> also
> >> >> >>>>>>>> be
> >> >> >>>>>>>>>>>>>>>> helpful
> >> >> >>>>>>>>>>>>>>>>>> for
> >> >> >>>>>>>>>>>>>>>>>>>>> their integration test with flink. Here's one
> >> >> >>>>> sample
> >> >> >>>>>>> code
> >> >> >>>>>>>>>>>> snippet
> >> >> >>>>>>>>>>>>>>>>>> that
> >> >> >>>>>>>>>>>>>>>>>>> I
> >> >> >>>>>>>>>>>>>>>>>>>>> expect.
> >> >> >>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>> val conf = new FlinkConf().mode("yarn")
> >> >> >>>>>>>>>>>>>>>>>>>>> val env = new ExecutionEnvironment(conf)
> >> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
> >> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> >> >> >>>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
> >> >> >>>>>>>>>>>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
> >> >> >>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> >> >> >>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> >> >> >>>>> 于2018年12月11日周二
> >> >> >>>>>>>>>>> 下午6:28写道：
> >> >> >>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> >> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to provide the
> >> >> >> user
> >> >> >>>> with
> >> >> >>>>>>>> better
> >> >> >>>>>>>>>>>>>>>>>>> programmatic
> >> >> >>>>>>>>>>>>>>>>>>>>> job
> >> >> >>>>>>>>>>>>>>>>>>>>>> control. There was actually an effort to
> >> >> >>> achieve
> >> >> >>>>>> this
> >> >> >>>>>>>> but
> >> >> >>>>>>>>>>> it
> >> >> >>>>>>>>>>>>>>>> has
> >> >> >>>>>>>>>>>>>>>>>>> never
> >> >> >>>>>>>>>>>>>>>>>>>>> been
> >> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However, there are some
> >> >> >>>> improvement
> >> >> >>>>>> in
> >> >> >>>>>>>> the
> >> >> >>>>>>>>>>> code
> >> >> >>>>>>>>>>>>>>>>> base
> >> >> >>>>>>>>>>>>>>>>>>>> now.
> >> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the NewClusterClient
> >> >> >>>> interface
> >> >> >>>>>>> which
> >> >> >>>>>>>>>>>>>>>> offers a
> >> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission. But I agree
> >> >> >> that
> >> >> >>> we
> >> >> >>>>>> need
> >> >> >>>>>>> to
> >> >> >>>>>>>>>>>>>>>> improve
> >> >> >>>>>>>>>>>>>>>>>>> Flink
> >> >> >>>>>>>>>>>>>>>>>>>> in
> >> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> >> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if exposing all
> >> >> >>>>>> ClusterClient
> >> >> >>>>>>>>>>> calls
> >> >> >>>>>>>>>>>>>>>> via
> >> >> >>>>>>>>>>>>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment because it would
> >> >> >> clutter
> >> >> >>>> the
> >> >> >>>>>>> class
> >> >> >>>>>>>>>>> and
> >> >> >>>>>>>>>>>>>>>> would
> >> >> >>>>>>>>>>>>>>>>>> not
> >> >> >>>>>>>>>>>>>>>>>>>> be
> >> >> >>>>>>>>>>>>>>>>>>>>> a
> >> >> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns. Instead one
> >> >> >> idea
> >> >> >>>>> could
> >> >> >>>>>> be
> >> >> >>>>>>>> to
> >> >> >>>>>>>>>>>>>>>>> retrieve
> >> >> >>>>>>>>>>>>>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from the
> >> >> >>>>> ExecutionEnvironment
> >> >> >>>>>>>> which
> >> >> >>>>>>>>>>> can
> >> >> >>>>>>>>>>>>>>>>> then
> >> >> >>>>>>>>>>>>>>>>>> be
> >> >> >>>>>>>>>>>>>>>>>>>>> used
> >> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job control. But before we
> >> >> >>> start
> >> >> >>>>> an
> >> >> >>>>>>>> effort
> >> >> >>>>>>>>>>>>>>>> here,
> >> >> >>>>>>>>>>>>>>>>> we
> >> >> >>>>>>>>>>>>>>>>>>>> need
> >> >> >>>>>>>>>>>>>>>>>>>>> to
> >> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what functionality we want
> >> >> >>> to
> >> >> >>>>>>> provide.
> >> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was that we have the
> >> >> >>>>>>>> ClusterDescriptor
> >> >> >>>>>>>>>>>>>>>>>> describing
> >> >> >>>>>>>>>>>>>>>>>>>> how
> >> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager like Yarn or
> >> >> >> Mesos.
> >> >> >>>> The
> >> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> >> >> >>>>>>>>>>>>>>>>>>>> can
> >> >> >>>>>>>>>>>>>>>>>>>>> be
> >> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink clusters (job and
> >> >> >>>>> session)
> >> >> >>>>>>> and
> >> >> >>>>>>>>>>> gives
> >> >> >>>>>>>>>>>>>>>>> you a
> >> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The ClusterClient controls
> >> >> >> the
> >> >> >>>>>> cluster
> >> >> >>>>>>>>>>> (e.g.
> >> >> >>>>>>>>>>>>>>>>>>> submitting
> >> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running jobs). And then
> >> >> >>> there
> >> >> >>>>> was
> >> >> >>>>>>> the
> >> >> >>>>>>>>>>> idea
> >> >> >>>>>>>>>>>> to
> >> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> >> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain from the
> >> >> >>>> ClusterClient
> >> >> >>>>> to
> >> >> >>>>>>>>>>> trigger
> >> >> >>>>>>>>>>>>>>>> job
> >> >> >>>>>>>>>>>>>>>>>>>> specific
> >> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a savepoint,
> >> >> >>> cancelling
> >> >> >>>>> the
> >> >> >>>>>>>> job).
> >> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> >> >> >>>>>> https://issues.apache.org/jira/browse/FLINK-4272
> >> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> >> >> >>>>>>>>>>>>>>>>>>>>>> Till
> >> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang
> >> >> >> <
> >> >> >>>>>>>>>>> zjffdu@gmail.com
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>> wrote:
> >> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate flink into apache
> >> >> >>>>> zeppelin
> >> >> >>>>>>>>>>> which is
> >> >> >>>>>>>>>>>>>>>> an
> >> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> >> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several issues that is
> >> >> >>>> caused
> >> >> >>>>>> by
> >> >> >>>>>>>>>>> flink
> >> >> >>>>>>>>>>>>>>>>> client
> >> >> >>>>>>>>>>>>>>>>>>>> api.
> >> >> >>>>>>>>>>>>>>>>>>>>> So
> >> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the following changes
> >> >> >>> for
> >> >> >>>>>> flink
> >> >> >>>>>>>>>>> client
> >> >> >>>>>>>>>>>>>>>>> api.
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking execution.
> >> >> >> Currently,
> >> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
> >> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which would do 2
> >> >> >> things,
> >> >> >>>>> first
> >> >> >>>>>>>>>>> submit
> >> >> >>>>>>>>>>>>>>>> job
> >> >> >>>>>>>>>>>>>>>>>> and
> >> >> >>>>>>>>>>>>>>>>>>>> then
> >> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is finished. I'd like
> >> >> >>>>>>> introduce a
> >> >> >>>>>>>>>>>>>>>>>> nonblocking
> >> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
> >> >> >>>> ExecutionEnvironment#submit
> >> >> >>>>>>> which
> >> >> >>>>>>>>>>> only
> >> >> >>>>>>>>>>>>>>>>>> submit
> >> >> >>>>>>>>>>>>>>>>>>>> job
> >> >> >>>>>>>>>>>>>>>>>>>>>> and
> >> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to client. And allow user
> >> >> >>> to
> >> >> >>>>>> query
> >> >> >>>>>>>> the
> >> >> >>>>>>>>>>> job
> >> >> >>>>>>>>>>>>>>>>>> status
> >> >> >>>>>>>>>>>>>>>>>>>> via
> >> >> >>>>>>>>>>>>>>>>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >> ExecutionEnvironment/StreamExecutionEnvironment,
> >> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to cancel job is via
> >> >> >>> cli
> >> >> >>>>>>>>>>> (bin/flink),
> >> >> >>>>>>>>>>>>>>>>> this
> >> >> >>>>>>>>>>>>>>>>>>> is
> >> >> >>>>>>>>>>>>>>>>>>>>> not
> >> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream project to use
> >> >> >> this
> >> >> >>>>>>> feature.
> >> >> >>>>>>>>>>> So I'd
> >> >> >>>>>>>>>>>>>>>>>> like
> >> >> >>>>>>>>>>>>>>>>>>> to
> >> >> >>>>>>>>>>>>>>>>>>>>> add
> >> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in ExecutionEnvironment
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
> >> >> >>>>>>>>>>>>>>>>>>>>>
> >> >> >>> ExecutionEnvironment/StreamExecutionEnvironment.
> >> >> >>>>>>>>>>>>>>>>>>>>>> It
> >> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api, we should use
> >> >> >>>>>>>>>>> ExecutionEnvironment
> >> >> >>>>>>>>>>>>>>>> as
> >> >> >>>>>>>>>>>>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>>>>>>> unified
> >> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to integrate with
> >> >> >> flink.
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job execution
> >> >> >> lifecycle.
> >> >> >>>>>>> Something
> >> >> >>>>>>>>>>> like
> >> >> >>>>>>>>>>>>>>>>>>>> following,
> >> >> >>>>>>>>>>>>>>>>>>>>> so
> >> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project can do custom logic
> >> >> >>> in
> >> >> >>>>> the
> >> >> >>>>>>>>>>> lifecycle
> >> >> >>>>>>>>>>>>>>>> of
> >> >> >>>>>>>>>>>>>>>>>>> job.
> >> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> >> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the jobId after job
> >> >> >> is
> >> >> >>>>>>> submitted
> >> >> >>>>>>>>>>> and
> >> >> >>>>>>>>>>>>>>>>> then
> >> >> >>>>>>>>>>>>>>>>>>> use
> >> >> >>>>>>>>>>>>>>>>>>>>> this
> >> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later when necessary.
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>> public interface JobListener {
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID jobId);
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobExecuted(JobExecutionResult
> >> >> >>>>> jobResult);
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID jobId);
> >> >> >>>>>>>>>>>>>>>>>>>>>>> }
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in ExecutionEnvironment.
> >> >> >>>>>> Currently
> >> >> >>>>>>> it
> >> >> >>>>>>>>>>> is
> >> >> >>>>>>>>>>>>>>>>>>> disabled,
> >> >> >>>>>>>>>>>>>>>>>>>>> but
> >> >> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient for third party
> >> >> >> to
> >> >> >>>>>>>> submitting
> >> >> >>>>>>>>>>> jobs
> >> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> >> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it again.
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client api into
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment.
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue which needs more
> >> >> >>>>> careful
> >> >> >>>>>>>>>>> thinking
> >> >> >>>>>>>>>>>>>>>> and
> >> >> >>>>>>>>>>>>>>>>>>>> design.
> >> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features of flink is
> >> >> >>> exposed
> >> >> >>>> in
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment,
> >> >> >>>>>> but
> >> >> >>>>>>>>>>> some are
> >> >> >>>>>>>>>>>>>>>>>>> exposed
> >> >> >>>>>>>>>>>>>>>>>>>>> in
> >> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like the cancel and
> >> >> >>>>> savepoint I
> >> >> >>>>>>>>>>> mentioned
> >> >> >>>>>>>>>>>>>>>>>>> above.
> >> >> >>>>>>>>>>>>>>>>>>>> I
> >> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is due to that flink
> >> >> >>>> didn't
> >> >> >>>>>>> unify
> >> >> >>>>>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>>>>> interaction
> >> >> >>>>>>>>>>>>>>>>>>>>>> with
> >> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3 scenarios of flink
> >> >> >>>> operation
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.  Flink will create
> >> >> >>>>>>>>>>> LocalEnvironment
> >> >> >>>>>>>>>>>>>>>>> and
> >> >> >>>>>>>>>>>>>>>>>>>> then
> >> >> >>>>>>>>>>>>>>>>>>>>>> use
> >> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to create
> >> >> >>> LocalExecutor
> >> >> >>>>> for
> >> >> >>>>>>> job
> >> >> >>>>>>>>>>>>>>>>>> execution.
> >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution. Flink will create
> >> >> >>>>>>>> ClusterClient
> >> >> >>>>>>>>>>>>>>>>> first
> >> >> >>>>>>>>>>>>>>>>>>> and
> >> >> >>>>>>>>>>>>>>>>>>>>> then
> >> >> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment based on the
> >> >> >>>>>>> ClusterClient
> >> >> >>>>>>>>>>> and
> >> >> >>>>>>>>>>>>>>>>> then
> >> >> >>>>>>>>>>>>>>>>>>> run
> >> >> >>>>>>>>>>>>>>>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> >> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink will create
> >> >> >>>>>> ClusterClient
> >> >> >>>>>>>>>>> first
> >> >> >>>>>>>>>>>>>>>> and
> >> >> >>>>>>>>>>>>>>>>>>> then
> >> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> >> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this ClusterClient.
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the above 3 scenarios.
> >> >> >>> Flink
> >> >> >>>>>> didn't
> >> >> >>>>>>>>>>> use the
> >> >> >>>>>>>>>>>>>>>>>> same
> >> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to interact with flink
> >> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is following:
> >> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> >> >> >>>>>> LocalEnvironment/RemoteEnvironment
> >> >> >>>>>>>>>>> (based
> >> >> >>>>>>>>>>>>>>>> on
> >> >> >>>>>>>>>>>>>>>>>> user
> >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this Environment to
> >> >> >>>> create
> >> >> >>>>>>>> proper
> >> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> >> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or RestClusterClient)
> >> >> >> to
> >> >> >>>>>>>> interactive
> >> >> >>>>>>>>>>> with
> >> >> >>>>>>>>>>>>>>>>>>> Flink (
> >> >> >>>>>>>>>>>>>>>>>>>>> job
> >> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the process of local
> >> >> >>>>>> execution
> >> >> >>>>>>>> and
> >> >> >>>>>>>>>>>>>>>> remote
> >> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> >> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for third party to
> >> >> >>>>> integrate
> >> >> >>>>>>> with
> >> >> >>>>>>>>>>>>>>>> flink,
> >> >> >>>>>>>>>>>>>>>>>>>> because
> >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the unified entry
> >> >> >>> point
> >> >> >>>>> for
> >> >> >>>>>>>>>>> flink.
> >> >> >>>>>>>>>>>>>>>> What
> >> >> >>>>>>>>>>>>>>>>>>> third
> >> >> >>>>>>>>>>>>>>>>>>>>>> party
> >> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass configuration to
> >> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> >> >> >>>>>>>>>>>>>>>>>> and
> >> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will do the right
> >> >> >> thing
> >> >> >>>>> based
> >> >> >>>>>> on
> >> >> >>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> >> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be considered as flink
> >> >> >> api
> >> >> >>>>>>> consumer.
> >> >> >>>>>>>>>>> it
> >> >> >>>>>>>>>>>>>>>> just
> >> >> >>>>>>>>>>>>>>>>>>> pass
> >> >> >>>>>>>>>>>>>>>>>>>>> the
> >> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to ExecutionEnvironment and
> >> >> >> let
> >> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >> >> >>>>>>>>>>>>>>>>>>> to
> >> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper ClusterClient instead of
> >> >> >>>>> letting
> >> >> >>>>>>> cli
> >> >> >>>>>>>> to
> >> >> >>>>>>>>>>>>>>>>> create
> >> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code refactoring, so
> >> >> >> I
> >> >> >>>>> think
> >> >> >>>>>> we
> >> >> >>>>>>>> can
> >> >> >>>>>>>>>>>>>>>> defer
> >> >> >>>>>>>>>>>>>>>>>> it
> >> >> >>>>>>>>>>>>>>>>>>>> for
> >> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5 could be done at
> >> >> >>>> once I
> >> >> >>>>>>>>>>> believe.
> >> >> >>>>>>>>>>>>>>>> Let
> >> >> >>>>>>>>>>>>>>>>> me
> >> >> >>>>>>>>>>>>>>>>>>>> know
> >> >> >>>>>>>>>>>>>>>>>>>>>> your
> >> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback, thanks
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>> --
> >> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>> --
> >> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> >> >> >>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >> >> >>>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>> --
> >> >> >>>>>>>>>>>>>>>>>>> Best Regards
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> >> >> >>>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>> --
> >> >> >>>>>>>>>>>>>>>>> Best Regards
> >> >> >>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> >> >> >>>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>> --
> >> >> >>>>>>>>>>>>>>> Best Regards
> >> >> >>>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>> Jeff Zhang
> >> >> >>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>> --
> >> >> >>>>>>>>>>>>> Best Regards
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>> Jeff Zhang
> >> >> >>>>>>>>>>>>>
> >> >> >>>>>>>>>>>>
> >> >> >>>>>>>>>>>
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>
> >> >> >>>>>>>> --
> >> >> >>>>>>>> Best Regards
> >> >> >>>>>>>>
> >> >> >>>>>>>> Jeff Zhang
> >> >> >>>>>>>>
> >> >> >>>>>>>
> >> >> >>>>>>
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Best Regards
> >> >> >>>>>
> >> >> >>>>> Jeff Zhang
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Best Regards
> >> >> >
> >> >> > Jeff Zhang
> >> >>
> >> >>
> >>
> >
> >
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Till Rohrmann <tr...@apache.org>.

I would not be in favour of getting rid of the per-job mode since it
simplifies the process of running Flink jobs considerably. Moreover, it is
not only well suited for container deployments but also for deployments
where you want to guarantee job isolation. For example, a user could use
the per-job mode on Yarn to execute his job on a separate cluster.

I think that having two notions of cluster deployments (session vs. per-job
mode) does not necessarily contradict your ideas for the client api
refactoring. For example one could have the following interfaces:

- ClusterDeploymentDescriptor: encapsulates the logic how to deploy a
cluster.
- ClusterClient: allows to interact with a cluster
- JobClient: allows to interact with a running job

Now the ClusterDeploymentDescriptor could have two methods:

- ClusterClient deploySessionCluster()
- JobClusterClient/JobClient deployPerJobCluster(JobGraph)

where JobClusterClient is either a supertype of ClusterClient which does
not give you the functionality to submit jobs or deployPerJobCluster
returns directly a JobClient.

When setting up the ExecutionEnvironment, one would then not provide a
ClusterClient to submit jobs but a JobDeployer which, depending on the
selected mode, either uses a ClusterClient (session mode) to submit jobs or
a ClusterDeploymentDescriptor to deploy per a job mode cluster with the job
to execute.

These are just some thoughts how one could make it working because I
believe there is some value in using the per job mode from the
ExecutionEnvironment.

Concerning the web submission, this is indeed a bit tricky. From a cluster
management stand point, I would in favour of not executing user code on the
REST endpoint. Especially when considering security, it would be good to
have a well defined cluster behaviour where it is explicitly stated where
user code and, thus, potentially risky code is executed. Ideally we limit
it to the TaskExecutor and JobMaster.

Cheers,
Till

On Tue, Aug 20, 2019 at 9:40 AM Flavio Pompermaier <po...@okkam.it>
wrote:

> In my opinion the client should not use any environment to get the Job
> graph because the jar should reside ONLY on the cluster (and not in the
> client classpath otherwise there are always inconsistencies between client
> and Flink Job manager's classpath).
> In the YARN, Mesos and Kubernetes scenarios you have the jar but you could
> start a cluster that has the jar on the Job Manager as well (but this is
> the only case where I think you can assume that the client has the jar on
> the classpath..in the REST job submission you don't have any classpath).
>
> Thus, always in my opinion, the JobGraph should be generated by the Job
> Manager REST API.
>
>
> On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <wa...@gmail.com> wrote:
>
>> I would like to involve Till & Stephan here to clarify some concept of
>> per-job mode.
>>
>> The term per-job is one of modes a cluster could run on. It is mainly
>> aimed
>> at spawn
>> a dedicated cluster for a specific job while the job could be packaged
>> with
>> Flink
>> itself and thus the cluster initialized with job so that get rid of a
>> separated
>> submission step.
>>
>> This is useful for container deployments where one create his image with
>> the job
>> and then simply deploy the container.
>>
>> However, it is out of client scope since a client(ClusterClient for
>> example) is for
>> communicate with an existing cluster and performance actions. Currently,
>> in
>> per-job
>> mode, we extract the job graph and bundle it into cluster deployment and
>> thus no
>> concept of client get involved. It looks like reasonable to exclude the
>> deployment
>> of per-job cluster from client api and use dedicated utility
>> classes(deployers) for
>> deployment.
>>
>> Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午12:37写道：
>>
>> > Hi Aljoscha,
>> >
>> > Thanks for your reply and participance. The Google Doc you linked to
>> > requires
>> > permission and I think you could use a share link instead.
>> >
>> > I agree with that we almost reach a consensus that JobClient is
>> necessary
>> > to
>> > interacte with a running Job.
>> >
>> > Let me check your open questions one by one.
>> >
>> > 1. Separate cluster creation and job submission for per-job mode.
>> >
>> > As you mentioned here is where the opinions diverge. In my document
>> there
>> > is
>> > an alternative[2] that proposes excluding per-job deployment from client
>> > api
>> > scope and now I find it is more reasonable we do the exclusion.
>> >
>> > When in per-job mode, a dedicated JobCluster is launched to execute the
>> > specific job. It is like a Flink Application more than a submission
>> > of Flink Job. Client only takes care of job submission and assume there
>> is
>> > an existing cluster. In this way we are able to consider per-job issues
>> > individually and JobClusterEntrypoint would be the utility class for
>> > per-job
>> > deployment.
>> >
>> > Nevertheless, user program works in both session mode and per-job mode
>> > without
>> > necessary to change code. JobClient in per-job mode is returned from
>> > env.execute as normal. However, it would be no longer a wrapper of
>> > RestClusterClient but a wrapper of PerJobClusterClient which
>> communicates
>> > to Dispatcher locally.
>> >
>> > 2. How to deal with plan preview.
>> >
>> > With env.compile functions users can get JobGraph or FlinkPlan and thus
>> > they can preview the plan with programming. Typically it looks like
>> >
>> > if (preview configured) {
>> >     FlinkPlan plan = env.compile();
>> >     new JSONDumpGenerator(...).dump(plan);
>> > } else {
>> >     env.execute();
>> > }
>> >
>> > And `flink info` would be invalid any more.
>> >
>> > 3. How to deal with Jar Submission at the Web Frontend.
>> >
>> > There is one more thread talked on this topic[1]. Apart from removing
>> > the functions there are two alternatives.
>> >
>> > One is to introduce an interface has a method returns JobGraph/FilnkPlan
>> > and Jar Submission only support main-class implements this interface.
>> > And then extract the JobGraph/FlinkPlan just by calling the method.
>> > In this way, it is even possible to consider a separation of job
>> creation
>> > and job submission.
>> >
>> > The other is, as you mentioned, let execute() do the actual execution.
>> > We won't execute the main method in the WebFrontend but spawn a process
>> > at WebMonitor side to execute. For return part we could generate the
>> > JobID from WebMonitor and pass it to the execution environemnt.
>> >
>> > 4. How to deal with detached mode.
>> >
>> > I think detached mode is a temporary solution for non-blocking
>> submission.
>> > In my document both submission and execution return a CompletableFuture
>> and
>> > users control whether or not wait for the result. In this point we don't
>> > need a detached option but the functionality is covered.
>> >
>> > 5. How does per-job mode interact with interactive programming.
>> >
>> > All of YARN, Mesos and Kubernetes scenarios follow the pattern launch a
>> > JobCluster now. And I don't think there would be inconsistency between
>> > different resource management.
>> >
>> > Best,
>> > tison.
>> >
>> > [1]
>> >
>> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
>> > [2]
>> >
>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
>> >
>> > Aljoscha Krettek <al...@apache.org> 于2019年8月16日周五 下午9:20写道：
>> >
>> >> Hi,
>> >>
>> >> I read both Jeffs initial design document and the newer document by
>> >> Tison. I also finally found the time to collect our thoughts on the
>> issue,
>> >> I had quite some discussions with Kostas and this is the result: [1].
>> >>
>> >> I think overall we agree that this part of the code is in dire need of
>> >> some refactoring/improvements but I think there are still some open
>> >> questions and some differences in opinion what those refactorings
>> should
>> >> look like.
>> >>
>> >> I think the API-side is quite clear, i.e. we need some JobClient API
>> that
>> >> allows interacting with a running Job. It could be worthwhile to spin
>> that
>> >> off into a separate FLIP because we can probably find consensus on that
>> >> part more easily.
>> >>
>> >> For the rest, the main open questions from our doc are these:
>> >>
>> >>   - Do we want to separate cluster creation and job submission for
>> >> per-job mode? In the past, there were conscious efforts to *not*
>> separate
>> >> job submission from cluster creation for per-job clusters for Mesos,
>> YARN,
>> >> Kubernets (see StandaloneJobClusterEntryPoint). Tison suggests in his
>> >> design document to decouple this in order to unify job submission.
>> >>
>> >>   - How to deal with plan preview, which needs to hijack execute() and
>> >> let the outside code catch an exception?
>> >>
>> >>   - How to deal with Jar Submission at the Web Frontend, which needs to
>> >> hijack execute() and let the outside code catch an exception?
>> >> CliFrontend.run() “hijacks” ExecutionEnvironment.execute() to get a
>> >> JobGraph and then execute that JobGraph manually. We could get around
>> that
>> >> by letting execute() do the actual execution. One caveat for this is
>> that
>> >> now the main() method doesn’t return (or is forced to return by
>> throwing an
>> >> exception from execute()) which means that for Jar Submission from the
>> >> WebFrontend we have a long-running main() method running in the
>> >> WebFrontend. This doesn’t sound very good. We could get around this by
>> >> removing the plan preview feature and by removing Jar
>> Submission/Running.
>> >>
>> >>   - How to deal with detached mode? Right now, DetachedEnvironment will
>> >> execute the job and return immediately. If users control when they
>> want to
>> >> return, by waiting on the job completion future, how do we deal with
>> this?
>> >> Do we simply remove the distinction between detached/non-detached?
>> >>
>> >>   - How does per-job mode interact with “interactive programming”
>> >> (FLIP-36). For YARN, each execute() call could spawn a new Flink YARN
>> >> cluster. What about Mesos and Kubernetes?
>> >>
>> >> The first open question is where the opinions diverge, I think. The
>> rest
>> >> are just open questions and interesting things that we need to
>> consider.
>> >>
>> >> Best,
>> >> Aljoscha
>> >>
>> >> [1]
>> >>
>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>> >> <
>> >>
>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>> >> >
>> >>
>> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <zj...@gmail.com> wrote:
>> >> >
>> >> > Thanks tison for the effort. I left a few comments.
>> >> >
>> >> >
>> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三 下午8:24写道：
>> >> >
>> >> >> Hi Flavio,
>> >> >>
>> >> >> Thanks for your reply.
>> >> >>
>> >> >> Either current impl and in the design, ClusterClient
>> >> >> never takes responsibility for generating JobGraph.
>> >> >> (what you see in current codebase is several class methods)
>> >> >>
>> >> >> Instead, user describes his program in the main method
>> >> >> with ExecutionEnvironment apis and calls env.compile()
>> >> >> or env.optimize() to get FlinkPlan and JobGraph respectively.
>> >> >>
>> >> >> For listing main classes in a jar and choose one for
>> >> >> submission, you're now able to customize a CLI to do it.
>> >> >> Specifically, the path of jar is passed as arguments and
>> >> >> in the customized CLI you list main classes, choose one
>> >> >> to submit to the cluster.
>> >> >>
>> >> >> Best,
>> >> >> tison.
>> >> >>
>> >> >>
>> >> >> Flavio Pompermaier <po...@okkam.it> 于2019年7月31日周三 下午8:12写道：
>> >> >>
>> >> >>> Just one note on my side: it is not clear to me whether the client
>> >> needs
>> >> >> to
>> >> >>> be able to generate a job graph or not.
>> >> >>> In my opinion, the job jar must resides only on the
>> server/jobManager
>> >> >> side
>> >> >>> and the client requires a way to get the job graph.
>> >> >>> If you really want to access to the job graph, I'd add a dedicated
>> >> method
>> >> >>> on the ClusterClient. like:
>> >> >>>
>> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
>> >> >>>   - listMainClasses(jarId): List<String>
>> >> >>>
>> >> >>> These would require some addition also on the job manager endpoint
>> as
>> >> >>> well..what do you think?
>> >> >>>
>> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <wa...@gmail.com>
>> >> wrote:
>> >> >>>
>> >> >>>> Hi all,
>> >> >>>>
>> >> >>>> Here is a document[1] on client api enhancement from our
>> perspective.
>> >> >>>> We have investigated current implementations. And we propose
>> >> >>>>
>> >> >>>> 1. Unify the implementation of cluster deployment and job
>> submission
>> >> in
>> >> >>>> Flink.
>> >> >>>> 2. Provide programmatic interfaces to allow flexible job and
>> cluster
>> >> >>>> management.
>> >> >>>>
>> >> >>>> The first proposal is aimed at reducing code paths of cluster
>> >> >> deployment
>> >> >>>> and
>> >> >>>> job submission so that one can adopt Flink in his usage easily.
>> The
>> >> >>> second
>> >> >>>> proposal is aimed at providing rich interfaces for advanced users
>> >> >>>> who want to make accurate control of these stages.
>> >> >>>>
>> >> >>>> Quick reference on open questions:
>> >> >>>>
>> >> >>>> 1. Exclude job cluster deployment from client side or redefine the
>> >> >>> semantic
>> >> >>>> of job cluster? Since it fits in a process quite different from
>> >> session
>> >> >>>> cluster deployment and job submission.
>> >> >>>>
>> >> >>>> 2. Maintain the codepaths handling class o.a.f.api.common.Program
>> or
>> >> >>>> implement customized program handling logic by customized
>> >> CliFrontend?
>> >> >>>> See also this thread[2] and the document[1].
>> >> >>>>
>> >> >>>> 3. Expose ClusterClient as public api or just expose api in
>> >> >>>> ExecutionEnvironment
>> >> >>>> and delegate them to ClusterClient? Further, in either way is it
>> >> worth
>> >> >> to
>> >> >>>> introduce a JobClient which is an encapsulation of ClusterClient
>> that
>> >> >>>> associated to specific job?
>> >> >>>>
>> >> >>>> Best,
>> >> >>>> tison.
>> >> >>>>
>> >> >>>> [1]
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >>
>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
>> >> >>>> [2]
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >>
>> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
>> >> >>>>
>> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三 上午9:19写道：
>> >> >>>>
>> >> >>>>> Thanks Stephan, I will follow up this issue in next few weeks,
>> and
>> >> >> will
>> >> >>>>> refine the design doc. We could discuss more details after 1.9
>> >> >> release.
>> >> >>>>>
>> >> >>>>> Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：
>> >> >>>>>
>> >> >>>>>> Hi all!
>> >> >>>>>>
>> >> >>>>>> This thread has stalled for a bit, which I assume ist mostly
>> due to
>> >> >>> the
>> >> >>>>>> Flink 1.9 feature freeze and release testing effort.
>> >> >>>>>>
>> >> >>>>>> I personally still recognize this issue as one important to be
>> >> >>> solved.
>> >> >>>>> I'd
>> >> >>>>>> be happy to help resume this discussion soon (after the 1.9
>> >> >> release)
>> >> >>>> and
>> >> >>>>>> see if we can do some step towards this in Flink 1.10.
>> >> >>>>>>
>> >> >>>>>> Best,
>> >> >>>>>> Stephan
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <
>> >> >>>>> pompermaier@okkam.it>
>> >> >>>>>> wrote:
>> >> >>>>>>
>> >> >>>>>>> That's exactly what I suggested a long time ago: the Flink REST
>> >> >>>> client
>> >> >>>>>>> should not require any Flink dependency, only http library to
>> >> >> call
>> >> >>>> the
>> >> >>>>>> REST
>> >> >>>>>>> services to submit and monitor a job.
>> >> >>>>>>> What I suggested also in [1] was to have a way to automatically
>> >> >>>> suggest
>> >> >>>>>> the
>> >> >>>>>>> user (via a UI) the available main classes and their required
>> >> >>>>>>> parameters[2].
>> >> >>>>>>> Another problem we have with Flink is that the Rest client and
>> >> >> the
>> >> >>>> CLI
>> >> >>>>>> one
>> >> >>>>>>> behaves differently and we use the CLI client (via ssh) because
>> >> >> it
>> >> >>>>> allows
>> >> >>>>>>> to call some other method after env.execute() [3] (we have to
>> >> >> call
>> >> >>>>>> another
>> >> >>>>>>> REST service to signal the end of the job).
>> >> >>>>>>> Int his regard, a dedicated interface, like the JobListener
>> >> >>> suggested
>> >> >>>>> in
>> >> >>>>>>> the previous emails, would be very helpful (IMHO).
>> >> >>>>>>>
>> >> >>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-10864
>> >> >>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-10862
>> >> >>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-10879
>> >> >>>>>>>
>> >> >>>>>>> Best,
>> >> >>>>>>> Flavio
>> >> >>>>>>>
>> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <zj...@gmail.com>
>> >> >>> wrote:
>> >> >>>>>>>
>> >> >>>>>>>> Hi, Tison,
>> >> >>>>>>>>
>> >> >>>>>>>> Thanks for your comments. Overall I agree with you that it is
>> >> >>>>> difficult
>> >> >>>>>>> for
>> >> >>>>>>>> down stream project to integrate with flink and we need to
>> >> >>> refactor
>> >> >>>>> the
>> >> >>>>>>>> current flink client api.
>> >> >>>>>>>> And I agree that CliFrontend should only parsing command line
>> >> >>>>> arguments
>> >> >>>>>>> and
>> >> >>>>>>>> then pass them to ExecutionEnvironment. It is
>> >> >>>> ExecutionEnvironment's
>> >> >>>>>>>> responsibility to compile job, create cluster, and submit job.
>> >> >>>>> Besides
>> >> >>>>>>>> that, Currently flink has many ExecutionEnvironment
>> >> >>>> implementations,
>> >> >>>>>> and
>> >> >>>>>>>> flink will use the specific one based on the context. IMHO, it
>> >> >> is
>> >> >>>> not
>> >> >>>>>>>> necessary, ExecutionEnvironment should be able to do the right
>> >> >>>> thing
>> >> >>>>>>> based
>> >> >>>>>>>> on the FlinkConf it is received. Too many ExecutionEnvironment
>> >> >>>>>>>> implementation is another burden for downstream project
>> >> >>>> integration.
>> >> >>>>>>>>
>> >> >>>>>>>> One thing I'd like to mention is flink's scala shell and sql
>> >> >>>> client,
>> >> >>>>>>>> although they are sub-modules of flink, they could be treated
>> >> >> as
>> >> >>>>>>> downstream
>> >> >>>>>>>> project which use flink's client api. Currently you will find
>> >> >> it
>> >> >>> is
>> >> >>>>> not
>> >> >>>>>>>> easy for them to integrate with flink, they share many
>> >> >> duplicated
>> >> >>>>> code.
>> >> >>>>>>> It
>> >> >>>>>>>> is another sign that we should refactor flink client api.
>> >> >>>>>>>>
>> >> >>>>>>>> I believe it is a large and hard change, and I am afraid we
>> can
>> >> >>> not
>> >> >>>>>> keep
>> >> >>>>>>>> compatibility since many of changes are user facing.
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：
>> >> >>>>>>>>
>> >> >>>>>>>>> Hi all,
>> >> >>>>>>>>>
>> >> >>>>>>>>> After a closer look on our client apis, I can see there are
>> >> >> two
>> >> >>>>> major
>> >> >>>>>>>>> issues to consistency and integration, namely different
>> >> >>>> deployment
>> >> >>>>> of
>> >> >>>>>>>>> job cluster which couples job graph creation and cluster
>> >> >>>>> deployment,
>> >> >>>>>>>>> and submission via CliFrontend confusing control flow of job
>> >> >>>> graph
>> >> >>>>>>>>> compilation and job submission. I'd like to follow the
>> >> >> discuss
>> >> >>>>> above,
>> >> >>>>>>>>> mainly the process described by Jeff and Stephan, and share
>> >> >> my
>> >> >>>>>>>>> ideas on these issues.
>> >> >>>>>>>>>
>> >> >>>>>>>>> 1) CliFrontend confuses the control flow of job compilation
>> >> >> and
>> >> >>>>>>>> submission.
>> >> >>>>>>>>> Following the process of job submission Stephan and Jeff
>> >> >>>> described,
>> >> >>>>>>>>> execution environment knows all configs of the cluster and
>> >> >>>>>>> topos/settings
>> >> >>>>>>>>> of the job. Ideally, in the main method of user program, it
>> >> >>> calls
>> >> >>>>>>>> #execute
>> >> >>>>>>>>> (or named #submit) and Flink deploys the cluster, compile the
>> >> >>> job
>> >> >>>>>> graph
>> >> >>>>>>>>> and submit it to the cluster. However, current CliFrontend
>> >> >> does
>> >> >>>> all
>> >> >>>>>>> these
>> >> >>>>>>>>> things inside its #runProgram method, which introduces a lot
>> >> >> of
>> >> >>>>>>>> subclasses
>> >> >>>>>>>>> of (stream) execution environment.
>> >> >>>>>>>>>
>> >> >>>>>>>>> Actually, it sets up an exec env that hijacks the
>> >> >>>>>> #execute/executePlan
>> >> >>>>>>>>> method, initializes the job graph and abort execution. And
>> >> >> then
>> >> >>>>>>>>> control flow back to CliFrontend, it deploys the cluster(or
>> >> >>>>> retrieve
>> >> >>>>>>>>> the client) and submits the job graph. This is quite a
>> >> >> specific
>> >> >>>>>>> internal
>> >> >>>>>>>>> process inside Flink and none of consistency to anything.
>> >> >>>>>>>>>
>> >> >>>>>>>>> 2) Deployment of job cluster couples job graph creation and
>> >> >>>> cluster
>> >> >>>>>>>>> deployment. Abstractly, from user job to a concrete
>> >> >> submission,
>> >> >>>> it
>> >> >>>>>>>> requires
>> >> >>>>>>>>>
>> >> >>>>>>>>>     create JobGraph --\
>> >> >>>>>>>>>
>> >> >>>>>>>>> create ClusterClient -->  submit JobGraph
>> >> >>>>>>>>>
>> >> >>>>>>>>> such a dependency. ClusterClient was created by deploying or
>> >> >>>>>>> retrieving.
>> >> >>>>>>>>> JobGraph submission requires a compiled JobGraph and valid
>> >> >>>>>>> ClusterClient,
>> >> >>>>>>>>> but the creation of ClusterClient is abstractly independent
>> >> >> of
>> >> >>>> that
>> >> >>>>>> of
>> >> >>>>>>>>> JobGraph. However, in job cluster mode, we deploy job cluster
>> >> >>>> with
>> >> >>>>> a
>> >> >>>>>>> job
>> >> >>>>>>>>> graph, which means we use another process:
>> >> >>>>>>>>>
>> >> >>>>>>>>> create JobGraph --> deploy cluster with the JobGraph
>> >> >>>>>>>>>
>> >> >>>>>>>>> Here is another inconsistency and downstream projects/client
>> >> >>> apis
>> >> >>>>> are
>> >> >>>>>>>>> forced to handle different cases with rare supports from
>> >> >> Flink.
>> >> >>>>>>>>>
>> >> >>>>>>>>> Since we likely reached a consensus on
>> >> >>>>>>>>>
>> >> >>>>>>>>> 1. all configs gathered by Flink configuration and passed
>> >> >>>>>>>>> 2. execution environment knows all configs and handles
>> >> >>>>> execution(both
>> >> >>>>>>>>> deployment and submission)
>> >> >>>>>>>>>
>> >> >>>>>>>>> to the issues above I propose eliminating inconsistencies by
>> >> >>>>>> following
>> >> >>>>>>>>> approach:
>> >> >>>>>>>>>
>> >> >>>>>>>>> 1) CliFrontend should exactly be a front end, at least for
>> >> >>> "run"
>> >> >>>>>>> command.
>> >> >>>>>>>>> That means it just gathered and passed all config from
>> >> >> command
>> >> >>>> line
>> >> >>>>>> to
>> >> >>>>>>>>> the main method of user program. Execution environment knows
>> >> >>> all
>> >> >>>>> the
>> >> >>>>>>> info
>> >> >>>>>>>>> and with an addition to utils for ClusterClient, we
>> >> >> gracefully
>> >> >>>> get
>> >> >>>>> a
>> >> >>>>>>>>> ClusterClient by deploying or retrieving. In this way, we
>> >> >> don't
>> >> >>>>> need
>> >> >>>>>> to
>> >> >>>>>>>>> hijack #execute/executePlan methods and can remove various
>> >> >>>> hacking
>> >> >>>>>>>>> subclasses of exec env, as well as #run methods in
>> >> >>>>> ClusterClient(for
>> >> >>>>>> an
>> >> >>>>>>>>> interface-ized ClusterClient). Now the control flow flows
>> >> >> from
>> >> >>>>>>>> CliFrontend
>> >> >>>>>>>>> to the main method and never returns.
>> >> >>>>>>>>>
>> >> >>>>>>>>> 2) Job cluster means a cluster for the specific job. From
>> >> >>> another
>> >> >>>>>>>>> perspective, it is an ephemeral session. We may decouple the
>> >> >>>>>> deployment
>> >> >>>>>>>>> with a compiled job graph, but start a session with idle
>> >> >>> timeout
>> >> >>>>>>>>> and submit the job following.
>> >> >>>>>>>>>
>> >> >>>>>>>>> These topics, before we go into more details on design or
>> >> >>>>>>> implementation,
>> >> >>>>>>>>> are better to be aware and discussed for a consensus.
>> >> >>>>>>>>>
>> >> >>>>>>>>> Best,
>> >> >>>>>>>>> tison.
>> >> >>>>>>>>>
>> >> >>>>>>>>>
>> >> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
>> >> >>>>>>>>>
>> >> >>>>>>>>>> Hi Jeff,
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Thanks for raising this thread and the design document!
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> As @Thomas Weise mentioned above, extending config to flink
>> >> >>>>>>>>>> requires far more effort than it should be. Another example
>> >> >>>>>>>>>> is we achieve detach mode by introduce another execution
>> >> >>>>>>>>>> environment which also hijack #execute method.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> I agree with your idea that user would configure all things
>> >> >>>>>>>>>> and flink "just" respect it. On this topic I think the
>> >> >> unusual
>> >> >>>>>>>>>> control flow when CliFrontend handle "run" command is the
>> >> >>>> problem.
>> >> >>>>>>>>>> It handles several configs, mainly about cluster settings,
>> >> >> and
>> >> >>>>>>>>>> thus main method of user program is unaware of them. Also it
>> >> >>>>>> compiles
>> >> >>>>>>>>>> app to job graph by run the main method with a hijacked exec
>> >> >>>> env,
>> >> >>>>>>>>>> which constrain the main method further.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> I'd like to write down a few of notes on configs/args pass
>> >> >> and
>> >> >>>>>>> respect,
>> >> >>>>>>>>>> as well as decoupling job compilation and submission. Share
>> >> >> on
>> >> >>>>> this
>> >> >>>>>>>>>> thread later.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Best,
>> >> >>>>>>>>>> tison.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一
>> >> >> 下午7:29写道：
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>> Hi Jeff and Flavio,
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the design document.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> We are also working on refactoring ClusterClient to allow
>> >> >>>>> flexible
>> >> >>>>>>> and
>> >> >>>>>>>>>>> efficient job management in our real-time platform.
>> >> >>>>>>>>>>> We would like to draft a document to share our ideas with
>> >> >>> you.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> I think it's a good idea to have something like Apache Livy
>> >> >>> for
>> >> >>>>>>> Flink,
>> >> >>>>>>>>>>> and
>> >> >>>>>>>>>>> the efforts discussed here will take a great step forward
>> >> >> to
>> >> >>>> it.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Regards,
>> >> >>>>>>>>>>> Xiaogang
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一
>> >> >>>>> 下午7:13写道：
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>> Is there any possibility to have something like Apache
>> >> >> Livy
>> >> >>>> [1]
>> >> >>>>>>> also
>> >> >>>>>>>>>>> for
>> >> >>>>>>>>>>>> Flink in the future?
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> [1] https://livy.apache.org/
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <
>> >> >>> zjffdu@gmail.com
>> >> >>>>>
>> >> >>>>>>> wrote:
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>> Any API we expose should not have dependencies on
>> >> >>> the
>> >> >>>>>>> runtime
>> >> >>>>>>>>>>>>> (flink-runtime) package or other implementation
>> >> >> details.
>> >> >>> To
>> >> >>>>> me,
>> >> >>>>>>>> this
>> >> >>>>>>>>>>>> means
>> >> >>>>>>>>>>>>> that the current ClusterClient cannot be exposed to
>> >> >> users
>> >> >>>>>> because
>> >> >>>>>>>> it
>> >> >>>>>>>>>>>> uses
>> >> >>>>>>>>>>>>> quite some classes from the optimiser and runtime
>> >> >>> packages.
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> We should change ClusterClient from class to interface.
>> >> >>>>>>>>>>>>> ExecutionEnvironment only use the interface
>> >> >> ClusterClient
>> >> >>>>> which
>> >> >>>>>>>>>>> should be
>> >> >>>>>>>>>>>>> in flink-clients while the concrete implementation
>> >> >> class
>> >> >>>>> could
>> >> >>>>>> be
>> >> >>>>>>>> in
>> >> >>>>>>>>>>>>> flink-runtime.
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>> What happens when a failure/restart in the client
>> >> >>>>> happens?
>> >> >>>>>>>> There
>> >> >>>>>>>>>>> need
>> >> >>>>>>>>>>>>> to be a way of re-establishing the connection to the
>> >> >> job,
>> >> >>>> set
>> >> >>>>>> up
>> >> >>>>>>>> the
>> >> >>>>>>>>>>>>> listeners again, etc.
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> Good point.  First we need to define what does
>> >> >>>>> failure/restart
>> >> >>>>>> in
>> >> >>>>>>>> the
>> >> >>>>>>>>>>>>> client mean. IIUC, that usually mean network failure
>> >> >>> which
>> >> >>>>> will
>> >> >>>>>>>>>>> happen in
>> >> >>>>>>>>>>>>> class RestClient. If my understanding is correct,
>> >> >>>>> restart/retry
>> >> >>>>>>>>>>> mechanism
>> >> >>>>>>>>>>>>> should be done in RestClient.
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二
>> >> >>>>>> 下午11:10写道：
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>> Some points to consider:
>> >> >>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>> * Any API we expose should not have dependencies on
>> >> >> the
>> >> >>>>>> runtime
>> >> >>>>>>>>>>>>>> (flink-runtime) package or other implementation
>> >> >>> details.
>> >> >>>> To
>> >> >>>>>> me,
>> >> >>>>>>>>>>> this
>> >> >>>>>>>>>>>>> means
>> >> >>>>>>>>>>>>>> that the current ClusterClient cannot be exposed to
>> >> >>> users
>> >> >>>>>>> because
>> >> >>>>>>>>>>> it
>> >> >>>>>>>>>>>>> uses
>> >> >>>>>>>>>>>>>> quite some classes from the optimiser and runtime
>> >> >>>> packages.
>> >> >>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>> * What happens when a failure/restart in the client
>> >> >>>>> happens?
>> >> >>>>>>>> There
>> >> >>>>>>>>>>> need
>> >> >>>>>>>>>>>>> to
>> >> >>>>>>>>>>>>>> be a way of re-establishing the connection to the
>> >> >> job,
>> >> >>>> set
>> >> >>>>> up
>> >> >>>>>>> the
>> >> >>>>>>>>>>>>> listeners
>> >> >>>>>>>>>>>>>> again, etc.
>> >> >>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>> Aljoscha
>> >> >>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff Zhang <
>> >> >>>> zjffdu@gmail.com>
>> >> >>>>>>>> wrote:
>> >> >>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is late as you
>> >> >> expected.
>> >> >>>>> Here's
>> >> >>>>>>> the
>> >> >>>>>>>>>>>> design
>> >> >>>>>>>>>>>>>> doc
>> >> >>>>>>>>>>>>>>> I drafted, welcome any comments and feedback.
>> >> >>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >>
>> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
>> >> >>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四
>> >> >>>> 下午8:43写道：
>> >> >>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>> Nice that this discussion is happening.
>> >> >>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>> In the FLIP, we could also revisit the entire role
>> >> >>> of
>> >> >>>>> the
>> >> >>>>>>>>>>>> environments
>> >> >>>>>>>>>>>>>>>> again.
>> >> >>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>> Initially, the idea was:
>> >> >>>>>>>>>>>>>>>> - the environments take care of the specific
>> >> >> setup
>> >> >>>> for
>> >> >>>>>>>>>>> standalone
>> >> >>>>>>>>>>>> (no
>> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
>> >> >>>>>>>>>>>>>>>> - the session ones have control over the session.
>> >> >>> The
>> >> >>>>>>>>>>> environment
>> >> >>>>>>>>>>>>> holds
>> >> >>>>>>>>>>>>>>>> the session client.
>> >> >>>>>>>>>>>>>>>> - running a job gives a "control" object for that
>> >> >>>> job.
>> >> >>>>>> That
>> >> >>>>>>>>>>>> behavior
>> >> >>>>>>>>>>>>> is
>> >> >>>>>>>>>>>>>>>> the same in all environments.
>> >> >>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>> The actual implementation diverged quite a bit
>> >> >> from
>> >> >>>>> that.
>> >> >>>>>>>> Happy
>> >> >>>>>>>>>>> to
>> >> >>>>>>>>>>>>> see a
>> >> >>>>>>>>>>>>>>>> discussion about straitening this out a bit more.
>> >> >>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
>> >> >>>>>>> zjffdu@gmail.com>
>> >> >>>>>>>>>>>> wrote:
>> >> >>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>> Hi folks,
>> >> >>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>> Sorry for late response, It seems we reach
>> >> >>> consensus
>> >> >>>> on
>> >> >>>>>>>> this, I
>> >> >>>>>>>>>>>> will
>> >> >>>>>>>>>>>>>>>> create
>> >> >>>>>>>>>>>>>>>>> FLIP for this with more detailed design
>> >> >>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org> 于2018年12月21日周五
>> >> >>>>> 上午11:43写道：
>> >> >>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>> Great to see this discussion seeded! The
>> >> >> problems
>> >> >>>> you
>> >> >>>>>> face
>> >> >>>>>>>>>>> with
>> >> >>>>>>>>>>>> the
>> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also affecting other
>> >> >>>>> downstream
>> >> >>>>>>>>>>> projects,
>> >> >>>>>>>>>>>>>> like
>> >> >>>>>>>>>>>>>>>>>> Beam.
>> >> >>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint restore option in
>> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
>> >> >>>>>>>>>>>>>>>>> [1]
>> >> >>>>>>>>>>>>>>>>>> and that was more difficult than it should be.
>> >> >> The
>> >> >>>>> main
>> >> >>>>>>>> issue
>> >> >>>>>>>>>>> is
>> >> >>>>>>>>>>>>> that
>> >> >>>>>>>>>>>>>>>>>> environment and cluster client aren't decoupled.
>> >> >>>>> Ideally
>> >> >>>>>>> it
>> >> >>>>>>>>>>> should
>> >> >>>>>>>>>>>>> be
>> >> >>>>>>>>>>>>>>>>>> possible to just get the matching cluster client
>> >> >>>> from
>> >> >>>>>> the
>> >> >>>>>>>>>>>>> environment
>> >> >>>>>>>>>>>>>>>> and
>> >> >>>>>>>>>>>>>>>>>> then control the job through it (environment as
>> >> >>>>> factory
>> >> >>>>>>> for
>> >> >>>>>>>>>>>> cluster
>> >> >>>>>>>>>>>>>>>>>> client). But note that the environment classes
>> >> >> are
>> >> >>>>> part
>> >> >>>>>> of
>> >> >>>>>>>> the
>> >> >>>>>>>>>>>>> public
>> >> >>>>>>>>>>>>>>>>> API,
>> >> >>>>>>>>>>>>>>>>>> and it is not straightforward to make larger
>> >> >>> changes
>> >> >>>>>>> without
>> >> >>>>>>>>>>>>> breaking
>> >> >>>>>>>>>>>>>>>>>> backward compatibility.
>> >> >>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes internal classes
>> >> >>>> like
>> >> >>>>>>>>>>> JobGraph and
>> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be possible to wrap
>> >> >>> this
>> >> >>>>>> with a
>> >> >>>>>>>> new
>> >> >>>>>>>>>>>>> public
>> >> >>>>>>>>>>>>>>>> API
>> >> >>>>>>>>>>>>>>>>>> that brings the required job control
>> >> >> capabilities
>> >> >>>> for
>> >> >>>>>>>>>>> downstream
>> >> >>>>>>>>>>>>>>>>> projects.
>> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at some of the
>> >> >>>>> interfaces
>> >> >>>>>> in
>> >> >>>>>>>>>>> Beam
>> >> >>>>>>>>>>>>> while
>> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the portable job
>> >> >> API
>> >> >>>> and
>> >> >>>>>> [3]
>> >> >>>>>>>> for
>> >> >>>>>>>>>>> the
>> >> >>>>>>>>>>>>> old
>> >> >>>>>>>>>>>>>>>>>> asynchronous job control from the Beam Java SDK.
>> >> >>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>> The backward compatibility discussion [4] is
>> >> >> also
>> >> >>>>>> relevant
>> >> >>>>>>>>>>> here. A
>> >> >>>>>>>>>>>>> new
>> >> >>>>>>>>>>>>>>>>> API
>> >> >>>>>>>>>>>>>>>>>> should shield downstream projects from internals
>> >> >>> and
>> >> >>>>>> allow
>> >> >>>>>>>>>>> them to
>> >> >>>>>>>>>>>>>>>>>> interoperate with multiple future Flink versions
>> >> >>> in
>> >> >>>>> the
>> >> >>>>>>> same
>> >> >>>>>>>>>>>> release
>> >> >>>>>>>>>>>>>>>> line
>> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
>> >> >>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>> Thanks,
>> >> >>>>>>>>>>>>>>>>>> Thomas
>> >> >>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>> [1] https://github.com/apache/flink/pull/7249
>> >> >>>>>>>>>>>>>>>>>> [2]
>> >> >>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >>
>> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
>> >> >>>>>>>>>>>>>>>>>> [3]
>> >> >>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >>
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
>> >> >>>>>>>>>>>>>>>>>> [4]
>> >> >>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >>
>> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
>> >> >>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
>> >> >>>>>>>> zjffdu@gmail.com>
>> >> >>>>>>>>>>>>> wrote:
>> >> >>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
>> >> >>> able
>> >> >>>> to
>> >> >>>>>>>> define
>> >> >>>>>>>>>>>> where
>> >> >>>>>>>>>>>>>>>> the
>> >> >>>>>>>>>>>>>>>>>> job
>> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is actually
>> >> >>>>>> independent
>> >> >>>>>>>> of
>> >> >>>>>>>>>>> the
>> >> >>>>>>>>>>>>> job
>> >> >>>>>>>>>>>>>>>>>>> development and is something which is decided
>> >> >> at
>> >> >>>>>>> deployment
>> >> >>>>>>>>>>> time.
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>> User don't need to specify execution mode
>> >> >>>>>>> programmatically.
>> >> >>>>>>>>>>> They
>> >> >>>>>>>>>>>>> can
>> >> >>>>>>>>>>>>>>>>> also
>> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from the arguments in
>> >> >>> flink
>> >> >>>>> run
>> >> >>>>>>>>>>> command.
>> >> >>>>>>>>>>>>> e.g.
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster ....
>> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
>> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that the
>> >> >>>> ExecutionEnvironment
>> >> >>>>>> is
>> >> >>>>>>>> not
>> >> >>>>>>>>>>>>>>>> directly
>> >> >>>>>>>>>>>>>>>>>>> initialized by the user and instead context
>> >> >>>> sensitive
>> >> >>>>>> how
>> >> >>>>>>>> you
>> >> >>>>>>>>>>>> want
>> >> >>>>>>>>>>>>> to
>> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs. IDE, for
>> >> >>> example).
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink would create
>> >> >>>>> different
>> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment based on different
>> >> >>>>>> submission
>> >> >>>>>>>>>>>> scenarios
>> >> >>>>>>>>>>>>>>>>>> (Flink
>> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is kind of hack
>> >> >> approach,
>> >> >>>> not
>> >> >>>>>> so
>> >> >>>>>>>>>>>>>>>>> straightforward.
>> >> >>>>>>>>>>>>>>>>>>> What I suggested above is that is that flink
>> >> >>> should
>> >> >>>>>>> always
>> >> >>>>>>>>>>> create
>> >> >>>>>>>>>>>>> the
>> >> >>>>>>>>>>>>>>>>>> same
>> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with different
>> >> >>>>> configuration,
>> >> >>>>>>> and
>> >> >>>>>>>>>>> based
>> >> >>>>>>>>>>>> on
>> >> >>>>>>>>>>>>>>>> the
>> >> >>>>>>>>>>>>>>>>>>> configuration it would create the proper
>> >> >>>>> ClusterClient
>> >> >>>>>>> for
>> >> >>>>>>>>>>>>> different
>> >> >>>>>>>>>>>>>>>>>>> behaviors.
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
>> >> >>>> 于2018年12月20日周四
>> >> >>>>>>>>>>> 下午11:18写道：
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>> You are probably right that we have code
>> >> >>>> duplication
>> >> >>>>>>> when
>> >> >>>>>>>> it
>> >> >>>>>>>>>>>> comes
>> >> >>>>>>>>>>>>>>>> to
>> >> >>>>>>>>>>>>>>>>>> the
>> >> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient. This should be
>> >> >>>>> reduced
>> >> >>>>>> in
>> >> >>>>>>>> the
>> >> >>>>>>>>>>>>>>>> future.
>> >> >>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
>> >> >> able
>> >> >>> to
>> >> >>>>>>> define
>> >> >>>>>>>>>>> where
>> >> >>>>>>>>>>>>> the
>> >> >>>>>>>>>>>>>>>>> job
>> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is actually
>> >> >>>>>>> independent
>> >> >>>>>>>>>>> of the
>> >> >>>>>>>>>>>>>>>> job
>> >> >>>>>>>>>>>>>>>>>>>> development and is something which is decided
>> >> >> at
>> >> >>>>>>>> deployment
>> >> >>>>>>>>>>>> time.
>> >> >>>>>>>>>>>>>>>> To
>> >> >>>>>>>>>>>>>>>>> me
>> >> >>>>>>>>>>>>>>>>>>> it
>> >> >>>>>>>>>>>>>>>>>>>> makes sense that the ExecutionEnvironment is
>> >> >> not
>> >> >>>>>>> directly
>> >> >>>>>>>>>>>>>>>> initialized
>> >> >>>>>>>>>>>>>>>>>> by
>> >> >>>>>>>>>>>>>>>>>>>> the user and instead context sensitive how you
>> >> >>>> want
>> >> >>>>> to
>> >> >>>>>>>>>>> execute
>> >> >>>>>>>>>>>>> your
>> >> >>>>>>>>>>>>>>>>> job
>> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for example). However, I
>> >> >>> agree
>> >> >>>>>> that
>> >> >>>>>>>> the
>> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should give you access to
>> >> >>> the
>> >> >>>>>>>>>>> ClusterClient
>> >> >>>>>>>>>>>>>>>> and
>> >> >>>>>>>>>>>>>>>>> to
>> >> >>>>>>>>>>>>>>>>>>> the
>> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the JobGraph or a
>> >> >> job
>> >> >>>>> plan).
>> >> >>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>> Cheers,
>> >> >>>>>>>>>>>>>>>>>>>> Till
>> >> >>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
>> >> >>>>>>>>>>> zjffdu@gmail.com>
>> >> >>>>>>>>>>>>>>>> wrote:
>> >> >>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
>> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You are right that I
>> >> >>>>> expect
>> >> >>>>>>>> better
>> >> >>>>>>>>>>>>>>>>>>> programmatic
>> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api which could be
>> >> >> used
>> >> >>> by
>> >> >>>>>>>>>>> downstream
>> >> >>>>>>>>>>>>>>>>> project.
>> >> >>>>>>>>>>>>>>>>>>> And
>> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the flink ecosystem.
>> >> >> When
>> >> >>> I
>> >> >>>>> look
>> >> >>>>>>> at
>> >> >>>>>>>>>>> the
>> >> >>>>>>>>>>>> code
>> >> >>>>>>>>>>>>>>>>> of
>> >> >>>>>>>>>>>>>>>>>>>> flink
>> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I believe they
>> >> >> are
>> >> >>>> not
>> >> >>>>>> the
>> >> >>>>>>>>>>> core of
>> >> >>>>>>>>>>>>>>>>> flink,
>> >> >>>>>>>>>>>>>>>>>>> but
>> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of flink), I find
>> >> >> many
>> >> >>>>>>> duplicated
>> >> >>>>>>>>>>> code
>> >> >>>>>>>>>>>>>>>> for
>> >> >>>>>>>>>>>>>>>>>>>> creating
>> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user provided
>> >> >> configuration
>> >> >>>>>>>>>>> (configuration
>> >> >>>>>>>>>>>>>>>>> format
>> >> >>>>>>>>>>>>>>>>>>> may
>> >> >>>>>>>>>>>>>>>>>>>> be
>> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell and sql-client)
>> >> >> and
>> >> >>>> then
>> >> >>>>>> use
>> >> >>>>>>>>>>> that
>> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
>> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't think this is
>> >> >>>>> convenient
>> >> >>>>>>> for
>> >> >>>>>>>>>>>>>>>> downstream
>> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is that downstream
>> >> >>>> project
>> >> >>>>>> only
>> >> >>>>>>>>>>> needs
>> >> >>>>>>>>>>>> to
>> >> >>>>>>>>>>>>>>>>>>> provide
>> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration info (maybe
>> >> >> introducing
>> >> >>>>> class
>> >> >>>>>>>>>>>> FlinkConf),
>> >> >>>>>>>>>>>>>>>>> and
>> >> >>>>>>>>>>>>>>>>>>>> then
>> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment based on this
>> >> >>>> FlinkConf,
>> >> >>>>>> and
>> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will create the proper
>> >> >>>>>>>> ClusterClient.
>> >> >>>>>>>>>>> It
>> >> >>>>>>>>>>>> not
>> >> >>>>>>>>>>>>>>>>>> only
>> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream project
>> >> >> development
>> >> >>>> but
>> >> >>>>>> also
>> >> >>>>>>>> be
>> >> >>>>>>>>>>>>>>>> helpful
>> >> >>>>>>>>>>>>>>>>>> for
>> >> >>>>>>>>>>>>>>>>>>>>> their integration test with flink. Here's one
>> >> >>>>> sample
>> >> >>>>>>> code
>> >> >>>>>>>>>>>> snippet
>> >> >>>>>>>>>>>>>>>>>> that
>> >> >>>>>>>>>>>>>>>>>>> I
>> >> >>>>>>>>>>>>>>>>>>>>> expect.
>> >> >>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>> val conf = new FlinkConf().mode("yarn")
>> >> >>>>>>>>>>>>>>>>>>>>> val env = new ExecutionEnvironment(conf)
>> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
>> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
>> >> >>>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
>> >> >>>>>>>>>>>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
>> >> >>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
>> >> >>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
>> >> >>>>> 于2018年12月11日周二
>> >> >>>>>>>>>>> 下午6:28写道：
>> >> >>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>> >> >>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to provide the
>> >> >> user
>> >> >>>> with
>> >> >>>>>>>> better
>> >> >>>>>>>>>>>>>>>>>>> programmatic
>> >> >>>>>>>>>>>>>>>>>>>>> job
>> >> >>>>>>>>>>>>>>>>>>>>>> control. There was actually an effort to
>> >> >>> achieve
>> >> >>>>>> this
>> >> >>>>>>>> but
>> >> >>>>>>>>>>> it
>> >> >>>>>>>>>>>>>>>> has
>> >> >>>>>>>>>>>>>>>>>>> never
>> >> >>>>>>>>>>>>>>>>>>>>> been
>> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However, there are some
>> >> >>>> improvement
>> >> >>>>>> in
>> >> >>>>>>>> the
>> >> >>>>>>>>>>> code
>> >> >>>>>>>>>>>>>>>>> base
>> >> >>>>>>>>>>>>>>>>>>>> now.
>> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the NewClusterClient
>> >> >>>> interface
>> >> >>>>>>> which
>> >> >>>>>>>>>>>>>>>> offers a
>> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission. But I agree
>> >> >> that
>> >> >>> we
>> >> >>>>>> need
>> >> >>>>>>> to
>> >> >>>>>>>>>>>>>>>> improve
>> >> >>>>>>>>>>>>>>>>>>> Flink
>> >> >>>>>>>>>>>>>>>>>>>> in
>> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
>> >> >>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if exposing all
>> >> >>>>>> ClusterClient
>> >> >>>>>>>>>>> calls
>> >> >>>>>>>>>>>>>>>> via
>> >> >>>>>>>>>>>>>>>>>> the
>> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment because it would
>> >> >> clutter
>> >> >>>> the
>> >> >>>>>>> class
>> >> >>>>>>>>>>> and
>> >> >>>>>>>>>>>>>>>> would
>> >> >>>>>>>>>>>>>>>>>> not
>> >> >>>>>>>>>>>>>>>>>>>> be
>> >> >>>>>>>>>>>>>>>>>>>>> a
>> >> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns. Instead one
>> >> >> idea
>> >> >>>>> could
>> >> >>>>>> be
>> >> >>>>>>>> to
>> >> >>>>>>>>>>>>>>>>> retrieve
>> >> >>>>>>>>>>>>>>>>>>> the
>> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from the
>> >> >>>>> ExecutionEnvironment
>> >> >>>>>>>> which
>> >> >>>>>>>>>>> can
>> >> >>>>>>>>>>>>>>>>> then
>> >> >>>>>>>>>>>>>>>>>> be
>> >> >>>>>>>>>>>>>>>>>>>>> used
>> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job control. But before we
>> >> >>> start
>> >> >>>>> an
>> >> >>>>>>>> effort
>> >> >>>>>>>>>>>>>>>> here,
>> >> >>>>>>>>>>>>>>>>> we
>> >> >>>>>>>>>>>>>>>>>>>> need
>> >> >>>>>>>>>>>>>>>>>>>>> to
>> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what functionality we want
>> >> >>> to
>> >> >>>>>>> provide.
>> >> >>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was that we have the
>> >> >>>>>>>> ClusterDescriptor
>> >> >>>>>>>>>>>>>>>>>> describing
>> >> >>>>>>>>>>>>>>>>>>>> how
>> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager like Yarn or
>> >> >> Mesos.
>> >> >>>> The
>> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
>> >> >>>>>>>>>>>>>>>>>>>> can
>> >> >>>>>>>>>>>>>>>>>>>>> be
>> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink clusters (job and
>> >> >>>>> session)
>> >> >>>>>>> and
>> >> >>>>>>>>>>> gives
>> >> >>>>>>>>>>>>>>>>> you a
>> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The ClusterClient controls
>> >> >> the
>> >> >>>>>> cluster
>> >> >>>>>>>>>>> (e.g.
>> >> >>>>>>>>>>>>>>>>>>> submitting
>> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running jobs). And then
>> >> >>> there
>> >> >>>>> was
>> >> >>>>>>> the
>> >> >>>>>>>>>>> idea
>> >> >>>>>>>>>>>> to
>> >> >>>>>>>>>>>>>>>>>>>>> introduce a
>> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain from the
>> >> >>>> ClusterClient
>> >> >>>>> to
>> >> >>>>>>>>>>> trigger
>> >> >>>>>>>>>>>>>>>> job
>> >> >>>>>>>>>>>>>>>>>>>> specific
>> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a savepoint,
>> >> >>> cancelling
>> >> >>>>> the
>> >> >>>>>>>> job).
>> >> >>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>> [1]
>> >> >>>>>> https://issues.apache.org/jira/browse/FLINK-4272
>> >> >>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
>> >> >>>>>>>>>>>>>>>>>>>>>> Till
>> >> >>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang
>> >> >> <
>> >> >>>>>>>>>>> zjffdu@gmail.com
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>> wrote:
>> >> >>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate flink into apache
>> >> >>>>> zeppelin
>> >> >>>>>>>>>>> which is
>> >> >>>>>>>>>>>>>>>> an
>> >> >>>>>>>>>>>>>>>>>>>>>> interactive
>> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several issues that is
>> >> >>>> caused
>> >> >>>>>> by
>> >> >>>>>>>>>>> flink
>> >> >>>>>>>>>>>>>>>>> client
>> >> >>>>>>>>>>>>>>>>>>>> api.
>> >> >>>>>>>>>>>>>>>>>>>>> So
>> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the following changes
>> >> >>> for
>> >> >>>>>> flink
>> >> >>>>>>>>>>> client
>> >> >>>>>>>>>>>>>>>>> api.
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking execution.
>> >> >> Currently,
>> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
>> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which would do 2
>> >> >> things,
>> >> >>>>> first
>> >> >>>>>>>>>>> submit
>> >> >>>>>>>>>>>>>>>> job
>> >> >>>>>>>>>>>>>>>>>> and
>> >> >>>>>>>>>>>>>>>>>>>> then
>> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is finished. I'd like
>> >> >>>>>>> introduce a
>> >> >>>>>>>>>>>>>>>>>> nonblocking
>> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
>> >> >>>> ExecutionEnvironment#submit
>> >> >>>>>>> which
>> >> >>>>>>>>>>> only
>> >> >>>>>>>>>>>>>>>>>> submit
>> >> >>>>>>>>>>>>>>>>>>>> job
>> >> >>>>>>>>>>>>>>>>>>>>>> and
>> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to client. And allow user
>> >> >>> to
>> >> >>>>>> query
>> >> >>>>>>>> the
>> >> >>>>>>>>>>> job
>> >> >>>>>>>>>>>>>>>>>> status
>> >> >>>>>>>>>>>>>>>>>>>> via
>> >> >>>>>>>>>>>>>>>>>>>>>> the
>> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >> ExecutionEnvironment/StreamExecutionEnvironment,
>> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to cancel job is via
>> >> >>> cli
>> >> >>>>>>>>>>> (bin/flink),
>> >> >>>>>>>>>>>>>>>>> this
>> >> >>>>>>>>>>>>>>>>>>> is
>> >> >>>>>>>>>>>>>>>>>>>>> not
>> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream project to use
>> >> >> this
>> >> >>>>>>> feature.
>> >> >>>>>>>>>>> So I'd
>> >> >>>>>>>>>>>>>>>>>> like
>> >> >>>>>>>>>>>>>>>>>>> to
>> >> >>>>>>>>>>>>>>>>>>>>> add
>> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in ExecutionEnvironment
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
>> >> >>>>>>>>>>>>>>>>>>>>>
>> >> >>> ExecutionEnvironment/StreamExecutionEnvironment.
>> >> >>>>>>>>>>>>>>>>>>>>>> It
>> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api, we should use
>> >> >>>>>>>>>>> ExecutionEnvironment
>> >> >>>>>>>>>>>>>>>> as
>> >> >>>>>>>>>>>>>>>>>> the
>> >> >>>>>>>>>>>>>>>>>>>>>> unified
>> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to integrate with
>> >> >> flink.
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job execution
>> >> >> lifecycle.
>> >> >>>>>>> Something
>> >> >>>>>>>>>>> like
>> >> >>>>>>>>>>>>>>>>>>>> following,
>> >> >>>>>>>>>>>>>>>>>>>>> so
>> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project can do custom logic
>> >> >>> in
>> >> >>>>> the
>> >> >>>>>>>>>>> lifecycle
>> >> >>>>>>>>>>>>>>>> of
>> >> >>>>>>>>>>>>>>>>>>> job.
>> >> >>>>>>>>>>>>>>>>>>>>> e.g.
>> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the jobId after job
>> >> >> is
>> >> >>>>>>> submitted
>> >> >>>>>>>>>>> and
>> >> >>>>>>>>>>>>>>>>> then
>> >> >>>>>>>>>>>>>>>>>>> use
>> >> >>>>>>>>>>>>>>>>>>>>> this
>> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later when necessary.
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>> public interface JobListener {
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID jobId);
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobExecuted(JobExecutionResult
>> >> >>>>> jobResult);
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID jobId);
>> >> >>>>>>>>>>>>>>>>>>>>>>> }
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in ExecutionEnvironment.
>> >> >>>>>> Currently
>> >> >>>>>>> it
>> >> >>>>>>>>>>> is
>> >> >>>>>>>>>>>>>>>>>>> disabled,
>> >> >>>>>>>>>>>>>>>>>>>>> but
>> >> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient for third party
>> >> >> to
>> >> >>>>>>>> submitting
>> >> >>>>>>>>>>> jobs
>> >> >>>>>>>>>>>>>>>>>>>>>> continually.
>> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it again.
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client api into
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment.
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue which needs more
>> >> >>>>> careful
>> >> >>>>>>>>>>> thinking
>> >> >>>>>>>>>>>>>>>> and
>> >> >>>>>>>>>>>>>>>>>>>> design.
>> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features of flink is
>> >> >>> exposed
>> >> >>>> in
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment,
>> >> >>>>>> but
>> >> >>>>>>>>>>> some are
>> >> >>>>>>>>>>>>>>>>>>> exposed
>> >> >>>>>>>>>>>>>>>>>>>>> in
>> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like the cancel and
>> >> >>>>> savepoint I
>> >> >>>>>>>>>>> mentioned
>> >> >>>>>>>>>>>>>>>>>>> above.
>> >> >>>>>>>>>>>>>>>>>>>> I
>> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is due to that flink
>> >> >>>> didn't
>> >> >>>>>>> unify
>> >> >>>>>>>>>>> the
>> >> >>>>>>>>>>>>>>>>>>>> interaction
>> >> >>>>>>>>>>>>>>>>>>>>>> with
>> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3 scenarios of flink
>> >> >>>> operation
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.  Flink will create
>> >> >>>>>>>>>>> LocalEnvironment
>> >> >>>>>>>>>>>>>>>>> and
>> >> >>>>>>>>>>>>>>>>>>>> then
>> >> >>>>>>>>>>>>>>>>>>>>>> use
>> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to create
>> >> >>> LocalExecutor
>> >> >>>>> for
>> >> >>>>>>> job
>> >> >>>>>>>>>>>>>>>>>> execution.
>> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution. Flink will create
>> >> >>>>>>>> ClusterClient
>> >> >>>>>>>>>>>>>>>>> first
>> >> >>>>>>>>>>>>>>>>>>> and
>> >> >>>>>>>>>>>>>>>>>>>>> then
>> >> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment based on the
>> >> >>>>>>> ClusterClient
>> >> >>>>>>>>>>> and
>> >> >>>>>>>>>>>>>>>>> then
>> >> >>>>>>>>>>>>>>>>>>> run
>> >> >>>>>>>>>>>>>>>>>>>>> the
>> >> >>>>>>>>>>>>>>>>>>>>>>> job.
>> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink will create
>> >> >>>>>> ClusterClient
>> >> >>>>>>>>>>> first
>> >> >>>>>>>>>>>>>>>> and
>> >> >>>>>>>>>>>>>>>>>>> then
>> >> >>>>>>>>>>>>>>>>>>>>>> cancel
>> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this ClusterClient.
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the above 3 scenarios.
>> >> >>> Flink
>> >> >>>>>> didn't
>> >> >>>>>>>>>>> use the
>> >> >>>>>>>>>>>>>>>>>> same
>> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to interact with flink
>> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is following:
>> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
>> >> >>>>>> LocalEnvironment/RemoteEnvironment
>> >> >>>>>>>>>>> (based
>> >> >>>>>>>>>>>>>>>> on
>> >> >>>>>>>>>>>>>>>>>> user
>> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this Environment to
>> >> >>>> create
>> >> >>>>>>>> proper
>> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
>> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or RestClusterClient)
>> >> >> to
>> >> >>>>>>>> interactive
>> >> >>>>>>>>>>> with
>> >> >>>>>>>>>>>>>>>>>>> Flink (
>> >> >>>>>>>>>>>>>>>>>>>>> job
>> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the process of local
>> >> >>>>>> execution
>> >> >>>>>>>> and
>> >> >>>>>>>>>>>>>>>> remote
>> >> >>>>>>>>>>>>>>>>>>>>>> execution.
>> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for third party to
>> >> >>>>> integrate
>> >> >>>>>>> with
>> >> >>>>>>>>>>>>>>>> flink,
>> >> >>>>>>>>>>>>>>>>>>>> because
>> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the unified entry
>> >> >>> point
>> >> >>>>> for
>> >> >>>>>>>>>>> flink.
>> >> >>>>>>>>>>>>>>>> What
>> >> >>>>>>>>>>>>>>>>>>> third
>> >> >>>>>>>>>>>>>>>>>>>>>> party
>> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass configuration to
>> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
>> >> >>>>>>>>>>>>>>>>>> and
>> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will do the right
>> >> >> thing
>> >> >>>>> based
>> >> >>>>>> on
>> >> >>>>>>>> the
>> >> >>>>>>>>>>>>>>>>>>>>> configuration.
>> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be considered as flink
>> >> >> api
>> >> >>>>>>> consumer.
>> >> >>>>>>>>>>> it
>> >> >>>>>>>>>>>>>>>> just
>> >> >>>>>>>>>>>>>>>>>>> pass
>> >> >>>>>>>>>>>>>>>>>>>>> the
>> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to ExecutionEnvironment and
>> >> >> let
>> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >> >>>>>>>>>>>>>>>>>>> to
>> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper ClusterClient instead of
>> >> >>>>> letting
>> >> >>>>>>> cli
>> >> >>>>>>>> to
>> >> >>>>>>>>>>>>>>>>> create
>> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code refactoring, so
>> >> >> I
>> >> >>>>> think
>> >> >>>>>> we
>> >> >>>>>>>> can
>> >> >>>>>>>>>>>>>>>> defer
>> >> >>>>>>>>>>>>>>>>>> it
>> >> >>>>>>>>>>>>>>>>>>>> for
>> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5 could be done at
>> >> >>>> once I
>> >> >>>>>>>>>>> believe.
>> >> >>>>>>>>>>>>>>>> Let
>> >> >>>>>>>>>>>>>>>>> me
>> >> >>>>>>>>>>>>>>>>>>>> know
>> >> >>>>>>>>>>>>>>>>>>>>>> your
>> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback, thanks
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>> --
>> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >> >>>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>> --
>> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
>> >> >>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >> >>>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>> --
>> >> >>>>>>>>>>>>>>>>>>> Best Regards
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >> >>>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>> --
>> >> >>>>>>>>>>>>>>>>> Best Regards
>> >> >>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>> Jeff Zhang
>> >> >>>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>> --
>> >> >>>>>>>>>>>>>>> Best Regards
>> >> >>>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>> Jeff Zhang
>> >> >>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>>
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> --
>> >> >>>>>>>>>>>>> Best Regards
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>> Jeff Zhang
>> >> >>>>>>>>>>>>>
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>> --
>> >> >>>>>>>> Best Regards
>> >> >>>>>>>>
>> >> >>>>>>>> Jeff Zhang
>> >> >>>>>>>>
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Best Regards
>> >> >>>>>
>> >> >>>>> Jeff Zhang
>> >> >>>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > Best Regards
>> >> >
>> >> > Jeff Zhang
>> >>
>> >>
>>
>
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Flavio Pompermaier <po...@okkam.it>.

In my opinion the client should not use any environment to get the Job
graph because the jar should reside ONLY on the cluster (and not in the
client classpath otherwise there are always inconsistencies between client
and Flink Job manager's classpath).
In the YARN, Mesos and Kubernetes scenarios you have the jar but you could
start a cluster that has the jar on the Job Manager as well (but this is
the only case where I think you can assume that the client has the jar on
the classpath..in the REST job submission you don't have any classpath).

Thus, always in my opinion, the JobGraph should be generated by the Job
Manager REST API.


On Tue, Aug 20, 2019 at 9:00 AM Zili Chen <wa...@gmail.com> wrote:

> I would like to involve Till & Stephan here to clarify some concept of
> per-job mode.
>
> The term per-job is one of modes a cluster could run on. It is mainly aimed
> at spawn
> a dedicated cluster for a specific job while the job could be packaged with
> Flink
> itself and thus the cluster initialized with job so that get rid of a
> separated
> submission step.
>
> This is useful for container deployments where one create his image with
> the job
> and then simply deploy the container.
>
> However, it is out of client scope since a client(ClusterClient for
> example) is for
> communicate with an existing cluster and performance actions. Currently, in
> per-job
> mode, we extract the job graph and bundle it into cluster deployment and
> thus no
> concept of client get involved. It looks like reasonable to exclude the
> deployment
> of per-job cluster from client api and use dedicated utility
> classes(deployers) for
> deployment.
>
> Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午12:37写道：
>
> > Hi Aljoscha,
> >
> > Thanks for your reply and participance. The Google Doc you linked to
> > requires
> > permission and I think you could use a share link instead.
> >
> > I agree with that we almost reach a consensus that JobClient is necessary
> > to
> > interacte with a running Job.
> >
> > Let me check your open questions one by one.
> >
> > 1. Separate cluster creation and job submission for per-job mode.
> >
> > As you mentioned here is where the opinions diverge. In my document there
> > is
> > an alternative[2] that proposes excluding per-job deployment from client
> > api
> > scope and now I find it is more reasonable we do the exclusion.
> >
> > When in per-job mode, a dedicated JobCluster is launched to execute the
> > specific job. It is like a Flink Application more than a submission
> > of Flink Job. Client only takes care of job submission and assume there
> is
> > an existing cluster. In this way we are able to consider per-job issues
> > individually and JobClusterEntrypoint would be the utility class for
> > per-job
> > deployment.
> >
> > Nevertheless, user program works in both session mode and per-job mode
> > without
> > necessary to change code. JobClient in per-job mode is returned from
> > env.execute as normal. However, it would be no longer a wrapper of
> > RestClusterClient but a wrapper of PerJobClusterClient which communicates
> > to Dispatcher locally.
> >
> > 2. How to deal with plan preview.
> >
> > With env.compile functions users can get JobGraph or FlinkPlan and thus
> > they can preview the plan with programming. Typically it looks like
> >
> > if (preview configured) {
> >     FlinkPlan plan = env.compile();
> >     new JSONDumpGenerator(...).dump(plan);
> > } else {
> >     env.execute();
> > }
> >
> > And `flink info` would be invalid any more.
> >
> > 3. How to deal with Jar Submission at the Web Frontend.
> >
> > There is one more thread talked on this topic[1]. Apart from removing
> > the functions there are two alternatives.
> >
> > One is to introduce an interface has a method returns JobGraph/FilnkPlan
> > and Jar Submission only support main-class implements this interface.
> > And then extract the JobGraph/FlinkPlan just by calling the method.
> > In this way, it is even possible to consider a separation of job creation
> > and job submission.
> >
> > The other is, as you mentioned, let execute() do the actual execution.
> > We won't execute the main method in the WebFrontend but spawn a process
> > at WebMonitor side to execute. For return part we could generate the
> > JobID from WebMonitor and pass it to the execution environemnt.
> >
> > 4. How to deal with detached mode.
> >
> > I think detached mode is a temporary solution for non-blocking
> submission.
> > In my document both submission and execution return a CompletableFuture
> and
> > users control whether or not wait for the result. In this point we don't
> > need a detached option but the functionality is covered.
> >
> > 5. How does per-job mode interact with interactive programming.
> >
> > All of YARN, Mesos and Kubernetes scenarios follow the pattern launch a
> > JobCluster now. And I don't think there would be inconsistency between
> > different resource management.
> >
> > Best,
> > tison.
> >
> > [1]
> >
> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> > [2]
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
> >
> > Aljoscha Krettek <al...@apache.org> 于2019年8月16日周五 下午9:20写道：
> >
> >> Hi,
> >>
> >> I read both Jeffs initial design document and the newer document by
> >> Tison. I also finally found the time to collect our thoughts on the
> issue,
> >> I had quite some discussions with Kostas and this is the result: [1].
> >>
> >> I think overall we agree that this part of the code is in dire need of
> >> some refactoring/improvements but I think there are still some open
> >> questions and some differences in opinion what those refactorings should
> >> look like.
> >>
> >> I think the API-side is quite clear, i.e. we need some JobClient API
> that
> >> allows interacting with a running Job. It could be worthwhile to spin
> that
> >> off into a separate FLIP because we can probably find consensus on that
> >> part more easily.
> >>
> >> For the rest, the main open questions from our doc are these:
> >>
> >>   - Do we want to separate cluster creation and job submission for
> >> per-job mode? In the past, there were conscious efforts to *not*
> separate
> >> job submission from cluster creation for per-job clusters for Mesos,
> YARN,
> >> Kubernets (see StandaloneJobClusterEntryPoint). Tison suggests in his
> >> design document to decouple this in order to unify job submission.
> >>
> >>   - How to deal with plan preview, which needs to hijack execute() and
> >> let the outside code catch an exception?
> >>
> >>   - How to deal with Jar Submission at the Web Frontend, which needs to
> >> hijack execute() and let the outside code catch an exception?
> >> CliFrontend.run() “hijacks” ExecutionEnvironment.execute() to get a
> >> JobGraph and then execute that JobGraph manually. We could get around
> that
> >> by letting execute() do the actual execution. One caveat for this is
> that
> >> now the main() method doesn’t return (or is forced to return by
> throwing an
> >> exception from execute()) which means that for Jar Submission from the
> >> WebFrontend we have a long-running main() method running in the
> >> WebFrontend. This doesn’t sound very good. We could get around this by
> >> removing the plan preview feature and by removing Jar
> Submission/Running.
> >>
> >>   - How to deal with detached mode? Right now, DetachedEnvironment will
> >> execute the job and return immediately. If users control when they want
> to
> >> return, by waiting on the job completion future, how do we deal with
> this?
> >> Do we simply remove the distinction between detached/non-detached?
> >>
> >>   - How does per-job mode interact with “interactive programming”
> >> (FLIP-36). For YARN, each execute() call could spawn a new Flink YARN
> >> cluster. What about Mesos and Kubernetes?
> >>
> >> The first open question is where the opinions diverge, I think. The rest
> >> are just open questions and interesting things that we need to consider.
> >>
> >> Best,
> >> Aljoscha
> >>
> >> [1]
> >>
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> >> <
> >>
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> >> >
> >>
> >> > On 31. Jul 2019, at 15:23, Jeff Zhang <zj...@gmail.com> wrote:
> >> >
> >> > Thanks tison for the effort. I left a few comments.
> >> >
> >> >
> >> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三 下午8:24写道：
> >> >
> >> >> Hi Flavio,
> >> >>
> >> >> Thanks for your reply.
> >> >>
> >> >> Either current impl and in the design, ClusterClient
> >> >> never takes responsibility for generating JobGraph.
> >> >> (what you see in current codebase is several class methods)
> >> >>
> >> >> Instead, user describes his program in the main method
> >> >> with ExecutionEnvironment apis and calls env.compile()
> >> >> or env.optimize() to get FlinkPlan and JobGraph respectively.
> >> >>
> >> >> For listing main classes in a jar and choose one for
> >> >> submission, you're now able to customize a CLI to do it.
> >> >> Specifically, the path of jar is passed as arguments and
> >> >> in the customized CLI you list main classes, choose one
> >> >> to submit to the cluster.
> >> >>
> >> >> Best,
> >> >> tison.
> >> >>
> >> >>
> >> >> Flavio Pompermaier <po...@okkam.it> 于2019年7月31日周三 下午8:12写道：
> >> >>
> >> >>> Just one note on my side: it is not clear to me whether the client
> >> needs
> >> >> to
> >> >>> be able to generate a job graph or not.
> >> >>> In my opinion, the job jar must resides only on the
> server/jobManager
> >> >> side
> >> >>> and the client requires a way to get the job graph.
> >> >>> If you really want to access to the job graph, I'd add a dedicated
> >> method
> >> >>> on the ClusterClient. like:
> >> >>>
> >> >>>   - getJobGraph(jarId, mainClass): JobGraph
> >> >>>   - listMainClasses(jarId): List<String>
> >> >>>
> >> >>> These would require some addition also on the job manager endpoint
> as
> >> >>> well..what do you think?
> >> >>>
> >> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <wa...@gmail.com>
> >> wrote:
> >> >>>
> >> >>>> Hi all,
> >> >>>>
> >> >>>> Here is a document[1] on client api enhancement from our
> perspective.
> >> >>>> We have investigated current implementations. And we propose
> >> >>>>
> >> >>>> 1. Unify the implementation of cluster deployment and job
> submission
> >> in
> >> >>>> Flink.
> >> >>>> 2. Provide programmatic interfaces to allow flexible job and
> cluster
> >> >>>> management.
> >> >>>>
> >> >>>> The first proposal is aimed at reducing code paths of cluster
> >> >> deployment
> >> >>>> and
> >> >>>> job submission so that one can adopt Flink in his usage easily. The
> >> >>> second
> >> >>>> proposal is aimed at providing rich interfaces for advanced users
> >> >>>> who want to make accurate control of these stages.
> >> >>>>
> >> >>>> Quick reference on open questions:
> >> >>>>
> >> >>>> 1. Exclude job cluster deployment from client side or redefine the
> >> >>> semantic
> >> >>>> of job cluster? Since it fits in a process quite different from
> >> session
> >> >>>> cluster deployment and job submission.
> >> >>>>
> >> >>>> 2. Maintain the codepaths handling class o.a.f.api.common.Program
> or
> >> >>>> implement customized program handling logic by customized
> >> CliFrontend?
> >> >>>> See also this thread[2] and the document[1].
> >> >>>>
> >> >>>> 3. Expose ClusterClient as public api or just expose api in
> >> >>>> ExecutionEnvironment
> >> >>>> and delegate them to ClusterClient? Further, in either way is it
> >> worth
> >> >> to
> >> >>>> introduce a JobClient which is an encapsulation of ClusterClient
> that
> >> >>>> associated to specific job?
> >> >>>>
> >> >>>> Best,
> >> >>>> tison.
> >> >>>>
> >> >>>> [1]
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> >> >>>> [2]
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> >> >>>>
> >> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三 上午9:19写道：
> >> >>>>
> >> >>>>> Thanks Stephan, I will follow up this issue in next few weeks, and
> >> >> will
> >> >>>>> refine the design doc. We could discuss more details after 1.9
> >> >> release.
> >> >>>>>
> >> >>>>> Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：
> >> >>>>>
> >> >>>>>> Hi all!
> >> >>>>>>
> >> >>>>>> This thread has stalled for a bit, which I assume ist mostly due
> to
> >> >>> the
> >> >>>>>> Flink 1.9 feature freeze and release testing effort.
> >> >>>>>>
> >> >>>>>> I personally still recognize this issue as one important to be
> >> >>> solved.
> >> >>>>> I'd
> >> >>>>>> be happy to help resume this discussion soon (after the 1.9
> >> >> release)
> >> >>>> and
> >> >>>>>> see if we can do some step towards this in Flink 1.10.
> >> >>>>>>
> >> >>>>>> Best,
> >> >>>>>> Stephan
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <
> >> >>>>> pompermaier@okkam.it>
> >> >>>>>> wrote:
> >> >>>>>>
> >> >>>>>>> That's exactly what I suggested a long time ago: the Flink REST
> >> >>>> client
> >> >>>>>>> should not require any Flink dependency, only http library to
> >> >> call
> >> >>>> the
> >> >>>>>> REST
> >> >>>>>>> services to submit and monitor a job.
> >> >>>>>>> What I suggested also in [1] was to have a way to automatically
> >> >>>> suggest
> >> >>>>>> the
> >> >>>>>>> user (via a UI) the available main classes and their required
> >> >>>>>>> parameters[2].
> >> >>>>>>> Another problem we have with Flink is that the Rest client and
> >> >> the
> >> >>>> CLI
> >> >>>>>> one
> >> >>>>>>> behaves differently and we use the CLI client (via ssh) because
> >> >> it
> >> >>>>> allows
> >> >>>>>>> to call some other method after env.execute() [3] (we have to
> >> >> call
> >> >>>>>> another
> >> >>>>>>> REST service to signal the end of the job).
> >> >>>>>>> Int his regard, a dedicated interface, like the JobListener
> >> >>> suggested
> >> >>>>> in
> >> >>>>>>> the previous emails, would be very helpful (IMHO).
> >> >>>>>>>
> >> >>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-10864
> >> >>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-10862
> >> >>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-10879
> >> >>>>>>>
> >> >>>>>>> Best,
> >> >>>>>>> Flavio
> >> >>>>>>>
> >> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <zj...@gmail.com>
> >> >>> wrote:
> >> >>>>>>>
> >> >>>>>>>> Hi, Tison,
> >> >>>>>>>>
> >> >>>>>>>> Thanks for your comments. Overall I agree with you that it is
> >> >>>>> difficult
> >> >>>>>>> for
> >> >>>>>>>> down stream project to integrate with flink and we need to
> >> >>> refactor
> >> >>>>> the
> >> >>>>>>>> current flink client api.
> >> >>>>>>>> And I agree that CliFrontend should only parsing command line
> >> >>>>> arguments
> >> >>>>>>> and
> >> >>>>>>>> then pass them to ExecutionEnvironment. It is
> >> >>>> ExecutionEnvironment's
> >> >>>>>>>> responsibility to compile job, create cluster, and submit job.
> >> >>>>> Besides
> >> >>>>>>>> that, Currently flink has many ExecutionEnvironment
> >> >>>> implementations,
> >> >>>>>> and
> >> >>>>>>>> flink will use the specific one based on the context. IMHO, it
> >> >> is
> >> >>>> not
> >> >>>>>>>> necessary, ExecutionEnvironment should be able to do the right
> >> >>>> thing
> >> >>>>>>> based
> >> >>>>>>>> on the FlinkConf it is received. Too many ExecutionEnvironment
> >> >>>>>>>> implementation is another burden for downstream project
> >> >>>> integration.
> >> >>>>>>>>
> >> >>>>>>>> One thing I'd like to mention is flink's scala shell and sql
> >> >>>> client,
> >> >>>>>>>> although they are sub-modules of flink, they could be treated
> >> >> as
> >> >>>>>>> downstream
> >> >>>>>>>> project which use flink's client api. Currently you will find
> >> >> it
> >> >>> is
> >> >>>>> not
> >> >>>>>>>> easy for them to integrate with flink, they share many
> >> >> duplicated
> >> >>>>> code.
> >> >>>>>>> It
> >> >>>>>>>> is another sign that we should refactor flink client api.
> >> >>>>>>>>
> >> >>>>>>>> I believe it is a large and hard change, and I am afraid we can
> >> >>> not
> >> >>>>>> keep
> >> >>>>>>>> compatibility since many of changes are user facing.
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：
> >> >>>>>>>>
> >> >>>>>>>>> Hi all,
> >> >>>>>>>>>
> >> >>>>>>>>> After a closer look on our client apis, I can see there are
> >> >> two
> >> >>>>> major
> >> >>>>>>>>> issues to consistency and integration, namely different
> >> >>>> deployment
> >> >>>>> of
> >> >>>>>>>>> job cluster which couples job graph creation and cluster
> >> >>>>> deployment,
> >> >>>>>>>>> and submission via CliFrontend confusing control flow of job
> >> >>>> graph
> >> >>>>>>>>> compilation and job submission. I'd like to follow the
> >> >> discuss
> >> >>>>> above,
> >> >>>>>>>>> mainly the process described by Jeff and Stephan, and share
> >> >> my
> >> >>>>>>>>> ideas on these issues.
> >> >>>>>>>>>
> >> >>>>>>>>> 1) CliFrontend confuses the control flow of job compilation
> >> >> and
> >> >>>>>>>> submission.
> >> >>>>>>>>> Following the process of job submission Stephan and Jeff
> >> >>>> described,
> >> >>>>>>>>> execution environment knows all configs of the cluster and
> >> >>>>>>> topos/settings
> >> >>>>>>>>> of the job. Ideally, in the main method of user program, it
> >> >>> calls
> >> >>>>>>>> #execute
> >> >>>>>>>>> (or named #submit) and Flink deploys the cluster, compile the
> >> >>> job
> >> >>>>>> graph
> >> >>>>>>>>> and submit it to the cluster. However, current CliFrontend
> >> >> does
> >> >>>> all
> >> >>>>>>> these
> >> >>>>>>>>> things inside its #runProgram method, which introduces a lot
> >> >> of
> >> >>>>>>>> subclasses
> >> >>>>>>>>> of (stream) execution environment.
> >> >>>>>>>>>
> >> >>>>>>>>> Actually, it sets up an exec env that hijacks the
> >> >>>>>> #execute/executePlan
> >> >>>>>>>>> method, initializes the job graph and abort execution. And
> >> >> then
> >> >>>>>>>>> control flow back to CliFrontend, it deploys the cluster(or
> >> >>>>> retrieve
> >> >>>>>>>>> the client) and submits the job graph. This is quite a
> >> >> specific
> >> >>>>>>> internal
> >> >>>>>>>>> process inside Flink and none of consistency to anything.
> >> >>>>>>>>>
> >> >>>>>>>>> 2) Deployment of job cluster couples job graph creation and
> >> >>>> cluster
> >> >>>>>>>>> deployment. Abstractly, from user job to a concrete
> >> >> submission,
> >> >>>> it
> >> >>>>>>>> requires
> >> >>>>>>>>>
> >> >>>>>>>>>     create JobGraph --\
> >> >>>>>>>>>
> >> >>>>>>>>> create ClusterClient -->  submit JobGraph
> >> >>>>>>>>>
> >> >>>>>>>>> such a dependency. ClusterClient was created by deploying or
> >> >>>>>>> retrieving.
> >> >>>>>>>>> JobGraph submission requires a compiled JobGraph and valid
> >> >>>>>>> ClusterClient,
> >> >>>>>>>>> but the creation of ClusterClient is abstractly independent
> >> >> of
> >> >>>> that
> >> >>>>>> of
> >> >>>>>>>>> JobGraph. However, in job cluster mode, we deploy job cluster
> >> >>>> with
> >> >>>>> a
> >> >>>>>>> job
> >> >>>>>>>>> graph, which means we use another process:
> >> >>>>>>>>>
> >> >>>>>>>>> create JobGraph --> deploy cluster with the JobGraph
> >> >>>>>>>>>
> >> >>>>>>>>> Here is another inconsistency and downstream projects/client
> >> >>> apis
> >> >>>>> are
> >> >>>>>>>>> forced to handle different cases with rare supports from
> >> >> Flink.
> >> >>>>>>>>>
> >> >>>>>>>>> Since we likely reached a consensus on
> >> >>>>>>>>>
> >> >>>>>>>>> 1. all configs gathered by Flink configuration and passed
> >> >>>>>>>>> 2. execution environment knows all configs and handles
> >> >>>>> execution(both
> >> >>>>>>>>> deployment and submission)
> >> >>>>>>>>>
> >> >>>>>>>>> to the issues above I propose eliminating inconsistencies by
> >> >>>>>> following
> >> >>>>>>>>> approach:
> >> >>>>>>>>>
> >> >>>>>>>>> 1) CliFrontend should exactly be a front end, at least for
> >> >>> "run"
> >> >>>>>>> command.
> >> >>>>>>>>> That means it just gathered and passed all config from
> >> >> command
> >> >>>> line
> >> >>>>>> to
> >> >>>>>>>>> the main method of user program. Execution environment knows
> >> >>> all
> >> >>>>> the
> >> >>>>>>> info
> >> >>>>>>>>> and with an addition to utils for ClusterClient, we
> >> >> gracefully
> >> >>>> get
> >> >>>>> a
> >> >>>>>>>>> ClusterClient by deploying or retrieving. In this way, we
> >> >> don't
> >> >>>>> need
> >> >>>>>> to
> >> >>>>>>>>> hijack #execute/executePlan methods and can remove various
> >> >>>> hacking
> >> >>>>>>>>> subclasses of exec env, as well as #run methods in
> >> >>>>> ClusterClient(for
> >> >>>>>> an
> >> >>>>>>>>> interface-ized ClusterClient). Now the control flow flows
> >> >> from
> >> >>>>>>>> CliFrontend
> >> >>>>>>>>> to the main method and never returns.
> >> >>>>>>>>>
> >> >>>>>>>>> 2) Job cluster means a cluster for the specific job. From
> >> >>> another
> >> >>>>>>>>> perspective, it is an ephemeral session. We may decouple the
> >> >>>>>> deployment
> >> >>>>>>>>> with a compiled job graph, but start a session with idle
> >> >>> timeout
> >> >>>>>>>>> and submit the job following.
> >> >>>>>>>>>
> >> >>>>>>>>> These topics, before we go into more details on design or
> >> >>>>>>> implementation,
> >> >>>>>>>>> are better to be aware and discussed for a consensus.
> >> >>>>>>>>>
> >> >>>>>>>>> Best,
> >> >>>>>>>>> tison.
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
> >> >>>>>>>>>
> >> >>>>>>>>>> Hi Jeff,
> >> >>>>>>>>>>
> >> >>>>>>>>>> Thanks for raising this thread and the design document!
> >> >>>>>>>>>>
> >> >>>>>>>>>> As @Thomas Weise mentioned above, extending config to flink
> >> >>>>>>>>>> requires far more effort than it should be. Another example
> >> >>>>>>>>>> is we achieve detach mode by introduce another execution
> >> >>>>>>>>>> environment which also hijack #execute method.
> >> >>>>>>>>>>
> >> >>>>>>>>>> I agree with your idea that user would configure all things
> >> >>>>>>>>>> and flink "just" respect it. On this topic I think the
> >> >> unusual
> >> >>>>>>>>>> control flow when CliFrontend handle "run" command is the
> >> >>>> problem.
> >> >>>>>>>>>> It handles several configs, mainly about cluster settings,
> >> >> and
> >> >>>>>>>>>> thus main method of user program is unaware of them. Also it
> >> >>>>>> compiles
> >> >>>>>>>>>> app to job graph by run the main method with a hijacked exec
> >> >>>> env,
> >> >>>>>>>>>> which constrain the main method further.
> >> >>>>>>>>>>
> >> >>>>>>>>>> I'd like to write down a few of notes on configs/args pass
> >> >> and
> >> >>>>>>> respect,
> >> >>>>>>>>>> as well as decoupling job compilation and submission. Share
> >> >> on
> >> >>>>> this
> >> >>>>>>>>>> thread later.
> >> >>>>>>>>>>
> >> >>>>>>>>>> Best,
> >> >>>>>>>>>> tison.
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一
> >> >> 下午7:29写道：
> >> >>>>>>>>>>
> >> >>>>>>>>>>> Hi Jeff and Flavio,
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Thanks Jeff a lot for proposing the design document.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> We are also working on refactoring ClusterClient to allow
> >> >>>>> flexible
> >> >>>>>>> and
> >> >>>>>>>>>>> efficient job management in our real-time platform.
> >> >>>>>>>>>>> We would like to draft a document to share our ideas with
> >> >>> you.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> I think it's a good idea to have something like Apache Livy
> >> >>> for
> >> >>>>>>> Flink,
> >> >>>>>>>>>>> and
> >> >>>>>>>>>>> the efforts discussed here will take a great step forward
> >> >> to
> >> >>>> it.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Regards,
> >> >>>>>>>>>>> Xiaogang
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一
> >> >>>>> 下午7:13写道：
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>> Is there any possibility to have something like Apache
> >> >> Livy
> >> >>>> [1]
> >> >>>>>>> also
> >> >>>>>>>>>>> for
> >> >>>>>>>>>>>> Flink in the future?
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> [1] https://livy.apache.org/
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <
> >> >>> zjffdu@gmail.com
> >> >>>>>
> >> >>>>>>> wrote:
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>> Any API we expose should not have dependencies on
> >> >>> the
> >> >>>>>>> runtime
> >> >>>>>>>>>>>>> (flink-runtime) package or other implementation
> >> >> details.
> >> >>> To
> >> >>>>> me,
> >> >>>>>>>> this
> >> >>>>>>>>>>>> means
> >> >>>>>>>>>>>>> that the current ClusterClient cannot be exposed to
> >> >> users
> >> >>>>>> because
> >> >>>>>>>> it
> >> >>>>>>>>>>>> uses
> >> >>>>>>>>>>>>> quite some classes from the optimiser and runtime
> >> >>> packages.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> We should change ClusterClient from class to interface.
> >> >>>>>>>>>>>>> ExecutionEnvironment only use the interface
> >> >> ClusterClient
> >> >>>>> which
> >> >>>>>>>>>>> should be
> >> >>>>>>>>>>>>> in flink-clients while the concrete implementation
> >> >> class
> >> >>>>> could
> >> >>>>>> be
> >> >>>>>>>> in
> >> >>>>>>>>>>>>> flink-runtime.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>> What happens when a failure/restart in the client
> >> >>>>> happens?
> >> >>>>>>>> There
> >> >>>>>>>>>>> need
> >> >>>>>>>>>>>>> to be a way of re-establishing the connection to the
> >> >> job,
> >> >>>> set
> >> >>>>>> up
> >> >>>>>>>> the
> >> >>>>>>>>>>>>> listeners again, etc.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Good point.  First we need to define what does
> >> >>>>> failure/restart
> >> >>>>>> in
> >> >>>>>>>> the
> >> >>>>>>>>>>>>> client mean. IIUC, that usually mean network failure
> >> >>> which
> >> >>>>> will
> >> >>>>>>>>>>> happen in
> >> >>>>>>>>>>>>> class RestClient. If my understanding is correct,
> >> >>>>> restart/retry
> >> >>>>>>>>>>> mechanism
> >> >>>>>>>>>>>>> should be done in RestClient.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二
> >> >>>>>> 下午11:10写道：
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> Some points to consider:
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> * Any API we expose should not have dependencies on
> >> >> the
> >> >>>>>> runtime
> >> >>>>>>>>>>>>>> (flink-runtime) package or other implementation
> >> >>> details.
> >> >>>> To
> >> >>>>>> me,
> >> >>>>>>>>>>> this
> >> >>>>>>>>>>>>> means
> >> >>>>>>>>>>>>>> that the current ClusterClient cannot be exposed to
> >> >>> users
> >> >>>>>>> because
> >> >>>>>>>>>>> it
> >> >>>>>>>>>>>>> uses
> >> >>>>>>>>>>>>>> quite some classes from the optimiser and runtime
> >> >>>> packages.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> * What happens when a failure/restart in the client
> >> >>>>> happens?
> >> >>>>>>>> There
> >> >>>>>>>>>>> need
> >> >>>>>>>>>>>>> to
> >> >>>>>>>>>>>>>> be a way of re-establishing the connection to the
> >> >> job,
> >> >>>> set
> >> >>>>> up
> >> >>>>>>> the
> >> >>>>>>>>>>>>> listeners
> >> >>>>>>>>>>>>>> again, etc.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> Aljoscha
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff Zhang <
> >> >>>> zjffdu@gmail.com>
> >> >>>>>>>> wrote:
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> Sorry folks, the design doc is late as you
> >> >> expected.
> >> >>>>> Here's
> >> >>>>>>> the
> >> >>>>>>>>>>>> design
> >> >>>>>>>>>>>>>> doc
> >> >>>>>>>>>>>>>>> I drafted, welcome any comments and feedback.
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四
> >> >>>> 下午8:43写道：
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>> Nice that this discussion is happening.
> >> >>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>> In the FLIP, we could also revisit the entire role
> >> >>> of
> >> >>>>> the
> >> >>>>>>>>>>>> environments
> >> >>>>>>>>>>>>>>>> again.
> >> >>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>> Initially, the idea was:
> >> >>>>>>>>>>>>>>>> - the environments take care of the specific
> >> >> setup
> >> >>>> for
> >> >>>>>>>>>>> standalone
> >> >>>>>>>>>>>> (no
> >> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
> >> >>>>>>>>>>>>>>>> - the session ones have control over the session.
> >> >>> The
> >> >>>>>>>>>>> environment
> >> >>>>>>>>>>>>> holds
> >> >>>>>>>>>>>>>>>> the session client.
> >> >>>>>>>>>>>>>>>> - running a job gives a "control" object for that
> >> >>>> job.
> >> >>>>>> That
> >> >>>>>>>>>>>> behavior
> >> >>>>>>>>>>>>> is
> >> >>>>>>>>>>>>>>>> the same in all environments.
> >> >>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>> The actual implementation diverged quite a bit
> >> >> from
> >> >>>>> that.
> >> >>>>>>>> Happy
> >> >>>>>>>>>>> to
> >> >>>>>>>>>>>>> see a
> >> >>>>>>>>>>>>>>>> discussion about straitening this out a bit more.
> >> >>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
> >> >>>>>>> zjffdu@gmail.com>
> >> >>>>>>>>>>>> wrote:
> >> >>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>> Hi folks,
> >> >>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>> Sorry for late response, It seems we reach
> >> >>> consensus
> >> >>>> on
> >> >>>>>>>> this, I
> >> >>>>>>>>>>>> will
> >> >>>>>>>>>>>>>>>> create
> >> >>>>>>>>>>>>>>>>> FLIP for this with more detailed design
> >> >>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org> 于2018年12月21日周五
> >> >>>>> 上午11:43写道：
> >> >>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>> Great to see this discussion seeded! The
> >> >> problems
> >> >>>> you
> >> >>>>>> face
> >> >>>>>>>>>>> with
> >> >>>>>>>>>>>> the
> >> >>>>>>>>>>>>>>>>>> Zeppelin integration are also affecting other
> >> >>>>> downstream
> >> >>>>>>>>>>> projects,
> >> >>>>>>>>>>>>>> like
> >> >>>>>>>>>>>>>>>>>> Beam.
> >> >>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>> We just enabled the savepoint restore option in
> >> >>>>>>>>>>>>>> RemoteStreamEnvironment
> >> >>>>>>>>>>>>>>>>> [1]
> >> >>>>>>>>>>>>>>>>>> and that was more difficult than it should be.
> >> >> The
> >> >>>>> main
> >> >>>>>>>> issue
> >> >>>>>>>>>>> is
> >> >>>>>>>>>>>>> that
> >> >>>>>>>>>>>>>>>>>> environment and cluster client aren't decoupled.
> >> >>>>> Ideally
> >> >>>>>>> it
> >> >>>>>>>>>>> should
> >> >>>>>>>>>>>>> be
> >> >>>>>>>>>>>>>>>>>> possible to just get the matching cluster client
> >> >>>> from
> >> >>>>>> the
> >> >>>>>>>>>>>>> environment
> >> >>>>>>>>>>>>>>>> and
> >> >>>>>>>>>>>>>>>>>> then control the job through it (environment as
> >> >>>>> factory
> >> >>>>>>> for
> >> >>>>>>>>>>>> cluster
> >> >>>>>>>>>>>>>>>>>> client). But note that the environment classes
> >> >> are
> >> >>>>> part
> >> >>>>>> of
> >> >>>>>>>> the
> >> >>>>>>>>>>>>> public
> >> >>>>>>>>>>>>>>>>> API,
> >> >>>>>>>>>>>>>>>>>> and it is not straightforward to make larger
> >> >>> changes
> >> >>>>>>> without
> >> >>>>>>>>>>>>> breaking
> >> >>>>>>>>>>>>>>>>>> backward compatibility.
> >> >>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes internal classes
> >> >>>> like
> >> >>>>>>>>>>> JobGraph and
> >> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be possible to wrap
> >> >>> this
> >> >>>>>> with a
> >> >>>>>>>> new
> >> >>>>>>>>>>>>> public
> >> >>>>>>>>>>>>>>>> API
> >> >>>>>>>>>>>>>>>>>> that brings the required job control
> >> >> capabilities
> >> >>>> for
> >> >>>>>>>>>>> downstream
> >> >>>>>>>>>>>>>>>>> projects.
> >> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at some of the
> >> >>>>> interfaces
> >> >>>>>> in
> >> >>>>>>>>>>> Beam
> >> >>>>>>>>>>>>> while
> >> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the portable job
> >> >> API
> >> >>>> and
> >> >>>>>> [3]
> >> >>>>>>>> for
> >> >>>>>>>>>>> the
> >> >>>>>>>>>>>>> old
> >> >>>>>>>>>>>>>>>>>> asynchronous job control from the Beam Java SDK.
> >> >>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>> The backward compatibility discussion [4] is
> >> >> also
> >> >>>>>> relevant
> >> >>>>>>>>>>> here. A
> >> >>>>>>>>>>>>> new
> >> >>>>>>>>>>>>>>>>> API
> >> >>>>>>>>>>>>>>>>>> should shield downstream projects from internals
> >> >>> and
> >> >>>>>> allow
> >> >>>>>>>>>>> them to
> >> >>>>>>>>>>>>>>>>>> interoperate with multiple future Flink versions
> >> >>> in
> >> >>>>> the
> >> >>>>>>> same
> >> >>>>>>>>>>>> release
> >> >>>>>>>>>>>>>>>> line
> >> >>>>>>>>>>>>>>>>>> without forced upgrades.
> >> >>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>> Thanks,
> >> >>>>>>>>>>>>>>>>>> Thomas
> >> >>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>> [1] https://github.com/apache/flink/pull/7249
> >> >>>>>>>>>>>>>>>>>> [2]
> >> >>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> >> >>>>>>>>>>>>>>>>>> [3]
> >> >>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> >> >>>>>>>>>>>>>>>>>> [4]
> >> >>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> >> >>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
> >> >>>>>>>> zjffdu@gmail.com>
> >> >>>>>>>>>>>>> wrote:
> >> >>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
> >> >>> able
> >> >>>> to
> >> >>>>>>>> define
> >> >>>>>>>>>>>> where
> >> >>>>>>>>>>>>>>>> the
> >> >>>>>>>>>>>>>>>>>> job
> >> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is actually
> >> >>>>>> independent
> >> >>>>>>>> of
> >> >>>>>>>>>>> the
> >> >>>>>>>>>>>>> job
> >> >>>>>>>>>>>>>>>>>>> development and is something which is decided
> >> >> at
> >> >>>>>>> deployment
> >> >>>>>>>>>>> time.
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>> User don't need to specify execution mode
> >> >>>>>>> programmatically.
> >> >>>>>>>>>>> They
> >> >>>>>>>>>>>>> can
> >> >>>>>>>>>>>>>>>>> also
> >> >>>>>>>>>>>>>>>>>>> pass the execution mode from the arguments in
> >> >>> flink
> >> >>>>> run
> >> >>>>>>>>>>> command.
> >> >>>>>>>>>>>>> e.g.
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster ....
> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
> >> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that the
> >> >>>> ExecutionEnvironment
> >> >>>>>> is
> >> >>>>>>>> not
> >> >>>>>>>>>>>>>>>> directly
> >> >>>>>>>>>>>>>>>>>>> initialized by the user and instead context
> >> >>>> sensitive
> >> >>>>>> how
> >> >>>>>>>> you
> >> >>>>>>>>>>>> want
> >> >>>>>>>>>>>>> to
> >> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs. IDE, for
> >> >>> example).
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink would create
> >> >>>>> different
> >> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment based on different
> >> >>>>>> submission
> >> >>>>>>>>>>>> scenarios
> >> >>>>>>>>>>>>>>>>>> (Flink
> >> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is kind of hack
> >> >> approach,
> >> >>>> not
> >> >>>>>> so
> >> >>>>>>>>>>>>>>>>> straightforward.
> >> >>>>>>>>>>>>>>>>>>> What I suggested above is that is that flink
> >> >>> should
> >> >>>>>>> always
> >> >>>>>>>>>>> create
> >> >>>>>>>>>>>>> the
> >> >>>>>>>>>>>>>>>>>> same
> >> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with different
> >> >>>>> configuration,
> >> >>>>>>> and
> >> >>>>>>>>>>> based
> >> >>>>>>>>>>>> on
> >> >>>>>>>>>>>>>>>> the
> >> >>>>>>>>>>>>>>>>>>> configuration it would create the proper
> >> >>>>> ClusterClient
> >> >>>>>>> for
> >> >>>>>>>>>>>>> different
> >> >>>>>>>>>>>>>>>>>>> behaviors.
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> >> >>>> 于2018年12月20日周四
> >> >>>>>>>>>>> 下午11:18写道：
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>> You are probably right that we have code
> >> >>>> duplication
> >> >>>>>>> when
> >> >>>>>>>> it
> >> >>>>>>>>>>>> comes
> >> >>>>>>>>>>>>>>>> to
> >> >>>>>>>>>>>>>>>>>> the
> >> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient. This should be
> >> >>>>> reduced
> >> >>>>>> in
> >> >>>>>>>> the
> >> >>>>>>>>>>>>>>>> future.
> >> >>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
> >> >> able
> >> >>> to
> >> >>>>>>> define
> >> >>>>>>>>>>> where
> >> >>>>>>>>>>>>> the
> >> >>>>>>>>>>>>>>>>> job
> >> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is actually
> >> >>>>>>> independent
> >> >>>>>>>>>>> of the
> >> >>>>>>>>>>>>>>>> job
> >> >>>>>>>>>>>>>>>>>>>> development and is something which is decided
> >> >> at
> >> >>>>>>>> deployment
> >> >>>>>>>>>>>> time.
> >> >>>>>>>>>>>>>>>> To
> >> >>>>>>>>>>>>>>>>> me
> >> >>>>>>>>>>>>>>>>>>> it
> >> >>>>>>>>>>>>>>>>>>>> makes sense that the ExecutionEnvironment is
> >> >> not
> >> >>>>>>> directly
> >> >>>>>>>>>>>>>>>> initialized
> >> >>>>>>>>>>>>>>>>>> by
> >> >>>>>>>>>>>>>>>>>>>> the user and instead context sensitive how you
> >> >>>> want
> >> >>>>> to
> >> >>>>>>>>>>> execute
> >> >>>>>>>>>>>>> your
> >> >>>>>>>>>>>>>>>>> job
> >> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for example). However, I
> >> >>> agree
> >> >>>>>> that
> >> >>>>>>>> the
> >> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should give you access to
> >> >>> the
> >> >>>>>>>>>>> ClusterClient
> >> >>>>>>>>>>>>>>>> and
> >> >>>>>>>>>>>>>>>>> to
> >> >>>>>>>>>>>>>>>>>>> the
> >> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the JobGraph or a
> >> >> job
> >> >>>>> plan).
> >> >>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>> Cheers,
> >> >>>>>>>>>>>>>>>>>>>> Till
> >> >>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
> >> >>>>>>>>>>> zjffdu@gmail.com>
> >> >>>>>>>>>>>>>>>> wrote:
> >> >>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> >> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You are right that I
> >> >>>>> expect
> >> >>>>>>>> better
> >> >>>>>>>>>>>>>>>>>>> programmatic
> >> >>>>>>>>>>>>>>>>>>>>> job submission/control api which could be
> >> >> used
> >> >>> by
> >> >>>>>>>>>>> downstream
> >> >>>>>>>>>>>>>>>>> project.
> >> >>>>>>>>>>>>>>>>>>> And
> >> >>>>>>>>>>>>>>>>>>>>> it would benefit for the flink ecosystem.
> >> >> When
> >> >>> I
> >> >>>>> look
> >> >>>>>>> at
> >> >>>>>>>>>>> the
> >> >>>>>>>>>>>> code
> >> >>>>>>>>>>>>>>>>> of
> >> >>>>>>>>>>>>>>>>>>>> flink
> >> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I believe they
> >> >> are
> >> >>>> not
> >> >>>>>> the
> >> >>>>>>>>>>> core of
> >> >>>>>>>>>>>>>>>>> flink,
> >> >>>>>>>>>>>>>>>>>>> but
> >> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of flink), I find
> >> >> many
> >> >>>>>>> duplicated
> >> >>>>>>>>>>> code
> >> >>>>>>>>>>>>>>>> for
> >> >>>>>>>>>>>>>>>>>>>> creating
> >> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user provided
> >> >> configuration
> >> >>>>>>>>>>> (configuration
> >> >>>>>>>>>>>>>>>>> format
> >> >>>>>>>>>>>>>>>>>>> may
> >> >>>>>>>>>>>>>>>>>>>> be
> >> >>>>>>>>>>>>>>>>>>>>> different from scala-shell and sql-client)
> >> >> and
> >> >>>> then
> >> >>>>>> use
> >> >>>>>>>>>>> that
> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> >> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't think this is
> >> >>>>> convenient
> >> >>>>>>> for
> >> >>>>>>>>>>>>>>>> downstream
> >> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is that downstream
> >> >>>> project
> >> >>>>>> only
> >> >>>>>>>>>>> needs
> >> >>>>>>>>>>>> to
> >> >>>>>>>>>>>>>>>>>>> provide
> >> >>>>>>>>>>>>>>>>>>>>> necessary configuration info (maybe
> >> >> introducing
> >> >>>>> class
> >> >>>>>>>>>>>> FlinkConf),
> >> >>>>>>>>>>>>>>>>> and
> >> >>>>>>>>>>>>>>>>>>>> then
> >> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment based on this
> >> >>>> FlinkConf,
> >> >>>>>> and
> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will create the proper
> >> >>>>>>>> ClusterClient.
> >> >>>>>>>>>>> It
> >> >>>>>>>>>>>> not
> >> >>>>>>>>>>>>>>>>>> only
> >> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream project
> >> >> development
> >> >>>> but
> >> >>>>>> also
> >> >>>>>>>> be
> >> >>>>>>>>>>>>>>>> helpful
> >> >>>>>>>>>>>>>>>>>> for
> >> >>>>>>>>>>>>>>>>>>>>> their integration test with flink. Here's one
> >> >>>>> sample
> >> >>>>>>> code
> >> >>>>>>>>>>>> snippet
> >> >>>>>>>>>>>>>>>>>> that
> >> >>>>>>>>>>>>>>>>>>> I
> >> >>>>>>>>>>>>>>>>>>>>> expect.
> >> >>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>> val conf = new FlinkConf().mode("yarn")
> >> >>>>>>>>>>>>>>>>>>>>> val env = new ExecutionEnvironment(conf)
> >> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
> >> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> >> >>>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
> >> >>>>>>>>>>>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
> >> >>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> >> >>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> >> >>>>> 于2018年12月11日周二
> >> >>>>>>>>>>> 下午6:28写道：
> >> >>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to provide the
> >> >> user
> >> >>>> with
> >> >>>>>>>> better
> >> >>>>>>>>>>>>>>>>>>> programmatic
> >> >>>>>>>>>>>>>>>>>>>>> job
> >> >>>>>>>>>>>>>>>>>>>>>> control. There was actually an effort to
> >> >>> achieve
> >> >>>>>> this
> >> >>>>>>>> but
> >> >>>>>>>>>>> it
> >> >>>>>>>>>>>>>>>> has
> >> >>>>>>>>>>>>>>>>>>> never
> >> >>>>>>>>>>>>>>>>>>>>> been
> >> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However, there are some
> >> >>>> improvement
> >> >>>>>> in
> >> >>>>>>>> the
> >> >>>>>>>>>>> code
> >> >>>>>>>>>>>>>>>>> base
> >> >>>>>>>>>>>>>>>>>>>> now.
> >> >>>>>>>>>>>>>>>>>>>>>> Look for example at the NewClusterClient
> >> >>>> interface
> >> >>>>>>> which
> >> >>>>>>>>>>>>>>>> offers a
> >> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission. But I agree
> >> >> that
> >> >>> we
> >> >>>>>> need
> >> >>>>>>> to
> >> >>>>>>>>>>>>>>>> improve
> >> >>>>>>>>>>>>>>>>>>> Flink
> >> >>>>>>>>>>>>>>>>>>>> in
> >> >>>>>>>>>>>>>>>>>>>>>> this regard.
> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if exposing all
> >> >>>>>> ClusterClient
> >> >>>>>>>>>>> calls
> >> >>>>>>>>>>>>>>>> via
> >> >>>>>>>>>>>>>>>>>> the
> >> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment because it would
> >> >> clutter
> >> >>>> the
> >> >>>>>>> class
> >> >>>>>>>>>>> and
> >> >>>>>>>>>>>>>>>> would
> >> >>>>>>>>>>>>>>>>>> not
> >> >>>>>>>>>>>>>>>>>>>> be
> >> >>>>>>>>>>>>>>>>>>>>> a
> >> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns. Instead one
> >> >> idea
> >> >>>>> could
> >> >>>>>> be
> >> >>>>>>>> to
> >> >>>>>>>>>>>>>>>>> retrieve
> >> >>>>>>>>>>>>>>>>>>> the
> >> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from the
> >> >>>>> ExecutionEnvironment
> >> >>>>>>>> which
> >> >>>>>>>>>>> can
> >> >>>>>>>>>>>>>>>>> then
> >> >>>>>>>>>>>>>>>>>> be
> >> >>>>>>>>>>>>>>>>>>>>> used
> >> >>>>>>>>>>>>>>>>>>>>>> for cluster and job control. But before we
> >> >>> start
> >> >>>>> an
> >> >>>>>>>> effort
> >> >>>>>>>>>>>>>>>> here,
> >> >>>>>>>>>>>>>>>>> we
> >> >>>>>>>>>>>>>>>>>>>> need
> >> >>>>>>>>>>>>>>>>>>>>> to
> >> >>>>>>>>>>>>>>>>>>>>>> agree and capture what functionality we want
> >> >>> to
> >> >>>>>>> provide.
> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was that we have the
> >> >>>>>>>> ClusterDescriptor
> >> >>>>>>>>>>>>>>>>>> describing
> >> >>>>>>>>>>>>>>>>>>>> how
> >> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager like Yarn or
> >> >> Mesos.
> >> >>>> The
> >> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> >> >>>>>>>>>>>>>>>>>>>> can
> >> >>>>>>>>>>>>>>>>>>>>> be
> >> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink clusters (job and
> >> >>>>> session)
> >> >>>>>>> and
> >> >>>>>>>>>>> gives
> >> >>>>>>>>>>>>>>>>> you a
> >> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The ClusterClient controls
> >> >> the
> >> >>>>>> cluster
> >> >>>>>>>>>>> (e.g.
> >> >>>>>>>>>>>>>>>>>>> submitting
> >> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running jobs). And then
> >> >>> there
> >> >>>>> was
> >> >>>>>>> the
> >> >>>>>>>>>>> idea
> >> >>>>>>>>>>>> to
> >> >>>>>>>>>>>>>>>>>>>>> introduce a
> >> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain from the
> >> >>>> ClusterClient
> >> >>>>> to
> >> >>>>>>>>>>> trigger
> >> >>>>>>>>>>>>>>>> job
> >> >>>>>>>>>>>>>>>>>>>> specific
> >> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a savepoint,
> >> >>> cancelling
> >> >>>>> the
> >> >>>>>>>> job).
> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>> [1]
> >> >>>>>> https://issues.apache.org/jira/browse/FLINK-4272
> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> >> >>>>>>>>>>>>>>>>>>>>>> Till
> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang
> >> >> <
> >> >>>>>>>>>>> zjffdu@gmail.com
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>> wrote:
> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate flink into apache
> >> >>>>> zeppelin
> >> >>>>>>>>>>> which is
> >> >>>>>>>>>>>>>>>> an
> >> >>>>>>>>>>>>>>>>>>>>>> interactive
> >> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several issues that is
> >> >>>> caused
> >> >>>>>> by
> >> >>>>>>>>>>> flink
> >> >>>>>>>>>>>>>>>>> client
> >> >>>>>>>>>>>>>>>>>>>> api.
> >> >>>>>>>>>>>>>>>>>>>>> So
> >> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the following changes
> >> >>> for
> >> >>>>>> flink
> >> >>>>>>>>>>> client
> >> >>>>>>>>>>>>>>>>> api.
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking execution.
> >> >> Currently,
> >> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
> >> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which would do 2
> >> >> things,
> >> >>>>> first
> >> >>>>>>>>>>> submit
> >> >>>>>>>>>>>>>>>> job
> >> >>>>>>>>>>>>>>>>>> and
> >> >>>>>>>>>>>>>>>>>>>> then
> >> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is finished. I'd like
> >> >>>>>>> introduce a
> >> >>>>>>>>>>>>>>>>>> nonblocking
> >> >>>>>>>>>>>>>>>>>>>>>>> execution method like
> >> >>>> ExecutionEnvironment#submit
> >> >>>>>>> which
> >> >>>>>>>>>>> only
> >> >>>>>>>>>>>>>>>>>> submit
> >> >>>>>>>>>>>>>>>>>>>> job
> >> >>>>>>>>>>>>>>>>>>>>>> and
> >> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to client. And allow user
> >> >>> to
> >> >>>>>> query
> >> >>>>>>>> the
> >> >>>>>>>>>>> job
> >> >>>>>>>>>>>>>>>>>> status
> >> >>>>>>>>>>>>>>>>>>>> via
> >> >>>>>>>>>>>>>>>>>>>>>> the
> >> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
> >> >>>>>>>>>>>>>>>>>>>
> >> >> ExecutionEnvironment/StreamExecutionEnvironment,
> >> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to cancel job is via
> >> >>> cli
> >> >>>>>>>>>>> (bin/flink),
> >> >>>>>>>>>>>>>>>>> this
> >> >>>>>>>>>>>>>>>>>>> is
> >> >>>>>>>>>>>>>>>>>>>>> not
> >> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream project to use
> >> >> this
> >> >>>>>>> feature.
> >> >>>>>>>>>>> So I'd
> >> >>>>>>>>>>>>>>>>>> like
> >> >>>>>>>>>>>>>>>>>>> to
> >> >>>>>>>>>>>>>>>>>>>>> add
> >> >>>>>>>>>>>>>>>>>>>>>>> cancel api in ExecutionEnvironment
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
> >> >>>>>>>>>>>>>>>>>>>>>
> >> >>> ExecutionEnvironment/StreamExecutionEnvironment.
> >> >>>>>>>>>>>>>>>>>>>>>> It
> >> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api, we should use
> >> >>>>>>>>>>> ExecutionEnvironment
> >> >>>>>>>>>>>>>>>> as
> >> >>>>>>>>>>>>>>>>>> the
> >> >>>>>>>>>>>>>>>>>>>>>> unified
> >> >>>>>>>>>>>>>>>>>>>>>>> api for third party to integrate with
> >> >> flink.
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job execution
> >> >> lifecycle.
> >> >>>>>>> Something
> >> >>>>>>>>>>> like
> >> >>>>>>>>>>>>>>>>>>>> following,
> >> >>>>>>>>>>>>>>>>>>>>> so
> >> >>>>>>>>>>>>>>>>>>>>>>> that downstream project can do custom logic
> >> >>> in
> >> >>>>> the
> >> >>>>>>>>>>> lifecycle
> >> >>>>>>>>>>>>>>>> of
> >> >>>>>>>>>>>>>>>>>>> job.
> >> >>>>>>>>>>>>>>>>>>>>> e.g.
> >> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the jobId after job
> >> >> is
> >> >>>>>>> submitted
> >> >>>>>>>>>>> and
> >> >>>>>>>>>>>>>>>>> then
> >> >>>>>>>>>>>>>>>>>>> use
> >> >>>>>>>>>>>>>>>>>>>>> this
> >> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later when necessary.
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>> public interface JobListener {
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID jobId);
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobExecuted(JobExecutionResult
> >> >>>>> jobResult);
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID jobId);
> >> >>>>>>>>>>>>>>>>>>>>>>> }
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in ExecutionEnvironment.
> >> >>>>>> Currently
> >> >>>>>>> it
> >> >>>>>>>>>>> is
> >> >>>>>>>>>>>>>>>>>>> disabled,
> >> >>>>>>>>>>>>>>>>>>>>> but
> >> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient for third party
> >> >> to
> >> >>>>>>>> submitting
> >> >>>>>>>>>>> jobs
> >> >>>>>>>>>>>>>>>>>>>>>> continually.
> >> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it again.
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client api into
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment.
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue which needs more
> >> >>>>> careful
> >> >>>>>>>>>>> thinking
> >> >>>>>>>>>>>>>>>> and
> >> >>>>>>>>>>>>>>>>>>>> design.
> >> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features of flink is
> >> >>> exposed
> >> >>>> in
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>> ExecutionEnvironment/StreamExecutionEnvironment,
> >> >>>>>> but
> >> >>>>>>>>>>> some are
> >> >>>>>>>>>>>>>>>>>>> exposed
> >> >>>>>>>>>>>>>>>>>>>>> in
> >> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like the cancel and
> >> >>>>> savepoint I
> >> >>>>>>>>>>> mentioned
> >> >>>>>>>>>>>>>>>>>>> above.
> >> >>>>>>>>>>>>>>>>>>>> I
> >> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is due to that flink
> >> >>>> didn't
> >> >>>>>>> unify
> >> >>>>>>>>>>> the
> >> >>>>>>>>>>>>>>>>>>>> interaction
> >> >>>>>>>>>>>>>>>>>>>>>> with
> >> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3 scenarios of flink
> >> >>>> operation
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.  Flink will create
> >> >>>>>>>>>>> LocalEnvironment
> >> >>>>>>>>>>>>>>>>> and
> >> >>>>>>>>>>>>>>>>>>>> then
> >> >>>>>>>>>>>>>>>>>>>>>> use
> >> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to create
> >> >>> LocalExecutor
> >> >>>>> for
> >> >>>>>>> job
> >> >>>>>>>>>>>>>>>>>> execution.
> >> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution. Flink will create
> >> >>>>>>>> ClusterClient
> >> >>>>>>>>>>>>>>>>> first
> >> >>>>>>>>>>>>>>>>>>> and
> >> >>>>>>>>>>>>>>>>>>>>> then
> >> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment based on the
> >> >>>>>>> ClusterClient
> >> >>>>>>>>>>> and
> >> >>>>>>>>>>>>>>>>> then
> >> >>>>>>>>>>>>>>>>>>> run
> >> >>>>>>>>>>>>>>>>>>>>> the
> >> >>>>>>>>>>>>>>>>>>>>>>> job.
> >> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink will create
> >> >>>>>> ClusterClient
> >> >>>>>>>>>>> first
> >> >>>>>>>>>>>>>>>> and
> >> >>>>>>>>>>>>>>>>>>> then
> >> >>>>>>>>>>>>>>>>>>>>>> cancel
> >> >>>>>>>>>>>>>>>>>>>>>>>  this job via this ClusterClient.
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the above 3 scenarios.
> >> >>> Flink
> >> >>>>>> didn't
> >> >>>>>>>>>>> use the
> >> >>>>>>>>>>>>>>>>>> same
> >> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to interact with flink
> >> >>>>>>>>>>>>>>>>>>>>>>> What I propose is following:
> >> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> >> >>>>>> LocalEnvironment/RemoteEnvironment
> >> >>>>>>>>>>> (based
> >> >>>>>>>>>>>>>>>> on
> >> >>>>>>>>>>>>>>>>>> user
> >> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this Environment to
> >> >>>> create
> >> >>>>>>>> proper
> >> >>>>>>>>>>>>>>>>>>>> ClusterClient
> >> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or RestClusterClient)
> >> >> to
> >> >>>>>>>> interactive
> >> >>>>>>>>>>> with
> >> >>>>>>>>>>>>>>>>>>> Flink (
> >> >>>>>>>>>>>>>>>>>>>>> job
> >> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the process of local
> >> >>>>>> execution
> >> >>>>>>>> and
> >> >>>>>>>>>>>>>>>> remote
> >> >>>>>>>>>>>>>>>>>>>>>> execution.
> >> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for third party to
> >> >>>>> integrate
> >> >>>>>>> with
> >> >>>>>>>>>>>>>>>> flink,
> >> >>>>>>>>>>>>>>>>>>>> because
> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the unified entry
> >> >>> point
> >> >>>>> for
> >> >>>>>>>>>>> flink.
> >> >>>>>>>>>>>>>>>> What
> >> >>>>>>>>>>>>>>>>>>> third
> >> >>>>>>>>>>>>>>>>>>>>>> party
> >> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass configuration to
> >> >>>>>>>>>>>>>>>> ExecutionEnvironment
> >> >>>>>>>>>>>>>>>>>> and
> >> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will do the right
> >> >> thing
> >> >>>>> based
> >> >>>>>> on
> >> >>>>>>>> the
> >> >>>>>>>>>>>>>>>>>>>>> configuration.
> >> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be considered as flink
> >> >> api
> >> >>>>>>> consumer.
> >> >>>>>>>>>>> it
> >> >>>>>>>>>>>>>>>> just
> >> >>>>>>>>>>>>>>>>>>> pass
> >> >>>>>>>>>>>>>>>>>>>>> the
> >> >>>>>>>>>>>>>>>>>>>>>>> configuration to ExecutionEnvironment and
> >> >> let
> >> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >> >>>>>>>>>>>>>>>>>>> to
> >> >>>>>>>>>>>>>>>>>>>>>>> create the proper ClusterClient instead of
> >> >>>>> letting
> >> >>>>>>> cli
> >> >>>>>>>> to
> >> >>>>>>>>>>>>>>>>> create
> >> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code refactoring, so
> >> >> I
> >> >>>>> think
> >> >>>>>> we
> >> >>>>>>>> can
> >> >>>>>>>>>>>>>>>> defer
> >> >>>>>>>>>>>>>>>>>> it
> >> >>>>>>>>>>>>>>>>>>>> for
> >> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5 could be done at
> >> >>>> once I
> >> >>>>>>>>>>> believe.
> >> >>>>>>>>>>>>>>>> Let
> >> >>>>>>>>>>>>>>>>> me
> >> >>>>>>>>>>>>>>>>>>>> know
> >> >>>>>>>>>>>>>>>>>>>>>> your
> >> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback, thanks
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>> --
> >> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >> >>>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>> --
> >> >>>>>>>>>>>>>>>>>>>>> Best Regards
> >> >>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >> >>>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>> --
> >> >>>>>>>>>>>>>>>>>>> Best Regards
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> >> >>>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>> --
> >> >>>>>>>>>>>>>>>>> Best Regards
> >> >>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>> Jeff Zhang
> >> >>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> --
> >> >>>>>>>>>>>>>>> Best Regards
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> Jeff Zhang
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> --
> >> >>>>>>>>>>>>> Best Regards
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Jeff Zhang
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> --
> >> >>>>>>>> Best Regards
> >> >>>>>>>>
> >> >>>>>>>> Jeff Zhang
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> --
> >> >>>>> Best Regards
> >> >>>>>
> >> >>>>> Jeff Zhang
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >> >
> >> > --
> >> > Best Regards
> >> >
> >> > Jeff Zhang
> >>
> >>
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

I would like to involve Till & Stephan here to clarify some concept of
per-job mode.

The term per-job is one of modes a cluster could run on. It is mainly aimed
at spawn
a dedicated cluster for a specific job while the job could be packaged with
Flink
itself and thus the cluster initialized with job so that get rid of a
separated
submission step.

This is useful for container deployments where one create his image with
the job
and then simply deploy the container.

However, it is out of client scope since a client(ClusterClient for
example) is for
communicate with an existing cluster and performance actions. Currently, in
per-job
mode, we extract the job graph and bundle it into cluster deployment and
thus no
concept of client get involved. It looks like reasonable to exclude the
deployment
of per-job cluster from client api and use dedicated utility
classes(deployers) for
deployment.

Zili Chen <wa...@gmail.com> 于2019年8月20日周二 下午12:37写道：

> Hi Aljoscha,
>
> Thanks for your reply and participance. The Google Doc you linked to
> requires
> permission and I think you could use a share link instead.
>
> I agree with that we almost reach a consensus that JobClient is necessary
> to
> interacte with a running Job.
>
> Let me check your open questions one by one.
>
> 1. Separate cluster creation and job submission for per-job mode.
>
> As you mentioned here is where the opinions diverge. In my document there
> is
> an alternative[2] that proposes excluding per-job deployment from client
> api
> scope and now I find it is more reasonable we do the exclusion.
>
> When in per-job mode, a dedicated JobCluster is launched to execute the
> specific job. It is like a Flink Application more than a submission
> of Flink Job. Client only takes care of job submission and assume there is
> an existing cluster. In this way we are able to consider per-job issues
> individually and JobClusterEntrypoint would be the utility class for
> per-job
> deployment.
>
> Nevertheless, user program works in both session mode and per-job mode
> without
> necessary to change code. JobClient in per-job mode is returned from
> env.execute as normal. However, it would be no longer a wrapper of
> RestClusterClient but a wrapper of PerJobClusterClient which communicates
> to Dispatcher locally.
>
> 2. How to deal with plan preview.
>
> With env.compile functions users can get JobGraph or FlinkPlan and thus
> they can preview the plan with programming. Typically it looks like
>
> if (preview configured) {
>     FlinkPlan plan = env.compile();
>     new JSONDumpGenerator(...).dump(plan);
> } else {
>     env.execute();
> }
>
> And `flink info` would be invalid any more.
>
> 3. How to deal with Jar Submission at the Web Frontend.
>
> There is one more thread talked on this topic[1]. Apart from removing
> the functions there are two alternatives.
>
> One is to introduce an interface has a method returns JobGraph/FilnkPlan
> and Jar Submission only support main-class implements this interface.
> And then extract the JobGraph/FlinkPlan just by calling the method.
> In this way, it is even possible to consider a separation of job creation
> and job submission.
>
> The other is, as you mentioned, let execute() do the actual execution.
> We won't execute the main method in the WebFrontend but spawn a process
> at WebMonitor side to execute. For return part we could generate the
> JobID from WebMonitor and pass it to the execution environemnt.
>
> 4. How to deal with detached mode.
>
> I think detached mode is a temporary solution for non-blocking submission.
> In my document both submission and execution return a CompletableFuture and
> users control whether or not wait for the result. In this point we don't
> need a detached option but the functionality is covered.
>
> 5. How does per-job mode interact with interactive programming.
>
> All of YARN, Mesos and Kubernetes scenarios follow the pattern launch a
> JobCluster now. And I don't think there would be inconsistency between
> different resource management.
>
> Best,
> tison.
>
> [1]
> https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
> [2]
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs
>
> Aljoscha Krettek <al...@apache.org> 于2019年8月16日周五 下午9:20写道：
>
>> Hi,
>>
>> I read both Jeffs initial design document and the newer document by
>> Tison. I also finally found the time to collect our thoughts on the issue,
>> I had quite some discussions with Kostas and this is the result: [1].
>>
>> I think overall we agree that this part of the code is in dire need of
>> some refactoring/improvements but I think there are still some open
>> questions and some differences in opinion what those refactorings should
>> look like.
>>
>> I think the API-side is quite clear, i.e. we need some JobClient API that
>> allows interacting with a running Job. It could be worthwhile to spin that
>> off into a separate FLIP because we can probably find consensus on that
>> part more easily.
>>
>> For the rest, the main open questions from our doc are these:
>>
>>   - Do we want to separate cluster creation and job submission for
>> per-job mode? In the past, there were conscious efforts to *not* separate
>> job submission from cluster creation for per-job clusters for Mesos, YARN,
>> Kubernets (see StandaloneJobClusterEntryPoint). Tison suggests in his
>> design document to decouple this in order to unify job submission.
>>
>>   - How to deal with plan preview, which needs to hijack execute() and
>> let the outside code catch an exception?
>>
>>   - How to deal with Jar Submission at the Web Frontend, which needs to
>> hijack execute() and let the outside code catch an exception?
>> CliFrontend.run() “hijacks” ExecutionEnvironment.execute() to get a
>> JobGraph and then execute that JobGraph manually. We could get around that
>> by letting execute() do the actual execution. One caveat for this is that
>> now the main() method doesn’t return (or is forced to return by throwing an
>> exception from execute()) which means that for Jar Submission from the
>> WebFrontend we have a long-running main() method running in the
>> WebFrontend. This doesn’t sound very good. We could get around this by
>> removing the plan preview feature and by removing Jar Submission/Running.
>>
>>   - How to deal with detached mode? Right now, DetachedEnvironment will
>> execute the job and return immediately. If users control when they want to
>> return, by waiting on the job completion future, how do we deal with this?
>> Do we simply remove the distinction between detached/non-detached?
>>
>>   - How does per-job mode interact with “interactive programming”
>> (FLIP-36). For YARN, each execute() call could spawn a new Flink YARN
>> cluster. What about Mesos and Kubernetes?
>>
>> The first open question is where the opinions diverge, I think. The rest
>> are just open questions and interesting things that we need to consider.
>>
>> Best,
>> Aljoscha
>>
>> [1]
>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>> <
>> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
>> >
>>
>> > On 31. Jul 2019, at 15:23, Jeff Zhang <zj...@gmail.com> wrote:
>> >
>> > Thanks tison for the effort. I left a few comments.
>> >
>> >
>> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三 下午8:24写道：
>> >
>> >> Hi Flavio,
>> >>
>> >> Thanks for your reply.
>> >>
>> >> Either current impl and in the design, ClusterClient
>> >> never takes responsibility for generating JobGraph.
>> >> (what you see in current codebase is several class methods)
>> >>
>> >> Instead, user describes his program in the main method
>> >> with ExecutionEnvironment apis and calls env.compile()
>> >> or env.optimize() to get FlinkPlan and JobGraph respectively.
>> >>
>> >> For listing main classes in a jar and choose one for
>> >> submission, you're now able to customize a CLI to do it.
>> >> Specifically, the path of jar is passed as arguments and
>> >> in the customized CLI you list main classes, choose one
>> >> to submit to the cluster.
>> >>
>> >> Best,
>> >> tison.
>> >>
>> >>
>> >> Flavio Pompermaier <po...@okkam.it> 于2019年7月31日周三 下午8:12写道：
>> >>
>> >>> Just one note on my side: it is not clear to me whether the client
>> needs
>> >> to
>> >>> be able to generate a job graph or not.
>> >>> In my opinion, the job jar must resides only on the server/jobManager
>> >> side
>> >>> and the client requires a way to get the job graph.
>> >>> If you really want to access to the job graph, I'd add a dedicated
>> method
>> >>> on the ClusterClient. like:
>> >>>
>> >>>   - getJobGraph(jarId, mainClass): JobGraph
>> >>>   - listMainClasses(jarId): List<String>
>> >>>
>> >>> These would require some addition also on the job manager endpoint as
>> >>> well..what do you think?
>> >>>
>> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <wa...@gmail.com>
>> wrote:
>> >>>
>> >>>> Hi all,
>> >>>>
>> >>>> Here is a document[1] on client api enhancement from our perspective.
>> >>>> We have investigated current implementations. And we propose
>> >>>>
>> >>>> 1. Unify the implementation of cluster deployment and job submission
>> in
>> >>>> Flink.
>> >>>> 2. Provide programmatic interfaces to allow flexible job and cluster
>> >>>> management.
>> >>>>
>> >>>> The first proposal is aimed at reducing code paths of cluster
>> >> deployment
>> >>>> and
>> >>>> job submission so that one can adopt Flink in his usage easily. The
>> >>> second
>> >>>> proposal is aimed at providing rich interfaces for advanced users
>> >>>> who want to make accurate control of these stages.
>> >>>>
>> >>>> Quick reference on open questions:
>> >>>>
>> >>>> 1. Exclude job cluster deployment from client side or redefine the
>> >>> semantic
>> >>>> of job cluster? Since it fits in a process quite different from
>> session
>> >>>> cluster deployment and job submission.
>> >>>>
>> >>>> 2. Maintain the codepaths handling class o.a.f.api.common.Program or
>> >>>> implement customized program handling logic by customized
>> CliFrontend?
>> >>>> See also this thread[2] and the document[1].
>> >>>>
>> >>>> 3. Expose ClusterClient as public api or just expose api in
>> >>>> ExecutionEnvironment
>> >>>> and delegate them to ClusterClient? Further, in either way is it
>> worth
>> >> to
>> >>>> introduce a JobClient which is an encapsulation of ClusterClient that
>> >>>> associated to specific job?
>> >>>>
>> >>>> Best,
>> >>>> tison.
>> >>>>
>> >>>> [1]
>> >>>>
>> >>>>
>> >>>
>> >>
>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
>> >>>> [2]
>> >>>>
>> >>>>
>> >>>
>> >>
>> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
>> >>>>
>> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三 上午9:19写道：
>> >>>>
>> >>>>> Thanks Stephan, I will follow up this issue in next few weeks, and
>> >> will
>> >>>>> refine the design doc. We could discuss more details after 1.9
>> >> release.
>> >>>>>
>> >>>>> Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：
>> >>>>>
>> >>>>>> Hi all!
>> >>>>>>
>> >>>>>> This thread has stalled for a bit, which I assume ist mostly due to
>> >>> the
>> >>>>>> Flink 1.9 feature freeze and release testing effort.
>> >>>>>>
>> >>>>>> I personally still recognize this issue as one important to be
>> >>> solved.
>> >>>>> I'd
>> >>>>>> be happy to help resume this discussion soon (after the 1.9
>> >> release)
>> >>>> and
>> >>>>>> see if we can do some step towards this in Flink 1.10.
>> >>>>>>
>> >>>>>> Best,
>> >>>>>> Stephan
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <
>> >>>>> pompermaier@okkam.it>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> That's exactly what I suggested a long time ago: the Flink REST
>> >>>> client
>> >>>>>>> should not require any Flink dependency, only http library to
>> >> call
>> >>>> the
>> >>>>>> REST
>> >>>>>>> services to submit and monitor a job.
>> >>>>>>> What I suggested also in [1] was to have a way to automatically
>> >>>> suggest
>> >>>>>> the
>> >>>>>>> user (via a UI) the available main classes and their required
>> >>>>>>> parameters[2].
>> >>>>>>> Another problem we have with Flink is that the Rest client and
>> >> the
>> >>>> CLI
>> >>>>>> one
>> >>>>>>> behaves differently and we use the CLI client (via ssh) because
>> >> it
>> >>>>> allows
>> >>>>>>> to call some other method after env.execute() [3] (we have to
>> >> call
>> >>>>>> another
>> >>>>>>> REST service to signal the end of the job).
>> >>>>>>> Int his regard, a dedicated interface, like the JobListener
>> >>> suggested
>> >>>>> in
>> >>>>>>> the previous emails, would be very helpful (IMHO).
>> >>>>>>>
>> >>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-10864
>> >>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-10862
>> >>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-10879
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>> Flavio
>> >>>>>>>
>> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <zj...@gmail.com>
>> >>> wrote:
>> >>>>>>>
>> >>>>>>>> Hi, Tison,
>> >>>>>>>>
>> >>>>>>>> Thanks for your comments. Overall I agree with you that it is
>> >>>>> difficult
>> >>>>>>> for
>> >>>>>>>> down stream project to integrate with flink and we need to
>> >>> refactor
>> >>>>> the
>> >>>>>>>> current flink client api.
>> >>>>>>>> And I agree that CliFrontend should only parsing command line
>> >>>>> arguments
>> >>>>>>> and
>> >>>>>>>> then pass them to ExecutionEnvironment. It is
>> >>>> ExecutionEnvironment's
>> >>>>>>>> responsibility to compile job, create cluster, and submit job.
>> >>>>> Besides
>> >>>>>>>> that, Currently flink has many ExecutionEnvironment
>> >>>> implementations,
>> >>>>>> and
>> >>>>>>>> flink will use the specific one based on the context. IMHO, it
>> >> is
>> >>>> not
>> >>>>>>>> necessary, ExecutionEnvironment should be able to do the right
>> >>>> thing
>> >>>>>>> based
>> >>>>>>>> on the FlinkConf it is received. Too many ExecutionEnvironment
>> >>>>>>>> implementation is another burden for downstream project
>> >>>> integration.
>> >>>>>>>>
>> >>>>>>>> One thing I'd like to mention is flink's scala shell and sql
>> >>>> client,
>> >>>>>>>> although they are sub-modules of flink, they could be treated
>> >> as
>> >>>>>>> downstream
>> >>>>>>>> project which use flink's client api. Currently you will find
>> >> it
>> >>> is
>> >>>>> not
>> >>>>>>>> easy for them to integrate with flink, they share many
>> >> duplicated
>> >>>>> code.
>> >>>>>>> It
>> >>>>>>>> is another sign that we should refactor flink client api.
>> >>>>>>>>
>> >>>>>>>> I believe it is a large and hard change, and I am afraid we can
>> >>> not
>> >>>>>> keep
>> >>>>>>>> compatibility since many of changes are user facing.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：
>> >>>>>>>>
>> >>>>>>>>> Hi all,
>> >>>>>>>>>
>> >>>>>>>>> After a closer look on our client apis, I can see there are
>> >> two
>> >>>>> major
>> >>>>>>>>> issues to consistency and integration, namely different
>> >>>> deployment
>> >>>>> of
>> >>>>>>>>> job cluster which couples job graph creation and cluster
>> >>>>> deployment,
>> >>>>>>>>> and submission via CliFrontend confusing control flow of job
>> >>>> graph
>> >>>>>>>>> compilation and job submission. I'd like to follow the
>> >> discuss
>> >>>>> above,
>> >>>>>>>>> mainly the process described by Jeff and Stephan, and share
>> >> my
>> >>>>>>>>> ideas on these issues.
>> >>>>>>>>>
>> >>>>>>>>> 1) CliFrontend confuses the control flow of job compilation
>> >> and
>> >>>>>>>> submission.
>> >>>>>>>>> Following the process of job submission Stephan and Jeff
>> >>>> described,
>> >>>>>>>>> execution environment knows all configs of the cluster and
>> >>>>>>> topos/settings
>> >>>>>>>>> of the job. Ideally, in the main method of user program, it
>> >>> calls
>> >>>>>>>> #execute
>> >>>>>>>>> (or named #submit) and Flink deploys the cluster, compile the
>> >>> job
>> >>>>>> graph
>> >>>>>>>>> and submit it to the cluster. However, current CliFrontend
>> >> does
>> >>>> all
>> >>>>>>> these
>> >>>>>>>>> things inside its #runProgram method, which introduces a lot
>> >> of
>> >>>>>>>> subclasses
>> >>>>>>>>> of (stream) execution environment.
>> >>>>>>>>>
>> >>>>>>>>> Actually, it sets up an exec env that hijacks the
>> >>>>>> #execute/executePlan
>> >>>>>>>>> method, initializes the job graph and abort execution. And
>> >> then
>> >>>>>>>>> control flow back to CliFrontend, it deploys the cluster(or
>> >>>>> retrieve
>> >>>>>>>>> the client) and submits the job graph. This is quite a
>> >> specific
>> >>>>>>> internal
>> >>>>>>>>> process inside Flink and none of consistency to anything.
>> >>>>>>>>>
>> >>>>>>>>> 2) Deployment of job cluster couples job graph creation and
>> >>>> cluster
>> >>>>>>>>> deployment. Abstractly, from user job to a concrete
>> >> submission,
>> >>>> it
>> >>>>>>>> requires
>> >>>>>>>>>
>> >>>>>>>>>     create JobGraph --\
>> >>>>>>>>>
>> >>>>>>>>> create ClusterClient -->  submit JobGraph
>> >>>>>>>>>
>> >>>>>>>>> such a dependency. ClusterClient was created by deploying or
>> >>>>>>> retrieving.
>> >>>>>>>>> JobGraph submission requires a compiled JobGraph and valid
>> >>>>>>> ClusterClient,
>> >>>>>>>>> but the creation of ClusterClient is abstractly independent
>> >> of
>> >>>> that
>> >>>>>> of
>> >>>>>>>>> JobGraph. However, in job cluster mode, we deploy job cluster
>> >>>> with
>> >>>>> a
>> >>>>>>> job
>> >>>>>>>>> graph, which means we use another process:
>> >>>>>>>>>
>> >>>>>>>>> create JobGraph --> deploy cluster with the JobGraph
>> >>>>>>>>>
>> >>>>>>>>> Here is another inconsistency and downstream projects/client
>> >>> apis
>> >>>>> are
>> >>>>>>>>> forced to handle different cases with rare supports from
>> >> Flink.
>> >>>>>>>>>
>> >>>>>>>>> Since we likely reached a consensus on
>> >>>>>>>>>
>> >>>>>>>>> 1. all configs gathered by Flink configuration and passed
>> >>>>>>>>> 2. execution environment knows all configs and handles
>> >>>>> execution(both
>> >>>>>>>>> deployment and submission)
>> >>>>>>>>>
>> >>>>>>>>> to the issues above I propose eliminating inconsistencies by
>> >>>>>> following
>> >>>>>>>>> approach:
>> >>>>>>>>>
>> >>>>>>>>> 1) CliFrontend should exactly be a front end, at least for
>> >>> "run"
>> >>>>>>> command.
>> >>>>>>>>> That means it just gathered and passed all config from
>> >> command
>> >>>> line
>> >>>>>> to
>> >>>>>>>>> the main method of user program. Execution environment knows
>> >>> all
>> >>>>> the
>> >>>>>>> info
>> >>>>>>>>> and with an addition to utils for ClusterClient, we
>> >> gracefully
>> >>>> get
>> >>>>> a
>> >>>>>>>>> ClusterClient by deploying or retrieving. In this way, we
>> >> don't
>> >>>>> need
>> >>>>>> to
>> >>>>>>>>> hijack #execute/executePlan methods and can remove various
>> >>>> hacking
>> >>>>>>>>> subclasses of exec env, as well as #run methods in
>> >>>>> ClusterClient(for
>> >>>>>> an
>> >>>>>>>>> interface-ized ClusterClient). Now the control flow flows
>> >> from
>> >>>>>>>> CliFrontend
>> >>>>>>>>> to the main method and never returns.
>> >>>>>>>>>
>> >>>>>>>>> 2) Job cluster means a cluster for the specific job. From
>> >>> another
>> >>>>>>>>> perspective, it is an ephemeral session. We may decouple the
>> >>>>>> deployment
>> >>>>>>>>> with a compiled job graph, but start a session with idle
>> >>> timeout
>> >>>>>>>>> and submit the job following.
>> >>>>>>>>>
>> >>>>>>>>> These topics, before we go into more details on design or
>> >>>>>>> implementation,
>> >>>>>>>>> are better to be aware and discussed for a consensus.
>> >>>>>>>>>
>> >>>>>>>>> Best,
>> >>>>>>>>> tison.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
>> >>>>>>>>>
>> >>>>>>>>>> Hi Jeff,
>> >>>>>>>>>>
>> >>>>>>>>>> Thanks for raising this thread and the design document!
>> >>>>>>>>>>
>> >>>>>>>>>> As @Thomas Weise mentioned above, extending config to flink
>> >>>>>>>>>> requires far more effort than it should be. Another example
>> >>>>>>>>>> is we achieve detach mode by introduce another execution
>> >>>>>>>>>> environment which also hijack #execute method.
>> >>>>>>>>>>
>> >>>>>>>>>> I agree with your idea that user would configure all things
>> >>>>>>>>>> and flink "just" respect it. On this topic I think the
>> >> unusual
>> >>>>>>>>>> control flow when CliFrontend handle "run" command is the
>> >>>> problem.
>> >>>>>>>>>> It handles several configs, mainly about cluster settings,
>> >> and
>> >>>>>>>>>> thus main method of user program is unaware of them. Also it
>> >>>>>> compiles
>> >>>>>>>>>> app to job graph by run the main method with a hijacked exec
>> >>>> env,
>> >>>>>>>>>> which constrain the main method further.
>> >>>>>>>>>>
>> >>>>>>>>>> I'd like to write down a few of notes on configs/args pass
>> >> and
>> >>>>>>> respect,
>> >>>>>>>>>> as well as decoupling job compilation and submission. Share
>> >> on
>> >>>>> this
>> >>>>>>>>>> thread later.
>> >>>>>>>>>>
>> >>>>>>>>>> Best,
>> >>>>>>>>>> tison.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一
>> >> 下午7:29写道：
>> >>>>>>>>>>
>> >>>>>>>>>>> Hi Jeff and Flavio,
>> >>>>>>>>>>>
>> >>>>>>>>>>> Thanks Jeff a lot for proposing the design document.
>> >>>>>>>>>>>
>> >>>>>>>>>>> We are also working on refactoring ClusterClient to allow
>> >>>>> flexible
>> >>>>>>> and
>> >>>>>>>>>>> efficient job management in our real-time platform.
>> >>>>>>>>>>> We would like to draft a document to share our ideas with
>> >>> you.
>> >>>>>>>>>>>
>> >>>>>>>>>>> I think it's a good idea to have something like Apache Livy
>> >>> for
>> >>>>>>> Flink,
>> >>>>>>>>>>> and
>> >>>>>>>>>>> the efforts discussed here will take a great step forward
>> >> to
>> >>>> it.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Regards,
>> >>>>>>>>>>> Xiaogang
>> >>>>>>>>>>>
>> >>>>>>>>>>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一
>> >>>>> 下午7:13写道：
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Is there any possibility to have something like Apache
>> >> Livy
>> >>>> [1]
>> >>>>>>> also
>> >>>>>>>>>>> for
>> >>>>>>>>>>>> Flink in the future?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> [1] https://livy.apache.org/
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <
>> >>> zjffdu@gmail.com
>> >>>>>
>> >>>>>>> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Any API we expose should not have dependencies on
>> >>> the
>> >>>>>>> runtime
>> >>>>>>>>>>>>> (flink-runtime) package or other implementation
>> >> details.
>> >>> To
>> >>>>> me,
>> >>>>>>>> this
>> >>>>>>>>>>>> means
>> >>>>>>>>>>>>> that the current ClusterClient cannot be exposed to
>> >> users
>> >>>>>> because
>> >>>>>>>> it
>> >>>>>>>>>>>> uses
>> >>>>>>>>>>>>> quite some classes from the optimiser and runtime
>> >>> packages.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> We should change ClusterClient from class to interface.
>> >>>>>>>>>>>>> ExecutionEnvironment only use the interface
>> >> ClusterClient
>> >>>>> which
>> >>>>>>>>>>> should be
>> >>>>>>>>>>>>> in flink-clients while the concrete implementation
>> >> class
>> >>>>> could
>> >>>>>> be
>> >>>>>>>> in
>> >>>>>>>>>>>>> flink-runtime.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> What happens when a failure/restart in the client
>> >>>>> happens?
>> >>>>>>>> There
>> >>>>>>>>>>> need
>> >>>>>>>>>>>>> to be a way of re-establishing the connection to the
>> >> job,
>> >>>> set
>> >>>>>> up
>> >>>>>>>> the
>> >>>>>>>>>>>>> listeners again, etc.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Good point.  First we need to define what does
>> >>>>> failure/restart
>> >>>>>> in
>> >>>>>>>> the
>> >>>>>>>>>>>>> client mean. IIUC, that usually mean network failure
>> >>> which
>> >>>>> will
>> >>>>>>>>>>> happen in
>> >>>>>>>>>>>>> class RestClient. If my understanding is correct,
>> >>>>> restart/retry
>> >>>>>>>>>>> mechanism
>> >>>>>>>>>>>>> should be done in RestClient.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二
>> >>>>>> 下午11:10写道：
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Some points to consider:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> * Any API we expose should not have dependencies on
>> >> the
>> >>>>>> runtime
>> >>>>>>>>>>>>>> (flink-runtime) package or other implementation
>> >>> details.
>> >>>> To
>> >>>>>> me,
>> >>>>>>>>>>> this
>> >>>>>>>>>>>>> means
>> >>>>>>>>>>>>>> that the current ClusterClient cannot be exposed to
>> >>> users
>> >>>>>>> because
>> >>>>>>>>>>> it
>> >>>>>>>>>>>>> uses
>> >>>>>>>>>>>>>> quite some classes from the optimiser and runtime
>> >>>> packages.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> * What happens when a failure/restart in the client
>> >>>>> happens?
>> >>>>>>>> There
>> >>>>>>>>>>> need
>> >>>>>>>>>>>>> to
>> >>>>>>>>>>>>>> be a way of re-establishing the connection to the
>> >> job,
>> >>>> set
>> >>>>> up
>> >>>>>>> the
>> >>>>>>>>>>>>> listeners
>> >>>>>>>>>>>>>> again, etc.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Aljoscha
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff Zhang <
>> >>>> zjffdu@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Sorry folks, the design doc is late as you
>> >> expected.
>> >>>>> Here's
>> >>>>>>> the
>> >>>>>>>>>>>> design
>> >>>>>>>>>>>>>> doc
>> >>>>>>>>>>>>>>> I drafted, welcome any comments and feedback.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四
>> >>>> 下午8:43写道：
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Nice that this discussion is happening.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> In the FLIP, we could also revisit the entire role
>> >>> of
>> >>>>> the
>> >>>>>>>>>>>> environments
>> >>>>>>>>>>>>>>>> again.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Initially, the idea was:
>> >>>>>>>>>>>>>>>> - the environments take care of the specific
>> >> setup
>> >>>> for
>> >>>>>>>>>>> standalone
>> >>>>>>>>>>>> (no
>> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
>> >>>>>>>>>>>>>>>> - the session ones have control over the session.
>> >>> The
>> >>>>>>>>>>> environment
>> >>>>>>>>>>>>> holds
>> >>>>>>>>>>>>>>>> the session client.
>> >>>>>>>>>>>>>>>> - running a job gives a "control" object for that
>> >>>> job.
>> >>>>>> That
>> >>>>>>>>>>>> behavior
>> >>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>> the same in all environments.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> The actual implementation diverged quite a bit
>> >> from
>> >>>>> that.
>> >>>>>>>> Happy
>> >>>>>>>>>>> to
>> >>>>>>>>>>>>> see a
>> >>>>>>>>>>>>>>>> discussion about straitening this out a bit more.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
>> >>>>>>> zjffdu@gmail.com>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Hi folks,
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Sorry for late response, It seems we reach
>> >>> consensus
>> >>>> on
>> >>>>>>>> this, I
>> >>>>>>>>>>>> will
>> >>>>>>>>>>>>>>>> create
>> >>>>>>>>>>>>>>>>> FLIP for this with more detailed design
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org> 于2018年12月21日周五
>> >>>>> 上午11:43写道：
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Great to see this discussion seeded! The
>> >> problems
>> >>>> you
>> >>>>>> face
>> >>>>>>>>>>> with
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>> Zeppelin integration are also affecting other
>> >>>>> downstream
>> >>>>>>>>>>> projects,
>> >>>>>>>>>>>>>> like
>> >>>>>>>>>>>>>>>>>> Beam.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> We just enabled the savepoint restore option in
>> >>>>>>>>>>>>>> RemoteStreamEnvironment
>> >>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>> and that was more difficult than it should be.
>> >> The
>> >>>>> main
>> >>>>>>>> issue
>> >>>>>>>>>>> is
>> >>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>> environment and cluster client aren't decoupled.
>> >>>>> Ideally
>> >>>>>>> it
>> >>>>>>>>>>> should
>> >>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>> possible to just get the matching cluster client
>> >>>> from
>> >>>>>> the
>> >>>>>>>>>>>>> environment
>> >>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>> then control the job through it (environment as
>> >>>>> factory
>> >>>>>>> for
>> >>>>>>>>>>>> cluster
>> >>>>>>>>>>>>>>>>>> client). But note that the environment classes
>> >> are
>> >>>>> part
>> >>>>>> of
>> >>>>>>>> the
>> >>>>>>>>>>>>> public
>> >>>>>>>>>>>>>>>>> API,
>> >>>>>>>>>>>>>>>>>> and it is not straightforward to make larger
>> >>> changes
>> >>>>>>> without
>> >>>>>>>>>>>>> breaking
>> >>>>>>>>>>>>>>>>>> backward compatibility.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes internal classes
>> >>>> like
>> >>>>>>>>>>> JobGraph and
>> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be possible to wrap
>> >>> this
>> >>>>>> with a
>> >>>>>>>> new
>> >>>>>>>>>>>>> public
>> >>>>>>>>>>>>>>>> API
>> >>>>>>>>>>>>>>>>>> that brings the required job control
>> >> capabilities
>> >>>> for
>> >>>>>>>>>>> downstream
>> >>>>>>>>>>>>>>>>> projects.
>> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at some of the
>> >>>>> interfaces
>> >>>>>> in
>> >>>>>>>>>>> Beam
>> >>>>>>>>>>>>> while
>> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the portable job
>> >> API
>> >>>> and
>> >>>>>> [3]
>> >>>>>>>> for
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>> old
>> >>>>>>>>>>>>>>>>>> asynchronous job control from the Beam Java SDK.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> The backward compatibility discussion [4] is
>> >> also
>> >>>>>> relevant
>> >>>>>>>>>>> here. A
>> >>>>>>>>>>>>> new
>> >>>>>>>>>>>>>>>>> API
>> >>>>>>>>>>>>>>>>>> should shield downstream projects from internals
>> >>> and
>> >>>>>> allow
>> >>>>>>>>>>> them to
>> >>>>>>>>>>>>>>>>>> interoperate with multiple future Flink versions
>> >>> in
>> >>>>> the
>> >>>>>>> same
>> >>>>>>>>>>>> release
>> >>>>>>>>>>>>>>>> line
>> >>>>>>>>>>>>>>>>>> without forced upgrades.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>>>>>>> Thomas
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> [1] https://github.com/apache/flink/pull/7249
>> >>>>>>>>>>>>>>>>>> [2]
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
>> >>>>>>>>>>>>>>>>>> [3]
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
>> >>>>>>>>>>>>>>>>>> [4]
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
>> >>>>>>>> zjffdu@gmail.com>
>> >>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
>> >>> able
>> >>>> to
>> >>>>>>>> define
>> >>>>>>>>>>>> where
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is actually
>> >>>>>> independent
>> >>>>>>>> of
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>> development and is something which is decided
>> >> at
>> >>>>>>> deployment
>> >>>>>>>>>>> time.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> User don't need to specify execution mode
>> >>>>>>> programmatically.
>> >>>>>>>>>>> They
>> >>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>> also
>> >>>>>>>>>>>>>>>>>>> pass the execution mode from the arguments in
>> >>> flink
>> >>>>> run
>> >>>>>>>>>>> command.
>> >>>>>>>>>>>>> e.g.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster ....
>> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
>> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that the
>> >>>> ExecutionEnvironment
>> >>>>>> is
>> >>>>>>>> not
>> >>>>>>>>>>>>>>>> directly
>> >>>>>>>>>>>>>>>>>>> initialized by the user and instead context
>> >>>> sensitive
>> >>>>>> how
>> >>>>>>>> you
>> >>>>>>>>>>>> want
>> >>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs. IDE, for
>> >>> example).
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink would create
>> >>>>> different
>> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment based on different
>> >>>>>> submission
>> >>>>>>>>>>>> scenarios
>> >>>>>>>>>>>>>>>>>> (Flink
>> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is kind of hack
>> >> approach,
>> >>>> not
>> >>>>>> so
>> >>>>>>>>>>>>>>>>> straightforward.
>> >>>>>>>>>>>>>>>>>>> What I suggested above is that is that flink
>> >>> should
>> >>>>>>> always
>> >>>>>>>>>>> create
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>> same
>> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with different
>> >>>>> configuration,
>> >>>>>>> and
>> >>>>>>>>>>> based
>> >>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>> configuration it would create the proper
>> >>>>> ClusterClient
>> >>>>>>> for
>> >>>>>>>>>>>>> different
>> >>>>>>>>>>>>>>>>>>> behaviors.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
>> >>>> 于2018年12月20日周四
>> >>>>>>>>>>> 下午11:18写道：
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> You are probably right that we have code
>> >>>> duplication
>> >>>>>>> when
>> >>>>>>>> it
>> >>>>>>>>>>>> comes
>> >>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient. This should be
>> >>>>> reduced
>> >>>>>> in
>> >>>>>>>> the
>> >>>>>>>>>>>>>>>> future.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
>> >> able
>> >>> to
>> >>>>>>> define
>> >>>>>>>>>>> where
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is actually
>> >>>>>>> independent
>> >>>>>>>>>>> of the
>> >>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>> development and is something which is decided
>> >> at
>> >>>>>>>> deployment
>> >>>>>>>>>>>> time.
>> >>>>>>>>>>>>>>>> To
>> >>>>>>>>>>>>>>>>> me
>> >>>>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>> makes sense that the ExecutionEnvironment is
>> >> not
>> >>>>>>> directly
>> >>>>>>>>>>>>>>>> initialized
>> >>>>>>>>>>>>>>>>>> by
>> >>>>>>>>>>>>>>>>>>>> the user and instead context sensitive how you
>> >>>> want
>> >>>>> to
>> >>>>>>>>>>> execute
>> >>>>>>>>>>>>> your
>> >>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for example). However, I
>> >>> agree
>> >>>>>> that
>> >>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should give you access to
>> >>> the
>> >>>>>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the JobGraph or a
>> >> job
>> >>>>> plan).
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>>>>>>>>>> Till
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
>> >>>>>>>>>>> zjffdu@gmail.com>
>> >>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Hi Till,
>> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You are right that I
>> >>>>> expect
>> >>>>>>>> better
>> >>>>>>>>>>>>>>>>>>> programmatic
>> >>>>>>>>>>>>>>>>>>>>> job submission/control api which could be
>> >> used
>> >>> by
>> >>>>>>>>>>> downstream
>> >>>>>>>>>>>>>>>>> project.
>> >>>>>>>>>>>>>>>>>>> And
>> >>>>>>>>>>>>>>>>>>>>> it would benefit for the flink ecosystem.
>> >> When
>> >>> I
>> >>>>> look
>> >>>>>>> at
>> >>>>>>>>>>> the
>> >>>>>>>>>>>> code
>> >>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>> flink
>> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I believe they
>> >> are
>> >>>> not
>> >>>>>> the
>> >>>>>>>>>>> core of
>> >>>>>>>>>>>>>>>>> flink,
>> >>>>>>>>>>>>>>>>>>> but
>> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of flink), I find
>> >> many
>> >>>>>>> duplicated
>> >>>>>>>>>>> code
>> >>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>> creating
>> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user provided
>> >> configuration
>> >>>>>>>>>>> (configuration
>> >>>>>>>>>>>>>>>>> format
>> >>>>>>>>>>>>>>>>>>> may
>> >>>>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>> different from scala-shell and sql-client)
>> >> and
>> >>>> then
>> >>>>>> use
>> >>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't think this is
>> >>>>> convenient
>> >>>>>>> for
>> >>>>>>>>>>>>>>>> downstream
>> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is that downstream
>> >>>> project
>> >>>>>> only
>> >>>>>>>>>>> needs
>> >>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>> provide
>> >>>>>>>>>>>>>>>>>>>>> necessary configuration info (maybe
>> >> introducing
>> >>>>> class
>> >>>>>>>>>>>> FlinkConf),
>> >>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment based on this
>> >>>> FlinkConf,
>> >>>>>> and
>> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will create the proper
>> >>>>>>>> ClusterClient.
>> >>>>>>>>>>> It
>> >>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>> only
>> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream project
>> >> development
>> >>>> but
>> >>>>>> also
>> >>>>>>>> be
>> >>>>>>>>>>>>>>>> helpful
>> >>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>> their integration test with flink. Here's one
>> >>>>> sample
>> >>>>>>> code
>> >>>>>>>>>>>> snippet
>> >>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>> I
>> >>>>>>>>>>>>>>>>>>>>> expect.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> val conf = new FlinkConf().mode("yarn")
>> >>>>>>>>>>>>>>>>>>>>> val env = new ExecutionEnvironment(conf)
>> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
>> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
>> >>>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
>> >>>>>>>>>>>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> What do you think ?
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
>> >>>>> 于2018年12月11日周二
>> >>>>>>>>>>> 下午6:28写道：
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to provide the
>> >> user
>> >>>> with
>> >>>>>>>> better
>> >>>>>>>>>>>>>>>>>>> programmatic
>> >>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>> control. There was actually an effort to
>> >>> achieve
>> >>>>>> this
>> >>>>>>>> but
>> >>>>>>>>>>> it
>> >>>>>>>>>>>>>>>> has
>> >>>>>>>>>>>>>>>>>>> never
>> >>>>>>>>>>>>>>>>>>>>> been
>> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However, there are some
>> >>>> improvement
>> >>>>>> in
>> >>>>>>>> the
>> >>>>>>>>>>> code
>> >>>>>>>>>>>>>>>>> base
>> >>>>>>>>>>>>>>>>>>>> now.
>> >>>>>>>>>>>>>>>>>>>>>> Look for example at the NewClusterClient
>> >>>> interface
>> >>>>>>> which
>> >>>>>>>>>>>>>>>> offers a
>> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission. But I agree
>> >> that
>> >>> we
>> >>>>>> need
>> >>>>>>> to
>> >>>>>>>>>>>>>>>> improve
>> >>>>>>>>>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>> this regard.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if exposing all
>> >>>>>> ClusterClient
>> >>>>>>>>>>> calls
>> >>>>>>>>>>>>>>>> via
>> >>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment because it would
>> >> clutter
>> >>>> the
>> >>>>>>> class
>> >>>>>>>>>>> and
>> >>>>>>>>>>>>>>>> would
>> >>>>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns. Instead one
>> >> idea
>> >>>>> could
>> >>>>>> be
>> >>>>>>>> to
>> >>>>>>>>>>>>>>>>> retrieve
>> >>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from the
>> >>>>> ExecutionEnvironment
>> >>>>>>>> which
>> >>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>> used
>> >>>>>>>>>>>>>>>>>>>>>> for cluster and job control. But before we
>> >>> start
>> >>>>> an
>> >>>>>>>> effort
>> >>>>>>>>>>>>>>>> here,
>> >>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>> need
>> >>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>> agree and capture what functionality we want
>> >>> to
>> >>>>>>> provide.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was that we have the
>> >>>>>>>> ClusterDescriptor
>> >>>>>>>>>>>>>>>>>> describing
>> >>>>>>>>>>>>>>>>>>>> how
>> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager like Yarn or
>> >> Mesos.
>> >>>> The
>> >>>>>>>>>>>>>>>>>> ClusterDescriptor
>> >>>>>>>>>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink clusters (job and
>> >>>>> session)
>> >>>>>>> and
>> >>>>>>>>>>> gives
>> >>>>>>>>>>>>>>>>> you a
>> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The ClusterClient controls
>> >> the
>> >>>>>> cluster
>> >>>>>>>>>>> (e.g.
>> >>>>>>>>>>>>>>>>>>> submitting
>> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running jobs). And then
>> >>> there
>> >>>>> was
>> >>>>>>> the
>> >>>>>>>>>>> idea
>> >>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>> introduce a
>> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain from the
>> >>>> ClusterClient
>> >>>>> to
>> >>>>>>>>>>> trigger
>> >>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>> specific
>> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a savepoint,
>> >>> cancelling
>> >>>>> the
>> >>>>>>>> job).
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>> https://issues.apache.org/jira/browse/FLINK-4272
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>>>>>>>>>>>> Till
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang
>> >> <
>> >>>>>>>>>>> zjffdu@gmail.com
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate flink into apache
>> >>>>> zeppelin
>> >>>>>>>>>>> which is
>> >>>>>>>>>>>>>>>> an
>> >>>>>>>>>>>>>>>>>>>>>> interactive
>> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several issues that is
>> >>>> caused
>> >>>>>> by
>> >>>>>>>>>>> flink
>> >>>>>>>>>>>>>>>>> client
>> >>>>>>>>>>>>>>>>>>>> api.
>> >>>>>>>>>>>>>>>>>>>>> So
>> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the following changes
>> >>> for
>> >>>>>> flink
>> >>>>>>>>>>> client
>> >>>>>>>>>>>>>>>>> api.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking execution.
>> >> Currently,
>> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
>> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which would do 2
>> >> things,
>> >>>>> first
>> >>>>>>>>>>> submit
>> >>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is finished. I'd like
>> >>>>>>> introduce a
>> >>>>>>>>>>>>>>>>>> nonblocking
>> >>>>>>>>>>>>>>>>>>>>>>> execution method like
>> >>>> ExecutionEnvironment#submit
>> >>>>>>> which
>> >>>>>>>>>>> only
>> >>>>>>>>>>>>>>>>>> submit
>> >>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to client. And allow user
>> >>> to
>> >>>>>> query
>> >>>>>>>> the
>> >>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>> status
>> >>>>>>>>>>>>>>>>>>>> via
>> >>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>> jobId.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
>> >>>>>>>>>>>>>>>>>>>
>> >> ExecutionEnvironment/StreamExecutionEnvironment,
>> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to cancel job is via
>> >>> cli
>> >>>>>>>>>>> (bin/flink),
>> >>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream project to use
>> >> this
>> >>>>>>> feature.
>> >>>>>>>>>>> So I'd
>> >>>>>>>>>>>>>>>>>> like
>> >>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>> add
>> >>>>>>>>>>>>>>>>>>>>>>> cancel api in ExecutionEnvironment
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
>> >>>>>>>>>>>>>>>>>>>>>
>> >>> ExecutionEnvironment/StreamExecutionEnvironment.
>> >>>>>>>>>>>>>>>>>>>>>> It
>> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api, we should use
>> >>>>>>>>>>> ExecutionEnvironment
>> >>>>>>>>>>>>>>>> as
>> >>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>> unified
>> >>>>>>>>>>>>>>>>>>>>>>> api for third party to integrate with
>> >> flink.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job execution
>> >> lifecycle.
>> >>>>>>> Something
>> >>>>>>>>>>> like
>> >>>>>>>>>>>>>>>>>>>> following,
>> >>>>>>>>>>>>>>>>>>>>> so
>> >>>>>>>>>>>>>>>>>>>>>>> that downstream project can do custom logic
>> >>> in
>> >>>>> the
>> >>>>>>>>>>> lifecycle
>> >>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>> job.
>> >>>>>>>>>>>>>>>>>>>>> e.g.
>> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the jobId after job
>> >> is
>> >>>>>>> submitted
>> >>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>> use
>> >>>>>>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later when necessary.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> public interface JobListener {
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID jobId);
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>  void onJobExecuted(JobExecutionResult
>> >>>>> jobResult);
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID jobId);
>> >>>>>>>>>>>>>>>>>>>>>>> }
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in ExecutionEnvironment.
>> >>>>>> Currently
>> >>>>>>> it
>> >>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>> disabled,
>> >>>>>>>>>>>>>>>>>>>>> but
>> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient for third party
>> >> to
>> >>>>>>>> submitting
>> >>>>>>>>>>> jobs
>> >>>>>>>>>>>>>>>>>>>>>> continually.
>> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it again.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client api into
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>> ExecutionEnvironment/StreamExecutionEnvironment.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue which needs more
>> >>>>> careful
>> >>>>>>>>>>> thinking
>> >>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>> design.
>> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features of flink is
>> >>> exposed
>> >>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>> ExecutionEnvironment/StreamExecutionEnvironment,
>> >>>>>> but
>> >>>>>>>>>>> some are
>> >>>>>>>>>>>>>>>>>>> exposed
>> >>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like the cancel and
>> >>>>> savepoint I
>> >>>>>>>>>>> mentioned
>> >>>>>>>>>>>>>>>>>>> above.
>> >>>>>>>>>>>>>>>>>>>> I
>> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is due to that flink
>> >>>> didn't
>> >>>>>>> unify
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>> interaction
>> >>>>>>>>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3 scenarios of flink
>> >>>> operation
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.  Flink will create
>> >>>>>>>>>>> LocalEnvironment
>> >>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>>> use
>> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to create
>> >>> LocalExecutor
>> >>>>> for
>> >>>>>>> job
>> >>>>>>>>>>>>>>>>>> execution.
>> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution. Flink will create
>> >>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>>> first
>> >>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment based on the
>> >>>>>>> ClusterClient
>> >>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>> run
>> >>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>> job.
>> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink will create
>> >>>>>> ClusterClient
>> >>>>>>>>>>> first
>> >>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>> then
>> >>>>>>>>>>>>>>>>>>>>>> cancel
>> >>>>>>>>>>>>>>>>>>>>>>>  this job via this ClusterClient.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the above 3 scenarios.
>> >>> Flink
>> >>>>>> didn't
>> >>>>>>>>>>> use the
>> >>>>>>>>>>>>>>>>>> same
>> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to interact with flink
>> >>>>>>>>>>>>>>>>>>>>>>> What I propose is following:
>> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
>> >>>>>> LocalEnvironment/RemoteEnvironment
>> >>>>>>>>>>> (based
>> >>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>> user
>> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this Environment to
>> >>>> create
>> >>>>>>>> proper
>> >>>>>>>>>>>>>>>>>>>> ClusterClient
>> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or RestClusterClient)
>> >> to
>> >>>>>>>> interactive
>> >>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>>>> Flink (
>> >>>>>>>>>>>>>>>>>>>>> job
>> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the process of local
>> >>>>>> execution
>> >>>>>>>> and
>> >>>>>>>>>>>>>>>> remote
>> >>>>>>>>>>>>>>>>>>>>>> execution.
>> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for third party to
>> >>>>> integrate
>> >>>>>>> with
>> >>>>>>>>>>>>>>>> flink,
>> >>>>>>>>>>>>>>>>>>>> because
>> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the unified entry
>> >>> point
>> >>>>> for
>> >>>>>>>>>>> flink.
>> >>>>>>>>>>>>>>>> What
>> >>>>>>>>>>>>>>>>>>> third
>> >>>>>>>>>>>>>>>>>>>>>> party
>> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass configuration to
>> >>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will do the right
>> >> thing
>> >>>>> based
>> >>>>>> on
>> >>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>> configuration.
>> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be considered as flink
>> >> api
>> >>>>>>> consumer.
>> >>>>>>>>>>> it
>> >>>>>>>>>>>>>>>> just
>> >>>>>>>>>>>>>>>>>>> pass
>> >>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>> configuration to ExecutionEnvironment and
>> >> let
>> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
>> >>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>> create the proper ClusterClient instead of
>> >>>>> letting
>> >>>>>>> cli
>> >>>>>>>> to
>> >>>>>>>>>>>>>>>>> create
>> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code refactoring, so
>> >> I
>> >>>>> think
>> >>>>>> we
>> >>>>>>>> can
>> >>>>>>>>>>>>>>>> defer
>> >>>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5 could be done at
>> >>>> once I
>> >>>>>>>>>>> believe.
>> >>>>>>>>>>>>>>>> Let
>> >>>>>>>>>>>>>>>>> me
>> >>>>>>>>>>>>>>>>>>>> know
>> >>>>>>>>>>>>>>>>>>>>>> your
>> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback, thanks
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Jeff Zhang
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Jeff Zhang
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Jeff Zhang
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> --
>> >>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Jeff Zhang
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> Best Regards
>> >>>>>>>>
>> >>>>>>>> Jeff Zhang
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Best Regards
>> >>>>>
>> >>>>> Jeff Zhang
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>> >
>> > --
>> > Best Regards
>> >
>> > Jeff Zhang
>>
>>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

Hi Aljoscha,

Thanks for your reply and participance. The Google Doc you linked to
requires
permission and I think you could use a share link instead.

I agree with that we almost reach a consensus that JobClient is necessary to
interacte with a running Job.

Let me check your open questions one by one.

1. Separate cluster creation and job submission for per-job mode.

As you mentioned here is where the opinions diverge. In my document there is
an alternative[2] that proposes excluding per-job deployment from client api
scope and now I find it is more reasonable we do the exclusion.

When in per-job mode, a dedicated JobCluster is launched to execute the
specific job. It is like a Flink Application more than a submission
of Flink Job. Client only takes care of job submission and assume there is
an existing cluster. In this way we are able to consider per-job issues
individually and JobClusterEntrypoint would be the utility class for per-job
deployment.

Nevertheless, user program works in both session mode and per-job mode
without
necessary to change code. JobClient in per-job mode is returned from
env.execute as normal. However, it would be no longer a wrapper of
RestClusterClient but a wrapper of PerJobClusterClient which communicates
to Dispatcher locally.

2. How to deal with plan preview.

With env.compile functions users can get JobGraph or FlinkPlan and thus
they can preview the plan with programming. Typically it looks like

if (preview configured) {
    FlinkPlan plan = env.compile();
    new JSONDumpGenerator(...).dump(plan);
} else {
    env.execute();
}

And `flink info` would be invalid any more.

3. How to deal with Jar Submission at the Web Frontend.

There is one more thread talked on this topic[1]. Apart from removing
the functions there are two alternatives.

One is to introduce an interface has a method returns JobGraph/FilnkPlan
and Jar Submission only support main-class implements this interface.
And then extract the JobGraph/FlinkPlan just by calling the method.
In this way, it is even possible to consider a separation of job creation
and job submission.

The other is, as you mentioned, let execute() do the actual execution.
We won't execute the main method in the WebFrontend but spawn a process
at WebMonitor side to execute. For return part we could generate the
JobID from WebMonitor and pass it to the execution environemnt.

4. How to deal with detached mode.

I think detached mode is a temporary solution for non-blocking submission.
In my document both submission and execution return a CompletableFuture and
users control whether or not wait for the result. In this point we don't
need a detached option but the functionality is covered.

5. How does per-job mode interact with interactive programming.

All of YARN, Mesos and Kubernetes scenarios follow the pattern launch a
JobCluster now. And I don't think there would be inconsistency between
different resource management.

Best,
tison.

[1]
https://lists.apache.org/x/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
[2]
https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?disco=AAAADZaGGfs

Aljoscha Krettek <al...@apache.org> 于2019年8月16日周五 下午9:20写道：

> Hi,
>
> I read both Jeffs initial design document and the newer document by Tison.
> I also finally found the time to collect our thoughts on the issue, I had
> quite some discussions with Kostas and this is the result: [1].
>
> I think overall we agree that this part of the code is in dire need of
> some refactoring/improvements but I think there are still some open
> questions and some differences in opinion what those refactorings should
> look like.
>
> I think the API-side is quite clear, i.e. we need some JobClient API that
> allows interacting with a running Job. It could be worthwhile to spin that
> off into a separate FLIP because we can probably find consensus on that
> part more easily.
>
> For the rest, the main open questions from our doc are these:
>
>   - Do we want to separate cluster creation and job submission for per-job
> mode? In the past, there were conscious efforts to *not* separate job
> submission from cluster creation for per-job clusters for Mesos, YARN,
> Kubernets (see StandaloneJobClusterEntryPoint). Tison suggests in his
> design document to decouple this in order to unify job submission.
>
>   - How to deal with plan preview, which needs to hijack execute() and let
> the outside code catch an exception?
>
>   - How to deal with Jar Submission at the Web Frontend, which needs to
> hijack execute() and let the outside code catch an exception?
> CliFrontend.run() “hijacks” ExecutionEnvironment.execute() to get a
> JobGraph and then execute that JobGraph manually. We could get around that
> by letting execute() do the actual execution. One caveat for this is that
> now the main() method doesn’t return (or is forced to return by throwing an
> exception from execute()) which means that for Jar Submission from the
> WebFrontend we have a long-running main() method running in the
> WebFrontend. This doesn’t sound very good. We could get around this by
> removing the plan preview feature and by removing Jar Submission/Running.
>
>   - How to deal with detached mode? Right now, DetachedEnvironment will
> execute the job and return immediately. If users control when they want to
> return, by waiting on the job completion future, how do we deal with this?
> Do we simply remove the distinction between detached/non-detached?
>
>   - How does per-job mode interact with “interactive programming”
> (FLIP-36). For YARN, each execute() call could spawn a new Flink YARN
> cluster. What about Mesos and Kubernetes?
>
> The first open question is where the opinions diverge, I think. The rest
> are just open questions and interesting things that we need to consider.
>
> Best,
> Aljoscha
>
> [1]
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> <
> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix
> >
>
> > On 31. Jul 2019, at 15:23, Jeff Zhang <zj...@gmail.com> wrote:
> >
> > Thanks tison for the effort. I left a few comments.
> >
> >
> > Zili Chen <wa...@gmail.com> 于2019年7月31日周三 下午8:24写道：
> >
> >> Hi Flavio,
> >>
> >> Thanks for your reply.
> >>
> >> Either current impl and in the design, ClusterClient
> >> never takes responsibility for generating JobGraph.
> >> (what you see in current codebase is several class methods)
> >>
> >> Instead, user describes his program in the main method
> >> with ExecutionEnvironment apis and calls env.compile()
> >> or env.optimize() to get FlinkPlan and JobGraph respectively.
> >>
> >> For listing main classes in a jar and choose one for
> >> submission, you're now able to customize a CLI to do it.
> >> Specifically, the path of jar is passed as arguments and
> >> in the customized CLI you list main classes, choose one
> >> to submit to the cluster.
> >>
> >> Best,
> >> tison.
> >>
> >>
> >> Flavio Pompermaier <po...@okkam.it> 于2019年7月31日周三 下午8:12写道：
> >>
> >>> Just one note on my side: it is not clear to me whether the client
> needs
> >> to
> >>> be able to generate a job graph or not.
> >>> In my opinion, the job jar must resides only on the server/jobManager
> >> side
> >>> and the client requires a way to get the job graph.
> >>> If you really want to access to the job graph, I'd add a dedicated
> method
> >>> on the ClusterClient. like:
> >>>
> >>>   - getJobGraph(jarId, mainClass): JobGraph
> >>>   - listMainClasses(jarId): List<String>
> >>>
> >>> These would require some addition also on the job manager endpoint as
> >>> well..what do you think?
> >>>
> >>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <wa...@gmail.com>
> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> Here is a document[1] on client api enhancement from our perspective.
> >>>> We have investigated current implementations. And we propose
> >>>>
> >>>> 1. Unify the implementation of cluster deployment and job submission
> in
> >>>> Flink.
> >>>> 2. Provide programmatic interfaces to allow flexible job and cluster
> >>>> management.
> >>>>
> >>>> The first proposal is aimed at reducing code paths of cluster
> >> deployment
> >>>> and
> >>>> job submission so that one can adopt Flink in his usage easily. The
> >>> second
> >>>> proposal is aimed at providing rich interfaces for advanced users
> >>>> who want to make accurate control of these stages.
> >>>>
> >>>> Quick reference on open questions:
> >>>>
> >>>> 1. Exclude job cluster deployment from client side or redefine the
> >>> semantic
> >>>> of job cluster? Since it fits in a process quite different from
> session
> >>>> cluster deployment and job submission.
> >>>>
> >>>> 2. Maintain the codepaths handling class o.a.f.api.common.Program or
> >>>> implement customized program handling logic by customized CliFrontend?
> >>>> See also this thread[2] and the document[1].
> >>>>
> >>>> 3. Expose ClusterClient as public api or just expose api in
> >>>> ExecutionEnvironment
> >>>> and delegate them to ClusterClient? Further, in either way is it worth
> >> to
> >>>> introduce a JobClient which is an encapsulation of ClusterClient that
> >>>> associated to specific job?
> >>>>
> >>>> Best,
> >>>> tison.
> >>>>
> >>>> [1]
> >>>>
> >>>>
> >>>
> >>
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> >>>> [2]
> >>>>
> >>>>
> >>>
> >>
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> >>>>
> >>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三 上午9:19写道：
> >>>>
> >>>>> Thanks Stephan, I will follow up this issue in next few weeks, and
> >> will
> >>>>> refine the design doc. We could discuss more details after 1.9
> >> release.
> >>>>>
> >>>>> Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：
> >>>>>
> >>>>>> Hi all!
> >>>>>>
> >>>>>> This thread has stalled for a bit, which I assume ist mostly due to
> >>> the
> >>>>>> Flink 1.9 feature freeze and release testing effort.
> >>>>>>
> >>>>>> I personally still recognize this issue as one important to be
> >>> solved.
> >>>>> I'd
> >>>>>> be happy to help resume this discussion soon (after the 1.9
> >> release)
> >>>> and
> >>>>>> see if we can do some step towards this in Flink 1.10.
> >>>>>>
> >>>>>> Best,
> >>>>>> Stephan
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <
> >>>>> pompermaier@okkam.it>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> That's exactly what I suggested a long time ago: the Flink REST
> >>>> client
> >>>>>>> should not require any Flink dependency, only http library to
> >> call
> >>>> the
> >>>>>> REST
> >>>>>>> services to submit and monitor a job.
> >>>>>>> What I suggested also in [1] was to have a way to automatically
> >>>> suggest
> >>>>>> the
> >>>>>>> user (via a UI) the available main classes and their required
> >>>>>>> parameters[2].
> >>>>>>> Another problem we have with Flink is that the Rest client and
> >> the
> >>>> CLI
> >>>>>> one
> >>>>>>> behaves differently and we use the CLI client (via ssh) because
> >> it
> >>>>> allows
> >>>>>>> to call some other method after env.execute() [3] (we have to
> >> call
> >>>>>> another
> >>>>>>> REST service to signal the end of the job).
> >>>>>>> Int his regard, a dedicated interface, like the JobListener
> >>> suggested
> >>>>> in
> >>>>>>> the previous emails, would be very helpful (IMHO).
> >>>>>>>
> >>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-10864
> >>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-10862
> >>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-10879
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Flavio
> >>>>>>>
> >>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <zj...@gmail.com>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Hi, Tison,
> >>>>>>>>
> >>>>>>>> Thanks for your comments. Overall I agree with you that it is
> >>>>> difficult
> >>>>>>> for
> >>>>>>>> down stream project to integrate with flink and we need to
> >>> refactor
> >>>>> the
> >>>>>>>> current flink client api.
> >>>>>>>> And I agree that CliFrontend should only parsing command line
> >>>>> arguments
> >>>>>>> and
> >>>>>>>> then pass them to ExecutionEnvironment. It is
> >>>> ExecutionEnvironment's
> >>>>>>>> responsibility to compile job, create cluster, and submit job.
> >>>>> Besides
> >>>>>>>> that, Currently flink has many ExecutionEnvironment
> >>>> implementations,
> >>>>>> and
> >>>>>>>> flink will use the specific one based on the context. IMHO, it
> >> is
> >>>> not
> >>>>>>>> necessary, ExecutionEnvironment should be able to do the right
> >>>> thing
> >>>>>>> based
> >>>>>>>> on the FlinkConf it is received. Too many ExecutionEnvironment
> >>>>>>>> implementation is another burden for downstream project
> >>>> integration.
> >>>>>>>>
> >>>>>>>> One thing I'd like to mention is flink's scala shell and sql
> >>>> client,
> >>>>>>>> although they are sub-modules of flink, they could be treated
> >> as
> >>>>>>> downstream
> >>>>>>>> project which use flink's client api. Currently you will find
> >> it
> >>> is
> >>>>> not
> >>>>>>>> easy for them to integrate with flink, they share many
> >> duplicated
> >>>>> code.
> >>>>>>> It
> >>>>>>>> is another sign that we should refactor flink client api.
> >>>>>>>>
> >>>>>>>> I believe it is a large and hard change, and I am afraid we can
> >>> not
> >>>>>> keep
> >>>>>>>> compatibility since many of changes are user facing.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：
> >>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> After a closer look on our client apis, I can see there are
> >> two
> >>>>> major
> >>>>>>>>> issues to consistency and integration, namely different
> >>>> deployment
> >>>>> of
> >>>>>>>>> job cluster which couples job graph creation and cluster
> >>>>> deployment,
> >>>>>>>>> and submission via CliFrontend confusing control flow of job
> >>>> graph
> >>>>>>>>> compilation and job submission. I'd like to follow the
> >> discuss
> >>>>> above,
> >>>>>>>>> mainly the process described by Jeff and Stephan, and share
> >> my
> >>>>>>>>> ideas on these issues.
> >>>>>>>>>
> >>>>>>>>> 1) CliFrontend confuses the control flow of job compilation
> >> and
> >>>>>>>> submission.
> >>>>>>>>> Following the process of job submission Stephan and Jeff
> >>>> described,
> >>>>>>>>> execution environment knows all configs of the cluster and
> >>>>>>> topos/settings
> >>>>>>>>> of the job. Ideally, in the main method of user program, it
> >>> calls
> >>>>>>>> #execute
> >>>>>>>>> (or named #submit) and Flink deploys the cluster, compile the
> >>> job
> >>>>>> graph
> >>>>>>>>> and submit it to the cluster. However, current CliFrontend
> >> does
> >>>> all
> >>>>>>> these
> >>>>>>>>> things inside its #runProgram method, which introduces a lot
> >> of
> >>>>>>>> subclasses
> >>>>>>>>> of (stream) execution environment.
> >>>>>>>>>
> >>>>>>>>> Actually, it sets up an exec env that hijacks the
> >>>>>> #execute/executePlan
> >>>>>>>>> method, initializes the job graph and abort execution. And
> >> then
> >>>>>>>>> control flow back to CliFrontend, it deploys the cluster(or
> >>>>> retrieve
> >>>>>>>>> the client) and submits the job graph. This is quite a
> >> specific
> >>>>>>> internal
> >>>>>>>>> process inside Flink and none of consistency to anything.
> >>>>>>>>>
> >>>>>>>>> 2) Deployment of job cluster couples job graph creation and
> >>>> cluster
> >>>>>>>>> deployment. Abstractly, from user job to a concrete
> >> submission,
> >>>> it
> >>>>>>>> requires
> >>>>>>>>>
> >>>>>>>>>     create JobGraph --\
> >>>>>>>>>
> >>>>>>>>> create ClusterClient -->  submit JobGraph
> >>>>>>>>>
> >>>>>>>>> such a dependency. ClusterClient was created by deploying or
> >>>>>>> retrieving.
> >>>>>>>>> JobGraph submission requires a compiled JobGraph and valid
> >>>>>>> ClusterClient,
> >>>>>>>>> but the creation of ClusterClient is abstractly independent
> >> of
> >>>> that
> >>>>>> of
> >>>>>>>>> JobGraph. However, in job cluster mode, we deploy job cluster
> >>>> with
> >>>>> a
> >>>>>>> job
> >>>>>>>>> graph, which means we use another process:
> >>>>>>>>>
> >>>>>>>>> create JobGraph --> deploy cluster with the JobGraph
> >>>>>>>>>
> >>>>>>>>> Here is another inconsistency and downstream projects/client
> >>> apis
> >>>>> are
> >>>>>>>>> forced to handle different cases with rare supports from
> >> Flink.
> >>>>>>>>>
> >>>>>>>>> Since we likely reached a consensus on
> >>>>>>>>>
> >>>>>>>>> 1. all configs gathered by Flink configuration and passed
> >>>>>>>>> 2. execution environment knows all configs and handles
> >>>>> execution(both
> >>>>>>>>> deployment and submission)
> >>>>>>>>>
> >>>>>>>>> to the issues above I propose eliminating inconsistencies by
> >>>>>> following
> >>>>>>>>> approach:
> >>>>>>>>>
> >>>>>>>>> 1) CliFrontend should exactly be a front end, at least for
> >>> "run"
> >>>>>>> command.
> >>>>>>>>> That means it just gathered and passed all config from
> >> command
> >>>> line
> >>>>>> to
> >>>>>>>>> the main method of user program. Execution environment knows
> >>> all
> >>>>> the
> >>>>>>> info
> >>>>>>>>> and with an addition to utils for ClusterClient, we
> >> gracefully
> >>>> get
> >>>>> a
> >>>>>>>>> ClusterClient by deploying or retrieving. In this way, we
> >> don't
> >>>>> need
> >>>>>> to
> >>>>>>>>> hijack #execute/executePlan methods and can remove various
> >>>> hacking
> >>>>>>>>> subclasses of exec env, as well as #run methods in
> >>>>> ClusterClient(for
> >>>>>> an
> >>>>>>>>> interface-ized ClusterClient). Now the control flow flows
> >> from
> >>>>>>>> CliFrontend
> >>>>>>>>> to the main method and never returns.
> >>>>>>>>>
> >>>>>>>>> 2) Job cluster means a cluster for the specific job. From
> >>> another
> >>>>>>>>> perspective, it is an ephemeral session. We may decouple the
> >>>>>> deployment
> >>>>>>>>> with a compiled job graph, but start a session with idle
> >>> timeout
> >>>>>>>>> and submit the job following.
> >>>>>>>>>
> >>>>>>>>> These topics, before we go into more details on design or
> >>>>>>> implementation,
> >>>>>>>>> are better to be aware and discussed for a consensus.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> tison.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
> >>>>>>>>>
> >>>>>>>>>> Hi Jeff,
> >>>>>>>>>>
> >>>>>>>>>> Thanks for raising this thread and the design document!
> >>>>>>>>>>
> >>>>>>>>>> As @Thomas Weise mentioned above, extending config to flink
> >>>>>>>>>> requires far more effort than it should be. Another example
> >>>>>>>>>> is we achieve detach mode by introduce another execution
> >>>>>>>>>> environment which also hijack #execute method.
> >>>>>>>>>>
> >>>>>>>>>> I agree with your idea that user would configure all things
> >>>>>>>>>> and flink "just" respect it. On this topic I think the
> >> unusual
> >>>>>>>>>> control flow when CliFrontend handle "run" command is the
> >>>> problem.
> >>>>>>>>>> It handles several configs, mainly about cluster settings,
> >> and
> >>>>>>>>>> thus main method of user program is unaware of them. Also it
> >>>>>> compiles
> >>>>>>>>>> app to job graph by run the main method with a hijacked exec
> >>>> env,
> >>>>>>>>>> which constrain the main method further.
> >>>>>>>>>>
> >>>>>>>>>> I'd like to write down a few of notes on configs/args pass
> >> and
> >>>>>>> respect,
> >>>>>>>>>> as well as decoupling job compilation and submission. Share
> >> on
> >>>>> this
> >>>>>>>>>> thread later.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> tison.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一
> >> 下午7:29写道：
> >>>>>>>>>>
> >>>>>>>>>>> Hi Jeff and Flavio,
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks Jeff a lot for proposing the design document.
> >>>>>>>>>>>
> >>>>>>>>>>> We are also working on refactoring ClusterClient to allow
> >>>>> flexible
> >>>>>>> and
> >>>>>>>>>>> efficient job management in our real-time platform.
> >>>>>>>>>>> We would like to draft a document to share our ideas with
> >>> you.
> >>>>>>>>>>>
> >>>>>>>>>>> I think it's a good idea to have something like Apache Livy
> >>> for
> >>>>>>> Flink,
> >>>>>>>>>>> and
> >>>>>>>>>>> the efforts discussed here will take a great step forward
> >> to
> >>>> it.
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Xiaogang
> >>>>>>>>>>>
> >>>>>>>>>>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一
> >>>>> 下午7:13写道：
> >>>>>>>>>>>
> >>>>>>>>>>>> Is there any possibility to have something like Apache
> >> Livy
> >>>> [1]
> >>>>>>> also
> >>>>>>>>>>> for
> >>>>>>>>>>>> Flink in the future?
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1] https://livy.apache.org/
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <
> >>> zjffdu@gmail.com
> >>>>>
> >>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Any API we expose should not have dependencies on
> >>> the
> >>>>>>> runtime
> >>>>>>>>>>>>> (flink-runtime) package or other implementation
> >> details.
> >>> To
> >>>>> me,
> >>>>>>>> this
> >>>>>>>>>>>> means
> >>>>>>>>>>>>> that the current ClusterClient cannot be exposed to
> >> users
> >>>>>> because
> >>>>>>>> it
> >>>>>>>>>>>> uses
> >>>>>>>>>>>>> quite some classes from the optimiser and runtime
> >>> packages.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We should change ClusterClient from class to interface.
> >>>>>>>>>>>>> ExecutionEnvironment only use the interface
> >> ClusterClient
> >>>>> which
> >>>>>>>>>>> should be
> >>>>>>>>>>>>> in flink-clients while the concrete implementation
> >> class
> >>>>> could
> >>>>>> be
> >>>>>>>> in
> >>>>>>>>>>>>> flink-runtime.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> What happens when a failure/restart in the client
> >>>>> happens?
> >>>>>>>> There
> >>>>>>>>>>> need
> >>>>>>>>>>>>> to be a way of re-establishing the connection to the
> >> job,
> >>>> set
> >>>>>> up
> >>>>>>>> the
> >>>>>>>>>>>>> listeners again, etc.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Good point.  First we need to define what does
> >>>>> failure/restart
> >>>>>> in
> >>>>>>>> the
> >>>>>>>>>>>>> client mean. IIUC, that usually mean network failure
> >>> which
> >>>>> will
> >>>>>>>>>>> happen in
> >>>>>>>>>>>>> class RestClient. If my understanding is correct,
> >>>>> restart/retry
> >>>>>>>>>>> mechanism
> >>>>>>>>>>>>> should be done in RestClient.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二
> >>>>>> 下午11:10写道：
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Some points to consider:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> * Any API we expose should not have dependencies on
> >> the
> >>>>>> runtime
> >>>>>>>>>>>>>> (flink-runtime) package or other implementation
> >>> details.
> >>>> To
> >>>>>> me,
> >>>>>>>>>>> this
> >>>>>>>>>>>>> means
> >>>>>>>>>>>>>> that the current ClusterClient cannot be exposed to
> >>> users
> >>>>>>> because
> >>>>>>>>>>> it
> >>>>>>>>>>>>> uses
> >>>>>>>>>>>>>> quite some classes from the optimiser and runtime
> >>>> packages.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> * What happens when a failure/restart in the client
> >>>>> happens?
> >>>>>>>> There
> >>>>>>>>>>> need
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>> be a way of re-establishing the connection to the
> >> job,
> >>>> set
> >>>>> up
> >>>>>>> the
> >>>>>>>>>>>>> listeners
> >>>>>>>>>>>>>> again, etc.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Aljoscha
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff Zhang <
> >>>> zjffdu@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Sorry folks, the design doc is late as you
> >> expected.
> >>>>> Here's
> >>>>>>> the
> >>>>>>>>>>>> design
> >>>>>>>>>>>>>> doc
> >>>>>>>>>>>>>>> I drafted, welcome any comments and feedback.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四
> >>>> 下午8:43写道：
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Nice that this discussion is happening.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> In the FLIP, we could also revisit the entire role
> >>> of
> >>>>> the
> >>>>>>>>>>>> environments
> >>>>>>>>>>>>>>>> again.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Initially, the idea was:
> >>>>>>>>>>>>>>>> - the environments take care of the specific
> >> setup
> >>>> for
> >>>>>>>>>>> standalone
> >>>>>>>>>>>> (no
> >>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
> >>>>>>>>>>>>>>>> - the session ones have control over the session.
> >>> The
> >>>>>>>>>>> environment
> >>>>>>>>>>>>> holds
> >>>>>>>>>>>>>>>> the session client.
> >>>>>>>>>>>>>>>> - running a job gives a "control" object for that
> >>>> job.
> >>>>>> That
> >>>>>>>>>>>> behavior
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>> the same in all environments.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The actual implementation diverged quite a bit
> >> from
> >>>>> that.
> >>>>>>>> Happy
> >>>>>>>>>>> to
> >>>>>>>>>>>>> see a
> >>>>>>>>>>>>>>>> discussion about straitening this out a bit more.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
> >>>>>>> zjffdu@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi folks,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Sorry for late response, It seems we reach
> >>> consensus
> >>>> on
> >>>>>>>> this, I
> >>>>>>>>>>>> will
> >>>>>>>>>>>>>>>> create
> >>>>>>>>>>>>>>>>> FLIP for this with more detailed design
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org> 于2018年12月21日周五
> >>>>> 上午11:43写道：
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Great to see this discussion seeded! The
> >> problems
> >>>> you
> >>>>>> face
> >>>>>>>>>>> with
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> Zeppelin integration are also affecting other
> >>>>> downstream
> >>>>>>>>>>> projects,
> >>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>> Beam.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> We just enabled the savepoint restore option in
> >>>>>>>>>>>>>> RemoteStreamEnvironment
> >>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>> and that was more difficult than it should be.
> >> The
> >>>>> main
> >>>>>>>> issue
> >>>>>>>>>>> is
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> environment and cluster client aren't decoupled.
> >>>>> Ideally
> >>>>>>> it
> >>>>>>>>>>> should
> >>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>> possible to just get the matching cluster client
> >>>> from
> >>>>>> the
> >>>>>>>>>>>>> environment
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>> then control the job through it (environment as
> >>>>> factory
> >>>>>>> for
> >>>>>>>>>>>> cluster
> >>>>>>>>>>>>>>>>>> client). But note that the environment classes
> >> are
> >>>>> part
> >>>>>> of
> >>>>>>>> the
> >>>>>>>>>>>>> public
> >>>>>>>>>>>>>>>>> API,
> >>>>>>>>>>>>>>>>>> and it is not straightforward to make larger
> >>> changes
> >>>>>>> without
> >>>>>>>>>>>>> breaking
> >>>>>>>>>>>>>>>>>> backward compatibility.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> ClusterClient currently exposes internal classes
> >>>> like
> >>>>>>>>>>> JobGraph and
> >>>>>>>>>>>>>>>>>> StreamGraph. But it should be possible to wrap
> >>> this
> >>>>>> with a
> >>>>>>>> new
> >>>>>>>>>>>>> public
> >>>>>>>>>>>>>>>> API
> >>>>>>>>>>>>>>>>>> that brings the required job control
> >> capabilities
> >>>> for
> >>>>>>>>>>> downstream
> >>>>>>>>>>>>>>>>> projects.
> >>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at some of the
> >>>>> interfaces
> >>>>>> in
> >>>>>>>>>>> Beam
> >>>>>>>>>>>>> while
> >>>>>>>>>>>>>>>>>> thinking about this: [2] for the portable job
> >> API
> >>>> and
> >>>>>> [3]
> >>>>>>>> for
> >>>>>>>>>>> the
> >>>>>>>>>>>>> old
> >>>>>>>>>>>>>>>>>> asynchronous job control from the Beam Java SDK.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The backward compatibility discussion [4] is
> >> also
> >>>>>> relevant
> >>>>>>>>>>> here. A
> >>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>>> API
> >>>>>>>>>>>>>>>>>> should shield downstream projects from internals
> >>> and
> >>>>>> allow
> >>>>>>>>>>> them to
> >>>>>>>>>>>>>>>>>> interoperate with multiple future Flink versions
> >>> in
> >>>>> the
> >>>>>>> same
> >>>>>>>>>>>> release
> >>>>>>>>>>>>>>>> line
> >>>>>>>>>>>>>>>>>> without forced upgrades.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>> Thomas
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> [1] https://github.com/apache/flink/pull/7249
> >>>>>>>>>>>>>>>>>> [2]
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> >>>>>>>>>>>>>>>>>> [3]
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> >>>>>>>>>>>>>>>>>> [4]
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
> >>>>>>>> zjffdu@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
> >>> able
> >>>> to
> >>>>>>>> define
> >>>>>>>>>>>> where
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is actually
> >>>>>> independent
> >>>>>>>> of
> >>>>>>>>>>> the
> >>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>> development and is something which is decided
> >> at
> >>>>>>> deployment
> >>>>>>>>>>> time.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> User don't need to specify execution mode
> >>>>>>> programmatically.
> >>>>>>>>>>> They
> >>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>>>>> pass the execution mode from the arguments in
> >>> flink
> >>>>> run
> >>>>>>>>>>> command.
> >>>>>>>>>>>>> e.g.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster ....
> >>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
> >>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Does this make sense to you ?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> To me it makes sense that the
> >>>> ExecutionEnvironment
> >>>>>> is
> >>>>>>>> not
> >>>>>>>>>>>>>>>> directly
> >>>>>>>>>>>>>>>>>>> initialized by the user and instead context
> >>>> sensitive
> >>>>>> how
> >>>>>>>> you
> >>>>>>>>>>>> want
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs. IDE, for
> >>> example).
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Right, currently I notice Flink would create
> >>>>> different
> >>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment based on different
> >>>>>> submission
> >>>>>>>>>>>> scenarios
> >>>>>>>>>>>>>>>>>> (Flink
> >>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is kind of hack
> >> approach,
> >>>> not
> >>>>>> so
> >>>>>>>>>>>>>>>>> straightforward.
> >>>>>>>>>>>>>>>>>>> What I suggested above is that is that flink
> >>> should
> >>>>>>> always
> >>>>>>>>>>> create
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> same
> >>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with different
> >>>>> configuration,
> >>>>>>> and
> >>>>>>>>>>> based
> >>>>>>>>>>>> on
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> configuration it would create the proper
> >>>>> ClusterClient
> >>>>>>> for
> >>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>> behaviors.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> >>>> 于2018年12月20日周四
> >>>>>>>>>>> 下午11:18写道：
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> You are probably right that we have code
> >>>> duplication
> >>>>>>> when
> >>>>>>>> it
> >>>>>>>>>>>> comes
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> creation of the ClusterClient. This should be
> >>>>> reduced
> >>>>>> in
> >>>>>>>> the
> >>>>>>>>>>>>>>>> future.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
> >> able
> >>> to
> >>>>>>> define
> >>>>>>>>>>> where
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is actually
> >>>>>>> independent
> >>>>>>>>>>> of the
> >>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>> development and is something which is decided
> >> at
> >>>>>>>> deployment
> >>>>>>>>>>>> time.
> >>>>>>>>>>>>>>>> To
> >>>>>>>>>>>>>>>>> me
> >>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>> makes sense that the ExecutionEnvironment is
> >> not
> >>>>>>> directly
> >>>>>>>>>>>>>>>> initialized
> >>>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>> the user and instead context sensitive how you
> >>>> want
> >>>>> to
> >>>>>>>>>>> execute
> >>>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for example). However, I
> >>> agree
> >>>>>> that
> >>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should give you access to
> >>> the
> >>>>>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> job (maybe in the form of the JobGraph or a
> >> job
> >>>>> plan).
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>> Till
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
> >>>>>>>>>>> zjffdu@gmail.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi Till,
> >>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You are right that I
> >>>>> expect
> >>>>>>>> better
> >>>>>>>>>>>>>>>>>>> programmatic
> >>>>>>>>>>>>>>>>>>>>> job submission/control api which could be
> >> used
> >>> by
> >>>>>>>>>>> downstream
> >>>>>>>>>>>>>>>>> project.
> >>>>>>>>>>>>>>>>>>> And
> >>>>>>>>>>>>>>>>>>>>> it would benefit for the flink ecosystem.
> >> When
> >>> I
> >>>>> look
> >>>>>>> at
> >>>>>>>>>>> the
> >>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>> flink
> >>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I believe they
> >> are
> >>>> not
> >>>>>> the
> >>>>>>>>>>> core of
> >>>>>>>>>>>>>>>>> flink,
> >>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of flink), I find
> >> many
> >>>>>>> duplicated
> >>>>>>>>>>> code
> >>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>> creating
> >>>>>>>>>>>>>>>>>>>>> ClusterClient from user provided
> >> configuration
> >>>>>>>>>>> (configuration
> >>>>>>>>>>>>>>>>> format
> >>>>>>>>>>>>>>>>>>> may
> >>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>> different from scala-shell and sql-client)
> >> and
> >>>> then
> >>>>>> use
> >>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't think this is
> >>>>> convenient
> >>>>>>> for
> >>>>>>>>>>>>>>>> downstream
> >>>>>>>>>>>>>>>>>>>>> projects. What I expect is that downstream
> >>>> project
> >>>>>> only
> >>>>>>>>>>> needs
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> provide
> >>>>>>>>>>>>>>>>>>>>> necessary configuration info (maybe
> >> introducing
> >>>>> class
> >>>>>>>>>>>> FlinkConf),
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment based on this
> >>>> FlinkConf,
> >>>>>> and
> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will create the proper
> >>>>>>>> ClusterClient.
> >>>>>>>>>>> It
> >>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>>>>> benefit for the downstream project
> >> development
> >>>> but
> >>>>>> also
> >>>>>>>> be
> >>>>>>>>>>>>>>>> helpful
> >>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>> their integration test with flink. Here's one
> >>>>> sample
> >>>>>>> code
> >>>>>>>>>>>> snippet
> >>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>> expect.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> val conf = new FlinkConf().mode("yarn")
> >>>>>>>>>>>>>>>>>>>>> val env = new ExecutionEnvironment(conf)
> >>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
> >>>>>>>>>>>>>>>>>>>>> val jobStatus =
> >>>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
> >>>>>>>>>>>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> What do you think ?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
> >>>>> 于2018年12月11日周二
> >>>>>>>>>>> 下午6:28写道：
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> what you are proposing is to provide the
> >> user
> >>>> with
> >>>>>>>> better
> >>>>>>>>>>>>>>>>>>> programmatic
> >>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>> control. There was actually an effort to
> >>> achieve
> >>>>>> this
> >>>>>>>> but
> >>>>>>>>>>> it
> >>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>> never
> >>>>>>>>>>>>>>>>>>>>> been
> >>>>>>>>>>>>>>>>>>>>>> completed [1]. However, there are some
> >>>> improvement
> >>>>>> in
> >>>>>>>> the
> >>>>>>>>>>> code
> >>>>>>>>>>>>>>>>> base
> >>>>>>>>>>>>>>>>>>>> now.
> >>>>>>>>>>>>>>>>>>>>>> Look for example at the NewClusterClient
> >>>> interface
> >>>>>>> which
> >>>>>>>>>>>>>>>> offers a
> >>>>>>>>>>>>>>>>>>>>>> non-blocking job submission. But I agree
> >> that
> >>> we
> >>>>>> need
> >>>>>>> to
> >>>>>>>>>>>>>>>> improve
> >>>>>>>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>> this regard.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I would not be in favour if exposing all
> >>>>>> ClusterClient
> >>>>>>>>>>> calls
> >>>>>>>>>>>>>>>> via
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment because it would
> >> clutter
> >>>> the
> >>>>>>> class
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>> good separation of concerns. Instead one
> >> idea
> >>>>> could
> >>>>>> be
> >>>>>>>> to
> >>>>>>>>>>>>>>>>> retrieve
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> current ClusterClient from the
> >>>>> ExecutionEnvironment
> >>>>>>>> which
> >>>>>>>>>>> can
> >>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>> used
> >>>>>>>>>>>>>>>>>>>>>> for cluster and job control. But before we
> >>> start
> >>>>> an
> >>>>>>>> effort
> >>>>>>>>>>>>>>>> here,
> >>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> agree and capture what functionality we want
> >>> to
> >>>>>>> provide.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Initially, the idea was that we have the
> >>>>>>>> ClusterDescriptor
> >>>>>>>>>>>>>>>>>> describing
> >>>>>>>>>>>>>>>>>>>> how
> >>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager like Yarn or
> >> Mesos.
> >>>> The
> >>>>>>>>>>>>>>>>>> ClusterDescriptor
> >>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>> used for deploying Flink clusters (job and
> >>>>> session)
> >>>>>>> and
> >>>>>>>>>>> gives
> >>>>>>>>>>>>>>>>> you a
> >>>>>>>>>>>>>>>>>>>>>> ClusterClient. The ClusterClient controls
> >> the
> >>>>>> cluster
> >>>>>>>>>>> (e.g.
> >>>>>>>>>>>>>>>>>>> submitting
> >>>>>>>>>>>>>>>>>>>>>> jobs, listing all running jobs). And then
> >>> there
> >>>>> was
> >>>>>>> the
> >>>>>>>>>>> idea
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> introduce a
> >>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain from the
> >>>> ClusterClient
> >>>>> to
> >>>>>>>>>>> trigger
> >>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>> specific
> >>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a savepoint,
> >>> cancelling
> >>>>> the
> >>>>>>>> job).
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>> https://issues.apache.org/jira/browse/FLINK-4272
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>> Till
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang
> >> <
> >>>>>>>>>>> zjffdu@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate flink into apache
> >>>>> zeppelin
> >>>>>>>>>>> which is
> >>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>> interactive
> >>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several issues that is
> >>>> caused
> >>>>>> by
> >>>>>>>>>>> flink
> >>>>>>>>>>>>>>>>> client
> >>>>>>>>>>>>>>>>>>>> api.
> >>>>>>>>>>>>>>>>>>>>> So
> >>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the following changes
> >>> for
> >>>>>> flink
> >>>>>>>>>>> client
> >>>>>>>>>>>>>>>>> api.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking execution.
> >> Currently,
> >>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
> >>>>>>>>>>>>>>>>>>>>>>> is a blocking method which would do 2
> >> things,
> >>>>> first
> >>>>>>>>>>> submit
> >>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>>> wait for job until it is finished. I'd like
> >>>>>>> introduce a
> >>>>>>>>>>>>>>>>>> nonblocking
> >>>>>>>>>>>>>>>>>>>>>>> execution method like
> >>>> ExecutionEnvironment#submit
> >>>>>>> which
> >>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>> submit
> >>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>> then return jobId to client. And allow user
> >>> to
> >>>>>> query
> >>>>>>>> the
> >>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>> status
> >>>>>>>>>>>>>>>>>>>> via
> >>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> jobId.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
> >>>>>>>>>>>>>>>>>>>
> >> ExecutionEnvironment/StreamExecutionEnvironment,
> >>>>>>>>>>>>>>>>>>>>>>> currently the only way to cancel job is via
> >>> cli
> >>>>>>>>>>> (bin/flink),
> >>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>> convenient for downstream project to use
> >> this
> >>>>>>> feature.
> >>>>>>>>>>> So I'd
> >>>>>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> add
> >>>>>>>>>>>>>>>>>>>>>>> cancel api in ExecutionEnvironment
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
> >>>>>>>>>>>>>>>>>>>>>
> >>> ExecutionEnvironment/StreamExecutionEnvironment.
> >>>>>>>>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api, we should use
> >>>>>>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> unified
> >>>>>>>>>>>>>>>>>>>>>>> api for third party to integrate with
> >> flink.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job execution
> >> lifecycle.
> >>>>>>> Something
> >>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>> following,
> >>>>>>>>>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>>>>>>>>>> that downstream project can do custom logic
> >>> in
> >>>>> the
> >>>>>>>>>>> lifecycle
> >>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> job.
> >>>>>>>>>>>>>>>>>>>>> e.g.
> >>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the jobId after job
> >> is
> >>>>>>> submitted
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later when necessary.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> public interface JobListener {
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID jobId);
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>  void onJobExecuted(JobExecutionResult
> >>>>> jobResult);
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID jobId);
> >>>>>>>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in ExecutionEnvironment.
> >>>>>> Currently
> >>>>>>> it
> >>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>> disabled,
> >>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>> session is very convenient for third party
> >> to
> >>>>>>>> submitting
> >>>>>>>>>>> jobs
> >>>>>>>>>>>>>>>>>>>>>> continually.
> >>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it again.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client api into
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> ExecutionEnvironment/StreamExecutionEnvironment.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> This is a long term issue which needs more
> >>>>> careful
> >>>>>>>>>>> thinking
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> design.
> >>>>>>>>>>>>>>>>>>>>>>> Currently some of features of flink is
> >>> exposed
> >>>> in
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> ExecutionEnvironment/StreamExecutionEnvironment,
> >>>>>> but
> >>>>>>>>>>> some are
> >>>>>>>>>>>>>>>>>>> exposed
> >>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like the cancel and
> >>>>> savepoint I
> >>>>>>>>>>> mentioned
> >>>>>>>>>>>>>>>>>>> above.
> >>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>> think the root cause is due to that flink
> >>>> didn't
> >>>>>>> unify
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> interaction
> >>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3 scenarios of flink
> >>>> operation
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.  Flink will create
> >>>>>>>>>>> LocalEnvironment
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to create
> >>> LocalExecutor
> >>>>> for
> >>>>>>> job
> >>>>>>>>>>>>>>>>>> execution.
> >>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution. Flink will create
> >>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>>> first
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment based on the
> >>>>>>> ClusterClient
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>> run
> >>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> job.
> >>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink will create
> >>>>>> ClusterClient
> >>>>>>>>>>> first
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>> cancel
> >>>>>>>>>>>>>>>>>>>>>>>  this job via this ClusterClient.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> As you can see in the above 3 scenarios.
> >>> Flink
> >>>>>> didn't
> >>>>>>>>>>> use the
> >>>>>>>>>>>>>>>>>> same
> >>>>>>>>>>>>>>>>>>>>>>> approach(code path) to interact with flink
> >>>>>>>>>>>>>>>>>>>>>>> What I propose is following:
> >>>>>>>>>>>>>>>>>>>>>>> Create the proper
> >>>>>> LocalEnvironment/RemoteEnvironment
> >>>>>>>>>>> (based
> >>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>> user
> >>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this Environment to
> >>>> create
> >>>>>>>> proper
> >>>>>>>>>>>>>>>>>>>> ClusterClient
> >>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or RestClusterClient)
> >> to
> >>>>>>>> interactive
> >>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>> Flink (
> >>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> This way we can unify the process of local
> >>>>>> execution
> >>>>>>>> and
> >>>>>>>>>>>>>>>> remote
> >>>>>>>>>>>>>>>>>>>>>> execution.
> >>>>>>>>>>>>>>>>>>>>>>> And it is much easier for third party to
> >>>>> integrate
> >>>>>>> with
> >>>>>>>>>>>>>>>> flink,
> >>>>>>>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the unified entry
> >>> point
> >>>>> for
> >>>>>>>>>>> flink.
> >>>>>>>>>>>>>>>> What
> >>>>>>>>>>>>>>>>>>> third
> >>>>>>>>>>>>>>>>>>>>>> party
> >>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass configuration to
> >>>>>>>>>>>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will do the right
> >> thing
> >>>>> based
> >>>>>> on
> >>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> configuration.
> >>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be considered as flink
> >> api
> >>>>>>> consumer.
> >>>>>>>>>>> it
> >>>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>> pass
> >>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> configuration to ExecutionEnvironment and
> >> let
> >>>>>>>>>>>>>>>>>> ExecutionEnvironment
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>> create the proper ClusterClient instead of
> >>>>> letting
> >>>>>>> cli
> >>>>>>>> to
> >>>>>>>>>>>>>>>>> create
> >>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code refactoring, so
> >> I
> >>>>> think
> >>>>>> we
> >>>>>>>> can
> >>>>>>>>>>>>>>>> defer
> >>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5 could be done at
> >>>> once I
> >>>>>>>>>>> believe.
> >>>>>>>>>>>>>>>> Let
> >>>>>>>>>>>>>>>>> me
> >>>>>>>>>>>>>>>>>>>> know
> >>>>>>>>>>>>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>>>>>>> comments and feedback, thanks
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>> Best Regards
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>> Best Regards
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>> Best Regards
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Jeff Zhang
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>> Best Regards
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Jeff Zhang
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> Best Regards
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Jeff Zhang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Best Regards
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Jeff Zhang
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Best Regards
> >>>>>>>>
> >>>>>>>> Jeff Zhang
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best Regards
> >>>>>
> >>>>> Jeff Zhang
> >>>>>
> >>>>
> >>>
> >>
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
>
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Aljoscha Krettek <al...@apache.org>.

Hi,

I read both Jeffs initial design document and the newer document by Tison. I also finally found the time to collect our thoughts on the issue, I had quite some discussions with Kostas and this is the result: [1].

I think overall we agree that this part of the code is in dire need of some refactoring/improvements but I think there are still some open questions and some differences in opinion what those refactorings should look like.

I think the API-side is quite clear, i.e. we need some JobClient API that allows interacting with a running Job. It could be worthwhile to spin that off into a separate FLIP because we can probably find consensus on that part more easily.

For the rest, the main open questions from our doc are these:

  - Do we want to separate cluster creation and job submission for per-job mode? In the past, there were conscious efforts to *not* separate job submission from cluster creation for per-job clusters for Mesos, YARN, Kubernets (see StandaloneJobClusterEntryPoint). Tison suggests in his design document to decouple this in order to unify job submission.

  - How to deal with plan preview, which needs to hijack execute() and let the outside code catch an exception?

  - How to deal with Jar Submission at the Web Frontend, which needs to hijack execute() and let the outside code catch an exception? CliFrontend.run() “hijacks” ExecutionEnvironment.execute() to get a JobGraph and then execute that JobGraph manually. We could get around that by letting execute() do the actual execution. One caveat for this is that now the main() method doesn’t return (or is forced to return by throwing an exception from execute()) which means that for Jar Submission from the WebFrontend we have a long-running main() method running in the WebFrontend. This doesn’t sound very good. We could get around this by removing the plan preview feature and by removing Jar Submission/Running.

  - How to deal with detached mode? Right now, DetachedEnvironment will execute the job and return immediately. If users control when they want to return, by waiting on the job completion future, how do we deal with this? Do we simply remove the distinction between detached/non-detached?

  - How does per-job mode interact with “interactive programming” (FLIP-36). For YARN, each execute() call could spawn a new Flink YARN cluster. What about Mesos and Kubernetes?

The first open question is where the opinions diverge, I think. The rest are just open questions and interesting things that we need to consider.

Best,
Aljoscha

[1] https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix <https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.na7k0ad88tix>

> On 31. Jul 2019, at 15:23, Jeff Zhang <zj...@gmail.com> wrote:
> 
> Thanks tison for the effort. I left a few comments.
> 
> 
> Zili Chen <wa...@gmail.com> 于2019年7月31日周三 下午8:24写道：
> 
>> Hi Flavio,
>> 
>> Thanks for your reply.
>> 
>> Either current impl and in the design, ClusterClient
>> never takes responsibility for generating JobGraph.
>> (what you see in current codebase is several class methods)
>> 
>> Instead, user describes his program in the main method
>> with ExecutionEnvironment apis and calls env.compile()
>> or env.optimize() to get FlinkPlan and JobGraph respectively.
>> 
>> For listing main classes in a jar and choose one for
>> submission, you're now able to customize a CLI to do it.
>> Specifically, the path of jar is passed as arguments and
>> in the customized CLI you list main classes, choose one
>> to submit to the cluster.
>> 
>> Best,
>> tison.
>> 
>> 
>> Flavio Pompermaier <po...@okkam.it> 于2019年7月31日周三 下午8:12写道：
>> 
>>> Just one note on my side: it is not clear to me whether the client needs
>> to
>>> be able to generate a job graph or not.
>>> In my opinion, the job jar must resides only on the server/jobManager
>> side
>>> and the client requires a way to get the job graph.
>>> If you really want to access to the job graph, I'd add a dedicated method
>>> on the ClusterClient. like:
>>> 
>>>   - getJobGraph(jarId, mainClass): JobGraph
>>>   - listMainClasses(jarId): List<String>
>>> 
>>> These would require some addition also on the job manager endpoint as
>>> well..what do you think?
>>> 
>>> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <wa...@gmail.com> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> Here is a document[1] on client api enhancement from our perspective.
>>>> We have investigated current implementations. And we propose
>>>> 
>>>> 1. Unify the implementation of cluster deployment and job submission in
>>>> Flink.
>>>> 2. Provide programmatic interfaces to allow flexible job and cluster
>>>> management.
>>>> 
>>>> The first proposal is aimed at reducing code paths of cluster
>> deployment
>>>> and
>>>> job submission so that one can adopt Flink in his usage easily. The
>>> second
>>>> proposal is aimed at providing rich interfaces for advanced users
>>>> who want to make accurate control of these stages.
>>>> 
>>>> Quick reference on open questions:
>>>> 
>>>> 1. Exclude job cluster deployment from client side or redefine the
>>> semantic
>>>> of job cluster? Since it fits in a process quite different from session
>>>> cluster deployment and job submission.
>>>> 
>>>> 2. Maintain the codepaths handling class o.a.f.api.common.Program or
>>>> implement customized program handling logic by customized CliFrontend?
>>>> See also this thread[2] and the document[1].
>>>> 
>>>> 3. Expose ClusterClient as public api or just expose api in
>>>> ExecutionEnvironment
>>>> and delegate them to ClusterClient? Further, in either way is it worth
>> to
>>>> introduce a JobClient which is an encapsulation of ClusterClient that
>>>> associated to specific job?
>>>> 
>>>> Best,
>>>> tison.
>>>> 
>>>> [1]
>>>> 
>>>> 
>>> 
>> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
>>>> [2]
>>>> 
>>>> 
>>> 
>> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
>>>> 
>>>> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三 上午9:19写道：
>>>> 
>>>>> Thanks Stephan, I will follow up this issue in next few weeks, and
>> will
>>>>> refine the design doc. We could discuss more details after 1.9
>> release.
>>>>> 
>>>>> Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：
>>>>> 
>>>>>> Hi all!
>>>>>> 
>>>>>> This thread has stalled for a bit, which I assume ist mostly due to
>>> the
>>>>>> Flink 1.9 feature freeze and release testing effort.
>>>>>> 
>>>>>> I personally still recognize this issue as one important to be
>>> solved.
>>>>> I'd
>>>>>> be happy to help resume this discussion soon (after the 1.9
>> release)
>>>> and
>>>>>> see if we can do some step towards this in Flink 1.10.
>>>>>> 
>>>>>> Best,
>>>>>> Stephan
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <
>>>>> pompermaier@okkam.it>
>>>>>> wrote:
>>>>>> 
>>>>>>> That's exactly what I suggested a long time ago: the Flink REST
>>>> client
>>>>>>> should not require any Flink dependency, only http library to
>> call
>>>> the
>>>>>> REST
>>>>>>> services to submit and monitor a job.
>>>>>>> What I suggested also in [1] was to have a way to automatically
>>>> suggest
>>>>>> the
>>>>>>> user (via a UI) the available main classes and their required
>>>>>>> parameters[2].
>>>>>>> Another problem we have with Flink is that the Rest client and
>> the
>>>> CLI
>>>>>> one
>>>>>>> behaves differently and we use the CLI client (via ssh) because
>> it
>>>>> allows
>>>>>>> to call some other method after env.execute() [3] (we have to
>> call
>>>>>> another
>>>>>>> REST service to signal the end of the job).
>>>>>>> Int his regard, a dedicated interface, like the JobListener
>>> suggested
>>>>> in
>>>>>>> the previous emails, would be very helpful (IMHO).
>>>>>>> 
>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-10864
>>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-10862
>>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-10879
>>>>>>> 
>>>>>>> Best,
>>>>>>> Flavio
>>>>>>> 
>>>>>>> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <zj...@gmail.com>
>>> wrote:
>>>>>>> 
>>>>>>>> Hi, Tison,
>>>>>>>> 
>>>>>>>> Thanks for your comments. Overall I agree with you that it is
>>>>> difficult
>>>>>>> for
>>>>>>>> down stream project to integrate with flink and we need to
>>> refactor
>>>>> the
>>>>>>>> current flink client api.
>>>>>>>> And I agree that CliFrontend should only parsing command line
>>>>> arguments
>>>>>>> and
>>>>>>>> then pass them to ExecutionEnvironment. It is
>>>> ExecutionEnvironment's
>>>>>>>> responsibility to compile job, create cluster, and submit job.
>>>>> Besides
>>>>>>>> that, Currently flink has many ExecutionEnvironment
>>>> implementations,
>>>>>> and
>>>>>>>> flink will use the specific one based on the context. IMHO, it
>> is
>>>> not
>>>>>>>> necessary, ExecutionEnvironment should be able to do the right
>>>> thing
>>>>>>> based
>>>>>>>> on the FlinkConf it is received. Too many ExecutionEnvironment
>>>>>>>> implementation is another burden for downstream project
>>>> integration.
>>>>>>>> 
>>>>>>>> One thing I'd like to mention is flink's scala shell and sql
>>>> client,
>>>>>>>> although they are sub-modules of flink, they could be treated
>> as
>>>>>>> downstream
>>>>>>>> project which use flink's client api. Currently you will find
>> it
>>> is
>>>>> not
>>>>>>>> easy for them to integrate with flink, they share many
>> duplicated
>>>>> code.
>>>>>>> It
>>>>>>>> is another sign that we should refactor flink client api.
>>>>>>>> 
>>>>>>>> I believe it is a large and hard change, and I am afraid we can
>>> not
>>>>>> keep
>>>>>>>> compatibility since many of changes are user facing.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> After a closer look on our client apis, I can see there are
>> two
>>>>> major
>>>>>>>>> issues to consistency and integration, namely different
>>>> deployment
>>>>> of
>>>>>>>>> job cluster which couples job graph creation and cluster
>>>>> deployment,
>>>>>>>>> and submission via CliFrontend confusing control flow of job
>>>> graph
>>>>>>>>> compilation and job submission. I'd like to follow the
>> discuss
>>>>> above,
>>>>>>>>> mainly the process described by Jeff and Stephan, and share
>> my
>>>>>>>>> ideas on these issues.
>>>>>>>>> 
>>>>>>>>> 1) CliFrontend confuses the control flow of job compilation
>> and
>>>>>>>> submission.
>>>>>>>>> Following the process of job submission Stephan and Jeff
>>>> described,
>>>>>>>>> execution environment knows all configs of the cluster and
>>>>>>> topos/settings
>>>>>>>>> of the job. Ideally, in the main method of user program, it
>>> calls
>>>>>>>> #execute
>>>>>>>>> (or named #submit) and Flink deploys the cluster, compile the
>>> job
>>>>>> graph
>>>>>>>>> and submit it to the cluster. However, current CliFrontend
>> does
>>>> all
>>>>>>> these
>>>>>>>>> things inside its #runProgram method, which introduces a lot
>> of
>>>>>>>> subclasses
>>>>>>>>> of (stream) execution environment.
>>>>>>>>> 
>>>>>>>>> Actually, it sets up an exec env that hijacks the
>>>>>> #execute/executePlan
>>>>>>>>> method, initializes the job graph and abort execution. And
>> then
>>>>>>>>> control flow back to CliFrontend, it deploys the cluster(or
>>>>> retrieve
>>>>>>>>> the client) and submits the job graph. This is quite a
>> specific
>>>>>>> internal
>>>>>>>>> process inside Flink and none of consistency to anything.
>>>>>>>>> 
>>>>>>>>> 2) Deployment of job cluster couples job graph creation and
>>>> cluster
>>>>>>>>> deployment. Abstractly, from user job to a concrete
>> submission,
>>>> it
>>>>>>>> requires
>>>>>>>>> 
>>>>>>>>>     create JobGraph --\
>>>>>>>>> 
>>>>>>>>> create ClusterClient -->  submit JobGraph
>>>>>>>>> 
>>>>>>>>> such a dependency. ClusterClient was created by deploying or
>>>>>>> retrieving.
>>>>>>>>> JobGraph submission requires a compiled JobGraph and valid
>>>>>>> ClusterClient,
>>>>>>>>> but the creation of ClusterClient is abstractly independent
>> of
>>>> that
>>>>>> of
>>>>>>>>> JobGraph. However, in job cluster mode, we deploy job cluster
>>>> with
>>>>> a
>>>>>>> job
>>>>>>>>> graph, which means we use another process:
>>>>>>>>> 
>>>>>>>>> create JobGraph --> deploy cluster with the JobGraph
>>>>>>>>> 
>>>>>>>>> Here is another inconsistency and downstream projects/client
>>> apis
>>>>> are
>>>>>>>>> forced to handle different cases with rare supports from
>> Flink.
>>>>>>>>> 
>>>>>>>>> Since we likely reached a consensus on
>>>>>>>>> 
>>>>>>>>> 1. all configs gathered by Flink configuration and passed
>>>>>>>>> 2. execution environment knows all configs and handles
>>>>> execution(both
>>>>>>>>> deployment and submission)
>>>>>>>>> 
>>>>>>>>> to the issues above I propose eliminating inconsistencies by
>>>>>> following
>>>>>>>>> approach:
>>>>>>>>> 
>>>>>>>>> 1) CliFrontend should exactly be a front end, at least for
>>> "run"
>>>>>>> command.
>>>>>>>>> That means it just gathered and passed all config from
>> command
>>>> line
>>>>>> to
>>>>>>>>> the main method of user program. Execution environment knows
>>> all
>>>>> the
>>>>>>> info
>>>>>>>>> and with an addition to utils for ClusterClient, we
>> gracefully
>>>> get
>>>>> a
>>>>>>>>> ClusterClient by deploying or retrieving. In this way, we
>> don't
>>>>> need
>>>>>> to
>>>>>>>>> hijack #execute/executePlan methods and can remove various
>>>> hacking
>>>>>>>>> subclasses of exec env, as well as #run methods in
>>>>> ClusterClient(for
>>>>>> an
>>>>>>>>> interface-ized ClusterClient). Now the control flow flows
>> from
>>>>>>>> CliFrontend
>>>>>>>>> to the main method and never returns.
>>>>>>>>> 
>>>>>>>>> 2) Job cluster means a cluster for the specific job. From
>>> another
>>>>>>>>> perspective, it is an ephemeral session. We may decouple the
>>>>>> deployment
>>>>>>>>> with a compiled job graph, but start a session with idle
>>> timeout
>>>>>>>>> and submit the job following.
>>>>>>>>> 
>>>>>>>>> These topics, before we go into more details on design or
>>>>>>> implementation,
>>>>>>>>> are better to be aware and discussed for a consensus.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> tison.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
>>>>>>>>> 
>>>>>>>>>> Hi Jeff,
>>>>>>>>>> 
>>>>>>>>>> Thanks for raising this thread and the design document!
>>>>>>>>>> 
>>>>>>>>>> As @Thomas Weise mentioned above, extending config to flink
>>>>>>>>>> requires far more effort than it should be. Another example
>>>>>>>>>> is we achieve detach mode by introduce another execution
>>>>>>>>>> environment which also hijack #execute method.
>>>>>>>>>> 
>>>>>>>>>> I agree with your idea that user would configure all things
>>>>>>>>>> and flink "just" respect it. On this topic I think the
>> unusual
>>>>>>>>>> control flow when CliFrontend handle "run" command is the
>>>> problem.
>>>>>>>>>> It handles several configs, mainly about cluster settings,
>> and
>>>>>>>>>> thus main method of user program is unaware of them. Also it
>>>>>> compiles
>>>>>>>>>> app to job graph by run the main method with a hijacked exec
>>>> env,
>>>>>>>>>> which constrain the main method further.
>>>>>>>>>> 
>>>>>>>>>> I'd like to write down a few of notes on configs/args pass
>> and
>>>>>>> respect,
>>>>>>>>>> as well as decoupling job compilation and submission. Share
>> on
>>>>> this
>>>>>>>>>> thread later.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> tison.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一
>> 下午7:29写道：
>>>>>>>>>> 
>>>>>>>>>>> Hi Jeff and Flavio,
>>>>>>>>>>> 
>>>>>>>>>>> Thanks Jeff a lot for proposing the design document.
>>>>>>>>>>> 
>>>>>>>>>>> We are also working on refactoring ClusterClient to allow
>>>>> flexible
>>>>>>> and
>>>>>>>>>>> efficient job management in our real-time platform.
>>>>>>>>>>> We would like to draft a document to share our ideas with
>>> you.
>>>>>>>>>>> 
>>>>>>>>>>> I think it's a good idea to have something like Apache Livy
>>> for
>>>>>>> Flink,
>>>>>>>>>>> and
>>>>>>>>>>> the efforts discussed here will take a great step forward
>> to
>>>> it.
>>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Xiaogang
>>>>>>>>>>> 
>>>>>>>>>>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一
>>>>> 下午7:13写道：
>>>>>>>>>>> 
>>>>>>>>>>>> Is there any possibility to have something like Apache
>> Livy
>>>> [1]
>>>>>>> also
>>>>>>>>>>> for
>>>>>>>>>>>> Flink in the future?
>>>>>>>>>>>> 
>>>>>>>>>>>> [1] https://livy.apache.org/
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <
>>> zjffdu@gmail.com
>>>>> 
>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Any API we expose should not have dependencies on
>>> the
>>>>>>> runtime
>>>>>>>>>>>>> (flink-runtime) package or other implementation
>> details.
>>> To
>>>>> me,
>>>>>>>> this
>>>>>>>>>>>> means
>>>>>>>>>>>>> that the current ClusterClient cannot be exposed to
>> users
>>>>>> because
>>>>>>>> it
>>>>>>>>>>>> uses
>>>>>>>>>>>>> quite some classes from the optimiser and runtime
>>> packages.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We should change ClusterClient from class to interface.
>>>>>>>>>>>>> ExecutionEnvironment only use the interface
>> ClusterClient
>>>>> which
>>>>>>>>>>> should be
>>>>>>>>>>>>> in flink-clients while the concrete implementation
>> class
>>>>> could
>>>>>> be
>>>>>>>> in
>>>>>>>>>>>>> flink-runtime.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> What happens when a failure/restart in the client
>>>>> happens?
>>>>>>>> There
>>>>>>>>>>> need
>>>>>>>>>>>>> to be a way of re-establishing the connection to the
>> job,
>>>> set
>>>>>> up
>>>>>>>> the
>>>>>>>>>>>>> listeners again, etc.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Good point.  First we need to define what does
>>>>> failure/restart
>>>>>> in
>>>>>>>> the
>>>>>>>>>>>>> client mean. IIUC, that usually mean network failure
>>> which
>>>>> will
>>>>>>>>>>> happen in
>>>>>>>>>>>>> class RestClient. If my understanding is correct,
>>>>> restart/retry
>>>>>>>>>>> mechanism
>>>>>>>>>>>>> should be done in RestClient.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二
>>>>>> 下午11:10写道：
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Some points to consider:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> * Any API we expose should not have dependencies on
>> the
>>>>>> runtime
>>>>>>>>>>>>>> (flink-runtime) package or other implementation
>>> details.
>>>> To
>>>>>> me,
>>>>>>>>>>> this
>>>>>>>>>>>>> means
>>>>>>>>>>>>>> that the current ClusterClient cannot be exposed to
>>> users
>>>>>>> because
>>>>>>>>>>> it
>>>>>>>>>>>>> uses
>>>>>>>>>>>>>> quite some classes from the optimiser and runtime
>>>> packages.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> * What happens when a failure/restart in the client
>>>>> happens?
>>>>>>>> There
>>>>>>>>>>> need
>>>>>>>>>>>>> to
>>>>>>>>>>>>>> be a way of re-establishing the connection to the
>> job,
>>>> set
>>>>> up
>>>>>>> the
>>>>>>>>>>>>> listeners
>>>>>>>>>>>>>> again, etc.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Aljoscha
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 29. May 2019, at 10:17, Jeff Zhang <
>>>> zjffdu@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Sorry folks, the design doc is late as you
>> expected.
>>>>> Here's
>>>>>>> the
>>>>>>>>>>>> design
>>>>>>>>>>>>>> doc
>>>>>>>>>>>>>>> I drafted, welcome any comments and feedback.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Stephan Ewen <se...@apache.org> 于2019年2月14日周四
>>>> 下午8:43写道：
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Nice that this discussion is happening.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> In the FLIP, we could also revisit the entire role
>>> of
>>>>> the
>>>>>>>>>>>> environments
>>>>>>>>>>>>>>>> again.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Initially, the idea was:
>>>>>>>>>>>>>>>> - the environments take care of the specific
>> setup
>>>> for
>>>>>>>>>>> standalone
>>>>>>>>>>>> (no
>>>>>>>>>>>>>>>> setup needed), yarn, mesos, etc.
>>>>>>>>>>>>>>>> - the session ones have control over the session.
>>> The
>>>>>>>>>>> environment
>>>>>>>>>>>>> holds
>>>>>>>>>>>>>>>> the session client.
>>>>>>>>>>>>>>>> - running a job gives a "control" object for that
>>>> job.
>>>>>> That
>>>>>>>>>>>> behavior
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> the same in all environments.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The actual implementation diverged quite a bit
>> from
>>>>> that.
>>>>>>>> Happy
>>>>>>>>>>> to
>>>>>>>>>>>>> see a
>>>>>>>>>>>>>>>> discussion about straitening this out a bit more.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
>>>>>>> zjffdu@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Sorry for late response, It seems we reach
>>> consensus
>>>> on
>>>>>>>> this, I
>>>>>>>>>>>> will
>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>> FLIP for this with more detailed design
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thomas Weise <th...@apache.org> 于2018年12月21日周五
>>>>> 上午11:43写道：
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Great to see this discussion seeded! The
>> problems
>>>> you
>>>>>> face
>>>>>>>>>>> with
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> Zeppelin integration are also affecting other
>>>>> downstream
>>>>>>>>>>> projects,
>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>> Beam.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> We just enabled the savepoint restore option in
>>>>>>>>>>>>>> RemoteStreamEnvironment
>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>> and that was more difficult than it should be.
>> The
>>>>> main
>>>>>>>> issue
>>>>>>>>>>> is
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> environment and cluster client aren't decoupled.
>>>>> Ideally
>>>>>>> it
>>>>>>>>>>> should
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>> possible to just get the matching cluster client
>>>> from
>>>>>> the
>>>>>>>>>>>>> environment
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> then control the job through it (environment as
>>>>> factory
>>>>>>> for
>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>> client). But note that the environment classes
>> are
>>>>> part
>>>>>> of
>>>>>>>> the
>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>>> API,
>>>>>>>>>>>>>>>>>> and it is not straightforward to make larger
>>> changes
>>>>>>> without
>>>>>>>>>>>>> breaking
>>>>>>>>>>>>>>>>>> backward compatibility.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> ClusterClient currently exposes internal classes
>>>> like
>>>>>>>>>>> JobGraph and
>>>>>>>>>>>>>>>>>> StreamGraph. But it should be possible to wrap
>>> this
>>>>>> with a
>>>>>>>> new
>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>> API
>>>>>>>>>>>>>>>>>> that brings the required job control
>> capabilities
>>>> for
>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>> projects.
>>>>>>>>>>>>>>>>>> Perhaps it is helpful to look at some of the
>>>>> interfaces
>>>>>> in
>>>>>>>>>>> Beam
>>>>>>>>>>>>> while
>>>>>>>>>>>>>>>>>> thinking about this: [2] for the portable job
>> API
>>>> and
>>>>>> [3]
>>>>>>>> for
>>>>>>>>>>> the
>>>>>>>>>>>>> old
>>>>>>>>>>>>>>>>>> asynchronous job control from the Beam Java SDK.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The backward compatibility discussion [4] is
>> also
>>>>>> relevant
>>>>>>>>>>> here. A
>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>> API
>>>>>>>>>>>>>>>>>> should shield downstream projects from internals
>>> and
>>>>>> allow
>>>>>>>>>>> them to
>>>>>>>>>>>>>>>>>> interoperate with multiple future Flink versions
>>> in
>>>>> the
>>>>>>> same
>>>>>>>>>>>> release
>>>>>>>>>>>>>>>> line
>>>>>>>>>>>>>>>>>> without forced upgrades.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> [1] https://github.com/apache/flink/pull/7249
>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
>>>>>>>>>>>>>>>>>> [4]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
>>>>>>>> zjffdu@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
>>> able
>>>> to
>>>>>>>> define
>>>>>>>>>>>> where
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is actually
>>>>>> independent
>>>>>>>> of
>>>>>>>>>>> the
>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>> development and is something which is decided
>> at
>>>>>>> deployment
>>>>>>>>>>> time.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> User don't need to specify execution mode
>>>>>>> programmatically.
>>>>>>>>>>> They
>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>> pass the execution mode from the arguments in
>>> flink
>>>>> run
>>>>>>>>>>> command.
>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> bin/flink run -m yarn-cluster ....
>>>>>>>>>>>>>>>>>>> bin/flink run -m local ...
>>>>>>>>>>>>>>>>>>> bin/flink run -m host:port ...
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Does this make sense to you ?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> To me it makes sense that the
>>>> ExecutionEnvironment
>>>>>> is
>>>>>>>> not
>>>>>>>>>>>>>>>> directly
>>>>>>>>>>>>>>>>>>> initialized by the user and instead context
>>>> sensitive
>>>>>> how
>>>>>>>> you
>>>>>>>>>>>> want
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> execute your job (Flink CLI vs. IDE, for
>>> example).
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Right, currently I notice Flink would create
>>>>> different
>>>>>>>>>>>>>>>>>>> ContextExecutionEnvironment based on different
>>>>>> submission
>>>>>>>>>>>> scenarios
>>>>>>>>>>>>>>>>>> (Flink
>>>>>>>>>>>>>>>>>>> Cli vs IDE). To me this is kind of hack
>> approach,
>>>> not
>>>>>> so
>>>>>>>>>>>>>>>>> straightforward.
>>>>>>>>>>>>>>>>>>> What I suggested above is that is that flink
>>> should
>>>>>>> always
>>>>>>>>>>> create
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>> ExecutionEnvironment but with different
>>>>> configuration,
>>>>>>> and
>>>>>>>>>>> based
>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> configuration it would create the proper
>>>>> ClusterClient
>>>>>>> for
>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>> behaviors.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
>>>> 于2018年12月20日周四
>>>>>>>>>>> 下午11:18写道：
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> You are probably right that we have code
>>>> duplication
>>>>>>> when
>>>>>>>> it
>>>>>>>>>>>> comes
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> creation of the ClusterClient. This should be
>>>>> reduced
>>>>>> in
>>>>>>>> the
>>>>>>>>>>>>>>>> future.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I'm not so sure whether the user should be
>> able
>>> to
>>>>>>> define
>>>>>>>>>>> where
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>> runs (in your example Yarn). This is actually
>>>>>>> independent
>>>>>>>>>>> of the
>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>> development and is something which is decided
>> at
>>>>>>>> deployment
>>>>>>>>>>>> time.
>>>>>>>>>>>>>>>> To
>>>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> makes sense that the ExecutionEnvironment is
>> not
>>>>>>> directly
>>>>>>>>>>>>>>>> initialized
>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>> the user and instead context sensitive how you
>>>> want
>>>>> to
>>>>>>>>>>> execute
>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>> (Flink CLI vs. IDE, for example). However, I
>>> agree
>>>>>> that
>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment should give you access to
>>> the
>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> job (maybe in the form of the JobGraph or a
>> job
>>>>> plan).
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
>>>>>>>>>>> zjffdu@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi Till,
>>>>>>>>>>>>>>>>>>>>> Thanks for the feedback. You are right that I
>>>>> expect
>>>>>>>> better
>>>>>>>>>>>>>>>>>>> programmatic
>>>>>>>>>>>>>>>>>>>>> job submission/control api which could be
>> used
>>> by
>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>> project.
>>>>>>>>>>>>>>>>>>> And
>>>>>>>>>>>>>>>>>>>>> it would benefit for the flink ecosystem.
>> When
>>> I
>>>>> look
>>>>>>> at
>>>>>>>>>>> the
>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>>>>>> scala-shell and sql-client (I believe they
>> are
>>>> not
>>>>>> the
>>>>>>>>>>> core of
>>>>>>>>>>>>>>>>> flink,
>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>> belong to the ecosystem of flink), I find
>> many
>>>>>>> duplicated
>>>>>>>>>>> code
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> creating
>>>>>>>>>>>>>>>>>>>>> ClusterClient from user provided
>> configuration
>>>>>>>>>>> (configuration
>>>>>>>>>>>>>>>>> format
>>>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>> different from scala-shell and sql-client)
>> and
>>>> then
>>>>>> use
>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>> to manipulate jobs. I don't think this is
>>>>> convenient
>>>>>>> for
>>>>>>>>>>>>>>>> downstream
>>>>>>>>>>>>>>>>>>>>> projects. What I expect is that downstream
>>>> project
>>>>>> only
>>>>>>>>>>> needs
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> provide
>>>>>>>>>>>>>>>>>>>>> necessary configuration info (maybe
>> introducing
>>>>> class
>>>>>>>>>>>> FlinkConf),
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>> build ExecutionEnvironment based on this
>>>> FlinkConf,
>>>>>> and
>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will create the proper
>>>>>>>> ClusterClient.
>>>>>>>>>>> It
>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>> benefit for the downstream project
>> development
>>>> but
>>>>>> also
>>>>>>>> be
>>>>>>>>>>>>>>>> helpful
>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>> their integration test with flink. Here's one
>>>>> sample
>>>>>>> code
>>>>>>>>>>>> snippet
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>> expect.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> val conf = new FlinkConf().mode("yarn")
>>>>>>>>>>>>>>>>>>>>> val env = new ExecutionEnvironment(conf)
>>>>>>>>>>>>>>>>>>>>> val jobId = env.submit(...)
>>>>>>>>>>>>>>>>>>>>> val jobStatus =
>>>>>>>>>>> env.getClusterClient().queryJobStatus(jobId)
>>>>>>>>>>>>>>>>>>>>> env.getClusterClient().cancelJob(jobId)
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> What do you think ?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Till Rohrmann <tr...@apache.org>
>>>>> 于2018年12月11日周二
>>>>>>>>>>> 下午6:28写道：
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Hi Jeff,
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> what you are proposing is to provide the
>> user
>>>> with
>>>>>>>> better
>>>>>>>>>>>>>>>>>>> programmatic
>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>> control. There was actually an effort to
>>> achieve
>>>>>> this
>>>>>>>> but
>>>>>>>>>>> it
>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>> never
>>>>>>>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>>>>>>>> completed [1]. However, there are some
>>>> improvement
>>>>>> in
>>>>>>>> the
>>>>>>>>>>> code
>>>>>>>>>>>>>>>>> base
>>>>>>>>>>>>>>>>>>>> now.
>>>>>>>>>>>>>>>>>>>>>> Look for example at the NewClusterClient
>>>> interface
>>>>>>> which
>>>>>>>>>>>>>>>> offers a
>>>>>>>>>>>>>>>>>>>>>> non-blocking job submission. But I agree
>> that
>>> we
>>>>>> need
>>>>>>> to
>>>>>>>>>>>>>>>> improve
>>>>>>>>>>>>>>>>>>> Flink
>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>> this regard.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I would not be in favour if exposing all
>>>>>> ClusterClient
>>>>>>>>>>> calls
>>>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment because it would
>> clutter
>>>> the
>>>>>>> class
>>>>>>>>>>> and
>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>> good separation of concerns. Instead one
>> idea
>>>>> could
>>>>>> be
>>>>>>>> to
>>>>>>>>>>>>>>>>> retrieve
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> current ClusterClient from the
>>>>> ExecutionEnvironment
>>>>>>>> which
>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>>>>>>>> for cluster and job control. But before we
>>> start
>>>>> an
>>>>>>>> effort
>>>>>>>>>>>>>>>> here,
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> agree and capture what functionality we want
>>> to
>>>>>>> provide.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Initially, the idea was that we have the
>>>>>>>> ClusterDescriptor
>>>>>>>>>>>>>>>>>> describing
>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>> to talk to cluster manager like Yarn or
>> Mesos.
>>>> The
>>>>>>>>>>>>>>>>>> ClusterDescriptor
>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>> used for deploying Flink clusters (job and
>>>>> session)
>>>>>>> and
>>>>>>>>>>> gives
>>>>>>>>>>>>>>>>> you a
>>>>>>>>>>>>>>>>>>>>>> ClusterClient. The ClusterClient controls
>> the
>>>>>> cluster
>>>>>>>>>>> (e.g.
>>>>>>>>>>>>>>>>>>> submitting
>>>>>>>>>>>>>>>>>>>>>> jobs, listing all running jobs). And then
>>> there
>>>>> was
>>>>>>> the
>>>>>>>>>>> idea
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> introduce a
>>>>>>>>>>>>>>>>>>>>>> JobClient which you obtain from the
>>>> ClusterClient
>>>>> to
>>>>>>>>>>> trigger
>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>>>>>>>>> operations (e.g. taking a savepoint,
>>> cancelling
>>>>> the
>>>>>>>> job).
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>> https://issues.apache.org/jira/browse/FLINK-4272
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang
>> <
>>>>>>>>>>> zjffdu@gmail.com
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Hi Folks,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> I am trying to integrate flink into apache
>>>>> zeppelin
>>>>>>>>>>> which is
>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>> interactive
>>>>>>>>>>>>>>>>>>>>>>> notebook. And I hit several issues that is
>>>> caused
>>>>>> by
>>>>>>>>>>> flink
>>>>>>>>>>>>>>>>> client
>>>>>>>>>>>>>>>>>>>> api.
>>>>>>>>>>>>>>>>>>>>> So
>>>>>>>>>>>>>>>>>>>>>>> I'd like to proposal the following changes
>>> for
>>>>>> flink
>>>>>>>>>>> client
>>>>>>>>>>>>>>>>> api.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 1. Support nonblocking execution.
>> Currently,
>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment#execute
>>>>>>>>>>>>>>>>>>>>>>> is a blocking method which would do 2
>> things,
>>>>> first
>>>>>>>>>>> submit
>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>> wait for job until it is finished. I'd like
>>>>>>> introduce a
>>>>>>>>>>>>>>>>>> nonblocking
>>>>>>>>>>>>>>>>>>>>>>> execution method like
>>>> ExecutionEnvironment#submit
>>>>>>> which
>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>> submit
>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> then return jobId to client. And allow user
>>> to
>>>>>> query
>>>>>>>> the
>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>> status
>>>>>>>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> jobId.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 2. Add cancel api in
>>>>>>>>>>>>>>>>>>> 
>> ExecutionEnvironment/StreamExecutionEnvironment,
>>>>>>>>>>>>>>>>>>>>>>> currently the only way to cancel job is via
>>> cli
>>>>>>>>>>> (bin/flink),
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>> convenient for downstream project to use
>> this
>>>>>>> feature.
>>>>>>>>>>> So I'd
>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>>>>>>>>>> cancel api in ExecutionEnvironment
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 3. Add savepoint api in
>>>>>>>>>>>>>>>>>>>>> 
>>> ExecutionEnvironment/StreamExecutionEnvironment.
>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>> is similar as cancel api, we should use
>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> unified
>>>>>>>>>>>>>>>>>>>>>>> api for third party to integrate with
>> flink.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 4. Add listener for job execution
>> lifecycle.
>>>>>>> Something
>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>> following,
>>>>>>>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>>>>>>> that downstream project can do custom logic
>>> in
>>>>> the
>>>>>>>>>>> lifecycle
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> job.
>>>>>>>>>>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>>>>>>>>> Zeppelin would capture the jobId after job
>> is
>>>>>>> submitted
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>> jobId to cancel it later when necessary.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> public interface JobListener {
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>  void onJobSubmitted(JobID jobId);
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>  void onJobExecuted(JobExecutionResult
>>>>> jobResult);
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>  void onJobCanceled(JobID jobId);
>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 5. Enable session in ExecutionEnvironment.
>>>>>> Currently
>>>>>>> it
>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> disabled,
>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>> session is very convenient for third party
>> to
>>>>>>>> submitting
>>>>>>>>>>> jobs
>>>>>>>>>>>>>>>>>>>>>> continually.
>>>>>>>>>>>>>>>>>>>>>>> I hope flink can enable it again.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 6. Unify all flink client api into
>>>>>>>>>>>>>>>>>>>>>>> 
>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> This is a long term issue which needs more
>>>>> careful
>>>>>>>>>>> thinking
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> design.
>>>>>>>>>>>>>>>>>>>>>>> Currently some of features of flink is
>>> exposed
>>>> in
>>>>>>>>>>>>>>>>>>>>>>> 
>>>> ExecutionEnvironment/StreamExecutionEnvironment,
>>>>>> but
>>>>>>>>>>> some are
>>>>>>>>>>>>>>>>>>> exposed
>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>> cli instead of api, like the cancel and
>>>>> savepoint I
>>>>>>>>>>> mentioned
>>>>>>>>>>>>>>>>>>> above.
>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>> think the root cause is due to that flink
>>>> didn't
>>>>>>> unify
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> interaction
>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>> flink. Here I list 3 scenarios of flink
>>>> operation
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>  - Local job execution.  Flink will create
>>>>>>>>>>> LocalEnvironment
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>  this LocalEnvironment to create
>>> LocalExecutor
>>>>> for
>>>>>>> job
>>>>>>>>>>>>>>>>>> execution.
>>>>>>>>>>>>>>>>>>>>>>>  - Remote job execution. Flink will create
>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>>>  create ContextEnvironment based on the
>>>>>>> ClusterClient
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>> run
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> job.
>>>>>>>>>>>>>>>>>>>>>>>  - Job cancelation. Flink will create
>>>>>> ClusterClient
>>>>>>>>>>> first
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>>>>>>>> cancel
>>>>>>>>>>>>>>>>>>>>>>>  this job via this ClusterClient.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> As you can see in the above 3 scenarios.
>>> Flink
>>>>>> didn't
>>>>>>>>>>> use the
>>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>>>>> approach(code path) to interact with flink
>>>>>>>>>>>>>>>>>>>>>>> What I propose is following:
>>>>>>>>>>>>>>>>>>>>>>> Create the proper
>>>>>> LocalEnvironment/RemoteEnvironment
>>>>>>>>>>> (based
>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>> configuration) --> Use this Environment to
>>>> create
>>>>>>>> proper
>>>>>>>>>>>>>>>>>>>> ClusterClient
>>>>>>>>>>>>>>>>>>>>>>> (LocalClusterClient or RestClusterClient)
>> to
>>>>>>>> interactive
>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> Flink (
>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>> execution or cancelation)
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> This way we can unify the process of local
>>>>>> execution
>>>>>>>> and
>>>>>>>>>>>>>>>> remote
>>>>>>>>>>>>>>>>>>>>>> execution.
>>>>>>>>>>>>>>>>>>>>>>> And it is much easier for third party to
>>>>> integrate
>>>>>>> with
>>>>>>>>>>>>>>>> flink,
>>>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment is the unified entry
>>> point
>>>>> for
>>>>>>>>>>> flink.
>>>>>>>>>>>>>>>> What
>>>>>>>>>>>>>>>>>>> third
>>>>>>>>>>>>>>>>>>>>>> party
>>>>>>>>>>>>>>>>>>>>>>> needs to do is just pass configuration to
>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> ExecutionEnvironment will do the right
>> thing
>>>>> based
>>>>>> on
>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> configuration.
>>>>>>>>>>>>>>>>>>>>>>> Flink cli can also be considered as flink
>> api
>>>>>>> consumer.
>>>>>>>>>>> it
>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>> pass
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> configuration to ExecutionEnvironment and
>> let
>>>>>>>>>>>>>>>>>> ExecutionEnvironment
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> create the proper ClusterClient instead of
>>>>> letting
>>>>>>> cli
>>>>>>>> to
>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>>> ClusterClient directly.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 6 would involve large code refactoring, so
>> I
>>>>> think
>>>>>> we
>>>>>>>> can
>>>>>>>>>>>>>>>> defer
>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>> future release, 1,2,3,4,5 could be done at
>>>> once I
>>>>>>>>>>> believe.
>>>>>>>>>>>>>>>> Let
>>>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>>>> know
>>>>>>>>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>>>>>>>> comments and feedback, thanks
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jeff Zhang
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Best Regards
>>>>>>>> 
>>>>>>>> Jeff Zhang
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards
>>>>> 
>>>>> Jeff Zhang
>>>>> 
>>>> 
>>> 
>> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Jeff Zhang <zj...@gmail.com>.

Thanks tison for the effort. I left a few comments.


Zili Chen <wa...@gmail.com> 于2019年7月31日周三 下午8:24写道：

> Hi Flavio,
>
> Thanks for your reply.
>
> Either current impl and in the design, ClusterClient
> never takes responsibility for generating JobGraph.
> (what you see in current codebase is several class methods)
>
> Instead, user describes his program in the main method
> with ExecutionEnvironment apis and calls env.compile()
> or env.optimize() to get FlinkPlan and JobGraph respectively.
>
> For listing main classes in a jar and choose one for
> submission, you're now able to customize a CLI to do it.
> Specifically, the path of jar is passed as arguments and
> in the customized CLI you list main classes, choose one
> to submit to the cluster.
>
> Best,
> tison.
>
>
> Flavio Pompermaier <po...@okkam.it> 于2019年7月31日周三 下午8:12写道：
>
> > Just one note on my side: it is not clear to me whether the client needs
> to
> > be able to generate a job graph or not.
> > In my opinion, the job jar must resides only on the server/jobManager
> side
> > and the client requires a way to get the job graph.
> > If you really want to access to the job graph, I'd add a dedicated method
> > on the ClusterClient. like:
> >
> >    - getJobGraph(jarId, mainClass): JobGraph
> >    - listMainClasses(jarId): List<String>
> >
> > These would require some addition also on the job manager endpoint as
> > well..what do you think?
> >
> > On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <wa...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Here is a document[1] on client api enhancement from our perspective.
> > > We have investigated current implementations. And we propose
> > >
> > > 1. Unify the implementation of cluster deployment and job submission in
> > > Flink.
> > > 2. Provide programmatic interfaces to allow flexible job and cluster
> > > management.
> > >
> > > The first proposal is aimed at reducing code paths of cluster
> deployment
> > > and
> > > job submission so that one can adopt Flink in his usage easily. The
> > second
> > > proposal is aimed at providing rich interfaces for advanced users
> > > who want to make accurate control of these stages.
> > >
> > > Quick reference on open questions:
> > >
> > > 1. Exclude job cluster deployment from client side or redefine the
> > semantic
> > > of job cluster? Since it fits in a process quite different from session
> > > cluster deployment and job submission.
> > >
> > > 2. Maintain the codepaths handling class o.a.f.api.common.Program or
> > > implement customized program handling logic by customized CliFrontend?
> > > See also this thread[2] and the document[1].
> > >
> > > 3. Expose ClusterClient as public api or just expose api in
> > > ExecutionEnvironment
> > > and delegate them to ClusterClient? Further, in either way is it worth
> to
> > > introduce a JobClient which is an encapsulation of ClusterClient that
> > > associated to specific job?
> > >
> > > Best,
> > > tison.
> > >
> > > [1]
> > >
> > >
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> > > [2]
> > >
> > >
> >
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> > >
> > > Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三 上午9:19写道：
> > >
> > > > Thanks Stephan, I will follow up this issue in next few weeks, and
> will
> > > > refine the design doc. We could discuss more details after 1.9
> release.
> > > >
> > > > Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：
> > > >
> > > > > Hi all!
> > > > >
> > > > > This thread has stalled for a bit, which I assume ist mostly due to
> > the
> > > > > Flink 1.9 feature freeze and release testing effort.
> > > > >
> > > > > I personally still recognize this issue as one important to be
> > solved.
> > > > I'd
> > > > > be happy to help resume this discussion soon (after the 1.9
> release)
> > > and
> > > > > see if we can do some step towards this in Flink 1.10.
> > > > >
> > > > > Best,
> > > > > Stephan
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <
> > > > pompermaier@okkam.it>
> > > > > wrote:
> > > > >
> > > > > > That's exactly what I suggested a long time ago: the Flink REST
> > > client
> > > > > > should not require any Flink dependency, only http library to
> call
> > > the
> > > > > REST
> > > > > > services to submit and monitor a job.
> > > > > > What I suggested also in [1] was to have a way to automatically
> > > suggest
> > > > > the
> > > > > > user (via a UI) the available main classes and their required
> > > > > > parameters[2].
> > > > > > Another problem we have with Flink is that the Rest client and
> the
> > > CLI
> > > > > one
> > > > > > behaves differently and we use the CLI client (via ssh) because
> it
> > > > allows
> > > > > > to call some other method after env.execute() [3] (we have to
> call
> > > > > another
> > > > > > REST service to signal the end of the job).
> > > > > > Int his regard, a dedicated interface, like the JobListener
> > suggested
> > > > in
> > > > > > the previous emails, would be very helpful (IMHO).
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/FLINK-10864
> > > > > > [2] https://issues.apache.org/jira/browse/FLINK-10862
> > > > > > [3] https://issues.apache.org/jira/browse/FLINK-10879
> > > > > >
> > > > > > Best,
> > > > > > Flavio
> > > > > >
> > > > > > On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <zj...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Hi, Tison,
> > > > > > >
> > > > > > > Thanks for your comments. Overall I agree with you that it is
> > > > difficult
> > > > > > for
> > > > > > > down stream project to integrate with flink and we need to
> > refactor
> > > > the
> > > > > > > current flink client api.
> > > > > > > And I agree that CliFrontend should only parsing command line
> > > > arguments
> > > > > > and
> > > > > > > then pass them to ExecutionEnvironment. It is
> > > ExecutionEnvironment's
> > > > > > > responsibility to compile job, create cluster, and submit job.
> > > > Besides
> > > > > > > that, Currently flink has many ExecutionEnvironment
> > > implementations,
> > > > > and
> > > > > > > flink will use the specific one based on the context. IMHO, it
> is
> > > not
> > > > > > > necessary, ExecutionEnvironment should be able to do the right
> > > thing
> > > > > > based
> > > > > > > on the FlinkConf it is received. Too many ExecutionEnvironment
> > > > > > > implementation is another burden for downstream project
> > > integration.
> > > > > > >
> > > > > > > One thing I'd like to mention is flink's scala shell and sql
> > > client,
> > > > > > > although they are sub-modules of flink, they could be treated
> as
> > > > > > downstream
> > > > > > > project which use flink's client api. Currently you will find
> it
> > is
> > > > not
> > > > > > > easy for them to integrate with flink, they share many
> duplicated
> > > > code.
> > > > > > It
> > > > > > > is another sign that we should refactor flink client api.
> > > > > > >
> > > > > > > I believe it is a large and hard change, and I am afraid we can
> > not
> > > > > keep
> > > > > > > compatibility since many of changes are user facing.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > After a closer look on our client apis, I can see there are
> two
> > > > major
> > > > > > > > issues to consistency and integration, namely different
> > > deployment
> > > > of
> > > > > > > > job cluster which couples job graph creation and cluster
> > > > deployment,
> > > > > > > > and submission via CliFrontend confusing control flow of job
> > > graph
> > > > > > > > compilation and job submission. I'd like to follow the
> discuss
> > > > above,
> > > > > > > > mainly the process described by Jeff and Stephan, and share
> my
> > > > > > > > ideas on these issues.
> > > > > > > >
> > > > > > > > 1) CliFrontend confuses the control flow of job compilation
> and
> > > > > > > submission.
> > > > > > > > Following the process of job submission Stephan and Jeff
> > > described,
> > > > > > > > execution environment knows all configs of the cluster and
> > > > > > topos/settings
> > > > > > > > of the job. Ideally, in the main method of user program, it
> > calls
> > > > > > > #execute
> > > > > > > > (or named #submit) and Flink deploys the cluster, compile the
> > job
> > > > > graph
> > > > > > > > and submit it to the cluster. However, current CliFrontend
> does
> > > all
> > > > > > these
> > > > > > > > things inside its #runProgram method, which introduces a lot
> of
> > > > > > > subclasses
> > > > > > > > of (stream) execution environment.
> > > > > > > >
> > > > > > > > Actually, it sets up an exec env that hijacks the
> > > > > #execute/executePlan
> > > > > > > > method, initializes the job graph and abort execution. And
> then
> > > > > > > > control flow back to CliFrontend, it deploys the cluster(or
> > > > retrieve
> > > > > > > > the client) and submits the job graph. This is quite a
> specific
> > > > > > internal
> > > > > > > > process inside Flink and none of consistency to anything.
> > > > > > > >
> > > > > > > > 2) Deployment of job cluster couples job graph creation and
> > > cluster
> > > > > > > > deployment. Abstractly, from user job to a concrete
> submission,
> > > it
> > > > > > > requires
> > > > > > > >
> > > > > > > >      create JobGraph --\
> > > > > > > >
> > > > > > > > create ClusterClient -->  submit JobGraph
> > > > > > > >
> > > > > > > > such a dependency. ClusterClient was created by deploying or
> > > > > > retrieving.
> > > > > > > > JobGraph submission requires a compiled JobGraph and valid
> > > > > > ClusterClient,
> > > > > > > > but the creation of ClusterClient is abstractly independent
> of
> > > that
> > > > > of
> > > > > > > > JobGraph. However, in job cluster mode, we deploy job cluster
> > > with
> > > > a
> > > > > > job
> > > > > > > > graph, which means we use another process:
> > > > > > > >
> > > > > > > > create JobGraph --> deploy cluster with the JobGraph
> > > > > > > >
> > > > > > > > Here is another inconsistency and downstream projects/client
> > apis
> > > > are
> > > > > > > > forced to handle different cases with rare supports from
> Flink.
> > > > > > > >
> > > > > > > > Since we likely reached a consensus on
> > > > > > > >
> > > > > > > > 1. all configs gathered by Flink configuration and passed
> > > > > > > > 2. execution environment knows all configs and handles
> > > > execution(both
> > > > > > > > deployment and submission)
> > > > > > > >
> > > > > > > > to the issues above I propose eliminating inconsistencies by
> > > > > following
> > > > > > > > approach:
> > > > > > > >
> > > > > > > > 1) CliFrontend should exactly be a front end, at least for
> > "run"
> > > > > > command.
> > > > > > > > That means it just gathered and passed all config from
> command
> > > line
> > > > > to
> > > > > > > > the main method of user program. Execution environment knows
> > all
> > > > the
> > > > > > info
> > > > > > > > and with an addition to utils for ClusterClient, we
> gracefully
> > > get
> > > > a
> > > > > > > > ClusterClient by deploying or retrieving. In this way, we
> don't
> > > > need
> > > > > to
> > > > > > > > hijack #execute/executePlan methods and can remove various
> > > hacking
> > > > > > > > subclasses of exec env, as well as #run methods in
> > > > ClusterClient(for
> > > > > an
> > > > > > > > interface-ized ClusterClient). Now the control flow flows
> from
> > > > > > > CliFrontend
> > > > > > > > to the main method and never returns.
> > > > > > > >
> > > > > > > > 2) Job cluster means a cluster for the specific job. From
> > another
> > > > > > > > perspective, it is an ephemeral session. We may decouple the
> > > > > deployment
> > > > > > > > with a compiled job graph, but start a session with idle
> > timeout
> > > > > > > > and submit the job following.
> > > > > > > >
> > > > > > > > These topics, before we go into more details on design or
> > > > > > implementation,
> > > > > > > > are better to be aware and discussed for a consensus.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > tison.
> > > > > > > >
> > > > > > > >
> > > > > > > > Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
> > > > > > > >
> > > > > > > >> Hi Jeff,
> > > > > > > >>
> > > > > > > >> Thanks for raising this thread and the design document!
> > > > > > > >>
> > > > > > > >> As @Thomas Weise mentioned above, extending config to flink
> > > > > > > >> requires far more effort than it should be. Another example
> > > > > > > >> is we achieve detach mode by introduce another execution
> > > > > > > >> environment which also hijack #execute method.
> > > > > > > >>
> > > > > > > >> I agree with your idea that user would configure all things
> > > > > > > >> and flink "just" respect it. On this topic I think the
> unusual
> > > > > > > >> control flow when CliFrontend handle "run" command is the
> > > problem.
> > > > > > > >> It handles several configs, mainly about cluster settings,
> and
> > > > > > > >> thus main method of user program is unaware of them. Also it
> > > > > compiles
> > > > > > > >> app to job graph by run the main method with a hijacked exec
> > > env,
> > > > > > > >> which constrain the main method further.
> > > > > > > >>
> > > > > > > >> I'd like to write down a few of notes on configs/args pass
> and
> > > > > > respect,
> > > > > > > >> as well as decoupling job compilation and submission. Share
> on
> > > > this
> > > > > > > >> thread later.
> > > > > > > >>
> > > > > > > >> Best,
> > > > > > > >> tison.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一
> 下午7:29写道：
> > > > > > > >>
> > > > > > > >>> Hi Jeff and Flavio,
> > > > > > > >>>
> > > > > > > >>> Thanks Jeff a lot for proposing the design document.
> > > > > > > >>>
> > > > > > > >>> We are also working on refactoring ClusterClient to allow
> > > > flexible
> > > > > > and
> > > > > > > >>> efficient job management in our real-time platform.
> > > > > > > >>> We would like to draft a document to share our ideas with
> > you.
> > > > > > > >>>
> > > > > > > >>> I think it's a good idea to have something like Apache Livy
> > for
> > > > > > Flink,
> > > > > > > >>> and
> > > > > > > >>> the efforts discussed here will take a great step forward
> to
> > > it.
> > > > > > > >>>
> > > > > > > >>> Regards,
> > > > > > > >>> Xiaogang
> > > > > > > >>>
> > > > > > > >>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一
> > > > 下午7:13写道：
> > > > > > > >>>
> > > > > > > >>> > Is there any possibility to have something like Apache
> Livy
> > > [1]
> > > > > > also
> > > > > > > >>> for
> > > > > > > >>> > Flink in the future?
> > > > > > > >>> >
> > > > > > > >>> > [1] https://livy.apache.org/
> > > > > > > >>> >
> > > > > > > >>> > On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <
> > zjffdu@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >>> >
> > > > > > > >>> > > >>>  Any API we expose should not have dependencies on
> > the
> > > > > > runtime
> > > > > > > >>> > > (flink-runtime) package or other implementation
> details.
> > To
> > > > me,
> > > > > > > this
> > > > > > > >>> > means
> > > > > > > >>> > > that the current ClusterClient cannot be exposed to
> users
> > > > > because
> > > > > > > it
> > > > > > > >>> >  uses
> > > > > > > >>> > > quite some classes from the optimiser and runtime
> > packages.
> > > > > > > >>> > >
> > > > > > > >>> > > We should change ClusterClient from class to interface.
> > > > > > > >>> > > ExecutionEnvironment only use the interface
> ClusterClient
> > > > which
> > > > > > > >>> should be
> > > > > > > >>> > > in flink-clients while the concrete implementation
> class
> > > > could
> > > > > be
> > > > > > > in
> > > > > > > >>> > > flink-runtime.
> > > > > > > >>> > >
> > > > > > > >>> > > >>> What happens when a failure/restart in the client
> > > > happens?
> > > > > > > There
> > > > > > > >>> need
> > > > > > > >>> > > to be a way of re-establishing the connection to the
> job,
> > > set
> > > > > up
> > > > > > > the
> > > > > > > >>> > > listeners again, etc.
> > > > > > > >>> > >
> > > > > > > >>> > > Good point.  First we need to define what does
> > > > failure/restart
> > > > > in
> > > > > > > the
> > > > > > > >>> > > client mean. IIUC, that usually mean network failure
> > which
> > > > will
> > > > > > > >>> happen in
> > > > > > > >>> > > class RestClient. If my understanding is correct,
> > > > restart/retry
> > > > > > > >>> mechanism
> > > > > > > >>> > > should be done in RestClient.
> > > > > > > >>> > >
> > > > > > > >>> > >
> > > > > > > >>> > >
> > > > > > > >>> > >
> > > > > > > >>> > >
> > > > > > > >>> > > Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二
> > > > > 下午11:10写道：
> > > > > > > >>> > >
> > > > > > > >>> > > > Some points to consider:
> > > > > > > >>> > > >
> > > > > > > >>> > > > * Any API we expose should not have dependencies on
> the
> > > > > runtime
> > > > > > > >>> > > > (flink-runtime) package or other implementation
> > details.
> > > To
> > > > > me,
> > > > > > > >>> this
> > > > > > > >>> > > means
> > > > > > > >>> > > > that the current ClusterClient cannot be exposed to
> > users
> > > > > > because
> > > > > > > >>> it
> > > > > > > >>> > >  uses
> > > > > > > >>> > > > quite some classes from the optimiser and runtime
> > > packages.
> > > > > > > >>> > > >
> > > > > > > >>> > > > * What happens when a failure/restart in the client
> > > > happens?
> > > > > > > There
> > > > > > > >>> need
> > > > > > > >>> > > to
> > > > > > > >>> > > > be a way of re-establishing the connection to the
> job,
> > > set
> > > > up
> > > > > > the
> > > > > > > >>> > > listeners
> > > > > > > >>> > > > again, etc.
> > > > > > > >>> > > >
> > > > > > > >>> > > > Aljoscha
> > > > > > > >>> > > >
> > > > > > > >>> > > > > On 29. May 2019, at 10:17, Jeff Zhang <
> > > zjffdu@gmail.com>
> > > > > > > wrote:
> > > > > > > >>> > > > >
> > > > > > > >>> > > > > Sorry folks, the design doc is late as you
> expected.
> > > > Here's
> > > > > > the
> > > > > > > >>> > design
> > > > > > > >>> > > > doc
> > > > > > > >>> > > > > I drafted, welcome any comments and feedback.
> > > > > > > >>> > > > >
> > > > > > > >>> > > > >
> > > > > > > >>> > > >
> > > > > > > >>> > >
> > > > > > > >>> >
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > > > > > >>> > > > >
> > > > > > > >>> > > > >
> > > > > > > >>> > > > >
> > > > > > > >>> > > > > Stephan Ewen <se...@apache.org> 于2019年2月14日周四
> > > 下午8:43写道：
> > > > > > > >>> > > > >
> > > > > > > >>> > > > >> Nice that this discussion is happening.
> > > > > > > >>> > > > >>
> > > > > > > >>> > > > >> In the FLIP, we could also revisit the entire role
> > of
> > > > the
> > > > > > > >>> > environments
> > > > > > > >>> > > > >> again.
> > > > > > > >>> > > > >>
> > > > > > > >>> > > > >> Initially, the idea was:
> > > > > > > >>> > > > >>  - the environments take care of the specific
> setup
> > > for
> > > > > > > >>> standalone
> > > > > > > >>> > (no
> > > > > > > >>> > > > >> setup needed), yarn, mesos, etc.
> > > > > > > >>> > > > >>  - the session ones have control over the session.
> > The
> > > > > > > >>> environment
> > > > > > > >>> > > holds
> > > > > > > >>> > > > >> the session client.
> > > > > > > >>> > > > >>  - running a job gives a "control" object for that
> > > job.
> > > > > That
> > > > > > > >>> > behavior
> > > > > > > >>> > > is
> > > > > > > >>> > > > >> the same in all environments.
> > > > > > > >>> > > > >>
> > > > > > > >>> > > > >> The actual implementation diverged quite a bit
> from
> > > > that.
> > > > > > > Happy
> > > > > > > >>> to
> > > > > > > >>> > > see a
> > > > > > > >>> > > > >> discussion about straitening this out a bit more.
> > > > > > > >>> > > > >>
> > > > > > > >>> > > > >>
> > > > > > > >>> > > > >> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
> > > > > > zjffdu@gmail.com>
> > > > > > > >>> > wrote:
> > > > > > > >>> > > > >>
> > > > > > > >>> > > > >>> Hi folks,
> > > > > > > >>> > > > >>>
> > > > > > > >>> > > > >>> Sorry for late response, It seems we reach
> > consensus
> > > on
> > > > > > > this, I
> > > > > > > >>> > will
> > > > > > > >>> > > > >> create
> > > > > > > >>> > > > >>> FLIP for this with more detailed design
> > > > > > > >>> > > > >>>
> > > > > > > >>> > > > >>>
> > > > > > > >>> > > > >>> Thomas Weise <th...@apache.org> 于2018年12月21日周五
> > > > 上午11:43写道：
> > > > > > > >>> > > > >>>
> > > > > > > >>> > > > >>>> Great to see this discussion seeded! The
> problems
> > > you
> > > > > face
> > > > > > > >>> with
> > > > > > > >>> > the
> > > > > > > >>> > > > >>>> Zeppelin integration are also affecting other
> > > > downstream
> > > > > > > >>> projects,
> > > > > > > >>> > > > like
> > > > > > > >>> > > > >>>> Beam.
> > > > > > > >>> > > > >>>>
> > > > > > > >>> > > > >>>> We just enabled the savepoint restore option in
> > > > > > > >>> > > > RemoteStreamEnvironment
> > > > > > > >>> > > > >>> [1]
> > > > > > > >>> > > > >>>> and that was more difficult than it should be.
> The
> > > > main
> > > > > > > issue
> > > > > > > >>> is
> > > > > > > >>> > > that
> > > > > > > >>> > > > >>>> environment and cluster client aren't decoupled.
> > > > Ideally
> > > > > > it
> > > > > > > >>> should
> > > > > > > >>> > > be
> > > > > > > >>> > > > >>>> possible to just get the matching cluster client
> > > from
> > > > > the
> > > > > > > >>> > > environment
> > > > > > > >>> > > > >> and
> > > > > > > >>> > > > >>>> then control the job through it (environment as
> > > > factory
> > > > > > for
> > > > > > > >>> > cluster
> > > > > > > >>> > > > >>>> client). But note that the environment classes
> are
> > > > part
> > > > > of
> > > > > > > the
> > > > > > > >>> > > public
> > > > > > > >>> > > > >>> API,
> > > > > > > >>> > > > >>>> and it is not straightforward to make larger
> > changes
> > > > > > without
> > > > > > > >>> > > breaking
> > > > > > > >>> > > > >>>> backward compatibility.
> > > > > > > >>> > > > >>>>
> > > > > > > >>> > > > >>>> ClusterClient currently exposes internal classes
> > > like
> > > > > > > >>> JobGraph and
> > > > > > > >>> > > > >>>> StreamGraph. But it should be possible to wrap
> > this
> > > > > with a
> > > > > > > new
> > > > > > > >>> > > public
> > > > > > > >>> > > > >> API
> > > > > > > >>> > > > >>>> that brings the required job control
> capabilities
> > > for
> > > > > > > >>> downstream
> > > > > > > >>> > > > >>> projects.
> > > > > > > >>> > > > >>>> Perhaps it is helpful to look at some of the
> > > > interfaces
> > > > > in
> > > > > > > >>> Beam
> > > > > > > >>> > > while
> > > > > > > >>> > > > >>>> thinking about this: [2] for the portable job
> API
> > > and
> > > > > [3]
> > > > > > > for
> > > > > > > >>> the
> > > > > > > >>> > > old
> > > > > > > >>> > > > >>>> asynchronous job control from the Beam Java SDK.
> > > > > > > >>> > > > >>>>
> > > > > > > >>> > > > >>>> The backward compatibility discussion [4] is
> also
> > > > > relevant
> > > > > > > >>> here. A
> > > > > > > >>> > > new
> > > > > > > >>> > > > >>> API
> > > > > > > >>> > > > >>>> should shield downstream projects from internals
> > and
> > > > > allow
> > > > > > > >>> them to
> > > > > > > >>> > > > >>>> interoperate with multiple future Flink versions
> > in
> > > > the
> > > > > > same
> > > > > > > >>> > release
> > > > > > > >>> > > > >> line
> > > > > > > >>> > > > >>>> without forced upgrades.
> > > > > > > >>> > > > >>>>
> > > > > > > >>> > > > >>>> Thanks,
> > > > > > > >>> > > > >>>> Thomas
> > > > > > > >>> > > > >>>>
> > > > > > > >>> > > > >>>> [1] https://github.com/apache/flink/pull/7249
> > > > > > > >>> > > > >>>> [2]
> > > > > > > >>> > > > >>>>
> > > > > > > >>> > > > >>>>
> > > > > > > >>> > > > >>>
> > > > > > > >>> > > > >>
> > > > > > > >>> > > >
> > > > > > > >>> > >
> > > > > > > >>> >
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > > > > > >>> > > > >>>> [3]
> > > > > > > >>> > > > >>>>
> > > > > > > >>> > > > >>>>
> > > > > > > >>> > > > >>>
> > > > > > > >>> > > > >>
> > > > > > > >>> > > >
> > > > > > > >>> > >
> > > > > > > >>> >
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > > > > > >>> > > > >>>> [4]
> > > > > > > >>> > > > >>>>
> > > > > > > >>> > > > >>>>
> > > > > > > >>> > > > >>>
> > > > > > > >>> > > > >>
> > > > > > > >>> > > >
> > > > > > > >>> > >
> > > > > > > >>> >
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > > > > > >>> > > > >>>>
> > > > > > > >>> > > > >>>>
> > > > > > > >>> > > > >>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
> > > > > > > zjffdu@gmail.com>
> > > > > > > >>> > > wrote:
> > > > > > > >>> > > > >>>>
> > > > > > > >>> > > > >>>>>>>> I'm not so sure whether the user should be
> > able
> > > to
> > > > > > > define
> > > > > > > >>> > where
> > > > > > > >>> > > > >> the
> > > > > > > >>> > > > >>>> job
> > > > > > > >>> > > > >>>>> runs (in your example Yarn). This is actually
> > > > > independent
> > > > > > > of
> > > > > > > >>> the
> > > > > > > >>> > > job
> > > > > > > >>> > > > >>>>> development and is something which is decided
> at
> > > > > > deployment
> > > > > > > >>> time.
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>> User don't need to specify execution mode
> > > > > > programmatically.
> > > > > > > >>> They
> > > > > > > >>> > > can
> > > > > > > >>> > > > >>> also
> > > > > > > >>> > > > >>>>> pass the execution mode from the arguments in
> > flink
> > > > run
> > > > > > > >>> command.
> > > > > > > >>> > > e.g.
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>> bin/flink run -m yarn-cluster ....
> > > > > > > >>> > > > >>>>> bin/flink run -m local ...
> > > > > > > >>> > > > >>>>> bin/flink run -m host:port ...
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>> Does this make sense to you ?
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>>>>> To me it makes sense that the
> > > ExecutionEnvironment
> > > > > is
> > > > > > > not
> > > > > > > >>> > > > >> directly
> > > > > > > >>> > > > >>>>> initialized by the user and instead context
> > > sensitive
> > > > > how
> > > > > > > you
> > > > > > > >>> > want
> > > > > > > >>> > > to
> > > > > > > >>> > > > >>>>> execute your job (Flink CLI vs. IDE, for
> > example).
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>> Right, currently I notice Flink would create
> > > > different
> > > > > > > >>> > > > >>>>> ContextExecutionEnvironment based on different
> > > > > submission
> > > > > > > >>> > scenarios
> > > > > > > >>> > > > >>>> (Flink
> > > > > > > >>> > > > >>>>> Cli vs IDE). To me this is kind of hack
> approach,
> > > not
> > > > > so
> > > > > > > >>> > > > >>> straightforward.
> > > > > > > >>> > > > >>>>> What I suggested above is that is that flink
> > should
> > > > > > always
> > > > > > > >>> create
> > > > > > > >>> > > the
> > > > > > > >>> > > > >>>> same
> > > > > > > >>> > > > >>>>> ExecutionEnvironment but with different
> > > > configuration,
> > > > > > and
> > > > > > > >>> based
> > > > > > > >>> > on
> > > > > > > >>> > > > >> the
> > > > > > > >>> > > > >>>>> configuration it would create the proper
> > > > ClusterClient
> > > > > > for
> > > > > > > >>> > > different
> > > > > > > >>> > > > >>>>> behaviors.
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>> Till Rohrmann <tr...@apache.org>
> > > 于2018年12月20日周四
> > > > > > > >>> 下午11:18写道：
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>>> You are probably right that we have code
> > > duplication
> > > > > > when
> > > > > > > it
> > > > > > > >>> > comes
> > > > > > > >>> > > > >> to
> > > > > > > >>> > > > >>>> the
> > > > > > > >>> > > > >>>>>> creation of the ClusterClient. This should be
> > > > reduced
> > > > > in
> > > > > > > the
> > > > > > > >>> > > > >> future.
> > > > > > > >>> > > > >>>>>>
> > > > > > > >>> > > > >>>>>> I'm not so sure whether the user should be
> able
> > to
> > > > > > define
> > > > > > > >>> where
> > > > > > > >>> > > the
> > > > > > > >>> > > > >>> job
> > > > > > > >>> > > > >>>>>> runs (in your example Yarn). This is actually
> > > > > > independent
> > > > > > > >>> of the
> > > > > > > >>> > > > >> job
> > > > > > > >>> > > > >>>>>> development and is something which is decided
> at
> > > > > > > deployment
> > > > > > > >>> > time.
> > > > > > > >>> > > > >> To
> > > > > > > >>> > > > >>> me
> > > > > > > >>> > > > >>>>> it
> > > > > > > >>> > > > >>>>>> makes sense that the ExecutionEnvironment is
> not
> > > > > > directly
> > > > > > > >>> > > > >> initialized
> > > > > > > >>> > > > >>>> by
> > > > > > > >>> > > > >>>>>> the user and instead context sensitive how you
> > > want
> > > > to
> > > > > > > >>> execute
> > > > > > > >>> > > your
> > > > > > > >>> > > > >>> job
> > > > > > > >>> > > > >>>>>> (Flink CLI vs. IDE, for example). However, I
> > agree
> > > > > that
> > > > > > > the
> > > > > > > >>> > > > >>>>>> ExecutionEnvironment should give you access to
> > the
> > > > > > > >>> ClusterClient
> > > > > > > >>> > > > >> and
> > > > > > > >>> > > > >>> to
> > > > > > > >>> > > > >>>>> the
> > > > > > > >>> > > > >>>>>> job (maybe in the form of the JobGraph or a
> job
> > > > plan).
> > > > > > > >>> > > > >>>>>>
> > > > > > > >>> > > > >>>>>> Cheers,
> > > > > > > >>> > > > >>>>>> Till
> > > > > > > >>> > > > >>>>>>
> > > > > > > >>> > > > >>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
> > > > > > > >>> zjffdu@gmail.com>
> > > > > > > >>> > > > >> wrote:
> > > > > > > >>> > > > >>>>>>
> > > > > > > >>> > > > >>>>>>> Hi Till,
> > > > > > > >>> > > > >>>>>>> Thanks for the feedback. You are right that I
> > > > expect
> > > > > > > better
> > > > > > > >>> > > > >>>>> programmatic
> > > > > > > >>> > > > >>>>>>> job submission/control api which could be
> used
> > by
> > > > > > > >>> downstream
> > > > > > > >>> > > > >>> project.
> > > > > > > >>> > > > >>>>> And
> > > > > > > >>> > > > >>>>>>> it would benefit for the flink ecosystem.
> When
> > I
> > > > look
> > > > > > at
> > > > > > > >>> the
> > > > > > > >>> > code
> > > > > > > >>> > > > >>> of
> > > > > > > >>> > > > >>>>>> flink
> > > > > > > >>> > > > >>>>>>> scala-shell and sql-client (I believe they
> are
> > > not
> > > > > the
> > > > > > > >>> core of
> > > > > > > >>> > > > >>> flink,
> > > > > > > >>> > > > >>>>> but
> > > > > > > >>> > > > >>>>>>> belong to the ecosystem of flink), I find
> many
> > > > > > duplicated
> > > > > > > >>> code
> > > > > > > >>> > > > >> for
> > > > > > > >>> > > > >>>>>> creating
> > > > > > > >>> > > > >>>>>>> ClusterClient from user provided
> configuration
> > > > > > > >>> (configuration
> > > > > > > >>> > > > >>> format
> > > > > > > >>> > > > >>>>> may
> > > > > > > >>> > > > >>>>>> be
> > > > > > > >>> > > > >>>>>>> different from scala-shell and sql-client)
> and
> > > then
> > > > > use
> > > > > > > >>> that
> > > > > > > >>> > > > >>>>>> ClusterClient
> > > > > > > >>> > > > >>>>>>> to manipulate jobs. I don't think this is
> > > > convenient
> > > > > > for
> > > > > > > >>> > > > >> downstream
> > > > > > > >>> > > > >>>>>>> projects. What I expect is that downstream
> > > project
> > > > > only
> > > > > > > >>> needs
> > > > > > > >>> > to
> > > > > > > >>> > > > >>>>> provide
> > > > > > > >>> > > > >>>>>>> necessary configuration info (maybe
> introducing
> > > > class
> > > > > > > >>> > FlinkConf),
> > > > > > > >>> > > > >>> and
> > > > > > > >>> > > > >>>>>> then
> > > > > > > >>> > > > >>>>>>> build ExecutionEnvironment based on this
> > > FlinkConf,
> > > > > and
> > > > > > > >>> > > > >>>>>>> ExecutionEnvironment will create the proper
> > > > > > > ClusterClient.
> > > > > > > >>> It
> > > > > > > >>> > not
> > > > > > > >>> > > > >>>> only
> > > > > > > >>> > > > >>>>>>> benefit for the downstream project
> development
> > > but
> > > > > also
> > > > > > > be
> > > > > > > >>> > > > >> helpful
> > > > > > > >>> > > > >>>> for
> > > > > > > >>> > > > >>>>>>> their integration test with flink. Here's one
> > > > sample
> > > > > > code
> > > > > > > >>> > snippet
> > > > > > > >>> > > > >>>> that
> > > > > > > >>> > > > >>>>> I
> > > > > > > >>> > > > >>>>>>> expect.
> > > > > > > >>> > > > >>>>>>>
> > > > > > > >>> > > > >>>>>>> val conf = new FlinkConf().mode("yarn")
> > > > > > > >>> > > > >>>>>>> val env = new ExecutionEnvironment(conf)
> > > > > > > >>> > > > >>>>>>> val jobId = env.submit(...)
> > > > > > > >>> > > > >>>>>>> val jobStatus =
> > > > > > > >>> env.getClusterClient().queryJobStatus(jobId)
> > > > > > > >>> > > > >>>>>>> env.getClusterClient().cancelJob(jobId)
> > > > > > > >>> > > > >>>>>>>
> > > > > > > >>> > > > >>>>>>> What do you think ?
> > > > > > > >>> > > > >>>>>>>
> > > > > > > >>> > > > >>>>>>>
> > > > > > > >>> > > > >>>>>>>
> > > > > > > >>> > > > >>>>>>>
> > > > > > > >>> > > > >>>>>>> Till Rohrmann <tr...@apache.org>
> > > > 于2018年12月11日周二
> > > > > > > >>> 下午6:28写道：
> > > > > > > >>> > > > >>>>>>>
> > > > > > > >>> > > > >>>>>>>> Hi Jeff,
> > > > > > > >>> > > > >>>>>>>>
> > > > > > > >>> > > > >>>>>>>> what you are proposing is to provide the
> user
> > > with
> > > > > > > better
> > > > > > > >>> > > > >>>>> programmatic
> > > > > > > >>> > > > >>>>>>> job
> > > > > > > >>> > > > >>>>>>>> control. There was actually an effort to
> > achieve
> > > > > this
> > > > > > > but
> > > > > > > >>> it
> > > > > > > >>> > > > >> has
> > > > > > > >>> > > > >>>>> never
> > > > > > > >>> > > > >>>>>>> been
> > > > > > > >>> > > > >>>>>>>> completed [1]. However, there are some
> > > improvement
> > > > > in
> > > > > > > the
> > > > > > > >>> code
> > > > > > > >>> > > > >>> base
> > > > > > > >>> > > > >>>>>> now.
> > > > > > > >>> > > > >>>>>>>> Look for example at the NewClusterClient
> > > interface
> > > > > > which
> > > > > > > >>> > > > >> offers a
> > > > > > > >>> > > > >>>>>>>> non-blocking job submission. But I agree
> that
> > we
> > > > > need
> > > > > > to
> > > > > > > >>> > > > >> improve
> > > > > > > >>> > > > >>>>> Flink
> > > > > > > >>> > > > >>>>>> in
> > > > > > > >>> > > > >>>>>>>> this regard.
> > > > > > > >>> > > > >>>>>>>>
> > > > > > > >>> > > > >>>>>>>> I would not be in favour if exposing all
> > > > > ClusterClient
> > > > > > > >>> calls
> > > > > > > >>> > > > >> via
> > > > > > > >>> > > > >>>> the
> > > > > > > >>> > > > >>>>>>>> ExecutionEnvironment because it would
> clutter
> > > the
> > > > > > class
> > > > > > > >>> and
> > > > > > > >>> > > > >> would
> > > > > > > >>> > > > >>>> not
> > > > > > > >>> > > > >>>>>> be
> > > > > > > >>> > > > >>>>>>> a
> > > > > > > >>> > > > >>>>>>>> good separation of concerns. Instead one
> idea
> > > > could
> > > > > be
> > > > > > > to
> > > > > > > >>> > > > >>> retrieve
> > > > > > > >>> > > > >>>>> the
> > > > > > > >>> > > > >>>>>>>> current ClusterClient from the
> > > > ExecutionEnvironment
> > > > > > > which
> > > > > > > >>> can
> > > > > > > >>> > > > >>> then
> > > > > > > >>> > > > >>>> be
> > > > > > > >>> > > > >>>>>>> used
> > > > > > > >>> > > > >>>>>>>> for cluster and job control. But before we
> > start
> > > > an
> > > > > > > effort
> > > > > > > >>> > > > >> here,
> > > > > > > >>> > > > >>> we
> > > > > > > >>> > > > >>>>>> need
> > > > > > > >>> > > > >>>>>>> to
> > > > > > > >>> > > > >>>>>>>> agree and capture what functionality we want
> > to
> > > > > > provide.
> > > > > > > >>> > > > >>>>>>>>
> > > > > > > >>> > > > >>>>>>>> Initially, the idea was that we have the
> > > > > > > ClusterDescriptor
> > > > > > > >>> > > > >>>> describing
> > > > > > > >>> > > > >>>>>> how
> > > > > > > >>> > > > >>>>>>>> to talk to cluster manager like Yarn or
> Mesos.
> > > The
> > > > > > > >>> > > > >>>> ClusterDescriptor
> > > > > > > >>> > > > >>>>>> can
> > > > > > > >>> > > > >>>>>>> be
> > > > > > > >>> > > > >>>>>>>> used for deploying Flink clusters (job and
> > > > session)
> > > > > > and
> > > > > > > >>> gives
> > > > > > > >>> > > > >>> you a
> > > > > > > >>> > > > >>>>>>>> ClusterClient. The ClusterClient controls
> the
> > > > > cluster
> > > > > > > >>> (e.g.
> > > > > > > >>> > > > >>>>> submitting
> > > > > > > >>> > > > >>>>>>>> jobs, listing all running jobs). And then
> > there
> > > > was
> > > > > > the
> > > > > > > >>> idea
> > > > > > > >>> > to
> > > > > > > >>> > > > >>>>>>> introduce a
> > > > > > > >>> > > > >>>>>>>> JobClient which you obtain from the
> > > ClusterClient
> > > > to
> > > > > > > >>> trigger
> > > > > > > >>> > > > >> job
> > > > > > > >>> > > > >>>>>> specific
> > > > > > > >>> > > > >>>>>>>> operations (e.g. taking a savepoint,
> > cancelling
> > > > the
> > > > > > > job).
> > > > > > > >>> > > > >>>>>>>>
> > > > > > > >>> > > > >>>>>>>> [1]
> > > > > https://issues.apache.org/jira/browse/FLINK-4272
> > > > > > > >>> > > > >>>>>>>>
> > > > > > > >>> > > > >>>>>>>> Cheers,
> > > > > > > >>> > > > >>>>>>>> Till
> > > > > > > >>> > > > >>>>>>>>
> > > > > > > >>> > > > >>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang
> <
> > > > > > > >>> zjffdu@gmail.com
> > > > > > > >>> > >
> > > > > > > >>> > > > >>>>> wrote:
> > > > > > > >>> > > > >>>>>>>>
> > > > > > > >>> > > > >>>>>>>>> Hi Folks,
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>> I am trying to integrate flink into apache
> > > > zeppelin
> > > > > > > >>> which is
> > > > > > > >>> > > > >> an
> > > > > > > >>> > > > >>>>>>>> interactive
> > > > > > > >>> > > > >>>>>>>>> notebook. And I hit several issues that is
> > > caused
> > > > > by
> > > > > > > >>> flink
> > > > > > > >>> > > > >>> client
> > > > > > > >>> > > > >>>>>> api.
> > > > > > > >>> > > > >>>>>>> So
> > > > > > > >>> > > > >>>>>>>>> I'd like to proposal the following changes
> > for
> > > > > flink
> > > > > > > >>> client
> > > > > > > >>> > > > >>> api.
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>> 1. Support nonblocking execution.
> Currently,
> > > > > > > >>> > > > >>>>>>> ExecutionEnvironment#execute
> > > > > > > >>> > > > >>>>>>>>> is a blocking method which would do 2
> things,
> > > > first
> > > > > > > >>> submit
> > > > > > > >>> > > > >> job
> > > > > > > >>> > > > >>>> and
> > > > > > > >>> > > > >>>>>> then
> > > > > > > >>> > > > >>>>>>>>> wait for job until it is finished. I'd like
> > > > > > introduce a
> > > > > > > >>> > > > >>>> nonblocking
> > > > > > > >>> > > > >>>>>>>>> execution method like
> > > ExecutionEnvironment#submit
> > > > > > which
> > > > > > > >>> only
> > > > > > > >>> > > > >>>> submit
> > > > > > > >>> > > > >>>>>> job
> > > > > > > >>> > > > >>>>>>>> and
> > > > > > > >>> > > > >>>>>>>>> then return jobId to client. And allow user
> > to
> > > > > query
> > > > > > > the
> > > > > > > >>> job
> > > > > > > >>> > > > >>>> status
> > > > > > > >>> > > > >>>>>> via
> > > > > > > >>> > > > >>>>>>>> the
> > > > > > > >>> > > > >>>>>>>>> jobId.
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>> 2. Add cancel api in
> > > > > > > >>> > > > >>>>>
> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > > > >>> > > > >>>>>>>>> currently the only way to cancel job is via
> > cli
> > > > > > > >>> (bin/flink),
> > > > > > > >>> > > > >>> this
> > > > > > > >>> > > > >>>>> is
> > > > > > > >>> > > > >>>>>>> not
> > > > > > > >>> > > > >>>>>>>>> convenient for downstream project to use
> this
> > > > > > feature.
> > > > > > > >>> So I'd
> > > > > > > >>> > > > >>>> like
> > > > > > > >>> > > > >>>>> to
> > > > > > > >>> > > > >>>>>>> add
> > > > > > > >>> > > > >>>>>>>>> cancel api in ExecutionEnvironment
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>> 3. Add savepoint api in
> > > > > > > >>> > > > >>>>>>>
> > ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > > >>> > > > >>>>>>>> It
> > > > > > > >>> > > > >>>>>>>>> is similar as cancel api, we should use
> > > > > > > >>> ExecutionEnvironment
> > > > > > > >>> > > > >> as
> > > > > > > >>> > > > >>>> the
> > > > > > > >>> > > > >>>>>>>> unified
> > > > > > > >>> > > > >>>>>>>>> api for third party to integrate with
> flink.
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>> 4. Add listener for job execution
> lifecycle.
> > > > > > Something
> > > > > > > >>> like
> > > > > > > >>> > > > >>>>>> following,
> > > > > > > >>> > > > >>>>>>> so
> > > > > > > >>> > > > >>>>>>>>> that downstream project can do custom logic
> > in
> > > > the
> > > > > > > >>> lifecycle
> > > > > > > >>> > > > >> of
> > > > > > > >>> > > > >>>>> job.
> > > > > > > >>> > > > >>>>>>> e.g.
> > > > > > > >>> > > > >>>>>>>>> Zeppelin would capture the jobId after job
> is
> > > > > > submitted
> > > > > > > >>> and
> > > > > > > >>> > > > >>> then
> > > > > > > >>> > > > >>>>> use
> > > > > > > >>> > > > >>>>>>> this
> > > > > > > >>> > > > >>>>>>>>> jobId to cancel it later when necessary.
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>> public interface JobListener {
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>>   void onJobSubmitted(JobID jobId);
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>>   void onJobExecuted(JobExecutionResult
> > > > jobResult);
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>>   void onJobCanceled(JobID jobId);
> > > > > > > >>> > > > >>>>>>>>> }
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>> 5. Enable session in ExecutionEnvironment.
> > > > > Currently
> > > > > > it
> > > > > > > >>> is
> > > > > > > >>> > > > >>>>> disabled,
> > > > > > > >>> > > > >>>>>>> but
> > > > > > > >>> > > > >>>>>>>>> session is very convenient for third party
> to
> > > > > > > submitting
> > > > > > > >>> jobs
> > > > > > > >>> > > > >>>>>>>> continually.
> > > > > > > >>> > > > >>>>>>>>> I hope flink can enable it again.
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>> 6. Unify all flink client api into
> > > > > > > >>> > > > >>>>>>>>>
> > > ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>> This is a long term issue which needs more
> > > > careful
> > > > > > > >>> thinking
> > > > > > > >>> > > > >> and
> > > > > > > >>> > > > >>>>>> design.
> > > > > > > >>> > > > >>>>>>>>> Currently some of features of flink is
> > exposed
> > > in
> > > > > > > >>> > > > >>>>>>>>>
> > > ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > but
> > > > > > > >>> some are
> > > > > > > >>> > > > >>>>> exposed
> > > > > > > >>> > > > >>>>>>> in
> > > > > > > >>> > > > >>>>>>>>> cli instead of api, like the cancel and
> > > > savepoint I
> > > > > > > >>> mentioned
> > > > > > > >>> > > > >>>>> above.
> > > > > > > >>> > > > >>>>>> I
> > > > > > > >>> > > > >>>>>>>>> think the root cause is due to that flink
> > > didn't
> > > > > > unify
> > > > > > > >>> the
> > > > > > > >>> > > > >>>>>> interaction
> > > > > > > >>> > > > >>>>>>>> with
> > > > > > > >>> > > > >>>>>>>>> flink. Here I list 3 scenarios of flink
> > > operation
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>>   - Local job execution.  Flink will create
> > > > > > > >>> LocalEnvironment
> > > > > > > >>> > > > >>> and
> > > > > > > >>> > > > >>>>>> then
> > > > > > > >>> > > > >>>>>>>> use
> > > > > > > >>> > > > >>>>>>>>>   this LocalEnvironment to create
> > LocalExecutor
> > > > for
> > > > > > job
> > > > > > > >>> > > > >>>> execution.
> > > > > > > >>> > > > >>>>>>>>>   - Remote job execution. Flink will create
> > > > > > > ClusterClient
> > > > > > > >>> > > > >>> first
> > > > > > > >>> > > > >>>>> and
> > > > > > > >>> > > > >>>>>>> then
> > > > > > > >>> > > > >>>>>>>>>   create ContextEnvironment based on the
> > > > > > ClusterClient
> > > > > > > >>> and
> > > > > > > >>> > > > >>> then
> > > > > > > >>> > > > >>>>> run
> > > > > > > >>> > > > >>>>>>> the
> > > > > > > >>> > > > >>>>>>>>> job.
> > > > > > > >>> > > > >>>>>>>>>   - Job cancelation. Flink will create
> > > > > ClusterClient
> > > > > > > >>> first
> > > > > > > >>> > > > >> and
> > > > > > > >>> > > > >>>>> then
> > > > > > > >>> > > > >>>>>>>> cancel
> > > > > > > >>> > > > >>>>>>>>>   this job via this ClusterClient.
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>> As you can see in the above 3 scenarios.
> > Flink
> > > > > didn't
> > > > > > > >>> use the
> > > > > > > >>> > > > >>>> same
> > > > > > > >>> > > > >>>>>>>>> approach(code path) to interact with flink
> > > > > > > >>> > > > >>>>>>>>> What I propose is following:
> > > > > > > >>> > > > >>>>>>>>> Create the proper
> > > > > LocalEnvironment/RemoteEnvironment
> > > > > > > >>> (based
> > > > > > > >>> > > > >> on
> > > > > > > >>> > > > >>>> user
> > > > > > > >>> > > > >>>>>>>>> configuration) --> Use this Environment to
> > > create
> > > > > > > proper
> > > > > > > >>> > > > >>>>>> ClusterClient
> > > > > > > >>> > > > >>>>>>>>> (LocalClusterClient or RestClusterClient)
> to
> > > > > > > interactive
> > > > > > > >>> with
> > > > > > > >>> > > > >>>>> Flink (
> > > > > > > >>> > > > >>>>>>> job
> > > > > > > >>> > > > >>>>>>>>> execution or cancelation)
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>> This way we can unify the process of local
> > > > > execution
> > > > > > > and
> > > > > > > >>> > > > >> remote
> > > > > > > >>> > > > >>>>>>>> execution.
> > > > > > > >>> > > > >>>>>>>>> And it is much easier for third party to
> > > > integrate
> > > > > > with
> > > > > > > >>> > > > >> flink,
> > > > > > > >>> > > > >>>>>> because
> > > > > > > >>> > > > >>>>>>>>> ExecutionEnvironment is the unified entry
> > point
> > > > for
> > > > > > > >>> flink.
> > > > > > > >>> > > > >> What
> > > > > > > >>> > > > >>>>> third
> > > > > > > >>> > > > >>>>>>>> party
> > > > > > > >>> > > > >>>>>>>>> needs to do is just pass configuration to
> > > > > > > >>> > > > >> ExecutionEnvironment
> > > > > > > >>> > > > >>>> and
> > > > > > > >>> > > > >>>>>>>>> ExecutionEnvironment will do the right
> thing
> > > > based
> > > > > on
> > > > > > > the
> > > > > > > >>> > > > >>>>>>> configuration.
> > > > > > > >>> > > > >>>>>>>>> Flink cli can also be considered as flink
> api
> > > > > > consumer.
> > > > > > > >>> it
> > > > > > > >>> > > > >> just
> > > > > > > >>> > > > >>>>> pass
> > > > > > > >>> > > > >>>>>>> the
> > > > > > > >>> > > > >>>>>>>>> configuration to ExecutionEnvironment and
> let
> > > > > > > >>> > > > >>>> ExecutionEnvironment
> > > > > > > >>> > > > >>>>> to
> > > > > > > >>> > > > >>>>>>>>> create the proper ClusterClient instead of
> > > > letting
> > > > > > cli
> > > > > > > to
> > > > > > > >>> > > > >>> create
> > > > > > > >>> > > > >>>>>>>>> ClusterClient directly.
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>> 6 would involve large code refactoring, so
> I
> > > > think
> > > > > we
> > > > > > > can
> > > > > > > >>> > > > >> defer
> > > > > > > >>> > > > >>>> it
> > > > > > > >>> > > > >>>>>> for
> > > > > > > >>> > > > >>>>>>>>> future release, 1,2,3,4,5 could be done at
> > > once I
> > > > > > > >>> believe.
> > > > > > > >>> > > > >> Let
> > > > > > > >>> > > > >>> me
> > > > > > > >>> > > > >>>>>> know
> > > > > > > >>> > > > >>>>>>>> your
> > > > > > > >>> > > > >>>>>>>>> comments and feedback, thanks
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>> --
> > > > > > > >>> > > > >>>>>>>>> Best Regards
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>> Jeff Zhang
> > > > > > > >>> > > > >>>>>>>>>
> > > > > > > >>> > > > >>>>>>>>
> > > > > > > >>> > > > >>>>>>>
> > > > > > > >>> > > > >>>>>>>
> > > > > > > >>> > > > >>>>>>> --
> > > > > > > >>> > > > >>>>>>> Best Regards
> > > > > > > >>> > > > >>>>>>>
> > > > > > > >>> > > > >>>>>>> Jeff Zhang
> > > > > > > >>> > > > >>>>>>>
> > > > > > > >>> > > > >>>>>>
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>> --
> > > > > > > >>> > > > >>>>> Best Regards
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>> Jeff Zhang
> > > > > > > >>> > > > >>>>>
> > > > > > > >>> > > > >>>>
> > > > > > > >>> > > > >>>
> > > > > > > >>> > > > >>>
> > > > > > > >>> > > > >>> --
> > > > > > > >>> > > > >>> Best Regards
> > > > > > > >>> > > > >>>
> > > > > > > >>> > > > >>> Jeff Zhang
> > > > > > > >>> > > > >>>
> > > > > > > >>> > > > >>
> > > > > > > >>> > > > >
> > > > > > > >>> > > > >
> > > > > > > >>> > > > > --
> > > > > > > >>> > > > > Best Regards
> > > > > > > >>> > > > >
> > > > > > > >>> > > > > Jeff Zhang
> > > > > > > >>> > > >
> > > > > > > >>> > > >
> > > > > > > >>> > >
> > > > > > > >>> > > --
> > > > > > > >>> > > Best Regards
> > > > > > > >>> > >
> > > > > > > >>> > > Jeff Zhang
> > > > > > > >>> > >
> > > > > > > >>> >
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > > > --
> > > > > > > Best Regards
> > > > > > >
> > > > > > > Jeff Zhang
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Jeff Zhang
> > > >
> > >
> >
>


-- 
Best Regards

Jeff Zhang

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

Hi Flavio,

Thanks for your reply.

Either current impl and in the design, ClusterClient
never takes responsibility for generating JobGraph.
(what you see in current codebase is several class methods)

Instead, user describes his program in the main method
with ExecutionEnvironment apis and calls env.compile()
or env.optimize() to get FlinkPlan and JobGraph respectively.

For listing main classes in a jar and choose one for
submission, you're now able to customize a CLI to do it.
Specifically, the path of jar is passed as arguments and
in the customized CLI you list main classes, choose one
to submit to the cluster.

Best,
tison.


Flavio Pompermaier <po...@okkam.it> 于2019年7月31日周三 下午8:12写道：

> Just one note on my side: it is not clear to me whether the client needs to
> be able to generate a job graph or not.
> In my opinion, the job jar must resides only on the server/jobManager side
> and the client requires a way to get the job graph.
> If you really want to access to the job graph, I'd add a dedicated method
> on the ClusterClient. like:
>
>    - getJobGraph(jarId, mainClass): JobGraph
>    - listMainClasses(jarId): List<String>
>
> These would require some addition also on the job manager endpoint as
> well..what do you think?
>
> On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <wa...@gmail.com> wrote:
>
> > Hi all,
> >
> > Here is a document[1] on client api enhancement from our perspective.
> > We have investigated current implementations. And we propose
> >
> > 1. Unify the implementation of cluster deployment and job submission in
> > Flink.
> > 2. Provide programmatic interfaces to allow flexible job and cluster
> > management.
> >
> > The first proposal is aimed at reducing code paths of cluster deployment
> > and
> > job submission so that one can adopt Flink in his usage easily. The
> second
> > proposal is aimed at providing rich interfaces for advanced users
> > who want to make accurate control of these stages.
> >
> > Quick reference on open questions:
> >
> > 1. Exclude job cluster deployment from client side or redefine the
> semantic
> > of job cluster? Since it fits in a process quite different from session
> > cluster deployment and job submission.
> >
> > 2. Maintain the codepaths handling class o.a.f.api.common.Program or
> > implement customized program handling logic by customized CliFrontend?
> > See also this thread[2] and the document[1].
> >
> > 3. Expose ClusterClient as public api or just expose api in
> > ExecutionEnvironment
> > and delegate them to ClusterClient? Further, in either way is it worth to
> > introduce a JobClient which is an encapsulation of ClusterClient that
> > associated to specific job?
> >
> > Best,
> > tison.
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> > [2]
> >
> >
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
> >
> > Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三 上午9:19写道：
> >
> > > Thanks Stephan, I will follow up this issue in next few weeks, and will
> > > refine the design doc. We could discuss more details after 1.9 release.
> > >
> > > Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：
> > >
> > > > Hi all!
> > > >
> > > > This thread has stalled for a bit, which I assume ist mostly due to
> the
> > > > Flink 1.9 feature freeze and release testing effort.
> > > >
> > > > I personally still recognize this issue as one important to be
> solved.
> > > I'd
> > > > be happy to help resume this discussion soon (after the 1.9 release)
> > and
> > > > see if we can do some step towards this in Flink 1.10.
> > > >
> > > > Best,
> > > > Stephan
> > > >
> > > >
> > > >
> > > > On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <
> > > pompermaier@okkam.it>
> > > > wrote:
> > > >
> > > > > That's exactly what I suggested a long time ago: the Flink REST
> > client
> > > > > should not require any Flink dependency, only http library to call
> > the
> > > > REST
> > > > > services to submit and monitor a job.
> > > > > What I suggested also in [1] was to have a way to automatically
> > suggest
> > > > the
> > > > > user (via a UI) the available main classes and their required
> > > > > parameters[2].
> > > > > Another problem we have with Flink is that the Rest client and the
> > CLI
> > > > one
> > > > > behaves differently and we use the CLI client (via ssh) because it
> > > allows
> > > > > to call some other method after env.execute() [3] (we have to call
> > > > another
> > > > > REST service to signal the end of the job).
> > > > > Int his regard, a dedicated interface, like the JobListener
> suggested
> > > in
> > > > > the previous emails, would be very helpful (IMHO).
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/FLINK-10864
> > > > > [2] https://issues.apache.org/jira/browse/FLINK-10862
> > > > > [3] https://issues.apache.org/jira/browse/FLINK-10879
> > > > >
> > > > > Best,
> > > > > Flavio
> > > > >
> > > > > On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <zj...@gmail.com>
> wrote:
> > > > >
> > > > > > Hi, Tison,
> > > > > >
> > > > > > Thanks for your comments. Overall I agree with you that it is
> > > difficult
> > > > > for
> > > > > > down stream project to integrate with flink and we need to
> refactor
> > > the
> > > > > > current flink client api.
> > > > > > And I agree that CliFrontend should only parsing command line
> > > arguments
> > > > > and
> > > > > > then pass them to ExecutionEnvironment. It is
> > ExecutionEnvironment's
> > > > > > responsibility to compile job, create cluster, and submit job.
> > > Besides
> > > > > > that, Currently flink has many ExecutionEnvironment
> > implementations,
> > > > and
> > > > > > flink will use the specific one based on the context. IMHO, it is
> > not
> > > > > > necessary, ExecutionEnvironment should be able to do the right
> > thing
> > > > > based
> > > > > > on the FlinkConf it is received. Too many ExecutionEnvironment
> > > > > > implementation is another burden for downstream project
> > integration.
> > > > > >
> > > > > > One thing I'd like to mention is flink's scala shell and sql
> > client,
> > > > > > although they are sub-modules of flink, they could be treated as
> > > > > downstream
> > > > > > project which use flink's client api. Currently you will find it
> is
> > > not
> > > > > > easy for them to integrate with flink, they share many duplicated
> > > code.
> > > > > It
> > > > > > is another sign that we should refactor flink client api.
> > > > > >
> > > > > > I believe it is a large and hard change, and I am afraid we can
> not
> > > > keep
> > > > > > compatibility since many of changes are user facing.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > After a closer look on our client apis, I can see there are two
> > > major
> > > > > > > issues to consistency and integration, namely different
> > deployment
> > > of
> > > > > > > job cluster which couples job graph creation and cluster
> > > deployment,
> > > > > > > and submission via CliFrontend confusing control flow of job
> > graph
> > > > > > > compilation and job submission. I'd like to follow the discuss
> > > above,
> > > > > > > mainly the process described by Jeff and Stephan, and share my
> > > > > > > ideas on these issues.
> > > > > > >
> > > > > > > 1) CliFrontend confuses the control flow of job compilation and
> > > > > > submission.
> > > > > > > Following the process of job submission Stephan and Jeff
> > described,
> > > > > > > execution environment knows all configs of the cluster and
> > > > > topos/settings
> > > > > > > of the job. Ideally, in the main method of user program, it
> calls
> > > > > > #execute
> > > > > > > (or named #submit) and Flink deploys the cluster, compile the
> job
> > > > graph
> > > > > > > and submit it to the cluster. However, current CliFrontend does
> > all
> > > > > these
> > > > > > > things inside its #runProgram method, which introduces a lot of
> > > > > > subclasses
> > > > > > > of (stream) execution environment.
> > > > > > >
> > > > > > > Actually, it sets up an exec env that hijacks the
> > > > #execute/executePlan
> > > > > > > method, initializes the job graph and abort execution. And then
> > > > > > > control flow back to CliFrontend, it deploys the cluster(or
> > > retrieve
> > > > > > > the client) and submits the job graph. This is quite a specific
> > > > > internal
> > > > > > > process inside Flink and none of consistency to anything.
> > > > > > >
> > > > > > > 2) Deployment of job cluster couples job graph creation and
> > cluster
> > > > > > > deployment. Abstractly, from user job to a concrete submission,
> > it
> > > > > > requires
> > > > > > >
> > > > > > >      create JobGraph --\
> > > > > > >
> > > > > > > create ClusterClient -->  submit JobGraph
> > > > > > >
> > > > > > > such a dependency. ClusterClient was created by deploying or
> > > > > retrieving.
> > > > > > > JobGraph submission requires a compiled JobGraph and valid
> > > > > ClusterClient,
> > > > > > > but the creation of ClusterClient is abstractly independent of
> > that
> > > > of
> > > > > > > JobGraph. However, in job cluster mode, we deploy job cluster
> > with
> > > a
> > > > > job
> > > > > > > graph, which means we use another process:
> > > > > > >
> > > > > > > create JobGraph --> deploy cluster with the JobGraph
> > > > > > >
> > > > > > > Here is another inconsistency and downstream projects/client
> apis
> > > are
> > > > > > > forced to handle different cases with rare supports from Flink.
> > > > > > >
> > > > > > > Since we likely reached a consensus on
> > > > > > >
> > > > > > > 1. all configs gathered by Flink configuration and passed
> > > > > > > 2. execution environment knows all configs and handles
> > > execution(both
> > > > > > > deployment and submission)
> > > > > > >
> > > > > > > to the issues above I propose eliminating inconsistencies by
> > > > following
> > > > > > > approach:
> > > > > > >
> > > > > > > 1) CliFrontend should exactly be a front end, at least for
> "run"
> > > > > command.
> > > > > > > That means it just gathered and passed all config from command
> > line
> > > > to
> > > > > > > the main method of user program. Execution environment knows
> all
> > > the
> > > > > info
> > > > > > > and with an addition to utils for ClusterClient, we gracefully
> > get
> > > a
> > > > > > > ClusterClient by deploying or retrieving. In this way, we don't
> > > need
> > > > to
> > > > > > > hijack #execute/executePlan methods and can remove various
> > hacking
> > > > > > > subclasses of exec env, as well as #run methods in
> > > ClusterClient(for
> > > > an
> > > > > > > interface-ized ClusterClient). Now the control flow flows from
> > > > > > CliFrontend
> > > > > > > to the main method and never returns.
> > > > > > >
> > > > > > > 2) Job cluster means a cluster for the specific job. From
> another
> > > > > > > perspective, it is an ephemeral session. We may decouple the
> > > > deployment
> > > > > > > with a compiled job graph, but start a session with idle
> timeout
> > > > > > > and submit the job following.
> > > > > > >
> > > > > > > These topics, before we go into more details on design or
> > > > > implementation,
> > > > > > > are better to be aware and discussed for a consensus.
> > > > > > >
> > > > > > > Best,
> > > > > > > tison.
> > > > > > >
> > > > > > >
> > > > > > > Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
> > > > > > >
> > > > > > >> Hi Jeff,
> > > > > > >>
> > > > > > >> Thanks for raising this thread and the design document!
> > > > > > >>
> > > > > > >> As @Thomas Weise mentioned above, extending config to flink
> > > > > > >> requires far more effort than it should be. Another example
> > > > > > >> is we achieve detach mode by introduce another execution
> > > > > > >> environment which also hijack #execute method.
> > > > > > >>
> > > > > > >> I agree with your idea that user would configure all things
> > > > > > >> and flink "just" respect it. On this topic I think the unusual
> > > > > > >> control flow when CliFrontend handle "run" command is the
> > problem.
> > > > > > >> It handles several configs, mainly about cluster settings, and
> > > > > > >> thus main method of user program is unaware of them. Also it
> > > > compiles
> > > > > > >> app to job graph by run the main method with a hijacked exec
> > env,
> > > > > > >> which constrain the main method further.
> > > > > > >>
> > > > > > >> I'd like to write down a few of notes on configs/args pass and
> > > > > respect,
> > > > > > >> as well as decoupling job compilation and submission. Share on
> > > this
> > > > > > >> thread later.
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> tison.
> > > > > > >>
> > > > > > >>
> > > > > > >> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一 下午7:29写道：
> > > > > > >>
> > > > > > >>> Hi Jeff and Flavio,
> > > > > > >>>
> > > > > > >>> Thanks Jeff a lot for proposing the design document.
> > > > > > >>>
> > > > > > >>> We are also working on refactoring ClusterClient to allow
> > > flexible
> > > > > and
> > > > > > >>> efficient job management in our real-time platform.
> > > > > > >>> We would like to draft a document to share our ideas with
> you.
> > > > > > >>>
> > > > > > >>> I think it's a good idea to have something like Apache Livy
> for
> > > > > Flink,
> > > > > > >>> and
> > > > > > >>> the efforts discussed here will take a great step forward to
> > it.
> > > > > > >>>
> > > > > > >>> Regards,
> > > > > > >>> Xiaogang
> > > > > > >>>
> > > > > > >>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一
> > > 下午7:13写道：
> > > > > > >>>
> > > > > > >>> > Is there any possibility to have something like Apache Livy
> > [1]
> > > > > also
> > > > > > >>> for
> > > > > > >>> > Flink in the future?
> > > > > > >>> >
> > > > > > >>> > [1] https://livy.apache.org/
> > > > > > >>> >
> > > > > > >>> > On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <
> zjffdu@gmail.com
> > >
> > > > > wrote:
> > > > > > >>> >
> > > > > > >>> > > >>>  Any API we expose should not have dependencies on
> the
> > > > > runtime
> > > > > > >>> > > (flink-runtime) package or other implementation details.
> To
> > > me,
> > > > > > this
> > > > > > >>> > means
> > > > > > >>> > > that the current ClusterClient cannot be exposed to users
> > > > because
> > > > > > it
> > > > > > >>> >  uses
> > > > > > >>> > > quite some classes from the optimiser and runtime
> packages.
> > > > > > >>> > >
> > > > > > >>> > > We should change ClusterClient from class to interface.
> > > > > > >>> > > ExecutionEnvironment only use the interface ClusterClient
> > > which
> > > > > > >>> should be
> > > > > > >>> > > in flink-clients while the concrete implementation class
> > > could
> > > > be
> > > > > > in
> > > > > > >>> > > flink-runtime.
> > > > > > >>> > >
> > > > > > >>> > > >>> What happens when a failure/restart in the client
> > > happens?
> > > > > > There
> > > > > > >>> need
> > > > > > >>> > > to be a way of re-establishing the connection to the job,
> > set
> > > > up
> > > > > > the
> > > > > > >>> > > listeners again, etc.
> > > > > > >>> > >
> > > > > > >>> > > Good point.  First we need to define what does
> > > failure/restart
> > > > in
> > > > > > the
> > > > > > >>> > > client mean. IIUC, that usually mean network failure
> which
> > > will
> > > > > > >>> happen in
> > > > > > >>> > > class RestClient. If my understanding is correct,
> > > restart/retry
> > > > > > >>> mechanism
> > > > > > >>> > > should be done in RestClient.
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>> > >
> > > > > > >>> > > Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二
> > > > 下午11:10写道：
> > > > > > >>> > >
> > > > > > >>> > > > Some points to consider:
> > > > > > >>> > > >
> > > > > > >>> > > > * Any API we expose should not have dependencies on the
> > > > runtime
> > > > > > >>> > > > (flink-runtime) package or other implementation
> details.
> > To
> > > > me,
> > > > > > >>> this
> > > > > > >>> > > means
> > > > > > >>> > > > that the current ClusterClient cannot be exposed to
> users
> > > > > because
> > > > > > >>> it
> > > > > > >>> > >  uses
> > > > > > >>> > > > quite some classes from the optimiser and runtime
> > packages.
> > > > > > >>> > > >
> > > > > > >>> > > > * What happens when a failure/restart in the client
> > > happens?
> > > > > > There
> > > > > > >>> need
> > > > > > >>> > > to
> > > > > > >>> > > > be a way of re-establishing the connection to the job,
> > set
> > > up
> > > > > the
> > > > > > >>> > > listeners
> > > > > > >>> > > > again, etc.
> > > > > > >>> > > >
> > > > > > >>> > > > Aljoscha
> > > > > > >>> > > >
> > > > > > >>> > > > > On 29. May 2019, at 10:17, Jeff Zhang <
> > zjffdu@gmail.com>
> > > > > > wrote:
> > > > > > >>> > > > >
> > > > > > >>> > > > > Sorry folks, the design doc is late as you expected.
> > > Here's
> > > > > the
> > > > > > >>> > design
> > > > > > >>> > > > doc
> > > > > > >>> > > > > I drafted, welcome any comments and feedback.
> > > > > > >>> > > > >
> > > > > > >>> > > > >
> > > > > > >>> > > >
> > > > > > >>> > >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > > > > >>> > > > >
> > > > > > >>> > > > >
> > > > > > >>> > > > >
> > > > > > >>> > > > > Stephan Ewen <se...@apache.org> 于2019年2月14日周四
> > 下午8:43写道：
> > > > > > >>> > > > >
> > > > > > >>> > > > >> Nice that this discussion is happening.
> > > > > > >>> > > > >>
> > > > > > >>> > > > >> In the FLIP, we could also revisit the entire role
> of
> > > the
> > > > > > >>> > environments
> > > > > > >>> > > > >> again.
> > > > > > >>> > > > >>
> > > > > > >>> > > > >> Initially, the idea was:
> > > > > > >>> > > > >>  - the environments take care of the specific setup
> > for
> > > > > > >>> standalone
> > > > > > >>> > (no
> > > > > > >>> > > > >> setup needed), yarn, mesos, etc.
> > > > > > >>> > > > >>  - the session ones have control over the session.
> The
> > > > > > >>> environment
> > > > > > >>> > > holds
> > > > > > >>> > > > >> the session client.
> > > > > > >>> > > > >>  - running a job gives a "control" object for that
> > job.
> > > > That
> > > > > > >>> > behavior
> > > > > > >>> > > is
> > > > > > >>> > > > >> the same in all environments.
> > > > > > >>> > > > >>
> > > > > > >>> > > > >> The actual implementation diverged quite a bit from
> > > that.
> > > > > > Happy
> > > > > > >>> to
> > > > > > >>> > > see a
> > > > > > >>> > > > >> discussion about straitening this out a bit more.
> > > > > > >>> > > > >>
> > > > > > >>> > > > >>
> > > > > > >>> > > > >> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
> > > > > zjffdu@gmail.com>
> > > > > > >>> > wrote:
> > > > > > >>> > > > >>
> > > > > > >>> > > > >>> Hi folks,
> > > > > > >>> > > > >>>
> > > > > > >>> > > > >>> Sorry for late response, It seems we reach
> consensus
> > on
> > > > > > this, I
> > > > > > >>> > will
> > > > > > >>> > > > >> create
> > > > > > >>> > > > >>> FLIP for this with more detailed design
> > > > > > >>> > > > >>>
> > > > > > >>> > > > >>>
> > > > > > >>> > > > >>> Thomas Weise <th...@apache.org> 于2018年12月21日周五
> > > 上午11:43写道：
> > > > > > >>> > > > >>>
> > > > > > >>> > > > >>>> Great to see this discussion seeded! The problems
> > you
> > > > face
> > > > > > >>> with
> > > > > > >>> > the
> > > > > > >>> > > > >>>> Zeppelin integration are also affecting other
> > > downstream
> > > > > > >>> projects,
> > > > > > >>> > > > like
> > > > > > >>> > > > >>>> Beam.
> > > > > > >>> > > > >>>>
> > > > > > >>> > > > >>>> We just enabled the savepoint restore option in
> > > > > > >>> > > > RemoteStreamEnvironment
> > > > > > >>> > > > >>> [1]
> > > > > > >>> > > > >>>> and that was more difficult than it should be. The
> > > main
> > > > > > issue
> > > > > > >>> is
> > > > > > >>> > > that
> > > > > > >>> > > > >>>> environment and cluster client aren't decoupled.
> > > Ideally
> > > > > it
> > > > > > >>> should
> > > > > > >>> > > be
> > > > > > >>> > > > >>>> possible to just get the matching cluster client
> > from
> > > > the
> > > > > > >>> > > environment
> > > > > > >>> > > > >> and
> > > > > > >>> > > > >>>> then control the job through it (environment as
> > > factory
> > > > > for
> > > > > > >>> > cluster
> > > > > > >>> > > > >>>> client). But note that the environment classes are
> > > part
> > > > of
> > > > > > the
> > > > > > >>> > > public
> > > > > > >>> > > > >>> API,
> > > > > > >>> > > > >>>> and it is not straightforward to make larger
> changes
> > > > > without
> > > > > > >>> > > breaking
> > > > > > >>> > > > >>>> backward compatibility.
> > > > > > >>> > > > >>>>
> > > > > > >>> > > > >>>> ClusterClient currently exposes internal classes
> > like
> > > > > > >>> JobGraph and
> > > > > > >>> > > > >>>> StreamGraph. But it should be possible to wrap
> this
> > > > with a
> > > > > > new
> > > > > > >>> > > public
> > > > > > >>> > > > >> API
> > > > > > >>> > > > >>>> that brings the required job control capabilities
> > for
> > > > > > >>> downstream
> > > > > > >>> > > > >>> projects.
> > > > > > >>> > > > >>>> Perhaps it is helpful to look at some of the
> > > interfaces
> > > > in
> > > > > > >>> Beam
> > > > > > >>> > > while
> > > > > > >>> > > > >>>> thinking about this: [2] for the portable job API
> > and
> > > > [3]
> > > > > > for
> > > > > > >>> the
> > > > > > >>> > > old
> > > > > > >>> > > > >>>> asynchronous job control from the Beam Java SDK.
> > > > > > >>> > > > >>>>
> > > > > > >>> > > > >>>> The backward compatibility discussion [4] is also
> > > > relevant
> > > > > > >>> here. A
> > > > > > >>> > > new
> > > > > > >>> > > > >>> API
> > > > > > >>> > > > >>>> should shield downstream projects from internals
> and
> > > > allow
> > > > > > >>> them to
> > > > > > >>> > > > >>>> interoperate with multiple future Flink versions
> in
> > > the
> > > > > same
> > > > > > >>> > release
> > > > > > >>> > > > >> line
> > > > > > >>> > > > >>>> without forced upgrades.
> > > > > > >>> > > > >>>>
> > > > > > >>> > > > >>>> Thanks,
> > > > > > >>> > > > >>>> Thomas
> > > > > > >>> > > > >>>>
> > > > > > >>> > > > >>>> [1] https://github.com/apache/flink/pull/7249
> > > > > > >>> > > > >>>> [2]
> > > > > > >>> > > > >>>>
> > > > > > >>> > > > >>>>
> > > > > > >>> > > > >>>
> > > > > > >>> > > > >>
> > > > > > >>> > > >
> > > > > > >>> > >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > > > > >>> > > > >>>> [3]
> > > > > > >>> > > > >>>>
> > > > > > >>> > > > >>>>
> > > > > > >>> > > > >>>
> > > > > > >>> > > > >>
> > > > > > >>> > > >
> > > > > > >>> > >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > > > > >>> > > > >>>> [4]
> > > > > > >>> > > > >>>>
> > > > > > >>> > > > >>>>
> > > > > > >>> > > > >>>
> > > > > > >>> > > > >>
> > > > > > >>> > > >
> > > > > > >>> > >
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > > > > >>> > > > >>>>
> > > > > > >>> > > > >>>>
> > > > > > >>> > > > >>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
> > > > > > zjffdu@gmail.com>
> > > > > > >>> > > wrote:
> > > > > > >>> > > > >>>>
> > > > > > >>> > > > >>>>>>>> I'm not so sure whether the user should be
> able
> > to
> > > > > > define
> > > > > > >>> > where
> > > > > > >>> > > > >> the
> > > > > > >>> > > > >>>> job
> > > > > > >>> > > > >>>>> runs (in your example Yarn). This is actually
> > > > independent
> > > > > > of
> > > > > > >>> the
> > > > > > >>> > > job
> > > > > > >>> > > > >>>>> development and is something which is decided at
> > > > > deployment
> > > > > > >>> time.
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>> User don't need to specify execution mode
> > > > > programmatically.
> > > > > > >>> They
> > > > > > >>> > > can
> > > > > > >>> > > > >>> also
> > > > > > >>> > > > >>>>> pass the execution mode from the arguments in
> flink
> > > run
> > > > > > >>> command.
> > > > > > >>> > > e.g.
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>> bin/flink run -m yarn-cluster ....
> > > > > > >>> > > > >>>>> bin/flink run -m local ...
> > > > > > >>> > > > >>>>> bin/flink run -m host:port ...
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>> Does this make sense to you ?
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>>>>> To me it makes sense that the
> > ExecutionEnvironment
> > > > is
> > > > > > not
> > > > > > >>> > > > >> directly
> > > > > > >>> > > > >>>>> initialized by the user and instead context
> > sensitive
> > > > how
> > > > > > you
> > > > > > >>> > want
> > > > > > >>> > > to
> > > > > > >>> > > > >>>>> execute your job (Flink CLI vs. IDE, for
> example).
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>> Right, currently I notice Flink would create
> > > different
> > > > > > >>> > > > >>>>> ContextExecutionEnvironment based on different
> > > > submission
> > > > > > >>> > scenarios
> > > > > > >>> > > > >>>> (Flink
> > > > > > >>> > > > >>>>> Cli vs IDE). To me this is kind of hack approach,
> > not
> > > > so
> > > > > > >>> > > > >>> straightforward.
> > > > > > >>> > > > >>>>> What I suggested above is that is that flink
> should
> > > > > always
> > > > > > >>> create
> > > > > > >>> > > the
> > > > > > >>> > > > >>>> same
> > > > > > >>> > > > >>>>> ExecutionEnvironment but with different
> > > configuration,
> > > > > and
> > > > > > >>> based
> > > > > > >>> > on
> > > > > > >>> > > > >> the
> > > > > > >>> > > > >>>>> configuration it would create the proper
> > > ClusterClient
> > > > > for
> > > > > > >>> > > different
> > > > > > >>> > > > >>>>> behaviors.
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>> Till Rohrmann <tr...@apache.org>
> > 于2018年12月20日周四
> > > > > > >>> 下午11:18写道：
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>>> You are probably right that we have code
> > duplication
> > > > > when
> > > > > > it
> > > > > > >>> > comes
> > > > > > >>> > > > >> to
> > > > > > >>> > > > >>>> the
> > > > > > >>> > > > >>>>>> creation of the ClusterClient. This should be
> > > reduced
> > > > in
> > > > > > the
> > > > > > >>> > > > >> future.
> > > > > > >>> > > > >>>>>>
> > > > > > >>> > > > >>>>>> I'm not so sure whether the user should be able
> to
> > > > > define
> > > > > > >>> where
> > > > > > >>> > > the
> > > > > > >>> > > > >>> job
> > > > > > >>> > > > >>>>>> runs (in your example Yarn). This is actually
> > > > > independent
> > > > > > >>> of the
> > > > > > >>> > > > >> job
> > > > > > >>> > > > >>>>>> development and is something which is decided at
> > > > > > deployment
> > > > > > >>> > time.
> > > > > > >>> > > > >> To
> > > > > > >>> > > > >>> me
> > > > > > >>> > > > >>>>> it
> > > > > > >>> > > > >>>>>> makes sense that the ExecutionEnvironment is not
> > > > > directly
> > > > > > >>> > > > >> initialized
> > > > > > >>> > > > >>>> by
> > > > > > >>> > > > >>>>>> the user and instead context sensitive how you
> > want
> > > to
> > > > > > >>> execute
> > > > > > >>> > > your
> > > > > > >>> > > > >>> job
> > > > > > >>> > > > >>>>>> (Flink CLI vs. IDE, for example). However, I
> agree
> > > > that
> > > > > > the
> > > > > > >>> > > > >>>>>> ExecutionEnvironment should give you access to
> the
> > > > > > >>> ClusterClient
> > > > > > >>> > > > >> and
> > > > > > >>> > > > >>> to
> > > > > > >>> > > > >>>>> the
> > > > > > >>> > > > >>>>>> job (maybe in the form of the JobGraph or a job
> > > plan).
> > > > > > >>> > > > >>>>>>
> > > > > > >>> > > > >>>>>> Cheers,
> > > > > > >>> > > > >>>>>> Till
> > > > > > >>> > > > >>>>>>
> > > > > > >>> > > > >>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
> > > > > > >>> zjffdu@gmail.com>
> > > > > > >>> > > > >> wrote:
> > > > > > >>> > > > >>>>>>
> > > > > > >>> > > > >>>>>>> Hi Till,
> > > > > > >>> > > > >>>>>>> Thanks for the feedback. You are right that I
> > > expect
> > > > > > better
> > > > > > >>> > > > >>>>> programmatic
> > > > > > >>> > > > >>>>>>> job submission/control api which could be used
> by
> > > > > > >>> downstream
> > > > > > >>> > > > >>> project.
> > > > > > >>> > > > >>>>> And
> > > > > > >>> > > > >>>>>>> it would benefit for the flink ecosystem. When
> I
> > > look
> > > > > at
> > > > > > >>> the
> > > > > > >>> > code
> > > > > > >>> > > > >>> of
> > > > > > >>> > > > >>>>>> flink
> > > > > > >>> > > > >>>>>>> scala-shell and sql-client (I believe they are
> > not
> > > > the
> > > > > > >>> core of
> > > > > > >>> > > > >>> flink,
> > > > > > >>> > > > >>>>> but
> > > > > > >>> > > > >>>>>>> belong to the ecosystem of flink), I find many
> > > > > duplicated
> > > > > > >>> code
> > > > > > >>> > > > >> for
> > > > > > >>> > > > >>>>>> creating
> > > > > > >>> > > > >>>>>>> ClusterClient from user provided configuration
> > > > > > >>> (configuration
> > > > > > >>> > > > >>> format
> > > > > > >>> > > > >>>>> may
> > > > > > >>> > > > >>>>>> be
> > > > > > >>> > > > >>>>>>> different from scala-shell and sql-client) and
> > then
> > > > use
> > > > > > >>> that
> > > > > > >>> > > > >>>>>> ClusterClient
> > > > > > >>> > > > >>>>>>> to manipulate jobs. I don't think this is
> > > convenient
> > > > > for
> > > > > > >>> > > > >> downstream
> > > > > > >>> > > > >>>>>>> projects. What I expect is that downstream
> > project
> > > > only
> > > > > > >>> needs
> > > > > > >>> > to
> > > > > > >>> > > > >>>>> provide
> > > > > > >>> > > > >>>>>>> necessary configuration info (maybe introducing
> > > class
> > > > > > >>> > FlinkConf),
> > > > > > >>> > > > >>> and
> > > > > > >>> > > > >>>>>> then
> > > > > > >>> > > > >>>>>>> build ExecutionEnvironment based on this
> > FlinkConf,
> > > > and
> > > > > > >>> > > > >>>>>>> ExecutionEnvironment will create the proper
> > > > > > ClusterClient.
> > > > > > >>> It
> > > > > > >>> > not
> > > > > > >>> > > > >>>> only
> > > > > > >>> > > > >>>>>>> benefit for the downstream project development
> > but
> > > > also
> > > > > > be
> > > > > > >>> > > > >> helpful
> > > > > > >>> > > > >>>> for
> > > > > > >>> > > > >>>>>>> their integration test with flink. Here's one
> > > sample
> > > > > code
> > > > > > >>> > snippet
> > > > > > >>> > > > >>>> that
> > > > > > >>> > > > >>>>> I
> > > > > > >>> > > > >>>>>>> expect.
> > > > > > >>> > > > >>>>>>>
> > > > > > >>> > > > >>>>>>> val conf = new FlinkConf().mode("yarn")
> > > > > > >>> > > > >>>>>>> val env = new ExecutionEnvironment(conf)
> > > > > > >>> > > > >>>>>>> val jobId = env.submit(...)
> > > > > > >>> > > > >>>>>>> val jobStatus =
> > > > > > >>> env.getClusterClient().queryJobStatus(jobId)
> > > > > > >>> > > > >>>>>>> env.getClusterClient().cancelJob(jobId)
> > > > > > >>> > > > >>>>>>>
> > > > > > >>> > > > >>>>>>> What do you think ?
> > > > > > >>> > > > >>>>>>>
> > > > > > >>> > > > >>>>>>>
> > > > > > >>> > > > >>>>>>>
> > > > > > >>> > > > >>>>>>>
> > > > > > >>> > > > >>>>>>> Till Rohrmann <tr...@apache.org>
> > > 于2018年12月11日周二
> > > > > > >>> 下午6:28写道：
> > > > > > >>> > > > >>>>>>>
> > > > > > >>> > > > >>>>>>>> Hi Jeff,
> > > > > > >>> > > > >>>>>>>>
> > > > > > >>> > > > >>>>>>>> what you are proposing is to provide the user
> > with
> > > > > > better
> > > > > > >>> > > > >>>>> programmatic
> > > > > > >>> > > > >>>>>>> job
> > > > > > >>> > > > >>>>>>>> control. There was actually an effort to
> achieve
> > > > this
> > > > > > but
> > > > > > >>> it
> > > > > > >>> > > > >> has
> > > > > > >>> > > > >>>>> never
> > > > > > >>> > > > >>>>>>> been
> > > > > > >>> > > > >>>>>>>> completed [1]. However, there are some
> > improvement
> > > > in
> > > > > > the
> > > > > > >>> code
> > > > > > >>> > > > >>> base
> > > > > > >>> > > > >>>>>> now.
> > > > > > >>> > > > >>>>>>>> Look for example at the NewClusterClient
> > interface
> > > > > which
> > > > > > >>> > > > >> offers a
> > > > > > >>> > > > >>>>>>>> non-blocking job submission. But I agree that
> we
> > > > need
> > > > > to
> > > > > > >>> > > > >> improve
> > > > > > >>> > > > >>>>> Flink
> > > > > > >>> > > > >>>>>> in
> > > > > > >>> > > > >>>>>>>> this regard.
> > > > > > >>> > > > >>>>>>>>
> > > > > > >>> > > > >>>>>>>> I would not be in favour if exposing all
> > > > ClusterClient
> > > > > > >>> calls
> > > > > > >>> > > > >> via
> > > > > > >>> > > > >>>> the
> > > > > > >>> > > > >>>>>>>> ExecutionEnvironment because it would clutter
> > the
> > > > > class
> > > > > > >>> and
> > > > > > >>> > > > >> would
> > > > > > >>> > > > >>>> not
> > > > > > >>> > > > >>>>>> be
> > > > > > >>> > > > >>>>>>> a
> > > > > > >>> > > > >>>>>>>> good separation of concerns. Instead one idea
> > > could
> > > > be
> > > > > > to
> > > > > > >>> > > > >>> retrieve
> > > > > > >>> > > > >>>>> the
> > > > > > >>> > > > >>>>>>>> current ClusterClient from the
> > > ExecutionEnvironment
> > > > > > which
> > > > > > >>> can
> > > > > > >>> > > > >>> then
> > > > > > >>> > > > >>>> be
> > > > > > >>> > > > >>>>>>> used
> > > > > > >>> > > > >>>>>>>> for cluster and job control. But before we
> start
> > > an
> > > > > > effort
> > > > > > >>> > > > >> here,
> > > > > > >>> > > > >>> we
> > > > > > >>> > > > >>>>>> need
> > > > > > >>> > > > >>>>>>> to
> > > > > > >>> > > > >>>>>>>> agree and capture what functionality we want
> to
> > > > > provide.
> > > > > > >>> > > > >>>>>>>>
> > > > > > >>> > > > >>>>>>>> Initially, the idea was that we have the
> > > > > > ClusterDescriptor
> > > > > > >>> > > > >>>> describing
> > > > > > >>> > > > >>>>>> how
> > > > > > >>> > > > >>>>>>>> to talk to cluster manager like Yarn or Mesos.
> > The
> > > > > > >>> > > > >>>> ClusterDescriptor
> > > > > > >>> > > > >>>>>> can
> > > > > > >>> > > > >>>>>>> be
> > > > > > >>> > > > >>>>>>>> used for deploying Flink clusters (job and
> > > session)
> > > > > and
> > > > > > >>> gives
> > > > > > >>> > > > >>> you a
> > > > > > >>> > > > >>>>>>>> ClusterClient. The ClusterClient controls the
> > > > cluster
> > > > > > >>> (e.g.
> > > > > > >>> > > > >>>>> submitting
> > > > > > >>> > > > >>>>>>>> jobs, listing all running jobs). And then
> there
> > > was
> > > > > the
> > > > > > >>> idea
> > > > > > >>> > to
> > > > > > >>> > > > >>>>>>> introduce a
> > > > > > >>> > > > >>>>>>>> JobClient which you obtain from the
> > ClusterClient
> > > to
> > > > > > >>> trigger
> > > > > > >>> > > > >> job
> > > > > > >>> > > > >>>>>> specific
> > > > > > >>> > > > >>>>>>>> operations (e.g. taking a savepoint,
> cancelling
> > > the
> > > > > > job).
> > > > > > >>> > > > >>>>>>>>
> > > > > > >>> > > > >>>>>>>> [1]
> > > > https://issues.apache.org/jira/browse/FLINK-4272
> > > > > > >>> > > > >>>>>>>>
> > > > > > >>> > > > >>>>>>>> Cheers,
> > > > > > >>> > > > >>>>>>>> Till
> > > > > > >>> > > > >>>>>>>>
> > > > > > >>> > > > >>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang <
> > > > > > >>> zjffdu@gmail.com
> > > > > > >>> > >
> > > > > > >>> > > > >>>>> wrote:
> > > > > > >>> > > > >>>>>>>>
> > > > > > >>> > > > >>>>>>>>> Hi Folks,
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>> I am trying to integrate flink into apache
> > > zeppelin
> > > > > > >>> which is
> > > > > > >>> > > > >> an
> > > > > > >>> > > > >>>>>>>> interactive
> > > > > > >>> > > > >>>>>>>>> notebook. And I hit several issues that is
> > caused
> > > > by
> > > > > > >>> flink
> > > > > > >>> > > > >>> client
> > > > > > >>> > > > >>>>>> api.
> > > > > > >>> > > > >>>>>>> So
> > > > > > >>> > > > >>>>>>>>> I'd like to proposal the following changes
> for
> > > > flink
> > > > > > >>> client
> > > > > > >>> > > > >>> api.
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>> 1. Support nonblocking execution. Currently,
> > > > > > >>> > > > >>>>>>> ExecutionEnvironment#execute
> > > > > > >>> > > > >>>>>>>>> is a blocking method which would do 2 things,
> > > first
> > > > > > >>> submit
> > > > > > >>> > > > >> job
> > > > > > >>> > > > >>>> and
> > > > > > >>> > > > >>>>>> then
> > > > > > >>> > > > >>>>>>>>> wait for job until it is finished. I'd like
> > > > > introduce a
> > > > > > >>> > > > >>>> nonblocking
> > > > > > >>> > > > >>>>>>>>> execution method like
> > ExecutionEnvironment#submit
> > > > > which
> > > > > > >>> only
> > > > > > >>> > > > >>>> submit
> > > > > > >>> > > > >>>>>> job
> > > > > > >>> > > > >>>>>>>> and
> > > > > > >>> > > > >>>>>>>>> then return jobId to client. And allow user
> to
> > > > query
> > > > > > the
> > > > > > >>> job
> > > > > > >>> > > > >>>> status
> > > > > > >>> > > > >>>>>> via
> > > > > > >>> > > > >>>>>>>> the
> > > > > > >>> > > > >>>>>>>>> jobId.
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>> 2. Add cancel api in
> > > > > > >>> > > > >>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > > >>> > > > >>>>>>>>> currently the only way to cancel job is via
> cli
> > > > > > >>> (bin/flink),
> > > > > > >>> > > > >>> this
> > > > > > >>> > > > >>>>> is
> > > > > > >>> > > > >>>>>>> not
> > > > > > >>> > > > >>>>>>>>> convenient for downstream project to use this
> > > > > feature.
> > > > > > >>> So I'd
> > > > > > >>> > > > >>>> like
> > > > > > >>> > > > >>>>> to
> > > > > > >>> > > > >>>>>>> add
> > > > > > >>> > > > >>>>>>>>> cancel api in ExecutionEnvironment
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>> 3. Add savepoint api in
> > > > > > >>> > > > >>>>>>>
> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > >>> > > > >>>>>>>> It
> > > > > > >>> > > > >>>>>>>>> is similar as cancel api, we should use
> > > > > > >>> ExecutionEnvironment
> > > > > > >>> > > > >> as
> > > > > > >>> > > > >>>> the
> > > > > > >>> > > > >>>>>>>> unified
> > > > > > >>> > > > >>>>>>>>> api for third party to integrate with flink.
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>> 4. Add listener for job execution lifecycle.
> > > > > Something
> > > > > > >>> like
> > > > > > >>> > > > >>>>>> following,
> > > > > > >>> > > > >>>>>>> so
> > > > > > >>> > > > >>>>>>>>> that downstream project can do custom logic
> in
> > > the
> > > > > > >>> lifecycle
> > > > > > >>> > > > >> of
> > > > > > >>> > > > >>>>> job.
> > > > > > >>> > > > >>>>>>> e.g.
> > > > > > >>> > > > >>>>>>>>> Zeppelin would capture the jobId after job is
> > > > > submitted
> > > > > > >>> and
> > > > > > >>> > > > >>> then
> > > > > > >>> > > > >>>>> use
> > > > > > >>> > > > >>>>>>> this
> > > > > > >>> > > > >>>>>>>>> jobId to cancel it later when necessary.
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>> public interface JobListener {
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>>   void onJobSubmitted(JobID jobId);
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>>   void onJobExecuted(JobExecutionResult
> > > jobResult);
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>>   void onJobCanceled(JobID jobId);
> > > > > > >>> > > > >>>>>>>>> }
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>> 5. Enable session in ExecutionEnvironment.
> > > > Currently
> > > > > it
> > > > > > >>> is
> > > > > > >>> > > > >>>>> disabled,
> > > > > > >>> > > > >>>>>>> but
> > > > > > >>> > > > >>>>>>>>> session is very convenient for third party to
> > > > > > submitting
> > > > > > >>> jobs
> > > > > > >>> > > > >>>>>>>> continually.
> > > > > > >>> > > > >>>>>>>>> I hope flink can enable it again.
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>> 6. Unify all flink client api into
> > > > > > >>> > > > >>>>>>>>>
> > ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>> This is a long term issue which needs more
> > > careful
> > > > > > >>> thinking
> > > > > > >>> > > > >> and
> > > > > > >>> > > > >>>>>> design.
> > > > > > >>> > > > >>>>>>>>> Currently some of features of flink is
> exposed
> > in
> > > > > > >>> > > > >>>>>>>>>
> > ExecutionEnvironment/StreamExecutionEnvironment,
> > > > but
> > > > > > >>> some are
> > > > > > >>> > > > >>>>> exposed
> > > > > > >>> > > > >>>>>>> in
> > > > > > >>> > > > >>>>>>>>> cli instead of api, like the cancel and
> > > savepoint I
> > > > > > >>> mentioned
> > > > > > >>> > > > >>>>> above.
> > > > > > >>> > > > >>>>>> I
> > > > > > >>> > > > >>>>>>>>> think the root cause is due to that flink
> > didn't
> > > > > unify
> > > > > > >>> the
> > > > > > >>> > > > >>>>>> interaction
> > > > > > >>> > > > >>>>>>>> with
> > > > > > >>> > > > >>>>>>>>> flink. Here I list 3 scenarios of flink
> > operation
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>>   - Local job execution.  Flink will create
> > > > > > >>> LocalEnvironment
> > > > > > >>> > > > >>> and
> > > > > > >>> > > > >>>>>> then
> > > > > > >>> > > > >>>>>>>> use
> > > > > > >>> > > > >>>>>>>>>   this LocalEnvironment to create
> LocalExecutor
> > > for
> > > > > job
> > > > > > >>> > > > >>>> execution.
> > > > > > >>> > > > >>>>>>>>>   - Remote job execution. Flink will create
> > > > > > ClusterClient
> > > > > > >>> > > > >>> first
> > > > > > >>> > > > >>>>> and
> > > > > > >>> > > > >>>>>>> then
> > > > > > >>> > > > >>>>>>>>>   create ContextEnvironment based on the
> > > > > ClusterClient
> > > > > > >>> and
> > > > > > >>> > > > >>> then
> > > > > > >>> > > > >>>>> run
> > > > > > >>> > > > >>>>>>> the
> > > > > > >>> > > > >>>>>>>>> job.
> > > > > > >>> > > > >>>>>>>>>   - Job cancelation. Flink will create
> > > > ClusterClient
> > > > > > >>> first
> > > > > > >>> > > > >> and
> > > > > > >>> > > > >>>>> then
> > > > > > >>> > > > >>>>>>>> cancel
> > > > > > >>> > > > >>>>>>>>>   this job via this ClusterClient.
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>> As you can see in the above 3 scenarios.
> Flink
> > > > didn't
> > > > > > >>> use the
> > > > > > >>> > > > >>>> same
> > > > > > >>> > > > >>>>>>>>> approach(code path) to interact with flink
> > > > > > >>> > > > >>>>>>>>> What I propose is following:
> > > > > > >>> > > > >>>>>>>>> Create the proper
> > > > LocalEnvironment/RemoteEnvironment
> > > > > > >>> (based
> > > > > > >>> > > > >> on
> > > > > > >>> > > > >>>> user
> > > > > > >>> > > > >>>>>>>>> configuration) --> Use this Environment to
> > create
> > > > > > proper
> > > > > > >>> > > > >>>>>> ClusterClient
> > > > > > >>> > > > >>>>>>>>> (LocalClusterClient or RestClusterClient) to
> > > > > > interactive
> > > > > > >>> with
> > > > > > >>> > > > >>>>> Flink (
> > > > > > >>> > > > >>>>>>> job
> > > > > > >>> > > > >>>>>>>>> execution or cancelation)
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>> This way we can unify the process of local
> > > > execution
> > > > > > and
> > > > > > >>> > > > >> remote
> > > > > > >>> > > > >>>>>>>> execution.
> > > > > > >>> > > > >>>>>>>>> And it is much easier for third party to
> > > integrate
> > > > > with
> > > > > > >>> > > > >> flink,
> > > > > > >>> > > > >>>>>> because
> > > > > > >>> > > > >>>>>>>>> ExecutionEnvironment is the unified entry
> point
> > > for
> > > > > > >>> flink.
> > > > > > >>> > > > >> What
> > > > > > >>> > > > >>>>> third
> > > > > > >>> > > > >>>>>>>> party
> > > > > > >>> > > > >>>>>>>>> needs to do is just pass configuration to
> > > > > > >>> > > > >> ExecutionEnvironment
> > > > > > >>> > > > >>>> and
> > > > > > >>> > > > >>>>>>>>> ExecutionEnvironment will do the right thing
> > > based
> > > > on
> > > > > > the
> > > > > > >>> > > > >>>>>>> configuration.
> > > > > > >>> > > > >>>>>>>>> Flink cli can also be considered as flink api
> > > > > consumer.
> > > > > > >>> it
> > > > > > >>> > > > >> just
> > > > > > >>> > > > >>>>> pass
> > > > > > >>> > > > >>>>>>> the
> > > > > > >>> > > > >>>>>>>>> configuration to ExecutionEnvironment and let
> > > > > > >>> > > > >>>> ExecutionEnvironment
> > > > > > >>> > > > >>>>> to
> > > > > > >>> > > > >>>>>>>>> create the proper ClusterClient instead of
> > > letting
> > > > > cli
> > > > > > to
> > > > > > >>> > > > >>> create
> > > > > > >>> > > > >>>>>>>>> ClusterClient directly.
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>> 6 would involve large code refactoring, so I
> > > think
> > > > we
> > > > > > can
> > > > > > >>> > > > >> defer
> > > > > > >>> > > > >>>> it
> > > > > > >>> > > > >>>>>> for
> > > > > > >>> > > > >>>>>>>>> future release, 1,2,3,4,5 could be done at
> > once I
> > > > > > >>> believe.
> > > > > > >>> > > > >> Let
> > > > > > >>> > > > >>> me
> > > > > > >>> > > > >>>>>> know
> > > > > > >>> > > > >>>>>>>> your
> > > > > > >>> > > > >>>>>>>>> comments and feedback, thanks
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>> --
> > > > > > >>> > > > >>>>>>>>> Best Regards
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>> Jeff Zhang
> > > > > > >>> > > > >>>>>>>>>
> > > > > > >>> > > > >>>>>>>>
> > > > > > >>> > > > >>>>>>>
> > > > > > >>> > > > >>>>>>>
> > > > > > >>> > > > >>>>>>> --
> > > > > > >>> > > > >>>>>>> Best Regards
> > > > > > >>> > > > >>>>>>>
> > > > > > >>> > > > >>>>>>> Jeff Zhang
> > > > > > >>> > > > >>>>>>>
> > > > > > >>> > > > >>>>>>
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>> --
> > > > > > >>> > > > >>>>> Best Regards
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>> Jeff Zhang
> > > > > > >>> > > > >>>>>
> > > > > > >>> > > > >>>>
> > > > > > >>> > > > >>>
> > > > > > >>> > > > >>>
> > > > > > >>> > > > >>> --
> > > > > > >>> > > > >>> Best Regards
> > > > > > >>> > > > >>>
> > > > > > >>> > > > >>> Jeff Zhang
> > > > > > >>> > > > >>>
> > > > > > >>> > > > >>
> > > > > > >>> > > > >
> > > > > > >>> > > > >
> > > > > > >>> > > > > --
> > > > > > >>> > > > > Best Regards
> > > > > > >>> > > > >
> > > > > > >>> > > > > Jeff Zhang
> > > > > > >>> > > >
> > > > > > >>> > > >
> > > > > > >>> > >
> > > > > > >>> > > --
> > > > > > >>> > > Best Regards
> > > > > > >>> > >
> > > > > > >>> > > Jeff Zhang
> > > > > > >>> > >
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > > > --
> > > > > > Best Regards
> > > > > >
> > > > > > Jeff Zhang
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> > >
> >
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Flavio Pompermaier <po...@okkam.it>.

Just one note on my side: it is not clear to me whether the client needs to
be able to generate a job graph or not.
In my opinion, the job jar must resides only on the server/jobManager side
and the client requires a way to get the job graph.
If you really want to access to the job graph, I'd add a dedicated method
on the ClusterClient. like:

   - getJobGraph(jarId, mainClass): JobGraph
   - listMainClasses(jarId): List<String>

These would require some addition also on the job manager endpoint as
well..what do you think?

On Wed, Jul 31, 2019 at 12:42 PM Zili Chen <wa...@gmail.com> wrote:

> Hi all,
>
> Here is a document[1] on client api enhancement from our perspective.
> We have investigated current implementations. And we propose
>
> 1. Unify the implementation of cluster deployment and job submission in
> Flink.
> 2. Provide programmatic interfaces to allow flexible job and cluster
> management.
>
> The first proposal is aimed at reducing code paths of cluster deployment
> and
> job submission so that one can adopt Flink in his usage easily. The second
> proposal is aimed at providing rich interfaces for advanced users
> who want to make accurate control of these stages.
>
> Quick reference on open questions:
>
> 1. Exclude job cluster deployment from client side or redefine the semantic
> of job cluster? Since it fits in a process quite different from session
> cluster deployment and job submission.
>
> 2. Maintain the codepaths handling class o.a.f.api.common.Program or
> implement customized program handling logic by customized CliFrontend?
> See also this thread[2] and the document[1].
>
> 3. Expose ClusterClient as public api or just expose api in
> ExecutionEnvironment
> and delegate them to ClusterClient? Further, in either way is it worth to
> introduce a JobClient which is an encapsulation of ClusterClient that
> associated to specific job?
>
> Best,
> tison.
>
> [1]
>
> https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
> [2]
>
> https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
>
> Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三 上午9:19写道：
>
> > Thanks Stephan, I will follow up this issue in next few weeks, and will
> > refine the design doc. We could discuss more details after 1.9 release.
> >
> > Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：
> >
> > > Hi all!
> > >
> > > This thread has stalled for a bit, which I assume ist mostly due to the
> > > Flink 1.9 feature freeze and release testing effort.
> > >
> > > I personally still recognize this issue as one important to be solved.
> > I'd
> > > be happy to help resume this discussion soon (after the 1.9 release)
> and
> > > see if we can do some step towards this in Flink 1.10.
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > >
> > > On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <
> > pompermaier@okkam.it>
> > > wrote:
> > >
> > > > That's exactly what I suggested a long time ago: the Flink REST
> client
> > > > should not require any Flink dependency, only http library to call
> the
> > > REST
> > > > services to submit and monitor a job.
> > > > What I suggested also in [1] was to have a way to automatically
> suggest
> > > the
> > > > user (via a UI) the available main classes and their required
> > > > parameters[2].
> > > > Another problem we have with Flink is that the Rest client and the
> CLI
> > > one
> > > > behaves differently and we use the CLI client (via ssh) because it
> > allows
> > > > to call some other method after env.execute() [3] (we have to call
> > > another
> > > > REST service to signal the end of the job).
> > > > Int his regard, a dedicated interface, like the JobListener suggested
> > in
> > > > the previous emails, would be very helpful (IMHO).
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-10864
> > > > [2] https://issues.apache.org/jira/browse/FLINK-10862
> > > > [3] https://issues.apache.org/jira/browse/FLINK-10879
> > > >
> > > > Best,
> > > > Flavio
> > > >
> > > > On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <zj...@gmail.com> wrote:
> > > >
> > > > > Hi, Tison,
> > > > >
> > > > > Thanks for your comments. Overall I agree with you that it is
> > difficult
> > > > for
> > > > > down stream project to integrate with flink and we need to refactor
> > the
> > > > > current flink client api.
> > > > > And I agree that CliFrontend should only parsing command line
> > arguments
> > > > and
> > > > > then pass them to ExecutionEnvironment. It is
> ExecutionEnvironment's
> > > > > responsibility to compile job, create cluster, and submit job.
> > Besides
> > > > > that, Currently flink has many ExecutionEnvironment
> implementations,
> > > and
> > > > > flink will use the specific one based on the context. IMHO, it is
> not
> > > > > necessary, ExecutionEnvironment should be able to do the right
> thing
> > > > based
> > > > > on the FlinkConf it is received. Too many ExecutionEnvironment
> > > > > implementation is another burden for downstream project
> integration.
> > > > >
> > > > > One thing I'd like to mention is flink's scala shell and sql
> client,
> > > > > although they are sub-modules of flink, they could be treated as
> > > > downstream
> > > > > project which use flink's client api. Currently you will find it is
> > not
> > > > > easy for them to integrate with flink, they share many duplicated
> > code.
> > > > It
> > > > > is another sign that we should refactor flink client api.
> > > > >
> > > > > I believe it is a large and hard change, and I am afraid we can not
> > > keep
> > > > > compatibility since many of changes are user facing.
> > > > >
> > > > >
> > > > >
> > > > > Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > After a closer look on our client apis, I can see there are two
> > major
> > > > > > issues to consistency and integration, namely different
> deployment
> > of
> > > > > > job cluster which couples job graph creation and cluster
> > deployment,
> > > > > > and submission via CliFrontend confusing control flow of job
> graph
> > > > > > compilation and job submission. I'd like to follow the discuss
> > above,
> > > > > > mainly the process described by Jeff and Stephan, and share my
> > > > > > ideas on these issues.
> > > > > >
> > > > > > 1) CliFrontend confuses the control flow of job compilation and
> > > > > submission.
> > > > > > Following the process of job submission Stephan and Jeff
> described,
> > > > > > execution environment knows all configs of the cluster and
> > > > topos/settings
> > > > > > of the job. Ideally, in the main method of user program, it calls
> > > > > #execute
> > > > > > (or named #submit) and Flink deploys the cluster, compile the job
> > > graph
> > > > > > and submit it to the cluster. However, current CliFrontend does
> all
> > > > these
> > > > > > things inside its #runProgram method, which introduces a lot of
> > > > > subclasses
> > > > > > of (stream) execution environment.
> > > > > >
> > > > > > Actually, it sets up an exec env that hijacks the
> > > #execute/executePlan
> > > > > > method, initializes the job graph and abort execution. And then
> > > > > > control flow back to CliFrontend, it deploys the cluster(or
> > retrieve
> > > > > > the client) and submits the job graph. This is quite a specific
> > > > internal
> > > > > > process inside Flink and none of consistency to anything.
> > > > > >
> > > > > > 2) Deployment of job cluster couples job graph creation and
> cluster
> > > > > > deployment. Abstractly, from user job to a concrete submission,
> it
> > > > > requires
> > > > > >
> > > > > >      create JobGraph --\
> > > > > >
> > > > > > create ClusterClient -->  submit JobGraph
> > > > > >
> > > > > > such a dependency. ClusterClient was created by deploying or
> > > > retrieving.
> > > > > > JobGraph submission requires a compiled JobGraph and valid
> > > > ClusterClient,
> > > > > > but the creation of ClusterClient is abstractly independent of
> that
> > > of
> > > > > > JobGraph. However, in job cluster mode, we deploy job cluster
> with
> > a
> > > > job
> > > > > > graph, which means we use another process:
> > > > > >
> > > > > > create JobGraph --> deploy cluster with the JobGraph
> > > > > >
> > > > > > Here is another inconsistency and downstream projects/client apis
> > are
> > > > > > forced to handle different cases with rare supports from Flink.
> > > > > >
> > > > > > Since we likely reached a consensus on
> > > > > >
> > > > > > 1. all configs gathered by Flink configuration and passed
> > > > > > 2. execution environment knows all configs and handles
> > execution(both
> > > > > > deployment and submission)
> > > > > >
> > > > > > to the issues above I propose eliminating inconsistencies by
> > > following
> > > > > > approach:
> > > > > >
> > > > > > 1) CliFrontend should exactly be a front end, at least for "run"
> > > > command.
> > > > > > That means it just gathered and passed all config from command
> line
> > > to
> > > > > > the main method of user program. Execution environment knows all
> > the
> > > > info
> > > > > > and with an addition to utils for ClusterClient, we gracefully
> get
> > a
> > > > > > ClusterClient by deploying or retrieving. In this way, we don't
> > need
> > > to
> > > > > > hijack #execute/executePlan methods and can remove various
> hacking
> > > > > > subclasses of exec env, as well as #run methods in
> > ClusterClient(for
> > > an
> > > > > > interface-ized ClusterClient). Now the control flow flows from
> > > > > CliFrontend
> > > > > > to the main method and never returns.
> > > > > >
> > > > > > 2) Job cluster means a cluster for the specific job. From another
> > > > > > perspective, it is an ephemeral session. We may decouple the
> > > deployment
> > > > > > with a compiled job graph, but start a session with idle timeout
> > > > > > and submit the job following.
> > > > > >
> > > > > > These topics, before we go into more details on design or
> > > > implementation,
> > > > > > are better to be aware and discussed for a consensus.
> > > > > >
> > > > > > Best,
> > > > > > tison.
> > > > > >
> > > > > >
> > > > > > Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
> > > > > >
> > > > > >> Hi Jeff,
> > > > > >>
> > > > > >> Thanks for raising this thread and the design document!
> > > > > >>
> > > > > >> As @Thomas Weise mentioned above, extending config to flink
> > > > > >> requires far more effort than it should be. Another example
> > > > > >> is we achieve detach mode by introduce another execution
> > > > > >> environment which also hijack #execute method.
> > > > > >>
> > > > > >> I agree with your idea that user would configure all things
> > > > > >> and flink "just" respect it. On this topic I think the unusual
> > > > > >> control flow when CliFrontend handle "run" command is the
> problem.
> > > > > >> It handles several configs, mainly about cluster settings, and
> > > > > >> thus main method of user program is unaware of them. Also it
> > > compiles
> > > > > >> app to job graph by run the main method with a hijacked exec
> env,
> > > > > >> which constrain the main method further.
> > > > > >>
> > > > > >> I'd like to write down a few of notes on configs/args pass and
> > > > respect,
> > > > > >> as well as decoupling job compilation and submission. Share on
> > this
> > > > > >> thread later.
> > > > > >>
> > > > > >> Best,
> > > > > >> tison.
> > > > > >>
> > > > > >>
> > > > > >> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一 下午7:29写道：
> > > > > >>
> > > > > >>> Hi Jeff and Flavio,
> > > > > >>>
> > > > > >>> Thanks Jeff a lot for proposing the design document.
> > > > > >>>
> > > > > >>> We are also working on refactoring ClusterClient to allow
> > flexible
> > > > and
> > > > > >>> efficient job management in our real-time platform.
> > > > > >>> We would like to draft a document to share our ideas with you.
> > > > > >>>
> > > > > >>> I think it's a good idea to have something like Apache Livy for
> > > > Flink,
> > > > > >>> and
> > > > > >>> the efforts discussed here will take a great step forward to
> it.
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>> Xiaogang
> > > > > >>>
> > > > > >>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一
> > 下午7:13写道：
> > > > > >>>
> > > > > >>> > Is there any possibility to have something like Apache Livy
> [1]
> > > > also
> > > > > >>> for
> > > > > >>> > Flink in the future?
> > > > > >>> >
> > > > > >>> > [1] https://livy.apache.org/
> > > > > >>> >
> > > > > >>> > On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <zjffdu@gmail.com
> >
> > > > wrote:
> > > > > >>> >
> > > > > >>> > > >>>  Any API we expose should not have dependencies on the
> > > > runtime
> > > > > >>> > > (flink-runtime) package or other implementation details. To
> > me,
> > > > > this
> > > > > >>> > means
> > > > > >>> > > that the current ClusterClient cannot be exposed to users
> > > because
> > > > > it
> > > > > >>> >  uses
> > > > > >>> > > quite some classes from the optimiser and runtime packages.
> > > > > >>> > >
> > > > > >>> > > We should change ClusterClient from class to interface.
> > > > > >>> > > ExecutionEnvironment only use the interface ClusterClient
> > which
> > > > > >>> should be
> > > > > >>> > > in flink-clients while the concrete implementation class
> > could
> > > be
> > > > > in
> > > > > >>> > > flink-runtime.
> > > > > >>> > >
> > > > > >>> > > >>> What happens when a failure/restart in the client
> > happens?
> > > > > There
> > > > > >>> need
> > > > > >>> > > to be a way of re-establishing the connection to the job,
> set
> > > up
> > > > > the
> > > > > >>> > > listeners again, etc.
> > > > > >>> > >
> > > > > >>> > > Good point.  First we need to define what does
> > failure/restart
> > > in
> > > > > the
> > > > > >>> > > client mean. IIUC, that usually mean network failure which
> > will
> > > > > >>> happen in
> > > > > >>> > > class RestClient. If my understanding is correct,
> > restart/retry
> > > > > >>> mechanism
> > > > > >>> > > should be done in RestClient.
> > > > > >>> > >
> > > > > >>> > >
> > > > > >>> > >
> > > > > >>> > >
> > > > > >>> > >
> > > > > >>> > > Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二
> > > 下午11:10写道：
> > > > > >>> > >
> > > > > >>> > > > Some points to consider:
> > > > > >>> > > >
> > > > > >>> > > > * Any API we expose should not have dependencies on the
> > > runtime
> > > > > >>> > > > (flink-runtime) package or other implementation details.
> To
> > > me,
> > > > > >>> this
> > > > > >>> > > means
> > > > > >>> > > > that the current ClusterClient cannot be exposed to users
> > > > because
> > > > > >>> it
> > > > > >>> > >  uses
> > > > > >>> > > > quite some classes from the optimiser and runtime
> packages.
> > > > > >>> > > >
> > > > > >>> > > > * What happens when a failure/restart in the client
> > happens?
> > > > > There
> > > > > >>> need
> > > > > >>> > > to
> > > > > >>> > > > be a way of re-establishing the connection to the job,
> set
> > up
> > > > the
> > > > > >>> > > listeners
> > > > > >>> > > > again, etc.
> > > > > >>> > > >
> > > > > >>> > > > Aljoscha
> > > > > >>> > > >
> > > > > >>> > > > > On 29. May 2019, at 10:17, Jeff Zhang <
> zjffdu@gmail.com>
> > > > > wrote:
> > > > > >>> > > > >
> > > > > >>> > > > > Sorry folks, the design doc is late as you expected.
> > Here's
> > > > the
> > > > > >>> > design
> > > > > >>> > > > doc
> > > > > >>> > > > > I drafted, welcome any comments and feedback.
> > > > > >>> > > > >
> > > > > >>> > > > >
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > > > >>> > > > >
> > > > > >>> > > > >
> > > > > >>> > > > >
> > > > > >>> > > > > Stephan Ewen <se...@apache.org> 于2019年2月14日周四
> 下午8:43写道：
> > > > > >>> > > > >
> > > > > >>> > > > >> Nice that this discussion is happening.
> > > > > >>> > > > >>
> > > > > >>> > > > >> In the FLIP, we could also revisit the entire role of
> > the
> > > > > >>> > environments
> > > > > >>> > > > >> again.
> > > > > >>> > > > >>
> > > > > >>> > > > >> Initially, the idea was:
> > > > > >>> > > > >>  - the environments take care of the specific setup
> for
> > > > > >>> standalone
> > > > > >>> > (no
> > > > > >>> > > > >> setup needed), yarn, mesos, etc.
> > > > > >>> > > > >>  - the session ones have control over the session. The
> > > > > >>> environment
> > > > > >>> > > holds
> > > > > >>> > > > >> the session client.
> > > > > >>> > > > >>  - running a job gives a "control" object for that
> job.
> > > That
> > > > > >>> > behavior
> > > > > >>> > > is
> > > > > >>> > > > >> the same in all environments.
> > > > > >>> > > > >>
> > > > > >>> > > > >> The actual implementation diverged quite a bit from
> > that.
> > > > > Happy
> > > > > >>> to
> > > > > >>> > > see a
> > > > > >>> > > > >> discussion about straitening this out a bit more.
> > > > > >>> > > > >>
> > > > > >>> > > > >>
> > > > > >>> > > > >> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
> > > > zjffdu@gmail.com>
> > > > > >>> > wrote:
> > > > > >>> > > > >>
> > > > > >>> > > > >>> Hi folks,
> > > > > >>> > > > >>>
> > > > > >>> > > > >>> Sorry for late response, It seems we reach consensus
> on
> > > > > this, I
> > > > > >>> > will
> > > > > >>> > > > >> create
> > > > > >>> > > > >>> FLIP for this with more detailed design
> > > > > >>> > > > >>>
> > > > > >>> > > > >>>
> > > > > >>> > > > >>> Thomas Weise <th...@apache.org> 于2018年12月21日周五
> > 上午11:43写道：
> > > > > >>> > > > >>>
> > > > > >>> > > > >>>> Great to see this discussion seeded! The problems
> you
> > > face
> > > > > >>> with
> > > > > >>> > the
> > > > > >>> > > > >>>> Zeppelin integration are also affecting other
> > downstream
> > > > > >>> projects,
> > > > > >>> > > > like
> > > > > >>> > > > >>>> Beam.
> > > > > >>> > > > >>>>
> > > > > >>> > > > >>>> We just enabled the savepoint restore option in
> > > > > >>> > > > RemoteStreamEnvironment
> > > > > >>> > > > >>> [1]
> > > > > >>> > > > >>>> and that was more difficult than it should be. The
> > main
> > > > > issue
> > > > > >>> is
> > > > > >>> > > that
> > > > > >>> > > > >>>> environment and cluster client aren't decoupled.
> > Ideally
> > > > it
> > > > > >>> should
> > > > > >>> > > be
> > > > > >>> > > > >>>> possible to just get the matching cluster client
> from
> > > the
> > > > > >>> > > environment
> > > > > >>> > > > >> and
> > > > > >>> > > > >>>> then control the job through it (environment as
> > factory
> > > > for
> > > > > >>> > cluster
> > > > > >>> > > > >>>> client). But note that the environment classes are
> > part
> > > of
> > > > > the
> > > > > >>> > > public
> > > > > >>> > > > >>> API,
> > > > > >>> > > > >>>> and it is not straightforward to make larger changes
> > > > without
> > > > > >>> > > breaking
> > > > > >>> > > > >>>> backward compatibility.
> > > > > >>> > > > >>>>
> > > > > >>> > > > >>>> ClusterClient currently exposes internal classes
> like
> > > > > >>> JobGraph and
> > > > > >>> > > > >>>> StreamGraph. But it should be possible to wrap this
> > > with a
> > > > > new
> > > > > >>> > > public
> > > > > >>> > > > >> API
> > > > > >>> > > > >>>> that brings the required job control capabilities
> for
> > > > > >>> downstream
> > > > > >>> > > > >>> projects.
> > > > > >>> > > > >>>> Perhaps it is helpful to look at some of the
> > interfaces
> > > in
> > > > > >>> Beam
> > > > > >>> > > while
> > > > > >>> > > > >>>> thinking about this: [2] for the portable job API
> and
> > > [3]
> > > > > for
> > > > > >>> the
> > > > > >>> > > old
> > > > > >>> > > > >>>> asynchronous job control from the Beam Java SDK.
> > > > > >>> > > > >>>>
> > > > > >>> > > > >>>> The backward compatibility discussion [4] is also
> > > relevant
> > > > > >>> here. A
> > > > > >>> > > new
> > > > > >>> > > > >>> API
> > > > > >>> > > > >>>> should shield downstream projects from internals and
> > > allow
> > > > > >>> them to
> > > > > >>> > > > >>>> interoperate with multiple future Flink versions in
> > the
> > > > same
> > > > > >>> > release
> > > > > >>> > > > >> line
> > > > > >>> > > > >>>> without forced upgrades.
> > > > > >>> > > > >>>>
> > > > > >>> > > > >>>> Thanks,
> > > > > >>> > > > >>>> Thomas
> > > > > >>> > > > >>>>
> > > > > >>> > > > >>>> [1] https://github.com/apache/flink/pull/7249
> > > > > >>> > > > >>>> [2]
> > > > > >>> > > > >>>>
> > > > > >>> > > > >>>>
> > > > > >>> > > > >>>
> > > > > >>> > > > >>
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > > > >>> > > > >>>> [3]
> > > > > >>> > > > >>>>
> > > > > >>> > > > >>>>
> > > > > >>> > > > >>>
> > > > > >>> > > > >>
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > > > >>> > > > >>>> [4]
> > > > > >>> > > > >>>>
> > > > > >>> > > > >>>>
> > > > > >>> > > > >>>
> > > > > >>> > > > >>
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > > > >>> > > > >>>>
> > > > > >>> > > > >>>>
> > > > > >>> > > > >>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
> > > > > zjffdu@gmail.com>
> > > > > >>> > > wrote:
> > > > > >>> > > > >>>>
> > > > > >>> > > > >>>>>>>> I'm not so sure whether the user should be able
> to
> > > > > define
> > > > > >>> > where
> > > > > >>> > > > >> the
> > > > > >>> > > > >>>> job
> > > > > >>> > > > >>>>> runs (in your example Yarn). This is actually
> > > independent
> > > > > of
> > > > > >>> the
> > > > > >>> > > job
> > > > > >>> > > > >>>>> development and is something which is decided at
> > > > deployment
> > > > > >>> time.
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>> User don't need to specify execution mode
> > > > programmatically.
> > > > > >>> They
> > > > > >>> > > can
> > > > > >>> > > > >>> also
> > > > > >>> > > > >>>>> pass the execution mode from the arguments in flink
> > run
> > > > > >>> command.
> > > > > >>> > > e.g.
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>> bin/flink run -m yarn-cluster ....
> > > > > >>> > > > >>>>> bin/flink run -m local ...
> > > > > >>> > > > >>>>> bin/flink run -m host:port ...
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>> Does this make sense to you ?
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>>>>> To me it makes sense that the
> ExecutionEnvironment
> > > is
> > > > > not
> > > > > >>> > > > >> directly
> > > > > >>> > > > >>>>> initialized by the user and instead context
> sensitive
> > > how
> > > > > you
> > > > > >>> > want
> > > > > >>> > > to
> > > > > >>> > > > >>>>> execute your job (Flink CLI vs. IDE, for example).
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>> Right, currently I notice Flink would create
> > different
> > > > > >>> > > > >>>>> ContextExecutionEnvironment based on different
> > > submission
> > > > > >>> > scenarios
> > > > > >>> > > > >>>> (Flink
> > > > > >>> > > > >>>>> Cli vs IDE). To me this is kind of hack approach,
> not
> > > so
> > > > > >>> > > > >>> straightforward.
> > > > > >>> > > > >>>>> What I suggested above is that is that flink should
> > > > always
> > > > > >>> create
> > > > > >>> > > the
> > > > > >>> > > > >>>> same
> > > > > >>> > > > >>>>> ExecutionEnvironment but with different
> > configuration,
> > > > and
> > > > > >>> based
> > > > > >>> > on
> > > > > >>> > > > >> the
> > > > > >>> > > > >>>>> configuration it would create the proper
> > ClusterClient
> > > > for
> > > > > >>> > > different
> > > > > >>> > > > >>>>> behaviors.
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>> Till Rohrmann <tr...@apache.org>
> 于2018年12月20日周四
> > > > > >>> 下午11:18写道：
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>>> You are probably right that we have code
> duplication
> > > > when
> > > > > it
> > > > > >>> > comes
> > > > > >>> > > > >> to
> > > > > >>> > > > >>>> the
> > > > > >>> > > > >>>>>> creation of the ClusterClient. This should be
> > reduced
> > > in
> > > > > the
> > > > > >>> > > > >> future.
> > > > > >>> > > > >>>>>>
> > > > > >>> > > > >>>>>> I'm not so sure whether the user should be able to
> > > > define
> > > > > >>> where
> > > > > >>> > > the
> > > > > >>> > > > >>> job
> > > > > >>> > > > >>>>>> runs (in your example Yarn). This is actually
> > > > independent
> > > > > >>> of the
> > > > > >>> > > > >> job
> > > > > >>> > > > >>>>>> development and is something which is decided at
> > > > > deployment
> > > > > >>> > time.
> > > > > >>> > > > >> To
> > > > > >>> > > > >>> me
> > > > > >>> > > > >>>>> it
> > > > > >>> > > > >>>>>> makes sense that the ExecutionEnvironment is not
> > > > directly
> > > > > >>> > > > >> initialized
> > > > > >>> > > > >>>> by
> > > > > >>> > > > >>>>>> the user and instead context sensitive how you
> want
> > to
> > > > > >>> execute
> > > > > >>> > > your
> > > > > >>> > > > >>> job
> > > > > >>> > > > >>>>>> (Flink CLI vs. IDE, for example). However, I agree
> > > that
> > > > > the
> > > > > >>> > > > >>>>>> ExecutionEnvironment should give you access to the
> > > > > >>> ClusterClient
> > > > > >>> > > > >> and
> > > > > >>> > > > >>> to
> > > > > >>> > > > >>>>> the
> > > > > >>> > > > >>>>>> job (maybe in the form of the JobGraph or a job
> > plan).
> > > > > >>> > > > >>>>>>
> > > > > >>> > > > >>>>>> Cheers,
> > > > > >>> > > > >>>>>> Till
> > > > > >>> > > > >>>>>>
> > > > > >>> > > > >>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
> > > > > >>> zjffdu@gmail.com>
> > > > > >>> > > > >> wrote:
> > > > > >>> > > > >>>>>>
> > > > > >>> > > > >>>>>>> Hi Till,
> > > > > >>> > > > >>>>>>> Thanks for the feedback. You are right that I
> > expect
> > > > > better
> > > > > >>> > > > >>>>> programmatic
> > > > > >>> > > > >>>>>>> job submission/control api which could be used by
> > > > > >>> downstream
> > > > > >>> > > > >>> project.
> > > > > >>> > > > >>>>> And
> > > > > >>> > > > >>>>>>> it would benefit for the flink ecosystem. When I
> > look
> > > > at
> > > > > >>> the
> > > > > >>> > code
> > > > > >>> > > > >>> of
> > > > > >>> > > > >>>>>> flink
> > > > > >>> > > > >>>>>>> scala-shell and sql-client (I believe they are
> not
> > > the
> > > > > >>> core of
> > > > > >>> > > > >>> flink,
> > > > > >>> > > > >>>>> but
> > > > > >>> > > > >>>>>>> belong to the ecosystem of flink), I find many
> > > > duplicated
> > > > > >>> code
> > > > > >>> > > > >> for
> > > > > >>> > > > >>>>>> creating
> > > > > >>> > > > >>>>>>> ClusterClient from user provided configuration
> > > > > >>> (configuration
> > > > > >>> > > > >>> format
> > > > > >>> > > > >>>>> may
> > > > > >>> > > > >>>>>> be
> > > > > >>> > > > >>>>>>> different from scala-shell and sql-client) and
> then
> > > use
> > > > > >>> that
> > > > > >>> > > > >>>>>> ClusterClient
> > > > > >>> > > > >>>>>>> to manipulate jobs. I don't think this is
> > convenient
> > > > for
> > > > > >>> > > > >> downstream
> > > > > >>> > > > >>>>>>> projects. What I expect is that downstream
> project
> > > only
> > > > > >>> needs
> > > > > >>> > to
> > > > > >>> > > > >>>>> provide
> > > > > >>> > > > >>>>>>> necessary configuration info (maybe introducing
> > class
> > > > > >>> > FlinkConf),
> > > > > >>> > > > >>> and
> > > > > >>> > > > >>>>>> then
> > > > > >>> > > > >>>>>>> build ExecutionEnvironment based on this
> FlinkConf,
> > > and
> > > > > >>> > > > >>>>>>> ExecutionEnvironment will create the proper
> > > > > ClusterClient.
> > > > > >>> It
> > > > > >>> > not
> > > > > >>> > > > >>>> only
> > > > > >>> > > > >>>>>>> benefit for the downstream project development
> but
> > > also
> > > > > be
> > > > > >>> > > > >> helpful
> > > > > >>> > > > >>>> for
> > > > > >>> > > > >>>>>>> their integration test with flink. Here's one
> > sample
> > > > code
> > > > > >>> > snippet
> > > > > >>> > > > >>>> that
> > > > > >>> > > > >>>>> I
> > > > > >>> > > > >>>>>>> expect.
> > > > > >>> > > > >>>>>>>
> > > > > >>> > > > >>>>>>> val conf = new FlinkConf().mode("yarn")
> > > > > >>> > > > >>>>>>> val env = new ExecutionEnvironment(conf)
> > > > > >>> > > > >>>>>>> val jobId = env.submit(...)
> > > > > >>> > > > >>>>>>> val jobStatus =
> > > > > >>> env.getClusterClient().queryJobStatus(jobId)
> > > > > >>> > > > >>>>>>> env.getClusterClient().cancelJob(jobId)
> > > > > >>> > > > >>>>>>>
> > > > > >>> > > > >>>>>>> What do you think ?
> > > > > >>> > > > >>>>>>>
> > > > > >>> > > > >>>>>>>
> > > > > >>> > > > >>>>>>>
> > > > > >>> > > > >>>>>>>
> > > > > >>> > > > >>>>>>> Till Rohrmann <tr...@apache.org>
> > 于2018年12月11日周二
> > > > > >>> 下午6:28写道：
> > > > > >>> > > > >>>>>>>
> > > > > >>> > > > >>>>>>>> Hi Jeff,
> > > > > >>> > > > >>>>>>>>
> > > > > >>> > > > >>>>>>>> what you are proposing is to provide the user
> with
> > > > > better
> > > > > >>> > > > >>>>> programmatic
> > > > > >>> > > > >>>>>>> job
> > > > > >>> > > > >>>>>>>> control. There was actually an effort to achieve
> > > this
> > > > > but
> > > > > >>> it
> > > > > >>> > > > >> has
> > > > > >>> > > > >>>>> never
> > > > > >>> > > > >>>>>>> been
> > > > > >>> > > > >>>>>>>> completed [1]. However, there are some
> improvement
> > > in
> > > > > the
> > > > > >>> code
> > > > > >>> > > > >>> base
> > > > > >>> > > > >>>>>> now.
> > > > > >>> > > > >>>>>>>> Look for example at the NewClusterClient
> interface
> > > > which
> > > > > >>> > > > >> offers a
> > > > > >>> > > > >>>>>>>> non-blocking job submission. But I agree that we
> > > need
> > > > to
> > > > > >>> > > > >> improve
> > > > > >>> > > > >>>>> Flink
> > > > > >>> > > > >>>>>> in
> > > > > >>> > > > >>>>>>>> this regard.
> > > > > >>> > > > >>>>>>>>
> > > > > >>> > > > >>>>>>>> I would not be in favour if exposing all
> > > ClusterClient
> > > > > >>> calls
> > > > > >>> > > > >> via
> > > > > >>> > > > >>>> the
> > > > > >>> > > > >>>>>>>> ExecutionEnvironment because it would clutter
> the
> > > > class
> > > > > >>> and
> > > > > >>> > > > >> would
> > > > > >>> > > > >>>> not
> > > > > >>> > > > >>>>>> be
> > > > > >>> > > > >>>>>>> a
> > > > > >>> > > > >>>>>>>> good separation of concerns. Instead one idea
> > could
> > > be
> > > > > to
> > > > > >>> > > > >>> retrieve
> > > > > >>> > > > >>>>> the
> > > > > >>> > > > >>>>>>>> current ClusterClient from the
> > ExecutionEnvironment
> > > > > which
> > > > > >>> can
> > > > > >>> > > > >>> then
> > > > > >>> > > > >>>> be
> > > > > >>> > > > >>>>>>> used
> > > > > >>> > > > >>>>>>>> for cluster and job control. But before we start
> > an
> > > > > effort
> > > > > >>> > > > >> here,
> > > > > >>> > > > >>> we
> > > > > >>> > > > >>>>>> need
> > > > > >>> > > > >>>>>>> to
> > > > > >>> > > > >>>>>>>> agree and capture what functionality we want to
> > > > provide.
> > > > > >>> > > > >>>>>>>>
> > > > > >>> > > > >>>>>>>> Initially, the idea was that we have the
> > > > > ClusterDescriptor
> > > > > >>> > > > >>>> describing
> > > > > >>> > > > >>>>>> how
> > > > > >>> > > > >>>>>>>> to talk to cluster manager like Yarn or Mesos.
> The
> > > > > >>> > > > >>>> ClusterDescriptor
> > > > > >>> > > > >>>>>> can
> > > > > >>> > > > >>>>>>> be
> > > > > >>> > > > >>>>>>>> used for deploying Flink clusters (job and
> > session)
> > > > and
> > > > > >>> gives
> > > > > >>> > > > >>> you a
> > > > > >>> > > > >>>>>>>> ClusterClient. The ClusterClient controls the
> > > cluster
> > > > > >>> (e.g.
> > > > > >>> > > > >>>>> submitting
> > > > > >>> > > > >>>>>>>> jobs, listing all running jobs). And then there
> > was
> > > > the
> > > > > >>> idea
> > > > > >>> > to
> > > > > >>> > > > >>>>>>> introduce a
> > > > > >>> > > > >>>>>>>> JobClient which you obtain from the
> ClusterClient
> > to
> > > > > >>> trigger
> > > > > >>> > > > >> job
> > > > > >>> > > > >>>>>> specific
> > > > > >>> > > > >>>>>>>> operations (e.g. taking a savepoint, cancelling
> > the
> > > > > job).
> > > > > >>> > > > >>>>>>>>
> > > > > >>> > > > >>>>>>>> [1]
> > > https://issues.apache.org/jira/browse/FLINK-4272
> > > > > >>> > > > >>>>>>>>
> > > > > >>> > > > >>>>>>>> Cheers,
> > > > > >>> > > > >>>>>>>> Till
> > > > > >>> > > > >>>>>>>>
> > > > > >>> > > > >>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang <
> > > > > >>> zjffdu@gmail.com
> > > > > >>> > >
> > > > > >>> > > > >>>>> wrote:
> > > > > >>> > > > >>>>>>>>
> > > > > >>> > > > >>>>>>>>> Hi Folks,
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>> I am trying to integrate flink into apache
> > zeppelin
> > > > > >>> which is
> > > > > >>> > > > >> an
> > > > > >>> > > > >>>>>>>> interactive
> > > > > >>> > > > >>>>>>>>> notebook. And I hit several issues that is
> caused
> > > by
> > > > > >>> flink
> > > > > >>> > > > >>> client
> > > > > >>> > > > >>>>>> api.
> > > > > >>> > > > >>>>>>> So
> > > > > >>> > > > >>>>>>>>> I'd like to proposal the following changes for
> > > flink
> > > > > >>> client
> > > > > >>> > > > >>> api.
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>> 1. Support nonblocking execution. Currently,
> > > > > >>> > > > >>>>>>> ExecutionEnvironment#execute
> > > > > >>> > > > >>>>>>>>> is a blocking method which would do 2 things,
> > first
> > > > > >>> submit
> > > > > >>> > > > >> job
> > > > > >>> > > > >>>> and
> > > > > >>> > > > >>>>>> then
> > > > > >>> > > > >>>>>>>>> wait for job until it is finished. I'd like
> > > > introduce a
> > > > > >>> > > > >>>> nonblocking
> > > > > >>> > > > >>>>>>>>> execution method like
> ExecutionEnvironment#submit
> > > > which
> > > > > >>> only
> > > > > >>> > > > >>>> submit
> > > > > >>> > > > >>>>>> job
> > > > > >>> > > > >>>>>>>> and
> > > > > >>> > > > >>>>>>>>> then return jobId to client. And allow user to
> > > query
> > > > > the
> > > > > >>> job
> > > > > >>> > > > >>>> status
> > > > > >>> > > > >>>>>> via
> > > > > >>> > > > >>>>>>>> the
> > > > > >>> > > > >>>>>>>>> jobId.
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>> 2. Add cancel api in
> > > > > >>> > > > >>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > > >>> > > > >>>>>>>>> currently the only way to cancel job is via cli
> > > > > >>> (bin/flink),
> > > > > >>> > > > >>> this
> > > > > >>> > > > >>>>> is
> > > > > >>> > > > >>>>>>> not
> > > > > >>> > > > >>>>>>>>> convenient for downstream project to use this
> > > > feature.
> > > > > >>> So I'd
> > > > > >>> > > > >>>> like
> > > > > >>> > > > >>>>> to
> > > > > >>> > > > >>>>>>> add
> > > > > >>> > > > >>>>>>>>> cancel api in ExecutionEnvironment
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>> 3. Add savepoint api in
> > > > > >>> > > > >>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > >>> > > > >>>>>>>> It
> > > > > >>> > > > >>>>>>>>> is similar as cancel api, we should use
> > > > > >>> ExecutionEnvironment
> > > > > >>> > > > >> as
> > > > > >>> > > > >>>> the
> > > > > >>> > > > >>>>>>>> unified
> > > > > >>> > > > >>>>>>>>> api for third party to integrate with flink.
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>> 4. Add listener for job execution lifecycle.
> > > > Something
> > > > > >>> like
> > > > > >>> > > > >>>>>> following,
> > > > > >>> > > > >>>>>>> so
> > > > > >>> > > > >>>>>>>>> that downstream project can do custom logic in
> > the
> > > > > >>> lifecycle
> > > > > >>> > > > >> of
> > > > > >>> > > > >>>>> job.
> > > > > >>> > > > >>>>>>> e.g.
> > > > > >>> > > > >>>>>>>>> Zeppelin would capture the jobId after job is
> > > > submitted
> > > > > >>> and
> > > > > >>> > > > >>> then
> > > > > >>> > > > >>>>> use
> > > > > >>> > > > >>>>>>> this
> > > > > >>> > > > >>>>>>>>> jobId to cancel it later when necessary.
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>> public interface JobListener {
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>>   void onJobSubmitted(JobID jobId);
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>>   void onJobExecuted(JobExecutionResult
> > jobResult);
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>>   void onJobCanceled(JobID jobId);
> > > > > >>> > > > >>>>>>>>> }
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>> 5. Enable session in ExecutionEnvironment.
> > > Currently
> > > > it
> > > > > >>> is
> > > > > >>> > > > >>>>> disabled,
> > > > > >>> > > > >>>>>>> but
> > > > > >>> > > > >>>>>>>>> session is very convenient for third party to
> > > > > submitting
> > > > > >>> jobs
> > > > > >>> > > > >>>>>>>> continually.
> > > > > >>> > > > >>>>>>>>> I hope flink can enable it again.
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>> 6. Unify all flink client api into
> > > > > >>> > > > >>>>>>>>>
> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>> This is a long term issue which needs more
> > careful
> > > > > >>> thinking
> > > > > >>> > > > >> and
> > > > > >>> > > > >>>>>> design.
> > > > > >>> > > > >>>>>>>>> Currently some of features of flink is exposed
> in
> > > > > >>> > > > >>>>>>>>>
> ExecutionEnvironment/StreamExecutionEnvironment,
> > > but
> > > > > >>> some are
> > > > > >>> > > > >>>>> exposed
> > > > > >>> > > > >>>>>>> in
> > > > > >>> > > > >>>>>>>>> cli instead of api, like the cancel and
> > savepoint I
> > > > > >>> mentioned
> > > > > >>> > > > >>>>> above.
> > > > > >>> > > > >>>>>> I
> > > > > >>> > > > >>>>>>>>> think the root cause is due to that flink
> didn't
> > > > unify
> > > > > >>> the
> > > > > >>> > > > >>>>>> interaction
> > > > > >>> > > > >>>>>>>> with
> > > > > >>> > > > >>>>>>>>> flink. Here I list 3 scenarios of flink
> operation
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>>   - Local job execution.  Flink will create
> > > > > >>> LocalEnvironment
> > > > > >>> > > > >>> and
> > > > > >>> > > > >>>>>> then
> > > > > >>> > > > >>>>>>>> use
> > > > > >>> > > > >>>>>>>>>   this LocalEnvironment to create LocalExecutor
> > for
> > > > job
> > > > > >>> > > > >>>> execution.
> > > > > >>> > > > >>>>>>>>>   - Remote job execution. Flink will create
> > > > > ClusterClient
> > > > > >>> > > > >>> first
> > > > > >>> > > > >>>>> and
> > > > > >>> > > > >>>>>>> then
> > > > > >>> > > > >>>>>>>>>   create ContextEnvironment based on the
> > > > ClusterClient
> > > > > >>> and
> > > > > >>> > > > >>> then
> > > > > >>> > > > >>>>> run
> > > > > >>> > > > >>>>>>> the
> > > > > >>> > > > >>>>>>>>> job.
> > > > > >>> > > > >>>>>>>>>   - Job cancelation. Flink will create
> > > ClusterClient
> > > > > >>> first
> > > > > >>> > > > >> and
> > > > > >>> > > > >>>>> then
> > > > > >>> > > > >>>>>>>> cancel
> > > > > >>> > > > >>>>>>>>>   this job via this ClusterClient.
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>> As you can see in the above 3 scenarios. Flink
> > > didn't
> > > > > >>> use the
> > > > > >>> > > > >>>> same
> > > > > >>> > > > >>>>>>>>> approach(code path) to interact with flink
> > > > > >>> > > > >>>>>>>>> What I propose is following:
> > > > > >>> > > > >>>>>>>>> Create the proper
> > > LocalEnvironment/RemoteEnvironment
> > > > > >>> (based
> > > > > >>> > > > >> on
> > > > > >>> > > > >>>> user
> > > > > >>> > > > >>>>>>>>> configuration) --> Use this Environment to
> create
> > > > > proper
> > > > > >>> > > > >>>>>> ClusterClient
> > > > > >>> > > > >>>>>>>>> (LocalClusterClient or RestClusterClient) to
> > > > > interactive
> > > > > >>> with
> > > > > >>> > > > >>>>> Flink (
> > > > > >>> > > > >>>>>>> job
> > > > > >>> > > > >>>>>>>>> execution or cancelation)
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>> This way we can unify the process of local
> > > execution
> > > > > and
> > > > > >>> > > > >> remote
> > > > > >>> > > > >>>>>>>> execution.
> > > > > >>> > > > >>>>>>>>> And it is much easier for third party to
> > integrate
> > > > with
> > > > > >>> > > > >> flink,
> > > > > >>> > > > >>>>>> because
> > > > > >>> > > > >>>>>>>>> ExecutionEnvironment is the unified entry point
> > for
> > > > > >>> flink.
> > > > > >>> > > > >> What
> > > > > >>> > > > >>>>> third
> > > > > >>> > > > >>>>>>>> party
> > > > > >>> > > > >>>>>>>>> needs to do is just pass configuration to
> > > > > >>> > > > >> ExecutionEnvironment
> > > > > >>> > > > >>>> and
> > > > > >>> > > > >>>>>>>>> ExecutionEnvironment will do the right thing
> > based
> > > on
> > > > > the
> > > > > >>> > > > >>>>>>> configuration.
> > > > > >>> > > > >>>>>>>>> Flink cli can also be considered as flink api
> > > > consumer.
> > > > > >>> it
> > > > > >>> > > > >> just
> > > > > >>> > > > >>>>> pass
> > > > > >>> > > > >>>>>>> the
> > > > > >>> > > > >>>>>>>>> configuration to ExecutionEnvironment and let
> > > > > >>> > > > >>>> ExecutionEnvironment
> > > > > >>> > > > >>>>> to
> > > > > >>> > > > >>>>>>>>> create the proper ClusterClient instead of
> > letting
> > > > cli
> > > > > to
> > > > > >>> > > > >>> create
> > > > > >>> > > > >>>>>>>>> ClusterClient directly.
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>> 6 would involve large code refactoring, so I
> > think
> > > we
> > > > > can
> > > > > >>> > > > >> defer
> > > > > >>> > > > >>>> it
> > > > > >>> > > > >>>>>> for
> > > > > >>> > > > >>>>>>>>> future release, 1,2,3,4,5 could be done at
> once I
> > > > > >>> believe.
> > > > > >>> > > > >> Let
> > > > > >>> > > > >>> me
> > > > > >>> > > > >>>>>> know
> > > > > >>> > > > >>>>>>>> your
> > > > > >>> > > > >>>>>>>>> comments and feedback, thanks
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>> --
> > > > > >>> > > > >>>>>>>>> Best Regards
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>> Jeff Zhang
> > > > > >>> > > > >>>>>>>>>
> > > > > >>> > > > >>>>>>>>
> > > > > >>> > > > >>>>>>>
> > > > > >>> > > > >>>>>>>
> > > > > >>> > > > >>>>>>> --
> > > > > >>> > > > >>>>>>> Best Regards
> > > > > >>> > > > >>>>>>>
> > > > > >>> > > > >>>>>>> Jeff Zhang
> > > > > >>> > > > >>>>>>>
> > > > > >>> > > > >>>>>>
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>> --
> > > > > >>> > > > >>>>> Best Regards
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>> Jeff Zhang
> > > > > >>> > > > >>>>>
> > > > > >>> > > > >>>>
> > > > > >>> > > > >>>
> > > > > >>> > > > >>>
> > > > > >>> > > > >>> --
> > > > > >>> > > > >>> Best Regards
> > > > > >>> > > > >>>
> > > > > >>> > > > >>> Jeff Zhang
> > > > > >>> > > > >>>
> > > > > >>> > > > >>
> > > > > >>> > > > >
> > > > > >>> > > > >
> > > > > >>> > > > > --
> > > > > >>> > > > > Best Regards
> > > > > >>> > > > >
> > > > > >>> > > > > Jeff Zhang
> > > > > >>> > > >
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> > > --
> > > > > >>> > > Best Regards
> > > > > >>> > >
> > > > > >>> > > Jeff Zhang
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > >
> > > > > --
> > > > > Best Regards
> > > > >
> > > > > Jeff Zhang
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

Hi all,

Here is a document[1] on client api enhancement from our perspective.
We have investigated current implementations. And we propose

1. Unify the implementation of cluster deployment and job submission in
Flink.
2. Provide programmatic interfaces to allow flexible job and cluster
management.

The first proposal is aimed at reducing code paths of cluster deployment and
job submission so that one can adopt Flink in his usage easily. The second
proposal is aimed at providing rich interfaces for advanced users
who want to make accurate control of these stages.

Quick reference on open questions:

1. Exclude job cluster deployment from client side or redefine the semantic
of job cluster? Since it fits in a process quite different from session
cluster deployment and job submission.

2. Maintain the codepaths handling class o.a.f.api.common.Program or
implement customized program handling logic by customized CliFrontend?
See also this thread[2] and the document[1].

3. Expose ClusterClient as public api or just expose api in
ExecutionEnvironment
and delegate them to ClusterClient? Further, in either way is it worth to
introduce a JobClient which is an encapsulation of ClusterClient that
associated to specific job?

Best,
tison.

[1]
https://docs.google.com/document/d/1UWJE7eYWiMuZewBKS0YmdVO2LUTqXPd6-pbOCof9ddY/edit?usp=sharing
[2]
https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E

Jeff Zhang <zj...@gmail.com> 于2019年7月24日周三 上午9:19写道：

> Thanks Stephan, I will follow up this issue in next few weeks, and will
> refine the design doc. We could discuss more details after 1.9 release.
>
> Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：
>
> > Hi all!
> >
> > This thread has stalled for a bit, which I assume ist mostly due to the
> > Flink 1.9 feature freeze and release testing effort.
> >
> > I personally still recognize this issue as one important to be solved.
> I'd
> > be happy to help resume this discussion soon (after the 1.9 release) and
> > see if we can do some step towards this in Flink 1.10.
> >
> > Best,
> > Stephan
> >
> >
> >
> > On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <
> pompermaier@okkam.it>
> > wrote:
> >
> > > That's exactly what I suggested a long time ago: the Flink REST client
> > > should not require any Flink dependency, only http library to call the
> > REST
> > > services to submit and monitor a job.
> > > What I suggested also in [1] was to have a way to automatically suggest
> > the
> > > user (via a UI) the available main classes and their required
> > > parameters[2].
> > > Another problem we have with Flink is that the Rest client and the CLI
> > one
> > > behaves differently and we use the CLI client (via ssh) because it
> allows
> > > to call some other method after env.execute() [3] (we have to call
> > another
> > > REST service to signal the end of the job).
> > > Int his regard, a dedicated interface, like the JobListener suggested
> in
> > > the previous emails, would be very helpful (IMHO).
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-10864
> > > [2] https://issues.apache.org/jira/browse/FLINK-10862
> > > [3] https://issues.apache.org/jira/browse/FLINK-10879
> > >
> > > Best,
> > > Flavio
> > >
> > > On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <zj...@gmail.com> wrote:
> > >
> > > > Hi, Tison,
> > > >
> > > > Thanks for your comments. Overall I agree with you that it is
> difficult
> > > for
> > > > down stream project to integrate with flink and we need to refactor
> the
> > > > current flink client api.
> > > > And I agree that CliFrontend should only parsing command line
> arguments
> > > and
> > > > then pass them to ExecutionEnvironment. It is ExecutionEnvironment's
> > > > responsibility to compile job, create cluster, and submit job.
> Besides
> > > > that, Currently flink has many ExecutionEnvironment implementations,
> > and
> > > > flink will use the specific one based on the context. IMHO, it is not
> > > > necessary, ExecutionEnvironment should be able to do the right thing
> > > based
> > > > on the FlinkConf it is received. Too many ExecutionEnvironment
> > > > implementation is another burden for downstream project integration.
> > > >
> > > > One thing I'd like to mention is flink's scala shell and sql client,
> > > > although they are sub-modules of flink, they could be treated as
> > > downstream
> > > > project which use flink's client api. Currently you will find it is
> not
> > > > easy for them to integrate with flink, they share many duplicated
> code.
> > > It
> > > > is another sign that we should refactor flink client api.
> > > >
> > > > I believe it is a large and hard change, and I am afraid we can not
> > keep
> > > > compatibility since many of changes are user facing.
> > > >
> > > >
> > > >
> > > > Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：
> > > >
> > > > > Hi all,
> > > > >
> > > > > After a closer look on our client apis, I can see there are two
> major
> > > > > issues to consistency and integration, namely different deployment
> of
> > > > > job cluster which couples job graph creation and cluster
> deployment,
> > > > > and submission via CliFrontend confusing control flow of job graph
> > > > > compilation and job submission. I'd like to follow the discuss
> above,
> > > > > mainly the process described by Jeff and Stephan, and share my
> > > > > ideas on these issues.
> > > > >
> > > > > 1) CliFrontend confuses the control flow of job compilation and
> > > > submission.
> > > > > Following the process of job submission Stephan and Jeff described,
> > > > > execution environment knows all configs of the cluster and
> > > topos/settings
> > > > > of the job. Ideally, in the main method of user program, it calls
> > > > #execute
> > > > > (or named #submit) and Flink deploys the cluster, compile the job
> > graph
> > > > > and submit it to the cluster. However, current CliFrontend does all
> > > these
> > > > > things inside its #runProgram method, which introduces a lot of
> > > > subclasses
> > > > > of (stream) execution environment.
> > > > >
> > > > > Actually, it sets up an exec env that hijacks the
> > #execute/executePlan
> > > > > method, initializes the job graph and abort execution. And then
> > > > > control flow back to CliFrontend, it deploys the cluster(or
> retrieve
> > > > > the client) and submits the job graph. This is quite a specific
> > > internal
> > > > > process inside Flink and none of consistency to anything.
> > > > >
> > > > > 2) Deployment of job cluster couples job graph creation and cluster
> > > > > deployment. Abstractly, from user job to a concrete submission, it
> > > > requires
> > > > >
> > > > >      create JobGraph --\
> > > > >
> > > > > create ClusterClient -->  submit JobGraph
> > > > >
> > > > > such a dependency. ClusterClient was created by deploying or
> > > retrieving.
> > > > > JobGraph submission requires a compiled JobGraph and valid
> > > ClusterClient,
> > > > > but the creation of ClusterClient is abstractly independent of that
> > of
> > > > > JobGraph. However, in job cluster mode, we deploy job cluster with
> a
> > > job
> > > > > graph, which means we use another process:
> > > > >
> > > > > create JobGraph --> deploy cluster with the JobGraph
> > > > >
> > > > > Here is another inconsistency and downstream projects/client apis
> are
> > > > > forced to handle different cases with rare supports from Flink.
> > > > >
> > > > > Since we likely reached a consensus on
> > > > >
> > > > > 1. all configs gathered by Flink configuration and passed
> > > > > 2. execution environment knows all configs and handles
> execution(both
> > > > > deployment and submission)
> > > > >
> > > > > to the issues above I propose eliminating inconsistencies by
> > following
> > > > > approach:
> > > > >
> > > > > 1) CliFrontend should exactly be a front end, at least for "run"
> > > command.
> > > > > That means it just gathered and passed all config from command line
> > to
> > > > > the main method of user program. Execution environment knows all
> the
> > > info
> > > > > and with an addition to utils for ClusterClient, we gracefully get
> a
> > > > > ClusterClient by deploying or retrieving. In this way, we don't
> need
> > to
> > > > > hijack #execute/executePlan methods and can remove various hacking
> > > > > subclasses of exec env, as well as #run methods in
> ClusterClient(for
> > an
> > > > > interface-ized ClusterClient). Now the control flow flows from
> > > > CliFrontend
> > > > > to the main method and never returns.
> > > > >
> > > > > 2) Job cluster means a cluster for the specific job. From another
> > > > > perspective, it is an ephemeral session. We may decouple the
> > deployment
> > > > > with a compiled job graph, but start a session with idle timeout
> > > > > and submit the job following.
> > > > >
> > > > > These topics, before we go into more details on design or
> > > implementation,
> > > > > are better to be aware and discussed for a consensus.
> > > > >
> > > > > Best,
> > > > > tison.
> > > > >
> > > > >
> > > > > Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
> > > > >
> > > > >> Hi Jeff,
> > > > >>
> > > > >> Thanks for raising this thread and the design document!
> > > > >>
> > > > >> As @Thomas Weise mentioned above, extending config to flink
> > > > >> requires far more effort than it should be. Another example
> > > > >> is we achieve detach mode by introduce another execution
> > > > >> environment which also hijack #execute method.
> > > > >>
> > > > >> I agree with your idea that user would configure all things
> > > > >> and flink "just" respect it. On this topic I think the unusual
> > > > >> control flow when CliFrontend handle "run" command is the problem.
> > > > >> It handles several configs, mainly about cluster settings, and
> > > > >> thus main method of user program is unaware of them. Also it
> > compiles
> > > > >> app to job graph by run the main method with a hijacked exec env,
> > > > >> which constrain the main method further.
> > > > >>
> > > > >> I'd like to write down a few of notes on configs/args pass and
> > > respect,
> > > > >> as well as decoupling job compilation and submission. Share on
> this
> > > > >> thread later.
> > > > >>
> > > > >> Best,
> > > > >> tison.
> > > > >>
> > > > >>
> > > > >> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一 下午7:29写道：
> > > > >>
> > > > >>> Hi Jeff and Flavio,
> > > > >>>
> > > > >>> Thanks Jeff a lot for proposing the design document.
> > > > >>>
> > > > >>> We are also working on refactoring ClusterClient to allow
> flexible
> > > and
> > > > >>> efficient job management in our real-time platform.
> > > > >>> We would like to draft a document to share our ideas with you.
> > > > >>>
> > > > >>> I think it's a good idea to have something like Apache Livy for
> > > Flink,
> > > > >>> and
> > > > >>> the efforts discussed here will take a great step forward to it.
> > > > >>>
> > > > >>> Regards,
> > > > >>> Xiaogang
> > > > >>>
> > > > >>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一
> 下午7:13写道：
> > > > >>>
> > > > >>> > Is there any possibility to have something like Apache Livy [1]
> > > also
> > > > >>> for
> > > > >>> > Flink in the future?
> > > > >>> >
> > > > >>> > [1] https://livy.apache.org/
> > > > >>> >
> > > > >>> > On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <zj...@gmail.com>
> > > wrote:
> > > > >>> >
> > > > >>> > > >>>  Any API we expose should not have dependencies on the
> > > runtime
> > > > >>> > > (flink-runtime) package or other implementation details. To
> me,
> > > > this
> > > > >>> > means
> > > > >>> > > that the current ClusterClient cannot be exposed to users
> > because
> > > > it
> > > > >>> >  uses
> > > > >>> > > quite some classes from the optimiser and runtime packages.
> > > > >>> > >
> > > > >>> > > We should change ClusterClient from class to interface.
> > > > >>> > > ExecutionEnvironment only use the interface ClusterClient
> which
> > > > >>> should be
> > > > >>> > > in flink-clients while the concrete implementation class
> could
> > be
> > > > in
> > > > >>> > > flink-runtime.
> > > > >>> > >
> > > > >>> > > >>> What happens when a failure/restart in the client
> happens?
> > > > There
> > > > >>> need
> > > > >>> > > to be a way of re-establishing the connection to the job, set
> > up
> > > > the
> > > > >>> > > listeners again, etc.
> > > > >>> > >
> > > > >>> > > Good point.  First we need to define what does
> failure/restart
> > in
> > > > the
> > > > >>> > > client mean. IIUC, that usually mean network failure which
> will
> > > > >>> happen in
> > > > >>> > > class RestClient. If my understanding is correct,
> restart/retry
> > > > >>> mechanism
> > > > >>> > > should be done in RestClient.
> > > > >>> > >
> > > > >>> > >
> > > > >>> > >
> > > > >>> > >
> > > > >>> > >
> > > > >>> > > Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二
> > 下午11:10写道：
> > > > >>> > >
> > > > >>> > > > Some points to consider:
> > > > >>> > > >
> > > > >>> > > > * Any API we expose should not have dependencies on the
> > runtime
> > > > >>> > > > (flink-runtime) package or other implementation details. To
> > me,
> > > > >>> this
> > > > >>> > > means
> > > > >>> > > > that the current ClusterClient cannot be exposed to users
> > > because
> > > > >>> it
> > > > >>> > >  uses
> > > > >>> > > > quite some classes from the optimiser and runtime packages.
> > > > >>> > > >
> > > > >>> > > > * What happens when a failure/restart in the client
> happens?
> > > > There
> > > > >>> need
> > > > >>> > > to
> > > > >>> > > > be a way of re-establishing the connection to the job, set
> up
> > > the
> > > > >>> > > listeners
> > > > >>> > > > again, etc.
> > > > >>> > > >
> > > > >>> > > > Aljoscha
> > > > >>> > > >
> > > > >>> > > > > On 29. May 2019, at 10:17, Jeff Zhang <zj...@gmail.com>
> > > > wrote:
> > > > >>> > > > >
> > > > >>> > > > > Sorry folks, the design doc is late as you expected.
> Here's
> > > the
> > > > >>> > design
> > > > >>> > > > doc
> > > > >>> > > > > I drafted, welcome any comments and feedback.
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > > Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午8:43写道：
> > > > >>> > > > >
> > > > >>> > > > >> Nice that this discussion is happening.
> > > > >>> > > > >>
> > > > >>> > > > >> In the FLIP, we could also revisit the entire role of
> the
> > > > >>> > environments
> > > > >>> > > > >> again.
> > > > >>> > > > >>
> > > > >>> > > > >> Initially, the idea was:
> > > > >>> > > > >>  - the environments take care of the specific setup for
> > > > >>> standalone
> > > > >>> > (no
> > > > >>> > > > >> setup needed), yarn, mesos, etc.
> > > > >>> > > > >>  - the session ones have control over the session. The
> > > > >>> environment
> > > > >>> > > holds
> > > > >>> > > > >> the session client.
> > > > >>> > > > >>  - running a job gives a "control" object for that job.
> > That
> > > > >>> > behavior
> > > > >>> > > is
> > > > >>> > > > >> the same in all environments.
> > > > >>> > > > >>
> > > > >>> > > > >> The actual implementation diverged quite a bit from
> that.
> > > > Happy
> > > > >>> to
> > > > >>> > > see a
> > > > >>> > > > >> discussion about straitening this out a bit more.
> > > > >>> > > > >>
> > > > >>> > > > >>
> > > > >>> > > > >> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
> > > zjffdu@gmail.com>
> > > > >>> > wrote:
> > > > >>> > > > >>
> > > > >>> > > > >>> Hi folks,
> > > > >>> > > > >>>
> > > > >>> > > > >>> Sorry for late response, It seems we reach consensus on
> > > > this, I
> > > > >>> > will
> > > > >>> > > > >> create
> > > > >>> > > > >>> FLIP for this with more detailed design
> > > > >>> > > > >>>
> > > > >>> > > > >>>
> > > > >>> > > > >>> Thomas Weise <th...@apache.org> 于2018年12月21日周五
> 上午11:43写道：
> > > > >>> > > > >>>
> > > > >>> > > > >>>> Great to see this discussion seeded! The problems you
> > face
> > > > >>> with
> > > > >>> > the
> > > > >>> > > > >>>> Zeppelin integration are also affecting other
> downstream
> > > > >>> projects,
> > > > >>> > > > like
> > > > >>> > > > >>>> Beam.
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> We just enabled the savepoint restore option in
> > > > >>> > > > RemoteStreamEnvironment
> > > > >>> > > > >>> [1]
> > > > >>> > > > >>>> and that was more difficult than it should be. The
> main
> > > > issue
> > > > >>> is
> > > > >>> > > that
> > > > >>> > > > >>>> environment and cluster client aren't decoupled.
> Ideally
> > > it
> > > > >>> should
> > > > >>> > > be
> > > > >>> > > > >>>> possible to just get the matching cluster client from
> > the
> > > > >>> > > environment
> > > > >>> > > > >> and
> > > > >>> > > > >>>> then control the job through it (environment as
> factory
> > > for
> > > > >>> > cluster
> > > > >>> > > > >>>> client). But note that the environment classes are
> part
> > of
> > > > the
> > > > >>> > > public
> > > > >>> > > > >>> API,
> > > > >>> > > > >>>> and it is not straightforward to make larger changes
> > > without
> > > > >>> > > breaking
> > > > >>> > > > >>>> backward compatibility.
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> ClusterClient currently exposes internal classes like
> > > > >>> JobGraph and
> > > > >>> > > > >>>> StreamGraph. But it should be possible to wrap this
> > with a
> > > > new
> > > > >>> > > public
> > > > >>> > > > >> API
> > > > >>> > > > >>>> that brings the required job control capabilities for
> > > > >>> downstream
> > > > >>> > > > >>> projects.
> > > > >>> > > > >>>> Perhaps it is helpful to look at some of the
> interfaces
> > in
> > > > >>> Beam
> > > > >>> > > while
> > > > >>> > > > >>>> thinking about this: [2] for the portable job API and
> > [3]
> > > > for
> > > > >>> the
> > > > >>> > > old
> > > > >>> > > > >>>> asynchronous job control from the Beam Java SDK.
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> The backward compatibility discussion [4] is also
> > relevant
> > > > >>> here. A
> > > > >>> > > new
> > > > >>> > > > >>> API
> > > > >>> > > > >>>> should shield downstream projects from internals and
> > allow
> > > > >>> them to
> > > > >>> > > > >>>> interoperate with multiple future Flink versions in
> the
> > > same
> > > > >>> > release
> > > > >>> > > > >> line
> > > > >>> > > > >>>> without forced upgrades.
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> Thanks,
> > > > >>> > > > >>>> Thomas
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> [1] https://github.com/apache/flink/pull/7249
> > > > >>> > > > >>>> [2]
> > > > >>> > > > >>>>
> > > > >>> > > > >>>>
> > > > >>> > > > >>>
> > > > >>> > > > >>
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > > >>> > > > >>>> [3]
> > > > >>> > > > >>>>
> > > > >>> > > > >>>>
> > > > >>> > > > >>>
> > > > >>> > > > >>
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > > >>> > > > >>>> [4]
> > > > >>> > > > >>>>
> > > > >>> > > > >>>>
> > > > >>> > > > >>>
> > > > >>> > > > >>
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > > >>> > > > >>>>
> > > > >>> > > > >>>>
> > > > >>> > > > >>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
> > > > zjffdu@gmail.com>
> > > > >>> > > wrote:
> > > > >>> > > > >>>>
> > > > >>> > > > >>>>>>>> I'm not so sure whether the user should be able to
> > > > define
> > > > >>> > where
> > > > >>> > > > >> the
> > > > >>> > > > >>>> job
> > > > >>> > > > >>>>> runs (in your example Yarn). This is actually
> > independent
> > > > of
> > > > >>> the
> > > > >>> > > job
> > > > >>> > > > >>>>> development and is something which is decided at
> > > deployment
> > > > >>> time.
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>> User don't need to specify execution mode
> > > programmatically.
> > > > >>> They
> > > > >>> > > can
> > > > >>> > > > >>> also
> > > > >>> > > > >>>>> pass the execution mode from the arguments in flink
> run
> > > > >>> command.
> > > > >>> > > e.g.
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>> bin/flink run -m yarn-cluster ....
> > > > >>> > > > >>>>> bin/flink run -m local ...
> > > > >>> > > > >>>>> bin/flink run -m host:port ...
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>> Does this make sense to you ?
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>>>>> To me it makes sense that the ExecutionEnvironment
> > is
> > > > not
> > > > >>> > > > >> directly
> > > > >>> > > > >>>>> initialized by the user and instead context sensitive
> > how
> > > > you
> > > > >>> > want
> > > > >>> > > to
> > > > >>> > > > >>>>> execute your job (Flink CLI vs. IDE, for example).
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>> Right, currently I notice Flink would create
> different
> > > > >>> > > > >>>>> ContextExecutionEnvironment based on different
> > submission
> > > > >>> > scenarios
> > > > >>> > > > >>>> (Flink
> > > > >>> > > > >>>>> Cli vs IDE). To me this is kind of hack approach, not
> > so
> > > > >>> > > > >>> straightforward.
> > > > >>> > > > >>>>> What I suggested above is that is that flink should
> > > always
> > > > >>> create
> > > > >>> > > the
> > > > >>> > > > >>>> same
> > > > >>> > > > >>>>> ExecutionEnvironment but with different
> configuration,
> > > and
> > > > >>> based
> > > > >>> > on
> > > > >>> > > > >> the
> > > > >>> > > > >>>>> configuration it would create the proper
> ClusterClient
> > > for
> > > > >>> > > different
> > > > >>> > > > >>>>> behaviors.
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>> Till Rohrmann <tr...@apache.org> 于2018年12月20日周四
> > > > >>> 下午11:18写道：
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>>> You are probably right that we have code duplication
> > > when
> > > > it
> > > > >>> > comes
> > > > >>> > > > >> to
> > > > >>> > > > >>>> the
> > > > >>> > > > >>>>>> creation of the ClusterClient. This should be
> reduced
> > in
> > > > the
> > > > >>> > > > >> future.
> > > > >>> > > > >>>>>>
> > > > >>> > > > >>>>>> I'm not so sure whether the user should be able to
> > > define
> > > > >>> where
> > > > >>> > > the
> > > > >>> > > > >>> job
> > > > >>> > > > >>>>>> runs (in your example Yarn). This is actually
> > > independent
> > > > >>> of the
> > > > >>> > > > >> job
> > > > >>> > > > >>>>>> development and is something which is decided at
> > > > deployment
> > > > >>> > time.
> > > > >>> > > > >> To
> > > > >>> > > > >>> me
> > > > >>> > > > >>>>> it
> > > > >>> > > > >>>>>> makes sense that the ExecutionEnvironment is not
> > > directly
> > > > >>> > > > >> initialized
> > > > >>> > > > >>>> by
> > > > >>> > > > >>>>>> the user and instead context sensitive how you want
> to
> > > > >>> execute
> > > > >>> > > your
> > > > >>> > > > >>> job
> > > > >>> > > > >>>>>> (Flink CLI vs. IDE, for example). However, I agree
> > that
> > > > the
> > > > >>> > > > >>>>>> ExecutionEnvironment should give you access to the
> > > > >>> ClusterClient
> > > > >>> > > > >> and
> > > > >>> > > > >>> to
> > > > >>> > > > >>>>> the
> > > > >>> > > > >>>>>> job (maybe in the form of the JobGraph or a job
> plan).
> > > > >>> > > > >>>>>>
> > > > >>> > > > >>>>>> Cheers,
> > > > >>> > > > >>>>>> Till
> > > > >>> > > > >>>>>>
> > > > >>> > > > >>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
> > > > >>> zjffdu@gmail.com>
> > > > >>> > > > >> wrote:
> > > > >>> > > > >>>>>>
> > > > >>> > > > >>>>>>> Hi Till,
> > > > >>> > > > >>>>>>> Thanks for the feedback. You are right that I
> expect
> > > > better
> > > > >>> > > > >>>>> programmatic
> > > > >>> > > > >>>>>>> job submission/control api which could be used by
> > > > >>> downstream
> > > > >>> > > > >>> project.
> > > > >>> > > > >>>>> And
> > > > >>> > > > >>>>>>> it would benefit for the flink ecosystem. When I
> look
> > > at
> > > > >>> the
> > > > >>> > code
> > > > >>> > > > >>> of
> > > > >>> > > > >>>>>> flink
> > > > >>> > > > >>>>>>> scala-shell and sql-client (I believe they are not
> > the
> > > > >>> core of
> > > > >>> > > > >>> flink,
> > > > >>> > > > >>>>> but
> > > > >>> > > > >>>>>>> belong to the ecosystem of flink), I find many
> > > duplicated
> > > > >>> code
> > > > >>> > > > >> for
> > > > >>> > > > >>>>>> creating
> > > > >>> > > > >>>>>>> ClusterClient from user provided configuration
> > > > >>> (configuration
> > > > >>> > > > >>> format
> > > > >>> > > > >>>>> may
> > > > >>> > > > >>>>>> be
> > > > >>> > > > >>>>>>> different from scala-shell and sql-client) and then
> > use
> > > > >>> that
> > > > >>> > > > >>>>>> ClusterClient
> > > > >>> > > > >>>>>>> to manipulate jobs. I don't think this is
> convenient
> > > for
> > > > >>> > > > >> downstream
> > > > >>> > > > >>>>>>> projects. What I expect is that downstream project
> > only
> > > > >>> needs
> > > > >>> > to
> > > > >>> > > > >>>>> provide
> > > > >>> > > > >>>>>>> necessary configuration info (maybe introducing
> class
> > > > >>> > FlinkConf),
> > > > >>> > > > >>> and
> > > > >>> > > > >>>>>> then
> > > > >>> > > > >>>>>>> build ExecutionEnvironment based on this FlinkConf,
> > and
> > > > >>> > > > >>>>>>> ExecutionEnvironment will create the proper
> > > > ClusterClient.
> > > > >>> It
> > > > >>> > not
> > > > >>> > > > >>>> only
> > > > >>> > > > >>>>>>> benefit for the downstream project development but
> > also
> > > > be
> > > > >>> > > > >> helpful
> > > > >>> > > > >>>> for
> > > > >>> > > > >>>>>>> their integration test with flink. Here's one
> sample
> > > code
> > > > >>> > snippet
> > > > >>> > > > >>>> that
> > > > >>> > > > >>>>> I
> > > > >>> > > > >>>>>>> expect.
> > > > >>> > > > >>>>>>>
> > > > >>> > > > >>>>>>> val conf = new FlinkConf().mode("yarn")
> > > > >>> > > > >>>>>>> val env = new ExecutionEnvironment(conf)
> > > > >>> > > > >>>>>>> val jobId = env.submit(...)
> > > > >>> > > > >>>>>>> val jobStatus =
> > > > >>> env.getClusterClient().queryJobStatus(jobId)
> > > > >>> > > > >>>>>>> env.getClusterClient().cancelJob(jobId)
> > > > >>> > > > >>>>>>>
> > > > >>> > > > >>>>>>> What do you think ?
> > > > >>> > > > >>>>>>>
> > > > >>> > > > >>>>>>>
> > > > >>> > > > >>>>>>>
> > > > >>> > > > >>>>>>>
> > > > >>> > > > >>>>>>> Till Rohrmann <tr...@apache.org>
> 于2018年12月11日周二
> > > > >>> 下午6:28写道：
> > > > >>> > > > >>>>>>>
> > > > >>> > > > >>>>>>>> Hi Jeff,
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> what you are proposing is to provide the user with
> > > > better
> > > > >>> > > > >>>>> programmatic
> > > > >>> > > > >>>>>>> job
> > > > >>> > > > >>>>>>>> control. There was actually an effort to achieve
> > this
> > > > but
> > > > >>> it
> > > > >>> > > > >> has
> > > > >>> > > > >>>>> never
> > > > >>> > > > >>>>>>> been
> > > > >>> > > > >>>>>>>> completed [1]. However, there are some improvement
> > in
> > > > the
> > > > >>> code
> > > > >>> > > > >>> base
> > > > >>> > > > >>>>>> now.
> > > > >>> > > > >>>>>>>> Look for example at the NewClusterClient interface
> > > which
> > > > >>> > > > >> offers a
> > > > >>> > > > >>>>>>>> non-blocking job submission. But I agree that we
> > need
> > > to
> > > > >>> > > > >> improve
> > > > >>> > > > >>>>> Flink
> > > > >>> > > > >>>>>> in
> > > > >>> > > > >>>>>>>> this regard.
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> I would not be in favour if exposing all
> > ClusterClient
> > > > >>> calls
> > > > >>> > > > >> via
> > > > >>> > > > >>>> the
> > > > >>> > > > >>>>>>>> ExecutionEnvironment because it would clutter the
> > > class
> > > > >>> and
> > > > >>> > > > >> would
> > > > >>> > > > >>>> not
> > > > >>> > > > >>>>>> be
> > > > >>> > > > >>>>>>> a
> > > > >>> > > > >>>>>>>> good separation of concerns. Instead one idea
> could
> > be
> > > > to
> > > > >>> > > > >>> retrieve
> > > > >>> > > > >>>>> the
> > > > >>> > > > >>>>>>>> current ClusterClient from the
> ExecutionEnvironment
> > > > which
> > > > >>> can
> > > > >>> > > > >>> then
> > > > >>> > > > >>>> be
> > > > >>> > > > >>>>>>> used
> > > > >>> > > > >>>>>>>> for cluster and job control. But before we start
> an
> > > > effort
> > > > >>> > > > >> here,
> > > > >>> > > > >>> we
> > > > >>> > > > >>>>>> need
> > > > >>> > > > >>>>>>> to
> > > > >>> > > > >>>>>>>> agree and capture what functionality we want to
> > > provide.
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> Initially, the idea was that we have the
> > > > ClusterDescriptor
> > > > >>> > > > >>>> describing
> > > > >>> > > > >>>>>> how
> > > > >>> > > > >>>>>>>> to talk to cluster manager like Yarn or Mesos. The
> > > > >>> > > > >>>> ClusterDescriptor
> > > > >>> > > > >>>>>> can
> > > > >>> > > > >>>>>>> be
> > > > >>> > > > >>>>>>>> used for deploying Flink clusters (job and
> session)
> > > and
> > > > >>> gives
> > > > >>> > > > >>> you a
> > > > >>> > > > >>>>>>>> ClusterClient. The ClusterClient controls the
> > cluster
> > > > >>> (e.g.
> > > > >>> > > > >>>>> submitting
> > > > >>> > > > >>>>>>>> jobs, listing all running jobs). And then there
> was
> > > the
> > > > >>> idea
> > > > >>> > to
> > > > >>> > > > >>>>>>> introduce a
> > > > >>> > > > >>>>>>>> JobClient which you obtain from the ClusterClient
> to
> > > > >>> trigger
> > > > >>> > > > >> job
> > > > >>> > > > >>>>>> specific
> > > > >>> > > > >>>>>>>> operations (e.g. taking a savepoint, cancelling
> the
> > > > job).
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> [1]
> > https://issues.apache.org/jira/browse/FLINK-4272
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> Cheers,
> > > > >>> > > > >>>>>>>> Till
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang <
> > > > >>> zjffdu@gmail.com
> > > > >>> > >
> > > > >>> > > > >>>>> wrote:
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>>> Hi Folks,
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>> I am trying to integrate flink into apache
> zeppelin
> > > > >>> which is
> > > > >>> > > > >> an
> > > > >>> > > > >>>>>>>> interactive
> > > > >>> > > > >>>>>>>>> notebook. And I hit several issues that is caused
> > by
> > > > >>> flink
> > > > >>> > > > >>> client
> > > > >>> > > > >>>>>> api.
> > > > >>> > > > >>>>>>> So
> > > > >>> > > > >>>>>>>>> I'd like to proposal the following changes for
> > flink
> > > > >>> client
> > > > >>> > > > >>> api.
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>> 1. Support nonblocking execution. Currently,
> > > > >>> > > > >>>>>>> ExecutionEnvironment#execute
> > > > >>> > > > >>>>>>>>> is a blocking method which would do 2 things,
> first
> > > > >>> submit
> > > > >>> > > > >> job
> > > > >>> > > > >>>> and
> > > > >>> > > > >>>>>> then
> > > > >>> > > > >>>>>>>>> wait for job until it is finished. I'd like
> > > introduce a
> > > > >>> > > > >>>> nonblocking
> > > > >>> > > > >>>>>>>>> execution method like ExecutionEnvironment#submit
> > > which
> > > > >>> only
> > > > >>> > > > >>>> submit
> > > > >>> > > > >>>>>> job
> > > > >>> > > > >>>>>>>> and
> > > > >>> > > > >>>>>>>>> then return jobId to client. And allow user to
> > query
> > > > the
> > > > >>> job
> > > > >>> > > > >>>> status
> > > > >>> > > > >>>>>> via
> > > > >>> > > > >>>>>>>> the
> > > > >>> > > > >>>>>>>>> jobId.
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>> 2. Add cancel api in
> > > > >>> > > > >>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > >>> > > > >>>>>>>>> currently the only way to cancel job is via cli
> > > > >>> (bin/flink),
> > > > >>> > > > >>> this
> > > > >>> > > > >>>>> is
> > > > >>> > > > >>>>>>> not
> > > > >>> > > > >>>>>>>>> convenient for downstream project to use this
> > > feature.
> > > > >>> So I'd
> > > > >>> > > > >>>> like
> > > > >>> > > > >>>>> to
> > > > >>> > > > >>>>>>> add
> > > > >>> > > > >>>>>>>>> cancel api in ExecutionEnvironment
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>> 3. Add savepoint api in
> > > > >>> > > > >>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > >>> > > > >>>>>>>> It
> > > > >>> > > > >>>>>>>>> is similar as cancel api, we should use
> > > > >>> ExecutionEnvironment
> > > > >>> > > > >> as
> > > > >>> > > > >>>> the
> > > > >>> > > > >>>>>>>> unified
> > > > >>> > > > >>>>>>>>> api for third party to integrate with flink.
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>> 4. Add listener for job execution lifecycle.
> > > Something
> > > > >>> like
> > > > >>> > > > >>>>>> following,
> > > > >>> > > > >>>>>>> so
> > > > >>> > > > >>>>>>>>> that downstream project can do custom logic in
> the
> > > > >>> lifecycle
> > > > >>> > > > >> of
> > > > >>> > > > >>>>> job.
> > > > >>> > > > >>>>>>> e.g.
> > > > >>> > > > >>>>>>>>> Zeppelin would capture the jobId after job is
> > > submitted
> > > > >>> and
> > > > >>> > > > >>> then
> > > > >>> > > > >>>>> use
> > > > >>> > > > >>>>>>> this
> > > > >>> > > > >>>>>>>>> jobId to cancel it later when necessary.
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>> public interface JobListener {
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>>   void onJobSubmitted(JobID jobId);
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>>   void onJobExecuted(JobExecutionResult
> jobResult);
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>>   void onJobCanceled(JobID jobId);
> > > > >>> > > > >>>>>>>>> }
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>> 5. Enable session in ExecutionEnvironment.
> > Currently
> > > it
> > > > >>> is
> > > > >>> > > > >>>>> disabled,
> > > > >>> > > > >>>>>>> but
> > > > >>> > > > >>>>>>>>> session is very convenient for third party to
> > > > submitting
> > > > >>> jobs
> > > > >>> > > > >>>>>>>> continually.
> > > > >>> > > > >>>>>>>>> I hope flink can enable it again.
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>> 6. Unify all flink client api into
> > > > >>> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>> This is a long term issue which needs more
> careful
> > > > >>> thinking
> > > > >>> > > > >> and
> > > > >>> > > > >>>>>> design.
> > > > >>> > > > >>>>>>>>> Currently some of features of flink is exposed in
> > > > >>> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > but
> > > > >>> some are
> > > > >>> > > > >>>>> exposed
> > > > >>> > > > >>>>>>> in
> > > > >>> > > > >>>>>>>>> cli instead of api, like the cancel and
> savepoint I
> > > > >>> mentioned
> > > > >>> > > > >>>>> above.
> > > > >>> > > > >>>>>> I
> > > > >>> > > > >>>>>>>>> think the root cause is due to that flink didn't
> > > unify
> > > > >>> the
> > > > >>> > > > >>>>>> interaction
> > > > >>> > > > >>>>>>>> with
> > > > >>> > > > >>>>>>>>> flink. Here I list 3 scenarios of flink operation
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>>   - Local job execution.  Flink will create
> > > > >>> LocalEnvironment
> > > > >>> > > > >>> and
> > > > >>> > > > >>>>>> then
> > > > >>> > > > >>>>>>>> use
> > > > >>> > > > >>>>>>>>>   this LocalEnvironment to create LocalExecutor
> for
> > > job
> > > > >>> > > > >>>> execution.
> > > > >>> > > > >>>>>>>>>   - Remote job execution. Flink will create
> > > > ClusterClient
> > > > >>> > > > >>> first
> > > > >>> > > > >>>>> and
> > > > >>> > > > >>>>>>> then
> > > > >>> > > > >>>>>>>>>   create ContextEnvironment based on the
> > > ClusterClient
> > > > >>> and
> > > > >>> > > > >>> then
> > > > >>> > > > >>>>> run
> > > > >>> > > > >>>>>>> the
> > > > >>> > > > >>>>>>>>> job.
> > > > >>> > > > >>>>>>>>>   - Job cancelation. Flink will create
> > ClusterClient
> > > > >>> first
> > > > >>> > > > >> and
> > > > >>> > > > >>>>> then
> > > > >>> > > > >>>>>>>> cancel
> > > > >>> > > > >>>>>>>>>   this job via this ClusterClient.
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>> As you can see in the above 3 scenarios. Flink
> > didn't
> > > > >>> use the
> > > > >>> > > > >>>> same
> > > > >>> > > > >>>>>>>>> approach(code path) to interact with flink
> > > > >>> > > > >>>>>>>>> What I propose is following:
> > > > >>> > > > >>>>>>>>> Create the proper
> > LocalEnvironment/RemoteEnvironment
> > > > >>> (based
> > > > >>> > > > >> on
> > > > >>> > > > >>>> user
> > > > >>> > > > >>>>>>>>> configuration) --> Use this Environment to create
> > > > proper
> > > > >>> > > > >>>>>> ClusterClient
> > > > >>> > > > >>>>>>>>> (LocalClusterClient or RestClusterClient) to
> > > > interactive
> > > > >>> with
> > > > >>> > > > >>>>> Flink (
> > > > >>> > > > >>>>>>> job
> > > > >>> > > > >>>>>>>>> execution or cancelation)
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>> This way we can unify the process of local
> > execution
> > > > and
> > > > >>> > > > >> remote
> > > > >>> > > > >>>>>>>> execution.
> > > > >>> > > > >>>>>>>>> And it is much easier for third party to
> integrate
> > > with
> > > > >>> > > > >> flink,
> > > > >>> > > > >>>>>> because
> > > > >>> > > > >>>>>>>>> ExecutionEnvironment is the unified entry point
> for
> > > > >>> flink.
> > > > >>> > > > >> What
> > > > >>> > > > >>>>> third
> > > > >>> > > > >>>>>>>> party
> > > > >>> > > > >>>>>>>>> needs to do is just pass configuration to
> > > > >>> > > > >> ExecutionEnvironment
> > > > >>> > > > >>>> and
> > > > >>> > > > >>>>>>>>> ExecutionEnvironment will do the right thing
> based
> > on
> > > > the
> > > > >>> > > > >>>>>>> configuration.
> > > > >>> > > > >>>>>>>>> Flink cli can also be considered as flink api
> > > consumer.
> > > > >>> it
> > > > >>> > > > >> just
> > > > >>> > > > >>>>> pass
> > > > >>> > > > >>>>>>> the
> > > > >>> > > > >>>>>>>>> configuration to ExecutionEnvironment and let
> > > > >>> > > > >>>> ExecutionEnvironment
> > > > >>> > > > >>>>> to
> > > > >>> > > > >>>>>>>>> create the proper ClusterClient instead of
> letting
> > > cli
> > > > to
> > > > >>> > > > >>> create
> > > > >>> > > > >>>>>>>>> ClusterClient directly.
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>> 6 would involve large code refactoring, so I
> think
> > we
> > > > can
> > > > >>> > > > >> defer
> > > > >>> > > > >>>> it
> > > > >>> > > > >>>>>> for
> > > > >>> > > > >>>>>>>>> future release, 1,2,3,4,5 could be done at once I
> > > > >>> believe.
> > > > >>> > > > >> Let
> > > > >>> > > > >>> me
> > > > >>> > > > >>>>>> know
> > > > >>> > > > >>>>>>>> your
> > > > >>> > > > >>>>>>>>> comments and feedback, thanks
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>> --
> > > > >>> > > > >>>>>>>>> Best Regards
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>> Jeff Zhang
> > > > >>> > > > >>>>>>>>>
> > > > >>> > > > >>>>>>>>
> > > > >>> > > > >>>>>>>
> > > > >>> > > > >>>>>>>
> > > > >>> > > > >>>>>>> --
> > > > >>> > > > >>>>>>> Best Regards
> > > > >>> > > > >>>>>>>
> > > > >>> > > > >>>>>>> Jeff Zhang
> > > > >>> > > > >>>>>>>
> > > > >>> > > > >>>>>>
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>> --
> > > > >>> > > > >>>>> Best Regards
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>> Jeff Zhang
> > > > >>> > > > >>>>>
> > > > >>> > > > >>>>
> > > > >>> > > > >>>
> > > > >>> > > > >>>
> > > > >>> > > > >>> --
> > > > >>> > > > >>> Best Regards
> > > > >>> > > > >>>
> > > > >>> > > > >>> Jeff Zhang
> > > > >>> > > > >>>
> > > > >>> > > > >>
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > > --
> > > > >>> > > > > Best Regards
> > > > >>> > > > >
> > > > >>> > > > > Jeff Zhang
> > > > >>> > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> > > --
> > > > >>> > > Best Regards
> > > > >>> > >
> > > > >>> > > Jeff Zhang
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > > >>
> > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Jeff Zhang
> > > >
> > >
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Jeff Zhang <zj...@gmail.com>.

Thanks Stephan, I will follow up this issue in next few weeks, and will
refine the design doc. We could discuss more details after 1.9 release.

Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:58写道：

> Hi all!
>
> This thread has stalled for a bit, which I assume ist mostly due to the
> Flink 1.9 feature freeze and release testing effort.
>
> I personally still recognize this issue as one important to be solved. I'd
> be happy to help resume this discussion soon (after the 1.9 release) and
> see if we can do some step towards this in Flink 1.10.
>
> Best,
> Stephan
>
>
>
> On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <po...@okkam.it>
> wrote:
>
> > That's exactly what I suggested a long time ago: the Flink REST client
> > should not require any Flink dependency, only http library to call the
> REST
> > services to submit and monitor a job.
> > What I suggested also in [1] was to have a way to automatically suggest
> the
> > user (via a UI) the available main classes and their required
> > parameters[2].
> > Another problem we have with Flink is that the Rest client and the CLI
> one
> > behaves differently and we use the CLI client (via ssh) because it allows
> > to call some other method after env.execute() [3] (we have to call
> another
> > REST service to signal the end of the job).
> > Int his regard, a dedicated interface, like the JobListener suggested in
> > the previous emails, would be very helpful (IMHO).
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-10864
> > [2] https://issues.apache.org/jira/browse/FLINK-10862
> > [3] https://issues.apache.org/jira/browse/FLINK-10879
> >
> > Best,
> > Flavio
> >
> > On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <zj...@gmail.com> wrote:
> >
> > > Hi, Tison,
> > >
> > > Thanks for your comments. Overall I agree with you that it is difficult
> > for
> > > down stream project to integrate with flink and we need to refactor the
> > > current flink client api.
> > > And I agree that CliFrontend should only parsing command line arguments
> > and
> > > then pass them to ExecutionEnvironment. It is ExecutionEnvironment's
> > > responsibility to compile job, create cluster, and submit job. Besides
> > > that, Currently flink has many ExecutionEnvironment implementations,
> and
> > > flink will use the specific one based on the context. IMHO, it is not
> > > necessary, ExecutionEnvironment should be able to do the right thing
> > based
> > > on the FlinkConf it is received. Too many ExecutionEnvironment
> > > implementation is another burden for downstream project integration.
> > >
> > > One thing I'd like to mention is flink's scala shell and sql client,
> > > although they are sub-modules of flink, they could be treated as
> > downstream
> > > project which use flink's client api. Currently you will find it is not
> > > easy for them to integrate with flink, they share many duplicated code.
> > It
> > > is another sign that we should refactor flink client api.
> > >
> > > I believe it is a large and hard change, and I am afraid we can not
> keep
> > > compatibility since many of changes are user facing.
> > >
> > >
> > >
> > > Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：
> > >
> > > > Hi all,
> > > >
> > > > After a closer look on our client apis, I can see there are two major
> > > > issues to consistency and integration, namely different deployment of
> > > > job cluster which couples job graph creation and cluster deployment,
> > > > and submission via CliFrontend confusing control flow of job graph
> > > > compilation and job submission. I'd like to follow the discuss above,
> > > > mainly the process described by Jeff and Stephan, and share my
> > > > ideas on these issues.
> > > >
> > > > 1) CliFrontend confuses the control flow of job compilation and
> > > submission.
> > > > Following the process of job submission Stephan and Jeff described,
> > > > execution environment knows all configs of the cluster and
> > topos/settings
> > > > of the job. Ideally, in the main method of user program, it calls
> > > #execute
> > > > (or named #submit) and Flink deploys the cluster, compile the job
> graph
> > > > and submit it to the cluster. However, current CliFrontend does all
> > these
> > > > things inside its #runProgram method, which introduces a lot of
> > > subclasses
> > > > of (stream) execution environment.
> > > >
> > > > Actually, it sets up an exec env that hijacks the
> #execute/executePlan
> > > > method, initializes the job graph and abort execution. And then
> > > > control flow back to CliFrontend, it deploys the cluster(or retrieve
> > > > the client) and submits the job graph. This is quite a specific
> > internal
> > > > process inside Flink and none of consistency to anything.
> > > >
> > > > 2) Deployment of job cluster couples job graph creation and cluster
> > > > deployment. Abstractly, from user job to a concrete submission, it
> > > requires
> > > >
> > > >      create JobGraph --\
> > > >
> > > > create ClusterClient -->  submit JobGraph
> > > >
> > > > such a dependency. ClusterClient was created by deploying or
> > retrieving.
> > > > JobGraph submission requires a compiled JobGraph and valid
> > ClusterClient,
> > > > but the creation of ClusterClient is abstractly independent of that
> of
> > > > JobGraph. However, in job cluster mode, we deploy job cluster with a
> > job
> > > > graph, which means we use another process:
> > > >
> > > > create JobGraph --> deploy cluster with the JobGraph
> > > >
> > > > Here is another inconsistency and downstream projects/client apis are
> > > > forced to handle different cases with rare supports from Flink.
> > > >
> > > > Since we likely reached a consensus on
> > > >
> > > > 1. all configs gathered by Flink configuration and passed
> > > > 2. execution environment knows all configs and handles execution(both
> > > > deployment and submission)
> > > >
> > > > to the issues above I propose eliminating inconsistencies by
> following
> > > > approach:
> > > >
> > > > 1) CliFrontend should exactly be a front end, at least for "run"
> > command.
> > > > That means it just gathered and passed all config from command line
> to
> > > > the main method of user program. Execution environment knows all the
> > info
> > > > and with an addition to utils for ClusterClient, we gracefully get a
> > > > ClusterClient by deploying or retrieving. In this way, we don't need
> to
> > > > hijack #execute/executePlan methods and can remove various hacking
> > > > subclasses of exec env, as well as #run methods in ClusterClient(for
> an
> > > > interface-ized ClusterClient). Now the control flow flows from
> > > CliFrontend
> > > > to the main method and never returns.
> > > >
> > > > 2) Job cluster means a cluster for the specific job. From another
> > > > perspective, it is an ephemeral session. We may decouple the
> deployment
> > > > with a compiled job graph, but start a session with idle timeout
> > > > and submit the job following.
> > > >
> > > > These topics, before we go into more details on design or
> > implementation,
> > > > are better to be aware and discussed for a consensus.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
> > > >
> > > >> Hi Jeff,
> > > >>
> > > >> Thanks for raising this thread and the design document!
> > > >>
> > > >> As @Thomas Weise mentioned above, extending config to flink
> > > >> requires far more effort than it should be. Another example
> > > >> is we achieve detach mode by introduce another execution
> > > >> environment which also hijack #execute method.
> > > >>
> > > >> I agree with your idea that user would configure all things
> > > >> and flink "just" respect it. On this topic I think the unusual
> > > >> control flow when CliFrontend handle "run" command is the problem.
> > > >> It handles several configs, mainly about cluster settings, and
> > > >> thus main method of user program is unaware of them. Also it
> compiles
> > > >> app to job graph by run the main method with a hijacked exec env,
> > > >> which constrain the main method further.
> > > >>
> > > >> I'd like to write down a few of notes on configs/args pass and
> > respect,
> > > >> as well as decoupling job compilation and submission. Share on this
> > > >> thread later.
> > > >>
> > > >> Best,
> > > >> tison.
> > > >>
> > > >>
> > > >> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一 下午7:29写道：
> > > >>
> > > >>> Hi Jeff and Flavio,
> > > >>>
> > > >>> Thanks Jeff a lot for proposing the design document.
> > > >>>
> > > >>> We are also working on refactoring ClusterClient to allow flexible
> > and
> > > >>> efficient job management in our real-time platform.
> > > >>> We would like to draft a document to share our ideas with you.
> > > >>>
> > > >>> I think it's a good idea to have something like Apache Livy for
> > Flink,
> > > >>> and
> > > >>> the efforts discussed here will take a great step forward to it.
> > > >>>
> > > >>> Regards,
> > > >>> Xiaogang
> > > >>>
> > > >>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一 下午7:13写道：
> > > >>>
> > > >>> > Is there any possibility to have something like Apache Livy [1]
> > also
> > > >>> for
> > > >>> > Flink in the future?
> > > >>> >
> > > >>> > [1] https://livy.apache.org/
> > > >>> >
> > > >>> > On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <zj...@gmail.com>
> > wrote:
> > > >>> >
> > > >>> > > >>>  Any API we expose should not have dependencies on the
> > runtime
> > > >>> > > (flink-runtime) package or other implementation details. To me,
> > > this
> > > >>> > means
> > > >>> > > that the current ClusterClient cannot be exposed to users
> because
> > > it
> > > >>> >  uses
> > > >>> > > quite some classes from the optimiser and runtime packages.
> > > >>> > >
> > > >>> > > We should change ClusterClient from class to interface.
> > > >>> > > ExecutionEnvironment only use the interface ClusterClient which
> > > >>> should be
> > > >>> > > in flink-clients while the concrete implementation class could
> be
> > > in
> > > >>> > > flink-runtime.
> > > >>> > >
> > > >>> > > >>> What happens when a failure/restart in the client happens?
> > > There
> > > >>> need
> > > >>> > > to be a way of re-establishing the connection to the job, set
> up
> > > the
> > > >>> > > listeners again, etc.
> > > >>> > >
> > > >>> > > Good point.  First we need to define what does failure/restart
> in
> > > the
> > > >>> > > client mean. IIUC, that usually mean network failure which will
> > > >>> happen in
> > > >>> > > class RestClient. If my understanding is correct, restart/retry
> > > >>> mechanism
> > > >>> > > should be done in RestClient.
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> > > Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二
> 下午11:10写道：
> > > >>> > >
> > > >>> > > > Some points to consider:
> > > >>> > > >
> > > >>> > > > * Any API we expose should not have dependencies on the
> runtime
> > > >>> > > > (flink-runtime) package or other implementation details. To
> me,
> > > >>> this
> > > >>> > > means
> > > >>> > > > that the current ClusterClient cannot be exposed to users
> > because
> > > >>> it
> > > >>> > >  uses
> > > >>> > > > quite some classes from the optimiser and runtime packages.
> > > >>> > > >
> > > >>> > > > * What happens when a failure/restart in the client happens?
> > > There
> > > >>> need
> > > >>> > > to
> > > >>> > > > be a way of re-establishing the connection to the job, set up
> > the
> > > >>> > > listeners
> > > >>> > > > again, etc.
> > > >>> > > >
> > > >>> > > > Aljoscha
> > > >>> > > >
> > > >>> > > > > On 29. May 2019, at 10:17, Jeff Zhang <zj...@gmail.com>
> > > wrote:
> > > >>> > > > >
> > > >>> > > > > Sorry folks, the design doc is late as you expected. Here's
> > the
> > > >>> > design
> > > >>> > > > doc
> > > >>> > > > > I drafted, welcome any comments and feedback.
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午8:43写道：
> > > >>> > > > >
> > > >>> > > > >> Nice that this discussion is happening.
> > > >>> > > > >>
> > > >>> > > > >> In the FLIP, we could also revisit the entire role of the
> > > >>> > environments
> > > >>> > > > >> again.
> > > >>> > > > >>
> > > >>> > > > >> Initially, the idea was:
> > > >>> > > > >>  - the environments take care of the specific setup for
> > > >>> standalone
> > > >>> > (no
> > > >>> > > > >> setup needed), yarn, mesos, etc.
> > > >>> > > > >>  - the session ones have control over the session. The
> > > >>> environment
> > > >>> > > holds
> > > >>> > > > >> the session client.
> > > >>> > > > >>  - running a job gives a "control" object for that job.
> That
> > > >>> > behavior
> > > >>> > > is
> > > >>> > > > >> the same in all environments.
> > > >>> > > > >>
> > > >>> > > > >> The actual implementation diverged quite a bit from that.
> > > Happy
> > > >>> to
> > > >>> > > see a
> > > >>> > > > >> discussion about straitening this out a bit more.
> > > >>> > > > >>
> > > >>> > > > >>
> > > >>> > > > >> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
> > zjffdu@gmail.com>
> > > >>> > wrote:
> > > >>> > > > >>
> > > >>> > > > >>> Hi folks,
> > > >>> > > > >>>
> > > >>> > > > >>> Sorry for late response, It seems we reach consensus on
> > > this, I
> > > >>> > will
> > > >>> > > > >> create
> > > >>> > > > >>> FLIP for this with more detailed design
> > > >>> > > > >>>
> > > >>> > > > >>>
> > > >>> > > > >>> Thomas Weise <th...@apache.org> 于2018年12月21日周五 上午11:43写道：
> > > >>> > > > >>>
> > > >>> > > > >>>> Great to see this discussion seeded! The problems you
> face
> > > >>> with
> > > >>> > the
> > > >>> > > > >>>> Zeppelin integration are also affecting other downstream
> > > >>> projects,
> > > >>> > > > like
> > > >>> > > > >>>> Beam.
> > > >>> > > > >>>>
> > > >>> > > > >>>> We just enabled the savepoint restore option in
> > > >>> > > > RemoteStreamEnvironment
> > > >>> > > > >>> [1]
> > > >>> > > > >>>> and that was more difficult than it should be. The main
> > > issue
> > > >>> is
> > > >>> > > that
> > > >>> > > > >>>> environment and cluster client aren't decoupled. Ideally
> > it
> > > >>> should
> > > >>> > > be
> > > >>> > > > >>>> possible to just get the matching cluster client from
> the
> > > >>> > > environment
> > > >>> > > > >> and
> > > >>> > > > >>>> then control the job through it (environment as factory
> > for
> > > >>> > cluster
> > > >>> > > > >>>> client). But note that the environment classes are part
> of
> > > the
> > > >>> > > public
> > > >>> > > > >>> API,
> > > >>> > > > >>>> and it is not straightforward to make larger changes
> > without
> > > >>> > > breaking
> > > >>> > > > >>>> backward compatibility.
> > > >>> > > > >>>>
> > > >>> > > > >>>> ClusterClient currently exposes internal classes like
> > > >>> JobGraph and
> > > >>> > > > >>>> StreamGraph. But it should be possible to wrap this
> with a
> > > new
> > > >>> > > public
> > > >>> > > > >> API
> > > >>> > > > >>>> that brings the required job control capabilities for
> > > >>> downstream
> > > >>> > > > >>> projects.
> > > >>> > > > >>>> Perhaps it is helpful to look at some of the interfaces
> in
> > > >>> Beam
> > > >>> > > while
> > > >>> > > > >>>> thinking about this: [2] for the portable job API and
> [3]
> > > for
> > > >>> the
> > > >>> > > old
> > > >>> > > > >>>> asynchronous job control from the Beam Java SDK.
> > > >>> > > > >>>>
> > > >>> > > > >>>> The backward compatibility discussion [4] is also
> relevant
> > > >>> here. A
> > > >>> > > new
> > > >>> > > > >>> API
> > > >>> > > > >>>> should shield downstream projects from internals and
> allow
> > > >>> them to
> > > >>> > > > >>>> interoperate with multiple future Flink versions in the
> > same
> > > >>> > release
> > > >>> > > > >> line
> > > >>> > > > >>>> without forced upgrades.
> > > >>> > > > >>>>
> > > >>> > > > >>>> Thanks,
> > > >>> > > > >>>> Thomas
> > > >>> > > > >>>>
> > > >>> > > > >>>> [1] https://github.com/apache/flink/pull/7249
> > > >>> > > > >>>> [2]
> > > >>> > > > >>>>
> > > >>> > > > >>>>
> > > >>> > > > >>>
> > > >>> > > > >>
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > >>> > > > >>>> [3]
> > > >>> > > > >>>>
> > > >>> > > > >>>>
> > > >>> > > > >>>
> > > >>> > > > >>
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > >>> > > > >>>> [4]
> > > >>> > > > >>>>
> > > >>> > > > >>>>
> > > >>> > > > >>>
> > > >>> > > > >>
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > >>> > > > >>>>
> > > >>> > > > >>>>
> > > >>> > > > >>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
> > > zjffdu@gmail.com>
> > > >>> > > wrote:
> > > >>> > > > >>>>
> > > >>> > > > >>>>>>>> I'm not so sure whether the user should be able to
> > > define
> > > >>> > where
> > > >>> > > > >> the
> > > >>> > > > >>>> job
> > > >>> > > > >>>>> runs (in your example Yarn). This is actually
> independent
> > > of
> > > >>> the
> > > >>> > > job
> > > >>> > > > >>>>> development and is something which is decided at
> > deployment
> > > >>> time.
> > > >>> > > > >>>>>
> > > >>> > > > >>>>> User don't need to specify execution mode
> > programmatically.
> > > >>> They
> > > >>> > > can
> > > >>> > > > >>> also
> > > >>> > > > >>>>> pass the execution mode from the arguments in flink run
> > > >>> command.
> > > >>> > > e.g.
> > > >>> > > > >>>>>
> > > >>> > > > >>>>> bin/flink run -m yarn-cluster ....
> > > >>> > > > >>>>> bin/flink run -m local ...
> > > >>> > > > >>>>> bin/flink run -m host:port ...
> > > >>> > > > >>>>>
> > > >>> > > > >>>>> Does this make sense to you ?
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>>>> To me it makes sense that the ExecutionEnvironment
> is
> > > not
> > > >>> > > > >> directly
> > > >>> > > > >>>>> initialized by the user and instead context sensitive
> how
> > > you
> > > >>> > want
> > > >>> > > to
> > > >>> > > > >>>>> execute your job (Flink CLI vs. IDE, for example).
> > > >>> > > > >>>>>
> > > >>> > > > >>>>> Right, currently I notice Flink would create different
> > > >>> > > > >>>>> ContextExecutionEnvironment based on different
> submission
> > > >>> > scenarios
> > > >>> > > > >>>> (Flink
> > > >>> > > > >>>>> Cli vs IDE). To me this is kind of hack approach, not
> so
> > > >>> > > > >>> straightforward.
> > > >>> > > > >>>>> What I suggested above is that is that flink should
> > always
> > > >>> create
> > > >>> > > the
> > > >>> > > > >>>> same
> > > >>> > > > >>>>> ExecutionEnvironment but with different configuration,
> > and
> > > >>> based
> > > >>> > on
> > > >>> > > > >> the
> > > >>> > > > >>>>> configuration it would create the proper ClusterClient
> > for
> > > >>> > > different
> > > >>> > > > >>>>> behaviors.
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>> Till Rohrmann <tr...@apache.org> 于2018年12月20日周四
> > > >>> 下午11:18写道：
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>> You are probably right that we have code duplication
> > when
> > > it
> > > >>> > comes
> > > >>> > > > >> to
> > > >>> > > > >>>> the
> > > >>> > > > >>>>>> creation of the ClusterClient. This should be reduced
> in
> > > the
> > > >>> > > > >> future.
> > > >>> > > > >>>>>>
> > > >>> > > > >>>>>> I'm not so sure whether the user should be able to
> > define
> > > >>> where
> > > >>> > > the
> > > >>> > > > >>> job
> > > >>> > > > >>>>>> runs (in your example Yarn). This is actually
> > independent
> > > >>> of the
> > > >>> > > > >> job
> > > >>> > > > >>>>>> development and is something which is decided at
> > > deployment
> > > >>> > time.
> > > >>> > > > >> To
> > > >>> > > > >>> me
> > > >>> > > > >>>>> it
> > > >>> > > > >>>>>> makes sense that the ExecutionEnvironment is not
> > directly
> > > >>> > > > >> initialized
> > > >>> > > > >>>> by
> > > >>> > > > >>>>>> the user and instead context sensitive how you want to
> > > >>> execute
> > > >>> > > your
> > > >>> > > > >>> job
> > > >>> > > > >>>>>> (Flink CLI vs. IDE, for example). However, I agree
> that
> > > the
> > > >>> > > > >>>>>> ExecutionEnvironment should give you access to the
> > > >>> ClusterClient
> > > >>> > > > >> and
> > > >>> > > > >>> to
> > > >>> > > > >>>>> the
> > > >>> > > > >>>>>> job (maybe in the form of the JobGraph or a job plan).
> > > >>> > > > >>>>>>
> > > >>> > > > >>>>>> Cheers,
> > > >>> > > > >>>>>> Till
> > > >>> > > > >>>>>>
> > > >>> > > > >>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
> > > >>> zjffdu@gmail.com>
> > > >>> > > > >> wrote:
> > > >>> > > > >>>>>>
> > > >>> > > > >>>>>>> Hi Till,
> > > >>> > > > >>>>>>> Thanks for the feedback. You are right that I expect
> > > better
> > > >>> > > > >>>>> programmatic
> > > >>> > > > >>>>>>> job submission/control api which could be used by
> > > >>> downstream
> > > >>> > > > >>> project.
> > > >>> > > > >>>>> And
> > > >>> > > > >>>>>>> it would benefit for the flink ecosystem. When I look
> > at
> > > >>> the
> > > >>> > code
> > > >>> > > > >>> of
> > > >>> > > > >>>>>> flink
> > > >>> > > > >>>>>>> scala-shell and sql-client (I believe they are not
> the
> > > >>> core of
> > > >>> > > > >>> flink,
> > > >>> > > > >>>>> but
> > > >>> > > > >>>>>>> belong to the ecosystem of flink), I find many
> > duplicated
> > > >>> code
> > > >>> > > > >> for
> > > >>> > > > >>>>>> creating
> > > >>> > > > >>>>>>> ClusterClient from user provided configuration
> > > >>> (configuration
> > > >>> > > > >>> format
> > > >>> > > > >>>>> may
> > > >>> > > > >>>>>> be
> > > >>> > > > >>>>>>> different from scala-shell and sql-client) and then
> use
> > > >>> that
> > > >>> > > > >>>>>> ClusterClient
> > > >>> > > > >>>>>>> to manipulate jobs. I don't think this is convenient
> > for
> > > >>> > > > >> downstream
> > > >>> > > > >>>>>>> projects. What I expect is that downstream project
> only
> > > >>> needs
> > > >>> > to
> > > >>> > > > >>>>> provide
> > > >>> > > > >>>>>>> necessary configuration info (maybe introducing class
> > > >>> > FlinkConf),
> > > >>> > > > >>> and
> > > >>> > > > >>>>>> then
> > > >>> > > > >>>>>>> build ExecutionEnvironment based on this FlinkConf,
> and
> > > >>> > > > >>>>>>> ExecutionEnvironment will create the proper
> > > ClusterClient.
> > > >>> It
> > > >>> > not
> > > >>> > > > >>>> only
> > > >>> > > > >>>>>>> benefit for the downstream project development but
> also
> > > be
> > > >>> > > > >> helpful
> > > >>> > > > >>>> for
> > > >>> > > > >>>>>>> their integration test with flink. Here's one sample
> > code
> > > >>> > snippet
> > > >>> > > > >>>> that
> > > >>> > > > >>>>> I
> > > >>> > > > >>>>>>> expect.
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>> val conf = new FlinkConf().mode("yarn")
> > > >>> > > > >>>>>>> val env = new ExecutionEnvironment(conf)
> > > >>> > > > >>>>>>> val jobId = env.submit(...)
> > > >>> > > > >>>>>>> val jobStatus =
> > > >>> env.getClusterClient().queryJobStatus(jobId)
> > > >>> > > > >>>>>>> env.getClusterClient().cancelJob(jobId)
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>> What do you think ?
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>> Till Rohrmann <tr...@apache.org> 于2018年12月11日周二
> > > >>> 下午6:28写道：
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>>> Hi Jeff,
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>> what you are proposing is to provide the user with
> > > better
> > > >>> > > > >>>>> programmatic
> > > >>> > > > >>>>>>> job
> > > >>> > > > >>>>>>>> control. There was actually an effort to achieve
> this
> > > but
> > > >>> it
> > > >>> > > > >> has
> > > >>> > > > >>>>> never
> > > >>> > > > >>>>>>> been
> > > >>> > > > >>>>>>>> completed [1]. However, there are some improvement
> in
> > > the
> > > >>> code
> > > >>> > > > >>> base
> > > >>> > > > >>>>>> now.
> > > >>> > > > >>>>>>>> Look for example at the NewClusterClient interface
> > which
> > > >>> > > > >> offers a
> > > >>> > > > >>>>>>>> non-blocking job submission. But I agree that we
> need
> > to
> > > >>> > > > >> improve
> > > >>> > > > >>>>> Flink
> > > >>> > > > >>>>>> in
> > > >>> > > > >>>>>>>> this regard.
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>> I would not be in favour if exposing all
> ClusterClient
> > > >>> calls
> > > >>> > > > >> via
> > > >>> > > > >>>> the
> > > >>> > > > >>>>>>>> ExecutionEnvironment because it would clutter the
> > class
> > > >>> and
> > > >>> > > > >> would
> > > >>> > > > >>>> not
> > > >>> > > > >>>>>> be
> > > >>> > > > >>>>>>> a
> > > >>> > > > >>>>>>>> good separation of concerns. Instead one idea could
> be
> > > to
> > > >>> > > > >>> retrieve
> > > >>> > > > >>>>> the
> > > >>> > > > >>>>>>>> current ClusterClient from the ExecutionEnvironment
> > > which
> > > >>> can
> > > >>> > > > >>> then
> > > >>> > > > >>>> be
> > > >>> > > > >>>>>>> used
> > > >>> > > > >>>>>>>> for cluster and job control. But before we start an
> > > effort
> > > >>> > > > >> here,
> > > >>> > > > >>> we
> > > >>> > > > >>>>>> need
> > > >>> > > > >>>>>>> to
> > > >>> > > > >>>>>>>> agree and capture what functionality we want to
> > provide.
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>> Initially, the idea was that we have the
> > > ClusterDescriptor
> > > >>> > > > >>>> describing
> > > >>> > > > >>>>>> how
> > > >>> > > > >>>>>>>> to talk to cluster manager like Yarn or Mesos. The
> > > >>> > > > >>>> ClusterDescriptor
> > > >>> > > > >>>>>> can
> > > >>> > > > >>>>>>> be
> > > >>> > > > >>>>>>>> used for deploying Flink clusters (job and session)
> > and
> > > >>> gives
> > > >>> > > > >>> you a
> > > >>> > > > >>>>>>>> ClusterClient. The ClusterClient controls the
> cluster
> > > >>> (e.g.
> > > >>> > > > >>>>> submitting
> > > >>> > > > >>>>>>>> jobs, listing all running jobs). And then there was
> > the
> > > >>> idea
> > > >>> > to
> > > >>> > > > >>>>>>> introduce a
> > > >>> > > > >>>>>>>> JobClient which you obtain from the ClusterClient to
> > > >>> trigger
> > > >>> > > > >> job
> > > >>> > > > >>>>>> specific
> > > >>> > > > >>>>>>>> operations (e.g. taking a savepoint, cancelling the
> > > job).
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>> [1]
> https://issues.apache.org/jira/browse/FLINK-4272
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>> Cheers,
> > > >>> > > > >>>>>>>> Till
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang <
> > > >>> zjffdu@gmail.com
> > > >>> > >
> > > >>> > > > >>>>> wrote:
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>>> Hi Folks,
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> I am trying to integrate flink into apache zeppelin
> > > >>> which is
> > > >>> > > > >> an
> > > >>> > > > >>>>>>>> interactive
> > > >>> > > > >>>>>>>>> notebook. And I hit several issues that is caused
> by
> > > >>> flink
> > > >>> > > > >>> client
> > > >>> > > > >>>>>> api.
> > > >>> > > > >>>>>>> So
> > > >>> > > > >>>>>>>>> I'd like to proposal the following changes for
> flink
> > > >>> client
> > > >>> > > > >>> api.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> 1. Support nonblocking execution. Currently,
> > > >>> > > > >>>>>>> ExecutionEnvironment#execute
> > > >>> > > > >>>>>>>>> is a blocking method which would do 2 things, first
> > > >>> submit
> > > >>> > > > >> job
> > > >>> > > > >>>> and
> > > >>> > > > >>>>>> then
> > > >>> > > > >>>>>>>>> wait for job until it is finished. I'd like
> > introduce a
> > > >>> > > > >>>> nonblocking
> > > >>> > > > >>>>>>>>> execution method like ExecutionEnvironment#submit
> > which
> > > >>> only
> > > >>> > > > >>>> submit
> > > >>> > > > >>>>>> job
> > > >>> > > > >>>>>>>> and
> > > >>> > > > >>>>>>>>> then return jobId to client. And allow user to
> query
> > > the
> > > >>> job
> > > >>> > > > >>>> status
> > > >>> > > > >>>>>> via
> > > >>> > > > >>>>>>>> the
> > > >>> > > > >>>>>>>>> jobId.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> 2. Add cancel api in
> > > >>> > > > >>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > > >>> > > > >>>>>>>>> currently the only way to cancel job is via cli
> > > >>> (bin/flink),
> > > >>> > > > >>> this
> > > >>> > > > >>>>> is
> > > >>> > > > >>>>>>> not
> > > >>> > > > >>>>>>>>> convenient for downstream project to use this
> > feature.
> > > >>> So I'd
> > > >>> > > > >>>> like
> > > >>> > > > >>>>> to
> > > >>> > > > >>>>>>> add
> > > >>> > > > >>>>>>>>> cancel api in ExecutionEnvironment
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> 3. Add savepoint api in
> > > >>> > > > >>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > >>> > > > >>>>>>>> It
> > > >>> > > > >>>>>>>>> is similar as cancel api, we should use
> > > >>> ExecutionEnvironment
> > > >>> > > > >> as
> > > >>> > > > >>>> the
> > > >>> > > > >>>>>>>> unified
> > > >>> > > > >>>>>>>>> api for third party to integrate with flink.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> 4. Add listener for job execution lifecycle.
> > Something
> > > >>> like
> > > >>> > > > >>>>>> following,
> > > >>> > > > >>>>>>> so
> > > >>> > > > >>>>>>>>> that downstream project can do custom logic in the
> > > >>> lifecycle
> > > >>> > > > >> of
> > > >>> > > > >>>>> job.
> > > >>> > > > >>>>>>> e.g.
> > > >>> > > > >>>>>>>>> Zeppelin would capture the jobId after job is
> > submitted
> > > >>> and
> > > >>> > > > >>> then
> > > >>> > > > >>>>> use
> > > >>> > > > >>>>>>> this
> > > >>> > > > >>>>>>>>> jobId to cancel it later when necessary.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> public interface JobListener {
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>>   void onJobSubmitted(JobID jobId);
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>>   void onJobExecuted(JobExecutionResult jobResult);
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>>   void onJobCanceled(JobID jobId);
> > > >>> > > > >>>>>>>>> }
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> 5. Enable session in ExecutionEnvironment.
> Currently
> > it
> > > >>> is
> > > >>> > > > >>>>> disabled,
> > > >>> > > > >>>>>>> but
> > > >>> > > > >>>>>>>>> session is very convenient for third party to
> > > submitting
> > > >>> jobs
> > > >>> > > > >>>>>>>> continually.
> > > >>> > > > >>>>>>>>> I hope flink can enable it again.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> 6. Unify all flink client api into
> > > >>> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> This is a long term issue which needs more careful
> > > >>> thinking
> > > >>> > > > >> and
> > > >>> > > > >>>>>> design.
> > > >>> > > > >>>>>>>>> Currently some of features of flink is exposed in
> > > >>> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> but
> > > >>> some are
> > > >>> > > > >>>>> exposed
> > > >>> > > > >>>>>>> in
> > > >>> > > > >>>>>>>>> cli instead of api, like the cancel and savepoint I
> > > >>> mentioned
> > > >>> > > > >>>>> above.
> > > >>> > > > >>>>>> I
> > > >>> > > > >>>>>>>>> think the root cause is due to that flink didn't
> > unify
> > > >>> the
> > > >>> > > > >>>>>> interaction
> > > >>> > > > >>>>>>>> with
> > > >>> > > > >>>>>>>>> flink. Here I list 3 scenarios of flink operation
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>>   - Local job execution.  Flink will create
> > > >>> LocalEnvironment
> > > >>> > > > >>> and
> > > >>> > > > >>>>>> then
> > > >>> > > > >>>>>>>> use
> > > >>> > > > >>>>>>>>>   this LocalEnvironment to create LocalExecutor for
> > job
> > > >>> > > > >>>> execution.
> > > >>> > > > >>>>>>>>>   - Remote job execution. Flink will create
> > > ClusterClient
> > > >>> > > > >>> first
> > > >>> > > > >>>>> and
> > > >>> > > > >>>>>>> then
> > > >>> > > > >>>>>>>>>   create ContextEnvironment based on the
> > ClusterClient
> > > >>> and
> > > >>> > > > >>> then
> > > >>> > > > >>>>> run
> > > >>> > > > >>>>>>> the
> > > >>> > > > >>>>>>>>> job.
> > > >>> > > > >>>>>>>>>   - Job cancelation. Flink will create
> ClusterClient
> > > >>> first
> > > >>> > > > >> and
> > > >>> > > > >>>>> then
> > > >>> > > > >>>>>>>> cancel
> > > >>> > > > >>>>>>>>>   this job via this ClusterClient.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> As you can see in the above 3 scenarios. Flink
> didn't
> > > >>> use the
> > > >>> > > > >>>> same
> > > >>> > > > >>>>>>>>> approach(code path) to interact with flink
> > > >>> > > > >>>>>>>>> What I propose is following:
> > > >>> > > > >>>>>>>>> Create the proper
> LocalEnvironment/RemoteEnvironment
> > > >>> (based
> > > >>> > > > >> on
> > > >>> > > > >>>> user
> > > >>> > > > >>>>>>>>> configuration) --> Use this Environment to create
> > > proper
> > > >>> > > > >>>>>> ClusterClient
> > > >>> > > > >>>>>>>>> (LocalClusterClient or RestClusterClient) to
> > > interactive
> > > >>> with
> > > >>> > > > >>>>> Flink (
> > > >>> > > > >>>>>>> job
> > > >>> > > > >>>>>>>>> execution or cancelation)
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> This way we can unify the process of local
> execution
> > > and
> > > >>> > > > >> remote
> > > >>> > > > >>>>>>>> execution.
> > > >>> > > > >>>>>>>>> And it is much easier for third party to integrate
> > with
> > > >>> > > > >> flink,
> > > >>> > > > >>>>>> because
> > > >>> > > > >>>>>>>>> ExecutionEnvironment is the unified entry point for
> > > >>> flink.
> > > >>> > > > >> What
> > > >>> > > > >>>>> third
> > > >>> > > > >>>>>>>> party
> > > >>> > > > >>>>>>>>> needs to do is just pass configuration to
> > > >>> > > > >> ExecutionEnvironment
> > > >>> > > > >>>> and
> > > >>> > > > >>>>>>>>> ExecutionEnvironment will do the right thing based
> on
> > > the
> > > >>> > > > >>>>>>> configuration.
> > > >>> > > > >>>>>>>>> Flink cli can also be considered as flink api
> > consumer.
> > > >>> it
> > > >>> > > > >> just
> > > >>> > > > >>>>> pass
> > > >>> > > > >>>>>>> the
> > > >>> > > > >>>>>>>>> configuration to ExecutionEnvironment and let
> > > >>> > > > >>>> ExecutionEnvironment
> > > >>> > > > >>>>> to
> > > >>> > > > >>>>>>>>> create the proper ClusterClient instead of letting
> > cli
> > > to
> > > >>> > > > >>> create
> > > >>> > > > >>>>>>>>> ClusterClient directly.
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> 6 would involve large code refactoring, so I think
> we
> > > can
> > > >>> > > > >> defer
> > > >>> > > > >>>> it
> > > >>> > > > >>>>>> for
> > > >>> > > > >>>>>>>>> future release, 1,2,3,4,5 could be done at once I
> > > >>> believe.
> > > >>> > > > >> Let
> > > >>> > > > >>> me
> > > >>> > > > >>>>>> know
> > > >>> > > > >>>>>>>> your
> > > >>> > > > >>>>>>>>> comments and feedback, thanks
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> --
> > > >>> > > > >>>>>>>>> Best Regards
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>> Jeff Zhang
> > > >>> > > > >>>>>>>>>
> > > >>> > > > >>>>>>>>
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>> --
> > > >>> > > > >>>>>>> Best Regards
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>> Jeff Zhang
> > > >>> > > > >>>>>>>
> > > >>> > > > >>>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>>
> > > >>> > > > >>>>> --
> > > >>> > > > >>>>> Best Regards
> > > >>> > > > >>>>>
> > > >>> > > > >>>>> Jeff Zhang
> > > >>> > > > >>>>>
> > > >>> > > > >>>>
> > > >>> > > > >>>
> > > >>> > > > >>>
> > > >>> > > > >>> --
> > > >>> > > > >>> Best Regards
> > > >>> > > > >>>
> > > >>> > > > >>> Jeff Zhang
> > > >>> > > > >>>
> > > >>> > > > >>
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > --
> > > >>> > > > > Best Regards
> > > >>> > > > >
> > > >>> > > > > Jeff Zhang
> > > >>> > > >
> > > >>> > > >
> > > >>> > >
> > > >>> > > --
> > > >>> > > Best Regards
> > > >>> > >
> > > >>> > > Jeff Zhang
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> > >
> >
>


-- 
Best Regards

Jeff Zhang

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Stephan Ewen <se...@apache.org>.

Hi all!

This thread has stalled for a bit, which I assume ist mostly due to the
Flink 1.9 feature freeze and release testing effort.

I personally still recognize this issue as one important to be solved. I'd
be happy to help resume this discussion soon (after the 1.9 release) and
see if we can do some step towards this in Flink 1.10.

Best,
Stephan



On Mon, Jun 24, 2019 at 10:41 AM Flavio Pompermaier <po...@okkam.it>
wrote:

> That's exactly what I suggested a long time ago: the Flink REST client
> should not require any Flink dependency, only http library to call the REST
> services to submit and monitor a job.
> What I suggested also in [1] was to have a way to automatically suggest the
> user (via a UI) the available main classes and their required
> parameters[2].
> Another problem we have with Flink is that the Rest client and the CLI one
> behaves differently and we use the CLI client (via ssh) because it allows
> to call some other method after env.execute() [3] (we have to call another
> REST service to signal the end of the job).
> Int his regard, a dedicated interface, like the JobListener suggested in
> the previous emails, would be very helpful (IMHO).
>
> [1] https://issues.apache.org/jira/browse/FLINK-10864
> [2] https://issues.apache.org/jira/browse/FLINK-10862
> [3] https://issues.apache.org/jira/browse/FLINK-10879
>
> Best,
> Flavio
>
> On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <zj...@gmail.com> wrote:
>
> > Hi, Tison,
> >
> > Thanks for your comments. Overall I agree with you that it is difficult
> for
> > down stream project to integrate with flink and we need to refactor the
> > current flink client api.
> > And I agree that CliFrontend should only parsing command line arguments
> and
> > then pass them to ExecutionEnvironment. It is ExecutionEnvironment's
> > responsibility to compile job, create cluster, and submit job. Besides
> > that, Currently flink has many ExecutionEnvironment implementations, and
> > flink will use the specific one based on the context. IMHO, it is not
> > necessary, ExecutionEnvironment should be able to do the right thing
> based
> > on the FlinkConf it is received. Too many ExecutionEnvironment
> > implementation is another burden for downstream project integration.
> >
> > One thing I'd like to mention is flink's scala shell and sql client,
> > although they are sub-modules of flink, they could be treated as
> downstream
> > project which use flink's client api. Currently you will find it is not
> > easy for them to integrate with flink, they share many duplicated code.
> It
> > is another sign that we should refactor flink client api.
> >
> > I believe it is a large and hard change, and I am afraid we can not keep
> > compatibility since many of changes are user facing.
> >
> >
> >
> > Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：
> >
> > > Hi all,
> > >
> > > After a closer look on our client apis, I can see there are two major
> > > issues to consistency and integration, namely different deployment of
> > > job cluster which couples job graph creation and cluster deployment,
> > > and submission via CliFrontend confusing control flow of job graph
> > > compilation and job submission. I'd like to follow the discuss above,
> > > mainly the process described by Jeff and Stephan, and share my
> > > ideas on these issues.
> > >
> > > 1) CliFrontend confuses the control flow of job compilation and
> > submission.
> > > Following the process of job submission Stephan and Jeff described,
> > > execution environment knows all configs of the cluster and
> topos/settings
> > > of the job. Ideally, in the main method of user program, it calls
> > #execute
> > > (or named #submit) and Flink deploys the cluster, compile the job graph
> > > and submit it to the cluster. However, current CliFrontend does all
> these
> > > things inside its #runProgram method, which introduces a lot of
> > subclasses
> > > of (stream) execution environment.
> > >
> > > Actually, it sets up an exec env that hijacks the #execute/executePlan
> > > method, initializes the job graph and abort execution. And then
> > > control flow back to CliFrontend, it deploys the cluster(or retrieve
> > > the client) and submits the job graph. This is quite a specific
> internal
> > > process inside Flink and none of consistency to anything.
> > >
> > > 2) Deployment of job cluster couples job graph creation and cluster
> > > deployment. Abstractly, from user job to a concrete submission, it
> > requires
> > >
> > >      create JobGraph --\
> > >
> > > create ClusterClient -->  submit JobGraph
> > >
> > > such a dependency. ClusterClient was created by deploying or
> retrieving.
> > > JobGraph submission requires a compiled JobGraph and valid
> ClusterClient,
> > > but the creation of ClusterClient is abstractly independent of that of
> > > JobGraph. However, in job cluster mode, we deploy job cluster with a
> job
> > > graph, which means we use another process:
> > >
> > > create JobGraph --> deploy cluster with the JobGraph
> > >
> > > Here is another inconsistency and downstream projects/client apis are
> > > forced to handle different cases with rare supports from Flink.
> > >
> > > Since we likely reached a consensus on
> > >
> > > 1. all configs gathered by Flink configuration and passed
> > > 2. execution environment knows all configs and handles execution(both
> > > deployment and submission)
> > >
> > > to the issues above I propose eliminating inconsistencies by following
> > > approach:
> > >
> > > 1) CliFrontend should exactly be a front end, at least for "run"
> command.
> > > That means it just gathered and passed all config from command line to
> > > the main method of user program. Execution environment knows all the
> info
> > > and with an addition to utils for ClusterClient, we gracefully get a
> > > ClusterClient by deploying or retrieving. In this way, we don't need to
> > > hijack #execute/executePlan methods and can remove various hacking
> > > subclasses of exec env, as well as #run methods in ClusterClient(for an
> > > interface-ized ClusterClient). Now the control flow flows from
> > CliFrontend
> > > to the main method and never returns.
> > >
> > > 2) Job cluster means a cluster for the specific job. From another
> > > perspective, it is an ephemeral session. We may decouple the deployment
> > > with a compiled job graph, but start a session with idle timeout
> > > and submit the job following.
> > >
> > > These topics, before we go into more details on design or
> implementation,
> > > are better to be aware and discussed for a consensus.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
> > >
> > >> Hi Jeff,
> > >>
> > >> Thanks for raising this thread and the design document!
> > >>
> > >> As @Thomas Weise mentioned above, extending config to flink
> > >> requires far more effort than it should be. Another example
> > >> is we achieve detach mode by introduce another execution
> > >> environment which also hijack #execute method.
> > >>
> > >> I agree with your idea that user would configure all things
> > >> and flink "just" respect it. On this topic I think the unusual
> > >> control flow when CliFrontend handle "run" command is the problem.
> > >> It handles several configs, mainly about cluster settings, and
> > >> thus main method of user program is unaware of them. Also it compiles
> > >> app to job graph by run the main method with a hijacked exec env,
> > >> which constrain the main method further.
> > >>
> > >> I'd like to write down a few of notes on configs/args pass and
> respect,
> > >> as well as decoupling job compilation and submission. Share on this
> > >> thread later.
> > >>
> > >> Best,
> > >> tison.
> > >>
> > >>
> > >> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一 下午7:29写道：
> > >>
> > >>> Hi Jeff and Flavio,
> > >>>
> > >>> Thanks Jeff a lot for proposing the design document.
> > >>>
> > >>> We are also working on refactoring ClusterClient to allow flexible
> and
> > >>> efficient job management in our real-time platform.
> > >>> We would like to draft a document to share our ideas with you.
> > >>>
> > >>> I think it's a good idea to have something like Apache Livy for
> Flink,
> > >>> and
> > >>> the efforts discussed here will take a great step forward to it.
> > >>>
> > >>> Regards,
> > >>> Xiaogang
> > >>>
> > >>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一 下午7:13写道：
> > >>>
> > >>> > Is there any possibility to have something like Apache Livy [1]
> also
> > >>> for
> > >>> > Flink in the future?
> > >>> >
> > >>> > [1] https://livy.apache.org/
> > >>> >
> > >>> > On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <zj...@gmail.com>
> wrote:
> > >>> >
> > >>> > > >>>  Any API we expose should not have dependencies on the
> runtime
> > >>> > > (flink-runtime) package or other implementation details. To me,
> > this
> > >>> > means
> > >>> > > that the current ClusterClient cannot be exposed to users because
> > it
> > >>> >  uses
> > >>> > > quite some classes from the optimiser and runtime packages.
> > >>> > >
> > >>> > > We should change ClusterClient from class to interface.
> > >>> > > ExecutionEnvironment only use the interface ClusterClient which
> > >>> should be
> > >>> > > in flink-clients while the concrete implementation class could be
> > in
> > >>> > > flink-runtime.
> > >>> > >
> > >>> > > >>> What happens when a failure/restart in the client happens?
> > There
> > >>> need
> > >>> > > to be a way of re-establishing the connection to the job, set up
> > the
> > >>> > > listeners again, etc.
> > >>> > >
> > >>> > > Good point.  First we need to define what does failure/restart in
> > the
> > >>> > > client mean. IIUC, that usually mean network failure which will
> > >>> happen in
> > >>> > > class RestClient. If my understanding is correct, restart/retry
> > >>> mechanism
> > >>> > > should be done in RestClient.
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > > Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二 下午11:10写道：
> > >>> > >
> > >>> > > > Some points to consider:
> > >>> > > >
> > >>> > > > * Any API we expose should not have dependencies on the runtime
> > >>> > > > (flink-runtime) package or other implementation details. To me,
> > >>> this
> > >>> > > means
> > >>> > > > that the current ClusterClient cannot be exposed to users
> because
> > >>> it
> > >>> > >  uses
> > >>> > > > quite some classes from the optimiser and runtime packages.
> > >>> > > >
> > >>> > > > * What happens when a failure/restart in the client happens?
> > There
> > >>> need
> > >>> > > to
> > >>> > > > be a way of re-establishing the connection to the job, set up
> the
> > >>> > > listeners
> > >>> > > > again, etc.
> > >>> > > >
> > >>> > > > Aljoscha
> > >>> > > >
> > >>> > > > > On 29. May 2019, at 10:17, Jeff Zhang <zj...@gmail.com>
> > wrote:
> > >>> > > > >
> > >>> > > > > Sorry folks, the design doc is late as you expected. Here's
> the
> > >>> > design
> > >>> > > > doc
> > >>> > > > > I drafted, welcome any comments and feedback.
> > >>> > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > >>> > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午8:43写道：
> > >>> > > > >
> > >>> > > > >> Nice that this discussion is happening.
> > >>> > > > >>
> > >>> > > > >> In the FLIP, we could also revisit the entire role of the
> > >>> > environments
> > >>> > > > >> again.
> > >>> > > > >>
> > >>> > > > >> Initially, the idea was:
> > >>> > > > >>  - the environments take care of the specific setup for
> > >>> standalone
> > >>> > (no
> > >>> > > > >> setup needed), yarn, mesos, etc.
> > >>> > > > >>  - the session ones have control over the session. The
> > >>> environment
> > >>> > > holds
> > >>> > > > >> the session client.
> > >>> > > > >>  - running a job gives a "control" object for that job. That
> > >>> > behavior
> > >>> > > is
> > >>> > > > >> the same in all environments.
> > >>> > > > >>
> > >>> > > > >> The actual implementation diverged quite a bit from that.
> > Happy
> > >>> to
> > >>> > > see a
> > >>> > > > >> discussion about straitening this out a bit more.
> > >>> > > > >>
> > >>> > > > >>
> > >>> > > > >> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <
> zjffdu@gmail.com>
> > >>> > wrote:
> > >>> > > > >>
> > >>> > > > >>> Hi folks,
> > >>> > > > >>>
> > >>> > > > >>> Sorry for late response, It seems we reach consensus on
> > this, I
> > >>> > will
> > >>> > > > >> create
> > >>> > > > >>> FLIP for this with more detailed design
> > >>> > > > >>>
> > >>> > > > >>>
> > >>> > > > >>> Thomas Weise <th...@apache.org> 于2018年12月21日周五 上午11:43写道：
> > >>> > > > >>>
> > >>> > > > >>>> Great to see this discussion seeded! The problems you face
> > >>> with
> > >>> > the
> > >>> > > > >>>> Zeppelin integration are also affecting other downstream
> > >>> projects,
> > >>> > > > like
> > >>> > > > >>>> Beam.
> > >>> > > > >>>>
> > >>> > > > >>>> We just enabled the savepoint restore option in
> > >>> > > > RemoteStreamEnvironment
> > >>> > > > >>> [1]
> > >>> > > > >>>> and that was more difficult than it should be. The main
> > issue
> > >>> is
> > >>> > > that
> > >>> > > > >>>> environment and cluster client aren't decoupled. Ideally
> it
> > >>> should
> > >>> > > be
> > >>> > > > >>>> possible to just get the matching cluster client from the
> > >>> > > environment
> > >>> > > > >> and
> > >>> > > > >>>> then control the job through it (environment as factory
> for
> > >>> > cluster
> > >>> > > > >>>> client). But note that the environment classes are part of
> > the
> > >>> > > public
> > >>> > > > >>> API,
> > >>> > > > >>>> and it is not straightforward to make larger changes
> without
> > >>> > > breaking
> > >>> > > > >>>> backward compatibility.
> > >>> > > > >>>>
> > >>> > > > >>>> ClusterClient currently exposes internal classes like
> > >>> JobGraph and
> > >>> > > > >>>> StreamGraph. But it should be possible to wrap this with a
> > new
> > >>> > > public
> > >>> > > > >> API
> > >>> > > > >>>> that brings the required job control capabilities for
> > >>> downstream
> > >>> > > > >>> projects.
> > >>> > > > >>>> Perhaps it is helpful to look at some of the interfaces in
> > >>> Beam
> > >>> > > while
> > >>> > > > >>>> thinking about this: [2] for the portable job API and [3]
> > for
> > >>> the
> > >>> > > old
> > >>> > > > >>>> asynchronous job control from the Beam Java SDK.
> > >>> > > > >>>>
> > >>> > > > >>>> The backward compatibility discussion [4] is also relevant
> > >>> here. A
> > >>> > > new
> > >>> > > > >>> API
> > >>> > > > >>>> should shield downstream projects from internals and allow
> > >>> them to
> > >>> > > > >>>> interoperate with multiple future Flink versions in the
> same
> > >>> > release
> > >>> > > > >> line
> > >>> > > > >>>> without forced upgrades.
> > >>> > > > >>>>
> > >>> > > > >>>> Thanks,
> > >>> > > > >>>> Thomas
> > >>> > > > >>>>
> > >>> > > > >>>> [1] https://github.com/apache/flink/pull/7249
> > >>> > > > >>>> [2]
> > >>> > > > >>>>
> > >>> > > > >>>>
> > >>> > > > >>>
> > >>> > > > >>
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > >>> > > > >>>> [3]
> > >>> > > > >>>>
> > >>> > > > >>>>
> > >>> > > > >>>
> > >>> > > > >>
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > >>> > > > >>>> [4]
> > >>> > > > >>>>
> > >>> > > > >>>>
> > >>> > > > >>>
> > >>> > > > >>
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > >>> > > > >>>>
> > >>> > > > >>>>
> > >>> > > > >>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
> > zjffdu@gmail.com>
> > >>> > > wrote:
> > >>> > > > >>>>
> > >>> > > > >>>>>>>> I'm not so sure whether the user should be able to
> > define
> > >>> > where
> > >>> > > > >> the
> > >>> > > > >>>> job
> > >>> > > > >>>>> runs (in your example Yarn). This is actually independent
> > of
> > >>> the
> > >>> > > job
> > >>> > > > >>>>> development and is something which is decided at
> deployment
> > >>> time.
> > >>> > > > >>>>>
> > >>> > > > >>>>> User don't need to specify execution mode
> programmatically.
> > >>> They
> > >>> > > can
> > >>> > > > >>> also
> > >>> > > > >>>>> pass the execution mode from the arguments in flink run
> > >>> command.
> > >>> > > e.g.
> > >>> > > > >>>>>
> > >>> > > > >>>>> bin/flink run -m yarn-cluster ....
> > >>> > > > >>>>> bin/flink run -m local ...
> > >>> > > > >>>>> bin/flink run -m host:port ...
> > >>> > > > >>>>>
> > >>> > > > >>>>> Does this make sense to you ?
> > >>> > > > >>>>>
> > >>> > > > >>>>>>>> To me it makes sense that the ExecutionEnvironment is
> > not
> > >>> > > > >> directly
> > >>> > > > >>>>> initialized by the user and instead context sensitive how
> > you
> > >>> > want
> > >>> > > to
> > >>> > > > >>>>> execute your job (Flink CLI vs. IDE, for example).
> > >>> > > > >>>>>
> > >>> > > > >>>>> Right, currently I notice Flink would create different
> > >>> > > > >>>>> ContextExecutionEnvironment based on different submission
> > >>> > scenarios
> > >>> > > > >>>> (Flink
> > >>> > > > >>>>> Cli vs IDE). To me this is kind of hack approach, not so
> > >>> > > > >>> straightforward.
> > >>> > > > >>>>> What I suggested above is that is that flink should
> always
> > >>> create
> > >>> > > the
> > >>> > > > >>>> same
> > >>> > > > >>>>> ExecutionEnvironment but with different configuration,
> and
> > >>> based
> > >>> > on
> > >>> > > > >> the
> > >>> > > > >>>>> configuration it would create the proper ClusterClient
> for
> > >>> > > different
> > >>> > > > >>>>> behaviors.
> > >>> > > > >>>>>
> > >>> > > > >>>>>
> > >>> > > > >>>>>
> > >>> > > > >>>>>
> > >>> > > > >>>>>
> > >>> > > > >>>>>
> > >>> > > > >>>>>
> > >>> > > > >>>>> Till Rohrmann <tr...@apache.org> 于2018年12月20日周四
> > >>> 下午11:18写道：
> > >>> > > > >>>>>
> > >>> > > > >>>>>> You are probably right that we have code duplication
> when
> > it
> > >>> > comes
> > >>> > > > >> to
> > >>> > > > >>>> the
> > >>> > > > >>>>>> creation of the ClusterClient. This should be reduced in
> > the
> > >>> > > > >> future.
> > >>> > > > >>>>>>
> > >>> > > > >>>>>> I'm not so sure whether the user should be able to
> define
> > >>> where
> > >>> > > the
> > >>> > > > >>> job
> > >>> > > > >>>>>> runs (in your example Yarn). This is actually
> independent
> > >>> of the
> > >>> > > > >> job
> > >>> > > > >>>>>> development and is something which is decided at
> > deployment
> > >>> > time.
> > >>> > > > >> To
> > >>> > > > >>> me
> > >>> > > > >>>>> it
> > >>> > > > >>>>>> makes sense that the ExecutionEnvironment is not
> directly
> > >>> > > > >> initialized
> > >>> > > > >>>> by
> > >>> > > > >>>>>> the user and instead context sensitive how you want to
> > >>> execute
> > >>> > > your
> > >>> > > > >>> job
> > >>> > > > >>>>>> (Flink CLI vs. IDE, for example). However, I agree that
> > the
> > >>> > > > >>>>>> ExecutionEnvironment should give you access to the
> > >>> ClusterClient
> > >>> > > > >> and
> > >>> > > > >>> to
> > >>> > > > >>>>> the
> > >>> > > > >>>>>> job (maybe in the form of the JobGraph or a job plan).
> > >>> > > > >>>>>>
> > >>> > > > >>>>>> Cheers,
> > >>> > > > >>>>>> Till
> > >>> > > > >>>>>>
> > >>> > > > >>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
> > >>> zjffdu@gmail.com>
> > >>> > > > >> wrote:
> > >>> > > > >>>>>>
> > >>> > > > >>>>>>> Hi Till,
> > >>> > > > >>>>>>> Thanks for the feedback. You are right that I expect
> > better
> > >>> > > > >>>>> programmatic
> > >>> > > > >>>>>>> job submission/control api which could be used by
> > >>> downstream
> > >>> > > > >>> project.
> > >>> > > > >>>>> And
> > >>> > > > >>>>>>> it would benefit for the flink ecosystem. When I look
> at
> > >>> the
> > >>> > code
> > >>> > > > >>> of
> > >>> > > > >>>>>> flink
> > >>> > > > >>>>>>> scala-shell and sql-client (I believe they are not the
> > >>> core of
> > >>> > > > >>> flink,
> > >>> > > > >>>>> but
> > >>> > > > >>>>>>> belong to the ecosystem of flink), I find many
> duplicated
> > >>> code
> > >>> > > > >> for
> > >>> > > > >>>>>> creating
> > >>> > > > >>>>>>> ClusterClient from user provided configuration
> > >>> (configuration
> > >>> > > > >>> format
> > >>> > > > >>>>> may
> > >>> > > > >>>>>> be
> > >>> > > > >>>>>>> different from scala-shell and sql-client) and then use
> > >>> that
> > >>> > > > >>>>>> ClusterClient
> > >>> > > > >>>>>>> to manipulate jobs. I don't think this is convenient
> for
> > >>> > > > >> downstream
> > >>> > > > >>>>>>> projects. What I expect is that downstream project only
> > >>> needs
> > >>> > to
> > >>> > > > >>>>> provide
> > >>> > > > >>>>>>> necessary configuration info (maybe introducing class
> > >>> > FlinkConf),
> > >>> > > > >>> and
> > >>> > > > >>>>>> then
> > >>> > > > >>>>>>> build ExecutionEnvironment based on this FlinkConf, and
> > >>> > > > >>>>>>> ExecutionEnvironment will create the proper
> > ClusterClient.
> > >>> It
> > >>> > not
> > >>> > > > >>>> only
> > >>> > > > >>>>>>> benefit for the downstream project development but also
> > be
> > >>> > > > >> helpful
> > >>> > > > >>>> for
> > >>> > > > >>>>>>> their integration test with flink. Here's one sample
> code
> > >>> > snippet
> > >>> > > > >>>> that
> > >>> > > > >>>>> I
> > >>> > > > >>>>>>> expect.
> > >>> > > > >>>>>>>
> > >>> > > > >>>>>>> val conf = new FlinkConf().mode("yarn")
> > >>> > > > >>>>>>> val env = new ExecutionEnvironment(conf)
> > >>> > > > >>>>>>> val jobId = env.submit(...)
> > >>> > > > >>>>>>> val jobStatus =
> > >>> env.getClusterClient().queryJobStatus(jobId)
> > >>> > > > >>>>>>> env.getClusterClient().cancelJob(jobId)
> > >>> > > > >>>>>>>
> > >>> > > > >>>>>>> What do you think ?
> > >>> > > > >>>>>>>
> > >>> > > > >>>>>>>
> > >>> > > > >>>>>>>
> > >>> > > > >>>>>>>
> > >>> > > > >>>>>>> Till Rohrmann <tr...@apache.org> 于2018年12月11日周二
> > >>> 下午6:28写道：
> > >>> > > > >>>>>>>
> > >>> > > > >>>>>>>> Hi Jeff,
> > >>> > > > >>>>>>>>
> > >>> > > > >>>>>>>> what you are proposing is to provide the user with
> > better
> > >>> > > > >>>>> programmatic
> > >>> > > > >>>>>>> job
> > >>> > > > >>>>>>>> control. There was actually an effort to achieve this
> > but
> > >>> it
> > >>> > > > >> has
> > >>> > > > >>>>> never
> > >>> > > > >>>>>>> been
> > >>> > > > >>>>>>>> completed [1]. However, there are some improvement in
> > the
> > >>> code
> > >>> > > > >>> base
> > >>> > > > >>>>>> now.
> > >>> > > > >>>>>>>> Look for example at the NewClusterClient interface
> which
> > >>> > > > >> offers a
> > >>> > > > >>>>>>>> non-blocking job submission. But I agree that we need
> to
> > >>> > > > >> improve
> > >>> > > > >>>>> Flink
> > >>> > > > >>>>>> in
> > >>> > > > >>>>>>>> this regard.
> > >>> > > > >>>>>>>>
> > >>> > > > >>>>>>>> I would not be in favour if exposing all ClusterClient
> > >>> calls
> > >>> > > > >> via
> > >>> > > > >>>> the
> > >>> > > > >>>>>>>> ExecutionEnvironment because it would clutter the
> class
> > >>> and
> > >>> > > > >> would
> > >>> > > > >>>> not
> > >>> > > > >>>>>> be
> > >>> > > > >>>>>>> a
> > >>> > > > >>>>>>>> good separation of concerns. Instead one idea could be
> > to
> > >>> > > > >>> retrieve
> > >>> > > > >>>>> the
> > >>> > > > >>>>>>>> current ClusterClient from the ExecutionEnvironment
> > which
> > >>> can
> > >>> > > > >>> then
> > >>> > > > >>>> be
> > >>> > > > >>>>>>> used
> > >>> > > > >>>>>>>> for cluster and job control. But before we start an
> > effort
> > >>> > > > >> here,
> > >>> > > > >>> we
> > >>> > > > >>>>>> need
> > >>> > > > >>>>>>> to
> > >>> > > > >>>>>>>> agree and capture what functionality we want to
> provide.
> > >>> > > > >>>>>>>>
> > >>> > > > >>>>>>>> Initially, the idea was that we have the
> > ClusterDescriptor
> > >>> > > > >>>> describing
> > >>> > > > >>>>>> how
> > >>> > > > >>>>>>>> to talk to cluster manager like Yarn or Mesos. The
> > >>> > > > >>>> ClusterDescriptor
> > >>> > > > >>>>>> can
> > >>> > > > >>>>>>> be
> > >>> > > > >>>>>>>> used for deploying Flink clusters (job and session)
> and
> > >>> gives
> > >>> > > > >>> you a
> > >>> > > > >>>>>>>> ClusterClient. The ClusterClient controls the cluster
> > >>> (e.g.
> > >>> > > > >>>>> submitting
> > >>> > > > >>>>>>>> jobs, listing all running jobs). And then there was
> the
> > >>> idea
> > >>> > to
> > >>> > > > >>>>>>> introduce a
> > >>> > > > >>>>>>>> JobClient which you obtain from the ClusterClient to
> > >>> trigger
> > >>> > > > >> job
> > >>> > > > >>>>>> specific
> > >>> > > > >>>>>>>> operations (e.g. taking a savepoint, cancelling the
> > job).
> > >>> > > > >>>>>>>>
> > >>> > > > >>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-4272
> > >>> > > > >>>>>>>>
> > >>> > > > >>>>>>>> Cheers,
> > >>> > > > >>>>>>>> Till
> > >>> > > > >>>>>>>>
> > >>> > > > >>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang <
> > >>> zjffdu@gmail.com
> > >>> > >
> > >>> > > > >>>>> wrote:
> > >>> > > > >>>>>>>>
> > >>> > > > >>>>>>>>> Hi Folks,
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>> I am trying to integrate flink into apache zeppelin
> > >>> which is
> > >>> > > > >> an
> > >>> > > > >>>>>>>> interactive
> > >>> > > > >>>>>>>>> notebook. And I hit several issues that is caused by
> > >>> flink
> > >>> > > > >>> client
> > >>> > > > >>>>>> api.
> > >>> > > > >>>>>>> So
> > >>> > > > >>>>>>>>> I'd like to proposal the following changes for flink
> > >>> client
> > >>> > > > >>> api.
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>> 1. Support nonblocking execution. Currently,
> > >>> > > > >>>>>>> ExecutionEnvironment#execute
> > >>> > > > >>>>>>>>> is a blocking method which would do 2 things, first
> > >>> submit
> > >>> > > > >> job
> > >>> > > > >>>> and
> > >>> > > > >>>>>> then
> > >>> > > > >>>>>>>>> wait for job until it is finished. I'd like
> introduce a
> > >>> > > > >>>> nonblocking
> > >>> > > > >>>>>>>>> execution method like ExecutionEnvironment#submit
> which
> > >>> only
> > >>> > > > >>>> submit
> > >>> > > > >>>>>> job
> > >>> > > > >>>>>>>> and
> > >>> > > > >>>>>>>>> then return jobId to client. And allow user to query
> > the
> > >>> job
> > >>> > > > >>>> status
> > >>> > > > >>>>>> via
> > >>> > > > >>>>>>>> the
> > >>> > > > >>>>>>>>> jobId.
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>> 2. Add cancel api in
> > >>> > > > >>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > >>> > > > >>>>>>>>> currently the only way to cancel job is via cli
> > >>> (bin/flink),
> > >>> > > > >>> this
> > >>> > > > >>>>> is
> > >>> > > > >>>>>>> not
> > >>> > > > >>>>>>>>> convenient for downstream project to use this
> feature.
> > >>> So I'd
> > >>> > > > >>>> like
> > >>> > > > >>>>> to
> > >>> > > > >>>>>>> add
> > >>> > > > >>>>>>>>> cancel api in ExecutionEnvironment
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>> 3. Add savepoint api in
> > >>> > > > >>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > >>> > > > >>>>>>>> It
> > >>> > > > >>>>>>>>> is similar as cancel api, we should use
> > >>> ExecutionEnvironment
> > >>> > > > >> as
> > >>> > > > >>>> the
> > >>> > > > >>>>>>>> unified
> > >>> > > > >>>>>>>>> api for third party to integrate with flink.
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>> 4. Add listener for job execution lifecycle.
> Something
> > >>> like
> > >>> > > > >>>>>> following,
> > >>> > > > >>>>>>> so
> > >>> > > > >>>>>>>>> that downstream project can do custom logic in the
> > >>> lifecycle
> > >>> > > > >> of
> > >>> > > > >>>>> job.
> > >>> > > > >>>>>>> e.g.
> > >>> > > > >>>>>>>>> Zeppelin would capture the jobId after job is
> submitted
> > >>> and
> > >>> > > > >>> then
> > >>> > > > >>>>> use
> > >>> > > > >>>>>>> this
> > >>> > > > >>>>>>>>> jobId to cancel it later when necessary.
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>> public interface JobListener {
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>>   void onJobSubmitted(JobID jobId);
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>>   void onJobExecuted(JobExecutionResult jobResult);
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>>   void onJobCanceled(JobID jobId);
> > >>> > > > >>>>>>>>> }
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>> 5. Enable session in ExecutionEnvironment. Currently
> it
> > >>> is
> > >>> > > > >>>>> disabled,
> > >>> > > > >>>>>>> but
> > >>> > > > >>>>>>>>> session is very convenient for third party to
> > submitting
> > >>> jobs
> > >>> > > > >>>>>>>> continually.
> > >>> > > > >>>>>>>>> I hope flink can enable it again.
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>> 6. Unify all flink client api into
> > >>> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>> This is a long term issue which needs more careful
> > >>> thinking
> > >>> > > > >> and
> > >>> > > > >>>>>> design.
> > >>> > > > >>>>>>>>> Currently some of features of flink is exposed in
> > >>> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment, but
> > >>> some are
> > >>> > > > >>>>> exposed
> > >>> > > > >>>>>>> in
> > >>> > > > >>>>>>>>> cli instead of api, like the cancel and savepoint I
> > >>> mentioned
> > >>> > > > >>>>> above.
> > >>> > > > >>>>>> I
> > >>> > > > >>>>>>>>> think the root cause is due to that flink didn't
> unify
> > >>> the
> > >>> > > > >>>>>> interaction
> > >>> > > > >>>>>>>> with
> > >>> > > > >>>>>>>>> flink. Here I list 3 scenarios of flink operation
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>>   - Local job execution.  Flink will create
> > >>> LocalEnvironment
> > >>> > > > >>> and
> > >>> > > > >>>>>> then
> > >>> > > > >>>>>>>> use
> > >>> > > > >>>>>>>>>   this LocalEnvironment to create LocalExecutor for
> job
> > >>> > > > >>>> execution.
> > >>> > > > >>>>>>>>>   - Remote job execution. Flink will create
> > ClusterClient
> > >>> > > > >>> first
> > >>> > > > >>>>> and
> > >>> > > > >>>>>>> then
> > >>> > > > >>>>>>>>>   create ContextEnvironment based on the
> ClusterClient
> > >>> and
> > >>> > > > >>> then
> > >>> > > > >>>>> run
> > >>> > > > >>>>>>> the
> > >>> > > > >>>>>>>>> job.
> > >>> > > > >>>>>>>>>   - Job cancelation. Flink will create ClusterClient
> > >>> first
> > >>> > > > >> and
> > >>> > > > >>>>> then
> > >>> > > > >>>>>>>> cancel
> > >>> > > > >>>>>>>>>   this job via this ClusterClient.
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>> As you can see in the above 3 scenarios. Flink didn't
> > >>> use the
> > >>> > > > >>>> same
> > >>> > > > >>>>>>>>> approach(code path) to interact with flink
> > >>> > > > >>>>>>>>> What I propose is following:
> > >>> > > > >>>>>>>>> Create the proper LocalEnvironment/RemoteEnvironment
> > >>> (based
> > >>> > > > >> on
> > >>> > > > >>>> user
> > >>> > > > >>>>>>>>> configuration) --> Use this Environment to create
> > proper
> > >>> > > > >>>>>> ClusterClient
> > >>> > > > >>>>>>>>> (LocalClusterClient or RestClusterClient) to
> > interactive
> > >>> with
> > >>> > > > >>>>> Flink (
> > >>> > > > >>>>>>> job
> > >>> > > > >>>>>>>>> execution or cancelation)
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>> This way we can unify the process of local execution
> > and
> > >>> > > > >> remote
> > >>> > > > >>>>>>>> execution.
> > >>> > > > >>>>>>>>> And it is much easier for third party to integrate
> with
> > >>> > > > >> flink,
> > >>> > > > >>>>>> because
> > >>> > > > >>>>>>>>> ExecutionEnvironment is the unified entry point for
> > >>> flink.
> > >>> > > > >> What
> > >>> > > > >>>>> third
> > >>> > > > >>>>>>>> party
> > >>> > > > >>>>>>>>> needs to do is just pass configuration to
> > >>> > > > >> ExecutionEnvironment
> > >>> > > > >>>> and
> > >>> > > > >>>>>>>>> ExecutionEnvironment will do the right thing based on
> > the
> > >>> > > > >>>>>>> configuration.
> > >>> > > > >>>>>>>>> Flink cli can also be considered as flink api
> consumer.
> > >>> it
> > >>> > > > >> just
> > >>> > > > >>>>> pass
> > >>> > > > >>>>>>> the
> > >>> > > > >>>>>>>>> configuration to ExecutionEnvironment and let
> > >>> > > > >>>> ExecutionEnvironment
> > >>> > > > >>>>> to
> > >>> > > > >>>>>>>>> create the proper ClusterClient instead of letting
> cli
> > to
> > >>> > > > >>> create
> > >>> > > > >>>>>>>>> ClusterClient directly.
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>> 6 would involve large code refactoring, so I think we
> > can
> > >>> > > > >> defer
> > >>> > > > >>>> it
> > >>> > > > >>>>>> for
> > >>> > > > >>>>>>>>> future release, 1,2,3,4,5 could be done at once I
> > >>> believe.
> > >>> > > > >> Let
> > >>> > > > >>> me
> > >>> > > > >>>>>> know
> > >>> > > > >>>>>>>> your
> > >>> > > > >>>>>>>>> comments and feedback, thanks
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>> --
> > >>> > > > >>>>>>>>> Best Regards
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>> Jeff Zhang
> > >>> > > > >>>>>>>>>
> > >>> > > > >>>>>>>>
> > >>> > > > >>>>>>>
> > >>> > > > >>>>>>>
> > >>> > > > >>>>>>> --
> > >>> > > > >>>>>>> Best Regards
> > >>> > > > >>>>>>>
> > >>> > > > >>>>>>> Jeff Zhang
> > >>> > > > >>>>>>>
> > >>> > > > >>>>>>
> > >>> > > > >>>>>
> > >>> > > > >>>>>
> > >>> > > > >>>>> --
> > >>> > > > >>>>> Best Regards
> > >>> > > > >>>>>
> > >>> > > > >>>>> Jeff Zhang
> > >>> > > > >>>>>
> > >>> > > > >>>>
> > >>> > > > >>>
> > >>> > > > >>>
> > >>> > > > >>> --
> > >>> > > > >>> Best Regards
> > >>> > > > >>>
> > >>> > > > >>> Jeff Zhang
> > >>> > > > >>>
> > >>> > > > >>
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > --
> > >>> > > > > Best Regards
> > >>> > > > >
> > >>> > > > > Jeff Zhang
> > >>> > > >
> > >>> > > >
> > >>> > >
> > >>> > > --
> > >>> > > Best Regards
> > >>> > >
> > >>> > > Jeff Zhang
> > >>> > >
> > >>> >
> > >>>
> > >>
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Flavio Pompermaier <po...@okkam.it>.

That's exactly what I suggested a long time ago: the Flink REST client
should not require any Flink dependency, only http library to call the REST
services to submit and monitor a job.
What I suggested also in [1] was to have a way to automatically suggest the
user (via a UI) the available main classes and their required parameters[2].
Another problem we have with Flink is that the Rest client and the CLI one
behaves differently and we use the CLI client (via ssh) because it allows
to call some other method after env.execute() [3] (we have to call another
REST service to signal the end of the job).
Int his regard, a dedicated interface, like the JobListener suggested in
the previous emails, would be very helpful (IMHO).

[1] https://issues.apache.org/jira/browse/FLINK-10864
[2] https://issues.apache.org/jira/browse/FLINK-10862
[3] https://issues.apache.org/jira/browse/FLINK-10879

Best,
Flavio

On Mon, Jun 24, 2019 at 9:54 AM Jeff Zhang <zj...@gmail.com> wrote:

> Hi, Tison,
>
> Thanks for your comments. Overall I agree with you that it is difficult for
> down stream project to integrate with flink and we need to refactor the
> current flink client api.
> And I agree that CliFrontend should only parsing command line arguments and
> then pass them to ExecutionEnvironment. It is ExecutionEnvironment's
> responsibility to compile job, create cluster, and submit job. Besides
> that, Currently flink has many ExecutionEnvironment implementations, and
> flink will use the specific one based on the context. IMHO, it is not
> necessary, ExecutionEnvironment should be able to do the right thing based
> on the FlinkConf it is received. Too many ExecutionEnvironment
> implementation is another burden for downstream project integration.
>
> One thing I'd like to mention is flink's scala shell and sql client,
> although they are sub-modules of flink, they could be treated as downstream
> project which use flink's client api. Currently you will find it is not
> easy for them to integrate with flink, they share many duplicated code.  It
> is another sign that we should refactor flink client api.
>
> I believe it is a large and hard change, and I am afraid we can not keep
> compatibility since many of changes are user facing.
>
>
>
> Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：
>
> > Hi all,
> >
> > After a closer look on our client apis, I can see there are two major
> > issues to consistency and integration, namely different deployment of
> > job cluster which couples job graph creation and cluster deployment,
> > and submission via CliFrontend confusing control flow of job graph
> > compilation and job submission. I'd like to follow the discuss above,
> > mainly the process described by Jeff and Stephan, and share my
> > ideas on these issues.
> >
> > 1) CliFrontend confuses the control flow of job compilation and
> submission.
> > Following the process of job submission Stephan and Jeff described,
> > execution environment knows all configs of the cluster and topos/settings
> > of the job. Ideally, in the main method of user program, it calls
> #execute
> > (or named #submit) and Flink deploys the cluster, compile the job graph
> > and submit it to the cluster. However, current CliFrontend does all these
> > things inside its #runProgram method, which introduces a lot of
> subclasses
> > of (stream) execution environment.
> >
> > Actually, it sets up an exec env that hijacks the #execute/executePlan
> > method, initializes the job graph and abort execution. And then
> > control flow back to CliFrontend, it deploys the cluster(or retrieve
> > the client) and submits the job graph. This is quite a specific internal
> > process inside Flink and none of consistency to anything.
> >
> > 2) Deployment of job cluster couples job graph creation and cluster
> > deployment. Abstractly, from user job to a concrete submission, it
> requires
> >
> >      create JobGraph --\
> >
> > create ClusterClient -->  submit JobGraph
> >
> > such a dependency. ClusterClient was created by deploying or retrieving.
> > JobGraph submission requires a compiled JobGraph and valid ClusterClient,
> > but the creation of ClusterClient is abstractly independent of that of
> > JobGraph. However, in job cluster mode, we deploy job cluster with a job
> > graph, which means we use another process:
> >
> > create JobGraph --> deploy cluster with the JobGraph
> >
> > Here is another inconsistency and downstream projects/client apis are
> > forced to handle different cases with rare supports from Flink.
> >
> > Since we likely reached a consensus on
> >
> > 1. all configs gathered by Flink configuration and passed
> > 2. execution environment knows all configs and handles execution(both
> > deployment and submission)
> >
> > to the issues above I propose eliminating inconsistencies by following
> > approach:
> >
> > 1) CliFrontend should exactly be a front end, at least for "run" command.
> > That means it just gathered and passed all config from command line to
> > the main method of user program. Execution environment knows all the info
> > and with an addition to utils for ClusterClient, we gracefully get a
> > ClusterClient by deploying or retrieving. In this way, we don't need to
> > hijack #execute/executePlan methods and can remove various hacking
> > subclasses of exec env, as well as #run methods in ClusterClient(for an
> > interface-ized ClusterClient). Now the control flow flows from
> CliFrontend
> > to the main method and never returns.
> >
> > 2) Job cluster means a cluster for the specific job. From another
> > perspective, it is an ephemeral session. We may decouple the deployment
> > with a compiled job graph, but start a session with idle timeout
> > and submit the job following.
> >
> > These topics, before we go into more details on design or implementation,
> > are better to be aware and discussed for a consensus.
> >
> > Best,
> > tison.
> >
> >
> > Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
> >
> >> Hi Jeff,
> >>
> >> Thanks for raising this thread and the design document!
> >>
> >> As @Thomas Weise mentioned above, extending config to flink
> >> requires far more effort than it should be. Another example
> >> is we achieve detach mode by introduce another execution
> >> environment which also hijack #execute method.
> >>
> >> I agree with your idea that user would configure all things
> >> and flink "just" respect it. On this topic I think the unusual
> >> control flow when CliFrontend handle "run" command is the problem.
> >> It handles several configs, mainly about cluster settings, and
> >> thus main method of user program is unaware of them. Also it compiles
> >> app to job graph by run the main method with a hijacked exec env,
> >> which constrain the main method further.
> >>
> >> I'd like to write down a few of notes on configs/args pass and respect,
> >> as well as decoupling job compilation and submission. Share on this
> >> thread later.
> >>
> >> Best,
> >> tison.
> >>
> >>
> >> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一 下午7:29写道：
> >>
> >>> Hi Jeff and Flavio,
> >>>
> >>> Thanks Jeff a lot for proposing the design document.
> >>>
> >>> We are also working on refactoring ClusterClient to allow flexible and
> >>> efficient job management in our real-time platform.
> >>> We would like to draft a document to share our ideas with you.
> >>>
> >>> I think it's a good idea to have something like Apache Livy for Flink,
> >>> and
> >>> the efforts discussed here will take a great step forward to it.
> >>>
> >>> Regards,
> >>> Xiaogang
> >>>
> >>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一 下午7:13写道：
> >>>
> >>> > Is there any possibility to have something like Apache Livy [1] also
> >>> for
> >>> > Flink in the future?
> >>> >
> >>> > [1] https://livy.apache.org/
> >>> >
> >>> > On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <zj...@gmail.com> wrote:
> >>> >
> >>> > > >>>  Any API we expose should not have dependencies on the runtime
> >>> > > (flink-runtime) package or other implementation details. To me,
> this
> >>> > means
> >>> > > that the current ClusterClient cannot be exposed to users because
> it
> >>> >  uses
> >>> > > quite some classes from the optimiser and runtime packages.
> >>> > >
> >>> > > We should change ClusterClient from class to interface.
> >>> > > ExecutionEnvironment only use the interface ClusterClient which
> >>> should be
> >>> > > in flink-clients while the concrete implementation class could be
> in
> >>> > > flink-runtime.
> >>> > >
> >>> > > >>> What happens when a failure/restart in the client happens?
> There
> >>> need
> >>> > > to be a way of re-establishing the connection to the job, set up
> the
> >>> > > listeners again, etc.
> >>> > >
> >>> > > Good point.  First we need to define what does failure/restart in
> the
> >>> > > client mean. IIUC, that usually mean network failure which will
> >>> happen in
> >>> > > class RestClient. If my understanding is correct, restart/retry
> >>> mechanism
> >>> > > should be done in RestClient.
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > > Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二 下午11:10写道：
> >>> > >
> >>> > > > Some points to consider:
> >>> > > >
> >>> > > > * Any API we expose should not have dependencies on the runtime
> >>> > > > (flink-runtime) package or other implementation details. To me,
> >>> this
> >>> > > means
> >>> > > > that the current ClusterClient cannot be exposed to users because
> >>> it
> >>> > >  uses
> >>> > > > quite some classes from the optimiser and runtime packages.
> >>> > > >
> >>> > > > * What happens when a failure/restart in the client happens?
> There
> >>> need
> >>> > > to
> >>> > > > be a way of re-establishing the connection to the job, set up the
> >>> > > listeners
> >>> > > > again, etc.
> >>> > > >
> >>> > > > Aljoscha
> >>> > > >
> >>> > > > > On 29. May 2019, at 10:17, Jeff Zhang <zj...@gmail.com>
> wrote:
> >>> > > > >
> >>> > > > > Sorry folks, the design doc is late as you expected. Here's the
> >>> > design
> >>> > > > doc
> >>> > > > > I drafted, welcome any comments and feedback.
> >>> > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > > Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午8:43写道：
> >>> > > > >
> >>> > > > >> Nice that this discussion is happening.
> >>> > > > >>
> >>> > > > >> In the FLIP, we could also revisit the entire role of the
> >>> > environments
> >>> > > > >> again.
> >>> > > > >>
> >>> > > > >> Initially, the idea was:
> >>> > > > >>  - the environments take care of the specific setup for
> >>> standalone
> >>> > (no
> >>> > > > >> setup needed), yarn, mesos, etc.
> >>> > > > >>  - the session ones have control over the session. The
> >>> environment
> >>> > > holds
> >>> > > > >> the session client.
> >>> > > > >>  - running a job gives a "control" object for that job. That
> >>> > behavior
> >>> > > is
> >>> > > > >> the same in all environments.
> >>> > > > >>
> >>> > > > >> The actual implementation diverged quite a bit from that.
> Happy
> >>> to
> >>> > > see a
> >>> > > > >> discussion about straitening this out a bit more.
> >>> > > > >>
> >>> > > > >>
> >>> > > > >> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <zj...@gmail.com>
> >>> > wrote:
> >>> > > > >>
> >>> > > > >>> Hi folks,
> >>> > > > >>>
> >>> > > > >>> Sorry for late response, It seems we reach consensus on
> this, I
> >>> > will
> >>> > > > >> create
> >>> > > > >>> FLIP for this with more detailed design
> >>> > > > >>>
> >>> > > > >>>
> >>> > > > >>> Thomas Weise <th...@apache.org> 于2018年12月21日周五 上午11:43写道：
> >>> > > > >>>
> >>> > > > >>>> Great to see this discussion seeded! The problems you face
> >>> with
> >>> > the
> >>> > > > >>>> Zeppelin integration are also affecting other downstream
> >>> projects,
> >>> > > > like
> >>> > > > >>>> Beam.
> >>> > > > >>>>
> >>> > > > >>>> We just enabled the savepoint restore option in
> >>> > > > RemoteStreamEnvironment
> >>> > > > >>> [1]
> >>> > > > >>>> and that was more difficult than it should be. The main
> issue
> >>> is
> >>> > > that
> >>> > > > >>>> environment and cluster client aren't decoupled. Ideally it
> >>> should
> >>> > > be
> >>> > > > >>>> possible to just get the matching cluster client from the
> >>> > > environment
> >>> > > > >> and
> >>> > > > >>>> then control the job through it (environment as factory for
> >>> > cluster
> >>> > > > >>>> client). But note that the environment classes are part of
> the
> >>> > > public
> >>> > > > >>> API,
> >>> > > > >>>> and it is not straightforward to make larger changes without
> >>> > > breaking
> >>> > > > >>>> backward compatibility.
> >>> > > > >>>>
> >>> > > > >>>> ClusterClient currently exposes internal classes like
> >>> JobGraph and
> >>> > > > >>>> StreamGraph. But it should be possible to wrap this with a
> new
> >>> > > public
> >>> > > > >> API
> >>> > > > >>>> that brings the required job control capabilities for
> >>> downstream
> >>> > > > >>> projects.
> >>> > > > >>>> Perhaps it is helpful to look at some of the interfaces in
> >>> Beam
> >>> > > while
> >>> > > > >>>> thinking about this: [2] for the portable job API and [3]
> for
> >>> the
> >>> > > old
> >>> > > > >>>> asynchronous job control from the Beam Java SDK.
> >>> > > > >>>>
> >>> > > > >>>> The backward compatibility discussion [4] is also relevant
> >>> here. A
> >>> > > new
> >>> > > > >>> API
> >>> > > > >>>> should shield downstream projects from internals and allow
> >>> them to
> >>> > > > >>>> interoperate with multiple future Flink versions in the same
> >>> > release
> >>> > > > >> line
> >>> > > > >>>> without forced upgrades.
> >>> > > > >>>>
> >>> > > > >>>> Thanks,
> >>> > > > >>>> Thomas
> >>> > > > >>>>
> >>> > > > >>>> [1] https://github.com/apache/flink/pull/7249
> >>> > > > >>>> [2]
> >>> > > > >>>>
> >>> > > > >>>>
> >>> > > > >>>
> >>> > > > >>
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> >>> > > > >>>> [3]
> >>> > > > >>>>
> >>> > > > >>>>
> >>> > > > >>>
> >>> > > > >>
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> >>> > > > >>>> [4]
> >>> > > > >>>>
> >>> > > > >>>>
> >>> > > > >>>
> >>> > > > >>
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> >>> > > > >>>>
> >>> > > > >>>>
> >>> > > > >>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <
> zjffdu@gmail.com>
> >>> > > wrote:
> >>> > > > >>>>
> >>> > > > >>>>>>>> I'm not so sure whether the user should be able to
> define
> >>> > where
> >>> > > > >> the
> >>> > > > >>>> job
> >>> > > > >>>>> runs (in your example Yarn). This is actually independent
> of
> >>> the
> >>> > > job
> >>> > > > >>>>> development and is something which is decided at deployment
> >>> time.
> >>> > > > >>>>>
> >>> > > > >>>>> User don't need to specify execution mode programmatically.
> >>> They
> >>> > > can
> >>> > > > >>> also
> >>> > > > >>>>> pass the execution mode from the arguments in flink run
> >>> command.
> >>> > > e.g.
> >>> > > > >>>>>
> >>> > > > >>>>> bin/flink run -m yarn-cluster ....
> >>> > > > >>>>> bin/flink run -m local ...
> >>> > > > >>>>> bin/flink run -m host:port ...
> >>> > > > >>>>>
> >>> > > > >>>>> Does this make sense to you ?
> >>> > > > >>>>>
> >>> > > > >>>>>>>> To me it makes sense that the ExecutionEnvironment is
> not
> >>> > > > >> directly
> >>> > > > >>>>> initialized by the user and instead context sensitive how
> you
> >>> > want
> >>> > > to
> >>> > > > >>>>> execute your job (Flink CLI vs. IDE, for example).
> >>> > > > >>>>>
> >>> > > > >>>>> Right, currently I notice Flink would create different
> >>> > > > >>>>> ContextExecutionEnvironment based on different submission
> >>> > scenarios
> >>> > > > >>>> (Flink
> >>> > > > >>>>> Cli vs IDE). To me this is kind of hack approach, not so
> >>> > > > >>> straightforward.
> >>> > > > >>>>> What I suggested above is that is that flink should always
> >>> create
> >>> > > the
> >>> > > > >>>> same
> >>> > > > >>>>> ExecutionEnvironment but with different configuration, and
> >>> based
> >>> > on
> >>> > > > >> the
> >>> > > > >>>>> configuration it would create the proper ClusterClient for
> >>> > > different
> >>> > > > >>>>> behaviors.
> >>> > > > >>>>>
> >>> > > > >>>>>
> >>> > > > >>>>>
> >>> > > > >>>>>
> >>> > > > >>>>>
> >>> > > > >>>>>
> >>> > > > >>>>>
> >>> > > > >>>>> Till Rohrmann <tr...@apache.org> 于2018年12月20日周四
> >>> 下午11:18写道：
> >>> > > > >>>>>
> >>> > > > >>>>>> You are probably right that we have code duplication when
> it
> >>> > comes
> >>> > > > >> to
> >>> > > > >>>> the
> >>> > > > >>>>>> creation of the ClusterClient. This should be reduced in
> the
> >>> > > > >> future.
> >>> > > > >>>>>>
> >>> > > > >>>>>> I'm not so sure whether the user should be able to define
> >>> where
> >>> > > the
> >>> > > > >>> job
> >>> > > > >>>>>> runs (in your example Yarn). This is actually independent
> >>> of the
> >>> > > > >> job
> >>> > > > >>>>>> development and is something which is decided at
> deployment
> >>> > time.
> >>> > > > >> To
> >>> > > > >>> me
> >>> > > > >>>>> it
> >>> > > > >>>>>> makes sense that the ExecutionEnvironment is not directly
> >>> > > > >> initialized
> >>> > > > >>>> by
> >>> > > > >>>>>> the user and instead context sensitive how you want to
> >>> execute
> >>> > > your
> >>> > > > >>> job
> >>> > > > >>>>>> (Flink CLI vs. IDE, for example). However, I agree that
> the
> >>> > > > >>>>>> ExecutionEnvironment should give you access to the
> >>> ClusterClient
> >>> > > > >> and
> >>> > > > >>> to
> >>> > > > >>>>> the
> >>> > > > >>>>>> job (maybe in the form of the JobGraph or a job plan).
> >>> > > > >>>>>>
> >>> > > > >>>>>> Cheers,
> >>> > > > >>>>>> Till
> >>> > > > >>>>>>
> >>> > > > >>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
> >>> zjffdu@gmail.com>
> >>> > > > >> wrote:
> >>> > > > >>>>>>
> >>> > > > >>>>>>> Hi Till,
> >>> > > > >>>>>>> Thanks for the feedback. You are right that I expect
> better
> >>> > > > >>>>> programmatic
> >>> > > > >>>>>>> job submission/control api which could be used by
> >>> downstream
> >>> > > > >>> project.
> >>> > > > >>>>> And
> >>> > > > >>>>>>> it would benefit for the flink ecosystem. When I look at
> >>> the
> >>> > code
> >>> > > > >>> of
> >>> > > > >>>>>> flink
> >>> > > > >>>>>>> scala-shell and sql-client (I believe they are not the
> >>> core of
> >>> > > > >>> flink,
> >>> > > > >>>>> but
> >>> > > > >>>>>>> belong to the ecosystem of flink), I find many duplicated
> >>> code
> >>> > > > >> for
> >>> > > > >>>>>> creating
> >>> > > > >>>>>>> ClusterClient from user provided configuration
> >>> (configuration
> >>> > > > >>> format
> >>> > > > >>>>> may
> >>> > > > >>>>>> be
> >>> > > > >>>>>>> different from scala-shell and sql-client) and then use
> >>> that
> >>> > > > >>>>>> ClusterClient
> >>> > > > >>>>>>> to manipulate jobs. I don't think this is convenient for
> >>> > > > >> downstream
> >>> > > > >>>>>>> projects. What I expect is that downstream project only
> >>> needs
> >>> > to
> >>> > > > >>>>> provide
> >>> > > > >>>>>>> necessary configuration info (maybe introducing class
> >>> > FlinkConf),
> >>> > > > >>> and
> >>> > > > >>>>>> then
> >>> > > > >>>>>>> build ExecutionEnvironment based on this FlinkConf, and
> >>> > > > >>>>>>> ExecutionEnvironment will create the proper
> ClusterClient.
> >>> It
> >>> > not
> >>> > > > >>>> only
> >>> > > > >>>>>>> benefit for the downstream project development but also
> be
> >>> > > > >> helpful
> >>> > > > >>>> for
> >>> > > > >>>>>>> their integration test with flink. Here's one sample code
> >>> > snippet
> >>> > > > >>>> that
> >>> > > > >>>>> I
> >>> > > > >>>>>>> expect.
> >>> > > > >>>>>>>
> >>> > > > >>>>>>> val conf = new FlinkConf().mode("yarn")
> >>> > > > >>>>>>> val env = new ExecutionEnvironment(conf)
> >>> > > > >>>>>>> val jobId = env.submit(...)
> >>> > > > >>>>>>> val jobStatus =
> >>> env.getClusterClient().queryJobStatus(jobId)
> >>> > > > >>>>>>> env.getClusterClient().cancelJob(jobId)
> >>> > > > >>>>>>>
> >>> > > > >>>>>>> What do you think ?
> >>> > > > >>>>>>>
> >>> > > > >>>>>>>
> >>> > > > >>>>>>>
> >>> > > > >>>>>>>
> >>> > > > >>>>>>> Till Rohrmann <tr...@apache.org> 于2018年12月11日周二
> >>> 下午6:28写道：
> >>> > > > >>>>>>>
> >>> > > > >>>>>>>> Hi Jeff,
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> what you are proposing is to provide the user with
> better
> >>> > > > >>>>> programmatic
> >>> > > > >>>>>>> job
> >>> > > > >>>>>>>> control. There was actually an effort to achieve this
> but
> >>> it
> >>> > > > >> has
> >>> > > > >>>>> never
> >>> > > > >>>>>>> been
> >>> > > > >>>>>>>> completed [1]. However, there are some improvement in
> the
> >>> code
> >>> > > > >>> base
> >>> > > > >>>>>> now.
> >>> > > > >>>>>>>> Look for example at the NewClusterClient interface which
> >>> > > > >> offers a
> >>> > > > >>>>>>>> non-blocking job submission. But I agree that we need to
> >>> > > > >> improve
> >>> > > > >>>>> Flink
> >>> > > > >>>>>> in
> >>> > > > >>>>>>>> this regard.
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> I would not be in favour if exposing all ClusterClient
> >>> calls
> >>> > > > >> via
> >>> > > > >>>> the
> >>> > > > >>>>>>>> ExecutionEnvironment because it would clutter the class
> >>> and
> >>> > > > >> would
> >>> > > > >>>> not
> >>> > > > >>>>>> be
> >>> > > > >>>>>>> a
> >>> > > > >>>>>>>> good separation of concerns. Instead one idea could be
> to
> >>> > > > >>> retrieve
> >>> > > > >>>>> the
> >>> > > > >>>>>>>> current ClusterClient from the ExecutionEnvironment
> which
> >>> can
> >>> > > > >>> then
> >>> > > > >>>> be
> >>> > > > >>>>>>> used
> >>> > > > >>>>>>>> for cluster and job control. But before we start an
> effort
> >>> > > > >> here,
> >>> > > > >>> we
> >>> > > > >>>>>> need
> >>> > > > >>>>>>> to
> >>> > > > >>>>>>>> agree and capture what functionality we want to provide.
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> Initially, the idea was that we have the
> ClusterDescriptor
> >>> > > > >>>> describing
> >>> > > > >>>>>> how
> >>> > > > >>>>>>>> to talk to cluster manager like Yarn or Mesos. The
> >>> > > > >>>> ClusterDescriptor
> >>> > > > >>>>>> can
> >>> > > > >>>>>>> be
> >>> > > > >>>>>>>> used for deploying Flink clusters (job and session) and
> >>> gives
> >>> > > > >>> you a
> >>> > > > >>>>>>>> ClusterClient. The ClusterClient controls the cluster
> >>> (e.g.
> >>> > > > >>>>> submitting
> >>> > > > >>>>>>>> jobs, listing all running jobs). And then there was the
> >>> idea
> >>> > to
> >>> > > > >>>>>>> introduce a
> >>> > > > >>>>>>>> JobClient which you obtain from the ClusterClient to
> >>> trigger
> >>> > > > >> job
> >>> > > > >>>>>> specific
> >>> > > > >>>>>>>> operations (e.g. taking a savepoint, cancelling the
> job).
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-4272
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> Cheers,
> >>> > > > >>>>>>>> Till
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang <
> >>> zjffdu@gmail.com
> >>> > >
> >>> > > > >>>>> wrote:
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>>> Hi Folks,
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>> I am trying to integrate flink into apache zeppelin
> >>> which is
> >>> > > > >> an
> >>> > > > >>>>>>>> interactive
> >>> > > > >>>>>>>>> notebook. And I hit several issues that is caused by
> >>> flink
> >>> > > > >>> client
> >>> > > > >>>>>> api.
> >>> > > > >>>>>>> So
> >>> > > > >>>>>>>>> I'd like to proposal the following changes for flink
> >>> client
> >>> > > > >>> api.
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>> 1. Support nonblocking execution. Currently,
> >>> > > > >>>>>>> ExecutionEnvironment#execute
> >>> > > > >>>>>>>>> is a blocking method which would do 2 things, first
> >>> submit
> >>> > > > >> job
> >>> > > > >>>> and
> >>> > > > >>>>>> then
> >>> > > > >>>>>>>>> wait for job until it is finished. I'd like introduce a
> >>> > > > >>>> nonblocking
> >>> > > > >>>>>>>>> execution method like ExecutionEnvironment#submit which
> >>> only
> >>> > > > >>>> submit
> >>> > > > >>>>>> job
> >>> > > > >>>>>>>> and
> >>> > > > >>>>>>>>> then return jobId to client. And allow user to query
> the
> >>> job
> >>> > > > >>>> status
> >>> > > > >>>>>> via
> >>> > > > >>>>>>>> the
> >>> > > > >>>>>>>>> jobId.
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>> 2. Add cancel api in
> >>> > > > >>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> >>> > > > >>>>>>>>> currently the only way to cancel job is via cli
> >>> (bin/flink),
> >>> > > > >>> this
> >>> > > > >>>>> is
> >>> > > > >>>>>>> not
> >>> > > > >>>>>>>>> convenient for downstream project to use this feature.
> >>> So I'd
> >>> > > > >>>> like
> >>> > > > >>>>> to
> >>> > > > >>>>>>> add
> >>> > > > >>>>>>>>> cancel api in ExecutionEnvironment
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>> 3. Add savepoint api in
> >>> > > > >>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> >>> > > > >>>>>>>> It
> >>> > > > >>>>>>>>> is similar as cancel api, we should use
> >>> ExecutionEnvironment
> >>> > > > >> as
> >>> > > > >>>> the
> >>> > > > >>>>>>>> unified
> >>> > > > >>>>>>>>> api for third party to integrate with flink.
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>> 4. Add listener for job execution lifecycle. Something
> >>> like
> >>> > > > >>>>>> following,
> >>> > > > >>>>>>> so
> >>> > > > >>>>>>>>> that downstream project can do custom logic in the
> >>> lifecycle
> >>> > > > >> of
> >>> > > > >>>>> job.
> >>> > > > >>>>>>> e.g.
> >>> > > > >>>>>>>>> Zeppelin would capture the jobId after job is submitted
> >>> and
> >>> > > > >>> then
> >>> > > > >>>>> use
> >>> > > > >>>>>>> this
> >>> > > > >>>>>>>>> jobId to cancel it later when necessary.
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>> public interface JobListener {
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>>   void onJobSubmitted(JobID jobId);
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>>   void onJobExecuted(JobExecutionResult jobResult);
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>>   void onJobCanceled(JobID jobId);
> >>> > > > >>>>>>>>> }
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>> 5. Enable session in ExecutionEnvironment. Currently it
> >>> is
> >>> > > > >>>>> disabled,
> >>> > > > >>>>>>> but
> >>> > > > >>>>>>>>> session is very convenient for third party to
> submitting
> >>> jobs
> >>> > > > >>>>>>>> continually.
> >>> > > > >>>>>>>>> I hope flink can enable it again.
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>> 6. Unify all flink client api into
> >>> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>> This is a long term issue which needs more careful
> >>> thinking
> >>> > > > >> and
> >>> > > > >>>>>> design.
> >>> > > > >>>>>>>>> Currently some of features of flink is exposed in
> >>> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment, but
> >>> some are
> >>> > > > >>>>> exposed
> >>> > > > >>>>>>> in
> >>> > > > >>>>>>>>> cli instead of api, like the cancel and savepoint I
> >>> mentioned
> >>> > > > >>>>> above.
> >>> > > > >>>>>> I
> >>> > > > >>>>>>>>> think the root cause is due to that flink didn't unify
> >>> the
> >>> > > > >>>>>> interaction
> >>> > > > >>>>>>>> with
> >>> > > > >>>>>>>>> flink. Here I list 3 scenarios of flink operation
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>>   - Local job execution.  Flink will create
> >>> LocalEnvironment
> >>> > > > >>> and
> >>> > > > >>>>>> then
> >>> > > > >>>>>>>> use
> >>> > > > >>>>>>>>>   this LocalEnvironment to create LocalExecutor for job
> >>> > > > >>>> execution.
> >>> > > > >>>>>>>>>   - Remote job execution. Flink will create
> ClusterClient
> >>> > > > >>> first
> >>> > > > >>>>> and
> >>> > > > >>>>>>> then
> >>> > > > >>>>>>>>>   create ContextEnvironment based on the ClusterClient
> >>> and
> >>> > > > >>> then
> >>> > > > >>>>> run
> >>> > > > >>>>>>> the
> >>> > > > >>>>>>>>> job.
> >>> > > > >>>>>>>>>   - Job cancelation. Flink will create ClusterClient
> >>> first
> >>> > > > >> and
> >>> > > > >>>>> then
> >>> > > > >>>>>>>> cancel
> >>> > > > >>>>>>>>>   this job via this ClusterClient.
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>> As you can see in the above 3 scenarios. Flink didn't
> >>> use the
> >>> > > > >>>> same
> >>> > > > >>>>>>>>> approach(code path) to interact with flink
> >>> > > > >>>>>>>>> What I propose is following:
> >>> > > > >>>>>>>>> Create the proper LocalEnvironment/RemoteEnvironment
> >>> (based
> >>> > > > >> on
> >>> > > > >>>> user
> >>> > > > >>>>>>>>> configuration) --> Use this Environment to create
> proper
> >>> > > > >>>>>> ClusterClient
> >>> > > > >>>>>>>>> (LocalClusterClient or RestClusterClient) to
> interactive
> >>> with
> >>> > > > >>>>> Flink (
> >>> > > > >>>>>>> job
> >>> > > > >>>>>>>>> execution or cancelation)
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>> This way we can unify the process of local execution
> and
> >>> > > > >> remote
> >>> > > > >>>>>>>> execution.
> >>> > > > >>>>>>>>> And it is much easier for third party to integrate with
> >>> > > > >> flink,
> >>> > > > >>>>>> because
> >>> > > > >>>>>>>>> ExecutionEnvironment is the unified entry point for
> >>> flink.
> >>> > > > >> What
> >>> > > > >>>>> third
> >>> > > > >>>>>>>> party
> >>> > > > >>>>>>>>> needs to do is just pass configuration to
> >>> > > > >> ExecutionEnvironment
> >>> > > > >>>> and
> >>> > > > >>>>>>>>> ExecutionEnvironment will do the right thing based on
> the
> >>> > > > >>>>>>> configuration.
> >>> > > > >>>>>>>>> Flink cli can also be considered as flink api consumer.
> >>> it
> >>> > > > >> just
> >>> > > > >>>>> pass
> >>> > > > >>>>>>> the
> >>> > > > >>>>>>>>> configuration to ExecutionEnvironment and let
> >>> > > > >>>> ExecutionEnvironment
> >>> > > > >>>>> to
> >>> > > > >>>>>>>>> create the proper ClusterClient instead of letting cli
> to
> >>> > > > >>> create
> >>> > > > >>>>>>>>> ClusterClient directly.
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>> 6 would involve large code refactoring, so I think we
> can
> >>> > > > >> defer
> >>> > > > >>>> it
> >>> > > > >>>>>> for
> >>> > > > >>>>>>>>> future release, 1,2,3,4,5 could be done at once I
> >>> believe.
> >>> > > > >> Let
> >>> > > > >>> me
> >>> > > > >>>>>> know
> >>> > > > >>>>>>>> your
> >>> > > > >>>>>>>>> comments and feedback, thanks
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>> --
> >>> > > > >>>>>>>>> Best Regards
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>> Jeff Zhang
> >>> > > > >>>>>>>>>
> >>> > > > >>>>>>>>
> >>> > > > >>>>>>>
> >>> > > > >>>>>>>
> >>> > > > >>>>>>> --
> >>> > > > >>>>>>> Best Regards
> >>> > > > >>>>>>>
> >>> > > > >>>>>>> Jeff Zhang
> >>> > > > >>>>>>>
> >>> > > > >>>>>>
> >>> > > > >>>>>
> >>> > > > >>>>>
> >>> > > > >>>>> --
> >>> > > > >>>>> Best Regards
> >>> > > > >>>>>
> >>> > > > >>>>> Jeff Zhang
> >>> > > > >>>>>
> >>> > > > >>>>
> >>> > > > >>>
> >>> > > > >>>
> >>> > > > >>> --
> >>> > > > >>> Best Regards
> >>> > > > >>>
> >>> > > > >>> Jeff Zhang
> >>> > > > >>>
> >>> > > > >>
> >>> > > > >
> >>> > > > >
> >>> > > > > --
> >>> > > > > Best Regards
> >>> > > > >
> >>> > > > > Jeff Zhang
> >>> > > >
> >>> > > >
> >>> > >
> >>> > > --
> >>> > > Best Regards
> >>> > >
> >>> > > Jeff Zhang
> >>> > >
> >>> >
> >>>
> >>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Jeff Zhang <zj...@gmail.com>.

Hi, Tison,

Thanks for your comments. Overall I agree with you that it is difficult for
down stream project to integrate with flink and we need to refactor the
current flink client api.
And I agree that CliFrontend should only parsing command line arguments and
then pass them to ExecutionEnvironment. It is ExecutionEnvironment's
responsibility to compile job, create cluster, and submit job. Besides
that, Currently flink has many ExecutionEnvironment implementations, and
flink will use the specific one based on the context. IMHO, it is not
necessary, ExecutionEnvironment should be able to do the right thing based
on the FlinkConf it is received. Too many ExecutionEnvironment
implementation is another burden for downstream project integration.

One thing I'd like to mention is flink's scala shell and sql client,
although they are sub-modules of flink, they could be treated as downstream
project which use flink's client api. Currently you will find it is not
easy for them to integrate with flink, they share many duplicated code.  It
is another sign that we should refactor flink client api.

I believe it is a large and hard change, and I am afraid we can not keep
compatibility since many of changes are user facing.



Zili Chen <wa...@gmail.com> 于2019年6月24日周一 下午2:53写道：

> Hi all,
>
> After a closer look on our client apis, I can see there are two major
> issues to consistency and integration, namely different deployment of
> job cluster which couples job graph creation and cluster deployment,
> and submission via CliFrontend confusing control flow of job graph
> compilation and job submission. I'd like to follow the discuss above,
> mainly the process described by Jeff and Stephan, and share my
> ideas on these issues.
>
> 1) CliFrontend confuses the control flow of job compilation and submission.
> Following the process of job submission Stephan and Jeff described,
> execution environment knows all configs of the cluster and topos/settings
> of the job. Ideally, in the main method of user program, it calls #execute
> (or named #submit) and Flink deploys the cluster, compile the job graph
> and submit it to the cluster. However, current CliFrontend does all these
> things inside its #runProgram method, which introduces a lot of subclasses
> of (stream) execution environment.
>
> Actually, it sets up an exec env that hijacks the #execute/executePlan
> method, initializes the job graph and abort execution. And then
> control flow back to CliFrontend, it deploys the cluster(or retrieve
> the client) and submits the job graph. This is quite a specific internal
> process inside Flink and none of consistency to anything.
>
> 2) Deployment of job cluster couples job graph creation and cluster
> deployment. Abstractly, from user job to a concrete submission, it requires
>
>      create JobGraph --\
>
> create ClusterClient -->  submit JobGraph
>
> such a dependency. ClusterClient was created by deploying or retrieving.
> JobGraph submission requires a compiled JobGraph and valid ClusterClient,
> but the creation of ClusterClient is abstractly independent of that of
> JobGraph. However, in job cluster mode, we deploy job cluster with a job
> graph, which means we use another process:
>
> create JobGraph --> deploy cluster with the JobGraph
>
> Here is another inconsistency and downstream projects/client apis are
> forced to handle different cases with rare supports from Flink.
>
> Since we likely reached a consensus on
>
> 1. all configs gathered by Flink configuration and passed
> 2. execution environment knows all configs and handles execution(both
> deployment and submission)
>
> to the issues above I propose eliminating inconsistencies by following
> approach:
>
> 1) CliFrontend should exactly be a front end, at least for "run" command.
> That means it just gathered and passed all config from command line to
> the main method of user program. Execution environment knows all the info
> and with an addition to utils for ClusterClient, we gracefully get a
> ClusterClient by deploying or retrieving. In this way, we don't need to
> hijack #execute/executePlan methods and can remove various hacking
> subclasses of exec env, as well as #run methods in ClusterClient(for an
> interface-ized ClusterClient). Now the control flow flows from CliFrontend
> to the main method and never returns.
>
> 2) Job cluster means a cluster for the specific job. From another
> perspective, it is an ephemeral session. We may decouple the deployment
> with a compiled job graph, but start a session with idle timeout
> and submit the job following.
>
> These topics, before we go into more details on design or implementation,
> are better to be aware and discussed for a consensus.
>
> Best,
> tison.
>
>
> Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：
>
>> Hi Jeff,
>>
>> Thanks for raising this thread and the design document!
>>
>> As @Thomas Weise mentioned above, extending config to flink
>> requires far more effort than it should be. Another example
>> is we achieve detach mode by introduce another execution
>> environment which also hijack #execute method.
>>
>> I agree with your idea that user would configure all things
>> and flink "just" respect it. On this topic I think the unusual
>> control flow when CliFrontend handle "run" command is the problem.
>> It handles several configs, mainly about cluster settings, and
>> thus main method of user program is unaware of them. Also it compiles
>> app to job graph by run the main method with a hijacked exec env,
>> which constrain the main method further.
>>
>> I'd like to write down a few of notes on configs/args pass and respect,
>> as well as decoupling job compilation and submission. Share on this
>> thread later.
>>
>> Best,
>> tison.
>>
>>
>> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一 下午7:29写道：
>>
>>> Hi Jeff and Flavio,
>>>
>>> Thanks Jeff a lot for proposing the design document.
>>>
>>> We are also working on refactoring ClusterClient to allow flexible and
>>> efficient job management in our real-time platform.
>>> We would like to draft a document to share our ideas with you.
>>>
>>> I think it's a good idea to have something like Apache Livy for Flink,
>>> and
>>> the efforts discussed here will take a great step forward to it.
>>>
>>> Regards,
>>> Xiaogang
>>>
>>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一 下午7:13写道：
>>>
>>> > Is there any possibility to have something like Apache Livy [1] also
>>> for
>>> > Flink in the future?
>>> >
>>> > [1] https://livy.apache.org/
>>> >
>>> > On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <zj...@gmail.com> wrote:
>>> >
>>> > > >>>  Any API we expose should not have dependencies on the runtime
>>> > > (flink-runtime) package or other implementation details. To me, this
>>> > means
>>> > > that the current ClusterClient cannot be exposed to users because it
>>> >  uses
>>> > > quite some classes from the optimiser and runtime packages.
>>> > >
>>> > > We should change ClusterClient from class to interface.
>>> > > ExecutionEnvironment only use the interface ClusterClient which
>>> should be
>>> > > in flink-clients while the concrete implementation class could be in
>>> > > flink-runtime.
>>> > >
>>> > > >>> What happens when a failure/restart in the client happens? There
>>> need
>>> > > to be a way of re-establishing the connection to the job, set up the
>>> > > listeners again, etc.
>>> > >
>>> > > Good point.  First we need to define what does failure/restart in the
>>> > > client mean. IIUC, that usually mean network failure which will
>>> happen in
>>> > > class RestClient. If my understanding is correct, restart/retry
>>> mechanism
>>> > > should be done in RestClient.
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二 下午11:10写道：
>>> > >
>>> > > > Some points to consider:
>>> > > >
>>> > > > * Any API we expose should not have dependencies on the runtime
>>> > > > (flink-runtime) package or other implementation details. To me,
>>> this
>>> > > means
>>> > > > that the current ClusterClient cannot be exposed to users because
>>> it
>>> > >  uses
>>> > > > quite some classes from the optimiser and runtime packages.
>>> > > >
>>> > > > * What happens when a failure/restart in the client happens? There
>>> need
>>> > > to
>>> > > > be a way of re-establishing the connection to the job, set up the
>>> > > listeners
>>> > > > again, etc.
>>> > > >
>>> > > > Aljoscha
>>> > > >
>>> > > > > On 29. May 2019, at 10:17, Jeff Zhang <zj...@gmail.com> wrote:
>>> > > > >
>>> > > > > Sorry folks, the design doc is late as you expected. Here's the
>>> > design
>>> > > > doc
>>> > > > > I drafted, welcome any comments and feedback.
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午8:43写道：
>>> > > > >
>>> > > > >> Nice that this discussion is happening.
>>> > > > >>
>>> > > > >> In the FLIP, we could also revisit the entire role of the
>>> > environments
>>> > > > >> again.
>>> > > > >>
>>> > > > >> Initially, the idea was:
>>> > > > >>  - the environments take care of the specific setup for
>>> standalone
>>> > (no
>>> > > > >> setup needed), yarn, mesos, etc.
>>> > > > >>  - the session ones have control over the session. The
>>> environment
>>> > > holds
>>> > > > >> the session client.
>>> > > > >>  - running a job gives a "control" object for that job. That
>>> > behavior
>>> > > is
>>> > > > >> the same in all environments.
>>> > > > >>
>>> > > > >> The actual implementation diverged quite a bit from that. Happy
>>> to
>>> > > see a
>>> > > > >> discussion about straitening this out a bit more.
>>> > > > >>
>>> > > > >>
>>> > > > >> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <zj...@gmail.com>
>>> > wrote:
>>> > > > >>
>>> > > > >>> Hi folks,
>>> > > > >>>
>>> > > > >>> Sorry for late response, It seems we reach consensus on this, I
>>> > will
>>> > > > >> create
>>> > > > >>> FLIP for this with more detailed design
>>> > > > >>>
>>> > > > >>>
>>> > > > >>> Thomas Weise <th...@apache.org> 于2018年12月21日周五 上午11:43写道：
>>> > > > >>>
>>> > > > >>>> Great to see this discussion seeded! The problems you face
>>> with
>>> > the
>>> > > > >>>> Zeppelin integration are also affecting other downstream
>>> projects,
>>> > > > like
>>> > > > >>>> Beam.
>>> > > > >>>>
>>> > > > >>>> We just enabled the savepoint restore option in
>>> > > > RemoteStreamEnvironment
>>> > > > >>> [1]
>>> > > > >>>> and that was more difficult than it should be. The main issue
>>> is
>>> > > that
>>> > > > >>>> environment and cluster client aren't decoupled. Ideally it
>>> should
>>> > > be
>>> > > > >>>> possible to just get the matching cluster client from the
>>> > > environment
>>> > > > >> and
>>> > > > >>>> then control the job through it (environment as factory for
>>> > cluster
>>> > > > >>>> client). But note that the environment classes are part of the
>>> > > public
>>> > > > >>> API,
>>> > > > >>>> and it is not straightforward to make larger changes without
>>> > > breaking
>>> > > > >>>> backward compatibility.
>>> > > > >>>>
>>> > > > >>>> ClusterClient currently exposes internal classes like
>>> JobGraph and
>>> > > > >>>> StreamGraph. But it should be possible to wrap this with a new
>>> > > public
>>> > > > >> API
>>> > > > >>>> that brings the required job control capabilities for
>>> downstream
>>> > > > >>> projects.
>>> > > > >>>> Perhaps it is helpful to look at some of the interfaces in
>>> Beam
>>> > > while
>>> > > > >>>> thinking about this: [2] for the portable job API and [3] for
>>> the
>>> > > old
>>> > > > >>>> asynchronous job control from the Beam Java SDK.
>>> > > > >>>>
>>> > > > >>>> The backward compatibility discussion [4] is also relevant
>>> here. A
>>> > > new
>>> > > > >>> API
>>> > > > >>>> should shield downstream projects from internals and allow
>>> them to
>>> > > > >>>> interoperate with multiple future Flink versions in the same
>>> > release
>>> > > > >> line
>>> > > > >>>> without forced upgrades.
>>> > > > >>>>
>>> > > > >>>> Thanks,
>>> > > > >>>> Thomas
>>> > > > >>>>
>>> > > > >>>> [1] https://github.com/apache/flink/pull/7249
>>> > > > >>>> [2]
>>> > > > >>>>
>>> > > > >>>>
>>> > > > >>>
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
>>> > > > >>>> [3]
>>> > > > >>>>
>>> > > > >>>>
>>> > > > >>>
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
>>> > > > >>>> [4]
>>> > > > >>>>
>>> > > > >>>>
>>> > > > >>>
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
>>> > > > >>>>
>>> > > > >>>>
>>> > > > >>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <zj...@gmail.com>
>>> > > wrote:
>>> > > > >>>>
>>> > > > >>>>>>>> I'm not so sure whether the user should be able to define
>>> > where
>>> > > > >> the
>>> > > > >>>> job
>>> > > > >>>>> runs (in your example Yarn). This is actually independent of
>>> the
>>> > > job
>>> > > > >>>>> development and is something which is decided at deployment
>>> time.
>>> > > > >>>>>
>>> > > > >>>>> User don't need to specify execution mode programmatically.
>>> They
>>> > > can
>>> > > > >>> also
>>> > > > >>>>> pass the execution mode from the arguments in flink run
>>> command.
>>> > > e.g.
>>> > > > >>>>>
>>> > > > >>>>> bin/flink run -m yarn-cluster ....
>>> > > > >>>>> bin/flink run -m local ...
>>> > > > >>>>> bin/flink run -m host:port ...
>>> > > > >>>>>
>>> > > > >>>>> Does this make sense to you ?
>>> > > > >>>>>
>>> > > > >>>>>>>> To me it makes sense that the ExecutionEnvironment is not
>>> > > > >> directly
>>> > > > >>>>> initialized by the user and instead context sensitive how you
>>> > want
>>> > > to
>>> > > > >>>>> execute your job (Flink CLI vs. IDE, for example).
>>> > > > >>>>>
>>> > > > >>>>> Right, currently I notice Flink would create different
>>> > > > >>>>> ContextExecutionEnvironment based on different submission
>>> > scenarios
>>> > > > >>>> (Flink
>>> > > > >>>>> Cli vs IDE). To me this is kind of hack approach, not so
>>> > > > >>> straightforward.
>>> > > > >>>>> What I suggested above is that is that flink should always
>>> create
>>> > > the
>>> > > > >>>> same
>>> > > > >>>>> ExecutionEnvironment but with different configuration, and
>>> based
>>> > on
>>> > > > >> the
>>> > > > >>>>> configuration it would create the proper ClusterClient for
>>> > > different
>>> > > > >>>>> behaviors.
>>> > > > >>>>>
>>> > > > >>>>>
>>> > > > >>>>>
>>> > > > >>>>>
>>> > > > >>>>>
>>> > > > >>>>>
>>> > > > >>>>>
>>> > > > >>>>> Till Rohrmann <tr...@apache.org> 于2018年12月20日周四
>>> 下午11:18写道：
>>> > > > >>>>>
>>> > > > >>>>>> You are probably right that we have code duplication when it
>>> > comes
>>> > > > >> to
>>> > > > >>>> the
>>> > > > >>>>>> creation of the ClusterClient. This should be reduced in the
>>> > > > >> future.
>>> > > > >>>>>>
>>> > > > >>>>>> I'm not so sure whether the user should be able to define
>>> where
>>> > > the
>>> > > > >>> job
>>> > > > >>>>>> runs (in your example Yarn). This is actually independent
>>> of the
>>> > > > >> job
>>> > > > >>>>>> development and is something which is decided at deployment
>>> > time.
>>> > > > >> To
>>> > > > >>> me
>>> > > > >>>>> it
>>> > > > >>>>>> makes sense that the ExecutionEnvironment is not directly
>>> > > > >> initialized
>>> > > > >>>> by
>>> > > > >>>>>> the user and instead context sensitive how you want to
>>> execute
>>> > > your
>>> > > > >>> job
>>> > > > >>>>>> (Flink CLI vs. IDE, for example). However, I agree that the
>>> > > > >>>>>> ExecutionEnvironment should give you access to the
>>> ClusterClient
>>> > > > >> and
>>> > > > >>> to
>>> > > > >>>>> the
>>> > > > >>>>>> job (maybe in the form of the JobGraph or a job plan).
>>> > > > >>>>>>
>>> > > > >>>>>> Cheers,
>>> > > > >>>>>> Till
>>> > > > >>>>>>
>>> > > > >>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <
>>> zjffdu@gmail.com>
>>> > > > >> wrote:
>>> > > > >>>>>>
>>> > > > >>>>>>> Hi Till,
>>> > > > >>>>>>> Thanks for the feedback. You are right that I expect better
>>> > > > >>>>> programmatic
>>> > > > >>>>>>> job submission/control api which could be used by
>>> downstream
>>> > > > >>> project.
>>> > > > >>>>> And
>>> > > > >>>>>>> it would benefit for the flink ecosystem. When I look at
>>> the
>>> > code
>>> > > > >>> of
>>> > > > >>>>>> flink
>>> > > > >>>>>>> scala-shell and sql-client (I believe they are not the
>>> core of
>>> > > > >>> flink,
>>> > > > >>>>> but
>>> > > > >>>>>>> belong to the ecosystem of flink), I find many duplicated
>>> code
>>> > > > >> for
>>> > > > >>>>>> creating
>>> > > > >>>>>>> ClusterClient from user provided configuration
>>> (configuration
>>> > > > >>> format
>>> > > > >>>>> may
>>> > > > >>>>>> be
>>> > > > >>>>>>> different from scala-shell and sql-client) and then use
>>> that
>>> > > > >>>>>> ClusterClient
>>> > > > >>>>>>> to manipulate jobs. I don't think this is convenient for
>>> > > > >> downstream
>>> > > > >>>>>>> projects. What I expect is that downstream project only
>>> needs
>>> > to
>>> > > > >>>>> provide
>>> > > > >>>>>>> necessary configuration info (maybe introducing class
>>> > FlinkConf),
>>> > > > >>> and
>>> > > > >>>>>> then
>>> > > > >>>>>>> build ExecutionEnvironment based on this FlinkConf, and
>>> > > > >>>>>>> ExecutionEnvironment will create the proper ClusterClient.
>>> It
>>> > not
>>> > > > >>>> only
>>> > > > >>>>>>> benefit for the downstream project development but also be
>>> > > > >> helpful
>>> > > > >>>> for
>>> > > > >>>>>>> their integration test with flink. Here's one sample code
>>> > snippet
>>> > > > >>>> that
>>> > > > >>>>> I
>>> > > > >>>>>>> expect.
>>> > > > >>>>>>>
>>> > > > >>>>>>> val conf = new FlinkConf().mode("yarn")
>>> > > > >>>>>>> val env = new ExecutionEnvironment(conf)
>>> > > > >>>>>>> val jobId = env.submit(...)
>>> > > > >>>>>>> val jobStatus =
>>> env.getClusterClient().queryJobStatus(jobId)
>>> > > > >>>>>>> env.getClusterClient().cancelJob(jobId)
>>> > > > >>>>>>>
>>> > > > >>>>>>> What do you think ?
>>> > > > >>>>>>>
>>> > > > >>>>>>>
>>> > > > >>>>>>>
>>> > > > >>>>>>>
>>> > > > >>>>>>> Till Rohrmann <tr...@apache.org> 于2018年12月11日周二
>>> 下午6:28写道：
>>> > > > >>>>>>>
>>> > > > >>>>>>>> Hi Jeff,
>>> > > > >>>>>>>>
>>> > > > >>>>>>>> what you are proposing is to provide the user with better
>>> > > > >>>>> programmatic
>>> > > > >>>>>>> job
>>> > > > >>>>>>>> control. There was actually an effort to achieve this but
>>> it
>>> > > > >> has
>>> > > > >>>>> never
>>> > > > >>>>>>> been
>>> > > > >>>>>>>> completed [1]. However, there are some improvement in the
>>> code
>>> > > > >>> base
>>> > > > >>>>>> now.
>>> > > > >>>>>>>> Look for example at the NewClusterClient interface which
>>> > > > >> offers a
>>> > > > >>>>>>>> non-blocking job submission. But I agree that we need to
>>> > > > >> improve
>>> > > > >>>>> Flink
>>> > > > >>>>>> in
>>> > > > >>>>>>>> this regard.
>>> > > > >>>>>>>>
>>> > > > >>>>>>>> I would not be in favour if exposing all ClusterClient
>>> calls
>>> > > > >> via
>>> > > > >>>> the
>>> > > > >>>>>>>> ExecutionEnvironment because it would clutter the class
>>> and
>>> > > > >> would
>>> > > > >>>> not
>>> > > > >>>>>> be
>>> > > > >>>>>>> a
>>> > > > >>>>>>>> good separation of concerns. Instead one idea could be to
>>> > > > >>> retrieve
>>> > > > >>>>> the
>>> > > > >>>>>>>> current ClusterClient from the ExecutionEnvironment which
>>> can
>>> > > > >>> then
>>> > > > >>>> be
>>> > > > >>>>>>> used
>>> > > > >>>>>>>> for cluster and job control. But before we start an effort
>>> > > > >> here,
>>> > > > >>> we
>>> > > > >>>>>> need
>>> > > > >>>>>>> to
>>> > > > >>>>>>>> agree and capture what functionality we want to provide.
>>> > > > >>>>>>>>
>>> > > > >>>>>>>> Initially, the idea was that we have the ClusterDescriptor
>>> > > > >>>> describing
>>> > > > >>>>>> how
>>> > > > >>>>>>>> to talk to cluster manager like Yarn or Mesos. The
>>> > > > >>>> ClusterDescriptor
>>> > > > >>>>>> can
>>> > > > >>>>>>> be
>>> > > > >>>>>>>> used for deploying Flink clusters (job and session) and
>>> gives
>>> > > > >>> you a
>>> > > > >>>>>>>> ClusterClient. The ClusterClient controls the cluster
>>> (e.g.
>>> > > > >>>>> submitting
>>> > > > >>>>>>>> jobs, listing all running jobs). And then there was the
>>> idea
>>> > to
>>> > > > >>>>>>> introduce a
>>> > > > >>>>>>>> JobClient which you obtain from the ClusterClient to
>>> trigger
>>> > > > >> job
>>> > > > >>>>>> specific
>>> > > > >>>>>>>> operations (e.g. taking a savepoint, cancelling the job).
>>> > > > >>>>>>>>
>>> > > > >>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-4272
>>> > > > >>>>>>>>
>>> > > > >>>>>>>> Cheers,
>>> > > > >>>>>>>> Till
>>> > > > >>>>>>>>
>>> > > > >>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang <
>>> zjffdu@gmail.com
>>> > >
>>> > > > >>>>> wrote:
>>> > > > >>>>>>>>
>>> > > > >>>>>>>>> Hi Folks,
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>> I am trying to integrate flink into apache zeppelin
>>> which is
>>> > > > >> an
>>> > > > >>>>>>>> interactive
>>> > > > >>>>>>>>> notebook. And I hit several issues that is caused by
>>> flink
>>> > > > >>> client
>>> > > > >>>>>> api.
>>> > > > >>>>>>> So
>>> > > > >>>>>>>>> I'd like to proposal the following changes for flink
>>> client
>>> > > > >>> api.
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>> 1. Support nonblocking execution. Currently,
>>> > > > >>>>>>> ExecutionEnvironment#execute
>>> > > > >>>>>>>>> is a blocking method which would do 2 things, first
>>> submit
>>> > > > >> job
>>> > > > >>>> and
>>> > > > >>>>>> then
>>> > > > >>>>>>>>> wait for job until it is finished. I'd like introduce a
>>> > > > >>>> nonblocking
>>> > > > >>>>>>>>> execution method like ExecutionEnvironment#submit which
>>> only
>>> > > > >>>> submit
>>> > > > >>>>>> job
>>> > > > >>>>>>>> and
>>> > > > >>>>>>>>> then return jobId to client. And allow user to query the
>>> job
>>> > > > >>>> status
>>> > > > >>>>>> via
>>> > > > >>>>>>>> the
>>> > > > >>>>>>>>> jobId.
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>> 2. Add cancel api in
>>> > > > >>>>> ExecutionEnvironment/StreamExecutionEnvironment,
>>> > > > >>>>>>>>> currently the only way to cancel job is via cli
>>> (bin/flink),
>>> > > > >>> this
>>> > > > >>>>> is
>>> > > > >>>>>>> not
>>> > > > >>>>>>>>> convenient for downstream project to use this feature.
>>> So I'd
>>> > > > >>>> like
>>> > > > >>>>> to
>>> > > > >>>>>>> add
>>> > > > >>>>>>>>> cancel api in ExecutionEnvironment
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>> 3. Add savepoint api in
>>> > > > >>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>>> > > > >>>>>>>> It
>>> > > > >>>>>>>>> is similar as cancel api, we should use
>>> ExecutionEnvironment
>>> > > > >> as
>>> > > > >>>> the
>>> > > > >>>>>>>> unified
>>> > > > >>>>>>>>> api for third party to integrate with flink.
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>> 4. Add listener for job execution lifecycle. Something
>>> like
>>> > > > >>>>>> following,
>>> > > > >>>>>>> so
>>> > > > >>>>>>>>> that downstream project can do custom logic in the
>>> lifecycle
>>> > > > >> of
>>> > > > >>>>> job.
>>> > > > >>>>>>> e.g.
>>> > > > >>>>>>>>> Zeppelin would capture the jobId after job is submitted
>>> and
>>> > > > >>> then
>>> > > > >>>>> use
>>> > > > >>>>>>> this
>>> > > > >>>>>>>>> jobId to cancel it later when necessary.
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>> public interface JobListener {
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>>   void onJobSubmitted(JobID jobId);
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>>   void onJobExecuted(JobExecutionResult jobResult);
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>>   void onJobCanceled(JobID jobId);
>>> > > > >>>>>>>>> }
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>> 5. Enable session in ExecutionEnvironment. Currently it
>>> is
>>> > > > >>>>> disabled,
>>> > > > >>>>>>> but
>>> > > > >>>>>>>>> session is very convenient for third party to submitting
>>> jobs
>>> > > > >>>>>>>> continually.
>>> > > > >>>>>>>>> I hope flink can enable it again.
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>> 6. Unify all flink client api into
>>> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>> This is a long term issue which needs more careful
>>> thinking
>>> > > > >> and
>>> > > > >>>>>> design.
>>> > > > >>>>>>>>> Currently some of features of flink is exposed in
>>> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment, but
>>> some are
>>> > > > >>>>> exposed
>>> > > > >>>>>>> in
>>> > > > >>>>>>>>> cli instead of api, like the cancel and savepoint I
>>> mentioned
>>> > > > >>>>> above.
>>> > > > >>>>>> I
>>> > > > >>>>>>>>> think the root cause is due to that flink didn't unify
>>> the
>>> > > > >>>>>> interaction
>>> > > > >>>>>>>> with
>>> > > > >>>>>>>>> flink. Here I list 3 scenarios of flink operation
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>>   - Local job execution.  Flink will create
>>> LocalEnvironment
>>> > > > >>> and
>>> > > > >>>>>> then
>>> > > > >>>>>>>> use
>>> > > > >>>>>>>>>   this LocalEnvironment to create LocalExecutor for job
>>> > > > >>>> execution.
>>> > > > >>>>>>>>>   - Remote job execution. Flink will create ClusterClient
>>> > > > >>> first
>>> > > > >>>>> and
>>> > > > >>>>>>> then
>>> > > > >>>>>>>>>   create ContextEnvironment based on the ClusterClient
>>> and
>>> > > > >>> then
>>> > > > >>>>> run
>>> > > > >>>>>>> the
>>> > > > >>>>>>>>> job.
>>> > > > >>>>>>>>>   - Job cancelation. Flink will create ClusterClient
>>> first
>>> > > > >> and
>>> > > > >>>>> then
>>> > > > >>>>>>>> cancel
>>> > > > >>>>>>>>>   this job via this ClusterClient.
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>> As you can see in the above 3 scenarios. Flink didn't
>>> use the
>>> > > > >>>> same
>>> > > > >>>>>>>>> approach(code path) to interact with flink
>>> > > > >>>>>>>>> What I propose is following:
>>> > > > >>>>>>>>> Create the proper LocalEnvironment/RemoteEnvironment
>>> (based
>>> > > > >> on
>>> > > > >>>> user
>>> > > > >>>>>>>>> configuration) --> Use this Environment to create proper
>>> > > > >>>>>> ClusterClient
>>> > > > >>>>>>>>> (LocalClusterClient or RestClusterClient) to interactive
>>> with
>>> > > > >>>>> Flink (
>>> > > > >>>>>>> job
>>> > > > >>>>>>>>> execution or cancelation)
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>> This way we can unify the process of local execution and
>>> > > > >> remote
>>> > > > >>>>>>>> execution.
>>> > > > >>>>>>>>> And it is much easier for third party to integrate with
>>> > > > >> flink,
>>> > > > >>>>>> because
>>> > > > >>>>>>>>> ExecutionEnvironment is the unified entry point for
>>> flink.
>>> > > > >> What
>>> > > > >>>>> third
>>> > > > >>>>>>>> party
>>> > > > >>>>>>>>> needs to do is just pass configuration to
>>> > > > >> ExecutionEnvironment
>>> > > > >>>> and
>>> > > > >>>>>>>>> ExecutionEnvironment will do the right thing based on the
>>> > > > >>>>>>> configuration.
>>> > > > >>>>>>>>> Flink cli can also be considered as flink api consumer.
>>> it
>>> > > > >> just
>>> > > > >>>>> pass
>>> > > > >>>>>>> the
>>> > > > >>>>>>>>> configuration to ExecutionEnvironment and let
>>> > > > >>>> ExecutionEnvironment
>>> > > > >>>>> to
>>> > > > >>>>>>>>> create the proper ClusterClient instead of letting cli to
>>> > > > >>> create
>>> > > > >>>>>>>>> ClusterClient directly.
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>> 6 would involve large code refactoring, so I think we can
>>> > > > >> defer
>>> > > > >>>> it
>>> > > > >>>>>> for
>>> > > > >>>>>>>>> future release, 1,2,3,4,5 could be done at once I
>>> believe.
>>> > > > >> Let
>>> > > > >>> me
>>> > > > >>>>>> know
>>> > > > >>>>>>>> your
>>> > > > >>>>>>>>> comments and feedback, thanks
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>> --
>>> > > > >>>>>>>>> Best Regards
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>> Jeff Zhang
>>> > > > >>>>>>>>>
>>> > > > >>>>>>>>
>>> > > > >>>>>>>
>>> > > > >>>>>>>
>>> > > > >>>>>>> --
>>> > > > >>>>>>> Best Regards
>>> > > > >>>>>>>
>>> > > > >>>>>>> Jeff Zhang
>>> > > > >>>>>>>
>>> > > > >>>>>>
>>> > > > >>>>>
>>> > > > >>>>>
>>> > > > >>>>> --
>>> > > > >>>>> Best Regards
>>> > > > >>>>>
>>> > > > >>>>> Jeff Zhang
>>> > > > >>>>>
>>> > > > >>>>
>>> > > > >>>
>>> > > > >>>
>>> > > > >>> --
>>> > > > >>> Best Regards
>>> > > > >>>
>>> > > > >>> Jeff Zhang
>>> > > > >>>
>>> > > > >>
>>> > > > >
>>> > > > >
>>> > > > > --
>>> > > > > Best Regards
>>> > > > >
>>> > > > > Jeff Zhang
>>> > > >
>>> > > >
>>> > >
>>> > > --
>>> > > Best Regards
>>> > >
>>> > > Jeff Zhang
>>> > >
>>> >
>>>
>>

-- 
Best Regards

Jeff Zhang

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

Hi all,

After a closer look on our client apis, I can see there are two major
issues to consistency and integration, namely different deployment of
job cluster which couples job graph creation and cluster deployment,
and submission via CliFrontend confusing control flow of job graph
compilation and job submission. I'd like to follow the discuss above,
mainly the process described by Jeff and Stephan, and share my
ideas on these issues.

1) CliFrontend confuses the control flow of job compilation and submission.
Following the process of job submission Stephan and Jeff described,
execution environment knows all configs of the cluster and topos/settings
of the job. Ideally, in the main method of user program, it calls #execute
(or named #submit) and Flink deploys the cluster, compile the job graph
and submit it to the cluster. However, current CliFrontend does all these
things inside its #runProgram method, which introduces a lot of subclasses
of (stream) execution environment.

Actually, it sets up an exec env that hijacks the #execute/executePlan
method, initializes the job graph and abort execution. And then
control flow back to CliFrontend, it deploys the cluster(or retrieve
the client) and submits the job graph. This is quite a specific internal
process inside Flink and none of consistency to anything.

2) Deployment of job cluster couples job graph creation and cluster
deployment. Abstractly, from user job to a concrete submission, it requires

     create JobGraph --\

create ClusterClient -->  submit JobGraph

such a dependency. ClusterClient was created by deploying or retrieving.
JobGraph submission requires a compiled JobGraph and valid ClusterClient,
but the creation of ClusterClient is abstractly independent of that of
JobGraph. However, in job cluster mode, we deploy job cluster with a job
graph, which means we use another process:

create JobGraph --> deploy cluster with the JobGraph

Here is another inconsistency and downstream projects/client apis are
forced to handle different cases with rare supports from Flink.

Since we likely reached a consensus on

1. all configs gathered by Flink configuration and passed
2. execution environment knows all configs and handles execution(both
deployment and submission)

to the issues above I propose eliminating inconsistencies by following
approach:

1) CliFrontend should exactly be a front end, at least for "run" command.
That means it just gathered and passed all config from command line to
the main method of user program. Execution environment knows all the info
and with an addition to utils for ClusterClient, we gracefully get a
ClusterClient by deploying or retrieving. In this way, we don't need to
hijack #execute/executePlan methods and can remove various hacking
subclasses of exec env, as well as #run methods in ClusterClient(for an
interface-ized ClusterClient). Now the control flow flows from CliFrontend
to the main method and never returns.

2) Job cluster means a cluster for the specific job. From another
perspective, it is an ephemeral session. We may decouple the deployment
with a compiled job graph, but start a session with idle timeout
and submit the job following.

These topics, before we go into more details on design or implementation,
are better to be aware and discussed for a consensus.

Best,
tison.


Zili Chen <wa...@gmail.com> 于2019年6月20日周四 上午3:21写道：

> Hi Jeff,
>
> Thanks for raising this thread and the design document!
>
> As @Thomas Weise mentioned above, extending config to flink
> requires far more effort than it should be. Another example
> is we achieve detach mode by introduce another execution
> environment which also hijack #execute method.
>
> I agree with your idea that user would configure all things
> and flink "just" respect it. On this topic I think the unusual
> control flow when CliFrontend handle "run" command is the problem.
> It handles several configs, mainly about cluster settings, and
> thus main method of user program is unaware of them. Also it compiles
> app to job graph by run the main method with a hijacked exec env,
> which constrain the main method further.
>
> I'd like to write down a few of notes on configs/args pass and respect,
> as well as decoupling job compilation and submission. Share on this
> thread later.
>
> Best,
> tison.
>
>
> SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一 下午7:29写道：
>
>> Hi Jeff and Flavio,
>>
>> Thanks Jeff a lot for proposing the design document.
>>
>> We are also working on refactoring ClusterClient to allow flexible and
>> efficient job management in our real-time platform.
>> We would like to draft a document to share our ideas with you.
>>
>> I think it's a good idea to have something like Apache Livy for Flink, and
>> the efforts discussed here will take a great step forward to it.
>>
>> Regards,
>> Xiaogang
>>
>> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一 下午7:13写道：
>>
>> > Is there any possibility to have something like Apache Livy [1] also for
>> > Flink in the future?
>> >
>> > [1] https://livy.apache.org/
>> >
>> > On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <zj...@gmail.com> wrote:
>> >
>> > > >>>  Any API we expose should not have dependencies on the runtime
>> > > (flink-runtime) package or other implementation details. To me, this
>> > means
>> > > that the current ClusterClient cannot be exposed to users because it
>> >  uses
>> > > quite some classes from the optimiser and runtime packages.
>> > >
>> > > We should change ClusterClient from class to interface.
>> > > ExecutionEnvironment only use the interface ClusterClient which
>> should be
>> > > in flink-clients while the concrete implementation class could be in
>> > > flink-runtime.
>> > >
>> > > >>> What happens when a failure/restart in the client happens? There
>> need
>> > > to be a way of re-establishing the connection to the job, set up the
>> > > listeners again, etc.
>> > >
>> > > Good point.  First we need to define what does failure/restart in the
>> > > client mean. IIUC, that usually mean network failure which will
>> happen in
>> > > class RestClient. If my understanding is correct, restart/retry
>> mechanism
>> > > should be done in RestClient.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二 下午11:10写道：
>> > >
>> > > > Some points to consider:
>> > > >
>> > > > * Any API we expose should not have dependencies on the runtime
>> > > > (flink-runtime) package or other implementation details. To me, this
>> > > means
>> > > > that the current ClusterClient cannot be exposed to users because it
>> > >  uses
>> > > > quite some classes from the optimiser and runtime packages.
>> > > >
>> > > > * What happens when a failure/restart in the client happens? There
>> need
>> > > to
>> > > > be a way of re-establishing the connection to the job, set up the
>> > > listeners
>> > > > again, etc.
>> > > >
>> > > > Aljoscha
>> > > >
>> > > > > On 29. May 2019, at 10:17, Jeff Zhang <zj...@gmail.com> wrote:
>> > > > >
>> > > > > Sorry folks, the design doc is late as you expected. Here's the
>> > design
>> > > > doc
>> > > > > I drafted, welcome any comments and feedback.
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
>> > > > >
>> > > > >
>> > > > >
>> > > > > Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午8:43写道：
>> > > > >
>> > > > >> Nice that this discussion is happening.
>> > > > >>
>> > > > >> In the FLIP, we could also revisit the entire role of the
>> > environments
>> > > > >> again.
>> > > > >>
>> > > > >> Initially, the idea was:
>> > > > >>  - the environments take care of the specific setup for
>> standalone
>> > (no
>> > > > >> setup needed), yarn, mesos, etc.
>> > > > >>  - the session ones have control over the session. The
>> environment
>> > > holds
>> > > > >> the session client.
>> > > > >>  - running a job gives a "control" object for that job. That
>> > behavior
>> > > is
>> > > > >> the same in all environments.
>> > > > >>
>> > > > >> The actual implementation diverged quite a bit from that. Happy
>> to
>> > > see a
>> > > > >> discussion about straitening this out a bit more.
>> > > > >>
>> > > > >>
>> > > > >> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <zj...@gmail.com>
>> > wrote:
>> > > > >>
>> > > > >>> Hi folks,
>> > > > >>>
>> > > > >>> Sorry for late response, It seems we reach consensus on this, I
>> > will
>> > > > >> create
>> > > > >>> FLIP for this with more detailed design
>> > > > >>>
>> > > > >>>
>> > > > >>> Thomas Weise <th...@apache.org> 于2018年12月21日周五 上午11:43写道：
>> > > > >>>
>> > > > >>>> Great to see this discussion seeded! The problems you face with
>> > the
>> > > > >>>> Zeppelin integration are also affecting other downstream
>> projects,
>> > > > like
>> > > > >>>> Beam.
>> > > > >>>>
>> > > > >>>> We just enabled the savepoint restore option in
>> > > > RemoteStreamEnvironment
>> > > > >>> [1]
>> > > > >>>> and that was more difficult than it should be. The main issue
>> is
>> > > that
>> > > > >>>> environment and cluster client aren't decoupled. Ideally it
>> should
>> > > be
>> > > > >>>> possible to just get the matching cluster client from the
>> > > environment
>> > > > >> and
>> > > > >>>> then control the job through it (environment as factory for
>> > cluster
>> > > > >>>> client). But note that the environment classes are part of the
>> > > public
>> > > > >>> API,
>> > > > >>>> and it is not straightforward to make larger changes without
>> > > breaking
>> > > > >>>> backward compatibility.
>> > > > >>>>
>> > > > >>>> ClusterClient currently exposes internal classes like JobGraph
>> and
>> > > > >>>> StreamGraph. But it should be possible to wrap this with a new
>> > > public
>> > > > >> API
>> > > > >>>> that brings the required job control capabilities for
>> downstream
>> > > > >>> projects.
>> > > > >>>> Perhaps it is helpful to look at some of the interfaces in Beam
>> > > while
>> > > > >>>> thinking about this: [2] for the portable job API and [3] for
>> the
>> > > old
>> > > > >>>> asynchronous job control from the Beam Java SDK.
>> > > > >>>>
>> > > > >>>> The backward compatibility discussion [4] is also relevant
>> here. A
>> > > new
>> > > > >>> API
>> > > > >>>> should shield downstream projects from internals and allow
>> them to
>> > > > >>>> interoperate with multiple future Flink versions in the same
>> > release
>> > > > >> line
>> > > > >>>> without forced upgrades.
>> > > > >>>>
>> > > > >>>> Thanks,
>> > > > >>>> Thomas
>> > > > >>>>
>> > > > >>>> [1] https://github.com/apache/flink/pull/7249
>> > > > >>>> [2]
>> > > > >>>>
>> > > > >>>>
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
>> > > > >>>> [3]
>> > > > >>>>
>> > > > >>>>
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
>> > > > >>>> [4]
>> > > > >>>>
>> > > > >>>>
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
>> > > > >>>>
>> > > > >>>>
>> > > > >>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <zj...@gmail.com>
>> > > wrote:
>> > > > >>>>
>> > > > >>>>>>>> I'm not so sure whether the user should be able to define
>> > where
>> > > > >> the
>> > > > >>>> job
>> > > > >>>>> runs (in your example Yarn). This is actually independent of
>> the
>> > > job
>> > > > >>>>> development and is something which is decided at deployment
>> time.
>> > > > >>>>>
>> > > > >>>>> User don't need to specify execution mode programmatically.
>> They
>> > > can
>> > > > >>> also
>> > > > >>>>> pass the execution mode from the arguments in flink run
>> command.
>> > > e.g.
>> > > > >>>>>
>> > > > >>>>> bin/flink run -m yarn-cluster ....
>> > > > >>>>> bin/flink run -m local ...
>> > > > >>>>> bin/flink run -m host:port ...
>> > > > >>>>>
>> > > > >>>>> Does this make sense to you ?
>> > > > >>>>>
>> > > > >>>>>>>> To me it makes sense that the ExecutionEnvironment is not
>> > > > >> directly
>> > > > >>>>> initialized by the user and instead context sensitive how you
>> > want
>> > > to
>> > > > >>>>> execute your job (Flink CLI vs. IDE, for example).
>> > > > >>>>>
>> > > > >>>>> Right, currently I notice Flink would create different
>> > > > >>>>> ContextExecutionEnvironment based on different submission
>> > scenarios
>> > > > >>>> (Flink
>> > > > >>>>> Cli vs IDE). To me this is kind of hack approach, not so
>> > > > >>> straightforward.
>> > > > >>>>> What I suggested above is that is that flink should always
>> create
>> > > the
>> > > > >>>> same
>> > > > >>>>> ExecutionEnvironment but with different configuration, and
>> based
>> > on
>> > > > >> the
>> > > > >>>>> configuration it would create the proper ClusterClient for
>> > > different
>> > > > >>>>> behaviors.
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>> Till Rohrmann <tr...@apache.org> 于2018年12月20日周四
>> 下午11:18写道：
>> > > > >>>>>
>> > > > >>>>>> You are probably right that we have code duplication when it
>> > comes
>> > > > >> to
>> > > > >>>> the
>> > > > >>>>>> creation of the ClusterClient. This should be reduced in the
>> > > > >> future.
>> > > > >>>>>>
>> > > > >>>>>> I'm not so sure whether the user should be able to define
>> where
>> > > the
>> > > > >>> job
>> > > > >>>>>> runs (in your example Yarn). This is actually independent of
>> the
>> > > > >> job
>> > > > >>>>>> development and is something which is decided at deployment
>> > time.
>> > > > >> To
>> > > > >>> me
>> > > > >>>>> it
>> > > > >>>>>> makes sense that the ExecutionEnvironment is not directly
>> > > > >> initialized
>> > > > >>>> by
>> > > > >>>>>> the user and instead context sensitive how you want to
>> execute
>> > > your
>> > > > >>> job
>> > > > >>>>>> (Flink CLI vs. IDE, for example). However, I agree that the
>> > > > >>>>>> ExecutionEnvironment should give you access to the
>> ClusterClient
>> > > > >> and
>> > > > >>> to
>> > > > >>>>> the
>> > > > >>>>>> job (maybe in the form of the JobGraph or a job plan).
>> > > > >>>>>>
>> > > > >>>>>> Cheers,
>> > > > >>>>>> Till
>> > > > >>>>>>
>> > > > >>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <zjffdu@gmail.com
>> >
>> > > > >> wrote:
>> > > > >>>>>>
>> > > > >>>>>>> Hi Till,
>> > > > >>>>>>> Thanks for the feedback. You are right that I expect better
>> > > > >>>>> programmatic
>> > > > >>>>>>> job submission/control api which could be used by downstream
>> > > > >>> project.
>> > > > >>>>> And
>> > > > >>>>>>> it would benefit for the flink ecosystem. When I look at the
>> > code
>> > > > >>> of
>> > > > >>>>>> flink
>> > > > >>>>>>> scala-shell and sql-client (I believe they are not the core
>> of
>> > > > >>> flink,
>> > > > >>>>> but
>> > > > >>>>>>> belong to the ecosystem of flink), I find many duplicated
>> code
>> > > > >> for
>> > > > >>>>>> creating
>> > > > >>>>>>> ClusterClient from user provided configuration
>> (configuration
>> > > > >>> format
>> > > > >>>>> may
>> > > > >>>>>> be
>> > > > >>>>>>> different from scala-shell and sql-client) and then use that
>> > > > >>>>>> ClusterClient
>> > > > >>>>>>> to manipulate jobs. I don't think this is convenient for
>> > > > >> downstream
>> > > > >>>>>>> projects. What I expect is that downstream project only
>> needs
>> > to
>> > > > >>>>> provide
>> > > > >>>>>>> necessary configuration info (maybe introducing class
>> > FlinkConf),
>> > > > >>> and
>> > > > >>>>>> then
>> > > > >>>>>>> build ExecutionEnvironment based on this FlinkConf, and
>> > > > >>>>>>> ExecutionEnvironment will create the proper ClusterClient.
>> It
>> > not
>> > > > >>>> only
>> > > > >>>>>>> benefit for the downstream project development but also be
>> > > > >> helpful
>> > > > >>>> for
>> > > > >>>>>>> their integration test with flink. Here's one sample code
>> > snippet
>> > > > >>>> that
>> > > > >>>>> I
>> > > > >>>>>>> expect.
>> > > > >>>>>>>
>> > > > >>>>>>> val conf = new FlinkConf().mode("yarn")
>> > > > >>>>>>> val env = new ExecutionEnvironment(conf)
>> > > > >>>>>>> val jobId = env.submit(...)
>> > > > >>>>>>> val jobStatus = env.getClusterClient().queryJobStatus(jobId)
>> > > > >>>>>>> env.getClusterClient().cancelJob(jobId)
>> > > > >>>>>>>
>> > > > >>>>>>> What do you think ?
>> > > > >>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>>> Till Rohrmann <tr...@apache.org> 于2018年12月11日周二
>> 下午6:28写道：
>> > > > >>>>>>>
>> > > > >>>>>>>> Hi Jeff,
>> > > > >>>>>>>>
>> > > > >>>>>>>> what you are proposing is to provide the user with better
>> > > > >>>>> programmatic
>> > > > >>>>>>> job
>> > > > >>>>>>>> control. There was actually an effort to achieve this but
>> it
>> > > > >> has
>> > > > >>>>> never
>> > > > >>>>>>> been
>> > > > >>>>>>>> completed [1]. However, there are some improvement in the
>> code
>> > > > >>> base
>> > > > >>>>>> now.
>> > > > >>>>>>>> Look for example at the NewClusterClient interface which
>> > > > >> offers a
>> > > > >>>>>>>> non-blocking job submission. But I agree that we need to
>> > > > >> improve
>> > > > >>>>> Flink
>> > > > >>>>>> in
>> > > > >>>>>>>> this regard.
>> > > > >>>>>>>>
>> > > > >>>>>>>> I would not be in favour if exposing all ClusterClient
>> calls
>> > > > >> via
>> > > > >>>> the
>> > > > >>>>>>>> ExecutionEnvironment because it would clutter the class and
>> > > > >> would
>> > > > >>>> not
>> > > > >>>>>> be
>> > > > >>>>>>> a
>> > > > >>>>>>>> good separation of concerns. Instead one idea could be to
>> > > > >>> retrieve
>> > > > >>>>> the
>> > > > >>>>>>>> current ClusterClient from the ExecutionEnvironment which
>> can
>> > > > >>> then
>> > > > >>>> be
>> > > > >>>>>>> used
>> > > > >>>>>>>> for cluster and job control. But before we start an effort
>> > > > >> here,
>> > > > >>> we
>> > > > >>>>>> need
>> > > > >>>>>>> to
>> > > > >>>>>>>> agree and capture what functionality we want to provide.
>> > > > >>>>>>>>
>> > > > >>>>>>>> Initially, the idea was that we have the ClusterDescriptor
>> > > > >>>> describing
>> > > > >>>>>> how
>> > > > >>>>>>>> to talk to cluster manager like Yarn or Mesos. The
>> > > > >>>> ClusterDescriptor
>> > > > >>>>>> can
>> > > > >>>>>>> be
>> > > > >>>>>>>> used for deploying Flink clusters (job and session) and
>> gives
>> > > > >>> you a
>> > > > >>>>>>>> ClusterClient. The ClusterClient controls the cluster (e.g.
>> > > > >>>>> submitting
>> > > > >>>>>>>> jobs, listing all running jobs). And then there was the
>> idea
>> > to
>> > > > >>>>>>> introduce a
>> > > > >>>>>>>> JobClient which you obtain from the ClusterClient to
>> trigger
>> > > > >> job
>> > > > >>>>>> specific
>> > > > >>>>>>>> operations (e.g. taking a savepoint, cancelling the job).
>> > > > >>>>>>>>
>> > > > >>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-4272
>> > > > >>>>>>>>
>> > > > >>>>>>>> Cheers,
>> > > > >>>>>>>> Till
>> > > > >>>>>>>>
>> > > > >>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang <
>> zjffdu@gmail.com
>> > >
>> > > > >>>>> wrote:
>> > > > >>>>>>>>
>> > > > >>>>>>>>> Hi Folks,
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> I am trying to integrate flink into apache zeppelin which
>> is
>> > > > >> an
>> > > > >>>>>>>> interactive
>> > > > >>>>>>>>> notebook. And I hit several issues that is caused by flink
>> > > > >>> client
>> > > > >>>>>> api.
>> > > > >>>>>>> So
>> > > > >>>>>>>>> I'd like to proposal the following changes for flink
>> client
>> > > > >>> api.
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> 1. Support nonblocking execution. Currently,
>> > > > >>>>>>> ExecutionEnvironment#execute
>> > > > >>>>>>>>> is a blocking method which would do 2 things, first submit
>> > > > >> job
>> > > > >>>> and
>> > > > >>>>>> then
>> > > > >>>>>>>>> wait for job until it is finished. I'd like introduce a
>> > > > >>>> nonblocking
>> > > > >>>>>>>>> execution method like ExecutionEnvironment#submit which
>> only
>> > > > >>>> submit
>> > > > >>>>>> job
>> > > > >>>>>>>> and
>> > > > >>>>>>>>> then return jobId to client. And allow user to query the
>> job
>> > > > >>>> status
>> > > > >>>>>> via
>> > > > >>>>>>>> the
>> > > > >>>>>>>>> jobId.
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> 2. Add cancel api in
>> > > > >>>>> ExecutionEnvironment/StreamExecutionEnvironment,
>> > > > >>>>>>>>> currently the only way to cancel job is via cli
>> (bin/flink),
>> > > > >>> this
>> > > > >>>>> is
>> > > > >>>>>>> not
>> > > > >>>>>>>>> convenient for downstream project to use this feature. So
>> I'd
>> > > > >>>> like
>> > > > >>>>> to
>> > > > >>>>>>> add
>> > > > >>>>>>>>> cancel api in ExecutionEnvironment
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> 3. Add savepoint api in
>> > > > >>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>> > > > >>>>>>>> It
>> > > > >>>>>>>>> is similar as cancel api, we should use
>> ExecutionEnvironment
>> > > > >> as
>> > > > >>>> the
>> > > > >>>>>>>> unified
>> > > > >>>>>>>>> api for third party to integrate with flink.
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> 4. Add listener for job execution lifecycle. Something
>> like
>> > > > >>>>>> following,
>> > > > >>>>>>> so
>> > > > >>>>>>>>> that downstream project can do custom logic in the
>> lifecycle
>> > > > >> of
>> > > > >>>>> job.
>> > > > >>>>>>> e.g.
>> > > > >>>>>>>>> Zeppelin would capture the jobId after job is submitted
>> and
>> > > > >>> then
>> > > > >>>>> use
>> > > > >>>>>>> this
>> > > > >>>>>>>>> jobId to cancel it later when necessary.
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> public interface JobListener {
>> > > > >>>>>>>>>
>> > > > >>>>>>>>>   void onJobSubmitted(JobID jobId);
>> > > > >>>>>>>>>
>> > > > >>>>>>>>>   void onJobExecuted(JobExecutionResult jobResult);
>> > > > >>>>>>>>>
>> > > > >>>>>>>>>   void onJobCanceled(JobID jobId);
>> > > > >>>>>>>>> }
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> 5. Enable session in ExecutionEnvironment. Currently it is
>> > > > >>>>> disabled,
>> > > > >>>>>>> but
>> > > > >>>>>>>>> session is very convenient for third party to submitting
>> jobs
>> > > > >>>>>>>> continually.
>> > > > >>>>>>>>> I hope flink can enable it again.
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> 6. Unify all flink client api into
>> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> This is a long term issue which needs more careful
>> thinking
>> > > > >> and
>> > > > >>>>>> design.
>> > > > >>>>>>>>> Currently some of features of flink is exposed in
>> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment, but some
>> are
>> > > > >>>>> exposed
>> > > > >>>>>>> in
>> > > > >>>>>>>>> cli instead of api, like the cancel and savepoint I
>> mentioned
>> > > > >>>>> above.
>> > > > >>>>>> I
>> > > > >>>>>>>>> think the root cause is due to that flink didn't unify the
>> > > > >>>>>> interaction
>> > > > >>>>>>>> with
>> > > > >>>>>>>>> flink. Here I list 3 scenarios of flink operation
>> > > > >>>>>>>>>
>> > > > >>>>>>>>>   - Local job execution.  Flink will create
>> LocalEnvironment
>> > > > >>> and
>> > > > >>>>>> then
>> > > > >>>>>>>> use
>> > > > >>>>>>>>>   this LocalEnvironment to create LocalExecutor for job
>> > > > >>>> execution.
>> > > > >>>>>>>>>   - Remote job execution. Flink will create ClusterClient
>> > > > >>> first
>> > > > >>>>> and
>> > > > >>>>>>> then
>> > > > >>>>>>>>>   create ContextEnvironment based on the ClusterClient and
>> > > > >>> then
>> > > > >>>>> run
>> > > > >>>>>>> the
>> > > > >>>>>>>>> job.
>> > > > >>>>>>>>>   - Job cancelation. Flink will create ClusterClient first
>> > > > >> and
>> > > > >>>>> then
>> > > > >>>>>>>> cancel
>> > > > >>>>>>>>>   this job via this ClusterClient.
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> As you can see in the above 3 scenarios. Flink didn't use
>> the
>> > > > >>>> same
>> > > > >>>>>>>>> approach(code path) to interact with flink
>> > > > >>>>>>>>> What I propose is following:
>> > > > >>>>>>>>> Create the proper LocalEnvironment/RemoteEnvironment
>> (based
>> > > > >> on
>> > > > >>>> user
>> > > > >>>>>>>>> configuration) --> Use this Environment to create proper
>> > > > >>>>>> ClusterClient
>> > > > >>>>>>>>> (LocalClusterClient or RestClusterClient) to interactive
>> with
>> > > > >>>>> Flink (
>> > > > >>>>>>> job
>> > > > >>>>>>>>> execution or cancelation)
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> This way we can unify the process of local execution and
>> > > > >> remote
>> > > > >>>>>>>> execution.
>> > > > >>>>>>>>> And it is much easier for third party to integrate with
>> > > > >> flink,
>> > > > >>>>>> because
>> > > > >>>>>>>>> ExecutionEnvironment is the unified entry point for flink.
>> > > > >> What
>> > > > >>>>> third
>> > > > >>>>>>>> party
>> > > > >>>>>>>>> needs to do is just pass configuration to
>> > > > >> ExecutionEnvironment
>> > > > >>>> and
>> > > > >>>>>>>>> ExecutionEnvironment will do the right thing based on the
>> > > > >>>>>>> configuration.
>> > > > >>>>>>>>> Flink cli can also be considered as flink api consumer. it
>> > > > >> just
>> > > > >>>>> pass
>> > > > >>>>>>> the
>> > > > >>>>>>>>> configuration to ExecutionEnvironment and let
>> > > > >>>> ExecutionEnvironment
>> > > > >>>>> to
>> > > > >>>>>>>>> create the proper ClusterClient instead of letting cli to
>> > > > >>> create
>> > > > >>>>>>>>> ClusterClient directly.
>> > > > >>>>>>>>>
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> 6 would involve large code refactoring, so I think we can
>> > > > >> defer
>> > > > >>>> it
>> > > > >>>>>> for
>> > > > >>>>>>>>> future release, 1,2,3,4,5 could be done at once I believe.
>> > > > >> Let
>> > > > >>> me
>> > > > >>>>>> know
>> > > > >>>>>>>> your
>> > > > >>>>>>>>> comments and feedback, thanks
>> > > > >>>>>>>>>
>> > > > >>>>>>>>>
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> --
>> > > > >>>>>>>>> Best Regards
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> Jeff Zhang
>> > > > >>>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>>> --
>> > > > >>>>>>> Best Regards
>> > > > >>>>>>>
>> > > > >>>>>>> Jeff Zhang
>> > > > >>>>>>>
>> > > > >>>>>>
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>> --
>> > > > >>>>> Best Regards
>> > > > >>>>>
>> > > > >>>>> Jeff Zhang
>> > > > >>>>>
>> > > > >>>>
>> > > > >>>
>> > > > >>>
>> > > > >>> --
>> > > > >>> Best Regards
>> > > > >>>
>> > > > >>> Jeff Zhang
>> > > > >>>
>> > > > >>
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Best Regards
>> > > > >
>> > > > > Jeff Zhang
>> > > >
>> > > >
>> > >
>> > > --
>> > > Best Regards
>> > >
>> > > Jeff Zhang
>> > >
>> >
>>
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Zili Chen <wa...@gmail.com>.

Hi Jeff,

Thanks for raising this thread and the design document!

As @Thomas Weise mentioned above, extending config to flink
requires far more effort than it should be. Another example
is we achieve detach mode by introduce another execution
environment which also hijack #execute method.

I agree with your idea that user would configure all things
and flink "just" respect it. On this topic I think the unusual
control flow when CliFrontend handle "run" command is the problem.
It handles several configs, mainly about cluster settings, and
thus main method of user program is unaware of them. Also it compiles
app to job graph by run the main method with a hijacked exec env,
which constrain the main method further.

I'd like to write down a few of notes on configs/args pass and respect,
as well as decoupling job compilation and submission. Share on this
thread later.

Best,
tison.


SHI Xiaogang <sh...@gmail.com> 于2019年6月17日周一 下午7:29写道：

> Hi Jeff and Flavio,
>
> Thanks Jeff a lot for proposing the design document.
>
> We are also working on refactoring ClusterClient to allow flexible and
> efficient job management in our real-time platform.
> We would like to draft a document to share our ideas with you.
>
> I think it's a good idea to have something like Apache Livy for Flink, and
> the efforts discussed here will take a great step forward to it.
>
> Regards,
> Xiaogang
>
> Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一 下午7:13写道：
>
> > Is there any possibility to have something like Apache Livy [1] also for
> > Flink in the future?
> >
> > [1] https://livy.apache.org/
> >
> > On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <zj...@gmail.com> wrote:
> >
> > > >>>  Any API we expose should not have dependencies on the runtime
> > > (flink-runtime) package or other implementation details. To me, this
> > means
> > > that the current ClusterClient cannot be exposed to users because it
> >  uses
> > > quite some classes from the optimiser and runtime packages.
> > >
> > > We should change ClusterClient from class to interface.
> > > ExecutionEnvironment only use the interface ClusterClient which should
> be
> > > in flink-clients while the concrete implementation class could be in
> > > flink-runtime.
> > >
> > > >>> What happens when a failure/restart in the client happens? There
> need
> > > to be a way of re-establishing the connection to the job, set up the
> > > listeners again, etc.
> > >
> > > Good point.  First we need to define what does failure/restart in the
> > > client mean. IIUC, that usually mean network failure which will happen
> in
> > > class RestClient. If my understanding is correct, restart/retry
> mechanism
> > > should be done in RestClient.
> > >
> > >
> > >
> > >
> > >
> > > Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二 下午11:10写道：
> > >
> > > > Some points to consider:
> > > >
> > > > * Any API we expose should not have dependencies on the runtime
> > > > (flink-runtime) package or other implementation details. To me, this
> > > means
> > > > that the current ClusterClient cannot be exposed to users because it
> > >  uses
> > > > quite some classes from the optimiser and runtime packages.
> > > >
> > > > * What happens when a failure/restart in the client happens? There
> need
> > > to
> > > > be a way of re-establishing the connection to the job, set up the
> > > listeners
> > > > again, etc.
> > > >
> > > > Aljoscha
> > > >
> > > > > On 29. May 2019, at 10:17, Jeff Zhang <zj...@gmail.com> wrote:
> > > > >
> > > > > Sorry folks, the design doc is late as you expected. Here's the
> > design
> > > > doc
> > > > > I drafted, welcome any comments and feedback.
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > > >
> > > > >
> > > > >
> > > > > Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午8:43写道：
> > > > >
> > > > >> Nice that this discussion is happening.
> > > > >>
> > > > >> In the FLIP, we could also revisit the entire role of the
> > environments
> > > > >> again.
> > > > >>
> > > > >> Initially, the idea was:
> > > > >>  - the environments take care of the specific setup for standalone
> > (no
> > > > >> setup needed), yarn, mesos, etc.
> > > > >>  - the session ones have control over the session. The environment
> > > holds
> > > > >> the session client.
> > > > >>  - running a job gives a "control" object for that job. That
> > behavior
> > > is
> > > > >> the same in all environments.
> > > > >>
> > > > >> The actual implementation diverged quite a bit from that. Happy to
> > > see a
> > > > >> discussion about straitening this out a bit more.
> > > > >>
> > > > >>
> > > > >> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <zj...@gmail.com>
> > wrote:
> > > > >>
> > > > >>> Hi folks,
> > > > >>>
> > > > >>> Sorry for late response, It seems we reach consensus on this, I
> > will
> > > > >> create
> > > > >>> FLIP for this with more detailed design
> > > > >>>
> > > > >>>
> > > > >>> Thomas Weise <th...@apache.org> 于2018年12月21日周五 上午11:43写道：
> > > > >>>
> > > > >>>> Great to see this discussion seeded! The problems you face with
> > the
> > > > >>>> Zeppelin integration are also affecting other downstream
> projects,
> > > > like
> > > > >>>> Beam.
> > > > >>>>
> > > > >>>> We just enabled the savepoint restore option in
> > > > RemoteStreamEnvironment
> > > > >>> [1]
> > > > >>>> and that was more difficult than it should be. The main issue is
> > > that
> > > > >>>> environment and cluster client aren't decoupled. Ideally it
> should
> > > be
> > > > >>>> possible to just get the matching cluster client from the
> > > environment
> > > > >> and
> > > > >>>> then control the job through it (environment as factory for
> > cluster
> > > > >>>> client). But note that the environment classes are part of the
> > > public
> > > > >>> API,
> > > > >>>> and it is not straightforward to make larger changes without
> > > breaking
> > > > >>>> backward compatibility.
> > > > >>>>
> > > > >>>> ClusterClient currently exposes internal classes like JobGraph
> and
> > > > >>>> StreamGraph. But it should be possible to wrap this with a new
> > > public
> > > > >> API
> > > > >>>> that brings the required job control capabilities for downstream
> > > > >>> projects.
> > > > >>>> Perhaps it is helpful to look at some of the interfaces in Beam
> > > while
> > > > >>>> thinking about this: [2] for the portable job API and [3] for
> the
> > > old
> > > > >>>> asynchronous job control from the Beam Java SDK.
> > > > >>>>
> > > > >>>> The backward compatibility discussion [4] is also relevant
> here. A
> > > new
> > > > >>> API
> > > > >>>> should shield downstream projects from internals and allow them
> to
> > > > >>>> interoperate with multiple future Flink versions in the same
> > release
> > > > >> line
> > > > >>>> without forced upgrades.
> > > > >>>>
> > > > >>>> Thanks,
> > > > >>>> Thomas
> > > > >>>>
> > > > >>>> [1] https://github.com/apache/flink/pull/7249
> > > > >>>> [2]
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > > >>>> [3]
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > > >>>> [4]
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > > >>>>
> > > > >>>>
> > > > >>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <zj...@gmail.com>
> > > wrote:
> > > > >>>>
> > > > >>>>>>>> I'm not so sure whether the user should be able to define
> > where
> > > > >> the
> > > > >>>> job
> > > > >>>>> runs (in your example Yarn). This is actually independent of
> the
> > > job
> > > > >>>>> development and is something which is decided at deployment
> time.
> > > > >>>>>
> > > > >>>>> User don't need to specify execution mode programmatically.
> They
> > > can
> > > > >>> also
> > > > >>>>> pass the execution mode from the arguments in flink run
> command.
> > > e.g.
> > > > >>>>>
> > > > >>>>> bin/flink run -m yarn-cluster ....
> > > > >>>>> bin/flink run -m local ...
> > > > >>>>> bin/flink run -m host:port ...
> > > > >>>>>
> > > > >>>>> Does this make sense to you ?
> > > > >>>>>
> > > > >>>>>>>> To me it makes sense that the ExecutionEnvironment is not
> > > > >> directly
> > > > >>>>> initialized by the user and instead context sensitive how you
> > want
> > > to
> > > > >>>>> execute your job (Flink CLI vs. IDE, for example).
> > > > >>>>>
> > > > >>>>> Right, currently I notice Flink would create different
> > > > >>>>> ContextExecutionEnvironment based on different submission
> > scenarios
> > > > >>>> (Flink
> > > > >>>>> Cli vs IDE). To me this is kind of hack approach, not so
> > > > >>> straightforward.
> > > > >>>>> What I suggested above is that is that flink should always
> create
> > > the
> > > > >>>> same
> > > > >>>>> ExecutionEnvironment but with different configuration, and
> based
> > on
> > > > >> the
> > > > >>>>> configuration it would create the proper ClusterClient for
> > > different
> > > > >>>>> behaviors.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Till Rohrmann <tr...@apache.org> 于2018年12月20日周四 下午11:18写道：
> > > > >>>>>
> > > > >>>>>> You are probably right that we have code duplication when it
> > comes
> > > > >> to
> > > > >>>> the
> > > > >>>>>> creation of the ClusterClient. This should be reduced in the
> > > > >> future.
> > > > >>>>>>
> > > > >>>>>> I'm not so sure whether the user should be able to define
> where
> > > the
> > > > >>> job
> > > > >>>>>> runs (in your example Yarn). This is actually independent of
> the
> > > > >> job
> > > > >>>>>> development and is something which is decided at deployment
> > time.
> > > > >> To
> > > > >>> me
> > > > >>>>> it
> > > > >>>>>> makes sense that the ExecutionEnvironment is not directly
> > > > >> initialized
> > > > >>>> by
> > > > >>>>>> the user and instead context sensitive how you want to execute
> > > your
> > > > >>> job
> > > > >>>>>> (Flink CLI vs. IDE, for example). However, I agree that the
> > > > >>>>>> ExecutionEnvironment should give you access to the
> ClusterClient
> > > > >> and
> > > > >>> to
> > > > >>>>> the
> > > > >>>>>> job (maybe in the form of the JobGraph or a job plan).
> > > > >>>>>>
> > > > >>>>>> Cheers,
> > > > >>>>>> Till
> > > > >>>>>>
> > > > >>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <zj...@gmail.com>
> > > > >> wrote:
> > > > >>>>>>
> > > > >>>>>>> Hi Till,
> > > > >>>>>>> Thanks for the feedback. You are right that I expect better
> > > > >>>>> programmatic
> > > > >>>>>>> job submission/control api which could be used by downstream
> > > > >>> project.
> > > > >>>>> And
> > > > >>>>>>> it would benefit for the flink ecosystem. When I look at the
> > code
> > > > >>> of
> > > > >>>>>> flink
> > > > >>>>>>> scala-shell and sql-client (I believe they are not the core
> of
> > > > >>> flink,
> > > > >>>>> but
> > > > >>>>>>> belong to the ecosystem of flink), I find many duplicated
> code
> > > > >> for
> > > > >>>>>> creating
> > > > >>>>>>> ClusterClient from user provided configuration (configuration
> > > > >>> format
> > > > >>>>> may
> > > > >>>>>> be
> > > > >>>>>>> different from scala-shell and sql-client) and then use that
> > > > >>>>>> ClusterClient
> > > > >>>>>>> to manipulate jobs. I don't think this is convenient for
> > > > >> downstream
> > > > >>>>>>> projects. What I expect is that downstream project only needs
> > to
> > > > >>>>> provide
> > > > >>>>>>> necessary configuration info (maybe introducing class
> > FlinkConf),
> > > > >>> and
> > > > >>>>>> then
> > > > >>>>>>> build ExecutionEnvironment based on this FlinkConf, and
> > > > >>>>>>> ExecutionEnvironment will create the proper ClusterClient. It
> > not
> > > > >>>> only
> > > > >>>>>>> benefit for the downstream project development but also be
> > > > >> helpful
> > > > >>>> for
> > > > >>>>>>> their integration test with flink. Here's one sample code
> > snippet
> > > > >>>> that
> > > > >>>>> I
> > > > >>>>>>> expect.
> > > > >>>>>>>
> > > > >>>>>>> val conf = new FlinkConf().mode("yarn")
> > > > >>>>>>> val env = new ExecutionEnvironment(conf)
> > > > >>>>>>> val jobId = env.submit(...)
> > > > >>>>>>> val jobStatus = env.getClusterClient().queryJobStatus(jobId)
> > > > >>>>>>> env.getClusterClient().cancelJob(jobId)
> > > > >>>>>>>
> > > > >>>>>>> What do you think ?
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Till Rohrmann <tr...@apache.org> 于2018年12月11日周二
> 下午6:28写道：
> > > > >>>>>>>
> > > > >>>>>>>> Hi Jeff,
> > > > >>>>>>>>
> > > > >>>>>>>> what you are proposing is to provide the user with better
> > > > >>>>> programmatic
> > > > >>>>>>> job
> > > > >>>>>>>> control. There was actually an effort to achieve this but it
> > > > >> has
> > > > >>>>> never
> > > > >>>>>>> been
> > > > >>>>>>>> completed [1]. However, there are some improvement in the
> code
> > > > >>> base
> > > > >>>>>> now.
> > > > >>>>>>>> Look for example at the NewClusterClient interface which
> > > > >> offers a
> > > > >>>>>>>> non-blocking job submission. But I agree that we need to
> > > > >> improve
> > > > >>>>> Flink
> > > > >>>>>> in
> > > > >>>>>>>> this regard.
> > > > >>>>>>>>
> > > > >>>>>>>> I would not be in favour if exposing all ClusterClient calls
> > > > >> via
> > > > >>>> the
> > > > >>>>>>>> ExecutionEnvironment because it would clutter the class and
> > > > >> would
> > > > >>>> not
> > > > >>>>>> be
> > > > >>>>>>> a
> > > > >>>>>>>> good separation of concerns. Instead one idea could be to
> > > > >>> retrieve
> > > > >>>>> the
> > > > >>>>>>>> current ClusterClient from the ExecutionEnvironment which
> can
> > > > >>> then
> > > > >>>> be
> > > > >>>>>>> used
> > > > >>>>>>>> for cluster and job control. But before we start an effort
> > > > >> here,
> > > > >>> we
> > > > >>>>>> need
> > > > >>>>>>> to
> > > > >>>>>>>> agree and capture what functionality we want to provide.
> > > > >>>>>>>>
> > > > >>>>>>>> Initially, the idea was that we have the ClusterDescriptor
> > > > >>>> describing
> > > > >>>>>> how
> > > > >>>>>>>> to talk to cluster manager like Yarn or Mesos. The
> > > > >>>> ClusterDescriptor
> > > > >>>>>> can
> > > > >>>>>>> be
> > > > >>>>>>>> used for deploying Flink clusters (job and session) and
> gives
> > > > >>> you a
> > > > >>>>>>>> ClusterClient. The ClusterClient controls the cluster (e.g.
> > > > >>>>> submitting
> > > > >>>>>>>> jobs, listing all running jobs). And then there was the idea
> > to
> > > > >>>>>>> introduce a
> > > > >>>>>>>> JobClient which you obtain from the ClusterClient to trigger
> > > > >> job
> > > > >>>>>> specific
> > > > >>>>>>>> operations (e.g. taking a savepoint, cancelling the job).
> > > > >>>>>>>>
> > > > >>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-4272
> > > > >>>>>>>>
> > > > >>>>>>>> Cheers,
> > > > >>>>>>>> Till
> > > > >>>>>>>>
> > > > >>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang <
> zjffdu@gmail.com
> > >
> > > > >>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> Hi Folks,
> > > > >>>>>>>>>
> > > > >>>>>>>>> I am trying to integrate flink into apache zeppelin which
> is
> > > > >> an
> > > > >>>>>>>> interactive
> > > > >>>>>>>>> notebook. And I hit several issues that is caused by flink
> > > > >>> client
> > > > >>>>>> api.
> > > > >>>>>>> So
> > > > >>>>>>>>> I'd like to proposal the following changes for flink client
> > > > >>> api.
> > > > >>>>>>>>>
> > > > >>>>>>>>> 1. Support nonblocking execution. Currently,
> > > > >>>>>>> ExecutionEnvironment#execute
> > > > >>>>>>>>> is a blocking method which would do 2 things, first submit
> > > > >> job
> > > > >>>> and
> > > > >>>>>> then
> > > > >>>>>>>>> wait for job until it is finished. I'd like introduce a
> > > > >>>> nonblocking
> > > > >>>>>>>>> execution method like ExecutionEnvironment#submit which
> only
> > > > >>>> submit
> > > > >>>>>> job
> > > > >>>>>>>> and
> > > > >>>>>>>>> then return jobId to client. And allow user to query the
> job
> > > > >>>> status
> > > > >>>>>> via
> > > > >>>>>>>> the
> > > > >>>>>>>>> jobId.
> > > > >>>>>>>>>
> > > > >>>>>>>>> 2. Add cancel api in
> > > > >>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > > > >>>>>>>>> currently the only way to cancel job is via cli
> (bin/flink),
> > > > >>> this
> > > > >>>>> is
> > > > >>>>>>> not
> > > > >>>>>>>>> convenient for downstream project to use this feature. So
> I'd
> > > > >>>> like
> > > > >>>>> to
> > > > >>>>>>> add
> > > > >>>>>>>>> cancel api in ExecutionEnvironment
> > > > >>>>>>>>>
> > > > >>>>>>>>> 3. Add savepoint api in
> > > > >>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > >>>>>>>> It
> > > > >>>>>>>>> is similar as cancel api, we should use
> ExecutionEnvironment
> > > > >> as
> > > > >>>> the
> > > > >>>>>>>> unified
> > > > >>>>>>>>> api for third party to integrate with flink.
> > > > >>>>>>>>>
> > > > >>>>>>>>> 4. Add listener for job execution lifecycle. Something like
> > > > >>>>>> following,
> > > > >>>>>>> so
> > > > >>>>>>>>> that downstream project can do custom logic in the
> lifecycle
> > > > >> of
> > > > >>>>> job.
> > > > >>>>>>> e.g.
> > > > >>>>>>>>> Zeppelin would capture the jobId after job is submitted and
> > > > >>> then
> > > > >>>>> use
> > > > >>>>>>> this
> > > > >>>>>>>>> jobId to cancel it later when necessary.
> > > > >>>>>>>>>
> > > > >>>>>>>>> public interface JobListener {
> > > > >>>>>>>>>
> > > > >>>>>>>>>   void onJobSubmitted(JobID jobId);
> > > > >>>>>>>>>
> > > > >>>>>>>>>   void onJobExecuted(JobExecutionResult jobResult);
> > > > >>>>>>>>>
> > > > >>>>>>>>>   void onJobCanceled(JobID jobId);
> > > > >>>>>>>>> }
> > > > >>>>>>>>>
> > > > >>>>>>>>> 5. Enable session in ExecutionEnvironment. Currently it is
> > > > >>>>> disabled,
> > > > >>>>>>> but
> > > > >>>>>>>>> session is very convenient for third party to submitting
> jobs
> > > > >>>>>>>> continually.
> > > > >>>>>>>>> I hope flink can enable it again.
> > > > >>>>>>>>>
> > > > >>>>>>>>> 6. Unify all flink client api into
> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > > >>>>>>>>>
> > > > >>>>>>>>> This is a long term issue which needs more careful thinking
> > > > >> and
> > > > >>>>>> design.
> > > > >>>>>>>>> Currently some of features of flink is exposed in
> > > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment, but some
> are
> > > > >>>>> exposed
> > > > >>>>>>> in
> > > > >>>>>>>>> cli instead of api, like the cancel and savepoint I
> mentioned
> > > > >>>>> above.
> > > > >>>>>> I
> > > > >>>>>>>>> think the root cause is due to that flink didn't unify the
> > > > >>>>>> interaction
> > > > >>>>>>>> with
> > > > >>>>>>>>> flink. Here I list 3 scenarios of flink operation
> > > > >>>>>>>>>
> > > > >>>>>>>>>   - Local job execution.  Flink will create
> LocalEnvironment
> > > > >>> and
> > > > >>>>>> then
> > > > >>>>>>>> use
> > > > >>>>>>>>>   this LocalEnvironment to create LocalExecutor for job
> > > > >>>> execution.
> > > > >>>>>>>>>   - Remote job execution. Flink will create ClusterClient
> > > > >>> first
> > > > >>>>> and
> > > > >>>>>>> then
> > > > >>>>>>>>>   create ContextEnvironment based on the ClusterClient and
> > > > >>> then
> > > > >>>>> run
> > > > >>>>>>> the
> > > > >>>>>>>>> job.
> > > > >>>>>>>>>   - Job cancelation. Flink will create ClusterClient first
> > > > >> and
> > > > >>>>> then
> > > > >>>>>>>> cancel
> > > > >>>>>>>>>   this job via this ClusterClient.
> > > > >>>>>>>>>
> > > > >>>>>>>>> As you can see in the above 3 scenarios. Flink didn't use
> the
> > > > >>>> same
> > > > >>>>>>>>> approach(code path) to interact with flink
> > > > >>>>>>>>> What I propose is following:
> > > > >>>>>>>>> Create the proper LocalEnvironment/RemoteEnvironment (based
> > > > >> on
> > > > >>>> user
> > > > >>>>>>>>> configuration) --> Use this Environment to create proper
> > > > >>>>>> ClusterClient
> > > > >>>>>>>>> (LocalClusterClient or RestClusterClient) to interactive
> with
> > > > >>>>> Flink (
> > > > >>>>>>> job
> > > > >>>>>>>>> execution or cancelation)
> > > > >>>>>>>>>
> > > > >>>>>>>>> This way we can unify the process of local execution and
> > > > >> remote
> > > > >>>>>>>> execution.
> > > > >>>>>>>>> And it is much easier for third party to integrate with
> > > > >> flink,
> > > > >>>>>> because
> > > > >>>>>>>>> ExecutionEnvironment is the unified entry point for flink.
> > > > >> What
> > > > >>>>> third
> > > > >>>>>>>> party
> > > > >>>>>>>>> needs to do is just pass configuration to
> > > > >> ExecutionEnvironment
> > > > >>>> and
> > > > >>>>>>>>> ExecutionEnvironment will do the right thing based on the
> > > > >>>>>>> configuration.
> > > > >>>>>>>>> Flink cli can also be considered as flink api consumer. it
> > > > >> just
> > > > >>>>> pass
> > > > >>>>>>> the
> > > > >>>>>>>>> configuration to ExecutionEnvironment and let
> > > > >>>> ExecutionEnvironment
> > > > >>>>> to
> > > > >>>>>>>>> create the proper ClusterClient instead of letting cli to
> > > > >>> create
> > > > >>>>>>>>> ClusterClient directly.
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> 6 would involve large code refactoring, so I think we can
> > > > >> defer
> > > > >>>> it
> > > > >>>>>> for
> > > > >>>>>>>>> future release, 1,2,3,4,5 could be done at once I believe.
> > > > >> Let
> > > > >>> me
> > > > >>>>>> know
> > > > >>>>>>>> your
> > > > >>>>>>>>> comments and feedback, thanks
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> --
> > > > >>>>>>>>> Best Regards
> > > > >>>>>>>>>
> > > > >>>>>>>>> Jeff Zhang
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> Best Regards
> > > > >>>>>>>
> > > > >>>>>>> Jeff Zhang
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> --
> > > > >>>>> Best Regards
> > > > >>>>>
> > > > >>>>> Jeff Zhang
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>> Best Regards
> > > > >>>
> > > > >>> Jeff Zhang
> > > > >>>
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards
> > > > >
> > > > > Jeff Zhang
> > > >
> > > >
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> > >
> >
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by SHI Xiaogang <sh...@gmail.com>.

Hi Jeff and Flavio,

Thanks Jeff a lot for proposing the design document.

We are also working on refactoring ClusterClient to allow flexible and
efficient job management in our real-time platform.
We would like to draft a document to share our ideas with you.

I think it's a good idea to have something like Apache Livy for Flink, and
the efforts discussed here will take a great step forward to it.

Regards,
Xiaogang

Flavio Pompermaier <po...@okkam.it> 于2019年6月17日周一 下午7:13写道：

> Is there any possibility to have something like Apache Livy [1] also for
> Flink in the future?
>
> [1] https://livy.apache.org/
>
> On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <zj...@gmail.com> wrote:
>
> > >>>  Any API we expose should not have dependencies on the runtime
> > (flink-runtime) package or other implementation details. To me, this
> means
> > that the current ClusterClient cannot be exposed to users because it
>  uses
> > quite some classes from the optimiser and runtime packages.
> >
> > We should change ClusterClient from class to interface.
> > ExecutionEnvironment only use the interface ClusterClient which should be
> > in flink-clients while the concrete implementation class could be in
> > flink-runtime.
> >
> > >>> What happens when a failure/restart in the client happens? There need
> > to be a way of re-establishing the connection to the job, set up the
> > listeners again, etc.
> >
> > Good point.  First we need to define what does failure/restart in the
> > client mean. IIUC, that usually mean network failure which will happen in
> > class RestClient. If my understanding is correct, restart/retry mechanism
> > should be done in RestClient.
> >
> >
> >
> >
> >
> > Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二 下午11:10写道：
> >
> > > Some points to consider:
> > >
> > > * Any API we expose should not have dependencies on the runtime
> > > (flink-runtime) package or other implementation details. To me, this
> > means
> > > that the current ClusterClient cannot be exposed to users because it
> >  uses
> > > quite some classes from the optimiser and runtime packages.
> > >
> > > * What happens when a failure/restart in the client happens? There need
> > to
> > > be a way of re-establishing the connection to the job, set up the
> > listeners
> > > again, etc.
> > >
> > > Aljoscha
> > >
> > > > On 29. May 2019, at 10:17, Jeff Zhang <zj...@gmail.com> wrote:
> > > >
> > > > Sorry folks, the design doc is late as you expected. Here's the
> design
> > > doc
> > > > I drafted, welcome any comments and feedback.
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > > >
> > > >
> > > >
> > > > Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午8:43写道：
> > > >
> > > >> Nice that this discussion is happening.
> > > >>
> > > >> In the FLIP, we could also revisit the entire role of the
> environments
> > > >> again.
> > > >>
> > > >> Initially, the idea was:
> > > >>  - the environments take care of the specific setup for standalone
> (no
> > > >> setup needed), yarn, mesos, etc.
> > > >>  - the session ones have control over the session. The environment
> > holds
> > > >> the session client.
> > > >>  - running a job gives a "control" object for that job. That
> behavior
> > is
> > > >> the same in all environments.
> > > >>
> > > >> The actual implementation diverged quite a bit from that. Happy to
> > see a
> > > >> discussion about straitening this out a bit more.
> > > >>
> > > >>
> > > >> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <zj...@gmail.com>
> wrote:
> > > >>
> > > >>> Hi folks,
> > > >>>
> > > >>> Sorry for late response, It seems we reach consensus on this, I
> will
> > > >> create
> > > >>> FLIP for this with more detailed design
> > > >>>
> > > >>>
> > > >>> Thomas Weise <th...@apache.org> 于2018年12月21日周五 上午11:43写道：
> > > >>>
> > > >>>> Great to see this discussion seeded! The problems you face with
> the
> > > >>>> Zeppelin integration are also affecting other downstream projects,
> > > like
> > > >>>> Beam.
> > > >>>>
> > > >>>> We just enabled the savepoint restore option in
> > > RemoteStreamEnvironment
> > > >>> [1]
> > > >>>> and that was more difficult than it should be. The main issue is
> > that
> > > >>>> environment and cluster client aren't decoupled. Ideally it should
> > be
> > > >>>> possible to just get the matching cluster client from the
> > environment
> > > >> and
> > > >>>> then control the job through it (environment as factory for
> cluster
> > > >>>> client). But note that the environment classes are part of the
> > public
> > > >>> API,
> > > >>>> and it is not straightforward to make larger changes without
> > breaking
> > > >>>> backward compatibility.
> > > >>>>
> > > >>>> ClusterClient currently exposes internal classes like JobGraph and
> > > >>>> StreamGraph. But it should be possible to wrap this with a new
> > public
> > > >> API
> > > >>>> that brings the required job control capabilities for downstream
> > > >>> projects.
> > > >>>> Perhaps it is helpful to look at some of the interfaces in Beam
> > while
> > > >>>> thinking about this: [2] for the portable job API and [3] for the
> > old
> > > >>>> asynchronous job control from the Beam Java SDK.
> > > >>>>
> > > >>>> The backward compatibility discussion [4] is also relevant here. A
> > new
> > > >>> API
> > > >>>> should shield downstream projects from internals and allow them to
> > > >>>> interoperate with multiple future Flink versions in the same
> release
> > > >> line
> > > >>>> without forced upgrades.
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Thomas
> > > >>>>
> > > >>>> [1] https://github.com/apache/flink/pull/7249
> > > >>>> [2]
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > > >>>> [3]
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > > >>>> [4]
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > > >>>>
> > > >>>>
> > > >>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <zj...@gmail.com>
> > wrote:
> > > >>>>
> > > >>>>>>>> I'm not so sure whether the user should be able to define
> where
> > > >> the
> > > >>>> job
> > > >>>>> runs (in your example Yarn). This is actually independent of the
> > job
> > > >>>>> development and is something which is decided at deployment time.
> > > >>>>>
> > > >>>>> User don't need to specify execution mode programmatically. They
> > can
> > > >>> also
> > > >>>>> pass the execution mode from the arguments in flink run command.
> > e.g.
> > > >>>>>
> > > >>>>> bin/flink run -m yarn-cluster ....
> > > >>>>> bin/flink run -m local ...
> > > >>>>> bin/flink run -m host:port ...
> > > >>>>>
> > > >>>>> Does this make sense to you ?
> > > >>>>>
> > > >>>>>>>> To me it makes sense that the ExecutionEnvironment is not
> > > >> directly
> > > >>>>> initialized by the user and instead context sensitive how you
> want
> > to
> > > >>>>> execute your job (Flink CLI vs. IDE, for example).
> > > >>>>>
> > > >>>>> Right, currently I notice Flink would create different
> > > >>>>> ContextExecutionEnvironment based on different submission
> scenarios
> > > >>>> (Flink
> > > >>>>> Cli vs IDE). To me this is kind of hack approach, not so
> > > >>> straightforward.
> > > >>>>> What I suggested above is that is that flink should always create
> > the
> > > >>>> same
> > > >>>>> ExecutionEnvironment but with different configuration, and based
> on
> > > >> the
> > > >>>>> configuration it would create the proper ClusterClient for
> > different
> > > >>>>> behaviors.
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> Till Rohrmann <tr...@apache.org> 于2018年12月20日周四 下午11:18写道：
> > > >>>>>
> > > >>>>>> You are probably right that we have code duplication when it
> comes
> > > >> to
> > > >>>> the
> > > >>>>>> creation of the ClusterClient. This should be reduced in the
> > > >> future.
> > > >>>>>>
> > > >>>>>> I'm not so sure whether the user should be able to define where
> > the
> > > >>> job
> > > >>>>>> runs (in your example Yarn). This is actually independent of the
> > > >> job
> > > >>>>>> development and is something which is decided at deployment
> time.
> > > >> To
> > > >>> me
> > > >>>>> it
> > > >>>>>> makes sense that the ExecutionEnvironment is not directly
> > > >> initialized
> > > >>>> by
> > > >>>>>> the user and instead context sensitive how you want to execute
> > your
> > > >>> job
> > > >>>>>> (Flink CLI vs. IDE, for example). However, I agree that the
> > > >>>>>> ExecutionEnvironment should give you access to the ClusterClient
> > > >> and
> > > >>> to
> > > >>>>> the
> > > >>>>>> job (maybe in the form of the JobGraph or a job plan).
> > > >>>>>>
> > > >>>>>> Cheers,
> > > >>>>>> Till
> > > >>>>>>
> > > >>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <zj...@gmail.com>
> > > >> wrote:
> > > >>>>>>
> > > >>>>>>> Hi Till,
> > > >>>>>>> Thanks for the feedback. You are right that I expect better
> > > >>>>> programmatic
> > > >>>>>>> job submission/control api which could be used by downstream
> > > >>> project.
> > > >>>>> And
> > > >>>>>>> it would benefit for the flink ecosystem. When I look at the
> code
> > > >>> of
> > > >>>>>> flink
> > > >>>>>>> scala-shell and sql-client (I believe they are not the core of
> > > >>> flink,
> > > >>>>> but
> > > >>>>>>> belong to the ecosystem of flink), I find many duplicated code
> > > >> for
> > > >>>>>> creating
> > > >>>>>>> ClusterClient from user provided configuration (configuration
> > > >>> format
> > > >>>>> may
> > > >>>>>> be
> > > >>>>>>> different from scala-shell and sql-client) and then use that
> > > >>>>>> ClusterClient
> > > >>>>>>> to manipulate jobs. I don't think this is convenient for
> > > >> downstream
> > > >>>>>>> projects. What I expect is that downstream project only needs
> to
> > > >>>>> provide
> > > >>>>>>> necessary configuration info (maybe introducing class
> FlinkConf),
> > > >>> and
> > > >>>>>> then
> > > >>>>>>> build ExecutionEnvironment based on this FlinkConf, and
> > > >>>>>>> ExecutionEnvironment will create the proper ClusterClient. It
> not
> > > >>>> only
> > > >>>>>>> benefit for the downstream project development but also be
> > > >> helpful
> > > >>>> for
> > > >>>>>>> their integration test with flink. Here's one sample code
> snippet
> > > >>>> that
> > > >>>>> I
> > > >>>>>>> expect.
> > > >>>>>>>
> > > >>>>>>> val conf = new FlinkConf().mode("yarn")
> > > >>>>>>> val env = new ExecutionEnvironment(conf)
> > > >>>>>>> val jobId = env.submit(...)
> > > >>>>>>> val jobStatus = env.getClusterClient().queryJobStatus(jobId)
> > > >>>>>>> env.getClusterClient().cancelJob(jobId)
> > > >>>>>>>
> > > >>>>>>> What do you think ?
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Till Rohrmann <tr...@apache.org> 于2018年12月11日周二 下午6:28写道：
> > > >>>>>>>
> > > >>>>>>>> Hi Jeff,
> > > >>>>>>>>
> > > >>>>>>>> what you are proposing is to provide the user with better
> > > >>>>> programmatic
> > > >>>>>>> job
> > > >>>>>>>> control. There was actually an effort to achieve this but it
> > > >> has
> > > >>>>> never
> > > >>>>>>> been
> > > >>>>>>>> completed [1]. However, there are some improvement in the code
> > > >>> base
> > > >>>>>> now.
> > > >>>>>>>> Look for example at the NewClusterClient interface which
> > > >> offers a
> > > >>>>>>>> non-blocking job submission. But I agree that we need to
> > > >> improve
> > > >>>>> Flink
> > > >>>>>> in
> > > >>>>>>>> this regard.
> > > >>>>>>>>
> > > >>>>>>>> I would not be in favour if exposing all ClusterClient calls
> > > >> via
> > > >>>> the
> > > >>>>>>>> ExecutionEnvironment because it would clutter the class and
> > > >> would
> > > >>>> not
> > > >>>>>> be
> > > >>>>>>> a
> > > >>>>>>>> good separation of concerns. Instead one idea could be to
> > > >>> retrieve
> > > >>>>> the
> > > >>>>>>>> current ClusterClient from the ExecutionEnvironment which can
> > > >>> then
> > > >>>> be
> > > >>>>>>> used
> > > >>>>>>>> for cluster and job control. But before we start an effort
> > > >> here,
> > > >>> we
> > > >>>>>> need
> > > >>>>>>> to
> > > >>>>>>>> agree and capture what functionality we want to provide.
> > > >>>>>>>>
> > > >>>>>>>> Initially, the idea was that we have the ClusterDescriptor
> > > >>>> describing
> > > >>>>>> how
> > > >>>>>>>> to talk to cluster manager like Yarn or Mesos. The
> > > >>>> ClusterDescriptor
> > > >>>>>> can
> > > >>>>>>> be
> > > >>>>>>>> used for deploying Flink clusters (job and session) and gives
> > > >>> you a
> > > >>>>>>>> ClusterClient. The ClusterClient controls the cluster (e.g.
> > > >>>>> submitting
> > > >>>>>>>> jobs, listing all running jobs). And then there was the idea
> to
> > > >>>>>>> introduce a
> > > >>>>>>>> JobClient which you obtain from the ClusterClient to trigger
> > > >> job
> > > >>>>>> specific
> > > >>>>>>>> operations (e.g. taking a savepoint, cancelling the job).
> > > >>>>>>>>
> > > >>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-4272
> > > >>>>>>>>
> > > >>>>>>>> Cheers,
> > > >>>>>>>> Till
> > > >>>>>>>>
> > > >>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang <zjffdu@gmail.com
> >
> > > >>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hi Folks,
> > > >>>>>>>>>
> > > >>>>>>>>> I am trying to integrate flink into apache zeppelin which is
> > > >> an
> > > >>>>>>>> interactive
> > > >>>>>>>>> notebook. And I hit several issues that is caused by flink
> > > >>> client
> > > >>>>>> api.
> > > >>>>>>> So
> > > >>>>>>>>> I'd like to proposal the following changes for flink client
> > > >>> api.
> > > >>>>>>>>>
> > > >>>>>>>>> 1. Support nonblocking execution. Currently,
> > > >>>>>>> ExecutionEnvironment#execute
> > > >>>>>>>>> is a blocking method which would do 2 things, first submit
> > > >> job
> > > >>>> and
> > > >>>>>> then
> > > >>>>>>>>> wait for job until it is finished. I'd like introduce a
> > > >>>> nonblocking
> > > >>>>>>>>> execution method like ExecutionEnvironment#submit which only
> > > >>>> submit
> > > >>>>>> job
> > > >>>>>>>> and
> > > >>>>>>>>> then return jobId to client. And allow user to query the job
> > > >>>> status
> > > >>>>>> via
> > > >>>>>>>> the
> > > >>>>>>>>> jobId.
> > > >>>>>>>>>
> > > >>>>>>>>> 2. Add cancel api in
> > > >>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > > >>>>>>>>> currently the only way to cancel job is via cli (bin/flink),
> > > >>> this
> > > >>>>> is
> > > >>>>>>> not
> > > >>>>>>>>> convenient for downstream project to use this feature. So I'd
> > > >>>> like
> > > >>>>> to
> > > >>>>>>> add
> > > >>>>>>>>> cancel api in ExecutionEnvironment
> > > >>>>>>>>>
> > > >>>>>>>>> 3. Add savepoint api in
> > > >>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > >>>>>>>> It
> > > >>>>>>>>> is similar as cancel api, we should use ExecutionEnvironment
> > > >> as
> > > >>>> the
> > > >>>>>>>> unified
> > > >>>>>>>>> api for third party to integrate with flink.
> > > >>>>>>>>>
> > > >>>>>>>>> 4. Add listener for job execution lifecycle. Something like
> > > >>>>>> following,
> > > >>>>>>> so
> > > >>>>>>>>> that downstream project can do custom logic in the lifecycle
> > > >> of
> > > >>>>> job.
> > > >>>>>>> e.g.
> > > >>>>>>>>> Zeppelin would capture the jobId after job is submitted and
> > > >>> then
> > > >>>>> use
> > > >>>>>>> this
> > > >>>>>>>>> jobId to cancel it later when necessary.
> > > >>>>>>>>>
> > > >>>>>>>>> public interface JobListener {
> > > >>>>>>>>>
> > > >>>>>>>>>   void onJobSubmitted(JobID jobId);
> > > >>>>>>>>>
> > > >>>>>>>>>   void onJobExecuted(JobExecutionResult jobResult);
> > > >>>>>>>>>
> > > >>>>>>>>>   void onJobCanceled(JobID jobId);
> > > >>>>>>>>> }
> > > >>>>>>>>>
> > > >>>>>>>>> 5. Enable session in ExecutionEnvironment. Currently it is
> > > >>>>> disabled,
> > > >>>>>>> but
> > > >>>>>>>>> session is very convenient for third party to submitting jobs
> > > >>>>>>>> continually.
> > > >>>>>>>>> I hope flink can enable it again.
> > > >>>>>>>>>
> > > >>>>>>>>> 6. Unify all flink client api into
> > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > > >>>>>>>>>
> > > >>>>>>>>> This is a long term issue which needs more careful thinking
> > > >> and
> > > >>>>>> design.
> > > >>>>>>>>> Currently some of features of flink is exposed in
> > > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment, but some are
> > > >>>>> exposed
> > > >>>>>>> in
> > > >>>>>>>>> cli instead of api, like the cancel and savepoint I mentioned
> > > >>>>> above.
> > > >>>>>> I
> > > >>>>>>>>> think the root cause is due to that flink didn't unify the
> > > >>>>>> interaction
> > > >>>>>>>> with
> > > >>>>>>>>> flink. Here I list 3 scenarios of flink operation
> > > >>>>>>>>>
> > > >>>>>>>>>   - Local job execution.  Flink will create LocalEnvironment
> > > >>> and
> > > >>>>>> then
> > > >>>>>>>> use
> > > >>>>>>>>>   this LocalEnvironment to create LocalExecutor for job
> > > >>>> execution.
> > > >>>>>>>>>   - Remote job execution. Flink will create ClusterClient
> > > >>> first
> > > >>>>> and
> > > >>>>>>> then
> > > >>>>>>>>>   create ContextEnvironment based on the ClusterClient and
> > > >>> then
> > > >>>>> run
> > > >>>>>>> the
> > > >>>>>>>>> job.
> > > >>>>>>>>>   - Job cancelation. Flink will create ClusterClient first
> > > >> and
> > > >>>>> then
> > > >>>>>>>> cancel
> > > >>>>>>>>>   this job via this ClusterClient.
> > > >>>>>>>>>
> > > >>>>>>>>> As you can see in the above 3 scenarios. Flink didn't use the
> > > >>>> same
> > > >>>>>>>>> approach(code path) to interact with flink
> > > >>>>>>>>> What I propose is following:
> > > >>>>>>>>> Create the proper LocalEnvironment/RemoteEnvironment (based
> > > >> on
> > > >>>> user
> > > >>>>>>>>> configuration) --> Use this Environment to create proper
> > > >>>>>> ClusterClient
> > > >>>>>>>>> (LocalClusterClient or RestClusterClient) to interactive with
> > > >>>>> Flink (
> > > >>>>>>> job
> > > >>>>>>>>> execution or cancelation)
> > > >>>>>>>>>
> > > >>>>>>>>> This way we can unify the process of local execution and
> > > >> remote
> > > >>>>>>>> execution.
> > > >>>>>>>>> And it is much easier for third party to integrate with
> > > >> flink,
> > > >>>>>> because
> > > >>>>>>>>> ExecutionEnvironment is the unified entry point for flink.
> > > >> What
> > > >>>>> third
> > > >>>>>>>> party
> > > >>>>>>>>> needs to do is just pass configuration to
> > > >> ExecutionEnvironment
> > > >>>> and
> > > >>>>>>>>> ExecutionEnvironment will do the right thing based on the
> > > >>>>>>> configuration.
> > > >>>>>>>>> Flink cli can also be considered as flink api consumer. it
> > > >> just
> > > >>>>> pass
> > > >>>>>>> the
> > > >>>>>>>>> configuration to ExecutionEnvironment and let
> > > >>>> ExecutionEnvironment
> > > >>>>> to
> > > >>>>>>>>> create the proper ClusterClient instead of letting cli to
> > > >>> create
> > > >>>>>>>>> ClusterClient directly.
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> 6 would involve large code refactoring, so I think we can
> > > >> defer
> > > >>>> it
> > > >>>>>> for
> > > >>>>>>>>> future release, 1,2,3,4,5 could be done at once I believe.
> > > >> Let
> > > >>> me
> > > >>>>>> know
> > > >>>>>>>> your
> > > >>>>>>>>> comments and feedback, thanks
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> --
> > > >>>>>>>>> Best Regards
> > > >>>>>>>>>
> > > >>>>>>>>> Jeff Zhang
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>> Best Regards
> > > >>>>>>>
> > > >>>>>>> Jeff Zhang
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Best Regards
> > > >>>>>
> > > >>>>> Jeff Zhang
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Best Regards
> > > >>>
> > > >>> Jeff Zhang
> > > >>>
> > > >>
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Jeff Zhang
> > >
> > >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Flavio Pompermaier <po...@okkam.it>.

Is there any possibility to have something like Apache Livy [1] also for
Flink in the future?

[1] https://livy.apache.org/

On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang <zj...@gmail.com> wrote:

> >>>  Any API we expose should not have dependencies on the runtime
> (flink-runtime) package or other implementation details. To me, this means
> that the current ClusterClient cannot be exposed to users because it   uses
> quite some classes from the optimiser and runtime packages.
>
> We should change ClusterClient from class to interface.
> ExecutionEnvironment only use the interface ClusterClient which should be
> in flink-clients while the concrete implementation class could be in
> flink-runtime.
>
> >>> What happens when a failure/restart in the client happens? There need
> to be a way of re-establishing the connection to the job, set up the
> listeners again, etc.
>
> Good point.  First we need to define what does failure/restart in the
> client mean. IIUC, that usually mean network failure which will happen in
> class RestClient. If my understanding is correct, restart/retry mechanism
> should be done in RestClient.
>
>
>
>
>
> Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二 下午11:10写道：
>
> > Some points to consider:
> >
> > * Any API we expose should not have dependencies on the runtime
> > (flink-runtime) package or other implementation details. To me, this
> means
> > that the current ClusterClient cannot be exposed to users because it
>  uses
> > quite some classes from the optimiser and runtime packages.
> >
> > * What happens when a failure/restart in the client happens? There need
> to
> > be a way of re-establishing the connection to the job, set up the
> listeners
> > again, etc.
> >
> > Aljoscha
> >
> > > On 29. May 2019, at 10:17, Jeff Zhang <zj...@gmail.com> wrote:
> > >
> > > Sorry folks, the design doc is late as you expected. Here's the design
> > doc
> > > I drafted, welcome any comments and feedback.
> > >
> > >
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> > >
> > >
> > >
> > > Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午8:43写道：
> > >
> > >> Nice that this discussion is happening.
> > >>
> > >> In the FLIP, we could also revisit the entire role of the environments
> > >> again.
> > >>
> > >> Initially, the idea was:
> > >>  - the environments take care of the specific setup for standalone (no
> > >> setup needed), yarn, mesos, etc.
> > >>  - the session ones have control over the session. The environment
> holds
> > >> the session client.
> > >>  - running a job gives a "control" object for that job. That behavior
> is
> > >> the same in all environments.
> > >>
> > >> The actual implementation diverged quite a bit from that. Happy to
> see a
> > >> discussion about straitening this out a bit more.
> > >>
> > >>
> > >> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <zj...@gmail.com> wrote:
> > >>
> > >>> Hi folks,
> > >>>
> > >>> Sorry for late response, It seems we reach consensus on this, I will
> > >> create
> > >>> FLIP for this with more detailed design
> > >>>
> > >>>
> > >>> Thomas Weise <th...@apache.org> 于2018年12月21日周五 上午11:43写道：
> > >>>
> > >>>> Great to see this discussion seeded! The problems you face with the
> > >>>> Zeppelin integration are also affecting other downstream projects,
> > like
> > >>>> Beam.
> > >>>>
> > >>>> We just enabled the savepoint restore option in
> > RemoteStreamEnvironment
> > >>> [1]
> > >>>> and that was more difficult than it should be. The main issue is
> that
> > >>>> environment and cluster client aren't decoupled. Ideally it should
> be
> > >>>> possible to just get the matching cluster client from the
> environment
> > >> and
> > >>>> then control the job through it (environment as factory for cluster
> > >>>> client). But note that the environment classes are part of the
> public
> > >>> API,
> > >>>> and it is not straightforward to make larger changes without
> breaking
> > >>>> backward compatibility.
> > >>>>
> > >>>> ClusterClient currently exposes internal classes like JobGraph and
> > >>>> StreamGraph. But it should be possible to wrap this with a new
> public
> > >> API
> > >>>> that brings the required job control capabilities for downstream
> > >>> projects.
> > >>>> Perhaps it is helpful to look at some of the interfaces in Beam
> while
> > >>>> thinking about this: [2] for the portable job API and [3] for the
> old
> > >>>> asynchronous job control from the Beam Java SDK.
> > >>>>
> > >>>> The backward compatibility discussion [4] is also relevant here. A
> new
> > >>> API
> > >>>> should shield downstream projects from internals and allow them to
> > >>>> interoperate with multiple future Flink versions in the same release
> > >> line
> > >>>> without forced upgrades.
> > >>>>
> > >>>> Thanks,
> > >>>> Thomas
> > >>>>
> > >>>> [1] https://github.com/apache/flink/pull/7249
> > >>>> [2]
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > >>>> [3]
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > >>>> [4]
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> > >>>>
> > >>>>
> > >>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <zj...@gmail.com>
> wrote:
> > >>>>
> > >>>>>>>> I'm not so sure whether the user should be able to define where
> > >> the
> > >>>> job
> > >>>>> runs (in your example Yarn). This is actually independent of the
> job
> > >>>>> development and is something which is decided at deployment time.
> > >>>>>
> > >>>>> User don't need to specify execution mode programmatically. They
> can
> > >>> also
> > >>>>> pass the execution mode from the arguments in flink run command.
> e.g.
> > >>>>>
> > >>>>> bin/flink run -m yarn-cluster ....
> > >>>>> bin/flink run -m local ...
> > >>>>> bin/flink run -m host:port ...
> > >>>>>
> > >>>>> Does this make sense to you ?
> > >>>>>
> > >>>>>>>> To me it makes sense that the ExecutionEnvironment is not
> > >> directly
> > >>>>> initialized by the user and instead context sensitive how you want
> to
> > >>>>> execute your job (Flink CLI vs. IDE, for example).
> > >>>>>
> > >>>>> Right, currently I notice Flink would create different
> > >>>>> ContextExecutionEnvironment based on different submission scenarios
> > >>>> (Flink
> > >>>>> Cli vs IDE). To me this is kind of hack approach, not so
> > >>> straightforward.
> > >>>>> What I suggested above is that is that flink should always create
> the
> > >>>> same
> > >>>>> ExecutionEnvironment but with different configuration, and based on
> > >> the
> > >>>>> configuration it would create the proper ClusterClient for
> different
> > >>>>> behaviors.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Till Rohrmann <tr...@apache.org> 于2018年12月20日周四 下午11:18写道：
> > >>>>>
> > >>>>>> You are probably right that we have code duplication when it comes
> > >> to
> > >>>> the
> > >>>>>> creation of the ClusterClient. This should be reduced in the
> > >> future.
> > >>>>>>
> > >>>>>> I'm not so sure whether the user should be able to define where
> the
> > >>> job
> > >>>>>> runs (in your example Yarn). This is actually independent of the
> > >> job
> > >>>>>> development and is something which is decided at deployment time.
> > >> To
> > >>> me
> > >>>>> it
> > >>>>>> makes sense that the ExecutionEnvironment is not directly
> > >> initialized
> > >>>> by
> > >>>>>> the user and instead context sensitive how you want to execute
> your
> > >>> job
> > >>>>>> (Flink CLI vs. IDE, for example). However, I agree that the
> > >>>>>> ExecutionEnvironment should give you access to the ClusterClient
> > >> and
> > >>> to
> > >>>>> the
> > >>>>>> job (maybe in the form of the JobGraph or a job plan).
> > >>>>>>
> > >>>>>> Cheers,
> > >>>>>> Till
> > >>>>>>
> > >>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <zj...@gmail.com>
> > >> wrote:
> > >>>>>>
> > >>>>>>> Hi Till,
> > >>>>>>> Thanks for the feedback. You are right that I expect better
> > >>>>> programmatic
> > >>>>>>> job submission/control api which could be used by downstream
> > >>> project.
> > >>>>> And
> > >>>>>>> it would benefit for the flink ecosystem. When I look at the code
> > >>> of
> > >>>>>> flink
> > >>>>>>> scala-shell and sql-client (I believe they are not the core of
> > >>> flink,
> > >>>>> but
> > >>>>>>> belong to the ecosystem of flink), I find many duplicated code
> > >> for
> > >>>>>> creating
> > >>>>>>> ClusterClient from user provided configuration (configuration
> > >>> format
> > >>>>> may
> > >>>>>> be
> > >>>>>>> different from scala-shell and sql-client) and then use that
> > >>>>>> ClusterClient
> > >>>>>>> to manipulate jobs. I don't think this is convenient for
> > >> downstream
> > >>>>>>> projects. What I expect is that downstream project only needs to
> > >>>>> provide
> > >>>>>>> necessary configuration info (maybe introducing class FlinkConf),
> > >>> and
> > >>>>>> then
> > >>>>>>> build ExecutionEnvironment based on this FlinkConf, and
> > >>>>>>> ExecutionEnvironment will create the proper ClusterClient. It not
> > >>>> only
> > >>>>>>> benefit for the downstream project development but also be
> > >> helpful
> > >>>> for
> > >>>>>>> their integration test with flink. Here's one sample code snippet
> > >>>> that
> > >>>>> I
> > >>>>>>> expect.
> > >>>>>>>
> > >>>>>>> val conf = new FlinkConf().mode("yarn")
> > >>>>>>> val env = new ExecutionEnvironment(conf)
> > >>>>>>> val jobId = env.submit(...)
> > >>>>>>> val jobStatus = env.getClusterClient().queryJobStatus(jobId)
> > >>>>>>> env.getClusterClient().cancelJob(jobId)
> > >>>>>>>
> > >>>>>>> What do you think ?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Till Rohrmann <tr...@apache.org> 于2018年12月11日周二 下午6:28写道：
> > >>>>>>>
> > >>>>>>>> Hi Jeff,
> > >>>>>>>>
> > >>>>>>>> what you are proposing is to provide the user with better
> > >>>>> programmatic
> > >>>>>>> job
> > >>>>>>>> control. There was actually an effort to achieve this but it
> > >> has
> > >>>>> never
> > >>>>>>> been
> > >>>>>>>> completed [1]. However, there are some improvement in the code
> > >>> base
> > >>>>>> now.
> > >>>>>>>> Look for example at the NewClusterClient interface which
> > >> offers a
> > >>>>>>>> non-blocking job submission. But I agree that we need to
> > >> improve
> > >>>>> Flink
> > >>>>>> in
> > >>>>>>>> this regard.
> > >>>>>>>>
> > >>>>>>>> I would not be in favour if exposing all ClusterClient calls
> > >> via
> > >>>> the
> > >>>>>>>> ExecutionEnvironment because it would clutter the class and
> > >> would
> > >>>> not
> > >>>>>> be
> > >>>>>>> a
> > >>>>>>>> good separation of concerns. Instead one idea could be to
> > >>> retrieve
> > >>>>> the
> > >>>>>>>> current ClusterClient from the ExecutionEnvironment which can
> > >>> then
> > >>>> be
> > >>>>>>> used
> > >>>>>>>> for cluster and job control. But before we start an effort
> > >> here,
> > >>> we
> > >>>>>> need
> > >>>>>>> to
> > >>>>>>>> agree and capture what functionality we want to provide.
> > >>>>>>>>
> > >>>>>>>> Initially, the idea was that we have the ClusterDescriptor
> > >>>> describing
> > >>>>>> how
> > >>>>>>>> to talk to cluster manager like Yarn or Mesos. The
> > >>>> ClusterDescriptor
> > >>>>>> can
> > >>>>>>> be
> > >>>>>>>> used for deploying Flink clusters (job and session) and gives
> > >>> you a
> > >>>>>>>> ClusterClient. The ClusterClient controls the cluster (e.g.
> > >>>>> submitting
> > >>>>>>>> jobs, listing all running jobs). And then there was the idea to
> > >>>>>>> introduce a
> > >>>>>>>> JobClient which you obtain from the ClusterClient to trigger
> > >> job
> > >>>>>> specific
> > >>>>>>>> operations (e.g. taking a savepoint, cancelling the job).
> > >>>>>>>>
> > >>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-4272
> > >>>>>>>>
> > >>>>>>>> Cheers,
> > >>>>>>>> Till
> > >>>>>>>>
> > >>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang <zj...@gmail.com>
> > >>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi Folks,
> > >>>>>>>>>
> > >>>>>>>>> I am trying to integrate flink into apache zeppelin which is
> > >> an
> > >>>>>>>> interactive
> > >>>>>>>>> notebook. And I hit several issues that is caused by flink
> > >>> client
> > >>>>>> api.
> > >>>>>>> So
> > >>>>>>>>> I'd like to proposal the following changes for flink client
> > >>> api.
> > >>>>>>>>>
> > >>>>>>>>> 1. Support nonblocking execution. Currently,
> > >>>>>>> ExecutionEnvironment#execute
> > >>>>>>>>> is a blocking method which would do 2 things, first submit
> > >> job
> > >>>> and
> > >>>>>> then
> > >>>>>>>>> wait for job until it is finished. I'd like introduce a
> > >>>> nonblocking
> > >>>>>>>>> execution method like ExecutionEnvironment#submit which only
> > >>>> submit
> > >>>>>> job
> > >>>>>>>> and
> > >>>>>>>>> then return jobId to client. And allow user to query the job
> > >>>> status
> > >>>>>> via
> > >>>>>>>> the
> > >>>>>>>>> jobId.
> > >>>>>>>>>
> > >>>>>>>>> 2. Add cancel api in
> > >>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> > >>>>>>>>> currently the only way to cancel job is via cli (bin/flink),
> > >>> this
> > >>>>> is
> > >>>>>>> not
> > >>>>>>>>> convenient for downstream project to use this feature. So I'd
> > >>>> like
> > >>>>> to
> > >>>>>>> add
> > >>>>>>>>> cancel api in ExecutionEnvironment
> > >>>>>>>>>
> > >>>>>>>>> 3. Add savepoint api in
> > >>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > >>>>>>>> It
> > >>>>>>>>> is similar as cancel api, we should use ExecutionEnvironment
> > >> as
> > >>>> the
> > >>>>>>>> unified
> > >>>>>>>>> api for third party to integrate with flink.
> > >>>>>>>>>
> > >>>>>>>>> 4. Add listener for job execution lifecycle. Something like
> > >>>>>> following,
> > >>>>>>> so
> > >>>>>>>>> that downstream project can do custom logic in the lifecycle
> > >> of
> > >>>>> job.
> > >>>>>>> e.g.
> > >>>>>>>>> Zeppelin would capture the jobId after job is submitted and
> > >>> then
> > >>>>> use
> > >>>>>>> this
> > >>>>>>>>> jobId to cancel it later when necessary.
> > >>>>>>>>>
> > >>>>>>>>> public interface JobListener {
> > >>>>>>>>>
> > >>>>>>>>>   void onJobSubmitted(JobID jobId);
> > >>>>>>>>>
> > >>>>>>>>>   void onJobExecuted(JobExecutionResult jobResult);
> > >>>>>>>>>
> > >>>>>>>>>   void onJobCanceled(JobID jobId);
> > >>>>>>>>> }
> > >>>>>>>>>
> > >>>>>>>>> 5. Enable session in ExecutionEnvironment. Currently it is
> > >>>>> disabled,
> > >>>>>>> but
> > >>>>>>>>> session is very convenient for third party to submitting jobs
> > >>>>>>>> continually.
> > >>>>>>>>> I hope flink can enable it again.
> > >>>>>>>>>
> > >>>>>>>>> 6. Unify all flink client api into
> > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> > >>>>>>>>>
> > >>>>>>>>> This is a long term issue which needs more careful thinking
> > >> and
> > >>>>>> design.
> > >>>>>>>>> Currently some of features of flink is exposed in
> > >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment, but some are
> > >>>>> exposed
> > >>>>>>> in
> > >>>>>>>>> cli instead of api, like the cancel and savepoint I mentioned
> > >>>>> above.
> > >>>>>> I
> > >>>>>>>>> think the root cause is due to that flink didn't unify the
> > >>>>>> interaction
> > >>>>>>>> with
> > >>>>>>>>> flink. Here I list 3 scenarios of flink operation
> > >>>>>>>>>
> > >>>>>>>>>   - Local job execution.  Flink will create LocalEnvironment
> > >>> and
> > >>>>>> then
> > >>>>>>>> use
> > >>>>>>>>>   this LocalEnvironment to create LocalExecutor for job
> > >>>> execution.
> > >>>>>>>>>   - Remote job execution. Flink will create ClusterClient
> > >>> first
> > >>>>> and
> > >>>>>>> then
> > >>>>>>>>>   create ContextEnvironment based on the ClusterClient and
> > >>> then
> > >>>>> run
> > >>>>>>> the
> > >>>>>>>>> job.
> > >>>>>>>>>   - Job cancelation. Flink will create ClusterClient first
> > >> and
> > >>>>> then
> > >>>>>>>> cancel
> > >>>>>>>>>   this job via this ClusterClient.
> > >>>>>>>>>
> > >>>>>>>>> As you can see in the above 3 scenarios. Flink didn't use the
> > >>>> same
> > >>>>>>>>> approach(code path) to interact with flink
> > >>>>>>>>> What I propose is following:
> > >>>>>>>>> Create the proper LocalEnvironment/RemoteEnvironment (based
> > >> on
> > >>>> user
> > >>>>>>>>> configuration) --> Use this Environment to create proper
> > >>>>>> ClusterClient
> > >>>>>>>>> (LocalClusterClient or RestClusterClient) to interactive with
> > >>>>> Flink (
> > >>>>>>> job
> > >>>>>>>>> execution or cancelation)
> > >>>>>>>>>
> > >>>>>>>>> This way we can unify the process of local execution and
> > >> remote
> > >>>>>>>> execution.
> > >>>>>>>>> And it is much easier for third party to integrate with
> > >> flink,
> > >>>>>> because
> > >>>>>>>>> ExecutionEnvironment is the unified entry point for flink.
> > >> What
> > >>>>> third
> > >>>>>>>> party
> > >>>>>>>>> needs to do is just pass configuration to
> > >> ExecutionEnvironment
> > >>>> and
> > >>>>>>>>> ExecutionEnvironment will do the right thing based on the
> > >>>>>>> configuration.
> > >>>>>>>>> Flink cli can also be considered as flink api consumer. it
> > >> just
> > >>>>> pass
> > >>>>>>> the
> > >>>>>>>>> configuration to ExecutionEnvironment and let
> > >>>> ExecutionEnvironment
> > >>>>> to
> > >>>>>>>>> create the proper ClusterClient instead of letting cli to
> > >>> create
> > >>>>>>>>> ClusterClient directly.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> 6 would involve large code refactoring, so I think we can
> > >> defer
> > >>>> it
> > >>>>>> for
> > >>>>>>>>> future release, 1,2,3,4,5 could be done at once I believe.
> > >> Let
> > >>> me
> > >>>>>> know
> > >>>>>>>> your
> > >>>>>>>>> comments and feedback, thanks
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>> Best Regards
> > >>>>>>>>>
> > >>>>>>>>> Jeff Zhang
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Best Regards
> > >>>>>>>
> > >>>>>>> Jeff Zhang
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Best Regards
> > >>>>>
> > >>>>> Jeff Zhang
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Best Regards
> > >>>
> > >>> Jeff Zhang
> > >>>
> > >>
> > >
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> >
> >
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Jeff Zhang <zj...@gmail.com>.

>>>  Any API we expose should not have dependencies on the runtime
(flink-runtime) package or other implementation details. To me, this means
that the current ClusterClient cannot be exposed to users because it   uses
quite some classes from the optimiser and runtime packages.

We should change ClusterClient from class to interface.
ExecutionEnvironment only use the interface ClusterClient which should be
in flink-clients while the concrete implementation class could be in
flink-runtime.

>>> What happens when a failure/restart in the client happens? There need
to be a way of re-establishing the connection to the job, set up the
listeners again, etc.

Good point.  First we need to define what does failure/restart in the
client mean. IIUC, that usually mean network failure which will happen in
class RestClient. If my understanding is correct, restart/retry mechanism
should be done in RestClient.





Aljoscha Krettek <al...@apache.org> 于2019年6月11日周二 下午11:10写道：

> Some points to consider:
>
> * Any API we expose should not have dependencies on the runtime
> (flink-runtime) package or other implementation details. To me, this means
> that the current ClusterClient cannot be exposed to users because it   uses
> quite some classes from the optimiser and runtime packages.
>
> * What happens when a failure/restart in the client happens? There need to
> be a way of re-establishing the connection to the job, set up the listeners
> again, etc.
>
> Aljoscha
>
> > On 29. May 2019, at 10:17, Jeff Zhang <zj...@gmail.com> wrote:
> >
> > Sorry folks, the design doc is late as you expected. Here's the design
> doc
> > I drafted, welcome any comments and feedback.
> >
> >
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> >
> >
> >
> > Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午8:43写道：
> >
> >> Nice that this discussion is happening.
> >>
> >> In the FLIP, we could also revisit the entire role of the environments
> >> again.
> >>
> >> Initially, the idea was:
> >>  - the environments take care of the specific setup for standalone (no
> >> setup needed), yarn, mesos, etc.
> >>  - the session ones have control over the session. The environment holds
> >> the session client.
> >>  - running a job gives a "control" object for that job. That behavior is
> >> the same in all environments.
> >>
> >> The actual implementation diverged quite a bit from that. Happy to see a
> >> discussion about straitening this out a bit more.
> >>
> >>
> >> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <zj...@gmail.com> wrote:
> >>
> >>> Hi folks,
> >>>
> >>> Sorry for late response, It seems we reach consensus on this, I will
> >> create
> >>> FLIP for this with more detailed design
> >>>
> >>>
> >>> Thomas Weise <th...@apache.org> 于2018年12月21日周五 上午11:43写道：
> >>>
> >>>> Great to see this discussion seeded! The problems you face with the
> >>>> Zeppelin integration are also affecting other downstream projects,
> like
> >>>> Beam.
> >>>>
> >>>> We just enabled the savepoint restore option in
> RemoteStreamEnvironment
> >>> [1]
> >>>> and that was more difficult than it should be. The main issue is that
> >>>> environment and cluster client aren't decoupled. Ideally it should be
> >>>> possible to just get the matching cluster client from the environment
> >> and
> >>>> then control the job through it (environment as factory for cluster
> >>>> client). But note that the environment classes are part of the public
> >>> API,
> >>>> and it is not straightforward to make larger changes without breaking
> >>>> backward compatibility.
> >>>>
> >>>> ClusterClient currently exposes internal classes like JobGraph and
> >>>> StreamGraph. But it should be possible to wrap this with a new public
> >> API
> >>>> that brings the required job control capabilities for downstream
> >>> projects.
> >>>> Perhaps it is helpful to look at some of the interfaces in Beam while
> >>>> thinking about this: [2] for the portable job API and [3] for the old
> >>>> asynchronous job control from the Beam Java SDK.
> >>>>
> >>>> The backward compatibility discussion [4] is also relevant here. A new
> >>> API
> >>>> should shield downstream projects from internals and allow them to
> >>>> interoperate with multiple future Flink versions in the same release
> >> line
> >>>> without forced upgrades.
> >>>>
> >>>> Thanks,
> >>>> Thomas
> >>>>
> >>>> [1] https://github.com/apache/flink/pull/7249
> >>>> [2]
> >>>>
> >>>>
> >>>
> >>
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> >>>> [3]
> >>>>
> >>>>
> >>>
> >>
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> >>>> [4]
> >>>>
> >>>>
> >>>
> >>
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> >>>>
> >>>>
> >>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <zj...@gmail.com> wrote:
> >>>>
> >>>>>>>> I'm not so sure whether the user should be able to define where
> >> the
> >>>> job
> >>>>> runs (in your example Yarn). This is actually independent of the job
> >>>>> development and is something which is decided at deployment time.
> >>>>>
> >>>>> User don't need to specify execution mode programmatically. They can
> >>> also
> >>>>> pass the execution mode from the arguments in flink run command. e.g.
> >>>>>
> >>>>> bin/flink run -m yarn-cluster ....
> >>>>> bin/flink run -m local ...
> >>>>> bin/flink run -m host:port ...
> >>>>>
> >>>>> Does this make sense to you ?
> >>>>>
> >>>>>>>> To me it makes sense that the ExecutionEnvironment is not
> >> directly
> >>>>> initialized by the user and instead context sensitive how you want to
> >>>>> execute your job (Flink CLI vs. IDE, for example).
> >>>>>
> >>>>> Right, currently I notice Flink would create different
> >>>>> ContextExecutionEnvironment based on different submission scenarios
> >>>> (Flink
> >>>>> Cli vs IDE). To me this is kind of hack approach, not so
> >>> straightforward.
> >>>>> What I suggested above is that is that flink should always create the
> >>>> same
> >>>>> ExecutionEnvironment but with different configuration, and based on
> >> the
> >>>>> configuration it would create the proper ClusterClient for different
> >>>>> behaviors.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Till Rohrmann <tr...@apache.org> 于2018年12月20日周四 下午11:18写道：
> >>>>>
> >>>>>> You are probably right that we have code duplication when it comes
> >> to
> >>>> the
> >>>>>> creation of the ClusterClient. This should be reduced in the
> >> future.
> >>>>>>
> >>>>>> I'm not so sure whether the user should be able to define where the
> >>> job
> >>>>>> runs (in your example Yarn). This is actually independent of the
> >> job
> >>>>>> development and is something which is decided at deployment time.
> >> To
> >>> me
> >>>>> it
> >>>>>> makes sense that the ExecutionEnvironment is not directly
> >> initialized
> >>>> by
> >>>>>> the user and instead context sensitive how you want to execute your
> >>> job
> >>>>>> (Flink CLI vs. IDE, for example). However, I agree that the
> >>>>>> ExecutionEnvironment should give you access to the ClusterClient
> >> and
> >>> to
> >>>>> the
> >>>>>> job (maybe in the form of the JobGraph or a job plan).
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Till
> >>>>>>
> >>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <zj...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>>> Hi Till,
> >>>>>>> Thanks for the feedback. You are right that I expect better
> >>>>> programmatic
> >>>>>>> job submission/control api which could be used by downstream
> >>> project.
> >>>>> And
> >>>>>>> it would benefit for the flink ecosystem. When I look at the code
> >>> of
> >>>>>> flink
> >>>>>>> scala-shell and sql-client (I believe they are not the core of
> >>> flink,
> >>>>> but
> >>>>>>> belong to the ecosystem of flink), I find many duplicated code
> >> for
> >>>>>> creating
> >>>>>>> ClusterClient from user provided configuration (configuration
> >>> format
> >>>>> may
> >>>>>> be
> >>>>>>> different from scala-shell and sql-client) and then use that
> >>>>>> ClusterClient
> >>>>>>> to manipulate jobs. I don't think this is convenient for
> >> downstream
> >>>>>>> projects. What I expect is that downstream project only needs to
> >>>>> provide
> >>>>>>> necessary configuration info (maybe introducing class FlinkConf),
> >>> and
> >>>>>> then
> >>>>>>> build ExecutionEnvironment based on this FlinkConf, and
> >>>>>>> ExecutionEnvironment will create the proper ClusterClient. It not
> >>>> only
> >>>>>>> benefit for the downstream project development but also be
> >> helpful
> >>>> for
> >>>>>>> their integration test with flink. Here's one sample code snippet
> >>>> that
> >>>>> I
> >>>>>>> expect.
> >>>>>>>
> >>>>>>> val conf = new FlinkConf().mode("yarn")
> >>>>>>> val env = new ExecutionEnvironment(conf)
> >>>>>>> val jobId = env.submit(...)
> >>>>>>> val jobStatus = env.getClusterClient().queryJobStatus(jobId)
> >>>>>>> env.getClusterClient().cancelJob(jobId)
> >>>>>>>
> >>>>>>> What do you think ?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Till Rohrmann <tr...@apache.org> 于2018年12月11日周二 下午6:28写道：
> >>>>>>>
> >>>>>>>> Hi Jeff,
> >>>>>>>>
> >>>>>>>> what you are proposing is to provide the user with better
> >>>>> programmatic
> >>>>>>> job
> >>>>>>>> control. There was actually an effort to achieve this but it
> >> has
> >>>>> never
> >>>>>>> been
> >>>>>>>> completed [1]. However, there are some improvement in the code
> >>> base
> >>>>>> now.
> >>>>>>>> Look for example at the NewClusterClient interface which
> >> offers a
> >>>>>>>> non-blocking job submission. But I agree that we need to
> >> improve
> >>>>> Flink
> >>>>>> in
> >>>>>>>> this regard.
> >>>>>>>>
> >>>>>>>> I would not be in favour if exposing all ClusterClient calls
> >> via
> >>>> the
> >>>>>>>> ExecutionEnvironment because it would clutter the class and
> >> would
> >>>> not
> >>>>>> be
> >>>>>>> a
> >>>>>>>> good separation of concerns. Instead one idea could be to
> >>> retrieve
> >>>>> the
> >>>>>>>> current ClusterClient from the ExecutionEnvironment which can
> >>> then
> >>>> be
> >>>>>>> used
> >>>>>>>> for cluster and job control. But before we start an effort
> >> here,
> >>> we
> >>>>>> need
> >>>>>>> to
> >>>>>>>> agree and capture what functionality we want to provide.
> >>>>>>>>
> >>>>>>>> Initially, the idea was that we have the ClusterDescriptor
> >>>> describing
> >>>>>> how
> >>>>>>>> to talk to cluster manager like Yarn or Mesos. The
> >>>> ClusterDescriptor
> >>>>>> can
> >>>>>>> be
> >>>>>>>> used for deploying Flink clusters (job and session) and gives
> >>> you a
> >>>>>>>> ClusterClient. The ClusterClient controls the cluster (e.g.
> >>>>> submitting
> >>>>>>>> jobs, listing all running jobs). And then there was the idea to
> >>>>>>> introduce a
> >>>>>>>> JobClient which you obtain from the ClusterClient to trigger
> >> job
> >>>>>> specific
> >>>>>>>> operations (e.g. taking a savepoint, cancelling the job).
> >>>>>>>>
> >>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-4272
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Till
> >>>>>>>>
> >>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang <zj...@gmail.com>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Folks,
> >>>>>>>>>
> >>>>>>>>> I am trying to integrate flink into apache zeppelin which is
> >> an
> >>>>>>>> interactive
> >>>>>>>>> notebook. And I hit several issues that is caused by flink
> >>> client
> >>>>>> api.
> >>>>>>> So
> >>>>>>>>> I'd like to proposal the following changes for flink client
> >>> api.
> >>>>>>>>>
> >>>>>>>>> 1. Support nonblocking execution. Currently,
> >>>>>>> ExecutionEnvironment#execute
> >>>>>>>>> is a blocking method which would do 2 things, first submit
> >> job
> >>>> and
> >>>>>> then
> >>>>>>>>> wait for job until it is finished. I'd like introduce a
> >>>> nonblocking
> >>>>>>>>> execution method like ExecutionEnvironment#submit which only
> >>>> submit
> >>>>>> job
> >>>>>>>> and
> >>>>>>>>> then return jobId to client. And allow user to query the job
> >>>> status
> >>>>>> via
> >>>>>>>> the
> >>>>>>>>> jobId.
> >>>>>>>>>
> >>>>>>>>> 2. Add cancel api in
> >>>>> ExecutionEnvironment/StreamExecutionEnvironment,
> >>>>>>>>> currently the only way to cancel job is via cli (bin/flink),
> >>> this
> >>>>> is
> >>>>>>> not
> >>>>>>>>> convenient for downstream project to use this feature. So I'd
> >>>> like
> >>>>> to
> >>>>>>> add
> >>>>>>>>> cancel api in ExecutionEnvironment
> >>>>>>>>>
> >>>>>>>>> 3. Add savepoint api in
> >>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> >>>>>>>> It
> >>>>>>>>> is similar as cancel api, we should use ExecutionEnvironment
> >> as
> >>>> the
> >>>>>>>> unified
> >>>>>>>>> api for third party to integrate with flink.
> >>>>>>>>>
> >>>>>>>>> 4. Add listener for job execution lifecycle. Something like
> >>>>>> following,
> >>>>>>> so
> >>>>>>>>> that downstream project can do custom logic in the lifecycle
> >> of
> >>>>> job.
> >>>>>>> e.g.
> >>>>>>>>> Zeppelin would capture the jobId after job is submitted and
> >>> then
> >>>>> use
> >>>>>>> this
> >>>>>>>>> jobId to cancel it later when necessary.
> >>>>>>>>>
> >>>>>>>>> public interface JobListener {
> >>>>>>>>>
> >>>>>>>>>   void onJobSubmitted(JobID jobId);
> >>>>>>>>>
> >>>>>>>>>   void onJobExecuted(JobExecutionResult jobResult);
> >>>>>>>>>
> >>>>>>>>>   void onJobCanceled(JobID jobId);
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> 5. Enable session in ExecutionEnvironment. Currently it is
> >>>>> disabled,
> >>>>>>> but
> >>>>>>>>> session is very convenient for third party to submitting jobs
> >>>>>>>> continually.
> >>>>>>>>> I hope flink can enable it again.
> >>>>>>>>>
> >>>>>>>>> 6. Unify all flink client api into
> >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
> >>>>>>>>>
> >>>>>>>>> This is a long term issue which needs more careful thinking
> >> and
> >>>>>> design.
> >>>>>>>>> Currently some of features of flink is exposed in
> >>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment, but some are
> >>>>> exposed
> >>>>>>> in
> >>>>>>>>> cli instead of api, like the cancel and savepoint I mentioned
> >>>>> above.
> >>>>>> I
> >>>>>>>>> think the root cause is due to that flink didn't unify the
> >>>>>> interaction
> >>>>>>>> with
> >>>>>>>>> flink. Here I list 3 scenarios of flink operation
> >>>>>>>>>
> >>>>>>>>>   - Local job execution.  Flink will create LocalEnvironment
> >>> and
> >>>>>> then
> >>>>>>>> use
> >>>>>>>>>   this LocalEnvironment to create LocalExecutor for job
> >>>> execution.
> >>>>>>>>>   - Remote job execution. Flink will create ClusterClient
> >>> first
> >>>>> and
> >>>>>>> then
> >>>>>>>>>   create ContextEnvironment based on the ClusterClient and
> >>> then
> >>>>> run
> >>>>>>> the
> >>>>>>>>> job.
> >>>>>>>>>   - Job cancelation. Flink will create ClusterClient first
> >> and
> >>>>> then
> >>>>>>>> cancel
> >>>>>>>>>   this job via this ClusterClient.
> >>>>>>>>>
> >>>>>>>>> As you can see in the above 3 scenarios. Flink didn't use the
> >>>> same
> >>>>>>>>> approach(code path) to interact with flink
> >>>>>>>>> What I propose is following:
> >>>>>>>>> Create the proper LocalEnvironment/RemoteEnvironment (based
> >> on
> >>>> user
> >>>>>>>>> configuration) --> Use this Environment to create proper
> >>>>>> ClusterClient
> >>>>>>>>> (LocalClusterClient or RestClusterClient) to interactive with
> >>>>> Flink (
> >>>>>>> job
> >>>>>>>>> execution or cancelation)
> >>>>>>>>>
> >>>>>>>>> This way we can unify the process of local execution and
> >> remote
> >>>>>>>> execution.
> >>>>>>>>> And it is much easier for third party to integrate with
> >> flink,
> >>>>>> because
> >>>>>>>>> ExecutionEnvironment is the unified entry point for flink.
> >> What
> >>>>> third
> >>>>>>>> party
> >>>>>>>>> needs to do is just pass configuration to
> >> ExecutionEnvironment
> >>>> and
> >>>>>>>>> ExecutionEnvironment will do the right thing based on the
> >>>>>>> configuration.
> >>>>>>>>> Flink cli can also be considered as flink api consumer. it
> >> just
> >>>>> pass
> >>>>>>> the
> >>>>>>>>> configuration to ExecutionEnvironment and let
> >>>> ExecutionEnvironment
> >>>>> to
> >>>>>>>>> create the proper ClusterClient instead of letting cli to
> >>> create
> >>>>>>>>> ClusterClient directly.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> 6 would involve large code refactoring, so I think we can
> >> defer
> >>>> it
> >>>>>> for
> >>>>>>>>> future release, 1,2,3,4,5 could be done at once I believe.
> >> Let
> >>> me
> >>>>>> know
> >>>>>>>> your
> >>>>>>>>> comments and feedback, thanks
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Best Regards
> >>>>>>>>>
> >>>>>>>>> Jeff Zhang
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best Regards
> >>>>>>>
> >>>>>>> Jeff Zhang
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best Regards
> >>>>>
> >>>>> Jeff Zhang
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Best Regards
> >>>
> >>> Jeff Zhang
> >>>
> >>
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
>
>

-- 
Best Regards

Jeff Zhang

Re: [DISCUSS] Flink client api enhancement for downstream project

Posted by Aljoscha Krettek <al...@apache.org>.

Some points to consider:

* Any API we expose should not have dependencies on the runtime (flink-runtime) package or other implementation details. To me, this means that the current ClusterClient cannot be exposed to users because it   uses quite some classes from the optimiser and runtime packages.

* What happens when a failure/restart in the client happens? There need to be a way of re-establishing the connection to the job, set up the listeners again, etc.

Aljoscha

> On 29. May 2019, at 10:17, Jeff Zhang <zj...@gmail.com> wrote:
> 
> Sorry folks, the design doc is late as you expected. Here's the design doc
> I drafted, welcome any comments and feedback.
> 
> https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/edit?usp=sharing
> 
> 
> 
> Stephan Ewen <se...@apache.org> 于2019年2月14日周四 下午8:43写道：
> 
>> Nice that this discussion is happening.
>> 
>> In the FLIP, we could also revisit the entire role of the environments
>> again.
>> 
>> Initially, the idea was:
>>  - the environments take care of the specific setup for standalone (no
>> setup needed), yarn, mesos, etc.
>>  - the session ones have control over the session. The environment holds
>> the session client.
>>  - running a job gives a "control" object for that job. That behavior is
>> the same in all environments.
>> 
>> The actual implementation diverged quite a bit from that. Happy to see a
>> discussion about straitening this out a bit more.
>> 
>> 
>> On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang <zj...@gmail.com> wrote:
>> 
>>> Hi folks,
>>> 
>>> Sorry for late response, It seems we reach consensus on this, I will
>> create
>>> FLIP for this with more detailed design
>>> 
>>> 
>>> Thomas Weise <th...@apache.org> 于2018年12月21日周五 上午11:43写道：
>>> 
>>>> Great to see this discussion seeded! The problems you face with the
>>>> Zeppelin integration are also affecting other downstream projects, like
>>>> Beam.
>>>> 
>>>> We just enabled the savepoint restore option in RemoteStreamEnvironment
>>> [1]
>>>> and that was more difficult than it should be. The main issue is that
>>>> environment and cluster client aren't decoupled. Ideally it should be
>>>> possible to just get the matching cluster client from the environment
>> and
>>>> then control the job through it (environment as factory for cluster
>>>> client). But note that the environment classes are part of the public
>>> API,
>>>> and it is not straightforward to make larger changes without breaking
>>>> backward compatibility.
>>>> 
>>>> ClusterClient currently exposes internal classes like JobGraph and
>>>> StreamGraph. But it should be possible to wrap this with a new public
>> API
>>>> that brings the required job control capabilities for downstream
>>> projects.
>>>> Perhaps it is helpful to look at some of the interfaces in Beam while
>>>> thinking about this: [2] for the portable job API and [3] for the old
>>>> asynchronous job control from the Beam Java SDK.
>>>> 
>>>> The backward compatibility discussion [4] is also relevant here. A new
>>> API
>>>> should shield downstream projects from internals and allow them to
>>>> interoperate with multiple future Flink versions in the same release
>> line
>>>> without forced upgrades.
>>>> 
>>>> Thanks,
>>>> Thomas
>>>> 
>>>> [1] https://github.com/apache/flink/pull/7249
>>>> [2]
>>>> 
>>>> 
>>> 
>> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
>>>> [3]
>>>> 
>>>> 
>>> 
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
>>>> [4]
>>>> 
>>>> 
>>> 
>> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
>>>> 
>>>> 
>>>> On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang <zj...@gmail.com> wrote:
>>>> 
>>>>>>>> I'm not so sure whether the user should be able to define where
>> the
>>>> job
>>>>> runs (in your example Yarn). This is actually independent of the job
>>>>> development and is something which is decided at deployment time.
>>>>> 
>>>>> User don't need to specify execution mode programmatically. They can
>>> also
>>>>> pass the execution mode from the arguments in flink run command. e.g.
>>>>> 
>>>>> bin/flink run -m yarn-cluster ....
>>>>> bin/flink run -m local ...
>>>>> bin/flink run -m host:port ...
>>>>> 
>>>>> Does this make sense to you ?
>>>>> 
>>>>>>>> To me it makes sense that the ExecutionEnvironment is not
>> directly
>>>>> initialized by the user and instead context sensitive how you want to
>>>>> execute your job (Flink CLI vs. IDE, for example).
>>>>> 
>>>>> Right, currently I notice Flink would create different
>>>>> ContextExecutionEnvironment based on different submission scenarios
>>>> (Flink
>>>>> Cli vs IDE). To me this is kind of hack approach, not so
>>> straightforward.
>>>>> What I suggested above is that is that flink should always create the
>>>> same
>>>>> ExecutionEnvironment but with different configuration, and based on
>> the
>>>>> configuration it would create the proper ClusterClient for different
>>>>> behaviors.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Till Rohrmann <tr...@apache.org> 于2018年12月20日周四 下午11:18写道：
>>>>> 
>>>>>> You are probably right that we have code duplication when it comes
>> to
>>>> the
>>>>>> creation of the ClusterClient. This should be reduced in the
>> future.
>>>>>> 
>>>>>> I'm not so sure whether the user should be able to define where the
>>> job
>>>>>> runs (in your example Yarn). This is actually independent of the
>> job
>>>>>> development and is something which is decided at deployment time.
>> To
>>> me
>>>>> it
>>>>>> makes sense that the ExecutionEnvironment is not directly
>> initialized
>>>> by
>>>>>> the user and instead context sensitive how you want to execute your
>>> job
>>>>>> (Flink CLI vs. IDE, for example). However, I agree that the
>>>>>> ExecutionEnvironment should give you access to the ClusterClient
>> and
>>> to
>>>>> the
>>>>>> job (maybe in the form of the JobGraph or a job plan).
>>>>>> 
>>>>>> Cheers,
>>>>>> Till
>>>>>> 
>>>>>> On Thu, Dec 13, 2018 at 4:36 AM Jeff Zhang <zj...@gmail.com>
>> wrote:
>>>>>> 
>>>>>>> Hi Till,
>>>>>>> Thanks for the feedback. You are right that I expect better
>>>>> programmatic
>>>>>>> job submission/control api which could be used by downstream
>>> project.
>>>>> And
>>>>>>> it would benefit for the flink ecosystem. When I look at the code
>>> of
>>>>>> flink
>>>>>>> scala-shell and sql-client (I believe they are not the core of
>>> flink,
>>>>> but
>>>>>>> belong to the ecosystem of flink), I find many duplicated code
>> for
>>>>>> creating
>>>>>>> ClusterClient from user provided configuration (configuration
>>> format
>>>>> may
>>>>>> be
>>>>>>> different from scala-shell and sql-client) and then use that
>>>>>> ClusterClient
>>>>>>> to manipulate jobs. I don't think this is convenient for
>> downstream
>>>>>>> projects. What I expect is that downstream project only needs to
>>>>> provide
>>>>>>> necessary configuration info (maybe introducing class FlinkConf),
>>> and
>>>>>> then
>>>>>>> build ExecutionEnvironment based on this FlinkConf, and
>>>>>>> ExecutionEnvironment will create the proper ClusterClient. It not
>>>> only
>>>>>>> benefit for the downstream project development but also be
>> helpful
>>>> for
>>>>>>> their integration test with flink. Here's one sample code snippet
>>>> that
>>>>> I
>>>>>>> expect.
>>>>>>> 
>>>>>>> val conf = new FlinkConf().mode("yarn")
>>>>>>> val env = new ExecutionEnvironment(conf)
>>>>>>> val jobId = env.submit(...)
>>>>>>> val jobStatus = env.getClusterClient().queryJobStatus(jobId)
>>>>>>> env.getClusterClient().cancelJob(jobId)
>>>>>>> 
>>>>>>> What do you think ?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Till Rohrmann <tr...@apache.org> 于2018年12月11日周二 下午6:28写道：
>>>>>>> 
>>>>>>>> Hi Jeff,
>>>>>>>> 
>>>>>>>> what you are proposing is to provide the user with better
>>>>> programmatic
>>>>>>> job
>>>>>>>> control. There was actually an effort to achieve this but it
>> has
>>>>> never
>>>>>>> been
>>>>>>>> completed [1]. However, there are some improvement in the code
>>> base
>>>>>> now.
>>>>>>>> Look for example at the NewClusterClient interface which
>> offers a
>>>>>>>> non-blocking job submission. But I agree that we need to
>> improve
>>>>> Flink
>>>>>> in
>>>>>>>> this regard.
>>>>>>>> 
>>>>>>>> I would not be in favour if exposing all ClusterClient calls
>> via
>>>> the
>>>>>>>> ExecutionEnvironment because it would clutter the class and
>> would
>>>> not
>>>>>> be
>>>>>>> a
>>>>>>>> good separation of concerns. Instead one idea could be to
>>> retrieve
>>>>> the
>>>>>>>> current ClusterClient from the ExecutionEnvironment which can
>>> then
>>>> be
>>>>>>> used
>>>>>>>> for cluster and job control. But before we start an effort
>> here,
>>> we
>>>>>> need
>>>>>>> to
>>>>>>>> agree and capture what functionality we want to provide.
>>>>>>>> 
>>>>>>>> Initially, the idea was that we have the ClusterDescriptor
>>>> describing
>>>>>> how
>>>>>>>> to talk to cluster manager like Yarn or Mesos. The
>>>> ClusterDescriptor
>>>>>> can
>>>>>>> be
>>>>>>>> used for deploying Flink clusters (job and session) and gives
>>> you a
>>>>>>>> ClusterClient. The ClusterClient controls the cluster (e.g.
>>>>> submitting
>>>>>>>> jobs, listing all running jobs). And then there was the idea to
>>>>>>> introduce a
>>>>>>>> JobClient which you obtain from the ClusterClient to trigger
>> job
>>>>>> specific
>>>>>>>> operations (e.g. taking a savepoint, cancelling the job).
>>>>>>>> 
>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-4272
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Till
>>>>>>>> 
>>>>>>>> On Tue, Dec 11, 2018 at 10:13 AM Jeff Zhang <zj...@gmail.com>
>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Folks,
>>>>>>>>> 
>>>>>>>>> I am trying to integrate flink into apache zeppelin which is
>> an
>>>>>>>> interactive
>>>>>>>>> notebook. And I hit several issues that is caused by flink
>>> client
>>>>>> api.
>>>>>>> So
>>>>>>>>> I'd like to proposal the following changes for flink client
>>> api.
>>>>>>>>> 
>>>>>>>>> 1. Support nonblocking execution. Currently,
>>>>>>> ExecutionEnvironment#execute
>>>>>>>>> is a blocking method which would do 2 things, first submit
>> job
>>>> and
>>>>>> then
>>>>>>>>> wait for job until it is finished. I'd like introduce a
>>>> nonblocking
>>>>>>>>> execution method like ExecutionEnvironment#submit which only
>>>> submit
>>>>>> job
>>>>>>>> and
>>>>>>>>> then return jobId to client. And allow user to query the job
>>>> status
>>>>>> via
>>>>>>>> the
>>>>>>>>> jobId.
>>>>>>>>> 
>>>>>>>>> 2. Add cancel api in
>>>>> ExecutionEnvironment/StreamExecutionEnvironment,
>>>>>>>>> currently the only way to cancel job is via cli (bin/flink),
>>> this
>>>>> is
>>>>>>> not
>>>>>>>>> convenient for downstream project to use this feature. So I'd
>>>> like
>>>>> to
>>>>>>> add
>>>>>>>>> cancel api in ExecutionEnvironment
>>>>>>>>> 
>>>>>>>>> 3. Add savepoint api in
>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>>>>>>>> It
>>>>>>>>> is similar as cancel api, we should use ExecutionEnvironment
>> as
>>>> the
>>>>>>>> unified
>>>>>>>>> api for third party to integrate with flink.
>>>>>>>>> 
>>>>>>>>> 4. Add listener for job execution lifecycle. Something like
>>>>>> following,
>>>>>>> so
>>>>>>>>> that downstream project can do custom logic in the lifecycle
>> of
>>>>> job.
>>>>>>> e.g.
>>>>>>>>> Zeppelin would capture the jobId after job is submitted and
>>> then
>>>>> use
>>>>>>> this
>>>>>>>>> jobId to cancel it later when necessary.
>>>>>>>>> 
>>>>>>>>> public interface JobListener {
>>>>>>>>> 
>>>>>>>>>   void onJobSubmitted(JobID jobId);
>>>>>>>>> 
>>>>>>>>>   void onJobExecuted(JobExecutionResult jobResult);
>>>>>>>>> 
>>>>>>>>>   void onJobCanceled(JobID jobId);
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 5. Enable session in ExecutionEnvironment. Currently it is
>>>>> disabled,
>>>>>>> but
>>>>>>>>> session is very convenient for third party to submitting jobs
>>>>>>>> continually.
>>>>>>>>> I hope flink can enable it again.
>>>>>>>>> 
>>>>>>>>> 6. Unify all flink client api into
>>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment.
>>>>>>>>> 
>>>>>>>>> This is a long term issue which needs more careful thinking
>> and
>>>>>> design.
>>>>>>>>> Currently some of features of flink is exposed in
>>>>>>>>> ExecutionEnvironment/StreamExecutionEnvironment, but some are
>>>>> exposed
>>>>>>> in
>>>>>>>>> cli instead of api, like the cancel and savepoint I mentioned
>>>>> above.
>>>>>> I
>>>>>>>>> think the root cause is due to that flink didn't unify the
>>>>>> interaction
>>>>>>>> with
>>>>>>>>> flink. Here I list 3 scenarios of flink operation
>>>>>>>>> 
>>>>>>>>>   - Local job execution.  Flink will create LocalEnvironment
>>> and
>>>>>> then
>>>>>>>> use
>>>>>>>>>   this LocalEnvironment to create LocalExecutor for job
>>>> execution.
>>>>>>>>>   - Remote job execution. Flink will create ClusterClient
>>> first
>>>>> and
>>>>>>> then
>>>>>>>>>   create ContextEnvironment based on the ClusterClient and
>>> then
>>>>> run
>>>>>>> the
>>>>>>>>> job.
>>>>>>>>>   - Job cancelation. Flink will create ClusterClient first
>> and
>>>>> then
>>>>>>>> cancel
>>>>>>>>>   this job via this ClusterClient.
>>>>>>>>> 
>>>>>>>>> As you can see in the above 3 scenarios. Flink didn't use the
>>>> same
>>>>>>>>> approach(code path) to interact with flink
>>>>>>>>> What I propose is following:
>>>>>>>>> Create the proper LocalEnvironment/RemoteEnvironment (based
>> on
>>>> user
>>>>>>>>> configuration) --> Use this Environment to create proper
>>>>>> ClusterClient
>>>>>>>>> (LocalClusterClient or RestClusterClient) to interactive with
>>>>> Flink (
>>>>>>> job
>>>>>>>>> execution or cancelation)
>>>>>>>>> 
>>>>>>>>> This way we can unify the process of local execution and
>> remote
>>>>>>>> execution.
>>>>>>>>> And it is much easier for third party to integrate with
>> flink,
>>>>>> because
>>>>>>>>> ExecutionEnvironment is the unified entry point for flink.
>> What
>>>>> third
>>>>>>>> party
>>>>>>>>> needs to do is just pass configuration to
>> ExecutionEnvironment
>>>> and
>>>>>>>>> ExecutionEnvironment will do the right thing based on the
>>>>>>> configuration.
>>>>>>>>> Flink cli can also be considered as flink api consumer. it
>> just
>>>>> pass
>>>>>>> the
>>>>>>>>> configuration to ExecutionEnvironment and let
>>>> ExecutionEnvironment
>>>>> to
>>>>>>>>> create the proper ClusterClient instead of letting cli to
>>> create
>>>>>>>>> ClusterClient directly.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 6 would involve large code refactoring, so I think we can
>> defer
>>>> it
>>>>>> for
>>>>>>>>> future release, 1,2,3,4,5 could be done at once I believe.
>> Let
>>> me
>>>>>> know
>>>>>>>> your
>>>>>>>>> comments and feedback, thanks
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Best Regards
>>>>>>>>> 
>>>>>>>>> Jeff Zhang
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best Regards
>>>>>>> 
>>>>>>> Jeff Zhang
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards
>>>>> 
>>>>> Jeff Zhang
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>>> 
>> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang