You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Leo Zhang (Jira)" <ji...@apache.org> on 2019/12/11 01:02:42 UTC

[jira] [Updated] (FLINK-14319) Register user jar files in {Stream}ExecutionEnvironment

     [ https://issues.apache.org/jira/browse/FLINK-14319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leo Zhang updated FLINK-14319:
------------------------------
    Description: 
 I see that there are some use cases in which people want to implement applications based on loading external jars for now, not just SQL but also streaming ones. And the related API proposals have been issued in the sub-task FLINK-14055 under task FLINK-10232 Add a SQL DDL.

To support this sub-task FLINK-14055 , we need the other new task for \{Stream}ExecutionEnvironment::registerUserJarFile() interface which will be addressed in this issue.    

Here is the plan.

*Design*
 * Add interface 
{code:java}
 void registerUserJarFile(String jarFile)
{code}
  into _StreamExecutionEnvironment_ ( in module flink-streaming-java). The affected classes are _StreamGraph_, _StreamGraphGenerator_, _StreamingJobGraphGenerator_ to support getting and setting a list of user jars.  And all they are in module flink-streaming-java.

 
 * Add _void_ _registerUserJarFile(String jarFile)_ into _ExecutionEnvironment_ (in module flink-java). The affected classes is _Plan_, in module flink-core, to support getting and setting a list of user jars. 

 
{code:java}
void registerUserJarFile(String jarFile){code}
 
 * Add interface 
{code:java}
void addUserJars(List<Path> userJars, JobGraph jobGraph)
{code}
  into _JobGraphGenerator_ and add the user jars within the method _JobGraph_ _compileJobGraph(OptimizedPlan program, JobID jobId)_ so that user jars can be shipped with user's program and submitted to cluster. _JobGraphGenerator_ is in module flink-optimizer.
 * Add interface 
{code:java}
void registerUserJarFile(String jarFile)
{code}
  into \{Stream}ExecutionEnvironment (in module flink-scala and flink-streaming-scala) and just use the wrapped _javaEnv_ to achieve registration. 

*Testing*
 * One test case for adding local user jars both in the streaming and batch jobs. We need to process test classes into a jar before testing. For this purpose, we can add a goal in process-test-classes for this testing case in the pom file. The affected module is flink-tests.
 * Another test case for adding use jars in HDFS. The same idea with the previous one. The affected module is flink-fs-tests.
 * Note that python API is not included in this issue just as registering cached files. But we still need to modify some python test cases in order to avoid building error as lacking some methods declared in java.  The affected files are _flink-python/pyflink/dataset/tests/test_execution_environment_completeness.py_ and _flink-python/pyflink/datastream/tests/test_stream_execution_environment_completeness.py_.

  was:
 I see that there are some use cases in which people want to implement their own SQL application based on loading external jars for now. And the related API proposals have been issued in the task FLINK-10232 Add a SQL DDL . And the related sub-task FLINK-14055 is unresolved and its status is still open.

I feel like it's better to split this task FLINK-14055 into two goals, one for DDL and the other new task for \{Stream}ExecutionEnvironment::registerUserJarFile() interface which will be addressed in this issue.    

Here is the plan.

*Design*
 * Add _void_ _registerUserJarFile(String jarFile)_ into _StreamExecutionEnvironment_ ( in module flink-streaming-java). The affected classes are _StreamGraph_, _StreamGraphGenerator_, _StreamingJobGraphGenerator_ to support getting and setting a list of user jars.  And all they are in module flink-streaming-java.
 * Add _void_ _registerUserJarFile(String jarFile)_ into _ExecutionEnvironment_ (in module flink-java). The affected classes is _Plan_, in module flink-core, to support getting and setting a list of user jars. 
 * Add _void addUserJars(List<Path> userJars, JobGraph jobGraph)_ into _JobGraphGenerator_ and add the user jars within the method _JobGraph_ _compileHobGraph(OptimizedPlan program, JobID jobId)_ so that user jars can be shipped with user's program and submitted to cluster. _JobGraphGenerator_ is in module flink-optimizer.
 * Add _void_ _registerUserJarFile(String jarFile)_ into \{Stream}ExecutionEnvironment (in module flink-scala and flink-streaming-scala) and just use the wrapped _javaEnv_ to achieve registration. 

*Testing*
 * One test case for adding local user jars both in the streaming and batch jobs. We need to process test classes into a jar before testing. For this purpose, we can add a goal in process-test-classes for this testing case in the pom file. The affected module is flink-tests.
 * Another test case for adding use jars in HDFS. The same idea with the previous one. The affected module is flink-fs-tests.
 * Note that python API is not included in this issue just as registering cached files. But we still need to modify some python test cases in order to avoid building error as lacking some methods declared in java.  The affected files are _flink-python/pyflink/dataset/tests/test_execution_environment_completeness.py_ and _flink-python/pyflink/datastream/tests/test_stream_execution_environment_completeness.py_.


> Register user jar files in {Stream}ExecutionEnvironment 
> --------------------------------------------------------
>
>                 Key: FLINK-14319
>                 URL: https://issues.apache.org/jira/browse/FLINK-14319
>             Project: Flink
>          Issue Type: New Feature
>          Components: API / DataSet, API / DataStream
>            Reporter: Leo Zhang
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
>  I see that there are some use cases in which people want to implement applications based on loading external jars for now, not just SQL but also streaming ones. And the related API proposals have been issued in the sub-task FLINK-14055 under task FLINK-10232 Add a SQL DDL.
> To support this sub-task FLINK-14055 , we need the other new task for \{Stream}ExecutionEnvironment::registerUserJarFile() interface which will be addressed in this issue.    
> Here is the plan.
> *Design*
>  * Add interface 
> {code:java}
>  void registerUserJarFile(String jarFile)
> {code}
>   into _StreamExecutionEnvironment_ ( in module flink-streaming-java). The affected classes are _StreamGraph_, _StreamGraphGenerator_, _StreamingJobGraphGenerator_ to support getting and setting a list of user jars.  And all they are in module flink-streaming-java.
>  
>  * Add _void_ _registerUserJarFile(String jarFile)_ into _ExecutionEnvironment_ (in module flink-java). The affected classes is _Plan_, in module flink-core, to support getting and setting a list of user jars. 
>  
> {code:java}
> void registerUserJarFile(String jarFile){code}
>  
>  * Add interface 
> {code:java}
> void addUserJars(List<Path> userJars, JobGraph jobGraph)
> {code}
>   into _JobGraphGenerator_ and add the user jars within the method _JobGraph_ _compileJobGraph(OptimizedPlan program, JobID jobId)_ so that user jars can be shipped with user's program and submitted to cluster. _JobGraphGenerator_ is in module flink-optimizer.
>  * Add interface 
> {code:java}
> void registerUserJarFile(String jarFile)
> {code}
>   into \{Stream}ExecutionEnvironment (in module flink-scala and flink-streaming-scala) and just use the wrapped _javaEnv_ to achieve registration. 
> *Testing*
>  * One test case for adding local user jars both in the streaming and batch jobs. We need to process test classes into a jar before testing. For this purpose, we can add a goal in process-test-classes for this testing case in the pom file. The affected module is flink-tests.
>  * Another test case for adding use jars in HDFS. The same idea with the previous one. The affected module is flink-fs-tests.
>  * Note that python API is not included in this issue just as registering cached files. But we still need to modify some python test cases in order to avoid building error as lacking some methods declared in java.  The affected files are _flink-python/pyflink/dataset/tests/test_execution_environment_completeness.py_ and _flink-python/pyflink/datastream/tests/test_stream_execution_environment_completeness.py_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)