You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "jiangxin369 (via GitHub)" <gi...@apache.org> on 2023/04/23 11:50:40 UTC

[GitHub] [flink-ml] jiangxin369 opened a new pull request, #234: [FLINK-31887] Upgrade Flink version of Flink ML to 1.16.1

jiangxin369 opened a new pull request, #234:
URL: https://github.com/apache/flink-ml/pull/234

   <!--
   *Thank you very much for contributing to Apache Flink ML - we are happy that you want to help us improve Flink ML. To help the community review your contribution in the best possible way, please go through the checklist below, which will get the contribution into a shape in which it can be best reviewed.*
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to one [JIRA issue](https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are made for typos in JavaDoc or documentation files, which need no JIRA issue.
     
     - Name the pull request in the form "[FLINK-XXXX] Title of the pull request", where *FLINK-XXXX* should be replaced by the actual issue number.
     Typo fixes that have no associated JIRA issue should be named following this pattern: `[hotfix] Title of the pull request`.
   
     - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review.
     
     - Each commit in the pull request has a meaningful commit message (including the JIRA id)
   
     - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below.
   
   **(The sections below can be removed for hotfixes of typos)**
   -->
   
   ## What is the purpose of the change
   
   Upgrade Flink version of Flink ML to 1.16.1
   
   ## Brief change log
   
   Upgrade Flink version of Flink ML to 1.16.1. The main changes about compatibility are as below.
     - 1.16 adds a `createdKVStates` in KeyedStateBackend to store states, which should be cleared in PerRound Iteration. Otherwise the tests like PerRoundOperatorStateTest#testStateIsolationWithHashMapKeyedStateBackend would fail.
     - 1.16 removes the `getNonChainedOutputs` API from `StreamConfig` which is used during creating wrapped operator configs.
     - 1.16 modifies some interfaces of `OperatorCoordinator`, we have to re-implement `HeadOperatorCoordinator`.
     - The `setXXX` functions in `StreamConfig` are changed to lazy, we need to trigger the serialization eagerly during creating wrapped operator configs.
     - 1.16 doesn't support `table.as("colA, colB")` anymore, we must use `table.as("colA", "colB")`
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not applicable)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-ml] jiangxin369 commented on pull request #234: [FLINK-31887] Upgrade Flink version of Flink ML to 1.16.1

Posted by "jiangxin369 (via GitHub)" <gi...@apache.org>.
jiangxin369 commented on PR #234:
URL: https://github.com/apache/flink-ml/pull/234#issuecomment-1521110530

   @lindong28 Could you have another look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-ml] jiangxin369 commented on pull request #234: [FLINK-31887] Upgrade Flink version of Flink ML to 1.16.1

Posted by "jiangxin369 (via GitHub)" <gi...@apache.org>.
jiangxin369 commented on PR #234:
URL: https://github.com/apache/flink-ml/pull/234#issuecomment-1519307669

   @lindong28 Could you have a look at this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-ml] lindong28 merged pull request #234: [FLINK-31887] Upgrade Flink version of Flink ML to 1.16.1

Posted by "lindong28 (via GitHub)" <gi...@apache.org>.
lindong28 merged PR #234:
URL: https://github.com/apache/flink-ml/pull/234


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-ml] lindong28 commented on a diff in pull request #234: [FLINK-31887] Upgrade Flink version of Flink ML to 1.16.1

Posted by "lindong28 (via GitHub)" <gi...@apache.org>.
lindong28 commented on code in PR #234:
URL: https://github.com/apache/flink-ml/pull/234#discussion_r1174844283


##########
flink-ml-iteration/src/test/java/org/apache/flink/iteration/operator/HeadOperatorTest.java:
##########
@@ -318,6 +322,10 @@ public void testPostponeGloballyAlignedEventsAfterSnapshot() throws Exception {
                                                     OperatorUtils.getUniqueSenderId(operatorId, 0)),
                                             0)),
                             new ArrayList<>(harness.getOutput()));
+                    // TODO There might be a potential bug makes the testing hang, which could be

Review Comment:
   The root cause is that `RecordingHeadOperatorFactory.latestHeadOperator` is not properly closed. The issue can be fixed by having the following code in `HeadOperatorTest#createHarnessAndRun`:
   
   ```
   try {
       return runnable.apply(harness);
   } finally {
       RecordingHeadOperatorFactory.latestHeadOperator.close();
   }
   ```
   
   
   



##########
flink-ml-iteration/src/main/java/org/apache/flink/iteration/operator/OperatorUtils.java:
##########
@@ -136,31 +135,20 @@ public static StreamConfig createWrappedOperatorConfig(StreamConfig config, Clas
         wrappedConfig.setTypeSerializerOut(
                 ((IterationRecordSerializer<?>) typeSerializerOut).getInnerSerializer());
 
-        Stream.concat(
-                        config.getChainedOutputs(cl).stream(),
-                        config.getNonChainedOutputs(cl).stream())
+        config.getChainedOutputs(cl)
                 .forEach(
-                        edge -> {
-                            OutputTag<?> outputTag = edge.getOutputTag();
-                            if (outputTag != null) {
-                                TypeSerializer<?> typeSerializerSideOut =
-                                        config.getTypeSerializerSideOut(outputTag, cl);
-                                checkState(
-                                        typeSerializerSideOut instanceof IterationRecordSerializer,
-                                        "The serializer of side output with tag[%s] should be IterationRecordSerializer but it is %s.",
-                                        outputTag,
-                                        typeSerializerSideOut);
-                                wrappedConfig.setTypeSerializerSideOut(
-                                        new OutputTag<>(
-                                                outputTag.getId(),
-                                                ((IterationRecordTypeInfo<?>)
-                                                                outputTag.getTypeInfo())
-                                                        .getInnerTypeInfo()),
-                                        ((IterationRecordSerializer) typeSerializerSideOut)
-                                                .getInnerSerializer());
-                            }
+                        chainedOutput -> {
+                            OutputTag<?> outputTag = chainedOutput.getOutputTag();
+                            setTypeSerializerSideOut(outputTag, config, wrappedConfig, cl);
+                        });
+        config.getOperatorNonChainedOutputs(cl)

Review Comment:
   Would it be simpler to keep the previous style and do something like this:
   
   ```
   Stream.concat(
           config.getChainedOutputs(cl).stream(),
           config.getOperatorNonChainedOutputs(cl).stream())
           .forEach(
                   output -> {
                       OutputTag<?> outputTag = output.getOutputTag();
                       setTypeSerializerSideOut(outputTag, config, wrappedConfig, cl);
                   }
           );
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-ml] jiangxin369 commented on a diff in pull request #234: [FLINK-31887] Upgrade Flink version of Flink ML to 1.16.1

Posted by "jiangxin369 (via GitHub)" <gi...@apache.org>.
jiangxin369 commented on code in PR #234:
URL: https://github.com/apache/flink-ml/pull/234#discussion_r1175973954


##########
flink-ml-iteration/src/main/java/org/apache/flink/iteration/operator/OperatorUtils.java:
##########
@@ -136,31 +135,20 @@ public static StreamConfig createWrappedOperatorConfig(StreamConfig config, Clas
         wrappedConfig.setTypeSerializerOut(
                 ((IterationRecordSerializer<?>) typeSerializerOut).getInnerSerializer());
 
-        Stream.concat(
-                        config.getChainedOutputs(cl).stream(),
-                        config.getNonChainedOutputs(cl).stream())
+        config.getChainedOutputs(cl)
                 .forEach(
-                        edge -> {
-                            OutputTag<?> outputTag = edge.getOutputTag();
-                            if (outputTag != null) {
-                                TypeSerializer<?> typeSerializerSideOut =
-                                        config.getTypeSerializerSideOut(outputTag, cl);
-                                checkState(
-                                        typeSerializerSideOut instanceof IterationRecordSerializer,
-                                        "The serializer of side output with tag[%s] should be IterationRecordSerializer but it is %s.",
-                                        outputTag,
-                                        typeSerializerSideOut);
-                                wrappedConfig.setTypeSerializerSideOut(
-                                        new OutputTag<>(
-                                                outputTag.getId(),
-                                                ((IterationRecordTypeInfo<?>)
-                                                                outputTag.getTypeInfo())
-                                                        .getInnerTypeInfo()),
-                                        ((IterationRecordSerializer) typeSerializerSideOut)
-                                                .getInnerSerializer());
-                            }
+                        chainedOutput -> {
+                            OutputTag<?> outputTag = chainedOutput.getOutputTag();
+                            setTypeSerializerSideOut(outputTag, config, wrappedConfig, cl);
+                        });
+        config.getOperatorNonChainedOutputs(cl)

Review Comment:
   The reason why I separate these two parts is `config.getChainedOutputs(cl)` returns `List<StreamEdge>` while `config.getOperatorNonChainedOutputs(cl)` returns `List<NonChainedOutput>`, they are of different types and are not suitable to iterate in the same loop.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-ml] lindong28 commented on pull request #234: [FLINK-31887] Upgrade Flink version of Flink ML to 1.16.1

Posted by "lindong28 (via GitHub)" <gi...@apache.org>.
lindong28 commented on PR #234:
URL: https://github.com/apache/flink-ml/pull/234#issuecomment-1521123703

   Thanks for the update. LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org