You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/01/13 13:36:19 UTC

[GitHub] [flink] twalthr opened a new pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

twalthr opened a new pull request #18352:
URL: https://github.com/apache/flink/pull/18352


   ## What is the purpose of the change
   
   This improves the function identifier string for anonymous/inline functions. The new identifier has the following characteristics:
   
   - If a function is not stateful, the identifier contains only the class name. No custom hash.
   - The anonymous/inline function nature is visible via `*...*` similar to structured types.
   - We use the pure class name to find classes easily (previously the information about inner classes was lost via `$` instead of `.`).
   
   
   ## Brief change log
   
   - Improved `functionIdentifier()`
   - Updates plans.
   
   ## Verifying this change
   
   This change added tests and can be verified as follows: `UserFunctionHelperTest`
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: yes
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? JavaDocs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18352:
URL: https://github.com/apache/flink/pull/18352#issuecomment-1012981301


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29419",
       "triggerID" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aab8d5779881c1c1575c9c15e263e28fce50a215 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29419) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] twalthr commented on a change in pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
twalthr commented on a change in pull request #18352:
URL: https://github.com/apache/flink/pull/18352#discussion_r784160659



##########
File path: flink-table/flink-table-common/src/test/java/org/apache/flink/table/functions/UserDefinedFunctionHelperTest.java
##########
@@ -143,34 +162,27 @@ private void testErrorMessage() {
 
         @Nullable String expectedErrorMessage;
 
-        TestSpec(Class<? extends UserDefinedFunction> functionClass) {
+        TestSpec(
+                @Nullable Class<? extends UserDefinedFunction> functionClass,
+                @Nullable UserDefinedFunction functionInstance,
+                @Nullable CatalogFunction catalogFunction) {
             this.functionClass = functionClass;
-            this.functionInstance = null;
-            this.catalogFunction = null;
-        }
-
-        TestSpec(UserDefinedFunction functionInstance) {
-            this.functionClass = null;
             this.functionInstance = functionInstance;
-            this.catalogFunction = null;
-        }
-
-        TestSpec(CatalogFunction catalogFunction) {
-            this.functionClass = null;
-            this.functionInstance = null;
             this.catalogFunction = catalogFunction;
         }
 
         static TestSpec forClass(Class<? extends UserDefinedFunction> function) {
-            return new TestSpec(function);
+            return new TestSpec(function, null, null);
         }
 
         static TestSpec forInstance(UserDefinedFunction function) {
-            return new TestSpec(function);
+            return new TestSpec(null, function, null);
         }
 
-        static TestSpec forCatalogFunction(String className, FunctionLanguage language) {
-            return new TestSpec(new CatalogFunctionMock(className, language));
+        static TestSpec forCatalogFunction(
+                String className,
+                @SuppressWarnings("SameParameterValue") FunctionLanguage language) {

Review comment:
       You are right. I will simply always choose Java for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18352:
URL: https://github.com/apache/flink/pull/18352#issuecomment-1012981301


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29419",
       "triggerID" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb360bd849dc4f333a89afc6c84e2aaeb8ed004d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29546",
       "triggerID" : "cb360bd849dc4f333a89afc6c84e2aaeb8ed004d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9a722a403d015ecb0d161d377ee6421f1ada63b7",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29570",
       "triggerID" : "9a722a403d015ecb0d161d377ee6421f1ada63b7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9a722a403d015ecb0d161d377ee6421f1ada63b7 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29570) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] twalthr closed pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
twalthr closed pull request #18352:
URL: https://github.com/apache/flink/pull/18352


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18352:
URL: https://github.com/apache/flink/pull/18352#issuecomment-1012981301


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29419",
       "triggerID" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb360bd849dc4f333a89afc6c84e2aaeb8ed004d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29546",
       "triggerID" : "cb360bd849dc4f333a89afc6c84e2aaeb8ed004d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9a722a403d015ecb0d161d377ee6421f1ada63b7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29570",
       "triggerID" : "9a722a403d015ecb0d161d377ee6421f1ada63b7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aab8d5779881c1c1575c9c15e263e28fce50a215 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29419) 
   * cb360bd849dc4f333a89afc6c84e2aaeb8ed004d Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29546) 
   * 9a722a403d015ecb0d161d377ee6421f1ada63b7 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29570) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18352:
URL: https://github.com/apache/flink/pull/18352#issuecomment-1012981301


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29419",
       "triggerID" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb360bd849dc4f333a89afc6c84e2aaeb8ed004d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cb360bd849dc4f333a89afc6c84e2aaeb8ed004d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aab8d5779881c1c1575c9c15e263e28fce50a215 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29419) 
   * cb360bd849dc4f333a89afc6c84e2aaeb8ed004d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #18352:
URL: https://github.com/apache/flink/pull/18352#issuecomment-1012146727


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 78aed1235a199cd9e47fc5d87aeeeffec5e17ffc (Thu Jan 13 13:40:59 UTC 2022)
   
    ✅no warnings
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18352:
URL: https://github.com/apache/flink/pull/18352#issuecomment-1012981301


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29419",
       "triggerID" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb360bd849dc4f333a89afc6c84e2aaeb8ed004d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29546",
       "triggerID" : "cb360bd849dc4f333a89afc6c84e2aaeb8ed004d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9a722a403d015ecb0d161d377ee6421f1ada63b7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9a722a403d015ecb0d161d377ee6421f1ada63b7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aab8d5779881c1c1575c9c15e263e28fce50a215 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29419) 
   * cb360bd849dc4f333a89afc6c84e2aaeb8ed004d Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29546) 
   * 9a722a403d015ecb0d161d377ee6421f1ada63b7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18352:
URL: https://github.com/apache/flink/pull/18352#issuecomment-1012981301


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29419",
       "triggerID" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb360bd849dc4f333a89afc6c84e2aaeb8ed004d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29546",
       "triggerID" : "cb360bd849dc4f333a89afc6c84e2aaeb8ed004d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aab8d5779881c1c1575c9c15e263e28fce50a215 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29419) 
   * cb360bd849dc4f333a89afc6c84e2aaeb8ed004d Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29546) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] twalthr commented on a change in pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
twalthr commented on a change in pull request #18352:
URL: https://github.com/apache/flink/pull/18352#discussion_r784159595



##########
File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/UserDefinedFunctionHelper.java
##########
@@ -243,6 +244,36 @@ public static void prepareInstance(ReadableConfig config, UserDefinedFunction fu
         cleanFunction(config, function);
     }
 
+    /**
+     * Returns whether a {@link UserDefinedFunction} can be easily serialized and identified by only
+     * a fully qualified class name. It must have a default constructor and no serializable fields.
+     */
+    public static boolean isClassNameSerializable(UserDefinedFunction function) {
+        final Class<?> functionClass = function.getClass();
+        if (!InstantiationUtil.hasPublicNullaryConstructor(functionClass)) {
+            // function must be parameterized
+            return false;
+        }
+        Class<?> currentClass = functionClass;
+        while (!currentClass.equals(UserDefinedFunction.class)) {
+            for (Field field : currentClass.getDeclaredFields()) {
+                if (!Modifier.isTransient(field.getModifiers())
+                        && !Modifier.isStatic(field.getModifiers())) {
+                    // function seems to be stateful
+                    return false;
+                }
+            }
+            currentClass = currentClass.getSuperclass();
+        }

Review comment:
       Even with a marker interface you need perform this check. User-defined function might come in various flavors. And the more interfaces, the harder it gets to get the job done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] slinkydeveloper commented on a change in pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
slinkydeveloper commented on a change in pull request #18352:
URL: https://github.com/apache/flink/pull/18352#discussion_r784638813



##########
File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/UserDefinedFunctionHelper.java
##########
@@ -243,6 +244,36 @@ public static void prepareInstance(ReadableConfig config, UserDefinedFunction fu
         cleanFunction(config, function);
     }
 
+    /**
+     * Returns whether a {@link UserDefinedFunction} can be easily serialized and identified by only
+     * a fully qualified class name. It must have a default constructor and no serializable fields.
+     */
+    public static boolean isClassNameSerializable(UserDefinedFunction function) {
+        final Class<?> functionClass = function.getClass();
+        if (!InstantiationUtil.hasPublicNullaryConstructor(functionClass)) {
+            // function must be parameterized
+            return false;
+        }
+        Class<?> currentClass = functionClass;
+        while (!currentClass.equals(UserDefinedFunction.class)) {
+            for (Field field : currentClass.getDeclaredFields()) {
+                if (!Modifier.isTransient(field.getModifiers())
+                        && !Modifier.isStatic(field.getModifiers())) {
+                    // function seems to be stateful
+                    return false;
+                }
+            }
+            currentClass = currentClass.getSuperclass();
+        }

Review comment:
       It is, but on the other hand this is very magic and very hard as a behavior to predict for the users. I would rather have this kind of logic enabled behind a flag and default to "if you wanna implement a stateful function, mark it as such with this interface"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] slinkydeveloper commented on a change in pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
slinkydeveloper commented on a change in pull request #18352:
URL: https://github.com/apache/flink/pull/18352#discussion_r784660032



##########
File path: docs/content/docs/dev/table/functions/udfs.md
##########
@@ -240,7 +240,9 @@ An implementation class must extend from one of the available base classes (e.g.
 
 The class must be declared `public`, not `abstract`, and should be globally accessible. Thus, non-static inner or anonymous classes are not allowed.
 
-For storing a user-defined function in a persistent catalog, the class must have a default constructor and must be instantiable during runtime.
+For storing a user-defined function in a persistent catalog, the class must have a default constructor
+and must be instantiable during runtime. Anonymous functions in Table API can only be persisted if the
+function is not stateful (i.e. no non-transient fields).

Review comment:
       and non static.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] twalthr commented on a change in pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
twalthr commented on a change in pull request #18352:
URL: https://github.com/apache/flink/pull/18352#discussion_r784653224



##########
File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/UserDefinedFunctionHelper.java
##########
@@ -243,6 +244,36 @@ public static void prepareInstance(ReadableConfig config, UserDefinedFunction fu
         cleanFunction(config, function);
     }
 
+    /**
+     * Returns whether a {@link UserDefinedFunction} can be easily serialized and identified by only
+     * a fully qualified class name. It must have a default constructor and no serializable fields.
+     */
+    public static boolean isClassNameSerializable(UserDefinedFunction function) {
+        final Class<?> functionClass = function.getClass();
+        if (!InstantiationUtil.hasPublicNullaryConstructor(functionClass)) {
+            // function must be parameterized
+            return false;
+        }
+        Class<?> currentClass = functionClass;
+        while (!currentClass.equals(UserDefinedFunction.class)) {
+            for (Field field : currentClass.getDeclaredFields()) {
+                if (!Modifier.isTransient(field.getModifiers())
+                        && !Modifier.isStatic(field.getModifiers())) {
+                    // function seems to be stateful
+                    return false;
+                }
+            }
+            currentClass = currentClass.getSuperclass();
+        }

Review comment:
       I'm also against magic but in this case the magic is minimal. It only influences a plan digest and makes it more deterministic. Most users don't parameterize their functions anyway and/or register it under a name which makes them non-anonymous anyway.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18352:
URL: https://github.com/apache/flink/pull/18352#issuecomment-1012981301


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29419",
       "triggerID" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aab8d5779881c1c1575c9c15e263e28fce50a215 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29419) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] twalthr commented on a change in pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
twalthr commented on a change in pull request #18352:
URL: https://github.com/apache/flink/pull/18352#discussion_r784159595



##########
File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/UserDefinedFunctionHelper.java
##########
@@ -243,6 +244,36 @@ public static void prepareInstance(ReadableConfig config, UserDefinedFunction fu
         cleanFunction(config, function);
     }
 
+    /**
+     * Returns whether a {@link UserDefinedFunction} can be easily serialized and identified by only
+     * a fully qualified class name. It must have a default constructor and no serializable fields.
+     */
+    public static boolean isClassNameSerializable(UserDefinedFunction function) {
+        final Class<?> functionClass = function.getClass();
+        if (!InstantiationUtil.hasPublicNullaryConstructor(functionClass)) {
+            // function must be parameterized
+            return false;
+        }
+        Class<?> currentClass = functionClass;
+        while (!currentClass.equals(UserDefinedFunction.class)) {
+            for (Field field : currentClass.getDeclaredFields()) {
+                if (!Modifier.isTransient(field.getModifiers())
+                        && !Modifier.isStatic(field.getModifiers())) {
+                    // function seems to be stateful
+                    return false;
+                }
+            }
+            currentClass = currentClass.getSuperclass();
+        }

Review comment:
       Even with a marker interface you need to perform this check. User-defined function might come in various flavors. And the more interfaces, the harder it gets to get the job done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] slinkydeveloper commented on a change in pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
slinkydeveloper commented on a change in pull request #18352:
URL: https://github.com/apache/flink/pull/18352#discussion_r784104335



##########
File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/UserDefinedFunctionHelper.java
##########
@@ -243,6 +244,36 @@ public static void prepareInstance(ReadableConfig config, UserDefinedFunction fu
         cleanFunction(config, function);
     }
 
+    /**
+     * Returns whether a {@link UserDefinedFunction} can be easily serialized and identified by only
+     * a fully qualified class name. It must have a default constructor and no serializable fields.
+     */
+    public static boolean isClassNameSerializable(UserDefinedFunction function) {
+        final Class<?> functionClass = function.getClass();
+        if (!InstantiationUtil.hasPublicNullaryConstructor(functionClass)) {
+            // function must be parameterized
+            return false;
+        }
+        Class<?> currentClass = functionClass;
+        while (!currentClass.equals(UserDefinedFunction.class)) {
+            for (Field field : currentClass.getDeclaredFields()) {
+                if (!Modifier.isTransient(field.getModifiers())
+                        && !Modifier.isStatic(field.getModifiers())) {
+                    // function seems to be stateful
+                    return false;
+                }
+            }
+            currentClass = currentClass.getSuperclass();
+        }

Review comment:
       That sounds like a very magic way to find out whether a function is stateful or not. Don't we have a marker interface for that?

##########
File path: flink-table/flink-table-common/src/test/java/org/apache/flink/table/functions/UserDefinedFunctionHelperTest.java
##########
@@ -143,34 +162,27 @@ private void testErrorMessage() {
 
         @Nullable String expectedErrorMessage;
 
-        TestSpec(Class<? extends UserDefinedFunction> functionClass) {
+        TestSpec(
+                @Nullable Class<? extends UserDefinedFunction> functionClass,
+                @Nullable UserDefinedFunction functionInstance,
+                @Nullable CatalogFunction catalogFunction) {
             this.functionClass = functionClass;
-            this.functionInstance = null;
-            this.catalogFunction = null;
-        }
-
-        TestSpec(UserDefinedFunction functionInstance) {
-            this.functionClass = null;
             this.functionInstance = functionInstance;
-            this.catalogFunction = null;
-        }
-
-        TestSpec(CatalogFunction catalogFunction) {
-            this.functionClass = null;
-            this.functionInstance = null;
             this.catalogFunction = catalogFunction;
         }
 
         static TestSpec forClass(Class<? extends UserDefinedFunction> function) {
-            return new TestSpec(function);
+            return new TestSpec(function, null, null);
         }
 
         static TestSpec forInstance(UserDefinedFunction function) {
-            return new TestSpec(function);
+            return new TestSpec(null, function, null);
         }
 
-        static TestSpec forCatalogFunction(String className, FunctionLanguage language) {
-            return new TestSpec(new CatalogFunctionMock(className, language));
+        static TestSpec forCatalogFunction(
+                String className,
+                @SuppressWarnings("SameParameterValue") FunctionLanguage language) {

Review comment:
       What warning is this one?

##########
File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/UserDefinedFunction.java
##########
@@ -58,9 +60,13 @@
 
     /** Returns a unique, serialized representation for this function. */
     public final String functionIdentifier() {
+        final String className = getClass().getName();
+        if (isClassNameSerializable(this)) {
+            return className;
+        }
         final String md5 =
                 EncodingUtils.hex(EncodingUtils.md5(EncodingUtils.encodeObjectToString(this)));
-        return getClass().getName().replace('.', '$').concat("$").concat(md5);
+        return className.concat("$").concat(md5);

Review comment:
       Does this logic works when the function is implemented as a scala `object`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] twalthr commented on a change in pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
twalthr commented on a change in pull request #18352:
URL: https://github.com/apache/flink/pull/18352#discussion_r784649660



##########
File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/UserDefinedFunction.java
##########
@@ -58,9 +60,13 @@
 
     /** Returns a unique, serialized representation for this function. */
     public final String functionIdentifier() {
+        final String className = getClass().getName();
+        if (isClassNameSerializable(this)) {
+            return className;
+        }
         final String md5 =
                 EncodingUtils.hex(EncodingUtils.md5(EncodingUtils.encodeObjectToString(this)));
-        return getClass().getName().replace('.', '$').concat("$").concat(md5);
+        return className.concat("$").concat(md5);

Review comment:
       Yes, you can also find them in the updated plans with a `$` at the end. E.g.:
   `org.apache.flink.table.planner.plan.batch.table.CalcTest$giveMeCaseClass$`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] twalthr commented on pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
twalthr commented on pull request #18352:
URL: https://github.com/apache/flink/pull/18352#issuecomment-1012916947


   @slinkydeveloper I addressed the test feedback. Please have another look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] twalthr commented on a change in pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
twalthr commented on a change in pull request #18352:
URL: https://github.com/apache/flink/pull/18352#discussion_r784680356



##########
File path: docs/content/docs/dev/table/functions/udfs.md
##########
@@ -240,7 +240,9 @@ An implementation class must extend from one of the available base classes (e.g.
 
 The class must be declared `public`, not `abstract`, and should be globally accessible. Thus, non-static inner or anonymous classes are not allowed.
 
-For storing a user-defined function in a persistent catalog, the class must have a default constructor and must be instantiable during runtime.
+For storing a user-defined function in a persistent catalog, the class must have a default constructor
+and must be instantiable during runtime. Anonymous functions in Table API can only be persisted if the
+function is not stateful (i.e. no non-transient fields).

Review comment:
       updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #18352:
URL: https://github.com/apache/flink/pull/18352#issuecomment-1012981301


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aab8d5779881c1c1575c9c15e263e28fce50a215 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #18352: [FLINK-15585][table] Improve function identifier string in plan digest

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #18352:
URL: https://github.com/apache/flink/pull/18352#issuecomment-1012981301


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29419",
       "triggerID" : "aab8d5779881c1c1575c9c15e263e28fce50a215",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cb360bd849dc4f333a89afc6c84e2aaeb8ed004d",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29546",
       "triggerID" : "cb360bd849dc4f333a89afc6c84e2aaeb8ed004d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9a722a403d015ecb0d161d377ee6421f1ada63b7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29570",
       "triggerID" : "9a722a403d015ecb0d161d377ee6421f1ada63b7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cb360bd849dc4f333a89afc6c84e2aaeb8ed004d Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29546) 
   * 9a722a403d015ecb0d161d377ee6421f1ada63b7 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=29570) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org