You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Daniel Becker (Code Review)" <ge...@cloudera.org> on 2022/01/17 15:57:08 UTC

[Impala-ASF-CR] IMPALA-10997: Refactor Java Hive UDF code.

Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/18020 )

Change subject: IMPALA-10997: Refactor Java Hive UDF code.
......................................................................


Patch Set 10:

(44 comments)

Thanks, this will make it easier to read Java UDF code.

http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaClass.java
File fe/src/main/java/org/apache/impala/hive/executor/HiveJavaClass.java:

http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaClass.java@37
PS10, Line 37: /**
             :  * Generate a Class object based on a class name and jar. Throws
             :  * CatalogException if class could not be retrieved from the Jar.
             :  */
I think this refers to one of the functions, it is not a class comment. That said, a comment describing the whole class would be great.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaClass.java@53
PS10, Line 53: persistLoader_
We only use this in the constructor so it doesn't need to be a member.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaClass.java@55
PS10, Line 55: isClassLoaderClosed
Nit: this should also and in a _, so 'isClassLoaderClosed_'.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaClass.java@75
PS10, Line 75: synchronized
Can you add a comment why this method is synchronized? To me it seems what it does wasn't synchronized before this change.

Could you also describe in the class comment how this class should and how it should not be used in a multithreaded environment?


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaClass.java@100
PS10, Line 100: localJarPath
This parameter is not used in the function.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaClass.java@100
PS10, Line 100: getUDFClass
I think 'loadUDFClass' would be a better name. We already have a getter with the name 'getUDFClass' and this function does a different thing.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunction.java
File fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunction.java:

http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunction.java@47
PS10, Line 47: g
Nit: capital G.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunction.java@52
PS10, Line 52: class
Nit: not class but function.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactory.java
File fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactory.java:

http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactory.java@36
PS10, Line 36: Function
Just curious, is there a reason this is not a 'ScalarFunction', especially as in the implementation a precondition requires it to be that?


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactoryImpl.java
File fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactoryImpl.java:

http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactoryImpl.java@81
PS10, Line 81: e
Nit: *


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveLegacyJavaFunction.java
File fe/src/main/java/org/apache/impala/hive/executor/HiveLegacyJavaFunction.java:

http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveLegacyJavaFunction.java@113
PS10, Line 113: getUDFInstance
Could rename to instantiateUDFInstance to distinguish it from the getter 'getUDFInstance'.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveLegacyJavaFunction.java@138
PS10, Line 138: function
"function" doesn't seem to be a variable name or other identifier in this context.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveLegacyJavaFunction.java@184
PS10, Line 184:   private boolean methodMatches(Method m, Type retType,
Could be static.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java
File fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java:

http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java@40
PS10, Line 40: Hive has two ways in which UDFs are supported.
Could you mention the what the two ways are and which of them we support in Impala, in what subclass?


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java@42
PS10, Line 42: @SuppressWarnings("restriction")
Just curious, why do we need to suppress this?


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java@61
PS10, Line 61: This
Nit: these.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java@72
PS10, Line 72: outBufferCapacity_
'outBufferStringCapacity_' would be better. Now it may be confusing as it could also refer to 'outputBufferPtr_'.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java@76
PS10, Line 76: Object
If it is instantiated as a Writable array, can't the type here be Writable[] instead of Object[]?


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java@110
PS10, Line 110: input
Isn't it 'inputObjects_'?


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java@124
PS10, Line 124:   // Sets the result object 'obj' into the outputBufferPtr_
Could mention what the function returns.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java@300
PS10, Line 300:   abstract protected void closeDerived();
Could add a comment to document what it should do.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java@302
PS10, Line 302:   abstract protected Object evaluateDerived(JavaUdfDataType[] argTypes,
Could add a comment to document what it should do.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutor.java@305
PS10, Line 305:   abstract public Method getMethod();
Could add a comment to document what it should do.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java
File fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java:

http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java@43
PS10, Line 43: inputArgs_
This is only used in 'evaluateDerived', so this could be a local variable there.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java@46
PS10, Line 46: UdfExecutor
Nit: HiveUdfExecutorLegacy.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfExecutorLegacy.java@69
PS10, Line 69: Returns 0 if the udf returned NULL. (the result is a ptr so this is okay).
If the result is a pointer, the return type should reflect that and not be Object; but maybe the comment is off?


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/JavaUdfDataType.java
File fe/src/main/java/org/apache/impala/hive/executor/JavaUdfDataType.java:

http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/JavaUdfDataType.java@66
PS10, Line 66: getPrimitiveType
For completeness, I would add a getter for 'description_' too. I know 'toString' returns that but still it feels better to me.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/JavaUdfDataType.java@150
PS10, Line 150: if (javaType == JavaUdfDataType.INVALID_TYPE) continue;
I think it would be nicer if we returned false before the loop if 't' was the INVALID, rather than using continue.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
File fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java:

http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java@39
PS10, Line 39: public class UdfExecutor {
Could you explain how this class can be used in a multithreaded environment? We have synchronization in 'close'.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java@49
PS10, Line 49:   public UdfExecutor(byte[] thriftParams) throws ImpalaException {
Is it intentional that you removed the function comment? I think there should be one explaining what it does.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java@81
PS10, Line 81: synchronized
Nit: is there any benefit to not making the method synchronized instead?


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java@95
PS10, Line 95: public long evaluateForTesting(Object... args) throws ImpalaRuntimeException {
             :     return hiveUdfExecutor_.evaluateForTesting(args);
             :   }
             : 
             :   /**
             :    * Evalutes the UDF with 'args' as the input to the UDF. This is exposed
             :    * for testing and not the version of evaluate() the backend uses.
             :    */
             :   public long evaluateForTesting() throws ImpalaRuntimeException {
             :     return hiveUdfExecutor_.evaluate();
             :   }
I'm a bit confused about these two 'evaluateForTesting()' functions. The second has no args and does what 'evaluate()' does, so it may not be needed; also, the comment probably belongs to the first one.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/util/FunctionUtils.java
File fe/src/main/java/org/apache/impala/util/FunctionUtils.java:

http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/main/java/org/apache/impala/util/FunctionUtils.java@75
PS10, Line 75:  
Nit: one more space.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/test/java/org/apache/impala/hive/executor/HiveLegacyJavaFunctionTest.java
File fe/src/test/java/org/apache/impala/hive/executor/HiveLegacyJavaFunctionTest.java:

http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/test/java/org/apache/impala/hive/executor/HiveLegacyJavaFunctionTest.java@70
PS10, Line 70:     public boolean matches(ScalarFunction fn) {
Nit: some fields are referenced through this, others not (db and fnName); it would be better to use one style in the function.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/test/java/org/apache/impala/hive/executor/HiveLegacyJavaFunctionTest.java@100
PS10, Line 100: functions
Nit: for me it would be clearer if it was named 'expectedFunctions'.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/test/java/org/apache/impala/hive/executor/HiveLegacyJavaFunctionTest.java@110
PS10, Line 110: /**
              :      * Check if there are any expected signatures that weren't extracted.
              :      */
              :     public void checkMissing(List<ScalarFunction> scalarFns) {
              :       if (scalarFns.size() == functions.size()) {
              :         return;
              :       }
              :       for (ExpectedFunction expectedFunction : functions) {
              :         boolean found = false;
              :         for (ScalarFunction scalarFn : scalarFns) {
              :           if (expectedFunction.matches(scalarFn)) {
              :             found = true;
              :             break;
              :           }
              :         }
              :         if (!found) {
              :           Assert.fail("Expected function not extracted: " + expectedFunction.retType +
              :               " " + expectedFunction.className + "(" +
              :               Joiner.on(",").join(expectedFunction.paramTypes) + ")");
              :         }
              :       }
              :     }
              : 
              :     /**
              :      * Check if the given extracted signature was expected.
              :      */
              :     public void checkContains(ScalarFunction fn) {
              :       for (ExpectedFunction func : functions) {
              :         if (func.matches(fn)) {
              :           return;
              :         }
              :       }
              :       Assert.fail("Extraction failed for " + fn.getFunctionName() +
              :           ", actual function not found: " + fn.getReturnType() + " " +
              :           fn.signatureString());
              :     }
              :   }
We could find missing and unexpected extracted functions in the same method. For example we could take the loop in 'checkMissing()' and when a scalarFn is found to match with an  expectedFunction, we could remove the scalarFn from scalarFns and after the loop, if there is anything left in scalarFns, that is an unexpected extracted function.
(We could of course clone 'scalarFns' at the beginning of the method if we don't want to modify the argument.)


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/test/java/org/apache/impala/hive/executor/HiveLegacyJavaFunctionTest.java@150
PS10, Line 150: ExpectedFunctions expectedFuncs;
              :     Class<?> udfClass;
Nit: where these are not reused, they could be initialised here and not following declaration. Same for the next test.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/test/java/org/apache/impala/hive/executor/HiveLegacyJavaFunctionTest.java@166
PS10, Line 166: extractFailTest
It would be good if the name reflected why extraction is expected to fail, just like in the case of 'extractFailNoEvaluateMethodsTest'.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/test/java/org/apache/impala/hive/executor/HiveLegacyJavaFunctionTest.java@226
PS10, Line 226: "Db"
These string constants could be extracted because they are also used in the ExpectedFunction constructor. But if you think that's too much hassle, you could leave them this way, the tests would fail if they were modified in one place but not the other, so we should notice it.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/test/java/org/apache/impala/hive/executor/TestHiveJavaFunction.java
File fe/src/test/java/org/apache/impala/hive/executor/TestHiveJavaFunction.java:

http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/test/java/org/apache/impala/hive/executor/TestHiveJavaFunction.java@27
PS10, Line 27: TestHiveJavaFunction
As it is only used in connection with TestHiveJavaFunctionFactory, which is only used in 'fe/src/test/java/org/apache/impala/testutil/PlannerTestCaseLoader.java', they both could be converted to static inner classes there so that files don't proliferate in this directory.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/test/java/org/apache/impala/hive/executor/TestHiveJavaFunctionFactory.java
File fe/src/test/java/org/apache/impala/hive/executor/TestHiveJavaFunctionFactory.java:

http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/test/java/org/apache/impala/hive/executor/TestHiveJavaFunctionFactory.java@25
PS10, Line 25: TestHiveJavaFunctionFactory
As it is only used in 'fe/src/test/java/org/apache/impala/testutil/PlannerTestCaseLoader.java', it could be converted to a static inner class there so that files don't proliferate in this directory.


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
File fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java:

http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java@a269
PS10, Line 269: 
              : 
              : 
Is this no longer a possibility?


http://gerrit.cloudera.org:8080/#/c/18020/10/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java@255
PS10, Line 255: Validates that c.evaluate(args) == retValue.
I know this is not related to this change but if we are here anyway, I think it should be 'c.evaluate(args) == expectedValue', not retValue.

Also, the comment could mention that validation is only done if 'validate' is true.


http://gerrit.cloudera.org:8080/#/c/18020/10/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

http://gerrit.cloudera.org:8080/#/c/18020/10/tests/query_test/test_udfs.py@443
PS10, Line 443:     ex = self.execute_query_expect_failure(self.client, create_fn_stmt)
Do I understand it correctly that after this commit, invalid symbols are detected at UDF creation time and not UDF invokation time?

In this case, the comment on L428 should be updated; also, 'query' (L439) is no longer needed.



-- 
To view, visit http://gerrit.cloudera.org:8080/18020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc9572e15fbed1876412159b99dddd3fb4d37174
Gerrit-Change-Number: 18020
Gerrit-PatchSet: 10
Gerrit-Owner: Steve Carlin <sc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 17 Jan 2022 15:57:08 +0000
Gerrit-HasComments: Yes