You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/12/27 13:47:08 UTC

[GitHub] [spark] HeartSaVioR opened a new pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

HeartSaVioR opened a new pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025
 
 
   ### What changes were proposed in this pull request?
   
   This patch is based on #23921 but revised to be simpler, as well as adds UT to test the behavior.
   
   Spark loads new JARs for `ADD JAR` and `CREATE FUNCTION ... USING JAR` into jar classloader in shared state, and changes current thread's context classloader to jar classloader as many parts of remaining codes rely on current thread's context classloader.
   
   This would work if the further queries will run in same thread and there's no change on context classloader for the thread, but once the context classloader of current thread is switched back by various reason, Spark fails to create instance of class for the function.
   
   This bug mostly affects spark-shell, as spark-shell will roll back current thread's context classloader at every prompt. But it may also affects the case of job-server, where the queries may be running in multiple threads.
   
   This patch fixes the issue via switching the context classloader to the classloader which loads the class. Hopefully FunctionBuilder created by `makeFunctionBuilder` has the information of Class as a part of closure, hence the Class itself can be provided regardless of current thread's context classloader.
   
   ### Why are the changes needed?
   
   Without this patch, end users cannot execute Hive UDF using JAR twice in spark-shell.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New UT.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569682868
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115953/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r399612923
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
+      Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
+        var udfExpr: Option[Expression] = None
+        try {
+          // When we instantiate hive UDF wrapper class, we may throw exception if the input
+          // expressions don't satisfy the hive UDF, such as type mismatch, input number
+          // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
+          if (classOf[UDF].isAssignableFrom(clazz)) {
+            udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
+            udfExpr.get.dataType // Force it to check input data types.
 
 Review comment:
   I figured out above code doesn't give error - HiveFunctionWrapper stores `instance` which is copied in `makeCopy()` - so once the instance is created it doesn't seems to require changing classloader.
   
   That said, below code gives error:
   
   ```
   // uses classloader which loads clazz
   val udf = HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input)
   // make sure HiveFunctionWrapper.createFunction is not called here
   
   val newUdf = udf.makeCopy(udf.productIterator.map(_.asInstanceOf[AnyRef]).toArray)
   // change classloader which doesn't load clazz
   newUdf.dataType
   ``` 
   
   we force call `.dataType` after creating HiveXXXUDF, so if my understanding is correct it won't matter.
   
   Could you please check whether my observation is correct, or please let me know if I'm missing something?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361667169
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 ##########
 @@ -70,6 +70,17 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
   import hiveContext._
   import spark.implicits._
 
+  private var threadContextClassLoader: ClassLoader = _
+
+  override protected def beforeEach(): Unit = {
 
 Review comment:
   This is needed to make a new UT pass, as some tests change the current thread's context classloader to jar classloader.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569406181
 
 
   sorry, but I don't have a smart idea, either... cc: @HyukjinKwon @cloud-fan 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569467521
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115896/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569506748
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115913/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361785548
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
 
 Review comment:
   We don't need to check in the current class loader first?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361905710
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 ##########
 @@ -2491,4 +2492,47 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " +
 
 Review comment:
   Second run of query execution will fail. It succeeds on first run of query execution, because the actual loading for jar happens on first reference (lazy loading). CREATE FUNCTION doesn't seem to load the jar at that time.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569387153
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-570139894
 
 
   Thanks all for reviewing and merging!
   
   > @HeartSaVioR can you open another PR for 2.4?
   
   Sure, I'll submit a PR for 2.4 as well. Thanks!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569292096
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115861/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361907231
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 ##########
 @@ -2491,4 +2492,47 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " +
 
 Review comment:
   FYI: without the patch, this test throws below exception and fails:
   
   ```
   No handler for UDF/UDAF/UDTF 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount3': java.lang.ClassNotFoundException: org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount3
   Please make sure your function overrides `public StructObjectInspector initialize(ObjectInspector[] args)`.; line 1 pos 7
   org.apache.spark.sql.AnalysisException: No handler for UDF/UDAF/UDTF 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount3': java.lang.ClassNotFoundException: org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount3
   Please make sure your function overrides `public StructObjectInspector initialize(ObjectInspector[] args)`.; line 1 pos 7
   	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   	at org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.createFunction(HiveShim.scala:245)
   	at org.apache.spark.sql.hive.HiveGenericUDTF.function$lzycompute(hiveUDFs.scala:207)
   	at org.apache.spark.sql.hive.HiveGenericUDTF.function(hiveUDFs.scala:206)
   	at org.apache.spark.sql.hive.HiveGenericUDTF.outputInspector$lzycompute(hiveUDFs.scala:216)
   	at org.apache.spark.sql.hive.HiveGenericUDTF.outputInspector(hiveUDFs.scala:216)
   	at org.apache.spark.sql.hive.HiveGenericUDTF.elementSchema$lzycompute(hiveUDFs.scala:224)
   	at org.apache.spark.sql.hive.HiveGenericUDTF.elementSchema(hiveUDFs.scala:224)
   	at org.apache.spark.sql.hive.HiveSessionCatalog.$anonfun$makeFunctionExpression$2(HiveSessionCatalog.scala:96)
   	at scala.util.Failure.getOrElse(Try.scala:222)
   	at org.apache.spark.sql.hive.HiveSessionCatalog.makeFunctionExpression(HiveSessionCatalog.scala:72)
   	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$makeFunctionBuilder$1(SessionCatalog.scala:1247)
   	at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:121)
   	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction(SessionCatalog.scala:1415)
   	at org.apache.spark.sql.hive.HiveSessionCatalog.super$lookupFunction(HiveSessionCatalog.scala:135)
   	at org.apache.spark.sql.hive.HiveSessionCatalog.$anonfun$lookupFunction0$2(HiveSessionCatalog.scala:135)
   	at scala.util.Try$.apply(Try.scala:213)
   	at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction0(HiveSessionCatalog.scala:135)
   	at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction(HiveSessionCatalog.scala:128)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$15$$anonfun$applyOrElse$96.$anonfun$applyOrElse$99(Analyzer.scala:1739)
   	at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$15$$anonfun$applyOrElse$96.applyOrElse(Analyzer.scala:1739)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$15$$anonfun$applyOrElse$96.applyOrElse(Analyzer.scala:1722)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:286)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:286)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:291)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:376)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:214)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:374)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:327)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:291)
   	at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDown$1(QueryPlan.scala:87)
   	at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:109)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
   	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:109)
   	at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:120)
   	at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:125)
   	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
   	at scala.collection.immutable.List.foreach(List.scala:392)
   	at scala.collection.TraversableLike.map(TraversableLike.scala:238)
   	at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
   	at scala.collection.immutable.List.map(List.scala:298)
   	at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:125)
   	at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:130)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:214)
   	at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:130)
   	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:87)
   	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressions(QueryPlan.scala:78)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1722)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1720)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$3(AnalysisHelper.scala:90)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$1(AnalysisHelper.scala:90)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:86)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:84)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$.apply(Analyzer.scala:1720)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$.apply(Analyzer.scala:1719)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:130)
   	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
   	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
   	at scala.collection.immutable.List.foldLeft(List.scala:89)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:127)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:119)
   	at scala.collection.immutable.List.foreach(List.scala:392)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:119)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:168)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:162)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:122)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:98)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
   	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:98)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:146)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:145)
   	at org.apache.spark.sql.hive.test.TestHiveQueryExecution.analyzed$lzycompute(TestHive.scala:606)
   	at org.apache.spark.sql.hive.test.TestHiveQueryExecution.analyzed(TestHive.scala:589)
   	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:58)
   	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:87)
   	at org.apache.spark.sql.hive.test.TestHiveSparkSession.sql(TestHive.scala:238)
   	at org.apache.spark.sql.test.SQLTestUtilsBase.$anonfun$sql$1(SQLTestUtils.scala:216)
   	at org.apache.spark.sql.hive.execution.SQLQuerySuite.$anonfun$new$488(SQLQuerySuite.scala:2533)
   	at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:139)
   	at org.apache.spark.sql.hive.execution.SQLQuerySuite.$anonfun$new$486(SQLQuerySuite.scala:2534)
   	at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction(SQLTestUtils.scala:239)
   	at org.apache.spark.sql.test.SQLTestUtilsBase.withUserDefinedFunction$(SQLTestUtils.scala:237)
   	at org.apache.spark.sql.hive.execution.SQLQuerySuite.withUserDefinedFunction(SQLQuerySuite.scala:70)
   	at org.apache.spark.sql.hive.execution.SQLQuerySuite.$anonfun$new$485(SQLQuerySuite.scala:2501)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   	at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:221)
   	at org.apache.spark.sql.hive.execution.SQLQuerySuite.$anonfun$new$484(SQLQuerySuite.scala:2501)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   	at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
   	at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
   	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
   	at org.scalatest.Transformer.apply(Transformer.scala:22)
   	at org.scalatest.Transformer.apply(Transformer.scala:20)
   	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
   	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149)
   	at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
   	at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
   	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
   	at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
   	at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
   	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56)
   	at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
   	at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
   	at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:56)
   	at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
   	at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
   	at scala.collection.immutable.List.foreach(List.scala:392)
   	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
   	at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
   	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
   	at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
   	at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
   	at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
   	at org.scalatest.Suite.run(Suite.scala:1124)
   	at org.scalatest.Suite.run$(Suite.scala:1106)
   	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
   	at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
   	at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
   	at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
   	at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
   	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:56)
   	at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
   	at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
   	at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
   	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56)
   	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
   	at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1349)
   	at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1343)
   	at scala.collection.immutable.List.foreach(List.scala:392)
   	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1343)
   	at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:1033)
   	at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:1011)
   	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1509)
   	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1011)
   	at org.scalatest.tools.Runner$.run(Runner.scala:850)
   	at org.scalatest.tools.Runner.run(Runner.scala)
   	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:133)
   	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:27)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569682861
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361947939
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 ##########
 @@ -2491,4 +2492,47 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " +
+    "current thread context classloader") {
+    // force to use Spark classloader as other test (even in other test suites) may change the
+    // current thread's context classloader to jar classloader
+    Utils.withContextClassLoader(Utils.getSparkClassLoader) {
+      withUserDefinedFunction("udtf_count3" -> false) {
+        val sparkClassLoader = Thread.currentThread().getContextClassLoader
+
+        // This jar file should not be placed to the classpath; GenericUDTFCount3 is slightly
+        // modified version of GenericUDTFCount2 in hive/contrib, which emits the count for
+        // three times.
+        val jarPath = "src/test/noclasspath/TestUDTF-spark-26560.jar"
+        val jarURL = s"file://${System.getProperty("user.dir")}/$jarPath"
+
+        sql(
+          s"""
+             |CREATE FUNCTION udtf_count3
+             |AS 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount3'
+             |USING JAR '$jarURL'
+          """.stripMargin)
+
+        assert(Thread.currentThread().getContextClassLoader eq sparkClassLoader)
+
+        // JAR will be loaded at first usage, and it will change the current thread's
+        // context classloader to jar classloader in sharedState.
+        // See SessionState.addJar for details.
+        checkAnswer(
+          sql("SELECT udtf_count3(a) FROM (SELECT 1 AS a FROM src LIMIT 3) t"),
+          Row(3) :: Row(3) :: Row(3) :: Nil)
+
+        assert(Thread.currentThread().getContextClassLoader ne sparkClassLoader)
+        assert(Thread.currentThread().getContextClassLoader eq
+          spark.sqlContext.sharedState.jarClassLoader)
+
+        // Roll back to the original classloader and run query again.
+        Thread.currentThread().setContextClassLoader(sparkClassLoader)
 
 Review comment:
   got it. Can we add some comments to explain it?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569462794
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569274177
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361905472
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 ##########
 @@ -2491,4 +2492,47 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " +
 
 Review comment:
   without this patch, which step will we fail in this test?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361917338
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 ##########
 @@ -2491,4 +2492,47 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " +
+    "current thread context classloader") {
+    // force to use Spark classloader as other test (even in other test suites) may change the
+    // current thread's context classloader to jar classloader
+    Utils.withContextClassLoader(Utils.getSparkClassLoader) {
+      withUserDefinedFunction("udtf_count3" -> false) {
+        val sparkClassLoader = Thread.currentThread().getContextClassLoader
+
+        // This jar file should not be placed to the classpath; GenericUDTFCount3 is slightly
+        // modified version of GenericUDTFCount2 in hive/contrib, which emits the count for
+        // three times.
+        val jarPath = "src/test/noclasspath/TestUDTF-spark-26560.jar"
+        val jarURL = s"file://${System.getProperty("user.dir")}/$jarPath"
+
+        sql(
+          s"""
+             |CREATE FUNCTION udtf_count3
+             |AS 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount3'
+             |USING JAR '$jarURL'
+          """.stripMargin)
+
+        assert(Thread.currentThread().getContextClassLoader eq sparkClassLoader)
+
+        // JAR will be loaded at first usage, and it will change the current thread's
+        // context classloader to jar classloader in sharedState.
+        // See SessionState.addJar for details.
+        checkAnswer(
+          sql("SELECT udtf_count3(a) FROM (SELECT 1 AS a FROM src LIMIT 3) t"),
+          Row(3) :: Row(3) :: Row(3) :: Nil)
+
+        assert(Thread.currentThread().getContextClassLoader ne sparkClassLoader)
+        assert(Thread.currentThread().getContextClassLoader eq
+          spark.sqlContext.sharedState.jarClassLoader)
+
+        // Roll back to the original classloader and run query again.
+        Thread.currentThread().setContextClassLoader(sparkClassLoader)
 
 Review comment:
   Yes, because the jar is loaded in jarClassloader. (commenting this line would make the test pass.)
   
   If we don't change the thread context classloader manually, current thread context classloader might be still likely jarClassloader, but it can be changed (one example is spark-shell which will roll back context classloader per prompt if I understand correctly). Here we mimic the behavior of `spark-shell`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r400725557
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
+      Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
+        var udfExpr: Option[Expression] = None
+        try {
+          // When we instantiate hive UDF wrapper class, we may throw exception if the input
+          // expressions don't satisfy the hive UDF, such as type mismatch, input number
+          // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
+          if (classOf[UDF].isAssignableFrom(clazz)) {
+            udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
+            udfExpr.get.dataType // Force it to check input data types.
 
 Review comment:
   #28079

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569292093
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569495909
 
 
   ```
   17:55:15.408 ERROR org.apache.spark.SparkContext: Failed to add <a href='file:///home/jenkins/workspace/SparkPullRequestBuilder/sql/hive/src/test/noclasspath/TestUDTF.jar'>file:///home/jenkins/workspace/SparkPullRequestBuilder/sql/hive/src/test/noclasspath/TestUDTF.jar</a> to Spark environment
   java.lang.IllegalArgumentException: requirement failed: File TestUDTF.jar was already registered with a different path (old path = /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive/target/scala-2.12/test-classes/TestUDTF.jar, new path = /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive/src/test/noclasspath/TestUDTF.jar
           at scala.Predef$.require(Predef.scala:281)
           at org.apache.spark.rpc.netty.NettyStreamManager.addJar(NettyStreamManager.scala:79)
           at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1853)
           at org.apache.spark.SparkContext.addJar(SparkContext.scala:1901)
           at org.apache.spark.sql.internal.SessionResourceLoader.addJar(SessionState.scala:166)
           at org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:118)
           at org.apache.spark.sql.internal.SessionResourceLoader.loadResource(SessionState.scala:149)
           at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$loadFunctionResources$1(SessionCatalog.scala:1287)
           at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$loadFunctionResources$1$adapted(SessionCatalog.scala:1287)
           at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
           at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
           at org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadFunctionResources(SessionCatalog.scala:1287)
           at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction(SessionCatalog.scala:1428)
           at org.apache.spark.sql.hive.HiveSessionCatalog.super$lookupFunction(HiveSessionCatalog.scala:135)
           at org.apache.spark.sql.hive.HiveSessionCatalog.$anonfun$lookupFunction0$2(HiveSessionCatalog.scala:135)
           at scala.util.Try$.apply(Try.scala:213)
           at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction0(HiveSessionCatalog.scala:135)
           at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction(HiveSessionCatalog.scala:121)
           at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$15$$anonfun$applyOrElse$96.$anonfun$applyOrElse$99(Analyzer.scala:1739)
           at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53)
           at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$15$$anonfun$applyOrElse$96.applyOrElse(Analyzer.scala:1739)
           at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$15$$anonfun$applyOrElse$96.applyOrElse(Analyzer.scala:1722)
           at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:286)
           at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
           at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:286)
           at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:291)
           at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:376)
   ...
   ```
   
   It doesn't seem to allow same name - I'll just add `spark-26560` after the jar name.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361917338
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 ##########
 @@ -2491,4 +2492,47 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " +
+    "current thread context classloader") {
+    // force to use Spark classloader as other test (even in other test suites) may change the
+    // current thread's context classloader to jar classloader
+    Utils.withContextClassLoader(Utils.getSparkClassLoader) {
+      withUserDefinedFunction("udtf_count3" -> false) {
+        val sparkClassLoader = Thread.currentThread().getContextClassLoader
+
+        // This jar file should not be placed to the classpath; GenericUDTFCount3 is slightly
+        // modified version of GenericUDTFCount2 in hive/contrib, which emits the count for
+        // three times.
+        val jarPath = "src/test/noclasspath/TestUDTF-spark-26560.jar"
+        val jarURL = s"file://${System.getProperty("user.dir")}/$jarPath"
+
+        sql(
+          s"""
+             |CREATE FUNCTION udtf_count3
+             |AS 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount3'
+             |USING JAR '$jarURL'
+          """.stripMargin)
+
+        assert(Thread.currentThread().getContextClassLoader eq sparkClassLoader)
+
+        // JAR will be loaded at first usage, and it will change the current thread's
+        // context classloader to jar classloader in sharedState.
+        // See SessionState.addJar for details.
+        checkAnswer(
+          sql("SELECT udtf_count3(a) FROM (SELECT 1 AS a FROM src LIMIT 3) t"),
+          Row(3) :: Row(3) :: Row(3) :: Nil)
+
+        assert(Thread.currentThread().getContextClassLoader ne sparkClassLoader)
+        assert(Thread.currentThread().getContextClassLoader eq
+          spark.sqlContext.sharedState.jarClassLoader)
+
+        // Roll back to the original classloader and run query again.
+        Thread.currentThread().setContextClassLoader(sparkClassLoader)
 
 Review comment:
   Yes, because the jar is loaded in jarClassloader.
   
   If we don't change the thread context classloader manually, current thread context classloader might be still likely jarClassloader, but it can be changed (one example is spark-shell which will roll back context classloader per prompt if I understand correctly). Here we mimic the behavior of `spark-shell`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569682868
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115953/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361958725
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 ##########
 @@ -2491,4 +2492,47 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " +
+    "current thread context classloader") {
+    // force to use Spark classloader as other test (even in other test suites) may change the
+    // current thread's context classloader to jar classloader
+    Utils.withContextClassLoader(Utils.getSparkClassLoader) {
+      withUserDefinedFunction("udtf_count3" -> false) {
+        val sparkClassLoader = Thread.currentThread().getContextClassLoader
+
+        // This jar file should not be placed to the classpath; GenericUDTFCount3 is slightly
+        // modified version of GenericUDTFCount2 in hive/contrib, which emits the count for
+        // three times.
+        val jarPath = "src/test/noclasspath/TestUDTF-spark-26560.jar"
+        val jarURL = s"file://${System.getProperty("user.dir")}/$jarPath"
+
+        sql(
+          s"""
+             |CREATE FUNCTION udtf_count3
+             |AS 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount3'
+             |USING JAR '$jarURL'
+          """.stripMargin)
+
+        assert(Thread.currentThread().getContextClassLoader eq sparkClassLoader)
+
+        // JAR will be loaded at first usage, and it will change the current thread's
+        // context classloader to jar classloader in sharedState.
+        // See SessionState.addJar for details.
+        checkAnswer(
+          sql("SELECT udtf_count3(a) FROM (SELECT 1 AS a FROM src LIMIT 3) t"),
+          Row(3) :: Row(3) :: Row(3) :: Nil)
+
+        assert(Thread.currentThread().getContextClassLoader ne sparkClassLoader)
+        assert(Thread.currentThread().getContextClassLoader eq
+          spark.sqlContext.sharedState.jarClassLoader)
+
+        // Roll back to the original classloader and run query again.
+        Thread.currentThread().setContextClassLoader(sparkClassLoader)
 
 Review comment:
   Sure! Will update it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569378971
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20671/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r397307022
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
+      Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
+        var udfExpr: Option[Expression] = None
+        try {
+          // When we instantiate hive UDF wrapper class, we may throw exception if the input
+          // expressions don't satisfy the hive UDF, such as type mismatch, input number
+          // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
+          if (classOf[UDF].isAssignableFrom(clazz)) {
+            udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
+            udfExpr.get.dataType // Force it to check input data types.
 
 Review comment:
   Found a potential problem: here we call `HiveSimpleUDF.dateType` (which is a lazy val), to force to load the class with the corrected class loader.
   
   However, if the expression gets transformed later, which copies `HiveSimpleUDF`, then calling  `HiveSimpleUDF.dataType` will re-trigger the class loading, and at that time there is no guarantee that the corrected classloader is used.
   
   I think we should materialize the loaded class in `HiveSimpleUDF`.
   
   @HeartSaVioR can you take a look?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569651483
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569467520
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569462795
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20686/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569496584
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20703/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569462795
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20686/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569387154
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115880/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361667169
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 ##########
 @@ -70,6 +70,17 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
   import hiveContext._
   import spark.implicits._
 
+  private var threadContextClassLoader: ClassLoader = _
+
+  override protected def beforeEach(): Unit = {
 
 Review comment:
   beforeEach and afterEach are needed to make a new UT pass, as some tests change the current thread's context classloader to jar classloader.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361789163
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
 
 Review comment:
   I just thought that, since the current master search classes in `IMainTranslatingClassLoader` first, its better to keep the current behaivour then add a class loader if it fails. But, its ok to wait for other reviewr's comments.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569496456
 
 
   **[Test build #115913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115913/testReport)** for PR 27025 at commit [`988061b`](https://github.com/apache/spark/commit/988061bf878b5c5685a0a0eb9fdabe05e1b4eaa8).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569496456
 
 
   **[Test build #115913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115913/testReport)** for PR 27025 at commit [`988061b`](https://github.com/apache/spark/commit/988061bf878b5c5685a0a0eb9fdabe05e1b4eaa8).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569463115
 
 
   **[Test build #115896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115896/testReport)** for PR 27025 at commit [`418e027`](https://github.com/apache/spark/commit/418e027a345875c879b90408699c0b987929cc2c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569462794
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569467520
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569405037
 
 
   > sql/hive/src/test/TestUDTF-dynamicload.jar is a weird path..., can we avoid this?
   
   Hmm... the jar shouldn't be in classpath. What about `sql/hive/src/test/data/TestUDTF.jar` and leaving README.txt in `sql/hive/src/test/data` to explain this path shouldn't be in classpath?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r397591651
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
+      Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
+        var udfExpr: Option[Expression] = None
+        try {
+          // When we instantiate hive UDF wrapper class, we may throw exception if the input
+          // expressions don't satisfy the hive UDF, such as type mismatch, input number
+          // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
+          if (classOf[UDF].isAssignableFrom(clazz)) {
+            udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
+            udfExpr.get.dataType // Force it to check input data types.
 
 Review comment:
   Thanks for pinging me.
   
   Could you please confirm my understanding? Actually my knowledge to resolve this issue came from debugging (like, reverse-engineering) so I'm not sure I get it 100%.
   
   If my understanding is correct, this seems to be the simple reproducer - could you please confirm I understand correctly?
   
   ```
   // uses classloader which loads clazz
   val udf = HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input)
   udf.dataType
   val newUdf = udf.makeCopy(Array.empty)
   // change classloader which doesn't load clazz
   newUdf.dataType
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569650943
 
 
   **[Test build #115953 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115953/testReport)** for PR 27025 at commit [`39a2171`](https://github.com/apache/spark/commit/39a2171d361f38b54640c07ccb8990aa4c204238).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569274182
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20654/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569292093
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569379417
 
 
   **[Test build #115880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115880/testReport)** for PR 27025 at commit [`b801ac5`](https://github.com/apache/spark/commit/b801ac57c04d6d8384f735b79dfaefa5bee77ba2).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569506612
 
 
   **[Test build #115913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115913/testReport)** for PR 27025 at commit [`988061b`](https://github.com/apache/spark/commit/988061bf878b5c5685a0a0eb9fdabe05e1b4eaa8).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361785557
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 ##########
 @@ -2491,4 +2492,47 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " +
+    "current thread context classloader") {
+    // force to use Spark classloader as other test (even in other test suites) may change the
+    // current thread's context classloader to jar classloader
+    Utils.withContextClassLoader(Utils.getSparkClassLoader) {
+      withUserDefinedFunction("udtf_count3" -> false) {
+        val sparkClassLoader = Thread.currentThread().getContextClassLoader
+
+        // This jar file should not be placed to the classpath; GenericUDTFCount3 is slightly
+        // modified version of GenericUDTFCount2 in hive/contrib, which emits the count for
+        // three times.
+        val jarPath = "src/test/TestUDTF-dynamicload.jar"
+        val jarURL = s"file://${System.getProperty("user.dir")}/$jarPath"
+
+        sql(
+          s"""
+             |CREATE FUNCTION udtf_count3
+             |AS 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount3'
+             |USING JAR '$jarURL'
+      """.stripMargin)
 
 Review comment:
   nit: indent

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569387153
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569506744
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569273885
 
 
   **[Test build #115861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115861/testReport)** for PR 27025 at commit [`02188e0`](https://github.com/apache/spark/commit/02188e07a9d74ef7422a23db4351e49faa0006b2).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r399612923
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
+      Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
+        var udfExpr: Option[Expression] = None
+        try {
+          // When we instantiate hive UDF wrapper class, we may throw exception if the input
+          // expressions don't satisfy the hive UDF, such as type mismatch, input number
+          // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
+          if (classOf[UDF].isAssignableFrom(clazz)) {
+            udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
+            udfExpr.get.dataType // Force it to check input data types.
 
 Review comment:
   I figured out above code doesn't give error - HiveFunctionWrapper stores `instance` which is copied in `makeCopy()` - so once the instance is created it doesn't seems to require changing classloader.
   
   That said, below code gives error:
   
   ```
   // uses classloader which loads clazz
   val udf = HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input)
   // make sure HiveFunctionWrapper.createFunction is not called here
   
   val newUdf = udf.makeCopy(udf.productIterator.map(_.asInstanceOf[AnyRef]).toArray)
   // change classloader which doesn't load clazz
   newUdf.dataType
   ``` 
   
   we force call `.dataType` after creating HiveXXXUDF, so if my understanding is correct it won't be matter.
   
   Could you please check whether my observation is correct, or please let me know if I'm missing something?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569379417
 
 
   **[Test build #115880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115880/testReport)** for PR 27025 at commit [`b801ac5`](https://github.com/apache/spark/commit/b801ac57c04d6d8384f735b79dfaefa5bee77ba2).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569467504
 
 
   **[Test build #115896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115896/testReport)** for PR 27025 at commit [`418e027`](https://github.com/apache/spark/commit/418e027a345875c879b90408699c0b987929cc2c).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569273885
 
 
   **[Test build #115861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115861/testReport)** for PR 27025 at commit [`02188e0`](https://github.com/apache/spark/commit/02188e07a9d74ef7422a23db4351e49faa0006b2).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361917338
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 ##########
 @@ -2491,4 +2492,47 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " +
+    "current thread context classloader") {
+    // force to use Spark classloader as other test (even in other test suites) may change the
+    // current thread's context classloader to jar classloader
+    Utils.withContextClassLoader(Utils.getSparkClassLoader) {
+      withUserDefinedFunction("udtf_count3" -> false) {
+        val sparkClassLoader = Thread.currentThread().getContextClassLoader
+
+        // This jar file should not be placed to the classpath; GenericUDTFCount3 is slightly
+        // modified version of GenericUDTFCount2 in hive/contrib, which emits the count for
+        // three times.
+        val jarPath = "src/test/noclasspath/TestUDTF-spark-26560.jar"
+        val jarURL = s"file://${System.getProperty("user.dir")}/$jarPath"
+
+        sql(
+          s"""
+             |CREATE FUNCTION udtf_count3
+             |AS 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount3'
+             |USING JAR '$jarURL'
+          """.stripMargin)
+
+        assert(Thread.currentThread().getContextClassLoader eq sparkClassLoader)
+
+        // JAR will be loaded at first usage, and it will change the current thread's
+        // context classloader to jar classloader in sharedState.
+        // See SessionState.addJar for details.
+        checkAnswer(
+          sql("SELECT udtf_count3(a) FROM (SELECT 1 AS a FROM src LIMIT 3) t"),
+          Row(3) :: Row(3) :: Row(3) :: Nil)
+
+        assert(Thread.currentThread().getContextClassLoader ne sparkClassLoader)
+        assert(Thread.currentThread().getContextClassLoader eq
+          spark.sqlContext.sharedState.jarClassLoader)
+
+        // Roll back to the original classloader and run query again.
+        Thread.currentThread().setContextClassLoader(sparkClassLoader)
 
 Review comment:
   Yes, because the jar is loaded in jarClassloader.
   
   If we don't change the thread context classloader manually, it might be still likely jarClassloader, but it can be changed (one example is spark-shell which will roll back context classloader per prompt if I understand correctly). Here we mimic the behavior of `spark-shell`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r399923532
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
+      Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
+        var udfExpr: Option[Expression] = None
+        try {
+          // When we instantiate hive UDF wrapper class, we may throw exception if the input
+          // expressions don't satisfy the hive UDF, such as type mismatch, input number
+          // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
+          if (classOf[UDF].isAssignableFrom(clazz)) {
+            udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
+            udfExpr.get.dataType // Force it to check input data types.
 
 Review comment:
   OK let me put my findings: If you look at `HiveFunctionWrapper.createFunction`, it says we don't cache the instance for Simple UDF
   ```
       def createFunction[UDFType <: AnyRef](): UDFType = {
         if (instance != null) {
           instance.asInstanceOf[UDFType]
         } else {
           val func = Utils.getContextOrSparkClassLoader
             .loadClass(functionClassName).newInstance.asInstanceOf[UDFType]
           if (!func.isInstanceOf[UDF]) {
             // We cache the function if it's no the Simple UDF,
             // as we always have to create new instance for Simple UDF
             instance = func
           }
           func
         }
       }
   ```
   I don't know the history but I assume "we always have to create new instance for Simple UDF" is correct. I think what we can do is to cache the loaded `Class` as well as the instance.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569274177
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r362028687
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
 
 Review comment:
   If the class is from classpath (not loaded from addJar), it would be spark ClassLoader instead of jarClassLoader, though jarClassLoader may be able to load it as it contains Spark classloader. So just changing to jarClassLoader may work in most cases, but this would also work for the classloader which dynamically loads the classes, as we're using classloader which "loaded" the class we want to instantiate.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569400674
 
 
   `sql/hive/src/test/TestUDTF-dynamicload.jar` is a weird path..., can we avoid this?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361820154
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
 
 Review comment:
   I can't imagine clazz is loaded from other than one of three cases - 1. available in classpath 2. class dynamically loaded (spark-shell, and more?) 3. JAR dynamically loaded - and if I understand correctly, Spark classpath (and Hive dependencies as well if Hive is enabled) is available for the classloader which loads the clazz for all three cases. This means clazz is the only one we need to make sure the current context classloader can load it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569495909
 
 
   ```
   17:55:15.408 ERROR org.apache.spark.SparkContext: Failed to add <a href='file:///home/jenkins/workspace/SparkPullRequestBuilder/sql/hive/src/test/noclasspath/TestUDTF.jar'>file:///home/jenkins/workspace/SparkPullRequestBuilder/sql/hive/src/test/noclasspath/TestUDTF.jar</a> to Spark environment
   java.lang.IllegalArgumentException: requirement failed: File TestUDTF.jar was already registered with a different path (old path = /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive/target/scala-2.12/test-classes/TestUDTF.jar, new path = /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive/src/test/noclasspath/TestUDTF.jar
           at scala.Predef$.require(Predef.scala:281)
           at org.apache.spark.rpc.netty.NettyStreamManager.addJar(NettyStreamManager.scala:79)
           at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1853)
           at org.apache.spark.SparkContext.addJar(SparkContext.scala:1901)
           at org.apache.spark.sql.internal.SessionResourceLoader.addJar(SessionState.scala:166)
           at org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:118)
           at org.apache.spark.sql.internal.SessionResourceLoader.loadResource(SessionState.scala:149)
           at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$loadFunctionResources$1(SessionCatalog.scala:1287)
           at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$loadFunctionResources$1$adapted(SessionCatalog.scala:1287)
           at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
           at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
           at org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadFunctionResources(SessionCatalog.scala:1287)
           at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction(SessionCatalog.scala:1428)
           at org.apache.spark.sql.hive.HiveSessionCatalog.super$lookupFunction(HiveSessionCatalog.scala:135)
           at org.apache.spark.sql.hive.HiveSessionCatalog.$anonfun$lookupFunction0$2(HiveSessionCatalog.scala:135)
           at scala.util.Try$.apply(Try.scala:213)
           at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction0(HiveSessionCatalog.scala:135)
           at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction(HiveSessionCatalog.scala:121)
           at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$15$$anonfun$applyOrElse$96.$anonfun$applyOrElse$99(Analyzer.scala:1739)
           at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53)
           at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$15$$anonfun$applyOrElse$96.applyOrElse(Analyzer.scala:1739)
           at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$15$$anonfun$applyOrElse$96.applyOrElse(Analyzer.scala:1722)
           at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:286)
           at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
           at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:286)
           at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:291)
           at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:376)
   ...
   ```
   
   It doesn't seem to allow same name - I'll just rename the jar to `TestUDTF-spark-26560.jar` after the jar name.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569651483
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361917338
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 ##########
 @@ -2491,4 +2492,47 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " +
+    "current thread context classloader") {
+    // force to use Spark classloader as other test (even in other test suites) may change the
+    // current thread's context classloader to jar classloader
+    Utils.withContextClassLoader(Utils.getSparkClassLoader) {
+      withUserDefinedFunction("udtf_count3" -> false) {
+        val sparkClassLoader = Thread.currentThread().getContextClassLoader
+
+        // This jar file should not be placed to the classpath; GenericUDTFCount3 is slightly
+        // modified version of GenericUDTFCount2 in hive/contrib, which emits the count for
+        // three times.
+        val jarPath = "src/test/noclasspath/TestUDTF-spark-26560.jar"
+        val jarURL = s"file://${System.getProperty("user.dir")}/$jarPath"
+
+        sql(
+          s"""
+             |CREATE FUNCTION udtf_count3
+             |AS 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount3'
+             |USING JAR '$jarURL'
+          """.stripMargin)
+
+        assert(Thread.currentThread().getContextClassLoader eq sparkClassLoader)
+
+        // JAR will be loaded at first usage, and it will change the current thread's
+        // context classloader to jar classloader in sharedState.
+        // See SessionState.addJar for details.
+        checkAnswer(
+          sql("SELECT udtf_count3(a) FROM (SELECT 1 AS a FROM src LIMIT 3) t"),
+          Row(3) :: Row(3) :: Row(3) :: Nil)
+
+        assert(Thread.currentThread().getContextClassLoader ne sparkClassLoader)
+        assert(Thread.currentThread().getContextClassLoader eq
+          spark.sqlContext.sharedState.jarClassLoader)
+
+        // Roll back to the original classloader and run query again.
+        Thread.currentThread().setContextClassLoader(sparkClassLoader)
 
 Review comment:
   Yes, because the jar is loaded in jarClassloader.
   
   If we don't change the thread context classloader, it might be likely jarClassloader, but it can be changed (one example is spark-shell which will roll back context classloader per prompt if I understand correctly). Here we mimic the behavior of `spark-shell`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569277051
 
 
   The ideal fix would be not changing current thread's context classloader in addJar, and still make things work. I guess there're many places relying on context classloader, so that requires broader changes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569650943
 
 
   **[Test build #115953 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115953/testReport)** for PR 27025 at commit [`39a2171`](https://github.com/apache/spark/commit/39a2171d361f38b54640c07ccb8990aa4c204238).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361960918
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 ##########
 @@ -2491,4 +2492,47 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " +
+    "current thread context classloader") {
+    // force to use Spark classloader as other test (even in other test suites) may change the
+    // current thread's context classloader to jar classloader
+    Utils.withContextClassLoader(Utils.getSparkClassLoader) {
+      withUserDefinedFunction("udtf_count3" -> false) {
+        val sparkClassLoader = Thread.currentThread().getContextClassLoader
+
+        // This jar file should not be placed to the classpath; GenericUDTFCount3 is slightly
+        // modified version of GenericUDTFCount2 in hive/contrib, which emits the count for
+        // three times.
+        val jarPath = "src/test/noclasspath/TestUDTF-spark-26560.jar"
+        val jarURL = s"file://${System.getProperty("user.dir")}/$jarPath"
+
+        sql(
+          s"""
+             |CREATE FUNCTION udtf_count3
+             |AS 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount3'
+             |USING JAR '$jarURL'
+          """.stripMargin)
+
+        assert(Thread.currentThread().getContextClassLoader eq sparkClassLoader)
+
+        // JAR will be loaded at first usage, and it will change the current thread's
+        // context classloader to jar classloader in sharedState.
+        // See SessionState.addJar for details.
+        checkAnswer(
+          sql("SELECT udtf_count3(a) FROM (SELECT 1 AS a FROM src LIMIT 3) t"),
+          Row(3) :: Row(3) :: Row(3) :: Nil)
+
+        assert(Thread.currentThread().getContextClassLoader ne sparkClassLoader)
+        assert(Thread.currentThread().getContextClassLoader eq
+          spark.sqlContext.sharedState.jarClassLoader)
+
+        // Roll back to the original classloader and run query again.
+        Thread.currentThread().setContextClassLoader(sparkClassLoader)
 
 Review comment:
   Just updated.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569467521
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115896/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361788588
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
 
 Review comment:
   I've sought the existing usages of `Utils.withContextClassLoader` and they don't check the current classloader and just call it. Is this in critical path?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569378969
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569274182
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20654/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569378971
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20671/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361914679
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 ##########
 @@ -2491,4 +2492,47 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " +
+    "current thread context classloader") {
+    // force to use Spark classloader as other test (even in other test suites) may change the
+    // current thread's context classloader to jar classloader
+    Utils.withContextClassLoader(Utils.getSparkClassLoader) {
+      withUserDefinedFunction("udtf_count3" -> false) {
+        val sparkClassLoader = Thread.currentThread().getContextClassLoader
+
+        // This jar file should not be placed to the classpath; GenericUDTFCount3 is slightly
+        // modified version of GenericUDTFCount2 in hive/contrib, which emits the count for
+        // three times.
+        val jarPath = "src/test/noclasspath/TestUDTF-spark-26560.jar"
+        val jarURL = s"file://${System.getProperty("user.dir")}/$jarPath"
+
+        sql(
+          s"""
+             |CREATE FUNCTION udtf_count3
+             |AS 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount3'
+             |USING JAR '$jarURL'
+          """.stripMargin)
+
+        assert(Thread.currentThread().getContextClassLoader eq sparkClassLoader)
+
+        // JAR will be loaded at first usage, and it will change the current thread's
+        // context classloader to jar classloader in sharedState.
+        // See SessionState.addJar for details.
+        checkAnswer(
+          sql("SELECT udtf_count3(a) FROM (SELECT 1 AS a FROM src LIMIT 3) t"),
+          Row(3) :: Row(3) :: Row(3) :: Nil)
+
+        assert(Thread.currentThread().getContextClassLoader ne sparkClassLoader)
+        assert(Thread.currentThread().getContextClassLoader eq
+          spark.sqlContext.sharedState.jarClassLoader)
+
+        // Roll back to the original classloader and run query again.
+        Thread.currentThread().setContextClassLoader(sparkClassLoader)
 
 Review comment:
   Do you mean even if we set back the class loader, this test still fails without this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569506744
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569463115
 
 
   **[Test build #115896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115896/testReport)** for PR 27025 at commit [`418e027`](https://github.com/apache/spark/commit/418e027a345875c879b90408699c0b987929cc2c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r397597269
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
+      Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
+        var udfExpr: Option[Expression] = None
+        try {
+          // When we instantiate hive UDF wrapper class, we may throw exception if the input
+          // expressions don't satisfy the hive UDF, such as type mismatch, input number
+          // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
+          if (classOf[UDF].isAssignableFrom(clazz)) {
+            udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
+            udfExpr.get.dataType // Force it to check input data types.
 
 Review comment:
   yup, `HiveSimpleUDF` needs to load class when `dataType` is first called. So even if we load the class here in `HiveSessionCatalog`, but once `HiveSimpleUDF` is copied during transformation, it needs to load class again.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569496584
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20703/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r399612923
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
+      Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
+        var udfExpr: Option[Expression] = None
+        try {
+          // When we instantiate hive UDF wrapper class, we may throw exception if the input
+          // expressions don't satisfy the hive UDF, such as type mismatch, input number
+          // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
+          if (classOf[UDF].isAssignableFrom(clazz)) {
+            udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
+            udfExpr.get.dataType // Force it to check input data types.
 
 Review comment:
   I figured out above code doesn't give error - HiveFunctionWrapper stores `instance` which is copied in `makeCopy()` - so once the instance is created it doesn't seems to require changing classloader.
   
   That said, below code gives error:
   
   ```
   // uses classloader which loads clazz
   val udf = HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input)
   // make sure HiveFunctionWrapper.createFunction is not called here
   
   // change classloader which doesn't load clazz
   val newUdf = udf.makeCopy(udf.productIterator.map(_.asInstanceOf[AnyRef]).toArray)
   newUdf.dataType
   ``` 
   
   Interestingly, like below, if we do makeCopy with classloader which loads clazz, it also doesn't give any error:
   
   ```
   // uses classloader which loads clazz
   val udf = HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input)
   // make sure HiveFunctionWrapper.createFunction is not called here
   
   val newUdf = udf.makeCopy(udf.productIterator.map(_.asInstanceOf[AnyRef]).toArray)
   // change classloader which doesn't load clazz
   newUdf.dataType
   ```
   
   we force call `.dataType` after creating HiveXXXUDF, so if my understanding is correct it won't matter.
   
   Could you please check whether my observation is correct, or please let me know if I'm missing something?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569496581
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r361917338
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 ##########
 @@ -2491,4 +2492,47 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-26560 Spark should be able to run Hive UDF using jar regardless of " +
+    "current thread context classloader") {
+    // force to use Spark classloader as other test (even in other test suites) may change the
+    // current thread's context classloader to jar classloader
+    Utils.withContextClassLoader(Utils.getSparkClassLoader) {
+      withUserDefinedFunction("udtf_count3" -> false) {
+        val sparkClassLoader = Thread.currentThread().getContextClassLoader
+
+        // This jar file should not be placed to the classpath; GenericUDTFCount3 is slightly
+        // modified version of GenericUDTFCount2 in hive/contrib, which emits the count for
+        // three times.
+        val jarPath = "src/test/noclasspath/TestUDTF-spark-26560.jar"
+        val jarURL = s"file://${System.getProperty("user.dir")}/$jarPath"
+
+        sql(
+          s"""
+             |CREATE FUNCTION udtf_count3
+             |AS 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount3'
+             |USING JAR '$jarURL'
+          """.stripMargin)
+
+        assert(Thread.currentThread().getContextClassLoader eq sparkClassLoader)
+
+        // JAR will be loaded at first usage, and it will change the current thread's
+        // context classloader to jar classloader in sharedState.
+        // See SessionState.addJar for details.
+        checkAnswer(
+          sql("SELECT udtf_count3(a) FROM (SELECT 1 AS a FROM src LIMIT 3) t"),
+          Row(3) :: Row(3) :: Row(3) :: Nil)
+
+        assert(Thread.currentThread().getContextClassLoader ne sparkClassLoader)
+        assert(Thread.currentThread().getContextClassLoader eq
+          spark.sqlContext.sharedState.jarClassLoader)
+
+        // Roll back to the original classloader and run query again.
+        Thread.currentThread().setContextClassLoader(sparkClassLoader)
 
 Review comment:
   Yes, because the jar is loaded in jarClassloader. (Not "even if". It's needed to fail the test: commenting this line would make the test pass.)
   
   If we don't change the thread context classloader manually, current thread context classloader might be still likely jarClassloader, but it can be changed (one example is spark-shell which will roll back context classloader per prompt if I understand correctly). Here we mimic the behavior of `spark-shell`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569682861
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569496581
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-570138151
 
 
   thanks, merging to master!
   
   @HeartSaVioR can you open another PR for 2.4?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569682587
 
 
   **[Test build #115953 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115953/testReport)** for PR 27025 at commit [`39a2171`](https://github.com/apache/spark/commit/39a2171d361f38b54640c07ccb8990aa4c204238).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r362005817
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
 
 Review comment:
   is it guaranteed that `clazz.getClassLoader` is the `sharedState.jarClassLoader`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r399617719
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
+      Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
+        var udfExpr: Option[Expression] = None
+        try {
+          // When we instantiate hive UDF wrapper class, we may throw exception if the input
+          // expressions don't satisfy the hive UDF, such as type mismatch, input number
+          // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
+          if (classOf[UDF].isAssignableFrom(clazz)) {
+            udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
+            udfExpr.get.dataType // Force it to check input data types.
 
 Review comment:
   The experimental UT code I used is below (added to SQLQuerySuite.scala) :
   
   ```
   test("SPARK-26560 ...experimenting Wenchen's comment...") {
       // force to use Spark classloader as other test (even in other test suites) may change the
       // current thread's context classloader to jar classloader
       Utils.withContextClassLoader(Utils.getSparkClassLoader) {
         withUserDefinedFunction("udtf_count3" -> false) {
           val sparkClassLoader = Thread.currentThread().getContextClassLoader
   
           // This jar file should not be placed to the classpath; GenericUDTFCount3 is slightly
           // modified version of GenericUDTFCount2 in hive/contrib, which emits the count for
           // three times.
           val jarPath = "src/test/noclasspath/TestUDTF-spark-26560.jar"
           val jarURL = s"file://${System.getProperty("user.dir")}/$jarPath"
   
           val className = "org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount3"
           sql(
             s"""
                |CREATE FUNCTION udtf_count3
                |AS '$className'
                |USING JAR '$jarURL'
             """.stripMargin)
   
           assert(Thread.currentThread().getContextClassLoader eq sparkClassLoader)
   
           // JAR will be loaded at first usage, and it will change the current thread's
           // context classloader to jar classloader in sharedState.
           // See SessionState.addJar for details.
           sql("SELECT udtf_count3(a) FROM (SELECT 1 AS a FROM src LIMIT 3) t")
   
           assert(Thread.currentThread().getContextClassLoader ne sparkClassLoader)
           assert(Thread.currentThread().getContextClassLoader eq
             spark.sqlContext.sharedState.jarClassLoader)
   
           // uses classloader which loads clazz
           val name = "default.udtf_count3"
   
           val input = Array(AttributeReference("a", IntegerType, nullable = false)())
           val udf = HiveGenericUDTF(name, new HiveFunctionWrapper(className), input)
           // FIXME: uncommenting below line will lead test passing
   //        udf.dataType
   
           // Roll back to the original classloader and run query again. Without this line, the test
           // would pass, as thread's context classloader is changed to jar classloader. But thread
           // context classloader can be changed from others as well which would fail the query; one
           // example is spark-shell, which thread context classloader rolls back automatically. This
           // mimics the behavior of spark-shell.
           Thread.currentThread().setContextClassLoader(sparkClassLoader)
   
           // FIXME: doing this "within" the context classloader which loads the UDF class will
           //   lead test passing even we comment out udf.dataType
           val newUdf = udf.makeCopy(udf.productIterator.map(_.asInstanceOf[AnyRef]).toArray)
   
           newUdf.dataType
         }
       }
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569292003
 
 
   **[Test build #115861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115861/testReport)** for PR 27025 at commit [`02188e0`](https://github.com/apache/spark/commit/02188e07a9d74ef7422a23db4351e49faa0006b2).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569651493
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20746/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#discussion_r400618527
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
 ##########
 @@ -66,49 +66,52 @@ private[sql] class HiveSessionCatalog(
       name: String,
       clazz: Class[_],
       input: Seq[Expression]): Expression = {
-
-    Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
-      var udfExpr: Option[Expression] = None
-      try {
-        // When we instantiate hive UDF wrapper class, we may throw exception if the input
-        // expressions don't satisfy the hive UDF, such as type mismatch, input number
-        // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
-        if (classOf[UDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[UDAF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveUDAFFunction(
-            name,
-            new HiveFunctionWrapper(clazz.getName),
-            input,
-            isUDAFBridgeRequired = true))
-          udfExpr.get.dataType // Force it to check input data types.
-        } else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
-          udfExpr = Some(HiveGenericUDTF(name, new HiveFunctionWrapper(clazz.getName), input))
-          udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema // Force it to check data types.
+    // Current thread context classloader may not be the one loaded the class. Need to switch
+    // context classloader to initialize instance properly.
+    Utils.withContextClassLoader(clazz.getClassLoader) {
+      Try(super.makeFunctionExpression(name, clazz, input)).getOrElse {
+        var udfExpr: Option[Expression] = None
+        try {
+          // When we instantiate hive UDF wrapper class, we may throw exception if the input
+          // expressions don't satisfy the hive UDF, such as type mismatch, input number
+          // mismatch, etc. Here we catch the exception and throw AnalysisException instead.
+          if (classOf[UDF].isAssignableFrom(clazz)) {
+            udfExpr = Some(HiveSimpleUDF(name, new HiveFunctionWrapper(clazz.getName), input))
+            udfExpr.get.dataType // Force it to check input data types.
 
 Review comment:
   Oh OK. I missed the case we don't cache the function. Thanks for the pointer!
   I'll try to reproduce the finding, and fix it without touching assumption.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569292096
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115861/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569506748
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115913/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569651493
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20746/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569387154
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115880/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569378969
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569387082
 
 
   **[Test build #115880 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115880/testReport)** for PR 27025 at commit [`b801ac5`](https://github.com/apache/spark/commit/b801ac57c04d6d8384f735b79dfaefa5bee77ba2).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27025: [SPARK-26560][SQL] Spark should be able to run Hive UDF using jar regardless of current thread context classloader
URL: https://github.com/apache/spark/pull/27025#issuecomment-569281209
 
 
   I mentioned Hive UDF as I haven't heard about creating Spark UDF function using jar. Please let me know if that's not the case.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org