You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2016/03/29 14:33:33 UTC

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/12036

    [SPARK-14123][SQL] Implement function related DDL commands

    ## What changes were proposed in this pull request?
    JIRA: https://issues.apache.org/jira/browse/SPARK-14123
    
    This PR adds the support of native DDL commands to create and drop (temporary) functions.
    
    For temporary function, these DDL commands call `FunctionRegistry` related APIs. This PR modifies current `FunctionRegistry` and `HiveFunctionRegistry` APIs to have proper temporary function support.
    
    For permanent function, we simply calls `Catalog` APIs.
    
    ## How was this patch tested?
    Existing tests.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 native-ddl-function

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12036.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12036
    
----
commit 7d001843f6a161298702aaf20079f1611b18141b
Author: Liang-Chi Hsieh <si...@tw.ibm.com>
Date:   2016-03-27T13:15:54Z

    init import.

commit 9af70af9cd2dfdac0fa36b2a1440dc14b88d9ddc
Author: Liang-Chi Hsieh <si...@tw.ibm.com>
Date:   2016-03-29T04:14:40Z

    Merge remote-tracking branch 'upstream/master' into native-ddl-function

commit 35ad7ae14ebff693a8e15dd231eb69be7061402f
Author: Liang-Chi Hsieh <si...@tw.ibm.com>
Date:   2016-03-29T12:10:19Z

    Fix hive temp function feature.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203257996
  
    **[Test build #54490 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54490/consoleFull)** for PR 12036 at commit [`314c4db`](https://github.com/apache/spark/commit/314c4db851da5e832fb995f5538e66d4c1ddc89a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class CatalogFunction(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-204203643
  
    **[Test build #54680 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54680/consoleFull)** for PR 12036 at commit [`05709f0`](https://github.com/apache/spark/commit/05709f09cfa5fb171e926c839929135b1cda112e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57831326
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -86,13 +87,50 @@ case class DescribeDatabase(
         extended: Boolean)(sql: String)
       extends NativeDDLCommand(sql) with Logging
     
    +/**
    + * The DDL command that creates a function.
    + * alias: the class name that implements the created function.
    + * resources: Jars, files, or archives which need to be added to the environment when the function
    + *            is referenced for the first time by a session.
    + * isTemp: indicates if it is a temporary function.
    + */
     case class CreateFunction(
         databaseName: Option[String],
         functionName: String,
         alias: String,
         resources: Seq[(String, String)],
         isTemp: Boolean)(sql: String)
    -  extends NativeDDLCommand(sql) with Logging
    +  extends NativeDDLCommand(sql) with Logging {
    +
    +  override def run(sqlContext: SQLContext): Seq[Row] = {
    +    val func = FunctionIdentifier(functionName, databaseName)
    +    val catalogFunc = CatalogFunction(func, alias)
    +    if (isTemp) {
    +      // Set `ignoreIfExists` to false, so if the temporary function already exists,
    +      // an exception will be thrown.
    +      val (info, builder) =
    +        sqlContext.sessionState.functionRegistry.getFunctionBuilderAndInfo(functionName, alias)
    +      sqlContext.sessionState.catalog.createTempFunction(functionName, info, builder, false)
    +    } else {
    +      // Check if the function to create is already existing. If so, throw exception.
    +      var funcExisting: Boolean = true
    +      try {
    +        if (sqlContext.sessionState.catalog.getFunction(func) == null) {
    +          funcExisting = false
    +        }
    +      } catch {
    +        case _: NoSuchFunctionException => funcExisting = false
    +      }
    --- End diff --
    
    yah. Got it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57963712
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala ---
    @@ -55,61 +57,133 @@ private[hive] class HiveFunctionRegistry(
         }
       }
     
    -  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    -    Try(underlying.lookupFunction(name, children)).getOrElse {
    -      // We only look it up to see if it exists, but do not include it in the HiveUDF since it is
    -      // not always serializable.
    -      val functionInfo: FunctionInfo =
    -        Option(getFunctionInfo(name.toLowerCase)).getOrElse(
    -          throw new AnalysisException(s"undefined function $name"))
    +  def loadHivePermanentFunction(name: String): Option[CatalogFunction] = {
    +    val databaseName = sessionStage.catalog.getCurrentDatabase
    +    val func = FunctionIdentifier(name, Option(databaseName))
    +    val catalogFunc =
    +      if (sessionStage.catalog.listFunctions(databaseName, name).size != 0) {
    +        Some(sessionStage.catalog.getFunction(func))
    +      } else {
    +        None
    +      }
    +    catalogFunc.map(_.resources.foreach { resource =>
    +      resource._1.toLowerCase match {
    +        case "jar" => sessionStage.ctx.addJar(resource._2)
    +        case _ =>
    +          sessionStage.ctx.runSqlHive(s"ADD FILE ${resource._2}")
    +          sessionStage.ctx.sparkContext.addFile(resource._2)
    +      }
    +    })
    +    catalogFunc
    +  }
     
    -      val functionClassName = functionInfo.getFunctionClass.getName
    +  override def makeFunctionBuilderAndInfo(
    +      name: String,
    +      functionClassName: String): (ExpressionInfo, FunctionBuilder) = {
    +    val hiveUDFWrapper = new HiveFunctionWrapper(functionClassName)
    +    val hiveUDFClass = hiveUDFWrapper.createFunction().getClass
    +    val info = new ExpressionInfo(functionClassName, name)
    +    val builder = makeHiveUDFBuilder(name, functionClassName, hiveUDFClass, hiveUDFWrapper)
    +    (info, builder)
    +  }
     
    -      // When we instantiate hive UDF wrapper class, we may throw exception if the input expressions
    -      // don't satisfy the hive UDF, such as type mismatch, input number mismatch, etc. Here we
    -      // catch the exception and throw AnalysisException instead.
    +  /**
    +   * Generates a Spark FunctionBuilder for a Hive UDF which is specified by a given classname.
    +   */
    +  def makeHiveUDFBuilder(
    +      name: String,
    +      functionClassName: String,
    +      hiveUDFClass: Class[_],
    +      hiveUDFWrapper: HiveFunctionWrapper): FunctionBuilder = {
    +    val builder = (children: Seq[Expression]) => {
           try {
    -        if (classOf[GenericUDFMacro].isAssignableFrom(functionInfo.getFunctionClass)) {
    +        if (classOf[GenericUDFMacro].isAssignableFrom(hiveUDFClass)) {
               val udf = HiveGenericUDF(
    -            name, new HiveFunctionWrapper(functionClassName, functionInfo.getGenericUDF), children)
    -          udf.dataType // Force it to check input data types.
    +            name, hiveUDFWrapper, children)
    +          if (udf.resolved) {
    +            udf.dataType // Force it to check input data types.
    +          }
               udf
    -        } else if (classOf[UDF].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udf = HiveSimpleUDF(name, new HiveFunctionWrapper(functionClassName), children)
    -          udf.dataType // Force it to check input data types.
    +        } else if (classOf[UDF].isAssignableFrom(hiveUDFClass)) {
    +          val udf = HiveSimpleUDF(name, hiveUDFWrapper, children)
    +          if (udf.resolved) {
    +            udf.dataType // Force it to check input data types.
    +          }
               udf
    -        } else if (classOf[GenericUDF].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udf = HiveGenericUDF(name, new HiveFunctionWrapper(functionClassName), children)
    -          udf.dataType // Force it to check input data types.
    +        } else if (classOf[GenericUDF].isAssignableFrom(hiveUDFClass)) {
    +          val udf = HiveGenericUDF(name, hiveUDFWrapper, children)
    +          if (udf.resolved) {
    +            udf.dataType // Force it to check input data types.
    +          }
               udf
             } else if (
    -          classOf[AbstractGenericUDAFResolver].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udaf = HiveUDAFFunction(name, new HiveFunctionWrapper(functionClassName), children)
    -          udaf.dataType // Force it to check input data types.
    +          classOf[AbstractGenericUDAFResolver].isAssignableFrom(hiveUDFClass)) {
    +          val udaf = HiveUDAFFunction(name, hiveUDFWrapper, children)
    +          if (udaf.resolved) {
    +            udaf.dataType // Force it to check input data types.
    +          }
               udaf
    -        } else if (classOf[UDAF].isAssignableFrom(functionInfo.getFunctionClass)) {
    +        } else if (classOf[UDAF].isAssignableFrom(hiveUDFClass)) {
               val udaf = HiveUDAFFunction(
    -            name, new HiveFunctionWrapper(functionClassName), children, isUDAFBridgeRequired = true)
    -          udaf.dataType  // Force it to check input data types.
    +            name, hiveUDFWrapper, children, isUDAFBridgeRequired = true)
    +          if (udaf.resolved) {
    +            udaf.dataType  // Force it to check input data types.
    +          }
               udaf
    -        } else if (classOf[GenericUDTF].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udtf = HiveGenericUDTF(name, new HiveFunctionWrapper(functionClassName), children)
    -          udtf.elementTypes // Force it to check input data types.
    +        } else if (classOf[GenericUDTF].isAssignableFrom(hiveUDFClass)) {
    +          val udtf = HiveGenericUDTF(name, hiveUDFWrapper, children)
    +          if (udtf.resolved) {
    +            udtf.elementTypes // Force it to check input data types.
    +          }
               udtf
             } else {
    -          throw new AnalysisException(s"No handler for udf ${functionInfo.getFunctionClass}")
    +          throw new AnalysisException(s"No handler for udf ${hiveUDFClass}")
             }
           } catch {
             case analysisException: AnalysisException =>
    -          // If the exception is an AnalysisException, just throw it.
               throw analysisException
             case throwable: Throwable =>
    -          // If there is any other error, we throw an AnalysisException.
    -          val errorMessage = s"No handler for Hive udf ${functionInfo.getFunctionClass} " +
    +          val errorMessage = s"No handler for Hive udf ${hiveUDFClass} " +
                 s"because: ${throwable.getMessage}."
               throw new AnalysisException(errorMessage)
           }
         }
    +    builder
    +  }
    +
    +  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    +    val builder = underlying.lookupFunctionBuilder(name)
    +    if (builder.isDefined) {
    +      builder.get(children)
    +    } else {
    +      // We only look it up to see if it exists, but do not include it in the HiveUDF since it is
    +      // not always serializable.
    +      val optFunctionInfo = Option(getFunctionInfo(name.toLowerCase))
    +      if (optFunctionInfo.isEmpty) {
    +        val catalogFunc = loadHivePermanentFunction(name).getOrElse(
    --- End diff --
    
    @viirya can you explain what we're doing here? If `getFunctionInfo` is null, why will session catalog know about the function? In the old code we just threw an exception if `getFunctionInfo` is null. Why do we try to do something else here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202896694
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203354843
  
    The current codebase is not enough to correctly support Hive permanent function. I just updated this to support it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57960505
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -175,13 +176,49 @@ case class DescribeDatabase(
       }
     }
     
    +/**
    + * The DDL command that creates a function.
    + * alias: the class name that implements the created function.
    + * resources: Jars, files, or archives which need to be added to the environment when the function
    + *            is referenced for the first time by a session.
    + * isTemp: indicates if it is a temporary function.
    + */
     case class CreateFunction(
         databaseName: Option[String],
         functionName: String,
         alias: String,
         resources: Seq[(String, String)],
         isTemp: Boolean)(sql: String)
    -  extends NativeDDLCommand(sql) with Logging
    +  extends NativeDDLCommand(sql) with Logging {
    +
    +  override def run(sqlContext: SQLContext): Seq[Row] = {
    +    val func = FunctionIdentifier(functionName, databaseName)
    +    val catalogFunc = CatalogFunction(func, alias, resources)
    +    if (isTemp) {
    +      // Set `ignoreIfExists` to false, so if the temporary function already exists,
    +      // an exception will be thrown.
    +      val (info, builder) =
    +        sqlContext.sessionState.functionRegistry.makeFunctionBuilderAndInfo(functionName, alias)
    +      sqlContext.sessionState.catalog.createTempFunction(functionName, info, builder, false)
    +    } else {
    +      // Check if the function to create is already existing. If so, throw exception.
    +      var funcExisting: Boolean =
    --- End diff --
    
    (and then you can use the same method in `DropFunction` as well)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203159161
  
    @viirya Thanks for the clarification. That said, I think the interfaces here can be simplified significantly. For instance, we don't need to expose `ExpressionInfo` since we can just get that from `FunctionBuilder`, then we don't need to change anything in `FunctionRegistry`.
    
    I did a first part of the clean up in #12051. With those changes, I think implementing the function DDLs will be pretty straightforward. The only additional thing that needs to be implemented is adding the resource list to `CatalogFunction`, but that's also simple to do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203900130
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54617/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203850041
  
    **[Test build #54612 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54612/consoleFull)** for PR 12036 at commit [`2cab41c`](https://github.com/apache/spark/commit/2cab41cdc254764ed976842475ce2c2547b0ddcc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57960777
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala ---
    @@ -29,7 +29,7 @@ import org.apache.spark.sql.internal.{SessionState, SQLConf}
     /**
      * A class that holds all session-specific state in a given [[HiveContext]].
      */
    -private[hive] class HiveSessionState(ctx: HiveContext) extends SessionState(ctx) {
    +private[hive] class HiveSessionState(val ctx: HiveContext) extends SessionState(ctx) {
    --- End diff --
    
    please don't do this; if someone needs the context then we should explicitly pass in the context itself instead of getting it through the session state.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57960666
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -193,7 +230,28 @@ case class DropFunction(
         functionName: String,
         ifExists: Boolean,
         isTemp: Boolean)(sql: String)
    -  extends NativeDDLCommand(sql) with Logging
    +  extends NativeDDLCommand(sql) with Logging {
    +
    +  override def run(sqlContext: SQLContext): Seq[Row] = {
    +    if (isTemp) {
    +      require(databaseName.isEmpty,
    +        "attempted to drop a temporary function while specifying a database")
    +      sqlContext.sessionState.catalog.dropTempFunction(functionName, ifExists)
    +    } else {
    +      val func = FunctionIdentifier(functionName, databaseName)
    +      if (!ifExists) {
    +        // getFunction can throw NoSuchFunctionException itself, or return null.
    +        val existingFunc = sqlContext.sessionState.catalog.getFunction(func)
    +        if (existingFunc == null) {
    +          val dbName = databaseName.getOrElse(sqlContext.sessionState.catalog.getCurrentDatabase)
    +          throw new NoSuchFunctionException(dbName, functionName)
    --- End diff --
    
    we should just be consistent and throw `AnalysisException` here. Once you added `functionExists` to the session catalog then you can just check on that instead of trying to get the function here yourself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203250531
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203394806
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203228057
  
    **[Test build #54487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54487/consoleFull)** for PR 12036 at commit [`e05b108`](https://github.com/apache/spark/commit/e05b108f9aeedf5d8e3fb5c7145940619811a858).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r58018075
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala ---
    @@ -55,61 +57,133 @@ private[hive] class HiveFunctionRegistry(
         }
       }
     
    -  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    -    Try(underlying.lookupFunction(name, children)).getOrElse {
    -      // We only look it up to see if it exists, but do not include it in the HiveUDF since it is
    -      // not always serializable.
    -      val functionInfo: FunctionInfo =
    -        Option(getFunctionInfo(name.toLowerCase)).getOrElse(
    -          throw new AnalysisException(s"undefined function $name"))
    +  def loadHivePermanentFunction(name: String): Option[CatalogFunction] = {
    +    val databaseName = sessionStage.catalog.getCurrentDatabase
    +    val func = FunctionIdentifier(name, Option(databaseName))
    +    val catalogFunc =
    +      if (sessionStage.catalog.listFunctions(databaseName, name).size != 0) {
    +        Some(sessionStage.catalog.getFunction(func))
    +      } else {
    +        None
    +      }
    +    catalogFunc.map(_.resources.foreach { resource =>
    +      resource._1.toLowerCase match {
    +        case "jar" => sessionStage.ctx.addJar(resource._2)
    +        case _ =>
    +          sessionStage.ctx.runSqlHive(s"ADD FILE ${resource._2}")
    +          sessionStage.ctx.sparkContext.addFile(resource._2)
    +      }
    +    })
    +    catalogFunc
    +  }
     
    -      val functionClassName = functionInfo.getFunctionClass.getName
    +  override def makeFunctionBuilderAndInfo(
    +      name: String,
    +      functionClassName: String): (ExpressionInfo, FunctionBuilder) = {
    +    val hiveUDFWrapper = new HiveFunctionWrapper(functionClassName)
    +    val hiveUDFClass = hiveUDFWrapper.createFunction().getClass
    +    val info = new ExpressionInfo(functionClassName, name)
    +    val builder = makeHiveUDFBuilder(name, functionClassName, hiveUDFClass, hiveUDFWrapper)
    +    (info, builder)
    +  }
     
    -      // When we instantiate hive UDF wrapper class, we may throw exception if the input expressions
    -      // don't satisfy the hive UDF, such as type mismatch, input number mismatch, etc. Here we
    -      // catch the exception and throw AnalysisException instead.
    +  /**
    +   * Generates a Spark FunctionBuilder for a Hive UDF which is specified by a given classname.
    +   */
    +  def makeHiveUDFBuilder(
    +      name: String,
    +      functionClassName: String,
    +      hiveUDFClass: Class[_],
    +      hiveUDFWrapper: HiveFunctionWrapper): FunctionBuilder = {
    +    val builder = (children: Seq[Expression]) => {
           try {
    -        if (classOf[GenericUDFMacro].isAssignableFrom(functionInfo.getFunctionClass)) {
    +        if (classOf[GenericUDFMacro].isAssignableFrom(hiveUDFClass)) {
               val udf = HiveGenericUDF(
    -            name, new HiveFunctionWrapper(functionClassName, functionInfo.getGenericUDF), children)
    -          udf.dataType // Force it to check input data types.
    +            name, hiveUDFWrapper, children)
    +          if (udf.resolved) {
    +            udf.dataType // Force it to check input data types.
    +          }
               udf
    -        } else if (classOf[UDF].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udf = HiveSimpleUDF(name, new HiveFunctionWrapper(functionClassName), children)
    -          udf.dataType // Force it to check input data types.
    +        } else if (classOf[UDF].isAssignableFrom(hiveUDFClass)) {
    +          val udf = HiveSimpleUDF(name, hiveUDFWrapper, children)
    +          if (udf.resolved) {
    +            udf.dataType // Force it to check input data types.
    +          }
               udf
    -        } else if (classOf[GenericUDF].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udf = HiveGenericUDF(name, new HiveFunctionWrapper(functionClassName), children)
    -          udf.dataType // Force it to check input data types.
    +        } else if (classOf[GenericUDF].isAssignableFrom(hiveUDFClass)) {
    +          val udf = HiveGenericUDF(name, hiveUDFWrapper, children)
    +          if (udf.resolved) {
    +            udf.dataType // Force it to check input data types.
    +          }
               udf
             } else if (
    -          classOf[AbstractGenericUDAFResolver].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udaf = HiveUDAFFunction(name, new HiveFunctionWrapper(functionClassName), children)
    -          udaf.dataType // Force it to check input data types.
    +          classOf[AbstractGenericUDAFResolver].isAssignableFrom(hiveUDFClass)) {
    +          val udaf = HiveUDAFFunction(name, hiveUDFWrapper, children)
    +          if (udaf.resolved) {
    +            udaf.dataType // Force it to check input data types.
    +          }
               udaf
    -        } else if (classOf[UDAF].isAssignableFrom(functionInfo.getFunctionClass)) {
    +        } else if (classOf[UDAF].isAssignableFrom(hiveUDFClass)) {
               val udaf = HiveUDAFFunction(
    -            name, new HiveFunctionWrapper(functionClassName), children, isUDAFBridgeRequired = true)
    -          udaf.dataType  // Force it to check input data types.
    +            name, hiveUDFWrapper, children, isUDAFBridgeRequired = true)
    +          if (udaf.resolved) {
    +            udaf.dataType  // Force it to check input data types.
    +          }
               udaf
    -        } else if (classOf[GenericUDTF].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udtf = HiveGenericUDTF(name, new HiveFunctionWrapper(functionClassName), children)
    -          udtf.elementTypes // Force it to check input data types.
    +        } else if (classOf[GenericUDTF].isAssignableFrom(hiveUDFClass)) {
    +          val udtf = HiveGenericUDTF(name, hiveUDFWrapper, children)
    +          if (udtf.resolved) {
    +            udtf.elementTypes // Force it to check input data types.
    +          }
               udtf
             } else {
    -          throw new AnalysisException(s"No handler for udf ${functionInfo.getFunctionClass}")
    +          throw new AnalysisException(s"No handler for udf ${hiveUDFClass}")
             }
           } catch {
             case analysisException: AnalysisException =>
    -          // If the exception is an AnalysisException, just throw it.
               throw analysisException
             case throwable: Throwable =>
    -          // If there is any other error, we throw an AnalysisException.
    -          val errorMessage = s"No handler for Hive udf ${functionInfo.getFunctionClass} " +
    +          val errorMessage = s"No handler for Hive udf ${hiveUDFClass} " +
                 s"because: ${throwable.getMessage}."
               throw new AnalysisException(errorMessage)
           }
         }
    +    builder
    +  }
    +
    +  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    +    val builder = underlying.lookupFunctionBuilder(name)
    +    if (builder.isDefined) {
    +      builder.get(children)
    +    } else {
    +      // We only look it up to see if it exists, but do not include it in the HiveUDF since it is
    +      // not always serializable.
    +      val optFunctionInfo = Option(getFunctionInfo(name.toLowerCase))
    +      if (optFunctionInfo.isEmpty) {
    +        val catalogFunc = loadHivePermanentFunction(name).getOrElse(
    --- End diff --
    
    Without this, the functions registered with `CREATE FUNCTION` DDL command can not be used. I've added a test in `SQLQuerySuite`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57958798
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -175,13 +176,49 @@ case class DescribeDatabase(
       }
     }
     
    +/**
    + * The DDL command that creates a function.
    + * alias: the class name that implements the created function.
    + * resources: Jars, files, or archives which need to be added to the environment when the function
    + *            is referenced for the first time by a session.
    + * isTemp: indicates if it is a temporary function.
    + */
     case class CreateFunction(
         databaseName: Option[String],
         functionName: String,
         alias: String,
         resources: Seq[(String, String)],
         isTemp: Boolean)(sql: String)
    -  extends NativeDDLCommand(sql) with Logging
    +  extends NativeDDLCommand(sql) with Logging {
    +
    +  override def run(sqlContext: SQLContext): Seq[Row] = {
    +    val func = FunctionIdentifier(functionName, databaseName)
    +    val catalogFunc = CatalogFunction(func, alias, resources)
    +    if (isTemp) {
    +      // Set `ignoreIfExists` to false, so if the temporary function already exists,
    +      // an exception will be thrown.
    +      val (info, builder) =
    +        sqlContext.sessionState.functionRegistry.makeFunctionBuilderAndInfo(functionName, alias)
    +      sqlContext.sessionState.catalog.createTempFunction(functionName, info, builder, false)
    +    } else {
    +      // Check if the function to create is already existing. If so, throw exception.
    +      var funcExisting: Boolean =
    --- End diff --
    
    `val funcExists`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r58017522
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala ---
    @@ -55,61 +57,133 @@ private[hive] class HiveFunctionRegistry(
         }
       }
     
    -  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    -    Try(underlying.lookupFunction(name, children)).getOrElse {
    -      // We only look it up to see if it exists, but do not include it in the HiveUDF since it is
    -      // not always serializable.
    -      val functionInfo: FunctionInfo =
    -        Option(getFunctionInfo(name.toLowerCase)).getOrElse(
    -          throw new AnalysisException(s"undefined function $name"))
    +  def loadHivePermanentFunction(name: String): Option[CatalogFunction] = {
    +    val databaseName = sessionStage.catalog.getCurrentDatabase
    +    val func = FunctionIdentifier(name, Option(databaseName))
    +    val catalogFunc =
    +      if (sessionStage.catalog.listFunctions(databaseName, name).size != 0) {
    +        Some(sessionStage.catalog.getFunction(func))
    +      } else {
    +        None
    +      }
    +    catalogFunc.map(_.resources.foreach { resource =>
    +      resource._1.toLowerCase match {
    +        case "jar" => sessionStage.ctx.addJar(resource._2)
    +        case _ =>
    +          sessionStage.ctx.runSqlHive(s"ADD FILE ${resource._2}")
    +          sessionStage.ctx.sparkContext.addFile(resource._2)
    +      }
    +    })
    +    catalogFunc
    +  }
     
    -      val functionClassName = functionInfo.getFunctionClass.getName
    +  override def makeFunctionBuilderAndInfo(
    +      name: String,
    +      functionClassName: String): (ExpressionInfo, FunctionBuilder) = {
    +    val hiveUDFWrapper = new HiveFunctionWrapper(functionClassName)
    +    val hiveUDFClass = hiveUDFWrapper.createFunction().getClass
    +    val info = new ExpressionInfo(functionClassName, name)
    +    val builder = makeHiveUDFBuilder(name, functionClassName, hiveUDFClass, hiveUDFWrapper)
    +    (info, builder)
    +  }
     
    -      // When we instantiate hive UDF wrapper class, we may throw exception if the input expressions
    -      // don't satisfy the hive UDF, such as type mismatch, input number mismatch, etc. Here we
    -      // catch the exception and throw AnalysisException instead.
    +  /**
    +   * Generates a Spark FunctionBuilder for a Hive UDF which is specified by a given classname.
    +   */
    +  def makeHiveUDFBuilder(
    +      name: String,
    +      functionClassName: String,
    +      hiveUDFClass: Class[_],
    +      hiveUDFWrapper: HiveFunctionWrapper): FunctionBuilder = {
    +    val builder = (children: Seq[Expression]) => {
           try {
    -        if (classOf[GenericUDFMacro].isAssignableFrom(functionInfo.getFunctionClass)) {
    +        if (classOf[GenericUDFMacro].isAssignableFrom(hiveUDFClass)) {
               val udf = HiveGenericUDF(
    -            name, new HiveFunctionWrapper(functionClassName, functionInfo.getGenericUDF), children)
    -          udf.dataType // Force it to check input data types.
    +            name, hiveUDFWrapper, children)
    +          if (udf.resolved) {
    +            udf.dataType // Force it to check input data types.
    +          }
               udf
    -        } else if (classOf[UDF].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udf = HiveSimpleUDF(name, new HiveFunctionWrapper(functionClassName), children)
    -          udf.dataType // Force it to check input data types.
    +        } else if (classOf[UDF].isAssignableFrom(hiveUDFClass)) {
    +          val udf = HiveSimpleUDF(name, hiveUDFWrapper, children)
    +          if (udf.resolved) {
    +            udf.dataType // Force it to check input data types.
    +          }
               udf
    -        } else if (classOf[GenericUDF].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udf = HiveGenericUDF(name, new HiveFunctionWrapper(functionClassName), children)
    -          udf.dataType // Force it to check input data types.
    +        } else if (classOf[GenericUDF].isAssignableFrom(hiveUDFClass)) {
    +          val udf = HiveGenericUDF(name, hiveUDFWrapper, children)
    +          if (udf.resolved) {
    +            udf.dataType // Force it to check input data types.
    +          }
               udf
             } else if (
    -          classOf[AbstractGenericUDAFResolver].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udaf = HiveUDAFFunction(name, new HiveFunctionWrapper(functionClassName), children)
    -          udaf.dataType // Force it to check input data types.
    +          classOf[AbstractGenericUDAFResolver].isAssignableFrom(hiveUDFClass)) {
    +          val udaf = HiveUDAFFunction(name, hiveUDFWrapper, children)
    +          if (udaf.resolved) {
    +            udaf.dataType // Force it to check input data types.
    +          }
               udaf
    -        } else if (classOf[UDAF].isAssignableFrom(functionInfo.getFunctionClass)) {
    +        } else if (classOf[UDAF].isAssignableFrom(hiveUDFClass)) {
               val udaf = HiveUDAFFunction(
    -            name, new HiveFunctionWrapper(functionClassName), children, isUDAFBridgeRequired = true)
    -          udaf.dataType  // Force it to check input data types.
    +            name, hiveUDFWrapper, children, isUDAFBridgeRequired = true)
    +          if (udaf.resolved) {
    +            udaf.dataType  // Force it to check input data types.
    +          }
               udaf
    -        } else if (classOf[GenericUDTF].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udtf = HiveGenericUDTF(name, new HiveFunctionWrapper(functionClassName), children)
    -          udtf.elementTypes // Force it to check input data types.
    +        } else if (classOf[GenericUDTF].isAssignableFrom(hiveUDFClass)) {
    +          val udtf = HiveGenericUDTF(name, hiveUDFWrapper, children)
    +          if (udtf.resolved) {
    +            udtf.elementTypes // Force it to check input data types.
    +          }
               udtf
             } else {
    -          throw new AnalysisException(s"No handler for udf ${functionInfo.getFunctionClass}")
    +          throw new AnalysisException(s"No handler for udf ${hiveUDFClass}")
             }
           } catch {
             case analysisException: AnalysisException =>
    -          // If the exception is an AnalysisException, just throw it.
               throw analysisException
             case throwable: Throwable =>
    -          // If there is any other error, we throw an AnalysisException.
    -          val errorMessage = s"No handler for Hive udf ${functionInfo.getFunctionClass} " +
    +          val errorMessage = s"No handler for Hive udf ${hiveUDFClass} " +
                 s"because: ${throwable.getMessage}."
               throw new AnalysisException(errorMessage)
           }
         }
    +    builder
    +  }
    +
    +  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    +    val builder = underlying.lookupFunctionBuilder(name)
    +    if (builder.isDefined) {
    +      builder.get(children)
    +    } else {
    +      // We only look it up to see if it exists, but do not include it in the HiveUDF since it is
    +      // not always serializable.
    +      val optFunctionInfo = Option(getFunctionInfo(name.toLowerCase))
    +      if (optFunctionInfo.isEmpty) {
    +        val catalogFunc = loadHivePermanentFunction(name).getOrElse(
    --- End diff --
    
    Because we main hive permanent functions in Spark too, when `getFunctionInfo` returns null, we can get `CatalogFunction` through catalog and load necessary jars, files. Then we create `FunctionBuilder` as the same as temporary function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-204217824
  
    @yhuai OK. Please let me know if any changes are not clear to you. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57959557
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -193,7 +230,28 @@ case class DropFunction(
         functionName: String,
         ifExists: Boolean,
         isTemp: Boolean)(sql: String)
    -  extends NativeDDLCommand(sql) with Logging
    +  extends NativeDDLCommand(sql) with Logging {
    --- End diff --
    
    Same here, this should not be `NativeDDLCommand` anymore. Then you don't need to pass in the `sql` string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57958583
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -175,13 +176,49 @@ case class DescribeDatabase(
       }
     }
     
    +/**
    + * The DDL command that creates a function.
    + * alias: the class name that implements the created function.
    + * resources: Jars, files, or archives which need to be added to the environment when the function
    + *            is referenced for the first time by a session.
    + * isTemp: indicates if it is a temporary function.
    + */
     case class CreateFunction(
         databaseName: Option[String],
         functionName: String,
         alias: String,
         resources: Seq[(String, String)],
         isTemp: Boolean)(sql: String)
    -  extends NativeDDLCommand(sql) with Logging
    +  extends NativeDDLCommand(sql) with Logging {
    +
    +  override def run(sqlContext: SQLContext): Seq[Row] = {
    +    val func = FunctionIdentifier(functionName, databaseName)
    +    val catalogFunc = CatalogFunction(func, alias, resources)
    +    if (isTemp) {
    +      // Set `ignoreIfExists` to false, so if the temporary function already exists,
    +      // an exception will be thrown.
    +      val (info, builder) =
    +        sqlContext.sessionState.functionRegistry.makeFunctionBuilderAndInfo(functionName, alias)
    +      sqlContext.sessionState.catalog.createTempFunction(functionName, info, builder, false)
    --- End diff --
    
    please name parameter `ignoreIfExists = false`, then you don't need the comment up there


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202961179
  
    **[Test build #54442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54442/consoleFull)** for PR 12036 at commit [`b8dda84`](https://github.com/apache/spark/commit/b8dda845dde32202c60db668270e73a0d3513309).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57943265
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -52,6 +53,15 @@ trait FunctionRegistry {
       /** Drop a function and return whether the function existed. */
       def dropFunction(name: String): Boolean
     
    +  /* Return the FunctionBuilder and ExpressionInfo for the specified function name and classname. */
    +  def makeFunctionBuilderAndInfo(
    +    name: String,
    +    functionClassName: String): (ExpressionInfo, FunctionBuilder) = {
    +    val clazz = Utils.getContextOrSparkClassLoader.loadClass(functionClassName)
    +    val (_, (info, builder)) =
    +      FunctionRegistry.expression(name, clazz.asInstanceOf[Class[Expression]])
    --- End diff --
    
    We shouldn't expose this to the user. The reflection hacks we do down there are intended to be for internal use only so we don't want the user to do something like:
    ```
    CREATE TEMPORARY FUNCTION my_func AS com.x.y.z.MyFunc
    ```
    where `MyFunc` has the special `Expression` constructors that we expect.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57944651
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala ---
    @@ -55,61 +57,133 @@ private[hive] class HiveFunctionRegistry(
         }
       }
     
    -  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    -    Try(underlying.lookupFunction(name, children)).getOrElse {
    -      // We only look it up to see if it exists, but do not include it in the HiveUDF since it is
    -      // not always serializable.
    -      val functionInfo: FunctionInfo =
    -        Option(getFunctionInfo(name.toLowerCase)).getOrElse(
    -          throw new AnalysisException(s"undefined function $name"))
    +  def loadHivePermanentFunction(name: String): Option[CatalogFunction] = {
    +    val databaseName = sessionStage.catalog.getCurrentDatabase
    +    val func = FunctionIdentifier(name, Option(databaseName))
    +    val catalogFunc =
    +      if (sessionStage.catalog.listFunctions(databaseName, name).size != 0) {
    +        Some(sessionStage.catalog.getFunction(func))
    +      } else {
    +        None
    +      }
    +    catalogFunc.map(_.resources.foreach { resource =>
    +      resource._1.toLowerCase match {
    +        case "jar" => sessionStage.ctx.addJar(resource._2)
    +        case _ =>
    +          sessionStage.ctx.runSqlHive(s"ADD FILE ${resource._2}")
    +          sessionStage.ctx.sparkContext.addFile(resource._2)
    +      }
    +    })
    +    catalogFunc
    +  }
     
    -      val functionClassName = functionInfo.getFunctionClass.getName
    +  override def makeFunctionBuilderAndInfo(
    +      name: String,
    +      functionClassName: String): (ExpressionInfo, FunctionBuilder) = {
    +    val hiveUDFWrapper = new HiveFunctionWrapper(functionClassName)
    +    val hiveUDFClass = hiveUDFWrapper.createFunction().getClass
    +    val info = new ExpressionInfo(functionClassName, name)
    +    val builder = makeHiveUDFBuilder(name, functionClassName, hiveUDFClass, hiveUDFWrapper)
    +    (info, builder)
    +  }
     
    -      // When we instantiate hive UDF wrapper class, we may throw exception if the input expressions
    -      // don't satisfy the hive UDF, such as type mismatch, input number mismatch, etc. Here we
    -      // catch the exception and throw AnalysisException instead.
    +  /**
    +   * Generates a Spark FunctionBuilder for a Hive UDF which is specified by a given classname.
    --- End diff --
    
    nit: Instead of `Spark FunctionBuilder` just do `[[FunctionBuilder]]`. It's pretty clear we're talking about Spark here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203166453
  
    > So I think our works are duplicate in some places. Do you want me to wait for #12051 to merge first and then rebase against it? Thanks!
    
    That would be best.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya closed the pull request at:

    https://github.com/apache/spark/pull/12036


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202948702
  
    The failed tests are due to describe function DDL command. Let me fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203394808
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54511/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202874429
  
    **[Test build #54434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54434/consoleFull)** for PR 12036 at commit [`35ad7ae`](https://github.com/apache/spark/commit/35ad7ae14ebff693a8e15dd231eb69be7061402f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-204208195
  
    @viirya Thank you for working on it. I think the overall approach makes sense. Looks like we can adjust how we organize SessionState, SessionCatalog, and FunctionRegistry to improve the structure of the code (@andrewor14 also mentioned about it). I'd like to play with it based on this PR. Would you mind to hold off your future changes until I send out commits based on your current branch? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203111573
  
    @viirya I took a look at the existing code in `HiveFunctionRegistry` and I think there's some cleanup we need to do before we can implement these DDLs. In particular, temporary functions should not go through Hive (SPARK-14253). Instead, we should just create a function builder ourselves every time we get `CREATE TEMPORARY FUNCTION` and use that to call `sessionState.catalog.createTempFunction`. This ensures all temporary functions, even the ones extending Hive UDF classes, are maintained only in Spark. THEN it will be much easier to implement the DDLs in a clean way.
    
    I would recommend putting this task on hold in the mean time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203236224
  
    **[Test build #54490 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54490/consoleFull)** for PR 12036 at commit [`314c4db`](https://github.com/apache/spark/commit/314c4db851da5e832fb995f5538e66d4c1ddc89a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203242724
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57770133
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -86,13 +87,50 @@ case class DescribeDatabase(
         extended: Boolean)(sql: String)
       extends NativeDDLCommand(sql) with Logging
     
    +/**
    + * The DDL command that creates a function.
    + * alias: the class name that implements the created function.
    + * resources: Jars, files, or archives which need to be added to the environment when the function
    + *            is referenced for the first time by a session.
    + * isTemp: indicates if it is a temporary function.
    + */
     case class CreateFunction(
         databaseName: Option[String],
         functionName: String,
         alias: String,
         resources: Seq[(String, String)],
         isTemp: Boolean)(sql: String)
    -  extends NativeDDLCommand(sql) with Logging
    +  extends NativeDDLCommand(sql) with Logging {
    +
    +  override def run(sqlContext: SQLContext): Seq[Row] = {
    +    val func = FunctionIdentifier(functionName, databaseName)
    +    val catalogFunc = CatalogFunction(func, alias)
    +    if (isTemp) {
    +      // Set `ignoreIfExists` to false, so if the temporary function already exists,
    +      // an exception will be thrown.
    +      val (info, builder) =
    +        sqlContext.sessionState.functionRegistry.getFunctionBuilderAndInfo(functionName, alias)
    +      sqlContext.sessionState.catalog.createTempFunction(functionName, info, builder, false)
    +    } else {
    +      // Check if the function to create is already existing. If so, throw exception.
    +      var funcExisting: Boolean = true
    +      try {
    +        if (sqlContext.sessionState.catalog.getFunction(func) == null) {
    +          funcExisting = false
    +        }
    +      } catch {
    +        case _: NoSuchFunctionException => funcExisting = false
    +      }
    --- End diff --
    
    this is pretty round about way to write it. You can simplify this:
    ```
    val funcExists =
      try {
        sqlContext.sessionState.catalog.getFunction(func) != null
      } catch {
        case _: NoSuchFunctionException => false
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203242726
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54486/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57945132
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -611,6 +615,9 @@ private[hive] class HiveClientImpl(
           .asInstanceOf[Class[_ <: org.apache.hadoop.hive.ql.io.HiveOutputFormat[_, _]]]
     
       private def toHiveFunction(f: CatalogFunction, db: String): HiveFunction = {
    +    val resourceUris = f.resources.map { resource =>
    --- End diff --
    
    `f.resources.map { case (resourceType, resourcePath) =>`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r58021704
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -557,7 +558,10 @@ private[hive] class HiveClientImpl(
       override def getFunctionOption(
           db: String,
           name: String): Option[CatalogFunction] = withHiveState {
    -    Option(client.getFunction(db, name)).map(fromHiveFunction)
    +    client.getFunction(db, name)
    +    Try {
    +      Option(client.getFunction(db, name)).map(fromHiveFunction)
    --- End diff --
    
    yes. it is for local test. I should remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203855898
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57944391
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala ---
    @@ -44,7 +45,8 @@ import org.apache.spark.sql.types._
     
     private[hive] class HiveFunctionRegistry(
         underlying: analysis.FunctionRegistry,
    -    executionHive: HiveClientImpl)
    +    executionHive: HiveClientImpl,
    +    sessionStage: HiveSessionState)
    --- End diff --
    
    sessionState


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57963958
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala ---
    @@ -55,61 +57,133 @@ private[hive] class HiveFunctionRegistry(
         }
       }
     
    -  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    -    Try(underlying.lookupFunction(name, children)).getOrElse {
    -      // We only look it up to see if it exists, but do not include it in the HiveUDF since it is
    -      // not always serializable.
    -      val functionInfo: FunctionInfo =
    -        Option(getFunctionInfo(name.toLowerCase)).getOrElse(
    -          throw new AnalysisException(s"undefined function $name"))
    +  def loadHivePermanentFunction(name: String): Option[CatalogFunction] = {
    +    val databaseName = sessionStage.catalog.getCurrentDatabase
    +    val func = FunctionIdentifier(name, Option(databaseName))
    +    val catalogFunc =
    +      if (sessionStage.catalog.listFunctions(databaseName, name).size != 0) {
    +        Some(sessionStage.catalog.getFunction(func))
    +      } else {
    +        None
    +      }
    +    catalogFunc.map(_.resources.foreach { resource =>
    +      resource._1.toLowerCase match {
    +        case "jar" => sessionStage.ctx.addJar(resource._2)
    +        case _ =>
    +          sessionStage.ctx.runSqlHive(s"ADD FILE ${resource._2}")
    +          sessionStage.ctx.sparkContext.addFile(resource._2)
    +      }
    +    })
    +    catalogFunc
    +  }
     
    -      val functionClassName = functionInfo.getFunctionClass.getName
    +  override def makeFunctionBuilderAndInfo(
    +      name: String,
    +      functionClassName: String): (ExpressionInfo, FunctionBuilder) = {
    +    val hiveUDFWrapper = new HiveFunctionWrapper(functionClassName)
    +    val hiveUDFClass = hiveUDFWrapper.createFunction().getClass
    +    val info = new ExpressionInfo(functionClassName, name)
    +    val builder = makeHiveUDFBuilder(name, functionClassName, hiveUDFClass, hiveUDFWrapper)
    +    (info, builder)
    +  }
     
    -      // When we instantiate hive UDF wrapper class, we may throw exception if the input expressions
    -      // don't satisfy the hive UDF, such as type mismatch, input number mismatch, etc. Here we
    -      // catch the exception and throw AnalysisException instead.
    +  /**
    +   * Generates a Spark FunctionBuilder for a Hive UDF which is specified by a given classname.
    +   */
    +  def makeHiveUDFBuilder(
    +      name: String,
    +      functionClassName: String,
    +      hiveUDFClass: Class[_],
    +      hiveUDFWrapper: HiveFunctionWrapper): FunctionBuilder = {
    +    val builder = (children: Seq[Expression]) => {
           try {
    -        if (classOf[GenericUDFMacro].isAssignableFrom(functionInfo.getFunctionClass)) {
    +        if (classOf[GenericUDFMacro].isAssignableFrom(hiveUDFClass)) {
               val udf = HiveGenericUDF(
    -            name, new HiveFunctionWrapper(functionClassName, functionInfo.getGenericUDF), children)
    -          udf.dataType // Force it to check input data types.
    +            name, hiveUDFWrapper, children)
    +          if (udf.resolved) {
    +            udf.dataType // Force it to check input data types.
    +          }
               udf
    -        } else if (classOf[UDF].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udf = HiveSimpleUDF(name, new HiveFunctionWrapper(functionClassName), children)
    -          udf.dataType // Force it to check input data types.
    +        } else if (classOf[UDF].isAssignableFrom(hiveUDFClass)) {
    +          val udf = HiveSimpleUDF(name, hiveUDFWrapper, children)
    +          if (udf.resolved) {
    +            udf.dataType // Force it to check input data types.
    +          }
               udf
    -        } else if (classOf[GenericUDF].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udf = HiveGenericUDF(name, new HiveFunctionWrapper(functionClassName), children)
    -          udf.dataType // Force it to check input data types.
    +        } else if (classOf[GenericUDF].isAssignableFrom(hiveUDFClass)) {
    +          val udf = HiveGenericUDF(name, hiveUDFWrapper, children)
    +          if (udf.resolved) {
    +            udf.dataType // Force it to check input data types.
    +          }
               udf
             } else if (
    -          classOf[AbstractGenericUDAFResolver].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udaf = HiveUDAFFunction(name, new HiveFunctionWrapper(functionClassName), children)
    -          udaf.dataType // Force it to check input data types.
    +          classOf[AbstractGenericUDAFResolver].isAssignableFrom(hiveUDFClass)) {
    +          val udaf = HiveUDAFFunction(name, hiveUDFWrapper, children)
    +          if (udaf.resolved) {
    +            udaf.dataType // Force it to check input data types.
    +          }
               udaf
    -        } else if (classOf[UDAF].isAssignableFrom(functionInfo.getFunctionClass)) {
    +        } else if (classOf[UDAF].isAssignableFrom(hiveUDFClass)) {
               val udaf = HiveUDAFFunction(
    -            name, new HiveFunctionWrapper(functionClassName), children, isUDAFBridgeRequired = true)
    -          udaf.dataType  // Force it to check input data types.
    +            name, hiveUDFWrapper, children, isUDAFBridgeRequired = true)
    +          if (udaf.resolved) {
    +            udaf.dataType  // Force it to check input data types.
    +          }
               udaf
    -        } else if (classOf[GenericUDTF].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udtf = HiveGenericUDTF(name, new HiveFunctionWrapper(functionClassName), children)
    -          udtf.elementTypes // Force it to check input data types.
    +        } else if (classOf[GenericUDTF].isAssignableFrom(hiveUDFClass)) {
    +          val udtf = HiveGenericUDTF(name, hiveUDFWrapper, children)
    +          if (udtf.resolved) {
    +            udtf.elementTypes // Force it to check input data types.
    +          }
               udtf
             } else {
    -          throw new AnalysisException(s"No handler for udf ${functionInfo.getFunctionClass}")
    +          throw new AnalysisException(s"No handler for udf ${hiveUDFClass}")
             }
           } catch {
             case analysisException: AnalysisException =>
    -          // If the exception is an AnalysisException, just throw it.
               throw analysisException
             case throwable: Throwable =>
    -          // If there is any other error, we throw an AnalysisException.
    -          val errorMessage = s"No handler for Hive udf ${functionInfo.getFunctionClass} " +
    +          val errorMessage = s"No handler for Hive udf ${hiveUDFClass} " +
                 s"because: ${throwable.getMessage}."
               throw new AnalysisException(errorMessage)
           }
         }
    +    builder
    +  }
    +
    +  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    +    val builder = underlying.lookupFunctionBuilder(name)
    --- End diff --
    
    why not just
    ```
    underlying.lookupFunctionBuilder(name).map { f => f(children) }.getOrElse {
      ...
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57769684
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala ---
    @@ -71,7 +71,7 @@ private[hive] class HiveSessionState(ctx: HiveContext) extends SessionState(ctx)
       /**
        * Parser for HiveQl query texts.
        */
    -  override lazy val sqlParser: ParserInterface = HiveSqlParser
    +  override lazy val sqlParser: ParserInterface = HiveSqlParser(this)
    --- End diff --
    
    don't pass in the `SessionState` here if you're just gonna use the function registry. In general it's better to pass in narrower things so we don't have a tight coupling across classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203258597
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54490/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57943872
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -67,9 +77,14 @@ class SimpleFunctionRegistry extends FunctionRegistry {
       }
     
       override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    +    val builder = functionBuilders.get(name)
    +    if (builder.isEmpty) {
    +      throw new AnalysisException(s"undefined function $name")
    +    }
         val func = synchronized {
    -      functionBuilders.get(name).map(_._2).getOrElse {
    -        throw new AnalysisException(s"undefined function $name")
    +      Try(builder.map(_._2)) match {
    --- End diff --
    
    What exception could `Option.map` possibly throw? This method doesn't need to change at all and the old code is more concise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203168856
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203638477
  
    I just did a close review and there were a few design choices that I would like to discuss further. The biggest thing is I'm not sure if it's the responsibility of `HiveFunctionRegistry.lookupFunction` to load the hive jars as well. Right now we need to pass in `HiveContext` and `SessionCatalog` into `HiveFunctionRegistry`, which seems like a messy design because it couples these few classes too closely. I think a better place to load the hive jars will be in `HiveSessionCatalog` itself, which already takes in a `HiveContext`.
    
    @yhuai What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203870738
  
    **[Test build #54617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54617/consoleFull)** for PR 12036 at commit [`65d9dbd`](https://github.com/apache/spark/commit/65d9dbdfbb26c0a1e917767aae08f73df4f9763c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203168858
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54471/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57966366
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala ---
    @@ -55,61 +57,133 @@ private[hive] class HiveFunctionRegistry(
         }
       }
     
    -  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    -    Try(underlying.lookupFunction(name, children)).getOrElse {
    -      // We only look it up to see if it exists, but do not include it in the HiveUDF since it is
    -      // not always serializable.
    -      val functionInfo: FunctionInfo =
    -        Option(getFunctionInfo(name.toLowerCase)).getOrElse(
    -          throw new AnalysisException(s"undefined function $name"))
    +  def loadHivePermanentFunction(name: String): Option[CatalogFunction] = {
    +    val databaseName = sessionStage.catalog.getCurrentDatabase
    +    val func = FunctionIdentifier(name, Option(databaseName))
    +    val catalogFunc =
    +      if (sessionStage.catalog.listFunctions(databaseName, name).size != 0) {
    --- End diff --
    
    also, I'm not sure if this is the responsibility of `HiveFunctionRegistry`. Let's talk about that in the main thread.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203152551
  
    **[Test build #54471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54471/consoleFull)** for PR 12036 at commit [`ee957db`](https://github.com/apache/spark/commit/ee957db463300c0b3f9e5194d9400d598b29509d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202874801
  
    **[Test build #54434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54434/consoleFull)** for PR 12036 at commit [`35ad7ae`](https://github.com/apache/spark/commit/35ad7ae14ebff693a8e15dd231eb69be7061402f).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203159944
  
    My previous commit gets a test failure due to the incorrect `ExpressionInfo`. The current method in `FunctionRegistry` to retrieve `FunctionBuilder` is not working. It directly gets canonical name from `FunctionBuilder`. However, by doing this, you would not get the correct class name of your original function class. So I change it to pass `ExpressionInfo`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57769787
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -86,13 +87,50 @@ case class DescribeDatabase(
         extended: Boolean)(sql: String)
       extends NativeDDLCommand(sql) with Logging
     
    +/**
    + * The DDL command that creates a function.
    + * alias: the class name that implements the created function.
    + * resources: Jars, files, or archives which need to be added to the environment when the function
    + *            is referenced for the first time by a session.
    + * isTemp: indicates if it is a temporary function.
    + */
     case class CreateFunction(
         databaseName: Option[String],
         functionName: String,
         alias: String,
         resources: Seq[(String, String)],
    --- End diff --
    
    resources is not used here? I think we need to change `CatalogFunction` to include resources so we can propagate that to Hive properly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57944863
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -557,7 +558,10 @@ private[hive] class HiveClientImpl(
       override def getFunctionOption(
           db: String,
           name: String): Option[CatalogFunction] = withHiveState {
    -    Option(client.getFunction(db, name)).map(fromHiveFunction)
    +    client.getFunction(db, name)
    +    Try {
    +      Option(client.getFunction(db, name)).map(fromHiveFunction)
    --- End diff --
    
    is this a mistake? Why do we call `getFunction` twice?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203899215
  
    **[Test build #54617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54617/consoleFull)** for PR 12036 at commit [`65d9dbd`](https://github.com/apache/spark/commit/65d9dbdfbb26c0a1e917767aae08f73df4f9763c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203574057
  
    @viirya Sure, let's focus on this patch then. I looked at the latest changes and recognized the similarities, but I still found many interfaces introduced in this patch overly complicated. I will comment what I mean more specifically inline.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202885944
  
    **[Test build #54437 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54437/consoleFull)** for PR 12036 at commit [`cb29f0f`](https://github.com/apache/spark/commit/cb29f0fb9d66b3bf3774dd9f76a82af365c39deb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57817228
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -337,13 +352,12 @@ object FunctionRegistry {
         fr
       }
     
    -  /** See usage above. */
    -  def expression[T <: Expression](name: String)
    -      (implicit tag: ClassTag[T]): (String, (ExpressionInfo, FunctionBuilder)) = {
    -
    +  def expression[T <: Expression](
    --- End diff --
    
    We create `FunctionBuilder` here for temporary functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203210735
  
    **[Test build #54486 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54486/consoleFull)** for PR 12036 at commit [`6b76980`](https://github.com/apache/spark/commit/6b769801108d64711fad899e48d08b53903a5f3f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202921726
  
    **[Test build #54439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54439/consoleFull)** for PR 12036 at commit [`77848c9`](https://github.com/apache/spark/commit/77848c910a78165416e7c069b25eaaee1b9c5a93).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class HiveSqlParser(sessionState: HiveSessionState) extends AbstractSqlParser `
      * `class HiveSqlAstBuilder(sessionState: HiveSessionState) extends SparkSqlAstBuilder `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203855901
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54612/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57815967
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala ---
    @@ -71,7 +71,7 @@ private[hive] class HiveSessionState(ctx: HiveContext) extends SessionState(ctx)
       /**
        * Parser for HiveQl query texts.
        */
    -  override lazy val sqlParser: ParserInterface = HiveSqlParser
    +  override lazy val sqlParser: ParserInterface = HiveSqlParser(this)
    --- End diff --
    
    ok. got it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203396981
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54512/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203855868
  
    **[Test build #54612 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54612/consoleFull)** for PR 12036 at commit [`2cab41c`](https://github.com/apache/spark/commit/2cab41cdc254764ed976842475ce2c2547b0ddcc).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Resource(resourceType: String, path: String)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57960922
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala ---
    @@ -55,61 +57,133 @@ private[hive] class HiveFunctionRegistry(
         }
       }
     
    -  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    -    Try(underlying.lookupFunction(name, children)).getOrElse {
    -      // We only look it up to see if it exists, but do not include it in the HiveUDF since it is
    -      // not always serializable.
    -      val functionInfo: FunctionInfo =
    -        Option(getFunctionInfo(name.toLowerCase)).getOrElse(
    -          throw new AnalysisException(s"undefined function $name"))
    +  def loadHivePermanentFunction(name: String): Option[CatalogFunction] = {
    +    val databaseName = sessionStage.catalog.getCurrentDatabase
    +    val func = FunctionIdentifier(name, Option(databaseName))
    +    val catalogFunc =
    +      if (sessionStage.catalog.listFunctions(databaseName, name).size != 0) {
    --- End diff --
    
    Looks like here you can use the `functionExists` method I suggested above too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57958471
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -175,13 +176,49 @@ case class DescribeDatabase(
       }
     }
     
    +/**
    + * The DDL command that creates a function.
    + * alias: the class name that implements the created function.
    + * resources: Jars, files, or archives which need to be added to the environment when the function
    + *            is referenced for the first time by a session.
    + * isTemp: indicates if it is a temporary function.
    + */
     case class CreateFunction(
         databaseName: Option[String],
         functionName: String,
         alias: String,
         resources: Seq[(String, String)],
         isTemp: Boolean)(sql: String)
    -  extends NativeDDLCommand(sql) with Logging
    +  extends NativeDDLCommand(sql) with Logging {
    --- End diff --
    
    this should extend `RunnableCommand` now right? Also we don't need the Logging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202962334
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54442/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57945206
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -619,12 +626,21 @@ private[hive] class HiveClientImpl(
           PrincipalType.USER,
           (System.currentTimeMillis / 1000).toInt,
           FunctionType.JAVA,
    -      List.empty[ResourceUri].asJava)
    +      resourceUris.asJava)
       }
     
       private def fromHiveFunction(hf: HiveFunction): CatalogFunction = {
         val name = FunctionIdentifier(hf.getFunctionName, Option(hf.getDbName))
    -    new CatalogFunction(name, hf.getClassName)
    +    val resources = hf.getResourceUris.asScala.map { uri =>
    +      val resourceType = uri.getResourceType() match {
    +        case ResourceType.ARCHIVE => "archive"
    +        case ResourceType.FILE => "file"
    +        case ResourceType.JAR => "jar"
    +        case r => throw new SparkException(s"Unknown resource type: $r")
    --- End diff --
    
    should this be an `AnalysisException`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203153077
  
    @andrewor14 Thanks for comment. But I think you might miss reading the code changes in this PR.
    
    This PR already did what you said, i.e., to avoid registering temporary functions in Hive. Temporary functions are not go through Hive after this PR and we create function builder ourselves in `HiveFunctionRegistry` here.
    
    So the temporary functions extending Hive UDF classes, are maintained in Spark with this PR.
    
    I would hope we don't duplicate our works. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203161583
  
    @andrewor14 I quickly go though #12051. You simplify the existing codes. Basically I did the similar thing here to make a method to create `FunctionBuilder` (i.e., `makeFunctionBuilder` in #12051).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203900123
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203258596
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203355991
  
    **[Test build #54511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54511/consoleFull)** for PR 12036 at commit [`c370c47`](https://github.com/apache/spark/commit/c370c479b388606226b91e19da791f328014559e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202962331
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202896600
  
    **[Test build #54437 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54437/consoleFull)** for PR 12036 at commit [`cb29f0f`](https://github.com/apache/spark/commit/cb29f0fb9d66b3bf3774dd9f76a82af365c39deb).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57959159
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -175,13 +176,49 @@ case class DescribeDatabase(
       }
     }
     
    +/**
    + * The DDL command that creates a function.
    + * alias: the class name that implements the created function.
    + * resources: Jars, files, or archives which need to be added to the environment when the function
    + *            is referenced for the first time by a session.
    + * isTemp: indicates if it is a temporary function.
    + */
     case class CreateFunction(
         databaseName: Option[String],
         functionName: String,
         alias: String,
         resources: Seq[(String, String)],
         isTemp: Boolean)(sql: String)
    -  extends NativeDDLCommand(sql) with Logging
    +  extends NativeDDLCommand(sql) with Logging {
    +
    +  override def run(sqlContext: SQLContext): Seq[Row] = {
    +    val func = FunctionIdentifier(functionName, databaseName)
    +    val catalogFunc = CatalogFunction(func, alias, resources)
    +    if (isTemp) {
    +      // Set `ignoreIfExists` to false, so if the temporary function already exists,
    +      // an exception will be thrown.
    +      val (info, builder) =
    +        sqlContext.sessionState.functionRegistry.makeFunctionBuilderAndInfo(functionName, alias)
    +      sqlContext.sessionState.catalog.createTempFunction(functionName, info, builder, false)
    +    } else {
    +      // Check if the function to create is already existing. If so, throw exception.
    +      var funcExisting: Boolean =
    --- End diff --
    
    actually I would push this into the session catalog itself and implement a `functionExists` method there


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-204204182
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54680/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203396979
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202921814
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54439/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202874812
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54434/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57802797
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -86,13 +87,50 @@ case class DescribeDatabase(
         extended: Boolean)(sql: String)
       extends NativeDDLCommand(sql) with Logging
     
    +/**
    + * The DDL command that creates a function.
    + * alias: the class name that implements the created function.
    + * resources: Jars, files, or archives which need to be added to the environment when the function
    + *            is referenced for the first time by a session.
    + * isTemp: indicates if it is a temporary function.
    + */
     case class CreateFunction(
         databaseName: Option[String],
         functionName: String,
         alias: String,
         resources: Seq[(String, String)],
         isTemp: Boolean)(sql: String)
    -  extends NativeDDLCommand(sql) with Logging
    +  extends NativeDDLCommand(sql) with Logging {
    +
    +  override def run(sqlContext: SQLContext): Seq[Row] = {
    +    val func = FunctionIdentifier(functionName, databaseName)
    +    val catalogFunc = CatalogFunction(func, alias)
    +    if (isTemp) {
    +      // Set `ignoreIfExists` to false, so if the temporary function already exists,
    +      // an exception will be thrown.
    +      val (info, builder) =
    +        sqlContext.sessionState.functionRegistry.getFunctionBuilderAndInfo(functionName, alias)
    +      sqlContext.sessionState.catalog.createTempFunction(functionName, info, builder, false)
    --- End diff --
    
    This is very confusing; We're supposed to be creating a new temporary function but somehow we just get it from the function registry assuming it's already there. It seems that `getFunctionBuilderAndInfo` doesn't actually "get" the function, but also creates a new one and registers it. This doesn't seem like the right way to approach this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57815957
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -86,13 +87,50 @@ case class DescribeDatabase(
         extended: Boolean)(sql: String)
       extends NativeDDLCommand(sql) with Logging
     
    +/**
    + * The DDL command that creates a function.
    + * alias: the class name that implements the created function.
    + * resources: Jars, files, or archives which need to be added to the environment when the function
    + *            is referenced for the first time by a session.
    + * isTemp: indicates if it is a temporary function.
    + */
     case class CreateFunction(
         databaseName: Option[String],
         functionName: String,
         alias: String,
         resources: Seq[(String, String)],
    --- End diff --
    
    yah, I should put the resources into it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57964175
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala ---
    @@ -55,6 +55,7 @@ class HiveQuerySuite extends HiveComparisonTest with BeforeAndAfter {
         TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))
         // Add Locale setting
         Locale.setDefault(Locale.US)
    +    sql("DROP TEMPORARY FUNCTION IF EXISTS udtf_count2")
    --- End diff --
    
    shouldn't we do this in `afterAll`? If there's some leftover test state from other test suites then we should fix that instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-204204179
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57963347
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala ---
    @@ -55,61 +57,133 @@ private[hive] class HiveFunctionRegistry(
         }
       }
     
    -  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    -    Try(underlying.lookupFunction(name, children)).getOrElse {
    -      // We only look it up to see if it exists, but do not include it in the HiveUDF since it is
    -      // not always serializable.
    -      val functionInfo: FunctionInfo =
    -        Option(getFunctionInfo(name.toLowerCase)).getOrElse(
    -          throw new AnalysisException(s"undefined function $name"))
    +  def loadHivePermanentFunction(name: String): Option[CatalogFunction] = {
    +    val databaseName = sessionStage.catalog.getCurrentDatabase
    +    val func = FunctionIdentifier(name, Option(databaseName))
    +    val catalogFunc =
    +      if (sessionStage.catalog.listFunctions(databaseName, name).size != 0) {
    +        Some(sessionStage.catalog.getFunction(func))
    +      } else {
    +        None
    +      }
    +    catalogFunc.map(_.resources.foreach { resource =>
    +      resource._1.toLowerCase match {
    +        case "jar" => sessionStage.ctx.addJar(resource._2)
    +        case _ =>
    +          sessionStage.ctx.runSqlHive(s"ADD FILE ${resource._2}")
    +          sessionStage.ctx.sparkContext.addFile(resource._2)
    +      }
    +    })
    +    catalogFunc
    +  }
     
    -      val functionClassName = functionInfo.getFunctionClass.getName
    +  override def makeFunctionBuilderAndInfo(
    +      name: String,
    +      functionClassName: String): (ExpressionInfo, FunctionBuilder) = {
    +    val hiveUDFWrapper = new HiveFunctionWrapper(functionClassName)
    +    val hiveUDFClass = hiveUDFWrapper.createFunction().getClass
    +    val info = new ExpressionInfo(functionClassName, name)
    +    val builder = makeHiveUDFBuilder(name, functionClassName, hiveUDFClass, hiveUDFWrapper)
    +    (info, builder)
    +  }
     
    -      // When we instantiate hive UDF wrapper class, we may throw exception if the input expressions
    -      // don't satisfy the hive UDF, such as type mismatch, input number mismatch, etc. Here we
    -      // catch the exception and throw AnalysisException instead.
    +  /**
    +   * Generates a Spark FunctionBuilder for a Hive UDF which is specified by a given classname.
    +   */
    +  def makeHiveUDFBuilder(
    +      name: String,
    +      functionClassName: String,
    +      hiveUDFClass: Class[_],
    +      hiveUDFWrapper: HiveFunctionWrapper): FunctionBuilder = {
    +    val builder = (children: Seq[Expression]) => {
           try {
    -        if (classOf[GenericUDFMacro].isAssignableFrom(functionInfo.getFunctionClass)) {
    +        if (classOf[GenericUDFMacro].isAssignableFrom(hiveUDFClass)) {
               val udf = HiveGenericUDF(
    -            name, new HiveFunctionWrapper(functionClassName, functionInfo.getGenericUDF), children)
    -          udf.dataType // Force it to check input data types.
    +            name, hiveUDFWrapper, children)
    +          if (udf.resolved) {
    +            udf.dataType // Force it to check input data types.
    +          }
               udf
    -        } else if (classOf[UDF].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udf = HiveSimpleUDF(name, new HiveFunctionWrapper(functionClassName), children)
    -          udf.dataType // Force it to check input data types.
    +        } else if (classOf[UDF].isAssignableFrom(hiveUDFClass)) {
    +          val udf = HiveSimpleUDF(name, hiveUDFWrapper, children)
    +          if (udf.resolved) {
    +            udf.dataType // Force it to check input data types.
    +          }
               udf
    -        } else if (classOf[GenericUDF].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udf = HiveGenericUDF(name, new HiveFunctionWrapper(functionClassName), children)
    -          udf.dataType // Force it to check input data types.
    +        } else if (classOf[GenericUDF].isAssignableFrom(hiveUDFClass)) {
    +          val udf = HiveGenericUDF(name, hiveUDFWrapper, children)
    +          if (udf.resolved) {
    +            udf.dataType // Force it to check input data types.
    +          }
               udf
             } else if (
    -          classOf[AbstractGenericUDAFResolver].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udaf = HiveUDAFFunction(name, new HiveFunctionWrapper(functionClassName), children)
    -          udaf.dataType // Force it to check input data types.
    +          classOf[AbstractGenericUDAFResolver].isAssignableFrom(hiveUDFClass)) {
    +          val udaf = HiveUDAFFunction(name, hiveUDFWrapper, children)
    +          if (udaf.resolved) {
    +            udaf.dataType // Force it to check input data types.
    +          }
               udaf
    -        } else if (classOf[UDAF].isAssignableFrom(functionInfo.getFunctionClass)) {
    +        } else if (classOf[UDAF].isAssignableFrom(hiveUDFClass)) {
               val udaf = HiveUDAFFunction(
    -            name, new HiveFunctionWrapper(functionClassName), children, isUDAFBridgeRequired = true)
    -          udaf.dataType  // Force it to check input data types.
    +            name, hiveUDFWrapper, children, isUDAFBridgeRequired = true)
    +          if (udaf.resolved) {
    +            udaf.dataType  // Force it to check input data types.
    +          }
               udaf
    -        } else if (classOf[GenericUDTF].isAssignableFrom(functionInfo.getFunctionClass)) {
    -          val udtf = HiveGenericUDTF(name, new HiveFunctionWrapper(functionClassName), children)
    -          udtf.elementTypes // Force it to check input data types.
    +        } else if (classOf[GenericUDTF].isAssignableFrom(hiveUDFClass)) {
    +          val udtf = HiveGenericUDTF(name, hiveUDFWrapper, children)
    +          if (udtf.resolved) {
    +            udtf.elementTypes // Force it to check input data types.
    +          }
               udtf
             } else {
    -          throw new AnalysisException(s"No handler for udf ${functionInfo.getFunctionClass}")
    +          throw new AnalysisException(s"No handler for udf ${hiveUDFClass}")
             }
           } catch {
             case analysisException: AnalysisException =>
    -          // If the exception is an AnalysisException, just throw it.
               throw analysisException
             case throwable: Throwable =>
    -          // If there is any other error, we throw an AnalysisException.
    -          val errorMessage = s"No handler for Hive udf ${functionInfo.getFunctionClass} " +
    +          val errorMessage = s"No handler for Hive udf ${hiveUDFClass} " +
                 s"because: ${throwable.getMessage}."
               throw new AnalysisException(errorMessage)
           }
         }
    +    builder
    +  }
    +
    +  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    +    val builder = underlying.lookupFunctionBuilder(name)
    +    if (builder.isDefined) {
    +      builder.get(children)
    +    } else {
    +      // We only look it up to see if it exists, but do not include it in the HiveUDF since it is
    +      // not always serializable.
    +      val optFunctionInfo = Option(getFunctionInfo(name.toLowerCase))
    +      if (optFunctionInfo.isEmpty) {
    +        val catalogFunc = loadHivePermanentFunction(name).getOrElse(
    +            throw new AnalysisException(s"undefined function $name"))
    +
    +        val functionClassName = catalogFunc.className
    +        val (_, builder) = makeFunctionBuilderAndInfo(name, functionClassName)
    +        builder(children)
    +      } else {
    +        val functionInfo = optFunctionInfo.get
    --- End diff --
    
    What's the point of using an option if you're just calling `get` anyway? Just do:
    ```
    val functionInfo = getFunctionInfo(...)
    val builder: FunctionBuilder =
      if (functionInfo != null) {
        val functionClassName = functionInfo.getFunctionClass.getName
        ...
      } else {
        // loadHivePermanentFunction ...
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57793017
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala ---
    @@ -21,27 +21,28 @@ import scala.collection.JavaConverters._
     import org.antlr.v4.runtime.{ParserRuleContext, Token}
     import org.apache.hadoop.hive.conf.HiveConf
     import org.apache.hadoop.hive.conf.HiveConf.ConfVars
    -import org.apache.hadoop.hive.ql.exec.FunctionRegistry
    +import org.apache.hadoop.hive.ql.exec.{FunctionRegistry => HiveFunctionRegistry}
     import org.apache.hadoop.hive.ql.parse.EximUtil
     import org.apache.hadoop.hive.ql.session.SessionState
     import org.apache.hadoop.hive.serde.serdeConstants
     import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe
     
    +import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
     import org.apache.spark.sql.catalyst.catalog.{CatalogColumn, CatalogStorageFormat, CatalogTable, CatalogTableType}
     import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.parser.ng._
     import org.apache.spark.sql.catalyst.parser.ng.SqlBaseParser._
     import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.execution.SparkSqlAstBuilder
    -import org.apache.spark.sql.hive.{CreateTableAsSelect => CTAS, CreateViewAsSelect => CreateView}
    +import org.apache.spark.sql.hive.{CreateTableAsSelect => CTAS, CreateViewAsSelect => CreateView, HiveSessionState}
     import org.apache.spark.sql.hive.{HiveGenericUDTF, HiveSerDe}
     import org.apache.spark.sql.hive.HiveShim.HiveFunctionWrapper
     
     /**
      * Concrete parser for HiveQl statements.
      */
    -object HiveSqlParser extends AbstractSqlParser {
    -  val astBuilder = new HiveSqlAstBuilder
    +case class HiveSqlParser(sessionState: HiveSessionState) extends AbstractSqlParser {
    --- End diff --
    
    I would just make it a real class instead of a case class


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203250405
  
    **[Test build #54487 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54487/consoleFull)** for PR 12036 at commit [`e05b108`](https://github.com/apache/spark/commit/e05b108f9aeedf5d8e3fb5c7145940619811a858).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class HiveSqlParser(functionRegistry: FunctionRegistry) extends AbstractSqlParser `
      * `class HiveSqlAstBuilder(functionRegistry: FunctionRegistry) extends SparkSqlAstBuilder `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203250532
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54487/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203168700
  
    **[Test build #54471 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54471/consoleFull)** for PR 12036 at commit [`ee957db`](https://github.com/apache/spark/commit/ee957db463300c0b3f9e5194d9400d598b29509d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203362601
  
    **[Test build #54512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54512/consoleFull)** for PR 12036 at commit [`acf9299`](https://github.com/apache/spark/commit/acf9299a79684dbb063efa035323726c753c682d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57815890
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
    @@ -86,13 +87,50 @@ case class DescribeDatabase(
         extended: Boolean)(sql: String)
       extends NativeDDLCommand(sql) with Logging
     
    +/**
    + * The DDL command that creates a function.
    + * alias: the class name that implements the created function.
    + * resources: Jars, files, or archives which need to be added to the environment when the function
    + *            is referenced for the first time by a session.
    + * isTemp: indicates if it is a temporary function.
    + */
     case class CreateFunction(
         databaseName: Option[String],
         functionName: String,
         alias: String,
         resources: Seq[(String, String)],
         isTemp: Boolean)(sql: String)
    -  extends NativeDDLCommand(sql) with Logging
    +  extends NativeDDLCommand(sql) with Logging {
    +
    +  override def run(sqlContext: SQLContext): Seq[Row] = {
    +    val func = FunctionIdentifier(functionName, databaseName)
    +    val catalogFunc = CatalogFunction(func, alias)
    +    if (isTemp) {
    +      // Set `ignoreIfExists` to false, so if the temporary function already exists,
    +      // an exception will be thrown.
    +      val (info, builder) =
    +        sqlContext.sessionState.functionRegistry.getFunctionBuilderAndInfo(functionName, alias)
    +      sqlContext.sessionState.catalog.createTempFunction(functionName, info, builder, false)
    --- End diff --
    
    So rename it to `genFunctionBuilderAndInfo`? Actually it is going to generate a function builder and expression info for us in order to create temporary function. But I think it is just the naming problem to make you confusing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57964432
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -83,6 +84,31 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
         checkAnswer(
           sql("SELECT udtf_count2(a) FROM (SELECT 1 AS a FROM src LIMIT 3) t"),
           Row(3) :: Row(3) :: Nil)
    +
    +    sql("DROP TEMPORARY FUNCTION udtf_count2")
    +  }
    +
    +  test("permanent UDTF") {
    +    sql(
    +      s"""
    +        |CREATE FUNCTION udtf_count_temp
    +        |AS 'org.apache.spark.sql.hive.execution.GenericUDTFCount2'
    +        |USING JAR '${hiveContext.getHiveFile("TestUDTF.jar").getCanonicalPath()}'
    +      """.stripMargin)
    +
    +    checkAnswer(
    +      sql("SELECT key, cc FROM src LATERAL VIEW udtf_count_temp(value) dd AS cc"),
    +      Row(97, 500) :: Row(97, 500) :: Nil)
    +
    +    checkAnswer(
    +      sql("SELECT udtf_count_temp(a) FROM (SELECT 1 AS a FROM src LIMIT 3) t"),
    +      Row(3) :: Row(3) :: Nil)
    +
    +    sql("DROP FUNCTION udtf_count_temp")
    --- End diff --
    
    same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57943772
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -52,6 +53,15 @@ trait FunctionRegistry {
       /** Drop a function and return whether the function existed. */
       def dropFunction(name: String): Boolean
     
    +  /* Return the FunctionBuilder and ExpressionInfo for the specified function name and classname. */
    +  def makeFunctionBuilderAndInfo(
    +    name: String,
    +    functionClassName: String): (ExpressionInfo, FunctionBuilder) = {
    +    val clazz = Utils.getContextOrSparkClassLoader.loadClass(functionClassName)
    +    val (_, (info, builder)) =
    +      FunctionRegistry.expression(name, clazz.asInstanceOf[Class[Expression]])
    --- End diff --
    
    Instead I would just throw unsupported operation exception for now (e.g. what I do in #12051). This was never supported in the first place so we don't have to implement it now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202921808
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57961773
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala ---
    @@ -55,61 +57,133 @@ private[hive] class HiveFunctionRegistry(
         }
       }
     
    -  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    -    Try(underlying.lookupFunction(name, children)).getOrElse {
    -      // We only look it up to see if it exists, but do not include it in the HiveUDF since it is
    -      // not always serializable.
    -      val functionInfo: FunctionInfo =
    -        Option(getFunctionInfo(name.toLowerCase)).getOrElse(
    -          throw new AnalysisException(s"undefined function $name"))
    +  def loadHivePermanentFunction(name: String): Option[CatalogFunction] = {
    +    val databaseName = sessionStage.catalog.getCurrentDatabase
    +    val func = FunctionIdentifier(name, Option(databaseName))
    +    val catalogFunc =
    +      if (sessionStage.catalog.listFunctions(databaseName, name).size != 0) {
    +        Some(sessionStage.catalog.getFunction(func))
    +      } else {
    +        None
    +      }
    +    catalogFunc.map(_.resources.foreach { resource =>
    +      resource._1.toLowerCase match {
    +        case "jar" => sessionStage.ctx.addJar(resource._2)
    +        case _ =>
    +          sessionStage.ctx.runSqlHive(s"ADD FILE ${resource._2}")
    +          sessionStage.ctx.sparkContext.addFile(resource._2)
    +      }
    +    })
    +    catalogFunc
    +  }
     
    -      val functionClassName = functionInfo.getFunctionClass.getName
    +  override def makeFunctionBuilderAndInfo(
    +      name: String,
    +      functionClassName: String): (ExpressionInfo, FunctionBuilder) = {
    +    val hiveUDFWrapper = new HiveFunctionWrapper(functionClassName)
    +    val hiveUDFClass = hiveUDFWrapper.createFunction().getClass
    +    val info = new ExpressionInfo(functionClassName, name)
    +    val builder = makeHiveUDFBuilder(name, functionClassName, hiveUDFClass, hiveUDFWrapper)
    +    (info, builder)
    +  }
     
    -      // When we instantiate hive UDF wrapper class, we may throw exception if the input expressions
    -      // don't satisfy the hive UDF, such as type mismatch, input number mismatch, etc. Here we
    -      // catch the exception and throw AnalysisException instead.
    +  /**
    +   * Generates a Spark FunctionBuilder for a Hive UDF which is specified by a given classname.
    +   */
    +  def makeHiveUDFBuilder(
    --- End diff --
    
    private, also just call this `makeFunctionBuilder`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202896697
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54437/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203242329
  
    **[Test build #54486 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54486/consoleFull)** for PR 12036 at commit [`6b76980`](https://github.com/apache/spark/commit/6b769801108d64711fad899e48d08b53903a5f3f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class HadoopFileLinesReader(file: PartitionedFile, conf: Configuration) extends Iterator[Text] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202962322
  
    **[Test build #54442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54442/consoleFull)** for PR 12036 at commit [`b8dda84`](https://github.com/apache/spark/commit/b8dda845dde32202c60db668270e73a0d3513309).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57944538
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala ---
    @@ -55,61 +57,133 @@ private[hive] class HiveFunctionRegistry(
         }
       }
     
    -  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    -    Try(underlying.lookupFunction(name, children)).getOrElse {
    -      // We only look it up to see if it exists, but do not include it in the HiveUDF since it is
    -      // not always serializable.
    -      val functionInfo: FunctionInfo =
    -        Option(getFunctionInfo(name.toLowerCase)).getOrElse(
    -          throw new AnalysisException(s"undefined function $name"))
    +  def loadHivePermanentFunction(name: String): Option[CatalogFunction] = {
    --- End diff --
    
    Make this private. Also please add some java doc to explain what it's doing. Even 1 sentence is better than nothing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57815906
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveSqlParser.scala ---
    @@ -21,27 +21,28 @@ import scala.collection.JavaConverters._
     import org.antlr.v4.runtime.{ParserRuleContext, Token}
     import org.apache.hadoop.hive.conf.HiveConf
     import org.apache.hadoop.hive.conf.HiveConf.ConfVars
    -import org.apache.hadoop.hive.ql.exec.FunctionRegistry
    +import org.apache.hadoop.hive.ql.exec.{FunctionRegistry => HiveFunctionRegistry}
     import org.apache.hadoop.hive.ql.parse.EximUtil
     import org.apache.hadoop.hive.ql.session.SessionState
     import org.apache.hadoop.hive.serde.serdeConstants
     import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe
     
    +import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
     import org.apache.spark.sql.catalyst.catalog.{CatalogColumn, CatalogStorageFormat, CatalogTable, CatalogTableType}
     import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.parser.ng._
     import org.apache.spark.sql.catalyst.parser.ng.SqlBaseParser._
     import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.execution.SparkSqlAstBuilder
    -import org.apache.spark.sql.hive.{CreateTableAsSelect => CTAS, CreateViewAsSelect => CreateView}
    +import org.apache.spark.sql.hive.{CreateTableAsSelect => CTAS, CreateViewAsSelect => CreateView, HiveSessionState}
     import org.apache.spark.sql.hive.{HiveGenericUDTF, HiveSerDe}
     import org.apache.spark.sql.hive.HiveShim.HiveFunctionWrapper
     
     /**
      * Concrete parser for HiveQl statements.
      */
    -object HiveSqlParser extends AbstractSqlParser {
    -  val astBuilder = new HiveSqlAstBuilder
    +case class HiveSqlParser(sessionState: HiveSessionState) extends AbstractSqlParser {
    --- End diff --
    
    ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203215691
  
    @andrewor14 I go through #12051 again. Your main change is in `HiveFunctionRegistry`. I compare our changes in this place and they are nearly doing the same thing, excepts for few differences.
    
    However, in order to make the temporary function work in Spark, instead of passing through Hive, there are many places needed to take care as I did in this PR.
    
    I think to duplicate our works is not making sense. As your PR just begins with few changes. I think my changes in this PR are functional and ready for review. I will suggest we focus on this change.
    
    What do you think?
    
    cc @yhuai @rxin 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202874808
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-202909853
  
    **[Test build #54439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54439/consoleFull)** for PR 12036 at commit [`77848c9`](https://github.com/apache/spark/commit/77848c910a78165416e7c069b25eaaee1b9c5a93).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r58006735
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -67,9 +77,14 @@ class SimpleFunctionRegistry extends FunctionRegistry {
       }
     
       override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    +    val builder = functionBuilders.get(name)
    +    if (builder.isEmpty) {
    +      throw new AnalysisException(s"undefined function $name")
    +    }
         val func = synchronized {
    -      functionBuilders.get(name).map(_._2).getOrElse {
    -        throw new AnalysisException(s"undefined function $name")
    +      Try(builder.map(_._2)) match {
    --- End diff --
    
    Hive's exception will be thrown. I faced the exception thrown in working on the changes. One jenkins test checks the error message. We will miss the actual exception and message and only get AnalysisException with `undefined function` here. I think it is proper to show real error message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203396724
  
    **[Test build #54512 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54512/consoleFull)** for PR 12036 at commit [`acf9299`](https://github.com/apache/spark/commit/acf9299a79684dbb063efa035323726c753c682d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57944268
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala ---
    @@ -55,61 +57,133 @@ private[hive] class HiveFunctionRegistry(
         }
       }
     
    -  override def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    -    Try(underlying.lookupFunction(name, children)).getOrElse {
    -      // We only look it up to see if it exists, but do not include it in the HiveUDF since it is
    -      // not always serializable.
    -      val functionInfo: FunctionInfo =
    -        Option(getFunctionInfo(name.toLowerCase)).getOrElse(
    -          throw new AnalysisException(s"undefined function $name"))
    +  def loadHivePermanentFunction(name: String): Option[CatalogFunction] = {
    +    val databaseName = sessionStage.catalog.getCurrentDatabase
    +    val func = FunctionIdentifier(name, Option(databaseName))
    +    val catalogFunc =
    +      if (sessionStage.catalog.listFunctions(databaseName, name).size != 0) {
    +        Some(sessionStage.catalog.getFunction(func))
    +      } else {
    +        None
    +      }
    +    catalogFunc.map(_.resources.foreach { resource =>
    +      resource._1.toLowerCase match {
    +        case "jar" => sessionStage.ctx.addJar(resource._2)
    +        case _ =>
    +          sessionStage.ctx.runSqlHive(s"ADD FILE ${resource._2}")
    +          sessionStage.ctx.sparkContext.addFile(resource._2)
    +      }
    +    })
    +    catalogFunc
    +  }
     
    -      val functionClassName = functionInfo.getFunctionClass.getName
    +  override def makeFunctionBuilderAndInfo(
    +      name: String,
    +      functionClassName: String): (ExpressionInfo, FunctionBuilder) = {
    +    val hiveUDFWrapper = new HiveFunctionWrapper(functionClassName)
    +    val hiveUDFClass = hiveUDFWrapper.createFunction().getClass
    +    val info = new ExpressionInfo(functionClassName, name)
    --- End diff --
    
    This is the whole reason why you have `makeFunctionBuilderAndInfo`. The signature here is clunky and unnecessary. We can just pass `functionClassName` to the `SessionCatalog` and do that there ourselves when we create a temporary function. Then we don't need all these tuple return types and this can be just `makeFunctionBuilder`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57964405
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -83,6 +84,31 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
         checkAnswer(
           sql("SELECT udtf_count2(a) FROM (SELECT 1 AS a FROM src LIMIT 3) t"),
           Row(3) :: Row(3) :: Nil)
    +
    +    sql("DROP TEMPORARY FUNCTION udtf_count2")
    --- End diff --
    
    you need to do this in a `try finally` block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57817425
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ---
    @@ -52,6 +53,15 @@ trait FunctionRegistry {
       /** Drop a function and return whether the function existed. */
       def dropFunction(name: String): Boolean
     
    +  /** Get the builder of the specified function name and class name. */
    +  def getFunctionBuilderAndInfo(
    --- End diff --
    
    By passing function name and class name, this method will return `FunctionBuilder` and `ExpressionInfo` which are used to create temporary function through `Catalog` API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r57944995
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -557,7 +558,10 @@ private[hive] class HiveClientImpl(
       override def getFunctionOption(
           db: String,
           name: String): Option[CatalogFunction] = withHiveState {
    -    Option(client.getFunction(db, name)).map(fromHiveFunction)
    +    client.getFunction(db, name)
    +    Try {
    +      Option(client.getFunction(db, name)).map(fromHiveFunction)
    --- End diff --
    
    also we shouldn't wrap this around a `Try`, which would swallow exceptions. This should return `None` only if the function doesn't exist, not when it throws exceptions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203394551
  
    **[Test build #54511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54511/consoleFull)** for PR 12036 at commit [`c370c47`](https://github.com/apache/spark/commit/c370c479b388606226b91e19da791f328014559e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203776390
  
    @andrewor14 Thanks for reviewing this. The current changes might need further refactoring. As you said, `HiveFunctionRegistry` now refers to `HiveContext` and `SessionCatalog`. It is a messy one but I  just want it to be functional and prove this approach work. I think I would do refactoring today to make it more clear. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12036#discussion_r58020732
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala ---
    @@ -55,6 +55,7 @@ class HiveQuerySuite extends HiveComparisonTest with BeforeAndAfter {
         TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))
         // Add Locale setting
         Locale.setDefault(Locale.US)
    +    sql("DROP TEMPORARY FUNCTION IF EXISTS udtf_count2")
    --- End diff --
    
    ya. I think it is because `SQLQuerySuite` which creates `udtf_count2` but never drops it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/12036#issuecomment-203162748
  
    @andrewor14 So I think our works are duplicate in some places. Do you want me to wait for #12051 to merge first and then rebase against it? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org