You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Santiago M. Mola (JIRA)" <ji...@apache.org> on 2015/05/26 19:17:18 UTC

[jira] [Commented] (SPARK-4867) UDF clean up

    [ https://issues.apache.org/jira/browse/SPARK-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559423#comment-14559423 ] 

Santiago M. Mola commented on SPARK-4867:
-----------------------------------------

Maybe this issue can be split in smaller tasks? A lot of built-in functions can be removed from the parser quite easily by registering them in the FunctionRegistry. I am doing this with a lot of fixed-arity functions.

I'm using some helper functions to create FunctionBuilders for Expression for use with the FunctionRegistry. The main helper looks like this:

{code}
  def expression[T <: Expression](arity: Int)(implicit tag: ClassTag[T]): ExpressionBuilder = {
    val argTypes = (1 to arity).map(x => classOf[Expression])
    val constructor = tag.runtimeClass.getDeclaredConstructor(argTypes: _*)
    (expressions: Seq[Expression]) => {
      if (expressions.size != arity) {
        throw new IllegalArgumentException(
          s"Invalid number of arguments: ${expressions.size} (must be equal to $arity)"
        )
      }
      constructor.newInstance(expressions: _*).asInstanceOf[Expression]
    }
  }
{code}

and can be used like this:

{code}
functionRegistry.registerFunction("MY_FUNCTION", expression[MyFunction])
{code}

If this approach looks like what is needed, I can extend it to use expressions with a variable number of parameters. Also, with some syntatic sugar we can provide a function that works this way:

{code}
functionRegistry.registerFunction[MyFunction]
// Register the builder produced by expression[MyFunction] with name "MY_FUNCTION" by using a camelcase -> underscore-separated conversion.
{code}

How does this sound?


> UDF clean up
> ------------
>
>                 Key: SPARK-4867
>                 URL: https://issues.apache.org/jira/browse/SPARK-4867
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>            Priority: Blocker
>
> Right now our support and internal implementation of many functions has a few issues.  Specifically:
>  - UDFS don't know their input types and thus don't do type coercion.
>  - We hard code a bunch of built in functions into the parser.  This is bad because in SQL it creates new reserved words for things that aren't actually keywords.  Also it means that for each function we need to add support to both SQLContext and HiveContext separately.
> For this JIRA I propose we do the following:
>  - Change the interfaces for registerFunction and ScalaUdf to include types for the input arguments as well as the output type.
>  - Add a rule to analysis that does type coercion for UDFs.
>  - Add a parse rule for functions to SQLParser.
>  - Rewrite all the UDFs that are currently hacked into the various parsers using this new functionality.
> Depending on how big this refactoring becomes we could split parts 1&2 from part 3 above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org