You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "CHONGGUANG LIU (JIRA)" <ji...@apache.org> on 2018/06/17 16:13:00 UTC

[jira] [Updated] (SPARK-24574) improve array_contains function of the sql component to deal with Column type

     [ https://issues.apache.org/jira/browse/SPARK-24574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

CHONGGUANG LIU updated SPARK-24574:
-----------------------------------
    Description: 
Hello all,
 
I ran into a use case in project with spark sql and want to share with you some thoughts about the function array_contains.
 
Say I have a Dataframe containing 2 columns. Column A of type "Array of String" and Column B of type "String". I want to determine if the value of column B is contained in the value of column A, without using a udf of course.
The function array_contains came into my mind naturally:
 
 
def array_contains(column: Column, value: Any): Column = withExpr {
 ArrayContains(column.expr, Literal(value))
}
 
However the function takes the column B and does a "Literal" of column B, which yields a runtime exception: RuntimeException("Unsupported literal type " + v.getClass + " " + v).
 
Then after discussion with my friends, we fund a solution without using udf:
new Column(ArrayContains(col("ColumnA").expr, col("ColumnB").expr) 
 
With this solution, I think of empowering a little bit more the function, by doing like this:
def array_contains(column: Column, value: Any): Column = withExpr {
  value match {
    case c: Column => ArrayContains(column.expr, c.expr)
    case _ => ArrayContains(column.expr, Literal(value))
  }
}
 
It does a pattern matching to detect if value is of type Column. If yes, it will use the .expr of the column, otherwise it will work as it used to.
 
 

> improve array_contains function of the sql component to deal with Column type
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-24574
>                 URL: https://issues.apache.org/jira/browse/SPARK-24574
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.2
>            Reporter: CHONGGUANG LIU
>            Priority: Major
>
> Hello all,
>  
> I ran into a use case in project with spark sql and want to share with you some thoughts about the function array_contains.
>  
> Say I have a Dataframe containing 2 columns. Column A of type "Array of String" and Column B of type "String". I want to determine if the value of column B is contained in the value of column A, without using a udf of course.
> The function array_contains came into my mind naturally:
>  
>  
> def array_contains(column: Column, value: Any): Column = withExpr {
>  ArrayContains(column.expr, Literal(value))
> }
>  
> However the function takes the column B and does a "Literal" of column B, which yields a runtime exception: RuntimeException("Unsupported literal type " + v.getClass + " " + v).
>  
> Then after discussion with my friends, we fund a solution without using udf:
> new Column(ArrayContains(col("ColumnA").expr, col("ColumnB").expr) 
>  
> With this solution, I think of empowering a little bit more the function, by doing like this:
> def array_contains(column: Column, value: Any): Column = withExpr {
>   value match {
>     case c: Column => ArrayContains(column.expr, c.expr)
>     case _ => ArrayContains(column.expr, Literal(value))
>   }
> }
>  
> It does a pattern matching to detect if value is of type Column. If yes, it will use the .expr of the column, otherwise it will work as it used to.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org