You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by filthysocks <js...@uos.de> on 2016/10/20 08:54:04 UTC

Where condition on columns of Arrays does no longer work in spark 2

I have a Column in a DataFrame that contains Arrays and I wanna filter for
equality. It does work fine in spark 1.6 but not in 2.0In spark 1.6.2:
import org.apache.spark.sql.SQLContextcase class DataTest(lists:
Seq[Int])val sql = new SQLContext(sc)val data =
sql.createDataFrame(sc.parallelize(Seq(		DataTest(Seq(1)),     	       
DataTest(Seq(4,5,6))     	      
)))data.registerTempTable("uiae")sql.sql(s"SELECT lists FROM uiae WHERE
lists=Array(1)").collect().foreach(println)
returns:[WrappedArray(1)] 
In spark 2.0.0:
import spark.implicits._case class DataTest(lists: Seq[Int])val data =
Seq(DataTest(Seq(1)),DataTest(Seq(4,5,6))).toDS()data.createOrReplaceTempView("uiae")spark.sql(s"SELECT
lists FROM uiae WHERE lists=Array(1)").collect().foreach(println)
returns: nothing

Is that a bug? Or is it just done differently in spark 2?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Where-condition-on-columns-of-Arrays-does-no-longer-work-in-spark-2-tp27926.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Where condition on columns of Arrays does no longer work in spark 2

Posted by Cheng Lian <li...@gmail.com>.
Thanks for reporting! It's a bug, just filed a ticket to track it:

https://issues.apache.org/jira/browse/SPARK-18053

Cheng


On 10/20/16 1:54 AM, filthysocks wrote:
> I have a Column in a DataFrame that contains Arrays and I wanna filter 
> for equality. It does work fine in spark 1.6 but not in 2.0 In spark 
> 1.6.2:
> import org.apache.spark.sql.SQLContext
>
> case class DataTest(lists: Seq[Int])
>
> val sql = new SQLContext(sc)
> val data = sql.createDataFrame(sc.parallelize(Seq(
> 		DataTest(Seq(1)),
>       	        DataTest(Seq(4,5,6))
>       	       )))
> data.registerTempTable("uiae")
> sql.sql(s"SELECT lists FROM uiae WHERE lists=Array(1)").collect().foreach(println)
> returns:[WrappedArray(1)]
> In spark 2.0.0:
> import spark.implicits._
>
> case class DataTest(lists: Seq[Int])
> val data = Seq(DataTest(Seq(1)),DataTest(Seq(4,5,6))).toDS()
>
> data.createOrReplaceTempView("uiae")
> spark.sql(s"SELECT lists FROM uiae WHERE lists=Array(1)").collect().foreach(println)
> returns: nothing
>
> Is that a bug? Or is it just done differently in spark 2?
> ------------------------------------------------------------------------
> View this message in context: Where condition on columns of Arrays 
> does no longer work in spark 2 
> <http://apache-spark-user-list.1001560.n3.nabble.com/Where-condition-on-columns-of-Arrays-does-no-longer-work-in-spark-2-tp27926.html>
> Sent from the Apache Spark User List mailing list archive 
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.