You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "kevin yu (JIRA)" <ji...@apache.org> on 2016/01/20 16:28:39 UTC

[jira] [Commented] (SPARK-12911) Cacheing a dataframe causes array comparisons to fail (in filter / where) after 1.6

    [ https://issues.apache.org/jira/browse/SPARK-12911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108715#comment-15108715 ] 

kevin yu commented on SPARK-12911:
----------------------------------

I will look into this . Thanks.
Kevin

> Cacheing a dataframe causes array comparisons to fail (in filter / where) after 1.6
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-12911
>                 URL: https://issues.apache.org/jira/browse/SPARK-12911
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>         Environment: OSX 10.11.1, Scala 2.11.7, Spark 1.6.0
>            Reporter: Jesse English
>
> When doing a *where* operation on a dataframe and testing for equality on an array type, after 1.6 no valid comparisons are made if the dataframe has been cached.  If it has not been cached, the results are as expected.
> This appears to be related to the underlying unsafe array data types.
> {code:title=test.scala|borderStyle=solid}
> test("test array comparison") {
>     val vectors: Vector[Row] =  Vector(
>       Row.fromTuple("id_1" -> Array(0L, 2L)),
>       Row.fromTuple("id_2" -> Array(0L, 5L)),
>       Row.fromTuple("id_3" -> Array(0L, 9L)),
>       Row.fromTuple("id_4" -> Array(1L, 0L)),
>       Row.fromTuple("id_5" -> Array(1L, 8L)),
>       Row.fromTuple("id_6" -> Array(2L, 4L)),
>       Row.fromTuple("id_7" -> Array(5L, 6L)),
>       Row.fromTuple("id_8" -> Array(6L, 2L)),
>       Row.fromTuple("id_9" -> Array(7L, 0L))
>     )
>     val data: RDD[Row] = sc.parallelize(vectors, 3)
>     val schema = StructType(
>       StructField("id", StringType, false) ::
>         StructField("point", DataTypes.createArrayType(LongType, false), false) ::
>         Nil
>     )
>     val sqlContext = new SQLContext(sc)
>     val dataframe = sqlContext.createDataFrame(data, schema)
>     val targetPoint:Array[Long] = Array(0L,9L)
>     //Cacheing is the trigger to cause the error (no cacheing causes no error)
>     dataframe.cache()
>     //This is the line where it fails
>     //java.util.NoSuchElementException: next on empty iterator
>     //However we know that there is a valid match
>     val targetRow = dataframe.where(dataframe("point") === array(targetPoint.map(value => lit(value)): _*)).first()
>     assert(targetRow != null)
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org