You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jesse English (JIRA)" <ji...@apache.org> on 2016/01/19 22:23:39 UTC

[jira] [Created] (SPARK-12911) Cacheing a dataframe causes array comparisons to fail (in filter / where) after 1.6

Jesse English created SPARK-12911:
-------------------------------------

             Summary: Cacheing a dataframe causes array comparisons to fail (in filter / where) after 1.6
                 Key: SPARK-12911
                 URL: https://issues.apache.org/jira/browse/SPARK-12911
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.6.0
         Environment: OSX 10.11.1, Scala 2.11.7, Spark 1.6.0
            Reporter: Jesse English


When doing a *where* operation on a dataframe and testing for equality on an array type, after 1.6 no valid comparisons are made if the dataframe has been cached.  If it has not been cached, the results are as expected.

This appears to be related to the underlying unsafe array data types.

{code:title=test.scala|borderStyle=solid}
test("test array comparison") {

    val vectors: Vector[Row] =  Vector(
      Row.fromTuple("id_1" -> Array(0L, 2L)),
      Row.fromTuple("id_2" -> Array(0L, 5L)),
      Row.fromTuple("id_3" -> Array(0L, 9L)),
      Row.fromTuple("id_4" -> Array(1L, 0L)),
      Row.fromTuple("id_5" -> Array(1L, 8L)),
      Row.fromTuple("id_6" -> Array(2L, 4L)),
      Row.fromTuple("id_7" -> Array(5L, 6L)),
      Row.fromTuple("id_8" -> Array(6L, 2L)),
      Row.fromTuple("id_9" -> Array(7L, 0L))
    )
    val data: RDD[Row] = sc.parallelize(vectors, 3)

    val schema = StructType(
      StructField("id", StringType, false) ::
        StructField("point", DataTypes.createArrayType(LongType, false), false) ::
        Nil
    )

    val sqlContext = new SQLContext(sc)
    val dataframe = sqlContext.createDataFrame(data, schema)

    val targetPoint:Array[Long] = Array(0L,9L)

    //Cacheing is the trigger to cause the error (no cacheing causes no error)
    dataframe.cache()

    //This is the line where it fails
    //java.util.NoSuchElementException: next on empty iterator
    //However we know that there is a valid match
    val targetRow = dataframe.where(dataframe("point") === array(targetPoint.map(value => lit(value)): _*)).first()

    assert(targetRow != null)
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org