You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shubham Chopra (JIRA)" <ji...@apache.org> on 2017/07/07 20:16:00 UTC

[jira] [Created] (SPARK-21344) BinaryType comparison does signed byte array comparison

Shubham Chopra created SPARK-21344:
--------------------------------------

             Summary: BinaryType comparison does signed byte array comparison
                 Key: SPARK-21344
                 URL: https://issues.apache.org/jira/browse/SPARK-21344
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.1
            Reporter: Shubham Chopra


BinaryType used by Spark SQL defines ordering using signed byte comparisons. This can lead to unexpected behavior. Consider the following code snippet that shows this error:

{code:scala}
case class TestRecord(col0: Array[Byte])
def convertToBytes(i: Long): Array[Byte] = {
    val bb = java.nio.ByteBuffer.allocate(8)
    bb.putLong(i)
    bb.array
  }
def test = {
    val sql = spark.sqlContext
    import sql.implicits._
    val timestamp = 1498772083037L
    val data = (timestamp to timestamp + 1000L).map(i => TestRecord(convertToBytes(i)))
    val testDF = sc.parallelize(data).toDF
    val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && col("col0") < convertToBytes(timestamp + 50L))
    val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 50L) && col("col0") < convertToBytes(timestamp + 100L))
    val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && col("col0") < convertToBytes(timestamp + 100L))
    assert(filter1.count == 50)
    assert(filter2.count == 50)
    assert(filter3.count == 100)
}
{code}






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org