You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shubham Chopra (JIRA)" <ji...@apache.org> on 2017/07/07 20:16:00 UTC
[jira] [Created] (SPARK-21344) BinaryType comparison does signed
byte array comparison
Shubham Chopra created SPARK-21344:
--------------------------------------
Summary: BinaryType comparison does signed byte array comparison
Key: SPARK-21344
URL: https://issues.apache.org/jira/browse/SPARK-21344
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.1.1
Reporter: Shubham Chopra
BinaryType used by Spark SQL defines ordering using signed byte comparisons. This can lead to unexpected behavior. Consider the following code snippet that shows this error:
{code:scala}
case class TestRecord(col0: Array[Byte])
def convertToBytes(i: Long): Array[Byte] = {
val bb = java.nio.ByteBuffer.allocate(8)
bb.putLong(i)
bb.array
}
def test = {
val sql = spark.sqlContext
import sql.implicits._
val timestamp = 1498772083037L
val data = (timestamp to timestamp + 1000L).map(i => TestRecord(convertToBytes(i)))
val testDF = sc.parallelize(data).toDF
val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && col("col0") < convertToBytes(timestamp + 50L))
val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 50L) && col("col0") < convertToBytes(timestamp + 100L))
val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && col("col0") < convertToBytes(timestamp + 100L))
assert(filter1.count == 50)
assert(filter2.count == 50)
assert(filter3.count == 100)
}
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org