You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "xsys (Jira)" <ji...@apache.org> on 2022/10/03 00:43:00 UTC

[jira] [Created] (SPARK-40637) DataFrame can correctly encode BINARY type but SparkSQL cannot

xsys created SPARK-40637:
----------------------------

             Summary: DataFrame can correctly encode BINARY type but SparkSQL cannot
                 Key: SPARK-40637
                 URL: https://issues.apache.org/jira/browse/SPARK-40637
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.2.1
            Reporter: xsys


h3. Describe the bug

Storing a BINARY value (e.g. {{BigInt("1").toByteArray)}} / {{{}X'01'{}}}) via {{spark-shell}} outputs {{{}[01]{}}}. However, it does not encode correctly if the value is inserted into a BINARY column of a table via {{{}spark-sql{}}}.
h3. To Reproduce

On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-shell{}}}:

 
{code:java}
$SPARK_HOME/bin/spark-shell{code}
 

Execute the following:
{code:java}
scala> val rdd = sc.parallelize(Seq(Row(BigInt("1").toByteArray)))
rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = ParallelCollectionRDD[356] at parallelize at <console>:28
scala> val schema = new StructType().add(StructField("c1", BinaryType, true))
schema: org.apache.spark.sql.types.StructType = StructType(StructField(c1,BinaryType,true))
scala> val df = spark.createDataFrame(rdd, schema)
df: org.apache.spark.sql.DataFrame = [c1: binary]
scala> df.show(false)
+----+
|c1  |
+----+
|[01]|
+----+
{code}
Using {{{}spark-sql{}}}:
{code:java}
$SPARK_HOME/bin/spark-sql{code}
 Execute the following, we only get an empty output:
{code:java}
spark-sql> create table binary_vals(c1 BINARY) stored as ORC;
spark-sql> insert into binary_vals select X'01';
spark-sql> select * from binary_vals;
Time taken: 0.077 seconds, Fetched 1 row(s)
{code}
 
h3. Expected behavior

We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) to behave consistently for the same data type ({{{}BINARY{}}}) & input ({{{}BigInt("1").toByteArray){}}} / {{{}X'01'{}}}) combination.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org