You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2016/07/25 09:50:20 UTC

[jira] [Commented] (SPARK-16704) Union does not work for column with array byte

    [ https://issues.apache.org/jira/browse/SPARK-16704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391620#comment-15391620 ] 

Dongjoon Hyun commented on SPARK-16704:
---------------------------------------

Hi, [~jiunnjye].
It seems you are reporting on Spark 1.6.0. Could you test that on 1.6.2 or 2.0.0? It seems to work for me in current master branch like the following.
{code}
scala> import java.nio.charset.StandardCharsets
scala> Seq("12".getBytes(StandardCharsets.UTF_8)).toDF("a").write.parquet("/tmp/t1")
scala> Seq("34".getBytes(StandardCharsets.UTF_8)).toDF("b").write.parquet("/tmp/t2")
scala> val df1 = spark.read.parquet("/tmp/t1")
df1: org.apache.spark.sql.DataFrame = [a: binary]
scala> val df2 = spark.read.parquet("/tmp/t2")
df2: org.apache.spark.sql.DataFrame = [b: binary]
scala> df1.createOrReplaceTempView("binary1")
scala> df2.createOrReplaceTempView("binary2")
scala> sql("SELECT a FROM binary1 UNION SELECT b FROM binary2").show()
+-------+
|      a|
+-------+
|[33 34]|
|[31 32]|
+-------+
{code}

If this is not your scenario, please let me know. Also, if you provide some sample code, that will be great.

> Union does not work for column with array byte 
> -----------------------------------------------
>
>                 Key: SPARK-16704
>                 URL: https://issues.apache.org/jira/browse/SPARK-16704
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Ng Jiunn Jye
>
> When union 2 query with columns having array of bytes datatype, spark query fail with exception.
> Example :
> select binaryColumn from tableA
>  union
> select binaryColumn from tableB
> Note that  spark properties "spark.sql.parquet.binaryAsString" is set to true
> org.apache.spark.sql.AnalysisException: unresolved operator 'Union;
>         at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) ~[iop-spark-client.spark-catalyst_2.11-1.6.0.jar:1.6.0]
>         at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44) ~[iop-spark-client.spark-catalyst_2.11-1.6.0.jar:1.6.0]
>         at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:203) ~[iop-spark-client.spark-catalyst_2.11-1.6.0.jar:1.6.0]
>         at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50) ~[iop-spark-client.spark-catalyst_2.11-1.6.0.jar:1.6.0]
>         at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:105) ~[iop-spark-client.spark-catalyst_2.11-1.6.0.jar:1.6.0]
>         at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:104) ~[iop-spark-client.spark-catalyst_2.11-1.6.0.jar:1.6.0]
>         at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:104) ~[iop-spark-client.spark-catalyst_2.11-1.6.0.jar:1.6.0]
>         at scala.collection.immutable.List.foreach(List.scala:381) ~[org.scala-lang.scala-library-2.11.8.jar:na]
>         at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:104) ~[iop-spark-client.spark-catalyst_2.11-1.6.0.jar:1.6.0]
>         at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50) ~[iop-spark-client.spark-catalyst_2.11-1.6.0.jar:1.6.0]
>         at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44) ~[iop-spark-client.spark-catalyst_2.11-1.6.0.jar:1.6.0]
>         at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) ~[iop-spark-client.spark-sql_2.11-1.6.0.jar:1.6.0]
>         at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133) ~[iop-spark-client.spark-sql_2.11-1.6.0.jar:1.6.0]
>         at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) ~[iop-spark-client.spark-sql_2.11-1.6.0.jar:1.6.0]
>         at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) ~[iop-spark-client.spark-sql_2.11-1.6.0.jar:1.6.0]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org