You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2017/10/12 18:46:01 UTC

[jira] [Created] (SPARK-22267) Spark SQL incorrectly reads ORC file when column order is different

Dongjoon Hyun created SPARK-22267:
-------------------------------------

             Summary: Spark SQL incorrectly reads ORC file when column order is different
                 Key: SPARK-22267
                 URL: https://issues.apache.org/jira/browse/SPARK-22267
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.0, 2.1.0, 2.0.2, 1.6.3
            Reporter: Dongjoon Hyun


For a long time, Apache Spark SQL returns incorrect results when ORC file schema is different from metastore schema order.

{code}
scala> Seq(1 -> 2).toDF("c1", "c2").write.format("parquet").mode("overwrite").save("/tmp/p")
scala> Seq(1 -> 2).toDF("c1", "c2").write.format("orc").mode("overwrite").save("/tmp/o")
scala> sql("CREATE EXTERNAL TABLE p(c2 INT, c1 INT) STORED AS parquet LOCATION '/tmp/p'")
scala> sql("CREATE EXTERNAL TABLE o(c2 INT, c1 INT) STORED AS orc LOCATION '/tmp/o'")
scala> spark.table("p").show  // Parquet is good.
+---+---+
| c2| c1|
+---+---+
|  2|  1|
+---+---+
scala> spark.table("o").show    // This is wrong.
+---+---+
| c2| c1|
+---+---+
|  1|  2|
+---+---+
scala> spark.read.orc("/tmp/o").show  // This is correct.
+---+---+
| c1| c2|
+---+---+
|  1|  2|
+---+---+
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org