You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2017/10/11 03:54:01 UTC

[GitHub] spark pull request #19470: [SPARK-18355][SQL] Use Spark schema to read ORC t...

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/19470

    [SPARK-18355][SQL] Use Spark schema to read ORC table instead of ORC file schema

    ## What changes were proposed in this pull request?
    
    Before Hive 2.0, ORC File schema has invalid column names like `_col1` and `col2`. This PR ignores ORC File schema and use Spark schema.
    
    ## How was this patch tested?
    
    Pass the newly added test case.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-18355

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19470.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19470
    
----
commit d11ce09d07f36f39c4a8a92c4f0ff0f19a35a724
Author: Dongjoon Hyun <do...@apache.org>
Date:   2017-10-11T02:12:31Z

    [SPARK-18355][SQL] Use Spark schema to read ORC table instead of ORC file schema

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19470#discussion_r144465575
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -2050,4 +2050,60 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
           }
         }
       }
    +
    +  test("SPARK-18355 Use Spark schema to read ORC table instead of ORC file schema") {
    --- End diff --
    
    since it depends on the CONVERT_METASTORE_XXX conf, maybe also test parquet.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19470#discussion_r144466637
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -2050,4 +2050,60 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
           }
         }
       }
    +
    +  test("SPARK-18355 Use Spark schema to read ORC table instead of ORC file schema") {
    --- End diff --
    
    Yep. I'll add `parquet`, too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-18355][SQL] Use Spark schema to read ORC table in...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82620/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema to read...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    **[Test build #82701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82701/testReport)** for PR 19470 at commit [`8ac1acf`](https://github.com/apache/spark/commit/8ac1acfad74eba27a7123e04ca14682bff59a20b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema to read...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82701/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19470#discussion_r144387023
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -2050,4 +2050,80 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
           }
         }
       }
    +
    +  test("SPARK-18355 Use Spark schema to read ORC table instead of ORC file schema") {
    +    val client = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
    +
    +    Seq("true", "false").foreach { value =>
    +      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> value) {
    +        withTempDatabase { db =>
    +          client.runSqlHive(
    +            s"""
    +               |CREATE TABLE $db.t(
    +               |  click_id string,
    +               |  search_id string,
    +               |  uid bigint)
    +               |PARTITIONED BY (
    +               |  ts string,
    +               |  hour string)
    +               |STORED AS ORC
    +             """.stripMargin)
    +
    +          client.runSqlHive(
    +            s"""
    +               |INSERT INTO TABLE $db.t
    +               |PARTITION (ts = '98765', hour = '01')
    +               |VALUES (12, 2, 12345)
    +             """.stripMargin
    +          )
    +
    +          checkAnswer(
    +            sql(s"SELECT * FROM $db.t"),
    +            Row("12", "2", 12345, "98765", "01"))
    +
    +          client.runSqlHive(s"ALTER TABLE $db.t ADD COLUMNS (dummy string)")
    +
    +          checkAnswer(
    +            sql(s"SELECT click_id, search_id FROM $db.t"),
    +            Row("12", "2"))
    +
    +          checkAnswer(
    +            sql(s"SELECT search_id, click_id FROM $db.t"),
    +            Row("2", "12"))
    +
    +          checkAnswer(
    +            sql(s"SELECT search_id FROM $db.t"),
    +            Row("2"))
    +
    +          checkAnswer(
    +            sql(s"SELECT dummy, click_id FROM $db.t"),
    +            Row(null, "12"))
    +
    +          checkAnswer(
    +            sql(s"SELECT * FROM $db.t"),
    +            Row("12", "2", 12345, null, "98765", "01"))
    +        }
    +      }
    +    }
    +  }
    +
    +  // This test case is added to prevent regression.
    +  test("SPARK-22267 Spark SQL incorrectly reads ORC files when column order is different") {
    --- End diff --
    
    This is added to prevent regression according to your request, @gatorsmile ~


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema to read...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Hi, @gatorsmile and @cloud-fan .
    Could you review this PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19470#discussion_r144457235
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
    @@ -138,8 +138,7 @@ class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable
           if (maybePhysicalSchema.isEmpty) {
    --- End diff --
    
    nit
    ```
    val isEmptyFile = OrcFileOperator.readSchema(Seq(file.filePath), Some(conf)).isEmpty
    if (isEmptyFile) {
      ...
    } else ...
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema to read...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Hi, @gatorsmile .
    - **For mismatched column orders**: [SPARK-22267](https://issues.apache.org/jira/browse/SPARK-22267) shows the current status of ORC.
    - **For mismatched column types**: Parquet also does not support.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema to read...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Ya, that was my question, too.
    - What kind of difference does Spark support, especially in ORC? Apache Spark only supports HiveFileFormat so far, not old OrcFileFormat.
    - In addition, there is no Schema Merging. Randomly (usually the bigging ORC file?), the first correct ORC file schema is used now. For old ORC cases, those are meaningless like `_colX`. For me, HiveMetastore schema is the only valid one in Apache Spark.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    thanks, merging to master/2.2!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19470


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19470#discussion_r144457977
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -2050,4 +2050,80 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
           }
         }
       }
    +
    +  test("SPARK-18355 Use Spark schema to read ORC table instead of ORC file schema") {
    +    val client = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
    +
    +    Seq("true", "false").foreach { value =>
    +      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> value) {
    +        withTempDatabase { db =>
    +          client.runSqlHive(
    +            s"""
    +               |CREATE TABLE $db.t(
    +               |  click_id string,
    +               |  search_id string,
    +               |  uid bigint)
    +               |PARTITIONED BY (
    +               |  ts string,
    +               |  hour string)
    +               |STORED AS ORC
    +             """.stripMargin)
    +
    +          client.runSqlHive(
    +            s"""
    +               |INSERT INTO TABLE $db.t
    +               |PARTITION (ts = '98765', hour = '01')
    +               |VALUES (12, 2, 12345)
    +             """.stripMargin
    +          )
    +
    +          checkAnswer(
    +            sql(s"SELECT * FROM $db.t"),
    +            Row("12", "2", 12345, "98765", "01"))
    +
    +          client.runSqlHive(s"ALTER TABLE $db.t ADD COLUMNS (dummy string)")
    +
    +          checkAnswer(
    +            sql(s"SELECT click_id, search_id FROM $db.t"),
    +            Row("12", "2"))
    +
    +          checkAnswer(
    +            sql(s"SELECT search_id, click_id FROM $db.t"),
    +            Row("2", "12"))
    +
    +          checkAnswer(
    +            sql(s"SELECT search_id FROM $db.t"),
    +            Row("2"))
    +
    +          checkAnswer(
    +            sql(s"SELECT dummy, click_id FROM $db.t"),
    +            Row(null, "12"))
    +
    +          checkAnswer(
    +            sql(s"SELECT * FROM $db.t"),
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-18355][SQL] Use Spark schema to read ORC table in...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    **[Test build #82620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82620/testReport)** for PR 19470 at commit [`d11ce09`](https://github.com/apache/spark/commit/d11ce09d07f36f39c4a8a92c4f0ff0f19a35a724).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19470#discussion_r144469314
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -2050,4 +2050,64 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
           }
         }
       }
    +
    +  Seq("orc", "parquet").foreach { format =>
    +    test(s"SPARK-18355 Read data from a hive table with a new column - $format") {
    +      val client = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
    +
    +      Seq("true", "false").foreach { value =>
    +        withSQLConf(
    +          HiveUtils.CONVERT_METASTORE_ORC.key -> value,
    +          HiveUtils.CONVERT_METASTORE_PARQUET.key -> value) {
    --- End diff --
    
    As you separate orc and parquet to two test in fact, maybe you just need to test against one config at one time, i.e., orc -> HiveUtils.CONVERT_METASTORE_ORC, parquet -> HiveUtils.CONVERT_METASTORE_PARQUET.key.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82718/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema to read...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19470#discussion_r144457932
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -2050,4 +2050,80 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
           }
         }
       }
    +
    +  test("SPARK-18355 Use Spark schema to read ORC table instead of ORC file schema") {
    +    val client = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
    +
    +    Seq("true", "false").foreach { value =>
    +      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> value) {
    +        withTempDatabase { db =>
    +          client.runSqlHive(
    +            s"""
    +               |CREATE TABLE $db.t(
    +               |  click_id string,
    +               |  search_id string,
    +               |  uid bigint)
    +               |PARTITIONED BY (
    +               |  ts string,
    +               |  hour string)
    +               |STORED AS ORC
    +             """.stripMargin)
    +
    +          client.runSqlHive(
    +            s"""
    +               |INSERT INTO TABLE $db.t
    +               |PARTITION (ts = '98765', hour = '01')
    +               |VALUES (12, 2, 12345)
    +             """.stripMargin
    +          )
    +
    +          checkAnswer(
    +            sql(s"SELECT * FROM $db.t"),
    --- End diff --
    
    please list all columns here instead of `*`, to make the test more clear


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema to read...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    I remember we previously hit multiple issues due to the schema difference between the actual orc-file schema and the metastore schema. Just ensure it still exists. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Thank you so much, @cloud-fan , @gatorsmile , and @viirya !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    **[Test build #82724 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82724/testReport)** for PR 19470 at commit [`8e7fe9b`](https://github.com/apache/spark/commit/8e7fe9b9e3fc6121686caad45dcfdb4ff08f0c4a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19470#discussion_r144457796
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
    @@ -272,25 +272,35 @@ private[orc] object OrcRelation extends HiveInspectors {
       def unwrapOrcStructs(
           conf: Configuration,
           dataSchema: StructType,
    +      requiredSchema: StructType,
           maybeStructOI: Option[StructObjectInspector],
           iterator: Iterator[Writable]): Iterator[InternalRow] = {
         val deserializer = new OrcSerde
    -    val mutableRow = new SpecificInternalRow(dataSchema.map(_.dataType))
    -    val unsafeProjection = UnsafeProjection.create(dataSchema)
    +    val mutableRow = new SpecificInternalRow(requiredSchema.map(_.dataType))
    +    val unsafeProjection = UnsafeProjection.create(requiredSchema)
     
         def unwrap(oi: StructObjectInspector): Iterator[InternalRow] = {
    -      val (fieldRefs, fieldOrdinals) = dataSchema.zipWithIndex.map {
    -        case (field, ordinal) => oi.getStructFieldRef(field.name) -> ordinal
    +      val (fieldRefs, fieldOrdinals) = requiredSchema.zipWithIndex.map {
    +        case (field, ordinal) =>
    +          var ref = oi.getStructFieldRef(field.name)
    +          if (ref == null) {
    +            val maybeIndex = dataSchema.getFieldIndex(field.name)
    +            if (maybeIndex.isDefined) {
    +              ref = oi.getStructFieldRef("_col" + maybeIndex.get)
    +            }
    +          }
    +          ref -> ordinal
           }.unzip
     
    -      val unwrappers = fieldRefs.map(unwrapperFor)
    +      val unwrappers = fieldRefs.map(r => if (r == null) null else unwrapperFor(r))
     
           iterator.map { value =>
             val raw = deserializer.deserialize(value)
             var i = 0
             val length = fieldRefs.length
             while (i < length) {
    -          val fieldValue = oi.getStructFieldData(raw, fieldRefs(i))
    +          val fieldRef = fieldRefs(i)
    +          val fieldValue = if (fieldRef == null) null else oi.getStructFieldData(raw, fieldRefs(i))
               if (fieldValue == null) {
    --- End diff --
    
    nit:
    ```
    if (fieldRef == null) {
      row.setNull...
    } else {
      val fieldValue = ...
      ...
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    BTW, @cloud-fan . Could you review #18460 , too? I think we need your final approval. :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82724/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema to read...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    LGTM except some minor comments


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema to read...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    @cloud-fan . Thank you so much for review!
    I updated the PR except one: If`fieldValue` is `null`, we also use `setNull` again in `else`. So, the current one is simpler.
    ```scala
    if (fieldRef == null) {
      row.setNull...
    } else {
      val fieldValue = ...
      ...
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-18355][SQL] Use Spark schema to read ORC table in...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    **[Test build #82620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82620/testReport)** for PR 19470 at commit [`d11ce09`](https://github.com/apache/spark/commit/d11ce09d07f36f39c4a8a92c4f0ff0f19a35a724).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82720/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19470#discussion_r144457357
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
    @@ -138,8 +138,7 @@ class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable
           if (maybePhysicalSchema.isEmpty) {
             Iterator.empty
           } else {
    -        val physicalSchema = maybePhysicalSchema.get
    -        OrcRelation.setRequiredColumns(conf, physicalSchema, requiredSchema)
    +        OrcRelation.setRequiredColumns(conf, dataSchema, requiredSchema)
    --- End diff --
    
    does it work? seems here we lie to the orc reader about the physical schema.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema to read...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    **[Test build #82701 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82701/testReport)** for PR 19470 at commit [`8ac1acf`](https://github.com/apache/spark/commit/8ac1acfad74eba27a7123e04ca14682bff59a20b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-18355][SQL] Use Spark schema to read ORC table in...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19470#discussion_r144458108
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -2050,4 +2050,80 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
           }
         }
       }
    +
    +  test("SPARK-18355 Use Spark schema to read ORC table instead of ORC file schema") {
    +    val client = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
    +
    +    Seq("true", "false").foreach { value =>
    +      withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> value) {
    +        withTempDatabase { db =>
    +          client.runSqlHive(
    +            s"""
    +               |CREATE TABLE $db.t(
    +               |  click_id string,
    +               |  search_id string,
    +               |  uid bigint)
    +               |PARTITIONED BY (
    +               |  ts string,
    +               |  hour string)
    +               |STORED AS ORC
    +             """.stripMargin)
    +
    +          client.runSqlHive(
    +            s"""
    +               |INSERT INTO TABLE $db.t
    +               |PARTITION (ts = '98765', hour = '01')
    +               |VALUES (12, 2, 12345)
    +             """.stripMargin
    +          )
    +
    +          checkAnswer(
    +            sql(s"SELECT * FROM $db.t"),
    +            Row("12", "2", 12345, "98765", "01"))
    +
    +          client.runSqlHive(s"ALTER TABLE $db.t ADD COLUMNS (dummy string)")
    +
    +          checkAnswer(
    +            sql(s"SELECT click_id, search_id FROM $db.t"),
    +            Row("12", "2"))
    +
    +          checkAnswer(
    +            sql(s"SELECT search_id, click_id FROM $db.t"),
    +            Row("2", "12"))
    +
    +          checkAnswer(
    +            sql(s"SELECT search_id FROM $db.t"),
    +            Row("2"))
    +
    +          checkAnswer(
    +            sql(s"SELECT dummy, click_id FROM $db.t"),
    +            Row(null, "12"))
    +
    +          checkAnswer(
    +            sql(s"SELECT * FROM $db.t"),
    +            Row("12", "2", 12345, null, "98765", "01"))
    +        }
    +      }
    +    }
    +  }
    +
    +  // This test case is added to prevent regression.
    +  test("SPARK-22267 Spark SQL incorrectly reads ORC files when column order is different") {
    --- End diff --
    
    it's weird to have a test verifying a bug, I think it's good enough to have a JIRA tracking this bug.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    LGTM, pending jenkins


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema to read...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    To be clear, I'll file another JIRA about ORC status on mismatched column orders.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema to read...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Could you create test cases with the different schemas between files and hive metastore.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    One minor comment doesn't affect this. LGTM.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    **[Test build #82724 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82724/testReport)** for PR 19470 at commit [`8e7fe9b`](https://github.com/apache/spark/commit/8e7fe9b9e3fc6121686caad45dcfdb4ff08f0c4a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    **[Test build #82720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82720/testReport)** for PR 19470 at commit [`8e7fe9b`](https://github.com/apache/spark/commit/8e7fe9b9e3fc6121686caad45dcfdb4ff08f0c4a).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    **[Test build #82718 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82718/testReport)** for PR 19470 at commit [`ef2123e`](https://github.com/apache/spark/commit/ef2123ecc516fce6feb2a4abe051b4ef862c51a0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema to read...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Thank you for review, @gatorsmile .
    Sure. I assume that you want to check the regression here. Could you tell me the degree of difference?
    
    Here, this PR is focusing on missing-columns scenario after `ADD COLUMNS`. This is a typical use cases by customers.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    **[Test build #82720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82720/testReport)** for PR 19470 at commit [`8e7fe9b`](https://github.com/apache/spark/commit/8e7fe9b9e3fc6121686caad45dcfdb4ff08f0c4a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19470#discussion_r144465324
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -2050,4 +2050,60 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
           }
         }
       }
    +
    +  test("SPARK-18355 Use Spark schema to read ORC table instead of ORC file schema") {
    --- End diff --
    
    Improve the test case for checking the other formats?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema to read...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    **[Test build #82718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82718/testReport)** for PR 19470 at commit [`ef2123e`](https://github.com/apache/spark/commit/ef2123ecc516fce6feb2a4abe051b4ef862c51a0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19470#discussion_r144457479
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
    @@ -272,25 +272,35 @@ private[orc] object OrcRelation extends HiveInspectors {
       def unwrapOrcStructs(
           conf: Configuration,
           dataSchema: StructType,
    +      requiredSchema: StructType,
           maybeStructOI: Option[StructObjectInspector],
           iterator: Iterator[Writable]): Iterator[InternalRow] = {
         val deserializer = new OrcSerde
    -    val mutableRow = new SpecificInternalRow(dataSchema.map(_.dataType))
    -    val unsafeProjection = UnsafeProjection.create(dataSchema)
    +    val mutableRow = new SpecificInternalRow(requiredSchema.map(_.dataType))
    +    val unsafeProjection = UnsafeProjection.create(requiredSchema)
     
         def unwrap(oi: StructObjectInspector): Iterator[InternalRow] = {
    -      val (fieldRefs, fieldOrdinals) = dataSchema.zipWithIndex.map {
    -        case (field, ordinal) => oi.getStructFieldRef(field.name) -> ordinal
    +      val (fieldRefs, fieldOrdinals) = requiredSchema.zipWithIndex.map {
    +        case (field, ordinal) =>
    +          var ref = oi.getStructFieldRef(field.name)
    +          if (ref == null) {
    +            val maybeIndex = dataSchema.getFieldIndex(field.name)
    --- End diff --
    
    the `requiredSchema` is guaranteed to be contained in the `dataSchema`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19470: [SPARK-14387][SPARK-18355][SQL] Use Spark schema ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19470#discussion_r144457643
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
    @@ -138,8 +138,7 @@ class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable
           if (maybePhysicalSchema.isEmpty) {
             Iterator.empty
           } else {
    -        val physicalSchema = maybePhysicalSchema.get
    -        OrcRelation.setRequiredColumns(conf, physicalSchema, requiredSchema)
    +        OrcRelation.setRequiredColumns(conf, dataSchema, requiredSchema)
    --- End diff --
    
    oh i see, we only need to pass the required column indices to orc reader.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19470#discussion_r144472506
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -2050,4 +2050,64 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
           }
         }
       }
    +
    +  Seq("orc", "parquet").foreach { format =>
    +    test(s"SPARK-18355 Read data from a hive table with a new column - $format") {
    +      val client = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
    +
    +      Seq("true", "false").foreach { value =>
    +        withSQLConf(
    +          HiveUtils.CONVERT_METASTORE_ORC.key -> value,
    +          HiveUtils.CONVERT_METASTORE_PARQUET.key -> value) {
    --- End diff --
    
    Thank you for review, @viirya . For that, yes, we can, but that will be a little-bit overkill.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    Now, it's passed again. :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    LGTM too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19470: [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark s...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19470
  
    R failure seems to be irrelevant.
    ```
    [error] running /home/jenkins/workspace/SparkPullRequestBuilder/R/run-tests.sh ; process was terminated by signal 9
    Attempting to post to Github...
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org