You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "pan3793 (via GitHub)" <gi...@apache.org> on 2023/08/04 04:14:32 UTC

[GitHub] [spark] pan3793 opened a new pull request, #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

pan3793 opened a new pull request, #42336:
URL: https://github.com/apache/spark/pull/42336

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   Add file extensions for Parquet/ORC files written using Hive Serde, to keep behavior consistent with Spark DataSource implementation.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   ```
   bin/spark-sql \
     --conf spark.sql.hive.convertMetastoreOrc=false \
     --conf spark.sql.hive.convertMetastoreParquet
   
   CREATE DATABASE test;
   USE test;
   
   CREATE TABLE hive_parquet (id INT, name STRING) STORED AS parquet;
   CREATE TABLE ds_parquet   (id INT, name STRING) USING parquet;
   CREATE TABLE hive_orc     (id INT, name STRING) STORED AS orc;
   CREATE TABLE ds_orc       (id INT, name STRING) USING orc;
   
   INSERT OVERWRITE hive_parquet VALUES(1, 'one');
   INSERT OVERWRITE ds_parquet   VALUES(1, 'one');
   INSERT OVERWRITE hive_orc     VALUES(1, 'one');
   INSERT OVERWRITE ds_orc       VALUES(1, 'one');
   ```
   
   ```
   ➜  test.db tree
   .
   ├── ds_orc
   │   ├── _SUCCESS
   │   └── part-00000-583d43f5-e57b-40e3-b2fd-a140ccb90a8e-c000.snappy.orc
   ├── ds_parquet
   │   ├── _SUCCESS
   │   └── part-00000-740e0249-c090-4240-90ed-b4e170dd8899-c000.snappy.parquet
   ├── hive_orc
   │   └── part-00000-5a481e57-caf3-471c-9cf3-0ec26e94e7a3-c000
   └── hive_parquet
       └── part-00000-27470d2e-4a74-41a5-bcc0-3b5bc872a233-c000
   
   5 directories, 6 files
   ```
   
   ```
   ➜  test.db tree
   .
   ├── ds_orc
   │   ├── _SUCCESS
   │   └── part-00000-583d43f5-e57b-40e3-b2fd-a140ccb90a8e-c000.snappy.orc
   ├── ds_parquet
   │   ├── _SUCCESS
   │   └── part-00000-740e0249-c090-4240-90ed-b4e170dd8899-c000.snappy.parquet
   ├── hive_orc
   │   └── part-00000-5a481e57-caf3-471c-9cf3-0ec26e94e7a3-c000
   └── hive_parquet
       └── part-00000-27470d2e-4a74-41a5-bcc0-3b5bc872a233-c000
   
   5 directories, 6 files
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   ```
   bin/spark-sql \
     --conf spark.sql.hive.convertMetastoreOrc=false \
     --conf spark.sql.hive.convertMetastoreParquet
   
   CREATE DATABASE spark_44669;
   USE spark_44669;
   
   CREATE TABLE hive_parquet (id INT, name STRING) STORED AS parquet;
   CREATE TABLE ds_parquet   (id INT, name STRING) USING parquet;
   CREATE TABLE hive_orc     (id INT, name STRING) STORED AS orc;
   CREATE TABLE ds_orc       (id INT, name STRING) USING orc;
   
   INSERT OVERWRITE hive_parquet VALUES(1, 'one');
   INSERT OVERWRITE ds_parquet   VALUES(1, 'one');
   INSERT OVERWRITE hive_orc     VALUES(1, 'one');
   INSERT OVERWRITE ds_orc       VALUES(1, 'one');
   ```
   
   ```
   ➜  spark_44669.db tree
   .
   ├── ds_orc
   │   ├── _SUCCESS
   │   └── part-00000-3d48ba2e-61df-4259-9e20-202d5530965b-c000.snappy.orc
   ├── ds_parquet
   │   ├── _SUCCESS
   │   └── part-00000-86b05b4b-d49a-40ec-898d-bfa7fb527e81-c000.snappy.parquet
   ├── hive_orc
   │   └── part-00000-8294f959-4fce-446d-b336-2200aea12324-c000.orc
   └── hive_parquet
       └── part-00000-68b20c75-17ed-45cf-a5ea-043a8e10e432-c000.snappy.parquet
   
   5 directories, 6 files
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.

pan3793 commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1665860195

   Let me supply my use case to help the reviewer evaluate the benefit of this change.
   
   Internally, most of the Spark jobs write Parquet/ORC files using Hive Serde(obviously, for compatibility with multiple computing engines mixed-use cases). Recently, we are working on promoting the zstd compression algorithm on the data warehouse, and one blocker is that we can not identify the compression algorithm of Parquet/ORC files without reading the file.
   
   For large HDFS clusters which may store over a billion files, opening files and reading the metadata to check the compression algorithm is quite costly ..., with this change, we can dump all file names from the NameNode and simply check the file name to know that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] melihsozdinler commented on a diff in pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

Posted by "melihsozdinler (via GitHub)" <gi...@apache.org>.

melihsozdinler commented on code in PR #42336:
URL: https://github.com/apache/spark/pull/42336#discussion_r1285819479


##########
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala:
##########
@@ -122,6 +130,23 @@ class HiveFileFormat(fileSinkConf: FileSinkDesc)
       case _ => true
     }
   }
+
+  private def configureFileExtension(conf: Configuration): Unit = {
+    fileSinkConf.getTableInfo.getOutputFileFormatClassName match {
+      case "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"
+        if SQLConf.get.getConf(HiveUtils.HIVE_PARQUET_FILE_EXTENSION_ENABLED) =>
+        conf.set("hive.output.file.extension",
+          CodecConfig.from(new JobConf(conf)).getCodec.getExtension + ".parquet")

Review Comment:
   .parquet and .orc are repeated in many places as a magic string, there are classes like OrcUtils and ParquetUtils can provide the file extension with a new function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.

pan3793 commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1667169264

   @wangyum I see your point, the table property takes high priority than the spark session configuration, but that does not fully solve the problem.
   
   The zstd promotion is gradually rolling out, and usually, we only change the table properties for future data and don't touch the existing data at first, it's hard to get file-level statistics of the whole cluster as explained above. Let's say we change the table `X` property on July 1st, all files written after then are using zstd, but all previous files use snappy, and all files' name does not contain compression algorithm.
   
   This PR gives users an easy way to observe that, I believe it will benefit users who want to promote a new compression algorithm with writing data via Hive Serde.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

Posted by "wangyum (via GitHub)" <gi...@apache.org>.

wangyum commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1665294859

   Hive also doesn't have file extension:
   ```sql
   hive> CREATE TABLE hive_orc     (id INT, name STRING) STORED AS orc;
   hive> insert into hive_orc   select 1, 'one';
   hive> dfs -ls file:/tmp/yumwang/hivelocal/warehouse/hive_orc;
   Found 1 items
   -rwxr-xr-x   1 yumwang wheel        283 2023-08-04 17:11 file:///tmp/yumwang/hivelocal/warehouse/hive_orc/000000_0
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1665806432

   To be honest, I believe this PR doesn't have enough good benefit (reason) to change the existing Apache Spark's behavior, @pan3793 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.

yaooqinn commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1665340358

   IIUC, Hive Serde differs from datasource tables naturally and is more likely to follow the rules on the Hive side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.

pan3793 commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1665362216

   @wangyum @yaooqinn I agree with your opinion to follow the Hive behavior as much as possible, meanwhile, Spark also aims to reduce the difference between DS/Hive. As you can see, the file name pattern is not same as Hive but like DS.
   
   - written via Spark `hive_orc/part-00000-5a481e57-caf3-471c-9cf3-0ec26e94e7a3-c000`
   - written via Hive `hive_orc/000000_0`
   
   WDYT to add a configuration and disable in default for this feature?
   
   For Parquet/ORC format, the file name does not affect decoding, since the compression information is part of the metadata of the file content. 
   
   Given that DS's file name is much more friendly for administrators to identify the format and compression codec. I would like to allow Spark to have such an ability.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.

pan3793 commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1665801271

   @dongjoon-hyun I understand your concerns, but as shown above, the current filename written via Spark Hive serde is not the same as Hive does, it's half like DS and half like Hive. So I guess the author was intent to reduce the difference between DS and Hive?
   
   Given that, do you think we can accept this change and make this feature configurable?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] melihsozdinler commented on a diff in pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

Posted by "melihsozdinler (via GitHub)" <gi...@apache.org>.

melihsozdinler commented on code in PR #42336:
URL: https://github.com/apache/spark/pull/42336#discussion_r1286682752


##########
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala:
##########
@@ -122,6 +130,23 @@ class HiveFileFormat(fileSinkConf: FileSinkDesc)
       case _ => true
     }
   }
+
+  private def configureFileExtension(conf: Configuration): Unit = {
+    fileSinkConf.getTableInfo.getOutputFileFormatClassName match {
+      case "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"
+        if SQLConf.get.getConf(HiveUtils.HIVE_PARQUET_FILE_EXTENSION_ENABLED) =>
+        conf.set("hive.output.file.extension",
+          CodecConfig.from(new JobConf(conf)).getCodec.getExtension + ".parquet")

Review Comment:
   thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] closed pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension
URL: https://github.com/apache/spark/pull/42336


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1665767868

   I also agree with @wangyum and @yaooqinn . IIRC, we keep the existing behavior for that reason exactly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on a diff in pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.

pan3793 commented on code in PR #42336:
URL: https://github.com/apache/spark/pull/42336#discussion_r1286544346


##########
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala:
##########
@@ -122,6 +130,23 @@ class HiveFileFormat(fileSinkConf: FileSinkDesc)
       case _ => true
     }
   }
+
+  private def configureFileExtension(conf: Configuration): Unit = {
+    fileSinkConf.getTableInfo.getOutputFileFormatClassName match {
+      case "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"
+        if SQLConf.get.getConf(HiveUtils.HIVE_PARQUET_FILE_EXTENSION_ENABLED) =>
+        conf.set("hive.output.file.extension",
+          CodecConfig.from(new JobConf(conf)).getCodec.getExtension + ".parquet")

Review Comment:
   @melihsozdinler thanks for the suggestion, I'd love to do such refactoring as long as the community intends to accept this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

Posted by "wangyum (via GitHub)" <gi...@apache.org>.

wangyum commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1666333752

   Could you create table with table properties?
   ```sql
   CREATE TABLE hive_orc (id INT, name STRING) STORED AS orc 
   TBLPROPERTIES('orc.compress'='zstd');
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] pan3793 commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.

pan3793 commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1664948047

   cc @wangyum @ulysses-you @yaooqinn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1815534282

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org