You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by li...@apache.org on 2019/04/19 15:59:25 UTC
[spark] branch master updated: [SPARK-27176][FOLLOW-UP][SQL]
Upgrade Hive parquet to 1.10.1 for hadoop-3.2
This is an automated email from the ASF dual-hosted git repository.
lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 777b450 [SPARK-27176][FOLLOW-UP][SQL] Upgrade Hive parquet to 1.10.1 for hadoop-3.2
777b450 is described below
commit 777b4502b206b7240c6655d3c0b0a9ce08f6a09c
Author: Yuming Wang <yu...@ebay.com>
AuthorDate: Fri Apr 19 08:59:08 2019 -0700
[SPARK-27176][FOLLOW-UP][SQL] Upgrade Hive parquet to 1.10.1 for hadoop-3.2
## What changes were proposed in this pull request?
When we compile and test Hadoop 3.2, we will hint the following two issues:
1. JobSummaryLevel is not a member of object org.apache.parquet.hadoop.ParquetOutputFormat. Fixed by [PARQUET-381](https://issues.apache.org/jira/browse/PARQUET-381)(Parquet 1.9.0)
2. java.lang.NoSuchFieldError: BROTLI
at org.apache.parquet.hadoop.metadata.CompressionCodecName.<clinit>(CompressionCodecName.java:31). Fixed by [PARQUET-1143](https://issues.apache.org/jira/browse/PARQUET-1143)(Parquet 1.10.0)
The reason is that the `parquet-hadoop-bundle-1.8.1.jar` conflicts with Parquet 1.10.1.
I think it would be safe to upgrade Hive's parquet to 1.10.1 to workaround this issue.
This is what Hive did when upgrading Parquet 1.8.1 to 1.10.0: [HIVE-17000](https://issues.apache.org/jira/browse/HIVE-17000) and [HIVE-19464](https://issues.apache.org/jira/browse/HIVE-19464). We can see that all changes are related to vectors, and vectors are disabled by default: see [HIVE-14826](https://issues.apache.org/jira/browse/HIVE-14826) and [HiveConf.java#L2723](https://github.com/apache/hive/blob/rel/release-2.3.4/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2723).
This pr removes [parquet-hadoop-bundle-1.8.1.jar](https://github.com/apache/parquet-mr/tree/master/parquet-hadoop-bundle) , so Hive serde will use [parquet-common-1.10.1.jar, parquet-column-1.10.1.jar and parquet-hadoop-1.10.1.jar](https://github.com/apache/spark/blob/master/dev/deps/spark-deps-hadoop-3.2#L185-L189).
## How was this patch tested?
1. manual tests
2. [upgrade Hive Parquet to 1.10.1 annd run Hadoop 3.2 test on jenkins](https://github.com/apache/spark/pull/24044#commits-pushed-0c3f962)
Closes #24346 from wangyum/SPARK-27176.
Authored-by: Yuming Wang <yu...@ebay.com>
Signed-off-by: gatorsmile <ga...@gmail.com>
---
dev/deps/spark-deps-hadoop-3.2 | 1 -
pom.xml | 8 +++++---
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/dev/deps/spark-deps-hadoop-3.2 b/dev/deps/spark-deps-hadoop-3.2
index a45f02d..8b3bd79 100644
--- a/dev/deps/spark-deps-hadoop-3.2
+++ b/dev/deps/spark-deps-hadoop-3.2
@@ -187,7 +187,6 @@ parquet-common-1.10.1.jar
parquet-encoding-1.10.1.jar
parquet-format-2.4.0.jar
parquet-hadoop-1.10.1.jar
-parquet-hadoop-bundle-1.6.0.jar
parquet-jackson-1.10.1.jar
protobuf-java-2.5.0.jar
py4j-0.10.8.1.jar
diff --git a/pom.xml b/pom.xml
index fce4cbd..5879a76 100644
--- a/pom.xml
+++ b/pom.xml
@@ -221,6 +221,7 @@
-->
<hadoop.deps.scope>compile</hadoop.deps.scope>
<hive.deps.scope>compile</hive.deps.scope>
+ <hive.parquet.scope>${hive.deps.scope}</hive.parquet.scope>
<orc.deps.scope>compile</orc.deps.scope>
<parquet.deps.scope>compile</parquet.deps.scope>
<parquet.test.deps.scope>test</parquet.test.deps.scope>
@@ -2004,7 +2005,7 @@
<groupId>${hive.parquet.group}</groupId>
<artifactId>parquet-hadoop-bundle</artifactId>
<version>${hive.parquet.version}</version>
- <scope>compile</scope>
+ <scope>${hive.parquet.scope}</scope>
</dependency>
<dependency>
<groupId>org.codehaus.janino</groupId>
@@ -2818,8 +2819,9 @@
<hive.classifier>core</hive.classifier>
<hive.version>${hive23.version}</hive.version>
<hive.version.short>2.3.4</hive.version.short>
- <hive.parquet.group>org.apache.parquet</hive.parquet.group>
- <hive.parquet.version>1.8.1</hive.parquet.version>
+ <!-- Do not need parquet-hadoop-bundle because we already have
+ parquet-common, parquet-column and parquet-hadoop -->
+ <hive.parquet.scope>provided</hive.parquet.scope>
<orc.classifier></orc.classifier>
<datanucleus-core.version>4.1.17</datanucleus-core.version>
</properties>
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org