You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by do...@apache.org on 2021/08/06 20:37:35 UTC
[orc] 02/02: ORC-912: Exclude Spark transitive avro/parquet
dependency from Spark benchmark (#818)
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-1.7
in repository https://gitbox.apache.org/repos/asf/orc.git
commit 0d59352e551ca80b81d8683c5a36b1b69c3ac46c
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Thu Aug 5 21:27:25 2021 -0700
ORC-912: Exclude Spark transitive avro/parquet dependency from Spark benchmark (#818)
### What changes were proposed in this pull request?
This PR aims to exclude Spark's transitive avro/parquet dependency and some duplicate META-INF files from Spark benchmark uber jar.
### Why are the changes needed?
Spark benchmark excludes ORC dependencies in order to measure the performance of our branches.
In the same way, we should exclude Spark's transitive Parquet/Avro dependencies and some duplicated files.
### How was this patch tested?
Manual.
```
// Run on the prepared the benchmark data according to the README.md
$ java -jar spark/target/orc-benchmarks-spark-1.8.0-SNAPSHOT.jar spark data
# JMH version: 1.20
# VM version: JDK 1.8.0_292, VM 25.292-b09
# VM invoker: ***
# VM options: -server -Xms256m -Xmx2g -Dbench.root.dir=/Users/dongjoon/data/orc
# Warmup: 2 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.apache.orc.bench.spark.SparkBenchmark.fullRead
# Parameters: (compression = none, dataset = taxi, format = orc)
# Run progress: 0.00% complete, ETA 01:34:30
# Fork: 1 of 1
# Warmup Iteration 1: [WARN ] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Records: 22773249
Invocations: 1
io: 81
Bytes: 1333069106
10393911.120 us/op
...
```
(cherry picked from commit e4e5324f1aef8ac30381586066fe85ae95e00954)
Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
java/bench/pom.xml | 8 ++++++++
java/bench/spark/pom.xml | 13 +++++--------
2 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/java/bench/pom.xml b/java/bench/pom.xml
index c73cfef..41e8749 100644
--- a/java/bench/pom.xml
+++ b/java/bench/pom.xml
@@ -371,6 +371,10 @@
<version>${spark.version}</version>
<exclusions>
<exclusion>
+ <groupId>org.apache.avro</groupId>
+ <artifactId>avro-mapred</artifactId>
+ </exclusion>
+ <exclusion>
<groupId>org.glassfish.hk2.external</groupId>
<artifactId>aopalliance-repackaged</artifactId>
</exclusion>
@@ -401,6 +405,10 @@
<groupId>org.apache.orc</groupId>
<artifactId>orc-mapreduce</artifactId>
</exclusion>
+ <exclusion>
+ <groupId>org.apache.parquet</groupId>
+ <artifactId>parquet-column</artifactId>
+ </exclusion>
</exclusions>
</dependency>
<dependency>
diff --git a/java/bench/spark/pom.xml b/java/bench/spark/pom.xml
index 8bed284..4ab651e 100644
--- a/java/bench/spark/pom.xml
+++ b/java/bench/spark/pom.xml
@@ -173,15 +173,12 @@
<shadedClassifierName>shaded</shadedClassifierName>
<filters>
<filter>
- <artifact>org.codehaus.janino:janino</artifact>
- <excludes>
- <exclude>META-INF/DUMMY.SF</exclude>
- <exclude>META-INF/DUMMY.DSA</exclude>
- </excludes>
- </filter>
- <filter>
- <artifact>org.codehaus.janino:commons-compiler</artifact>
+ <artifact>*:*</artifact>
<excludes>
+ <exclude>META-INF/MANIFEST.MF</exclude>
+ <exclude>META-INF/DEPENDENCIES</exclude>
+ <exclude>META-INF/LICENSE</exclude>
+ <exclude>META-INF/NOTICE</exclude>
<exclude>META-INF/DUMMY.SF</exclude>
<exclude>META-INF/DUMMY.DSA</exclude>
</excludes>