You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@orc.apache.org by wi...@apache.org on 2021/08/06 04:27:31 UTC

[orc] branch main updated: ORC-912: Exclude Spark transitive avro/parquet dependency from Spark benchmark (#818)

This is an automated email from the ASF dual-hosted git repository.

william pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git


The following commit(s) were added to refs/heads/main by this push:
     new e4e5324  ORC-912: Exclude Spark transitive avro/parquet dependency from Spark benchmark (#818)
e4e5324 is described below

commit e4e5324f1aef8ac30381586066fe85ae95e00954
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Thu Aug 5 21:27:25 2021 -0700

    ORC-912: Exclude Spark transitive avro/parquet dependency from Spark benchmark (#818)
    
    ### What changes were proposed in this pull request?
    
    This PR aims to exclude Spark's transitive avro/parquet dependency and some duplicate META-INF files from Spark benchmark uber jar.
    
    ### Why are the changes needed?
    
    Spark benchmark excludes ORC dependencies in order to measure the performance of our branches.
    In the same way, we should exclude Spark's transitive Parquet/Avro dependencies and some duplicated files.
    
    ### How was this patch tested?
    
    Manual.
    
    ```
    // Run on the prepared the benchmark data according to the README.md
    $ java -jar spark/target/orc-benchmarks-spark-1.8.0-SNAPSHOT.jar spark data
    # JMH version: 1.20
    # VM version: JDK 1.8.0_292, VM 25.292-b09
    # VM invoker: ***
    # VM options: -server -Xms256m -Xmx2g -Dbench.root.dir=/Users/dongjoon/data/orc
    # Warmup: 2 iterations, 10 s each
    # Measurement: 5 iterations, 10 s each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: org.apache.orc.bench.spark.SparkBenchmark.fullRead
    # Parameters: (compression = none, dataset = taxi, format = orc)
    
    # Run progress: 0.00% complete, ETA 01:34:30
    # Fork: 1 of 1
    # Warmup Iteration   1: [WARN ] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    
    Records: 22773249
    Invocations: 1
    io: 81
    Bytes: 1333069106
    10393911.120 us/op
    ...
    ```
---
 java/bench/pom.xml       |  8 ++++++++
 java/bench/spark/pom.xml | 13 +++++--------
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/java/bench/pom.xml b/java/bench/pom.xml
index a383c3d..0a54c4b 100644
--- a/java/bench/pom.xml
+++ b/java/bench/pom.xml
@@ -371,6 +371,10 @@
         <version>${spark.version}</version>
 	<exclusions>
 	  <exclusion>
+	    <groupId>org.apache.avro</groupId>
+	    <artifactId>avro-mapred</artifactId>
+	  </exclusion>
+	  <exclusion>
 	    <groupId>org.glassfish.hk2.external</groupId>
 	    <artifactId>aopalliance-repackaged</artifactId>
 	  </exclusion>
@@ -401,6 +405,10 @@
             <groupId>org.apache.orc</groupId>
             <artifactId>orc-mapreduce</artifactId>
           </exclusion>
+          <exclusion>
+            <groupId>org.apache.parquet</groupId>
+            <artifactId>parquet-column</artifactId>
+          </exclusion>
         </exclusions>
       </dependency>
       <dependency>
diff --git a/java/bench/spark/pom.xml b/java/bench/spark/pom.xml
index 02e94b6..40dfa69 100644
--- a/java/bench/spark/pom.xml
+++ b/java/bench/spark/pom.xml
@@ -173,15 +173,12 @@
               <shadedClassifierName>shaded</shadedClassifierName>
               <filters>
                 <filter>
-                  <artifact>org.codehaus.janino:janino</artifact>
-                  <excludes>
-                    <exclude>META-INF/DUMMY.SF</exclude>
-                    <exclude>META-INF/DUMMY.DSA</exclude>
-                  </excludes>
-                </filter>
-                <filter>
-                  <artifact>org.codehaus.janino:commons-compiler</artifact>
+                  <artifact>*:*</artifact>
                   <excludes>
+                    <exclude>META-INF/MANIFEST.MF</exclude>
+                    <exclude>META-INF/DEPENDENCIES</exclude>
+                    <exclude>META-INF/LICENSE</exclude>
+                    <exclude>META-INF/NOTICE</exclude>
                     <exclude>META-INF/DUMMY.SF</exclude>
                     <exclude>META-INF/DUMMY.DSA</exclude>
                   </excludes>