You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2018/05/10 04:48:03 UTC

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/21288

    [SPARK-24206][SQL] Improve FilterPushdownBenchmark benchmark code

    ## What changes were proposed in this pull request?
    This pr added benchmark code `FilterPushdownBenchmark` for string pushdown and updated performance results.
    
    ## How was this patch tested?
    N/A

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark UpdateParquetBenchmark

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21288.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21288
    
----
commit 223bf2008abfe5fd41c3b5e741dc525ab3864977
Author: Takeshi Yamamuro <ya...@...>
Date:   2018-05-03T00:17:21Z

    Fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r190382044
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -0,0 +1,437 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.{Random, Try}
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.functions.monotonically_increasing_id
    +import org.apache.spark.sql.internal.SQLConf
    +import org.apache.spark.util.{Benchmark, Utils}
    +
    +
    +/**
    + * Benchmark to measure read performance with Filter pushdown.
    + * To run this:
    + *  spark-submit --class <this class> <spark sql test jar>
    + */
    +object FilterPushdownBenchmark {
    +  val conf = new SparkConf()
    +    .setAppName("FilterPushdownBenchmark")
    +    .setIfMissing("spark.master", "local[1]")
    +    .setIfMissing("spark.driver.memory", "3g")
    +    .setIfMissing("spark.executor.memory", "3g")
    +    .setIfMissing("orc.compression", "snappy")
    +    .setIfMissing("spark.sql.parquet.compression.codec", "snappy")
    +
    +  private val spark = SparkSession.builder().config(conf).getOrCreate()
    +
    +  def withTempPath(f: File => Unit): Unit = {
    +    val path = Utils.createTempDir()
    +    path.delete()
    +    try f(path) finally Utils.deleteRecursively(path)
    +  }
    +
    +  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
    +    try f finally tableNames.foreach(spark.catalog.dropTempView)
    +  }
    +
    +  def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
    +    val (keys, values) = pairs.unzip
    +    val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption)
    +    (keys, values).zipped.foreach(spark.conf.set)
    +    try f finally {
    +      keys.zip(currentValues).foreach {
    +        case (key, Some(value)) => spark.conf.set(key, value)
    +        case (key, None) => spark.conf.unset(key)
    +      }
    +    }
    +  }
    +
    +  private def prepareTable(
    +      dir: File, numRows: Int, width: Int, useStringForValue: Boolean): Unit = {
    +    import spark.implicits._
    +    val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i")
    +    val valueCol = if (useStringForValue) {
    +      monotonically_increasing_id().cast("string")
    +    } else {
    +      monotonically_increasing_id()
    +    }
    +    val df = spark.range(numRows).map(_ => Random.nextLong).selectExpr(selectExpr: _*)
    +      .withColumn("value", valueCol)
    +      .sort("value")
    +
    +    saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
    +    saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
    +  }
    +
    +  private def prepareStringDictTable(
    +      dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = {
    +    val selectExpr = (0 to width).map {
    +      case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value"
    +      case i => s"CAST(rand() AS STRING) c$i"
    +    }
    +    val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value")
    +
    +    saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
    +    saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
    +  }
    +
    +  private def saveAsOrcTable(df: DataFrame, dir: String): Unit = {
    +    df.write.mode("overwrite").orc(dir)
    +    spark.read.orc(dir).createOrReplaceTempView("orcTable")
    +  }
    +
    +  private def saveAsParquetTable(df: DataFrame, dir: String): Unit = {
    +    df.write.mode("overwrite").parquet(dir)
    +    spark.read.parquet(dir).createOrReplaceTempView("parquetTable")
    +  }
    +
    +  def filterPushDownBenchmark(
    +      values: Int,
    +      title: String,
    +      whereExpr: String,
    +      selectExpr: String = "*"): Unit = {
    +    val benchmark = new Benchmark(title, values, minNumIters = 5)
    +
    +    Seq(false, true).foreach { pushDownEnabled =>
    +      val name = s"Parquet Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}"
    +      benchmark.addCase(name) { _ =>
    +        withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> s"$pushDownEnabled") {
    +          spark.sql(s"SELECT $selectExpr FROM parquetTable WHERE $whereExpr").collect()
    +        }
    +      }
    +    }
    +
    +    Seq(false, true).foreach { pushDownEnabled =>
    +      val name = s"Native ORC Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}"
    +      benchmark.addCase(name) { _ =>
    +        withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> s"$pushDownEnabled") {
    +          spark.sql(s"SELECT $selectExpr FROM orcTable WHERE $whereExpr").collect()
    +        }
    +      }
    +    }
    +
    +    /*
    +    Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
    +    Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    +    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    +    Native ORC Vectorized                         8167 / 8185          1.9         519.3       1.0X
    +    Native ORC Vectorized (Pushdown)               365 /  379         43.1          23.2      23.1X
    +
    +
    +    Select 0 string row
    +    ('7864320' < value < '7864320'):         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8532 / 8564          1.8         542.4       1.0X
    +    Parquet Vectorized (Pushdown)                  366 /  386         43.0          23.3      23.3X
    +    Native ORC Vectorized                         8289 / 8300          1.9         527.0       1.0X
    +    Native ORC Vectorized (Pushdown)               378 /  385         41.6          24.0      22.6X
    +
    +
    +    Select 1 string row (value = '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8547 / 8564          1.8         543.4       1.0X
    +    Parquet Vectorized (Pushdown)                  351 /  356         44.9          22.3      24.4X
    +    Native ORC Vectorized                         8310 / 8323          1.9         528.3       1.0X
    +    Native ORC Vectorized (Pushdown)               370 /  375         42.5          23.5      23.1X
    +
    +
    +    Select 1 string row
    +    (value <=> '7864320'):                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8537 / 8563          1.8         542.8       1.0X
    +    Parquet Vectorized (Pushdown)                  310 /  319         50.7          19.7      27.5X
    +    Native ORC Vectorized                         8316 / 8335          1.9         528.7       1.0X
    +    Native ORC Vectorized (Pushdown)               364 /  367         43.2          23.1      23.5X
    +
    +
    +    Select 1 string row
    +    ('7864320' <= value <= '7864320'):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8594 / 8607          1.8         546.4       1.0X
    +    Parquet Vectorized (Pushdown)                  370 /  374         42.5          23.5      23.2X
    +    Native ORC Vectorized                         8350 / 8358          1.9         530.9       1.0X
    +    Native ORC Vectorized (Pushdown)               371 /  374         42.4          23.6      23.2X
    +
    +
    +    Select all string rows
    +    (value IS NOT NULL):                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          19601 / 19625          0.8        1246.2       1.0X
    +    Parquet Vectorized (Pushdown)               19698 / 19703          0.8        1252.3       1.0X
    +    Native ORC Vectorized                       19435 / 19470          0.8        1235.6       1.0X
    +    Native ORC Vectorized (Pushdown)            19568 / 19590          0.8        1244.1       1.0X
    +
    +
    +    Select 0 int row (value IS NULL):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7815 / 7824          2.0         496.9       1.0X
    +    Parquet Vectorized (Pushdown)                  245 /  251         64.2          15.6      31.9X
    +    Native ORC Vectorized                         7436 / 7460          2.1         472.8       1.1X
    +    Native ORC Vectorized (Pushdown)               344 /  351         45.7          21.9      22.7X
    +
    +
    +    Select 0 int row
    +    (7864320 < value < 7864320):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7792 / 7807          2.0         495.4       1.0X
    +    Parquet Vectorized (Pushdown)                  349 /  353         45.1          22.2      22.3X
    +    Native ORC Vectorized                         7451 / 7465          2.1         473.7       1.0X
    +    Native ORC Vectorized (Pushdown)               365 /  368         43.0          23.2      21.3X
    +
    +
    +    Select 1 int row (value = 7864320):      Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7836 / 7872          2.0         498.2       1.0X
    +    Parquet Vectorized (Pushdown)                  322 /  327         48.8          20.5      24.3X
    +    Native ORC Vectorized                         7533 / 7540          2.1         478.9       1.0X
    +    Native ORC Vectorized (Pushdown)               358 /  363         43.9          22.8      21.9X
    +
    +
    +    Select 1 int row (value <=> 7864320):    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7855 / 7870          2.0         499.4       1.0X
    +    Parquet Vectorized (Pushdown)                  286 /  297         54.9          18.2      27.4X
    +    Native ORC Vectorized                         7511 / 7557          2.1         477.5       1.0X
    +    Native ORC Vectorized (Pushdown)               358 /  361         43.9          22.8      21.9X
    +
    +
    +    Select 1 int row
    +    (7864320 <= value <= 7864320):           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7851 / 7870          2.0         499.2       1.0X
    +    Parquet Vectorized (Pushdown)                  345 /  347         45.6          21.9      22.8X
    +    Native ORC Vectorized                         7543 / 7554          2.1         479.6       1.0X
    +    Native ORC Vectorized (Pushdown)               364 /  374         43.2          23.1      21.6X
    +
    +
    +    Select 1 int row
    +    (7864319 < value < 7864321):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7837 / 7840          2.0         498.2       1.0X
    +    Parquet Vectorized (Pushdown)                  338 /  339         46.6          21.5      23.2X
    +    Native ORC Vectorized                         7524 / 7541          2.1         478.3       1.0X
    +    Native ORC Vectorized (Pushdown)               361 /  364         43.6          22.9      21.7X
    +
    +
    +    Select 10% int rows (value < 1572864):   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8864 / 8900          1.8         563.5       1.0X
    +    Parquet Vectorized (Pushdown)                 2088 / 2095          7.5         132.7       4.2X
    +    Native ORC Vectorized                         8562 / 8579          1.8         544.3       1.0X
    +    Native ORC Vectorized (Pushdown)              2127 / 2131          7.4         135.2       4.2X
    +
    +
    +    Select 50% int rows (value < 7864320):   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          12671 / 12684          1.2         805.6       1.0X
    +    Parquet Vectorized (Pushdown)                 9032 / 9041          1.7         574.2       1.4X
    +    Native ORC Vectorized                       12388 / 12411          1.3         787.6       1.0X
    +    Native ORC Vectorized (Pushdown)              8873 / 8884          1.8         564.1       1.4X
    +
    +
    +    Select 90% int rows (value < 14155776):  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          16481 / 16495          1.0        1047.8       1.0X
    +    Parquet Vectorized (Pushdown)               15906 / 15919          1.0        1011.3       1.0X
    +    Native ORC Vectorized                       16224 / 16254          1.0        1031.5       1.0X
    +    Native ORC Vectorized (Pushdown)            15632 / 15661          1.0         993.9       1.1X
    +
    +
    +    Select all int rows (value IS NOT NULL): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17341 / 17354          0.9        1102.5       1.0X
    +    Parquet Vectorized (Pushdown)               17463 / 17481          0.9        1110.2       1.0X
    +    Native ORC Vectorized                       17073 / 17089          0.9        1085.4       1.0X
    +    Native ORC Vectorized (Pushdown)            17194 / 17232          0.9        1093.2       1.0X
    +
    +
    +    Select all int rows (value > -1):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17452 / 17467          0.9        1109.6       1.0X
    +    Parquet Vectorized (Pushdown)               17613 / 17630          0.9        1119.8       1.0X
    +    Native ORC Vectorized                       17259 / 17271          0.9        1097.3       1.0X
    +    Native ORC Vectorized (Pushdown)            17385 / 17429          0.9        1105.3       1.0X
    +
    +
    +    Select all int rows (value != -1):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17363 / 17372          0.9        1103.9       1.0X
    +    Parquet Vectorized (Pushdown)               17526 / 17535          0.9        1114.2       1.0X
    +    Native ORC Vectorized                       17052 / 17089          0.9        1084.2       1.0X
    +    Native ORC Vectorized (Pushdown)            17209 / 17229          0.9        1094.1       1.0X
    +
    +
    +    Select 0 distinct string row
    +    (value IS NULL):                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7697 / 7751          2.0         489.4       1.0X
    +    Parquet Vectorized (Pushdown)                  264 /  284         59.5          16.8      29.1X
    +    Native ORC Vectorized                         6942 / 6970          2.3         441.4       1.1X
    +    Native ORC Vectorized (Pushdown)               372 /  381         42.3          23.7      20.7X
    +
    +
    +    Select 0 distinct string row
    +    ('100' < value < '100'):                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7983 / 8018          2.0         507.5       1.0X
    +    Parquet Vectorized (Pushdown)                  334 /  337         47.0          21.3      23.9X
    +    Native ORC Vectorized                         7307 / 7313          2.2         464.5       1.1X
    +    Native ORC Vectorized (Pushdown)               363 /  371         43.3          23.1      22.0X
    +
    +
    +    Select 1 distinct string row
    +    (value = '100'):                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7882 / 7915          2.0         501.1       1.0X
    +    Parquet Vectorized (Pushdown)                  504 /  522         31.2          32.1      15.6X
    +    Native ORC Vectorized                         7143 / 7155          2.2         454.1       1.1X
    +    Native ORC Vectorized (Pushdown)               555 /  573         28.4          35.3      14.2X
    +
    +
    +    Select 1 distinct string row
    +    (value <=> '100'):                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7898 / 7912          2.0         502.1       1.0X
    +    Parquet Vectorized (Pushdown)                  470 /  481         33.5          29.9      16.8X
    +    Native ORC Vectorized                         7135 / 7149          2.2         453.6       1.1X
    +    Native ORC Vectorized (Pushdown)               552 /  557         28.5          35.1      14.3X
    +
    +
    +    Select 1 distinct string row
    +    ('100' <= value <= '100'):               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8189 / 8213          1.9         520.7       1.0X
    +    Parquet Vectorized (Pushdown)                  527 /  534         29.9          33.5      15.5X
    +    Native ORC Vectorized                         7477 / 7498          2.1         475.3       1.1X
    +    Native ORC Vectorized (Pushdown)               558 /  566         28.2          35.5      14.7X
    +
    +
    +    Select all distinct string rows
    +    (value IS NOT NULL):                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          19462 / 19476          0.8        1237.4       1.0X
    +    Parquet Vectorized (Pushdown)               19570 / 19582          0.8        1244.2       1.0X
    +    Native ORC Vectorized                       18577 / 18604          0.8        1181.1       1.0X
    +    Native ORC Vectorized (Pushdown)            18701 / 18742          0.8        1189.0       1.0X
    +    */
    +    benchmark.run()
    +  }
    +
    +  private def runIntBenchmark(numRows: Int, width: Int, mid: Int): Unit = {
    +    Seq("value IS NULL", s"$mid < value AND value < $mid").foreach { whereExpr =>
    +      val title = s"Select 0 int row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    Seq(
    +      s"value = $mid",
    +      s"value <=> $mid",
    +      s"$mid <= value AND value <= $mid",
    +      s"${mid - 1} < value AND value < ${mid + 1}"
    +    ).foreach { whereExpr =>
    +      val title = s"Select 1 int row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
    +
    +    Seq(10, 50, 90).foreach { percent =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select $percent% int rows (value < ${numRows * percent / 100})",
    +        s"value < ${numRows * percent / 100}",
    +        selectExpr
    +      )
    +    }
    +
    +    Seq("value IS NOT NULL", "value > -1", "value != -1").foreach { whereExpr =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select all int rows ($whereExpr)",
    +        whereExpr,
    +        selectExpr)
    +    }
    +  }
    +
    +  private def runStringBenchmark(
    +      numRows: Int, width: Int, searchValue: Int, colType: String): Unit = {
    +    Seq("value IS NULL", s"'$searchValue' < value AND value < '$searchValue'")
    +        .foreach { whereExpr =>
    +      val title = s"Select 0 $colType row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    Seq(
    +      s"value = '$searchValue'",
    +      s"value <=> '$searchValue'",
    +      s"'$searchValue' <= value AND value <= '$searchValue'"
    +    ).foreach { whereExpr =>
    +      val title = s"Select 1 $colType row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
    +
    +    Seq("value IS NOT NULL").foreach { whereExpr =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select all $colType rows ($whereExpr)",
    +        whereExpr,
    +        selectExpr)
    +    }
    +  }
    +
    +  def main(args: Array[String]): Unit = {
    +    val numRows = 1024 * 1024 * 15
    +    val width = 5
    +
    +    // Pushdown for many distinct value case
    +    withTempPath { dir =>
    +      val mid = numRows / 2
    +
    +      withTempTable("orcTable", "patquetTable") {
    +        Seq(true, false).foreach { useStringForValue =>
    +          prepareTable(dir, numRows, width, useStringForValue)
    +          if (useStringForValue) {
    +            runStringBenchmark(numRows, width, mid, "string")
    +          } else {
    +            runIntBenchmark(numRows, width, mid)
    +          }
    +        }
    +      }
    +    }
    +
    +    // Pushdown for few distinct value case (use dictionary encoding)
    --- End diff --
    
    Let us add a comment and also change the conf?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    I've check the metrics and I found that GC happend in case of `--diriver-memory 3g`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4112/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r191650406
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -131,211 +132,214 @@ object FilterPushdownBenchmark {
         }
     
         /*
    +    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.26-46.32.amzn1.x86_64
         Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
         Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
         ------------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    -    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    -    Native ORC Vectorized                         8167 / 8185          1.9         519.3       1.0X
    -    Native ORC Vectorized (Pushdown)               365 /  379         43.1          23.2      23.1X
    +    Parquet Vectorized                            2961 / 3123          5.3         188.3       1.0X
    +    Parquet Vectorized (Pushdown)                 3057 / 3121          5.1         194.4       1.0X
    --- End diff --
    
    How about 2.3?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #90440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90440/testReport)** for PR 21288 at commit [`223bf20`](https://github.com/apache/spark/commit/223bf2008abfe5fd41c3b5e741dc525ab3864977).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91795 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91795/testReport)** for PR 21288 at commit [`d41e689`](https://github.com/apache/spark/commit/d41e68914e00a7ba6734b3fdbe839b130fbbd42e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r189638131
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -0,0 +1,437 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.{Random, Try}
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.functions.monotonically_increasing_id
    +import org.apache.spark.sql.internal.SQLConf
    +import org.apache.spark.util.{Benchmark, Utils}
    +
    +
    +/**
    + * Benchmark to measure read performance with Filter pushdown.
    + * To run this:
    + *  spark-submit --class <this class> <spark sql test jar>
    + */
    +object FilterPushdownBenchmark {
    +  val conf = new SparkConf()
    +    .setAppName("FilterPushdownBenchmark")
    +    .setIfMissing("spark.master", "local[1]")
    +    .setIfMissing("spark.driver.memory", "3g")
    +    .setIfMissing("spark.executor.memory", "3g")
    +    .setIfMissing("orc.compression", "snappy")
    +    .setIfMissing("spark.sql.parquet.compression.codec", "snappy")
    +
    +  private val spark = SparkSession.builder().config(conf).getOrCreate()
    +
    +  def withTempPath(f: File => Unit): Unit = {
    +    val path = Utils.createTempDir()
    +    path.delete()
    +    try f(path) finally Utils.deleteRecursively(path)
    +  }
    +
    +  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
    +    try f finally tableNames.foreach(spark.catalog.dropTempView)
    +  }
    +
    +  def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
    +    val (keys, values) = pairs.unzip
    +    val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption)
    +    (keys, values).zipped.foreach(spark.conf.set)
    +    try f finally {
    +      keys.zip(currentValues).foreach {
    +        case (key, Some(value)) => spark.conf.set(key, value)
    +        case (key, None) => spark.conf.unset(key)
    +      }
    +    }
    +  }
    +
    +  private def prepareTable(
    +      dir: File, numRows: Int, width: Int, useStringForValue: Boolean): Unit = {
    +    import spark.implicits._
    +    val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i")
    +    val valueCol = if (useStringForValue) {
    +      monotonically_increasing_id().cast("string")
    +    } else {
    +      monotonically_increasing_id()
    +    }
    +    val df = spark.range(numRows).map(_ => Random.nextLong).selectExpr(selectExpr: _*)
    +      .withColumn("value", valueCol)
    +      .sort("value")
    +
    +    saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
    +    saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
    +  }
    +
    +  private def prepareStringDictTable(
    +      dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = {
    +    val selectExpr = (0 to width).map {
    +      case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value"
    +      case i => s"CAST(rand() AS STRING) c$i"
    +    }
    +    val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value")
    +
    +    saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
    +    saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
    +  }
    +
    +  private def saveAsOrcTable(df: DataFrame, dir: String): Unit = {
    +    df.write.mode("overwrite").orc(dir)
    +    spark.read.orc(dir).createOrReplaceTempView("orcTable")
    +  }
    +
    +  private def saveAsParquetTable(df: DataFrame, dir: String): Unit = {
    +    df.write.mode("overwrite").parquet(dir)
    +    spark.read.parquet(dir).createOrReplaceTempView("parquetTable")
    +  }
    +
    +  def filterPushDownBenchmark(
    +      values: Int,
    +      title: String,
    +      whereExpr: String,
    +      selectExpr: String = "*"): Unit = {
    +    val benchmark = new Benchmark(title, values, minNumIters = 5)
    +
    +    Seq(false, true).foreach { pushDownEnabled =>
    +      val name = s"Parquet Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}"
    +      benchmark.addCase(name) { _ =>
    +        withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> s"$pushDownEnabled") {
    +          spark.sql(s"SELECT $selectExpr FROM parquetTable WHERE $whereExpr").collect()
    +        }
    +      }
    +    }
    +
    +    Seq(false, true).foreach { pushDownEnabled =>
    +      val name = s"Native ORC Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}"
    +      benchmark.addCase(name) { _ =>
    +        withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> s"$pushDownEnabled") {
    +          spark.sql(s"SELECT $selectExpr FROM orcTable WHERE $whereExpr").collect()
    +        }
    +      }
    +    }
    +
    +    /*
    +    Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
    +    Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    +    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    +    Native ORC Vectorized                         8167 / 8185          1.9         519.3       1.0X
    +    Native ORC Vectorized (Pushdown)               365 /  379         43.1          23.2      23.1X
    +
    +
    +    Select 0 string row
    +    ('7864320' < value < '7864320'):         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8532 / 8564          1.8         542.4       1.0X
    +    Parquet Vectorized (Pushdown)                  366 /  386         43.0          23.3      23.3X
    +    Native ORC Vectorized                         8289 / 8300          1.9         527.0       1.0X
    +    Native ORC Vectorized (Pushdown)               378 /  385         41.6          24.0      22.6X
    +
    +
    +    Select 1 string row (value = '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8547 / 8564          1.8         543.4       1.0X
    +    Parquet Vectorized (Pushdown)                  351 /  356         44.9          22.3      24.4X
    +    Native ORC Vectorized                         8310 / 8323          1.9         528.3       1.0X
    +    Native ORC Vectorized (Pushdown)               370 /  375         42.5          23.5      23.1X
    +
    +
    +    Select 1 string row
    +    (value <=> '7864320'):                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8537 / 8563          1.8         542.8       1.0X
    +    Parquet Vectorized (Pushdown)                  310 /  319         50.7          19.7      27.5X
    +    Native ORC Vectorized                         8316 / 8335          1.9         528.7       1.0X
    +    Native ORC Vectorized (Pushdown)               364 /  367         43.2          23.1      23.5X
    +
    +
    +    Select 1 string row
    +    ('7864320' <= value <= '7864320'):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8594 / 8607          1.8         546.4       1.0X
    +    Parquet Vectorized (Pushdown)                  370 /  374         42.5          23.5      23.2X
    +    Native ORC Vectorized                         8350 / 8358          1.9         530.9       1.0X
    +    Native ORC Vectorized (Pushdown)               371 /  374         42.4          23.6      23.2X
    +
    +
    +    Select all string rows
    +    (value IS NOT NULL):                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          19601 / 19625          0.8        1246.2       1.0X
    +    Parquet Vectorized (Pushdown)               19698 / 19703          0.8        1252.3       1.0X
    +    Native ORC Vectorized                       19435 / 19470          0.8        1235.6       1.0X
    +    Native ORC Vectorized (Pushdown)            19568 / 19590          0.8        1244.1       1.0X
    +
    +
    +    Select 0 int row (value IS NULL):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7815 / 7824          2.0         496.9       1.0X
    +    Parquet Vectorized (Pushdown)                  245 /  251         64.2          15.6      31.9X
    +    Native ORC Vectorized                         7436 / 7460          2.1         472.8       1.1X
    +    Native ORC Vectorized (Pushdown)               344 /  351         45.7          21.9      22.7X
    +
    +
    +    Select 0 int row
    +    (7864320 < value < 7864320):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7792 / 7807          2.0         495.4       1.0X
    +    Parquet Vectorized (Pushdown)                  349 /  353         45.1          22.2      22.3X
    +    Native ORC Vectorized                         7451 / 7465          2.1         473.7       1.0X
    +    Native ORC Vectorized (Pushdown)               365 /  368         43.0          23.2      21.3X
    +
    +
    +    Select 1 int row (value = 7864320):      Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7836 / 7872          2.0         498.2       1.0X
    +    Parquet Vectorized (Pushdown)                  322 /  327         48.8          20.5      24.3X
    +    Native ORC Vectorized                         7533 / 7540          2.1         478.9       1.0X
    +    Native ORC Vectorized (Pushdown)               358 /  363         43.9          22.8      21.9X
    +
    +
    +    Select 1 int row (value <=> 7864320):    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7855 / 7870          2.0         499.4       1.0X
    +    Parquet Vectorized (Pushdown)                  286 /  297         54.9          18.2      27.4X
    +    Native ORC Vectorized                         7511 / 7557          2.1         477.5       1.0X
    +    Native ORC Vectorized (Pushdown)               358 /  361         43.9          22.8      21.9X
    +
    +
    +    Select 1 int row
    +    (7864320 <= value <= 7864320):           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7851 / 7870          2.0         499.2       1.0X
    +    Parquet Vectorized (Pushdown)                  345 /  347         45.6          21.9      22.8X
    +    Native ORC Vectorized                         7543 / 7554          2.1         479.6       1.0X
    +    Native ORC Vectorized (Pushdown)               364 /  374         43.2          23.1      21.6X
    +
    +
    +    Select 1 int row
    +    (7864319 < value < 7864321):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7837 / 7840          2.0         498.2       1.0X
    +    Parquet Vectorized (Pushdown)                  338 /  339         46.6          21.5      23.2X
    +    Native ORC Vectorized                         7524 / 7541          2.1         478.3       1.0X
    +    Native ORC Vectorized (Pushdown)               361 /  364         43.6          22.9      21.7X
    +
    +
    +    Select 10% int rows (value < 1572864):   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8864 / 8900          1.8         563.5       1.0X
    +    Parquet Vectorized (Pushdown)                 2088 / 2095          7.5         132.7       4.2X
    +    Native ORC Vectorized                         8562 / 8579          1.8         544.3       1.0X
    +    Native ORC Vectorized (Pushdown)              2127 / 2131          7.4         135.2       4.2X
    +
    +
    +    Select 50% int rows (value < 7864320):   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          12671 / 12684          1.2         805.6       1.0X
    +    Parquet Vectorized (Pushdown)                 9032 / 9041          1.7         574.2       1.4X
    +    Native ORC Vectorized                       12388 / 12411          1.3         787.6       1.0X
    +    Native ORC Vectorized (Pushdown)              8873 / 8884          1.8         564.1       1.4X
    +
    +
    +    Select 90% int rows (value < 14155776):  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          16481 / 16495          1.0        1047.8       1.0X
    +    Parquet Vectorized (Pushdown)               15906 / 15919          1.0        1011.3       1.0X
    +    Native ORC Vectorized                       16224 / 16254          1.0        1031.5       1.0X
    +    Native ORC Vectorized (Pushdown)            15632 / 15661          1.0         993.9       1.1X
    +
    +
    +    Select all int rows (value IS NOT NULL): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17341 / 17354          0.9        1102.5       1.0X
    +    Parquet Vectorized (Pushdown)               17463 / 17481          0.9        1110.2       1.0X
    +    Native ORC Vectorized                       17073 / 17089          0.9        1085.4       1.0X
    +    Native ORC Vectorized (Pushdown)            17194 / 17232          0.9        1093.2       1.0X
    +
    +
    +    Select all int rows (value > -1):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17452 / 17467          0.9        1109.6       1.0X
    +    Parquet Vectorized (Pushdown)               17613 / 17630          0.9        1119.8       1.0X
    +    Native ORC Vectorized                       17259 / 17271          0.9        1097.3       1.0X
    +    Native ORC Vectorized (Pushdown)            17385 / 17429          0.9        1105.3       1.0X
    +
    +
    +    Select all int rows (value != -1):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17363 / 17372          0.9        1103.9       1.0X
    +    Parquet Vectorized (Pushdown)               17526 / 17535          0.9        1114.2       1.0X
    +    Native ORC Vectorized                       17052 / 17089          0.9        1084.2       1.0X
    +    Native ORC Vectorized (Pushdown)            17209 / 17229          0.9        1094.1       1.0X
    +
    +
    +    Select 0 distinct string row
    +    (value IS NULL):                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7697 / 7751          2.0         489.4       1.0X
    +    Parquet Vectorized (Pushdown)                  264 /  284         59.5          16.8      29.1X
    +    Native ORC Vectorized                         6942 / 6970          2.3         441.4       1.1X
    +    Native ORC Vectorized (Pushdown)               372 /  381         42.3          23.7      20.7X
    +
    +
    +    Select 0 distinct string row
    +    ('100' < value < '100'):                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7983 / 8018          2.0         507.5       1.0X
    +    Parquet Vectorized (Pushdown)                  334 /  337         47.0          21.3      23.9X
    +    Native ORC Vectorized                         7307 / 7313          2.2         464.5       1.1X
    +    Native ORC Vectorized (Pushdown)               363 /  371         43.3          23.1      22.0X
    +
    +
    +    Select 1 distinct string row
    +    (value = '100'):                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7882 / 7915          2.0         501.1       1.0X
    +    Parquet Vectorized (Pushdown)                  504 /  522         31.2          32.1      15.6X
    +    Native ORC Vectorized                         7143 / 7155          2.2         454.1       1.1X
    +    Native ORC Vectorized (Pushdown)               555 /  573         28.4          35.3      14.2X
    +
    +
    +    Select 1 distinct string row
    +    (value <=> '100'):                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7898 / 7912          2.0         502.1       1.0X
    +    Parquet Vectorized (Pushdown)                  470 /  481         33.5          29.9      16.8X
    +    Native ORC Vectorized                         7135 / 7149          2.2         453.6       1.1X
    +    Native ORC Vectorized (Pushdown)               552 /  557         28.5          35.1      14.3X
    +
    +
    +    Select 1 distinct string row
    +    ('100' <= value <= '100'):               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8189 / 8213          1.9         520.7       1.0X
    +    Parquet Vectorized (Pushdown)                  527 /  534         29.9          33.5      15.5X
    +    Native ORC Vectorized                         7477 / 7498          2.1         475.3       1.1X
    +    Native ORC Vectorized (Pushdown)               558 /  566         28.2          35.5      14.7X
    +
    +
    +    Select all distinct string rows
    +    (value IS NOT NULL):                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          19462 / 19476          0.8        1237.4       1.0X
    +    Parquet Vectorized (Pushdown)               19570 / 19582          0.8        1244.2       1.0X
    +    Native ORC Vectorized                       18577 / 18604          0.8        1181.1       1.0X
    +    Native ORC Vectorized (Pushdown)            18701 / 18742          0.8        1189.0       1.0X
    +    */
    +    benchmark.run()
    +  }
    +
    +  private def runIntBenchmark(numRows: Int, width: Int, mid: Int): Unit = {
    +    Seq("value IS NULL", s"$mid < value AND value < $mid").foreach { whereExpr =>
    +      val title = s"Select 0 int row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    Seq(
    +      s"value = $mid",
    +      s"value <=> $mid",
    +      s"$mid <= value AND value <= $mid",
    +      s"${mid - 1} < value AND value < ${mid + 1}"
    +    ).foreach { whereExpr =>
    +      val title = s"Select 1 int row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
    +
    +    Seq(10, 50, 90).foreach { percent =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select $percent% int rows (value < ${numRows * percent / 100})",
    +        s"value < ${numRows * percent / 100}",
    +        selectExpr
    +      )
    +    }
    +
    +    Seq("value IS NOT NULL", "value > -1", "value != -1").foreach { whereExpr =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select all int rows ($whereExpr)",
    +        whereExpr,
    +        selectExpr)
    +    }
    +  }
    +
    +  private def runStringBenchmark(
    +      numRows: Int, width: Int, searchValue: Int, colType: String): Unit = {
    +    Seq("value IS NULL", s"'$searchValue' < value AND value < '$searchValue'")
    +        .foreach { whereExpr =>
    +      val title = s"Select 0 $colType row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    Seq(
    +      s"value = '$searchValue'",
    +      s"value <=> '$searchValue'",
    +      s"'$searchValue' <= value AND value <= '$searchValue'"
    +    ).foreach { whereExpr =>
    +      val title = s"Select 1 $colType row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
    +
    +    Seq("value IS NOT NULL").foreach { whereExpr =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select all $colType rows ($whereExpr)",
    +        whereExpr,
    +        selectExpr)
    +    }
    +  }
    +
    +  def main(args: Array[String]): Unit = {
    +    val numRows = 1024 * 1024 * 15
    +    val width = 5
    +
    +    // Pushdown for many distinct value case
    +    withTempPath { dir =>
    +      val mid = numRows / 2
    +
    +      withTempTable("orcTable", "patquetTable") {
    +        Seq(true, false).foreach { useStringForValue =>
    +          prepareTable(dir, numRows, width, useStringForValue)
    +          if (useStringForValue) {
    +            runStringBenchmark(numRows, width, mid, "string")
    +          } else {
    +            runIntBenchmark(numRows, width, mid)
    +          }
    +        }
    +      }
    +    }
    +
    +    // Pushdown for few distinct value case (use dictionary encoding)
    --- End diff --
    
    So far, in Apache Spark project, we are testing with only **default** configurations. `snappy` will be the only exception because it's Spark's default compression and it's easy to get an idea in Parquet/ORC comparison.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #90441 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90441/testReport)** for PR 21288 at commit [`8f60902`](https://github.com/apache/spark/commit/8f609023174c9f97bddc46bebe98f4ce3caf08c5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r189780637
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -0,0 +1,437 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.{Random, Try}
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.functions.monotonically_increasing_id
    +import org.apache.spark.sql.internal.SQLConf
    +import org.apache.spark.util.{Benchmark, Utils}
    +
    +
    +/**
    + * Benchmark to measure read performance with Filter pushdown.
    + * To run this:
    + *  spark-submit --class <this class> <spark sql test jar>
    + */
    +object FilterPushdownBenchmark {
    +  val conf = new SparkConf()
    +    .setAppName("FilterPushdownBenchmark")
    +    .setIfMissing("spark.master", "local[1]")
    +    .setIfMissing("spark.driver.memory", "3g")
    +    .setIfMissing("spark.executor.memory", "3g")
    +    .setIfMissing("orc.compression", "snappy")
    +    .setIfMissing("spark.sql.parquet.compression.codec", "snappy")
    +
    +  private val spark = SparkSession.builder().config(conf).getOrCreate()
    +
    +  def withTempPath(f: File => Unit): Unit = {
    +    val path = Utils.createTempDir()
    +    path.delete()
    +    try f(path) finally Utils.deleteRecursively(path)
    +  }
    +
    +  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
    +    try f finally tableNames.foreach(spark.catalog.dropTempView)
    +  }
    +
    +  def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
    +    val (keys, values) = pairs.unzip
    +    val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption)
    +    (keys, values).zipped.foreach(spark.conf.set)
    +    try f finally {
    +      keys.zip(currentValues).foreach {
    +        case (key, Some(value)) => spark.conf.set(key, value)
    +        case (key, None) => spark.conf.unset(key)
    +      }
    +    }
    +  }
    +
    +  private def prepareTable(
    +      dir: File, numRows: Int, width: Int, useStringForValue: Boolean): Unit = {
    +    import spark.implicits._
    +    val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i")
    +    val valueCol = if (useStringForValue) {
    +      monotonically_increasing_id().cast("string")
    +    } else {
    +      monotonically_increasing_id()
    +    }
    +    val df = spark.range(numRows).map(_ => Random.nextLong).selectExpr(selectExpr: _*)
    +      .withColumn("value", valueCol)
    +      .sort("value")
    +
    +    saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
    +    saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
    +  }
    +
    +  private def prepareStringDictTable(
    +      dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = {
    +    val selectExpr = (0 to width).map {
    +      case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value"
    +      case i => s"CAST(rand() AS STRING) c$i"
    +    }
    +    val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value")
    +
    +    saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
    +    saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
    +  }
    +
    +  private def saveAsOrcTable(df: DataFrame, dir: String): Unit = {
    +    df.write.mode("overwrite").orc(dir)
    +    spark.read.orc(dir).createOrReplaceTempView("orcTable")
    +  }
    +
    +  private def saveAsParquetTable(df: DataFrame, dir: String): Unit = {
    +    df.write.mode("overwrite").parquet(dir)
    +    spark.read.parquet(dir).createOrReplaceTempView("parquetTable")
    +  }
    +
    +  def filterPushDownBenchmark(
    +      values: Int,
    +      title: String,
    +      whereExpr: String,
    +      selectExpr: String = "*"): Unit = {
    +    val benchmark = new Benchmark(title, values, minNumIters = 5)
    +
    +    Seq(false, true).foreach { pushDownEnabled =>
    +      val name = s"Parquet Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}"
    +      benchmark.addCase(name) { _ =>
    +        withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> s"$pushDownEnabled") {
    +          spark.sql(s"SELECT $selectExpr FROM parquetTable WHERE $whereExpr").collect()
    +        }
    +      }
    +    }
    +
    +    Seq(false, true).foreach { pushDownEnabled =>
    +      val name = s"Native ORC Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}"
    +      benchmark.addCase(name) { _ =>
    +        withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> s"$pushDownEnabled") {
    +          spark.sql(s"SELECT $selectExpr FROM orcTable WHERE $whereExpr").collect()
    +        }
    +      }
    +    }
    +
    +    /*
    +    Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
    +    Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    +    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    +    Native ORC Vectorized                         8167 / 8185          1.9         519.3       1.0X
    +    Native ORC Vectorized (Pushdown)               365 /  379         43.1          23.2      23.1X
    +
    +
    +    Select 0 string row
    +    ('7864320' < value < '7864320'):         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8532 / 8564          1.8         542.4       1.0X
    +    Parquet Vectorized (Pushdown)                  366 /  386         43.0          23.3      23.3X
    +    Native ORC Vectorized                         8289 / 8300          1.9         527.0       1.0X
    +    Native ORC Vectorized (Pushdown)               378 /  385         41.6          24.0      22.6X
    +
    +
    +    Select 1 string row (value = '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8547 / 8564          1.8         543.4       1.0X
    +    Parquet Vectorized (Pushdown)                  351 /  356         44.9          22.3      24.4X
    +    Native ORC Vectorized                         8310 / 8323          1.9         528.3       1.0X
    +    Native ORC Vectorized (Pushdown)               370 /  375         42.5          23.5      23.1X
    +
    +
    +    Select 1 string row
    +    (value <=> '7864320'):                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8537 / 8563          1.8         542.8       1.0X
    +    Parquet Vectorized (Pushdown)                  310 /  319         50.7          19.7      27.5X
    +    Native ORC Vectorized                         8316 / 8335          1.9         528.7       1.0X
    +    Native ORC Vectorized (Pushdown)               364 /  367         43.2          23.1      23.5X
    +
    +
    +    Select 1 string row
    +    ('7864320' <= value <= '7864320'):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8594 / 8607          1.8         546.4       1.0X
    +    Parquet Vectorized (Pushdown)                  370 /  374         42.5          23.5      23.2X
    +    Native ORC Vectorized                         8350 / 8358          1.9         530.9       1.0X
    +    Native ORC Vectorized (Pushdown)               371 /  374         42.4          23.6      23.2X
    +
    +
    +    Select all string rows
    +    (value IS NOT NULL):                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          19601 / 19625          0.8        1246.2       1.0X
    +    Parquet Vectorized (Pushdown)               19698 / 19703          0.8        1252.3       1.0X
    +    Native ORC Vectorized                       19435 / 19470          0.8        1235.6       1.0X
    +    Native ORC Vectorized (Pushdown)            19568 / 19590          0.8        1244.1       1.0X
    +
    +
    +    Select 0 int row (value IS NULL):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7815 / 7824          2.0         496.9       1.0X
    +    Parquet Vectorized (Pushdown)                  245 /  251         64.2          15.6      31.9X
    +    Native ORC Vectorized                         7436 / 7460          2.1         472.8       1.1X
    +    Native ORC Vectorized (Pushdown)               344 /  351         45.7          21.9      22.7X
    +
    +
    +    Select 0 int row
    +    (7864320 < value < 7864320):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7792 / 7807          2.0         495.4       1.0X
    +    Parquet Vectorized (Pushdown)                  349 /  353         45.1          22.2      22.3X
    +    Native ORC Vectorized                         7451 / 7465          2.1         473.7       1.0X
    +    Native ORC Vectorized (Pushdown)               365 /  368         43.0          23.2      21.3X
    +
    +
    +    Select 1 int row (value = 7864320):      Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7836 / 7872          2.0         498.2       1.0X
    +    Parquet Vectorized (Pushdown)                  322 /  327         48.8          20.5      24.3X
    +    Native ORC Vectorized                         7533 / 7540          2.1         478.9       1.0X
    +    Native ORC Vectorized (Pushdown)               358 /  363         43.9          22.8      21.9X
    +
    +
    +    Select 1 int row (value <=> 7864320):    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7855 / 7870          2.0         499.4       1.0X
    +    Parquet Vectorized (Pushdown)                  286 /  297         54.9          18.2      27.4X
    +    Native ORC Vectorized                         7511 / 7557          2.1         477.5       1.0X
    +    Native ORC Vectorized (Pushdown)               358 /  361         43.9          22.8      21.9X
    +
    +
    +    Select 1 int row
    +    (7864320 <= value <= 7864320):           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7851 / 7870          2.0         499.2       1.0X
    +    Parquet Vectorized (Pushdown)                  345 /  347         45.6          21.9      22.8X
    +    Native ORC Vectorized                         7543 / 7554          2.1         479.6       1.0X
    +    Native ORC Vectorized (Pushdown)               364 /  374         43.2          23.1      21.6X
    +
    +
    +    Select 1 int row
    +    (7864319 < value < 7864321):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7837 / 7840          2.0         498.2       1.0X
    +    Parquet Vectorized (Pushdown)                  338 /  339         46.6          21.5      23.2X
    +    Native ORC Vectorized                         7524 / 7541          2.1         478.3       1.0X
    +    Native ORC Vectorized (Pushdown)               361 /  364         43.6          22.9      21.7X
    +
    +
    +    Select 10% int rows (value < 1572864):   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8864 / 8900          1.8         563.5       1.0X
    +    Parquet Vectorized (Pushdown)                 2088 / 2095          7.5         132.7       4.2X
    +    Native ORC Vectorized                         8562 / 8579          1.8         544.3       1.0X
    +    Native ORC Vectorized (Pushdown)              2127 / 2131          7.4         135.2       4.2X
    +
    +
    +    Select 50% int rows (value < 7864320):   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          12671 / 12684          1.2         805.6       1.0X
    +    Parquet Vectorized (Pushdown)                 9032 / 9041          1.7         574.2       1.4X
    +    Native ORC Vectorized                       12388 / 12411          1.3         787.6       1.0X
    +    Native ORC Vectorized (Pushdown)              8873 / 8884          1.8         564.1       1.4X
    +
    +
    +    Select 90% int rows (value < 14155776):  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          16481 / 16495          1.0        1047.8       1.0X
    +    Parquet Vectorized (Pushdown)               15906 / 15919          1.0        1011.3       1.0X
    +    Native ORC Vectorized                       16224 / 16254          1.0        1031.5       1.0X
    +    Native ORC Vectorized (Pushdown)            15632 / 15661          1.0         993.9       1.1X
    +
    +
    +    Select all int rows (value IS NOT NULL): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17341 / 17354          0.9        1102.5       1.0X
    +    Parquet Vectorized (Pushdown)               17463 / 17481          0.9        1110.2       1.0X
    +    Native ORC Vectorized                       17073 / 17089          0.9        1085.4       1.0X
    +    Native ORC Vectorized (Pushdown)            17194 / 17232          0.9        1093.2       1.0X
    +
    +
    +    Select all int rows (value > -1):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17452 / 17467          0.9        1109.6       1.0X
    +    Parquet Vectorized (Pushdown)               17613 / 17630          0.9        1119.8       1.0X
    +    Native ORC Vectorized                       17259 / 17271          0.9        1097.3       1.0X
    +    Native ORC Vectorized (Pushdown)            17385 / 17429          0.9        1105.3       1.0X
    +
    +
    +    Select all int rows (value != -1):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17363 / 17372          0.9        1103.9       1.0X
    +    Parquet Vectorized (Pushdown)               17526 / 17535          0.9        1114.2       1.0X
    +    Native ORC Vectorized                       17052 / 17089          0.9        1084.2       1.0X
    +    Native ORC Vectorized (Pushdown)            17209 / 17229          0.9        1094.1       1.0X
    +
    +
    +    Select 0 distinct string row
    +    (value IS NULL):                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7697 / 7751          2.0         489.4       1.0X
    +    Parquet Vectorized (Pushdown)                  264 /  284         59.5          16.8      29.1X
    +    Native ORC Vectorized                         6942 / 6970          2.3         441.4       1.1X
    +    Native ORC Vectorized (Pushdown)               372 /  381         42.3          23.7      20.7X
    +
    +
    +    Select 0 distinct string row
    +    ('100' < value < '100'):                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7983 / 8018          2.0         507.5       1.0X
    +    Parquet Vectorized (Pushdown)                  334 /  337         47.0          21.3      23.9X
    +    Native ORC Vectorized                         7307 / 7313          2.2         464.5       1.1X
    +    Native ORC Vectorized (Pushdown)               363 /  371         43.3          23.1      22.0X
    +
    +
    +    Select 1 distinct string row
    +    (value = '100'):                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7882 / 7915          2.0         501.1       1.0X
    +    Parquet Vectorized (Pushdown)                  504 /  522         31.2          32.1      15.6X
    +    Native ORC Vectorized                         7143 / 7155          2.2         454.1       1.1X
    +    Native ORC Vectorized (Pushdown)               555 /  573         28.4          35.3      14.2X
    +
    +
    +    Select 1 distinct string row
    +    (value <=> '100'):                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7898 / 7912          2.0         502.1       1.0X
    +    Parquet Vectorized (Pushdown)                  470 /  481         33.5          29.9      16.8X
    +    Native ORC Vectorized                         7135 / 7149          2.2         453.6       1.1X
    +    Native ORC Vectorized (Pushdown)               552 /  557         28.5          35.1      14.3X
    +
    +
    +    Select 1 distinct string row
    +    ('100' <= value <= '100'):               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8189 / 8213          1.9         520.7       1.0X
    +    Parquet Vectorized (Pushdown)                  527 /  534         29.9          33.5      15.5X
    +    Native ORC Vectorized                         7477 / 7498          2.1         475.3       1.1X
    +    Native ORC Vectorized (Pushdown)               558 /  566         28.2          35.5      14.7X
    +
    +
    +    Select all distinct string rows
    +    (value IS NOT NULL):                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          19462 / 19476          0.8        1237.4       1.0X
    +    Parquet Vectorized (Pushdown)               19570 / 19582          0.8        1244.2       1.0X
    +    Native ORC Vectorized                       18577 / 18604          0.8        1181.1       1.0X
    +    Native ORC Vectorized (Pushdown)            18701 / 18742          0.8        1189.0       1.0X
    +    */
    +    benchmark.run()
    +  }
    +
    +  private def runIntBenchmark(numRows: Int, width: Int, mid: Int): Unit = {
    +    Seq("value IS NULL", s"$mid < value AND value < $mid").foreach { whereExpr =>
    +      val title = s"Select 0 int row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    Seq(
    +      s"value = $mid",
    +      s"value <=> $mid",
    +      s"$mid <= value AND value <= $mid",
    +      s"${mid - 1} < value AND value < ${mid + 1}"
    +    ).foreach { whereExpr =>
    +      val title = s"Select 1 int row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
    +
    +    Seq(10, 50, 90).foreach { percent =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select $percent% int rows (value < ${numRows * percent / 100})",
    +        s"value < ${numRows * percent / 100}",
    +        selectExpr
    +      )
    +    }
    +
    +    Seq("value IS NOT NULL", "value > -1", "value != -1").foreach { whereExpr =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select all int rows ($whereExpr)",
    +        whereExpr,
    +        selectExpr)
    +    }
    +  }
    +
    +  private def runStringBenchmark(
    +      numRows: Int, width: Int, searchValue: Int, colType: String): Unit = {
    +    Seq("value IS NULL", s"'$searchValue' < value AND value < '$searchValue'")
    +        .foreach { whereExpr =>
    +      val title = s"Select 0 $colType row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    Seq(
    +      s"value = '$searchValue'",
    +      s"value <=> '$searchValue'",
    +      s"'$searchValue' <= value AND value <= '$searchValue'"
    +    ).foreach { whereExpr =>
    +      val title = s"Select 1 $colType row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
    +
    +    Seq("value IS NOT NULL").foreach { whereExpr =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select all $colType rows ($whereExpr)",
    +        whereExpr,
    +        selectExpr)
    +    }
    +  }
    +
    +  def main(args: Array[String]): Unit = {
    +    val numRows = 1024 * 1024 * 15
    +    val width = 5
    +
    +    // Pushdown for many distinct value case
    +    withTempPath { dir =>
    +      val mid = numRows / 2
    +
    +      withTempTable("orcTable", "patquetTable") {
    +        Seq(true, false).foreach { useStringForValue =>
    +          prepareTable(dir, numRows, width, useStringForValue)
    +          if (useStringForValue) {
    +            runStringBenchmark(numRows, width, mid, "string")
    +          } else {
    +            runIntBenchmark(numRows, width, mid)
    +          }
    +        }
    +      }
    +    }
    +
    +    // Pushdown for few distinct value case (use dictionary encoding)
    --- End diff --
    
    I feel it'd be better to set 1.0 at the option for safety, too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91795 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91795/testReport)** for PR 21288 at commit [`d41e689`](https://github.com/apache/spark/commit/d41e68914e00a7ba6734b3fdbe839b130fbbd42e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3418/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    LGTM
    
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    @gatorsmile and @maropu . I really appreciate this effort. Thanks.
    
    Since this is a cloud benchmark, I have one thing to recommend. Can we use `r3.xlarge` for all benchmarks **consistently**? As we know, it's difficult to compare the result from different machines.
    
    There are three reasons.
    
    1. `r3.xlarge` is cheaper than `m4.2xlarge`.
    2. Previous benchmark result cames from Macbook (SSD). `r3.xlarge` also provides SSD.
    3. `r3.xlarge` is used at [Databricks TPCDS benchmark](https://databricks.com/blog/2017/07/12/benchmarking-big-data-sql-platforms-in-the-cloud.html), too.
    
    The following is the result on `r3.xlarge`; I launched the machine and build this PR on the latest master and run `bin/spark-submit --master local[1] --driver-memory 10G --conf spark.ui.enabled=false --class org.apache.spark.sql.execution.benchmark.FilterPushdownBenchmark sql/core/target/scala-2.11/spark-sql_2.11-2.0-SNAPSHOT-tests.jar`. (There is no hadoop installation. I guess @maropu also does.)
    
    ```
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 3.10.0-693.5.2.el7.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9133 / 9275          1.7         580.6       1.0X
    Parquet Vectorized (Pushdown)                   85 /  100        185.2           5.4     107.6X
    Native ORC Vectorized                         8760 / 8843          1.8         556.9       1.0X
    Native ORC Vectorized (Pushdown)               115 /  130        136.4           7.3      79.2X
    
    
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 3.10.0-693.5.2.el7.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9254 / 9276          1.7         588.4       1.0X
    Parquet Vectorized (Pushdown)                  912 /  922         17.2          58.0      10.1X
    Native ORC Vectorized                         8966 / 9013          1.8         570.1       1.0X
    Native ORC Vectorized (Pushdown)               254 /  276         61.8          16.2      36.4X
    
    
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 3.10.0-693.5.2.el7.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 1 string row (value = '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9106 / 9136          1.7         578.9       1.0X
    Parquet Vectorized (Pushdown)                  897 /  910         17.5          57.0      10.2X
    Native ORC Vectorized                         8846 / 8889          1.8         562.4       1.0X
    Native ORC Vectorized (Pushdown)               254 /  267         61.9          16.2      35.8X
    
    
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 3.10.0-693.5.2.el7.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 1 string row (value <=> '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9095 / 9124          1.7         578.3       1.0X
    Parquet Vectorized (Pushdown)                  891 /  899         17.7          56.6      10.2X
    Native ORC Vectorized                         8853 / 8941          1.8         562.8       1.0X
    Native ORC Vectorized (Pushdown)               246 /  254         64.0          15.6      37.0X
    
    
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 3.10.0-693.5.2.el7.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 1 string row ('7864320' <= value <= '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9236 / 9273          1.7         587.2       1.0X
    Parquet Vectorized (Pushdown)                  902 /  910         17.4          57.4      10.2X
    Native ORC Vectorized                         8944 / 8965          1.8         568.6       1.0X
    Native ORC Vectorized (Pushdown)               248 /  262         63.4          15.8      37.2X
    
    
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 3.10.0-693.5.2.el7.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select all string rows (value IS NOT NULL): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Parquet Vectorized                          20309 / 20381          0.8        1291.2       1.0X
    Parquet Vectorized (Pushdown)               20437 / 20477          0.8        1299.3       1.0X
    Native ORC Vectorized                       24929 / 24999          0.6        1585.0       0.8X
    Native ORC Vectorized (Pushdown)            24918 / 25040          0.6        1584.3       0.8X
    ```
    
    As you see, the result is more consistent from the previous one and is different from this PR. Actually, I was reluctant to say this, but we had better have a standard way to generate a benchmark result on the cloud. If possible, I'd like to use `r3.xlarge`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91795/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91228/testReport)** for PR 21288 at commit [`d41e689`](https://github.com/apache/spark/commit/d41e68914e00a7ba6734b3fdbe839b130fbbd42e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91219/testReport)** for PR 21288 at commit [`2c0d5cb`](https://github.com/apache/spark/commit/2c0d5cbf51268540653543b96de135a6923c6cef).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r191650442
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -131,211 +132,214 @@ object FilterPushdownBenchmark {
         }
     
         /*
    +    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.26-46.32.amzn1.x86_64
         Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
         Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
         ------------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    -    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    -    Native ORC Vectorized                         8167 / 8185          1.9         519.3       1.0X
    -    Native ORC Vectorized (Pushdown)               365 /  379         43.1          23.2      23.1X
    +    Parquet Vectorized                            2961 / 3123          5.3         188.3       1.0X
    +    Parquet Vectorized (Pushdown)                 3057 / 3121          5.1         194.4       1.0X
    --- End diff --
    
    Is it a regression?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Yep. Thank you for progressing this, @maropu !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r189158065
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FilterPushdownBenchmark.scala ---
    @@ -105,138 +128,306 @@ object FilterPushdownBenchmark {
         }
     
         /*
    -    Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Mac OS X 10.13.2
    -    Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
    -
    -    Select 0 row (id IS NULL):              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7882 / 7957          2.0         501.1       1.0X
    -    Parquet Vectorized (Pushdown)                   55 /   60        285.2           3.5     142.9X
    -    Native ORC Vectorized                         5592 / 5627          2.8         355.5       1.4X
    -    Native ORC Vectorized (Pushdown)                66 /   70        237.2           4.2     118.9X
    -
    -    Select 0 row (7864320 < id < 7864320):  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7884 / 7909          2.0         501.2       1.0X
    -    Parquet Vectorized (Pushdown)                  739 /  752         21.3          47.0      10.7X
    -    Native ORC Vectorized                         5614 / 5646          2.8         356.9       1.4X
    -    Native ORC Vectorized (Pushdown)                81 /   83        195.2           5.1      97.8X
    -
    -    Select 1 row (id = 7864320):            Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7905 / 8027          2.0         502.6       1.0X
    -    Parquet Vectorized (Pushdown)                  740 /  766         21.2          47.1      10.7X
    -    Native ORC Vectorized                         5684 / 5738          2.8         361.4       1.4X
    -    Native ORC Vectorized (Pushdown)                78 /   81        202.4           4.9     101.7X
    -
    -    Select 1 row (id <=> 7864320):          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7928 / 7993          2.0         504.1       1.0X
    -    Parquet Vectorized (Pushdown)                  747 /  772         21.0          47.5      10.6X
    -    Native ORC Vectorized                         5728 / 5753          2.7         364.2       1.4X
    -    Native ORC Vectorized (Pushdown)                76 /   78        207.9           4.8     104.8X
    -
    -    Select 1 row (7864320 <= id <= 7864320):Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7939 / 8021          2.0         504.8       1.0X
    -    Parquet Vectorized (Pushdown)                  746 /  770         21.1          47.4      10.6X
    -    Native ORC Vectorized                         5690 / 5734          2.8         361.7       1.4X
    -    Native ORC Vectorized (Pushdown)                76 /   79        206.7           4.8     104.3X
    -
    -    Select 1 row (7864319 < id < 7864321):  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7972 / 8019          2.0         506.9       1.0X
    -    Parquet Vectorized (Pushdown)                  742 /  764         21.2          47.2      10.7X
    -    Native ORC Vectorized                         5704 / 5743          2.8         362.6       1.4X
    -    Native ORC Vectorized (Pushdown)                76 /   78        207.9           4.8     105.4X
    -
    -    Select 10% rows (id < 1572864):         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            8733 / 8808          1.8         555.2       1.0X
    -    Parquet Vectorized (Pushdown)                 2213 / 2267          7.1         140.7       3.9X
    -    Native ORC Vectorized                         6420 / 6463          2.4         408.2       1.4X
    -    Native ORC Vectorized (Pushdown)              1313 / 1331         12.0          83.5       6.7X
    -
    -    Select 50% rows (id < 7864320):         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          11518 / 11591          1.4         732.3       1.0X
    -    Parquet Vectorized (Pushdown)                 7962 / 7991          2.0         506.2       1.4X
    -    Native ORC Vectorized                         8927 / 8985          1.8         567.6       1.3X
    -    Native ORC Vectorized (Pushdown)              6102 / 6160          2.6         387.9       1.9X
    -
    -    Select 90% rows (id < 14155776):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14255 / 14389          1.1         906.3       1.0X
    -    Parquet Vectorized (Pushdown)               13564 / 13594          1.2         862.4       1.1X
    -    Native ORC Vectorized                       11442 / 11608          1.4         727.5       1.2X
    -    Native ORC Vectorized (Pushdown)            10991 / 11029          1.4         698.8       1.3X
    -
    -    Select all rows (id IS NOT NULL):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14917 / 14938          1.1         948.4       1.0X
    -    Parquet Vectorized (Pushdown)               14910 / 14964          1.1         948.0       1.0X
    -    Native ORC Vectorized                       11986 / 12069          1.3         762.0       1.2X
    -    Native ORC Vectorized (Pushdown)            12037 / 12123          1.3         765.3       1.2X
    -
    -    Select all rows (id > -1):              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14951 / 14976          1.1         950.6       1.0X
    -    Parquet Vectorized (Pushdown)               14934 / 15016          1.1         949.5       1.0X
    -    Native ORC Vectorized                       12000 / 12156          1.3         763.0       1.2X
    -    Native ORC Vectorized (Pushdown)            12079 / 12113          1.3         767.9       1.2X
    -
    -    Select all rows (id != -1):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14930 / 14972          1.1         949.3       1.0X
    -    Parquet Vectorized (Pushdown)               15015 / 15047          1.0         954.6       1.0X
    -    Native ORC Vectorized                       12090 / 12259          1.3         768.7       1.2X
    -    Native ORC Vectorized (Pushdown)            12021 / 12096          1.3         764.2       1.2X
    +    Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
    --- End diff --
    
    Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91946/testReport)** for PR 21288 at commit [`4a9cec9`](https://github.com/apache/spark/commit/4a9cec91f9446161d4dde0cac20ccdccb9a112e7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91210/testReport)** for PR 21288 at commit [`b7859ed`](https://github.com/apache/spark/commit/b7859ed0905ce3e0476e5d327f65798acc7aba8c).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #90440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90440/testReport)** for PR 21288 at commit [`223bf20`](https://github.com/apache/spark/commit/223bf2008abfe5fd41c3b5e741dc525ab3864977).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/191/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r191620766
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -131,211 +132,214 @@ object FilterPushdownBenchmark {
         }
     
         /*
    +    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.26-46.32.amzn1.x86_64
         Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
         Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
         ------------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    -    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    -    Native ORC Vectorized                         8167 / 8185          1.9         519.3       1.0X
    -    Native ORC Vectorized (Pushdown)               365 /  379         43.1          23.2      23.1X
    +    Parquet Vectorized                            2961 / 3123          5.3         188.3       1.0X
    +    Parquet Vectorized (Pushdown)                 3057 / 3121          5.1         194.4       1.0X
    --- End diff --
    
    That might be, but I feel the change was too big... I probably think that I had some mistakes in the last benchmark runs (I've not found why yet though).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3095/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    yea, I also agree with the opinion; we'd be better to run benchmarks on the same machine.
    I'll re-run the benchmark on `r3.xlarge` to check if I could get the same result.
    
    >  There is no hadoop installation. I guess @maropu also does
    yea, I had no installation.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r191280132
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -131,211 +132,214 @@ object FilterPushdownBenchmark {
         }
     
         /*
    +    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.26-46.32.amzn1.x86_64
         Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
         Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
         ------------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    -    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    -    Native ORC Vectorized                         8167 / 8185          1.9         519.3       1.0X
    -    Native ORC Vectorized (Pushdown)               365 /  379         43.1          23.2      23.1X
    +    Parquet Vectorized                            2961 / 3123          5.3         188.3       1.0X
    +    Parquet Vectorized (Pushdown)                 3057 / 3121          5.1         194.4       1.0X
    --- End diff --
    
    The difference is huge. What happened?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91815 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91815/testReport)** for PR 21288 at commit [`fa53156`](https://github.com/apache/spark/commit/fa53156599812adc94f089b8c163224fb2e4935f).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3191/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r189635143
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -0,0 +1,437 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.{Random, Try}
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.functions.monotonically_increasing_id
    +import org.apache.spark.sql.internal.SQLConf
    +import org.apache.spark.util.{Benchmark, Utils}
    +
    +
    +/**
    + * Benchmark to measure read performance with Filter pushdown.
    + * To run this:
    + *  spark-submit --class <this class> <spark sql test jar>
    + */
    +object FilterPushdownBenchmark {
    +  val conf = new SparkConf()
    +    .setAppName("FilterPushdownBenchmark")
    +    .setIfMissing("spark.master", "local[1]")
    +    .setIfMissing("spark.driver.memory", "3g")
    +    .setIfMissing("spark.executor.memory", "3g")
    +    .setIfMissing("orc.compression", "snappy")
    +    .setIfMissing("spark.sql.parquet.compression.codec", "snappy")
    +
    +  private val spark = SparkSession.builder().config(conf).getOrCreate()
    +
    +  def withTempPath(f: File => Unit): Unit = {
    +    val path = Utils.createTempDir()
    +    path.delete()
    +    try f(path) finally Utils.deleteRecursively(path)
    +  }
    +
    +  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
    +    try f finally tableNames.foreach(spark.catalog.dropTempView)
    +  }
    +
    +  def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
    +    val (keys, values) = pairs.unzip
    +    val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption)
    +    (keys, values).zipped.foreach(spark.conf.set)
    +    try f finally {
    +      keys.zip(currentValues).foreach {
    +        case (key, Some(value)) => spark.conf.set(key, value)
    +        case (key, None) => spark.conf.unset(key)
    +      }
    +    }
    +  }
    +
    +  private def prepareTable(
    +      dir: File, numRows: Int, width: Int, useStringForValue: Boolean): Unit = {
    +    import spark.implicits._
    +    val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i")
    +    val valueCol = if (useStringForValue) {
    +      monotonically_increasing_id().cast("string")
    +    } else {
    +      monotonically_increasing_id()
    +    }
    +    val df = spark.range(numRows).map(_ => Random.nextLong).selectExpr(selectExpr: _*)
    +      .withColumn("value", valueCol)
    +      .sort("value")
    +
    +    saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
    +    saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
    +  }
    +
    +  private def prepareStringDictTable(
    +      dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = {
    +    val selectExpr = (0 to width).map {
    +      case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value"
    +      case i => s"CAST(rand() AS STRING) c$i"
    +    }
    +    val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value")
    +
    +    saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
    +    saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
    +  }
    +
    +  private def saveAsOrcTable(df: DataFrame, dir: String): Unit = {
    +    df.write.mode("overwrite").orc(dir)
    +    spark.read.orc(dir).createOrReplaceTempView("orcTable")
    +  }
    +
    +  private def saveAsParquetTable(df: DataFrame, dir: String): Unit = {
    +    df.write.mode("overwrite").parquet(dir)
    +    spark.read.parquet(dir).createOrReplaceTempView("parquetTable")
    +  }
    +
    +  def filterPushDownBenchmark(
    +      values: Int,
    +      title: String,
    +      whereExpr: String,
    +      selectExpr: String = "*"): Unit = {
    +    val benchmark = new Benchmark(title, values, minNumIters = 5)
    +
    +    Seq(false, true).foreach { pushDownEnabled =>
    +      val name = s"Parquet Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}"
    +      benchmark.addCase(name) { _ =>
    +        withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> s"$pushDownEnabled") {
    +          spark.sql(s"SELECT $selectExpr FROM parquetTable WHERE $whereExpr").collect()
    +        }
    +      }
    +    }
    +
    +    Seq(false, true).foreach { pushDownEnabled =>
    +      val name = s"Native ORC Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}"
    +      benchmark.addCase(name) { _ =>
    +        withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> s"$pushDownEnabled") {
    +          spark.sql(s"SELECT $selectExpr FROM orcTable WHERE $whereExpr").collect()
    +        }
    +      }
    +    }
    +
    +    /*
    +    Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
    +    Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    +    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    +    Native ORC Vectorized                         8167 / 8185          1.9         519.3       1.0X
    +    Native ORC Vectorized (Pushdown)               365 /  379         43.1          23.2      23.1X
    +
    +
    +    Select 0 string row
    +    ('7864320' < value < '7864320'):         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8532 / 8564          1.8         542.4       1.0X
    +    Parquet Vectorized (Pushdown)                  366 /  386         43.0          23.3      23.3X
    +    Native ORC Vectorized                         8289 / 8300          1.9         527.0       1.0X
    +    Native ORC Vectorized (Pushdown)               378 /  385         41.6          24.0      22.6X
    +
    +
    +    Select 1 string row (value = '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8547 / 8564          1.8         543.4       1.0X
    +    Parquet Vectorized (Pushdown)                  351 /  356         44.9          22.3      24.4X
    +    Native ORC Vectorized                         8310 / 8323          1.9         528.3       1.0X
    +    Native ORC Vectorized (Pushdown)               370 /  375         42.5          23.5      23.1X
    +
    +
    +    Select 1 string row
    +    (value <=> '7864320'):                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8537 / 8563          1.8         542.8       1.0X
    +    Parquet Vectorized (Pushdown)                  310 /  319         50.7          19.7      27.5X
    +    Native ORC Vectorized                         8316 / 8335          1.9         528.7       1.0X
    +    Native ORC Vectorized (Pushdown)               364 /  367         43.2          23.1      23.5X
    +
    +
    +    Select 1 string row
    +    ('7864320' <= value <= '7864320'):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8594 / 8607          1.8         546.4       1.0X
    +    Parquet Vectorized (Pushdown)                  370 /  374         42.5          23.5      23.2X
    +    Native ORC Vectorized                         8350 / 8358          1.9         530.9       1.0X
    +    Native ORC Vectorized (Pushdown)               371 /  374         42.4          23.6      23.2X
    +
    +
    +    Select all string rows
    +    (value IS NOT NULL):                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          19601 / 19625          0.8        1246.2       1.0X
    +    Parquet Vectorized (Pushdown)               19698 / 19703          0.8        1252.3       1.0X
    +    Native ORC Vectorized                       19435 / 19470          0.8        1235.6       1.0X
    +    Native ORC Vectorized (Pushdown)            19568 / 19590          0.8        1244.1       1.0X
    +
    +
    +    Select 0 int row (value IS NULL):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7815 / 7824          2.0         496.9       1.0X
    +    Parquet Vectorized (Pushdown)                  245 /  251         64.2          15.6      31.9X
    +    Native ORC Vectorized                         7436 / 7460          2.1         472.8       1.1X
    +    Native ORC Vectorized (Pushdown)               344 /  351         45.7          21.9      22.7X
    +
    +
    +    Select 0 int row
    +    (7864320 < value < 7864320):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7792 / 7807          2.0         495.4       1.0X
    +    Parquet Vectorized (Pushdown)                  349 /  353         45.1          22.2      22.3X
    +    Native ORC Vectorized                         7451 / 7465          2.1         473.7       1.0X
    +    Native ORC Vectorized (Pushdown)               365 /  368         43.0          23.2      21.3X
    +
    +
    +    Select 1 int row (value = 7864320):      Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7836 / 7872          2.0         498.2       1.0X
    +    Parquet Vectorized (Pushdown)                  322 /  327         48.8          20.5      24.3X
    +    Native ORC Vectorized                         7533 / 7540          2.1         478.9       1.0X
    +    Native ORC Vectorized (Pushdown)               358 /  363         43.9          22.8      21.9X
    +
    +
    +    Select 1 int row (value <=> 7864320):    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7855 / 7870          2.0         499.4       1.0X
    +    Parquet Vectorized (Pushdown)                  286 /  297         54.9          18.2      27.4X
    +    Native ORC Vectorized                         7511 / 7557          2.1         477.5       1.0X
    +    Native ORC Vectorized (Pushdown)               358 /  361         43.9          22.8      21.9X
    +
    +
    +    Select 1 int row
    +    (7864320 <= value <= 7864320):           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7851 / 7870          2.0         499.2       1.0X
    +    Parquet Vectorized (Pushdown)                  345 /  347         45.6          21.9      22.8X
    +    Native ORC Vectorized                         7543 / 7554          2.1         479.6       1.0X
    +    Native ORC Vectorized (Pushdown)               364 /  374         43.2          23.1      21.6X
    +
    +
    +    Select 1 int row
    +    (7864319 < value < 7864321):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7837 / 7840          2.0         498.2       1.0X
    +    Parquet Vectorized (Pushdown)                  338 /  339         46.6          21.5      23.2X
    +    Native ORC Vectorized                         7524 / 7541          2.1         478.3       1.0X
    +    Native ORC Vectorized (Pushdown)               361 /  364         43.6          22.9      21.7X
    +
    +
    +    Select 10% int rows (value < 1572864):   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8864 / 8900          1.8         563.5       1.0X
    +    Parquet Vectorized (Pushdown)                 2088 / 2095          7.5         132.7       4.2X
    +    Native ORC Vectorized                         8562 / 8579          1.8         544.3       1.0X
    +    Native ORC Vectorized (Pushdown)              2127 / 2131          7.4         135.2       4.2X
    +
    +
    +    Select 50% int rows (value < 7864320):   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          12671 / 12684          1.2         805.6       1.0X
    +    Parquet Vectorized (Pushdown)                 9032 / 9041          1.7         574.2       1.4X
    +    Native ORC Vectorized                       12388 / 12411          1.3         787.6       1.0X
    +    Native ORC Vectorized (Pushdown)              8873 / 8884          1.8         564.1       1.4X
    +
    +
    +    Select 90% int rows (value < 14155776):  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          16481 / 16495          1.0        1047.8       1.0X
    +    Parquet Vectorized (Pushdown)               15906 / 15919          1.0        1011.3       1.0X
    +    Native ORC Vectorized                       16224 / 16254          1.0        1031.5       1.0X
    +    Native ORC Vectorized (Pushdown)            15632 / 15661          1.0         993.9       1.1X
    +
    +
    +    Select all int rows (value IS NOT NULL): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17341 / 17354          0.9        1102.5       1.0X
    +    Parquet Vectorized (Pushdown)               17463 / 17481          0.9        1110.2       1.0X
    +    Native ORC Vectorized                       17073 / 17089          0.9        1085.4       1.0X
    +    Native ORC Vectorized (Pushdown)            17194 / 17232          0.9        1093.2       1.0X
    +
    +
    +    Select all int rows (value > -1):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17452 / 17467          0.9        1109.6       1.0X
    +    Parquet Vectorized (Pushdown)               17613 / 17630          0.9        1119.8       1.0X
    +    Native ORC Vectorized                       17259 / 17271          0.9        1097.3       1.0X
    +    Native ORC Vectorized (Pushdown)            17385 / 17429          0.9        1105.3       1.0X
    +
    +
    +    Select all int rows (value != -1):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17363 / 17372          0.9        1103.9       1.0X
    +    Parquet Vectorized (Pushdown)               17526 / 17535          0.9        1114.2       1.0X
    +    Native ORC Vectorized                       17052 / 17089          0.9        1084.2       1.0X
    +    Native ORC Vectorized (Pushdown)            17209 / 17229          0.9        1094.1       1.0X
    +
    +
    +    Select 0 distinct string row
    +    (value IS NULL):                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7697 / 7751          2.0         489.4       1.0X
    +    Parquet Vectorized (Pushdown)                  264 /  284         59.5          16.8      29.1X
    +    Native ORC Vectorized                         6942 / 6970          2.3         441.4       1.1X
    +    Native ORC Vectorized (Pushdown)               372 /  381         42.3          23.7      20.7X
    +
    +
    +    Select 0 distinct string row
    +    ('100' < value < '100'):                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7983 / 8018          2.0         507.5       1.0X
    +    Parquet Vectorized (Pushdown)                  334 /  337         47.0          21.3      23.9X
    +    Native ORC Vectorized                         7307 / 7313          2.2         464.5       1.1X
    +    Native ORC Vectorized (Pushdown)               363 /  371         43.3          23.1      22.0X
    +
    +
    +    Select 1 distinct string row
    +    (value = '100'):                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7882 / 7915          2.0         501.1       1.0X
    +    Parquet Vectorized (Pushdown)                  504 /  522         31.2          32.1      15.6X
    +    Native ORC Vectorized                         7143 / 7155          2.2         454.1       1.1X
    +    Native ORC Vectorized (Pushdown)               555 /  573         28.4          35.3      14.2X
    +
    +
    +    Select 1 distinct string row
    +    (value <=> '100'):                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7898 / 7912          2.0         502.1       1.0X
    +    Parquet Vectorized (Pushdown)                  470 /  481         33.5          29.9      16.8X
    +    Native ORC Vectorized                         7135 / 7149          2.2         453.6       1.1X
    +    Native ORC Vectorized (Pushdown)               552 /  557         28.5          35.1      14.3X
    +
    +
    +    Select 1 distinct string row
    +    ('100' <= value <= '100'):               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8189 / 8213          1.9         520.7       1.0X
    +    Parquet Vectorized (Pushdown)                  527 /  534         29.9          33.5      15.5X
    +    Native ORC Vectorized                         7477 / 7498          2.1         475.3       1.1X
    +    Native ORC Vectorized (Pushdown)               558 /  566         28.2          35.5      14.7X
    +
    +
    +    Select all distinct string rows
    +    (value IS NOT NULL):                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          19462 / 19476          0.8        1237.4       1.0X
    +    Parquet Vectorized (Pushdown)               19570 / 19582          0.8        1244.2       1.0X
    +    Native ORC Vectorized                       18577 / 18604          0.8        1181.1       1.0X
    +    Native ORC Vectorized (Pushdown)            18701 / 18742          0.8        1189.0       1.0X
    +    */
    +    benchmark.run()
    +  }
    +
    +  private def runIntBenchmark(numRows: Int, width: Int, mid: Int): Unit = {
    +    Seq("value IS NULL", s"$mid < value AND value < $mid").foreach { whereExpr =>
    +      val title = s"Select 0 int row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    Seq(
    +      s"value = $mid",
    +      s"value <=> $mid",
    +      s"$mid <= value AND value <= $mid",
    +      s"${mid - 1} < value AND value < ${mid + 1}"
    +    ).foreach { whereExpr =>
    +      val title = s"Select 1 int row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
    +
    +    Seq(10, 50, 90).foreach { percent =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select $percent% int rows (value < ${numRows * percent / 100})",
    +        s"value < ${numRows * percent / 100}",
    +        selectExpr
    +      )
    +    }
    +
    +    Seq("value IS NOT NULL", "value > -1", "value != -1").foreach { whereExpr =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select all int rows ($whereExpr)",
    +        whereExpr,
    +        selectExpr)
    +    }
    +  }
    +
    +  private def runStringBenchmark(
    +      numRows: Int, width: Int, searchValue: Int, colType: String): Unit = {
    +    Seq("value IS NULL", s"'$searchValue' < value AND value < '$searchValue'")
    +        .foreach { whereExpr =>
    +      val title = s"Select 0 $colType row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    Seq(
    +      s"value = '$searchValue'",
    +      s"value <=> '$searchValue'",
    +      s"'$searchValue' <= value AND value <= '$searchValue'"
    +    ).foreach { whereExpr =>
    +      val title = s"Select 1 $colType row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
    +
    +    Seq("value IS NOT NULL").foreach { whereExpr =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select all $colType rows ($whereExpr)",
    +        whereExpr,
    +        selectExpr)
    +    }
    +  }
    +
    +  def main(args: Array[String]): Unit = {
    +    val numRows = 1024 * 1024 * 15
    +    val width = 5
    +
    +    // Pushdown for many distinct value case
    +    withTempPath { dir =>
    +      val mid = numRows / 2
    +
    +      withTempTable("orcTable", "patquetTable") {
    +        Seq(true, false).foreach { useStringForValue =>
    +          prepareTable(dir, numRows, width, useStringForValue)
    +          if (useStringForValue) {
    +            runStringBenchmark(numRows, width, mid, "string")
    +          } else {
    +            runIntBenchmark(numRows, width, mid)
    +          }
    +        }
    +      }
    +    }
    +
    +    // Pushdown for few distinct value case (use dictionary encoding)
    --- End diff --
    
    For ORC, the ORC has the conf called `orc.dictionary.key.threshold`. Do we need to set the conf here? cc @dongjoon-hyun 
    ```
      DICTIONARY_KEY_SIZE_THRESHOLD("orc.dictionary.key.threshold",
          "hive.exec.orc.dictionary.key.size.threshold",
          0.8,
          "If the number of distinct keys in a dictionary is greater than this\n" +
              "fraction of the total number of non-null rows, turn off \n" +
              "dictionary encoding.  Use 1 to always use dictionary encoding.")
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90441/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r191283013
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -131,211 +132,214 @@ object FilterPushdownBenchmark {
         }
     
         /*
    +    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.26-46.32.amzn1.x86_64
         Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
         Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
         ------------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    -    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    -    Native ORC Vectorized                         8167 / 8185          1.9         519.3       1.0X
    -    Native ORC Vectorized (Pushdown)               365 /  379         43.1          23.2      23.1X
    +    Parquet Vectorized                            2961 / 3123          5.3         188.3       1.0X
    +    Parquet Vectorized (Pushdown)                 3057 / 3121          5.1         194.4       1.0X
    --- End diff --
    
    yea, I thinks so. But, not sure. I tried to run multiple times though, I didn't get the old performance values...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91228/testReport)** for PR 21288 at commit [`d41e689`](https://github.com/apache/spark/commit/d41e68914e00a7ba6734b3fdbe839b130fbbd42e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    @maropu Could you fix the style? 
    
    BTW, based on the latest result, Parquet is generally faster than ORC. cc @dongjoon-hyun @rdblue 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91228/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3405/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90454/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91211/testReport)** for PR 21288 at commit [`2c0d5cb`](https://github.com/apache/spark/commit/2c0d5cbf51268540653543b96de135a6923c6cef).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91857/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Thanks for the check! btw, `DataSourceReadBenchmark` has the same issue (`spark.master` setup), so is it ok to fix this as follow-up? 
    https://github.com/apache/spark/compare/master...maropu:FixDataSourceReadBenchmark
    Also, I update the bench on `r3.xlarge`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4082/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21288


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3096/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91857 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91857/testReport)** for PR 21288 at commit [`d3dd504`](https://github.com/apache/spark/commit/d3dd50463c2b91ae8800dbcc811dcc52880a02ca).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4014/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3628/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r195940979
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -0,0 +1,442 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.{Random, Try}
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.functions.monotonically_increasing_id
    +import org.apache.spark.sql.internal.SQLConf
    +import org.apache.spark.util.{Benchmark, Utils}
    +
    +
    +/**
    + * Benchmark to measure read performance with Filter pushdown.
    + * To run this:
    + *  spark-submit --class <this class> <spark sql test jar>
    + */
    +object FilterPushdownBenchmark {
    +  val conf = new SparkConf()
    +    .setAppName("FilterPushdownBenchmark")
    +    // Since `spark.master` always exists, overrides this value
    +    .set("spark.master", "local[1]")
    --- End diff --
    
    Could you update `m4.2xlarge` in the PR description and add `spark.master` at line 34, too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #90878 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90878/testReport)** for PR 21288 at commit [`39e5a50`](https://github.com/apache/spark/commit/39e5a507fe22cade6bed0613eefbccab15cf45ff).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91946/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3627/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r189490682
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FilterPushdownBenchmark.scala ---
    @@ -32,14 +32,14 @@ import org.apache.spark.util.{Benchmark, Utils}
      */
     object FilterPushdownBenchmark {
       val conf = new SparkConf()
    -  conf.set("orc.compression", "snappy")
    -  conf.set("spark.sql.parquet.compression.codec", "snappy")
    +    .setMaster("local[1]")
    --- End diff --
    
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r189175140
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FilterPushdownBenchmark.scala ---
    @@ -32,14 +32,14 @@ import org.apache.spark.util.{Benchmark, Utils}
      */
     object FilterPushdownBenchmark {
       val conf = new SparkConf()
    -  conf.set("orc.compression", "snappy")
    -  conf.set("spark.sql.parquet.compression.codec", "snappy")
    +    .setMaster("local[1]")
    --- End diff --
    
    I think you can do `.setIfMissing("spark.master", "local[1]")`
    that way perhaps we could get this to run on different backends too


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r195949711
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -0,0 +1,442 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.{Random, Try}
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.functions.monotonically_increasing_id
    +import org.apache.spark.sql.internal.SQLConf
    +import org.apache.spark.util.{Benchmark, Utils}
    +
    +
    +/**
    + * Benchmark to measure read performance with Filter pushdown.
    + * To run this:
    + *  spark-submit --class <this class> <spark sql test jar>
    + */
    +object FilterPushdownBenchmark {
    +  val conf = new SparkConf()
    +    .setAppName("FilterPushdownBenchmark")
    +    // Since `spark.master` always exists, overrides this value
    +    .set("spark.master", "local[1]")
    --- End diff --
    
    I'm afraid that other developers might misunderstand how-to-use this?
    ```
    spark-submit --master local[1] --class <this class> <spark sql test jar>
    spark-submit --master local[*] --class <this class> <spark sql test jar>
    ````
    In both case, the benchmark always uses `local[1]`. Or, you suggest the other point of view?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #90883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90883/testReport)** for PR 21288 at commit [`39e5a50`](https://github.com/apache/spark/commit/39e5a507fe22cade6bed0613eefbccab15cf45ff).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r187822729
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FilterPushdownBenchmark.scala ---
    @@ -32,14 +32,14 @@ import org.apache.spark.util.{Benchmark, Utils}
      */
     object FilterPushdownBenchmark {
       val conf = new SparkConf()
    -  conf.set("orc.compression", "snappy")
    -  conf.set("spark.sql.parquet.compression.codec", "snappy")
    +    .setMaster("local[1]")
    +    .setAppName("FilterPushdownBenchmark")
    +    .set("spark.driver.memory", "3g")
    --- End diff --
    
    aha, ok. Looks good to me.
    I just added this along with other benchmark code, e.g., `TPCDSQueryBenchmark`.
    If no problem, I'll fix the other places in follow-up.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #90441 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90441/testReport)** for PR 21288 at commit [`8f60902`](https://github.com/apache/spark/commit/8f609023174c9f97bddc46bebe98f4ce3caf08c5).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r195946683
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -0,0 +1,442 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.{Random, Try}
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.functions.monotonically_increasing_id
    +import org.apache.spark.sql.internal.SQLConf
    +import org.apache.spark.util.{Benchmark, Utils}
    +
    +
    +/**
    + * Benchmark to measure read performance with Filter pushdown.
    + * To run this:
    + *  spark-submit --class <this class> <spark sql test jar>
    + */
    +object FilterPushdownBenchmark {
    +  val conf = new SparkConf()
    +    .setAppName("FilterPushdownBenchmark")
    +    // Since `spark.master` always exists, overrides this value
    +    .set("spark.master", "local[1]")
    --- End diff --
    
    btw, I updated the description. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r189120527
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FilterPushdownBenchmark.scala ---
    @@ -105,138 +128,306 @@ object FilterPushdownBenchmark {
         }
     
         /*
    -    Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Mac OS X 10.13.2
    -    Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
    -
    -    Select 0 row (id IS NULL):              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7882 / 7957          2.0         501.1       1.0X
    -    Parquet Vectorized (Pushdown)                   55 /   60        285.2           3.5     142.9X
    -    Native ORC Vectorized                         5592 / 5627          2.8         355.5       1.4X
    -    Native ORC Vectorized (Pushdown)                66 /   70        237.2           4.2     118.9X
    -
    -    Select 0 row (7864320 < id < 7864320):  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7884 / 7909          2.0         501.2       1.0X
    -    Parquet Vectorized (Pushdown)                  739 /  752         21.3          47.0      10.7X
    -    Native ORC Vectorized                         5614 / 5646          2.8         356.9       1.4X
    -    Native ORC Vectorized (Pushdown)                81 /   83        195.2           5.1      97.8X
    -
    -    Select 1 row (id = 7864320):            Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7905 / 8027          2.0         502.6       1.0X
    -    Parquet Vectorized (Pushdown)                  740 /  766         21.2          47.1      10.7X
    -    Native ORC Vectorized                         5684 / 5738          2.8         361.4       1.4X
    -    Native ORC Vectorized (Pushdown)                78 /   81        202.4           4.9     101.7X
    -
    -    Select 1 row (id <=> 7864320):          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7928 / 7993          2.0         504.1       1.0X
    -    Parquet Vectorized (Pushdown)                  747 /  772         21.0          47.5      10.6X
    -    Native ORC Vectorized                         5728 / 5753          2.7         364.2       1.4X
    -    Native ORC Vectorized (Pushdown)                76 /   78        207.9           4.8     104.8X
    -
    -    Select 1 row (7864320 <= id <= 7864320):Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7939 / 8021          2.0         504.8       1.0X
    -    Parquet Vectorized (Pushdown)                  746 /  770         21.1          47.4      10.6X
    -    Native ORC Vectorized                         5690 / 5734          2.8         361.7       1.4X
    -    Native ORC Vectorized (Pushdown)                76 /   79        206.7           4.8     104.3X
    -
    -    Select 1 row (7864319 < id < 7864321):  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7972 / 8019          2.0         506.9       1.0X
    -    Parquet Vectorized (Pushdown)                  742 /  764         21.2          47.2      10.7X
    -    Native ORC Vectorized                         5704 / 5743          2.8         362.6       1.4X
    -    Native ORC Vectorized (Pushdown)                76 /   78        207.9           4.8     105.4X
    -
    -    Select 10% rows (id < 1572864):         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            8733 / 8808          1.8         555.2       1.0X
    -    Parquet Vectorized (Pushdown)                 2213 / 2267          7.1         140.7       3.9X
    -    Native ORC Vectorized                         6420 / 6463          2.4         408.2       1.4X
    -    Native ORC Vectorized (Pushdown)              1313 / 1331         12.0          83.5       6.7X
    -
    -    Select 50% rows (id < 7864320):         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          11518 / 11591          1.4         732.3       1.0X
    -    Parquet Vectorized (Pushdown)                 7962 / 7991          2.0         506.2       1.4X
    -    Native ORC Vectorized                         8927 / 8985          1.8         567.6       1.3X
    -    Native ORC Vectorized (Pushdown)              6102 / 6160          2.6         387.9       1.9X
    -
    -    Select 90% rows (id < 14155776):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14255 / 14389          1.1         906.3       1.0X
    -    Parquet Vectorized (Pushdown)               13564 / 13594          1.2         862.4       1.1X
    -    Native ORC Vectorized                       11442 / 11608          1.4         727.5       1.2X
    -    Native ORC Vectorized (Pushdown)            10991 / 11029          1.4         698.8       1.3X
    -
    -    Select all rows (id IS NOT NULL):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14917 / 14938          1.1         948.4       1.0X
    -    Parquet Vectorized (Pushdown)               14910 / 14964          1.1         948.0       1.0X
    -    Native ORC Vectorized                       11986 / 12069          1.3         762.0       1.2X
    -    Native ORC Vectorized (Pushdown)            12037 / 12123          1.3         765.3       1.2X
    -
    -    Select all rows (id > -1):              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14951 / 14976          1.1         950.6       1.0X
    -    Parquet Vectorized (Pushdown)               14934 / 15016          1.1         949.5       1.0X
    -    Native ORC Vectorized                       12000 / 12156          1.3         763.0       1.2X
    -    Native ORC Vectorized (Pushdown)            12079 / 12113          1.3         767.9       1.2X
    -
    -    Select all rows (id != -1):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14930 / 14972          1.1         949.3       1.0X
    -    Parquet Vectorized (Pushdown)               15015 / 15047          1.0         954.6       1.0X
    -    Native ORC Vectorized                       12090 / 12259          1.3         768.7       1.2X
    -    Native ORC Vectorized (Pushdown)            12021 / 12096          1.3         764.2       1.2X
    +    Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
    --- End diff --
    
    ok, I used `m4.2xlarge`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #90904 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90904/testReport)** for PR 21288 at commit [`39e5a50`](https://github.com/apache/spark/commit/39e5a507fe22cade6bed0613eefbccab15cf45ff).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90878/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    One more thing; I prefer Macbook performance tests because the cost of EC2 is always a barrier to developers.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91219/testReport)** for PR 21288 at commit [`2c0d5cb`](https://github.com/apache/spark/commit/2c0d5cbf51268540653543b96de135a6923c6cef).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91821/testReport)** for PR 21288 at commit [`d3dd504`](https://github.com/apache/spark/commit/d3dd50463c2b91ae8800dbcc811dcc52880a02ca).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #90454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90454/testReport)** for PR 21288 at commit [`8f60902`](https://github.com/apache/spark/commit/8f609023174c9f97bddc46bebe98f4ce3caf08c5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91914 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91914/testReport)** for PR 21288 at commit [`4a9cec9`](https://github.com/apache/spark/commit/4a9cec91f9446161d4dde0cac20ccdccb9a112e7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91211/testReport)** for PR 21288 at commit [`2c0d5cb`](https://github.com/apache/spark/commit/2c0d5cbf51268540653543b96de135a6923c6cef).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91815/testReport)** for PR 21288 at commit [`fa53156`](https://github.com/apache/spark/commit/fa53156599812adc94f089b8c163224fb2e4935f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4011/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3102/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90440/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/125/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91914/testReport)** for PR 21288 at commit [`4a9cec9`](https://github.com/apache/spark/commit/4a9cec91f9446161d4dde0cac20ccdccb9a112e7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91815/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r187764083
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FilterPushdownBenchmark.scala ---
    @@ -32,14 +32,14 @@ import org.apache.spark.util.{Benchmark, Utils}
      */
     object FilterPushdownBenchmark {
       val conf = new SparkConf()
    -  conf.set("orc.compression", "snappy")
    -  conf.set("spark.sql.parquet.compression.codec", "snappy")
    +    .setMaster("local[1]")
    +    .setAppName("FilterPushdownBenchmark")
    +    .set("spark.driver.memory", "3g")
    --- End diff --
    
    these and master - change to setIfMissing()? I think it's great if these can be set via config


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/147/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r195946600
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -0,0 +1,442 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.{Random, Try}
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.functions.monotonically_increasing_id
    +import org.apache.spark.sql.internal.SQLConf
    +import org.apache.spark.util.{Benchmark, Utils}
    +
    +
    +/**
    + * Benchmark to measure read performance with Filter pushdown.
    + * To run this:
    + *  spark-submit --class <this class> <spark sql test jar>
    + */
    +object FilterPushdownBenchmark {
    +  val conf = new SparkConf()
    +    .setAppName("FilterPushdownBenchmark")
    +    // Since `spark.master` always exists, overrides this value
    +    .set("spark.master", "local[1]")
    --- End diff --
    
    In the current pr, we cannot use `spark.master` in command line options. You suggest we drop `.set("spark.master", "local[1]")` and we always set `spark.master` in options for this benchmark?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #90878 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90878/testReport)** for PR 21288 at commit [`39e5a50`](https://github.com/apache/spark/commit/39e5a507fe22cade6bed0613eefbccab15cf45ff).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #90904 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90904/testReport)** for PR 21288 at commit [`39e5a50`](https://github.com/apache/spark/commit/39e5a507fe22cade6bed0613eefbccab15cf45ff).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r195304544
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -131,211 +132,214 @@ object FilterPushdownBenchmark {
         }
     
         /*
    +    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.26-46.32.amzn1.x86_64
         Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
         Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
         ------------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    -    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    -    Native ORC Vectorized                         8167 / 8185          1.9         519.3       1.0X
    -    Native ORC Vectorized (Pushdown)               365 /  379         43.1          23.2      23.1X
    +    Parquet Vectorized                            2961 / 3123          5.3         188.3       1.0X
    +    Parquet Vectorized (Pushdown)                 3057 / 3121          5.1         194.4       1.0X
    --- End diff --
    
    The result in v2.3.1: https://gist.github.com/maropu/88627246b7143ede5ab73c7183ab2128
    
    That is not a regression, but I probably run the bench in wrong branch or commit.
    I re-ran the bench in the current master and updated the pr.
    
    how-to-run: I created a new `m4.2xlarge` instance, fetched this pr, rebased to master, and run the bench.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91219/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Sure


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91821/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90571/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #90883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90883/testReport)** for PR 21288 at commit [`39e5a50`](https://github.com/apache/spark/commit/39e5a507fe22cade6bed0613eefbccab15cf45ff).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r195262751
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -131,211 +132,214 @@ object FilterPushdownBenchmark {
         }
     
         /*
    +    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.26-46.32.amzn1.x86_64
         Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
         Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
         ------------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    -    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    -    Native ORC Vectorized                         8167 / 8185          1.9         519.3       1.0X
    -    Native ORC Vectorized (Pushdown)               365 /  379         43.1          23.2      23.1X
    +    Parquet Vectorized                            2961 / 3123          5.3         188.3       1.0X
    +    Parquet Vectorized (Pushdown)                 3057 / 3121          5.1         194.4       1.0X
    --- End diff --
    
    I have time today, so I'll check v2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3409/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91857 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91857/testReport)** for PR 21288 at commit [`d3dd504`](https://github.com/apache/spark/commit/d3dd50463c2b91ae8800dbcc811dcc52880a02ca).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    I noticed why the big performance value changes happened in https://github.com/apache/spark/pull/21288#discussion_r191280132; that's because [the commit](./https://github.com/apache/spark/pull/21288/commits/39e5a507fe22cade6bed0613eefbccab15cf45ff) wrongly set `local[*]` at `spark.master` instead of `local[1]`;
    
    ```
    // Performance results on r3.xlarge 
    
    // --master local[1] --driver-memory 10G --conf spark.ui.enabled=false
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9292 / 9315          1.7         590.8       1.0X
    Parquet Vectorized (Pushdown)                  921 /  933         17.1          58.6      10.1X
    Native ORC Vectorized                         9001 / 9021          1.7         572.3       1.0X
    Native ORC Vectorized (Pushdown)               257 /  265         61.2          16.3      36.2X
    
    Select 1 string row (value = '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9151 / 9162          1.7         581.8       1.0X
    Parquet Vectorized (Pushdown)                  902 /  917         17.4          57.3      10.1X
    Native ORC Vectorized                         8870 / 8882          1.8         564.0       1.0X
    Native ORC Vectorized (Pushdown)               254 /  268         61.9          16.1      36.0X
    ...
    
    
    // --master local[*] --driver-memory 10G --conf spark.ui.enabled=false
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Parquet Vectorized                            3959 / 4067          4.0         251.7       1.0X
    Parquet Vectorized (Pushdown)                  202 /  245         77.7          12.9      19.6X
    Native ORC Vectorized                         3973 / 4055          4.0         252.6       1.0X
    Native ORC Vectorized (Pushdown)               286 /  345         55.0          18.2      13.8X
    
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Parquet Vectorized                            3985 / 4022          3.9         253.4       1.0X
    Parquet Vectorized (Pushdown)                  249 /  274         63.3          15.8      16.0X
    Native ORC Vectorized                         4066 / 4122          3.9         258.5       1.0X
    Native ORC Vectorized (Pushdown)               257 /  310         61.3          16.3      15.5X
    ```
    
    I'll fix the bug and update the results in following prs. Sorry, all.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3635/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r189639582
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -0,0 +1,437 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.{Random, Try}
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.functions.monotonically_increasing_id
    +import org.apache.spark.sql.internal.SQLConf
    +import org.apache.spark.util.{Benchmark, Utils}
    +
    +
    +/**
    + * Benchmark to measure read performance with Filter pushdown.
    + * To run this:
    + *  spark-submit --class <this class> <spark sql test jar>
    + */
    +object FilterPushdownBenchmark {
    +  val conf = new SparkConf()
    +    .setAppName("FilterPushdownBenchmark")
    +    .setIfMissing("spark.master", "local[1]")
    +    .setIfMissing("spark.driver.memory", "3g")
    +    .setIfMissing("spark.executor.memory", "3g")
    +    .setIfMissing("orc.compression", "snappy")
    +    .setIfMissing("spark.sql.parquet.compression.codec", "snappy")
    +
    +  private val spark = SparkSession.builder().config(conf).getOrCreate()
    +
    +  def withTempPath(f: File => Unit): Unit = {
    +    val path = Utils.createTempDir()
    +    path.delete()
    +    try f(path) finally Utils.deleteRecursively(path)
    +  }
    +
    +  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
    +    try f finally tableNames.foreach(spark.catalog.dropTempView)
    +  }
    +
    +  def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
    +    val (keys, values) = pairs.unzip
    +    val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption)
    +    (keys, values).zipped.foreach(spark.conf.set)
    +    try f finally {
    +      keys.zip(currentValues).foreach {
    +        case (key, Some(value)) => spark.conf.set(key, value)
    +        case (key, None) => spark.conf.unset(key)
    +      }
    +    }
    +  }
    +
    +  private def prepareTable(
    +      dir: File, numRows: Int, width: Int, useStringForValue: Boolean): Unit = {
    +    import spark.implicits._
    +    val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i")
    +    val valueCol = if (useStringForValue) {
    +      monotonically_increasing_id().cast("string")
    +    } else {
    +      monotonically_increasing_id()
    +    }
    +    val df = spark.range(numRows).map(_ => Random.nextLong).selectExpr(selectExpr: _*)
    +      .withColumn("value", valueCol)
    +      .sort("value")
    +
    +    saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
    +    saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
    +  }
    +
    +  private def prepareStringDictTable(
    +      dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = {
    +    val selectExpr = (0 to width).map {
    +      case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value"
    +      case i => s"CAST(rand() AS STRING) c$i"
    +    }
    +    val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value")
    +
    +    saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
    +    saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
    +  }
    +
    +  private def saveAsOrcTable(df: DataFrame, dir: String): Unit = {
    +    df.write.mode("overwrite").orc(dir)
    +    spark.read.orc(dir).createOrReplaceTempView("orcTable")
    +  }
    +
    +  private def saveAsParquetTable(df: DataFrame, dir: String): Unit = {
    +    df.write.mode("overwrite").parquet(dir)
    +    spark.read.parquet(dir).createOrReplaceTempView("parquetTable")
    +  }
    +
    +  def filterPushDownBenchmark(
    +      values: Int,
    +      title: String,
    +      whereExpr: String,
    +      selectExpr: String = "*"): Unit = {
    +    val benchmark = new Benchmark(title, values, minNumIters = 5)
    +
    +    Seq(false, true).foreach { pushDownEnabled =>
    +      val name = s"Parquet Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}"
    +      benchmark.addCase(name) { _ =>
    +        withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> s"$pushDownEnabled") {
    +          spark.sql(s"SELECT $selectExpr FROM parquetTable WHERE $whereExpr").collect()
    +        }
    +      }
    +    }
    +
    +    Seq(false, true).foreach { pushDownEnabled =>
    +      val name = s"Native ORC Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}"
    +      benchmark.addCase(name) { _ =>
    +        withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> s"$pushDownEnabled") {
    +          spark.sql(s"SELECT $selectExpr FROM orcTable WHERE $whereExpr").collect()
    +        }
    +      }
    +    }
    +
    +    /*
    +    Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
    +    Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    +    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    +    Native ORC Vectorized                         8167 / 8185          1.9         519.3       1.0X
    +    Native ORC Vectorized (Pushdown)               365 /  379         43.1          23.2      23.1X
    +
    +
    +    Select 0 string row
    +    ('7864320' < value < '7864320'):         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8532 / 8564          1.8         542.4       1.0X
    +    Parquet Vectorized (Pushdown)                  366 /  386         43.0          23.3      23.3X
    +    Native ORC Vectorized                         8289 / 8300          1.9         527.0       1.0X
    +    Native ORC Vectorized (Pushdown)               378 /  385         41.6          24.0      22.6X
    +
    +
    +    Select 1 string row (value = '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8547 / 8564          1.8         543.4       1.0X
    +    Parquet Vectorized (Pushdown)                  351 /  356         44.9          22.3      24.4X
    +    Native ORC Vectorized                         8310 / 8323          1.9         528.3       1.0X
    +    Native ORC Vectorized (Pushdown)               370 /  375         42.5          23.5      23.1X
    +
    +
    +    Select 1 string row
    +    (value <=> '7864320'):                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8537 / 8563          1.8         542.8       1.0X
    +    Parquet Vectorized (Pushdown)                  310 /  319         50.7          19.7      27.5X
    +    Native ORC Vectorized                         8316 / 8335          1.9         528.7       1.0X
    +    Native ORC Vectorized (Pushdown)               364 /  367         43.2          23.1      23.5X
    +
    +
    +    Select 1 string row
    +    ('7864320' <= value <= '7864320'):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8594 / 8607          1.8         546.4       1.0X
    +    Parquet Vectorized (Pushdown)                  370 /  374         42.5          23.5      23.2X
    +    Native ORC Vectorized                         8350 / 8358          1.9         530.9       1.0X
    +    Native ORC Vectorized (Pushdown)               371 /  374         42.4          23.6      23.2X
    +
    +
    +    Select all string rows
    +    (value IS NOT NULL):                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          19601 / 19625          0.8        1246.2       1.0X
    +    Parquet Vectorized (Pushdown)               19698 / 19703          0.8        1252.3       1.0X
    +    Native ORC Vectorized                       19435 / 19470          0.8        1235.6       1.0X
    +    Native ORC Vectorized (Pushdown)            19568 / 19590          0.8        1244.1       1.0X
    +
    +
    +    Select 0 int row (value IS NULL):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7815 / 7824          2.0         496.9       1.0X
    +    Parquet Vectorized (Pushdown)                  245 /  251         64.2          15.6      31.9X
    +    Native ORC Vectorized                         7436 / 7460          2.1         472.8       1.1X
    +    Native ORC Vectorized (Pushdown)               344 /  351         45.7          21.9      22.7X
    +
    +
    +    Select 0 int row
    +    (7864320 < value < 7864320):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7792 / 7807          2.0         495.4       1.0X
    +    Parquet Vectorized (Pushdown)                  349 /  353         45.1          22.2      22.3X
    +    Native ORC Vectorized                         7451 / 7465          2.1         473.7       1.0X
    +    Native ORC Vectorized (Pushdown)               365 /  368         43.0          23.2      21.3X
    +
    +
    +    Select 1 int row (value = 7864320):      Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7836 / 7872          2.0         498.2       1.0X
    +    Parquet Vectorized (Pushdown)                  322 /  327         48.8          20.5      24.3X
    +    Native ORC Vectorized                         7533 / 7540          2.1         478.9       1.0X
    +    Native ORC Vectorized (Pushdown)               358 /  363         43.9          22.8      21.9X
    +
    +
    +    Select 1 int row (value <=> 7864320):    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7855 / 7870          2.0         499.4       1.0X
    +    Parquet Vectorized (Pushdown)                  286 /  297         54.9          18.2      27.4X
    +    Native ORC Vectorized                         7511 / 7557          2.1         477.5       1.0X
    +    Native ORC Vectorized (Pushdown)               358 /  361         43.9          22.8      21.9X
    +
    +
    +    Select 1 int row
    +    (7864320 <= value <= 7864320):           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7851 / 7870          2.0         499.2       1.0X
    +    Parquet Vectorized (Pushdown)                  345 /  347         45.6          21.9      22.8X
    +    Native ORC Vectorized                         7543 / 7554          2.1         479.6       1.0X
    +    Native ORC Vectorized (Pushdown)               364 /  374         43.2          23.1      21.6X
    +
    +
    +    Select 1 int row
    +    (7864319 < value < 7864321):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7837 / 7840          2.0         498.2       1.0X
    +    Parquet Vectorized (Pushdown)                  338 /  339         46.6          21.5      23.2X
    +    Native ORC Vectorized                         7524 / 7541          2.1         478.3       1.0X
    +    Native ORC Vectorized (Pushdown)               361 /  364         43.6          22.9      21.7X
    +
    +
    +    Select 10% int rows (value < 1572864):   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8864 / 8900          1.8         563.5       1.0X
    +    Parquet Vectorized (Pushdown)                 2088 / 2095          7.5         132.7       4.2X
    +    Native ORC Vectorized                         8562 / 8579          1.8         544.3       1.0X
    +    Native ORC Vectorized (Pushdown)              2127 / 2131          7.4         135.2       4.2X
    +
    +
    +    Select 50% int rows (value < 7864320):   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          12671 / 12684          1.2         805.6       1.0X
    +    Parquet Vectorized (Pushdown)                 9032 / 9041          1.7         574.2       1.4X
    +    Native ORC Vectorized                       12388 / 12411          1.3         787.6       1.0X
    +    Native ORC Vectorized (Pushdown)              8873 / 8884          1.8         564.1       1.4X
    +
    +
    +    Select 90% int rows (value < 14155776):  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          16481 / 16495          1.0        1047.8       1.0X
    +    Parquet Vectorized (Pushdown)               15906 / 15919          1.0        1011.3       1.0X
    +    Native ORC Vectorized                       16224 / 16254          1.0        1031.5       1.0X
    +    Native ORC Vectorized (Pushdown)            15632 / 15661          1.0         993.9       1.1X
    +
    +
    +    Select all int rows (value IS NOT NULL): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17341 / 17354          0.9        1102.5       1.0X
    +    Parquet Vectorized (Pushdown)               17463 / 17481          0.9        1110.2       1.0X
    +    Native ORC Vectorized                       17073 / 17089          0.9        1085.4       1.0X
    +    Native ORC Vectorized (Pushdown)            17194 / 17232          0.9        1093.2       1.0X
    +
    +
    +    Select all int rows (value > -1):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17452 / 17467          0.9        1109.6       1.0X
    +    Parquet Vectorized (Pushdown)               17613 / 17630          0.9        1119.8       1.0X
    +    Native ORC Vectorized                       17259 / 17271          0.9        1097.3       1.0X
    +    Native ORC Vectorized (Pushdown)            17385 / 17429          0.9        1105.3       1.0X
    +
    +
    +    Select all int rows (value != -1):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17363 / 17372          0.9        1103.9       1.0X
    +    Parquet Vectorized (Pushdown)               17526 / 17535          0.9        1114.2       1.0X
    +    Native ORC Vectorized                       17052 / 17089          0.9        1084.2       1.0X
    +    Native ORC Vectorized (Pushdown)            17209 / 17229          0.9        1094.1       1.0X
    +
    +
    +    Select 0 distinct string row
    +    (value IS NULL):                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7697 / 7751          2.0         489.4       1.0X
    +    Parquet Vectorized (Pushdown)                  264 /  284         59.5          16.8      29.1X
    +    Native ORC Vectorized                         6942 / 6970          2.3         441.4       1.1X
    +    Native ORC Vectorized (Pushdown)               372 /  381         42.3          23.7      20.7X
    +
    +
    +    Select 0 distinct string row
    +    ('100' < value < '100'):                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7983 / 8018          2.0         507.5       1.0X
    +    Parquet Vectorized (Pushdown)                  334 /  337         47.0          21.3      23.9X
    +    Native ORC Vectorized                         7307 / 7313          2.2         464.5       1.1X
    +    Native ORC Vectorized (Pushdown)               363 /  371         43.3          23.1      22.0X
    +
    +
    +    Select 1 distinct string row
    +    (value = '100'):                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7882 / 7915          2.0         501.1       1.0X
    +    Parquet Vectorized (Pushdown)                  504 /  522         31.2          32.1      15.6X
    +    Native ORC Vectorized                         7143 / 7155          2.2         454.1       1.1X
    +    Native ORC Vectorized (Pushdown)               555 /  573         28.4          35.3      14.2X
    +
    +
    +    Select 1 distinct string row
    +    (value <=> '100'):                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7898 / 7912          2.0         502.1       1.0X
    +    Parquet Vectorized (Pushdown)                  470 /  481         33.5          29.9      16.8X
    +    Native ORC Vectorized                         7135 / 7149          2.2         453.6       1.1X
    +    Native ORC Vectorized (Pushdown)               552 /  557         28.5          35.1      14.3X
    +
    +
    +    Select 1 distinct string row
    +    ('100' <= value <= '100'):               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8189 / 8213          1.9         520.7       1.0X
    +    Parquet Vectorized (Pushdown)                  527 /  534         29.9          33.5      15.5X
    +    Native ORC Vectorized                         7477 / 7498          2.1         475.3       1.1X
    +    Native ORC Vectorized (Pushdown)               558 /  566         28.2          35.5      14.7X
    +
    +
    +    Select all distinct string rows
    +    (value IS NOT NULL):                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          19462 / 19476          0.8        1237.4       1.0X
    +    Parquet Vectorized (Pushdown)               19570 / 19582          0.8        1244.2       1.0X
    +    Native ORC Vectorized                       18577 / 18604          0.8        1181.1       1.0X
    +    Native ORC Vectorized (Pushdown)            18701 / 18742          0.8        1189.0       1.0X
    +    */
    +    benchmark.run()
    +  }
    +
    +  private def runIntBenchmark(numRows: Int, width: Int, mid: Int): Unit = {
    +    Seq("value IS NULL", s"$mid < value AND value < $mid").foreach { whereExpr =>
    +      val title = s"Select 0 int row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    Seq(
    +      s"value = $mid",
    +      s"value <=> $mid",
    +      s"$mid <= value AND value <= $mid",
    +      s"${mid - 1} < value AND value < ${mid + 1}"
    +    ).foreach { whereExpr =>
    +      val title = s"Select 1 int row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
    +
    +    Seq(10, 50, 90).foreach { percent =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select $percent% int rows (value < ${numRows * percent / 100})",
    +        s"value < ${numRows * percent / 100}",
    +        selectExpr
    +      )
    +    }
    +
    +    Seq("value IS NOT NULL", "value > -1", "value != -1").foreach { whereExpr =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select all int rows ($whereExpr)",
    +        whereExpr,
    +        selectExpr)
    +    }
    +  }
    +
    +  private def runStringBenchmark(
    +      numRows: Int, width: Int, searchValue: Int, colType: String): Unit = {
    +    Seq("value IS NULL", s"'$searchValue' < value AND value < '$searchValue'")
    +        .foreach { whereExpr =>
    +      val title = s"Select 0 $colType row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    Seq(
    +      s"value = '$searchValue'",
    +      s"value <=> '$searchValue'",
    +      s"'$searchValue' <= value AND value <= '$searchValue'"
    +    ).foreach { whereExpr =>
    +      val title = s"Select 1 $colType row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
    +
    +    Seq("value IS NOT NULL").foreach { whereExpr =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select all $colType rows ($whereExpr)",
    +        whereExpr,
    +        selectExpr)
    +    }
    +  }
    +
    +  def main(args: Array[String]): Unit = {
    +    val numRows = 1024 * 1024 * 15
    +    val width = 5
    +
    +    // Pushdown for many distinct value case
    +    withTempPath { dir =>
    +      val mid = numRows / 2
    +
    +      withTempTable("orcTable", "patquetTable") {
    +        Seq(true, false).foreach { useStringForValue =>
    +          prepareTable(dir, numRows, width, useStringForValue)
    +          if (useStringForValue) {
    +            runStringBenchmark(numRows, width, mid, "string")
    +          } else {
    +            runIntBenchmark(numRows, width, mid)
    +          }
    +        }
    +      }
    +    }
    +
    +    // Pushdown for few distinct value case (use dictionary encoding)
    --- End diff --
    
    The current data fits the threshold. I am just afraid the comment might be invalid if the underlying files are not using dictionary encoding. Even if we do not change the format, we still need to update the comment. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r189018667
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FilterPushdownBenchmark.scala ---
    @@ -105,138 +128,306 @@ object FilterPushdownBenchmark {
         }
     
         /*
    -    Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Mac OS X 10.13.2
    -    Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
    -
    -    Select 0 row (id IS NULL):              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7882 / 7957          2.0         501.1       1.0X
    -    Parquet Vectorized (Pushdown)                   55 /   60        285.2           3.5     142.9X
    -    Native ORC Vectorized                         5592 / 5627          2.8         355.5       1.4X
    -    Native ORC Vectorized (Pushdown)                66 /   70        237.2           4.2     118.9X
    -
    -    Select 0 row (7864320 < id < 7864320):  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7884 / 7909          2.0         501.2       1.0X
    -    Parquet Vectorized (Pushdown)                  739 /  752         21.3          47.0      10.7X
    -    Native ORC Vectorized                         5614 / 5646          2.8         356.9       1.4X
    -    Native ORC Vectorized (Pushdown)                81 /   83        195.2           5.1      97.8X
    -
    -    Select 1 row (id = 7864320):            Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7905 / 8027          2.0         502.6       1.0X
    -    Parquet Vectorized (Pushdown)                  740 /  766         21.2          47.1      10.7X
    -    Native ORC Vectorized                         5684 / 5738          2.8         361.4       1.4X
    -    Native ORC Vectorized (Pushdown)                78 /   81        202.4           4.9     101.7X
    -
    -    Select 1 row (id <=> 7864320):          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7928 / 7993          2.0         504.1       1.0X
    -    Parquet Vectorized (Pushdown)                  747 /  772         21.0          47.5      10.6X
    -    Native ORC Vectorized                         5728 / 5753          2.7         364.2       1.4X
    -    Native ORC Vectorized (Pushdown)                76 /   78        207.9           4.8     104.8X
    -
    -    Select 1 row (7864320 <= id <= 7864320):Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7939 / 8021          2.0         504.8       1.0X
    -    Parquet Vectorized (Pushdown)                  746 /  770         21.1          47.4      10.6X
    -    Native ORC Vectorized                         5690 / 5734          2.8         361.7       1.4X
    -    Native ORC Vectorized (Pushdown)                76 /   79        206.7           4.8     104.3X
    -
    -    Select 1 row (7864319 < id < 7864321):  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7972 / 8019          2.0         506.9       1.0X
    -    Parquet Vectorized (Pushdown)                  742 /  764         21.2          47.2      10.7X
    -    Native ORC Vectorized                         5704 / 5743          2.8         362.6       1.4X
    -    Native ORC Vectorized (Pushdown)                76 /   78        207.9           4.8     105.4X
    -
    -    Select 10% rows (id < 1572864):         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            8733 / 8808          1.8         555.2       1.0X
    -    Parquet Vectorized (Pushdown)                 2213 / 2267          7.1         140.7       3.9X
    -    Native ORC Vectorized                         6420 / 6463          2.4         408.2       1.4X
    -    Native ORC Vectorized (Pushdown)              1313 / 1331         12.0          83.5       6.7X
    -
    -    Select 50% rows (id < 7864320):         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          11518 / 11591          1.4         732.3       1.0X
    -    Parquet Vectorized (Pushdown)                 7962 / 7991          2.0         506.2       1.4X
    -    Native ORC Vectorized                         8927 / 8985          1.8         567.6       1.3X
    -    Native ORC Vectorized (Pushdown)              6102 / 6160          2.6         387.9       1.9X
    -
    -    Select 90% rows (id < 14155776):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14255 / 14389          1.1         906.3       1.0X
    -    Parquet Vectorized (Pushdown)               13564 / 13594          1.2         862.4       1.1X
    -    Native ORC Vectorized                       11442 / 11608          1.4         727.5       1.2X
    -    Native ORC Vectorized (Pushdown)            10991 / 11029          1.4         698.8       1.3X
    -
    -    Select all rows (id IS NOT NULL):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14917 / 14938          1.1         948.4       1.0X
    -    Parquet Vectorized (Pushdown)               14910 / 14964          1.1         948.0       1.0X
    -    Native ORC Vectorized                       11986 / 12069          1.3         762.0       1.2X
    -    Native ORC Vectorized (Pushdown)            12037 / 12123          1.3         765.3       1.2X
    -
    -    Select all rows (id > -1):              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14951 / 14976          1.1         950.6       1.0X
    -    Parquet Vectorized (Pushdown)               14934 / 15016          1.1         949.5       1.0X
    -    Native ORC Vectorized                       12000 / 12156          1.3         763.0       1.2X
    -    Native ORC Vectorized (Pushdown)            12079 / 12113          1.3         767.9       1.2X
    -
    -    Select all rows (id != -1):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14930 / 14972          1.1         949.3       1.0X
    -    Parquet Vectorized (Pushdown)               15015 / 15047          1.0         954.6       1.0X
    -    Native ORC Vectorized                       12090 / 12259          1.3         768.7       1.2X
    -    Native ORC Vectorized (Pushdown)            12021 / 12096          1.3         764.2       1.2X
    +    Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
    +    Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    +    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    --- End diff --
    
    Hi, @maropu .
    Thank you for updating with new Parquet 1.10.
    Could you elaborate a little more about your EC2 environment and the step you did in PR description?
    I'm trying to reproduce this, but in my mac the result doesn't have the same pattern with this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #90571 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90571/testReport)** for PR 21288 at commit [`4520044`](https://github.com/apache/spark/commit/4520044d3be40ba8bf963a151db2dd9769c0f59a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r191109472
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -0,0 +1,437 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.{Random, Try}
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.functions.monotonically_increasing_id
    +import org.apache.spark.sql.internal.SQLConf
    +import org.apache.spark.util.{Benchmark, Utils}
    +
    +
    +/**
    + * Benchmark to measure read performance with Filter pushdown.
    + * To run this:
    + *  spark-submit --class <this class> <spark sql test jar>
    + */
    +object FilterPushdownBenchmark {
    +  val conf = new SparkConf()
    +    .setAppName("FilterPushdownBenchmark")
    +    .setIfMissing("spark.master", "local[1]")
    +    .setIfMissing("spark.driver.memory", "3g")
    +    .setIfMissing("spark.executor.memory", "3g")
    +    .setIfMissing("orc.compression", "snappy")
    +    .setIfMissing("spark.sql.parquet.compression.codec", "snappy")
    +
    +  private val spark = SparkSession.builder().config(conf).getOrCreate()
    +
    +  def withTempPath(f: File => Unit): Unit = {
    +    val path = Utils.createTempDir()
    +    path.delete()
    +    try f(path) finally Utils.deleteRecursively(path)
    +  }
    +
    +  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
    +    try f finally tableNames.foreach(spark.catalog.dropTempView)
    +  }
    +
    +  def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
    +    val (keys, values) = pairs.unzip
    +    val currentValues = keys.map(key => Try(spark.conf.get(key)).toOption)
    +    (keys, values).zipped.foreach(spark.conf.set)
    +    try f finally {
    +      keys.zip(currentValues).foreach {
    +        case (key, Some(value)) => spark.conf.set(key, value)
    +        case (key, None) => spark.conf.unset(key)
    +      }
    +    }
    +  }
    +
    +  private def prepareTable(
    +      dir: File, numRows: Int, width: Int, useStringForValue: Boolean): Unit = {
    +    import spark.implicits._
    +    val selectExpr = (1 to width).map(i => s"CAST(value AS STRING) c$i")
    +    val valueCol = if (useStringForValue) {
    +      monotonically_increasing_id().cast("string")
    +    } else {
    +      monotonically_increasing_id()
    +    }
    +    val df = spark.range(numRows).map(_ => Random.nextLong).selectExpr(selectExpr: _*)
    +      .withColumn("value", valueCol)
    +      .sort("value")
    +
    +    saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
    +    saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
    +  }
    +
    +  private def prepareStringDictTable(
    +      dir: File, numRows: Int, numDistinctValues: Int, width: Int): Unit = {
    +    val selectExpr = (0 to width).map {
    +      case 0 => s"CAST(id % $numDistinctValues AS STRING) AS value"
    +      case i => s"CAST(rand() AS STRING) c$i"
    +    }
    +    val df = spark.range(numRows).selectExpr(selectExpr: _*).sort("value")
    +
    +    saveAsOrcTable(df, dir.getCanonicalPath + "/orc")
    +    saveAsParquetTable(df, dir.getCanonicalPath + "/parquet")
    +  }
    +
    +  private def saveAsOrcTable(df: DataFrame, dir: String): Unit = {
    +    df.write.mode("overwrite").orc(dir)
    +    spark.read.orc(dir).createOrReplaceTempView("orcTable")
    +  }
    +
    +  private def saveAsParquetTable(df: DataFrame, dir: String): Unit = {
    +    df.write.mode("overwrite").parquet(dir)
    +    spark.read.parquet(dir).createOrReplaceTempView("parquetTable")
    +  }
    +
    +  def filterPushDownBenchmark(
    +      values: Int,
    +      title: String,
    +      whereExpr: String,
    +      selectExpr: String = "*"): Unit = {
    +    val benchmark = new Benchmark(title, values, minNumIters = 5)
    +
    +    Seq(false, true).foreach { pushDownEnabled =>
    +      val name = s"Parquet Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}"
    +      benchmark.addCase(name) { _ =>
    +        withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> s"$pushDownEnabled") {
    +          spark.sql(s"SELECT $selectExpr FROM parquetTable WHERE $whereExpr").collect()
    +        }
    +      }
    +    }
    +
    +    Seq(false, true).foreach { pushDownEnabled =>
    +      val name = s"Native ORC Vectorized ${if (pushDownEnabled) s"(Pushdown)" else ""}"
    +      benchmark.addCase(name) { _ =>
    +        withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> s"$pushDownEnabled") {
    +          spark.sql(s"SELECT $selectExpr FROM orcTable WHERE $whereExpr").collect()
    +        }
    +      }
    +    }
    +
    +    /*
    +    Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
    +    Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    +    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    +    Native ORC Vectorized                         8167 / 8185          1.9         519.3       1.0X
    +    Native ORC Vectorized (Pushdown)               365 /  379         43.1          23.2      23.1X
    +
    +
    +    Select 0 string row
    +    ('7864320' < value < '7864320'):         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8532 / 8564          1.8         542.4       1.0X
    +    Parquet Vectorized (Pushdown)                  366 /  386         43.0          23.3      23.3X
    +    Native ORC Vectorized                         8289 / 8300          1.9         527.0       1.0X
    +    Native ORC Vectorized (Pushdown)               378 /  385         41.6          24.0      22.6X
    +
    +
    +    Select 1 string row (value = '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8547 / 8564          1.8         543.4       1.0X
    +    Parquet Vectorized (Pushdown)                  351 /  356         44.9          22.3      24.4X
    +    Native ORC Vectorized                         8310 / 8323          1.9         528.3       1.0X
    +    Native ORC Vectorized (Pushdown)               370 /  375         42.5          23.5      23.1X
    +
    +
    +    Select 1 string row
    +    (value <=> '7864320'):                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8537 / 8563          1.8         542.8       1.0X
    +    Parquet Vectorized (Pushdown)                  310 /  319         50.7          19.7      27.5X
    +    Native ORC Vectorized                         8316 / 8335          1.9         528.7       1.0X
    +    Native ORC Vectorized (Pushdown)               364 /  367         43.2          23.1      23.5X
    +
    +
    +    Select 1 string row
    +    ('7864320' <= value <= '7864320'):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8594 / 8607          1.8         546.4       1.0X
    +    Parquet Vectorized (Pushdown)                  370 /  374         42.5          23.5      23.2X
    +    Native ORC Vectorized                         8350 / 8358          1.9         530.9       1.0X
    +    Native ORC Vectorized (Pushdown)               371 /  374         42.4          23.6      23.2X
    +
    +
    +    Select all string rows
    +    (value IS NOT NULL):                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          19601 / 19625          0.8        1246.2       1.0X
    +    Parquet Vectorized (Pushdown)               19698 / 19703          0.8        1252.3       1.0X
    +    Native ORC Vectorized                       19435 / 19470          0.8        1235.6       1.0X
    +    Native ORC Vectorized (Pushdown)            19568 / 19590          0.8        1244.1       1.0X
    +
    +
    +    Select 0 int row (value IS NULL):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7815 / 7824          2.0         496.9       1.0X
    +    Parquet Vectorized (Pushdown)                  245 /  251         64.2          15.6      31.9X
    +    Native ORC Vectorized                         7436 / 7460          2.1         472.8       1.1X
    +    Native ORC Vectorized (Pushdown)               344 /  351         45.7          21.9      22.7X
    +
    +
    +    Select 0 int row
    +    (7864320 < value < 7864320):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7792 / 7807          2.0         495.4       1.0X
    +    Parquet Vectorized (Pushdown)                  349 /  353         45.1          22.2      22.3X
    +    Native ORC Vectorized                         7451 / 7465          2.1         473.7       1.0X
    +    Native ORC Vectorized (Pushdown)               365 /  368         43.0          23.2      21.3X
    +
    +
    +    Select 1 int row (value = 7864320):      Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7836 / 7872          2.0         498.2       1.0X
    +    Parquet Vectorized (Pushdown)                  322 /  327         48.8          20.5      24.3X
    +    Native ORC Vectorized                         7533 / 7540          2.1         478.9       1.0X
    +    Native ORC Vectorized (Pushdown)               358 /  363         43.9          22.8      21.9X
    +
    +
    +    Select 1 int row (value <=> 7864320):    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7855 / 7870          2.0         499.4       1.0X
    +    Parquet Vectorized (Pushdown)                  286 /  297         54.9          18.2      27.4X
    +    Native ORC Vectorized                         7511 / 7557          2.1         477.5       1.0X
    +    Native ORC Vectorized (Pushdown)               358 /  361         43.9          22.8      21.9X
    +
    +
    +    Select 1 int row
    +    (7864320 <= value <= 7864320):           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7851 / 7870          2.0         499.2       1.0X
    +    Parquet Vectorized (Pushdown)                  345 /  347         45.6          21.9      22.8X
    +    Native ORC Vectorized                         7543 / 7554          2.1         479.6       1.0X
    +    Native ORC Vectorized (Pushdown)               364 /  374         43.2          23.1      21.6X
    +
    +
    +    Select 1 int row
    +    (7864319 < value < 7864321):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7837 / 7840          2.0         498.2       1.0X
    +    Parquet Vectorized (Pushdown)                  338 /  339         46.6          21.5      23.2X
    +    Native ORC Vectorized                         7524 / 7541          2.1         478.3       1.0X
    +    Native ORC Vectorized (Pushdown)               361 /  364         43.6          22.9      21.7X
    +
    +
    +    Select 10% int rows (value < 1572864):   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8864 / 8900          1.8         563.5       1.0X
    +    Parquet Vectorized (Pushdown)                 2088 / 2095          7.5         132.7       4.2X
    +    Native ORC Vectorized                         8562 / 8579          1.8         544.3       1.0X
    +    Native ORC Vectorized (Pushdown)              2127 / 2131          7.4         135.2       4.2X
    +
    +
    +    Select 50% int rows (value < 7864320):   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          12671 / 12684          1.2         805.6       1.0X
    +    Parquet Vectorized (Pushdown)                 9032 / 9041          1.7         574.2       1.4X
    +    Native ORC Vectorized                       12388 / 12411          1.3         787.6       1.0X
    +    Native ORC Vectorized (Pushdown)              8873 / 8884          1.8         564.1       1.4X
    +
    +
    +    Select 90% int rows (value < 14155776):  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          16481 / 16495          1.0        1047.8       1.0X
    +    Parquet Vectorized (Pushdown)               15906 / 15919          1.0        1011.3       1.0X
    +    Native ORC Vectorized                       16224 / 16254          1.0        1031.5       1.0X
    +    Native ORC Vectorized (Pushdown)            15632 / 15661          1.0         993.9       1.1X
    +
    +
    +    Select all int rows (value IS NOT NULL): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17341 / 17354          0.9        1102.5       1.0X
    +    Parquet Vectorized (Pushdown)               17463 / 17481          0.9        1110.2       1.0X
    +    Native ORC Vectorized                       17073 / 17089          0.9        1085.4       1.0X
    +    Native ORC Vectorized (Pushdown)            17194 / 17232          0.9        1093.2       1.0X
    +
    +
    +    Select all int rows (value > -1):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17452 / 17467          0.9        1109.6       1.0X
    +    Parquet Vectorized (Pushdown)               17613 / 17630          0.9        1119.8       1.0X
    +    Native ORC Vectorized                       17259 / 17271          0.9        1097.3       1.0X
    +    Native ORC Vectorized (Pushdown)            17385 / 17429          0.9        1105.3       1.0X
    +
    +
    +    Select all int rows (value != -1):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          17363 / 17372          0.9        1103.9       1.0X
    +    Parquet Vectorized (Pushdown)               17526 / 17535          0.9        1114.2       1.0X
    +    Native ORC Vectorized                       17052 / 17089          0.9        1084.2       1.0X
    +    Native ORC Vectorized (Pushdown)            17209 / 17229          0.9        1094.1       1.0X
    +
    +
    +    Select 0 distinct string row
    +    (value IS NULL):                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7697 / 7751          2.0         489.4       1.0X
    +    Parquet Vectorized (Pushdown)                  264 /  284         59.5          16.8      29.1X
    +    Native ORC Vectorized                         6942 / 6970          2.3         441.4       1.1X
    +    Native ORC Vectorized (Pushdown)               372 /  381         42.3          23.7      20.7X
    +
    +
    +    Select 0 distinct string row
    +    ('100' < value < '100'):                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7983 / 8018          2.0         507.5       1.0X
    +    Parquet Vectorized (Pushdown)                  334 /  337         47.0          21.3      23.9X
    +    Native ORC Vectorized                         7307 / 7313          2.2         464.5       1.1X
    +    Native ORC Vectorized (Pushdown)               363 /  371         43.3          23.1      22.0X
    +
    +
    +    Select 1 distinct string row
    +    (value = '100'):                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7882 / 7915          2.0         501.1       1.0X
    +    Parquet Vectorized (Pushdown)                  504 /  522         31.2          32.1      15.6X
    +    Native ORC Vectorized                         7143 / 7155          2.2         454.1       1.1X
    +    Native ORC Vectorized (Pushdown)               555 /  573         28.4          35.3      14.2X
    +
    +
    +    Select 1 distinct string row
    +    (value <=> '100'):                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            7898 / 7912          2.0         502.1       1.0X
    +    Parquet Vectorized (Pushdown)                  470 /  481         33.5          29.9      16.8X
    +    Native ORC Vectorized                         7135 / 7149          2.2         453.6       1.1X
    +    Native ORC Vectorized (Pushdown)               552 /  557         28.5          35.1      14.3X
    +
    +
    +    Select 1 distinct string row
    +    ('100' <= value <= '100'):               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                            8189 / 8213          1.9         520.7       1.0X
    +    Parquet Vectorized (Pushdown)                  527 /  534         29.9          33.5      15.5X
    +    Native ORC Vectorized                         7477 / 7498          2.1         475.3       1.1X
    +    Native ORC Vectorized (Pushdown)               558 /  566         28.2          35.5      14.7X
    +
    +
    +    Select all distinct string rows
    +    (value IS NOT NULL):                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    Parquet Vectorized                          19462 / 19476          0.8        1237.4       1.0X
    +    Parquet Vectorized (Pushdown)               19570 / 19582          0.8        1244.2       1.0X
    +    Native ORC Vectorized                       18577 / 18604          0.8        1181.1       1.0X
    +    Native ORC Vectorized (Pushdown)            18701 / 18742          0.8        1189.0       1.0X
    +    */
    +    benchmark.run()
    +  }
    +
    +  private def runIntBenchmark(numRows: Int, width: Int, mid: Int): Unit = {
    +    Seq("value IS NULL", s"$mid < value AND value < $mid").foreach { whereExpr =>
    +      val title = s"Select 0 int row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    Seq(
    +      s"value = $mid",
    +      s"value <=> $mid",
    +      s"$mid <= value AND value <= $mid",
    +      s"${mid - 1} < value AND value < ${mid + 1}"
    +    ).foreach { whereExpr =>
    +      val title = s"Select 1 int row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
    +
    +    Seq(10, 50, 90).foreach { percent =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select $percent% int rows (value < ${numRows * percent / 100})",
    +        s"value < ${numRows * percent / 100}",
    +        selectExpr
    +      )
    +    }
    +
    +    Seq("value IS NOT NULL", "value > -1", "value != -1").foreach { whereExpr =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select all int rows ($whereExpr)",
    +        whereExpr,
    +        selectExpr)
    +    }
    +  }
    +
    +  private def runStringBenchmark(
    +      numRows: Int, width: Int, searchValue: Int, colType: String): Unit = {
    +    Seq("value IS NULL", s"'$searchValue' < value AND value < '$searchValue'")
    +        .foreach { whereExpr =>
    +      val title = s"Select 0 $colType row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    Seq(
    +      s"value = '$searchValue'",
    +      s"value <=> '$searchValue'",
    +      s"'$searchValue' <= value AND value <= '$searchValue'"
    +    ).foreach { whereExpr =>
    +      val title = s"Select 1 $colType row ($whereExpr)".replace("value AND value", "value")
    +      filterPushDownBenchmark(numRows, title, whereExpr)
    +    }
    +
    +    val selectExpr = (1 to width).map(i => s"MAX(c$i)").mkString("", ",", ", MAX(value)")
    +
    +    Seq("value IS NOT NULL").foreach { whereExpr =>
    +      filterPushDownBenchmark(
    +        numRows,
    +        s"Select all $colType rows ($whereExpr)",
    +        whereExpr,
    +        selectExpr)
    +    }
    +  }
    +
    +  def main(args: Array[String]): Unit = {
    +    val numRows = 1024 * 1024 * 15
    +    val width = 5
    +
    +    // Pushdown for many distinct value case
    +    withTempPath { dir =>
    +      val mid = numRows / 2
    +
    +      withTempTable("orcTable", "patquetTable") {
    +        Seq(true, false).foreach { useStringForValue =>
    +          prepareTable(dir, numRows, width, useStringForValue)
    +          if (useStringForValue) {
    +            runStringBenchmark(numRows, width, mid, "string")
    +          } else {
    +            runIntBenchmark(numRows, width, mid)
    +          }
    +        }
    +      }
    +    }
    +
    +    // Pushdown for few distinct value case (use dictionary encoding)
    --- End diff --
    
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91210/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91821/testReport)** for PR 21288 at commit [`d3dd504`](https://github.com/apache/spark/commit/d3dd50463c2b91ae8800dbcc811dcc52880a02ca).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91946/testReport)** for PR 21288 at commit [`4a9cec9`](https://github.com/apache/spark/commit/4a9cec91f9446161d4dde0cac20ccdccb9a112e7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r191610297
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -131,211 +132,214 @@ object FilterPushdownBenchmark {
         }
     
         /*
    +    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.26-46.32.amzn1.x86_64
         Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
         Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
         ------------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    -    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    -    Native ORC Vectorized                         8167 / 8185          1.9         519.3       1.0X
    -    Native ORC Vectorized (Pushdown)               365 /  379         43.1          23.2      23.1X
    +    Parquet Vectorized                            2961 / 3123          5.3         188.3       1.0X
    +    Parquet Vectorized (Pushdown)                 3057 / 3121          5.1         194.4       1.0X
    --- End diff --
    
    I have not tried it yet, but is it related to the recent change we made in the parquet reader?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #90454 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90454/testReport)** for PR 21288 at commit [`8f60902`](https://github.com/apache/spark/commit/8f609023174c9f97bddc46bebe98f4ce3caf08c5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #91210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91210/testReport)** for PR 21288 at commit [`b7859ed`](https://github.com/apache/spark/commit/b7859ed0905ce3e0476e5d327f65798acc7aba8c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r195948346
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -0,0 +1,442 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import java.io.File
    +
    +import scala.util.{Random, Try}
    +
    +import org.apache.spark.SparkConf
    +import org.apache.spark.sql.{DataFrame, SparkSession}
    +import org.apache.spark.sql.functions.monotonically_increasing_id
    +import org.apache.spark.sql.internal.SQLConf
    +import org.apache.spark.util.{Benchmark, Utils}
    +
    +
    +/**
    + * Benchmark to measure read performance with Filter pushdown.
    + * To run this:
    + *  spark-submit --class <this class> <spark sql test jar>
    + */
    +object FilterPushdownBenchmark {
    +  val conf = new SparkConf()
    +    .setAppName("FilterPushdownBenchmark")
    +    // Since `spark.master` always exists, overrides this value
    +    .set("spark.master", "local[1]")
    --- End diff --
    
    What I mean is adding `--master local[1]` at line 34, too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4037/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90883/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90904/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/220/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/122/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    @dongjoon-hyun I got the same result in case of the same condition (enough memory), but, if `--diriver-memory 3g` (smaller memory), I got a little different results;
    ```
    // --diriver-memory=3g (default)
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Parquet Vectorized                          10084 / 10154          1.6         641.1       1.0X
    Parquet Vectorized (Pushdown)                  967 / 1008         16.3          61.5      10.4X
    Native ORC Vectorized                       11088 / 11116          1.4         705.0       0.9X
    Native ORC Vectorized (Pushdown)               270 /  278         58.2          17.2      37.3X
    
    Select 1 string row (value = '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Parquet Vectorized                          10032 / 10085          1.6         637.8       1.0X
    Parquet Vectorized (Pushdown)                  959 /  998         16.4          61.0      10.5X
    Native ORC Vectorized                       11104 / 11128          1.4         706.0       0.9X
    Native ORC Vectorized (Pushdown)               259 /  277         60.6          16.5      38.7X
    ...
    
    
    // --diriver-memory=10g
    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.33-51.37.amzn1.x86_64
    Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
    Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9201 / 9300          1.7         585.0       1.0X
    Parquet Vectorized (Pushdown)                   89 /  105        176.3           5.7     103.1X
    Native ORC Vectorized                         8886 / 8898          1.8         564.9       1.0X
    Native ORC Vectorized (Pushdown)               110 /  128        143.4           7.0      83.9X
    
    Select 0 string row ('7864320' < value < '7864320'): Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Parquet Vectorized                            9336 / 9357          1.7         593.6       1.0X
    Parquet Vectorized (Pushdown)                  927 /  937         17.0          58.9      10.1X
    Native ORC Vectorized                         9026 / 9041          1.7         573.9       1.0X
    Native ORC Vectorized (Pushdown)               257 /  272         61.1          16.4      36.3X
    ...
    ```
    The parquet has smaller memory footprint? I'm currently look into this (I updated the result in case of the enough memory).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    **[Test build #90571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90571/testReport)** for PR 21288 at commit [`4520044`](https://github.com/apache/spark/commit/4520044d3be40ba8bf963a151db2dd9769c0f59a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r195305634
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala ---
    @@ -131,211 +132,214 @@ object FilterPushdownBenchmark {
         }
     
         /*
    +    OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 4.14.26-46.32.amzn1.x86_64
         Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
         Select 0 string row (value IS NULL):     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
         ------------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            8452 / 8504          1.9         537.3       1.0X
    -    Parquet Vectorized (Pushdown)                  274 /  281         57.3          17.4      30.8X
    -    Native ORC Vectorized                         8167 / 8185          1.9         519.3       1.0X
    -    Native ORC Vectorized (Pushdown)               365 /  379         43.1          23.2      23.1X
    +    Parquet Vectorized                            2961 / 3123          5.3         188.3       1.0X
    +    Parquet Vectorized (Pushdown)                 3057 / 3121          5.1         194.4       1.0X
    --- End diff --
    
    Thank you for updating, @maropu .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91211/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91914/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3644/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21288#discussion_r189019277
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FilterPushdownBenchmark.scala ---
    @@ -105,138 +128,306 @@ object FilterPushdownBenchmark {
         }
     
         /*
    -    Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Mac OS X 10.13.2
    -    Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
    -
    -    Select 0 row (id IS NULL):              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7882 / 7957          2.0         501.1       1.0X
    -    Parquet Vectorized (Pushdown)                   55 /   60        285.2           3.5     142.9X
    -    Native ORC Vectorized                         5592 / 5627          2.8         355.5       1.4X
    -    Native ORC Vectorized (Pushdown)                66 /   70        237.2           4.2     118.9X
    -
    -    Select 0 row (7864320 < id < 7864320):  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7884 / 7909          2.0         501.2       1.0X
    -    Parquet Vectorized (Pushdown)                  739 /  752         21.3          47.0      10.7X
    -    Native ORC Vectorized                         5614 / 5646          2.8         356.9       1.4X
    -    Native ORC Vectorized (Pushdown)                81 /   83        195.2           5.1      97.8X
    -
    -    Select 1 row (id = 7864320):            Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7905 / 8027          2.0         502.6       1.0X
    -    Parquet Vectorized (Pushdown)                  740 /  766         21.2          47.1      10.7X
    -    Native ORC Vectorized                         5684 / 5738          2.8         361.4       1.4X
    -    Native ORC Vectorized (Pushdown)                78 /   81        202.4           4.9     101.7X
    -
    -    Select 1 row (id <=> 7864320):          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7928 / 7993          2.0         504.1       1.0X
    -    Parquet Vectorized (Pushdown)                  747 /  772         21.0          47.5      10.6X
    -    Native ORC Vectorized                         5728 / 5753          2.7         364.2       1.4X
    -    Native ORC Vectorized (Pushdown)                76 /   78        207.9           4.8     104.8X
    -
    -    Select 1 row (7864320 <= id <= 7864320):Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7939 / 8021          2.0         504.8       1.0X
    -    Parquet Vectorized (Pushdown)                  746 /  770         21.1          47.4      10.6X
    -    Native ORC Vectorized                         5690 / 5734          2.8         361.7       1.4X
    -    Native ORC Vectorized (Pushdown)                76 /   79        206.7           4.8     104.3X
    -
    -    Select 1 row (7864319 < id < 7864321):  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            7972 / 8019          2.0         506.9       1.0X
    -    Parquet Vectorized (Pushdown)                  742 /  764         21.2          47.2      10.7X
    -    Native ORC Vectorized                         5704 / 5743          2.8         362.6       1.4X
    -    Native ORC Vectorized (Pushdown)                76 /   78        207.9           4.8     105.4X
    -
    -    Select 10% rows (id < 1572864):         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                            8733 / 8808          1.8         555.2       1.0X
    -    Parquet Vectorized (Pushdown)                 2213 / 2267          7.1         140.7       3.9X
    -    Native ORC Vectorized                         6420 / 6463          2.4         408.2       1.4X
    -    Native ORC Vectorized (Pushdown)              1313 / 1331         12.0          83.5       6.7X
    -
    -    Select 50% rows (id < 7864320):         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          11518 / 11591          1.4         732.3       1.0X
    -    Parquet Vectorized (Pushdown)                 7962 / 7991          2.0         506.2       1.4X
    -    Native ORC Vectorized                         8927 / 8985          1.8         567.6       1.3X
    -    Native ORC Vectorized (Pushdown)              6102 / 6160          2.6         387.9       1.9X
    -
    -    Select 90% rows (id < 14155776):        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14255 / 14389          1.1         906.3       1.0X
    -    Parquet Vectorized (Pushdown)               13564 / 13594          1.2         862.4       1.1X
    -    Native ORC Vectorized                       11442 / 11608          1.4         727.5       1.2X
    -    Native ORC Vectorized (Pushdown)            10991 / 11029          1.4         698.8       1.3X
    -
    -    Select all rows (id IS NOT NULL):       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14917 / 14938          1.1         948.4       1.0X
    -    Parquet Vectorized (Pushdown)               14910 / 14964          1.1         948.0       1.0X
    -    Native ORC Vectorized                       11986 / 12069          1.3         762.0       1.2X
    -    Native ORC Vectorized (Pushdown)            12037 / 12123          1.3         765.3       1.2X
    -
    -    Select all rows (id > -1):              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14951 / 14976          1.1         950.6       1.0X
    -    Parquet Vectorized (Pushdown)               14934 / 15016          1.1         949.5       1.0X
    -    Native ORC Vectorized                       12000 / 12156          1.3         763.0       1.2X
    -    Native ORC Vectorized (Pushdown)            12079 / 12113          1.3         767.9       1.2X
    -
    -    Select all rows (id != -1):             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -----------------------------------------------------------------------------------------------
    -    Parquet Vectorized                          14930 / 14972          1.1         949.3       1.0X
    -    Parquet Vectorized (Pushdown)               15015 / 15047          1.0         954.6       1.0X
    -    Native ORC Vectorized                       12090 / 12259          1.3         768.7       1.2X
    -    Native ORC Vectorized (Pushdown)            12021 / 12096          1.3         764.2       1.2X
    +    Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
    --- End diff --
    
    Hi, @maropu . Thank you for updating this with new Parquet 1.10. BTW, could you elaborate the EC2 description more clearly in the PR description? I want to reproduce this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21288
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org