You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2019/01/15 14:26:20 UTC

[spark] branch master updated: [SPARK-26203][SQL][TEST] Benchmark performance of In and InSet expressions

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new b45ff02  [SPARK-26203][SQL][TEST] Benchmark performance of In and InSet expressions
b45ff02 is described below

commit b45ff02e77da013c878574e70d14faf09e1aed39
Author: Anton Okolnychyi <ao...@apple.com>
AuthorDate: Tue Jan 15 07:25:50 2019 -0700

    [SPARK-26203][SQL][TEST] Benchmark performance of In and InSet expressions
    
    ## What changes were proposed in this pull request?
    
    This PR contains benchmarks for `In` and `InSet` expressions. They cover literals of different data types and will help us to decide where to integrate the switch-based logic for bytes/shorts/ints.
    
    As discussed in [PR-23171](https://github.com/apache/spark/pull/23171), one potential approach is to convert `In` to `InSet` if all elements are literals independently of data types and the number of elements. According to the results of this PR, we might want to keep the threshold for the number of elements. The if-else approach approach might be faster for some data types on a small number of elements (structs? arrays? small decimals?).
    
    ### byte / short / int / long
    
    Unless the number of items is really big, `InSet` is slower than `In` because of autoboxing .
    
    Interestingly, `In` scales worse on bytes/shorts than on ints/longs. For example, `InSet` starts to match the performance on around 50 bytes/shorts while this does not happen on the same number of ints/longs. This is a bit strange as shorts/bytes (e.g., `(byte) 1`, `(short) 2`) are represented as ints in the bytecode.
    
    ### float / double
    
    Use cases on floats/doubles also suffer from autoboxing. Therefore, `In` outperforms `InSet` on 10 elements.
    
    Similarly to shorts/bytes, `In` scales worse on floats/doubles than on ints/longs because the equality condition is more complicated (e.g., `java.lang.Float.isNaN(filter_valueArg_0) && java.lang.Float.isNaN(9.0F)) || filter_valueArg_0 == 9.0F`).
    
    ### decimal
    
    The reason why we have separate benchmarks for small and large decimals is that Spark might use longs to represent decimals in some cases.
    
    If this optimization happens, then `equals` will be nothing else as comparing longs. If this does not happen, Spark will create an instance of `scala.BigDecimal` and use it for comparisons. The latter is more expensive.
    
    `Decimal$hashCode` will always use `scala.BigDecimal$hashCode` even if the number is small enough to fit into a long variable. As a consequence, we see that use cases on small decimals are faster with `In` as they are using long comparisons under the hood. Large decimal values are always faster with `InSet`.
    
    ### string
    
    `UTF8String$equals` is not cheap. Therefore, `In` does not really outperform `InSet` as in previous use cases.
    
    ### timestamp / date
    
    Under the hood, timestamp/date values will be represented as long/int values. So, `In` allows us to avoid autoboxing.
    
    ### array
    
    Arrays are working as expected. `In` is faster on 5 elements while `InSet` is faster on 15 elements. The benchmarks are using `UnsafeArrayData`.
    
    ### struct
    
    `InSet` is always faster than `In` for structs. These benchmarks use `GenericInternalRow`.
    
    Closes #23291 from aokolnychyi/spark-26203.
    
    Lead-authored-by: Anton Okolnychyi <ao...@apple.com>
    Co-authored-by: Dongjoon Hyun <do...@apache.org>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 .../benchmarks/InExpressionBenchmark-results.txt   | 551 +++++++++++++++++++++
 .../benchmark/InExpressionBenchmark.scala          | 214 ++++++++
 2 files changed, 765 insertions(+)

diff --git a/sql/core/benchmarks/InExpressionBenchmark-results.txt b/sql/core/benchmarks/InExpressionBenchmark-results.txt
new file mode 100644
index 0000000..d2adbde
--- /dev/null
+++ b/sql/core/benchmarks/InExpressionBenchmark-results.txt
@@ -0,0 +1,551 @@
+================================================================================================
+In Expression Benchmark
+================================================================================================
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+5 bytes:                                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  101 /  138         98.7          10.1       1.0X
+InSet expression                               125 /  136         79.7          12.5       0.8X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+10 bytes:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  101 /  111         99.3          10.1       1.0X
+InSet expression                               126 /  133         79.6          12.6       0.8X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+25 bytes:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  176 /  183         56.9          17.6       1.0X
+InSet expression                               174 /  184         57.4          17.4       1.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+50 bytes:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  299 /  312         33.5          29.9       1.0X
+InSet expression                               243 /  246         41.2          24.3       1.2X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+100 bytes:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  512 /  518         19.5          51.2       1.0X
+InSet expression                               388 /  400         25.8          38.8       1.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+200 bytes:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  854 /  866         11.7          85.4       1.0X
+InSet expression                               686 /  694         14.6          68.6       1.2X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+5 shorts:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   59 /   62        169.6           5.9       1.0X
+InSet expression                               163 /  168         61.3          16.3       0.4X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+10 shorts:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   76 /   78        132.0           7.6       1.0X
+InSet expression                               182 /  186         54.9          18.2       0.4X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+25 shorts:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  126 /  128         79.4          12.6       1.0X
+InSet expression                               190 /  193         52.7          19.0       0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+50 shorts:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  227 /  227         44.1          22.7       1.0X
+InSet expression                               232 /  235         43.1          23.2       1.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+100 shorts:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  408 /  414         24.5          40.8       1.0X
+InSet expression                               203 /  209         49.3          20.3       2.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+200 shorts:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  762 /  765         13.1          76.2       1.0X
+InSet expression                               192 /  196         52.1          19.2       4.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+5 ints:                                  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   53 /   57        187.3           5.3       1.0X
+InSet expression                               156 /  160         63.9          15.6       0.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+10 ints:                                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   70 /   74        142.4           7.0       1.0X
+InSet expression                               170 /  176         58.9          17.0       0.4X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+25 ints:                                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  125 /  126         80.2          12.5       1.0X
+InSet expression                               174 /  179         57.4          17.4       0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+50 ints:                                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  220 /  222         45.5          22.0       1.0X
+InSet expression                               215 /  221         46.6          21.5       1.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+100 ints:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  404 /  407         24.8          40.4       1.0X
+InSet expression                               189 /  192         53.0          18.9       2.1X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+200 ints:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  760 /  764         13.2          76.0       1.0X
+InSet expression                               176 /  179         56.8          17.6       4.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+5 longs:                                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   50 /   52        200.3           5.0       1.0X
+InSet expression                               147 /  151         68.1          14.7       0.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+10 longs:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   65 /   66        154.8           6.5       1.0X
+InSet expression                               162 /  166         61.6          16.2       0.4X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+25 longs:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  117 /  119         85.1          11.7       1.0X
+InSet expression                               170 /  175         58.8          17.0       0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+50 longs:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  207 /  208         48.3          20.7       1.0X
+InSet expression                               211 /  214         47.4          21.1       1.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+100 longs:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  387 /  389         25.9          38.7       1.0X
+InSet expression                               185 /  187         54.2          18.5       2.1X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+200 longs:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  742 /  744         13.5          74.2       1.0X
+InSet expression                               172 /  173         58.3          17.2       4.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+5 floats:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   88 /   91        113.0           8.8       1.0X
+InSet expression                               170 /  171         58.9          17.0       0.5X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+10 floats:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  129 /  132         77.5          12.9       1.0X
+InSet expression                               188 /  189         53.2          18.8       0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+25 floats:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  243 /  244         41.2          24.3       1.0X
+InSet expression                               192 /  194         52.2          19.2       1.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+50 floats:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  421 /  424         23.7          42.1       1.0X
+InSet expression                               237 /  240         42.2          23.7       1.8X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+100 floats:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  775 /  777         12.9          77.5       1.0X
+InSet expression                               205 /  209         48.8          20.5       3.8X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+200 floats:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                 3052 / 3151          3.3         305.2       1.0X
+InSet expression                               197 /  199         50.8          19.7      15.5X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+5 doubles:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   82 /   84        121.6           8.2       1.0X
+InSet expression                               167 /  169         60.0          16.7       0.5X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+10 doubles:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  124 /  131         80.3          12.4       1.0X
+InSet expression                               186 /  187         53.9          18.6       0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+25 doubles:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  237 /  239         42.1          23.7       1.0X
+InSet expression                               193 /  194         51.8          19.3       1.2X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+50 doubles:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  416 /  418         24.0          41.6       1.0X
+InSet expression                               239 /  241         41.8          23.9       1.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+100 doubles:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  771 /  774         13.0          77.1       1.0X
+InSet expression                               204 /  207         49.1          20.4       3.8X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+200 doubles:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                 3755 / 3801          2.7         375.5       1.0X
+InSet expression                               194 /  197         51.5          19.4      19.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+5 small decimals:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   46 /   49         21.6          46.4       1.0X
+InSet expression                               136 /  141          7.4         135.7       0.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+10 small decimals:                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   57 /   61         17.5          57.1       1.0X
+InSet expression                               137 /  140          7.3         137.2       0.4X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+25 small decimals:                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   89 /   92         11.2          89.4       1.0X
+InSet expression                               139 /  141          7.2         138.7       0.6X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+50 small decimals:                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  170 /  171          5.9         169.5       1.0X
+InSet expression                               146 /  148          6.9         145.8       1.2X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+100 small decimals:                      Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  429 /  431          2.3         429.2       1.0X
+InSet expression                               145 /  148          6.9         144.9       3.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+200 small decimals:                      Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  995 / 1207          1.0         995.0       1.0X
+InSet expression                               154 /  158          6.5         154.1       6.5X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+5 large decimals:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  268 /  307          3.7         268.3       1.0X
+InSet expression                               171 /  176          5.8         171.1       1.6X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+10 large decimals:                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  455 /  458          2.2         455.2       1.0X
+InSet expression                               173 /  176          5.8         173.1       2.6X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+25 large decimals:                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                 1095 / 1099          0.9        1095.2       1.0X
+InSet expression                               179 /  183          5.6         178.7       6.1X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+50 large decimals:                       Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                 2099 / 2110          0.5        2098.6       1.0X
+InSet expression                               183 /  187          5.5         183.2      11.5X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+100 large decimals:                      Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                 3885 / 3911          0.3        3885.4       1.0X
+InSet expression                               207 /  223          4.8         206.6      18.8X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+200 large decimals:                      Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                 7759 / 7867          0.1        7759.2       1.0X
+InSet expression                               214 /  217          4.7         214.4      36.2X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+5 strings:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  126 /  127          7.9         126.0       1.0X
+InSet expression                               139 /  142          7.2         139.0       0.9X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+10 strings:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  128 /  132          7.8         128.2       1.0X
+InSet expression                               142 /  144          7.0         142.0       0.9X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+25 strings:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  151 /  153          6.6         150.9       1.0X
+InSet expression                               150 /  152          6.7         150.1       1.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+50 strings:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  238 /  240          4.2         238.5       1.0X
+InSet expression                               152 /  154          6.6         152.4       1.6X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+100 strings:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  431 /  432          2.3         431.2       1.0X
+InSet expression                               149 /  151          6.7         148.8       2.9X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+200 strings:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  822 / 1060          1.2         821.7       1.0X
+InSet expression                               153 /  162          6.5         152.9       5.4X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+5 timestamps:                            Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   42 /   44        240.5           4.2       1.0X
+InSet expression                               158 /  161         63.5          15.8       0.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+10 timestamps:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   57 /   59        174.5           5.7       1.0X
+InSet expression                               173 /  176         57.8          17.3       0.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+25 timestamps:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  110 /  113         91.1          11.0       1.0X
+InSet expression                               223 /  226         44.9          22.3       0.5X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+50 timestamps:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  190 /  193         52.6          19.0       1.0X
+InSet expression                               238 /  240         42.1          23.8       0.8X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+100 timestamps:                          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  355 /  367         28.2          35.5       1.0X
+InSet expression                               221 /  222         45.2          22.1       1.6X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+200 timestamps:                          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  692 /  694         14.5          69.2       1.0X
+InSet expression                               220 /  222         45.4          22.0       3.1X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+5 dates:                                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  143 /  145         70.0          14.3       1.0X
+InSet expression                               264 /  269         37.9          26.4       0.5X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+10 dates:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  141 /  142         71.1          14.1       1.0X
+InSet expression                               268 /  269         37.3          26.8       0.5X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+25 dates:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  196 /  196         51.1          19.6       1.0X
+InSet expression                               277 /  282         36.1          27.7       0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+50 dates:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  338 /  351         29.5          33.8       1.0X
+InSet expression                               287 /  290         34.9          28.7       1.2X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+100 dates:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  541 /  542         18.5          54.1       1.0X
+InSet expression                               299 /  300         33.5          29.9       1.8X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+200 dates:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  883 /  900         11.3          88.3       1.0X
+InSet expression                               296 /  298         33.8          29.6       3.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+5 arrays:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   51 /   53         19.6          51.0       1.0X
+InSet expression                                96 /   97         10.5          95.7       0.5X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+10 arrays:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   77 /   79         13.1          76.6       1.0X
+InSet expression                                96 /   98         10.4          96.0       0.8X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+25 arrays:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  275 /  276          3.6         274.6       1.0X
+InSet expression                               119 /  121          8.4         119.1       2.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+50 arrays:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  592 /  663          1.7         592.1       1.0X
+InSet expression                               164 /  172          6.1         164.3       3.6X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+100 arrays:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                 2555 / 2733          0.4        2554.7       1.0X
+InSet expression                               194 /  198          5.2         193.9      13.2X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+200 arrays:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                 9215 / 9778          0.1        9214.8       1.0X
+InSet expression                               253 /  256          3.9         253.2      36.4X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+5 structs:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   46 /   47         22.0          45.5       1.0X
+InSet expression                               157 /  162          6.4         156.5       0.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+10 structs:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                   61 /   63         16.5          60.7       1.0X
+InSet expression                               158 /  161          6.3         158.2       0.4X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+25 structs:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  133 /  140          7.5         132.8       1.0X
+InSet expression                               199 /  202          5.0         198.8       0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+50 structs:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                  369 /  372          2.7         369.1       1.0X
+InSet expression                               283 /  294          3.5         282.7       1.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+100 structs:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                 1570 / 1731          0.6        1569.8       1.0X
+InSet expression                               332 /  334          3.0         332.0       4.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+200 structs:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
+------------------------------------------------------------------------------------------------
+In expression                                 6332 / 6794          0.2        6331.8       1.0X
+InSet expression                               441 /  444          2.3         440.9      14.4X
+
+
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/InExpressionBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/InExpressionBenchmark.scala
new file mode 100644
index 0000000..cf4a34b
--- /dev/null
+++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/InExpressionBenchmark.scala
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.functions.{array, struct}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types._
+
+/**
+ * A benchmark that compares the performance of different ways to evaluate SQL IN expressions.
+ *
+ * Specifically, this class compares the if-based approach, which might iterate through all items
+ * inside the IN value list, to other options with better worst-case time complexities (e.g., sets).
+ *
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/InExpressionBenchmark-results.txt".
+ * }}}
+ */
+object InExpressionBenchmark extends SqlBasedBenchmark {
+
+  import spark.implicits._
+
+  private def runByteBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Unit = {
+    val name = s"$numItems bytes"
+    val values = (Byte.MinValue until Byte.MinValue + numItems).map(v => s"${v}Y")
+    val df = spark.range(0, numRows).select($"id".cast(ByteType))
+    runBenchmark(name, df, values, numRows, minNumIters)
+  }
+
+  private def runShortBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Unit = {
+    val name = s"$numItems shorts"
+    val values = (1 to numItems).map(v => s"${v}S")
+    val df = spark.range(0, numRows).select($"id".cast(ShortType))
+    runBenchmark(name, df, values, numRows, minNumIters)
+  }
+
+  private def runIntBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Unit = {
+    val name = s"$numItems ints"
+    val values = 1 to numItems
+    val df = spark.range(0, numRows).select($"id".cast(IntegerType))
+    runBenchmark(name, df, values, numRows, minNumIters)
+  }
+
+  private def runLongBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Unit = {
+    val name = s"$numItems longs"
+    val values = (1 to numItems).map(v => s"${v}L")
+    val df = spark.range(0, numRows).toDF("id")
+    runBenchmark(name, df, values, numRows, minNumIters)
+  }
+
+  private def runFloatBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Unit = {
+    val name = s"$numItems floats"
+    val values = (1 to numItems).map(v => s"CAST($v AS float)")
+    val df = spark.range(0, numRows).select($"id".cast(FloatType))
+    runBenchmark(name, df, values, numRows, minNumIters)
+  }
+
+  private def runDoubleBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Unit = {
+    val name = s"$numItems doubles"
+    val values = (1 to numItems).map(v => s"$v.0D")
+    val df = spark.range(0, numRows).select($"id".cast(DoubleType))
+    runBenchmark(name, df, values, numRows, minNumIters)
+  }
+
+  private def runSmallDecimalBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Unit = {
+    val name = s"$numItems small decimals"
+    val values = (1 to numItems).map(v => s"CAST($v AS decimal(12, 1))")
+    val df = spark.range(0, numRows).select($"id".cast(DecimalType(12, 1)))
+    runBenchmark(name, df, values, numRows, minNumIters)
+  }
+
+  private def runLargeDecimalBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Unit = {
+    val name = s"$numItems large decimals"
+    val values = (1 to numItems).map(v => s"9223372036854775812.10539$v")
+    val df = spark.range(0, numRows).select($"id".cast(DecimalType(30, 7)))
+    runBenchmark(name, df, values, numRows, minNumIters)
+  }
+
+  private def runStringBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Unit = {
+    val name = s"$numItems strings"
+    val values = (1 to numItems).map(n => s"'$n'")
+    val df = spark.range(0, numRows).select($"id".cast(StringType))
+    runBenchmark(name, df, values, numRows, minNumIters)
+  }
+
+  private def runTimestampBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Unit = {
+    val name = s"$numItems timestamps"
+    val values = (1 to numItems).map(m => s"CAST('1970-01-01 01:00:00.$m' AS timestamp)")
+    val df = spark.range(0, numRows).select($"id".cast(TimestampType))
+    runBenchmark(name, df, values, numRows, minNumIters)
+  }
+
+  private def runDateBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Unit = {
+    val name = s"$numItems dates"
+    val values = (1 to numItems).map(n => 1970 + n).map(y => s"CAST('$y-01-01' AS date)")
+    val df = spark.range(0, numRows).select($"id".cast(TimestampType).cast(DateType))
+    runBenchmark(name, df, values, numRows, minNumIters)
+  }
+
+  private def runArrayBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Unit = {
+    val name = s"$numItems arrays"
+    val values = (1 to numItems).map(i => s"array($i)")
+    val df = spark.range(0, numRows).select(array($"id").as("id"))
+    runBenchmark(name, df, values, numRows, minNumIters)
+  }
+
+  private def runStructBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Unit = {
+    val name = s"$numItems structs"
+    val values = (1 to numItems).map(i => s"struct($i)")
+    val df = spark.range(0, numRows).select(struct($"id".as("col1")).as("id"))
+    runBenchmark(name, df, values, numRows, minNumIters)
+  }
+
+  private def runBenchmark(
+      name: String,
+      df: DataFrame,
+      values: Seq[Any],
+      numRows: Long,
+      minNumIters: Int): Unit = {
+
+    val benchmark = new Benchmark(name, numRows, minNumIters, output = output)
+
+    df.createOrReplaceTempView("t")
+
+    def testClosure(): Unit = {
+      val df = spark.sql(s"SELECT * FROM t WHERE id IN (${values.mkString(",")})")
+      df.queryExecution.toRdd.foreach(_ => Unit)
+    }
+
+    benchmark.addCase("In expression") { _ =>
+      withSQLConf(SQLConf.OPTIMIZER_INSET_CONVERSION_THRESHOLD.key -> values.size.toString) {
+        testClosure()
+      }
+    }
+
+    benchmark.addCase("InSet expression") { _ =>
+      withSQLConf(SQLConf.OPTIMIZER_INSET_CONVERSION_THRESHOLD.key -> "1") {
+        testClosure()
+      }
+    }
+
+    benchmark.run()
+  }
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+    val numItemsSeq = Seq(5, 10, 25, 50, 100, 200)
+    val largeNumRows = 10000000
+    val smallNumRows = 1000000
+    val minNumIters = 5
+
+    runBenchmark("In Expression Benchmark") {
+      numItemsSeq.foreach { numItems =>
+        runByteBenchmark(numItems, largeNumRows, minNumIters)
+      }
+      numItemsSeq.foreach { numItems =>
+        runShortBenchmark(numItems, largeNumRows, minNumIters)
+      }
+      numItemsSeq.foreach { numItems =>
+        runIntBenchmark(numItems, largeNumRows, minNumIters)
+      }
+      numItemsSeq.foreach { numItems =>
+        runLongBenchmark(numItems, largeNumRows, minNumIters)
+      }
+      numItemsSeq.foreach { numItems =>
+        runFloatBenchmark(numItems, largeNumRows, minNumIters)
+      }
+      numItemsSeq.foreach { numItems =>
+        runDoubleBenchmark(numItems, largeNumRows, minNumIters)
+      }
+      numItemsSeq.foreach { numItems =>
+        runSmallDecimalBenchmark(numItems, smallNumRows, minNumIters)
+      }
+      numItemsSeq.foreach { numItems =>
+        runLargeDecimalBenchmark(numItems, smallNumRows, minNumIters)
+      }
+      numItemsSeq.foreach { numItems =>
+        runStringBenchmark(numItems, smallNumRows, minNumIters)
+      }
+      numItemsSeq.foreach { numItems =>
+        runTimestampBenchmark(numItems, largeNumRows, minNumIters)
+      }
+      numItemsSeq.foreach { numItems =>
+        runDateBenchmark(numItems, largeNumRows, minNumIters)
+      }
+      numItemsSeq.foreach { numItems =>
+        runArrayBenchmark(numItems, smallNumRows, minNumIters)
+      }
+      numItemsSeq.foreach { numItems =>
+        runStructBenchmark(numItems, smallNumRows, minNumIters)
+      }
+    }
+  }
+}


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org