You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:14:13 UTC

[jira] [Resolved] (SPARK-21314) ByteArrayMethods.arrayEquals could use some optimizations

     [ https://issues.apache.org/jira/browse/SPARK-21314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-21314.
----------------------------------
    Resolution: Incomplete

> ByteArrayMethods.arrayEquals could use some optimizations
> ---------------------------------------------------------
>
>                 Key: SPARK-21314
>                 URL: https://issues.apache.org/jira/browse/SPARK-21314
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Sumedh Wale
>            Priority: Minor
>              Labels: bulk-closed, performance
>
> ByteArrayMethods.arrayEquals is commonly invoked in queries especially for UTF8String comparisons. It shows up as having a major contribution for many kinds of queries involving string values like simple filters. An improvement to the same will help quite a range of queries.
> The current implementation:
> {code}
>     int i = 0;
>     while (i <= length - 8) {
>       if (Platform.getLong(leftBase, leftOffset + i) !=
>           Platform.getLong(rightBase, rightOffset + i)) {
>         return false;
>       }
>       i += 8;
>     }
>     while (i < length) {
>       if (Platform.getByte(leftBase, leftOffset + i) !=
>           Platform.getByte(rightBase, rightOffset + i)) {
>         return false;
>       }
>       i += 1;
>     }
> {code}
> can be optimized in two ways:
> a) use getInt comparison in remaining when possible which will be much faster than four byte comparisons
> b) offsets can be manipulated individually instead of adding "i" in every loop
> Above changes gives numbers like below for 15 byte strings:
> {noformat}
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Linux 4.4.0-21-generic
> Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
> compare arrayEquals:                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------
> arrayEquals                                   1230 / 1255         81.3          12.3       1.0X
> arrayEquals2                                   830 /  846        120.4           8.3       1.5X
> {noformat}
> The gains vary from 1.2X to 1.6X for different sizes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org