You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sumedh Wale (JIRA)" <ji...@apache.org> on 2017/07/05 09:34:00 UTC

[jira] [Created] (SPARK-21314) ByteArrayMethods.arrayEquals could use some optimizations

Sumedh Wale created SPARK-21314:
-----------------------------------

             Summary: ByteArrayMethods.arrayEquals could use some optimizations
                 Key: SPARK-21314
                 URL: https://issues.apache.org/jira/browse/SPARK-21314
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.0, 2.0.0
            Reporter: Sumedh Wale
            Priority: Minor


ByteArrayMethods.arrayEquals is commonly invoked in queries especially for UTF8String comparisons. It shows up as having a major contribution for many kinds of queries involving string values like simple filters. An improvement to the same will help quite a range of queries.

The current implementation:
{code}
    int i = 0;
    while (i <= length - 8) {
      if (Platform.getLong(leftBase, leftOffset + i) !=
          Platform.getLong(rightBase, rightOffset + i)) {
        return false;
      }
      i += 8;
    }
    while (i < length) {
      if (Platform.getByte(leftBase, leftOffset + i) !=
          Platform.getByte(rightBase, rightOffset + i)) {
        return false;
      }
      i += 1;
    }
{code}

can be optimized in two ways:

a) use getInt comparison in remaining when possible which will be much faster than four byte comparisons

b) offsets can be manipulated individually instead of adding "i" in every loop

Above changes gives numbers like below for 15 byte strings:

{noformat}
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Linux 4.4.0-21-generic
Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
compare arrayEquals:                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
arrayEquals                                   1230 / 1255         81.3          12.3       1.0X
arrayEquals2                                   830 /  846        120.4           8.3       1.5X
{noformat}

The gains vary from 1.2X to 1.6X for different sizes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org