You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sumedh Wale (JIRA)" <ji...@apache.org> on 2017/07/05 09:34:00 UTC
[jira] [Created] (SPARK-21314) ByteArrayMethods.arrayEquals could
use some optimizations
Sumedh Wale created SPARK-21314:
-----------------------------------
Summary: ByteArrayMethods.arrayEquals could use some optimizations
Key: SPARK-21314
URL: https://issues.apache.org/jira/browse/SPARK-21314
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.1.0, 2.0.0
Reporter: Sumedh Wale
Priority: Minor
ByteArrayMethods.arrayEquals is commonly invoked in queries especially for UTF8String comparisons. It shows up as having a major contribution for many kinds of queries involving string values like simple filters. An improvement to the same will help quite a range of queries.
The current implementation:
{code}
int i = 0;
while (i <= length - 8) {
if (Platform.getLong(leftBase, leftOffset + i) !=
Platform.getLong(rightBase, rightOffset + i)) {
return false;
}
i += 8;
}
while (i < length) {
if (Platform.getByte(leftBase, leftOffset + i) !=
Platform.getByte(rightBase, rightOffset + i)) {
return false;
}
i += 1;
}
{code}
can be optimized in two ways:
a) use getInt comparison in remaining when possible which will be much faster than four byte comparisons
b) offsets can be manipulated individually instead of adding "i" in every loop
Above changes gives numbers like below for 15 byte strings:
{noformat}
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Linux 4.4.0-21-generic
Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
compare arrayEquals: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
arrayEquals 1230 / 1255 81.3 12.3 1.0X
arrayEquals2 830 / 846 120.4 8.3 1.5X
{noformat}
The gains vary from 1.2X to 1.6X for different sizes.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org