You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "WoudyGao (JIRA)" <ji...@apache.org> on 2019/04/28 10:52:00 UTC
[jira] [Created] (SPARK-27586) Improve binary comparison: replace
Scala's for-comprehension if statements with while loop
WoudyGao created SPARK-27586:
--------------------------------
Summary: Improve binary comparison: replace Scala's for-comprehension if statements with while loop
Key: SPARK-27586
URL: https://issues.apache.org/jira/browse/SPARK-27586
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.4.2
Environment: benchmark env:
* Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
* Linux 4.4.0-33.bm.1-amd64
* java version "1.8.0_131"
* Scala 2.11.8
* perf version 4.4.0
Run:
40,000,000 times comparison on 32 bytes-length binary
Reporter: WoudyGao
I found the cpu cost of TypeUtils.compareBinary is noticeable when handle some big parquet files;
After some perf work, I found:
In the " for-comprehension if statements" will execute ≈15X instructions than while loop
*'while-loop' version perf:*
{{ 886.687949 task-clock (msec) # 1.257 CPUs utilized}}
{{ 3,089 context-switches # 0.003 M/sec}}
{{ 265 cpu-migrations # 0.299 K/sec}}
{{ 12,227 page-faults # 0.014 M/sec}}
{{ 2,209,183,920 cycles # 2.492 GHz}}
{{ <not supported> stalled-cycles-frontend}}
{{ <not supported> stalled-cycles-backend}}
{{ 6,865,836,114 instructions # 3.11 insns per cycle}}
{{ 1,568,910,228 branches # 1769.405 M/sec}}
{{ 9,172,613 branch-misses # 0.58% of all branches}}
{{ 0.705671157 seconds time elapsed}}
*TypeUtils.compareBinary perf:*
{{ 16347.242313 task-clock (msec) # 1.233 CPUs utilized}}
{{ 8,370 context-switches # 0.512 K/sec}}
{{ 481 cpu-migrations # 0.029 K/sec}}
{{ 536,671 page-faults # 0.033 M/sec}}
{{ 40,857,347,119 cycles # 2.499 GHz}}
{{ <not supported> stalled-cycles-frontend}}
{{ <not supported> stalled-cycles-backend}}
{{ 90,606,381,612 instructions # 2.22 insns per cycle}}
{{ 18,107,867,151 branches # 1107.702 M/sec}}
{{ 12,880,296 branch-misses # 0.07% of all branches}}
{{ 13.257617118 seconds time elapsed}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org