You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2019/05/03 03:33:00 UTC
[jira] [Resolved] (SPARK-27586) Improve binary comparison: replace
Scala's for-comprehension if statements with while loop
[ https://issues.apache.org/jira/browse/SPARK-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun resolved SPARK-27586.
-----------------------------------
Resolution: Fixed
Assignee: WoudyGao
Fix Version/s: 3.0.0
This is resolved via https://github.com/apache/spark/pull/24494
> Improve binary comparison: replace Scala's for-comprehension if statements with while loop
> ------------------------------------------------------------------------------------------
>
> Key: SPARK-27586
> URL: https://issues.apache.org/jira/browse/SPARK-27586
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.2
> Environment: benchmark env:
> * Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
> * Linux 4.4.0-33.bm.1-amd64
> * java version "1.8.0_131"
> * Scala 2.11.8
> * perf version 4.4.0
> Run:
> 40,000,000 times comparison on 32 bytes-length binary
>
> Reporter: WoudyGao
> Assignee: WoudyGao
> Priority: Minor
> Fix For: 3.0.0
>
>
> I found the cpu cost of TypeUtils.compareBinary is noticeable when handle some big parquet files;
> After some perf work, I found:
> the " for-comprehension if statements" will execute ≈15X instructions than while loop
>
> *'while-loop' version perf:*
>
> {{ 886.687949 task-clock (msec) # 1.257 CPUs utilized}}
> {{ 3,089 context-switches # 0.003 M/sec}}
> {{ 265 cpu-migrations # 0.299 K/sec}}
> {{ 12,227 page-faults # 0.014 M/sec}}
> {{ 2,209,183,920 cycles # 2.492 GHz}}
> {{ <not supported> stalled-cycles-frontend}}
> {{ <not supported> stalled-cycles-backend}}
> {{ 6,865,836,114 instructions # 3.11 insns per cycle}}
> {{ 1,568,910,228 branches # 1769.405 M/sec}}
> {{ 9,172,613 branch-misses # 0.58% of all branches}}
>
> {{ 0.705671157 seconds time elapsed}}
>
> *TypeUtils.compareBinary perf:*
> {{ 16347.242313 task-clock (msec) # 1.233 CPUs utilized}}
> {{ 8,370 context-switches # 0.512 K/sec}}
> {{ 481 cpu-migrations # 0.029 K/sec}}
> {{ 536,671 page-faults # 0.033 M/sec}}
> {{ 40,857,347,119 cycles # 2.499 GHz}}
> {{ <not supported> stalled-cycles-frontend}}
> {{ <not supported> stalled-cycles-backend}}
> {{ 90,606,381,612 instructions # 2.22 insns per cycle}}
> {{ 18,107,867,151 branches # 1107.702 M/sec}}
> {{ 12,880,296 branch-misses # 0.07% of all branches}}
>
> {{ 13.257617118 seconds time elapsed}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org