You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2019/04/30 06:34:00 UTC

[jira] [Assigned] (SPARK-27586) Improve binary comparison: replace Scala's for-comprehension if statements with while loop

     [ https://issues.apache.org/jira/browse/SPARK-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-27586:
------------------------------------

    Assignee: Apache Spark

> Improve binary comparison: replace Scala's for-comprehension if statements with while loop
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-27586
>                 URL: https://issues.apache.org/jira/browse/SPARK-27586
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.2
>         Environment: benchmark env:
>  * Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
>  * Linux 4.4.0-33.bm.1-amd64
>  * java version "1.8.0_131"
>  * Scala 2.11.8
>  * perf version 4.4.0
> Run:
> 40,000,000 times comparison on 32 bytes-length binary
>  
>            Reporter: WoudyGao
>            Assignee: Apache Spark
>            Priority: Minor
>
> I found the cpu cost of TypeUtils.compareBinary is noticeable when handle some big parquet files;
> After some perf work, I found:
> the " for-comprehension if statements" will execute ≈15X instructions than while loop
>  
> *'while-loop' version perf:*
>   
>  {{        886.687949      task-clock (msec)         #    1.257 CPUs utilized}}
>  {{             3,089      context-switches          #    0.003 M/sec}}
>  {{               265      cpu-migrations            #    0.299 K/sec}}
>  {{            12,227      page-faults               #    0.014 M/sec}}
>  {{     2,209,183,920      cycles                    #    2.492 GHz}}
>  {{   <not supported>      stalled-cycles-frontend}}
>  {{   <not supported>      stalled-cycles-backend}}
>  {{     6,865,836,114      instructions              #    3.11  insns per cycle}}
>  {{     1,568,910,228      branches                  # 1769.405 M/sec}}
>  {{         9,172,613      branch-misses             #    0.58% of all branches}}
>   
>  {{       0.705671157 seconds time elapsed}}
>   
> *TypeUtils.compareBinary perf:*
>  {{      16347.242313      task-clock (msec)         #    1.233 CPUs utilized}}
>  {{             8,370      context-switches          #    0.512 K/sec}}
>  {{               481      cpu-migrations            #    0.029 K/sec}}
>  {{           536,671      page-faults               #    0.033 M/sec}}
>  {{    40,857,347,119      cycles                    #    2.499 GHz}}
>  {{   <not supported>      stalled-cycles-frontend}}
>  {{   <not supported>      stalled-cycles-backend}}
>  {{    90,606,381,612      instructions              #    2.22  insns per cycle}}
>  {{    18,107,867,151      branches                  # 1107.702 M/sec}}
>  {{        12,880,296      branch-misses             #    0.07% of all branches}}
>   
>  {{      13.257617118 seconds time elapsed}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org