You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "WoudyGao (JIRA)" <ji...@apache.org> on 2019/04/28 10:52:00 UTC

[jira] [Created] (SPARK-27586) Improve binary comparison: replace Scala's for-comprehension if statements with while loop

WoudyGao created SPARK-27586:
--------------------------------

             Summary: Improve binary comparison: replace Scala's for-comprehension if statements with while loop
                 Key: SPARK-27586
                 URL: https://issues.apache.org/jira/browse/SPARK-27586
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.2
         Environment: benchmark env:
 * Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
 * Linux 4.4.0-33.bm.1-amd64
 * java version "1.8.0_131"
 * Scala 2.11.8
 * perf version 4.4.0

Run:

40,000,000 times comparison on 32 bytes-length binary

 
            Reporter: WoudyGao


I found the cpu cost of TypeUtils.compareBinary is noticeable when handle some big parquet files;

After some perf work, I found:

In  the " for-comprehension if statements" will execute ≈15X instructions than while loop

 

*'while-loop' version perf:*
 
{{        886.687949      task-clock (msec)         #    1.257 CPUs utilized}}
{{             3,089      context-switches          #    0.003 M/sec}}
{{               265      cpu-migrations            #    0.299 K/sec}}
{{            12,227      page-faults               #    0.014 M/sec}}
{{     2,209,183,920      cycles                    #    2.492 GHz}}
{{   <not supported>      stalled-cycles-frontend}}
{{   <not supported>      stalled-cycles-backend}}
{{     6,865,836,114      instructions              #    3.11  insns per cycle}}
{{     1,568,910,228      branches                  # 1769.405 M/sec}}
{{         9,172,613      branch-misses             #    0.58% of all branches}}
 
{{       0.705671157 seconds time elapsed}}
 

*TypeUtils.compareBinary perf:*
{{      16347.242313      task-clock (msec)         #    1.233 CPUs utilized}}
{{             8,370      context-switches          #    0.512 K/sec}}
{{               481      cpu-migrations            #    0.029 K/sec}}
{{           536,671      page-faults               #    0.033 M/sec}}
{{    40,857,347,119      cycles                    #    2.499 GHz}}
{{   <not supported>      stalled-cycles-frontend}}
{{   <not supported>      stalled-cycles-backend}}
{{    90,606,381,612      instructions              #    2.22  insns per cycle}}
{{    18,107,867,151      branches                  # 1107.702 M/sec}}
{{        12,880,296      branch-misses             #    0.07% of all branches}}
 
{{      13.257617118 seconds time elapsed}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org