You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2018/06/20 19:41:00 UTC

[jira] [Commented] (HADOOP-15551) Avoid use of Java8 streams in Configuration.addTags

    [ https://issues.apache.org/jira/browse/HADOOP-15551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518526#comment-16518526 ] 

Todd Lipcon commented on HADOOP-15551:
--------------------------------------

Perf results of a simple Java program that instantiates and uses a single Configuration
{code}
without patch:

       1220.803922      task-clock (msec)         #    2.075 CPUs utilized            ( +-  0.97% )
             2,038      context-switches          #    0.002 M/sec                    ( +-  0.52% )
                39      cpu-migrations            #    0.032 K/sec                    ( +-  5.07% )
            22,468      page-faults               #    0.018 M/sec                    ( +-  0.41% )
     3,992,441,054      cycles                    #    3.270 GHz                      ( +-  0.78% )
     4,458,310,856      instructions              #    1.12  insn per cycle           ( +-  0.71% )
       833,135,256      branches                  #  682.448 M/sec                    ( +-  0.70% )
        34,458,171      branch-misses             #    4.14% of all branches          ( +-  0.76% )

       0.588308028 seconds time elapsed                                          ( +-  1.80% )

with patch:

       1158.420617      task-clock (msec)         #    2.106 CPUs utilized            ( +-  0.80% )
             1,998      context-switches          #    0.002 M/sec                    ( +-  0.93% )
                40      cpu-migrations            #    0.035 K/sec                    ( +-  9.65% )
            22,025      page-faults               #    0.019 M/sec                    ( +-  0.45% )
     3,957,999,054      cycles                    #    3.417 GHz                      ( +-  0.89% )
     4,468,617,304      instructions              #    1.13  insn per cycle           ( +-  0.71% )
       834,835,030      branches                  #  720.667 M/sec                    ( +-  0.72% )
        34,494,708      branch-misses             #    4.13% of all branches          ( +-  0.67% )

       0.550146256 seconds time elapsed                                          ( +-  0.92% )
{code}

(ie this silly change saves ~50ms of CPU)

> Avoid use of Java8 streams in Configuration.addTags
> ---------------------------------------------------
>
>                 Key: HADOOP-15551
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15551
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: performance
>    Affects Versions: 3.2
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Major
>         Attachments: hadoop-15551.txt
>
>
> Configuration.addTags oddly uses Arrays.stream instead of a more conventional mechanism. When profiling a simple program that uses Configuration, I found that addTags was taking tens of millis of CPU to do very little work the first time it's called, accounting for ~8% of total profiler samples in my program.
> {code}
> [9] 4.52% 253 self: 0.00% 0 java/lang/invoke/MethodHandleNatives.linkCallSite
> [9] 3.71% 208 self: 0.00% 0 java/lang/invoke/MethodHandleNatives.linkMethodHandleConstant
> {code}
> I don't know much about the implementation details of the Streams stuff, but it seems it's probably meant more for cases with very large arrays or somesuch. Switching to a normal Set.addAll() call eliminates this from the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org