You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2018/06/20 19:41:00 UTC
[jira] [Commented] (HADOOP-15551) Avoid use of Java8 streams in
Configuration.addTags
[ https://issues.apache.org/jira/browse/HADOOP-15551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518526#comment-16518526 ]
Todd Lipcon commented on HADOOP-15551:
--------------------------------------
Perf results of a simple Java program that instantiates and uses a single Configuration
{code}
without patch:
1220.803922 task-clock (msec) # 2.075 CPUs utilized ( +- 0.97% )
2,038 context-switches # 0.002 M/sec ( +- 0.52% )
39 cpu-migrations # 0.032 K/sec ( +- 5.07% )
22,468 page-faults # 0.018 M/sec ( +- 0.41% )
3,992,441,054 cycles # 3.270 GHz ( +- 0.78% )
4,458,310,856 instructions # 1.12 insn per cycle ( +- 0.71% )
833,135,256 branches # 682.448 M/sec ( +- 0.70% )
34,458,171 branch-misses # 4.14% of all branches ( +- 0.76% )
0.588308028 seconds time elapsed ( +- 1.80% )
with patch:
1158.420617 task-clock (msec) # 2.106 CPUs utilized ( +- 0.80% )
1,998 context-switches # 0.002 M/sec ( +- 0.93% )
40 cpu-migrations # 0.035 K/sec ( +- 9.65% )
22,025 page-faults # 0.019 M/sec ( +- 0.45% )
3,957,999,054 cycles # 3.417 GHz ( +- 0.89% )
4,468,617,304 instructions # 1.13 insn per cycle ( +- 0.71% )
834,835,030 branches # 720.667 M/sec ( +- 0.72% )
34,494,708 branch-misses # 4.13% of all branches ( +- 0.67% )
0.550146256 seconds time elapsed ( +- 0.92% )
{code}
(ie this silly change saves ~50ms of CPU)
> Avoid use of Java8 streams in Configuration.addTags
> ---------------------------------------------------
>
> Key: HADOOP-15551
> URL: https://issues.apache.org/jira/browse/HADOOP-15551
> Project: Hadoop Common
> Issue Type: Improvement
> Components: performance
> Affects Versions: 3.2
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Major
> Attachments: hadoop-15551.txt
>
>
> Configuration.addTags oddly uses Arrays.stream instead of a more conventional mechanism. When profiling a simple program that uses Configuration, I found that addTags was taking tens of millis of CPU to do very little work the first time it's called, accounting for ~8% of total profiler samples in my program.
> {code}
> [9] 4.52% 253 self: 0.00% 0 java/lang/invoke/MethodHandleNatives.linkCallSite
> [9] 3.71% 208 self: 0.00% 0 java/lang/invoke/MethodHandleNatives.linkMethodHandleConstant
> {code}
> I don't know much about the implementation details of the Streams stuff, but it seems it's probably meant more for cases with very large arrays or somesuch. Switching to a normal Set.addAll() call eliminates this from the profile.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org