You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Eric Yang (JIRA)" <ji...@apache.org> on 2018/05/21 18:58:00 UTC

[jira] [Comment Edited] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6

    [ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482905#comment-16482905 ] 

Eric Yang edited comment on YARN-8326 at 5/21/18 6:57 PM:
----------------------------------------------------------

This appears to be introduced by YARN-5662 by turning on container monitor from false to true.  This feature is used by opportunistic container scheduling and pre-emption to gather statics of the containers to make scheduling decisions.  You can disable this feature by:

{code}
    <property>
      <name>yarn.nodemanager.container-monitor.enabled</name>
      <value>false</value>
    </property>
{code}

Or reduce the stats collection time from 3 seconds to 300 milliseconds (use more system resources, but faster scheduling):

{code}
    <property>
      <name>yarn.nodemanager.container-monitor.interval-ms</name>
      <value>300</value>
    </property>
{code}

Timer optimization might be possible to the work done in YARN-2883.  The queuing and scheduling of containers is based on monitoring thread information.  If it takes several seconds to wait for information to become available before next container is scheduled, then it can introduced artificial delay to rapidly launching containers.  The timer value can not be smaller than certain value otherwise monitoring/container forking both will tax cpu resources too much.  If your workloads take less time than container scheduling/launch, then you might need to revisit how to decrease the containers to launch, and increase the work to run in containers.  [~hlhuang@us.ibm.com] Can you confirm that those settings changes the benchmark result?


was (Author: eyang):
This appears to be introduced by YARN-5662 by turning on container monitor from false to true.  This feature is used by opportunistic container scheduling and pre-emption to gather statics of the containers to make scheduling decisions.  You can disable this feature by:

{code}
    <property>
      <name>yarn.nodemanager.container-monitor.enabled</name>
      <value>false</value>
    </property>
{code}

Or reduce the stats collection time from 3 seconds to 300 milliseconds (use more system resources, but faster scheduling):

{code}
    <property>
      <name>yarn.nodemanager.container-monitor.interval-ms</name>
      <value>300</value>
    </property>
{code}

Timer optimization might be possible to the work done in YARN-2883.  The queuing and scheduling of containers is based on monitoring thread information.  If it takes several seconds to wait for information to become available before next container is scheduled, then it can introduced artificial delay to rapidly launching containers.  The timer value can not be smaller than certain value otherwise monitoring/container forking both will tax cpu resources too much much.  If your workloads take less time than container scheduling/launch, then you might need to revisit how to decrease the containers to launch, and increase the work to run in containers.  [~hlhuang@us.ibm.com] Can you confirm that those settings changes the benchmark result?

> Yarn 3.0 seems runs slower than Yarn 2.6
> ----------------------------------------
>
>                 Key: YARN-8326
>                 URL: https://issues.apache.org/jira/browse/YARN-8326
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 3.0.0
>         Environment: This is the yarn-site.xml for 3.0. 
>  
> <configuration>
> <property>
>  <name>hadoop.registry.dns.bind-port</name>
>  <value>5353</value>
>  </property>
> <property>
>  <name>hadoop.registry.dns.domain-name</name>
>  <value>hwx.site</value>
>  </property>
> <property>
>  <name>hadoop.registry.dns.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>hadoop.registry.dns.zone-mask</name>
>  <value>255.255.255.0</value>
>  </property>
> <property>
>  <name>hadoop.registry.dns.zone-subnet</name>
>  <value>172.17.0.0</value>
>  </property>
> <property>
>  <name>manage.include.files</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.acl.enable</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.admin.acl</name>
>  <value>yarn</value>
>  </property>
> <property>
>  <name>yarn.client.nodemanager-connect.max-wait-ms</name>
>  <value>60000</value>
>  </property>
> <property>
>  <name>yarn.client.nodemanager-connect.retry-interval-ms</name>
>  <value>10000</value>
>  </property>
> <property>
>  <name>yarn.http.policy</name>
>  <value>HTTP_ONLY</value>
>  </property>
> <property>
>  <name>yarn.log-aggregation-enable</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.log-aggregation.retain-seconds</name>
>  <value>2592000</value>
>  </property>
> <property>
>  <name>yarn.log.server.url</name>
>  <value>[http://xxxxxx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs]</value>
>  </property>
> <property>
>  <name>yarn.log.server.web-service.url</name>
>  <value>[http://xxxxxx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory]</value>
>  </property>
> <property>
>  <name>yarn.node-labels.enabled</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.node-labels.fs-store.retry-policy-spec</name>
>  <value>2000, 500</value>
>  </property>
> <property>
>  <name>yarn.node-labels.fs-store.root-dir</name>
>  <value>/system/yarn/node-labels</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.address</name>
>  <value>0.0.0.0:45454</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.admin-env</name>
>  <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.aux-services</name>
>  <value>mapreduce_shuffle,spark2_shuffle,timeline_collector</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
>  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.aux-services.spark2_shuffle.class</name>
>  <value>org.apache.spark.network.yarn.YarnShuffleService</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.aux-services.spark2_shuffle.classpath</name>
>  <value>/usr/spark2/aux/*</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
>  <value>org.apache.spark.network.yarn.YarnShuffleService</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.aux-services.timeline_collector.class</name>
>  <value>org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.bind-host</name>
>  <value>0.0.0.0</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.container-executor.class</name>
>  <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.container-metrics.unregister-delay-ms</name>
>  <value>60000</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.container-monitor.interval-ms</name>
>  <value>3000</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.delete.debug-delay-sec</name>
>  <value>0</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
>  <value>90</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb</name>
>  <value>1000</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
>  <value>0.25</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.health-checker.interval-ms</name>
>  <value>135000</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.health-checker.script.timeout-ms</name>
>  <value>60000</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.linux-container-executor.group</name>
>  <value>hadoop</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.local-dirs</name>
>  <value>/hadoop/yarn/local</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.log-aggregation.compression-type</name>
>  <value>gz</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.log-aggregation.debug-enabled</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.log-aggregation.num-log-files-per-app</name>
>  <value>30</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
>  <value>3600</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.log-dirs</name>
>  <value>/hadoop/yarn/log</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.log.retain-seconds</name>
>  <value>604800</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.pmem-check-enabled</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.recovery.dir</name>
>  <value>/var/log/hadoop-yarn/nodemanager/recovery-state</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.recovery.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.recovery.supervised</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.remote-app-log-dir</name>
>  <value>/app-logs</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
>  <value>logs</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.resource-plugins</name>
>  <value></value>
>  </property>
> <property>
>  <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name>
>  <value>auto</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.resource-plugins.gpu.docker-plugin</name>
>  <value>nvidia-docker-v1</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.resource-plugins.gpu.docker-plugin.nvidiadocker-
>  v1.endpoint</name>
>  <value>[http://localhost:3476/v1.0/docker/cli]</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name>
>  <value></value>
>  </property>
> <property>
>  <name>yarn.nodemanager.resource.cpu-vcores</name>
>  <value>6</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.resource.memory-mb</name>
>  <value>12288</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.resource.percentage-physical-cpu-limit</name>
>  <value>80</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.runtime.linux.allowed-runtimes</name>
>  <value>default,docker</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.runtime.linux.docker.allowed-container-networks</name>
>  <value>host,none,bridge</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.runtime.linux.docker.capabilities</name>
>  <value>
>  CHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,SETFCAP,
>  SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.runtime.linux.docker.default-container-network</name>
>  <value>host</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.runtime.linux.docker.privileged-containers.acl</name>
>  <value></value>
>  </property>
> <property>
>  <name>yarn.nodemanager.runtime.linux.docker.privileged-containers.allowed</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.vmem-check-enabled</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.vmem-pmem-ratio</name>
>  <value>2.1</value>
>  </property>
> <property>
>  <name>yarn.nodemanager.webapp.cross-origin.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.address</name>
>  <value>xxxxxxx:8050</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.admin.address</name>
>  <value>xxxxxx:8141</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.am.max-attempts</name>
>  <value>2</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.bind-host</name>
>  <value>0.0.0.0</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.connect.max-wait.ms</name>
>  <value>900000</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.connect.retry-interval.ms</name>
>  <value>30000</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.fs.state-store.retry-policy-spec</name>
>  <value>2000, 500</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.fs.state-store.uri</name>
>  <value> </value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.ha.enabled</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.hostname</name>
>  <value>xxxxxxxx</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval</name>
>  <value>15000</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor</name>
>  <value>1</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round</name>
>  <value>0.25</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.nodes.exclude-path</name>
>  <value>/etc/hadoop/conf/yarn.exclude</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.recovery.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.resource-tracker.address</name>
>  <value>xxxxxxx:8025</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.scheduler.address</name>
>  <value>xxxxxxxx:8030</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.scheduler.class</name>
>  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.scheduler.monitor.enable</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.state-store.max-completed-applications</name>
>  <value>${yarn.resourcemanager.max-completed-applications}</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.store.class</name>
>  <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.system-metrics-publisher.dispatcher.pool-size</name>
>  <value>10</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.webapp.address</name>
>  <value>xxxxxx:8088</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.webapp.cross-origin.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.webapp.https.address</name>
>  <value>wxxxxxx:8090</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms</name>
>  <value>10000</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.zk-acl</name>
>  <value>world:anyone:rwcda</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.zk-address</name>
>  <value>xxxxxx:2181,xxxxxx:2181,xxxxxx:2181</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.zk-num-retries</name>
>  <value>1000</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.zk-retry-interval-ms</name>
>  <value>1000</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.zk-state-store.parent-path</name>
>  <value>/rmstore</value>
>  </property>
> <property>
>  <name>yarn.resourcemanager.zk-timeout-ms</name>
>  <value>10000</value>
>  </property>
> <property>
>  <name>yarn.rm.system-metricspublisher.emit-container-events</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled</name>
>  <value>false</value>
>  </property>
> <property>
>  <name>yarn.scheduler.maximum-allocation-mb</name>
>  <value>12288</value>
>  </property>
> <property>
>  <name>yarn.scheduler.maximum-allocation-vcores</name>
>  <value>6</value>
>  </property>
> <property>
>  <name>yarn.scheduler.minimum-allocation-mb</name>
>  <value>64</value>
>  </property>
> <property>
>  <name>yarn.scheduler.minimum-allocation-vcores</name>
>  <value>1</value>
>  </property>
> <property>
>  <name>yarn.service.framework.path</name>
>  <value>/yarn/service-dep.tar.gz</value>
>  </property>
> <property>
>  <name>yarn.system-metricspublisher.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.address</name>
>  <value>xxxxxx:10200</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.bind-host</name>
>  <value>0.0.0.0</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.client.max-retries</name>
>  <value>30</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.client.retry-interval-ms</name>
>  <value>1000</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.entity-group-fs-store.active-dir</name>
>  <value>/ats/active/</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.entity-group-fs-store.app-cache-size</name>
>  <value>10</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.entity-group-fs-store.cleaner-interval-seconds</name>
>  <value>3600</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.entity-group-fs-store.done-dir</name>
>  <value>/ats/done/</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes</name>
>  <value></value>
>  </property>
> <property>
>  <name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath</name>
>  <value></value>
>  </property>
> <property>
>  <name>yarn.timeline-service.entity-group-fs-store.retain-seconds</name>
>  <value>604800</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.entity-group-fs-store.scan-interval-seconds</name>
>  <value>60</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.entity-group-fs-store.summary-store</name>
>  <value>org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.generic-application-history.store-class</name>
>  <value>org.apache.hadoop.yarn.server.applicationhistoryservice.NullApplicationHistoryStore</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.hbase-schema.prefix</name>
>  <value>prod.</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.hbase.configuration.file</name>
>  <value>[file:///etc/yarn-hbase/conf/hbase-site.xml]</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.hbase.coprocessor.jar.hdfs.location</name>
>  <value>[file:///hadoop-yarn-client/timelineservice/hadoop-yarn-server-timelineservice-hbase-coprocessor.jar|file:///usr/hdp/current/hadoop-yarn-client/timelineservice/hadoop-yarn-server-timelineservice-hbase-coprocessor.jar]</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.http-authentication.simple.anonymous.allowed</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.http-authentication.type</name>
>  <value>simple</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.http-cross-origin.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.leveldb-state-store.path</name>
>  <value>/hadoop/yarn/timeline</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.leveldb-timeline-store.path</name>
>  <value>/hadoop/yarn/timeline</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.leveldb-timeline-store.read-cache-size</name>
>  <value>104857600</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size</name>
>  <value>10000</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size</name>
>  <value>10000</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms</name>
>  <value>300000</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.reader.webapp.address</name>
>  <value>xxxxxx:8198</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.reader.webapp.https.address</name>
>  <value>xxxxxx:8199</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.recovery.enabled</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.state-store-class</name>
>  <value>org.apache.hadoop.yarn.server.timeline.recovery.LeveldbTimelineStateStore</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.store-class</name>
>  <value>org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.ttl-enable</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.ttl-ms</name>
>  <value>2678400000</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.version</name>
>  <value>2.0</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.versions</name>
>  <value>1.5f,2.0f</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.webapp.address</name>
>  <value>xxxxxx:8188</value>
>  </property>
> <property>
>  <name>yarn.timeline-service.webapp.https.address</name>
>  <value>xxxxxx:8190</value>
>  </property>
> <property>
>  <name>yarn.webapp.api-service.enable</name>
>  <value>true</value>
>  </property>
> <property>
>  <name>yarn.webapp.ui2.enable</name>
>  <value>true</value>
>  </property>
> </configuration>
>            Reporter: Hsin-Liang Huang
>            Priority: Major
>         Attachments: image-2018-05-18-15-20-33-839.png, image-2018-05-18-15-22-30-948.png
>
>
> Hi,  I am running testcases on Yarn 2.6 and Yarn 3.0 and found out the performance seems like twice slower on Yarn 3.0, and the performance would get even slower if we acquire more containers.   I looked at the node manager logs on 2.6 vs 3.0.   Here is what I find below.  
> On 2.6 ,  this is a life cycle of a specific container,  from beginning to end, it takes about{color:#14892c}  8 seconds{color} (9:53:50 to 9:53:58). 
> !image-2018-05-18-15-20-33-839.png!
> On 3.0: the life cycle of a specific container looks like this,  it takes{color:#d04437} 20 seconds{color} to finish the same job.  (9:51:44 to 9:52:04)
> !image-2018-05-18-15-22-30-948.png!
>  It seems like on 3.0, it spends an extra 5 seconds on monitor.ContinaerMonitorImpl  (marked in {color:#d04437}red{color}) which doesn't happen in 2.6,  and also after the job is done, and the container is exiting,  on 3.0, it took 5 seconds to do that (9:51:59 to 9:52:04)  which on 2.6, it only took less than 1/.2 of the time. (9: 53:56 to 9:53:58).  
>    Since we are running the same unit testcases and usually acquire more than 4 containers,  therefore, when it addess up all these extra seconds, it became a huge performance issue.  On 2.6, the unittest runs 7 hours whilc on 3.0, the same unitests runs 11 hours.  I was told this performance delay might be caused by Hadoop’s new monitoring system Timeline service v2.  Could someone take a look of this?   Thanks for any help on this!!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org