You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Salih Şen <sl...@gmail.com> on 2018/10/16 18:07:03 UTC
Hive LLAP low Vcore allocation

Hi
We are trying to improve our LLAP performance on our cluster but we've
noticed that event though LLAP daemon containers get configured memory,
they get only 1 vcore per container.
We are running 10 LLAP deamons using Slider. There are no other containers
running on the nodes that run LLAP daemons and there are 0 memory available
but 43 vcores running idle.

I can see the following lines on Slider logs so I suspect SliderAppMaster
doesn't request vcores from Yarn:

2018-10-16 18:38:42,503 [AmExecutor-006] INFO  appmaster.SliderAppMaster -
Registered service under /users/hive/services/org-apache-slider/llap0;
absolute path /registry/users/hive/services/org-apache-slider/llap0
2018-10-16 18:38:42,510 [AmExecutor-006] INFO  state.AppState - Reviewing
RoleStatus{name='LLAP', group=LLAP, key=1, desired=10, actual=0,
requested=0, releasing=0, failed=0, startFailed=0, started=0, completed=0,
totalRequested=0, preempted=0, nodeFailed=0, failedRecently=0,
limitsExceeded=0, resourceRequirements=<memory:445440, vCores:1>,
isAntiAffinePlacement=false, failureMessage='',
providerRole=ProviderRole{name='LLAP', group=LLAP, id=1, placementPolicy=0,
nodeFailureThreshold=3, placementTimeoutSeconds=30,
labelExpression='null'}, failedContainers=[],
healthThresholdMonitorEnabled=true} :
2018-10-16 18:38:42,510 [AmExecutor-006] INFO  state.AppState - LLAP:
Asking for 10 more nodes(s) for a total of 10
2018-10-16 18:38:42,512 [AmExecutor-006] INFO  state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,513 [AmExecutor-006] INFO  state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,513 [AmExecutor-006] INFO  state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,513 [AmExecutor-006] INFO  state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,513 [AmExecutor-006] INFO  state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,513 [AmExecutor-006] INFO  state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,513 [AmExecutor-006] INFO  state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,513 [AmExecutor-006] INFO  state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,514 [AmExecutor-006] INFO  state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null
2018-10-16 18:38:42,514 [AmExecutor-006] INFO  state.AppState - Container
ask is Capability[<memory:445440, vCores:1>]Priority[1073741825] and label
= null

And here is the configuration output from same log file that might be
relevant:

 "credentials" : { },
  "components" : {
    "LLAP" : {
      "yarn.container.health.threshold.init.delay.secs" : "400",
      "yarn.role.priority" : "1",
      "yarn.component.instances" : "10",
      "yarn.memory" : "445440",
      "yarn.resource.normalization.enabled" : "false",
      "yarn.container.health.threshold.window.secs" : "300",
      "yarn.component.placement.policy" : "0",
      "yarn.container.health.threshold.percent" : "80"
    },
    "slider-appmaster" : {
      "yarn.vcores" : "1",
      "yarn.component.instances" : "1",
      "yarn.memory" : "1024"
    }
  }
},

yarn.nodemanager.resource.cpu-vcores,
yarn.scheduler.maximum-allocation-vcores,
hive.llap.daemon.vcpus.per.instance,
hive.llap.daemon.num.executors are all set to 44.

We can confirm 44 executors running per instance on LLAP Daemon web UI.

We are using HDP 2.7.3.2.6.4.0-91 with YARN 2.7.3, Hive 1.2.1000, Slider
0.92.0.

Any ideas how to utilize more CPU with LLAP daemons?

Thanks.