You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Ethan Li <et...@gmail.com> on 2020/03/05 23:22:10 UTC

Re: Storm 1.2.1 - Excessive workerbeats causing long GC and thus disconneting zookeeper

So you are seeing 65MB on UI. UI only shows assigned memory, not memory usage. 

As I mentioned earlier, -Xmx%HEAP-MEM%m in worker.childopts is designed to be replaced with the total memory assigned for the worker. ( https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java#L392 <https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java#L392>). Your config "worker.childopts: "-Xmx2048m -XX:+PrintGCDetails” will make every worker use 2GB. But it will not show up on UI as 2GB because UI doesn’t read “-Xmx2048m” from worker.childopts.

The assigned memory on UI is the sum of the assigned memory of all the executors in the worker. The amount of  assigned memory for a worker depends on how it’s scheduled and what the executors in the worker are. 

For example,  by default, every instance/executor is configured with 128MB memory (https://github.com/apache/storm/blob/1.x-branch/conf/defaults.yaml#L276 <https://github.com/apache/storm/blob/1.x-branch/conf/defaults.yaml#L276>). If 4 executors are scheduled in one worker, then the assigned memory for that worker is 512MB. 


Hope that helps. 


> On Feb 17, 2020, at 8:55 AM, Narasimhan Chengalvarayan <na...@gmail.com> wrote:
> 
> Hi Ethan Li,
> 
> 
> Sorry for the late reply. Please find the output where it is showing
> -Xmx2048m for worker heap . But in  storm ui we are seeing the worker
> allocated memory as 65MB for each worker.
> 
> java -server -Dlogging.sensitivity=S3 -Dlogfile.name=worker.log
> -Dstorm.home=/opt/storm/apache-storm-1.2.1
> -Dworkers.artifacts=/var/log/storm/workers-artifacts
> -Dstorm.id=Topology_334348-43-1580365369
> -Dworker.id=f1e3e060-0b32-4ecd-8c34-c486258264a4 -Dworker.port=6707
> -Dstorm.log.dir=/var/log/storm
> -Dlog4j.configurationFile=/opt/storm/apache-storm-1.2.1/log4j2/worker.xml
> -DLog4jContextSelector=org.apache.logging.log4j.core.selector.BasicContextSelector
> -Dstorm.local.dir=/var/log/storm/tmp -Xmx2048m -XX:+PrintGCDetails
> -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=artifacts/heapdump
> -Djava.library.path=/var/log/storm/tmp/supervisor/stormdist/Topology_334348-43-1580365369/resources/Linux-amd64:/var/log/storm/tmp/supervisor/stormdist/Topology_334348-43-1580365369/resources:/usr/local/lib:/opt/local/lib:/usr/lib
> -Dstorm.conf.file= -Dstorm.options=
> -Djava.io.tmpdir=/var/log/storm/tmp/workers/f1e3e060-0b32-4ecd-8c34-c486258264a4/tmp
> -cp /opt/storm/apache-storm-1.2.1/lib/*:/opt/storm/apache-storm-1.2.1/extlib/*:/opt/storm/apache-storm-1.2.1/conf:/var/log/storm/tmp/supervisor/stormdist/Topology_334348-43-1580365369/stormjar.jar
> org.apache.storm.daemon.worker Topology_334348-43-1580365369
> 7fe05c2b-ebcf-491b-a8cc-2565834b5988 6707
> f1e3e060-0b32-4ecd-8c34-c486258264a4
> 
> On Tue, 4 Feb 2020 at 04:36, Ethan Li <et...@gmail.com> wrote:
>> 
>> This is where the worker launch command is composed:
>> 
>> https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java#L653-L671
>> 
>> Since your worker.childopts is set, and topology.worker.childopts is empty,
>> 
>> 
>> worker.childopts: "-Xmx2048m -XX:+PrintGCDetails
>> -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
>> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
>> -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=artifacts/heapdump”
>> 
>> 
>> The command to launch the worker process should have -Xmx2048m.
>> 
>> I don’t see why it would be 65MB. And what do you mean by coming as 65MB only? Is it only committed 65MB? Or is the max only 65MB?
>> 
>> Could you submit the topology and show the result of “ps -aux |grep --ignore-case worker”? This will show you the JVM parameters of the worker process.
>> 
>> 
>> (BTW, -Xmx%HEAP-MEM%m in worker.childopts is designed to be replaced https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/daemon/supervisor/BasicContainer.java#L392)
>> 
>> 
>> 
>> 
>> On Jan 30, 2020, at 2:12 AM, Narasimhan Chengalvarayan <na...@gmail.com> wrote:
>> 
>> Hi Ethan,
>> 
>> 
>> Please find the configuration detail
>> 
>> **********************************************
>> 
>> #Licensed to the Apache Software Foundation (ASF) under one
>> # or more contributor license agreements.  See the NOTICE file
>> # distributed with this work for additional information
>> # regarding copyright ownership.  The ASF licenses this file
>> # to you under the Apache License, Version 2.0 (the
>> # "License"); you may not use this file except in compliance
>> # with the License.  You may obtain a copy of the License at
>> #
>> # http://www.apache.org/licenses/LICENSE-2.0
>> #
>> # Unless required by applicable law or agreed to in writing, software
>> # distributed under the License is distributed on an "AS IS" BASIS,
>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> # See the License for the specific language governing permissions and
>> # limitations under the License.
>> 
>> ########### These MUST be filled in for a storm configuration
>> storm.zookeeper.servers:
>>    - "ZK1"
>>    - "ZK2"
>>    - "ZK3"
>> 
>> nimbus.seeds: ["host1","host2"]
>> ui.port : 8081
>> storm.log.dir: "/var/log/storm"
>> storm.local.dir: "/var/log/storm/tmp"
>> supervisor.slots.ports:
>> - 6700
>> - 6701
>> - 6702
>> - 6703
>> - 6704
>> - 6705
>> - 6706
>> - 6707
>> - 6708
>> - 6709
>> - 6710
>> - 6711
>> - 6712
>> - 6713
>> - 6714
>> - 6715
>> - 6716
>> - 6717
>> worker.heap.memory.mb: 1639
>> topology.worker.max.heap.size.mb: 1639
>> worker.childopts: "-Xmx2048m -XX:+PrintGCDetails
>> -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
>> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
>> -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=artifacts/heapdump"
>> worker.gc.childopts: ""
>> 
>> topology.min.replication.count: 2
>> #
>> #
>> # ##### These may optionally be filled in:
>> #
>> ## List of custom serializations
>> # topology.kryo.register:
>> #     - org.mycompany.MyType
>> #     - org.mycompany.MyType2: org.mycompany.MyType2Serializer
>> #
>> ## List of custom kryo decorators
>> # topology.kryo.decorators:
>> #     - org.mycompany.MyDecorator
>> #
>> ## Locations of the drpc servers
>> # drpc.servers:
>> #     - "server1"
>> #     - "server2"
>> 
>> ## Metrics Consumers
>> # topology.metrics.consumer.register:
>> #   - class: "org.apache.storm.metric.LoggingMetricsConsumer"
>> #     parallelism.hint: 1
>> #   - class: "org.mycompany.MyMetricsConsumer"
>> #     parallelism.hint: 1
>> #     argument:
>> #       - endpoint: "metrics-collector.mycompany.org"
>> 
>> *********************************************************************************************
>> 
>> 
>> On Thu, 30 Jan 2020 at 03:07, Ethan Li <et...@gmail.com> wrote:
>> 
>> 
>> I am not sure. Can you provide your configs?
>> 
>> 
>> 
>> On Jan 28, 2020, at 6:33 PM, Narasimhan Chengalvarayan <na...@gmail.com> wrote:
>> 
>> Hi Team,
>> 
>> Do you have any idea, In storm apache 1.1.0 we have set worker size as
>> 2 GB , Once we upgrade to 1.2.1 .It was coming as 65MB only. please
>> help us .DO we need to follow different configuration setting for
>> storm 1.2.1 or it is a bug.
>> 
>> On Mon, 27 Jan 2020 at 16:44, Narasimhan Chengalvarayan
>> <na...@gmail.com> wrote:
>> 
>> 
>> Hi Team,
>> 
>> In storm 1.2.1 version, worker memory is showing as 65MB. But we have
>> set worker memory has 2GB.
>> 
>> On Fri, 24 Jan 2020 at 01:25, Ethan Li <et...@gmail.com> wrote:
>> 
>> 
>> 
>> 1) What is stored in Workerbeats znode?
>> 
>> 
>> Worker periodically sends heartbeat to zookeeper under workerbeats node.
>> 
>> 2) Which settings control the frequency of workerbeats update
>> 
>> 
>> 
>> https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/Config.java#L1534-L1539
>> task.heartbeat.frequency.secs Default to 3
>> 
>> 3)What will be the impact if the frequency is reduced
>> 
>> 
>> Nimbus get the worker status from workerbeat znode to know if executors on workers are alive or not.
>> https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/Config.java#L595-L601
>> If heartbeat exceeds nimbus.task.timeout.secs (default to 30), nimbus will think the certain executor is dead and try to reschedule.
>> 
>> To reduce the issue on zookeeper, a pacemaker component was introduced. https://github.com/apache/storm/blob/master/docs/Pacemaker.md
>> You might want to use it too.
>> 
>> Thanks
>> 
>> 
>> On Dec 10, 2019, at 4:36 PM, Surajeet Dev <su...@gmail.com> wrote:
>> 
>> We upgraded Storm version to 1.2.1 , and since then have been consistently observing Zookeeper session timeouts .
>> 
>> On analysis , we observed that there is high frequency of updates on workerbeats znode with data upto size of 50KB. This causes the Garbage Collector to kickoff lasting more than 15 secs , resulting in Zookeper session timeout
>> 
>> I understand , increasing the session timeout will alleviate the issue , but we have already done that twice
>> 
>> My questions are:
>> 
>> 1) What is stored in Workerbeats znode?
>> 2) Which settings control the frequency of workerbeats update
>> 3)What will be the impact if the frequency is reduced
>> 
>> 
>> 
>> 
>> 
>> --
>> Thanks
>> C.Narasimhan
>> 09739123245
>> 
>> 
>> 
>> 
>> --
>> Thanks
>> C.Narasimhan
>> 09739123245
>> 
>> 
>> 
>> 
>> --
>> Thanks
>> C.Narasimhan
>> 09739123245
>> 
>> 
> 
> 
> -- 
> Thanks
> C.Narasimhan
> 09739123245