You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Igor Kuzmenko <f1...@gmail.com> on 2016/03/27 20:26:57 UTC

Topology code distribution takes too much time

Hello, I'm using Hortonworks Data Platform v2.3.4 with included storm 0.10.
After deploying my topology using "*storm jar*" command it takes about 3
min to distribute code. The network is 10Gb/s, topology jar is about 150MB,
cluster has 2 nodes with supervisors on them, so I assume that this process
shouldn't take much time. Actualy, it wasn't with Hortonworks platform
2.3.0, I can't understand what happend since than. Heres change log
<http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_HDP_RelNotes/content/patch_storm.html>
of
HDP, hope you can help.

Here's logs.
Nimbus download jar pretty fast:
*2016-02-25 17:10:13.040 b.s.d.nimbus [INFO] Uploading file from client to
/opt/hadoop/storm/nimbus/inbox/stormjar-7352b097-8829-4268-a81f-29d820c0f311.jar*
*2016-02-25 17:10:14.915 b.s.d.nimbus [INFO] Finished uploading file from
client:
/opt/hadoop/storm/nimbus/inbox/stormjar-7352b097-8829-4268-a81f-29d820c0f311.jar*
2016-02-25 17:10:14.928 b.s.d.nimbus [INFO] [req 22] Access from:
principal: op:submitTopology
2016-02-25 17:10:14.933 b.s.d.nimbus [INFO] Received topology submission
for CdrPerformanceTestTopology-phoenix-bolt-4 with conf
{"topology.max.task.parallelism" nil, "hbase.conf" {"hbase.rootdir" "hdfs://
sorm-master02.msk.mts.ru:8020/apps/hbase/data"},
"topology.submitter.principal" "", "topology.acker.executors" nil,
"topology.workers" 2, "topology.message.timeout.secs" 30, "topology.debug"
true, "topology.max.spout.pending" 10000, "storm.zookeeper.superACL" nil,
"topology.users" (), "topology.submitter.user" "", "topology.kryo.register"
{"org.apache.avro.util.Utf8" nil,
"ru.mts.sorm.schemes.avro.cdr.CbossCdrRecord" nil},
"topology.kryo.decorators" (), "storm.id"
"CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414", "topology.name"
"CdrPerformanceTestTopology-phoenix-bolt-4"}
2016-02-25 17:10:14.936 b.s.d.nimbus [INFO] nimbus file
location:/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
2016-02-25 17:10:15.160 b.s.c.LocalFileSystemCodeDistributor [INFO] Created
meta file
/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/storm-code-distributor.meta
upload successful.
2016-02-25 17:10:15.164 b.s.zookeeper [INFO] Node already in queue for
leader lock.
2016-02-25 17:10:15.165 b.s.d.nimbus [INFO] desired replication count of 1
not achieved but we have hit the max wait time 60 so moving on with
replication count = 1
2016-02-25 17:10:15.169 b.s.d.nimbus [INFO] Activating
CdrPerformanceTestTopology-phoenix-bolt-4:
CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
2016-02-25 17:10:15.186 b.s.s.EvenScheduler [INFO] Available slots:
(["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700]
["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6701]
["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700]
["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6701])
2016-02-25 17:10:15.189 b.s.d.nimbus [INFO] Setting new assignment for
topology id CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414:
#backtype.storm.daemon.common.Assignment{:master-code-dir
"/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414",
:node->host {"edef7b4d-2358-41c6-8a03-a7638d0f68c6" "sorm-data06.msk.mts.ru",
"0a6e4de5-dd83-4307-b7e6-1db60c4c0898" "sorm-data07.msk.mts.ru"},
:executor->node+port {[8 8] ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700],
[2 2] ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [7 7]
["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [3 3]
["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [1 1]
["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [6 6]
["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [9 9]
["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [11 11]
["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [5 5]
["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [10 10]
["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [4 4]
["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700]}, :executor->start-time-secs
{[8 8] 1456409415, [2 2] 1456409415, [7 7] 1456409415, [3 3] 1456409415, [1
1] 1456409415, [6 6] 1456409415, [9 9] 1456409415, [11 11] 1456409415, [5
5] 1456409415, [10 10] 1456409415, [4 4] 1456409415}}

But supervisor stuck somewhere:
*2016-02-25 17:10:15.198 b.s.d.supervisor [INFO] Downloading code for storm
id* CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414 from
/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
2016-02-25 17:10:15.199 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
2016-02-25 17:10:15.205 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
2016-02-25 17:10:17.405 b.s.c.LocalFileSystemCodeDistributor [INFO]
Attempting to download meta file
/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormjar.jar
from remote sorm-master02.msk.mts.ru:6627
2016-02-25 *17:10:17.406* b.s.u.StormBoundedExponentialBackoffRetry [INFO]
The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
2016-02-25 *17:13:29.002* b.s.c.LocalFileSystemCodeDistributor [INFO]
Attempting to download meta file
/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormconf.ser
from remote sorm-master02.msk.mts.ru:6627
2016-02-25 17:13:29.003 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
2016-02-25 17:13:31.148 b.s.c.LocalFileSystemCodeDistributor [INFO]
Attempting to download meta file
/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormcode.ser
from remote sorm-master02.msk.mts.ru:6627
2016-02-25 17:13:31.148 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
*2016-02-25 17:13:34.124 b.s.d.supervisor [INFO] Finished downloading code
for storm id* CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414 from
/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414

Re: Topology code distribution takes too much time

Posted by Igor Kuzmenko <f1...@gmail.com>.
Thanks for replay, guys.
While deploying topology, CPU load is arround zero. When topology deployed
and started to work it raises up to 25%, so its not CPU overload or cgroup.
The network is able to load 150MB in less than a second, i checked it with
SCP. When deploying topology theres no major network activity, whatched it
with ambari.
JVMs settings are storm defaults except worker Xmx increased up to 2048m
(768 default). No OutOfMemory or other kind of exceptions was thrown in a
process.

I doubt that it's hardware issue, cause all worked fine with old version of
HDP.

On Sun, Mar 27, 2016 at 11:22 PM, Erik Weathers <ew...@groupon.com>
wrote:

> Igor, *maybe* you have the supervisor in a cgroup that is limiting the CPU
> for the process?  (Don't know anything about the HDP packaged storm)
>
> - Erik
>
> On Sun, Mar 27, 2016 at 11:30 AM, Andrey Dudin <do...@gmail.com>
> wrote:
>
>> Hi Igor.
>> Try to dump threads and look on it. I think you can find problem in dump.
>>
>> Look at lan, cpu and ram load. Maybe you CPU overloaded or ram and lan.
>>
>> 2016-03-27 21:26 GMT+03:00 Igor Kuzmenko <f1...@gmail.com>:
>>
>>> Hello, I'm using Hortonworks Data Platform v2.3.4 with included storm
>>> 0.10.
>>> After deploying my topology using "*storm jar*" command it takes about
>>> 3 min to distribute code. The network is 10Gb/s, topology jar is about
>>> 150MB, cluster has 2 nodes with supervisors on them, so I assume that this
>>> process shouldn't take much time. Actualy, it wasn't with Hortonworks
>>> platform 2.3.0, I can't understand what happend since than. Heres change
>>> log
>>> <http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_HDP_RelNotes/content/patch_storm.html> of
>>> HDP, hope you can help.
>>>
>>> Here's logs.
>>> Nimbus download jar pretty fast:
>>> *2016-02-25 17:10:13.040 b.s.d.nimbus [INFO] Uploading file from client
>>> to
>>> /opt/hadoop/storm/nimbus/inbox/stormjar-7352b097-8829-4268-a81f-29d820c0f311.jar*
>>> *2016-02-25 17:10:14.915 b.s.d.nimbus [INFO] Finished uploading file
>>> from client:
>>> /opt/hadoop/storm/nimbus/inbox/stormjar-7352b097-8829-4268-a81f-29d820c0f311.jar*
>>> 2016-02-25 17:10:14.928 b.s.d.nimbus [INFO] [req 22] Access from:
>>> principal: op:submitTopology
>>> 2016-02-25 17:10:14.933 b.s.d.nimbus [INFO] Received topology submission
>>> for CdrPerformanceTestTopology-phoenix-bolt-4 with conf
>>> {"topology.max.task.parallelism" nil, "hbase.conf" {"hbase.rootdir" "hdfs://
>>> sorm-master02.msk.mts.ru:8020/apps/hbase/data"},
>>> "topology.submitter.principal" "", "topology.acker.executors" nil,
>>> "topology.workers" 2, "topology.message.timeout.secs" 30, "topology.debug"
>>> true, "topology.max.spout.pending" 10000, "storm.zookeeper.superACL" nil,
>>> "topology.users" (), "topology.submitter.user" "", "topology.kryo.register"
>>> {"org.apache.avro.util.Utf8" nil,
>>> "ru.mts.sorm.schemes.avro.cdr.CbossCdrRecord" nil},
>>> "topology.kryo.decorators" (), "storm.id"
>>> "CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414", "topology.name"
>>> "CdrPerformanceTestTopology-phoenix-bolt-4"}
>>> 2016-02-25 17:10:14.936 b.s.d.nimbus [INFO] nimbus file
>>> location:/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
>>> 2016-02-25 17:10:15.160 b.s.c.LocalFileSystemCodeDistributor [INFO]
>>> Created meta file
>>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/storm-code-distributor.meta
>>> upload successful.
>>> 2016-02-25 17:10:15.164 b.s.zookeeper [INFO] Node already in queue for
>>> leader lock.
>>> 2016-02-25 17:10:15.165 b.s.d.nimbus [INFO] desired replication count of
>>> 1 not achieved but we have hit the max wait time 60 so moving on with
>>> replication count = 1
>>> 2016-02-25 17:10:15.169 b.s.d.nimbus [INFO] Activating
>>> CdrPerformanceTestTopology-phoenix-bolt-4:
>>> CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
>>> 2016-02-25 17:10:15.186 b.s.s.EvenScheduler [INFO] Available slots:
>>> (["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700]
>>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6701]
>>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700]
>>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6701])
>>> 2016-02-25 17:10:15.189 b.s.d.nimbus [INFO] Setting new assignment for
>>> topology id CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414:
>>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>>> "/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414",
>>> :node->host {"edef7b4d-2358-41c6-8a03-a7638d0f68c6" "
>>> sorm-data06.msk.mts.ru", "0a6e4de5-dd83-4307-b7e6-1db60c4c0898" "
>>> sorm-data07.msk.mts.ru"}, :executor->node+port {[8 8]
>>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [2 2]
>>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [7 7]
>>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [3 3]
>>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [1 1]
>>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [6 6]
>>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [9 9]
>>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [11 11]
>>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [5 5]
>>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [10 10]
>>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [4 4]
>>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700]}, :executor->start-time-secs
>>> {[8 8] 1456409415, [2 2] 1456409415, [7 7] 1456409415, [3 3] 1456409415, [1
>>> 1] 1456409415, [6 6] 1456409415, [9 9] 1456409415, [11 11] 1456409415, [5
>>> 5] 1456409415, [10 10] 1456409415, [4 4] 1456409415}}
>>>
>>> But supervisor stuck somewhere:
>>> *2016-02-25 17:10:15.198 b.s.d.supervisor [INFO] Downloading code for
>>> storm id* CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414 from
>>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
>>> 2016-02-25 17:10:15.199 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
>>> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
>>> 2016-02-25 17:10:15.205 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
>>> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
>>> 2016-02-25 17:10:17.405 b.s.c.LocalFileSystemCodeDistributor [INFO]
>>> Attempting to download meta file
>>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormjar.jar
>>> from remote sorm-master02.msk.mts.ru:6627
>>> 2016-02-25 *17:10:17.406* b.s.u.StormBoundedExponentialBackoffRetry
>>> [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries
>>> [5]
>>> 2016-02-25 *17:13:29.002* b.s.c.LocalFileSystemCodeDistributor [INFO]
>>> Attempting to download meta file
>>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormconf.ser
>>> from remote sorm-master02.msk.mts.ru:6627
>>> 2016-02-25 17:13:29.003 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
>>> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
>>> 2016-02-25 17:13:31.148 b.s.c.LocalFileSystemCodeDistributor [INFO]
>>> Attempting to download meta file
>>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormcode.ser
>>> from remote sorm-master02.msk.mts.ru:6627
>>> 2016-02-25 17:13:31.148 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
>>> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
>>> *2016-02-25 17:13:34.124 b.s.d.supervisor [INFO] Finished downloading
>>> code for storm id*
>>> CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414 from
>>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
>>>
>>>
>>>
>>
>>
>> --
>> С уважением Дудин Андрей
>>
>
>

Re: Topology code distribution takes too much time

Posted by Erik Weathers <ew...@groupon.com>.
Igor, *maybe* you have the supervisor in a cgroup that is limiting the CPU
for the process?  (Don't know anything about the HDP packaged storm)

- Erik

On Sun, Mar 27, 2016 at 11:30 AM, Andrey Dudin <do...@gmail.com> wrote:

> Hi Igor.
> Try to dump threads and look on it. I think you can find problem in dump.
>
> Look at lan, cpu and ram load. Maybe you CPU overloaded or ram and lan.
>
> 2016-03-27 21:26 GMT+03:00 Igor Kuzmenko <f1...@gmail.com>:
>
>> Hello, I'm using Hortonworks Data Platform v2.3.4 with included storm
>> 0.10.
>> After deploying my topology using "*storm jar*" command it takes about 3
>> min to distribute code. The network is 10Gb/s, topology jar is about 150MB,
>> cluster has 2 nodes with supervisors on them, so I assume that this process
>> shouldn't take much time. Actualy, it wasn't with Hortonworks platform
>> 2.3.0, I can't understand what happend since than. Heres change log
>> <http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_HDP_RelNotes/content/patch_storm.html> of
>> HDP, hope you can help.
>>
>> Here's logs.
>> Nimbus download jar pretty fast:
>> *2016-02-25 17:10:13.040 b.s.d.nimbus [INFO] Uploading file from client
>> to
>> /opt/hadoop/storm/nimbus/inbox/stormjar-7352b097-8829-4268-a81f-29d820c0f311.jar*
>> *2016-02-25 17:10:14.915 b.s.d.nimbus [INFO] Finished uploading file from
>> client:
>> /opt/hadoop/storm/nimbus/inbox/stormjar-7352b097-8829-4268-a81f-29d820c0f311.jar*
>> 2016-02-25 17:10:14.928 b.s.d.nimbus [INFO] [req 22] Access from:
>> principal: op:submitTopology
>> 2016-02-25 17:10:14.933 b.s.d.nimbus [INFO] Received topology submission
>> for CdrPerformanceTestTopology-phoenix-bolt-4 with conf
>> {"topology.max.task.parallelism" nil, "hbase.conf" {"hbase.rootdir" "hdfs://
>> sorm-master02.msk.mts.ru:8020/apps/hbase/data"},
>> "topology.submitter.principal" "", "topology.acker.executors" nil,
>> "topology.workers" 2, "topology.message.timeout.secs" 30, "topology.debug"
>> true, "topology.max.spout.pending" 10000, "storm.zookeeper.superACL" nil,
>> "topology.users" (), "topology.submitter.user" "", "topology.kryo.register"
>> {"org.apache.avro.util.Utf8" nil,
>> "ru.mts.sorm.schemes.avro.cdr.CbossCdrRecord" nil},
>> "topology.kryo.decorators" (), "storm.id"
>> "CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414", "topology.name"
>> "CdrPerformanceTestTopology-phoenix-bolt-4"}
>> 2016-02-25 17:10:14.936 b.s.d.nimbus [INFO] nimbus file
>> location:/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
>> 2016-02-25 17:10:15.160 b.s.c.LocalFileSystemCodeDistributor [INFO]
>> Created meta file
>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/storm-code-distributor.meta
>> upload successful.
>> 2016-02-25 17:10:15.164 b.s.zookeeper [INFO] Node already in queue for
>> leader lock.
>> 2016-02-25 17:10:15.165 b.s.d.nimbus [INFO] desired replication count of
>> 1 not achieved but we have hit the max wait time 60 so moving on with
>> replication count = 1
>> 2016-02-25 17:10:15.169 b.s.d.nimbus [INFO] Activating
>> CdrPerformanceTestTopology-phoenix-bolt-4:
>> CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
>> 2016-02-25 17:10:15.186 b.s.s.EvenScheduler [INFO] Available slots:
>> (["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700]
>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6701]
>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700]
>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6701])
>> 2016-02-25 17:10:15.189 b.s.d.nimbus [INFO] Setting new assignment for
>> topology id CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414:
>> #backtype.storm.daemon.common.Assignment{:master-code-dir
>> "/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414",
>> :node->host {"edef7b4d-2358-41c6-8a03-a7638d0f68c6" "
>> sorm-data06.msk.mts.ru", "0a6e4de5-dd83-4307-b7e6-1db60c4c0898" "
>> sorm-data07.msk.mts.ru"}, :executor->node+port {[8 8]
>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [2 2]
>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [7 7]
>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [3 3]
>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [1 1]
>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [6 6]
>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [9 9]
>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [11 11]
>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [5 5]
>> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [10 10]
>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [4 4]
>> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700]}, :executor->start-time-secs
>> {[8 8] 1456409415, [2 2] 1456409415, [7 7] 1456409415, [3 3] 1456409415, [1
>> 1] 1456409415, [6 6] 1456409415, [9 9] 1456409415, [11 11] 1456409415, [5
>> 5] 1456409415, [10 10] 1456409415, [4 4] 1456409415}}
>>
>> But supervisor stuck somewhere:
>> *2016-02-25 17:10:15.198 b.s.d.supervisor [INFO] Downloading code for
>> storm id* CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414 from
>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
>> 2016-02-25 17:10:15.199 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
>> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
>> 2016-02-25 17:10:15.205 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
>> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
>> 2016-02-25 17:10:17.405 b.s.c.LocalFileSystemCodeDistributor [INFO]
>> Attempting to download meta file
>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormjar.jar
>> from remote sorm-master02.msk.mts.ru:6627
>> 2016-02-25 *17:10:17.406* b.s.u.StormBoundedExponentialBackoffRetry
>> [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries
>> [5]
>> 2016-02-25 *17:13:29.002* b.s.c.LocalFileSystemCodeDistributor [INFO]
>> Attempting to download meta file
>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormconf.ser
>> from remote sorm-master02.msk.mts.ru:6627
>> 2016-02-25 17:13:29.003 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
>> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
>> 2016-02-25 17:13:31.148 b.s.c.LocalFileSystemCodeDistributor [INFO]
>> Attempting to download meta file
>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormcode.ser
>> from remote sorm-master02.msk.mts.ru:6627
>> 2016-02-25 17:13:31.148 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
>> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
>> *2016-02-25 17:13:34.124 b.s.d.supervisor [INFO] Finished downloading
>> code for storm id*
>> CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414 from
>> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
>>
>>
>>
>
>
> --
> С уважением Дудин Андрей
>

Re: Topology code distribution takes too much time

Posted by Andrey Dudin <do...@gmail.com>.
Hi Igor.
Try to dump threads and look on it. I think you can find problem in dump.

Look at lan, cpu and ram load. Maybe you CPU overloaded or ram and lan.

2016-03-27 21:26 GMT+03:00 Igor Kuzmenko <f1...@gmail.com>:

> Hello, I'm using Hortonworks Data Platform v2.3.4 with included storm 0.10.
> After deploying my topology using "*storm jar*" command it takes about 3
> min to distribute code. The network is 10Gb/s, topology jar is about 150MB,
> cluster has 2 nodes with supervisors on them, so I assume that this process
> shouldn't take much time. Actualy, it wasn't with Hortonworks platform
> 2.3.0, I can't understand what happend since than. Heres change log
> <http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_HDP_RelNotes/content/patch_storm.html> of
> HDP, hope you can help.
>
> Here's logs.
> Nimbus download jar pretty fast:
> *2016-02-25 17:10:13.040 b.s.d.nimbus [INFO] Uploading file from client to
> /opt/hadoop/storm/nimbus/inbox/stormjar-7352b097-8829-4268-a81f-29d820c0f311.jar*
> *2016-02-25 17:10:14.915 b.s.d.nimbus [INFO] Finished uploading file from
> client:
> /opt/hadoop/storm/nimbus/inbox/stormjar-7352b097-8829-4268-a81f-29d820c0f311.jar*
> 2016-02-25 17:10:14.928 b.s.d.nimbus [INFO] [req 22] Access from:
> principal: op:submitTopology
> 2016-02-25 17:10:14.933 b.s.d.nimbus [INFO] Received topology submission
> for CdrPerformanceTestTopology-phoenix-bolt-4 with conf
> {"topology.max.task.parallelism" nil, "hbase.conf" {"hbase.rootdir" "hdfs://
> sorm-master02.msk.mts.ru:8020/apps/hbase/data"},
> "topology.submitter.principal" "", "topology.acker.executors" nil,
> "topology.workers" 2, "topology.message.timeout.secs" 30, "topology.debug"
> true, "topology.max.spout.pending" 10000, "storm.zookeeper.superACL" nil,
> "topology.users" (), "topology.submitter.user" "", "topology.kryo.register"
> {"org.apache.avro.util.Utf8" nil,
> "ru.mts.sorm.schemes.avro.cdr.CbossCdrRecord" nil},
> "topology.kryo.decorators" (), "storm.id"
> "CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414", "topology.name"
> "CdrPerformanceTestTopology-phoenix-bolt-4"}
> 2016-02-25 17:10:14.936 b.s.d.nimbus [INFO] nimbus file
> location:/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
> 2016-02-25 17:10:15.160 b.s.c.LocalFileSystemCodeDistributor [INFO]
> Created meta file
> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/storm-code-distributor.meta
> upload successful.
> 2016-02-25 17:10:15.164 b.s.zookeeper [INFO] Node already in queue for
> leader lock.
> 2016-02-25 17:10:15.165 b.s.d.nimbus [INFO] desired replication count of 1
> not achieved but we have hit the max wait time 60 so moving on with
> replication count = 1
> 2016-02-25 17:10:15.169 b.s.d.nimbus [INFO] Activating
> CdrPerformanceTestTopology-phoenix-bolt-4:
> CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
> 2016-02-25 17:10:15.186 b.s.s.EvenScheduler [INFO] Available slots:
> (["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700]
> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6701]
> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700]
> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6701])
> 2016-02-25 17:10:15.189 b.s.d.nimbus [INFO] Setting new assignment for
> topology id CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414:
> #backtype.storm.daemon.common.Assignment{:master-code-dir
> "/opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414",
> :node->host {"edef7b4d-2358-41c6-8a03-a7638d0f68c6" "
> sorm-data06.msk.mts.ru", "0a6e4de5-dd83-4307-b7e6-1db60c4c0898" "
> sorm-data07.msk.mts.ru"}, :executor->node+port {[8 8]
> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [2 2]
> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [7 7]
> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [3 3]
> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [1 1]
> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [6 6]
> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [9 9]
> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [11 11]
> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [5 5]
> ["edef7b4d-2358-41c6-8a03-a7638d0f68c6" 6700], [10 10]
> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700], [4 4]
> ["0a6e4de5-dd83-4307-b7e6-1db60c4c0898" 6700]}, :executor->start-time-secs
> {[8 8] 1456409415, [2 2] 1456409415, [7 7] 1456409415, [3 3] 1456409415, [1
> 1] 1456409415, [6 6] 1456409415, [9 9] 1456409415, [11 11] 1456409415, [5
> 5] 1456409415, [10 10] 1456409415, [4 4] 1456409415}}
>
> But supervisor stuck somewhere:
> *2016-02-25 17:10:15.198 b.s.d.supervisor [INFO] Downloading code for
> storm id* CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414 from
> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
> 2016-02-25 17:10:15.199 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
> 2016-02-25 17:10:15.205 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
> 2016-02-25 17:10:17.405 b.s.c.LocalFileSystemCodeDistributor [INFO]
> Attempting to download meta file
> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormjar.jar
> from remote sorm-master02.msk.mts.ru:6627
> 2016-02-25 *17:10:17.406* b.s.u.StormBoundedExponentialBackoffRetry
> [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries
> [5]
> 2016-02-25 *17:13:29.002* b.s.c.LocalFileSystemCodeDistributor [INFO]
> Attempting to download meta file
> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormconf.ser
> from remote sorm-master02.msk.mts.ru:6627
> 2016-02-25 17:13:29.003 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
> 2016-02-25 17:13:31.148 b.s.c.LocalFileSystemCodeDistributor [INFO]
> Attempting to download meta file
> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormcode.ser
> from remote sorm-master02.msk.mts.ru:6627
> 2016-02-25 17:13:31.148 b.s.u.StormBoundedExponentialBackoffRetry [INFO]
> The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
> *2016-02-25 17:13:34.124 b.s.d.supervisor [INFO] Finished downloading code
> for storm id* CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414 from
> /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
>
>
>


-- 
С уважением Дудин Андрей