You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2018/01/08 00:41:00 UTC
[jira] [Resolved] (STORM-2879) Supervisor collapse continuously
when there is a expired assignment for overdue storm
[ https://issues.apache.org/jira/browse/STORM-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jungtaek Lim resolved STORM-2879.
---------------------------------
Resolution: Fixed
Fix Version/s: 1.0.6
1.1.2
1.2.0
Thanks [~danny0405], I merge into master and 1.x version lines.
> Supervisor collapse continuously when there is a expired assignment for overdue storm
> -------------------------------------------------------------------------------------
>
> Key: STORM-2879
> URL: https://issues.apache.org/jira/browse/STORM-2879
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core, storm-server
> Affects Versions: 2.0.0, 1.x
> Reporter: Yuzhao Chen
> Assignee: Yuzhao Chen
> Priority: Critical
> Labels: patch, pull-request-available
> Fix For: 2.0.0, 1.2.0, 1.1.2, 1.0.6
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> For now, when a topology is reassigned or killed for a cluster, supervisor will delete 4 files for an overdue storm:
> - storm-code
> - storm-ser
> - storm-jar
> - LocalAssignment
> Slot.java
> static DynamicState cleanupCurrentContainer(DynamicState dynamicState, StaticState staticState, MachineState nextState) throws Exception {
> assert(dynamicState.container != null);
> assert(dynamicState.currentAssignment != null);
> assert(dynamicState.container.areAllProcessesDead());
>
> dynamicState.container.cleanUp();
> staticState.localizer.releaseSlotFor(dynamicState.currentAssignment, staticState.port);
> DynamicState ret = dynamicState.withCurrentAssignment(null, null);
> if (nextState != null) {
> ret = ret.withState(nextState);
> }
> return ret;
> }
> But we do not make a transaction to do this, if an exception occurred during deleting storm-code/ser/jar, an overdue local assignment will be left on disk.
> Then when supervisor restart from the exception above, the slots will be initial and container will be recovered from LocalAssignments, the blob store will fetch the files from Nimbus/Master, but will get a KeyNotFoundException, and supervisor collapses again.
> This will happens continuously and supervisor will never recover until we clean up all the local assignments manually.
> This is the stack:
> 2017-12-27 14:15:04.434 o.a.s.l.AsyncLocalizer [INFO] Cleaning up unused topologies in /opt/meituan/storm/data/supervisor/stormdist
> 2017-12-27 14:15:04.434 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path /opt/meituan/storm/data/supervisor/stormdist/app_dpsr_realtime_shop_vane_allcates-14-1513685785
> 2017-12-27 14:15:04.445 o.a.s.d.s.Slot [INFO] STATE EMPTY msInState: 109 -> WAITING_FOR_BASIC_LOCALIZATION msInState: 1
> 2017-12-27 14:15:04.471 o.a.s.d.s.Supervisor [INFO] Starting supervisor with id 255d3fed-f3ee-4c7e-8a08-b693c9a6a072 at host gq-data-rt48.gq.sankuai.com.
> 2017-12-27 14:15:04.502 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormjar.jar from blob store.
> org.apache.storm.generated.KeyNotFoundException: null
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:123) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
> at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
> 2017-12-27 14:15:04.611 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormjar.jar from blob store.
> org.apache.storm.generated.KeyNotFoundException: null
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:123) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
> at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
> 2017-12-27 14:15:04.718 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormcode.ser from blob store.
> org.apache.storm.generated.KeyNotFoundException: null
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:124) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
> at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
> 2017-12-27 14:15:04.825 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormcode.ser from blob store.
> org.apache.storm.generated.KeyNotFoundException: null
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:124) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
> at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
> 2017-12-27 14:15:04.932 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormconf.ser from blob store.
> org.apache.storm.generated.KeyNotFoundException: null
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:125) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
> at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
> 2017-12-27 14:15:05.039 o.a.s.u.Utils [ERROR] An exception happened while downloading /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormconf.ser from blob store.
> org.apache.storm.generated.KeyNotFoundException: null
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:125) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.2-mt001.jar:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
> at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
> 2017-12-27 14:15:05.140 o.a.s.u.Utils [INFO] Could not extract resources from /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormjar.jar
> 2017-12-27 14:15:05.142 o.a.s.d.s.Slot [INFO] STATE WAITING_FOR_BASIC_LOCALIZATION msInState: 697 -> WAITING_FOR_BLOB_LOCALIZATION msInState: 0
> 2017-12-27 14:15:05.142 o.a.s.l.AsyncLocalizer [WARN] Caught Exception While Downloading (rethrowing)...
> java.io.FileNotFoundException: File '/opt/meituan/storm/data/supervisor/stormdist/app_dpsr_realtime_shop_vane_allcates-14-1513685785/stormconf.ser' does not exist
> at org.apache.storm.shade.org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:292) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.shade.org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1815) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.ConfigUtils.readSupervisorStormConfGivenPath(ConfigUtils.java:264) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.ConfigUtils.readSupervisorStormConfImpl(ConfigUtils.java:376) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.utils.ConfigUtils.readSupervisorStormConf(ConfigUtils.java:370) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBlobs.call(AsyncLocalizer.java:226) ~[storm-core-1.1.2-mt001.jar:?]
> at org.apache.storm.localizer.AsyncLocalizer$DownloadBlobs.call(AsyncLocalizer.java:213) ~[storm-core-1.1.2-mt001.jar:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_76]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_76]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_76]
> at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)