You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "zhiwei (JIRA)" <ji...@apache.org> on 2017/08/21 03:39:00 UTC

[jira] [Commented] (STORM-2697) Failed to cleanup worker when GET worker-user failed

    [ https://issues.apache.org/jira/browse/STORM-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16134671#comment-16134671 ] 

zhiwei commented on STORM-2697:
-------------------------------

(defn try-cleanup-worker [conf id user]
  (try
    (if (.exists (File. (worker-root conf id)))
      (do
        (if (conf SUPERVISOR-RUN-WORKER-AS-USER)
          (rmr-as-user conf id user (worker-root conf id))
          (do
            (rmr (worker-heartbeats-root conf id))
            ;; this avoids a race condition with worker or subprocess writing pid around same time
            (rmr (worker-pids-root conf id))
            (rmr (worker-root conf id))))
        (remove-worker-user! conf id)
        (remove-dead-worker id)
      ))
  (catch IOException e
    (log-warn-error e "Failed to cleanup worker " id ". Will retry later"))
  (catch RuntimeException e
    (log-warn-error e "Failed to cleanup worker " id ". Will retry later")
    )
  (catch java.io.FileNotFoundException e (log-message (.getMessage e)))
    ))


(rmr-as-user conf id user (worker-root conf id))   exe failed, and throw Exception。

should add force rm

(do
            (rmr (worker-heartbeats-root conf id))
            ;; this avoids a race condition with worker or subprocess writing pid around same time
            (rmr (worker-pids-root conf id))
            (rmr (worker-root conf id))))



> Failed to cleanup worker when GET worker-user failed
> ----------------------------------------------------
>
>                 Key: STORM-2697
>                 URL: https://issues.apache.org/jira/browse/STORM-2697
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 0.10.0, 0.10.1, 0.10.2, 1.1.2
>            Reporter: zhiwei
>
> "2017-08-15 11:25:53,554" | INFO  | [Thread-4] | Shutting down and clearing state for id f5906569-41db-4c7f-9048-b3c551603fb4. Current supervisor time: 1502767553. State: :not-started, Heartbeat: nil | backtype.storm.daemon.supervisor (NO_SOURCE_FILE:0) 
> "2017-08-15 11:25:53,554" | INFO  | [Thread-4] | Shutting down 136d9652-7b8b-4e3d-8d45-33d72dfe1462:f5906569-41db-4c7f-9048-b3c551603fb4 | backtype.storm.daemon.supervisor (NO_SOURCE_FILE:0) 
> "2017-08-15 11:25:53,555" | INFO  | [Thread-4] | GET worker-user f5906569-41db-4c7f-9048-b3c551603fb4 | backtype.storm.config (NO_SOURCE_FILE:0) 
> "2017-08-15 11:25:53,555" | WARN  | [Thread-4] | Failed to get worker user for f5906569-41db-4c7f-9048-b3c551603fb4. #<FileNotFoundException java.io.FileNotFoundException: /var/streaming_data/stormdir/workers-users/f5906569-41db-4c7f-9048-b3c551603fb4 (No such file or directory)> | backtype.storm.config (NO_SOURCE_FILE:0) 
> "2017-08-15 11:25:53,555" | WARN  | [Thread-4] | Failed to cleanup worker f5906569-41db-4c7f-9048-b3c551603fb4. Will retry later #<IllegalArgumentException java.lang.IllegalArgumentException: User cannot be blank when calling worker-launcher.> | backtype.storm.daemon.supervisor (NO_SOURCE_FILE:0) 
> "2017-08-15 11:25:53,555" | INFO  | [Thread-4] | Shut down 136d9652-7b8b-4e3d-8d45-33d72dfe1462:f5906569-41db-4c7f-9048-b3c551603fb4 | backtype.storm.daemon.supervisor (NO_SOURCE_FILE:0) 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)