You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@knox.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/01/02 18:22:00 UTC

[jira] [Work logged] (KNOX-2157) Knox should check if it's actually up&running

     [ https://issues.apache.org/jira/browse/KNOX-2157?focusedWorklogId=365444&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365444 ]

ASF GitHub Bot logged work on KNOX-2157:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Jan/20 18:21
            Start Date: 02/Jan/20 18:21
    Worklog Time Spent: 10m 
      Work Description: smolnar82 commented on pull request #230: KNOX-2157 - Verifying the server's state in addition to PID check at gateway start
URL: https://github.com/apache/knox/pull/230
 
 
   ## What changes were proposed in this pull request?
   
   Apart from the already existing PID check, we are going to verify if the Jetty server is in `STARTED` state. We can achieve it by implementing Jetty's `LifeCylce.Listener` interface and write out the status (STARTING, STARTED, FAILURE, STOPPING, STOPPED) into `$DATA_DIR/gatewayServer.status` file. The startup script will return 0 when this file contains `STARTED`; 1 otherwise (the file is overwritten in every status change).
   
   Additionally, two new command-line options are introduced to the `start` command:
   * `--test-gateway-retry-attempts`: indicates the number of tries the startup script should execute before it fails. Defaults to 5.
   * `--test-gateway-retry-sleep`: the amount of time that the test process will wait or sleep before a retry is issued. Defaults to 2s.
   
   **Note**: currently, the `stop` command simply kills the Java process and there is no hook in the Java code to handle graceful shutdown. Because of this the `STOPPING` and `STOPPED` statuses never got written into the status file. To avoid future errors I rather added a delete command in the shell script.
   
   ## How was this patch tested?
   
   Executed a full build:
   ```
   $ mvn clean -Dshellcheck=true -T1C verify -Prelease,package
   ...
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time: 18:24 min (Wall Clock)
   [INFO] Finished at: 2020-01-02T18:50:47+01:00
   [INFO] Final Memory: 410M/1798M
   [INFO] ------------------------------------------------------------------------
   ```
   
   Tested manually:
   * confirmed that the new status file became created/removed when started/stopped the server
   * tried the `start` command with different options (`--printEnv`, `--test-gateway-retry-attempts`, `--test-gateway-retry-sleep`)
   * made sure (using debug messages in the shell scripts) that the status check took place (in my environment it needed 3 re-tries)
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 365444)
    Remaining Estimate: 0h
            Time Spent: 10m

> Knox should check if it's actually up&running
> ---------------------------------------------
>
>                 Key: KNOX-2157
>                 URL: https://issues.apache.org/jira/browse/KNOX-2157
>             Project: Apache Knox
>          Issue Type: New Feature
>          Components: Server
>    Affects Versions: 1.1.0, 1.2.0, 1.3.0
>            Reporter: Sandor Molnar
>            Assignee: Sandor Molnar
>            Priority: Major
>             Fix For: 1.4.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> As of now, Knox returns a success code in case the process is being created. There should be another way to check if the server is actually up&running and is capable of serving incoming requests.
>  My proposal is:
>  * the Knox startup script should be modified to run a basic Admin API check in case {{--test-gateway-url}} is defined in the startup command. If this is blank or non-defined we fallback to the existing PID-based check
>  * two more optional arguments will be defined for this feature:
>  ** {{--test-gateway-retry-attempts}}: indicates the number of tries the startup script should execute before it fails. Defaults to 5.
>  ** {{--test-gateway-retry-sleep}}: the amount of time that the test process will wait or sleep before a retry is issued. Defaults to 2s.
> The new-style check will use {{curl}} and will return success in case {{$GATEWAY_TEST_URL/gateway/admin/api/v1/version/}} returns an HTTP response with code 200. If this is not true; the startup script should return an error code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)