You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by GitBox <gi...@apache.org> on 2019/01/25 07:32:32 UTC

[GitHub] Liu-XinYuan opened a new issue #2206: Loss of metadata causes the server to fail to start properly

Liu-XinYuan opened a new issue #2206: Loss of metadata causes the server to fail to start properly
URL: https://github.com/apache/incubator-skywalking/issues/2206
 
 
   Please answer these questions before submitting your issue.
   
   - Why do you submit this issue?
   - [ ] Question or discussion
   - [ ] Bug
   - [ ] Requirement
   - [x] Feature or performance improvement
   
   
   ##  Requirement or improvement
   ### background
   When the user upgrades SkyWalking, the data model of the old and new versions is inconsistent, causing the server to fail to start properly. At this time, the user may adopt the method of clearing the library.
   The registration data is lost. The interface will not be able to display the statistical indicators reported by the client that lacks registration information. At the same time, because under the existing mechanism,
   The user needs to restart the client to complete the re-registration operation, but the business system restart is not acceptable because of the monitoring system problem.
   So we need a mechanism to re-register without restarting the business system.
   
   ### ideas
   The registration data is lost on the server side and there are two compensation measures:
   * Push the registration data cached by the client to the server again, but the ID of the registration data in the cache may have been occupied by other newly registered clients.
   Solving such problems is costly.
   * Reset the registration data of the problem client. The key elements of this solution are how to identify the problem client and how to send the command to the problem client.
   
   ### Key issues
   #### Uniquely identifies
   At present, the client automatically generates a globally unique agentUUID as the unique identifier of the client instance. However, the ID of the client cannot be accurately located by the operation and maintenance personnel. Therefore, the startup file and startup parameters are required. Add the client instance name attribute, which is manually specified when the user deploys.
   Because the recovery function is not a necessary function of the system, as a non-essential option, the ability to automatically generate the original global unique agentUUID is retained. The original agentUUID is overwritten only when the user specifies the client instance name in the startup file or startup parameters.
   In order to avoid modifying the 5.x protocol, the other language probes are linked to the upgrade, and the attributes of the instance name are added in the heartbeat interface of the 6.x protocol.
   
   #### Problem finding
   The client whose registration data is missing is not aware of it. Only the server can find the data reported by the client. If the trace details are reported in the trace interface, the size of the trace details is too large and the performance is too large. Therefore, consider the heartbeat interface of the instance to discover the problem client. However, only the instance ID is reported in the heartbeat interface. And friendly prompts, you need to modify this interface, add the instance name attribute in the interface.
   Check the ID and instance name at the same time, and prompt in the error log information to check the problematic instance information.
   
   ### Directive is issued
   Considering the background of this solution is a very useful function, the instruction does not need to be sent to the client through the server, and the client is directly logged in to the client.
   The instruction to reset the registration data, while considering the security problem, can not open the network interface to the client to receive the instruction, so the file scanning and listening mode are used to issue the instruction.
   Considering the friendliness of the operator after the command is issued, the client will modify the status information in the file to inform the execution of the reset command.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services