You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by GitBox <gi...@apache.org> on 2021/05/07 06:21:50 UTC

[GitHub] [skywalking] Ax1an commented on pull request #6863: Fix possible NullPointerException in agent's ES plugin.

Ax1an commented on pull request #6863:
URL: https://github.com/apache/skywalking/pull/6863#issuecomment-834100555


   **Why NPE1 happens?**
   
   ![image](https://user-images.githubusercontent.com/28091237/117406098-285e4800-af3f-11eb-91fa-c7589ca6e589.png)
   
   First, we need to make it clear that skywalking sets dynamic fields when enhancing the `org.elasticsearch.client.transport.TransportClientNodesService.execute` method.
   
   Secondly, we need to know that the exception occurred in the `org.apache.skywalking.apm.plugin.elasticsearch.v6.interceptor.AdapterActionFutureActionGetMethodsInterceptor` class of enhanced the `org.elasticsearch.action.support.AdapterActionFuture.actionGet` method.
   
   Seeing this exception, I first suspect that the execution result of the `isTrace (objInst)` method in the `beforeMethod` method is false, which results in the `createLocalSpan` method not executing. And the `org.elasticsearch.action.support.AdapterActionFuture` class's instance method `actionGet` occurred exception  in the normal execution process.
   
   ![image](https://user-images.githubusercontent.com/28091237/117406150-4035cc00-af3f-11eb-927a-12929994d5f1.png)
   
   By using arthas to observe `org.elasticsearch.action.support.AdapterActionFuture.actionGet()` method, I get the following exception.
   
   ```java
   watch org.elasticsearch.action.support.AdapterActionFuture actionGet "{throwExp}" -e -x 2 -n 2
   
   ts=2021-04-30 17:40:18; [cost=5.923075ms] result=@ArrayList[
       ConnectTransportException[[][10.111.233.68:9300] general node connection failure]; nested: IllegalStateException[Received message from unsupported version: [6.3.2] minimal compatible version is: [6.8.0]];
   	at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener$1.onFailure(TcpTransport.java:957)
   	at org.elasticsearch.transport.TransportHandshaker$HandshakeResponseHandler.handleResponse(TransportHandshaker.java:138)
   	at org.elasticsearch.transport.TransportHandshaker$HandshakeResponseHandler.handleResponse(TransportHandshaker.java:115)
   	at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:224)
   	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
   	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193)
   	at org.elasticsearch.transport.InboundHandler.handleResponse(InboundHandler.java:216)
   	at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:141)
   	at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105)
   	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660)
   	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62)
   	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359)
   	... 32 more
   ,
   ]
   ```
   
   After asking related colleagues, I found that they called version 6.3.2 ES cluster by using version 7.2.1 ES SDK.
   
   Why the execution result of the `isTrace (objInst)` method in the `beforeMethod` method is false?
   
   I find that the skywalking dynamic fields are set when the `org.elasticsearch.client.transport.TransportClientNodesService.execute()` method is executed.
   
   ![image](https://user-images.githubusercontent.com/28091237/117406197-53489c00-af3f-11eb-8038-7fbe23af0e46.png)
   
   However,  the `org.elasticsearch.client.transport.TransportClientNodesService.execute()` method is not executed every time before the `org.elasticsearch.action.support.AdapterActionFuture.actionGet()` method is executed.
   
   NPE occurs in the following execution path:
   
   ![image](https://user-images.githubusercontent.com/28091237/117406251-65c2d580-af3f-11eb-8221-40d479473fe2.png)
   
   `ScheduledNodeSampler` periodically sniffs cluster nodes and does not execute the `org.elasticsearch.client.transport.TransportClientNodesService.execute()` method.
   
   NPE will not occur in the following execution path:
   
   ![image](https://user-images.githubusercontent.com/28091237/117406295-75dab500-af3f-11eb-9460-62cfdd1dbceb.png)
   
   By looking at the `ActionRequestBuilder` method source code, you can know that the `org.elasticsearch.action.support.AdapterActionFuture.actionGet()` method is executed after the `org.elasticsearch.client.transport.TransportClientNodesService.execute()` method.
   
   So this exception has nothing to do with skywalking. It is caused by developers operating ES with an incompatible version of SDK.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org