You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@skywalking.apache.org by GitBox <gi...@apache.org> on 2019/05/12 00:19:41 UTC

[GitHub] [skywalking] wu-sheng opened a new issue #2653: Improve Zipkin<->SkyWalking integration

wu-sheng opened a new issue #2653: Improve Zipkin<->SkyWalking integration
URL: https://github.com/apache/skywalking/issues/2653

Today's analysis process, in the easy scenario, really could transfer spans to SkyWalking trace. We used to demo it. Such as

- https://twitter.com/adrianfcole/status/1007625470886567936

I am welcome to have a deeper discussion on improving this feature. But please notice, as an APM, we have less flexible in span tags and the spans of each trace. Meaning, if you want to have the SkyWalking topology, metrics, and alarm, you need to have logically the same raw data.

Right now, the major issues of Zipkin receiver, not production ready are
1. Cluster mode is not supported. Because Zipkin sends trace through span by span. But the whole analysis needs the whole trace to build SkyWalking segments. One segment means all spans of one trace in one process(or thread). But in `ZipkinSkyWalkingTransfer#doTransfer`, it uses `CacheFactory.INSTANCE`, which use `CaffeineSpanCache` to cache the span with timeout mechanism. The `CaffeineSpanCache` use memory cache. In production and cluster mode, this cache should be a distributed cache env, and each trace timeout should be controlled by one OAP instance. Such as always let the OAP instance, who received the root span, to process the timeout and analysis the whole trace.
2. `CaffeineSpanCache#onRemoval` is the entrance based on timeout to assume the whole trace has been received, which could begin to analysis in `Zipkin2SkyWalkingTransfer`. The transfer process in `SegmentBuilder#build` including the complex steps, which I am not sure ready and suitable for everyone.

I think you have read https://github.com/SkyAPM/zipkin-skywalking , actually, that document is wrote by me and @adriancole back to the back we were building this feature as step one.

So, you need to fix the above two issues.

Also, when we came into SkyWalking 6.x, we have a better solution than analysis of the whole trace.

The following suggestion should have better performance in trace analysis, but you need to understand SkyWalking more.
When `needAnalysis=false` today, the trace/span have been saved in storage already, and query has been supported. So, the things we left are topology and metrics, these two actually are the same thing. **Build scopes**. You could read `MultiScopesSpanListener#parseEntry`, `MultiScopesSpanListener#parseExit` and `MultiScopesSpanListener#build`. Especially the `#build` method. If you have read SkyWalking OAL document, you should know, SkyWalking analysis core receives the scopes.
In Zipkin receiver, we could only process span having `remoteEndpoint`, matching the client span <-> server span, instead of building the whole trace. Build the service, service instance, endpoint and relationship scopes. This could significantly reduce the memory cost of distributed cache in issue(1). But you still need that the cache.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services