You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@skywalking.apache.org by GitBox <gi...@apache.org> on 2018/12/13 20:27:20 UTC

[GitHub] wu-sheng opened a new issue #2044: [Proposal] Make Zipkin and Jaeger formats analysis product available

wu-sheng opened a new issue #2044: [Proposal] Make Zipkin and Jaeger formats analysis product available
URL: https://github.com/apache/incubator-skywalking/issues/2044

Now we have a Zipkin receiver(experimental), which could process data and reformat it to SkyWalking native format. The reason I called it experimental because really it is not available in cluster mode, and not in high performance.

Thanks to @peng-yongsheng and other committers deliver the new core of SkyWalking v6, I have some new solutions.

The basic principle is
1. Deliver trace and span storage based on native implementation of Zipkin or Jaeger.
1. Metric of service, service instance, endpoint scope come from span directly, no need to wait.
1. Relationship of service based on the span with RPC info, keep in Redis(or other) cache, set a timer to read and try match. Dispatcher as relation scope when both sides are ready.

Thank modulization and scan based extension mechanism, I could provide
1. A new storage implementation for all metric and alarm
1. A `storage-ext` module and implementations for Zipkin or Jaeger. They take charge of span based trace persistence
1. New query implementations for Zipkin or Jaeger.

The benefits of doing these are
1. Cluster mode easier to support. (Important for production ready)
1. Lower cache requirements in backend and reids. (Important for production ready)
1. We don't need to depend on SkyWalking native span/segment data structure to run the analysis for Zipkin and Jaeger. (Reduce the chances to face bugs of formats adaptor)
1. Storage of traces could follow the native Zipkin and Jaeger.
1. Storage still switchable, old codes of metric, alarm and record still useful.

The ideal result of this could be shown in `application.yml`, which should like this
```yaml
storage:
elasticsearch:
nameSpace: ${SW_NAMESPACE:""}
clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:localhost:9200}
indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:2}
indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:0}
# Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html
bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:2000} # Execute the bulk every 2000 requests
bulkSize: ${SW_STORAGE_ES_BULK_SIZE:20} # flush the bulk every 20mb
flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:10} # flush the bulk every 10 seconds whatever the number of requests
concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests
storage-jaeger-ext:
default:
receiver_jeager:
default:
parameters:xxxx
query:
graphql-jaeger:
path: ${SW_QUERY_GRAPHQL_PATH:/graphql}
```

Hope I haven't missed anything in high level. @peng-yongsheng @JaredTan95 @YunaiV @candyleer Welcome to join the discussion.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services