You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by GitBox <gi...@apache.org> on 2020/07/03 07:23:27 UTC

[GitHub] [skywalking] EvanLjp opened a new issue #5023: trace query performance

EvanLjp opened a new issue #5023:
URL: https://github.com/apache/skywalking/issues/5023


   Please answer these questions before submitting your issue.
   
   - Why do you submit this issue?
   - [ ] Question or discussion
   the trace quey is a range query with  timeBucket.
   is the trace segment data keeps 7 days. 
   And The amount of data we get every day is huge.
   the Query performance will be very poor
   
   the reason:
   the skywalking trace query only consider  common coding styles ranther than the query performance in es.
   
   how to improve:
   a query form front end ,and the time bucket is between 20200627000238 and  20200627000438
   
   so if the index spilt by day step
   
   the data must contains in 20200627 index. so we only need to query data in this index ranther than all index
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] EvanLjp edited a comment on issue #5023: trace query performance

Posted by GitBox <gi...@apache.org>.
EvanLjp edited a comment on issue #5023:
URL: https://github.com/apache/skywalking/issues/5023#issuecomment-655671523


   _Basically, you need to consider when you do trace_id query, there is no timestamp._
   
   why not to add a time key for backend.
   
   i read the[PR](https://github.com/apache/skywalking/pull/4863) ,i don't think parse time from traceId is a good idea.for example the traceId in nginx lua agent.
   so my advice: 
   1. normal query.the index time is format between  [queryStartTime -1h,queryEndTime +1h]
   2. add a time key when traceID query
   3. if the advice2 is not implement, Short term plan: if have 7 segment index, from 7.1 to 7.7 ,we can search every index from 7.7 to 7.1 .if hits some data,we also to search the prefix one (when cross day) .
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng commented on issue #5023: trace query performance

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #5023:
URL: https://github.com/apache/skywalking/issues/5023#issuecomment-653404661


   Yes, logically, it is. But the change wouldn't be that easy. #4863 stopped, someone tried to do so too. Basically, you need to consider when you do trace_id query, there is no timestamp.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng commented on issue #5023: trace query performance

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #5023:
URL: https://github.com/apache/skywalking/issues/5023#issuecomment-661860674


   > If you want to optimize the query when timestamps exist, I think it is possible.
   
   I think we should.
   
   > shard_num = (hash(_routing) + hash(_id) % routing_partition_size) % num_primary_shards
   
   What is `_id`? Trace id? Why do you want to do this? Query specific shard(s) in the trace_id query case? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] EvanLjp commented on issue #5023: trace query performance

Posted by GitBox <gi...@apache.org>.
EvanLjp commented on issue #5023:
URL: https://github.com/apache/skywalking/issues/5023#issuecomment-661856066


   @wu-sheng I have a new idea to solve es query speed on traceId query
   1: shard_num = (hash(_routing) + hash(_id) % routing_partition_size) % num_primary_shards
   when writing: we can compute the shards number by some rules
   2. when reading ,we can compute the data in which shards .and only get data from this shards
   
   how do u think about this idea


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] EvanLjp commented on issue #5023: trace query performance

Posted by GitBox <gi...@apache.org>.
EvanLjp commented on issue #5023:
URL: https://github.com/apache/skywalking/issues/5023#issuecomment-661874512


   there are some docs:https://zhuanlan.zhihu.com/p/94604871


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] EvanLjp commented on issue #5023: trace query performance

Posted by GitBox <gi...@apache.org>.
EvanLjp commented on issue #5023:
URL: https://github.com/apache/skywalking/issues/5023#issuecomment-653405413


   okay, i would like to read the issue. and try to solve it later


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng commented on issue #5023: trace query performance

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #5023:
URL: https://github.com/apache/skywalking/issues/5023#issuecomment-674320641


   This has been fixed through a new way in #5132. Query with time range could land on the specific indexes directly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng commented on issue #5023: trace query performance

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #5023:
URL: https://github.com/apache/skywalking/issues/5023#issuecomment-655815846


   > i read thePR ,i don't think parse time from traceId is a good idea.for example the traceId in nginx lua agent.
   
   No, that is not a good idea.
   
   > Basically, you need to consider when you do trace_id query, there is no timestamp.
   
   Because finding the timestamp may be an issue. When we inject the trace id into the logs or even in codes/database, there may be no timestamp there. And the trace query page supports `?traceId=xxx` to get the result directly.
   
   > if the advice2 is not implement, Short term plan: if have 7 segment index, from 7.1 to 7.7 ,we can search every index from 7.7 to 7.1 .if hits some data,we also to search the prefix one (when cross day) .
   
   This is as same as using Alias query, which is better because of parallel query.
   
   -------
   Basically, in `trace_id` query case, I think we still need the current query mode. If you want to optimize the query when timestamps exist, I think it is possible.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] EvanLjp commented on issue #5023: trace query performance

Posted by GitBox <gi...@apache.org>.
EvanLjp commented on issue #5023:
URL: https://github.com/apache/skywalking/issues/5023#issuecomment-661864207


   > I think we should.
   
   already do. please read pr
   
   
   
   > What is `_id`? Trace id? Why do you want to do this? Query specific shard(s) in the trace_id query case?
   
   maybe this is a way to optimize query performance on a query with traceId


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] EvanLjp edited a comment on issue #5023: trace query performance

Posted by GitBox <gi...@apache.org>.
EvanLjp edited a comment on issue #5023:
URL: https://github.com/apache/skywalking/issues/5023#issuecomment-655671523


   _Basically, you need to consider when you do trace_id query, there is no timestamp._
   
   why not to add a time key for backend.
   
   i read the[PR](https://github.com/apache/skywalking/pull/4863) ,i don't think parse time from traceId is a good idea.for example the traceId in nginx lua agent.
   so my advice: 
   1. trace query with time range .the index time is format between  [queryStartTime -1h,queryEndTime +1h]
   2.trace query without time range: add a time key when traceID query
   3. if the advice2 is not implement, Short term plan: if have 7 segment index, from 7.1 to 7.7 ,we can search every index from 7.7 to 7.1 .if hits some data,we also to search the prefix one (when cross day) .
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng closed issue #5023: trace query performance

Posted by GitBox <gi...@apache.org>.
wu-sheng closed issue #5023:
URL: https://github.com/apache/skywalking/issues/5023


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] EvanLjp commented on issue #5023: trace query performance

Posted by GitBox <gi...@apache.org>.
EvanLjp commented on issue #5023:
URL: https://github.com/apache/skywalking/issues/5023#issuecomment-655671523


   _Basically, you need to consider when you do trace_id query, there is no timestamp._
   
   why not to add a time key for backend.
   
   i read the[PR](https://github.com/apache/skywalking/pull/4863) ,i don't parse time from traceId is a good idea.for example the traceId in nginx lua agent.
   so my advice: 
   1. normal query.the index time is format between  [queryStartTime -1h,queryEndTime +1h]
   2. add a time key when traceID query
   3. if the advice2 is not implement, Short term plan: if have 7 segment index, from 7.1 to 7.7 ,we can search every index from 7.7 to 7.1 .if hits some data,we also to search the prefix one (when cross day) .
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] EvanLjp edited a comment on issue #5023: trace query performance

Posted by GitBox <gi...@apache.org>.
EvanLjp edited a comment on issue #5023:
URL: https://github.com/apache/skywalking/issues/5023#issuecomment-655671523


   _Basically, you need to consider when you do trace_id query, there is no timestamp._
   
   why not to add a time key for backend.
   
   i read the[PR](https://github.com/apache/skywalking/pull/4863) ,i don't think parse time from traceId is a good idea.for example the traceId in nginx lua agent.
   so my advice: 
   1. trace query with time range .the index time is format between  [queryStartTime -1h,queryEndTime +1h]
   2. trace query without time range: add a time key when traceID query
   3. if the advice2 is not implement, Short term plan: if have 7 segment index, from 7.1 to 7.7 ,we can search every index from 7.7 to 7.1 .if hits some data,we also to search the prefix one (when cross day) .
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org