You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by GitBox <gi...@apache.org> on 2022/11/23 01:23:54 UTC

[GitHub] [skywalking] wu-sheng opened a new issue, #10008: [Bug] Incorrect Timestamp used in BanyanDB

wu-sheng opened a new issue, #10008:
URL: https://github.com/apache/skywalking/issues/10008

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Apache SkyWalking Component
   
   BanyanDB (apache/skywalking-banyandb)
   
   ### What happened
   
   BanyanDB used the measure/stream receiving timestamp as its timestamp.
   When in the query stage, there are several cases, in which you can't read the data correctly.
   1. OAP has 20-30s delay to build bulk data, so when do T1-T10 metrics query, you always can't T10 metrics correctly. Because the latest T10 value is updated in T10 + N seconds
   2. Similar to <1>, the traces and logs have a delay in persistent. So, you always can't get the result correctly. And more importantly, the UI is showing the start time of the traces/logs, you would see traces and logs not in the query condition. This would be treated as bugs.
   
   ### What you expected to happen
   
   We should use OAP persistent kernel to assign the timestamp for persistent, rather than using a timestamp by default.
   
   ### How to reproduce
   
   Generate a trace/metric/log, and put them into the BanyanDB several minutes later, but query in the trace running duration.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] Incorrect Timestamp used in BanyanDB [skywalking]

Posted by "hanahmily (via GitHub)" <gi...@apache.org>.
hanahmily closed issue #10008: [Bug] Incorrect Timestamp used in BanyanDB
URL: https://github.com/apache/skywalking/issues/10008


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wu-sheng commented on issue #10008: [Bug] Incorrect Timestamp used in BanyanDB

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #10008:
URL: https://github.com/apache/skywalking/issues/10008#issuecomment-1324591092

   There are at least two conclusions based on today's discussion
   1. Timestamp should be added as a field of Record in the OAP. All record implementation should provide an accurate timestamp. @wankai123 
   2. BanyanDB server should maintain a snowflake-style ID generator. A record ID should constitute by `record name` + `timestamp`(not time bucket) + server-generated suffix. The server-generated suffix should be held for every sharding ID in every several seconds/minutes(time window). 
   
   The reason for setting the rolling time bucket is to avoid conflict when a new record carries an old timestamp, and the suffix ID just rolls back to 0.
   
   - Trace ID-1, Service A, Timestamp-1, suffix-0
   - Trace ID-2, Service A, Timestamp-2, suffix-1
   - Trace ID-3, Service A, Timestamp-3, suffix-2
   ...
   - Trace ID-1000, Service A, Timestamp-1000, suffix-1000
   - Trace ID-1001, Service A, Timestamp-1, suffix-0
   
   `Trace ID-1` would be overridden by `Trace ID-1001` unexpectedly. The suffix generator should make sure it would not roll back to zero in the given time window. Also, the time window should not be too small to make too much payload to hold many generators.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] Incorrect Timestamp used in BanyanDB [skywalking]

Posted by "wu-sheng (via GitHub)" <gi...@apache.org>.
wu-sheng commented on issue #10008:
URL: https://github.com/apache/skywalking/issues/10008#issuecomment-1876143253

   @hanahmily I think it is fine to pick the latest data for a specific data point. Is this solution majorly for measure data? The repeatable write for a certain data point should only happen for measure data, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] hanahmily commented on issue #10008: [Bug] Incorrect Timestamp used in BanyanDB

Posted by GitBox <gi...@apache.org>.
hanahmily commented on issue #10008:
URL: https://github.com/apache/skywalking/issues/10008#issuecomment-1324476863

   BanyanDB leverage the below timestamp fields defined in the writing spec to identify the data.
   
   1. The metrics use a "time bucket" to set this field
   2. Trace segment's timestamp should be picked up.
   
   We need more investigations on this issue. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] Incorrect Timestamp used in BanyanDB [skywalking]

Posted by "hanahmily (via GitHub)" <gi...@apache.org>.
hanahmily commented on issue #10008:
URL: https://github.com/apache/skywalking/issues/10008#issuecomment-1876168241

   > @hanahmily I think it is fine to pick the latest data for a specific data point. Is this solution majorly for measure data? The repeatable write for a certain data point should only happen for measure data, right?
   
   Exactly. We only deduplicate data for the measure for now. FWIK, the repeatable data only happens in the measure data flushing in the same time bucket. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] Incorrect Timestamp used in BanyanDB [skywalking]

Posted by "hanahmily (via GitHub)" <gi...@apache.org>.
hanahmily commented on issue #10008:
URL: https://github.com/apache/skywalking/issues/10008#issuecomment-1876054109

   Now, the column-based engine has fixed the problem mentioned differently:
   
   The Measure engine now assigns a version number to each part, so every bulk flush of OAP will be identified by this version. The version number will be larger for T10 than T1. In case of any duplicate data points, T10 will take precedence and T1 will be ignored. The deduplication process takes place during the querying and merging process.
   
   The stream engine now has the capability to store duplicated data. If two elements with identical timestamps are present in the same series, they will both be saved and can be retrieved by the client. Currently, we do not support any deduplication technology (although some other database systems do offer this feature during querying). However, we may consider adding support for it in the future.
   
   @wu-sheng Would you review the solution?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] Incorrect Timestamp used in BanyanDB [skywalking]

Posted by "hanahmily (via GitHub)" <gi...@apache.org>.
hanahmily commented on issue #10008:
URL: https://github.com/apache/skywalking/issues/10008#issuecomment-1962265770

   Since the column-based engine is merged, I closed this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org