You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by GitBox <gi...@apache.org> on 2021/04/22 03:21:16 UTC

[GitHub] [skywalking] zhyyu opened a new issue #6803: service cpm may not accurate when deploy multi L2 aggregators with enableDatabaseSession on

zhyyu opened a new issue #6803:
URL: https://github.com/apache/skywalking/issues/6803


   Please answer these questions before submitting your issue.
   
   - Why do you submit this issue?
   - [x] Question or discussion
   - [ ] Bug
   - [ ] Requirement
   - [ ] Feature or performance improvement
   
   ___
   
   ### Question
   
   - What do you want to know?
     - I reviewed the code and found that, if there are two L2 aggregators, when they call the "MetricsPersistentWorker#flushDataToStorage" at the same time, the final cpm data maybe not accurate. Example steps: two L2 aggregator, backend A and B
   
   ```mermaid
   sequenceDiagram
   
   participant L2_A
   participant L2_A_context_cache
   participant L2_B
   participant L2_B_context_cache
   participant persistent_layer
   
   L2_A ->> persistent_layer: MetricsPersistentWorker.loadFromStorage return metrics (demoservice value 2)
   L2_A ->> L2_A_context_cache: load persistent metrics to cache
   
   L2_B ->> persistent_layer: MetricsPersistentWorker.loadFromStorage return metrics (demoservice value 2)
   L2_B ->> L2_B_context_cache: load persistent metrics to cache
   
   L2_A ->> L2_A_context_cache: get cache metrics
   L2_A ->> L2_A: combine current metrics(value 2) with cache metrics, calculate it (value 4) 4 = 2 + 2
   L2_A ->> persistent_layer: update metrics to (demoservice value 4)
   
   L2_B ->> L2_B_context_cache: get cache metrics
   L2_B ->> L2_B: combine current metrics(value 2) with cache metrics, calculate it (value 4) 4 = 2 + 2
   L2_B ->> persistent_layer: update metrics to (demoservice value 4)
   
   L2_A ->> L2_A_context_cache: get cache metrics
   L2_A ->> L2_A: combine current metrics(value 3) with cache metrics, calculate it (value 7) 7 = 4 + 3
   L2_A ->> persistent_layer: update metrics to (demoservice value 7)
   
   L2_B ->> L2_B_context_cache: get cache metrics
   L2_B ->> L2_B: combine current metrics(value 3) with cache metrics, calculate it (value 7) 7 = 4 + 3
   L2_B ->> persistent_layer: update metrics to (demoservice value 7)
   ```
   
   - above steps, when L2_A L2_B PersistenceTimer.extractDataAndSave are executed intersect, and enableDatabaseSession = true, the final result will be 7, but accurate result should be 12 = (2 + (2 + 3) L2_A + (2 + 3)L2_B)
   
   ___
   
   ### Requirement or improvement
   
   - Please describe your requirements or improvement suggestions.
     - If I deploy multi backends in mixed role (multi L2), is it right to turn enableDatabaseSession off to avoid service cpm not accurate?
     - Even though switch enableDatabaseSession off try best avoid cpm not accurate, but when multi L2 query persistent metrics the same time, the cpm will still not accurate (like multi-thread no safe)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] zhyyu commented on issue #6803: service cpm may not accurate when deploy multi L2 aggregators with enableDatabaseSession on

Posted by GitBox <gi...@apache.org>.
zhyyu commented on issue #6803:
URL: https://github.com/apache/skywalking/issues/6803#issuecomment-826600536


   Hi, I finally found that why one specific metrics data will only do persistent in one specific L2 oap node. Because when RemoteSenderService#send called, L1 oap will hash the metrics data(exp: EndpointCpm hashed by timeBucket and entityId), the hash result is unique, so every time L1 choose the same L2 oap node.
   
   And because one metrics data is only persistent in one L2 oap node, so there will no racing condition problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] zhyyu edited a comment on issue #6803: service cpm may not accurate when deploy multi L2 aggregators with enableDatabaseSession on

Posted by GitBox <gi...@apache.org>.
zhyyu edited a comment on issue #6803:
URL: https://github.com/apache/skywalking/issues/6803#issuecomment-824513838


   > If one entity in same metric name in 2 OAP are found in 2 OAP nodes, meaning your cluster management fails.
   > There should be no dirty write, as no race condition.
   
   So in other to make cluster management right, should I just deploy one machine which in charge of L2 aggregation?
   
   https://skywalking.apache.org/docs/main/latest/en/setup/backend/advanced-deployment/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng commented on issue #6803: service cpm may not accurate when deploy multi L2 aggregators with enableDatabaseSession on

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #6803:
URL: https://github.com/apache/skywalking/issues/6803#issuecomment-824509866


   If one entity in same metric name in 2 OAP are found in 2 OAP nodes, meaning your cluster management fails. 
   There should be no dirty write, as no race condition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng closed issue #6803: service cpm may not accurate when deploy multi L2 aggregators with enableDatabaseSession on

Posted by GitBox <gi...@apache.org>.
wu-sheng closed issue #6803:
URL: https://github.com/apache/skywalking/issues/6803


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] wu-sheng commented on issue #6803: service cpm may not accurate when deploy multi L2 aggregators with enableDatabaseSession on

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #6803:
URL: https://github.com/apache/skywalking/issues/6803#issuecomment-824557813


   > So in other to make cluster management right, should I just deploy one machine which in charge of L2 aggregation?
   
   No, because you don't deploy the cluster mode correctly. You may use `0.0.0.0` as all OAP IP, or something similar.
   
   Try health check, https://skywalking.apache.org/docs/main/latest/en/setup/backend/backend-health-check/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking] zhyyu commented on issue #6803: service cpm may not accurate when deploy multi L2 aggregators with enableDatabaseSession on

Posted by GitBox <gi...@apache.org>.
zhyyu commented on issue #6803:
URL: https://github.com/apache/skywalking/issues/6803#issuecomment-824513838


   > If one entity in same metric name in 2 OAP are found in 2 OAP nodes, meaning your cluster management fails.
   > There should be no dirty write, as no race condition.
   
   So in other to make cluster management right, should I just deploy one machine which in charge of L2 aggregation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org