You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by GitBox <gi...@apache.org> on 2022/10/14 09:52:52 UTC
[GitHub] [skywalking] yangyiweigege opened a new issue, #9788: [Bug] Alarm Rule Warn is not accurate
yangyiweigege opened a new issue, #9788:
URL: https://github.com/apache/skywalking/issues/9788
### Search before asking
- [X] I had searched in the [issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no similar issues.
### Apache SkyWalking Component
OAP server (apache/skywalking)
### What happened
when i use the Alarm Rule to monitor metrics,i find the alarm rule is not accurately.
### What you expected to happen
hope zhe alarm rule is accurately.
### How to reproduce
all
### Anything else
when i find the alarm rule is not accurately, i read the source code and add logs in org.apache.skywalking.oap.server.core.alarm.provider.RunningRule.Window#isMatch like this
![image](https://user-images.githubusercontent.com/23202824/195814238-9ab2fef0-ca27-42fc-9748-51b50d166f82.png)
i find the same bucket metrics in the windows.
so i analysis the code of org.apache.skywalking.oap.server.core.alarm.provider.RunningRule.Window#add,
![image](https://user-images.githubusercontent.com/23202824/195815975-95dddc2b-8ac4-4a93-9a62-10af3d20326e.png)
the endtime in this code will have seconds data, such as:20221014174301,Assuming that:
endtime is 20221014174301
two metrics bucket is [202210141743, 202210141744]
the code minutes will all return 0, and set the tow metrics in the same index.
```
int minutes = Minutes.minutesBetween(timeBucket, this.endTime).getMinutes();
this.values.set(values.size() - minutes - 1, metrics);
```
so it will cause the windows data error.
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [skywalking] yangyiweigege commented on issue #9788: [Bug] Alarm Rule Warn is not accurate
Posted by GitBox <gi...@apache.org>.
yangyiweigege commented on issue #9788:
URL: https://github.com/apache/skywalking/issues/9788#issuecomment-1278992963
>
yes, i add the logs in org.apache.skywalking.oap.server.core.alarm.provider.RunningRule.Window#isMatch, print all metrics buckets in windows, sometimes i can see the log like this:
![75F43DB2-52F7-455B-8990-6E6C378D1D68](https://user-images.githubusercontent.com/23202824/195855593-d0de7e77-25ec-4089-898c-ad8fb59d1985.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [skywalking] wu-sheng commented on issue #9788: [Bug] Alarm Rule Warn is not accurate
Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #9788:
URL: https://github.com/apache/skywalking/issues/9788#issuecomment-1278897079
All time buckets for alerting and moving windows should be in minutes actually.
I am not sure why this happens, could you explain more?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [skywalking] wu-sheng commented on issue #9788: [Bug] Alarm Rule Warn is not accurate
Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #9788:
URL: https://github.com/apache/skywalking/issues/9788#issuecomment-1279182456
OK, sorry for misleading, I just wanted to mention using withMinute for better readability. You could check whether we have with millisecond.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [skywalking] wu-sheng commented on issue #9788: [Bug] Alarm Rule Warn is not accurate
Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #9788:
URL: https://github.com/apache/skywalking/issues/9788#issuecomment-1279007730
I would like to recommend you change in another way but with similar logic,
`AlarmCore#start`(L58), set `LocalDateTime checkTime = LocalDateTime.now();`'s second to `0`. Because generally, Line 75(`lastExecuteTime = checkTime.minusSeconds(checkTime.getSecondOfMinute());`) has executed the same logic.
And we could keep this in the same place.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [skywalking] wu-sheng commented on issue #9788: [Bug] Alarm Rule Warn is not accurate
Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #9788:
URL: https://github.com/apache/skywalking/issues/9788#issuecomment-1278892063
I am not 100% following, when would the `endTime` be in the second?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [skywalking] yangyiweigege commented on issue #9788: [Bug] Alarm Rule Warn is not accurate
Posted by GitBox <gi...@apache.org>.
yangyiweigege commented on issue #9788:
URL: https://github.com/apache/skywalking/issues/9788#issuecomment-1279033475
> Are you going to submit a pull request to fix?
yes。 i will subimit a pull request tommorrow.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [skywalking] yangyiweigege commented on issue #9788: [Bug] Alarm Rule Warn is not accurate
Posted by GitBox <gi...@apache.org>.
yangyiweigege commented on issue #9788:
URL: https://github.com/apache/skywalking/issues/9788#issuecomment-1279159577
> yes , this is better for read. but one point should be noticed,we must remove the second and millisSecond both
>
> ```java
> final LocalDateTime target = targetTime.withSecondOfMinute(0).withMillisOfSecond(0);
> windows.values().forEach(window -> window.moveTo(target));
> ```
>
> <img alt="image" width="995" src="https://user-images.githubusercontent.com/23202824/195883672-c458a7cd-9504-4eb0-b09c-09d46e9baaa5.png">
if only remove seconds, this code also have bug:
```java
int minutes = Minutes.minutesBetween(endTime, current).getMinutes();
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [skywalking] wu-sheng commented on issue #9788: [Bug] Alarm Rule Warn is not accurate
Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #9788:
URL: https://github.com/apache/skywalking/issues/9788#issuecomment-1279077772
> I would like to recommend you change in another way but with similar logic,
>
> `AlarmCore#start`(L58), set `LocalDateTime checkTime = LocalDateTime.now();`'s second to `0`. Because generally, Line 75(`lastExecuteTime = checkTime.minusSeconds(checkTime.getSecondOfMinute());`) has executed the same logic. And we could keep this in the same place.
Sorry, this fix is wrong. I think yours is better, but I think using `withSecondOfMinute` is better for read?
```java
public void moveTo(LocalDateTime targetTime) {
final LocalDateTime target = targetTime.withSecondOfMinute(0);
windows.values().forEach(window -> window.moveTo(target));
}
```
Meanwhile, AlarmCore#start(L75) should use the `withSecondOfMinute(0)` too.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [skywalking] yangyiweigege commented on issue #9788: [Bug] Alarm Rule Warn is not accurate
Posted by GitBox <gi...@apache.org>.
yangyiweigege commented on issue #9788:
URL: https://github.com/apache/skywalking/issues/9788#issuecomment-1278968310
> All time buckets for alerting and moving windows should be in minutes actually. I am not sure why this happens, could you explain more?
endTime is changed in org.apache.skywalking.oap.server.core.alarm.provider.RunningRule.Window#moveTo ,and moveTo is called by two method,one is org.apache.skywalking.oap.server.core.alarm.provider.RunningRule.Window#add, another is
org.apache.skywalking.oap.server.core.alarm.provider.RunningRule#moveTo.
when org.apache.skywalking.oap.server.core.alarm.provider.RunningRule#moveTo is called by schedule task, the endTime will be updated,and the value contains seconds,like 20221014174301, code in org.apache.skywalking.oap.server.core.alarm.provider.AlarmCore#start:
<img width="1248" alt="image" src="https://user-images.githubusercontent.com/23202824/195851375-f6c1d0a1-4b59-4593-a9af-12c3a34026e9.png">
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [skywalking] wu-sheng commented on issue #9788: [Bug] Alarm Rule Warn is not accurate
Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #9788:
URL: https://github.com/apache/skywalking/issues/9788#issuecomment-1278976211
OK, I think I know the point. The timestamp includes the second, but the bucket slot doesn't.
Due to our calculation(`int minutes = Minutes.minutesBetween(endTime, current).getMinutes();`) is involved the timestamp(in second), the duration(in minute) could be mis-calculated to get a wrong slot to save the value, such as
From`11:40:14` to `11:44:01`, we expected `4` minutes, but due to the second involved, the value should be `3`, then we got the wrong slot.
Could you confirm the case?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [skywalking] wu-sheng closed issue #9788: [Bug] Alarm Rule Warn is not accurate
Posted by GitBox <gi...@apache.org>.
wu-sheng closed issue #9788: [Bug] Alarm Rule Warn is not accurate
URL: https://github.com/apache/skywalking/issues/9788
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [skywalking] yangyiweigege commented on issue #9788: [Bug] Alarm Rule Warn is not accurate
Posted by GitBox <gi...@apache.org>.
yangyiweigege commented on issue #9788:
URL: https://github.com/apache/skywalking/issues/9788#issuecomment-1278998600
> I think we just need to set the second to `0` before `moveTo`.
yes i have change
> I think we just need to set the second to `0` before `moveTo`.
yes ,i have changed the code in org.apache.skywalking.oap.server.core.alarm.provider.RunningRule#moveTo in my local code,like this:
```
public void moveTo(LocalDateTime targetTime) {
// 去掉秒,去掉毫秒,时间取整到分钟
LocalDateTime targetTimeMinute = targetTime.minusSeconds(targetTime.getSecondOfMinute()).minusMillis(targetTime.getMillisOfSecond());
windows.values().forEach(window -> window.moveTo(targetTimeMinute));
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [skywalking] wu-sheng commented on issue #9788: [Bug] Alarm Rule Warn is not accurate
Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #9788:
URL: https://github.com/apache/skywalking/issues/9788#issuecomment-1278994163
I think we just need to set the second to `0` before `moveTo`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [skywalking] wu-sheng commented on issue #9788: [Bug] Alarm Rule Warn is not accurate
Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #9788:
URL: https://github.com/apache/skywalking/issues/9788#issuecomment-1279028958
Are you going to submit a pull request to fix?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [skywalking] yangyiweigege commented on issue #9788: [Bug] Alarm Rule Warn is not accurate
Posted by GitBox <gi...@apache.org>.
yangyiweigege commented on issue #9788:
URL: https://github.com/apache/skywalking/issues/9788#issuecomment-1279156166
yes , this is better for read. but one point should be noticed,we must remove the second and millisSecond both
```java
final LocalDateTime target = targetTime.withSecondOfMinute(0).withMillisOfSecond(0);
windows.values().forEach(window -> window.moveTo(target));
```
<img width="995" alt="image" src="https://user-images.githubusercontent.com/23202824/195883672-c458a7cd-9504-4eb0-b09c-09d46e9baaa5.png">
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org