You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/07/23 03:44:40 UTC

[GitHub] [incubator-pinot] suvodeep-pyne opened a new pull request #5740: [TE] Added a backfill start date for Anomaly Detection

suvodeep-pyne opened a new pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740


   ## Description
   When an alert is created, it automatically triggers a YAML Onboarding Task
   which detects all anomalies in the past 30 days. This change makes the
   setting configurable by providing a backfill start timestamp.
   
   Not providing the value or simply providing a dubious entry would result
   in the system defaulting to the 30 day lookback.
   
   This change also updates the alert template UI to make it easy for a new
   user to try this config
   
   There are no backward incompatible changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] suvodeep-pyne commented on a change in pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
suvodeep-pyne commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r462001207



##########
File path: thirdeye/thirdeye-pinot/src/main/java/org/apache/pinot/thirdeye/detection/yaml/YamlResource.java
##########
@@ -236,10 +236,17 @@ private void createYamlOnboardingTask(long configId, long tuningWindowStart, lon
     info.setTuningWindowStart(tuningWindowStart);
     info.setTuningWindowEnd(tuningWindowEnd);
     info.setEnd(System.currentTimeMillis());
-    info.setStart(info.getEnd() - ONBOARDING_REPLAY_LOOKBACK);
+
+    long lastTimestamp = detectionConfig.getLastTimestamp();
+    // If no value is present, set the default lookback
+    if (lastTimestamp < 0) {

Review comment:
       Hey @akshayrai 
   During `org.apache.pinot.thirdeye.detection.yaml.translator.DetectionConfigTranslator#generateDetectionConfig`, the default is always set to `-1` in which case, it is set to `System.currentTimeMillis()`. ref: `org/apache/pinot/thirdeye/detection/yaml/YamlResource.java:242`

##########
File path: thirdeye/thirdeye-pinot/src/main/java/org/apache/pinot/thirdeye/detection/yaml/YamlResource.java
##########
@@ -236,10 +236,17 @@ private void createYamlOnboardingTask(long configId, long tuningWindowStart, lon
     info.setTuningWindowStart(tuningWindowStart);
     info.setTuningWindowEnd(tuningWindowEnd);
     info.setEnd(System.currentTimeMillis());
-    info.setStart(info.getEnd() - ONBOARDING_REPLAY_LOOKBACK);
+
+    long lastTimestamp = detectionConfig.getLastTimestamp();
+    // If no value is present, set the default lookback
+    if (lastTimestamp < 0) {

Review comment:
       Hey @akshayrai 
   During `org.apache.pinot.thirdeye.detection.yaml.translator.DetectionConfigTranslator#generateDetectionConfig`, the default is always set to `-1` in which case, it is set to 30 days before current time. ref: `org/apache/pinot/thirdeye/detection/yaml/YamlResource.java:242`
   
   I can simplify that a bit but then the `YamlResource` default needs to accessed from the translator which I found to be a bit confusing.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] suvodeep-pyne commented on a change in pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
suvodeep-pyne commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r459684283



##########
File path: thirdeye/thirdeye-pinot/src/main/java/org/apache/pinot/thirdeye/detection/yaml/translator/DetectionConfigTranslator.java
##########
@@ -189,6 +190,7 @@ private DetectionConfigDTO generateDetectionConfig(Map<String, Object> yamlConfi
     config.setCron(cron);
     config.setActive(MapUtils.getBooleanValue(yamlConfigMap, PROP_ACTIVE, true));
     config.setYaml(yamlConfig);
+    config.setBackfillStart(parseTimeStampLong(yamlConfigMap.get(PROP_BACKFILL_START)));

Review comment:
       I found `org.apache.pinot.thirdeye.detection.ConfigUtils#getLongs(java.util.Collection<java.lang.Number>)` but that takes a list and has a slightly different behavior which would require me to have a separate method anyway.
   
   Are you aware of any other utility? Please do share if you find something.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] akshayrai merged pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
akshayrai merged pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] suvodeep-pyne commented on a change in pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
suvodeep-pyne commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r459702739



##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -12,12 +12,11 @@ dataset: dataset_to_which_this_metric_belongs
 rules:
 - detection:
   - name: detection_rule_1
-    type: ALGORITHM             # Configure the detection type here. See doc for more details.
+    type: PERCENTAGE_RULE       # Configure the detection type here. See doc for more details.

Review comment:
       @harleyjj @akshayrai 
   We could keep the default setting for now if it helps and come up with a plan. I did this now since I thought that have a working default setting would be easier for a user to get started.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] suvodeep-pyne commented on pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
suvodeep-pyne commented on pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#issuecomment-663334111


   Hi @akshayrai 
   
   I replaced the `backfillStart` with the already existing `lastTimestamp`. This value is used as a checkpoint and is already being persisted in the db so this seems to be a better fit.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] akshayrai commented on a change in pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
akshayrai commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r461792659



##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -29,6 +28,20 @@ rules:
     type: DATA_SLA              # Alert if data is missing.
     params:
       sla: 3_DAYS               # Data is missing for 3 days since last availability
+
+# You can mention a cron to allow this detection to be run periodically
+# For example, the cron below would execute every 5 minutes.
+# cron: "0 0/5 * 1/1 * ? *"
+
+# Detections in past data
+# You can use the lastTimestamp parameter to denote how far back you want the anomalies
+# to be detected. This value acts like a checkpoint or high watermark. 
+# A valid entry is a non negative timestamp value in millis.
+# 
+# - By default, (if not mentioned), the lookback is 30 days
+# - A value of 0 (zero) implies complete backfill till the start of data. As shown below.

Review comment:
       Couple of concerns with backfilling all the data:
   1. Backfilling all the data can take quite some time, especially if there are years worth of data and possibly even fail due to timeout and other issues.
   2. There isn't much benefit of backfilling older data. Users usually don't care about historical anomalies/issues, at least not older than a month.
   3. The goal of the 30 day default backfill is to quickly show some stats and recent anomalies as soon as the alert is created. Otherwise, the alert page will look empty and users have to revisit after some time.
   4. Typically, users tend to create an alert using the basic settings and update it later on. When they update,  anomalies created using the initial setting will no longer make sense.
   
   We would increase the default limit from 30 days to say 90 days if that is more reasonable?

##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -29,6 +28,20 @@ rules:
     type: DATA_SLA              # Alert if data is missing.
     params:
       sla: 3_DAYS               # Data is missing for 3 days since last availability
+
+# You can mention a cron to allow this detection to be run periodically
+# For example, the cron below would execute every 5 minutes.
+# cron: "0 0/5 * 1/1 * ? *"
+
+# Detections in past data
+# You can use the lastTimestamp parameter to denote how far back you want the anomalies
+# to be detected. This value acts like a checkpoint or high watermark. 
+# A valid entry is a non negative timestamp value in millis.
+# 
+# - By default, (if not mentioned), the lookback is 30 days
+# - A value of 0 (zero) implies complete backfill till the start of data. As shown below.

Review comment:
       Couple of concerns with backfilling all the data:
   1. Backfilling all the data can take quite some time, especially if there are years worth of data and possibly even fail due to timeout and other issues.
   2. There isn't much benefit of backfilling older data. Users usually don't care about historical anomalies/issues, at least not older than a month.
   3. The goal of the 30 day default backfill is to quickly show some stats and recent anomalies as soon as the alert is created. Otherwise, the alert page will look empty and users have to revisit after some time.
   4. Typically, users tend to create an alert using the basic settings and update it later on. When they update,  anomalies created using the initial setting will no longer make sense.
   
   We could increase the default limit from 30 days to say 90 days if that is more reasonable?

##########
File path: thirdeye/thirdeye-pinot/src/main/java/org/apache/pinot/thirdeye/detection/yaml/YamlResource.java
##########
@@ -236,10 +236,17 @@ private void createYamlOnboardingTask(long configId, long tuningWindowStart, lon
     info.setTuningWindowStart(tuningWindowStart);
     info.setTuningWindowEnd(tuningWindowEnd);
     info.setEnd(System.currentTimeMillis());
-    info.setStart(info.getEnd() - ONBOARDING_REPLAY_LOOKBACK);
+
+    long lastTimestamp = detectionConfig.getLastTimestamp();
+    // If no value is present, set the default lookback
+    if (lastTimestamp < 0) {

Review comment:
       Since the default value of lastTimestamp will be 0 (when not set), we should probably change this condition here to not trigger detection on the entire dataset.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] harleyjj commented on a change in pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
harleyjj commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r459688843



##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -12,12 +12,11 @@ dataset: dataset_to_which_this_metric_belongs
 rules:
 - detection:
   - name: detection_rule_1
-    type: ALGORITHM             # Configure the detection type here. See doc for more details.
+    type: PERCENTAGE_RULE       # Configure the detection type here. See doc for more details.

Review comment:
       I think we can move these templates into the environment.js file [here](https://github.com/apache/incubator-pinot/blob/master/thirdeye/thirdeye-frontend/config/environment.js) and then different users can configure their own default templates for detection and subscription group yaml configurations.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] akshayrai commented on a change in pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
akshayrai commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r460318266



##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -29,6 +28,20 @@ rules:
     type: DATA_SLA              # Alert if data is missing.
     params:
       sla: 3_DAYS               # Data is missing for 3 days since last availability
+
+# You can mention a cron to allow this detection to be run periodically
+# For example, the cron below would execute every 5 minutes.
+# cron: "0 0/5 * 1/1 * ? *"
+
+# Backfill detections
+# You can use the backfillStart parameter to denote how far back you want the anomalies
+# to be detected.
+# A valid entry is a timestamp value in millis.
+# 
+# - By default, (if not mentioned), the lookback is 30 days
+# - A value of 0 (zero) implies complete backfill till the start of data. As shown below.
+# backfillStart: 0 
+

Review comment:
       I understand. We too discussed this idea in the past, but the issue with this approach is that - There are several basic alert and subscription parameters which we might want to add to the template and this will simply keep growing over time making the config bigger and harder to read.
   
   To help with this problem, the alternative we have taken is,
   a. A link is provided in the create alert UI  (at the top) that should point to the documentation on the alert configs. (I can work on OS'ing this wiki if it isn't already)
   b. Eventually, we should have a full form UI through which all the basic features should be surfaced. The yaml config is more for advanced users who can refer to a doc and update the configs as required.
   
   So, it would be better to keep the template simple and put the additional settings in the doc.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] akshayrai commented on a change in pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
akshayrai commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r460318552



##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -29,6 +28,20 @@ rules:
     type: DATA_SLA              # Alert if data is missing.
     params:
       sla: 3_DAYS               # Data is missing for 3 days since last availability
+
+# You can mention a cron to allow this detection to be run periodically
+# For example, the cron below would execute every 5 minutes.
+# cron: "0 0/5 * 1/1 * ? *"
+
+# Backfill detections
+# You can use the backfillStart parameter to denote how far back you want the anomalies
+# to be detected.
+# A valid entry is a timestamp value in millis.
+# 
+# - By default, (if not mentioned), the lookback is 30 days
+# - A value of 0 (zero) implies complete backfill till the start of data. As shown below.
+# backfillStart: 0 

Review comment:
       Couple of issues/suggestions
   1.  There is an api through which you can trigger a replay or backfill. You might want to consider exposing this api to the user. It is better to keep the alert creation independent of backfill. 30 days is hard-coded just to populate some metrics on the UI for users to play around.
   2. If you haven't explored yet, there is a preview experience through which users can trigger detection on older data. However, this is adhoc and there is no option of saving the results. You can consider adding the option of persisting the results!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] akshayrai commented on a change in pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
akshayrai commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r460318552



##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -29,6 +28,20 @@ rules:
     type: DATA_SLA              # Alert if data is missing.
     params:
       sla: 3_DAYS               # Data is missing for 3 days since last availability
+
+# You can mention a cron to allow this detection to be run periodically
+# For example, the cron below would execute every 5 minutes.
+# cron: "0 0/5 * 1/1 * ? *"
+
+# Backfill detections
+# You can use the backfillStart parameter to denote how far back you want the anomalies
+# to be detected.
+# A valid entry is a timestamp value in millis.
+# 
+# - By default, (if not mentioned), the lookback is 30 days
+# - A value of 0 (zero) implies complete backfill till the start of data. As shown below.
+# backfillStart: 0 

Review comment:
       Couple of issues/suggestions
   1.  There is an api through which you can trigger a replay or backfill. You might want to consider exposing this api to the user. It is better to keep the alert creation independent of backfill. 30 days is hard-coded just to populate some metrics on the UI for users to play around.
   2. If you haven't explored yet, there is a preview experience through which users can trigger detection on older data. However, this is adhoc and there is no option of saving the results.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] akshayrai commented on a change in pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
akshayrai commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r459665554



##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -29,6 +28,20 @@ rules:
     type: DATA_SLA              # Alert if data is missing.
     params:
       sla: 3_DAYS               # Data is missing for 3 days since last availability
+
+# You can mention a cron to allow this detection to be run periodically
+# For example, the cron below would execute every 5 minutes.
+# cron: "0 0/5 * 1/1 * ? *"
+
+# Backfill detections
+# You can use the backfillStart parameter to denote how far back you want the anomalies
+# to be detected.
+# A valid entry is a timestamp value in millis.
+# 
+# - By default, (if not mentioned), the lookback is 30 days
+# - A value of 0 (zero) implies complete backfill till the start of data. As shown below.
+# backfillStart: 0 

Review comment:
       Can you shed some light on the need for a custom backfill configuration? Since this is a one time setting, I am not very confident of putting it in the detection config.
   

##########
File path: thirdeye/thirdeye-pinot/src/main/java/org/apache/pinot/thirdeye/detection/yaml/translator/DetectionConfigTranslator.java
##########
@@ -189,6 +190,7 @@ private DetectionConfigDTO generateDetectionConfig(Map<String, Object> yamlConfi
     config.setCron(cron);
     config.setActive(MapUtils.getBooleanValue(yamlConfigMap, PROP_ACTIVE, true));
     config.setYaml(yamlConfig);
+    config.setBackfillStart(parseTimeStampLong(yamlConfigMap.get(PROP_BACKFILL_START)));

Review comment:
       I think we already have a utility for this. Can you try MapUtils or ConfigUtils?

##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -29,6 +28,20 @@ rules:
     type: DATA_SLA              # Alert if data is missing.
     params:
       sla: 3_DAYS               # Data is missing for 3 days since last availability
+
+# You can mention a cron to allow this detection to be run periodically
+# For example, the cron below would execute every 5 minutes.
+# cron: "0 0/5 * 1/1 * ? *"
+
+# Backfill detections
+# You can use the backfillStart parameter to denote how far back you want the anomalies
+# to be detected.
+# A valid entry is a timestamp value in millis.
+# 
+# - By default, (if not mentioned), the lookback is 30 days
+# - A value of 0 (zero) implies complete backfill till the start of data. As shown below.
+# backfillStart: 0 
+

Review comment:
       I'd prefer keeping the template simple and put all the additional configurations in the user guide. Regular users need not know about these parameters.

##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -29,6 +28,20 @@ rules:
     type: DATA_SLA              # Alert if data is missing.
     params:
       sla: 3_DAYS               # Data is missing for 3 days since last availability
+
+# You can mention a cron to allow this detection to be run periodically
+# For example, the cron below would execute every 5 minutes.
+# cron: "0 0/5 * 1/1 * ? *"
+
+# Backfill detections

Review comment:
       The term backfill is a bit overloaded. It could also mean rerunning the detection which is not what is intended.

##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -12,12 +12,11 @@ dataset: dataset_to_which_this_metric_belongs
 rules:
 - detection:
   - name: detection_rule_1
-    type: ALGORITHM             # Configure the detection type here. See doc for more details.
+    type: PERCENTAGE_RULE       # Configure the detection type here. See doc for more details.

Review comment:
       + @harleyjj 
   Can we customize and configure the default template? At LinkedIn, we are relying on ALGORITHM as the default detection. If we change this, we need to have an alternative to plug a different default setting.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] suvodeep-pyne commented on pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
suvodeep-pyne commented on pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#issuecomment-665395239


   Resolving older comments as per conversation: All yaml template changes have been removed. And the `lastTimestamp` has been exposed as is with a default value of 30 days prior to current time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] akshayrai commented on a change in pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
akshayrai commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r460318786



##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -12,12 +12,11 @@ dataset: dataset_to_which_this_metric_belongs
 rules:
 - detection:
   - name: detection_rule_1
-    type: ALGORITHM             # Configure the detection type here. See doc for more details.
+    type: PERCENTAGE_RULE       # Configure the detection type here. See doc for more details.

Review comment:
       Yes, we should pull this out into a config. @suvodeep-pyne, feel free to update this or we can do it at a later stage.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] akshayrai commented on a change in pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
akshayrai commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r460319108



##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -29,6 +28,20 @@ rules:
     type: DATA_SLA              # Alert if data is missing.
     params:
       sla: 3_DAYS               # Data is missing for 3 days since last availability
+
+# You can mention a cron to allow this detection to be run periodically
+# For example, the cron below would execute every 5 minutes.
+# cron: "0 0/5 * 1/1 * ? *"
+
+# Backfill detections

Review comment:
       There are apis for each one of them. I will consider documenting it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] suvodeep-pyne commented on a change in pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
suvodeep-pyne commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r459680850



##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -29,6 +28,20 @@ rules:
     type: DATA_SLA              # Alert if data is missing.
     params:
       sla: 3_DAYS               # Data is missing for 3 days since last availability
+
+# You can mention a cron to allow this detection to be run periodically
+# For example, the cron below would execute every 5 minutes.
+# cron: "0 0/5 * 1/1 * ? *"
+
+# Backfill detections
+# You can use the backfillStart parameter to denote how far back you want the anomalies
+# to be detected.
+# A valid entry is a timestamp value in millis.
+# 
+# - By default, (if not mentioned), the lookback is 30 days
+# - A value of 0 (zero) implies complete backfill till the start of data. As shown below.
+# backfillStart: 0 
+

Review comment:
       Hey @akshayrai 
   
   I understand your point. The problem is that it is very hard for a regular user to get started on ThirdEye. These configurations are buried in the doc somewhere on a FAQ or you have to go through the code. I didn't know that there is a capability to add cron until someone else told me.
   
   I think a YAML interface may be good for advanced users who can have configurations saved in files/elsewhere. However for a regular user,  it might help to just have a simple UI with text boxes and dropdowns etc which have hints/prepopulated and are easy to use without the need for documentation.
   
   I added to the template here so that the user doesn't need to go find information elsewhere but is able to see it right within the app. I agree this makes the template more complex and that's the reason why I pushed it down at the bottom. Let me know what you think and I can revert the template changes.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] suvodeep-pyne commented on a change in pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
suvodeep-pyne commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r459681531



##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -12,12 +12,11 @@ dataset: dataset_to_which_this_metric_belongs
 rules:
 - detection:
   - name: detection_rule_1
-    type: ALGORITHM             # Configure the detection type here. See doc for more details.
+    type: PERCENTAGE_RULE       # Configure the detection type here. See doc for more details.

Review comment:
       Hey @akshayrai 
   
   `ALGORITHM` doesn't seem to work in open source. It says type not found. I changed it so that we can have a default that users can use out of the box.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] suvodeep-pyne commented on a change in pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
suvodeep-pyne commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r459686686



##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -29,6 +28,20 @@ rules:
     type: DATA_SLA              # Alert if data is missing.
     params:
       sla: 3_DAYS               # Data is missing for 3 days since last availability
+
+# You can mention a cron to allow this detection to be run periodically
+# For example, the cron below would execute every 5 minutes.
+# cron: "0 0/5 * 1/1 * ? *"
+
+# Backfill detections

Review comment:
       Yes. That is a side effect but it happens today anyway. Whenever, you create an alert, you run the detection for the past month every time. This parameter just exposes the ability to configure the same.
   
   But I totally agree on having separate atomic APIs in the longer term for creating alerts, doing detections and saving them. Currently the preview runs detections but does not save them. Creating an alert also fires detections but doesn't allow configuring the detection. Also, it doesn't seem to be possible to create a new alert without triggering a detectionjob.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] suvodeep-pyne commented on a change in pull request #5740: [TE] Added a backfill start date for Anomaly Detection

Posted by GitBox <gi...@apache.org>.
suvodeep-pyne commented on a change in pull request #5740:
URL: https://github.com/apache/incubator-pinot/pull/5740#discussion_r459691900



##########
File path: thirdeye/thirdeye-frontend/app/utils/yaml-tools.js
##########
@@ -29,6 +28,20 @@ rules:
     type: DATA_SLA              # Alert if data is missing.
     params:
       sla: 3_DAYS               # Data is missing for 3 days since last availability
+
+# You can mention a cron to allow this detection to be run periodically
+# For example, the cron below would execute every 5 minutes.
+# cron: "0 0/5 * 1/1 * ? *"
+
+# Backfill detections
+# You can use the backfillStart parameter to denote how far back you want the anomalies
+# to be detected.
+# A valid entry is a timestamp value in millis.
+# 
+# - By default, (if not mentioned), the lookback is 30 days
+# - A value of 0 (zero) implies complete backfill till the start of data. As shown below.
+# backfillStart: 0 

Review comment:
       Agreed. But I found this to be the easiest way of having this capability without a larger change. 
   
   So, here is the use case: I uploaded a new dataset and I want to see the anomalies. I was unable to find a way to do this. Creating an alert would trigger detections for the past month but not before that. It is not configurable. And I couldn't find a way to just fire a detection from the UI.
   
   This parameter allows the user to have the ability to use a default or choose a start time at the creation of the alert. I agree with you that creating an alert rule and running detection can be separated. This change sort of addresses the immediate requirement that allows users to upload their datasets and run detections on past data. Let me know what you think.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org