You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/06/21 05:34:35 UTC

[GitHub] [dolphinscheduler] EricGao888 opened a new issue, #10525: [Feature][Metrics] Increase granularity of metrics

EricGao888 opened a new issue, #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar feature requirement.
   
   
   ### Description
   
   * At present, Dolphin Scheduler only provides metrics with constant names and tags, which means we could not monitor the behavior of a specific workflow or task. 
   * We shall provide metrics with constant names but dynamic tags to increase granularity.
   
   ### Use case
   
   * Some specific workflows / tasks on prod could be vital and may cause great loss to users if failed, we need metrics to monitor these VIP tasks / workflows.
   
   ### Related issues
   
   related: #9324 
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
EricGao888 commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1169954936

   ![image](https://user-images.githubusercontent.com/34905992/176442655-a58497d5-be2f-4a06-88a4-977f8e08c547.png)
   I just got a reply from the author of `Micrometer`. Enabling users to choose several VIP workflows to add extra detailed metrics would definitely be much better than simply generating detailed metrics for all workflows / tasks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by "EricGao888 (via GitHub)" <gi...@apache.org>.
EricGao888 commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1447648891

   related: #13552
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
EricGao888 commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1168487016

   BTW, the term `dynamic metrics` I'm using here refers to metrics instantiated during runtime with pre-defined name and generated tag (e.g. task_id, workflow_id). Hope it doesn't cause confusion here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 closed issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
EricGao888 closed issue #10525: [Feature][Metrics] Increase granularity of metrics
URL: https://github.com/apache/dolphinscheduler/issues/10525


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
EricGao888 commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1166944789

   It seems `micrometer` will not automatically manage those dynamically-generated metrics for us. 
   
   First of all, we may use constant name and dynamic tags, e.g. workflow_id to generate metrics dynamically.
   
   Secondly, I plan to use a class `DynamicMetricsManager` to provide a method to instantiate metrics during runtime. `DynamicMetricsManager` uses `ConcurrentHashMap` to store those dynamic metrics.
   
   Thirdly, when workflow / task instance finished, `DynamicMetricsManager` will remove related metrics from `ConcurrentHashMap` and put them into `Caffeine Cache` with `expireAfterAcess` larger than `Prometheus` pull interval to manage the life cycle of those dynamic metrics.
   
   WDYT? @ruanwenjun  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] kezhenxu94 commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
kezhenxu94 commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1168460039

   Putting id as tag doesn't require dynamic metrics, no?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
EricGao888 commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1168478839

   > Putting id as tag doesn't require dynamic metrics, no?
   
   Things seem a little different in `Micrometer` to me. If it is possible to add tags during runtime, it would be much easier. However, I haven't figured out a way to add dynamic tags to metric instance during runtime. Please correct me if I miss something. Thanks! https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/Counter.java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
EricGao888 commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1168482596

   ![image](https://user-images.githubusercontent.com/34905992/176146627-f53677b5-6696-425c-b680-5a94bcc09078.png)
   
   https://github.com/micrometer-metrics/micrometer/blob/4de0d5f70dbd05dc1d66b48012399e90b22a6005/micrometer-core/src/main/java/io/micrometer/core/instrument/Counter.java#L122-L132
   
   Not sure if there is a way to do the trick more gracefully.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
EricGao888 commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1166953380

   > It seems `micrometer` will not automatically manage those dynamically-generated metrics for us.
   > 
   > First of all, we may use constant name and dynamic tags, e.g. workflow_id to generate metrics dynamically.
   > 
   > Secondly, I plan to use a class `DynamicMetricsManager` to provide a method to instantiate metrics during runtime. `DynamicMetricsManager` uses `ConcurrentHashMap` to store those dynamic metrics.
   > 
   > Thirdly, when workflow / task instance finished, `DynamicMetricsManager` will remove related metrics from `ConcurrentHashMap` and put them into `Caffeine Cache` with `expireAfterAcess` larger than `Prometheus` pull interval to manage the life cycle of those dynamic metrics and avoid memory explosion.
   > 
   > WDYT? @ruanwenjun
   
   @kezhenxu94 Any suggestions will be very appreciated. : )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
EricGao888 commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1168450914

   > > Hi @EricGao888 , what kind of dynamic metrics do we have? You know, to fully use these metrics we need to display them and configure in Grafana, if we don't know these metrics in advance, how to configure these and use these metrics? Can you give some examples that we need dynamic metrics?
   > 
   > @kezhenxu94 Thanks for the reply!
   > 
   > Considering the following scenario: some user A works for a big financial company, and A chooses to use DS to schedule ETL tasks. Some of these tasks are not that important, but some are crucial, which may strongly relate to their financial business. Therefore, A wants some metrics to monitor these specific VIP tasks to avoid breaching SLAs for their customers. For example, task running duration, retry times, failover times, etc.
   > 
   > To easily configure these dynamic metrics, we could name them as ds.dynamic.task.<task_id>.duration. Since A does not need to monitor dynamic metrics for every tasks, just the VIP ones, he could configure those metrics in grafana manually.
   > 
   > But indeed, there is a concern that how users could get the task_id / workflow_id.
   
   Typo here, shouldn't use <task_id>s in metrics names but put them in tags as described in `Description` section and comment https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1168318015


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] kezhenxu94 commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
kezhenxu94 commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1168292164

   Hi @EricGao888 , what kind of dynamic metrics do we have? You know, to fully use these metrics we need to display them and configure in Grafana, if we don't know these metrics in advance, how to configure these and use these metrics? Can you give some examples that we need dynamic metrics?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
EricGao888 commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1168333310

   > > > Hi @EricGao888 , what kind of dynamic metrics do we have? You know, to fully use these metrics we need to display them and configure in Grafana, if we don't know these metrics in advance, how to configure these and use these metrics? Can you give some examples that we need dynamic metrics?
   > > 
   > > 
   > > @kezhenxu94 Thanks for the reply!
   > > Considering the following scenario: some user A works for a big financial company, and A chooses to use DS to schedule ETL tasks. Some of these tasks are not that important, but some are crucial, which may strongly relate to their financial business. Therefore, A wants some metrics to monitor these specific VIP tasks to avoid breaching SLAs for their customers. For example, task running duration, retry times, failover times, etc.
   > > To easily configure these dynamic metrics, we could name them as ds.dynamic.task.<task_id>.duration. Since A does not need to monitor dynamic metrics for every tasks, just the VIP ones, he could configure those metrics in grafana manually.
   > > But indeed, there is a concern that how users could get the task_id / workflow_id.
   > 
   > Hi, task id can be added as tags and the users can filter by tags when configuring Grafana. Also, is task ID constant and won't change between different executions? If they change it's impractical to configure these constantly changing IDs.
   
   Thanks for bringing up this point. In several other scheduling systems, task instance id will change but task id remains constant. But I need to double check how it works in DS and whether it is possible to track task instances of a specific task.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
EricGao888 commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1168313138

   > Hi @EricGao888 , what kind of dynamic metrics do we have? You know, to fully use these metrics we need to display them and configure in Grafana, if we don't know these metrics in advance, how to configure these and use these metrics? Can you give some examples that we need dynamic metrics?
   
   @kezhenxu94 Thanks for the reply! 
   
   Considering the following scenario: some user A works for a big financial company, and A chooses to use DS to schedule ETL tasks. Some of these tasks are not that important, but some are crucial, which may strongly relate to their financial business. Therefore, A wants some metrics to monitor these specific VIP tasks to avoid breaching SLAs for their customers. For example, task running duration, retry times, failover times, etc.
   
   To easily configure these dynamic metrics, we could name them as ds.dynamic.task.<task_id>.duration. Since A does not need to monitor dynamic metrics for every tasks, just the VIP ones, he could configure those metrics in grafana manually.
   
   But indeed, there is a concern that how users could get the task_id / workflow_id.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
EricGao888 commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1169747224

   > It seems `micrometer` will not automatically manage those dynamically-generated metrics for us.
   > 
   > First of all, we may use constant name and dynamic tags, e.g. workflow_id to generate metrics dynamically.
   > 
   > Secondly, I plan to use a class `DynamicMetricsManager` to provide a method to instantiate metrics during runtime. `DynamicMetricsManager` uses `ConcurrentHashMap` to store those dynamic metrics.
   > 
   > Thirdly, when workflow / task instance finished, `DynamicMetricsManager` will remove related metrics from `ConcurrentHashMap` and put them into `Caffeine Cache` with `expireAfterAcess` larger than `Prometheus` pull interval to manage the life cycle of those dynamic metrics and avoid memory explosion.
   > 
   > WDYT? @ruanwenjun
   
   I think there could be a better solution.
   
   We give users ability to choose whether monitor a specific workflow or not on UI. DS will generate related metrics for a specific workflow only if users choose to. This is more reasonable because users definitely do not need to monitor every workflow / task in detail. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] kezhenxu94 commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
kezhenxu94 commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1168318015

   > > Hi @EricGao888 , what kind of dynamic metrics do we have? You know, to fully use these metrics we need to display them and configure in Grafana, if we don't know these metrics in advance, how to configure these and use these metrics? Can you give some examples that we need dynamic metrics?
   > 
   > @kezhenxu94 Thanks for the reply!
   > 
   > Considering the following scenario: some user A works for a big financial company, and A chooses to use DS to schedule ETL tasks. Some of these tasks are not that important, but some are crucial, which may strongly relate to their financial business. Therefore, A wants some metrics to monitor these specific VIP tasks to avoid breaching SLAs for their customers. For example, task running duration, retry times, failover times, etc.
   > 
   > To easily configure these dynamic metrics, we could name them as ds.dynamic.task.<task_id>.duration. Since A does not need to monitor dynamic metrics for every tasks, just the VIP ones, he could configure those metrics in grafana manually.
   > 
   > But indeed, there is a concern that how users could get the task_id / workflow_id.
   
   Hi, task id can be added as tags and the users can filter by tags when configuring Grafana. Also, is task ID constant and won't change between different executions? If they change it's impractical to configure these constantly changing IDs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1161285760

   Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can [join our slack](https://s.apache.org/dolphinscheduler-slack) and send your question to channel `#troubleshooting`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] EricGao888 commented on issue #10525: [Feature][Metrics] Increase granularity of metrics

Posted by GitBox <gi...@apache.org>.
EricGao888 commented on issue #10525:
URL: https://github.com/apache/dolphinscheduler/issues/10525#issuecomment-1350546517

   Not sure whether we need this, so I'm closing it at this moment. Will reopen it if needed in the future, thanks : )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org