You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Shlomi Cohen (Jira)" <ji...@apache.org> on 2020/01/03 13:18:00 UTC

[jira] [Commented] (AIRFLOW-6439) Sagemaker training operator does not consider metric_definition

    [ https://issues.apache.org/jira/browse/AIRFLOW-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007476#comment-17007476 ] 

Shlomi Cohen commented on AIRFLOW-6439:
---------------------------------------

Ok , i have managed to workaround this by Inheriting the SagemakerTrainingOperator and 

overriding the configurations in execute() method. like this 
{code:java}
// prepare base config from airflow.py
input_config = training_config(estimator=estimator, **task_config_params)  
//add metrics 
input_config["AlgorithmSpecification"]["MetricDefinitions"] = [
    {
        "Name": "execution_time",
        "Regex": "Execution Time - ([0-9.]*)"
    }
]{code}
 

> Sagemaker training operator does not consider metric_definition
> ---------------------------------------------------------------
>
>                 Key: AIRFLOW-6439
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6439
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: contrib
>    Affects Versions: 1.10.5
>            Reporter: Shlomi Cohen
>            Priority: Major
>
> Hi
> we have been using all Sagemaker operators in Airflow with great success.
> we now wanted to see some metrics in Cloudwatch for our training jobs, 
> i have tried to modify the estimator.metric_definition according to Sagemaker documentation , i even set the deprecated flag of sending metrics to True.
> {code:java}
> estimator.enable_cloudwatch_metrics = True
> estimator.metric_definitions = [
>     {
>         "Name": "execution_time",
>         "Regex": "Execution Time - ([0-9.]*)"
>     }
> ]
> {code}
> But this didn't yield any metrics in job definition as seen in Sagemaker training jobs console.
> while trying to debug the code , i saw that for Tuning jobs this block of code does consider metrics but  for training it does not. 
> {code:java}
> if tuner.metric_definitions is not None:
>     tune_config["TrainingJobDefinition"]["AlgorithmSpecification"][
>         "MetricDefinitions"
>     ] = tuner.metric_definitions
> {code}
> According to Sagemaker documentation - an Estimator should be able to get metric_definitions and that would cause the metrics to be reported to CloudWatch.
> here is the docs.
> [https://sagemaker.readthedocs.io/en/stable/estimators.html#sagemaker.estimator.EstimatorBase]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)