You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2023/01/10 18:41:05 UTC

[GitHub] [airflow] vandonr-amz opened a new pull request, #28837: new operator to create a sagemaker experiment

vandonr-amz opened a new pull request, #28837:
URL: https://github.com/apache/airflow/pull/28837

   official doc for reference: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateExperiment.html
   
   An experiment is kind of a tag, or a directory, to regroup sagemaker jobs that belong together, and makes it easier to compare their outputs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vandonr-amz commented on a diff in pull request #28837: new operator to create a sagemaker experiment

Posted by GitBox <gi...@apache.org>.
vandonr-amz commented on code in PR #28837:
URL: https://github.com/apache/airflow/pull/28837#discussion_r1067297176


##########
tests/providers/amazon/aws/operators/test_sagemaker_base.py:
##########
@@ -40,3 +45,19 @@ def test_default_integer_fields(self):
         self.sagemaker.preprocess_config()
 
         assert self.sagemaker.integer_fields == EXPECTED_INTEGER_FIELDS
+
+
+class TestSageMakerExperimentOperator:
+    @patch("airflow.providers.amazon.aws.hooks.sagemaker.SageMakerHook.conn", new_callable=mock.PropertyMock)
+    def test_create_experiment(self, conn_mock):
+        conn_mock().create_experiment.return_value = {"ExperimentArn": "abcdef"}
+
+        op = SageMakerCreateExperimentOperator(
+            name="the name", description="the desc", tags={"the": "tags"}, task_id="whatever"

Review Comment:
   is this supposed to be replaced by the task ID ? It doesn't work for me when I run it locally (even after fixing the transform to happen in the execute rather than in the ctor)...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] josh-fell merged pull request #28837: new operator to create a sagemaker experiment

Posted by GitBox <gi...@apache.org>.
josh-fell merged PR #28837:
URL: https://github.com/apache/airflow/pull/28837


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] josh-fell commented on a diff in pull request #28837: new operator to create a sagemaker experiment

Posted by GitBox <gi...@apache.org>.
josh-fell commented on code in PR #28837:
URL: https://github.com/apache/airflow/pull/28837#discussion_r1067342839


##########
airflow/providers/amazon/aws/operators/sagemaker.py:
##########
@@ -1049,3 +1050,53 @@ def execute(self, context: Context) -> dict | None:
             self.check_interval,
         )
         return best
+
+
+class SageMakerCreateExperimentOperator(SageMakerBaseOperator):
+    """
+    Creates a SageMaker experiment, to be then associated to jobs etc.
+
+    .. seealso::
+        For more information on how to use this operator, take a look at the guide:
+        :ref:`howto/operator:SageMakerCreateExperimentOperator`
+
+    :param name: name of the experiment, must be unique within the AWS account
+    :param description: description of the experiment, optional
+    :param tags: tags to attach to the experiment, optional
+    :param aws_conn_id: The AWS connection ID to use.
+
+    :returns: the ARN of the experiment created, though experiments are referred to by name
+    """
+
+    template_fields: Sequence[str] = (
+        "name",
+        "description",
+        "tags",
+    )
+
+    def __init__(
+        self,
+        *,
+        name: str,
+        description: str | None = None,
+        tags: dict | None = None,
+        aws_conn_id: str = DEFAULT_CONN_ID,
+        **kwargs,
+    ):
+        super().__init__(config={}, aws_conn_id=aws_conn_id, **kwargs)
+        self.name = name
+        self.description = description
+        self.tags = tags or []

Review Comment:
   ```suggestion
           self.tags = tags or {}
   ```
   Otherwise the calculation for `tag_set` will fail if no `tags` are provided, and this will align with the typing of `tags` too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] josh-fell commented on a diff in pull request #28837: new operator to create a sagemaker experiment

Posted by GitBox <gi...@apache.org>.
josh-fell commented on code in PR #28837:
URL: https://github.com/apache/airflow/pull/28837#discussion_r1066521820


##########
airflow/providers/amazon/aws/operators/sagemaker.py:
##########
@@ -1049,3 +1050,54 @@ def execute(self, context: Context) -> dict | None:
             self.check_interval,
         )
         return best
+
+
+class SageMakerCreateExperimentOperator(SageMakerBaseOperator):
+    """
+    Creates a SageMaker experiment, to be then associated to jobs etc.
+
+    .. seealso::
+        For more information on how to use this operator, take a look at the guide:
+        :ref:`howto/operator:SageMakerCreateExperimentOperator`
+
+    :param name: name of the experiment, must be unique within the AWS account
+    :param description: description of the experiment, optional
+    :param tags: tags to attach to the experiment, optional
+
+    :returns: the ARN of the experiment created, though experiments are referred to by name
+    """
+
+    template_fields: Sequence[str] = (
+        "name",
+        "description",
+        "tags",
+    )
+
+    def __init__(
+        self,
+        *,
+        name: str,
+        description: str | None = None,
+        tags: dict | None = None,
+        aws_conn_id: str = DEFAULT_CONN_ID,
+        **kwargs,
+    ):
+        super().__init__(config={}, aws_conn_id=aws_conn_id, **kwargs)
+        self.name = name
+        self.description = description
+        if tags:
+            self.tags_set = [{"Key": kvp[0], "Value": kvp[1]} for kvp in tags.items()]
+        else:
+            self.tags_set = []

Review Comment:
   There is a possibility of unexpected behavior here with `tags` being a template field. Template field values are rendered _right before_ `execute()` is called but after an operator is initialized. So a user could specify 
   ```python
   SageMakerCreateExperimentOperator(..., tag={"my_key": "{{ var.value.my_tag }}")
   ```
   when calling the operator, but `tag_set` would be literally `[{"Key": "my_key", "Value": "{{ var.value.my_tag }}"}]` instead of whatever `{{ var.value.my_tag }}` would eventually render to be because the templated parts of `tag` haven't been rendered yet.
   
   IMO the easiest thing to do is move this code block under `execute()` and not worry about this edge case.



##########
airflow/providers/amazon/aws/operators/sagemaker.py:
##########
@@ -1049,3 +1050,54 @@ def execute(self, context: Context) -> dict | None:
             self.check_interval,
         )
         return best
+
+
+class SageMakerCreateExperimentOperator(SageMakerBaseOperator):
+    """
+    Creates a SageMaker experiment, to be then associated to jobs etc.
+
+    .. seealso::
+        For more information on how to use this operator, take a look at the guide:
+        :ref:`howto/operator:SageMakerCreateExperimentOperator`
+
+    :param name: name of the experiment, must be unique within the AWS account
+    :param description: description of the experiment, optional
+    :param tags: tags to attach to the experiment, optional
+
+    :returns: the ARN of the experiment created, though experiments are referred to by name
+    """
+
+    template_fields: Sequence[str] = (
+        "name",
+        "description",
+        "tags",
+    )
+
+    def __init__(
+        self,
+        *,
+        name: str,
+        description: str | None = None,
+        tags: dict | None = None,
+        aws_conn_id: str = DEFAULT_CONN_ID,

Review Comment:
   I think it would be useful if this param was included in the docstring as well. In the Python API doc for the operator, users explicitly then know this param is available should they not want to use the default connection ID.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vandonr-amz commented on a diff in pull request #28837: new operator to create a sagemaker experiment

Posted by GitBox <gi...@apache.org>.
vandonr-amz commented on code in PR #28837:
URL: https://github.com/apache/airflow/pull/28837#discussion_r1067292233


##########
airflow/providers/amazon/aws/operators/sagemaker.py:
##########
@@ -1049,3 +1050,54 @@ def execute(self, context: Context) -> dict | None:
             self.check_interval,
         )
         return best
+
+
+class SageMakerCreateExperimentOperator(SageMakerBaseOperator):
+    """
+    Creates a SageMaker experiment, to be then associated to jobs etc.
+
+    .. seealso::
+        For more information on how to use this operator, take a look at the guide:
+        :ref:`howto/operator:SageMakerCreateExperimentOperator`
+
+    :param name: name of the experiment, must be unique within the AWS account
+    :param description: description of the experiment, optional
+    :param tags: tags to attach to the experiment, optional
+
+    :returns: the ARN of the experiment created, though experiments are referred to by name
+    """
+
+    template_fields: Sequence[str] = (
+        "name",
+        "description",
+        "tags",
+    )
+
+    def __init__(
+        self,
+        *,
+        name: str,
+        description: str | None = None,
+        tags: dict | None = None,
+        aws_conn_id: str = DEFAULT_CONN_ID,
+        **kwargs,
+    ):
+        super().__init__(config={}, aws_conn_id=aws_conn_id, **kwargs)
+        self.name = name
+        self.description = description
+        if tags:
+            self.tags_set = [{"Key": kvp[0], "Value": kvp[1]} for kvp in tags.items()]
+        else:
+            self.tags_set = []

Review Comment:
   ah good point, I'll change that (and in the other PR as you mentioned)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vandonr-amz commented on a diff in pull request #28837: new operator to create a sagemaker experiment

Posted by GitBox <gi...@apache.org>.
vandonr-amz commented on code in PR #28837:
URL: https://github.com/apache/airflow/pull/28837#discussion_r1067380331


##########
airflow/providers/amazon/aws/operators/sagemaker.py:
##########
@@ -1049,3 +1050,53 @@ def execute(self, context: Context) -> dict | None:
             self.check_interval,
         )
         return best
+
+
+class SageMakerCreateExperimentOperator(SageMakerBaseOperator):
+    """
+    Creates a SageMaker experiment, to be then associated to jobs etc.
+
+    .. seealso::
+        For more information on how to use this operator, take a look at the guide:
+        :ref:`howto/operator:SageMakerCreateExperimentOperator`
+
+    :param name: name of the experiment, must be unique within the AWS account
+    :param description: description of the experiment, optional
+    :param tags: tags to attach to the experiment, optional
+    :param aws_conn_id: The AWS connection ID to use.
+
+    :returns: the ARN of the experiment created, though experiments are referred to by name
+    """
+
+    template_fields: Sequence[str] = (
+        "name",
+        "description",
+        "tags",
+    )
+
+    def __init__(
+        self,
+        *,
+        name: str,
+        description: str | None = None,
+        tags: dict | None = None,
+        aws_conn_id: str = DEFAULT_CONN_ID,
+        **kwargs,
+    ):
+        super().__init__(config={}, aws_conn_id=aws_conn_id, **kwargs)
+        self.name = name
+        self.description = description
+        self.tags = tags or []

Review Comment:
   yeah right my mistake



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ashb commented on a diff in pull request #28837: new operator to create a sagemaker experiment

Posted by GitBox <gi...@apache.org>.
ashb commented on code in PR #28837:
URL: https://github.com/apache/airflow/pull/28837#discussion_r1066886052


##########
airflow/providers/amazon/aws/operators/sagemaker.py:
##########
@@ -1049,3 +1050,54 @@ def execute(self, context: Context) -> dict | None:
             self.check_interval,
         )
         return best
+
+
+class SageMakerCreateExperimentOperator(SageMakerBaseOperator):
+    """
+    Creates a SageMaker experiment, to be then associated to jobs etc.
+
+    .. seealso::
+        For more information on how to use this operator, take a look at the guide:
+        :ref:`howto/operator:SageMakerCreateExperimentOperator`
+
+    :param name: name of the experiment, must be unique within the AWS account
+    :param description: description of the experiment, optional
+    :param tags: tags to attach to the experiment, optional
+
+    :returns: the ARN of the experiment created, though experiments are referred to by name
+    """
+
+    template_fields: Sequence[str] = (
+        "name",
+        "description",
+        "tags",
+    )
+
+    def __init__(
+        self,
+        *,
+        name: str,
+        description: str | None = None,
+        tags: dict | None = None,
+        aws_conn_id: str = DEFAULT_CONN_ID,
+        **kwargs,
+    ):
+        super().__init__(config={}, aws_conn_id=aws_conn_id, **kwargs)
+        self.name = name
+        self.description = description
+        if tags:
+            self.tags_set = [{"Key": kvp[0], "Value": kvp[1]} for kvp in tags.items()]
+        else:
+            self.tags_set = []

Review Comment:
   Yup, this'll need fixing (and in your other PR too) to do the "formatting" in execute() instead.



##########
tests/providers/amazon/aws/operators/test_sagemaker_base.py:
##########
@@ -40,3 +45,19 @@ def test_default_integer_fields(self):
         self.sagemaker.preprocess_config()
 
         assert self.sagemaker.integer_fields == EXPECTED_INTEGER_FIELDS
+
+
+class TestSageMakerExperimentOperator:
+    @patch("airflow.providers.amazon.aws.hooks.sagemaker.SageMakerHook.conn", new_callable=mock.PropertyMock)
+    def test_create_experiment(self, conn_mock):
+        conn_mock().create_experiment.return_value = {"ExperimentArn": "abcdef"}
+
+        op = SageMakerCreateExperimentOperator(
+            name="the name", description="the desc", tags={"the": "tags"}, task_id="whatever"

Review Comment:
   ```suggestion
               name="the name", description="the desc", tags={"the": "{{ task.task_id }}"}, task_id="whatever"
   ```



##########
tests/providers/amazon/aws/operators/test_sagemaker_base.py:
##########
@@ -40,3 +45,19 @@ def test_default_integer_fields(self):
         self.sagemaker.preprocess_config()
 
         assert self.sagemaker.integer_fields == EXPECTED_INTEGER_FIELDS
+
+
+class TestSageMakerExperimentOperator:
+    @patch("airflow.providers.amazon.aws.hooks.sagemaker.SageMakerHook.conn", new_callable=mock.PropertyMock)
+    def test_create_experiment(self, conn_mock):
+        conn_mock().create_experiment.return_value = {"ExperimentArn": "abcdef"}
+
+        op = SageMakerCreateExperimentOperator(
+            name="the name", description="the desc", tags={"the": "tags"}, task_id="whatever"
+        )
+        ret = op.execute(None)
+
+        assert ret == "abcdef"
+        conn_mock().create_experiment.assert_called_once_with(
+            ExperimentName="the name", Description="the desc", Tags=[{"Key": "the", "Value": "tags"}]

Review Comment:
   ```suggestion
               ExperimentName="the name", Description="the desc", Tags=[{"Key": "the", "Value": "the name"}]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org