You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "dimberman (via GitHub)" <gi...@apache.org> on 2023/02/13 04:53:46 UTC

[GitHub] [airflow] dimberman commented on a diff in pull request #29245: fix code checking job names in sagemaker

dimberman commented on code in PR #29245:
URL: https://github.com/apache/airflow/pull/29245#discussion_r1103994385


##########
airflow/providers/amazon/aws/operators/sagemaker.py:
##########
@@ -106,6 +108,41 @@ def _create_integer_fields(self) -> None:
         """
         self.integer_fields = []
 
+    def _get_unique_job_name(
+        self, proposed_name: str, fail_if_exists: bool, describe_func: Callable[[str], Any]
+    ) -> str:
+        """
+        Returns the proposed name if it doesn't already exist, otherwise returns it with a random suffix.
+
+        :param proposed_name: Base name.
+        :param fail_if_exists: Will throw an error if a job with that name already exists
+            instead of finding a new name.
+        :param describe_func: The `describe_` function for that kind of job.
+            We use it as an O(1) way to check if a job exists.
+        """
+        job_name = proposed_name
+        while self._check_if_job_exists(job_name, describe_func):
+            # this while should loop only once in most cases, just setting it this way to regenerate a name
+            # in case there is a random number collision.
+            if fail_if_exists:
+                raise AirflowException(f"A SageMaker job with name {job_name} already exists.")
+            else:
+                job_name = f"{proposed_name}-{random.randint(0, 999999999):09}"

Review Comment:
   Is there a reason we're using a randomized name here? Perhaps something more programmatic could be more reproducible?



##########
airflow/providers/amazon/aws/operators/sagemaker.py:
##########
@@ -679,11 +697,11 @@ def __init__(
         self.check_interval = check_interval
         self.max_ingestion_time = max_ingestion_time
         self.check_if_job_exists = check_if_job_exists
-        if action_if_job_exists in ("increment", "fail"):
+        if action_if_job_exists in {"random", "increment", "fail"}:

Review Comment:
   I think that since we're adding in a deprecation warning that should be sufficient as users will have proper logs as to why they are getting unexpected naming.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org