You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "vincbeck (via GitHub)" <gi...@apache.org> on 2023/02/16 21:58:11 UTC

[GitHub] [airflow] vincbeck opened a new pull request, #29581: AWS system test sagemaker-endpoint: archive logs

vincbeck opened a new pull request, #29581:
URL: https://github.com/apache/airflow/pull/29581

   When creating a new Sagemaker endpoint in the system test `example_sagemaker-endpoint`, Sagemaker itself creates a new log group. For some reasons, Sagemaker keeps pushing logs for some time (some seconds) even after all the resources created in this system test are deleted (log group included). As a consequence, a new log group gets created every-time the system test is executed without being removed. In order to stop accumulating log groups if you execute this system test frequently (lie our CI), setting a retention policy of 1 day for this given log group


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #29581: AWS system test sagemaker-endpoint: archive logs

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on code in PR #29581:
URL: https://github.com/apache/airflow/pull/29581#discussion_r1111026449


##########
tests/system/providers/amazon/aws/example_sagemaker_endpoint.py:
##########
@@ -88,6 +88,11 @@ def delete_endpoint(endpoint_name):
     boto3.client("sagemaker").delete_endpoint(EndpointName=endpoint_name)
 
 
+@task(trigger_rule=TriggerRule.ALL_DONE)
+def archive_logs(log_group_name):
+    boto3.client("logs").put_retention_policy(logGroupName=log_group_name, retentionInDays=1)

Review Comment:
   Oh... got you. That a bit strange Sagemaker behaviour that doesn't allow specify name of Log Group and it create only [predefined](https://docs.aws.amazon.com/sagemaker/latest/dg/logging-cloudwatch.html)?
   
   So in real environment `/aws/sagemaker/Endpoints/[EndpointName]` in the end could hit limit of Log Group per region per account (looong-long time) or keep enormous number of  Log Groups in Cloudwatch 🙄 
   Because even if Log Streams would removed by retention period the Log Group still exists in account.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis merged pull request #29581: AWS system test sagemaker-endpoint: archive logs

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis merged PR #29581:
URL: https://github.com/apache/airflow/pull/29581


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #29581: AWS system test sagemaker-endpoint: archive logs

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on code in PR #29581:
URL: https://github.com/apache/airflow/pull/29581#discussion_r1110107924


##########
tests/system/providers/amazon/aws/example_sagemaker_endpoint.py:
##########
@@ -88,6 +88,11 @@ def delete_endpoint(endpoint_name):
     boto3.client("sagemaker").delete_endpoint(EndpointName=endpoint_name)
 
 
+@task(trigger_rule=TriggerRule.ALL_DONE)
+def archive_logs(log_group_name):
+    boto3.client("logs").put_retention_policy(logGroupName=log_group_name, retentionInDays=1)

Review Comment:
   Finally found https://github.com/apache/airflow/pull/28819#discussion_r1069787616 previous discussion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on a diff in pull request #29581: AWS system test sagemaker-endpoint: archive logs

Posted by "vincbeck (via GitHub)" <gi...@apache.org>.
vincbeck commented on code in PR #29581:
URL: https://github.com/apache/airflow/pull/29581#discussion_r1113261654


##########
tests/system/providers/amazon/aws/example_sagemaker_endpoint.py:
##########
@@ -88,6 +88,11 @@ def delete_endpoint(endpoint_name):
     boto3.client("sagemaker").delete_endpoint(EndpointName=endpoint_name)
 
 
+@task(trigger_rule=TriggerRule.ALL_DONE)
+def archive_logs(log_group_name):
+    boto3.client("logs").put_retention_policy(logGroupName=log_group_name, retentionInDays=1)

Review Comment:
   Correct! I also do not like making this change but it is definitely a bug on Sagemaker side but ...  ¯\_(ツ)_/¯
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on a diff in pull request #29581: AWS system test sagemaker-endpoint: archive logs

Posted by "vincbeck (via GitHub)" <gi...@apache.org>.
vincbeck commented on code in PR #29581:
URL: https://github.com/apache/airflow/pull/29581#discussion_r1113261654


##########
tests/system/providers/amazon/aws/example_sagemaker_endpoint.py:
##########
@@ -88,6 +88,11 @@ def delete_endpoint(endpoint_name):
     boto3.client("sagemaker").delete_endpoint(EndpointName=endpoint_name)
 
 
+@task(trigger_rule=TriggerRule.ALL_DONE)
+def archive_logs(log_group_name):
+    boto3.client("logs").put_retention_policy(logGroupName=log_group_name, retentionInDays=1)

Review Comment:
   Correct! I do not like making this change as well since it is definitely a bug on Sagemaker side but ...  ¯\_(ツ)_/¯
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on a diff in pull request #29581: AWS system test sagemaker-endpoint: archive logs

Posted by "vincbeck (via GitHub)" <gi...@apache.org>.
vincbeck commented on code in PR #29581:
URL: https://github.com/apache/airflow/pull/29581#discussion_r1110293535


##########
tests/system/providers/amazon/aws/example_sagemaker_endpoint.py:
##########
@@ -88,6 +88,11 @@ def delete_endpoint(endpoint_name):
     boto3.client("sagemaker").delete_endpoint(EndpointName=endpoint_name)
 
 
+@task(trigger_rule=TriggerRule.ALL_DONE)
+def archive_logs(log_group_name):
+    boto3.client("logs").put_retention_policy(logGroupName=log_group_name, retentionInDays=1)

Review Comment:
   The only issue here is `_purge_logs` do not work because of the reason explained in the description. Even though the log group is removed, it gets recreated because some logs are still being pushed after the system test ends. The solution I can think of is to add a parameter `successful_test_action` (or a better name) with the default value being `_purge_logs`. That way, for this test, we could pass `archive_logs` as value of `successful_test_action`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #29581: AWS system test sagemaker-endpoint: archive logs

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on code in PR #29581:
URL: https://github.com/apache/airflow/pull/29581#discussion_r1109696047


##########
tests/system/providers/amazon/aws/example_sagemaker_endpoint.py:
##########
@@ -88,6 +88,11 @@ def delete_endpoint(endpoint_name):
     boto3.client("sagemaker").delete_endpoint(EndpointName=endpoint_name)
 
 
+@task(trigger_rule=TriggerRule.ALL_DONE)
+def archive_logs(log_group_name):
+    boto3.client("logs").put_retention_policy(logGroupName=log_group_name, retentionInDays=1)

Review Comment:
   Is 1 day would be enough? In some previous PR which also related to retention period for CloudWatch logs in system tests (can't find it quick) we decide as far a i remember 14 days
   
   More expensive it write Gigabytes/Terabytes of logs rather than store it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on a diff in pull request #29581: AWS system test sagemaker-endpoint: archive logs

Posted by "vincbeck (via GitHub)" <gi...@apache.org>.
vincbeck commented on code in PR #29581:
URL: https://github.com/apache/airflow/pull/29581#discussion_r1110012488


##########
tests/system/providers/amazon/aws/example_sagemaker_endpoint.py:
##########
@@ -88,6 +88,11 @@ def delete_endpoint(endpoint_name):
     boto3.client("sagemaker").delete_endpoint(EndpointName=endpoint_name)
 
 
+@task(trigger_rule=TriggerRule.ALL_DONE)
+def archive_logs(log_group_name):
+    boto3.client("logs").put_retention_policy(logGroupName=log_group_name, retentionInDays=1)

Review Comment:
   Good question. My thinking was, we currently remove the log group so let's put the smallest retention period possible (that is one day). Your question brings a project/idea we were thinking working on, which is to keeps logs of resources which have been created in a failed system test. That way, it could simplify investigation upon failure. But I think we should come up with a solution for all system tests. That should be another PR to me



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org