You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/10 17:47:39 UTC

[GitHub] [airflow] dstandish opened a new pull request #21500: S3KeySensor should parse bucket and key outside of poke

dstandish opened a new pull request #21500:
URL: https://github.com/apache/airflow/pull/21500


   Rather than relying on the poke method to parse the attrs we should either do it in `__init__` or in a cached property.
   
   Since it's a pretty insignifigant computation let's  just do it in `__init__`.
   
   And anyway, if the params are bad then best to get a warning before deploying your code.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] josh-fell commented on a change in pull request #21500: S3KeySensor to use S3Hook url parser

Posted by GitBox <gi...@apache.org>.
josh-fell commented on a change in pull request #21500:
URL: https://github.com/apache/airflow/pull/21500#discussion_r804755557



##########
File path: airflow/providers/amazon/aws/sensors/s3.py
##########
@@ -77,31 +77,24 @@ def __init__(
         **kwargs,
     ):
         super().__init__(**kwargs)
-
         self.bucket_name = bucket_name
         self.bucket_key = bucket_key
         self.wildcard_match = wildcard_match
         self.aws_conn_id = aws_conn_id
         self.verify = verify
         self.hook: Optional[S3Hook] = None
 
-    def poke(self, context: 'Context'):
-
+    def _resolve_bucket_and_key(self):
+        """If key is URI, parse bucket"""
         if self.bucket_name is None:
-            parsed_url = urlparse(self.bucket_key)
-            if parsed_url.netloc == '':
-                raise AirflowException('If key is a relative path from root, please provide a bucket_name')
-            self.bucket_name = parsed_url.netloc
-            self.bucket_key = parsed_url.path.lstrip('/')
+            self.bucket_name, self.bucket_key = S3Hook.parse_s3_url(self.bucket_key)
         else:
             parsed_url = urlparse(self.bucket_key)
             if parsed_url.scheme != '' or parsed_url.netloc != '':
-                raise AirflowException(
-                    'If bucket_name is provided, bucket_key'
-                    ' should be relative path from root'
-                    ' level, rather than a full s3:// url'
-                )
+                raise AirflowException('If bucket_name provided, bucket_key must be relative path, not URI.')

Review comment:
       ```suggestion
                   raise AirflowException('If bucket_name is provided, bucket_key must be a relative path, not a URI.')
   ```

##########
File path: airflow/providers/amazon/aws/sensors/s3.py
##########
@@ -77,31 +77,24 @@ def __init__(
         **kwargs,
     ):
         super().__init__(**kwargs)
-
         self.bucket_name = bucket_name
         self.bucket_key = bucket_key
         self.wildcard_match = wildcard_match
         self.aws_conn_id = aws_conn_id
         self.verify = verify
         self.hook: Optional[S3Hook] = None
 
-    def poke(self, context: 'Context'):
-
+    def _resolve_bucket_and_key(self):
+        """If key is URI, parse bucket"""
         if self.bucket_name is None:
-            parsed_url = urlparse(self.bucket_key)
-            if parsed_url.netloc == '':
-                raise AirflowException('If key is a relative path from root, please provide a bucket_name')
-            self.bucket_name = parsed_url.netloc
-            self.bucket_key = parsed_url.path.lstrip('/')
+            self.bucket_name, self.bucket_key = S3Hook.parse_s3_url(self.bucket_key)
         else:
             parsed_url = urlparse(self.bucket_key)

Review comment:
       Doesn't have to happen in this PR but I wonder if `S3Hook.parse_s3_url()` could be augmented with some sort of configurable check between validating a URI vs. a relative path? Centralize some of this logic there without have `urlparse()` is different places. Anyway, just a thought.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #21500: S3KeySensor to use S3Hook url parser

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #21500:
URL: https://github.com/apache/airflow/pull/21500#discussion_r804817967



##########
File path: airflow/providers/amazon/aws/sensors/s3.py
##########
@@ -77,31 +77,24 @@ def __init__(
         **kwargs,
     ):
         super().__init__(**kwargs)
-
         self.bucket_name = bucket_name
         self.bucket_key = bucket_key
         self.wildcard_match = wildcard_match
         self.aws_conn_id = aws_conn_id
         self.verify = verify
         self.hook: Optional[S3Hook] = None
 
-    def poke(self, context: 'Context'):
-
+    def _resolve_bucket_and_key(self):
+        """If key is URI, parse bucket"""
         if self.bucket_name is None:
-            parsed_url = urlparse(self.bucket_key)
-            if parsed_url.netloc == '':
-                raise AirflowException('If key is a relative path from root, please provide a bucket_name')
-            self.bucket_name = parsed_url.netloc
-            self.bucket_key = parsed_url.path.lstrip('/')
+            self.bucket_name, self.bucket_key = S3Hook.parse_s3_url(self.bucket_key)
         else:
             parsed_url = urlparse(self.bucket_key)
             if parsed_url.scheme != '' or parsed_url.netloc != '':
-                raise AirflowException(
-                    'If bucket_name is provided, bucket_key'
-                    ' should be relative path from root'
-                    ' level, rather than a full s3:// url'
-                )
+                raise AirflowException('If bucket_name provided, bucket_key must be relative path, not URI.')

Review comment:
       i was trying to keep it on one line :) lemme see if that works




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] josh-fell commented on pull request #21500: S3KeySensor should parse bucket and key outside of poke

Posted by GitBox <gi...@apache.org>.
josh-fell commented on pull request #21500:
URL: https://github.com/apache/airflow/pull/21500#issuecomment-1035347722


   Parsing `bucket_key` in the `__init__()` won't work if the value is an `XComArg` or a Jinja string. We've seen several issues, namely AttributeErrors, when some sort of string parsing is used on templated fields in `__init__()` since the value doesn't get rendered until the `execute()` method. Check out #14682 as an example.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #21500: S3KeySensor to use S3Hook url parser

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #21500:
URL: https://github.com/apache/airflow/pull/21500


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] josh-fell edited a comment on pull request #21500: S3KeySensor should parse bucket and key outside of poke

Posted by GitBox <gi...@apache.org>.
josh-fell edited a comment on pull request #21500:
URL: https://github.com/apache/airflow/pull/21500#issuecomment-1035357422


   FWIW, datatype checks on templated fields within `__init__()` have their issues as well. See #20347.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #21500: S3KeySensor should parse bucket and key outside of poke

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #21500:
URL: https://github.com/apache/airflow/pull/21500#issuecomment-1035362954


   > Parsing `bucket_key` in the `__init__()` won't work if the value is an `XComArg` or a Jinja string. We've seen several issues, namely AttributeErrors, when some sort of string parsing is used on templated fields in `__init__()` since the value doesn't get rendered until the `execute()` method. Check out #14682 as an example.
   
   darnit, good catch... maybe i'll go cached prop route


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] josh-fell commented on pull request #21500: S3KeySensor should parse bucket and key outside of poke

Posted by GitBox <gi...@apache.org>.
josh-fell commented on pull request #21500:
URL: https://github.com/apache/airflow/pull/21500#issuecomment-1035357422


   FWIW, datatype checks on templated fields have their issues within `__init__()` as well. See #20347.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org