You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/05 03:51:13 UTC

[GitHub] [airflow] coopergillan opened a new pull request #10164: Add type annotations to S3 hook module

coopergillan opened a new pull request #10164:
URL: https://github.com/apache/airflow/pull/10164


   ### What
   
   Add type annotations to the `S3Hook` module.
   
   Import `S3Transfer` object from `boto3` as well as `BytesIO` to add type annotations.
   
   ### Why
   
   Related: #9708
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#discussion_r465704305



##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -104,11 +106,11 @@ class S3Hook(AwsBaseHook):
         :class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
     """
 
-    def __init__(self, *args, **kwargs):
-        super().__init__(client_type='s3', *args, **kwargs)
+    def __init__(self, *args: str, **kwargs: str):

Review comment:
       ```suggestion
       def __init__(self, *args, **kwargs):
   ```
   We do not add type hints to those (usually it is `tuple` and `dictionary`)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coopergillan commented on a change in pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
coopergillan commented on a change in pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#discussion_r465715691



##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -104,11 +106,11 @@ class S3Hook(AwsBaseHook):
         :class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
     """
 
-    def __init__(self, *args, **kwargs):
-        super().__init__(client_type='s3', *args, **kwargs)
+    def __init__(self, *args: str, **kwargs: str):

Review comment:
       Got it, that makes sense. Any ideas on how to get the type hint to work here? ~Okay to ignore like I did in https://github.com/apache/airflow/pull/10164/commits/573d9a6d73cf565967598b806d411a4a84b6e567?~ Or are we content to just leave it be?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coopergillan commented on pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
coopergillan commented on pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#issuecomment-669980018


   @turbaszek - just made a few changes and rebased on `apache/master`. Let me know if anything else is needed here. I'll keep an eye on the builds also.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#discussion_r465706171



##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -198,8 +205,12 @@ def check_for_prefix(self, prefix, delimiter, bucket_name=None):
         return False if plist is None else prefix in plist
 
     @provide_bucket_name
-    def list_prefixes(self, bucket_name=None, prefix='', delimiter='',
-                      page_size=None, max_items=None):
+    def list_prefixes(self,
+                      bucket_name: Optional[str] = None,
+                      prefix: str = '',
+                      delimiter: str = '',
+                      page_size: Optional[int] = None,
+                      max_items: Optional[int] = None) -> Optional[list]:

Review comment:
       I think we should avoid strings as default values 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coopergillan commented on a change in pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
coopergillan commented on a change in pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#discussion_r465715292



##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -198,8 +205,12 @@ def check_for_prefix(self, prefix, delimiter, bucket_name=None):
         return False if plist is None else prefix in plist
 
     @provide_bucket_name
-    def list_prefixes(self, bucket_name=None, prefix='', delimiter='',
-                      page_size=None, max_items=None):
+    def list_prefixes(self,
+                      bucket_name: Optional[str] = None,
+                      prefix: str = '',
+                      delimiter: str = '',
+                      page_size: Optional[int] = None,
+                      max_items: Optional[int] = None) -> Optional[list]:

Review comment:
       I think it makes more sense if the idea behind `Optional[str]` is that it can be a `str` or `None`, yep.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coopergillan commented on a change in pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
coopergillan commented on a change in pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#discussion_r465724280



##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -104,11 +106,11 @@ class S3Hook(AwsBaseHook):
         :class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
     """
 
-    def __init__(self, *args, **kwargs):
-        super().__init__(client_type='s3', *args, **kwargs)
+    def __init__(self, *args: str, **kwargs: str):

Review comment:
       Looking at some other examples, it seems like you can only get the type hinting to work if and when there are other arguments. Also kinda makes me wonder if we need the `def __init__` at all or if we could just call the `super` right away.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ccage-simp commented on a change in pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
ccage-simp commented on a change in pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#discussion_r659780809



##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -304,7 +323,7 @@ def check_for_key(self, key, bucket_name=None):
 
     @provide_bucket_name
     @unify_bucket_name_and_key
-    def get_key(self, key, bucket_name=None):
+    def get_key(self, key: str, bucket_name: Optional[str] = None) -> S3Transfer:

Review comment:
       Thanks, Cooper. Saying it's giving pycharm fits might have been hyperbole by me. It's showing an error message due to the confusion over types:
   
   > Unresolved attribute reference 'download_fileobj' for class 'S3Transfer'
   
   When I step into the get_key function it's showing rtype as Object:
   
   >  :rtype: boto3.s3.Object
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#discussion_r465705578



##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -198,8 +205,12 @@ def check_for_prefix(self, prefix, delimiter, bucket_name=None):
         return False if plist is None else prefix in plist
 
     @provide_bucket_name
-    def list_prefixes(self, bucket_name=None, prefix='', delimiter='',
-                      page_size=None, max_items=None):
+    def list_prefixes(self,
+                      bucket_name: Optional[str] = None,
+                      prefix: str = '',
+                      delimiter: str = '',
+                      page_size: Optional[int] = None,
+                      max_items: Optional[int] = None) -> Optional[list]:

Review comment:
       ```suggestion
                         prefix: Optional[str] = None,
                         delimiter: Optional[str] = None,
                         page_size: Optional[int] = None,
                         max_items: Optional[int] = None) -> Optional[list]:
              prefix = prefix or ""
              delimiter = delimiter or ""
   ```
   WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coopergillan commented on pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
coopergillan commented on pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#issuecomment-668968891


   When I ran the `mypy` report for this module using the command from #9708 I kept getting 25 out of 26 checks. I can't figure out which check is missing:
   
   ```bash
   $ MP_DIR=$(mktemp -d); mypy --linecount-report ${MP_DIR} airflow/providers/amazon/aws/hooks; \
   cat "${MP_DIR}/linecount.txt" | grep providers.amazon.aws.hooks.s3 | grep -v example_dags | \
   awk '$4 != 0 {print 100.00 * ($3/$4), $3, $4, $5}'| sort -g
   ```
   
   However, it looks like when I add annotations to `__init__`:
   
   ```python
   def __init__(self, *args: str, **kwargs: str):
       super().__init__(client_type='s3', *args, **kwargs)
   ```
   
   I can get to 26/26, but get this error:
   
   ```bash
   airflow/providers/amazon/aws/hooks/s3.py:110: error: "__init__" of "AwsBaseHook" gets multiple
   values for keyword argument "client_type"
               super().__init__(client_type='s3', *args, **kwargs)
               ^
   ```
   
   Any ideas what might be going on there? I'm guessing it doesn't like something about how `client_type` is not specified in the args, but that seems like it should be fine due to the `super`.
   
   Either way, I just pushed up 573d9a6d73cf565967598b806d411a4a84b6e567 to add these and ignore the type check. That gets us to 26/26. I would be happy to revert this if we can explain better what's going on here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ccage-simp commented on a change in pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
ccage-simp commented on a change in pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#discussion_r659780809



##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -304,7 +323,7 @@ def check_for_key(self, key, bucket_name=None):
 
     @provide_bucket_name
     @unify_bucket_name_and_key
-    def get_key(self, key, bucket_name=None):
+    def get_key(self, key: str, bucket_name: Optional[str] = None) -> S3Transfer:

Review comment:
       Thanks, Cooper. Saying it's giving pycharm fits might have been hyperbole by me. It's showing an error message due to the confusion over types:
   
   > Unresolved attribute reference 'download_fileobj' for class 'S3Transfer'
   
   When I step into the get_key function it's showing rtype as Object:
   
   >  :rtype: boto3.s3.Object
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coopergillan commented on a change in pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
coopergillan commented on a change in pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#discussion_r654863829



##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -304,7 +323,7 @@ def check_for_key(self, key, bucket_name=None):
 
     @provide_bucket_name
     @unify_bucket_name_and_key
-    def get_key(self, key, bucket_name=None):
+    def get_key(self, key: str, bucket_name: Optional[str] = None) -> S3Transfer:

Review comment:
       Just dropped a comment over in #11297 as well: https://github.com/apache/airflow/issues/11297#issuecomment-864488490




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coopergillan commented on a change in pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
coopergillan commented on a change in pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#discussion_r465715691



##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -104,11 +106,11 @@ class S3Hook(AwsBaseHook):
         :class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
     """
 
-    def __init__(self, *args, **kwargs):
-        super().__init__(client_type='s3', *args, **kwargs)
+    def __init__(self, *args: str, **kwargs: str):

Review comment:
       Got it, that makes sense. Any ideas on how to get the type hint to work here? Okay to ignore like I did in https://github.com/apache/airflow/pull/10164/commits/573d9a6d73cf565967598b806d411a4a84b6e567?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek merged pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
turbaszek merged pull request #10164:
URL: https://github.com/apache/airflow/pull/10164


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coopergillan commented on a change in pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
coopergillan commented on a change in pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#discussion_r654863578



##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -304,7 +323,7 @@ def check_for_key(self, key, bucket_name=None):
 
     @provide_bucket_name
     @unify_bucket_name_and_key
-    def get_key(self, key, bucket_name=None):
+    def get_key(self, key: str, bucket_name: Optional[str] = None) -> S3Transfer:

Review comment:
       Interesting. Could you share the output you are seeing from pycharm? I cannot figure out where the `boto3.s3.Object` is defined. I have been through the `boto3`, `botocore` and `airflow` code as well as [some docs]() but I have not been able to.
   
   I think we need the [`boto3-type-annotations`](https://pypi.org/project/boto3-type-annotations/) to solve this issue. [Check out this Stack Overflow question also](https://stackoverflow.com/questions/52087307/adding-type-hinting-to-functions-that-return-boto3-objects).
   
   This is something `@mik-laj` once brought up in https://github.com/apache/airflow/issues/11297 it looks like. I don't have time to tackle it right now, but I could in the near future.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#discussion_r660614869



##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -304,7 +323,7 @@ def check_for_key(self, key, bucket_name=None):
 
     @provide_bucket_name
     @unify_bucket_name_and_key
-    def get_key(self, key, bucket_name=None):
+    def get_key(self, key: str, bucket_name: Optional[str] = None) -> S3Transfer:

Review comment:
       https://pypi.org/project/boto3-stubs/ might be a helpful project here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coopergillan commented on a change in pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
coopergillan commented on a change in pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#discussion_r465725720



##########
File path: airflow/providers/amazon/aws/hooks/s3.py
##########
@@ -104,11 +106,11 @@ class S3Hook(AwsBaseHook):
         :class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
     """
 
-    def __init__(self, *args, **kwargs):
-        super().__init__(client_type='s3', *args, **kwargs)
+    def __init__(self, *args: str, **kwargs: str):

Review comment:
       Going to just revert `573d9a6`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coopergillan edited a comment on pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
coopergillan edited a comment on pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#issuecomment-668968891


   When I ran the `mypy` report for this module using the command from #9708 I kept getting 25 out of 26 checks. I ~can't~ couldn't figure out which check ~is` was missing:
   
   ```bash
   $ MP_DIR=$(mktemp -d); mypy --linecount-report ${MP_DIR} airflow/providers/amazon/aws/hooks; \
   cat "${MP_DIR}/linecount.txt" | grep providers.amazon.aws.hooks.s3 | grep -v example_dags | \
   awk '$4 != 0 {print 100.00 * ($3/$4), $3, $4, $5}'| sort -g
   ```
   
   However, it looks like when I add annotations to `__init__`:
   
   ```python
   def __init__(self, *args: str, **kwargs: str):
       super().__init__(client_type='s3', *args, **kwargs)
   ```
   
   I can get to 26/26, but get this error:
   
   ```bash
   airflow/providers/amazon/aws/hooks/s3.py:110: error: "__init__" of "AwsBaseHook" gets multiple
   values for keyword argument "client_type"
               super().__init__(client_type='s3', *args, **kwargs)
               ^
   ```
   
   I'm guessing it doesn't like something about how `client_type` is not specified in the args, but that seems like it should be fine due to the `super`.
   
   Either way, I just pushed up 573d9a6d73cf565967598b806d411a4a84b6e567 to add these and ignore the type check. That gets us to 26/26. That said, any ideas what might be going on here?
   
   I would be happy to revert and change this if we can explain better what's going on or if we are perfectly content to have 25/26 methods covered, which I'm guessing we are =)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coopergillan commented on pull request #10164: Add type annotations to S3 hook module

Posted by GitBox <gi...@apache.org>.
coopergillan commented on pull request #10164:
URL: https://github.com/apache/airflow/pull/10164#issuecomment-669163522


   Can't tell if the CI failure is related or not.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org