You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2023/01/03 22:41:38 UTC

[GitHub] [airflow] dstandish opened a new pull request, #28710: Apply "unify bucket and key" before "provide bucket"

dstandish opened a new pull request, #28710:
URL: https://github.com/apache/airflow/pull/28710

   Previously if user provided full key it may be overwritten by conn bucket but now we fix ordering of decorators and this won't happen.
   
   This fix doesn't seem to break backcompat because previously you'd get call(Bucket='bucket', Key='s3://other-bucket/file.txt') which should fail anyway.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
dstandish commented on PR #28710:
URL: https://github.com/apache/airflow/pull/28710#issuecomment-1373423090

   ok i think i have gotten the test issue fixed now... it was the result of reloading the s3 module.  after that, mocks don't work.
   
   separately i had a chance to look at the change @feluelle and i reverted, in small part cus it was failing :) but mainly because...
   
   this is sort of why i objected to doing it that way.
   with your way, we see a bunch of test values but we don't really know what they are there for.  sure you could use param class and add an ID but that is a lot more noise. and it risks that the params differ from the description (id).
   meanwhile, my params, to borrow a phrase, "descriptive and meaningful phrases", e.g. 
   `"unify", "no_conn", "no_bucket", "full_key"`
   this scenario covers the case unify first, with no connection (or no schema in connection), and no bucket provided as kwarg, and with a full key.
   meanwhile, with yours we see
   ```None, {"key": "s3://key_bucket/key.txt"}```
   if we look at this, how are we to know what it's testing?  and when we look at all the params together, how do we know if we have missed a case? 
   with mine, we just need to verify that we all combinations of the 4 binary choices. when using the values directly, they don't quite combine that way.
   and, this is also related to why i didn't want to make them booleans, which @o-nikolas suggested.  by using the string values, we get good test names. test names that clearly indicate the scenario tested.
   anyway, not that these other approaches aren't fine, just explaining my choices here.
   
   thanks
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] feluelle commented on a diff in pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
feluelle commented on code in PR #28710:
URL: https://github.com/apache/airflow/pull/28710#discussion_r1061456165


##########
airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -38,17 +40,22 @@
 from botocore.exceptions import ClientError
 
 from airflow.exceptions import AirflowException
+from airflow.providers.amazon.aws.exceptions import S3HookUriParseFailure
 from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
 from airflow.utils.helpers import chunks
 
 T = TypeVar("T", bound=Callable)
 
+logger = logging.getLogger(__name__)
+
 
 def provide_bucket_name(func: T) -> T:
     """
     Function decorator that provides a bucket name taken from the connection
     in case no bucket name has been passed to the function.
     """
+    if getattr(func, "_unify_bucket_name_and_key_wrapped", False) is True:

Review Comment:
   ```suggestion
       if hasattr(func, "_unify_bucket_name_and_key_wrapped"):
   ```



##########
tests/providers/amazon/aws/hooks/test_s3.py:
##########
@@ -825,3 +826,143 @@ def test_delete_bucket_tagging_with_no_tags(self):
 
         with pytest.raises(ClientError, match=r".*NoSuchTagSet.*"):
             hook.get_bucket_tagging(bucket_name="new_bucket")
+
+
+@patch("airflow.hooks.base.BaseHook.get_connection")
+@pytest.mark.parametrize(
+    "expected",
+    [
+        # full key
+        # no conn - no bucket - full key
+        param(["key_bucket", "key.txt"], id="unify-no_conn-no_bucket-full_key"),
+        param(["key_bucket", "key.txt"], id="provide-no_conn-no_bucket-full_key"),
+        # no conn - with bucket - full key
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="unify-no_conn-with_bucket-full_key"),
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="provide-no_conn-with_bucket-full_key"),
+        # with conn - no bucket - full key
+        param(["conn_bucket", "s3://key_bucket/key.txt"], id="provide-with_conn-no_bucket-full_key"),
+        param(["key_bucket", "key.txt"], id="unify-with_conn-no_bucket-full_key"),
+        # with conn - with bucket - full key
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="provide-with_conn-with_bucket-full_key"),
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="unify-with_conn-with_bucket-full_key"),
+        # rel key
+        # no conn - no bucket - rel key
+        param([None, "key.txt"], id="unify-no_conn-no_bucket-rel_key"),
+        param([None, "key.txt"], id="provide-no_conn-no_bucket-rel_key"),
+        # no conn - with bucket - rel key
+        param(["kwargs_bucket", "key.txt"], id="unify-no_conn-with_bucket-rel_key"),
+        param(["kwargs_bucket", "key.txt"], id="provide-no_conn-with_bucket-rel_key"),
+        # with conn - no bucket - rel key
+        param(["conn_bucket", "key.txt"], id="provide-with_conn-no_bucket-rel_key"),
+        param(["conn_bucket", "key.txt"], id="unify-with_conn-no_bucket-rel_key"),
+        # with conn - with bucket - rel key
+        param(["kwargs_bucket", "key.txt"], id="provide-with_conn-with_bucket-rel_key"),
+        param(["kwargs_bucket", "key.txt"], id="unify-with_conn-with_bucket-rel_key"),
+    ],
+)
+def test_unify_and_provide_bucket_name_combination(mock_base, expected, request, caplog):
+    """
+    Verify what is the outcome when the unify_bucket_name_and_key and provide_bucket_name
+    decorators are combined.
+    """
+    tokens = request.node.callspec.id.split("-")
+    assert len(tokens) == 4
+    if "with_conn" in tokens:
+        c = Connection(schema="conn_bucket")
+    else:
+        c = Connection(schema=None)
+    key = "key.txt" if "rel_key" in tokens else "s3://key_bucket/key.txt"
+    if "with_bucket" in tokens:
+        kwargs = {"bucket_name": "kwargs_bucket", "key": key}
+    else:
+        kwargs = {"key": key}
+
+    mock_base.return_value = c
+    if "unify" in tokens:  # unify to be processed before provide

Review Comment:
   Why are we deciding based on param id? Can't we have the different Connection schema, kwargs and so on in the params?



##########
tests/providers/amazon/aws/hooks/test_s3.py:
##########
@@ -825,3 +826,143 @@ def test_delete_bucket_tagging_with_no_tags(self):
 
         with pytest.raises(ClientError, match=r".*NoSuchTagSet.*"):
             hook.get_bucket_tagging(bucket_name="new_bucket")
+
+
+@patch("airflow.hooks.base.BaseHook.get_connection")
+@pytest.mark.parametrize(
+    "expected",
+    [
+        # full key
+        # no conn - no bucket - full key
+        param(["key_bucket", "key.txt"], id="unify-no_conn-no_bucket-full_key"),
+        param(["key_bucket", "key.txt"], id="provide-no_conn-no_bucket-full_key"),
+        # no conn - with bucket - full key
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="unify-no_conn-with_bucket-full_key"),
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="provide-no_conn-with_bucket-full_key"),
+        # with conn - no bucket - full key
+        param(["conn_bucket", "s3://key_bucket/key.txt"], id="provide-with_conn-no_bucket-full_key"),
+        param(["key_bucket", "key.txt"], id="unify-with_conn-no_bucket-full_key"),
+        # with conn - with bucket - full key
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="provide-with_conn-with_bucket-full_key"),
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="unify-with_conn-with_bucket-full_key"),
+        # rel key
+        # no conn - no bucket - rel key
+        param([None, "key.txt"], id="unify-no_conn-no_bucket-rel_key"),
+        param([None, "key.txt"], id="provide-no_conn-no_bucket-rel_key"),
+        # no conn - with bucket - rel key
+        param(["kwargs_bucket", "key.txt"], id="unify-no_conn-with_bucket-rel_key"),
+        param(["kwargs_bucket", "key.txt"], id="provide-no_conn-with_bucket-rel_key"),
+        # with conn - no bucket - rel key
+        param(["conn_bucket", "key.txt"], id="provide-with_conn-no_bucket-rel_key"),
+        param(["conn_bucket", "key.txt"], id="unify-with_conn-no_bucket-rel_key"),
+        # with conn - with bucket - rel key
+        param(["kwargs_bucket", "key.txt"], id="provide-with_conn-with_bucket-rel_key"),
+        param(["kwargs_bucket", "key.txt"], id="unify-with_conn-with_bucket-rel_key"),
+    ],
+)
+def test_unify_and_provide_bucket_name_combination(mock_base, expected, request, caplog):
+    """
+    Verify what is the outcome when the unify_bucket_name_and_key and provide_bucket_name
+    decorators are combined.
+    """
+    tokens = request.node.callspec.id.split("-")
+    assert len(tokens) == 4
+    if "with_conn" in tokens:
+        c = Connection(schema="conn_bucket")
+    else:
+        c = Connection(schema=None)
+    key = "key.txt" if "rel_key" in tokens else "s3://key_bucket/key.txt"
+    if "with_bucket" in tokens:
+        kwargs = {"bucket_name": "kwargs_bucket", "key": key}
+    else:
+        kwargs = {"key": key}
+
+    mock_base.return_value = c
+    if "unify" in tokens:  # unify to be processed before provide
+
+        class MyHook(S3Hook):
+            @unify_bucket_name_and_key
+            @provide_bucket_name
+            def do_something(self, bucket_name=None, key=None):
+                return bucket_name, key
+
+    else:
+
+        with caplog.at_level("WARNING"):
+
+            class MyHook(S3Hook):
+                @provide_bucket_name
+                @unify_bucket_name_and_key
+                def do_something(self, bucket_name=None, key=None):
+                    return bucket_name, key
+
+        assert caplog.records[0].message == "`unify_bucket_name_and_key` should wrap `provide_bucket_name`."
+    hook = MyHook()
+    if expected == "__fail__":
+        with pytest.raises(Exception, match='Please provide a bucket name using a valid format: "key.txt"'):
+            hook.do_something(**kwargs)
+    else:
+        assert list(hook.do_something(**kwargs)) == expected

Review Comment:
   Ideally, we should not have if/else branches doing different kind of assertions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #28710:
URL: https://github.com/apache/airflow/pull/28710#issuecomment-1372083302

   Glad to see we got to common conclusions even after some initial strugles in communication :) 
   
   BTW. 
   
   > I think writing tests is not about being DRY. It much more important that they are readable and maintainable.
   
   Yep. 100% agree. DRY is important but for tests DAMP is **importanter** :D :D 
   
   https://enterprisecraftsmanship.com/posts/dry-damp-unit-tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on a diff in pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
dstandish commented on code in PR #28710:
URL: https://github.com/apache/airflow/pull/28710#discussion_r1061153019


##########
airflow/providers/amazon/aws/exceptions.py:
##########
@@ -42,3 +44,7 @@ def __init__(self, failures: list, message: str):
 
     def __reduce__(self):
         return EcsOperatorError, (self.failures, self.message)
+
+
+class S3HookUriParseFailure(AirflowException):
+    """When parse_s3_url fails to parse URL, this error is thrown."""

Review Comment:
   it could, but i chose to do this because it was previously raising airflowexception  ... basically... backcompat...  anything that was previously catching will still catch.  if we change, it won't



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
dstandish commented on PR #28710:
URL: https://github.com/apache/airflow/pull/28710#issuecomment-1371446257

   ok @o-nikolas and @feluelle i have updated these tests to move the "tokens" to individual params.  it's not quite "putting the test values in the tuple", which, if i understood correctly, is what yall were thinking of (and what i more or less strenuously object to doing, myself anyway) but hopefully it's "orthodox enough" to not raise anyone's hackles (i kid) while still being relatively compact and readable.  πŸ₯³ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on a diff in pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
dstandish commented on code in PR #28710:
URL: https://github.com/apache/airflow/pull/28710#discussion_r1061865510


##########
tests/providers/amazon/aws/hooks/test_s3.py:
##########
@@ -825,3 +826,143 @@ def test_delete_bucket_tagging_with_no_tags(self):
 
         with pytest.raises(ClientError, match=r".*NoSuchTagSet.*"):
             hook.get_bucket_tagging(bucket_name="new_bucket")
+
+
+@patch("airflow.hooks.base.BaseHook.get_connection")
+@pytest.mark.parametrize(
+    "expected",
+    [
+        # full key
+        # no conn - no bucket - full key
+        param(["key_bucket", "key.txt"], id="unify-no_conn-no_bucket-full_key"),
+        param(["key_bucket", "key.txt"], id="provide-no_conn-no_bucket-full_key"),
+        # no conn - with bucket - full key
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="unify-no_conn-with_bucket-full_key"),
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="provide-no_conn-with_bucket-full_key"),
+        # with conn - no bucket - full key
+        param(["conn_bucket", "s3://key_bucket/key.txt"], id="provide-with_conn-no_bucket-full_key"),
+        param(["key_bucket", "key.txt"], id="unify-with_conn-no_bucket-full_key"),
+        # with conn - with bucket - full key
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="provide-with_conn-with_bucket-full_key"),
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="unify-with_conn-with_bucket-full_key"),
+        # rel key
+        # no conn - no bucket - rel key
+        param([None, "key.txt"], id="unify-no_conn-no_bucket-rel_key"),
+        param([None, "key.txt"], id="provide-no_conn-no_bucket-rel_key"),
+        # no conn - with bucket - rel key
+        param(["kwargs_bucket", "key.txt"], id="unify-no_conn-with_bucket-rel_key"),
+        param(["kwargs_bucket", "key.txt"], id="provide-no_conn-with_bucket-rel_key"),
+        # with conn - no bucket - rel key
+        param(["conn_bucket", "key.txt"], id="provide-with_conn-no_bucket-rel_key"),
+        param(["conn_bucket", "key.txt"], id="unify-with_conn-no_bucket-rel_key"),
+        # with conn - with bucket - rel key
+        param(["kwargs_bucket", "key.txt"], id="provide-with_conn-with_bucket-rel_key"),
+        param(["kwargs_bucket", "key.txt"], id="unify-with_conn-with_bucket-rel_key"),
+    ],
+)
+def test_unify_and_provide_bucket_name_combination(mock_base, expected, request, caplog):
+    """
+    Verify what is the outcome when the unify_bucket_name_and_key and provide_bucket_name
+    decorators are combined.
+    """
+    tokens = request.node.callspec.id.split("-")
+    assert len(tokens) == 4
+    if "with_conn" in tokens:
+        c = Connection(schema="conn_bucket")
+    else:
+        c = Connection(schema=None)
+    key = "key.txt" if "rel_key" in tokens else "s3://key_bucket/key.txt"
+    if "with_bucket" in tokens:
+        kwargs = {"bucket_name": "kwargs_bucket", "key": key}
+    else:
+        kwargs = {"key": key}
+
+    mock_base.return_value = c
+    if "unify" in tokens:  # unify to be processed before provide
+
+        class MyHook(S3Hook):
+            @unify_bucket_name_and_key
+            @provide_bucket_name
+            def do_something(self, bucket_name=None, key=None):
+                return bucket_name, key
+
+    else:
+
+        with caplog.at_level("WARNING"):
+
+            class MyHook(S3Hook):
+                @provide_bucket_name
+                @unify_bucket_name_and_key
+                def do_something(self, bucket_name=None, key=None):
+                    return bucket_name, key
+
+        assert caplog.records[0].message == "`unify_bucket_name_and_key` should wrap `provide_bucket_name`."
+    hook = MyHook()
+    if expected == "__fail__":

Review Comment:
   good catch, it's a leftover



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] feluelle commented on pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
feluelle commented on PR #28710:
URL: https://github.com/apache/airflow/pull/28710#issuecomment-1373638571

   > sure you could use param class and add an ID but that is a lot more noise.
   
   Not to me, but okay.
   
   Instead of
   ```python
   # full key
   # no conn - no bucket - full key
   (None, {"key": "s3://key_bucket/key.txt"}, ["key_bucket", "key.txt"]),
   ```
   you would do:
   ```python
   param(None, {"key": "s3://key_bucket/key.txt"}, ["key_bucket", "key.txt"], id="unify-no_conn-no_bucket-full_key"),
   ```
   
   and you get more descriptive output in pytest.
   
   > and it risks that the params differ from the description (id).
   
   Isn't this the same as with comments (we have above each line)? And comments also can be valuable. You just need to make sure to update them accordingly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on a diff in pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
dstandish commented on code in PR #28710:
URL: https://github.com/apache/airflow/pull/28710#discussion_r1061869329


##########
tests/providers/amazon/aws/hooks/test_s3.py:
##########
@@ -825,3 +826,143 @@ def test_delete_bucket_tagging_with_no_tags(self):
 
         with pytest.raises(ClientError, match=r".*NoSuchTagSet.*"):
             hook.get_bucket_tagging(bucket_name="new_bucket")
+
+
+@patch("airflow.hooks.base.BaseHook.get_connection")
+@pytest.mark.parametrize(
+    "expected",
+    [
+        # full key
+        # no conn - no bucket - full key
+        param(["key_bucket", "key.txt"], id="unify-no_conn-no_bucket-full_key"),
+        param(["key_bucket", "key.txt"], id="provide-no_conn-no_bucket-full_key"),
+        # no conn - with bucket - full key
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="unify-no_conn-with_bucket-full_key"),
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="provide-no_conn-with_bucket-full_key"),
+        # with conn - no bucket - full key
+        param(["conn_bucket", "s3://key_bucket/key.txt"], id="provide-with_conn-no_bucket-full_key"),
+        param(["key_bucket", "key.txt"], id="unify-with_conn-no_bucket-full_key"),
+        # with conn - with bucket - full key
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="provide-with_conn-with_bucket-full_key"),
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="unify-with_conn-with_bucket-full_key"),
+        # rel key
+        # no conn - no bucket - rel key
+        param([None, "key.txt"], id="unify-no_conn-no_bucket-rel_key"),
+        param([None, "key.txt"], id="provide-no_conn-no_bucket-rel_key"),
+        # no conn - with bucket - rel key
+        param(["kwargs_bucket", "key.txt"], id="unify-no_conn-with_bucket-rel_key"),
+        param(["kwargs_bucket", "key.txt"], id="provide-no_conn-with_bucket-rel_key"),
+        # with conn - no bucket - rel key
+        param(["conn_bucket", "key.txt"], id="provide-with_conn-no_bucket-rel_key"),
+        param(["conn_bucket", "key.txt"], id="unify-with_conn-no_bucket-rel_key"),
+        # with conn - with bucket - rel key
+        param(["kwargs_bucket", "key.txt"], id="provide-with_conn-with_bucket-rel_key"),
+        param(["kwargs_bucket", "key.txt"], id="unify-with_conn-with_bucket-rel_key"),
+    ],
+)
+def test_unify_and_provide_bucket_name_combination(mock_base, expected, request, caplog):
+    """
+    Verify what is the outcome when the unify_bucket_name_and_key and provide_bucket_name
+    decorators are combined.
+    """
+    tokens = request.node.callspec.id.split("-")
+    assert len(tokens) == 4
+    if "with_conn" in tokens:
+        c = Connection(schema="conn_bucket")
+    else:
+        c = Connection(schema=None)
+    key = "key.txt" if "rel_key" in tokens else "s3://key_bucket/key.txt"
+    if "with_bucket" in tokens:
+        kwargs = {"bucket_name": "kwargs_bucket", "key": key}
+    else:
+        kwargs = {"key": key}
+
+    mock_base.return_value = c
+    if "unify" in tokens:  # unify to be processed before provide

Review Comment:
   yes this was raised in previous pr.  it is unorthodox i concede.  but much more readable compared with the alternative IMO.  if you actually try to do it the "standard way" i think you might agree.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
dstandish commented on PR #28710:
URL: https://github.com/apache/airflow/pull/28710#issuecomment-1372919043

   > Thanks @dstandish. I appreciate that. :)
   > What I actually meant is [729e275](https://github.com/apache/airflow/commit/729e27506d245002fb33078f0e1afbe30d84b6bb). Feel free to revert if you don't like it, but this is how I would do it.
   
   By all means, gold plate this to your hearts content 😜
   
   Now if we can figure out why tests failing....
   
   Meanwhile I'm in the mountains and checked out for a few days 🌧️


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #28710:
URL: https://github.com/apache/airflow/pull/28710#discussion_r1061147800


##########
airflow/providers/amazon/aws/exceptions.py:
##########
@@ -42,3 +44,7 @@ def __init__(self, failures: list, message: str):
 
     def __reduce__(self):
         return EcsOperatorError, (self.failures, self.message)
+
+
+class S3HookUriParseFailure(AirflowException):
+    """When parse_s3_url fails to parse URL, this error is thrown."""

Review Comment:
   What if this inherits Exception instead of AirflowException?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #28710:
URL: https://github.com/apache/airflow/pull/28710#discussion_r1061159268


##########
airflow/providers/amazon/aws/exceptions.py:
##########
@@ -42,3 +44,7 @@ def __init__(self, failures: list, message: str):
 
     def __reduce__(self):
         return EcsOperatorError, (self.failures, self.message)
+
+
+class S3HookUriParseFailure(AirflowException):
+    """When parse_s3_url fails to parse URL, this error is thrown."""

Review Comment:
   OK for compatibility I think it’s fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on a diff in pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
o-nikolas commented on code in PR #28710:
URL: https://github.com/apache/airflow/pull/28710#discussion_r1061955523


##########
tests/providers/amazon/aws/hooks/test_s3.py:
##########
@@ -884,66 +883,75 @@ def do_something(self, bucket_name=None, key=None):
 
     else:
 
-        class MyHook(S3Hook):
-            @provide_bucket_name
-            @unify_bucket_name_and_key
-            def do_something(self, bucket_name=None, key=None):
-                return bucket_name, key
+        with caplog.at_level("WARNING"):
 
+            class MyHook(S3Hook):
+                @provide_bucket_name
+                @unify_bucket_name_and_key
+                def do_something(self, bucket_name=None, key=None):
+                    return bucket_name, key
+
+        assert caplog.records[0].message == "`unify_bucket_name_and_key` should wrap `provide_bucket_name`."
     hook = MyHook()
-    if expected == "__fail__":
-        with pytest.raises(Exception, match='Please provide a bucket name using a valid format: "key.txt"'):
-            hook.do_something(**kwargs)
-    else:
-        assert list(hook.do_something(**kwargs)) == expected
+    assert list(hook.do_something(**kwargs)) == expected
 
 
 @pytest.mark.parametrize(
-    "expected",
+    "has_conn, has_bucket, key_kind, expected",
     [
         # full key
         # no conn - no bucket - full key
-        param(["key_bucket", "key.txt"], id="no_conn-no_bucket-full_key"),
+        ("no_conn", "no_bucket", "full_key", ["key_bucket", "key.txt"]),
         # no conn - with bucket - full key
-        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="no_conn-with_bucket-full_key"),
+        ("no_conn", "with_bucket", "full_key", ["kwargs_bucket", "s3://key_bucket/key.txt"]),
         # with conn - no bucket - full key
-        param(["conn_bucket", "s3://key_bucket/key.txt"], id="with_conn-no_bucket-full_key"),
+        ("with_conn", "no_bucket", "full_key", ["key_bucket", "key.txt"]),
         # with conn - with bucket - full key
-        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="with_conn-with_bucket-full_key"),
+        ("with_conn", "with_bucket", "full_key", ["kwargs_bucket", "s3://key_bucket/key.txt"]),
         # rel key
         # no conn - no bucket - rel key
-        param("__fail__", id="no_conn-no_bucket-rel_key"),
+        ("no_conn", "no_bucket", "rel_key", [None, "key.txt"]),
         # no conn - with bucket - rel key
-        param(["kwargs_bucket", "key.txt"], id="no_conn-with_bucket-rel_key"),
+        ("no_conn", "with_bucket", "rel_key", ["kwargs_bucket", "key.txt"]),
         # with conn - no bucket - rel key
-        param(["conn_bucket", "key.txt"], id="with_conn-no_bucket-rel_key"),
+        ("with_conn", "no_bucket", "rel_key", ["conn_bucket", "key.txt"]),
         # with conn - with bucket - rel key
-        param(["kwargs_bucket", "key.txt"], id="with_conn-with_bucket-rel_key"),
+        ("with_conn", "with_bucket", "rel_key", ["kwargs_bucket", "key.txt"]),
     ],
 )
 @patch("airflow.hooks.base.BaseHook.get_connection")
-def test_s3_head_object_decorated_behavior(mock_conn, request, expected):
-    tokens = request.node.callspec.id.split("-")
-    assert len(tokens) == 3
-    if "with_conn" in tokens:
+def test_s3_head_object_decorated_behavior(mock_conn, has_conn, has_bucket, key_kind, expected):
+    if has_conn == "with_conn":

Review Comment:
   Thanks for making these changes @dstandish! It's very much appreciated :pray: and is indeed what I was describing before. The only other improvement is that you can actually just make many of the params here simply booleans instead of strings. This would allow you to get rid of the string comparisons in the conditionals of the test body.
   
   But of course feel free to do this or not :)
   
   P.S. I tried, but unfortunately cannot, add a full suggested change for the above because the snippet includes deleted lines 927/928 and the GitHub UI does not allow that, and I didn't want to include a partial one so as to avoid confusion.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish merged pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
dstandish merged PR #28710:
URL: https://github.com/apache/airflow/pull/28710


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on a diff in pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
dstandish commented on code in PR #28710:
URL: https://github.com/apache/airflow/pull/28710#discussion_r1061868471


##########
tests/providers/amazon/aws/hooks/test_s3.py:
##########
@@ -825,3 +826,143 @@ def test_delete_bucket_tagging_with_no_tags(self):
 
         with pytest.raises(ClientError, match=r".*NoSuchTagSet.*"):
             hook.get_bucket_tagging(bucket_name="new_bucket")
+
+
+@patch("airflow.hooks.base.BaseHook.get_connection")
+@pytest.mark.parametrize(
+    "expected",
+    [
+        # full key
+        # no conn - no bucket - full key
+        param(["key_bucket", "key.txt"], id="unify-no_conn-no_bucket-full_key"),
+        param(["key_bucket", "key.txt"], id="provide-no_conn-no_bucket-full_key"),
+        # no conn - with bucket - full key
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="unify-no_conn-with_bucket-full_key"),
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="provide-no_conn-with_bucket-full_key"),
+        # with conn - no bucket - full key
+        param(["conn_bucket", "s3://key_bucket/key.txt"], id="provide-with_conn-no_bucket-full_key"),
+        param(["key_bucket", "key.txt"], id="unify-with_conn-no_bucket-full_key"),
+        # with conn - with bucket - full key
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="provide-with_conn-with_bucket-full_key"),
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="unify-with_conn-with_bucket-full_key"),
+        # rel key
+        # no conn - no bucket - rel key
+        param([None, "key.txt"], id="unify-no_conn-no_bucket-rel_key"),
+        param([None, "key.txt"], id="provide-no_conn-no_bucket-rel_key"),
+        # no conn - with bucket - rel key
+        param(["kwargs_bucket", "key.txt"], id="unify-no_conn-with_bucket-rel_key"),
+        param(["kwargs_bucket", "key.txt"], id="provide-no_conn-with_bucket-rel_key"),
+        # with conn - no bucket - rel key
+        param(["conn_bucket", "key.txt"], id="provide-with_conn-no_bucket-rel_key"),
+        param(["conn_bucket", "key.txt"], id="unify-with_conn-no_bucket-rel_key"),
+        # with conn - with bucket - rel key
+        param(["kwargs_bucket", "key.txt"], id="provide-with_conn-with_bucket-rel_key"),
+        param(["kwargs_bucket", "key.txt"], id="unify-with_conn-with_bucket-rel_key"),
+    ],
+)
+def test_unify_and_provide_bucket_name_combination(mock_base, expected, request, caplog):
+    """
+    Verify what is the outcome when the unify_bucket_name_and_key and provide_bucket_name
+    decorators are combined.
+    """
+    tokens = request.node.callspec.id.split("-")
+    assert len(tokens) == 4
+    if "with_conn" in tokens:
+        c = Connection(schema="conn_bucket")
+    else:
+        c = Connection(schema=None)
+    key = "key.txt" if "rel_key" in tokens else "s3://key_bucket/key.txt"
+    if "with_bucket" in tokens:
+        kwargs = {"bucket_name": "kwargs_bucket", "key": key}
+    else:
+        kwargs = {"key": key}
+
+    mock_base.return_value = c
+    if "unify" in tokens:  # unify to be processed before provide
+
+        class MyHook(S3Hook):
+            @unify_bucket_name_and_key
+            @provide_bucket_name
+            def do_something(self, bucket_name=None, key=None):
+                return bucket_name, key
+
+    else:
+
+        with caplog.at_level("WARNING"):
+
+            class MyHook(S3Hook):
+                @provide_bucket_name
+                @unify_bucket_name_and_key
+                def do_something(self, bucket_name=None, key=None):
+                    return bucket_name, key
+
+        assert caplog.records[0].message == "`unify_bucket_name_and_key` should wrap `provide_bucket_name`."
+    hook = MyHook()
+    if expected == "__fail__":
+        with pytest.raises(Exception, match='Please provide a bucket name using a valid format: "key.txt"'):
+            hook.do_something(**kwargs)
+    else:
+        assert list(hook.do_something(**kwargs)) == expected

Review Comment:
   you know, i think sometimes it's helpful to have conditional logic in parameterized tests.  "when this, raise; when this, don't".  i think it's not a terribly uncommon pattern.   and in python, errors are raised not returned, so it's harder to accomplish with parameterization.  e.g. if it was golang where exceptions are returned, maybe you just stick it in the param tuple.  but in python when there's an exc, nothing is returned.  so i think in practice you kindof need this kind of thing.  but maybe you have a better way.  in any case, with this change there are no failures, so i've removed this conditional.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] feluelle commented on pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
feluelle commented on PR #28710:
URL: https://github.com/apache/airflow/pull/28710#issuecomment-1372079164

   Thanks @dstandish. I appreciate that. :) 
   What I actually meant is 729e275. Feel free to revert if you don't like it, but this is how I would do it.
   
   I think writing tests is not about being DRY. It much more important that they are readable and maintainable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
dstandish commented on PR #28710:
URL: https://github.com/apache/airflow/pull/28710#issuecomment-1374013804

   > Isn't this the same as with comments (we have above each line)? And comments also can be valuable. You just need to make sure to update them accordingly.
   
   yes absolutely it is, and this is a good reminder to remove them, which i've now done. i also rearranged things so that the truth table is built in the normal, sane way, and unify vs provide remain innermost (since those are what we are actually comparing here). now it's much easier to see that we have everything covered, and we no longer need the comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on a diff in pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
dstandish commented on code in PR #28710:
URL: https://github.com/apache/airflow/pull/28710#discussion_r1061157842


##########
airflow/providers/amazon/aws/exceptions.py:
##########
@@ -42,3 +44,7 @@ def __init__(self, failures: list, message: str):
 
     def __reduce__(self):
         return EcsOperatorError, (self.failures, self.message)
+
+
+class S3HookUriParseFailure(AirflowException):
+    """When parse_s3_url fails to parse URL, this error is thrown."""

Review Comment:
   do you think this is right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] feluelle commented on a diff in pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
feluelle commented on code in PR #28710:
URL: https://github.com/apache/airflow/pull/28710#discussion_r1061466287


##########
tests/providers/amazon/aws/hooks/test_s3.py:
##########
@@ -825,3 +826,143 @@ def test_delete_bucket_tagging_with_no_tags(self):
 
         with pytest.raises(ClientError, match=r".*NoSuchTagSet.*"):
             hook.get_bucket_tagging(bucket_name="new_bucket")
+
+
+@patch("airflow.hooks.base.BaseHook.get_connection")
+@pytest.mark.parametrize(
+    "expected",
+    [
+        # full key
+        # no conn - no bucket - full key
+        param(["key_bucket", "key.txt"], id="unify-no_conn-no_bucket-full_key"),
+        param(["key_bucket", "key.txt"], id="provide-no_conn-no_bucket-full_key"),
+        # no conn - with bucket - full key
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="unify-no_conn-with_bucket-full_key"),
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="provide-no_conn-with_bucket-full_key"),
+        # with conn - no bucket - full key
+        param(["conn_bucket", "s3://key_bucket/key.txt"], id="provide-with_conn-no_bucket-full_key"),
+        param(["key_bucket", "key.txt"], id="unify-with_conn-no_bucket-full_key"),
+        # with conn - with bucket - full key
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="provide-with_conn-with_bucket-full_key"),
+        param(["kwargs_bucket", "s3://key_bucket/key.txt"], id="unify-with_conn-with_bucket-full_key"),
+        # rel key
+        # no conn - no bucket - rel key
+        param([None, "key.txt"], id="unify-no_conn-no_bucket-rel_key"),
+        param([None, "key.txt"], id="provide-no_conn-no_bucket-rel_key"),
+        # no conn - with bucket - rel key
+        param(["kwargs_bucket", "key.txt"], id="unify-no_conn-with_bucket-rel_key"),
+        param(["kwargs_bucket", "key.txt"], id="provide-no_conn-with_bucket-rel_key"),
+        # with conn - no bucket - rel key
+        param(["conn_bucket", "key.txt"], id="provide-with_conn-no_bucket-rel_key"),
+        param(["conn_bucket", "key.txt"], id="unify-with_conn-no_bucket-rel_key"),
+        # with conn - with bucket - rel key
+        param(["kwargs_bucket", "key.txt"], id="provide-with_conn-with_bucket-rel_key"),
+        param(["kwargs_bucket", "key.txt"], id="unify-with_conn-with_bucket-rel_key"),
+    ],
+)
+def test_unify_and_provide_bucket_name_combination(mock_base, expected, request, caplog):
+    """
+    Verify what is the outcome when the unify_bucket_name_and_key and provide_bucket_name
+    decorators are combined.
+    """
+    tokens = request.node.callspec.id.split("-")
+    assert len(tokens) == 4
+    if "with_conn" in tokens:
+        c = Connection(schema="conn_bucket")
+    else:
+        c = Connection(schema=None)
+    key = "key.txt" if "rel_key" in tokens else "s3://key_bucket/key.txt"
+    if "with_bucket" in tokens:
+        kwargs = {"bucket_name": "kwargs_bucket", "key": key}
+    else:
+        kwargs = {"key": key}
+
+    mock_base.return_value = c
+    if "unify" in tokens:  # unify to be processed before provide
+
+        class MyHook(S3Hook):
+            @unify_bucket_name_and_key
+            @provide_bucket_name
+            def do_something(self, bucket_name=None, key=None):
+                return bucket_name, key
+
+    else:
+
+        with caplog.at_level("WARNING"):
+
+            class MyHook(S3Hook):
+                @provide_bucket_name
+                @unify_bucket_name_and_key
+                def do_something(self, bucket_name=None, key=None):
+                    return bucket_name, key
+
+        assert caplog.records[0].message == "`unify_bucket_name_and_key` should wrap `provide_bucket_name`."
+    hook = MyHook()
+    if expected == "__fail__":

Review Comment:
   Where is this defined? πŸ€” 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on pull request #28710: Apply "unify bucket and key" before "provide bucket"

Posted by GitBox <gi...@apache.org>.
o-nikolas commented on PR #28710:
URL: https://github.com/apache/airflow/pull/28710#issuecomment-1371533722

   > ok @o-nikolas and @feluelle i have updated these tests to move the "tokens" to individual params. it's not quite "putting the test values in the tuple", which, if i understood correctly, is what yall were thinking of (and what i more or less strenuously object to doing, myself anyway)
   
   Thanks @dstandish! And this was actually exactly what I was intending in the previous CR, apologies if we miscommunicated there :pray:
   I only left one comment for a way to optimize it further (using booleans as the param values instead of strings for everything but `expected`). Feel free to do so or not :smiley:  
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org