You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/10 18:12:58 UTC

[GitHub] [airflow] ngaranko opened a new pull request #14701: HttpHook. Use request factory and respect defaults

ngaranko opened a new pull request #14701:
URL: https://github.com/apache/airflow/pull/14701


   Use Request's `session.request` factory for HTTP request initiation, this will use environment variables and sensible defaults for requests.
   Also use `verify` option only if it is provided to `run` method, as requests library already defaults to `True`.
   
   ---
   
   Backstory for this PR:
    Our organization uses firewalls and custom SSL certificates to communicate between systems, this can be achieved via `CURL_CA_BUNDLE` and `REQUESTS_CA_BUNDLE` environment variables.
   Requests library takes both into account and uses them as default value for `verify` option when sending request to remote system.
   
    Current implementation is setting `verify` to True, which overwrites defaults and as results requests can not be made due to SSL verification issues. This PR is fixing the problem.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ngaranko commented on pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
ngaranko commented on pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#issuecomment-825006141


   @ashb Corrected code and added explanation to why new empty method is needed inside `FakeSession`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] marcosmarxm commented on pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
marcosmarxm commented on pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#issuecomment-799582362


   @ngaranko can you take a look in the failed tests? All providers using HttpHook are failing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#issuecomment-827652670


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gkoller commented on pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
gkoller commented on pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#issuecomment-795955198


   The "why" is often omitted in our commit messages. Making it hard to determine why certain things were done. Your "backstory to this PR" perfectly documents the "why" hence should definitely be included in the commit message.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ngaranko commented on a change in pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
ngaranko commented on a change in pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#discussion_r592324754



##########
File path: airflow/providers/http/hooks/http.py
##########
@@ -126,15 +126,20 @@ def run(
         else:
             url = (self.base_url or '') + (endpoint or '')
 
+        request_parameters = dict(method=self.method, url=url, headers=headers)
+        request_parameters.update(request_kwargs)
+
         if self.method == 'GET':
             # GET uses params
-            req = requests.Request(self.method, url, params=data, headers=headers, **request_kwargs)
+            request_parameters["params"] = data
         elif self.method == 'HEAD':
             # HEAD doesn't use params
-            req = requests.Request(self.method, url, headers=headers, **request_kwargs)
+            pass

Review comment:
       HEAD requests are still made, it's just they do no require data to be added into  payload, hence the `pass`.
   
   Request instance is created later via `session.request` factory.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ngaranko commented on a change in pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
ngaranko commented on a change in pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#discussion_r618571663



##########
File path: airflow/providers/http/hooks/http.py
##########
@@ -126,15 +126,20 @@ def run(
         else:
             url = (self.base_url or '') + (endpoint or '')
 
+        request_parameters = dict(method=self.method.upper(), url=url, headers=headers)
+        request_parameters.update(request_kwargs)
+
         if self.method == 'GET':
             # GET uses params
-            req = requests.Request(self.method, url, params=data, headers=headers, **request_kwargs)
+            request_parameters["params"] = data
         elif self.method == 'HEAD':
             # HEAD doesn't use params
-            req = requests.Request(self.method, url, headers=headers, **request_kwargs)
+            pass
         else:
             # Others use data
-            req = requests.Request(self.method, url, data=data, headers=headers, **request_kwargs)
+            request_parameters["data"] = data
+
+        req = requests.Request(**request_parameters)

Review comment:
       Thanks, that was garbage after latest changes. Removed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#discussion_r621264285



##########
File path: tests/providers/http/sensors/test_http.py
##########
@@ -201,6 +201,9 @@ def prepare_request(self, request):
             self.response._content += ('/' + request.params['date']).encode('ascii', 'ignore')
         return self.response
 
+    def merge_environment_settings(self, _url, **kwargs):
+        return kwargs
+

Review comment:
       Oh this in test code! I missed that first time round :D 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#issuecomment-831986000


   Re-triggering CI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] marcosmarxm commented on a change in pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
marcosmarxm commented on a change in pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#discussion_r594507916



##########
File path: airflow/providers/http/hooks/http.py
##########
@@ -176,15 +181,20 @@ def run_and_check(
         """
         extra_options = extra_options or {}
 
+        request_options = {}
+        if "verify" in extra_options:
+            # Overwrite verify only if it is needed
+            request_options["verify"] = extra_options["verify"]
+
         try:
             response = session.send(
                 prepped_request,
                 stream=extra_options.get("stream", False),
-                verify=extra_options.get("verify", True),
                 proxies=extra_options.get("proxies", {}),
                 cert=extra_options.get("cert"),
                 timeout=extra_options.get("timeout"),
                 allow_redirects=extra_options.get("allow_redirects", True),
+                **request_options,

Review comment:
       The default value in the requests Session library to **verify** is `True` and you can change the value in your **hook.run** execution to `False` using the `extra_options={"verify": False}`. If you dont sent this parameter from Airflow in the `send` function request will assume `True` anyway... right?
   
   https://requests.readthedocs.io/en/master/_modules/requests/sessions/#Session.send




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ngaranko commented on a change in pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
ngaranko commented on a change in pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#discussion_r594529324



##########
File path: airflow/providers/http/hooks/http.py
##########
@@ -176,15 +181,20 @@ def run_and_check(
         """
         extra_options = extra_options or {}
 
+        request_options = {}
+        if "verify" in extra_options:
+            # Overwrite verify only if it is needed
+            request_options["verify"] = extra_options["verify"]
+
         try:
             response = session.send(
                 prepped_request,
                 stream=extra_options.get("stream", False),
-                verify=extra_options.get("verify", True),
                 proxies=extra_options.get("proxies", {}),
                 cert=extra_options.get("cert"),
                 timeout=extra_options.get("timeout"),
                 allow_redirects=extra_options.get("allow_redirects", True),
+                **request_options,

Review comment:
       Hey @marcosmarxm thanks for looking into this!
   
   Default `verify=True` is correct only for convenience methods of requests library (e.g. `requests.get`, `requests.post` etc), as they are using session factory and building correct `session.verify` value inside `Session.request`:
    https://requests.readthedocs.io/en/master/_modules/requests/sessions/#Session.merge_environment_settings
   
   I've updated hook and replaced `requests.Request` with `session.request` as it does more and behaves more like expected from `requests.post`.
   
   Issue of `session.send` is that if we say `session.send(..., verify=True)` it will overwrite whatever is set by `merge_environment_settings` and will forcefully set `verify=True` (thanks to `kwargs.setdefault('verify', self.verify)` in `Session.send` code) and this breaks SSL certificate configuration provided by OS, same goes for `verify=False`.
   
   > you can change the value in your hook.run
   
   I can not, as we're not using `hook.run` directly, rather via other hooks, such as slack hook etc.
   
   Hope my comment clears out reasoning behind this change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ngaranko commented on a change in pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
ngaranko commented on a change in pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#discussion_r618551768



##########
File path: tests/providers/http/sensors/test_http.py
##########
@@ -201,6 +201,9 @@ def prepare_request(self, request):
             self.response._content += ('/' + request.params['date']).encode('ascii', 'ignore')
         return self.response
 
+    def merge_environment_settings(self, _url, **kwargs):
+        return kwargs
+

Review comment:
       This is test mock for `requests.Session.merge_environment_settings` code, which is called in https://github.com/apache/airflow/pull/14701/files#diff-b644517c333b1a654b85edd76607b296990c14f203b8521091458a46fa228359R184
   
   It is needed to make tests work with `FakeSession` class instead of regular session.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#discussion_r592277937



##########
File path: airflow/providers/http/hooks/http.py
##########
@@ -126,15 +126,20 @@ def run(
         else:
             url = (self.base_url or '') + (endpoint or '')
 
+        request_parameters = dict(method=self.method, url=url, headers=headers)
+        request_parameters.update(request_kwargs)
+
         if self.method == 'GET':
             # GET uses params
-            req = requests.Request(self.method, url, params=data, headers=headers, **request_kwargs)
+            request_parameters["params"] = data
         elif self.method == 'HEAD':
             # HEAD doesn't use params
-            req = requests.Request(self.method, url, headers=headers, **request_kwargs)
+            pass

Review comment:
       Head requests can be used to get the content length of a download without actually downloading the data. I'm thinking we should not have `pass` here, what do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb merged pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
ashb merged pull request #14701:
URL: https://github.com/apache/airflow/pull/14701


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ngaranko commented on pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
ngaranko commented on pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#issuecomment-799677049


   @marcosmarxm fixed tests and updated code. Also added extra tests to illustrate issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gkoller edited a comment on pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
gkoller edited a comment on pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#issuecomment-795955198


   Your "backstory to this PR" perfectly documents the "why" hence should definitely be included in the commit message.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#issuecomment-832071797


   Awesome work, congrats on your first merged pull request!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#discussion_r618538491



##########
File path: tests/providers/http/sensors/test_http.py
##########
@@ -201,6 +201,9 @@ def prepare_request(self, request):
             self.response._content += ('/' + request.params['date']).encode('ascii', 'ignore')
         return self.response
 
+    def merge_environment_settings(self, _url, **kwargs):
+        return kwargs
+

Review comment:
       What's this for?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb closed pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
ashb closed pull request #14701:
URL: https://github.com/apache/airflow/pull/14701


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #14701: HttpHook. Use request factory and respect defaults

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #14701:
URL: https://github.com/apache/airflow/pull/14701#discussion_r618537627



##########
File path: airflow/providers/http/hooks/http.py
##########
@@ -126,15 +126,20 @@ def run(
         else:
             url = (self.base_url or '') + (endpoint or '')
 
+        request_parameters = dict(method=self.method.upper(), url=url, headers=headers)
+        request_parameters.update(request_kwargs)
+
         if self.method == 'GET':
             # GET uses params
-            req = requests.Request(self.method, url, params=data, headers=headers, **request_kwargs)
+            request_parameters["params"] = data
         elif self.method == 'HEAD':
             # HEAD doesn't use params
-            req = requests.Request(self.method, url, headers=headers, **request_kwargs)
+            pass
         else:
             # Others use data
-            req = requests.Request(self.method, url, data=data, headers=headers, **request_kwargs)
+            request_parameters["data"] = data
+
+        req = requests.Request(**request_parameters)

Review comment:
       This block appears to change the code to have no difference in behaviour, unless I've missed something
   
   Can you remove hunk please?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org