You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/09 00:29:01 UTC

[GitHub] [airflow] kubatyszko opened a new pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

kubatyszko opened a new pull request #10256:
URL: https://github.com/apache/airflow/pull/10256


   …nection, this one parses schema field from parts AFTER the / , while the connection schema is returned within conn_type field. This PR modifies http hook to use conn.conn_type instead of conn.schema
   
   closes: #10255
   related: #ISSUE
   
   
   What happened:
   
   HTTP hook defaults to http connection schema, and I could never get it to accurately use the https that I had set in the connection URI.
   This may only happen using secrets backend, since the connections set in airflow UI have their fields mapped directly.
   
   What you expected to happen: https://host.com should yield connection schema to be https
   
   How to reproduce it:
   Pull airflow from master, create a sample DAG using http hook, with connection being set to https, with the connection string pulled from some secrets manager (in my case I used AWS secrets manager)
   
   Anything else we need to know:
   This issue seems to be limited to using secrets backend for connection information, since connections configured via airflow UI have their "schema" field mapped directly.
   
   More information
   
           if self.http_conn_id:
               conn = self.get_connection(self.http_conn_id)
               if conn.host and "://" in conn.host:
                   self.base_url = conn.host
               else:
                   # schema defaults to HTTP
   ****                schema = conn.schema if conn.schema else "http"****
   ****                schema = conn.conn_type if conn.conn_type else "http"****
                   host = conn.host if conn.host else ""
                   self.base_url = schema + "://" + host
   
   code snippet from airflow.models.connection with highlights:
   
       def _parse_from_uri(self, uri: str):
           uri_parts = urlparse(uri)
           conn_type = uri_parts.scheme
           if conn_type == 'postgresql':
               conn_type = 'postgres'
           elif '-' in conn_type:
               conn_type = conn_type.replace('-', '_')
   ****        self.conn_type = conn_type ****
   ****        self.host = _parse_netloc_to_hostname(uri_parts) ****
   ****        quoted_schema = uri_parts.path[1:]
   ****        self.schema = unquote(quoted_schema) if quoted_schema else quoted_schema ****
           self.login = unquote(uri_parts.username) \
   Quick verification of my approach:
   
   urlparse("https://host.com:443")
   ParseResult(scheme='https', netloc='host.com:443', path='', params='', query='', fragment='')
   >>>urlparse("https://host.com:443/r").path[1:].
   'r'
   
   
   
   Kuba Tyszko, FiniteState.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-670990333


   Can you add unit tests to avoid regression?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kubatyszko commented on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
kubatyszko commented on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-671083935


   I'm reconsider my solution to the issue.
   I think the conn_type always needs to be "http" - meaning protocol, while the scheme is supposed to reflect http or https.
   With connection stored as a secret in AWS, and a url like "https://foo.bar/SCHEME" - the scheme would have been populated from the SCHEME portion, which is unintuitive and doesn't reflect the URI format at all.
   Going to investigate further whether it's something that needs fixing in the http hook or secrets backend.
   
   Also, with my change, this test is failing now:
   
   ```
   ______________________________________________________ TestHttpHook.test_https_connection ______________________________________________________
   
   self = <tests.providers.http.hooks.test_http.TestHttpHook testMethod=test_https_connection>
   mock_get_connection = <MagicMock name='get_connection' id='140430833565936'>
   
       @mock.patch('airflow.providers.http.hooks.http.HttpHook.get_connection')
       def test_https_connection(self, mock_get_connection):
           conn = Connection(conn_id='http_default', conn_type='http',
                             host='localhost', schema='https')
           mock_get_connection.return_value = conn
           hook = HttpHook()
           hook.get_conn({})
   >       self.assertEqual(hook.base_url, 'https://localhost')
   E       AssertionError: 'http://localhost' != 'https://localhost'
   E       - http://localhost
   E       + https://localhost
   E       ?     +
   
   test_http.py:305: AssertionError
   ```
   (it succeeds without my change).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-670988577


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better 🚀.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://apache-airflow-slack.herokuapp.com/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-671114988


   > And yes, that prior issue seems related, but it was closed (seems to have been resolved), and yet it did not work for me...
   
   This ticket has been closed because it was migrated to Github, but the comments made there are valuable and there is also a description of how to save the HTTP connection configuration as a URI.
   https://github.com/apache/airflow/issues/7881


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-671086076


   @kubatyszko  Can you give the value you have in the Secret backet? Do you realize that in the backend secret you should store the connection representation as a URI, not a URL?
   
   Have you tried to create a connection using the Web UI and then view the URI representations of the Connection using the CLI? 
   https://airflow.readthedocs.io/en/latest/howto/connection/index.html#connection-uri-format
   
   https://issues.apache.org/jira/browse/AIRFLOW-2910 Is it related?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-671086076


   @kubatyszko  Can you give the value you have in the Secret backet? 
   
   Have you tried to create a connection using the Web UI and then view the URI representations of the Connection using the CLI?
   https://airflow.readthedocs.io/en/latest/howto/connection/index.html#connection-uri-format
   
   https://issues.apache.org/jira/browse/AIRFLOW-2910 Is it related?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kubatyszko commented on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
kubatyszko commented on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-671111509


   In my backend, I simply have: https://domain.com:443 (the 443 doesn't matter here).
   For the http hook to parse it correctly using airflow.models.connection, I would have to set it to: http://domain.com:443/https - where https would be interpreted as the schema field, not path element (path passed via URL is actually grabbed using httpoperator's endpoint field).
   
   The schema field in URI specs you mentioned, make sense for databases, where you could specify it as say mysql://host.com:3636/dbschema, but for HTTP it doesn't.
   
   As far as HTTP hook goes, the leading "http" or "https" should be parsed both as schema and conn_type, where in conn_type it always replaces https with http. but schema should contain the true protocol to use to reconstruct the URL passed to the server.
   
   And yes, that prior issue seems related, but it was closed (seems to have been resolved), and yet it did not work for me...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kubatyszko commented on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
kubatyszko commented on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-671153598


   Yeah, I just concluded the same:
   
   ```
   >>> c = Connection(uri='https://server.com:443/foo/bar')
   >>> print(c.schema)
   foo/bar
   >>> print(c.conn_type)
   https
   >>> print(c.get_uri())
   https://server.com:443/foo%2Fbar
   
   
   root@41009fd10af0:/opt/airflow# airflow connections get http_default
   [2020-08-10 04:06:27,818] {base_hook.py:68} INFO - Using connection to: id: http_default. Host: https://www.httpbin.org/, Port: None, Schema: None, Login: None, Password: None, extra: None
   Conn ID: http_default
   Conn Type: http
   Extra: {}
   Host: https://www.httpbin.org/
   Is Encrypted: false
   Is Extra Encrypted: false
   Port: null
   URI: http://https%3A%2F%2Fwww.httpbin.org%2F
   
   root@41009fd10af0:/opt/airflow#
   ```
   
   Even though I can make it work this way, it's extremely unintuitive, and does not match ANY URI representation to my knowledge, I checked. URI would be basically exactly the same as URL.
   
   Thanks
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kubatyszko commented on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
kubatyszko commented on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-670992363


   > Can you add unit tests to avoid regression?
   
   I'll try to look at the tests soon, I'd like to wait for the CI runs to complete, it seems that many unrelated ones are failing.
   Also, since my issue is isolated to only secrets backends, and there ARE tests provided, this must mean that the tests are either flawed or incomplete.
   The postgres backend for connection/variables etc is working fine an my code doesn't touch that.
   I may need some guidance with local development and test suite.
   
   Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-686822977


   @kubatyszko Do you need help writing documentation about your findings? I am very happy to help with the review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-671114988


   > In my backend, I simply have: https://domain.com:443 (the 443 doesn't matter here).
   
   This does not look like a valid connection configuration URI although it does appear as a valid URL, but not a valid connection configuration.
   ```console
   $ AIRFLOW_CONN_A='https://domain.com:443' airflow connections get A
   Conn ID: A
   Conn Type: https
   Extra: {}
   Host: domain.com
   Is Encrypted: null
   Is Extra Encrypted: null
   Port: 443
   URI: https://domain.com:443
   ```
   In your case, this connection can be expressed using a URI: 
   a) ``http://https%3A%2F%2Fdomain.com:443``
   ```console
   $ AIRFLOW_CONN_A='http://domain.com:443/https' airflow connections get A
   Conn Id: A
   Conn Type: http
   Extra: {}
   Host: domain.com
   Id: null
   Is Encrypted: null
   Is Extra Encrypted: null
   Login: null
   Password: null
   Port: 443
   Schema: https
   URI: http://domain.com:443/https
   ````
   or
   b) ``http://https%3A%2F%2Fdomain.com:443``
   ```
   AIRFLOW_CONN_A='http://https%3A%2F%2Fdomain.com:443' airflow connections get A
   Schema: , Login: None, Password: None, extra: None
   Conn Id: A
   Conn Type: http
   Extra: {}
   Host: https://domain.com
   Id: null
   Is Encrypted: null
   Is Extra Encrypted: null
   Login: null
   Password: null
   Port: 443
   Schema: ''
   URI: http://https%3A%2F%2Fdomain.com:443
   ```
   I agree that this is not what users can expect, but we have a limited ability to change it in a backward-compatible manner.
   
   
   > And yes, that prior issue seems related, but it was closed (seems to have been resolved), and yet it did not work for me...
   
   This ticket has been closed because it was migrated to Github, but the comments made there are valuable and there is also a description of how to save the HTTP connection configuration as a URI.
   https://github.com/apache/airflow/issues/7881


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kubatyszko commented on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
kubatyszko commented on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-671172136


   Yes, I am able to use http hooks without any modifications, even though I'm really not happy with having to use ```http://https%3A%2F%2F``` (and it really must be escaped, the https part cannot be https://).
   I'll look at the docs and add my findings.
   Keep an eye out for a PR to the openfaas provider, I have an "invoke_async" function coming soon.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-670990333


   Can you add system tests to avoid regression?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-671114988


   > In my backend, I simply have: https://domain.com:443 (the 443 doesn't matter here).
   
   This does not look like a valid connection configuration URI although it does appear as a valid URL, but not a valid connection configuration.
   ```console
   $ AIRFLOW_CONN_A='https://domain.com:443' airflow connections get A
   Conn ID: A
   Conn Type: https
   Extra: {}
   Host: domain.com
   Is Encrypted: null
   Is Extra Encrypted: null
   Port: 443
   URI: https://domain.com:443
   ```
   In your case, this connection can be expressed using a URI: 
   a) ``http://https%3A%2F%2Fdomain.com:443``
   ```console
   $ AIRFLOW_CONN_A='http://domain.com:443/https' airflow connections get A
   Conn Id: A
   Conn Type: http
   Extra: {}
   Host: domain.com
   Id: null
   Is Encrypted: null
   Is Extra Encrypted: null
   Login: null
   Password: null
   Port: 443
   Schema: https
   URI: http://domain.com:443/https
   ````
   or
   b) ``http://https%3A%2F%2Fdomain.com:443``
   ```
   $ AIRFLOW_CONN_A='http://https%3A%2F%2Fdomain.com:443' airflow connections get A
   Schema: , Login: None, Password: None, extra: None
   Conn Id: A
   Conn Type: http
   Extra: {}
   Host: https://domain.com
   Id: null
   Is Encrypted: null
   Is Extra Encrypted: null
   Login: null
   Password: null
   Port: 443
   Schema: ''
   URI: http://https%3A%2F%2Fdomain.com:443
   ```
   I agree that this is not what users can expect, but we have a limited ability to change it in a backward-compatible manner.
   
   
   > And yes, that prior issue seems related, but it was closed (seems to have been resolved), and yet it did not work for me...
   
   This ticket has been closed because it was migrated to Github, but the comments made there are valuable and there is also a description of how to save the HTTP connection configuration as a URI.
   https://github.com/apache/airflow/issues/7881


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-671168474


   Hope your problem is solved now.  Can you describe our findings in the documentation?  I will be happy to help with the review of the change and its merging.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stale[bot] closed pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
stale[bot] closed pull request #10256:
URL: https://github.com/apache/airflow/pull/10256


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #10256: fixes http hook using schema field from airflow.models.connection.Con…

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #10256:
URL: https://github.com/apache/airflow/pull/10256#issuecomment-671040183


   Happy to help with the System Tests as well if needed -> just ping us in  "#system-tests"  channel in Airflow's slack. The plans are that we fully automate system tests execution before we release 2.0.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org