You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/21 01:41:17 UTC

[GitHub] [airflow] dwreeves opened a new issue, #24572: Snowflake Provider connection documentation is misleading

dwreeves opened a new issue, #24572:
URL: https://github.com/apache/airflow/issues/24572

   ### What do you see as an issue?
   
   Relevant page: https://airflow.apache.org/docs/apache-airflow-providers-snowflake/stable/connections/snowflake.html
   
   ## Behavior in the Airflow package
   
   The `SnowflakeHook` object in Airflow behaves oddly compared to some other database hooks like Postgres (so extra clarity in the documentation is beneficial).
   
   Most notably, the `SnowflakeHook` does _not_ make use of the either the `host` or `port` of the `Connection` object it consumes. It is completely pointless to specify these two fields.
   
   When constructing the URL in a runtime context, `snowflake.sqlalchemy.URL` is used for parsing. `URL()` allows for either `account` or `host` to be specified as kwargs. Either one of these 2 kwargs will correspond with what we'd conventionally call the host in a typical URL's anatomy. However, because `SnowflakeHook` never parses `host`, any `host` defined in the Connection object would never get this far into the parsing.
   
   ## Issue with the documentation
   
   Right now the documentation does not make clear that it is completely pointless to specify the `host`. The documentation correctly omits the port, but says that the host is optional. It does not warn the user about this field never being consumed at all by the `SnowflakeHook` ([source here](https://github.com/apache/airflow/blob/main/airflow/providers/snowflake/hooks/snowflake.py)).
   
   This can lead to some confusion especially because the Snowflake URI consumed by `SQLAlchemy` (which many people using Snowflake will be familiar with) uses either the "account" or "host" as its host. So a user coming from SQLAlchemy may think it is fine to post the account as the "host" and skip filling in the "account" inside the extras (after all, it's "extra"), whereas that doesn't work.
   
   I would argue that if it is correct to omit the `port` in the documentation (which it is), then `host` should also be excluded.
   
   Furthermore, the documentation reinforces this confusion with the last few lines, where an environment variable example connection is defined that uses a host.
   
   Finally, the documentation says "When specifying the connection in environment variable you should specify it using URI syntax", which is no longer true as of 2.3.0.
   
   
   ### Solving the problem
   
   I have 3 proposals for how the documentation should be updated to better reflect how the `SnowflakeHook` actually works.
   
   1. The `Host` option should not be listed as part of the "Configuring the Connection" section.
   
   2. The example URI should remove the host. The new example URI would look like this: `snowflake://user:password@/db-schema?account=account&database=snow-db&region=us-east&warehouse=snow-warehouse`. This URI with a blank host works fine; you can test this yourself:
   
      ```python
      from airflow.models.connection import Connection
      
      c = Connection(conn_id="foo", uri="snowflake://user:password@/db-schema?account=account&database=snow-db&region=us-east&warehouse=snow-warehouse")
      print(c.host)
      print(c.extra_dejson)
      ```
   
   3. An example should be provided of a valid Snowflake construction using the JSON. This example would not only work on its own merits of defining an environment variable connection valid for 2.3.0, but it also would highlight some of the idiosyncrasies of how Airflow defines connections to Snowflake. This would also be valuable as a reference for the AWS `SecretsManagerBackend` for when `full_url_mode` is set to `False`.
   
   ### Anything else
   
   I wasn't sure whether to label this issue as a provider issue or documentation issue; I saw templates for either but not both.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #24572: Snowflake Provider connection documentation is misleading

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #24572:
URL: https://github.com/apache/airflow/issues/24572#issuecomment-1161066762

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] josh-fell commented on issue #24572: Snowflake Provider connection documentation is misleading

Posted by GitBox <gi...@apache.org>.
josh-fell commented on issue #24572:
URL: https://github.com/apache/airflow/issues/24572#issuecomment-1161802337

   @dwreeves Excellent points. I see you already submitted a PR, thank you kindly, but assigning to you for good measure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] mik-laj closed issue #24572: Snowflake Provider connection documentation is misleading

Posted by GitBox <gi...@apache.org>.
mik-laj closed issue #24572: Snowflake Provider connection documentation is misleading
URL: https://github.com/apache/airflow/issues/24572


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org