You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/13 15:05:49 UTC

[GitHub] [airflow] potiuk opened a new pull request #13654: Switches to latest version of snowflake connector

potiuk opened a new pull request #13654:
URL: https://github.com/apache/airflow/pull/13654


   This should allow us to release a new version of snowflake
   provider that is not interacting with other providers via
   monkeypatching of SSL classes.
   
   Fixes #12881
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13654: Switches to latest version of snowflake connector

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13654:
URL: https://github.com/apache/airflow/pull/13654#issuecomment-759584913


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13654: Switches to latest version of snowflake connector

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13654:
URL: https://github.com/apache/airflow/pull/13654#issuecomment-761628791


   Absolutely! 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13654: Switches to latest version of snowflake connector

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13654:
URL: https://github.com/apache/airflow/pull/13654#issuecomment-759705199


   [The Workflow run](https://github.com/apache/airflow/actions/runs/483621608) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #13654: Switches to latest version of snowflake connector

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #13654:
URL: https://github.com/apache/airflow/pull/13654


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13654: Switches to latest version of snowflake connector

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13654:
URL: https://github.com/apache/airflow/pull/13654#issuecomment-760273973


   [The Workflow run](https://github.com/apache/airflow/actions/runs/485486567) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13654: Switches to latest version of snowflake connector

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13654:
URL: https://github.com/apache/airflow/pull/13654#discussion_r557388154



##########
File path: Dockerfile.ci
##########
@@ -286,22 +286,23 @@ RUN echo "Pip no cache dir: ${PIP_NO_CACHE_DIR}"
 
 RUN pip install --upgrade "pip==${AIRFLOW_PIP_VERSION}"
 
-# Increase the value here to force reinstalling Apache Airflow pip dependencies
-ARG PIP_DEPENDENCIES_EPOCH_NUMBER="5"
-ENV PIP_DEPENDENCIES_EPOCH_NUMBER=${PIP_DEPENDENCIES_EPOCH_NUMBER}
-
 # Only copy install_airflow_from_latest_master.sh to not invalidate cache on other script changes
 COPY scripts/docker/install_airflow_from_latest_master.sh /scripts/docker/install_airflow_from_latest_master.sh
 # fix permission issue in Azure DevOps when running the script
 RUN chmod a+x /scripts/docker/install_airflow_from_latest_master.sh
 
+ARG UPGRADE_TO_NEWER_DEPENDENCIES="false"
+ENV UPGRADE_TO_NEWER_DEPENDENCIES=${UPGRADE_TO_NEWER_DEPENDENCIES}
+
 # In case of CI builds we want to pre-install master version of airflow dependencies so that
 # We do not have to always reinstall it from the scratch.
-# This can be reinstalled from latest master by increasing PIP_DEPENDENCIES_EPOCH_NUMBER.
 # And is automatically reinstalled from the scratch every time patch release of python gets released
 # The Airflow (and providers in case INSTALL_PROVIDERS_FROM_SOURCES is "false")
-# are uninstalled, only dependencies remain
-RUN if [[ ${AIRFLOW_PRE_CACHED_PIP_PACKAGES} == "true" ]]; then \
+# are uninstalled, only dependencies remain.
+# the cache is only used when "upgrade to newer dependencies" is not set to automatically
+# account for removed dependencies (we do not install them in the first place)
+RUN if [[ ${AIRFLOW_PRE_CACHED_PIP_PACKAGES} == "true" && \
+          ${UPGRADE_TO_NEWER_DEPENDENCIES} != "true" ]]; then \

Review comment:
       This will default to True right? So will use cache and L329 will take care of installing it if setup.py or setup.cfg changes?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13654: Switches to latest version of snowflake connector

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13654:
URL: https://github.com/apache/airflow/pull/13654#discussion_r557400091



##########
File path: Dockerfile.ci
##########
@@ -286,22 +286,23 @@ RUN echo "Pip no cache dir: ${PIP_NO_CACHE_DIR}"
 
 RUN pip install --upgrade "pip==${AIRFLOW_PIP_VERSION}"
 
-# Increase the value here to force reinstalling Apache Airflow pip dependencies
-ARG PIP_DEPENDENCIES_EPOCH_NUMBER="5"
-ENV PIP_DEPENDENCIES_EPOCH_NUMBER=${PIP_DEPENDENCIES_EPOCH_NUMBER}
-
 # Only copy install_airflow_from_latest_master.sh to not invalidate cache on other script changes
 COPY scripts/docker/install_airflow_from_latest_master.sh /scripts/docker/install_airflow_from_latest_master.sh
 # fix permission issue in Azure DevOps when running the script
 RUN chmod a+x /scripts/docker/install_airflow_from_latest_master.sh
 
+ARG UPGRADE_TO_NEWER_DEPENDENCIES="false"
+ENV UPGRADE_TO_NEWER_DEPENDENCIES=${UPGRADE_TO_NEWER_DEPENDENCIES}
+
 # In case of CI builds we want to pre-install master version of airflow dependencies so that
 # We do not have to always reinstall it from the scratch.
-# This can be reinstalled from latest master by increasing PIP_DEPENDENCIES_EPOCH_NUMBER.
 # And is automatically reinstalled from the scratch every time patch release of python gets released
 # The Airflow (and providers in case INSTALL_PROVIDERS_FROM_SOURCES is "false")
-# are uninstalled, only dependencies remain
-RUN if [[ ${AIRFLOW_PRE_CACHED_PIP_PACKAGES} == "true" ]]; then \
+# are uninstalled, only dependencies remain.
+# the cache is only used when "upgrade to newer dependencies" is not set to automatically
+# account for removed dependencies (we do not install them in the first place)
+RUN if [[ ${AIRFLOW_PRE_CACHED_PIP_PACKAGES} == "true" && \
+          ${UPGRADE_TO_NEWER_DEPENDENCIES} != "true" ]]; then \

Review comment:
       UPGRADE_TO_NEWER_DEPENDENCIES is "false" by default. 
   
   It is set in the 'selective_checks" in CI to "true" (as of few days) whenever setup.py or setup.cfg changes (https://github.com/apache/airflow/blob/e4b8ee63b04a25feb21a5766b1cc997aca9951a9/scripts/ci/selective_ci_checks.sh#L325)
   
   This means that in CI, by default the "install_airflow_from_latest_master.sh" is used (and airflow is installed from cache first). 
   
   When either of the two setup files change, UPGRADE_TO_LATEST_DEPENDENCIES is set and this line is skipped, so no preinstalled packages - they are all installed from scratch (which will take a bit longer but it is 'clean' state - so anything that disappears (comparing to master) is not installed.
   
   
   The situation is different when you build image locally - when you change setup.py, setup.cfg and build the image. the cache is still used (UPGRADE_TO_NEWER_DEPENDENCIES) is "false"). 
   
   This way you avoid rebuilding all of the dependencies when you add new dependency (it takes ~ 10 minutes) to install all deps from the scratch.
   
   You can still trigger the same behavior as in CI by adding `--upgrade-to-newer-dependencies` flag in breeze when building the image (it simply sets UPGRADE_TO_NEWER_DEPENDENCIES). 
   
   However, good that I explained it -  I just realised that the comparision should be == "false"  because we are using commit_hash as the "truthy" value in selective checks! Fixing it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #13654: Switches to latest version of snowflake connector

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #13654:
URL: https://github.com/apache/airflow/pull/13654#issuecomment-761624937


   Should this go on 2.0.1 @potiuk ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13654: Switches to latest version of snowflake connector

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13654:
URL: https://github.com/apache/airflow/pull/13654#issuecomment-759960378


   @kaxil @ashb  @mik-laj  @ephraimbuddy  @turbaszek - when fixing the 'azure-storage' problems connected (they will be finaly solved in #12188 which I rebased on top of this one) I also simplified upgrades of setup.py so they will also automatically work in case any change in setup.py results in REMOVING dependency (as is in the case of azure-storage). 
   
   Basically - no more EPOCH changes are needed when you remove dependencies.  Whenever setup.py/setup.cfg is changed all installation will skip automatically (on CI) the "cache" build and it will install everything from the scratch - that will take much longer than usual (3-4 minutes) but it should handle every case - including the removal.
   
   I did it as part of the 'snowflake' fix because also here I want to reinstall dependencies from scratch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13654: Switches to latest version of snowflake connector

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13654:
URL: https://github.com/apache/airflow/pull/13654#discussion_r556604413



##########
File path: setup.py
##########
@@ -428,14 +428,10 @@ def get_sphinx_theme_version() -> str:
     # once it is merged, we can move those two back to `azure` extra.
     'azure-storage-blob',
     'azure-storage-common',
-    # snowflake is not compatible with latest version.
-    # This library monkey patches the requests library, so SSL is broken globally.
-    # See: https://github.com/snowflakedb/snowflake-connector-python/issues/324
-    'requests<2.24.0',
-    # Newest version drop support for old version of azure-storage-blob
-    # Until #12188 is solved at least we need to limit maximum version.
-    # https://github.com/apache/airflow/pull/12188
-    'snowflake-connector-python>=1.5.2,<=2.3.6',
+    # Snowflake conector > 2.3.8 is needed because it has vendored urrllib3 and requests libraries which

Review comment:
       ```suggestion
       # Snowflake conector > 2.3.8 is needed because it has vendored urllib3 and requests libraries which
   ```

##########
File path: setup.py
##########
@@ -428,14 +428,10 @@ def get_sphinx_theme_version() -> str:
     # once it is merged, we can move those two back to `azure` extra.
     'azure-storage-blob',
     'azure-storage-common',
-    # snowflake is not compatible with latest version.
-    # This library monkey patches the requests library, so SSL is broken globally.
-    # See: https://github.com/snowflakedb/snowflake-connector-python/issues/324
-    'requests<2.24.0',
-    # Newest version drop support for old version of azure-storage-blob
-    # Until #12188 is solved at least we need to limit maximum version.
-    # https://github.com/apache/airflow/pull/12188
-    'snowflake-connector-python>=1.5.2,<=2.3.6',
+    # Snowflake conector > 2.3.8 is needed because it has vendored urrllib3 and requests libraries which

Review comment:
       ```suggestion
       # Snowflake connector > 2.3.8 is needed because it has vendored urllib3 and requests libraries which
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13654: Switches to latest version of snowflake connector

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13654:
URL: https://github.com/apache/airflow/pull/13654#issuecomment-760197633


   [The Workflow run](https://github.com/apache/airflow/actions/runs/485417468) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] noelmcloughlin commented on a change in pull request #13654: Switches to latest version of snowflake connector

Posted by GitBox <gi...@apache.org>.
noelmcloughlin commented on a change in pull request #13654:
URL: https://github.com/apache/airflow/pull/13654#discussion_r556722430



##########
File path: Dockerfile
##########
@@ -262,6 +262,7 @@ ENV UPGRADE_TO_NEWER_DEPENDENCIES=${UPGRADE_TO_NEWER_DEPENDENCIES}
 # Those are additional constraints that are needed for some extras but we do not want to
 # Force them on the main Airflow package.
 # * urllib3 - required to keep boto3 happy
+# chardet<4 - required to keep snowflake happy

Review comment:
       missing "*"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13654: Switches to latest version of snowflake connector

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13654:
URL: https://github.com/apache/airflow/pull/13654#issuecomment-761551018


   Random test failure only. Merging!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13654: Switches to latest version of snowflake connector

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13654:
URL: https://github.com/apache/airflow/pull/13654#discussion_r556723630



##########
File path: Dockerfile
##########
@@ -262,6 +262,7 @@ ENV UPGRADE_TO_NEWER_DEPENDENCIES=${UPGRADE_TO_NEWER_DEPENDENCIES}
 # Those are additional constraints that are needed for some extras but we do not want to
 # Force them on the main Airflow package.
 # * urllib3 - required to keep boto3 happy
+# chardet<4 - required to keep snowflake happy

Review comment:
       Will add it before merge :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org