You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/25 15:10:54 UTC

[GitHub] [airflow] mik-laj opened a new pull request #13313: Warns politely, do not force run a long operation

mik-laj opened a new pull request #13313:
URL: https://github.com/apache/airflow/pull/13313


   I personally (as probably most of the developers in this project) do not deal with the development of Javascript code, but `entrypoint_ci.sh` forces us to recompile resources even when we don't need it. This takes a lot of time and is frustrating. This can happen very often as even a small change forces you to recompile assets.  This is especially problematic when working with two branches that have two different versions of JS files. Then, each restart of the environment means that we have to wait until the files are regenerated.
   
   Since most developers don't need to keep these files in good shape, I didn't decide to add an additional question, but did give a warning with yellow text.
   
   Additionally, I fixed one problem. When we run the environment with the use of a pre-baked image, e.g. on CI, or when using the "--github-image-id" option, we do not want to recompile assets. Unfortunately, the finished images did not contain checksums, because `Dockerfile` files run `compile_assets.sh` scripts, which did not generate checksums. I moved the checksum generation to the compile_assets.sh file, so now the pre-baked image will have checksums and won't regenerate assets.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13313:
URL: https://github.com/apache/airflow/pull/13313#issuecomment-752992362


   Can you rebase again @mik-laj -> it was failing early due to the failing tests in master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13313:
URL: https://github.com/apache/airflow/pull/13313#issuecomment-752956837


   > I've added a new panel that compiles all files that have changed and then monitors for any new changes.
   :heart: 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13313:
URL: https://github.com/apache/airflow/pull/13313#discussion_r550491680



##########
File path: Dockerfile.ci
##########
@@ -345,7 +345,7 @@ COPY airflow/www/webpack.config.js ${AIRFLOW_SOURCES}/airflow/www/
 COPY airflow/www/static ${AIRFLOW_SOURCES}/airflow/www/static/

Review comment:
       You must copy the airflow/www/compile_assets.sh script here as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #13313:
URL: https://github.com/apache/airflow/pull/13313#issuecomment-752956837


   > I've added a new panel that compiles all files that have changed and then monitors for any new changes.
   
   :heart: 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13313:
URL: https://github.com/apache/airflow/pull/13313#issuecomment-753007599


   Failed on batch errors which are fixed in master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13313:
URL: https://github.com/apache/airflow/pull/13313#discussion_r548962702



##########
File path: airflow/www/compile_assets.sh
##########
@@ -20,10 +20,15 @@ set -e
 
 cd "$( dirname "${BASH_SOURCE[0]}" )"
 
+MD5SUM_FILE="static/dist/sum.md5"
+readonly MD5SUM_FILE
+
 # first bump up package.json manually, commit and tag
 if [[ -d ./static/dist ]]; then
   rm -f ./static/dist/*
 fi
 
 yarn install --frozen-lockfile
 yarn run build
+
+find package.json yarn.lock static/css static/js -type f | sort | xargs md5sum > "${MD5SUM_FILE}"

Review comment:
       One more thing. In order this to work properly, also `airflow/www/compile_assets.sh` should be run as step in `Dockerfile.ci`
   
   https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L348
   
   In the Dockefile.ci we this in two steps (this is an optimisation of image rebuilding)
   
   1) first we install node_modules (in case .lock/ or package.json changed): https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L300
   
   2) then we update all latest setup.py dependencies in case setup.py changed https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L322
   
   3) only then we build assets (only if any www files changed): https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L348
   
   However there is a difference here: 
   
   1) we use `yarn run prod` in the Dockerfile.ci
   2) we use `yarn build` in the compile_assets.sh
   
   There are two things to fix here:
   
   1) We should decide if we should use `build` or `prod' and use the same in both cases (AFAIK prod is optimized version of assets) (@ryanahamilton WDYT ?). I'd vote for `build` as it produces un-obfuscated code if that's the case..
   
   2) We should calculate and store the md5sum also in case of Dockerfile.ci build (because otherwise it will be always rebuilt).
   
   Do you want to do that @mik-laj ? Or should I do that myself?
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13313:
URL: https://github.com/apache/airflow/pull/13313#discussion_r548962702



##########
File path: airflow/www/compile_assets.sh
##########
@@ -20,10 +20,15 @@ set -e
 
 cd "$( dirname "${BASH_SOURCE[0]}" )"
 
+MD5SUM_FILE="static/dist/sum.md5"
+readonly MD5SUM_FILE
+
 # first bump up package.json manually, commit and tag
 if [[ -d ./static/dist ]]; then
   rm -f ./static/dist/*
 fi
 
 yarn install --frozen-lockfile
 yarn run build
+
+find package.json yarn.lock static/css static/js -type f | sort | xargs md5sum > "${MD5SUM_FILE}"

Review comment:
       One more thing. In order this to work properly, also `airflow/www/compile_assets.sh` should be run as step in `Dockerfile.ci`
   
   https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L348
   
   In the Dockefile.ci we this in two steps (this is an optimisation of image rebuilding)
   
   1) first we install node_modules (in case .lock/ or package.json changed): https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L300
   
   2) then we update all latest setup.py dependencies in case setup.py changed https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L322
   
   3) only then we build assets (only if any www files changed): https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L348
   
   However there is a difference here: 
   
   1) we use `yarn run prod` in the Dockerfile.ci
   2) we use `yarn build` in the compile_assets.sh
   
   There are two things to fix here:
   
   1) We should decide if we should use `build` or `prod` and use the same in both cases (AFAIK prod is optimized version of assets) (@ryanahamilton WDYT ?). I'd vote for `build` as it produces un-obfuscated code if that's the case..
   
   2) We should calculate and store the md5sum also in case of Dockerfile.ci build (because otherwise it will be always rebuilt).
   
   Do you want to do that @mik-laj ? Or should I do that myself?
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13313:
URL: https://github.com/apache/airflow/pull/13313#issuecomment-752959565


   [The Workflow run](https://github.com/apache/airflow/actions/runs/454561854) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13313:
URL: https://github.com/apache/airflow/pull/13313#discussion_r548962702



##########
File path: airflow/www/compile_assets.sh
##########
@@ -20,10 +20,15 @@ set -e
 
 cd "$( dirname "${BASH_SOURCE[0]}" )"
 
+MD5SUM_FILE="static/dist/sum.md5"
+readonly MD5SUM_FILE
+
 # first bump up package.json manually, commit and tag
 if [[ -d ./static/dist ]]; then
   rm -f ./static/dist/*
 fi
 
 yarn install --frozen-lockfile
 yarn run build
+
+find package.json yarn.lock static/css static/js -type f | sort | xargs md5sum > "${MD5SUM_FILE}"

Review comment:
       One more thing. In order this to work properly, also `airflow/www/compile_assets.sh` should be run as step in `Dockerfile.ci`
   
   https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L348
   
   In the Dockefile.ci we this in two steps (this is an optimisation of image rebuilding)
   
   1) first we install node_modules (in case .lock/ or package.json changed): https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L300
   
   2) then we update all latest setup.py dependencies in case setup.py changed https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L322
   
   3) only then we build assets (only if any www files changed): https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L348
   
   However there is a difference here: 
   
   1) we use `yarn run prod` in the Dockerfile.ci
   2) we use `yarn build` in the compile_assets.sh
   
   There are two things to fix here:
   
   1) We should decide if we should use `build` or `prod` and use the same in both cases (AFAIK prod is optimized version of assets) (@ryanahamilton WDYT ?). I'd vote for `build` as it produces un-obfuscated code if that's the case..
   
   2) We should calculate and store the md5sum also in case of Dockerfile.ci build (because otherwise it will be always rebuilt first time).
   
   Do you want to do that @mik-laj ? Or should I do that myself?
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #13313:
URL: https://github.com/apache/airflow/pull/13313


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #13313:
URL: https://github.com/apache/airflow/pull/13313#issuecomment-752955409


   > I think we should stil recompile automatically when user runs 'start-airflow' command. In such case there is no chance to recompile the assets manually.
   
   I've added a new panel that compiles all files that have changed and then monitors for any new changes.
   
   > I think this is much better approach than warning.
   
   ```
   time yarn run build
   real	0m22.657s
   ```
   On my computer, it still takes 20 seconds. In my opinion, this time is wasted, especially when I frequently change branches and these files will recompile when I switch branches.  I suspect most developers will feel similar as we don't have too many JS developers.  When there are more JS developers, we can try to improve their workflow, but for now I would rather make life easier for a larger group.
   
   > We should decide if we should use build or prod and use the same in both cases (AFAIK prod is optimized version of assets) (@ryanahamilton WDYT ?). I'd vote for build as it produces un-obfuscated code if that's the case..
   
   I also think it is worth having access to non-optimized files.
   
   > We should calculate and store the md5sum also in case of Dockerfile.ci build (because otherwise it will be always rebuilt first time).
   
   I used the `airflow/www/compile_assets.sh` script in Dockerfile, so now the checksums are also generated.
   
   > One more thought @mik-laj . Maybe we should also split node_modules vs. compiling assets? Right now we are doing both which is probably not the best idea and this is main reason why it takes so long on Mac mounted filesystem is slow and it scans many files while doing yarn install - that's why for me on Linux it takes seconds and for anyone working on Mac it might take minutes (that's why I have no noticed that this is a problem).
   
   In my opinion, this has very big benefits. This way, one script prepares all JS files and solves any Javascript issues. I don't have to worry if I should use `yarn install` or` yarn run build` in a given situation. In our community, where most developers don't use these tools every day, this has a huge benefit. I just run this script and it doesn't care anymore.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13313:
URL: https://github.com/apache/airflow/pull/13313#discussion_r548963109



##########
File path: airflow/www/compile_assets.sh
##########
@@ -20,10 +20,15 @@ set -e
 
 cd "$( dirname "${BASH_SOURCE[0]}" )"
 
+MD5SUM_FILE="static/dist/sum.md5"
+readonly MD5SUM_FILE
+
 # first bump up package.json manually, commit and tag
 if [[ -d ./static/dist ]]; then
   rm -f ./static/dist/*
 fi
 
 yarn install --frozen-lockfile
 yarn run build
+
+find package.json yarn.lock static/css static/js -type f | sort | xargs md5sum > "${MD5SUM_FILE}"

Review comment:
       One more thought @mik-laj . Maybe we should also split node_modules vs. compiling assets? Right now we are doing both which  is probably not the best idea and this is main reason why it takes so long on Mac mounted filesystem is slow and it scans many files while doing `yarn install` - that's why for me on Linux it takes seconds and for anyone working on Mac it might take minutes (that's why I have no noticed that this is a problem).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13313:
URL: https://github.com/apache/airflow/pull/13313#issuecomment-752956860


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13313:
URL: https://github.com/apache/airflow/pull/13313#discussion_r548962702



##########
File path: airflow/www/compile_assets.sh
##########
@@ -20,10 +20,15 @@ set -e
 
 cd "$( dirname "${BASH_SOURCE[0]}" )"
 
+MD5SUM_FILE="static/dist/sum.md5"
+readonly MD5SUM_FILE
+
 # first bump up package.json manually, commit and tag
 if [[ -d ./static/dist ]]; then
   rm -f ./static/dist/*
 fi
 
 yarn install --frozen-lockfile
 yarn run build
+
+find package.json yarn.lock static/css static/js -type f | sort | xargs md5sum > "${MD5SUM_FILE}"

Review comment:
       One more thing. In order this to work properly, also `airflow/www/compile_assets.sh` should be run as step in `Dockerfile.ci`
   
   https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L348
   
   In the Dockefile.ci we this in two steps (this is an optimisation of image rebuilding)
   
   1) first we install node_modules (in case .lock/ or package.json changed): https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L300
   
   2) then we update all latest setup.py dependencies in case setup.py changed https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L322
   
   3) only then we build assets (only if any www files changed): https://github.com/apache/airflow/blob/6f246b0d54ccaf733b7c5951a8955adda6719acb/Dockerfile.ci#L348
   
   However there is a difference here: 
   
   1) we use `yarn prod` in the Dockerfile.ci
   2) we use `yarn build` in the compile_assets.sh
   
   There are two things to fix here:
   
   1) We should decide if we should use `build` or `prod' and use the same in both cases (AFAIK prod is optimized version of assets) (@ryanahamilton WDYT ?). I'd vote for `build` as it produces un-obfuscated code if that's the case..
   
   2) We should calculate and store the md5sum also in case of Dockerfile.ci build (because otherwise it will be always rebuilt).
   
   Do you want to do that @mik-laj ? Or should I do that myself?
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #13313:
URL: https://github.com/apache/airflow/pull/13313#issuecomment-751295493


   Another solution to the problem you had:
   
   MOUNT_LOCAL_SOURCES="false" when github image id is used:
   
   https://github.com/apache/airflow/blob/master/breeze#L1085
   
   + moving calculation of the sum to the "compile_assets.sh"
   
   This way md5sum will be part of the image and it will not change (because sources will not be mounted by default when you use github image.
   
   I think this is much better approach than warning.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #13313: Warns politely, do not force run a long operation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #13313:
URL: https://github.com/apache/airflow/pull/13313#discussion_r548963109



##########
File path: airflow/www/compile_assets.sh
##########
@@ -20,10 +20,15 @@ set -e
 
 cd "$( dirname "${BASH_SOURCE[0]}" )"
 
+MD5SUM_FILE="static/dist/sum.md5"
+readonly MD5SUM_FILE
+
 # first bump up package.json manually, commit and tag
 if [[ -d ./static/dist ]]; then
   rm -f ./static/dist/*
 fi
 
 yarn install --frozen-lockfile
 yarn run build
+
+find package.json yarn.lock static/css static/js -type f | sort | xargs md5sum > "${MD5SUM_FILE}"

Review comment:
       One more thought @mik-laj . Maybe we should also split node_modules vs. compiling assets? Right now we are doing both which  is probably not the best idea and this is main reason why it takes so long on Mac mounted filesystem is slow and it scans many files while doing `yarn install` - that's why for me on Linux it takes seconds and for anyone working on Mac it might take many minutes (that's why I have no noticed that this is a problem).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org