You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/12/14 22:16:54 UTC

[GitHub] [airflow] pingzh opened a new pull request, #28367: mkdirs should set mode correctly

pingzh opened a new pull request, #28367:
URL: https://github.com/apache/airflow/pull/28367

   With Path.mkdirs, if parents is true, any missing parents of this path are created as needed; they are created with the default permissions **without** taking mode into account
   
   see: https://docs.python.org/3/library/pathlib.html#pathlib.Path.mkdir
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #28367: mkdirs should set mode correctly

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #28367:
URL: https://github.com/apache/airflow/pull/28367#issuecomment-1355637320

   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pingzh closed pull request #28367: mkdirs should set mode correctly

Posted by GitBox <gi...@apache.org>.
pingzh closed pull request #28367: mkdirs should set mode correctly
URL: https://github.com/apache/airflow/pull/28367


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28367: mkdirs should set mode correctly

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28367:
URL: https://github.com/apache/airflow/pull/28367#issuecomment-1352726713

   In additional `airflow.utils.file.mkdirs` does not use in airflow and community providers.
   So any changes here does not improve anything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pingzh commented on pull request #28367: mkdirs should set mode correctly

Posted by GitBox <gi...@apache.org>.
pingzh commented on PR #28367:
URL: https://github.com/apache/airflow/pull/28367#issuecomment-1358242305

   close this in favor of https://github.com/apache/airflow/pull/28477


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28367: mkdirs should set mode correctly

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28367:
URL: https://github.com/apache/airflow/pull/28367#issuecomment-1353758880

   This implementation completely ignored umask for subdirectories, see:
   
   ```python
   import os
   from tempfile import TemporaryDirectory
   from pathlib import Path
   from airflow.utils.file import mkdirs
   
   
   def cur_umask() -> int:
       tmp = os.umask(0o022)
       os.umask(tmp)
       return oct(tmp)
   
   
   def sample():
       print(f" Current umask: {cur_umask()} ".center(72, "="))
       with TemporaryDirectory() as tmpdir:
           sub_path = Path(tmpdir) / "child1"
           path = sub_path / "child2"
   
           mkdirs(path, 0o770)
   
           print(f"{sub_path} mode {oct(os.stat(sub_path).st_mode)}")
           print(f"{path} mode {oct(os.stat(path).st_mode)}")
   
   sample()
   # child1 mode 0o40777
   # child2 mode 0o40770
   
   os.umask(0)
   sample()
   # child1 mode 0o40777
   # child2 mode 0o40770
   
   os.umask(0o002)
   sample()
   # child1 mode 0o40777
   # child2 mode 0o40770
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28367: mkdirs should set mode correctly

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28367:
URL: https://github.com/apache/airflow/pull/28367#issuecomment-1355515252

   Yep. If this issue still exists better to fix them. It could be a situation that initial issue solved but comments still exists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pingzh commented on pull request #28367: mkdirs should set mode correctly

Posted by GitBox <gi...@apache.org>.
pingzh commented on PR #28367:
URL: https://github.com/apache/airflow/pull/28367#issuecomment-1353692433

   > In additional `airflow.utils.file.mkdirs` does not use in airflow and community providers. So any changes here does not improve anything.
   
   @Taragolis it is true, it is not used. it was replaced by `Path(directory).mkdir(mode=0o777, parents=True, exist_ok=True)`, eg. https://github.com/apache/airflow/blob/b263dbcb0f84fd9029591d1447a7c843cb970f15/airflow/utils/log/file_task_handler.py#L345
   
   However, it is incorrect since the `0o777` won't be correctly set, see my PR description.
   
   @uranusjr the main gap here is the about the dir mode, see this code block:
   
   ```
   import os
   from tempfile import TemporaryDirectory
   from uuid import uuid4
   from pathlib import Path
   
   with TemporaryDirectory() as tmpdir:
       path = os.path.join(tmpdir, str(uuid4()), str(uuid4()))
       Path(path).mkdir(mode=0o777, parents=True, exist_ok=True)
       mode = oct(os.stat(path).st_mode)[-3:]
       print('mode is', mode) 
      assert mode == "777"
   ```
   
   the `mode` is `755` not `777`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pingzh commented on pull request #28367: mkdirs should set mode correctly

Posted by GitBox <gi...@apache.org>.
pingzh commented on PR #28367:
URL: https://github.com/apache/airflow/pull/28367#issuecomment-1353827189

   @Taragolis thanks for the code sample, it boils down to the expectation that when users use this method `mkdirs`. 
   
   To me, it is a little bit confusing if i want a folder with `0o777`, but it factors in the umask and gives me a folder that is not `0o777`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pingzh commented on pull request #28367: mkdirs should set mode correctly

Posted by GitBox <gi...@apache.org>.
pingzh commented on PR #28367:
URL: https://github.com/apache/airflow/pull/28367#issuecomment-1354056644

   @Taragolis @potiuk it looks like it's better just to update the `file_task_handler.py` given the fact that `mkdir` with mode isn't widely used.  Let me know your thoughts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pingzh commented on pull request #28367: mkdirs should set mode correctly

Posted by GitBox <gi...@apache.org>.
pingzh commented on PR #28367:
URL: https://github.com/apache/airflow/pull/28367#issuecomment-1353698604

   > It may not be a good idea to reuse `mkdirs`. Its docstring specifically says _If directory already exists, this is a no-op_ and the new implementation is not compatible with the previous behaviour.
   
   
   @uranusjr  i think the new implementation is no-op when the dir already exists. it is actually from the previous implementation, e.g. 1.10.15
   
   https://github.com/apache/airflow/blob/5786dcdc392f7a2649f398353a0beebef01c428e/airflow/utils/file.py#L55-L73


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #28367: mkdirs should set mode correctly

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #28367:
URL: https://github.com/apache/airflow/pull/28367#issuecomment-1353867228

   I think there is far too much of confusion here in general.
   
   I'd rather say having "mkdir" which behaves different than Python and POSIX standard is confusing.
   
   This was a decision made by Python creators and while we might argue with it, it has a good reason and it is explained in the docs of Python very clearly and explicitly https://docs.python.org/3/library/pathlib.html#pathlib.Path.mkdir - so you might say a lot of users will be confused if aiflow's mkdir behaves differently.
   
   > If parents is true, any missing parents of this path are created as needed; they are created with the default permissions without taking mode into account (mimicking the POSIX mkdir -p command).
   
   I'd say having a different behaviour will only add to confusion (umask behaviour is sufficiently complex without it).  I would also say that if we want to change behaviour of this in specific places and we have a good reason for it (do we?) undeprecating a method that have been deprecated > 2 years ago https://github.com/apache/airflow/pull/10117 (which would add even more confusion) is a bad idea. Now - we want to add a new mkdirs util method that not only creates parents by default (contrary to the system lib, but also does it differently).  That's super-confusing, and there is even no attempt to explain it in the docs of hte newly added method.
   
   If we want to propose a change in behaviour of some pleces where currently Path.mkdir(parent=True) is used, then it should be a new feature added rather than silently changing it back. This ship has sailed.  And father than adding similarly named function to utils, it should have different name and signature to make it clear it is not similar to mkdirs behaviour.
   
   But - this PR does not do that, so I am rather sceptical if we want to add it at all because it seems there is no intention to use it ?  If we want to just add a new util file which is not used in airflow anyway why do we want to add it at all? 
   
   If it's not used, there is no reason to add it. YAGNI.
   
   If it "just" needs to be used elsewhere, outside of Airflow (I guess AirBnB uses it), it should be added as a library there, not in Airflow. 
   
   We ere **just now** discussing about being very explicit what Public Airflow Interface is https://github.com/apache/airflow/pull/28300 and the idea is there to be very explicit about what is and treat all the rest. Adding - effectively (after old one is deprecated for 2.5 years) a new util API which is unused makes very little sense to me. We only want to expose as external interface things that are "extremely" important to be exposed. We do not want others relying on random utils and small things in Airflow, because it makes it more difficult to maintain Airflow in the future.
   
   Are there any particular uses of it @pingzh in Airflow that you want to add now or soon (and if so - why it's not part of this PR)? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on pull request #28367: mkdirs should set mode correctly

Posted by GitBox <gi...@apache.org>.
uranusjr commented on PR #28367:
URL: https://github.com/apache/airflow/pull/28367#issuecomment-1352656353

   It may not be a good idea to reuse `mkdirs`. Its docstring specifically says _If directory already exists, this is a no-op_ and the new implementation is not compatible with the previous behaviour.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pingzh commented on pull request #28367: mkdirs should set mode correctly

Posted by GitBox <gi...@apache.org>.
pingzh commented on PR #28367:
URL: https://github.com/apache/airflow/pull/28367#issuecomment-1353935068

   @potiuk thanks for chiming me. I agree with that:
   
   > And rather than adding similarly named function to utils, it should have different name and signature to make it clear it is not similar to mkdirs behaviour.
   
   A better method name with very clear expectation is very important.
   
   
   as for the usage of this method, i was planning to update places that use ` Path(directory).mkdir(mode=0o777, parents=True, exist_ok=True) `. I should have called it out. as I don't want to make those changes, before we agree on the expected behavior of `mkdirs`. I noticed that while I was testing airflow 2 in Airbnb, but this method isn't directly used by our data pipelines)
   
   For example, the places that I plan to change is the 
   https://github.com/apache/airflow/blob/b263dbcb0f84fd9029591d1447a7c843cb970f15/airflow/utils/log/file_task_handler.py#L324-L346
   
   To me, the mode of `directory` has to be `0o777` otherwise other users cannot write logs.
   
   
   
   
   
   
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28367: mkdirs should set mode correctly

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28367:
URL: https://github.com/apache/airflow/pull/28367#issuecomment-1354013019

   This is the only one place in project where we tried to set `0o777` and it related only to SubDag, which nowadays deprecated due to tons of limitations.
   
   And fix (it it actually required) could be `Path(directory).chmod(0o777)` or `os.chmod(directory, 0o777)` after directory created


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org