You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/12/19 20:15:01 UTC

[GitHub] [airflow] pingzh opened a new pull request, #28477: Ensure correct log dir in file task handler

pingzh opened a new pull request, #28477:
URL: https://github.com/apache/airflow/pull/28477

   since Path.mkdir combines with the process’ umask value to determine the file mode and access flags, thus the newly created folder isn't 0o777
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #28477: Ensure correct log dir in file task handler

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #28477:
URL: https://github.com/apache/airflow/pull/28477#discussion_r1055190200


##########
airflow/utils/log/file_task_handler.py:
##########
@@ -318,6 +319,34 @@ def read(self, task_instance, try_number=None, metadata=None):
 
         return logs, metadata_array
 
+    def _prepare_log_folder(self, directory: str):
+        """
+        To handle log writing when tasks are impersonated, the log files need to
+        be writable by the user that runs the Airflow command and the user
+        that is impersonated. This is mainly to handle corner cases with the
+        SubDagOperator. When the SubDagOperator is run, all of the operators
+        run under the impersonated user and create appropriate log files
+        as the impersonated user. However, if the user manually runs tasks
+        of the SubDagOperator through the UI, then the log files are created
+        by the user that runs the Airflow command. For example, the Airflow
+        run command may be run by the `airflow_sudoable` user, but the Airflow
+        tasks may be run by the `airflow` user. If the log files are not
+        writable by both users, then it's possible that re-running a task
+        via the UI (or vice versa) results in a permission error as the task
+        tries to write to a log file created by the other user.
+
+        Create the log file and give it group writable permissions
+        TODO(aoen): Make log dirs and logs globally readable for now since the SubDag
+        operator is not compatible with impersonation (e.g. if a Celery executor is used
+        for a SubDag operator and the SubDag operator has a different owner than the
+        parent DAG)
+        """
+        mode = 0o777
+        Path(directory).mkdir(mode=mode, parents=True, exist_ok=True)
+        current_mode = stat.S_IMODE(os.stat(directory).st_mode)
+        if current_mode != mode:
+            os.chmod(directory, mode)

Review Comment:
   If `directory` is a `Path`, this can simply be
   
   ```python
   directory.mkdir(mode=mode, parents=True, exist_ok=True)
   if directory.stat().st_mode != mode:
       directory.chmod(mode)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pingzh commented on a diff in pull request #28477: Ensure correct log dir in file task handler

Posted by GitBox <gi...@apache.org>.
pingzh commented on code in PR #28477:
URL: https://github.com/apache/airflow/pull/28477#discussion_r1060859760


##########
airflow/utils/log/file_task_handler.py:
##########
@@ -318,6 +318,34 @@ def read(self, task_instance, try_number=None, metadata=None):
 
         return logs, metadata_array
 
+    def _prepare_log_folder(self, directory: str):
+        """
+        To handle log writing when tasks are impersonated, the log files need to
+        be writable by the user that runs the Airflow command and the user
+        that is impersonated. This is mainly to handle corner cases with the
+        SubDagOperator. When the SubDagOperator is run, all of the operators
+        run under the impersonated user and create appropriate log files
+        as the impersonated user. However, if the user manually runs tasks
+        of the SubDagOperator through the UI, then the log files are created
+        by the user that runs the Airflow command. For example, the Airflow
+        run command may be run by the `airflow_sudoable` user, but the Airflow
+        tasks may be run by the `airflow` user. If the log files are not
+        writable by both users, then it's possible that re-running a task
+        via the UI (or vice versa) results in a permission error as the task
+        tries to write to a log file created by the other user.
+
+        Create the log file and give it group writable permissions
+        TODO(aoen): Make log dirs and logs globally readable for now since the SubDag
+        operator is not compatible with impersonation (e.g. if a Celery executor is used
+        for a SubDag operator and the SubDag operator has a different owner than the
+        parent DAG)
+        """

Review Comment:
   thanks. do you mean D400? i dont find D440. https://github.com/apache/airflow/issues/10742



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pingzh merged pull request #28477: Ensure correct log dir in file task handler

Posted by GitBox <gi...@apache.org>.
pingzh merged PR #28477:
URL: https://github.com/apache/airflow/pull/28477


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #28477: Ensure correct log dir in file task handler

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #28477:
URL: https://github.com/apache/airflow/pull/28477#discussion_r1061223496


##########
airflow/utils/log/file_task_handler.py:
##########
@@ -318,6 +318,34 @@ def read(self, task_instance, try_number=None, metadata=None):
 
         return logs, metadata_array
 
+    def _prepare_log_folder(self, directory: str):
+        """
+        To handle log writing when tasks are impersonated, the log files need to
+        be writable by the user that runs the Airflow command and the user
+        that is impersonated. This is mainly to handle corner cases with the
+        SubDagOperator. When the SubDagOperator is run, all of the operators
+        run under the impersonated user and create appropriate log files
+        as the impersonated user. However, if the user manually runs tasks
+        of the SubDagOperator through the UI, then the log files are created
+        by the user that runs the Airflow command. For example, the Airflow
+        run command may be run by the `airflow_sudoable` user, but the Airflow
+        tasks may be run by the `airflow` user. If the log files are not
+        writable by both users, then it's possible that re-running a task
+        via the UI (or vice versa) results in a permission error as the task
+        tries to write to a log file created by the other user.
+
+        Create the log file and give it group writable permissions
+        TODO(aoen): Make log dirs and logs globally readable for now since the SubDag
+        operator is not compatible with impersonation (e.g. if a Celery executor is used
+        for a SubDag operator and the SubDag operator has a different owner than the
+        parent DAG)
+        """

Review Comment:
   Sorry I meant D400.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #28477: Ensure correct log dir in file task handler

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #28477:
URL: https://github.com/apache/airflow/pull/28477#discussion_r1057532538


##########
airflow/utils/log/file_task_handler.py:
##########
@@ -340,13 +368,7 @@ def _init_file(self, ti):
         # tries to write to a log file created by the other user.
         relative_path = self._render_filename(ti, ti.try_number)
         full_path = os.path.join(self.local_base, relative_path)
-        directory = os.path.dirname(full_path)
-        # Create the log file and give it group writable permissions
-        # TODO(aoen): Make log dirs and logs globally readable for now since the SubDag
-        # operator is not compatible with impersonation (e.g. if a Celery executor is used
-        # for a SubDag operator and the SubDag operator has a different owner than the
-        # parent DAG)
-        Path(directory).mkdir(mode=0o777, parents=True, exist_ok=True)
+        self._prepare_log_folder(os.path.dirname(full_path))

Review Comment:
   Since this is a private function call, we could do this instead to streamline the path manipulation logic:
   
   ```python
   self._prepare_log_folder(Path(full_path).parent)
   
   def _prepare_log_folder(self, directory: Path) -> None:
       mode = 0o777
       try:
           directory.mkdir(mode=mode, parents=True)
       except FileExistsError:
           directory.chmod(mode)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pingzh commented on a diff in pull request #28477: Ensure correct log dir in file task handler

Posted by GitBox <gi...@apache.org>.
pingzh commented on code in PR #28477:
URL: https://github.com/apache/airflow/pull/28477#discussion_r1060867282


##########
airflow/utils/log/file_task_handler.py:
##########
@@ -340,13 +368,7 @@ def _init_file(self, ti):
         # tries to write to a log file created by the other user.
         relative_path = self._render_filename(ti, ti.try_number)
         full_path = os.path.join(self.local_base, relative_path)
-        directory = os.path.dirname(full_path)
-        # Create the log file and give it group writable permissions
-        # TODO(aoen): Make log dirs and logs globally readable for now since the SubDag
-        # operator is not compatible with impersonation (e.g. if a Celery executor is used
-        # for a SubDag operator and the SubDag operator has a different owner than the
-        # parent DAG)
-        Path(directory).mkdir(mode=0o777, parents=True, exist_ok=True)
+        self._prepare_log_folder(os.path.dirname(full_path))

Review Comment:
   good idea to promote the usage of `Path`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pingzh commented on a diff in pull request #28477: Ensure correct log dir in file task handler

Posted by GitBox <gi...@apache.org>.
pingzh commented on code in PR #28477:
URL: https://github.com/apache/airflow/pull/28477#discussion_r1055962697


##########
airflow/utils/log/file_task_handler.py:
##########
@@ -318,6 +319,34 @@ def read(self, task_instance, try_number=None, metadata=None):
 
         return logs, metadata_array
 
+    def _prepare_log_folder(self, directory: str):
+        """
+        To handle log writing when tasks are impersonated, the log files need to
+        be writable by the user that runs the Airflow command and the user
+        that is impersonated. This is mainly to handle corner cases with the
+        SubDagOperator. When the SubDagOperator is run, all of the operators
+        run under the impersonated user and create appropriate log files
+        as the impersonated user. However, if the user manually runs tasks
+        of the SubDagOperator through the UI, then the log files are created
+        by the user that runs the Airflow command. For example, the Airflow
+        run command may be run by the `airflow_sudoable` user, but the Airflow
+        tasks may be run by the `airflow` user. If the log files are not
+        writable by both users, then it's possible that re-running a task
+        via the UI (or vice versa) results in a permission error as the task
+        tries to write to a log file created by the other user.
+
+        Create the log file and give it group writable permissions
+        TODO(aoen): Make log dirs and logs globally readable for now since the SubDag
+        operator is not compatible with impersonation (e.g. if a Celery executor is used
+        for a SubDag operator and the SubDag operator has a different owner than the
+        parent DAG)
+        """
+        mode = 0o777
+        Path(directory).mkdir(mode=mode, parents=True, exist_ok=True)
+        current_mode = stat.S_IMODE(os.stat(directory).st_mode)
+        if current_mode != mode:
+            os.chmod(directory, mode)

Review Comment:
   good call. updated @uranusjr ptal



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #28477: Ensure correct log dir in file task handler

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #28477:
URL: https://github.com/apache/airflow/pull/28477#discussion_r1057533683


##########
airflow/utils/log/file_task_handler.py:
##########
@@ -318,6 +318,34 @@ def read(self, task_instance, try_number=None, metadata=None):
 
         return logs, metadata_array
 
+    def _prepare_log_folder(self, directory: str):
+        """
+        To handle log writing when tasks are impersonated, the log files need to
+        be writable by the user that runs the Airflow command and the user
+        that is impersonated. This is mainly to handle corner cases with the
+        SubDagOperator. When the SubDagOperator is run, all of the operators
+        run under the impersonated user and create appropriate log files
+        as the impersonated user. However, if the user manually runs tasks
+        of the SubDagOperator through the UI, then the log files are created
+        by the user that runs the Airflow command. For example, the Airflow
+        run command may be run by the `airflow_sudoable` user, but the Airflow
+        tasks may be run by the `airflow` user. If the log files are not
+        writable by both users, then it's possible that re-running a task
+        via the UI (or vice versa) results in a permission error as the task
+        tries to write to a log file created by the other user.
+
+        Create the log file and give it group writable permissions
+        TODO(aoen): Make log dirs and logs globally readable for now since the SubDag
+        operator is not compatible with impersonation (e.g. if a Celery executor is used
+        for a SubDag operator and the SubDag operator has a different owner than the
+        parent DAG)
+        """

Review Comment:
   We’re in the process of converting the code base to apply D440, so it’d be nice if you could rewrite this docstring to conform to that as well.
   
   See #10742



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org