You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/12/28 04:38:22 UTC

[GitHub] [airflow] uranusjr commented on a diff in pull request #28256: Include full path to Python files under zip path while clearing import errors.

uranusjr commented on code in PR #28256:
URL: https://github.com/apache/airflow/pull/28256#discussion_r1058051361


##########
airflow/dag_processing/manager.py:
##########
@@ -777,8 +777,9 @@ def clear_nonexistent_import_errors(self, session):
         :param session: session for ORM operations
         """
         query = session.query(errors.ImportError)
-        if self._file_paths:
-            query = query.filter(~errors.ImportError.filename.in_(self._file_paths))
+        files = list_py_file_paths(self._dag_directory, include_examples=False, include_zip_paths=True)

Review Comment:
   I think there are two possible alternatives. One is to introduce a new attribute on DagFileProcessorManager that stores the “full” paths, so we can use it instead of `_file_paths` here. The other is to introduce a new column on ImportError that store the filesystem path (i.e. path to the zip file) so we can filter it against `_file_paths`.
   
   The root issue here is that both `_file_paths` and `ImportError.filename` essentially has double meaning—they both represent the actual filesystem entry (path to an actual file), and a Python code loading target (path for the interpreter). Right now `_file_paths` is a list of filesystem entries, while `ImportError.filename` is a code target, and trying to comparing them is fundamentally not going to work.



##########
airflow/dag_processing/manager.py:
##########
@@ -777,8 +777,9 @@ def clear_nonexistent_import_errors(self, session):
         :param session: session for ORM operations
         """
         query = session.query(errors.ImportError)
-        if self._file_paths:
-            query = query.filter(~errors.ImportError.filename.in_(self._file_paths))
+        files = list_py_file_paths(self._dag_directory, include_examples=False, include_zip_paths=True)

Review Comment:
   I think there are two possible alternatives. One is to introduce a new attribute on DagFileProcessorManager that stores the “full” paths, so we can use it instead of `_file_paths` here. The other is to introduce a new column on ImportError that store the filesystem path (i.e. path to the zip file) so we can filter it against `_file_paths`.
   
   The root issue here is that both `_file_paths` and `ImportError.filename` essentially has double meaning—they both represent the actual filesystem entry (path to an actual file), and a Python code loading target (path for the interpreter). Right now `_file_paths` is a list of filesystem entries, while `ImportError.filename` is a code target, and trying to comparing them is fundamentally not a good idea.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org