You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/12/02 13:32:13 UTC

[GitHub] [airflow] tirkarthi opened a new issue, #28068: ImportError database entries get cleared for zip files due to filepath mismatch

tirkarthi opened a new issue, #28068:
URL: https://github.com/apache/airflow/issues/28068

   ### Apache Airflow version
   
   main (development)
   
   ### What happened
   
   ImportError table stores filename with full path to the Python file inside zip file. But during dag processing when the files are listed the python files inside for zipfiles are not listed and it returns just the zip files which makes the dag processor think the file is not present anymore and deletes the ImportError entry. Below `self._file_paths` doesn't have the full path to zip files.
   
   I will propose a patch for this so that the zip file paths are present during query time only since the processor itself doesn't need full paths,
   
   https://github.com/apache/airflow/blob/ada91b686508218752fee176d29d63334364a7f2/airflow/dag_processing/manager.py#L772-L783
   
   https://github.com/apache/airflow/blob/ada91b686508218752fee176d29d63334364a7f2/airflow/utils/file.py#L281
   
   ```
   airflow_db=> select filename from import_error;
                            filename                          
   -----------------------------------------------------------
    /home/karthikeyan/airflow/dags/error_dag.zip/error_dag.py
   ```
   
   ### What you think should happen instead
   
   The ImportError should not be deleted as long as the zip file is present.
   
   ### How to reproduce
   
   Test case
   
   ```patch
   diff --git a/tests/dag_processing/test_manager.py b/tests/dag_processing/test_manager.py
   index 7df0e78840..f62bfee2cd 100644
   --- a/tests/dag_processing/test_manager.py
   +++ b/tests/dag_processing/test_manager.py
   @@ -180,6 +180,51 @@ class TestDagFileProcessorManager:
            child_pipe.close()
            parent_pipe.close()
    
   +    @conf_vars({("core", "dagbag_import_error_tracebacks"): "False"})
   +    def test_zipfile_import_error_not_cleared(self, tmpdir):
   +        from zipfile import ZipFile
   +
   +        UNPARSEABLE_DAG_FILE_CONTENTS = "airflow dag"
   +        TEMP_DAG_FILENAME = "temp_dag.py"
   +
   +        zip_filename = os.path.join(tmpdir, "test_zip.zip")
   +        invalid_dag_filename = os.path.join(zip_filename, TEMP_DAG_FILENAME)
   +        with ZipFile(zip_filename, "w") as zip_file:
   +            zip_file.writestr(TEMP_DAG_FILENAME, "an invalid airflow DAG")
   +
   +        child_pipe, parent_pipe = multiprocessing.Pipe()
   +
   +        async_mode = "sqlite" not in conf.get("database", "sql_alchemy_conn")
   +        manager = DagFileProcessorManager(
   +            dag_directory=tmpdir,
   +            max_runs=1,
   +            processor_timeout=timedelta(days=365),
   +            signal_conn=child_pipe,
   +            dag_ids=[],
   +            pickle_dags=False,
   +            async_mode=async_mode,
   +        )
   +
   +        with create_session() as session:
   +            self.run_processor_manager_one_loop(manager, parent_pipe)
   +            import_errors = session.query(errors.ImportError).all()
   +            assert len(import_errors) == 1
   +
   +            manager.clear_nonexistent_import_errors(session)
   +            import_errors = session.query(errors.ImportError).all()
   +            assert len(import_errors) == 1
   +
   +            os.remove(zip_filename)
   +            manager.clear_nonexistent_import_errors(session)
   +            import_errors = session.query(errors.ImportError).all()
   +            assert len(import_errors) == 0
   +
   +            session.rollback()
   +
   +        child_pipe.close()
   +        parent_pipe.close()
   +
   +
        @conf_vars({("core", "load_examples"): "False"})
        def test_max_runs_when_no_files(self):
   ```
   
   ### Operating System
   
   Ubuntu
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] ImportError database entries get cleared for zip files due to filepath mismatch [airflow]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #28068: ImportError database entries get cleared for zip files due to filepath mismatch
URL: https://github.com/apache/airflow/issues/28068


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] ImportError database entries get cleared for zip files due to filepath mismatch [airflow]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #28068:
URL: https://github.com/apache/airflow/issues/28068#issuecomment-1840120979

   This issue has been automatically marked as stale because it has been open for 365 days without any activity. There has been several Airflow releases since last activity on this issue. Kindly asking to recheck the report against latest Airflow version and let us know if the issue is reproducible. The issue will be closed in next 30 days if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #28068: ImportError database entries get cleared for zip files due to filepath mismatch

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #28068:
URL: https://github.com/apache/airflow/issues/28068#issuecomment-1336646697

   nice!. looking forward to a pr.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] ImportError database entries get cleared for zip files due to filepath mismatch [airflow]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #28068:
URL: https://github.com/apache/airflow/issues/28068#issuecomment-1878221437

   This issue has been closed because it has not received response from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org