You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/01 22:29:06 UTC

[GitHub] [airflow] Gollum999 opened a new issue #21259: Custom Timetables must be imported relative to $PLUGINS_FOLDER

Gollum999 opened a new issue #21259:
URL: https://github.com/apache/airflow/issues/21259


   ### Apache Airflow version
   
   2.2.3 (latest released)
   
   ### What happened
   
   I wrote a custom Timetable following the [example](https://airflow.apache.org/docs/apache-airflow/stable/howto/timetable.html).  `airflow plugins` reports that the plugin is registered correctly, and running the DAG script on the command line reports no errors.  But once the webserver attempts to load the dag, it complains:
   ```
   Broken DAG: [/home/tsanders/airflow/dags/test_airflow2.py] Traceback (most recent call last):
     File "/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/serialization/serialized_objects.py", line 269, in serialize_to_json
       serialized_object[key] = _encode_timetable(value)
     File "/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/serialization/serialized_objects.py", line 150, in _encode_timetable
       raise _TimetableNotRegistered(importable_string)
   airflow.serialization.serialized_objects._TimetableNotRegistered: Timetable class 'tb.airflow.plugins.live_timetable.LiveCronTimetable' is not registered
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/serialization/serialized_objects.py", line 935, in to_dict
       json_dict = {"__version": cls.SERIALIZER_VERSION, "dag": cls.serialize_dag(var)}
     File "/opt/conda/envs/airflow/lib/python3.9/site-packages/airflow/serialization/serialized_objects.py", line 847, in serialize_dag
       raise SerializationError(f'Failed to serialize DAG {dag.dag_id!r}: {e}')
   airflow.exceptions.SerializationError: Failed to serialize DAG 'test_airflow2_7': Timetable class 'tb.airflow.plugins.live_timetable.LiveCronTimetable' is not registered
   ```
   
   ### What you expected to happen
   
   The DAG should import successfully in all contexts.
   
   ### How to reproduce
   
   I eventually realized that the problem only arises when the Timetable is imported relative to a directory that is not the `$PLUGINS_FOLDER`.  In my case, I have changed `core.plugins_folder` to another directory that also happens to be on my `PYTHONPATH`.
   
   Steps to reproduce:
   ```
   $ mkdir ~/airflow_plugins
   $ export PYTHONPATH=~:$PYTHONPATH
   $ export AIRFLOW__CORE__PLUGINS_FOLDER=~/airflow_plugins
   
   $ cat ~/airflow_plugins/custom_timetable.py
   from airflow.plugins_manager import AirflowPlugin
   from airflow.timetables.interval import DeltaDataIntervalTimetable
   
   class CustomTimetable(DeltaDataIntervalTimetable):
       pass
   
   class CustomTimetablePlugin(AirflowPlugin):
       name = "custom_timetable_plugin"
       timetables = [CustomTimetable]
   
   $ cat ~/airflow/dags/dag.py
   #!/usr/bin/env python3
   
   from datetime import datetime, timedelta
   import sys
   
   from airflow import DAG, plugins_manager
   from airflow.operators.dummy import DummyOperator
   
   from airflow_plugins.custom_timetable import CustomTimetable as Bad
   from custom_timetable import CustomTimetable as Good
   
   plugins_manager.initialize_timetables_plugins()
   print(sys.path)
   print(plugins_manager.as_importable_string(Bad))
   print(plugins_manager.as_importable_string(Good))
   print(plugins_manager.timetable_classes)
   
   with DAG(
           'timetable_example',
           timetable=Bad(timedelta(hours=1)),  # this breaks
           # timetable=Good(timedelta(hours=1)),  # this doesn't
   ) as dag:
       task = DummyOperator(task_id='dummy', start_date=datetime.today())
   
   $ airflow plugins
   name                    | source                             
   ========================+====================================
   custom_timetable_plugin | $PLUGINS_FOLDER/custom_timetable.py
   
   $ ~/airflow/dags/dag.py
   ['/home/tsanders/airflow/dags', '/home/tsanders', '/opt/conda/envs/airflow/lib/python39.zip', '/opt/conda/envs/airflow/lib/python3.9', '/opt/conda/envs/airflow/lib/python3.9/lib-dynload', '/home/tsanders/.local/lib/python3.9/site-packages', '/opt/conda/envs/airflow/lib/python3.9/site-packages', '/home/tsanders/airflow/config', '/home/tsanders/airflow_plugins']
   airflow_plugins.custom_timetable.CustomTimetable
   custom_timetable.CustomTimetable
   {'custom_timetable.CustomTimetable': <class 'custom_timetable.CustomTimetable'>}
   
   # Load web UI here, 'timetable_example' task will be broken or not depending on which import was used
   ```
   
   ### Operating System
   
   CentOS 7.4
   
   ### Versions of Apache Airflow Providers
   
   N/A
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   Currently testing with `standalone` mode.
   
   ### Anything else
   
   Possibly related to #19869.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] bmoon4 commented on issue #21259: Custom Timetables must be imported relative to $PLUGINS_FOLDER

Posted by GitBox <gi...@apache.org>.
bmoon4 commented on issue #21259:
URL: https://github.com/apache/airflow/issues/21259#issuecomment-1027565795


   
   I think sample code is wrong. 
   
   Asterisk (*) is missing in `infer_manual_data_interval`'s parameter
   
   https://airflow.apache.org/docs/apache-airflow/stable/howto/timetable.html#define-scheduling-logic
   ![image](https://user-images.githubusercontent.com/28991527/152092702-38f9febe-9553-4f26-9e57-53d57f424a01.png)
   
   
   
   https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/timetables/base/index.html#airflow.timetables.base.Timetable.infer_manual_data_interval
   ![image](https://user-images.githubusercontent.com/28991527/152092728-eac3953c-50fd-487c-a1d1-3acdb4425069.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] bmoon4 removed a comment on issue #21259: Custom Timetables must be imported relative to $PLUGINS_FOLDER

Posted by GitBox <gi...@apache.org>.
bmoon4 removed a comment on issue #21259:
URL: https://github.com/apache/airflow/issues/21259#issuecomment-1027651920


   Check this out https://stackoverflow.com/questions/69732193/airflow-2-2-timetable-for-schedule-always-with-error-timetable-not-registered/70948018


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] bmoon4 commented on issue #21259: Custom Timetables must be imported relative to $PLUGINS_FOLDER

Posted by GitBox <gi...@apache.org>.
bmoon4 commented on issue #21259:
URL: https://github.com/apache/airflow/issues/21259#issuecomment-1027651920


   Check this out https://stackoverflow.com/questions/69732193/airflow-2-2-timetable-for-schedule-always-with-error-timetable-not-registered/70948018


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr edited a comment on issue #21259: Custom Timetables must be imported relative to $PLUGINS_FOLDER

Posted by GitBox <gi...@apache.org>.
uranusjr edited a comment on issue #21259:
URL: https://github.com/apache/airflow/issues/21259#issuecomment-1027620661


   The asterisk is optional in this case. It means `run_after` can only be called as a keyword argument, while the example code allows the caller to use it as _either_ positional or keyword. But since the function is always only called with keyword arguments in Airflow, both syntax will work. The example code is not wrong.
   
   As for the absolute import issue, unfortunately this is a (somewhat obsecure and annoying) problem in Python. If you add a directory in `sys.path` via multiple paths, names imported from different `sys.path` items have different identities.
   
   ```console
   $ tree
   .
   |-- plugins
       |-- __init__.py
       `-- mod.py
   $ cat plugins/mod.py
   class A:
       pass
   $ PYTHONPATH=plugins python -q  # Simulate how Airflow loads the plugin directory.
   >>> from plugins.mod import A as AFromAbsolute
   >>> from mod import A as AFromPlugin
   >>> AFromAbsolute == AFromPlugin
   False
   ```
   
   And since timetable is loaded from plugin, you must import relative to the plugin directory, or somewhere that’s not related to the plugin directory hierarchy, such as `site-packages`. There are _probably_ hacks we can use to work around this, but this kind of “working above” the Python language rules is not a good idea to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr edited a comment on issue #21259: Custom Timetables must be imported relative to $PLUGINS_FOLDER

Posted by GitBox <gi...@apache.org>.
uranusjr edited a comment on issue #21259:
URL: https://github.com/apache/airflow/issues/21259#issuecomment-1027620661


   The asterisk is optional in this case. It means `run_after` can only be called as a keyword argument, while the example code allows the caller to use it as _either_ positional or keyword. But since the function is always only called with keyword arguments in Airflow, both syntax will work. The example code is not wrong.
   
   As for the absolute import issue, unfortunately this is a (somewhat obsecure and annoying) problem in Python. If you add a directory in `sys.path` via multiple paths, names imported from different `sys.path` items have different identities.
   
   ```console
   $ tree
   .
   `-- plugins
       |-- __init__.py
       `-- mod.py
   $ cat plugins/mod.py
   class A:
       pass
   $ PYTHONPATH=plugins python -q  # Simulate how Airflow loads the plugin directory.
   >>> from plugins.mod import A as AFromAbsolute
   >>> from mod import A as AFromPlugin
   >>> AFromAbsolute == AFromPlugin
   False
   ```
   
   And since timetable is loaded from plugin, you must import relative to the plugin directory, or somewhere that’s not related to the plugin directory hierarchy, such as `site-packages`. There are _probably_ hacks we can use to work around this, but this kind of “working above” the Python language rules is not a good idea to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #21259: Custom Timetables must be imported relative to $PLUGINS_FOLDER

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #21259:
URL: https://github.com/apache/airflow/issues/21259#issuecomment-1027620661


   The asterisk is optional in this case. It means `run_after` can only be called as a keyword argument, while the example code allows the caller to use it as _either_ positional or keyword. But since the function is always only called with keyword arguments in Airflow, both syntax will work. The example code is not wrong.
   
   As for the absolute import issue, unfortunately this is a (somewhat obsecure and annoying) problem in Python. If you add a directory in `sys.path` via multiple paths, names imported from different `sys.path` items have different identities.
   
   ```console
   $ tree
   .
   |-- plugins
       |-- __init__.py
       `-- mod.py
   $ cat plugins/mod.py
   class A:
       pass
   $ PYTHONPATH=plugins python -q  # Simulate how Airflow loads the plugin directory.
   >>> from plugins.mod import A as AFromAbsolute
   >>> from mod import A as AFromPlugin
   >>> AFromAbsolute == AFromPlugin
   False
   ```
   
   And since timetable is loaded fomr plugin, you must import relative to the plugin directory, or somewhere that’s not related to the plugin directory hierarchy, such as `site-packages`. There are _probably_ hacks we can use to work around this, but this kind of “working above” the Python language rules is not a good idea to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #21259: Custom Timetables must be imported relative to $PLUGINS_FOLDER

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #21259:
URL: https://github.com/apache/airflow/issues/21259#issuecomment-1027350749


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] bmoon4 removed a comment on issue #21259: Custom Timetables must be imported relative to $PLUGINS_FOLDER

Posted by GitBox <gi...@apache.org>.
bmoon4 removed a comment on issue #21259:
URL: https://github.com/apache/airflow/issues/21259#issuecomment-1027653593


   I had a same problem a couple of weeks ago. Found this https://stackoverflow.com/questions/69732193/airflow-2-2-timetable-for-schedule-always-with-error-timetable-not-registered/70948018 and the 3rd answer helped me fix the problem. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] bmoon4 commented on issue #21259: Custom Timetables must be imported relative to $PLUGINS_FOLDER

Posted by GitBox <gi...@apache.org>.
bmoon4 commented on issue #21259:
URL: https://github.com/apache/airflow/issues/21259#issuecomment-1027653593


   I had a same problem a couple of weeks ago. Found this https://stackoverflow.com/questions/69732193/airflow-2-2-timetable-for-schedule-always-with-error-timetable-not-registered/70948018 and the 3rd answer helped me fix the problem. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org