You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/07/21 11:54:52 UTC

[GitHub] [airflow] GBAAM opened a new issue #17132: 2.1.2 (py 3.6 and 3.7) Docker Image broken

GBAAM opened a new issue #17132:
URL: https://github.com/apache/airflow/issues/17132


   The Docker Image for 2.1.2 (default Python 3.6 https://airflow.apache.org/docs/apache-airflow/2.1.2/docker-compose.yaml) is missing a few python packages:
   
   - `celery`	
   - `redis`
   - `python-psycopg2`
   - `psycopg2-binary`
   - `apache-airflow` (at the individual worker level)
   
   Unfortunately, installing them individually failed due to some Python 3.6 incompatibilities.
   
   At the same time, we tried to use the 3.8 image airflow:2.1.2-python3.8, injecting both `redis` and `apache-airflow[celery]` installation in a custom image. While this solved a few problems, it generated a set of new ones.
   
   When running airflow-init via docker-compose, we are presented with a new set of errors:
      
   
   > 2021-07-21 11:30:32.803 UTC [140] ERROR: column dag.last_parsed_time does not exist at character 166
   > 2021-07-21 11:30:32.803 UTC [140] STATEMENT: SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_parsed_time AS dag_last_parsed_time, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval, dag.concurrency AS dag_concurrency, dag.has_task_concurrency_limits AS dag_has_task_concurrency_limits, dag.next_dagrun AS dag_next_dagrun, dag.next_dagrun_create_after AS dag_next_dagrun_create_after 
   > 
   > 2021-07-21 11:30:22.511 UTC [106] STATEMENT: INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES ('2021-07-21T11:30:22.493764+00:00'::timestamptz, NULL, NULL, 'cli_scheduler', NULL, 'root', '{​​​​​​​​"host_name": "47eb5dd6ccc3", "full_command": "[''/home/airflow/.local/bin/airflow'', ''scheduler'']"}​​​​​​​​') RETURNING log.id
   > 
   >  2021-07-21 11:30:22.517 UTC [106] ERROR: relation "job" does not exist at character 13
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #17132: 2.1.2 (py 3.6 and 3.8) Docker Image broken

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #17132:
URL: https://github.com/apache/airflow/issues/17132


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17132: 2.1.2 (py 3.6 and 3.8) Docker Image broken

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17132:
URL: https://github.com/apache/airflow/issues/17132#issuecomment-884319302


   Also you have to be aware that the image has an entrypoint which sets a number of variables and does a number of checks. Your confusion might also come from the fact that you used own entrypoint and did not run the original one. If you have your own entrypoint, you should make sure you exec into the original entrypoint from the image after you do your stuff. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #17132: 2.1.2 (py 3.6 and 3.7) Docker Image broken

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #17132:
URL: https://github.com/apache/airflow/issues/17132#issuecomment-884131226


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17132: 2.1.2 (py 3.6 and 3.8) Docker Image broken

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17132:
URL: https://github.com/apache/airflow/issues/17132#issuecomment-884315771


   Not really. All the packages are there:
   
   ```
   docker run -it apache/airflow:2.1.2 bash
   airflow@af2bf98c9084:/opt/airflow$ pip freeze  | grep celery
   apache-airflow-providers-celery==2.0.0
   celery==4.4.7
   airflow@af2bf98c9084:/opt/airflow$ pip freeze  | grep redis
   apache-airflow-providers-redis==2.0.0
   google-cloud-redis==2.1.1
   redis==3.5.3
   airflow@af2bf98c9084:/opt/airflow$ pip freeze  | grep psycopg2
   psycopg2-binary==2.9.1
   ```
   
   The `lipq-dev` dependency should not be there. Airflow production image is highly optimized for runtime (for size) and it does not contain any of the -dev dependencies (also it does not contain `build essentials`. Those are only needed /added in the "build" segment during the image build and not installed in the "final" image where only runtime dependencies are added. That makes the image half the size it would be if all dev dependencies and build essentials were added.
   
   If you want to add your own custom dependencies you can build your own image following instructions in https://airflow.apache.org/docs/docker-stack/build.html  - if you need `build essentials` and `-dev` dependencies you should go "custom image" route rather than "extend the image" one. 
   
   Also I recommend to take a look at my talk where I explain how and why it is done: https://www.youtube.com/watch?v=wDr3Y7q2XoI
   
   Note, that in production image all the packages are installed in ${HOME}/.local (i.e. with the --user flag) so that you can easier extend the image without changing the user to root. Maybe that is the root cause of you thinking that the packages are not installed.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] GBAAM commented on issue #17132: 2.1.2 (py 3.6 and 3.8) Docker Image broken

Posted by GitBox <gi...@apache.org>.
GBAAM commented on issue #17132:
URL: https://github.com/apache/airflow/issues/17132#issuecomment-884188597


   Addition: in the 3.8 image, the dependency `libpq-dev` is also missing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17132: 2.1.2 (py 3.6 and 3.8) Docker Image broken

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17132:
URL: https://github.com/apache/airflow/issues/17132#issuecomment-884315771


   Not really. All the packages are there:
   
   ```
   docker run -it apache/airflow:2.1.2 bash
   airflow@af2bf98c9084:/opt/airflow$ pip | grep celery 
   airflow@af2bf98c9084:/opt/airflow$ pipdeptree | grep celery 
   bash: pipdeptree: command not found
   airflow@af2bf98c9084:/opt/airflow$ pip freeze  | grep celery
   apache-airflow-providers-celery==2.0.0
   celery==4.4.7
   airflow@af2bf98c9084:/opt/airflow$ pip freeze  | grep redis
   apache-airflow-providers-redis==2.0.0
   google-cloud-redis==2.1.1
   redis==3.5.3
   airflow@af2bf98c9084:/opt/airflow$ pip freeze  | grep psycopg2
   psycopg2-binary==2.9.1
   ```
   
   The `lipq-dev` dependency should not be there. Airflow production image is highly optimized for runtime (for size) and it does not contain any of the -dev dependencies (also it does not contain `build essentials`. Those are only needed /added in the "build" segment during the image build and not installed in the "final" image where only runtime dependencies are added. That makes the image half the size it would be if all dev dependencies and build essentials were added.
   
   If you want to add your own custom dependencies you can build your own image following instructions in https://airflow.apache.org/docs/docker-stack/build.html  - if you need `build essentials` and `-dev` dependencies you should go "custom image" route rather than "extend the image" one. 
   
   Also I recommend to take a look at my talk where I explain how and why it is done: https://www.youtube.com/watch?v=wDr3Y7q2XoI
   
   Note, that in production image all the packages are installed in ${HOME}/.local (i.e. with the --user flag) so that you can easier extend the image without changing the user to root. Maybe that is the root cause of you thinking that the packages are not installed.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org