You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "t oo (Jira)" <ji...@apache.org> on 2019/09/04 23:06:00 UTC

[jira] [Updated] (AIRFLOW-5355) 1.10.4 upgrade issues - No module named kubernetes (but i'm using localexecutor)

     [ https://issues.apache.org/jira/browse/AIRFLOW-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

t oo updated AIRFLOW-5355:
--------------------------
    Description: 
i upgraded from 1.10.3 to 1.10.4 just now but in the main page of the ui, if i click refresh button next to my dag it shows error 'Broken DAG: [x.py] No module named kubernetes'

 

I have debug log enabled but have no idea why it is trying to find kubernetes.

 

 

my install steps:

pip install cryptography mysqlclient ldap3 gunicorn[gevent] 

pip install kubernetes

pip install apache-airflow-1.10.4-bin.tar.gz

pip install apache-airflow-1.10.4-bin.tar.gz[kubernetes]

airflow initdb

airflow upgradedb

 

my only dag has:

 

import datetime as dt

import glob

import json

import logging

import os

import subprocess

import re

from airflow import DAG

from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator

from airflow.operators.python_operator import PythonOperator

from airflow.operators.dummy_operator import DummyOperator

from airflow.operators.python_operator import BranchPythonOperator

from airflow.hooks.base_hook import BaseHook

 

 

airflow 1.10.4, localexecutor, python2, t3.large EC2, spark standalone scheduler

 

it worked fine in 1.10.3

 

some things that may be relevant:
 # when i query dag table in mysql metastore db table, some dags have  {{last_expired populated with a timestamp but some have }}{{last_expired}}{{ that is empty. }}
 # {{i am doing blue-green deploy so i have one ec2 running 1.10.3 and one ec2 running 1.10.4 but both ec2s are talking to a common mysql metastore}}

 

UPDATE As an awful workaround i commented out all references to kube*/pod in many .py files from site-packages!

 

 

some other small issues:

a) tutorial.py dag is in the ui, how to remove? when i click delete it says tutorial.py not found UPDATE: delete dag from cli removed this

b) [2019-08-30 04:12:57,714] \{scheduler_job.py:924} WARNING - Tasks using non-existent pool '' will not be scheduled is in logs. UPDATE: due to pool=None in spark_submit, once that was removed it could run.

c) this is in logs: UPDATE - [https://github.com/apache/airflow/pull/5330#issuecomment-526919369] mentions expected

airflow-scheduler.log-[2019-08-30 09:05:38,451] \{settings.py:327} DEBUG - Failed to import airflow_local_settings.
 airflow-scheduler.log-Traceback (most recent call last):
 airflow-scheduler.log- File "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/settings.py", line 315, in import_local_settings
 airflow-scheduler.log- import airflow_local_settings
 airflow-scheduler.log:ImportError: No module named airflow_local_settings
 airflow-scheduler.log-[2019-08-30 09:05:38,452] \{logging_config.py:59} DEBUG - Unable to load custom logging, using default config instead

  was:
i upgraded from 1.10.3 to 1.10.4 just now but in the main page of the ui, if i click refresh button next to my dag it shows error 'Broken DAG: [x.py] No module named kubernetes'

 

I have debug log enabled but have no idea why it is trying to find kubernetes.

 

 

my install steps:

pip install cryptography mysqlclient ldap3 gunicorn[gevent] 

pip install kubernetes

pip install apache-airflow-1.10.4-bin.tar.gz

pip install apache-airflow-1.10.4-bin.tar.gz[kubernetes]

airflow initdb

airflow upgradedb

 

my only dag has:

 

import datetime as dt

import glob

import json

import logging

import os

import subprocess

import re

from airflow import DAG

from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator

from airflow.operators.python_operator import PythonOperator

from airflow.operators.dummy_operator import DummyOperator

from airflow.operators.python_operator import BranchPythonOperator

from airflow.hooks.base_hook import BaseHook

 

 

airflow 1.10.4, localexecutor, python2, t3.large EC2, spark standalone scheduler

 

it worked fine in 1.10.3

 

some things that may be relevant:
 # when i query dag table in mysql metastore db table, some dags have  {{last_expired populated with a timestamp but some have }}{{last_expired}}{{ that is empty. }}
 # {{i am doing blue-green deploy so i have one ec2 running 1.10.3 and one ec2 running 1.10.4 but both ec2s are talking to a common mysql metastore}}

 

 

some other small issues:

a) tutorial.py dag is in the ui, how to remove? when i click delete it says tutorial.py not found

b) [2019-08-30 04:12:57,714] \{scheduler_job.py:924} WARNING - Tasks using non-existent pool '' will not be scheduled is in logs

c) this is in logs:

airflow-scheduler.log-[2019-08-30 09:05:38,451] \{settings.py:327} DEBUG - Failed to import airflow_local_settings.
 airflow-scheduler.log-Traceback (most recent call last):
 airflow-scheduler.log- File "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/settings.py", line 315, in import_local_settings
 airflow-scheduler.log- import airflow_local_settings
 airflow-scheduler.log:ImportError: No module named airflow_local_settings
 airflow-scheduler.log-[2019-08-30 09:05:38,452] \{logging_config.py:59} DEBUG - Unable to load custom logging, using default config instead


> 1.10.4 upgrade issues - No module named kubernetes (but i'm using localexecutor)
> --------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5355
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5355
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: ui
>    Affects Versions: 1.10.4
>            Reporter: t oo
>            Priority: Major
>
> i upgraded from 1.10.3 to 1.10.4 just now but in the main page of the ui, if i click refresh button next to my dag it shows error 'Broken DAG: [x.py] No module named kubernetes'
>  
> I have debug log enabled but have no idea why it is trying to find kubernetes.
>  
>  
> my install steps:
> pip install cryptography mysqlclient ldap3 gunicorn[gevent] 
> pip install kubernetes
> pip install apache-airflow-1.10.4-bin.tar.gz
> pip install apache-airflow-1.10.4-bin.tar.gz[kubernetes]
> airflow initdb
> airflow upgradedb
>  
> my only dag has:
>  
> import datetime as dt
> import glob
> import json
> import logging
> import os
> import subprocess
> import re
> from airflow import DAG
> from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator
> from airflow.operators.python_operator import PythonOperator
> from airflow.operators.dummy_operator import DummyOperator
> from airflow.operators.python_operator import BranchPythonOperator
> from airflow.hooks.base_hook import BaseHook
>  
>  
> airflow 1.10.4, localexecutor, python2, t3.large EC2, spark standalone scheduler
>  
> it worked fine in 1.10.3
>  
> some things that may be relevant:
>  # when i query dag table in mysql metastore db table, some dags have  {{last_expired populated with a timestamp but some have }}{{last_expired}}{{ that is empty. }}
>  # {{i am doing blue-green deploy so i have one ec2 running 1.10.3 and one ec2 running 1.10.4 but both ec2s are talking to a common mysql metastore}}
>  
> UPDATE As an awful workaround i commented out all references to kube*/pod in many .py files from site-packages!
>  
>  
> some other small issues:
> a) tutorial.py dag is in the ui, how to remove? when i click delete it says tutorial.py not found UPDATE: delete dag from cli removed this
> b) [2019-08-30 04:12:57,714] \{scheduler_job.py:924} WARNING - Tasks using non-existent pool '' will not be scheduled is in logs. UPDATE: due to pool=None in spark_submit, once that was removed it could run.
> c) this is in logs: UPDATE - [https://github.com/apache/airflow/pull/5330#issuecomment-526919369] mentions expected
> airflow-scheduler.log-[2019-08-30 09:05:38,451] \{settings.py:327} DEBUG - Failed to import airflow_local_settings.
>  airflow-scheduler.log-Traceback (most recent call last):
>  airflow-scheduler.log- File "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/settings.py", line 315, in import_local_settings
>  airflow-scheduler.log- import airflow_local_settings
>  airflow-scheduler.log:ImportError: No module named airflow_local_settings
>  airflow-scheduler.log-[2019-08-30 09:05:38,452] \{logging_config.py:59} DEBUG - Unable to load custom logging, using default config instead



--
This message was sent by Atlassian Jira
(v8.3.2#803003)