You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/11 11:41:02 UTC

[GitHub] [airflow] jgr-trackunit commented on issue #25640: Status of testing Providers that were prepared on August 10, 2022

jgr-trackunit commented on issue #25640:
URL: https://github.com/apache/airflow/issues/25640#issuecomment-1211875172

   Hi!
   
   I've found an issue with `databricks` provider, at first glance it looks that it's related to: https://github.com/apache/airflow/pull/25115 
   
   @alexott I think you might be interesting for you.
   
   More info below:
   ```
   [2022-08-11, 11:10:43 UTC] {{standard_task_runner.py:91}} ERROR - Failed to execute job 34817 for task xxxxx
   Traceback (most recent call last):
     File "/usr/local/lib/python3.7/site-packages/airflow/task/task_runner/standard_task_runner.py", line 85, in _start_by_fork
       args.func(args, dag=self.dag)
     File "/usr/local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py", line 92, in wrapper
       return f(*args, **kwargs)
     File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 292, in task_run
       _run_task_by_selected_method(args, dag, ti)
     File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 107, in _run_task_by_selected_method
       _run_raw_task(args, ti)
     File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 184, in _run_raw_task
       error_file=args.error_file,
     File "/usr/local/lib/python3.7/site-packages/airflow/utils/session.py", line 70, in wrapper
       return func(*args, session=session, **kwargs)
     File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1332, in _run_raw_task
       self._execute_task_with_callbacks(context)
     File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1458, in _execute_task_with_callbacks
       result = self._execute_task(context, self.task)
     File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1514, in _execute_task
       result = execute_callable(context=context)
     File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/providers/databricks/operators/databricks.py", line 374, in execute
       self.run_id = self._hook.submit_run(self.json)
     File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/providers/databricks/hooks/databricks.py", line 152, in submit_run
       response = self._do_api_call(SUBMIT_RUN_ENDPOINT, json)
     File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/providers/databricks/hooks/databricks_base.py", line 493, in _do_api_call
       headers = {**self.user_agent_header, **aad_headers}
     File "/usr/local/lib/python3.7/site-packages/cached_property.py", line 36, in __get__
       value = obj.__dict__[self.func.__name__] = self.func(obj)
     File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/providers/databricks/hooks/databricks_base.py", line 136, in user_agent_header
       return {'user-agent': self.user_agent_value}
     File "/usr/local/lib/python3.7/site-packages/cached_property.py", line 36, in __get__
       value = obj.__dict__[self.func.__name__] = self.func(obj)
     File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/providers/databricks/hooks/databricks_base.py", line 144, in user_agent_value
       if provider.is_source:
   AttributeError: 'ProviderInfo' object has no attribute 'is_source'
   ```
   
   I ran it on `MWAA == 2.2.2` with below configuration:
   ```
   new_cluster = {
       "autoscale": {"min_workers": 1, "max_workers": 2},
       "cluster_name": "",
       "spark_version": get_spark_version(),
       "spark_conf": Variable.get("SPARK_CONF", deserialize_json=True, default_var="{}"),
       "aws_attributes": {
           "first_on_demand": 1,
           "availability": "SPOT_WITH_FALLBACK",
           "zone_id": "auto",
           "instance_profile_arn": Variable.get("E2_INSTANCE_PROFILE_ARN", default_var=""),
           "spot_bid_price_percent": 100,
       },
       "enable_elastic_disk": True,
       "node_type_id": "r5a.xlarge",
       "ssh_public_keys": [],
       "custom_tags": {"Application": "databricks", "env": env, "AnalyticsTask": "task name"},
       "spark_env_vars": {},
       "cluster_source": "JOB",
       "init_scripts": [],
   }
   
   with DAG(
       dag_id="dag id",
       description="desc",
       default_args=default_args,
       schedule_interval="0 2 * * *",  # Every night at 02:00
       catchup=False,
       max_active_runs=1,
       concurrency=1,
       is_paused_upon_creation=dag_is_paused_upon_creation,
   ) as dag:
   
       task = DatabricksSubmitRunOperator(
           task_id="task-name",
           databricks_conn_id="connection-name",
           new_cluster=new_cluster,
           notebook_task="notebook task",
           timeout_seconds=3600 * 4,  # 4 hours
           polling_period_seconds=30,
           retries=1,
       )
   ```
   
   Tell me if you need more detail.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org