You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/05 20:22:04 UTC

[GitHub] [airflow] boittega opened a new issue #8723: Spark JDBC Hook fails if spark_conf is not specified

boittega opened a new issue #8723:
URL: https://github.com/apache/airflow/issues/8723


   **Apache Airflow version**: 1.10.10
   
   **What happened**:
   
   At SparkJDBCHook, the `spark_conf` parameter has default None, if kept like that it raise an error:
   ```
   Traceback (most recent call last):
     File "/Users/rbottega/Documents/airflow_latest/env/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
       result = task_copy.execute(context=context)
     File "/Users/rbottega/Documents/airflow_latest/env/lib/python3.7/site-packages/airflow/contrib/operators/spark_jdbc_operator.py", line 211, in execute
       self._hook.submit_jdbc_job()
     File "/Users/rbottega/Documents/airflow_latest/env/lib/python3.7/site-packages/airflow/contrib/hooks/spark_jdbc_hook.py", line 243, in submit_jdbc_job
       "/spark_jdbc_script.py")
     File "/Users/rbottega/Documents/airflow_latest/env/lib/python3.7/site-packages/airflow/contrib/hooks/spark_submit_hook.py", line 383, in submit
       spark_submit_cmd = self._build_spark_submit_command(application)
     File "/Users/rbottega/Documents/airflow_latest/env/lib/python3.7/site-packages/airflow/contrib/hooks/spark_submit_hook.py", line 254, in _build_spark_submit_command
       for key in self._conf:
   TypeError: 'NoneType' object is not iterable
   ```
   
   **What you expected to happen**:
   
   Following the same behaviour than SparkSubmitHook, the a`spark_conf` should have default empty dict "{}"
   ```
   self._conf = conf or {}
   ```
   
   **How to reproduce it**:
   Create a DAG with SparkJDBCOperator and don't specify the parameter `spark_conf`
   ```
       spark_to_jdbc_job = SparkJDBCOperator(
           cmd_type='spark_to_jdbc',
           jdbc_table="foo",
           spark_jars="${SPARK_HOME}/jars/postgresql-42.2.12.jar",
           jdbc_driver="org.postgresql.Driver",
           metastore_table="bar",
           save_mode="append",
           task_id="spark_to_jdbc_job"
       )
   ```
   
   
   **Anything else we need to know**:
   
   I am happy to implement this change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org