You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Skyler Lehan (JIRA)" <ji...@apache.org> on 2019/07/22 15:46:00 UTC

[jira] [Created] (AIRFLOW-5017) Cannot Parse Environment Variable Connection for Spark on K8s

Skyler Lehan created AIRFLOW-5017:
-------------------------------------

             Summary: Cannot Parse Environment Variable Connection for Spark on K8s
                 Key: AIRFLOW-5017
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5017
             Project: Apache Airflow
          Issue Type: Bug
          Components: models
    Affects Versions: 1.10.3
            Reporter: Skyler Lehan


Currently, if you create a Spark on Kubernetes based URLĀ [connection with environment variables]([https://airflow.apache.org/howto/connection/index.html#creating-a-connection-with-environment-variables]) for the Spark Submit operator, urllib is unable to parse it correctly due to the double scheme that Spark expects (k8s://https://<k8s cluster hostname>:<k8s cluster port>). 

To test this, first set an environment variable for the connection:
{code:java}
$ export AIRFLOW_CONN_SPARK_K8S=k8s://https://localhost:8080{code}
Add the following example DAG:
{code:java}
from datetime import datetime, timedelta
from airflow import DAG
from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator

default_args = {
  'owner': 'airflow',
  'depends_on_past': False,
  'start_date': datetime(2019, 7, 19),
  'retries': 0
}

dag = DAG('spark_submit_k8s', default_args=default_args, schedule_interval=timedelta(days=1))

run_example_jar = SparkSubmitOperator(
  task_id='spark_submit_example',
  application='file:///usr/local/spark-2.4.2/examples/jars/spark-examples_2.11-  2.4.2.jar',
  java_class='org.apache.spark.examples.SparkPi',
  conn_id='spark_k8s',
  dag=dag
)

{code}
This fails to parse as per the logs:
{code:java}
[2019-07-22 15:05:58,925] {logging_mixin.py:95} INFO - [2019-07-22 15:05:58,925] {base_hook.py:83} INFO - Using connection to: id: spark_k8s. Host: https, Port: None, Schema: /localhost:8080, Login: None, Password: None, extra: {}
{code}
Because of the preceding "k8s://" urllib fails to parse this correctly.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)