You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Skyler Lehan (JIRA)" <ji...@apache.org> on 2019/07/22 15:50:00 UTC

[jira] [Updated] (AIRFLOW-5017) Cannot Parse Environment Variable Connection for Spark on K8s

     [ https://issues.apache.org/jira/browse/AIRFLOW-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Skyler Lehan updated AIRFLOW-5017:
----------------------------------
    Description: 
Currently, if you create a Spark on Kubernetes based URL [connection with environment variables|[https://airflow.apache.org/howto/connection/index.html#creating-a-connection-with-environment-variables]] for the Spark Submit operator, urllib is unable to parse it correctly due to the double scheme that Spark expects (k8s://https://<k8s cluster hostname>:<k8s cluster port>).

To test this, first set an environment variable for the connection:
{code:java}
$ export AIRFLOW_CONN_SPARK_K8S=k8s://https://localhost:8080{code}
Add the following example DAG:
{code:java}
from datetime import datetime, timedelta
from airflow import DAG
from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator

default_args = {
  'owner': 'airflow',
  'depends_on_past': False,
  'start_date': datetime(2019, 7, 19),
  'retries': 0
}

dag = DAG('spark_submit_k8s', default_args=default_args, schedule_interval=timedelta(days=1))

run_example_jar = SparkSubmitOperator(
  task_id='spark_submit_example',
  application='file:///usr/local/spark-2.4.2/examples/jars/spark-examples_2.11-  2.4.2.jar',
  java_class='org.apache.spark.examples.SparkPi',
  conn_id='spark_k8s',
  dag=dag
)

{code}
This fails to parse as per the logs:
{code:java}
[2019-07-22 15:05:58,925] {logging_mixin.py:95} INFO - [2019-07-22 15:05:58,925] {base_hook.py:83} INFO - Using connection to: id: spark_k8s. Host: https, Port: None, Schema: /localhost:8080, Login: None, Password: None, extra: {}
{code}
Because of the preceding "k8s://" urllib fails to parse this correctly.

  was:
Currently, if you create a Spark on Kubernetes based URL [connection with environment variables]([https://airflow.apache.org/howto/connection/index.html#creating-a-connection-with-environment-variables]) for the Spark Submit operator, urllib is unable to parse it correctly due to the double scheme that Spark expects (k8s://https://<k8s cluster hostname>:<k8s cluster port>). 

To test this, first set an environment variable for the connection:
{code:java}
$ export AIRFLOW_CONN_SPARK_K8S=k8s://https://localhost:8080{code}
Add the following example DAG:
{code:java}
from datetime import datetime, timedelta
from airflow import DAG
from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator

default_args = {
  'owner': 'airflow',
  'depends_on_past': False,
  'start_date': datetime(2019, 7, 19),
  'retries': 0
}

dag = DAG('spark_submit_k8s', default_args=default_args, schedule_interval=timedelta(days=1))

run_example_jar = SparkSubmitOperator(
  task_id='spark_submit_example',
  application='file:///usr/local/spark-2.4.2/examples/jars/spark-examples_2.11-  2.4.2.jar',
  java_class='org.apache.spark.examples.SparkPi',
  conn_id='spark_k8s',
  dag=dag
)

{code}
This fails to parse as per the logs:
{code:java}
[2019-07-22 15:05:58,925] {logging_mixin.py:95} INFO - [2019-07-22 15:05:58,925] {base_hook.py:83} INFO - Using connection to: id: spark_k8s. Host: https, Port: None, Schema: /localhost:8080, Login: None, Password: None, extra: {}
{code}
Because of the preceding "k8s://" urllib fails to parse this correctly.


> Cannot Parse Environment Variable Connection for Spark on K8s
> -------------------------------------------------------------
>
>                 Key: AIRFLOW-5017
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5017
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: models
>    Affects Versions: 1.10.3
>            Reporter: Skyler Lehan
>            Priority: Critical
>
> Currently, if you create a Spark on Kubernetes based URL [connection with environment variables|[https://airflow.apache.org/howto/connection/index.html#creating-a-connection-with-environment-variables]] for the Spark Submit operator, urllib is unable to parse it correctly due to the double scheme that Spark expects (k8s://https://<k8s cluster hostname>:<k8s cluster port>).
> To test this, first set an environment variable for the connection:
> {code:java}
> $ export AIRFLOW_CONN_SPARK_K8S=k8s://https://localhost:8080{code}
> Add the following example DAG:
> {code:java}
> from datetime import datetime, timedelta
> from airflow import DAG
> from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator
> default_args = {
>   'owner': 'airflow',
>   'depends_on_past': False,
>   'start_date': datetime(2019, 7, 19),
>   'retries': 0
> }
> dag = DAG('spark_submit_k8s', default_args=default_args, schedule_interval=timedelta(days=1))
> run_example_jar = SparkSubmitOperator(
>   task_id='spark_submit_example',
>   application='file:///usr/local/spark-2.4.2/examples/jars/spark-examples_2.11-  2.4.2.jar',
>   java_class='org.apache.spark.examples.SparkPi',
>   conn_id='spark_k8s',
>   dag=dag
> )
> {code}
> This fails to parse as per the logs:
> {code:java}
> [2019-07-22 15:05:58,925] {logging_mixin.py:95} INFO - [2019-07-22 15:05:58,925] {base_hook.py:83} INFO - Using connection to: id: spark_k8s. Host: https, Port: None, Schema: /localhost:8080, Login: None, Password: None, extra: {}
> {code}
> Because of the preceding "k8s://" urllib fails to parse this correctly.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)