You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by Russell Jurney <ru...@gmail.com> on 2020/07/17 05:41:13 UTC

How do I spark-submit a Spark job to a master [EMR] through an SSH tunnel?

For starters: I am familiar with all the parts involved and have created an
SSH connection, a tunnel from that connection and a connection to the Spark
master that doesn't use SSH (so it can't connect). I see the myriad ways to
interact with Spark in Airflow, both in contrib and the main package.

*What I can't find a single discussion about is: how do I submit a Spark
job to a Spark master through an SSH tunnel?*

SSH tunnels are done in DAGs via the hook and not as connections (seems
like a bad design decision) and so I can't find a way to actually make a
connection to the Spark master that uses a tunnel. There is no parameter in
the spark-submit operators that might use an ssh tunnel, so I am stuck.

Thanks,
Russell Jurney @rjurney <http://twitter.com/rjurney>
russell.jurney@gmail.com LI <http://linkedin.com/in/russelljurney> FB
<http://facebook.com/jurney> datasyndrome.com