You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Kelly, Jonathan" <jo...@amazon.com> on 2015/03/18 01:15:16 UTC

Using Spark with a SOCKS proxy

I'm trying to figure out how I might be able to use Spark with a SOCKS proxy.  That is, my dream is to be able to write code in my IDE then run it without much trouble on a remote cluster, accessible only via a SOCKS proxy between the local development machine and the master node of the cluster (ignoring, for now, any dependencies that would need to be transferred--assume it's a very simple app with no dependencies that aren't part of the Spark classpath on the cluster).  This is possible with Hadoop by setting hadoop.rpc.socket.factory.class.default to org.apache.hadoop.net.SocksSocketFactory and hadoop.socks.server to localhost:<port on which a SOCKS proxy has been opened via "ssh -D" to the master node>.  However, I can't seem to find anything like this for Spark, and I only see very few mentions of it on the user list and on stackoverflow, with no real answers.  (See links below.)

I thought I might be able to use the JVM's -DsocksProxyHost and -DsocksProxyPort system properties, but it still does not seem to work.  That is, if I start a SOCKS proxy to my master node using something like "ssh -D 2600 <master node public name>" then run a simple Spark app that calls SparkConf.setMaster("spark://<master node private IP>:7077"), passing in JVM args of "-DsocksProxyHost=locahost -DsocksProxyPort=2600", the driver hangs for a while before finally giving up ("Application has been killed. Reason: All masters are unresponsive! Giving up.").  It seems like it is not even attempting to use the SOCKS proxy.  Do -DsocksProxyHost/-DsocksProxyPort not even work for Spark?

http://stackoverflow.com/questions/28047000/connect-to-spark-through-a-socks-proxy (unanswered similar question from somebody else about a month ago)
https://issues.apache.org/jira/browse/SPARK-5004 (unresolved, somewhat related JIRA from a few months ago)

Thanks,
Jonathan

Re: Using Spark with a SOCKS proxy

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Did you try ssh tunneling instead of SOCKS?

Thanks
Best Regards

On Wed, Mar 18, 2015 at 5:45 AM, Kelly, Jonathan <jo...@amazon.com>
wrote:

>  I'm trying to figure out how I might be able to use Spark with a SOCKS
> proxy.  That is, my dream is to be able to write code in my IDE then run it
> without much trouble on a remote cluster, accessible only via a SOCKS proxy
> between the local development machine and the master node of the
> cluster (ignoring, for now, any dependencies that would need to be
> transferred--assume it's a very simple app with no dependencies that aren't
> part of the Spark classpath on the cluster).  This is possible with Hadoop
> by setting hadoop.rpc.socket.factory.class.default to
> org.apache.hadoop.net.SocksSocketFactory and hadoop.socks.server to
> localhost:<port on which a SOCKS proxy has been opened via "ssh -D" to the
> master node>.  However, I can't seem to find anything like this for Spark,
> and I only see very few mentions of it on the user list and on
> stackoverflow, with no real answers.  (See links below.)
>
>  I thought I might be able to use the JVM's -DsocksProxyHost and
> -DsocksProxyPort system properties, but it still does not seem to work.
> That is, if I start a SOCKS proxy to my master node using something like
> "ssh -D 2600 <master node public name>" then run a simple Spark app that
> calls SparkConf.setMaster("spark://<master node private IP>:7077"), passing
> in JVM args of "-DsocksProxyHost=locahost -DsocksProxyPort=2600", the
> driver hangs for a while before finally giving up ("Application has been
> killed. Reason: All masters are unresponsive! Giving up.").  It seems like
> it is not even attempting to use the SOCKS proxy.  Do
> -DsocksProxyHost/-DsocksProxyPort not even work for Spark?
>
>
> http://stackoverflow.com/questions/28047000/connect-to-spark-through-a-socks-proxy (unanswered
> similar question from somebody else about a month ago)
> https://issues.apache.org/jira/browse/SPARK-5004 (unresolved, somewhat
> related JIRA from a few months ago)
>
>  Thanks,
>  Jonathan
>

Re: Using Spark with a SOCKS proxy

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Did you try ssh tunneling instead of SOCKS?

Thanks
Best Regards

On Wed, Mar 18, 2015 at 5:45 AM, Kelly, Jonathan <jo...@amazon.com>
wrote:

>  I'm trying to figure out how I might be able to use Spark with a SOCKS
> proxy.  That is, my dream is to be able to write code in my IDE then run it
> without much trouble on a remote cluster, accessible only via a SOCKS proxy
> between the local development machine and the master node of the
> cluster (ignoring, for now, any dependencies that would need to be
> transferred--assume it's a very simple app with no dependencies that aren't
> part of the Spark classpath on the cluster).  This is possible with Hadoop
> by setting hadoop.rpc.socket.factory.class.default to
> org.apache.hadoop.net.SocksSocketFactory and hadoop.socks.server to
> localhost:<port on which a SOCKS proxy has been opened via "ssh -D" to the
> master node>.  However, I can't seem to find anything like this for Spark,
> and I only see very few mentions of it on the user list and on
> stackoverflow, with no real answers.  (See links below.)
>
>  I thought I might be able to use the JVM's -DsocksProxyHost and
> -DsocksProxyPort system properties, but it still does not seem to work.
> That is, if I start a SOCKS proxy to my master node using something like
> "ssh -D 2600 <master node public name>" then run a simple Spark app that
> calls SparkConf.setMaster("spark://<master node private IP>:7077"), passing
> in JVM args of "-DsocksProxyHost=locahost -DsocksProxyPort=2600", the
> driver hangs for a while before finally giving up ("Application has been
> killed. Reason: All masters are unresponsive! Giving up.").  It seems like
> it is not even attempting to use the SOCKS proxy.  Do
> -DsocksProxyHost/-DsocksProxyPort not even work for Spark?
>
>
> http://stackoverflow.com/questions/28047000/connect-to-spark-through-a-socks-proxy (unanswered
> similar question from somebody else about a month ago)
> https://issues.apache.org/jira/browse/SPARK-5004 (unresolved, somewhat
> related JIRA from a few months ago)
>
>  Thanks,
>  Jonathan
>