You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Scott Clasen <sc...@gmail.com> on 2014/05/05 18:38:21 UTC
Comprehensive Port Configuration reference?
Is there somewhere documented how one would go about configuring every open
port a spark application needs?
This seems like one of the main things that make running spark hard in
places like EC2 where you arent using the canned spark scripts.
Starting an app looks like you'll see ports open for
BlockManager
OutoutTracker
FileServer
WebUI
Local port to get callbacks from mesos master..
What else?
How do I configure all of these?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Comprehensive-Port-Configuration-reference-tp5384.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Comprehensive Port Configuration reference?
Posted by Andrew Ash <an...@andrewash.com>.
Hi Jacob,
The port configuration docs that we worked on together are now available
at:
http://spark.apache.org/docs/latest/spark-standalone.html#configuring-ports-for-network-security
Thanks for the help!
Andrew
On Wed, May 28, 2014 at 3:21 PM, Jacob Eisinger <je...@us.ibm.com> wrote:
> Howdy Andrew,
>
> This is a standalone cluster. And, yes, if my understanding of Spark
> terminology is correct, you are correct about the port ownerships.
>
>
> Jacob
>
> Jacob D. Eisinger
> IBM Emerging Technologies
> jeising@us.ibm.com - (512) 286-6075
>
> [image: Inactive hide details for Andrew Ash ---05/28/2014 05:18:46
> PM---Hmm, those do look like 4 listening ports to me. PID 3404 is]Andrew
> Ash ---05/28/2014 05:18:46 PM---Hmm, those do look like 4 listening ports
> to me. PID 3404 is an executor and PID 4762 is a worker?
>
>
> From: Andrew Ash <an...@andrewash.com>
> To: user@spark.apache.org
> Date: 05/28/2014 05:18 PM
>
> Subject: Re: Comprehensive Port Configuration reference?
> ------------------------------
>
>
>
> Hmm, those do look like 4 listening ports to me. PID 3404 is an executor
> and PID 4762 is a worker? This is a standalone cluster?
>
>
> On Wed, May 28, 2014 at 8:22 AM, Jacob Eisinger <*jeising@us.ibm.com*
> <je...@us.ibm.com>> wrote:
>
> Howdy Andrew,
>
> Here is what I ran before an application context was created (other
> services have been deleted):
>
> *# netstat -l -t tcp -p --numeric-ports
> *
> Active Internet connections (only servers)
>
> Proto Recv-Q Send-Q Local Address Foreign Address
> State PID/Program name
> * tcp6 0 0 **10.90.17.100:8888* <http://10.90.17.100:8888/>
> * :::* LISTEN 4762/java
> tcp6 0 0 :::8081 :::*
> LISTEN 4762/java *
>
> And, then while the application context is up:
> *# netstat -l -t tcp -p --numeric-ports
> *
> Active Internet connections (only servers)
>
> Proto Recv-Q Send-Q Local Address Foreign Address
> State PID/Program name
> * tcp6 0 0 **10.90.17.100:8888* <http://10.90.17.100:8888/>*
> :::* LISTEN 4762/java
> *
>
> * tcp6 0 0 :::57286 :::*
> LISTEN 3404/java tcp6 0 0 *
> *10.90.17.100:38118* <http://10.90.17.100:38118/>
> * :::* LISTEN 3404/java
> tcp6 0 0 **10.90.17.100:35530*
> <http://10.90.17.100:35530/>
> * :::* LISTEN 3404/java
> tcp6 0 0 :::60235 :::*
> LISTEN 3404/java *
> * tcp6 0 0 :::8081 :::*
> LISTEN 4762/java *
>
> My understanding is that this says four ports are open. Is 57286 and
> 60235 not being used?
>
>
> Jacob
>
> Jacob D. Eisinger
> IBM Emerging Technologies
> *jeising@us.ibm.com* <je...@us.ibm.com> - *(512) 286-6075*
> <%28512%29%20286-6075>
>
> [image: Inactive hide details for Andrew Ash ---05/25/2014 06:25:18
> PM---Hi Jacob, The config option spark.history.ui.port is new for 1]Andrew
> Ash ---05/25/2014 06:25:18 PM---Hi Jacob, The config option
> spark.history.ui.port is new for 1.0 The problem that
>
>
> From: Andrew Ash <*andrew@andrewash.com* <an...@andrewash.com>>
> To: *user@spark.apache.org* <us...@spark.apache.org>
> Date: 05/25/2014 06:25 PM
>
> Subject: Re: Comprehensive Port Configuration reference?
> ------------------------------
>
>
>
> Hi Jacob,
>
> The config option spark.history.ui.port is new for 1.0 The problem
> that History server solves is that in non-Standalone cluster deployment
> modes (Mesos and YARN) there is no long-lived Spark Master that can store
> logs and statistics about an application after it finishes. History server
> is the UI that renders logged data from applications after they complete.
>
> Read more here: *https://issues.apache.org/jira/browse/SPARK-1276*
> <https://issues.apache.org/jira/browse/SPARK-1276> and
> *https://github.com/apache/spark/pull/204*
> <https://github.com/apache/spark/pull/204>
>
> As far as the two vs four dynamic ports, are those all listening
> ports? I did observe 4 ports in use, but only two of them were listening.
> The other two were the random ports used for responses on outbound
> connections, the source port of the (srcIP, srcPort, dstIP, dstPort) tuple
> that uniquely identifies a TCP socket.
>
>
> *http://unix.stackexchange.com/questions/75011/how-does-the-server-find-out-what-client-port-to-send-to*
> <http://unix.stackexchange.com/questions/75011/how-does-the-server-find-out-what-client-port-to-send-to>
>
> Thanks for taking a look through!
>
> I also realized that I had a couple mistakes with the 0.9 to 1.0
> transition so appropriately documented those now as well in the updated PR.
>
> Cheers!
> Andrew
>
>
>
> On Fri, May 23, 2014 at 2:43 PM, Jacob Eisinger <*jeising@us.ibm.com*
> <je...@us.ibm.com>> wrote:
> Howdy Andrew,
>
> I noticed you have a configuration item that we were not aware of:
> spark.history.ui.port . Is that new for 1.0?
>
> Also, we noticed that the Workers and the Drivers were opening up
> four dynamic ports per application context. It looks like you were seeing
> two.
>
> Everything else looks like it aligns!
> Jacob
>
>
>
> Jacob D. Eisinger
> IBM Emerging Technologies
> *jeising@us.ibm.com* <je...@us.ibm.com> - *(512) 286-6075*
> <%28512%29%20286-6075>
>
> [image: Inactive hide details for Andrew Ash ---05/23/2014 10:30:58
> AM---Hi everyone, I've also been interested in better understanding]Andrew
> Ash ---05/23/2014 10:30:58 AM---Hi everyone, I've also been interested in
> better understanding what ports are used where
>
> From: Andrew Ash <*andrew@andrewash.com* <an...@andrewash.com>>
> To: *user@spark.apache.org* <us...@spark.apache.org>
> Date: 05/23/2014 10:30 AM
> Subject: Re: Comprehensive Port Configuration reference?
>
> ------------------------------
>
>
>
> Hi everyone,
>
> I've also been interested in better understanding what ports are
> used where and the direction the network connections go. I've observed a
> running cluster and read through code, and came up with the below
> documentation addition.
>
> *https://github.com/apache/spark/pull/856*
> <https://github.com/apache/spark/pull/856>
>
> Scott and Jacob -- it sounds like you two have pulled together some
> of this yourselves for writing firewall rules. Would you mind taking a
> look at this pull request and confirming that it matches your observations?
> Wrong documentation is worse than no documentation, so I'd like to make
> sure this is right.
>
> Cheers,
> Andrew
>
>
> On Wed, May 7, 2014 at 10:19 AM, Mark Baker <*distobj@acm.org*
> <di...@acm.org>> wrote:
> On Tue, May 6, 2014 at 9:09 AM, Jacob Eisinger <
> *jeising@us.ibm.com* <je...@us.ibm.com>> wrote:
> > In a nut shell, Spark opens up a couple of well known ports.
> And,then the workers and the shell open up dynamic ports for each job.
> These dynamic ports make securing the Spark network difficult.
>
> Indeed.
>
> Judging by the frequency with which this topic arises, this is a
> concern for many (myself included).
>
> I couldn't find anything in JIRA about it, but I'm curious to
> know
> whether the Spark team considers this a problem in need of a fix?
>
> Mark.
>
>
>
>
Re: Comprehensive Port Configuration reference?
Posted by Jacob Eisinger <je...@us.ibm.com>.
Howdy Andrew,
This is a standalone cluster. And, yes, if my understanding of Spark
terminology is correct, you are correct about the port ownerships.
Jacob
Jacob D. Eisinger
IBM Emerging Technologies
jeising@us.ibm.com - (512) 286-6075
From: Andrew Ash <an...@andrewash.com>
To: user@spark.apache.org
Date: 05/28/2014 05:18 PM
Subject: Re: Comprehensive Port Configuration reference?
Hmm, those do look like 4 listening ports to me. PID 3404 is an executor
and PID 4762 is a worker? This is a standalone cluster?
On Wed, May 28, 2014 at 8:22 AM, Jacob Eisinger <je...@us.ibm.com> wrote:
Howdy Andrew,
Here is what I ran before an application context was created (other
services have been deleted):
# netstat -l -t tcp -p --numeric-ports
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
tcp6 0 0 10.90.17.100:8888 :::*
LISTEN 4762/java
tcp6 0 0 :::8081 :::*
LISTEN 4762/java
And, then while the application context is up:
# netstat -l -t tcp -p --numeric-ports
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
tcp6 0 0 10.90.17.100:8888 :::*
LISTEN 4762/java
tcp6 0 0 :::57286 :::*
LISTEN 3404/java
tcp6 0 0 10.90.17.100:38118 :::*
LISTEN 3404/java
tcp6 0 0 10.90.17.100:35530 :::*
LISTEN 3404/java
tcp6 0 0 :::60235 :::*
LISTEN 3404/java
tcp6 0 0 :::8081 :::*
LISTEN 4762/java
My understanding is that this says four ports are open. Is 57286 and
60235 not being used?
Jacob
Jacob D. Eisinger
IBM Emerging Technologies
jeising@us.ibm.com - (512) 286-6075
Inactive hide details for Andrew Ash ---05/25/2014 06:25:18 PM---Hi
Jacob, The config option spark.history.ui.port is new for 1Andrew Ash
---05/25/2014 06:25:18 PM---Hi Jacob, The config option
spark.history.ui.port is new for 1.0 The problem that
From: Andrew Ash <an...@andrewash.com>
To: user@spark.apache.org
Date: 05/25/2014 06:25 PM
Subject: Re: Comprehensive Port Configuration reference?
Hi Jacob,
The config option spark.history.ui.port is new for 1.0 The problem that
History server solves is that in non-Standalone cluster deployment modes
(Mesos and YARN) there is no long-lived Spark Master that can store logs
and statistics about an application after it finishes. History server is
the UI that renders logged data from applications after they complete.
Read more here: https://issues.apache.org/jira/browse/SPARK-1276 and
https://github.com/apache/spark/pull/204
As far as the two vs four dynamic ports, are those all listening ports?
I did observe 4 ports in use, but only two of them were listening. The
other two were the random ports used for responses on outbound
connections, the source port of the (srcIP, srcPort, dstIP, dstPort)
tuple that uniquely identifies a TCP socket.
http://unix.stackexchange.com/questions/75011/how-does-the-server-find-out-what-client-port-to-send-to
Thanks for taking a look through!
I also realized that I had a couple mistakes with the 0.9 to 1.0
transition so appropriately documented those now as well in the updated
PR.
Cheers!
Andrew
On Fri, May 23, 2014 at 2:43 PM, Jacob Eisinger <je...@us.ibm.com>
wrote:
Howdy Andrew,
I noticed you have a configuration item that we were not aware of:
spark.history.ui.port . Is that new for 1.0?
Also, we noticed that the Workers and the Drivers were opening up
four dynamic ports per application context. It looks like you were
seeing two.
Everything else looks like it aligns!
Jacob
Jacob D. Eisinger
IBM Emerging Technologies
jeising@us.ibm.com - (512) 286-6075
Inactive hide details for Andrew Ash ---05/23/2014 10:30:58 AM---Hi
everyone, I've also been interested in better understandingAndrew
Ash ---05/23/2014 10:30:58 AM---Hi everyone, I've also been
interested in better understanding what ports are used where
From: Andrew Ash <an...@andrewash.com>
To: user@spark.apache.org
Date: 05/23/2014 10:30 AM
Subject: Re: Comprehensive Port Configuration reference?
Hi everyone,
I've also been interested in better understanding what ports are
used where and the direction the network connections go. I've
observed a running cluster and read through code, and came up with
the below documentation addition.
https://github.com/apache/spark/pull/856
Scott and Jacob -- it sounds like you two have pulled together some
of this yourselves for writing firewall rules. Would you mind
taking a look at this pull request and confirming that it matches
your observations? Wrong documentation is worse than no
documentation, so I'd like to make sure this is right.
Cheers,
Andrew
On Wed, May 7, 2014 at 10:19 AM, Mark Baker <di...@acm.org>
wrote:
On Tue, May 6, 2014 at 9:09 AM, Jacob Eisinger <
jeising@us.ibm.com> wrote:
> In a nut shell, Spark opens up a couple of well known
ports. And,then the workers and the shell open up dynamic
ports for each job. These dynamic ports make securing the
Spark network difficult.
Indeed.
Judging by the frequency with which this topic arises, this
is a
concern for many (myself included).
I couldn't find anything in JIRA about it, but I'm curious to
know
whether the Spark team considers this a problem in need of a
fix?
Mark.
Re: Comprehensive Port Configuration reference?
Posted by Andrew Ash <an...@andrewash.com>.
Hmm, those do look like 4 listening ports to me. PID 3404 is an executor
and PID 4762 is a worker? This is a standalone cluster?
On Wed, May 28, 2014 at 8:22 AM, Jacob Eisinger <je...@us.ibm.com> wrote:
> Howdy Andrew,
>
> Here is what I ran before an application context was created (other
> services have been deleted):
>
> *# netstat -l -t tcp -p --numeric-ports
> *
> Active Internet connections (only servers)
>
> Proto Recv-Q Send-Q Local Address Foreign Address
> State PID/Program name
> *tcp6 0 0 10.90.17.100:8888 <http://10.90.17.100:8888>
> :::* LISTEN 4762/java
> *
> *tcp6 0 0 :::8081 :::*
> LISTEN 4762/java *
>
>
> And, then while the application context is up:
>
> *# netstat -l -t tcp -p --numeric-ports
> *
> Active Internet connections (only servers)
>
> Proto Recv-Q Send-Q Local Address Foreign Address
> State PID/Program name
> *tcp6 0 0 10.90.17.100:8888 <http://10.90.17.100:8888>
> :::* LISTEN 4762/java
> *
> *tcp6 0 0 :::57286 :::*
> LISTEN 3404/java *
> *tcp6 0 0 10.90.17.100:38118 <http://10.90.17.100:38118>
> :::* LISTEN 3404/java
> *
> *tcp6 0 0 10.90.17.100:35530 <http://10.90.17.100:35530>
> :::* LISTEN 3404/java
> *
> *tcp6 0 0 :::60235 :::*
> LISTEN 3404/java *
> *tcp6 0 0 :::8081 :::*
> LISTEN 4762/java *
>
>
> My understanding is that this says four ports are open. Is 57286 and
> 60235 not being used?
>
>
> Jacob
>
> Jacob D. Eisinger
> IBM Emerging Technologies
> jeising@us.ibm.com - (512) 286-6075
>
> [image: Inactive hide details for Andrew Ash ---05/25/2014 06:25:18
> PM---Hi Jacob, The config option spark.history.ui.port is new for 1]Andrew
> Ash ---05/25/2014 06:25:18 PM---Hi Jacob, The config option
> spark.history.ui.port is new for 1.0 The problem that
>
>
> From: Andrew Ash <an...@andrewash.com>
> To: user@spark.apache.org
> Date: 05/25/2014 06:25 PM
>
> Subject: Re: Comprehensive Port Configuration reference?
> ------------------------------
>
>
>
> Hi Jacob,
>
> The config option spark.history.ui.port is new for 1.0 The problem that
> History server solves is that in non-Standalone cluster deployment modes
> (Mesos and YARN) there is no long-lived Spark Master that can store logs
> and statistics about an application after it finishes. History server is
> the UI that renders logged data from applications after they complete.
>
> Read more here: *https://issues.apache.org/jira/browse/SPARK-1276*<https://issues.apache.org/jira/browse/SPARK-1276>
> and *https://github.com/apache/spark/pull/204*<https://github.com/apache/spark/pull/204>
>
> As far as the two vs four dynamic ports, are those all listening ports? I
> did observe 4 ports in use, but only two of them were listening. The other
> two were the random ports used for responses on outbound connections, the
> source port of the (srcIP, srcPort, dstIP, dstPort) tuple that uniquely
> identifies a TCP socket.
>
>
> *http://unix.stackexchange.com/questions/75011/how-does-the-server-find-out-what-client-port-to-send-to*<http://unix.stackexchange.com/questions/75011/how-does-the-server-find-out-what-client-port-to-send-to>
>
> Thanks for taking a look through!
>
> I also realized that I had a couple mistakes with the 0.9 to 1.0
> transition so appropriately documented those now as well in the updated PR.
>
> Cheers!
> Andrew
>
>
>
> On Fri, May 23, 2014 at 2:43 PM, Jacob Eisinger <*j...@us.ibm.com>>
> wrote:
>
> Howdy Andrew,
>
> I noticed you have a configuration item that we were not aware of:
> spark.history.ui.port . Is that new for 1.0?
>
> Also, we noticed that the Workers and the Drivers were opening up four
> dynamic ports per application context. It looks like you were seeing two.
>
> Everything else looks like it aligns!
> Jacob
>
>
>
> Jacob D. Eisinger
> IBM Emerging Technologies
> *jeising@us.ibm.com* <je...@us.ibm.com> - *(512) 286-6075*<%28512%29%20286-6075>
>
> [image: Inactive hide details for Andrew Ash ---05/23/2014 10:30:58
> AM---Hi everyone, I've also been interested in better understanding]Andrew
> Ash ---05/23/2014 10:30:58 AM---Hi everyone, I've also been interested in
> better understanding what ports are used where
>
> From: Andrew Ash <*andrew@andrewash.com* <an...@andrewash.com>>
> To: *user@spark.apache.org* <us...@spark.apache.org>
> Date: 05/23/2014 10:30 AM
> Subject: Re: Comprehensive Port Configuration reference?
> ------------------------------
>
>
>
> Hi everyone,
>
> I've also been interested in better understanding what ports are used
> where and the direction the network connections go. I've observed a
> running cluster and read through code, and came up with the below
> documentation addition.
>
> *https://github.com/apache/spark/pull/856*<https://github.com/apache/spark/pull/856>
>
> Scott and Jacob -- it sounds like you two have pulled together some of
> this yourselves for writing firewall rules. Would you mind taking a look
> at this pull request and confirming that it matches your observations?
> Wrong documentation is worse than no documentation, so I'd like to make
> sure this is right.
>
> Cheers,
> Andrew
>
>
> On Wed, May 7, 2014 at 10:19 AM, Mark Baker <*d...@acm.org>>
> wrote:
> On Tue, May 6, 2014 at 9:09 AM, Jacob Eisinger <*j...@us.ibm.com>>
> wrote:
> > In a nut shell, Spark opens up a couple of well known ports.
> And,then the workers and the shell open up dynamic ports for each job.
> These dynamic ports make securing the Spark network difficult.
>
> Indeed.
>
> Judging by the frequency with which this topic arises, this is a
> concern for many (myself included).
>
> I couldn't find anything in JIRA about it, but I'm curious to know
> whether the Spark team considers this a problem in need of a fix?
>
> Mark.
>
>
>
>
Re: Comprehensive Port Configuration reference?
Posted by Jacob Eisinger <je...@us.ibm.com>.
Howdy Andrew,
Here is what I ran before an application context was created (other
services have been deleted):
# netstat -l -t tcp -p --numeric-ports
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
tcp6 0 0 10.90.17.100:8888 :::*
LISTEN 4762/java
tcp6 0 0 :::8081 :::*
LISTEN 4762/java
And, then while the application context is up:
# netstat -l -t tcp -p --numeric-ports
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
tcp6 0 0 10.90.17.100:8888 :::*
LISTEN 4762/java
tcp6 0 0 :::57286 :::*
LISTEN 3404/java
tcp6 0 0 10.90.17.100:38118 :::*
LISTEN 3404/java
tcp6 0 0 10.90.17.100:35530 :::*
LISTEN 3404/java
tcp6 0 0 :::60235 :::*
LISTEN 3404/java
tcp6 0 0 :::8081 :::*
LISTEN 4762/java
My understanding is that this says four ports are open. Is 57286 and 60235
not being used?
Jacob
Jacob D. Eisinger
IBM Emerging Technologies
jeising@us.ibm.com - (512) 286-6075
From: Andrew Ash <an...@andrewash.com>
To: user@spark.apache.org
Date: 05/25/2014 06:25 PM
Subject: Re: Comprehensive Port Configuration reference?
Hi Jacob,
The config option spark.history.ui.port is new for 1.0 The problem that
History server solves is that in non-Standalone cluster deployment modes
(Mesos and YARN) there is no long-lived Spark Master that can store logs
and statistics about an application after it finishes. History server is
the UI that renders logged data from applications after they complete.
Read more here: https://issues.apache.org/jira/browse/SPARK-1276 and
https://github.com/apache/spark/pull/204
As far as the two vs four dynamic ports, are those all listening ports? I
did observe 4 ports in use, but only two of them were listening. The other
two were the random ports used for responses on outbound connections, the
source port of the (srcIP, srcPort, dstIP, dstPort) tuple that uniquely
identifies a TCP socket.
http://unix.stackexchange.com/questions/75011/how-does-the-server-find-out-what-client-port-to-send-to
Thanks for taking a look through!
I also realized that I had a couple mistakes with the 0.9 to 1.0 transition
so appropriately documented those now as well in the updated PR.
Cheers!
Andrew
On Fri, May 23, 2014 at 2:43 PM, Jacob Eisinger <je...@us.ibm.com> wrote:
Howdy Andrew,
I noticed you have a configuration item that we were not aware of:
spark.history.ui.port . Is that new for 1.0?
Also, we noticed that the Workers and the Drivers were opening up four
dynamic ports per application context. It looks like you were seeing
two.
Everything else looks like it aligns!
Jacob
Jacob D. Eisinger
IBM Emerging Technologies
jeising@us.ibm.com - (512) 286-6075
Inactive hide details for Andrew Ash ---05/23/2014 10:30:58 AM---Hi
everyone, I've also been interested in better understandingAndrew Ash
---05/23/2014 10:30:58 AM---Hi everyone, I've also been interested in
better understanding what ports are used where
From: Andrew Ash <an...@andrewash.com>
To: user@spark.apache.org
Date: 05/23/2014 10:30 AM
Subject: Re: Comprehensive Port Configuration reference?
Hi everyone,
I've also been interested in better understanding what ports are used
where and the direction the network connections go. I've observed a
running cluster and read through code, and came up with the below
documentation addition.
https://github.com/apache/spark/pull/856
Scott and Jacob -- it sounds like you two have pulled together some of
this yourselves for writing firewall rules. Would you mind taking a look
at this pull request and confirming that it matches your observations?
Wrong documentation is worse than no documentation, so I'd like to make
sure this is right.
Cheers,
Andrew
On Wed, May 7, 2014 at 10:19 AM, Mark Baker <di...@acm.org> wrote:
On Tue, May 6, 2014 at 9:09 AM, Jacob Eisinger <je...@us.ibm.com>
wrote:
> In a nut shell, Spark opens up a couple of well known ports.
And,then the workers and the shell open up dynamic ports for each
job. These dynamic ports make securing the Spark network
difficult.
Indeed.
Judging by the frequency with which this topic arises, this is a
concern for many (myself included).
I couldn't find anything in JIRA about it, but I'm curious to know
whether the Spark team considers this a problem in need of a fix?
Mark.
Re: Comprehensive Port Configuration reference?
Posted by Andrew Ash <an...@andrewash.com>.
Hi Jacob,
The config option spark.history.ui.port is new for 1.0 The problem that
History server solves is that in non-Standalone cluster deployment modes
(Mesos and YARN) there is no long-lived Spark Master that can store logs
and statistics about an application after it finishes. History server is
the UI that renders logged data from applications after they complete.
Read more here: https://issues.apache.org/jira/browse/SPARK-1276 and
https://github.com/apache/spark/pull/204
As far as the two vs four dynamic ports, are those all listening ports? I
did observe 4 ports in use, but only two of them were listening. The other
two were the random ports used for responses on outbound connections, the
source port of the (srcIP, srcPort, dstIP, dstPort) tuple that uniquely
identifies a TCP socket.
http://unix.stackexchange.com/questions/75011/how-does-the-server-find-out-what-client-port-to-send-to
Thanks for taking a look through!
I also realized that I had a couple mistakes with the 0.9 to 1.0 transition
so appropriately documented those now as well in the updated PR.
Cheers!
Andrew
On Fri, May 23, 2014 at 2:43 PM, Jacob Eisinger <je...@us.ibm.com> wrote:
> Howdy Andrew,
>
> I noticed you have a configuration item that we were not aware of:
> spark.history.ui.port . Is that new for 1.0?
>
> Also, we noticed that the Workers and the Drivers were opening up four
> dynamic ports per application context. It looks like you were seeing two.
>
> Everything else looks like it aligns!
> Jacob
>
>
> Jacob D. Eisinger
> IBM Emerging Technologies
> jeising@us.ibm.com - (512) 286-6075
>
> [image: Inactive hide details for Andrew Ash ---05/23/2014 10:30:58
> AM---Hi everyone, I've also been interested in better understanding]Andrew
> Ash ---05/23/2014 10:30:58 AM---Hi everyone, I've also been interested in
> better understanding what ports are used where
>
> From: Andrew Ash <an...@andrewash.com>
> To: user@spark.apache.org
> Date: 05/23/2014 10:30 AM
> Subject: Re: Comprehensive Port Configuration reference?
> ------------------------------
>
>
>
> Hi everyone,
>
> I've also been interested in better understanding what ports are used
> where and the direction the network connections go. I've observed a
> running cluster and read through code, and came up with the below
> documentation addition.
>
> *https://github.com/apache/spark/pull/856*<https://github.com/apache/spark/pull/856>
>
> Scott and Jacob -- it sounds like you two have pulled together some of
> this yourselves for writing firewall rules. Would you mind taking a look
> at this pull request and confirming that it matches your observations?
> Wrong documentation is worse than no documentation, so I'd like to make
> sure this is right.
>
> Cheers,
> Andrew
>
>
> On Wed, May 7, 2014 at 10:19 AM, Mark Baker <*d...@acm.org>>
> wrote:
>
> On Tue, May 6, 2014 at 9:09 AM, Jacob Eisinger <*j...@us.ibm.com>>
> wrote:
> > In a nut shell, Spark opens up a couple of well known ports.
> And,then the workers and the shell open up dynamic ports for each job.
> These dynamic ports make securing the Spark network difficult.
>
> Indeed.
>
> Judging by the frequency with which this topic arises, this is a
> concern for many (myself included).
>
> I couldn't find anything in JIRA about it, but I'm curious to know
> whether the Spark team considers this a problem in need of a fix?
>
> Mark.
>
>
>
Re: Comprehensive Port Configuration reference?
Posted by Jacob Eisinger <je...@us.ibm.com>.
Howdy Andrew,
I noticed you have a configuration item that we were not aware of:
spark.history.ui.port . Is that new for 1.0?
Also, we noticed that the Workers and the Drivers were opening up four
dynamic ports per application context. It looks like you were seeing two.
Everything else looks like it aligns!
Jacob
Jacob D. Eisinger
IBM Emerging Technologies
jeising@us.ibm.com - (512) 286-6075
From: Andrew Ash <an...@andrewash.com>
To: user@spark.apache.org
Date: 05/23/2014 10:30 AM
Subject: Re: Comprehensive Port Configuration reference?
Hi everyone,
I've also been interested in better understanding what ports are used where
and the direction the network connections go. I've observed a running
cluster and read through code, and came up with the below documentation
addition.
https://github.com/apache/spark/pull/856
Scott and Jacob -- it sounds like you two have pulled together some of this
yourselves for writing firewall rules. Would you mind taking a look at
this pull request and confirming that it matches your observations? Wrong
documentation is worse than no documentation, so I'd like to make sure this
is right.
Cheers,
Andrew
On Wed, May 7, 2014 at 10:19 AM, Mark Baker <di...@acm.org> wrote:
On Tue, May 6, 2014 at 9:09 AM, Jacob Eisinger <je...@us.ibm.com>
wrote:
> In a nut shell, Spark opens up a couple of well known ports. And,then
the workers and the shell open up dynamic ports for each job. These
dynamic ports make securing the Spark network difficult.
Indeed.
Judging by the frequency with which this topic arises, this is a
concern for many (myself included).
I couldn't find anything in JIRA about it, but I'm curious to know
whether the Spark team considers this a problem in need of a fix?
Mark.
Re: Comprehensive Port Configuration reference?
Posted by Andrew Ash <an...@andrewash.com>.
Hi everyone,
I've also been interested in better understanding what ports are used where
and the direction the network connections go. I've observed a running
cluster and read through code, and came up with the below documentation
addition.
https://github.com/apache/spark/pull/856
Scott and Jacob -- it sounds like you two have pulled together some of this
yourselves for writing firewall rules. Would you mind taking a look at
this pull request and confirming that it matches your observations? Wrong
documentation is worse than no documentation, so I'd like to make sure this
is right.
Cheers,
Andrew
On Wed, May 7, 2014 at 10:19 AM, Mark Baker <di...@acm.org> wrote:
> On Tue, May 6, 2014 at 9:09 AM, Jacob Eisinger <je...@us.ibm.com> wrote:
> > In a nut shell, Spark opens up a couple of well known ports. And,then
> the workers and the shell open up dynamic ports for each job. These
> dynamic ports make securing the Spark network difficult.
>
> Indeed.
>
> Judging by the frequency with which this topic arises, this is a
> concern for many (myself included).
>
> I couldn't find anything in JIRA about it, but I'm curious to know
> whether the Spark team considers this a problem in need of a fix?
>
> Mark.
>
unsubscribe
Posted by eric perler <er...@hotmail.com>.
unsubscribe
Re: Comprehensive Port Configuration reference?
Posted by Mark Baker <di...@acm.org>.
On Tue, May 6, 2014 at 9:09 AM, Jacob Eisinger <je...@us.ibm.com> wrote:
> In a nut shell, Spark opens up a couple of well known ports. And,then the workers and the shell open up dynamic ports for each job. These dynamic ports make securing the Spark network difficult.
Indeed.
Judging by the frequency with which this topic arises, this is a
concern for many (myself included).
I couldn't find anything in JIRA about it, but I'm curious to know
whether the Spark team considers this a problem in need of a fix?
Mark.
Re: Comprehensive Port Configuration reference?
Posted by Jacob Eisinger <je...@us.ibm.com>.
Howdy Scott,
Please see the discussions about securing the Spark network [1] [2].
In a nut shell, Spark opens up a couple of well known ports. And,then the
workers and the shell open up dynamic ports for each job. These dynamic
ports make securing the Spark network difficult.
Jacob
[1]
http://apache-spark-user-list.1001560.n3.nabble.com/Securing-Spark-s-Network-td4832.html
[2]
http://apache-spark-user-list.1001560.n3.nabble.com/spark-shell-driver-interacting-with-Workers-in-YARN-mode-firewall-blocking-communication-td5237.html
Jacob D. Eisinger
IBM Emerging Technologies
jeising@us.ibm.com - (512) 286-6075
From: Scott Clasen <sc...@gmail.com>
To: user@spark.incubator.apache.org
Date: 05/05/2014 11:39 AM
Subject: Comprehensive Port Configuration reference?
Is there somewhere documented how one would go about configuring every open
port a spark application needs?
This seems like one of the main things that make running spark hard in
places like EC2 where you arent using the canned spark scripts.
Starting an app looks like you'll see ports open for
BlockManager
OutoutTracker
FileServer
WebUI
Local port to get callbacks from mesos master..
What else?
How do I configure all of these?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Comprehensive-Port-Configuration-reference-tp5384.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.