You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Andrew Ash <an...@andrewash.com> on 2014/06/09 08:26:25 UTC

Re: Comprehensive Port Configuration reference?

Hi Jacob,

The port configuration docs that we worked on together are now available
at:
http://spark.apache.org/docs/latest/spark-standalone.html#configuring-ports-for-network-security

Thanks for the help!

Andrew


On Wed, May 28, 2014 at 3:21 PM, Jacob Eisinger <je...@us.ibm.com> wrote:

> Howdy Andrew,
>
> This is a standalone cluster.  And, yes, if my understanding of Spark
> terminology is correct, you are correct about the port ownerships.
>
>
> Jacob
>
> Jacob D. Eisinger
> IBM Emerging Technologies
> jeising@us.ibm.com - (512) 286-6075
>
> [image: Inactive hide details for Andrew Ash ---05/28/2014 05:18:46
> PM---Hmm, those do look like 4 listening ports to me. PID 3404 is]Andrew
> Ash ---05/28/2014 05:18:46 PM---Hmm, those do look like 4 listening ports
> to me.  PID 3404 is an executor and PID 4762 is a worker?
>
>
> From: Andrew Ash <an...@andrewash.com>
> To: user@spark.apache.org
> Date: 05/28/2014 05:18 PM
>
> Subject: Re: Comprehensive Port Configuration reference?
> ------------------------------
>
>
>
> Hmm, those do look like 4 listening ports to me.  PID 3404 is an executor
> and PID 4762 is a worker?  This is a standalone cluster?
>
>
> On Wed, May 28, 2014 at 8:22 AM, Jacob Eisinger <*jeising@us.ibm.com*
> <je...@us.ibm.com>> wrote:
>
>    Howdy Andrew,
>
>    Here is what I ran before an application context was created (other
>    services have been deleted):
>
>       *# netstat -l -t tcp -p  --numeric-ports
>                                                 *
>       Active Internet connections (only servers)
>
>       Proto Recv-Q Send-Q Local Address           Foreign Address
>       State       PID/Program name
> * tcp6       0      0 **10.90.17.100:8888* <http://10.90.17.100:8888/>
> *       :::*                    LISTEN      4762/java
>                   tcp6       0      0 :::8081                 :::*
>            LISTEN      4762/java                                 *
>
>    And, then while the application context is up:
>       *# netstat -l -t tcp -p  --numeric-ports
>                                                 *
>       Active Internet connections (only servers)
>
>       Proto Recv-Q Send-Q Local Address           Foreign Address
>       State       PID/Program name
> * tcp6       0      0 **10.90.17.100:8888* <http://10.90.17.100:8888/>*
>           :::*                    LISTEN      4762/java
>             *
>
> * tcp6       0      0 :::57286                :::*
>        LISTEN      3404/java                                 tcp6       0      0 *
>       *10.90.17.100:38118* <http://10.90.17.100:38118/>
> *      :::*                    LISTEN      3404/java
>                 tcp6       0      0 **10.90.17.100:35530*
>       <http://10.90.17.100:35530/>
> *      :::*                    LISTEN      3404/java
>                 tcp6       0      0 :::60235                :::*
>          LISTEN      3404/java                                 *
> * tcp6       0      0 :::8081                 :::*
>        LISTEN      4762/java                                 *
>
>    My understanding is that this says four ports are open.  Is 57286 and
>    60235 not being used?
>
>
>    Jacob
>
>    Jacob D. Eisinger
>    IBM Emerging Technologies
> *jeising@us.ibm.com* <je...@us.ibm.com> - *(512) 286-6075*
>    <%28512%29%20286-6075>
>
>    [image: Inactive hide details for Andrew Ash ---05/25/2014 06:25:18
>    PM---Hi Jacob, The config option spark.history.ui.port is new for 1]Andrew
>    Ash ---05/25/2014 06:25:18 PM---Hi Jacob, The config option
>    spark.history.ui.port is new for 1.0  The problem that
>
>
>    From: Andrew Ash <*andrew@andrewash.com* <an...@andrewash.com>>
>    To: *user@spark.apache.org* <us...@spark.apache.org>
>    Date: 05/25/2014 06:25 PM
>
>    Subject: Re: Comprehensive Port Configuration reference?
>    ------------------------------
>
>
>
>    Hi Jacob,
>
>    The config option spark.history.ui.port is new for 1.0  The problem
>    that History server solves is that in non-Standalone cluster deployment
>    modes (Mesos and YARN) there is no long-lived Spark Master that can store
>    logs and statistics about an application after it finishes.  History server
>    is the UI that renders logged data from applications after they complete.
>
>    Read more here: *https://issues.apache.org/jira/browse/SPARK-1276*
>    <https://issues.apache.org/jira/browse/SPARK-1276> and
>    *https://github.com/apache/spark/pull/204*
>    <https://github.com/apache/spark/pull/204>
>
>    As far as the two vs four dynamic ports, are those all listening
>    ports?  I did observe 4 ports in use, but only two of them were listening.
>     The other two were the random ports used for responses on outbound
>    connections, the source port of the (srcIP, srcPort, dstIP, dstPort) tuple
>    that uniquely identifies a TCP socket.
>
>
>    *http://unix.stackexchange.com/questions/75011/how-does-the-server-find-out-what-client-port-to-send-to*
>    <http://unix.stackexchange.com/questions/75011/how-does-the-server-find-out-what-client-port-to-send-to>
>
>    Thanks for taking a look through!
>
>    I also realized that I had a couple mistakes with the 0.9 to 1.0
>    transition so appropriately documented those now as well in the updated PR.
>
>    Cheers!
>    Andrew
>
>
>
>    On Fri, May 23, 2014 at 2:43 PM, Jacob Eisinger <*jeising@us.ibm.com*
>    <je...@us.ibm.com>> wrote:
>       Howdy Andrew,
>
>       I noticed you have a configuration item that we were not aware of:
>       spark.history.ui.port .  Is that new for 1.0?
>
>       Also, we noticed that the Workers and the Drivers were opening up
>       four dynamic ports per application context.  It looks like you were seeing
>       two.
>
>       Everything else looks like it aligns!
>       Jacob
>
>
>
>       Jacob D. Eisinger
>       IBM Emerging Technologies
> *jeising@us.ibm.com* <je...@us.ibm.com> - *(512) 286-6075*
>       <%28512%29%20286-6075>
>
>       [image: Inactive hide details for Andrew Ash ---05/23/2014 10:30:58
>       AM---Hi everyone, I've also been interested in better understanding]Andrew
>       Ash ---05/23/2014 10:30:58 AM---Hi everyone, I've also been interested in
>       better understanding what ports are used where
>
>       From: Andrew Ash <*andrew@andrewash.com* <an...@andrewash.com>>
>       To: *user@spark.apache.org* <us...@spark.apache.org>
>       Date: 05/23/2014 10:30 AM
>       Subject: Re: Comprehensive Port Configuration reference?
>
>       ------------------------------
>
>
>
>       Hi everyone,
>
>       I've also been interested in better understanding what ports are
>       used where and the direction the network connections go.  I've observed a
>       running cluster and read through code, and came up with the below
>       documentation addition.
>
> *https://github.com/apache/spark/pull/856*
>       <https://github.com/apache/spark/pull/856>
>
>       Scott and Jacob -- it sounds like you two have pulled together some
>       of this yourselves for writing firewall rules.  Would you mind taking a
>       look at this pull request and confirming that it matches your observations?
>        Wrong documentation is worse than no documentation, so I'd like to make
>       sure this is right.
>
>       Cheers,
>       Andrew
>
>
>       On Wed, May 7, 2014 at 10:19 AM, Mark Baker <*distobj@acm.org*
>       <di...@acm.org>> wrote:
>          On Tue, May 6, 2014 at 9:09 AM, Jacob Eisinger <
>          *jeising@us.ibm.com* <je...@us.ibm.com>> wrote:
>          > In a nut shell, Spark opens up a couple of well known ports.
>           And,then the workers and the shell open up dynamic ports for each job.
>           These dynamic ports make securing the Spark network difficult.
>
>          Indeed.
>
>          Judging by the frequency with which this topic arises, this is a
>          concern for many (myself included).
>
>          I couldn't find anything in JIRA about it, but I'm curious to
>          know
>          whether the Spark team considers this a problem in need of a fix?
>
>          Mark.
>
>
>
>