You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by pwendell <gi...@git.apache.org> on 2014/05/26 07:30:54 UTC

[GitHub] spark pull request: Organize configuration options

GitHub user pwendell opened a pull request:

    https://github.com/apache/spark/pull/880

    Organize configuration options

    This PR improves and organizes the config option page
    and makes a few other changes to config docs. See a preview here:
    http://people.apache.org/~pwendell/config-improvements/configuration.html
    
    The biggest changes are:
    1. The configs for the standalone master/workers were moved to the
    standalone page and out of the general config doc.
    2. SPARK_LOCAL_DIRS was missing from the standalone docs.
    3. Expanded discussion of injecting configs with spark-submit, including an
    example.
    4. Config options were organized into the following categories:
    - Runtime Environment
    - Shuffle Behavior
    - Spark UI
    - Compression and Serialization
    - Execution Behavior
    - Networking
    - Scheduling
    - Security
    - Spark Streaming


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pwendell/spark config-cleanup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/880.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #880
    
----
commit 204b2480028a1a4256ed248f4dbf689b60723ac3
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-05-25T03:05:19Z

    Small fixes

commit 4af9e07494b4de99e1e099ff9c04a74fa3f02951
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-05-25T03:25:58Z

    Adding SPARK_LOCAL_DIRS docs

commit 2d719efd9f68563119be1f527e97a19df4aa7485
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-05-25T03:26:30Z

    Small fix

commit 29b54461e07557d66cfa7128f6c222106ce5a5e8
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-05-25T03:56:05Z

    Better discussion of spark-submit in configuration docs

commit 592e94ac20f4d209c9e2334875f33d811f5e1a64
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-05-25T07:28:10Z

    Stash

commit 54b184d4a3c10386fd73cf8b8d0db7800d4ac560
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-05-25T04:40:10Z

    Adding standalone configs to the standalone page

commit f7e79bc42c1635686c3af01eef147dae92de2529
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-05-26T04:43:11Z

    Re-organizing config options.
    
    This uses the following categories:
    - Runtime Environment
    - Shuffle Behavior
    - Spark UI
    - Compression and Serialization
    - Execution Behavior
    - Networking
    - Scheduling
    - Security
    - Spark Streaming

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13158307
  
    --- Diff: docs/configuration.md ---
    @@ -201,54 +282,41 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    -  <td><code>spark.ui.filters</code></td>
    -  <td>None</td>
    +  <td><code>spark.ui.killEnabled</code></td>
    +  <td>true</td>
       <td>
    -    Comma separated list of filter class names to apply to the Spark web ui. The filter should be a
    -    standard javax servlet Filter. Parameters to each filter can also be specified by setting a
    -    java system property of spark.&lt;class name of filter&gt;.params='param1=value1,param2=value2'
    -    (e.g. -Dspark.ui.filters=com.test.filter1 -Dspark.com.test.filter1.params='param1=foo,param2=testing')
    +    Allows stages and corresponding jobs to be killed from the web ui.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.ui.acls.enable</code></td>
    +  <td><code>spark.eventLog.enabled</code></td>
       <td>false</td>
       <td>
    -    Whether spark web ui acls should are enabled. If enabled, this checks to see if the user has
    -    access permissions to view the web ui. See <code>spark.ui.view.acls</code> for more details.
    -    Also note this requires the user to be known, if the user comes across as null no checks
    -    are done. Filters can be used to authenticate and set the user.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.ui.view.acls</code></td>
    -  <td>Empty</td>
    -  <td>
    -    Comma separated list of users that have view access to the spark web ui. By default only the
    -    user that started the Spark job has view access.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.ui.killEnabled</code></td>
    -  <td>true</td>
    -  <td>
    -    Allows stages and corresponding jobs to be killed from the web ui.
    +    Whether to log spark events, useful for reconstructing the Web UI after the application has
    +    finished.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.shuffle.compress</code></td>
    -  <td>true</td>
    +  <td><code>spark.eventLog.compress</code></td>
    +  <td>false</td>
       <td>
    -    Whether to compress map output files. Generally a good idea.
    +    Whether to compress logged events, if <code>spark.eventLog.enabled</code> is true.
    --- End diff --
    
    It doesn't do application-aware compression, it just uses a standard stream compression algorithm.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by ash211 <gi...@git.apache.org>.

Github user ash211 commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44158903
  
    Hi Patrick I just added some documentation for ports that Spark uses in the below pull request that was merged this afternoon. Can you make sure that info doesn't get lost in this update?
    
    Much appreciation for the docs updates! 
    
    https://github.com/apache/spark/pull/856


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44474644
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44160220
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44471697
  
    Made a few other small comments but it looks good otherwise, feel free to merge it when they're fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44474651
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13110044
  
    --- Diff: docs/index.md ---
    @@ -5,7 +5,7 @@ title: Spark Overview
     
     Apache Spark is a fast and general-purpose cluster computing system.
     It provides high-level APIs in [Scala](scala-programming-guide.html), [Java](java-programming-guide.html), and [Python](python-programming-guide.html) that make parallel jobs easy to write, and an optimized engine that supports general computation graphs.
    -It also supports a rich set of higher-level tools including [Shark](http://shark.cs.berkeley.edu) (Hive on Spark), [MLlib](mllib-guide.html) for machine learning, [GraphX](graphx-programming-guide.html) for graph processing, and [Spark Streaming](streaming-programming-guide.html).
    +It also supports a rich set of higher-level tools including [Spark SQL](sql-programming-guide.html) (SQL on Spark), [MLlib](mllib-guide.html) for machine learning, [GraphX](graphx-programming-guide.html) for graph processing, and [Spark Streaming](streaming-programming-guide.html).
    --- End diff --
    
    I make a similar change in #896 so probably best to remove this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13159214
  
    --- Diff: docs/configuration.md ---
    @@ -260,59 +328,44 @@ Apart from these, the following properties are also available, and may be useful
       <td><code>spark.rdd.compress</code></td>
       <td>false</td>
       <td>
    -    Whether to compress serialized RDD partitions (e.g. for <code>StorageLevel.MEMORY_ONLY_SER</code>).
    -    Can save substantial space at the cost of some extra CPU time.
    +    Whether to compress serialized RDD partitions (e.g. for
    +    <code>StorageLevel.MEMORY_ONLY_SER</code>). Can save substantial space at the cost of some
    +    extra CPU time.
       </td>
     </tr>
     <tr>
       <td><code>spark.io.compression.codec</code></td>
       <td>org.apache.spark.io.<br />LZFCompressionCodec</td>
       <td>
    -    The codec used to compress internal data such as RDD partitions and shuffle outputs. By default,
    -    Spark provides two codecs: <code>org.apache.spark.io.LZFCompressionCodec</code> and
    -    <code>org.apache.spark.io.SnappyCompressionCodec</code>.
    +    The codec used to compress internal data such as RDD partitions and shuffle outputs.
    +    By default, Spark provides two codecs: <code>org.apache.spark.io.LZFCompressionCodec</code>
    +    and <code>org.apache.spark.io.SnappyCompressionCodec</code>.
       </td>
     </tr>
     <tr>
       <td><code>spark.io.compression.snappy.block.size</code></td>
       <td>32768</td>
       <td>
    -    Block size (in bytes) used in Snappy compression, in the case when Snappy compression codec is
    -    used.
    +    Block size (in bytes) used in Snappy compression, in the case when Snappy compression codec
    +    is used.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.scheduler.mode</code></td>
    -  <td>FIFO</td>
    -  <td>
    -    The <a href="job-scheduling.html#scheduling-within-an-application">scheduling mode</a> between
    -    jobs submitted to the same SparkContext. Can be set to <code>FAIR</code>
    -    to use fair sharing instead of queueing jobs one after another. Useful for
    -    multi-user services.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.scheduler.revive.interval</code></td>
    -  <td>1000</td>
    -  <td>
    -    The interval length for the scheduler to revive the worker resource offers to run tasks. (in
    -    milliseconds)
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.reducer.maxMbInFlight</code></td>
    -  <td>48</td>
    +  <td><code>spark.closure.serializer</code></td>
    +  <td>org.apache.spark.serializer.<br />JavaSerializer</td>
       <td>
    -    Maximum size (in megabytes) of map outputs to fetch simultaneously from each reduce task. Since
    -    each output requires us to create a buffer to receive it, this represents a fixed memory
    -    overhead per reduce task, so keep it small unless you have a large amount of memory.
    +    Serializer class to use for closures. Currently only the Java serializer is supported.
    --- End diff --
    
    That's a good question :P I think this used to be change-able but it was reverted. I'm not totally sure about the genealogy of this so I'm going to leave it, but it's worth seeing if we should just remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44474264
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15266/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by ash211 <gi...@git.apache.org>.

Github user ash211 commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44198430
  
    Awesome! Thanks for navigating that merge. I'll take a look at what you've
    got later today but don't block on me for committing. I'll do a post-commit
    review if it's already in when I take a look.
    On May 25, 2014 11:21 PM, "Patrick Wendell" <no...@github.com>
    wrote:
    
    > @ash211 <https://github.com/ash211> okay I think this is up-to-date with
    > your changes. I did move your security stuff over to the standalone cluster
    > page, since almost every entry was only relevant to standalone mode. I also
    > might have dropped a few cases where you improved some in-code
    > indentation... I'll try to find every last one but the diff was very
    > complicated.
    >
    > —
    > Reply to this email directly or view it on GitHub<https://github.com/apache/spark/pull/880#issuecomment-44160359>
    > .
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44160359
  
    @ash211 okay I think this is up-to-date with your changes. I did move your security stuff over to the standalone cluster page, since almost every entry was only relevant to standalone mode. I also might have dropped a few cases where you improved some in-code indentation... I'll try to find every last one but the diff was very complicated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13159489
  
    --- Diff: docs/configuration.md ---
    @@ -705,42 +720,69 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    -  <td><code>spark.task.cpus</code></td>
    -  <td>1</td>
    +  <td><code>spark.ui.filters</code></td>
    +  <td>None</td>
       <td>
    -    Number of cores to allocate for each task.
    +    Comma separated list of filter class names to apply to the Spark web ui. The filter should be a
    --- End diff --
    
    UI


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13110259
  
    --- Diff: docs/configuration.md ---
    @@ -601,91 +626,59 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    -  <td><code>spark.logConf</code></td>
    -  <td>false</td>
    -  <td>
    -    Whether to log the supplied SparkConf as INFO at start of spark context.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.eventLog.enabled</code></td>
    -  <td>false</td>
    -  <td>
    -    Whether to log spark events, useful for reconstructing the Web UI after the application has
    -    finished.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.eventLog.compress</code></td>
    -  <td>false</td>
    -  <td>
    -    Whether to compress logged events, if <code>spark.eventLog.enabled</code> is true.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.eventLog.dir</code></td>
    -  <td>file:///tmp/spark-events</td>
    -  <td>
    -    Base directory in which spark events are logged, if <code>spark.eventLog.enabled</code> is true.
    -    Within this base directory, Spark creates a sub-directory for each application, and logs the
    -    events specific to the application in this directory.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.deploy.spreadOut</code></td>
    -  <td>true</td>
    +  <td><code>spark.locality.wait</code></td>
    +  <td>3000</td>
       <td>
    -    Whether the standalone cluster manager should spread applications out across nodes or try to
    -    consolidate them onto as few nodes as possible. Spreading out is usually better for data
    -    locality in HDFS, but consolidating is more efficient for compute-intensive workloads. <br/>
    -    <b>Note:</b> this setting needs to be configured in the standalone cluster master, not in
    -    individual applications; you can set it through <code>SPARK_MASTER_OPTS</code> in
    -    <code>spark-env.sh</code>.
    +    Number of milliseconds to wait to launch a data-local task before giving up and launching it
    +    on a less-local node. The same wait will be used to step through multiple locality levels
    +    (process-local, node-local, rack-local and then any). It is also possible to customize the
    +    waiting time for each level by setting <code>spark.locality.wait.node</code>, etc.
    +    You should increase this setting if your tasks are long and see poor locality, but the
    +    default usually works well.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.deploy.defaultCores</code></td>
    -  <td>(infinite)</td>
    +  <td><code>spark.locality.wait.process</code></td>
    +  <td>spark.locality.wait</td>
       <td>
    -    Default number of cores to give to applications in Spark's standalone mode if they don't set
    -    <code>spark.cores.max</code>. If not set, applications always get all available cores unless
    -    they configure <code>spark.cores.max</code> themselves.  Set this lower on a shared cluster to
    -    prevent users from grabbing the whole cluster by default. <br/> <b>Note:</b> this setting needs
    -    to be configured in the standalone cluster master, not in individual applications; you can set
    -    it through <code>SPARK_MASTER_OPTS</code> in <code>spark-env.sh</code>.
    +    Customize the locality wait for process locality. This affects tasks that attempt to access
    +    cached data in a particular executor process.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.files.overwrite</code></td>
    -  <td>false</td>
    +  <td><code>spark.locality.wait.node</code></td>
    +  <td>spark.locality.wait</td>
       <td>
    -    Whether to overwrite files added through SparkContext.addFile() when the target file exists and
    -    its contents do not match those of the source.
    +    Customize the locality wait for node locality. For example, you can set this to 0 to skip
    +    node locality and search immediately for rack locality (if your cluster has rack information).
       </td>
     </tr>
     <tr>
    -  <td><code>spark.files.fetchTimeout</code></td>
    -  <td>false</td>
    +  <td><code>spark.locality.wait.rack</code></td>
    +  <td>spark.locality.wait</td>
       <td>
    -    Communication timeout to use when fetching files added through SparkContext.addFile() from
    -    the driver.
    +    Customize the locality wait for rack locality.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.files.userClassPathFirst</code></td>
    -  <td>false</td>
    +  <td><code>spark.scheduler.revive.interval</code></td>
    +  <td>1000</td>
       <td>
    -    (Experimental) Whether to give user-added jars precedence over Spark's own jars when
    -    loading classes in Executors. This feature can be used to mitigate conflicts between
    -    Spark's dependencies and user dependencies. It is currently an experimental feature.
    +    The interval length for the scheduler to revive the worker resource offers to run tasks.
    +    (in milliseconds)
       </td>
     </tr>
    +</table>
    +
    +#### Security
    +<table class="table">
    +<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
     <tr>
       <td><code>spark.authenticate</code></td>
       <td>false</td>
       <td>
    -    Whether spark authenticates its internal connections. See <code>spark.authenticate.secret</code>
    -    if not running on Yarn.
    +    Whether spark authenticates its internal connections. See
    +    <code>spark.authenticate.secret</code> if not running on Yarn.
    --- End diff --
    
    Capitalize YARN and Spark throughout this doc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44158456
  
     Build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13110228
  
    --- Diff: docs/configuration.md ---
    @@ -151,41 +230,43 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    -  <td><code>spark.storage.memoryMapThreshold</code></td>
    -  <td>8192</td>
    +  <td><code>spark.shuffle.compress</code></td>
    +  <td>true</td>
       <td>
    -    Size of a block, in bytes, above which Spark memory maps when reading a block from disk.
    -    This prevents Spark from memory mapping very small blocks. In general, memory
    -    mapping has high overhead for blocks close to or below the page size of the operating system.
    +    Whether to compress map output files. Generally a good idea.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.tachyonStore.baseDir</code></td>
    -  <td>System.getProperty("java.io.tmpdir")</td>
    +  <td><code>spark.shuffle.file.buffer.kb</code></td>
    +  <td>100</td>
       <td>
    -    Directories of the Tachyon File System that store RDDs. The Tachyon file system's URL is set by
    -    <code>spark.tachyonStore.url</code>.  It can also be a comma-separated list of multiple
    -    directories on Tachyon file system.
    +    Size of the in-memory buffer for each shuffle file output stream, in kilobytes. These buffers
    +    reduce the number of disk seeks and system calls made in creating intermediate shuffle files.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.tachyonStore.url</code></td>
    -  <td>tachyon://localhost:19998</td>
    +  <td><code>spark.storage.memoryMapThreshold</code></td>
    --- End diff --
    
    This might make more sense under Execution Behavior


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44160024
  
    Build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13110123
  
    --- Diff: docs/configuration.md ---
    @@ -51,17 +66,32 @@ appear. For all other configuration properties, you can assume the default value
     
     ## All Configuration Properties
    --- End diff --
    
    Maybe rename this to Available Properties


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/880


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by ash211 <gi...@git.apache.org>.

Github user ash211 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13115497
  
    --- Diff: docs/configuration.md ---
    @@ -260,59 +328,44 @@ Apart from these, the following properties are also available, and may be useful
       <td><code>spark.rdd.compress</code></td>
       <td>false</td>
       <td>
    -    Whether to compress serialized RDD partitions (e.g. for <code>StorageLevel.MEMORY_ONLY_SER</code>).
    -    Can save substantial space at the cost of some extra CPU time.
    +    Whether to compress serialized RDD partitions (e.g. for
    +    <code>StorageLevel.MEMORY_ONLY_SER</code>). Can save substantial space at the cost of some
    +    extra CPU time.
       </td>
     </tr>
     <tr>
       <td><code>spark.io.compression.codec</code></td>
       <td>org.apache.spark.io.<br />LZFCompressionCodec</td>
       <td>
    -    The codec used to compress internal data such as RDD partitions and shuffle outputs. By default,
    -    Spark provides two codecs: <code>org.apache.spark.io.LZFCompressionCodec</code> and
    -    <code>org.apache.spark.io.SnappyCompressionCodec</code>.
    +    The codec used to compress internal data such as RDD partitions and shuffle outputs.
    +    By default, Spark provides two codecs: <code>org.apache.spark.io.LZFCompressionCodec</code>
    +    and <code>org.apache.spark.io.SnappyCompressionCodec</code>.
       </td>
     </tr>
     <tr>
       <td><code>spark.io.compression.snappy.block.size</code></td>
       <td>32768</td>
       <td>
    -    Block size (in bytes) used in Snappy compression, in the case when Snappy compression codec is
    -    used.
    +    Block size (in bytes) used in Snappy compression, in the case when Snappy compression codec
    +    is used.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.scheduler.mode</code></td>
    -  <td>FIFO</td>
    -  <td>
    -    The <a href="job-scheduling.html#scheduling-within-an-application">scheduling mode</a> between
    -    jobs submitted to the same SparkContext. Can be set to <code>FAIR</code>
    -    to use fair sharing instead of queueing jobs one after another. Useful for
    -    multi-user services.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.scheduler.revive.interval</code></td>
    -  <td>1000</td>
    -  <td>
    -    The interval length for the scheduler to revive the worker resource offers to run tasks. (in
    -    milliseconds)
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.reducer.maxMbInFlight</code></td>
    -  <td>48</td>
    +  <td><code>spark.closure.serializer</code></td>
    +  <td>org.apache.spark.serializer.<br />JavaSerializer</td>
       <td>
    -    Maximum size (in megabytes) of map outputs to fetch simultaneously from each reduce task. Since
    -    each output requires us to create a buffer to receive it, this represents a fixed memory
    -    overhead per reduce task, so keep it small unless you have a large amount of memory.
    +    Serializer class to use for closures. Currently only the Java serializer is supported.
    --- End diff --
    
    Why bother having a configuration option if you can't change it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44160434
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44470648
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44159245
  
    Hey Andrew, I actually meant to rebase on master before starting this precisely because of #856, but now I see that I missed it. I'll go back and make sure I include all your changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44158461
  
    Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44163612
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by ash211 <gi...@git.apache.org>.

Github user ash211 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13115427
  
    --- Diff: docs/configuration.md ---
    @@ -151,41 +230,43 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    -  <td><code>spark.storage.memoryMapThreshold</code></td>
    -  <td>8192</td>
    +  <td><code>spark.shuffle.compress</code></td>
    +  <td>true</td>
       <td>
    -    Size of a block, in bytes, above which Spark memory maps when reading a block from disk.
    -    This prevents Spark from memory mapping very small blocks. In general, memory
    -    mapping has high overhead for blocks close to or below the page size of the operating system.
    +    Whether to compress map output files. Generally a good idea.
    --- End diff --
    
    Putting the compression algorithm here would be good.  I think this is controlled by a setting somewhere?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44163613
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15205/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13110289
  
    --- Diff: docs/configuration.md ---
    @@ -705,40 +698,69 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    -  <td><code>spark.task.cpus</code></td>
    -  <td>1</td>
    +  <td><code>spark.ui.filters</code></td>
    +  <td>None</td>
       <td>
    -    Number of cores to allocate for each task.
    +    Comma separated list of filter class names to apply to the Spark web ui. The filter should be a
    +    standard javax servlet Filter. Parameters to each filter can also be specified by setting a
    +    java system property of spark.&lt;class name of filter&gt;.params='param1=value1,param2=value2'
    +    (e.g. -Dspark.ui.filters=com.test.filter1
    +    -Dspark.com.test.filter1.params='param1=foo,param2=testing')
       </td>
     </tr>
     <tr>
    -  <td><code>spark.executor.extraJavaOptions</code></td>
    -  <td>(none)</td>
    +  <td><code>spark.ui.acls.enable</code></td>
    +  <td>false</td>
       <td>
    -    A string of extra JVM options to pass to executors. For instance, GC settings or other
    -    logging. Note that it is illegal to set Spark properties or heap size settings with this 
    -    option. Spark properties should be set using a SparkConf object or the 
    -    spark-defaults.conf file used with the spark-submit script. Heap size settings can be set
    -    with spark.executor.memory.
    +    Whether spark web ui acls should are enabled. If enabled, this checks to see if the user has
    +    access permissions to view the web ui. See <code>spark.ui.view.acls</code> for more details.
    +    Also note this requires the user to be known, if the user comes across as null no checks
    +    are done. Filters can be used to authenticate and set the user.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.executor.extraClassPath</code></td>
    -  <td>(none)</td>
    +  <td><code>spark.ui.view.acls</code></td>
    +  <td>Empty</td>
       <td>
    -    Extra classpath entries to append to the classpath of executors. This exists primarily
    -    for backwards-compatibility with older versions of Spark. Users typically should not need
    -    to set this option.
    +    Comma separated list of users that have view access to the spark web ui. By default only the
    +    user that started the Spark job has view access.
       </td>
     </tr>
    +</table>
    +
    +#### Spark Streaming
    +<table class="table">
    +<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
     <tr>
    -  <td><code>spark.executor.extraLibraryPath</code></td>
    -  <td>(none)</td>
    +  <td><code>spark.cleaner.ttl</code></td>
    --- End diff --
    
    This is not really a streaming-specifc setting. I'd just move it into execution behavior since it's also useful for other long-running apps.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44161382
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44160217
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44160239
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15202/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44477622
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15267/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44161392
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44160025
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15201/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by ash211 <gi...@git.apache.org>.

Github user ash211 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13115420
  
    --- Diff: docs/configuration.md ---
    @@ -94,49 +127,95 @@ there are at least five properties that you will commonly want to control:
         comma-separated list of multiple directories on different disks.
     
         NOTE: In Spark 1.0 and later this will be overriden by SPARK_LOCAL_DIRS (Standalone, Mesos) or
    -    LOCAL_DIRS (YARN) envrionment variables set by the cluster manager.
    +    LOCAL_DIRS (YARN) environment variables set by the cluster manager.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.cores.max</code></td>
    -  <td>(not set)</td>
    +  <td><code>spark.logConf</code></td>
    +  <td>false</td>
       <td>
    -    When running on a <a href="spark-standalone.html">standalone deploy cluster</a> or a
    -    <a href="running-on-mesos.html#mesos-run-modes">Mesos cluster in "coarse-grained"
    -    sharing mode</a>, the maximum amount of CPU cores to request for the application from
    -    across the cluster (not from each machine). If not set, the default will be
    -    <code>spark.deploy.defaultCores</code> on Spark's standalone cluster manager, or
    -    infinite (all available cores) on Mesos.
    +    Logs the effective SparkConf as INFO when a SparkContext is started.
       </td>
     </tr>
     </table>
     
    -
     Apart from these, the following properties are also available, and may be useful in some situations:
     
    +#### Runtime Environment
     <table class="table">
     <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
     <tr>
    -  <td><code>spark.default.parallelism</code></td>
    +  <td><code>spark.executor.memory</code></td>
    +  <td>512m</td>
       <td>
    -    <ul>
    -      <li>Local mode: number of cores on the local machine</li>
    -      <li>Mesos fine grained mode: 8</li>
    -      <li>Others: total number of cores on all executor nodes or 2, whichever is larger</li>
    -    </ul>
    +    Amount of memory to use per executor process, in the same format as JVM memory strings
    +    (e.g. <code>512m</code>, <code>2g</code>).
       </td>
    +</tr>
    +<tr>
    +  <td><code>spark.executor.extraJavaOptions</code></td>
    +  <td>(none)</td>
       <td>
    -    Default number of tasks to use across the cluster for distributed shuffle operations
    -    (<code>groupByKey</code>, <code>reduceByKey</code>, etc) when not set by user.
    +    A string of extra JVM options to pass to executors. For instance, GC settings or other
    +    logging. Note that it is illegal to set Spark properties or heap size settings with this
    +    option. Spark properties should be set using a SparkConf object or the
    +    spark-defaults.conf file used with the spark-submit script. Heap size settings can be set
    +    with spark.executor.memory.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.storage.memoryFraction</code></td>
    -  <td>0.6</td>
    +  <td><code>spark.executor.extraClassPath</code></td>
    +  <td>(none)</td>
       <td>
    -    Fraction of Java heap to use for Spark's memory cache. This should not be larger than the "old"
    -    generation of objects in the JVM, which by default is given 0.6 of the heap, but you can increase
    -    it if you configure your own old generation size.
    +    Extra classpath entries to append to the classpath of executors. This exists primarily
    +    for backwards-compatibility with older versions of Spark. Users typically should not need
    +    to set this option.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.executor.extraLibraryPath</code></td>
    +  <td>(none)</td>
    +  <td>
    +    Set a special library path to use when launching executor JVM's.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.files.userClassPathFirst</code></td>
    +  <td>false</td>
    +  <td>
    +    (Experimental) Whether to give user-added jars precedence over Spark's own jars when
    +    loading classes in Executors. This feature can be used to mitigate conflicts between
    +    Spark's dependencies and user dependencies. It is currently an experimental feature.
    +  </td>
    +</tr>
    +</table>
    +
    +#### Shuffle Behavior
    +<table class="table">
    +<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
    +<tr>
    +  <td><code>spark.shuffle.consolidateFiles</code></td>
    +  <td>false</td>
    +  <td>
    +    If set to "true", consolidates intermediate files created during a shuffle. Creating fewer
    +    files can improve filesystem performance for shuffles with large numbers of reduce tasks. It
    +    is recommended to set this to "true" when using ext4 or xfs filesystems. On ext3, this option
    +    might degrade performance on machines with many (>8) cores due to filesystem limitations.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.shuffle.spill</code></td>
    +  <td>true</td>
    +  <td>
    +    If set to "true", limits the amount of memory used during reduces by spilling data out to disk.
    +    This spilling threshold is specified by <code>spark.shuffle.memoryFraction</code>.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.shuffle.spill.compress</code></td>
    +  <td>true</td>
    +  <td>
    +    Whether to compress data spilled during shuffles.
    --- End diff --
    
    What compression algorithm is used here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44158624
  
    Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by ash211 <gi...@git.apache.org>.

Github user ash211 commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44365341
  
    Thanks for the further docs organization @pwendell !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13159367
  
    --- Diff: docs/configuration.md ---
    @@ -705,42 +720,69 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    -  <td><code>spark.task.cpus</code></td>
    -  <td>1</td>
    +  <td><code>spark.ui.filters</code></td>
    +  <td>None</td>
       <td>
    -    Number of cores to allocate for each task.
    +    Comma separated list of filter class names to apply to the Spark web ui. The filter should be a
    +    standard <a href="http://docs.oracle.com/javaee/6/api/javax/servlet/Filter.html">
    +    javax servlet Filter</a>. Parameters to each filter can also be specified by setting a
    +    java system property of spark.&lt;class name of filter&gt;.params='param1=value1,param2=value2'
    +    (e.g. -Dspark.ui.filters=com.test.filter1
    +    -Dspark.com.test.filter1.params='param1=foo,param2=testing')
       </td>
     </tr>
     <tr>
    -  <td><code>spark.executor.extraJavaOptions</code></td>
    -  <td>(none)</td>
    +  <td><code>spark.ui.acls.enable</code></td>
    +  <td>false</td>
       <td>
    -    A string of extra JVM options to pass to executors. For instance, GC settings or other
    -    logging. Note that it is illegal to set Spark properties or heap size settings with this 
    -    option. Spark properties should be set using a SparkConf object or the 
    -    spark-defaults.conf file used with the spark-submit script. Heap size settings can be set
    -    with spark.executor.memory.
    +    Whether Spark web ui acls should are enabled. If enabled, this checks to see if the user has
    +    access permissions to view the web ui. See <code>spark.ui.view.acls</code> for more details.
    +    Also note this requires the user to be known, if the user comes across as null no checks
    +    are done. Filters can be used to authenticate and set the user.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.executor.extraClassPath</code></td>
    -  <td>(none)</td>
    +  <td><code>spark.ui.view.acls</code></td>
    +  <td>Empty</td>
       <td>
    -    Extra classpath entries to append to the classpath of executors. This exists primarily
    -    for backwards-compatibility with older versions of Spark. Users typically should not need
    -    to set this option.
    +    Comma separated list of users that have view access to the Spark web ui. By default only the
    +    user that started the Spark job has view access.
       </td>
     </tr>
    +</table>
    +
    +#### Spark Streaming
    +<table class="table">
    +<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
     <tr>
    -  <td><code>spark.executor.extraLibraryPath</code></td>
    -  <td>(none)</td>
    +  <td><code>spark.streaming.blockInterval</code></td>
    +  <td>200</td>
       <td>
    -    Set a special library path to use when launching executor JVM's.
    +    Interval (milliseconds) at which data received by Spark Streaming receivers is coalesced
    +    into blocks of data before storing them in Spark.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.streaming.unpersist</code></td>
    +  <td>true</td>
    +  <td>
    +    Force RDDs generated and persisted by Spark Streaming to be automatically unpersisted from
    +    Spark's memory. The raw input data received by Spark Streaming is also automatically cleared.
    +    Setting this to false will allow the raw data and persisted RDDs to be accessible outside the
    +    streaming application as they will not be cleared automatically. But it comes at the cost of
    +    higher memory usage in Spark.
       </td>
     </tr>
    -
     </table>
     
    +#### Cluster Managers (YARN, Mesos, Standalone)
    +Each cluster manager in Spark has additional configuration options. Configurations 
    +can be found on the pages for each mode:
    +
    + * [Yarn](running-on-yarn.html#configuration)
    --- End diff --
    
    Should say YARN


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44474754
  
    Okay I'm going to merge this. Thanks Matei!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44162519
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by ash211 <gi...@git.apache.org>.

Github user ash211 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13115464
  
    --- Diff: docs/configuration.md ---
    @@ -201,54 +282,41 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    -  <td><code>spark.ui.filters</code></td>
    -  <td>None</td>
    +  <td><code>spark.ui.killEnabled</code></td>
    +  <td>true</td>
       <td>
    -    Comma separated list of filter class names to apply to the Spark web ui. The filter should be a
    -    standard javax servlet Filter. Parameters to each filter can also be specified by setting a
    -    java system property of spark.&lt;class name of filter&gt;.params='param1=value1,param2=value2'
    -    (e.g. -Dspark.ui.filters=com.test.filter1 -Dspark.com.test.filter1.params='param1=foo,param2=testing')
    +    Allows stages and corresponding jobs to be killed from the web ui.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.ui.acls.enable</code></td>
    +  <td><code>spark.eventLog.enabled</code></td>
       <td>false</td>
       <td>
    -    Whether spark web ui acls should are enabled. If enabled, this checks to see if the user has
    -    access permissions to view the web ui. See <code>spark.ui.view.acls</code> for more details.
    -    Also note this requires the user to be known, if the user comes across as null no checks
    -    are done. Filters can be used to authenticate and set the user.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.ui.view.acls</code></td>
    -  <td>Empty</td>
    -  <td>
    -    Comma separated list of users that have view access to the spark web ui. By default only the
    -    user that started the Spark job has view access.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.ui.killEnabled</code></td>
    -  <td>true</td>
    -  <td>
    -    Allows stages and corresponding jobs to be killed from the web ui.
    +    Whether to log spark events, useful for reconstructing the Web UI after the application has
    +    finished.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.shuffle.compress</code></td>
    -  <td>true</td>
    +  <td><code>spark.eventLog.compress</code></td>
    +  <td>false</td>
       <td>
    -    Whether to compress map output files. Generally a good idea.
    +    Whether to compress logged events, if <code>spark.eventLog.enabled</code> is true.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.shuffle.spill.compress</code></td>
    -  <td>true</td>
    +  <td><code>spark.eventLog.dir</code></td>
    +  <td>file:///tmp/spark-events</td>
    --- End diff --
    
    The `file:///` URI implies to me that I could put HDFS or S3 URIs here.  Is that allowed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44159535
  
    @ash211 Hey by the way, are all of the changes in your PR just formatting related other than the environment varaible sections and the table of contents? I noticed you changed the linebreaks around in a bunch of places, was this just for line width or did you modify content as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44160444
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13159409
  
    --- Diff: docs/configuration.md ---
    @@ -774,104 +816,16 @@ The following variables can be set in `spark-env.sh`:
       </tr>
     </table>
     
    -In addition to the above, there are also options for setting up the Spark [standalone cluster
    -scripts](spark-standalone.html#cluster-launch-scripts), such as number of cores to use on each
    -machine and maximum memory.
    +In addition to the above, there are also options for setting up the Spark
    +[standalone cluster scripts](spark-standalone.html#cluster-launch-scripts), such as number of cores
    +to use on each machine and maximum memory.
     
    -Since `spark-env.sh` is a shell script, some of these can be set programmatically -- for example,
    -you might compute `SPARK_LOCAL_IP` by looking up the IP of a specific network interface.
    +Since `spark-env.sh` is a shell script, some of these can be set programmatically -- for example, you might
    +compute `SPARK_LOCAL_IP` by looking up the IP of a specific network interface.
     
     # Configuring Logging
     
     Spark uses [log4j](http://logging.apache.org/log4j/) for logging. You can configure it by adding a
     `log4j.properties` file in the `conf` directory. One way to start is to copy the existing
     `log4j.properties.template` located there.
    -
    -# Configuring Ports for Network Security
    -
    -Spark makes heavy use of the network, and some environments have strict requirements for using tight
    -firewall settings.  Below are the primary ports that Spark uses for its communication and how to
    -configure those ports.
    -
    -<table class="table">
    -  <tr>
    -    <th>From</th><th>To</th><th>Default Port</th><th>Purpose</th><th>Configuration
    -    Setting</th><th>Notes</th>
    -  </tr>
    -  <!-- Web UIs -->
    -  <tr>
    -    <td>Browser</td>
    -    <td>Standalone Cluster Master</td>
    -    <td>8080</td>
    -    <td>Web UI</td>
    -    <td><code>master.ui.port</code></td>
    -    <td>Jetty-based</td>
    -  </tr>
    -  <tr>
    -    <td>Browser</td>
    -    <td>Worker</td>
    -    <td>8081</td>
    -    <td>Web UI</td>
    -    <td><code>worker.ui.port</code></td>
    -    <td>Jetty-based</td>
    -  </tr>
    -  <tr>
    -    <td>Browser</td>
    -    <td>Driver</td>
    -    <td>4040</td>
    -    <td>Web UI</td>
    -    <td><code>spark.ui.port</code></td>
    -    <td>Jetty-based</td>
    -  </tr>
    -  <tr>
    -    <td>Browser</td>
    -    <td>History Server</td>
    -    <td>18080</td>
    -    <td>Web UI</td>
    -    <td><code>spark.history.ui.port</code></td>
    -    <td>Jetty-based</td>
    -  </tr>
    -
    -  <!-- Cluster interactions -->
    -  <tr>
    -    <td>Application</td>
    -    <td>Standalone Cluster Master</td>
    -    <td>7077</td>
    -    <td>Submit job to cluster</td>
    -    <td><code>spark.driver.port</code></td>
    -    <td>Akka-based.  Set to "0" to choose a port randomly</td>
    -  </tr>
    -  <tr>
    -    <td>Worker</td>
    -    <td>Standalone Cluster Master</td>
    -    <td>7077</td>
    -    <td>Join cluster</td>
    -    <td><code>spark.driver.port</code></td>
    -    <td>Akka-based.  Set to "0" to choose a port randomly</td>
    -  </tr>
    -  <tr>
    -    <td>Application</td>
    -    <td>Worker</td>
    -    <td>(random)</td>
    -    <td>Join cluster</td>
    -    <td><code>SPARK_WORKER_PORT</code> (standalone cluster)</td>
    -    <td>Akka-based</td>
    -  </tr>
    -
    -  <!-- Other misc stuff -->
    -  <tr>
    -    <td>Driver and other Workers</td>
    -    <td>Worker</td>
    -    <td>(random)</td>
    -    <td>
    -      <ul>
    -        <li>File server for file and jars</li>
    -        <li>Http Broadcast</li>
    -        <li>Class file server (Spark Shell only)</li>
    -      </ul>
    -    </td>
    -    <td>None</td>
    -    <td>Jetty-based.  Each of these services starts on a random port that cannot be configured</td>
    -  </tr>
    -
     </table>
    --- End diff --
    
    You need to delete this `</table>` too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by ash211 <gi...@git.apache.org>.

Github user ash211 commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44198224
  
    It was just line wrap changes other than the network ports addition and TOC
    sections. Probably should have made formatting a separate PR but I figured
    I was already mucking around so just put those in as well.
    On May 25, 2014 11:01 PM, "Patrick Wendell" <no...@github.com>
    wrote:
    
    > @ash211 <https://github.com/ash211> Hey by the way, are all of the
    > changes in your PR just formatting related other than the environment
    > varaible sections and the table of contents? I noticed you changed the
    > linebreaks around in a bunch of places, was this just for line width or did
    > you modify content as well?
    >
    > —
    > Reply to this email directly or view it on GitHub<https://github.com/apache/spark/pull/880#issuecomment-44159535>
    > .
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13110156
  
    --- Diff: docs/configuration.md ---
    @@ -51,17 +66,32 @@ appear. For all other configuration properties, you can assume the default value
     
     ## All Configuration Properties
     
    -Most of the properties that control internal settings have reasonable default values. However,
    -there are at least five properties that you will commonly want to control:
    +Most of the properties that control internal settings have reasonable default values. Some
    +of the most common options to set are:
     
     <table class="table">
     <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
     <tr>
    +  <td><strong><code>spark.app.name</code></strong></td>
    +  <td>(none)</td>
    +  <td>
    +    The name of your application. This will appear in the UI and in log data.
    +  </td>
    +</tr>
    +<tr>
    +  <td><strong><code>spark.master</code></strong></td>
    +  <td>(none)</td>
    +  <td>
    +    The cluster manager to connect to. See the list of
    +    <a href="scala-programming-guide.html#master-urls"> allowed master URL's</a>.
    +  </td>
    +</tr>
    +<tr>
    --- End diff --
    
    Why did you make these bold? I don't think they should be. If you want you can add (required).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44474263
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13159518
  
    --- Diff: docs/configuration.md ---
    @@ -705,42 +720,69 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    -  <td><code>spark.task.cpus</code></td>
    -  <td>1</td>
    +  <td><code>spark.ui.filters</code></td>
    +  <td>None</td>
       <td>
    -    Number of cores to allocate for each task.
    +    Comma separated list of filter class names to apply to the Spark web ui. The filter should be a
    +    standard <a href="http://docs.oracle.com/javaee/6/api/javax/servlet/Filter.html">
    +    javax servlet Filter</a>. Parameters to each filter can also be specified by setting a
    +    java system property of spark.&lt;class name of filter&gt;.params='param1=value1,param2=value2'
    +    (e.g. -Dspark.ui.filters=com.test.filter1
    +    -Dspark.com.test.filter1.params='param1=foo,param2=testing')
    --- End diff --
    
    These look weird now, they might be better with a `<br />` before each line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44162520
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15204/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44160656
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15203/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13110341
  
    --- Diff: docs/spark-standalone.md ---
    @@ -144,6 +152,72 @@ You can optionally configure the cluster further by setting environment variable
     
     **Note:** The launch scripts do not currently support Windows. To run a Spark cluster on Windows, start the master and workers by hand.
     
    +SPARK_MASTER_OPTS supports the following system properties:
    +
    +<table class="table">
    +<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
    +<tr>
    +  <td>spark.deploy.spreadOut</td>
    --- End diff --
    
    Add <code> around these as well to make them match the formatting in configuration.md


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by ash211 <gi...@git.apache.org>.

Github user ash211 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13115487
  
    --- Diff: docs/configuration.md ---
    @@ -260,59 +328,44 @@ Apart from these, the following properties are also available, and may be useful
       <td><code>spark.rdd.compress</code></td>
       <td>false</td>
       <td>
    -    Whether to compress serialized RDD partitions (e.g. for <code>StorageLevel.MEMORY_ONLY_SER</code>).
    -    Can save substantial space at the cost of some extra CPU time.
    +    Whether to compress serialized RDD partitions (e.g. for
    +    <code>StorageLevel.MEMORY_ONLY_SER</code>). Can save substantial space at the cost of some
    +    extra CPU time.
       </td>
     </tr>
     <tr>
       <td><code>spark.io.compression.codec</code></td>
       <td>org.apache.spark.io.<br />LZFCompressionCodec</td>
       <td>
    -    The codec used to compress internal data such as RDD partitions and shuffle outputs. By default,
    -    Spark provides two codecs: <code>org.apache.spark.io.LZFCompressionCodec</code> and
    -    <code>org.apache.spark.io.SnappyCompressionCodec</code>.
    +    The codec used to compress internal data such as RDD partitions and shuffle outputs.
    +    By default, Spark provides two codecs: <code>org.apache.spark.io.LZFCompressionCodec</code>
    +    and <code>org.apache.spark.io.SnappyCompressionCodec</code>.
    --- End diff --
    
    Any guidance on when to use Snappy instead of the default LZF?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13110273
  
    --- Diff: docs/configuration.md ---
    @@ -705,40 +698,69 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    -  <td><code>spark.task.cpus</code></td>
    -  <td>1</td>
    +  <td><code>spark.ui.filters</code></td>
    +  <td>None</td>
       <td>
    -    Number of cores to allocate for each task.
    +    Comma separated list of filter class names to apply to the Spark web ui. The filter should be a
    +    standard javax servlet Filter. Parameters to each filter can also be specified by setting a
    --- End diff --
    
    Link "javax servlet filter" to http://docs.oracle.com/javaee/6/api/javax/servlet/Filter.html; also capitalize UI above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by ash211 <gi...@git.apache.org>.

Github user ash211 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/880#discussion_r13115445
  
    --- Diff: docs/configuration.md ---
    @@ -201,54 +282,41 @@ Apart from these, the following properties are also available, and may be useful
       </td>
     </tr>
     <tr>
    -  <td><code>spark.ui.filters</code></td>
    -  <td>None</td>
    +  <td><code>spark.ui.killEnabled</code></td>
    +  <td>true</td>
       <td>
    -    Comma separated list of filter class names to apply to the Spark web ui. The filter should be a
    -    standard javax servlet Filter. Parameters to each filter can also be specified by setting a
    -    java system property of spark.&lt;class name of filter&gt;.params='param1=value1,param2=value2'
    -    (e.g. -Dspark.ui.filters=com.test.filter1 -Dspark.com.test.filter1.params='param1=foo,param2=testing')
    +    Allows stages and corresponding jobs to be killed from the web ui.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.ui.acls.enable</code></td>
    +  <td><code>spark.eventLog.enabled</code></td>
       <td>false</td>
       <td>
    -    Whether spark web ui acls should are enabled. If enabled, this checks to see if the user has
    -    access permissions to view the web ui. See <code>spark.ui.view.acls</code> for more details.
    -    Also note this requires the user to be known, if the user comes across as null no checks
    -    are done. Filters can be used to authenticate and set the user.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.ui.view.acls</code></td>
    -  <td>Empty</td>
    -  <td>
    -    Comma separated list of users that have view access to the spark web ui. By default only the
    -    user that started the Spark job has view access.
    -  </td>
    -</tr>
    -<tr>
    -  <td><code>spark.ui.killEnabled</code></td>
    -  <td>true</td>
    -  <td>
    -    Allows stages and corresponding jobs to be killed from the web ui.
    +    Whether to log spark events, useful for reconstructing the Web UI after the application has
    +    finished.
       </td>
     </tr>
     <tr>
    -  <td><code>spark.shuffle.compress</code></td>
    -  <td>true</td>
    +  <td><code>spark.eventLog.compress</code></td>
    +  <td>false</td>
       <td>
    -    Whether to compress map output files. Generally a good idea.
    +    Whether to compress logged events, if <code>spark.eventLog.enabled</code> is true.
    --- End diff --
    
    Does this mean compressed with a lossless compression algorithm, or is this compress events as in write aggregates ("event X occurred 10 times" vs "eventX; eventX; eventX; eventX; ...") ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44351463
  
    Oh one other thing: the "all config options" list should also have a section on deployment mode specific options, and just link to the docs for each deployment mode. This is just for completeness -- someone might come to this page and not find their topic of interest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44470659
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44160238
  
    Build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44160655
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44158617
  
     Build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44477621
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Organize configuration docs

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/880#issuecomment-44351250
  
    Hey Patrick, this looks pretty good organization-wise. I made some comments on the actual docs. One other thing: the beginning of configuration.md is very abrupt, saying just "Spark provides several locations to configure the system:". It made more sense in 0.9 (see http://spark.apache.org/docs/latest/configuration.html) -- it looks like some text was lost in earlier changes on master. Can you bring back that test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---