You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zsxwing <gi...@git.apache.org> on 2015/12/23 20:14:04 UTC

[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Update Strea...

GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/10453

    [SPARK-12507][Streaming][Document]Update Streaming configurations for 1.6

    /cc @tdas @brkyvz 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark streaming-conf

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10453.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10453
    
----
commit 42951371259cc1ef1dd39f1e6a2ebb5867326704
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2015-12-23T19:11:56Z

    Update Streaming configurations for 1.6

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10453#discussion_r48559015
  
    --- Diff: docs/configuration.md ---
    @@ -1600,6 +1600,33 @@ Apart from these, the following properties are also available, and may be useful
         How many batches the Spark Streaming UI and status APIs remember before garbage collecting.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.streaming.driver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record in driver. Because S3 doesn't
    +    support flushing of data, when using S3 for checkpointing, you should enable it to achieve read
    +    after write consistency.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.streaming.receiver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record in receivers. Because S3
    --- End diff --
    
    same thing here: `on the receivers` instead of `in receivers`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-169837009
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-169817911
  
    **[Test build #48971 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48971/consoleFull)** for PR 10453 at commit [`4d55b03`](https://github.com/apache/spark/commit/4d55b03af0c6cfb73833c8fe86fb7bf97f7c2c38).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10453#discussion_r48559444
  
    --- Diff: docs/configuration.md ---
    @@ -1600,6 +1600,33 @@ Apart from these, the following properties are also available, and may be useful
         How many batches the Spark Streaming UI and status APIs remember before garbage collecting.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.streaming.driver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record in driver. Because S3 doesn't
    +    support flushing of data, when using S3 for checkpointing, you should enable it to achieve read
    +    after write consistency.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.streaming.receiver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record in receivers. Because S3
    +    doesn't support flushing of data, when using S3 for checkpointing, you should enable it to
    +    achieve read after write consistency.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.streaming.driver.writeAheadLog.allowBatching</code></td>
    +  <td>false</td>
    --- End diff --
    
    for me: the default value is `true`.
    
    That's why I want to expose this one since the behavior is different from 1.5.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10453#discussion_r48559275
  
    --- Diff: docs/configuration.md ---
    @@ -1600,6 +1600,33 @@ Apart from these, the following properties are also available, and may be useful
         How many batches the Spark Streaming UI and status APIs remember before garbage collecting.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.streaming.driver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record in driver. Because S3 doesn't
    +    support flushing of data, when using S3 for checkpointing, you should enable it to achieve read
    +    after write consistency.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.streaming.receiver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record in receivers. Because S3
    +    doesn't support flushing of data, when using S3 for checkpointing, you should enable it to
    +    achieve read after write consistency.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.streaming.driver.writeAheadLog.allowBatching</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to batch write ahead logs in driver to write. When using S3 for checkpointing, write
    +    operations in driver usually take too long. Enable batching write ahead logs will improve
    +    the performance of writing.
    --- End diff --
    
    I'd say `will improve the performance of write operations`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-169836865
  
    **[Test build #48980 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48980/consoleFull)** for PR 10453 at commit [`28a750d`](https://github.com/apache/spark/commit/28a750d61c058e537a8ca44babb3ff0f4b54f3b3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-168094523
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48516/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Update Strea...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-167752921
  
    Let's improve the title of items like this. "Update x" is never descriptive


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10453#discussion_r49127139
  
    --- Diff: docs/streaming-programming-guide.md ---
    @@ -2029,6 +2029,11 @@ If the data is being received by the receivers faster than what can be processed
     you can limit the rate by setting the [configuration parameter](configuration.html#spark-streaming)
    --- End diff --
    
    Can you remove this section completely? 
    - Remove the rate stuff as it has already been covered earlier. 
    - Put the closeAfterFileWrite stuff in the WAL sections above. Keeps things in context.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10453#discussion_r48559339
  
    --- Diff: docs/streaming-programming-guide.md ---
    @@ -2029,6 +2029,11 @@ If the data is being received by the receivers faster than what can be processed
     you can limit the rate by setting the [configuration parameter](configuration.html#spark-streaming)
     `spark.streaming.receiver.maxRate`.
     
    +If using S3 for checkpointing, please remember to enable `spark.streaming.driver.writeAheadLog.closeFileAfterWrite`
    +and `spark.streaming.receiver.writeAheadLog.closeFileAfterWrite`. You can also enable
    +`spark.streaming.driver.writeAheadLog.allowBatching` to improve the performance of writing write
    +ahead logs in driver. See [Spark Streaming Configuration](configuration.html#spark-streaming) or more details.
    --- End diff --
    
    `on the driver` and `for more details`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-167973270
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10453#discussion_r49137355
  
    --- Diff: docs/streaming-programming-guide.md ---
    @@ -1985,7 +1985,11 @@ To run a Spark Streaming applications, you need to have the following.
       to increase aggregate throughput. Additionally, it is recommended that the replication of the
       received data within Spark be disabled when the write ahead log is enabled as the log is already
       stored in a replicated storage system. This can be done by setting the storage level for the
    -  input stream to `StorageLevel.MEMORY_AND_DISK_SER`.
    +  input stream to `StorageLevel.MEMORY_AND_DISK_SER`. While using S3 (or any file system that
    +  does not support flushing) for Write Ahead Logs, please remember to enable
    --- End diff --
    
    nit: Write Ahead Logs is not in caps in this text. so please be consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-169818271
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10453#discussion_r49125309
  
    --- Diff: docs/configuration.md ---
    @@ -1600,6 +1600,34 @@ Apart from these, the following properties are also available, and may be useful
         How many batches the Spark Streaming UI and status APIs remember before garbage collecting.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.streaming.driver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record on the driver. Because S3 doesn't
    +    support flushing of data, when using S3 for checkpointing, you should enable it to achieve read
    +    after write consistency.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.streaming.receiver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record on the receivers. Because S3
    --- End diff --
    
    Because file systems like S3


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-168092271
  
    **[Test build #48516 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48516/consoleFull)** for PR 10453 at commit [`7d9b038`](https://github.com/apache/spark/commit/7d9b0389ddc2b03b259f9f2fa6b657b93cd5f3ea).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10453#discussion_r49127420
  
    --- Diff: docs/configuration.md ---
    @@ -1600,6 +1600,34 @@ Apart from these, the following properties are also available, and may be useful
         How many batches the Spark Streaming UI and status APIs remember before garbage collecting.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.streaming.driver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record on the driver. Because S3 doesn't
    +    support flushing of data, when using S3 for checkpointing, you should enable it to achieve read
    +    after write consistency.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.streaming.receiver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record on the receivers. Because S3
    +    doesn't support flushing of data, when using S3 for checkpointing, you should enable it to
    +    achieve read after write consistency.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.streaming.driver.writeAheadLog.allowBatching</code></td>
    +  <td>true</td>
    +  <td>
    +    Whether to batch write ahead logs on the driver to write. When using S3 for checkpointing, write
    +    operations on the driver usually take too long. Enabling batching write ahead logs will improve
    --- End diff --
    
    Whether to batch writes to the metadata WAL in the driver. This is useful from improving performance for file systems like S3 where the write latency is high and / or scenarios with large number of receivers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-169837010
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48980/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-169831376
  
    just one more comment. then LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-169818273
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48971/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-167903860
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48436/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10453


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-167902303
  
    @BenFradet Addressed. Thanks for your reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-169813237
  
    **[Test build #48971 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48971/consoleFull)** for PR 10453 at commit [`4d55b03`](https://github.com/apache/spark/commit/4d55b03af0c6cfb73833c8fe86fb7bf97f7c2c38).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-167903857
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-167973342
  
    Maybe we can also include that `allowBatching` is not just helpful when `closeFileAfterWrite` is enabled, but is also very helpful to scale to a large number of receivers 50+ for example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10453#discussion_r49125847
  
    --- Diff: docs/streaming-programming-guide.md ---
    @@ -2029,6 +2029,11 @@ If the data is being received by the receivers faster than what can be processed
     you can limit the rate by setting the [configuration parameter](configuration.html#spark-streaming)
     `spark.streaming.receiver.maxRate`.
     
    +If using S3 for checkpointing, please remember to enable `spark.streaming.driver.writeAheadLog.closeFileAfterWrite`
    --- End diff --
    
    "If" --> "while"
    checkpointing --> Write Ahead Logs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10453#discussion_r48558980
  
    --- Diff: docs/configuration.md ---
    @@ -1600,6 +1600,33 @@ Apart from these, the following properties are also available, and may be useful
         How many batches the Spark Streaming UI and status APIs remember before garbage collecting.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.streaming.driver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record in driver. Because S3 doesn't
    --- End diff --
    
    I'd say `on the driver` instead of `in driver`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-168094522
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-167848381
  
    I have a few comments on phrasing but otherwise it lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10453#discussion_r48559086
  
    --- Diff: docs/configuration.md ---
    @@ -1600,6 +1600,33 @@ Apart from these, the following properties are also available, and may be useful
         How many batches the Spark Streaming UI and status APIs remember before garbage collecting.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.streaming.driver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record in driver. Because S3 doesn't
    +    support flushing of data, when using S3 for checkpointing, you should enable it to achieve read
    +    after write consistency.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.streaming.receiver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record in receivers. Because S3
    +    doesn't support flushing of data, when using S3 for checkpointing, you should enable it to
    +    achieve read after write consistency.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.streaming.driver.writeAheadLog.allowBatching</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to batch write ahead logs in driver to write. When using S3 for checkpointing, write
    --- End diff --
    
    Here, I'd say `on the driver` instead of `in driver to write`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-169862067
  
    LGTM. Merging this to master and 1.6. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-168194624
  
    @zsxwing Thanks! LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-168091405
  
    > Maybe we can also include that allowBatching is not just helpful when closeFileAfterWrite is enabled, but is also very helpful to scale to a large number of receivers 50+ for example.
    
    I guess `enabled` should be `disabled`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Update Strea...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-166976157
  
    **[Test build #48251 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48251/consoleFull)** for PR 10453 at commit [`4295137`](https://github.com/apache/spark/commit/42951371259cc1ef1dd39f1e6a2ebb5867326704).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Update Strea...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-166978301
  
    **[Test build #48251 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48251/consoleFull)** for PR 10453 at commit [`4295137`](https://github.com/apache/spark/commit/42951371259cc1ef1dd39f1e6a2ebb5867326704).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-169833334
  
    **[Test build #48980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48980/consoleFull)** for PR 10453 at commit [`28a750d`](https://github.com/apache/spark/commit/28a750d61c058e537a8ca44babb3ff0f4b54f3b3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-167902755
  
    **[Test build #48436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48436/consoleFull)** for PR 10453 at commit [`bce7a29`](https://github.com/apache/spark/commit/bce7a29de2966024103258031eeecb369e6d45b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10453#discussion_r49125736
  
    --- Diff: docs/configuration.md ---
    @@ -1600,6 +1600,34 @@ Apart from these, the following properties are also available, and may be useful
         How many batches the Spark Streaming UI and status APIs remember before garbage collecting.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.streaming.driver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record on the driver. Because S3 doesn't
    --- End diff --
    
    Rewrite similar to below, replacing driver with receiver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10453#discussion_r48559209
  
    --- Diff: docs/configuration.md ---
    @@ -1600,6 +1600,33 @@ Apart from these, the following properties are also available, and may be useful
         How many batches the Spark Streaming UI and status APIs remember before garbage collecting.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.streaming.driver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record in driver. Because S3 doesn't
    +    support flushing of data, when using S3 for checkpointing, you should enable it to achieve read
    +    after write consistency.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.streaming.receiver.writeAheadLog.closeFileAfterWrite</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to close the file after writing a write ahead log record in receivers. Because S3
    +    doesn't support flushing of data, when using S3 for checkpointing, you should enable it to
    +    achieve read after write consistency.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.streaming.driver.writeAheadLog.allowBatching</code></td>
    +  <td>false</td>
    +  <td>
    +    Whether to batch write ahead logs in driver to write. When using S3 for checkpointing, write
    +    operations in driver usually take too long. Enable batching write ahead logs will improve
    --- End diff --
    
    same: `on the`
    and `Enabling` instead of `Enable`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-167903798
  
    **[Test build #48436 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48436/consoleFull)** for PR 10453 at commit [`bce7a29`](https://github.com/apache/spark/commit/bce7a29de2966024103258031eeecb369e6d45b4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Update Strea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-166978434
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Update Strea...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-166978436
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48251/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12507][Streaming][Document]Expose close...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10453#issuecomment-168094430
  
    **[Test build #48516 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48516/consoleFull)** for PR 10453 at commit [`7d9b038`](https://github.com/apache/spark/commit/7d9b0389ddc2b03b259f9f2fa6b657b93cd5f3ea).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org