You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HeartSaVioR <gi...@git.apache.org> on 2018/08/26 21:22:48 UTC

[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

GitHub user HeartSaVioR opened a pull request:

    https://github.com/apache/spark/pull/22238

    [SPARK-25245][DOCS][SS] Explain regarding limiting modification on "spark.sql.shuffle.partitions" for structured streaming

    ## What changes were proposed in this pull request?
    
    This patch adds explanation of `why "spark.sql.shuffle.partitions" keeps unchanged in structured streaming`, which couple of users already wondered and some of them even thought it as a bug.
    
    This patch would help other end users to know about such behavior before they find by theirselves and being wondered.
    
    ## How was this patch tested?
    
    No need to test because this is a simple addition on guide doc with markdown editor.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HeartSaVioR/spark SPARK-25245

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22238.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22238
    
----
commit ec24f2941021d34058d7af3d3f103094582fd79d
Author: Jungtaek Lim <ka...@...>
Date:   2018-08-26T21:08:04Z

    SPARK-25245 Explain regarding limiting modification on "spark.sql.shuffle.partitions" for structured streaming

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213176029
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -2812,7 +2812,18 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f
     
     # Additional Information
     
    -**Further Reading**
    +## Configuration Options For Structured Streaming
    +
    +This section is for configurations which are only available for structured streaming, or they behave differently with batch query.
    +
    +- spark.sql.shuffle.partitions
    --- End diff --
    
    We do have it in `sql-programming-guide.md`. Shall we add some info there for now?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213264786
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -266,7 +266,9 @@ object SQLConf {
         .createWithDefault(Long.MaxValue)
     
       val SHUFFLE_PARTITIONS = buildConf("spark.sql.shuffle.partitions")
    -    .doc("The default number of partitions to use when shuffling data for joins or aggregations.")
    +    .doc("The default number of partitions to use when shuffling data for joins or aggregations. " +
    +      "Note: For structured streaming, this configuration cannot be changed between query " +
    --- End diff --
    
    s/cannot be/must not be/


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22238
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213177623
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -2812,7 +2812,18 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f
     
     # Additional Information
     
    -**Further Reading**
    +## Configuration Options For Structured Streaming
    +
    +This section is for configurations which are only available for structured streaming, or they behave differently with batch query.
    +
    +- spark.sql.shuffle.partitions
    --- End diff --
    
    IMHO, if something goes wrong with structured streaming, end users would try to review structured streaming guide doc, rather than sql programming guide doc. Could we wait for hearing more voices on this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22238
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213181165
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -2812,6 +2812,19 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f
     
     # Additional Information
     
    +**Notes**
    +
    +- There're couple of configurations which are not modifiable once you run the query. If you really want to make changes for these configurations, you have to discard checkpoint and start a new query.
    +  - `spark.sql.shuffle.partitions`
    +    - This is due to the physical partitioning of state: state is partitioned via applying hash function to key, hence the number of partitions for state should be unchanged.
    +    - If you want to run less tasks for stateful operations, `coalesce` would help with avoiding unnecessary repartitioning.
    +      - e.g. `df.groupBy("time").count().coalesce(10)` reduces the number of tasks by 10, whereas `spark.sql.shuffle.partitions` may be bigger.
    +      - After `coalesce`, the number of (reduced) tasks will be kept unless another shuffle happens.
    +  - `spark.sql.streaming.stateStore.providerClass`
    --- End diff --
    
    Ah, okay, so there are more instances to describe here. If so, im okay.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22238
  
    **[Test build #95326 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95326/testReport)** for PR 22238 at commit [`e2ee43d`](https://github.com/apache/spark/commit/e2ee43da2f9bf4fb95c938764ee3584bbae06c1b).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/22238
  
    Also adding @tdas @zsxwing @jose-torres to cc.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213267329
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -868,7 +870,9 @@ object SQLConf {
           .internal()
           .doc(
             "The class used to manage state data in stateful streaming queries. This class must " +
    -          "be a subclass of StateStoreProvider, and must have a zero-arg constructor.")
    +          "be a subclass of StateStoreProvider, and must have a zero-arg constructor. " +
    +          "Note: For structured streaming, this configuration cannot be changed between query " +
    --- End diff --
    
    Same here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22238
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95321/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213180977
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -2812,7 +2812,18 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f
     
     # Additional Information
     
    -**Further Reading**
    +## Configuration Options For Structured Streaming
    +
    +This section is for configurations which are only available for structured streaming, or they behave differently with batch query.
    +
    +- spark.sql.shuffle.partitions
    --- End diff --
    
    I was thinking adding this information somewhere API or configuration only. For instance, notes like https://github.com/apache/spark/pull/19617. 
    
    > lots of wondering around SO and user mailing list,
    
    I don't object to note these stuff but usually the site has only key points for some features or configurations. 
    
    If there are more instance to describe specifically for structured streaming (where the same SQL configurations could lead to some confusions), I am fine with adding this. If not or less sure for now, I would add them into API's doc or configuration's doc.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22238
  
    **[Test build #95321 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95321/testReport)** for PR 22238 at commit [`138cc63`](https://github.com/apache/spark/commit/138cc63e639b60fb7e803097654816ad6c19c95f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213005430
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -2812,6 +2812,12 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f
     
     # Additional Information
     
    +**Gotchas**
    --- End diff --
    
    hmmmmm .. @HeartSaVioR how about leaving them in codes or API somewhere as `note`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213178020
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -2812,7 +2812,18 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f
     
     # Additional Information
     
    -**Further Reading**
    +## Configuration Options For Structured Streaming
    +
    +This section is for configurations which are only available for structured streaming, or they behave differently with batch query.
    +
    +- spark.sql.shuffle.partitions
    --- End diff --
    
    What I am worried is about adding a new section, which is quite unusual. Usually we go for it when multiple instances are detected later.
    
    Are there more instance to describe here specifically for structured streaming?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213264912
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -868,7 +870,9 @@ object SQLConf {
           .internal()
           .doc(
             "The class used to manage state data in stateful streaming queries. This class must " +
    -          "be a subclass of StateStoreProvider, and must have a zero-arg constructor.")
    +          "be a subclass of StateStoreProvider, and must have a zero-arg constructor. " +
    +          "Note: For structured streaming, this configuration cannot be changed between query " +
    --- End diff --
    
    s/cannot be/must not be/


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213179259
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -2812,7 +2812,18 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f
     
     # Additional Information
     
    -**Further Reading**
    +## Configuration Options For Structured Streaming
    +
    +This section is for configurations which are only available for structured streaming, or they behave differently with batch query.
    +
    +- spark.sql.shuffle.partitions
    --- End diff --
    
    I can revert adding a new section if you meant adding `##` on it. While gotcha looks more like funny, I will change it to `**Notes**`.
    
    The rationalization on adding this to doc is, this restriction had been making lots of wondering around SO and user mailing list, as well as even a patch for fixing this. So all of end users who use structured streaming would be nice to see it at least once, even they skim the doc, so that they can remember and revisit the doc once they get stuck on this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22238
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by arunmahadevan <gi...@git.apache.org>.
Github user arunmahadevan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213049895
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -2812,6 +2812,12 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f
     
     # Additional Information
     
    +**Gotchas**
    --- End diff --
    
    IMO, It would be better to keep it here as well as in the code, we may not be able to surface it in the right api docs and chance for users to ignore it.
    
    @HeartSaVioR, may be add an example here to illustrate how to use the coalesce?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213123711
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -2812,6 +2812,12 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f
     
     # Additional Information
     
    +**Gotchas**
    --- End diff --
    
    I was going to add the explanation to `doc()` of `spark.sql.shuffle.partitions`, but looks like what we explained in `doc()` would not be published automatically. (Please correct me if I'm missing here.) SQLConf is even not exposed to scaladoc. That's why I'm adding this to structured streaming guide doc. Actually I think most of end users only take a look at this doc for structured streaming, and we can't (and shouldn't) expect end users to take a look at source code to find it.
    
    But also actually I didn't notice that `spark.sql.shuffle.partitions` is explained in `sql-programming-guide.md` but I also think we need to explain all configs here if they work differently with batch query. `spark.sql.shuffle.partitions` is the case. 
    
    Btw, `Gotchas` looks like funny though. Maybe having section would be better. Maybe like `## Other Configuration Options` in `sql-programming-guide.md`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22238
  
    **[Test build #95268 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95268/testReport)** for PR 22238 at commit [`ec24f29`](https://github.com/apache/spark/commit/ec24f2941021d34058d7af3d3f103094582fd79d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22238
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22238
  
    **[Test build #95326 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95326/testReport)** for PR 22238 at commit [`e2ee43d`](https://github.com/apache/spark/commit/e2ee43da2f9bf4fb95c938764ee3584bbae06c1b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22238
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95268/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22238
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213181040
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -2812,6 +2812,19 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f
     
     # Additional Information
     
    +**Notes**
    --- End diff --
    
    I was thinking adding this information somewhere API or configuration only. For instance, notes like https://github.com/apache/spark/pull/19617. 
    
    > lots of wondering around SO and user mailing list,
    
    I don't object to note these stuff but usually the site has only key points for some features or configurations. 
    
    If there are more instance to describe specifically for structured streaming (where the same SQL configurations could lead to some confusions), I am fine with adding this. If not or less sure for now, I would add them into API's doc or configuration's doc.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213267299
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -266,7 +266,9 @@ object SQLConf {
         .createWithDefault(Long.MaxValue)
     
       val SHUFFLE_PARTITIONS = buildConf("spark.sql.shuffle.partitions")
    -    .doc("The default number of partitions to use when shuffling data for joins or aggregations.")
    +    .doc("The default number of partitions to use when shuffling data for joins or aggregations. " +
    +      "Note: For structured streaming, this configuration cannot be changed between query " +
    --- End diff --
    
    The sentence is borrowed from existing one: https://github.com/apache/spark/blob/8198ea50192cad615071beb5510c73aa9e9178f4/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L978-L979


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22238
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95326/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213063267
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -2812,6 +2812,12 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f
     
     # Additional Information
     
    +**Gotchas**
    +
    +- For structured streaming, modifying "spark.sql.shuffle.partitions" is restricted once you run the query.
    +  - This is because state is partitioned via key, hence number of partitions for state should be unchanged.
    +  - If you want to run less tasks for stateful operations, `coalesce` would help with avoiding unnecessary repartitioning. Please note that it will also affect downstream operators.
    --- End diff --
    
    An example of how to use `coalesce` operator with stateful streaming query would be superb.
    
    I'd also appreciate if you added what type of downstream operators are affected and how.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22238
  
    **[Test build #95268 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95268/testReport)** for PR 22238 at commit [`ec24f29`](https://github.com/apache/spark/commit/ec24f2941021d34058d7af3d3f103094582fd79d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213129120
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -2812,6 +2812,12 @@ See [Input Sources](#input-sources) and [Output Sinks](#output-sinks) sections f
     
     # Additional Information
     
    +**Gotchas**
    +
    +- For structured streaming, modifying "spark.sql.shuffle.partitions" is restricted once you run the query.
    +  - This is because state is partitioned via key, hence number of partitions for state should be unchanged.
    +  - If you want to run less tasks for stateful operations, `coalesce` would help with avoiding unnecessary repartitioning. Please note that it will also affect downstream operators.
    --- End diff --
    
    It just means that the number of partitions in stateful operations' output will be same as parameter for `coalesce`, and the number of partitions will be kept unless another shuffle happens. It is implicitly same as `spark.sql.shuffle.partitions`, which default value is 200.
    
    I'll add the code, but not sure we need to have the code per language like Scala / Java / Python tabs since they will be same.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22238
  
    **[Test build #95321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95321/testReport)** for PR 22238 at commit [`138cc63`](https://github.com/apache/spark/commit/138cc63e639b60fb7e803097654816ad6c19c95f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org