You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by jjthomas <gi...@git.apache.org> on 2016/07/13 18:34:36 UTC

[GitHub] spark pull request #14183: updated structured streaming guide

GitHub user jjthomas opened a pull request:

    https://github.com/apache/spark/pull/14183

    updated structured streaming guide

    ## What changes were proposed in this pull request?
    
    Updated structured streaming programming guide with new windowed example.
    
    
    ## How was this patch tested?
    
    Docs
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jjthomas/spark ss_docs_update

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14183.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14183
    
----
commit cd18bda94b9be008bda6c10edf6917d82913caee
Author: James Thomas <ja...@gmail.com>
Date:   2016-07-13T18:31:48Z

    updated structured streaming guide

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14183: [SPARK-16114] [SQL] updated structured streaming ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14183#discussion_r70700875
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -626,52 +626,48 @@ The result tables would look something like the following.
     
     ![Window Operations](img/structured-streaming-window.png)
     
    -Since this windowing is similar to grouping, in code, you can use `groupBy()` and `window()` operations to express windowed aggregations.
    +Since this windowing is similar to grouping, in code, you can use `groupBy()` and `window()` operations to express windowed aggregations. You can see the full code for the below examples in
    +[Scala]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredNetworkWordCountWindowed.scala)/
    +[Java]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCountWindowed.java)/
    +[Python]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/python/sql/streaming/structured_network_wordcount_windowed.py).
     
     <div class="codetabs">
     <div data-lang="scala"  markdown="1">
     
     {% highlight scala %}
    -// Number of events in every 1 minute time windows
    -df.groupBy(window(df.col("time"), "1 minute"))
    -  .count()
    +import spark.implicits._
     
    +val words = ... // streaming DataFrame of schema { timestamp: Timestamp, word: String }
     
    -// Average number of events for each device type in every 1 minute time windows
    -df.groupBy(
    -     df.col("type"),
    -     window(df.col("time"), "1 minute"))
    -  .avg("signal")
    +// Group the data by window and word and compute the count of each group
    +val windowedCounts = words.groupBy(
    +  window($"timestamp", "10 minutes", "5 minutes"), $"word"
    +).count().orderBy("window")
     {% endhighlight %}
     
     </div>
     <div data-lang="java"  markdown="1">
     
     {% highlight java %}
    -import static org.apache.spark.sql.functions.window;
    -
    -// Number of events in every 1 minute time windows
    -df.groupBy(window(df.col("time"), "1 minute"))
    -  .count();
    -
    -// Average number of events for each device type in every 1 minute time windows
    -df.groupBy(
    -     df.col("type"),
    -     window(df.col("time"), "1 minute"))
    -  .avg("signal");
    +Dataset<Row> words = ... // streaming DataFrame of schema { timestamp: Timestamp, word: String }
     
    +// Group the data by window and word and compute the count of each group
    +Dataset<Row> windowedCounts = words.groupBy(
    +  functions.window(words.col("timestamp"), "10 minutes", "5 minutes"),
    +  words.col("word")
    +).count().orderBy("window");
     {% endhighlight %}
     
     </div>
     <div data-lang="python"  markdown="1">
     {% highlight python %}
    -from pyspark.sql.functions import window
    -
    -# Number of events in every 1 minute time windows
    -df.groupBy(window("time", "1 minute")).count()
    +words = ... # streaming DataFrame of schema { timestamp: Timestamp, word: String }
     
    -# Average number of events for each device type in every 1 minute time windows
    -df.groupBy("type", window("time", "1 minute")).avg("signal")
    +# Group the data by window and word and compute the count of each group
    +windowedCounts = words.groupBy(
    +    window(words.timestamp, '10 minutes', '5 minutes'),
    +    words.word
    +).count().orderBy('window')
    --- End diff --
    
    orderBy not important.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14183: [SPARK-16114] [SQL] updated structured streaming guide

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/14183
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14183: [SPARK-16114] [SQL] updated structured streaming guide

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14183
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14183: [SPARK-16114] [SQL] updated structured streaming guide

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14183
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14183: [SPARK-16114] [SQL] updated structured streaming guide

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14183
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62277/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14183: [SPARK-16114] [SQL] updated structured streaming ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14183#discussion_r70687380
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -626,52 +626,95 @@ The result tables would look something like the following.
     
     ![Window Operations](img/structured-streaming-window.png)
     
    -Since this windowing is similar to grouping, in code, you can use `groupBy()` and `window()` operations to express windowed aggregations.
    +Since this windowing is similar to grouping, in code, you can use `groupBy()` and `window()` operations to express windowed aggregations. You can see the full code for the below examples in
    +[Scala]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredNetworkWordCountWindowed.scala)/
    +[Java]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCountWindowed.java)/
    +[Python]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/python/sql/streaming/structured_network_wordcount_windowed.py).
     
     <div class="codetabs">
     <div data-lang="scala"  markdown="1">
     
     {% highlight scala %}
    -// Number of events in every 1 minute time windows
    -df.groupBy(window(df.col("time"), "1 minute"))
    -  .count()
    -
    +import spark.implicits._
     
    -// Average number of events for each device type in every 1 minute time windows
    -df.groupBy(
    -     df.col("type"),
    -     window(df.col("time"), "1 minute"))
    -  .avg("signal")
    +// Create DataFrame representing the stream of input lines from connection to host:port
    +val lines = spark.readStream
    +  .format("socket")
    +  .option("host", "localhost")
    +  .option("port", 9999)
    +  .option("includeTimestamp", true)
    +  .load().as[(String, Timestamp)]
    +
    +// Split the lines into words, retaining timestamps
    +val words = lines.flatMap(line =>
    +  line._1.split(" ").map(word => (word, line._2))
    +).toDF("word", "timestamp")
    +
    +// Group the data by window and word and compute the count of each group
    +val windowedCounts = words.groupBy(
    --- End diff --
    
    I took a look at the built doc again and imagined what it would look like. This would look very verbose. I think since the nearest example in the doc (Basic Operations - Selection, Projection, Aggregation) uses device data and already has all the boilerplate code to define DeviceData class, etc., lets not change the code snippet to the exact one in the example. 
    
    
    Can you revert all the code snippet changes, and just do one change for the Scala snippet. 
    - Change df.col("..") to use $"..."
    - Add import spark.implicits._
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14183: [SPARK-16114] [SQL] updated structured streaming ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14183#discussion_r70700752
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -626,52 +626,48 @@ The result tables would look something like the following.
     
     ![Window Operations](img/structured-streaming-window.png)
     
    -Since this windowing is similar to grouping, in code, you can use `groupBy()` and `window()` operations to express windowed aggregations.
    +Since this windowing is similar to grouping, in code, you can use `groupBy()` and `window()` operations to express windowed aggregations. You can see the full code for the below examples in
    +[Scala]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredNetworkWordCountWindowed.scala)/
    +[Java]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCountWindowed.java)/
    +[Python]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/python/sql/streaming/structured_network_wordcount_windowed.py).
     
     <div class="codetabs">
     <div data-lang="scala"  markdown="1">
     
     {% highlight scala %}
    -// Number of events in every 1 minute time windows
    -df.groupBy(window(df.col("time"), "1 minute"))
    -  .count()
    +import spark.implicits._
     
    +val words = ... // streaming DataFrame of schema { timestamp: Timestamp, word: String }
     
    -// Average number of events for each device type in every 1 minute time windows
    -df.groupBy(
    -     df.col("type"),
    -     window(df.col("time"), "1 minute"))
    -  .avg("signal")
    +// Group the data by window and word and compute the count of each group
    +val windowedCounts = words.groupBy(
    +  window($"timestamp", "10 minutes", "5 minutes"), $"word"
    --- End diff --
    
    put word on next line, to be consistent with other examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14183: [SPARK-16114] [SQL] updated structured streaming guide

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14183
  
    **[Test build #62276 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62276/consoleFull)** for PR 14183 at commit [`4342efb`](https://github.com/apache/spark/commit/4342efb394e3410a12bd3bba310d11be963d0604).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14183: [SPARK-16114] [SQL] updated structured streaming guide

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14183
  
    **[Test build #62277 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62277/consoleFull)** for PR 14183 at commit [`77c4a6e`](https://github.com/apache/spark/commit/77c4a6e42aa983952e53cfb12beba988beb7f9cc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14183: [SPARK-16114] [SQL] updated structured streaming guide

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14183
  
    **[Test build #62277 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62277/consoleFull)** for PR 14183 at commit [`77c4a6e`](https://github.com/apache/spark/commit/77c4a6e42aa983952e53cfb12beba988beb7f9cc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14183: [SPARK-16114] [SQL] updated structured streaming guide

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14183
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14183: [SPARK-16114] [SQL] updated structured streaming guide

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14183
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62276/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14183: [SPARK-16114] [SQL] updated structured streaming ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14183#discussion_r70700857
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -626,52 +626,48 @@ The result tables would look something like the following.
     
     ![Window Operations](img/structured-streaming-window.png)
     
    -Since this windowing is similar to grouping, in code, you can use `groupBy()` and `window()` operations to express windowed aggregations.
    +Since this windowing is similar to grouping, in code, you can use `groupBy()` and `window()` operations to express windowed aggregations. You can see the full code for the below examples in
    +[Scala]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredNetworkWordCountWindowed.scala)/
    +[Java]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCountWindowed.java)/
    +[Python]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/python/sql/streaming/structured_network_wordcount_windowed.py).
     
     <div class="codetabs">
     <div data-lang="scala"  markdown="1">
     
     {% highlight scala %}
    -// Number of events in every 1 minute time windows
    -df.groupBy(window(df.col("time"), "1 minute"))
    -  .count()
    +import spark.implicits._
     
    +val words = ... // streaming DataFrame of schema { timestamp: Timestamp, word: String }
     
    -// Average number of events for each device type in every 1 minute time windows
    -df.groupBy(
    -     df.col("type"),
    -     window(df.col("time"), "1 minute"))
    -  .avg("signal")
    +// Group the data by window and word and compute the count of each group
    +val windowedCounts = words.groupBy(
    +  window($"timestamp", "10 minutes", "5 minutes"), $"word"
    +).count().orderBy("window")
     {% endhighlight %}
     
     </div>
     <div data-lang="java"  markdown="1">
     
     {% highlight java %}
    -import static org.apache.spark.sql.functions.window;
    -
    -// Number of events in every 1 minute time windows
    -df.groupBy(window(df.col("time"), "1 minute"))
    -  .count();
    -
    -// Average number of events for each device type in every 1 minute time windows
    -df.groupBy(
    -     df.col("type"),
    -     window(df.col("time"), "1 minute"))
    -  .avg("signal");
    +Dataset<Row> words = ... // streaming DataFrame of schema { timestamp: Timestamp, word: String }
     
    +// Group the data by window and word and compute the count of each group
    +Dataset<Row> windowedCounts = words.groupBy(
    +  functions.window(words.col("timestamp"), "10 minutes", "5 minutes"),
    +  words.col("word")
    +).count().orderBy("window");
    --- End diff --
    
    orderby not important.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14183: [SPARK-16114] [SQL] updated structured streaming ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14183#discussion_r70700829
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -626,52 +626,48 @@ The result tables would look something like the following.
     
     ![Window Operations](img/structured-streaming-window.png)
     
    -Since this windowing is similar to grouping, in code, you can use `groupBy()` and `window()` operations to express windowed aggregations.
    +Since this windowing is similar to grouping, in code, you can use `groupBy()` and `window()` operations to express windowed aggregations. You can see the full code for the below examples in
    +[Scala]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredNetworkWordCountWindowed.scala)/
    +[Java]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCountWindowed.java)/
    +[Python]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/python/sql/streaming/structured_network_wordcount_windowed.py).
     
     <div class="codetabs">
     <div data-lang="scala"  markdown="1">
     
     {% highlight scala %}
    -// Number of events in every 1 minute time windows
    -df.groupBy(window(df.col("time"), "1 minute"))
    -  .count()
    +import spark.implicits._
     
    +val words = ... // streaming DataFrame of schema { timestamp: Timestamp, word: String }
     
    -// Average number of events for each device type in every 1 minute time windows
    -df.groupBy(
    -     df.col("type"),
    -     window(df.col("time"), "1 minute"))
    -  .avg("signal")
    +// Group the data by window and word and compute the count of each group
    +val windowedCounts = words.groupBy(
    +  window($"timestamp", "10 minutes", "5 minutes"), $"word"
    +).count().orderBy("window")
    --- End diff --
    
    orderBy("window") is not essential. it was only for pretty printing in the example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14183: [SPARK-16114] [SQL] updated structured streaming guide

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/14183
  
    LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14183: [SPARK-16114] [SQL] updated structured streaming ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14183#discussion_r70687523
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -626,52 +626,95 @@ The result tables would look something like the following.
     
     ![Window Operations](img/structured-streaming-window.png)
     
    -Since this windowing is similar to grouping, in code, you can use `groupBy()` and `window()` operations to express windowed aggregations.
    +Since this windowing is similar to grouping, in code, you can use `groupBy()` and `window()` operations to express windowed aggregations. You can see the full code for the below examples in
    +[Scala]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredNetworkWordCountWindowed.scala)/
    +[Java]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCountWindowed.java)/
    +[Python]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/python/sql/streaming/structured_network_wordcount_windowed.py).
    --- End diff --
    
    these changes are good. do not revert this based on what i have said below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14183: [SPARK-16114] [SQL] updated structured streaming guide

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/14183
  
    Merging to master and 2.0. Tests dont matter here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14183: [SPARK-16114] [SQL] updated structured streaming ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14183


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14183: [SPARK-16114] [SQL] updated structured streaming guide

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14183
  
    **[Test build #62276 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62276/consoleFull)** for PR 14183 at commit [`4342efb`](https://github.com/apache/spark/commit/4342efb394e3410a12bd3bba310d11be963d0604).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org