You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ramkumarvenkat <gi...@git.apache.org> on 2017/02/23 08:36:48 UTC

[GitHub] spark pull request #17037: [MINOR][DOCS] Fix few typos in structured streami...

GitHub user ramkumarvenkat opened a pull request:

    https://github.com/apache/spark/pull/17037

    [MINOR][DOCS] Fix few typos in structured streaming doc

    ## What changes were proposed in this pull request?
    
    Minor typos like `even-time`, which is changed to `event-time` and a couple of grammatical errors.
    
    ## How was this patch tested?
    
    N/A - since this is a doc fix. I did a jekyll build locally though.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ramkumarvenkat/spark doc-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17037.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17037
    
----
commit e3292740b0c18b6960ff8cdf2ac439a8c9bed2b2
Author: Ramkumar Venkataraman <rv...@paypal.com>
Date:   2017-02-23T08:27:38Z

    [MINOR][DOCS] Fix few typos in structured streaming doc

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17037: [MINOR][DOCS] Fix few typos in structured streami...

Posted by ramkumarvenkat <gi...@git.apache.org>.
Github user ramkumarvenkat commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17037#discussion_r102876076
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -647,7 +647,7 @@ df.groupBy("deviceType").count()
     </div>
     
     ### Window Operations on Event Time
    -Aggregations over a sliding event-time window are straightforward with Structured Streaming. The key idea to understand about window-based aggregations are very similar to grouped aggregations. In a grouped aggregation, aggregate values (e.g. counts) are maintained for each unique value in the user-specified grouping column. In case of window-based aggregations, aggregate values are maintained for each window the event-time of a row falls into. Let's understand this with an illustration. 
    +Aggregations over a sliding event-time window are straightforward with Structured Streaming. The key idea to understand window-based aggregations is very similar to grouped aggregations. In a grouped aggregation, aggregate values (e.g. counts) are maintained for each unique value in the user-specified grouping column. In case of window-based aggregations, aggregate values are maintained for each window the event-time of a row falls into. Let's understand this with an illustration. 
    --- End diff --
    
    Fixed this as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17037: [MINOR][DOCS] Fix few typos in structured streami...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17037#discussion_r102670532
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -392,7 +392,7 @@ data, thus relieving the users from reasoning about it. As an example, let\u2019s
     see how this model handles event-time based processing and late arriving data.
     
     ## Handling Event-time and Late Data
    -Event-time is the time embedded in the data itself. For many applications, you may want to operate on this event-time. For example, if you want to get the number of events generated by IoT devices every minute, then you probably want to use the time when the data was generated (that is, event-time in the data), rather than the time Spark receives them. This event-time is very naturally expressed in this model -- each event from the devices is a row in the table, and event-time is a column value in the row. This allows window-based aggregations (e.g. number of events every minute) to be just a special type of grouping and aggregation on the even-time column -- each time window is a group and each row can belong to multiple windows/groups. Therefore, such event-time-window-based aggregation queries can be defined consistently on both a static dataset (e.g. from collected device events logs) as well as on a data stream, making the life of the user much easier.
    --- End diff --
    
    can you point out what changed here? github doesnt seeming to showing the difference clearly like the other diffs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17037: [MINOR][DOCS] Fix few typos in structured streami...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17037#discussion_r102726661
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -647,7 +647,7 @@ df.groupBy("deviceType").count()
     </div>
     
     ### Window Operations on Event Time
    -Aggregations over a sliding event-time window are straightforward with Structured Streaming. The key idea to understand about window-based aggregations are very similar to grouped aggregations. In a grouped aggregation, aggregate values (e.g. counts) are maintained for each unique value in the user-specified grouping column. In case of window-based aggregations, aggregate values are maintained for each window the event-time of a row falls into. Let's understand this with an illustration. 
    +Aggregations over a sliding event-time window are straightforward with Structured Streaming. The key idea to understand window-based aggregations is very similar to grouped aggregations. In a grouped aggregation, aggregate values (e.g. counts) are maintained for each unique value in the user-specified grouping column. In case of window-based aggregations, aggregate values are maintained for each window the event-time of a row falls into. Let's understand this with an illustration. 
    --- End diff --
    
    This still needs a fix -- I would just say "Window-based aggregations are very similar to grouped aggregations"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17037: [MINOR][DOCS] Fix few typos in structured streaming doc

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/17037
  
    Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17037: [MINOR][DOCS] Fix few typos in structured streaming doc

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17037
  
    **[Test build #3583 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3583/testReport)** for PR 17037 at commit [`ac24bd6`](https://github.com/apache/spark/commit/ac24bd6bd2e053a7699ca109e13b8a66c466a88a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17037: [MINOR][DOCS] Fix few typos in structured streaming doc

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17037
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17037: [MINOR][DOCS] Fix few typos in structured streami...

Posted by ramkumarvenkat <gi...@git.apache.org>.
Github user ramkumarvenkat commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17037#discussion_r102671200
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -392,7 +392,7 @@ data, thus relieving the users from reasoning about it. As an example, let\u2019s
     see how this model handles event-time based processing and late arriving data.
     
     ## Handling Event-time and Late Data
    -Event-time is the time embedded in the data itself. For many applications, you may want to operate on this event-time. For example, if you want to get the number of events generated by IoT devices every minute, then you probably want to use the time when the data was generated (that is, event-time in the data), rather than the time Spark receives them. This event-time is very naturally expressed in this model -- each event from the devices is a row in the table, and event-time is a column value in the row. This allows window-based aggregations (e.g. number of events every minute) to be just a special type of grouping and aggregation on the even-time column -- each time window is a group and each row can belong to multiple windows/groups. Therefore, such event-time-window-based aggregation queries can be defined consistently on both a static dataset (e.g. from collected device events logs) as well as on a data stream, making the life of the user much easier.
    --- End diff --
    
    and aggregation on the `even-time column`
    
    is changed to 
    
    and aggregation on the `event-time column`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17037: [MINOR][DOCS] Fix few typos in structured streami...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17037#discussion_r102830753
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -647,7 +647,7 @@ df.groupBy("deviceType").count()
     </div>
     
     ### Window Operations on Event Time
    -Aggregations over a sliding event-time window are straightforward with Structured Streaming. The key idea to understand about window-based aggregations are very similar to grouped aggregations. In a grouped aggregation, aggregate values (e.g. counts) are maintained for each unique value in the user-specified grouping column. In case of window-based aggregations, aggregate values are maintained for each window the event-time of a row falls into. Let's understand this with an illustration. 
    +Aggregations over a sliding event-time window are straightforward with Structured Streaming. The key idea to understand window-based aggregations is very similar to grouped aggregations. In a grouped aggregation, aggregate values (e.g. counts) are maintained for each unique value in the user-specified grouping column. In case of window-based aggregations, aggregate values are maintained for each window the event-time of a row falls into. Let's understand this with an illustration. 
    --- End diff --
    
    agreed. 
    "The key idea to understand is that window-based aggregations are very similar to grouped aggregations."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17037: [MINOR][DOCS] Fix few typos in structured streami...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17037


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17037: [MINOR][DOCS] Fix few typos in structured streaming doc

Posted by ramkumarvenkat <gi...@git.apache.org>.
Github user ramkumarvenkat commented on the issue:

    https://github.com/apache/spark/pull/17037
  
    @srowen @tdas Can you guys please look into this small doc fix?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17037: [MINOR][DOCS] Fix few typos in structured streaming doc

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17037
  
    **[Test build #3583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3583/testReport)** for PR 17037 at commit [`ac24bd6`](https://github.com/apache/spark/commit/ac24bd6bd2e053a7699ca109e13b8a66c466a88a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org