You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jjthomas <gi...@git.apache.org> on 2016/06/21 21:07:15 UTC

[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

GitHub user jjthomas opened a pull request:

    https://github.com/apache/spark/pull/13816

    [SPARK-16114] [SQL] structured streaming network word count examples

    ## What changes were proposed in this pull request?
    
    Network word count example for structured streaming
    
    
    ## How was this patch tested?
    
    Run locally
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jjthomas/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13816.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13816
    
----
commit 38b5497ef17b0c1f1cf2a8c5731832bed06d2fc8
Author: James Thomas <ja...@jamess-macbook-pro.local>
Date:   2016-06-21T21:00:05Z

    structured streaming network word count examples

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #61051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61051/consoleFull)** for PR 13816 at commit [`46ac930`](https://github.com/apache/spark/commit/46ac930296cce78d47ff832c9940cf4d017224a2).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r67961394
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql.streaming;
    +
    +
    +import org.apache.spark.sql.*;
    +import org.apache.spark.sql.streaming.OutputMode;
    +
    +import java.util.regex.Pattern;
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: JavaStructuredNetworkWordCount <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount
    + *    localhost 9999 <checkpoint dir>`
    + */
    +public class JavaStructuredNetworkWordCount {
    +  private static final Pattern SPACE = Pattern.compile(" ");
    +
    +  public static void main(String[] args) throws Exception {
    +    if (args.length < 3) {
    +      System.err.println("Usage: JavaNetworkWordCount <hostname> <port> <checkpoint dir>");
    +      System.exit(1);
    +    }
    +
    +    SparkSession spark = SparkSession
    +      .builder()
    +      .appName("JavaStructuredNetworkWordCount")
    +      .getOrCreate();
    +
    +    Dataset<String> df = spark.readStream().format("socket").option("host", args[0])
    +      .option("port", args[1]).load().as(Encoders.STRING());
    +
    +    Dataset<String> words = df.select(functions.explode(functions.split(df.col("value"), " ")).alias("word"))
    +      .as(Encoders.STRING());
    +
    +    Dataset<Row> wordCounts = words.groupBy("word").count();
    +
    +    wordCounts.writeStream()
    +      .outputMode(OutputMode.Complete())
    +      .format("console")
    +      .option("checkpointLocation", args[2])
    +      .start()
    +      .awaitTermination();
    +
    +    spark.stop();
    --- End diff --
    
    Never reached due to awaitTermination. So not needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #3127 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3127/consoleFull)** for PR 13816 at commit [`80fee20`](https://github.com/apache/spark/commit/80fee206beccc38b7b1e92e47a20ec2525e02b31).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #60981 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60981/consoleFull)** for PR 13816 at commit [`18c83b1`](https://github.com/apache/spark/commit/18c83b1550fb69a63dd547e3bd4d030b649c9031).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #61413 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61413/consoleFull)** for PR 13816 at commit [`6ab4453`](https://github.com/apache/spark/commit/6ab4453ec6a27e3ff7bdb03b1b7282385168a6a1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r67980191
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredNetworkWordCount.scala ---
    @@ -0,0 +1,80 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.sql.streaming
    +
    +import org.apache.spark.sql.{functions, SparkSession}
    +import org.apache.spark.sql.streaming.OutputMode
    +
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: StructuredNetworkWordCount <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount
    + *    localhost 9999 <checkpoint dir>`
    + */
    +object StructuredNetworkWordCount {
    +  def main(args: Array[String]) {
    +    if (args.length < 3) {
    +      System.err.println("Usage: StructuredNetworkWordCount <hostname> <port> <checkpoint dir>")
    +      System.exit(1)
    +    }
    +
    +    val host = args(0)
    +    val port = args(1).toInt
    +    val checkpointDir = args(2)
    +
    +    val spark = SparkSession
    +      .builder
    +      .appName("StructuredNetworkWordCount")
    +      .getOrCreate()
    +
    +    import spark.implicits._
    +
    +    // input lines (may be multiple words on each line)
    +    val lines = spark.readStream
    +      .format("socket")
    +      .option("host", host)
    +      .option("port", port)
    +      .load().as[String]
    +
    +    // input words
    +    val words = lines.select(
    +      functions.explode(
    --- End diff --
    
    you can import function._ and simplify the code further.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60970/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61421/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61071/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #61421 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61421/consoleFull)** for PR 13816 at commit [`a8c3fec`](https://github.com/apache/spark/commit/a8c3fecdf63fc2df93d1287a2e8c6ddebc5cbe61).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #61413 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61413/consoleFull)** for PR 13816 at commit [`6ab4453`](https://github.com/apache/spark/commit/6ab4453ec6a27e3ff7bdb03b1b7282385168a6a1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #61051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61051/consoleFull)** for PR 13816 at commit [`46ac930`](https://github.com/apache/spark/commit/46ac930296cce78d47ff832c9940cf4d017224a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #60981 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60981/consoleFull)** for PR 13816 at commit [`18c83b1`](https://github.com/apache/spark/commit/18c83b1550fb69a63dd547e3bd4d030b649c9031).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public final class JavaStructuredNetworkWordCount `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61389/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61413/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61054/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r68712291
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/streaming/NetworkEventTimeWindow.scala ---
    @@ -0,0 +1,84 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.sql.streaming
    +
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.functions._
    +import org.apache.spark.sql.types.TimestampType
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: EventTimeWindowExample <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.EventTimeWindowExample
    + *    localhost 9999 <checkpoint dir>`
    + */
    +object NetworkEventTimeWindow {
    --- End diff --
    
    Just rename to EventTimeWindow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r67961040
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql.streaming;
    +
    +
    +import org.apache.spark.sql.*;
    +import org.apache.spark.sql.streaming.OutputMode;
    +
    +import java.util.regex.Pattern;
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: JavaStructuredNetworkWordCount <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount
    + *    localhost 9999 <checkpoint dir>`
    + */
    +public class JavaStructuredNetworkWordCount {
    +  private static final Pattern SPACE = Pattern.compile(" ");
    +
    +  public static void main(String[] args) throws Exception {
    +    if (args.length < 3) {
    +      System.err.println("Usage: JavaNetworkWordCount <hostname> <port> <checkpoint dir>");
    +      System.exit(1);
    +    }
    +
    +    SparkSession spark = SparkSession
    +      .builder()
    +      .appName("JavaStructuredNetworkWordCount")
    +      .getOrCreate();
    +
    +    Dataset<String> df = spark.readStream().format("socket").option("host", args[0])
    +      .option("port", args[1]).load().as(Encoders.STRING());
    +
    +    Dataset<String> words = df.select(functions.explode(functions.split(df.col("value"), " ")).alias("word"))
    --- End diff --
    
    Same as above.
    
    and add comments on each line. 
    is it a dataset of string or row??


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r68712408
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/streaming/NetworkEventTimeWindow.scala ---
    @@ -0,0 +1,84 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.sql.streaming
    +
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.functions._
    +import org.apache.spark.sql.types.TimestampType
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    --- End diff --
    
    This does not say window in any way


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #61421 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61421/consoleFull)** for PR 13816 at commit [`a8c3fec`](https://github.com/apache/spark/commit/a8c3fecdf63fc2df93d1287a2e8c6ddebc5cbe61).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #60970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60970/consoleFull)** for PR 13816 at commit [`38b5497`](https://github.com/apache/spark/commit/38b5497ef17b0c1f1cf2a8c5731832bed06d2fc8).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class JavaStructuredNetworkWordCount `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #61071 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61071/consoleFull)** for PR 13816 at commit [`f7aec9d`](https://github.com/apache/spark/commit/f7aec9d1256790070dc8122b8e70c8855f2de8f4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    test this again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61310/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #61389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61389/consoleFull)** for PR 13816 at commit [`fb491c6`](https://github.com/apache/spark/commit/fb491c6237380d6419c63c7b24b73c574bee843d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r67961140
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql.streaming;
    +
    +
    +import org.apache.spark.sql.*;
    +import org.apache.spark.sql.streaming.OutputMode;
    +
    +import java.util.regex.Pattern;
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: JavaStructuredNetworkWordCount <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount
    + *    localhost 9999 <checkpoint dir>`
    + */
    +public class JavaStructuredNetworkWordCount {
    +  private static final Pattern SPACE = Pattern.compile(" ");
    +
    +  public static void main(String[] args) throws Exception {
    +    if (args.length < 3) {
    +      System.err.println("Usage: JavaNetworkWordCount <hostname> <port> <checkpoint dir>");
    +      System.exit(1);
    +    }
    +
    +    SparkSession spark = SparkSession
    +      .builder()
    +      .appName("JavaStructuredNetworkWordCount")
    +      .getOrCreate();
    +
    +    Dataset<String> df = spark.readStream().format("socket").option("host", args[0])
    +      .option("port", args[1]).load().as(Encoders.STRING());
    +
    +    Dataset<String> words = df.select(functions.explode(functions.split(df.col("value"), " ")).alias("word"))
    +      .as(Encoders.STRING());
    +
    +    Dataset<Row> wordCounts = words.groupBy("word").count();
    +
    +    wordCounts.writeStream()
    +      .outputMode(OutputMode.Complete())
    --- End diff --
    
    you could use `.outputMode("complete")`. I think that easier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13816


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r67980730
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java ---
    @@ -0,0 +1,83 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql.streaming;
    +
    +
    +import org.apache.spark.sql.*;
    +import org.apache.spark.sql.streaming.OutputMode;
    +import org.apache.spark.sql.streaming.StreamingQuery;
    +
    +import java.util.regex.Pattern;
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: JavaStructuredNetworkWordCount <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount
    + *    localhost 9999 <checkpoint dir>`
    + */
    +public final class JavaStructuredNetworkWordCount {
    +  private static final Pattern SPACE = Pattern.compile(" ");
    +
    +  public static void main(String[] args) throws Exception {
    +    if (args.length < 3) {
    +      System.err.println("Usage: JavaNetworkWordCount <hostname> <port> <checkpoint dir>");
    +      System.exit(1);
    +    }
    +
    +    String host = args[0];
    +    int port = Integer.parseInt(args[1]);
    +    String checkpointDir = args[2];
    +
    +    SparkSession spark = SparkSession
    +      .builder()
    +      .appName("JavaStructuredNetworkWordCount")
    +      .getOrCreate();
    +
    +    // input lines (may be multiple words on each line)
    +    Dataset<String> lines = spark
    +      .readStream()
    +      .format("socket")
    +      .option("host", host)
    +      .option("port", port)
    +      .load()
    +      .as(Encoders.STRING());
    +
    +    // input words
    +    Dataset<String> words = lines.select(
    +        functions.explode(
    +          functions.split(lines.col("value"), " ")
    +        ).alias("word")
    +      ).as(Encoders.STRING());
    +
    +    // the count for each distinct word
    +    Dataset<Row> wordCounts = words.groupBy("word").count();
    +
    +    StreamingQuery query = wordCounts.writeStream()
    --- End diff --
    
    // Start running the query that prints the running counts to the console


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r67980047
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java ---
    @@ -0,0 +1,83 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql.streaming;
    +
    +
    +import org.apache.spark.sql.*;
    +import org.apache.spark.sql.streaming.OutputMode;
    +import org.apache.spark.sql.streaming.StreamingQuery;
    +
    +import java.util.regex.Pattern;
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: JavaStructuredNetworkWordCount <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount
    + *    localhost 9999 <checkpoint dir>`
    + */
    +public final class JavaStructuredNetworkWordCount {
    +  private static final Pattern SPACE = Pattern.compile(" ");
    +
    +  public static void main(String[] args) throws Exception {
    +    if (args.length < 3) {
    +      System.err.println("Usage: JavaNetworkWordCount <hostname> <port> <checkpoint dir>");
    +      System.exit(1);
    +    }
    +
    +    String host = args[0];
    +    int port = Integer.parseInt(args[1]);
    +    String checkpointDir = args[2];
    +
    +    SparkSession spark = SparkSession
    +      .builder()
    +      .appName("JavaStructuredNetworkWordCount")
    +      .getOrCreate();
    +
    +    // input lines (may be multiple words on each line)
    +    Dataset<String> lines = spark
    --- End diff --
    
    you dont need to convert to Dataset[String] using `as`, since you are not using the typed groupByKey.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r68859907
  
    --- Diff: examples/src/main/python/sql/streaming/structured_network_wordcount.py ---
    @@ -0,0 +1,76 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +"""
    + Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + Usage: structured_network_wordcount.py <hostname> <port>
    +   <hostname> and <port> describe the TCP server that Structured Streaming
    +   would connect to receive data.
    +
    + To run this on your local machine, you need to first run a Netcat server
    +    `$ nc -lk 9999`
    + and then run the example
    +    `$ bin/spark-submit examples/src/main/python/sql/streaming/structured_network_wordcount.py
    +    localhost 9999`
    +"""
    +from __future__ import print_function
    +
    +import sys
    +
    +from pyspark.sql import SparkSession
    +from pyspark.sql.functions import explode
    +from pyspark.sql.functions import split
    +
    +if __name__ == "__main__":
    +    if len(sys.argv) != 3:
    +        print("Usage: network_wordcount.py <hostname> <port>", file=sys.stderr)
    --- End diff --
    
    usage is wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r67980738
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java ---
    @@ -0,0 +1,83 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql.streaming;
    +
    +
    +import org.apache.spark.sql.*;
    +import org.apache.spark.sql.streaming.OutputMode;
    +import org.apache.spark.sql.streaming.StreamingQuery;
    +
    +import java.util.regex.Pattern;
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: JavaStructuredNetworkWordCount <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount
    + *    localhost 9999 <checkpoint dir>`
    + */
    +public final class JavaStructuredNetworkWordCount {
    +  private static final Pattern SPACE = Pattern.compile(" ");
    +
    +  public static void main(String[] args) throws Exception {
    +    if (args.length < 3) {
    +      System.err.println("Usage: JavaNetworkWordCount <hostname> <port> <checkpoint dir>");
    +      System.exit(1);
    +    }
    +
    +    String host = args[0];
    +    int port = Integer.parseInt(args[1]);
    +    String checkpointDir = args[2];
    +
    +    SparkSession spark = SparkSession
    +      .builder()
    +      .appName("JavaStructuredNetworkWordCount")
    +      .getOrCreate();
    +
    +    // input lines (may be multiple words on each line)
    +    Dataset<String> lines = spark
    +      .readStream()
    +      .format("socket")
    +      .option("host", host)
    +      .option("port", port)
    +      .load()
    +      .as(Encoders.STRING());
    +
    +    // input words
    +    Dataset<String> words = lines.select(
    +        functions.explode(
    +          functions.split(lines.col("value"), " ")
    +        ).alias("word")
    +      ).as(Encoders.STRING());
    +
    +    // the count for each distinct word
    --- End diff --
    
    // Generate running word count


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r67960868
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql.streaming;
    +
    +
    +import org.apache.spark.sql.*;
    +import org.apache.spark.sql.streaming.OutputMode;
    +
    +import java.util.regex.Pattern;
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: JavaStructuredNetworkWordCount <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount
    + *    localhost 9999 <checkpoint dir>`
    + */
    +public class JavaStructuredNetworkWordCount {
    +  private static final Pattern SPACE = Pattern.compile(" ");
    +
    +  public static void main(String[] args) throws Exception {
    +    if (args.length < 3) {
    +      System.err.println("Usage: JavaNetworkWordCount <hostname> <port> <checkpoint dir>");
    +      System.exit(1);
    +    }
    +
    +    SparkSession spark = SparkSession
    +      .builder()
    +      .appName("JavaStructuredNetworkWordCount")
    +      .getOrCreate();
    +
    +    Dataset<String> df = spark.readStream().format("socket").option("host", args[0])
    --- End diff --
    
    make this 
    ```
    spark
       .readStream()
       .format("socket")
       .option(...)
    ....
    ```
    
    easier to read.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #3128 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3128/consoleFull)** for PR 13816 at commit [`80fee20`](https://github.com/apache/spark/commit/80fee206beccc38b7b1e92e47a20ec2525e02b31).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #61071 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61071/consoleFull)** for PR 13816 at commit [`f7aec9d`](https://github.com/apache/spark/commit/f7aec9d1256790070dc8122b8e70c8855f2de8f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #61054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61054/consoleFull)** for PR 13816 at commit [`80fee20`](https://github.com/apache/spark/commit/80fee206beccc38b7b1e92e47a20ec2525e02b31).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #3127 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3127/consoleFull)** for PR 13816 at commit [`80fee20`](https://github.com/apache/spark/commit/80fee206beccc38b7b1e92e47a20ec2525e02b31).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r67980520
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java ---
    @@ -0,0 +1,83 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql.streaming;
    +
    +
    +import org.apache.spark.sql.*;
    +import org.apache.spark.sql.streaming.OutputMode;
    +import org.apache.spark.sql.streaming.StreamingQuery;
    +
    +import java.util.regex.Pattern;
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: JavaStructuredNetworkWordCount <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount
    + *    localhost 9999 <checkpoint dir>`
    + */
    +public final class JavaStructuredNetworkWordCount {
    +  private static final Pattern SPACE = Pattern.compile(" ");
    +
    +  public static void main(String[] args) throws Exception {
    +    if (args.length < 3) {
    +      System.err.println("Usage: JavaNetworkWordCount <hostname> <port> <checkpoint dir>");
    +      System.exit(1);
    +    }
    +
    +    String host = args[0];
    +    int port = Integer.parseInt(args[1]);
    +    String checkpointDir = args[2];
    +
    +    SparkSession spark = SparkSession
    +      .builder()
    +      .appName("JavaStructuredNetworkWordCount")
    +      .getOrCreate();
    +
    +    // input lines (may be multiple words on each line)
    +    Dataset<String> lines = spark
    +      .readStream()
    +      .format("socket")
    +      .option("host", host)
    +      .option("port", port)
    +      .load()
    +      .as(Encoders.STRING());
    +
    +    // input words
    --- End diff --
    
    // Split the lines into words


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60981/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r67962119
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql.streaming;
    +
    +
    +import org.apache.spark.sql.*;
    +import org.apache.spark.sql.streaming.OutputMode;
    +
    +import java.util.regex.Pattern;
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: JavaStructuredNetworkWordCount <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount
    + *    localhost 9999 <checkpoint dir>`
    + */
    +public class JavaStructuredNetworkWordCount {
    +  private static final Pattern SPACE = Pattern.compile(" ");
    +
    +  public static void main(String[] args) throws Exception {
    +    if (args.length < 3) {
    +      System.err.println("Usage: JavaNetworkWordCount <hostname> <port> <checkpoint dir>");
    +      System.exit(1);
    +    }
    +
    --- End diff --
    
    would be good to sanitize the input a little bit, before using them. so that common errors are not confusing. like port should be converted to int and then used. 
    
    also defining well named variables earlier up in the code makes it easier to read.
    
    ```
    val host = args(0)
    val port = args(1).toInt
    val checkpointDir = args(2)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r67962188
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredNetworkWordCount.scala ---
    @@ -0,0 +1,70 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.sql.streaming
    +
    +import org.apache.spark.sql.{functions, SparkSession}
    +import org.apache.spark.sql.streaming.OutputMode
    +
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: StructuredNetworkWordCount <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount
    + *    localhost 9999 <checkpoint dir>`
    + */
    +object StructuredNetworkWordCount {
    +  def main(args: Array[String]) {
    +    if (args.length < 3) {
    +      System.err.println("Usage: StructuredNetworkWordCount <hostname> <port> <checkpoint dir>")
    +      System.exit(1)
    +    }
    +
    +    val spark = SparkSession
    +      .builder
    +      .appName("StructuredNetworkWordCount")
    +      .getOrCreate()
    +
    +    import spark.implicits._
    +
    +    val df = spark.readStream
    --- End diff --
    
    df --> lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    LGTM. Merging this to master and 2.0. Thank @jjthomas 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r68860400
  
    --- Diff: examples/src/main/python/sql/streaming/structured_network_wordcount.py ---
    @@ -0,0 +1,76 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +"""
    + Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + Usage: structured_network_wordcount.py <hostname> <port>
    +   <hostname> and <port> describe the TCP server that Structured Streaming
    +   would connect to receive data.
    +
    + To run this on your local machine, you need to first run a Netcat server
    +    `$ nc -lk 9999`
    + and then run the example
    +    `$ bin/spark-submit examples/src/main/python/sql/streaming/structured_network_wordcount.py
    +    localhost 9999`
    +"""
    +from __future__ import print_function
    +
    +import sys
    +
    +from pyspark.sql import SparkSession
    +from pyspark.sql.functions import explode
    +from pyspark.sql.functions import split
    +
    +if __name__ == "__main__":
    +    if len(sys.argv) != 3:
    --- End diff --
    
    nvm. my bad.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by jjthomas <gi...@git.apache.org>.
Github user jjthomas commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Responded to comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #61054 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61054/consoleFull)** for PR 13816 at commit [`80fee20`](https://github.com/apache/spark/commit/80fee206beccc38b7b1e92e47a20ec2525e02b31).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r67961339
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql.streaming;
    +
    +
    +import org.apache.spark.sql.*;
    +import org.apache.spark.sql.streaming.OutputMode;
    +
    +import java.util.regex.Pattern;
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: JavaStructuredNetworkWordCount <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount
    + *    localhost 9999 <checkpoint dir>`
    + */
    +public class JavaStructuredNetworkWordCount {
    +  private static final Pattern SPACE = Pattern.compile(" ");
    +
    +  public static void main(String[] args) throws Exception {
    +    if (args.length < 3) {
    +      System.err.println("Usage: JavaNetworkWordCount <hostname> <port> <checkpoint dir>");
    +      System.exit(1);
    +    }
    +
    +    SparkSession spark = SparkSession
    +      .builder()
    +      .appName("JavaStructuredNetworkWordCount")
    +      .getOrCreate();
    +
    +    Dataset<String> df = spark.readStream().format("socket").option("host", args[0])
    +      .option("port", args[1]).load().as(Encoders.STRING());
    +
    +    Dataset<String> words = df.select(functions.explode(functions.split(df.col("value"), " ")).alias("word"))
    +      .as(Encoders.STRING());
    +
    +    Dataset<Row> wordCounts = words.groupBy("word").count();
    +
    +    wordCounts.writeStream()
    +      .outputMode(OutputMode.Complete())
    +      .format("console")
    +      .option("checkpointLocation", args[2])
    +      .start()
    +      .awaitTermination();
    --- End diff --
    
    assign this to query to highlight the fact that this creates a StreamingQuery. And then separately called awaitTermination on it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #60970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60970/consoleFull)** for PR 13816 at commit [`38b5497`](https://github.com/apache/spark/commit/38b5497ef17b0c1f1cf2a8c5731832bed06d2fc8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r67980502
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java ---
    @@ -0,0 +1,83 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql.streaming;
    +
    +
    +import org.apache.spark.sql.*;
    +import org.apache.spark.sql.streaming.OutputMode;
    +import org.apache.spark.sql.streaming.StreamingQuery;
    +
    +import java.util.regex.Pattern;
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: JavaStructuredNetworkWordCount <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount
    + *    localhost 9999 <checkpoint dir>`
    + */
    +public final class JavaStructuredNetworkWordCount {
    +  private static final Pattern SPACE = Pattern.compile(" ");
    +
    +  public static void main(String[] args) throws Exception {
    +    if (args.length < 3) {
    +      System.err.println("Usage: JavaNetworkWordCount <hostname> <port> <checkpoint dir>");
    +      System.exit(1);
    +    }
    +
    +    String host = args[0];
    +    int port = Integer.parseInt(args[1]);
    +    String checkpointDir = args[2];
    +
    +    SparkSession spark = SparkSession
    +      .builder()
    +      .appName("JavaStructuredNetworkWordCount")
    +      .getOrCreate();
    +
    +    // input lines (may be multiple words on each line)
    --- End diff --
    
    // Create DataFrame representing the stream of input lines from connection to host:port 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r68712352
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java ---
    @@ -0,0 +1,78 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.examples.sql.streaming;
    +
    +import org.apache.spark.sql.*;
    +import org.apache.spark.sql.streaming.StreamingQuery;
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: JavaStructuredNetworkWordCount <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount
    --- End diff --
    
    I think you can just do `$ bin/run-example sql.streaming.JavaStructuredNetworkWordCount`. Verify that, and if it works, please change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r68712530
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/streaming/NetworkEventTimeWindow.scala ---
    @@ -0,0 +1,84 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.sql.streaming
    +
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.functions._
    +import org.apache.spark.sql.types.TimestampType
    +
    +/**
    + * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + *
    + * Usage: EventTimeWindowExample <hostname> <port> <checkpoint dir>
    + * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
    + *
    + * To run this on your local machine, you need to first run a Netcat server
    + *    `$ nc -lk 9999`
    + * and then run the example
    + *    `$ bin/run-example org.apache.spark.examples.sql.streaming.EventTimeWindowExample
    + *    localhost 9999 <checkpoint dir>`
    + */
    +object NetworkEventTimeWindow {
    +
    +  def main(args: Array[String]) {
    +    if (args.length < 3) {
    +      System.err.println("Usage: EventTimeWindowExample <hostname> <port> <checkpoint dir>")
    +      System.exit(1)
    +    }
    +
    +    val host = args(0)
    +    val port = args(1).toInt
    +    val checkpointDir = args(2)
    +
    +    val spark = SparkSession
    +      .builder
    +      .appName("EventTimeWindowExample")
    +      .getOrCreate()
    +
    +    import spark.implicits._
    +
    +    // Create DataFrame representing the stream of input lines from connection to host:port
    +    val lines = spark.readStream
    --- End diff --
    
    What is the format of the data that the user is expected to enter?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #61310 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61310/consoleFull)** for PR 13816 at commit [`c3b16a2`](https://github.com/apache/spark/commit/c3b16a28e91328c5f6e6fb8efd5e3ee9b3cde1ed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #3128 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3128/consoleFull)** for PR 13816 at commit [`80fee20`](https://github.com/apache/spark/commit/80fee206beccc38b7b1e92e47a20ec2525e02b31).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #61389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61389/consoleFull)** for PR 13816 at commit [`fb491c6`](https://github.com/apache/spark/commit/fb491c6237380d6419c63c7b24b73c574bee843d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    **[Test build #61310 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61310/consoleFull)** for PR 13816 at commit [`c3b16a2`](https://github.com/apache/spark/commit/c3b16a28e91328c5f6e6fb8efd5e3ee9b3cde1ed).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61051/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13816: [SPARK-16114] [SQL] structured streaming network ...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13816#discussion_r68859955
  
    --- Diff: examples/src/main/python/sql/streaming/structured_network_wordcount.py ---
    @@ -0,0 +1,76 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +"""
    + Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    + Usage: structured_network_wordcount.py <hostname> <port>
    +   <hostname> and <port> describe the TCP server that Structured Streaming
    +   would connect to receive data.
    +
    + To run this on your local machine, you need to first run a Netcat server
    +    `$ nc -lk 9999`
    + and then run the example
    +    `$ bin/spark-submit examples/src/main/python/sql/streaming/structured_network_wordcount.py
    +    localhost 9999`
    +"""
    +from __future__ import print_function
    +
    +import sys
    +
    +from pyspark.sql import SparkSession
    +from pyspark.sql.functions import explode
    +from pyspark.sql.functions import split
    +
    +if __name__ == "__main__":
    +    if len(sys.argv) != 3:
    --- End diff --
    
    number of arguments should be 2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13816: [SPARK-16114] [SQL] structured streaming network word co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13816
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org