You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ScrapCodes <gi...@git.apache.org> on 2016/07/07 09:10:55 UTC

[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

GitHub user ScrapCodes opened a pull request:

    https://github.com/apache/spark/pull/14087

    [SPARK-16411][SQL][STREAMING] Add textFile to Structured Streaming.

    ## What changes were proposed in this pull request?
    
    Adds the textFile API which exists in DataFrameReader and serves same purpose.
    
    ## How was this patch tested?
    
    Added corresponding testcase.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ScrapCodes/spark textFile

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14087.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14087
    
----
commit ac822323f35122b99c6aa4d9fce5874160266909
Author: Prashant Sharma <pr...@in.ibm.com>
Date:   2016-07-07T06:46:03Z

    [SPARK-16411][SQL][STREAMING] Add textFile to Structured Streaming.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    **[Test build #61908 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61908/consoleFull)** for PR 14087 at commit [`ac82232`](https://github.com/apache/spark/commit/ac822323f35122b99c6aa4d9fce5874160266909).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    **[Test build #66303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66303/consoleFull)** for PR 14087 at commit [`25dfd09`](https://github.com/apache/spark/commit/25dfd09e194734f5d257041296c29dd79de81d1c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    **[Test build #66261 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66261/consoleFull)** for PR 14087 at commit [`1c43ffa`](https://github.com/apache/spark/commit/1c43ffaee570a83b6253f58f3a4c4f67823ac5f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    **[Test build #66376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66376/consoleFull)** for PR 14087 at commit [`ecdf653`](https://github.com/apache/spark/commit/ecdf6539c8c19da3f019601309993fde634d6c22).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r81689922
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -311,6 +311,37 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text file(s) and returns a [[Dataset]] of String. The underlying schema of the Dataset
    --- End diff --
    
    I would like to be corrected, as I just followed the convention over here. Since this class does not have any vararg method for other APIs, I was doubtful in adding one myself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r69985195
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset
    +   * contains a single string column named "value".
    +   *
    +   * If the directory structure of the text files contains partitioning information, those are
    +   * ignored in the resulting Dataset. To include partitioning information as columns, use `text`.
    +   *
    +   * Each line in the text files is a new element in the resulting Dataset. For example:
    +   * {{{
    +   *   // Scala:
    +   *   spark.read.textFile("/path/to/spark/README.md")
    --- End diff --
    
    s/read/readStream?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61976/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by jodersky <gi...@git.apache.org>.
Github user jodersky commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r81669340
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -311,6 +311,37 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text file(s) and returns a [[Dataset]] of String. The underlying schema of the Dataset
    --- End diff --
    
    Should text files be plural here? The api would be more intuitive by copying the non-streaming equivalent with a vararg-method for multiple parameters


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    Thanks, merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r69984678
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset
    +   * contains a single string column named "value".
    +   *
    +   * If the directory structure of the text files contains partitioning information, those are
    +   * ignored in the resulting Dataset. To include partitioning information as columns, use `text`.
    +   *
    +   * Each line in the text files is a new element in the resulting Dataset. For example:
    --- End diff --
    
    s/element/record?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r70024634
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset
    --- End diff --
    
    Thanks :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r81689547
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -21,13 +21,13 @@ import scala.collection.JavaConverters._
     
     import org.apache.spark.annotation.Experimental
     import org.apache.spark.internal.Logging
    -import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
    +import org.apache.spark.sql.{AnalysisException, DataFrame, Dataset, SparkSession}
     import org.apache.spark.sql.execution.datasources.DataSource
     import org.apache.spark.sql.execution.streaming.StreamingRelation
     import org.apache.spark.sql.types.StructType
     
     /**
    - * Interface used to load a streaming [[Dataset]] from external storage systems (e.g. file systems,
    + * Class used to load a streaming [[Dataset]] from external storage systems (e.g. file systems,
    --- End diff --
    
    Understood, thanks for correcting !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    **[Test build #61908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61908/consoleFull)** for PR 14087 at commit [`ac82232`](https://github.com/apache/spark/commit/ac822323f35122b99c6aa4d9fce5874160266909).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    @marmbrus Do you think this is useful ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r70121449
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset
    +   * contains a single string column named "value".
    +   *
    +   * If the directory structure of the text files contains partitioning information, those are
    +   * ignored in the resulting Dataset. To include partitioning information as columns, use `text`.
    +   *
    +   * Each line in the text files is a new element in the resulting Dataset. For example:
    +   * {{{
    +   *   // Scala:
    +   *   spark.read.textFile("/path/to/spark/README.md")
    +   *
    +   *   // Java:
    +   *   spark.read().textFile("/path/to/spark/README.md")
    +   * }}}
    +   *
    +   * @param path input path
    +   * @since 2.0.0
    +   */
    +  def textFile(path: String): Dataset[String] = {
    +    if (userSpecifiedSchema.nonEmpty) {
    +      throw new AnalysisException("User specified schema not supported with `textFile`")
    --- End diff --
    
    Since this check is presumably copied from the similar function in DataFrameReader, we should probably keep the exception the same as DataFrameReader (so either update it too or leave this as is).
    Also In the SQL code base we use "User specified"  24 times and "User-specified" 5 times.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r69985100
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset
    +   * contains a single string column named "value".
    +   *
    +   * If the directory structure of the text files contains partitioning information, those are
    +   * ignored in the resulting Dataset. To include partitioning information as columns, use `text`.
    +   *
    +   * Each line in the text files is a new element in the resulting Dataset. For example:
    +   * {{{
    +   *   // Scala:
    +   *   spark.read.textFile("/path/to/spark/README.md")
    +   *
    +   *   // Java:
    +   *   spark.read().textFile("/path/to/spark/README.md")
    +   * }}}
    +   *
    +   * @param path input path
    +   * @since 2.0.0
    +   */
    +  def textFile(path: String): Dataset[String] = {
    +    if (userSpecifiedSchema.nonEmpty) {
    +      throw new AnalysisException("User specified schema not supported with `textFile`")
    +    }
    +    text(path).select("value").as[String](sparkSession.implicits.newStringEncoder)
    --- End diff --
    
    I'm surprised that `sparkSession.implicits.newStringEncoder` is required here? Why is `sparkSession.implicits._` not imported here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r70043723
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset
    --- End diff --
    
    No actually it should be text files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66498/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66261/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    **[Test build #61976 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61976/consoleFull)** for PR 14087 at commit [`0c76ef9`](https://github.com/apache/spark/commit/0c76ef9d5c7f9218c82cea2cdcc5da50b58c16d6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r82371339
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala ---
    @@ -378,6 +378,24 @@ class FileStreamSourceSuite extends FileStreamSourceTest {
         }
       }
     
    +  test("read from textfile") {
    +    withTempDirs { case (src, tmp) =>
    +      val textStream = spark.readStream.textFile(src.getCanonicalPath)
    +      val filtered = textStream.filter($"value" contains "keep")
    --- End diff --
    
    updated it. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    @tdas Ping !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r82293680
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -311,6 +311,37 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text file(s) and returns a [[Dataset]] of String. The underlying schema of the Dataset
    --- End diff --
    
    It might be weird to add var args, since the streaming case would always be to watch a directory (not list a bunch of files).  I think its fine to leave it out for now.
    
    This is existing, but its a little odd that the methods in this file talk about `loading files` rather than `watching directories of files and processing them as they appear`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    **[Test build #61976 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61976/consoleFull)** for PR 14087 at commit [`0c76ef9`](https://github.com/apache/spark/commit/0c76ef9d5c7f9218c82cea2cdcc5da50b58c16d6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r69985212
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset
    +   * contains a single string column named "value".
    +   *
    +   * If the directory structure of the text files contains partitioning information, those are
    +   * ignored in the resulting Dataset. To include partitioning information as columns, use `text`.
    +   *
    +   * Each line in the text files is a new element in the resulting Dataset. For example:
    +   * {{{
    +   *   // Scala:
    +   *   spark.read.textFile("/path/to/spark/README.md")
    +   *
    +   *   // Java:
    +   *   spark.read().textFile("/path/to/spark/README.md")
    --- End diff --
    
    s/read/readStream?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r70024651
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset
    +   * contains a single string column named "value".
    +   *
    +   * If the directory structure of the text files contains partitioning information, those are
    +   * ignored in the resulting Dataset. To include partitioning information as columns, use `text`.
    +   *
    +   * Each line in the text files is a new element in the resulting Dataset. For example:
    --- End diff --
    
    This is okay, I think. Not sure what others think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66376/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    **[Test build #66303 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66303/consoleFull)** for PR 14087 at commit [`25dfd09`](https://github.com/apache/spark/commit/25dfd09e194734f5d257041296c29dd79de81d1c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r70045502
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset
    +   * contains a single string column named "value".
    +   *
    +   * If the directory structure of the text files contains partitioning information, those are
    +   * ignored in the resulting Dataset. To include partitioning information as columns, use `text`.
    +   *
    +   * Each line in the text files is a new element in the resulting Dataset. For example:
    +   * {{{
    +   *   // Scala:
    +   *   spark.read.textFile("/path/to/spark/README.md")
    +   *
    +   *   // Java:
    +   *   spark.read().textFile("/path/to/spark/README.md")
    +   * }}}
    +   *
    +   * @param path input path
    +   * @since 2.0.0
    +   */
    +  def textFile(path: String): Dataset[String] = {
    +    if (userSpecifiedSchema.nonEmpty) {
    +      throw new AnalysisException("User specified schema not supported with `textFile`")
    +    }
    +    text(path).select("value").as[String](sparkSession.implicits.newStringEncoder)
    --- End diff --
    
    If I remember correctly, in Spark codebase we prefer explicitly stating the implicit used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66303/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r71025786
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala ---
    @@ -331,6 +331,24 @@ class FileStreamSourceSuite extends FileStreamSourceTest {
         }
       }
     
    +  test("read from textfile") {
    +    withTempDirs { case (src, tmp) =>
    +      val textStream = spark.readStream.textFile(src.getCanonicalPath)
    +      val filtered = textStream.filter($"value" contains "keep")
    +
    +      testStream(filtered)(
    +        AddTextFileData("drop1\nkeep2\nkeep3", src, tmp),
    +        CheckAnswer("keep2", "keep3"),
    +        StopStream,
    +        AddTextFileData("drop4\nkeep5\nkeep6", src, tmp),
    +        StartStream(),
    --- End diff --
    
    Stopping has no parameters, starting you might choose to change trigger interval.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14087


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r81610590
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -21,13 +21,13 @@ import scala.collection.JavaConverters._
     
     import org.apache.spark.annotation.Experimental
     import org.apache.spark.internal.Logging
    -import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
    +import org.apache.spark.sql.{AnalysisException, DataFrame, Dataset, SparkSession}
     import org.apache.spark.sql.execution.datasources.DataSource
     import org.apache.spark.sql.execution.streaming.StreamingRelation
     import org.apache.spark.sql.types.StructType
     
     /**
    - * Interface used to load a streaming [[Dataset]] from external storage systems (e.g. file systems,
    + * Class used to load a streaming [[Dataset]] from external storage systems (e.g. file systems,
    --- End diff --
    
    This change seems unrelated and takes us out of sync with the batch version.  I don't think this means a JVM interface, but rather the `interface` in API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r69985379
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala ---
    @@ -331,6 +331,24 @@ class FileStreamSourceSuite extends FileStreamSourceTest {
         }
       }
     
    +  test("read from textfile") {
    +    withTempDirs { case (src, tmp) =>
    +      val textStream = spark.readStream.textFile(src.getCanonicalPath)
    +      val filtered = textStream.filter($"value" contains "keep")
    +
    +      testStream(filtered)(
    +        AddTextFileData("drop1\nkeep2\nkeep3", src, tmp),
    +        CheckAnswer("keep2", "keep3"),
    +        StopStream,
    +        AddTextFileData("drop4\nkeep5\nkeep6", src, tmp),
    +        StartStream(),
    --- End diff --
    
    Just wondering why `()` are here while not for `StopStream`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    /cc @tdas 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    **[Test build #66498 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66498/consoleFull)** for PR 14087 at commit [`867394d`](https://github.com/apache/spark/commit/867394d4ca3c8bed58fab9efa3ecaead41f2099e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r70044030
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala ---
    @@ -331,6 +331,24 @@ class FileStreamSourceSuite extends FileStreamSourceTest {
         }
       }
     
    +  test("read from textfile") {
    +    withTempDirs { case (src, tmp) =>
    +      val textStream = spark.readStream.textFile(src.getCanonicalPath)
    +      val filtered = textStream.filter($"value" contains "keep")
    +
    +      testStream(filtered)(
    +        AddTextFileData("drop1\nkeep2\nkeep3", src, tmp),
    +        CheckAnswer("keep2", "keep3"),
    +        StopStream,
    +        AddTextFileData("drop4\nkeep5\nkeep6", src, tmp),
    +        StartStream(),
    --- End diff --
    
    This is a correct question, but it was a choice already made by others.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r69984805
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset
    +   * contains a single string column named "value".
    +   *
    +   * If the directory structure of the text files contains partitioning information, those are
    +   * ignored in the resulting Dataset. To include partitioning information as columns, use `text`.
    +   *
    +   * Each line in the text files is a new element in the resulting Dataset. For example:
    +   * {{{
    +   *   // Scala:
    +   *   spark.read.textFile("/path/to/spark/README.md")
    +   *
    +   *   // Java:
    +   *   spark.read().textFile("/path/to/spark/README.md")
    +   * }}}
    +   *
    +   * @param path input path
    +   * @since 2.0.0
    +   */
    +  def textFile(path: String): Dataset[String] = {
    +    if (userSpecifiedSchema.nonEmpty) {
    +      throw new AnalysisException("User specified schema not supported with `textFile`")
    --- End diff --
    
    user-specified


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    **[Test build #66376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66376/consoleFull)** for PR 14087 at commit [`ecdf653`](https://github.com/apache/spark/commit/ecdf6539c8c19da3f019601309993fde634d6c22).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    **[Test build #66498 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66498/consoleFull)** for PR 14087 at commit [`867394d`](https://github.com/apache/spark/commit/867394d4ca3c8bed58fab9efa3ecaead41f2099e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r69984584
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset
    --- End diff --
    
    a text file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    **[Test build #66261 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66261/consoleFull)** for PR 14087 at commit [`1c43ffa`](https://github.com/apache/spark/commit/1c43ffaee570a83b6253f58f3a4c4f67823ac5f5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61908/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r70043361
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala ---
    @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text files and returns a [[Dataset]] of String. The underlying schema of the Dataset
    +   * contains a single string column named "value".
    +   *
    +   * If the directory structure of the text files contains partitioning information, those are
    +   * ignored in the resulting Dataset. To include partitioning information as columns, use `text`.
    +   *
    +   * Each line in the text files is a new element in the resulting Dataset. For example:
    +   * {{{
    +   *   // Scala:
    +   *   spark.read.textFile("/path/to/spark/README.md")
    +   *
    +   *   // Java:
    +   *   spark.read().textFile("/path/to/spark/README.md")
    --- End diff --
    
    Correct !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14087
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r82293331
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala ---
    @@ -378,6 +378,24 @@ class FileStreamSourceSuite extends FileStreamSourceTest {
         }
       }
     
    +  test("read from textfile") {
    +    withTempDirs { case (src, tmp) =>
    +      val textStream = spark.readStream.textFile(src.getCanonicalPath)
    +      val filtered = textStream.filter($"value" contains "keep")
    --- End diff --
    
    One last comment.  I'd use the typed API here since that is the whole point of `textFile` vs `text`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org