You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zsxwing <gi...@git.apache.org> on 2016/04/08 01:31:41 UTC

[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/12247

    [SPARK-14474][SQL]Move FileSource offset log into checkpointLocation

    ## What changes were proposed in this pull request?
    
    Now that we have a single location for storing checkpointed state. This PR just propagates the checkpoint location into FileStreamSource so that we don't have one random log off on its own.
    
    ## How was this patch tested?
    
    test("metadataPath should be in checkpointLocation")

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark file-source-log-location

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12247.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12247
    
----
commit d161f3adb978dc4ed519eb3318731ac05c247f5b
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2016-04-07T22:27:12Z

    Move FileSource offset log into checkpointLocation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-207152941
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208593200
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55537/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208576061
  
    **[Test build #55539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55539/consoleFull)** for PR 12247 at commit [`a761692`](https://github.com/apache/spark/commit/a761692ed8eb752989fd03f6ec4a0d71a11880d8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-207216132
  
    **[Test build #55308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55308/consoleFull)** for PR 12247 at commit [`d161f3a`](https://github.com/apache/spark/commit/d161f3adb978dc4ed519eb3318731ac05c247f5b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59297156
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
    @@ -129,8 +129,17 @@ trait SchemaRelationProvider {
      * Implemented by objects that can produce a streaming [[Source]] for a specific format or system.
      */
     trait StreamSourceProvider {
    +
    +  /** Returns the name and schema of the source that can be used to continually read data. */
    +  def sourceSchema(
    +      sqlContext: SQLContext,
    +      schema: Option[StructType],
    +      providerName: String,
    +      parameters: Map[String, String]): (String, StructType)
    +
       def createSource(
           sqlContext: SQLContext,
    +      sourceId: Long,
    --- End diff --
    
    I thought the goal was to have all the data in the same location.  With this API everyone needs to duplicate the checkpoint location resolution logic.
    
    Note that if you want a unique identifier the path also qualifies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208651028
  
    **[Test build #55548 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55548/consoleFull)** for PR 12247 at commit [`4cb1608`](https://github.com/apache/spark/commit/4cb16085590de943aea9972274f7f2d114125653).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59419309
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
    @@ -129,8 +129,17 @@ trait SchemaRelationProvider {
      * Implemented by objects that can produce a streaming [[Source]] for a specific format or system.
      */
     trait StreamSourceProvider {
    +
    +  /** Returns the name and schema of the source that can be used to continually read data. */
    +  def sourceSchema(
    +      sqlContext: SQLContext,
    +      schema: Option[StructType],
    +      providerName: String,
    +      parameters: Map[String, String]): (String, StructType)
    +
       def createSource(
           sqlContext: SQLContext,
    +      metadataPath: String,
    --- End diff --
    
    This is called `metadataPath` to avoid confusing with `checkpointLocation` since they are not the same path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208607307
  
    **[Test build #55539 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55539/consoleFull)** for PR 12247 at commit [`a761692`](https://github.com/apache/spark/commit/a761692ed8eb752989fd03f6ec4a0d71a11880d8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-207216360
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59297483
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
    @@ -129,8 +129,17 @@ trait SchemaRelationProvider {
      * Implemented by objects that can produce a streaming [[Source]] for a specific format or system.
      */
     trait StreamSourceProvider {
    +
    +  /** Returns the name and schema of the source that can be used to continually read data. */
    +  def sourceSchema(
    +      sqlContext: SQLContext,
    +      schema: Option[StructType],
    +      providerName: String,
    +      parameters: Map[String, String]): (String, StructType)
    +
       def createSource(
           sqlContext: SQLContext,
    +      sourceId: Long,
    --- End diff --
    
    Make sense. I will update it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59254285
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala ---
    @@ -341,6 +347,33 @@ class FileStreamSourceSuite extends FileStreamSourceTest with SharedSQLContext {
         Utils.deleteRecursively(tmp)
       }
     
    +  test("metadataPath should be in checkpointLocation") {
    --- End diff --
    
    Could we just test this in DataFrameReaderWriterSuite?  This seems kind of integration heavy.  It would be good to test that multiple sources get different ids too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208616091
  
    **[Test build #55548 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55548/consoleFull)** for PR 12247 at commit [`4cb1608`](https://github.com/apache/spark/commit/4cb16085590de943aea9972274f7f2d114125653).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208593199
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208607918
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12247


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59282240
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala ---
    @@ -341,6 +347,33 @@ class FileStreamSourceSuite extends FileStreamSourceTest with SharedSQLContext {
         Utils.deleteRecursively(tmp)
       }
     
    +  test("metadataPath should be in checkpointLocation") {
    --- End diff --
    
    What are you really testing here?  That its not just blindly ignoring the parameter that is passed to it?  Given the amount of reflection you are adding here it seems likely that the cost of maintaining this test outweighs its utility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208651263
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55548/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208607923
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55539/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59255827
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -123,8 +123,16 @@ case class DataSource(
         }
       }
     
    -  /** Returns a source that can be used to continually read data. */
    -  def createSource(): Source = {
    +  /**
    +   * Returns a source that can be used to continually read data.
    +   *
    +   * Before running a real query (e.g., df.explain), `sourceId` and `checkpointLocation` is None
    +   * as they are unknown. [[ContinuousQueryManager]] should set `sourceId` and `checkpointLocation`
    +   * before starting a query.
    +   */
    +  def createSource(
    +      sourceId: Option[Long] = None,
    +      checkpointLocation: Option[String] = None): Source = {
    --- End diff --
    
    Yeah, and we also don't really need to create a source there (we only need to know the schema).  Perhaps getting the schema should be separated from getting the source (like we do in FileFormat).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-207216362
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55308/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208591596
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55536/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59303566
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala ---
    @@ -341,6 +347,33 @@ class FileStreamSourceSuite extends FileStreamSourceTest with SharedSQLContext {
         Utils.deleteRecursively(tmp)
       }
     
    +  test("metadataPath should be in checkpointLocation") {
    --- End diff --
    
    I removed this test as now `metadataPath` is for all `Source`s.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208559378
  
    Updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59295994
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
    @@ -129,8 +129,17 @@ trait SchemaRelationProvider {
      * Implemented by objects that can produce a streaming [[Source]] for a specific format or system.
      */
     trait StreamSourceProvider {
    +
    +  /** Returns the name and schema of the source that can be used to continually read data. */
    +  def sourceSchema(
    +      sqlContext: SQLContext,
    +      schema: Option[StructType],
    +      providerName: String,
    +      parameters: Map[String, String]): (String, StructType)
    +
       def createSource(
           sqlContext: SQLContext,
    +      sourceId: Long,
    --- End diff --
    
    Why are we passing the `sourceId` instead of the location?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59280899
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala ---
    @@ -341,6 +347,33 @@ class FileStreamSourceSuite extends FileStreamSourceTest with SharedSQLContext {
         Utils.deleteRecursively(tmp)
       }
     
    +  test("metadataPath should be in checkpointLocation") {
    --- End diff --
    
    `metadataPath` is only for `FileStreamSource` so I think this test belongs to `FileStreamSourceSuite`.
    
    I added a test to test source ids in `DataFrameReaderWriterSuite`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-207195658
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208592951
  
    **[Test build #55537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55537/consoleFull)** for PR 12247 at commit [`7a818a9`](https://github.com/apache/spark/commit/7a818a9500b8f73abc8a3ef441093c3ae65e0cef).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59254687
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -123,8 +123,16 @@ case class DataSource(
         }
       }
     
    -  /** Returns a source that can be used to continually read data. */
    -  def createSource(): Source = {
    +  /**
    +   * Returns a source that can be used to continually read data.
    +   *
    +   * Before running a real query (e.g., df.explain), `sourceId` and `checkpointLocation` is None
    +   * as they are unknown. [[ContinuousQueryManager]] should set `sourceId` and `checkpointLocation`
    +   * before starting a query.
    +   */
    +  def createSource(
    +      sourceId: Option[Long] = None,
    +      checkpointLocation: Option[String] = None): Source = {
    --- End diff --
    
    `sourceId` and `checkpointLocation` are set via DataFrameWriter. When this one is called in `DataFrameReader`, we don't know them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59256046
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala ---
    @@ -67,12 +62,33 @@ class FileStreamSource(
       }
     
       /**
    +   * Set the metadata path. This method should be called before using [[FileStreamSource]].
    +   */
    +  def setMetadataPath(metadataPath: String): Unit = {
    --- End diff --
    
    Sure, but if you find yourself hacking around the fact that we don't know some information at some point in the control flow and its making the implementation a lot more complicated, then we need to rethink the control flow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59255253
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala ---
    @@ -67,12 +62,33 @@ class FileStreamSource(
       }
     
       /**
    +   * Set the metadata path. This method should be called before using [[FileStreamSource]].
    +   */
    +  def setMetadataPath(metadataPath: String): Unit = {
    --- End diff --
    
    > I'd really prefer to avoid the pattern of having a initialization that is separate from the constructor.
    
    Same as above. We don't know `metadataPath` when `DataSource.createSource` is called in `DataFrameReader`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-207152790
  
    **[Test build #55270 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55270/consoleFull)** for PR 12247 at commit [`d161f3a`](https://github.com/apache/spark/commit/d161f3adb978dc4ed519eb3318731ac05c247f5b).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-209024772
  
    Thanks, merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208651259
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208559400
  
    **[Test build #55537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55537/consoleFull)** for PR 12247 at commit [`7a818a9`](https://github.com/apache/spark/commit/7a818a9500b8f73abc8a3ef441093c3ae65e0cef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-207516450
  
    cc @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-207136914
  
    **[Test build #55270 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55270/consoleFull)** for PR 12247 at commit [`d161f3a`](https://github.com/apache/spark/commit/d161f3adb978dc4ed519eb3318731ac05c247f5b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208591339
  
    **[Test build #55536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55536/consoleFull)** for PR 12247 at commit [`61fe406`](https://github.com/apache/spark/commit/61fe40674dfa1a3b1dc9f586b54d5a9993a1d67e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59253837
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -123,8 +123,16 @@ case class DataSource(
         }
       }
     
    -  /** Returns a source that can be used to continually read data. */
    -  def createSource(): Source = {
    +  /**
    +   * Returns a source that can be used to continually read data.
    +   *
    +   * Before running a real query (e.g., df.explain), `sourceId` and `checkpointLocation` is None
    +   * as they are unknown. [[ContinuousQueryManager]] should set `sourceId` and `checkpointLocation`
    +   * before starting a query.
    +   */
    +  def createSource(
    +      sourceId: Option[Long] = None,
    +      checkpointLocation: Option[String] = None): Source = {
    --- End diff --
    
    Why are these optional?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59253976
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/MemorySinkSuite.scala ---
    @@ -59,7 +59,7 @@ class MemorySinkSuite extends StreamTest with SharedSQLContext {
       }
     
       test("error if attempting to resume specific checkpoint") {
    -    val location = Utils.createTempDir("steaming.checkpoint").getCanonicalPath
    +    val location = Utils.createTempDir(namePrefix = "steaming.checkpoint").getCanonicalPath
    --- End diff --
    
    Why this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208591595
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59254961
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/MemorySinkSuite.scala ---
    @@ -59,7 +59,7 @@ class MemorySinkSuite extends StreamTest with SharedSQLContext {
       }
     
       test("error if attempting to resume specific checkpoint") {
    -    val location = Utils.createTempDir("steaming.checkpoint").getCanonicalPath
    +    val location = Utils.createTempDir(namePrefix = "steaming.checkpoint").getCanonicalPath
    --- End diff --
    
    > Why this change?
    
    Avoid to create `steaming.checkpoint` in the sql folder. I have to clean my repo after running this test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-207196573
  
    **[Test build #55308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55308/consoleFull)** for PR 12247 at commit [`d161f3a`](https://github.com/apache/spark/commit/d161f3adb978dc4ed519eb3318731ac05c247f5b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-208558013
  
    **[Test build #55536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55536/consoleFull)** for PR 12247 at commit [`61fe406`](https://github.com/apache/spark/commit/61fe40674dfa1a3b1dc9f586b54d5a9993a1d67e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12247#issuecomment-207152943
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55270/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59296806
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
    @@ -129,8 +129,17 @@ trait SchemaRelationProvider {
      * Implemented by objects that can produce a streaming [[Source]] for a specific format or system.
      */
     trait StreamSourceProvider {
    +
    +  /** Returns the name and schema of the source that can be used to continually read data. */
    +  def sourceSchema(
    +      sqlContext: SQLContext,
    +      schema: Option[StructType],
    +      providerName: String,
    +      parameters: Map[String, String]): (String, StructType)
    +
       def createSource(
           sqlContext: SQLContext,
    +      sourceId: Long,
    --- End diff --
    
    I think some Source may not need a location. Instead, it just needs an id to distinguish.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59253946
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala ---
    @@ -67,12 +62,33 @@ class FileStreamSource(
       }
     
       /**
    +   * Set the metadata path. This method should be called before using [[FileStreamSource]].
    +   */
    +  def setMetadataPath(metadataPath: String): Unit = {
    --- End diff --
    
    I'd really prefer to avoid the pattern of having a initialization that is separate from the constructor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12247#discussion_r59286840
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala ---
    @@ -341,6 +347,33 @@ class FileStreamSourceSuite extends FileStreamSourceTest with SharedSQLContext {
         Utils.deleteRecursively(tmp)
       }
     
    +  test("metadataPath should be in checkpointLocation") {
    --- End diff --
    
    I want to check the FileStreamSource.metadataPath value. Let me just make it public to avoid the reflection.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org