You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/09/20 09:52:17 UTC

[GitHub] spark pull request #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options sh...

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/22489

    [SPARK-25425][SQL][BACKPORT-2.3] Extra options should override session options in DataSource V2

    ## What changes were proposed in this pull request?
    
    In the PR, I propose overriding session options by extra options in DataSource V2. Extra options are more specific and set via `.option()`, and should overwrite more generic session options. 
    
    ## How was this patch tested?
    
    Added tests for read and write paths.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 session-options-2.3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22489.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22489
    
----
commit 0cc174160bf04bc97de62c438d571b27224c53a0
Author: Maxim Gekk <ma...@...>
Date:   2018-09-16T00:24:11Z

    [SPARK-25425][SQL] Extra options should override session options in DataSource V2
    
    In the PR, I propose overriding session options by extra options in DataSource V2. Extra options are more specific and set via `.option()`, and should overwrite more generic session options. Entries from seconds map overwrites entries with the same key from the first map, for example:
    ```Scala
    scala> Map("option" -> false) ++ Map("option" -> true)
    res0: scala.collection.immutable.Map[String,Boolean] = Map(option -> true)
    ```
    
    Added a test for checking which option is propagated to a data source in `load()`.
    
    Closes #22413 from MaxGekk/session-options.
    
    Lead-authored-by: Maxim Gekk <ma...@databricks.com>
    Co-authored-by: Dongjoon Hyun <do...@apache.org>
    Co-authored-by: Maxim Gekk <ma...@gmail.com>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>

commit 744c5b8b0c8fbc7caa6d03902ed323df5c73986b
Author: Maxim Gekk <ma...@...>
Date:   2018-09-19T19:40:25Z

    Adding missing import after merge to 2.4

commit c87a6de4a1de82514fad590c68f35fbfaf54d825
Author: Maxim Gekk <ma...@...>
Date:   2018-09-19T20:20:21Z

    Fix merge

commit f8b5aa6b3d1fc9e953bcfef8f23aeb6f3ab7818d
Author: Maxim Gekk <ma...@...>
Date:   2018-09-20T09:47:57Z

    Adding options to Reader

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96592/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options sh...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk closed the pull request at:

    https://github.com/apache/spark/pull/22489


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    **[Test build #96592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96592/testReport)** for PR 22489 at commit [`f8b5aa6`](https://github.com/apache/spark/commit/f8b5aa6b3d1fc9e953bcfef8f23aeb6f3ab7818d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class SimpleDataSourceV2Reader(options: DataSourceOptions) extends DataSourceReader `
      * `class SimpleDataSourceV2 extends DataSourceV2 with ReadSupport `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    **[Test build #96341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96341/testReport)** for PR 22489 at commit [`f8b5aa6`](https://github.com/apache/spark/commit/f8b5aa6b3d1fc9e953bcfef8f23aeb6f3ab7818d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    **[Test build #96341 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96341/testReport)** for PR 22489 at commit [`f8b5aa6`](https://github.com/apache/spark/commit/f8b5aa6b3d1fc9e953bcfef8f23aeb6f3ab7818d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class SimpleDataSourceV2Reader(options: DataSourceOptions) extends DataSourceReader `
      * `class SimpleDataSourceV2 extends DataSourceV2 with ReadSupport `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    Retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    @dongjoon-hyun Please, take a look at this PR. This is the backport of https://github.com/apache/spark/pull/22413 to Spark 2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    I've considered this for 2.3.3 since 2.3.2 RC6 vote was already started. For now, I'm waiting the result of vote.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    +1, LGTM. Merged to `branch-2.3`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    **[Test build #96592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96592/testReport)** for PR 22489 at commit [`f8b5aa6`](https://github.com/apache/spark/commit/f8b5aa6b3d1fc9e953bcfef8f23aeb6f3ab7818d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96341/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    @dongjoon-hyun @gatorsmile Can the fix be included into the upcoming minor release of 2.3?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    Thank you, @MaxGekk . Since this lands on `branch-2.3`, could you close this PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22489
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options sh...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22489#discussion_r220439848
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala ---
    @@ -315,19 +316,52 @@ class DataSourceV2Suite extends QueryTest with SharedSQLContext {
         checkCanonicalizedOutput(df, 2)
         checkCanonicalizedOutput(df.select('i), 1)
       }
    -}
    -
    -class SimpleDataSourceV2 extends DataSourceV2 with ReadSupport {
     
    -  class Reader extends DataSourceReader {
    -    override def readSchema(): StructType = new StructType().add("i", "int").add("j", "int")
    +  test("SPARK-25425: extra options should override sessions options during reading") {
    +    val prefix = "spark.datasource.userDefinedDataSource."
    +    val optionName = "optionA"
    +    withSQLConf(prefix + optionName -> "true") {
    +      val df = spark
    +        .read
    +        .option(optionName, false)
    +        .format(classOf[DataSourceV2WithSessionConfig].getName).load()
    +      val options = df.queryExecution.optimizedPlan.collectFirst {
    +        case DataSourceV2Relation(_, SimpleDataSourceV2Reader(options)) => options
    +      }
    +      assert(options.get.getBoolean(optionName, true) == false)
    +    }
    +  }
     
    -    override def createDataReaderFactories(): JList[DataReaderFactory[Row]] = {
    -      java.util.Arrays.asList(new SimpleDataReaderFactory(0, 5), new SimpleDataReaderFactory(5, 10))
    +  test("SPARK-25425: extra options should override sessions options during writing") {
    +    withTempPath { path =>
    +      val sessionPath = path.getCanonicalPath
    +      withSQLConf("spark.datasource.simpleWritableDataSource.path" -> sessionPath) {
    +        withTempPath { file =>
    +          val optionPath = file.getCanonicalPath
    +          val format = classOf[SimpleWritableDataSource].getName
    +
    +          val df = Seq((1L, 2L)).toDF("i", "j")
    +          df.write.format(format).option("path", optionPath).save()
    +          assert(!new File(sessionPath).exists)
    +          checkAnswer(spark.read.format(format).option("path", optionPath).load(), df)
    +        }
    +      }
         }
       }
    +}
     
    -  override def createReader(options: DataSourceOptions): DataSourceReader = new Reader
    +case class SimpleDataSourceV2Reader(options: DataSourceOptions) extends DataSourceReader {
    --- End diff --
    
    Thank you for this non-trivial backporting, @MaxGekk !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org