You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/09/20 09:52:17 UTC
[GitHub] spark pull request #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options sh...
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/22489
[SPARK-25425][SQL][BACKPORT-2.3] Extra options should override session options in DataSource V2
## What changes were proposed in this pull request?
In the PR, I propose overriding session options by extra options in DataSource V2. Extra options are more specific and set via `.option()`, and should overwrite more generic session options.
## How was this patch tested?
Added tests for read and write paths.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 session-options-2.3
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22489.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22489
----
commit 0cc174160bf04bc97de62c438d571b27224c53a0
Author: Maxim Gekk <ma...@...>
Date: 2018-09-16T00:24:11Z
[SPARK-25425][SQL] Extra options should override session options in DataSource V2
In the PR, I propose overriding session options by extra options in DataSource V2. Extra options are more specific and set via `.option()`, and should overwrite more generic session options. Entries from seconds map overwrites entries with the same key from the first map, for example:
```Scala
scala> Map("option" -> false) ++ Map("option" -> true)
res0: scala.collection.immutable.Map[String,Boolean] = Map(option -> true)
```
Added a test for checking which option is propagated to a data source in `load()`.
Closes #22413 from MaxGekk/session-options.
Lead-authored-by: Maxim Gekk <ma...@databricks.com>
Co-authored-by: Dongjoon Hyun <do...@apache.org>
Co-authored-by: Maxim Gekk <ma...@gmail.com>
Signed-off-by: Dongjoon Hyun <do...@apache.org>
commit 744c5b8b0c8fbc7caa6d03902ed323df5c73986b
Author: Maxim Gekk <ma...@...>
Date: 2018-09-19T19:40:25Z
Adding missing import after merge to 2.4
commit c87a6de4a1de82514fad590c68f35fbfaf54d825
Author: Maxim Gekk <ma...@...>
Date: 2018-09-19T20:20:21Z
Fix merge
commit f8b5aa6b3d1fc9e953bcfef8f23aeb6f3ab7818d
Author: Maxim Gekk <ma...@...>
Date: 2018-09-20T09:47:57Z
Adding options to Reader
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22489
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96592/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22489
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22489
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options sh...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk closed the pull request at:
https://github.com/apache/spark/pull/22489
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22489
**[Test build #96592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96592/testReport)** for PR 22489 at commit [`f8b5aa6`](https://github.com/apache/spark/commit/f8b5aa6b3d1fc9e953bcfef8f23aeb6f3ab7818d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class SimpleDataSourceV2Reader(options: DataSourceOptions) extends DataSourceReader `
* `class SimpleDataSourceV2 extends DataSourceV2 with ReadSupport `
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22489
**[Test build #96341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96341/testReport)** for PR 22489 at commit [`f8b5aa6`](https://github.com/apache/spark/commit/f8b5aa6b3d1fc9e953bcfef8f23aeb6f3ab7818d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22489
**[Test build #96341 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96341/testReport)** for PR 22489 at commit [`f8b5aa6`](https://github.com/apache/spark/commit/f8b5aa6b3d1fc9e953bcfef8f23aeb6f3ab7818d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `case class SimpleDataSourceV2Reader(options: DataSourceOptions) extends DataSourceReader `
* `class SimpleDataSourceV2 extends DataSourceV2 with ReadSupport `
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22489
Retest this please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/22489
@dongjoon-hyun Please, take a look at this PR. This is the backport of https://github.com/apache/spark/pull/22413 to Spark 2.3.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22489
I've considered this for 2.3.3 since 2.3.2 RC6 vote was already started. For now, I'm waiting the result of vote.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22489
+1, LGTM. Merged to `branch-2.3`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22489
**[Test build #96592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96592/testReport)** for PR 22489 at commit [`f8b5aa6`](https://github.com/apache/spark/commit/f8b5aa6b3d1fc9e953bcfef8f23aeb6f3ab7818d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22489
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96341/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22489
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/22489
@dongjoon-hyun @gatorsmile Can the fix be included into the upcoming minor release of 2.3?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22489
Thank you, @MaxGekk . Since this lands on `branch-2.3`, could you close this PR?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options should ov...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22489
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22489: [SPARK-25425][SQL][BACKPORT-2.3] Extra options sh...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22489#discussion_r220439848
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala ---
@@ -315,19 +316,52 @@ class DataSourceV2Suite extends QueryTest with SharedSQLContext {
checkCanonicalizedOutput(df, 2)
checkCanonicalizedOutput(df.select('i), 1)
}
-}
-
-class SimpleDataSourceV2 extends DataSourceV2 with ReadSupport {
- class Reader extends DataSourceReader {
- override def readSchema(): StructType = new StructType().add("i", "int").add("j", "int")
+ test("SPARK-25425: extra options should override sessions options during reading") {
+ val prefix = "spark.datasource.userDefinedDataSource."
+ val optionName = "optionA"
+ withSQLConf(prefix + optionName -> "true") {
+ val df = spark
+ .read
+ .option(optionName, false)
+ .format(classOf[DataSourceV2WithSessionConfig].getName).load()
+ val options = df.queryExecution.optimizedPlan.collectFirst {
+ case DataSourceV2Relation(_, SimpleDataSourceV2Reader(options)) => options
+ }
+ assert(options.get.getBoolean(optionName, true) == false)
+ }
+ }
- override def createDataReaderFactories(): JList[DataReaderFactory[Row]] = {
- java.util.Arrays.asList(new SimpleDataReaderFactory(0, 5), new SimpleDataReaderFactory(5, 10))
+ test("SPARK-25425: extra options should override sessions options during writing") {
+ withTempPath { path =>
+ val sessionPath = path.getCanonicalPath
+ withSQLConf("spark.datasource.simpleWritableDataSource.path" -> sessionPath) {
+ withTempPath { file =>
+ val optionPath = file.getCanonicalPath
+ val format = classOf[SimpleWritableDataSource].getName
+
+ val df = Seq((1L, 2L)).toDF("i", "j")
+ df.write.format(format).option("path", optionPath).save()
+ assert(!new File(sessionPath).exists)
+ checkAnswer(spark.read.format(format).option("path", optionPath).load(), df)
+ }
+ }
}
}
+}
- override def createReader(options: DataSourceOptions): DataSourceReader = new Reader
+case class SimpleDataSourceV2Reader(options: DataSourceOptions) extends DataSourceReader {
--- End diff --
Thank you for this non-trivial backporting, @MaxGekk !
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org