You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/12/09 08:41:11 UTC
[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...
GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/23266
[SPARK-26313][SQL] move read related methods from Table to read related mix-in traits
## What changes were proposed in this pull request?
As discussed in https://github.com/apache/spark/pull/23208/files#r239684490 , we should put read related methods in read related mix-in traits like `SupportsBatchRead`, to support write-only table.
In the `Append` operator, we should skip schema validation if the table is write-only. This will be done when we finish the write API refactor in https://github.com/apache/spark/pull/23208/
## How was this patch tested?
existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark ds-read
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/23266.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #23266
----
commit da520cc6e7daa36c7d7fabdb0a08d4b4341250b9
Author: Wenchen Fan <we...@...>
Date: 2018-12-09T08:32:51Z
move read related methods from Table to read related mix-in traits
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23266: [SPARK-26313][SQL] move read related methods from Table ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23266
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99886/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/23266#discussion_r240029373
--- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java ---
@@ -20,14 +20,27 @@
import org.apache.spark.annotation.Evolving;
import org.apache.spark.sql.sources.v2.reader.Scan;
import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
+import org.apache.spark.sql.types.StructType;
/**
- * An empty mix-in interface for {@link Table}, to indicate this table supports batch scan.
- * <p>
- * If a {@link Table} implements this interface, its {@link Table#newScanBuilder(DataSourceOptions)}
- * must return a {@link ScanBuilder} that builds {@link Scan} with {@link Scan#toBatch()}
- * implemented.
- * </p>
+ * A mix-in interface for {@link Table} to provide data reading ability of batch processing.
*/
@Evolving
-public interface SupportsBatchRead extends Table { }
+public interface SupportsBatchRead extends Table {
+
+ /**
+ * Returns the schema of this table.
+ */
+ StructType schema();
--- End diff --
I'm not sure about this. Maybe it's ok to leave `schema` in `Table`, and asks write-only table to report schema as empty.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23266: [SPARK-26313][SQL] move read related methods from Table ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23266
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5902/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23266: [SPARK-26313][SQL] move read related methods from Table ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23266
**[Test build #99886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99886/testReport)** for PR 23266 at commit [`da520cc`](https://github.com/apache/spark/commit/da520cc6e7daa36c7d7fabdb0a08d4b4341250b9).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/23266#discussion_r240074711
--- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java ---
@@ -20,14 +20,27 @@
import org.apache.spark.annotation.Evolving;
import org.apache.spark.sql.sources.v2.reader.Scan;
import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
+import org.apache.spark.sql.types.StructType;
/**
- * An empty mix-in interface for {@link Table}, to indicate this table supports batch scan.
- * <p>
- * If a {@link Table} implements this interface, its {@link Table#newScanBuilder(DataSourceOptions)}
- * must return a {@link ScanBuilder} that builds {@link Scan} with {@link Scan#toBatch()}
- * implemented.
- * </p>
+ * A mix-in interface for {@link Table} to provide data reading ability of batch processing.
*/
@Evolving
-public interface SupportsBatchRead extends Table { }
+public interface SupportsBatchRead extends Table {
+
+ /**
+ * Returns the schema of this table.
+ */
+ StructType schema();
--- End diff --
To me, +1 for the current change.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23266: [SPARK-26313][SQL] move read related methods from Table ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23266
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23266: [SPARK-26313][SQL] move read related methods from Table ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23266
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23266: [SPARK-26313][SQL] move read related methods from Table ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23266
**[Test build #99886 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99886/testReport)** for PR 23266 at commit [`da520cc`](https://github.com/apache/spark/commit/da520cc6e7daa36c7d7fabdb0a08d4b4341250b9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/23266#discussion_r240053406
--- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java ---
@@ -18,9 +18,6 @@
package org.apache.spark.sql.sources.v2;
import org.apache.spark.annotation.Evolving;
-import org.apache.spark.sql.sources.v2.reader.Scan;
-import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
-import org.apache.spark.sql.types.StructType;
/**
* An interface representing a logical structured data set of a data source. For example, the
--- End diff --
According to this update, maybe `a logical structured data set` -> `a logical named data set`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/23266#discussion_r240073831
--- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java ---
@@ -20,14 +20,27 @@
import org.apache.spark.annotation.Evolving;
import org.apache.spark.sql.sources.v2.reader.Scan;
import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
+import org.apache.spark.sql.types.StructType;
/**
- * An empty mix-in interface for {@link Table}, to indicate this table supports batch scan.
- * <p>
- * If a {@link Table} implements this interface, its {@link Table#newScanBuilder(DataSourceOptions)}
- * must return a {@link ScanBuilder} that builds {@link Scan} with {@link Scan#toBatch()}
- * implemented.
- * </p>
+ * A mix-in interface for {@link Table} to provide data reading ability of batch processing.
*/
@Evolving
-public interface SupportsBatchRead extends Table { }
+public interface SupportsBatchRead extends Table {
+
+ /**
+ * Returns the schema of this table.
+ */
+ StructType schema();
+
+ /**
+ * Returns a {@link ScanBuilder} which can be used to build a {@link Scan} later. The built
+ * {@link Scan} must implement {@link Scan#toBatch()}. Spark will call this method for each data
+ * scanning query.
+ * <p>
+ * The builder can take some query specific information to do operators pushdown, and keep these
+ * information in the created {@link Scan}.
+ * </p>
+ */
+ ScanBuilder newScanBuilder(DataSourceOptions options);
--- End diff --
+1
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23266: [SPARK-26313][SQL] move read related methods from Table ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/23266
cc @rdblue @HyukjinKwon @gatorsmile
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org