You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/12/09 08:41:11 UTC

[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/23266

    [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits

    ## What changes were proposed in this pull request?
    
    As discussed in https://github.com/apache/spark/pull/23208/files#r239684490 , we should put read related methods in read related mix-in traits like `SupportsBatchRead`, to support write-only table.
    
    In the `Append` operator, we should skip schema validation if the table is write-only. This will be done when we finish the write API refactor in https://github.com/apache/spark/pull/23208/
    
    ## How was this patch tested?
    
    existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark ds-read

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23266.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23266
    
----
commit da520cc6e7daa36c7d7fabdb0a08d4b4341250b9
Author: Wenchen Fan <we...@...>
Date:   2018-12-09T08:32:51Z

    move read related methods from Table to read related mix-in traits

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23266: [SPARK-26313][SQL] move read related methods from Table ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23266
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99886/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23266#discussion_r240029373
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java ---
    @@ -20,14 +20,27 @@
     import org.apache.spark.annotation.Evolving;
     import org.apache.spark.sql.sources.v2.reader.Scan;
     import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
    +import org.apache.spark.sql.types.StructType;
     
     /**
    - * An empty mix-in interface for {@link Table}, to indicate this table supports batch scan.
    - * <p>
    - * If a {@link Table} implements this interface, its {@link Table#newScanBuilder(DataSourceOptions)}
    - * must return a {@link ScanBuilder} that builds {@link Scan} with {@link Scan#toBatch()}
    - * implemented.
    - * </p>
    + * A mix-in interface for {@link Table} to provide data reading ability of batch processing.
      */
     @Evolving
    -public interface SupportsBatchRead extends Table { }
    +public interface SupportsBatchRead extends Table {
    +
    +  /**
    +   * Returns the schema of this table.
    +   */
    +  StructType schema();
    --- End diff --
    
    I'm not sure about this. Maybe it's ok to leave `schema` in `Table`, and asks write-only table to report schema as empty.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23266: [SPARK-26313][SQL] move read related methods from Table ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23266
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5902/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23266: [SPARK-26313][SQL] move read related methods from Table ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23266
  
    **[Test build #99886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99886/testReport)** for PR 23266 at commit [`da520cc`](https://github.com/apache/spark/commit/da520cc6e7daa36c7d7fabdb0a08d4b4341250b9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23266#discussion_r240074711
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java ---
    @@ -20,14 +20,27 @@
     import org.apache.spark.annotation.Evolving;
     import org.apache.spark.sql.sources.v2.reader.Scan;
     import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
    +import org.apache.spark.sql.types.StructType;
     
     /**
    - * An empty mix-in interface for {@link Table}, to indicate this table supports batch scan.
    - * <p>
    - * If a {@link Table} implements this interface, its {@link Table#newScanBuilder(DataSourceOptions)}
    - * must return a {@link ScanBuilder} that builds {@link Scan} with {@link Scan#toBatch()}
    - * implemented.
    - * </p>
    + * A mix-in interface for {@link Table} to provide data reading ability of batch processing.
      */
     @Evolving
    -public interface SupportsBatchRead extends Table { }
    +public interface SupportsBatchRead extends Table {
    +
    +  /**
    +   * Returns the schema of this table.
    +   */
    +  StructType schema();
    --- End diff --
    
    To me, +1 for the current change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23266: [SPARK-26313][SQL] move read related methods from Table ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23266
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23266: [SPARK-26313][SQL] move read related methods from Table ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23266
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23266: [SPARK-26313][SQL] move read related methods from Table ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23266
  
    **[Test build #99886 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99886/testReport)** for PR 23266 at commit [`da520cc`](https://github.com/apache/spark/commit/da520cc6e7daa36c7d7fabdb0a08d4b4341250b9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23266#discussion_r240053406
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java ---
    @@ -18,9 +18,6 @@
     package org.apache.spark.sql.sources.v2;
     
     import org.apache.spark.annotation.Evolving;
    -import org.apache.spark.sql.sources.v2.reader.Scan;
    -import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
    -import org.apache.spark.sql.types.StructType;
     
     /**
      * An interface representing a logical structured data set of a data source. For example, the
    --- End diff --
    
    According to this update, maybe `a logical structured data set` -> `a logical named data set`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23266#discussion_r240073831
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java ---
    @@ -20,14 +20,27 @@
     import org.apache.spark.annotation.Evolving;
     import org.apache.spark.sql.sources.v2.reader.Scan;
     import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
    +import org.apache.spark.sql.types.StructType;
     
     /**
    - * An empty mix-in interface for {@link Table}, to indicate this table supports batch scan.
    - * <p>
    - * If a {@link Table} implements this interface, its {@link Table#newScanBuilder(DataSourceOptions)}
    - * must return a {@link ScanBuilder} that builds {@link Scan} with {@link Scan#toBatch()}
    - * implemented.
    - * </p>
    + * A mix-in interface for {@link Table} to provide data reading ability of batch processing.
      */
     @Evolving
    -public interface SupportsBatchRead extends Table { }
    +public interface SupportsBatchRead extends Table {
    +
    +  /**
    +   * Returns the schema of this table.
    +   */
    +  StructType schema();
    +
    +  /**
    +   * Returns a {@link ScanBuilder} which can be used to build a {@link Scan} later. The built
    +   * {@link Scan} must implement {@link Scan#toBatch()}. Spark will call this method for each data
    +   * scanning query.
    +   * <p>
    +   * The builder can take some query specific information to do operators pushdown, and keep these
    +   * information in the created {@link Scan}.
    +   * </p>
    +   */
    +  ScanBuilder newScanBuilder(DataSourceOptions options);
    --- End diff --
    
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23266: [SPARK-26313][SQL] move read related methods from Table ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23266
  
    cc @rdblue @HyukjinKwon @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org