You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/01/04 15:23:01 UTC

[GitHub] spark pull request #20153: [SPARK-22392][SQL] data source v2 columnar batch ...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/20153

    [SPARK-22392][SQL] data source v2 columnar batch reader

    ## What changes were proposed in this pull request?
    
    a new Data Source V2 interface to allow the data source to return `ColumnarBatch` during the scan.
    
    ## How was this patch tested?
    
    new tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark columnar-reader

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20153.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20153
    
----

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20153: [SPARK-22392][SQL] data source v2 columnar batch ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20153#discussion_r161265463
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala ---
    @@ -17,21 +17,24 @@
     
     package org.apache.spark.sql.execution
     
    -import org.apache.spark.sql.catalyst.expressions.UnsafeRow
    +import org.apache.spark.sql.catalyst.expressions.{BoundReference, UnsafeRow}
     import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
     import org.apache.spark.sql.execution.metric.SQLMetrics
     import org.apache.spark.sql.types.DataType
     import org.apache.spark.sql.vectorized.{ColumnarBatch, ColumnVector}
     
     
     /**
    - * Helper trait for abstracting scan functionality using
    - * [[ColumnarBatch]]es.
    + * Helper trait for abstracting scan functionality using [[ColumnarBatch]]es.
      */
     private[sql] trait ColumnarBatchScan extends CodegenSupport {
     
       def vectorTypes: Option[Seq[String]] = None
     
    +  protected def supportsBatch: Boolean = true
    --- End diff --
    
    Add a comment to explain `supportsBatch `?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20153
  
    **[Test build #86054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86054/testReport)** for PR 20153 at commit [`4a6a725`](https://github.com/apache/spark/commit/4a6a725acffdc24f7c00302c1a0081c93f6acdd8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20153
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86054/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/20153
  
    I will look at this tomorrow.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20153
  
    **[Test build #85683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85683/testReport)** for PR 20153 at commit [`df89a83`](https://github.com/apache/spark/commit/df89a833fb3db3726f45e8a5982b0006f231fd98).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class DataSourceRDDPartition[T : ClassTag](val index: Int, val readTask: ReadTask[T])`
      * `class DataSourceRDD[T: ClassTag](`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20153
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85968/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20153
  
    also cc @rxin


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20153
  
    **[Test build #86160 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86160/testReport)** for PR 20153 at commit [`d666110`](https://github.com/apache/spark/commit/d6661104f314c88ff84057fd4830e7a5fbe964d9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20153
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86080/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20153
  
    **[Test build #86080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86080/testReport)** for PR 20153 at commit [`d666110`](https://github.com/apache/spark/commit/d6661104f314c88ff84057fd4830e7a5fbe964d9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20153
  
    Thanks! Merged to master/2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20153: [SPARK-22392][SQL] data source v2 columnar batch reader

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20153
  
    **[Test build #85968 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85968/testReport)** for PR 20153 at commit [`b8a700d`](https://github.com/apache/spark/commit/b8a700d87d3708bae34054a00ad5d489280e5852).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org