You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2016/03/29 23:48:35 UTC

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/12047

    [SPARK-14224] [SPARK-14223] [SQL] fix RowEncoder and parquet reader for wide table

    ## What changes were proposed in this pull request?
    
    1) fix the RowEncoder for wide table (many columns) by splitting the generate code into multiple functions.
    2) Disable the returning columnar batch in parquet reader if there are many columns.
    3) Added a internal config for maximum number of fields (nested) columns supported by whole stage codegen.
    
    ## How was this patch tested?
    
    Add a tests for table with 1000 columns.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark many_columns

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12047.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12047
    
----
commit 872ecf5341138f075fcc5e8c0caedbd0d1d0e8fd
Author: Davies Liu <da...@databricks.com>
Date:   2016-03-29T21:42:54Z

    fix RowEncoder and parquet reader for wide table

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206071969
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55065/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58290214
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
    @@ -175,10 +175,11 @@ private[sql] case class DataSourceScan(
       }
     
       private def canProcessBatches(): Boolean = {
    +    lazy val conf = SQLContext.getActive().get.conf
         relation match {
           case r: HadoopFsRelation if r.fileFormat.isInstanceOf[ParquetSource] &&
    -        SQLContext.getActive().get.conf.getConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED) &&
    -        SQLContext.getActive().get.conf.getConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED) =>
    +        conf.parquetVectorizedReaderEnabled &&
    +        conf.wholeStageEnabled && schema.length <= conf.wholeStageMaxNumFields =>
    --- End diff --
    
    why do we need to disable batching if num fields is large?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-204613748
  
    **[Test build #54745 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54745/consoleFull)** for PR 12047 at commit [`6eb13f3`](https://github.com/apache/spark/commit/6eb13f3de05f17cd7de2faa6f6eff32b8cb058cc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58290233
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
    @@ -175,10 +175,11 @@ private[sql] case class DataSourceScan(
       }
     
       private def canProcessBatches(): Boolean = {
    +    lazy val conf = SQLContext.getActive().get.conf
         relation match {
           case r: HadoopFsRelation if r.fileFormat.isInstanceOf[ParquetSource] &&
    -        SQLContext.getActive().get.conf.getConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED) &&
    -        SQLContext.getActive().get.conf.getConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED) =>
    +        conf.parquetVectorizedReaderEnabled &&
    +        conf.wholeStageEnabled && schema.length <= conf.wholeStageMaxNumFields =>
    --- End diff --
    
    is it because we have to synchronize this with whole stage codegen because we don't have great planner support for this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58290224
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala ---
    @@ -434,12 +435,20 @@ case class CollapseCodegenStages(conf: SQLConf) extends Rule[SparkPlan] {
         case _ => true
       }
     
    +  private def numOfNestedFields(dataType: DataType): Int = dataType match {
    +    case dt: StructType => dt.fields.map(f => numOfNestedFields(f.dataType)).sum
    +    case m: MapType => numOfNestedFields(m.keyType) + numOfNestedFields(m.valueType)
    +    case a: ArrayType => numOfNestedFields(a.elementType) + 1
    +    case u: UserDefinedType[_] => numOfNestedFields(u.sqlType)
    +    case _ => 1
    +  }
    +
       private def supportCodegen(plan: SparkPlan): Boolean = plan match {
         case plan: CodegenSupport if plan.supportCodegen =>
           val willFallback = plan.expressions.exists(_.find(e => !supportCodegen(e)).isDefined)
           // the generated code will be huge if there are too many columns
    -      val haveManyColumns = plan.output.length > 200
    -      !willFallback && !haveManyColumns
    +      val haveManyFields = numOfNestedFields(plan.schema) > conf.wholeStageMaxNumFields
    --- End diff --
    
    haveTooManyFields


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58745460
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala ---
    @@ -408,6 +411,10 @@ private[sql] class DefaultSource
         val parquetFilterPushDown = sqlContext.conf.parquetFilterPushDown
         val assumeBinaryIsString = sqlContext.conf.isParquetBinaryAsString
         val assumeInt96IsTimestamp = sqlContext.conf.isParquetINT96AsTimestamp
    +    // TODO: this could be wrong iff:
    --- End diff --
    
    What's up with this TODO?  Does this matter if we get rid of buildInternalScan?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206525330
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206597892
  
    Since the last commit is tiny, merging this into master to unblock others.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206476011
  
    This is a huge improvement.   A few minor comments, otherwise LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205531284
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54890/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-204624960
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205462039
  
    **[Test build #54882 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54882/consoleFull)** for PR 12047 at commit [`caae5c9`](https://github.com/apache/spark/commit/caae5c9a64efa3ebe9c59146cabc2587d8e3f405).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58792643
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala ---
    @@ -434,12 +435,20 @@ case class CollapseCodegenStages(conf: SQLConf) extends Rule[SparkPlan] {
         case _ => true
       }
     
    +  private def numOfNestedFields(dataType: DataType): Int = dataType match {
    +    case dt: StructType => dt.fields.map(f => numOfNestedFields(f.dataType)).sum
    +    case m: MapType => numOfNestedFields(m.keyType) + numOfNestedFields(m.valueType)
    +    case a: ArrayType => numOfNestedFields(a.elementType) + 1
    --- End diff --
    
    Considering generate a UnsafeRow, ArrayType itself will need some code to assemble the elements, co counting it as as `1`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-204603297
  
    **[Test build #54737 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54737/consoleFull)** for PR 12047 at commit [`5f8e009`](https://github.com/apache/spark/commit/5f8e00991cb7f7c7fb157a599ce0042b2aa936c3).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58745859
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala ---
    @@ -279,7 +279,8 @@ class FileSourceStrategySuite extends QueryTest with SharedSQLContext with Predi
       /** Plans the query and calls the provided validation function with the planned partitioning. */
       def checkScan(df: DataFrame)(func: Seq[FilePartition] => Unit): Unit = {
         val fileScan = df.queryExecution.executedPlan.collect {
    -      case DataSourceScan(_, scan: FileScanRDD, _, _) => scan
    +      case scan: DataSourceScan if scan.rdd.isInstanceOf[FileScanRDD] =>
    +        scan.rdd.asInstanceOf[FileScanRDD]
    --- End diff --
    
    Why this change?  Pattern matching seems cleaner to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58750142
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala ---
    @@ -279,7 +279,8 @@ class FileSourceStrategySuite extends QueryTest with SharedSQLContext with Predi
       /** Plans the query and calls the provided validation function with the planned partitioning. */
       def checkScan(df: DataFrame)(func: Seq[FilePartition] => Unit): Unit = {
         val fileScan = df.queryExecution.executedPlan.collect {
    -      case DataSourceScan(_, scan: FileScanRDD, _, _) => scan
    +      case scan: DataSourceScan if scan.rdd.isInstanceOf[FileScanRDD] =>
    +        scan.rdd.asInstanceOf[FileScanRDD]
    --- End diff --
    
    DataSourceScan is a trait right now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206590169
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205549973
  
    @marmbrus Did another round of refactor, does this match the things in your mind?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205463805
  
    **[Test build #54883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54883/consoleFull)** for PR 12047 at commit [`ec46638`](https://github.com/apache/spark/commit/ec4663884cf86397514e35910b1e30ada193aa69).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-204624403
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54744/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58290183
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java ---
    @@ -121,6 +109,8 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptCont
           throws IOException, InterruptedException, UnsupportedOperationException {
         super.initialize(inputSplit, taskAttemptContext);
         initializeInternal();
    +    Configuration conf = ContextUtil.getConfiguration(taskAttemptContext);
    +    returnColumnarBatch = conf.getBoolean("returning.batch", false);
    --- End diff --
    
    where is this returning.batch set?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-204197995
  
    **[Test build #2720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2720/consoleFull)** for PR 12047 at commit [`872ecf5`](https://github.com/apache/spark/commit/872ecf5341138f075fcc5e8c0caedbd0d1d0e8fd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205516234
  
    **[Test build #54882 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54882/consoleFull)** for PR 12047 at commit [`caae5c9`](https://github.com/apache/spark/commit/caae5c9a64efa3ebe9c59146cabc2587d8e3f405).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58793202
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala ---
    @@ -434,12 +435,20 @@ case class CollapseCodegenStages(conf: SQLConf) extends Rule[SparkPlan] {
         case _ => true
       }
     
    +  private def numOfNestedFields(dataType: DataType): Int = dataType match {
    +    case dt: StructType => dt.fields.map(f => numOfNestedFields(f.dataType)).sum
    +    case m: MapType => numOfNestedFields(m.keyType) + numOfNestedFields(m.valueType)
    +    case a: ArrayType => numOfNestedFields(a.elementType) + 1
    --- End diff --
    
    we should rename this function then. it is really not about the number of fields? It's some estimation of local variables in codegen.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-204624961
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54745/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205551227
  
    Yeah, this looks much cleaner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-203150999
  
    **[Test build #54463 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54463/consoleFull)** for PR 12047 at commit [`872ecf5`](https://github.com/apache/spark/commit/872ecf5341138f075fcc5e8c0caedbd0d1d0e8fd).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206454486
  
    @marmbrus Is this PR good to go?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58745999
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -396,6 +396,12 @@ object SQLConf {
         .booleanConf
         .createWithDefault(true)
     
    +  val WHOLESTAGE_MAX_NUM_FIELDS = SQLConfigBuilder("spark.sql.codegen.maxFields")
    +    .internal()
    +    .doc("The maxinum number of fields (nested) could be supported by whole stage codegen")
    --- End diff --
    
    Nit: The maximum number of fields (including nested fields) that will be supported before deactivating whole-stage codegen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205531279
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58415966
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
    @@ -175,10 +175,11 @@ private[sql] case class DataSourceScan(
       }
     
       private def canProcessBatches(): Boolean = {
    +    lazy val conf = SQLContext.getActive().get.conf
         relation match {
           case r: HadoopFsRelation if r.fileFormat.isInstanceOf[ParquetSource] &&
    -        SQLContext.getActive().get.conf.getConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED) &&
    -        SQLContext.getActive().get.conf.getConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED) =>
    +        conf.parquetVectorizedReaderEnabled &&
    +        conf.wholeStageEnabled && schema.length <= conf.wholeStageMaxNumFields =>
    --- End diff --
    
    We really need to move all of this logic into the planner.  I don't think it would be that hard after the refactoring we did.
    
    I think main complication is that the parquet reader only supports a subset of possible schemata.  So you have to branch based both on the format but also arbitrary functions of the schema.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205513109
  
    **[Test build #54883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54883/consoleFull)** for PR 12047 at commit [`ec46638`](https://github.com/apache/spark/commit/ec4663884cf86397514e35910b1e30ada193aa69).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58789868
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala ---
    @@ -434,12 +435,20 @@ case class CollapseCodegenStages(conf: SQLConf) extends Rule[SparkPlan] {
         case _ => true
       }
     
    +  private def numOfNestedFields(dataType: DataType): Int = dataType match {
    +    case dt: StructType => dt.fields.map(f => numOfNestedFields(f.dataType)).sum
    +    case m: MapType => numOfNestedFields(m.keyType) + numOfNestedFields(m.valueType)
    +    case a: ArrayType => numOfNestedFields(a.elementType) + 1
    --- End diff --
    
    why + 1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205530013
  
    **[Test build #54890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54890/consoleFull)** for PR 12047 at commit [`f2baae5`](https://github.com/apache/spark/commit/f2baae5c21a6cafa3d8b23910fc4738d9ebeefad).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205516421
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206581934
  
    **[Test build #2761 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2761/consoleFull)** for PR 12047 at commit [`2dc2b75`](https://github.com/apache/spark/commit/2dc2b752e46af40c816cef40dca248cd4a42552f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58789578
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java ---
    @@ -121,6 +109,7 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptCont
           throws IOException, InterruptedException, UnsupportedOperationException {
         super.initialize(inputSplit, taskAttemptContext);
         initializeInternal();
    +    Configuration conf = ContextUtil.getConfiguration(taskAttemptContext);
    --- End diff --
    
    what does this do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58746557
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala ---
    @@ -347,32 +358,24 @@ private[sql] class DefaultSource
           val attemptId = new TaskAttemptID(new TaskID(new JobID(), TaskType.MAP, 0), 0)
           val hadoopAttemptContext = new TaskAttemptContextImpl(broadcastedConf.value.value, attemptId)
     
    -      val parquetReader = try {
    -        if (!enableVectorizedParquetReader) sys.error("Vectorized reader turned off.")
    +      val parquetReader = if (enableVectorizedParquetReader) {
             val vectorizedReader = new VectorizedParquetRecordReader()
             vectorizedReader.initialize(split, hadoopAttemptContext)
             logDebug(s"Appending $partitionSchema ${file.partitionValues}")
             vectorizedReader.initBatch(partitionSchema, file.partitionValues)
    -        // Whole stage codegen (PhysicalRDD) is able to deal with batches directly
             // TODO: fix column appending
    -        if (enableWholestageCodegen) {
    -          logDebug(s"Enabling batch returning")
    -          vectorizedReader.enableReturningBatches()
    -        }
             vectorizedReader
    -      } catch {
    -        case NonFatal(e) =>
    -          logDebug(s"Falling back to parquet-mr: $e", e)
    -          val reader = pushed match {
    -            case Some(filter) =>
    -              new ParquetRecordReader[InternalRow](
    -                new CatalystReadSupport,
    -                FilterCompat.get(filter, null))
    -            case _ =>
    -              new ParquetRecordReader[InternalRow](new CatalystReadSupport)
    -          }
    -          reader.initialize(split, hadoopAttemptContext)
    -          reader
    +      } else {
    --- End diff --
    
    I think we lost the log messages that tell us that we are falling back on a different parquet reader.
    
    I love that we are getting rid of the try/catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-204612666
  
    **[Test build #54744 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54744/consoleFull)** for PR 12047 at commit [`e8fb619`](https://github.com/apache/spark/commit/e8fb6198376d37f1fa0678c359d44b5fd099af54).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206525056
  
    **[Test build #55130 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55130/consoleFull)** for PR 12047 at commit [`2dc2b75`](https://github.com/apache/spark/commit/2dc2b752e46af40c816cef40dca248cd4a42552f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206071968
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58745744
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
    @@ -469,6 +469,13 @@ trait FileFormat {
           options: Map[String, String]): RDD[InternalRow]
     
       /**
    +   * Returns whether this format support returning columnar batch or not.
    --- End diff --
    
    Maybe a TODO, that we should just have different traits for the different formats.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205513507
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206491987
  
    **[Test build #55129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55129/consoleFull)** for PR 12047 at commit [`4b44e22`](https://github.com/apache/spark/commit/4b44e22412a5ec89b7029b3f23b690f61cfc6007).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-203126253
  
    cc @nongli @rxin @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58745638
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala ---
    @@ -306,6 +315,10 @@ private[sql] class DefaultSource
           SQLConf.PARQUET_INT96_AS_TIMESTAMP.key,
           sqlContext.conf.getConf(SQLConf.PARQUET_INT96_AS_TIMESTAMP))
     
    +    // Whole stage codegen (PhysicalRDD) is able to deal with batches directly
    +    parquetConf.setBoolean(ParquetRelation.RETURNING_BATCH,
    +      supportBatch(sqlContext, StructType(partitionSchema.fields ++ dataSchema.fields)))
    --- End diff --
    
    In general, I'd prefer if all new code passes options explicitly to the closure instead of through the conf.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206525336
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55130/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205524261
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54887/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58750566
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala ---
    @@ -408,6 +411,10 @@ private[sql] class DefaultSource
         val parquetFilterPushDown = sqlContext.conf.parquetFilterPushDown
         val assumeBinaryIsString = sqlContext.conf.isParquetBinaryAsString
         val assumeInt96IsTimestamp = sqlContext.conf.isParquetINT96AsTimestamp
    +    // TODO: this could be wrong iff:
    --- End diff --
    
    Once we get rid of buildInternalScan(), all of these will be deleted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205513511
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54883/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205478456
  
    **[Test build #54887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54887/consoleFull)** for PR 12047 at commit [`d8f6e4d`](https://github.com/apache/spark/commit/d8f6e4d625fa5183417a42a8767d058e0d78457a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205479802
  
    **[Test build #54890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54890/consoleFull)** for PR 12047 at commit [`f2baae5`](https://github.com/apache/spark/commit/f2baae5c21a6cafa3d8b23910fc4738d9ebeefad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-204603114
  
    **[Test build #54737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54737/consoleFull)** for PR 12047 at commit [`5f8e009`](https://github.com/apache/spark/commit/5f8e00991cb7f7c7fb157a599ce0042b2aa936c3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205524257
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206532804
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206532808
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55129/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58746322
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala ---
    @@ -347,32 +358,24 @@ private[sql] class DefaultSource
           val attemptId = new TaskAttemptID(new TaskID(new JobID(), TaskType.MAP, 0), 0)
           val hadoopAttemptContext = new TaskAttemptContextImpl(broadcastedConf.value.value, attemptId)
     
    -      val parquetReader = try {
    -        if (!enableVectorizedParquetReader) sys.error("Vectorized reader turned off.")
    +      val parquetReader = if (enableVectorizedParquetReader) {
             val vectorizedReader = new VectorizedParquetRecordReader()
             vectorizedReader.initialize(split, hadoopAttemptContext)
             logDebug(s"Appending $partitionSchema ${file.partitionValues}")
             vectorizedReader.initBatch(partitionSchema, file.partitionValues)
    -        // Whole stage codegen (PhysicalRDD) is able to deal with batches directly
             // TODO: fix column appending
    --- End diff --
    
    I think this TODO is stale.  We are appending the columns on the line above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-204624402
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r57809576
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java ---
    @@ -224,9 +224,13 @@ public ColumnarBatch resultBatch() {
     
       /**
        * Can be called before any rows are returned to enable returning columnar batches directly.
    +   *
    +   * @param maxColumns The maximum number of fields supported by whole stage codegen.
        */
    -  public void enableReturningBatches() {
    -    returnColumnarBatch = true;
    +  public boolean tryEnableReturningBatches(int maxColumns) {
    --- End diff --
    
    Yes, we should do this in planner, by checking the schema, which requires some cleanup in parquet reader, currently which still use runtime exception to do the switch. cc @nongli @sameeragarwal 
    
    Should we wait for that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-203127055
  
    **[Test build #54463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54463/consoleFull)** for PR 12047 at commit [`872ecf5`](https://github.com/apache/spark/commit/872ecf5341138f075fcc5e8c0caedbd0d1d0e8fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205523846
  
    **[Test build #54887 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54887/consoleFull)** for PR 12047 at commit [`d8f6e4d`](https://github.com/apache/spark/commit/d8f6e4d625fa5183417a42a8767d058e0d78457a).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206530742
  
    **[Test build #2761 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2761/consoleFull)** for PR 12047 at commit [`2dc2b75`](https://github.com/apache/spark/commit/2dc2b752e46af40c816cef40dca248cd4a42552f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-204624308
  
    **[Test build #54744 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54744/consoleFull)** for PR 12047 at commit [`e8fb619`](https://github.com/apache/spark/commit/e8fb6198376d37f1fa0678c359d44b5fd099af54).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-204603302
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54737/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58793691
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala ---
    @@ -434,12 +435,20 @@ case class CollapseCodegenStages(conf: SQLConf) extends Rule[SparkPlan] {
         case _ => true
       }
     
    +  private def numOfNestedFields(dataType: DataType): Int = dataType match {
    +    case dt: StructType => dt.fields.map(f => numOfNestedFields(f.dataType)).sum
    +    case m: MapType => numOfNestedFields(m.keyType) + numOfNestedFields(m.valueType)
    +    case a: ArrayType => numOfNestedFields(a.elementType) + 1
    --- End diff --
    
    Removed this `+1`, make it consistent with StructType and MapType. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206495529
  
    **[Test build #55130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55130/consoleFull)** for PR 12047 at commit [`2dc2b75`](https://github.com/apache/spark/commit/2dc2b752e46af40c816cef40dca248cd4a42552f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r57807341
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java ---
    @@ -224,9 +224,13 @@ public ColumnarBatch resultBatch() {
     
       /**
        * Can be called before any rows are returned to enable returning columnar batches directly.
    +   *
    +   * @param maxColumns The maximum number of fields supported by whole stage codegen.
        */
    -  public void enableReturningBatches() {
    -    returnColumnarBatch = true;
    +  public boolean tryEnableReturningBatches(int maxColumns) {
    --- End diff --
    
    Why are we pushing a config value into the reader so that it can switch?  Can't we just put this logic in the query planner where the decision to call `enableReturningBatches` is.  I think its clearer if there is a single place where we decide to use column batches or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12047


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-205516424
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54882/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206048632
  
    **[Test build #55065 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55065/consoleFull)** for PR 12047 at commit [`79c2ad5`](https://github.com/apache/spark/commit/79c2ad5f72546517fc40bd31e5b32eeafc97d304).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-204624890
  
    **[Test build #54745 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54745/consoleFull)** for PR 12047 at commit [`6eb13f3`](https://github.com/apache/spark/commit/6eb13f3de05f17cd7de2faa6f6eff32b8cb058cc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58745404
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala ---
    @@ -481,6 +497,17 @@ private[sql] class DefaultSource
       }
     }
     
    +/**
    + * The ParquetInputFormat that create VectorizedParquetRecordReader.
    + */
    +final class VectorizedParquetInputFormat extends ParquetInputFormat[InternalRow] {
    --- End diff --
    
    @liancheng @cloud-fan lets make sure we delete this when you remove `buildInternalScan`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-203151376
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54463/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12047#discussion_r58792173
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java ---
    @@ -121,6 +109,7 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptCont
           throws IOException, InterruptedException, UnsupportedOperationException {
         super.initialize(inputSplit, taskAttemptContext);
         initializeInternal();
    +    Configuration conf = ContextUtil.getConfiguration(taskAttemptContext);
    --- End diff --
    
    Not needed, will remove


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-204603300
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206493175
  
    @marmbrus Since we will remove buildInternalScan soon, removed the returning batch support for that case to simplify this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206071710
  
    **[Test build #55065 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55065/consoleFull)** for PR 12047 at commit [`79c2ad5`](https://github.com/apache/spark/commit/79c2ad5f72546517fc40bd31e5b32eeafc97d304).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SPARK-14310] [SQL...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-206532385
  
    **[Test build #55129 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55129/consoleFull)** for PR 12047 at commit [`4b44e22`](https://github.com/apache/spark/commit/4b44e22412a5ec89b7029b3f23b690f61cfc6007).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12047#issuecomment-203151372
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org