You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by marmbrus <gi...@git.apache.org> on 2014/08/07 01:40:20 UTC

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/1819

    [SPARK-2406][SQL] Initial support for using ParquetTableScan to read HiveMetaStore tables.

    This PR adds an experimental flag `spark.sql.hive.convertMetastoreParquet` that when true causes the planner to detects table that use Hive's Parquet SerDe and instead plans them using Spark SQL's native `ParquetTableScan`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark parquetMetastore

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1819.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1819
    
----
commit 212d5cd2bda5f0a5c9899923c7257ea99e9077bc
Author: Michael Armbrust <mi...@databricks.com>
Date:   2014-08-06T23:25:12Z

    Initial support for using ParquetTableScan to read HiveMetaStore tables.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2406][SQL] Initial support for usi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-52406347
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18678/consoleFull) for   PR 1819 at commit [`570fd9e`](https://github.com/apache/spark/commit/570fd9eb6a27b0febe174e2d64cbfee27327a278).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class OutputFaker(output: Seq[Attribute], child: SparkPlan) extends SparkPlan `
      * `    implicit class LogicalPlanHacks(s: SchemaRDD) `
      * `    implicit class PhysicalPlanHacks(originalPlan: SparkPlan) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1819#discussion_r15913747
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
    @@ -78,6 +78,14 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
       // Change the default SQL dialect to HiveQL
       override private[spark] def dialect: String = getConf(SQLConf.DIALECT, "hiveql")
     
    +  /**
    +   * When true, enables an experimental feature where metastore tables that use the parquet SerDe
    +   * are automatically converted to use the Spark SQL parquet table scan, instead of the Hive
    +   * SerDe.
    +   */
    +  private[spark] def convertMetastoreParquet: Boolean =
    +    getConf("spark.sql.hive.convertMetastoreParquet", "false") == "true"
    --- End diff --
    
    Sounds like a job for `HiveConf extends SQLConf`! After all, there's nothing better than confusing users trying to use `org.apache.hadoop.hive.conf`!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-51417477
  
    QA results for PR 1819:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds the following public classes (experimental):<br>case class OutputFaker(output: Seq[Attribute], child: SparkPlan) extends SparkPlan {<br>implicit class LogicalPlanHacks(s: SchemaRDD) {<br>implicit class PhysicalPlanHacks(s: SparkPlan) {<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18078/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-52548411
  
    This only failed the thrift server tests.  I'm going to merge into master and 1.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-52538922
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2406][SQL] Initial support for usi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-52405112
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18681/consoleFull) for   PR 1819 at commit [`41ebc5f`](https://github.com/apache/spark/commit/41ebc5f912093fdf7b21808ce19da1bae514435e).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class Serializer `
      * `abstract class SerializerInstance `
      * `abstract class SerializationStream `
      * `abstract class DeserializationStream `
      * `class ShuffleBlockManager(blockManager: BlockManager,`
      * `case class OutputFaker(output: Seq[Attribute], child: SparkPlan) extends SparkPlan `
      * `    implicit class LogicalPlanHacks(s: SchemaRDD) `
      * `    implicit class PhysicalPlanHacks(originalPlan: SparkPlan) `
      * `class FakeParquetSerDe extends SerDe `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by concretevitamin <gi...@git.apache.org>.

Github user concretevitamin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1819#discussion_r15913270
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
    @@ -78,6 +78,14 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
       // Change the default SQL dialect to HiveQL
       override private[spark] def dialect: String = getConf(SQLConf.DIALECT, "hiveql")
     
    +  /**
    +   * When true, enables an experimental feature where metastore tables that use the parquet SerDe
    +   * are automatically converted to use the Spark SQL parquet table scan, instead of the Hive
    +   * SerDe.
    +   */
    +  private[spark] def convertMetastoreParquet: Boolean =
    +    getConf("spark.sql.hive.convertMetastoreParquet", "false") == "true"
    --- End diff --
    
    I am going to test this PR soon. In the meantime would it make sense to only put this in `SQLConf` (as well as a field of the key string in the singleton object), making that class the central place that stores SQL configs? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2406][SQL] Initial support for usi...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1819#discussion_r16326189
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/ParquetMetastoreSuite.scala ---
    @@ -0,0 +1,138 @@
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.parquet
    +
    +import java.io.File
    +
    +import org.apache.spark.sql.hive.execution.HiveTableScan
    +import org.scalatest.BeforeAndAfterAll
    +
    +import scala.reflect.ClassTag
    +
    +import org.apache.spark.sql.{SQLConf, QueryTest}
    +import org.apache.spark.sql.execution.{BroadcastHashJoin, ShuffledHashJoin}
    +import org.apache.spark.sql.hive.test.TestHive
    +import org.apache.spark.sql.hive.test.TestHive._
    +
    +case class ParquetData(intField: Int, stringField: String)
    +
    +/**
    + * Tests for our SerDe -> Native parquet scan conversion.
    + */
    +class ParquetMetastoreSuite extends QueryTest with BeforeAndAfterAll {
    +
    +  override def beforeAll(): Unit = {
    +    setConf("spark.sql.hive.convertMetastoreParquet", "true")
    +  }
    +
    +  override def afterAll(): Unit = {
    +    setConf("spark.sql.hive.convertMetastoreParquet", "false")
    +  }
    +
    +  val partitionedTableDir = File.createTempFile("parquettests", "sparksql")
    +  partitionedTableDir.delete()
    +  partitionedTableDir.mkdir()
    +
    +  (1 to 10).foreach { p =>
    +    val partDir = new File(partitionedTableDir, s"p=$p")
    +    sparkContext.makeRDD(1 to 10)
    +      .map(i => ParquetData(i, s"part-$p"))
    +      .saveAsParquetFile(partDir.getCanonicalPath)
    +  }
    +
    +  sql(s"""
    --- End diff --
    
    I think we are okay as long as we don't use createQueryTest anywhere, since it runs `reset()`.  I can try to move the DDL into each test to be safe though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2406][SQL] Initial support for usi...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1819#discussion_r15978899
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
    @@ -32,6 +38,113 @@ private[hive] trait HiveStrategies {
     
       val hiveContext: HiveContext
     
    +  /**
    +   * :: Experimental ::
    +   * Finds table scans that would use the Hive SerDe and replaces them with our own native parquet
    +   * table scan operator.
    +   *
    +   * TODO: Much of this logic is duplicated in HiveTableScan.  Ideally we would do some refactoring
    +   * but since this is after the code freeze for 1.1 all logic is here to minimize disruption.
    +   */
    +  @Experimental
    +  object ParquetConversion extends Strategy {
    +    implicit class LogicalPlanHacks(s: SchemaRDD) {
    +      def lowerCase =
    +        new SchemaRDD(s.sqlContext, LowerCaseSchema(s.logicalPlan))
    +    }
    +
    +    implicit class PhysicalPlanHacks(s: SparkPlan) {
    +      def fakeOutput(newOutput: Seq[Attribute]) = OutputFaker(newOutput, s)
    +    }
    +
    +    def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    +      case PhysicalOperation(projectList, predicates, relation: MetastoreRelation)
    +          if relation.tableDesc.getSerdeClassName.contains("Parquet") &&
    +             hiveContext.convertMetastoreParquet =>
    +
    +        // Filter out all predicates that only deal with partition keys
    +        val partitionKeyIds = relation.partitionKeys.map(_.exprId).toSet
    +        val (pruningPredicates, otherPredicates) = predicates.partition {
    +          _.references.map(_.exprId).subsetOf(partitionKeyIds)
    +        }
    +
    +        // We are going to throw the predicates and projection back at the whole optimization
    +        // sequence so lets unresolve all the attributes, allowing them to be rebound to the
    +        // matching parquet attributes.
    +        val unresolvedOtherPredicates = otherPredicates.map(_ transform {
    +          case a: AttributeReference => UnresolvedAttribute(a.name)
    +        }).reduceOption(And).getOrElse(Literal(true))
    +
    +        val unresolvedProjection = projectList.map(_ transform {
    +          // Handle non-partitioning columns
    +          case a: AttributeReference if !partitionKeyIds.contains(a.exprId) => UnresolvedAttribute(a.name)
    +        })
    +
    +        if (relation.hiveQlTable.isPartitioned) {
    +          val rawPredicate = pruningPredicates.reduceOption(And).getOrElse(Literal(true))
    +          // Translate the predicate so that it automatically casts the input values to the correct
    +          // data types during evaluation
    +          val castedPredicate = rawPredicate transform {
    +            case a: AttributeReference =>
    +              val idx = relation.partitionKeys.indexWhere(a.exprId == _.exprId)
    +              val key = relation.partitionKeys(idx)
    +              Cast(BoundReference(idx, StringType, nullable = true), key.dataType)
    +          }
    +
    +          val inputData = new GenericMutableRow(relation.partitionKeys.size)
    +          val pruningCondition =
    +            if(codegenEnabled) {
    +              GeneratePredicate(castedPredicate)
    +            } else {
    +              InterpretedPredicate(castedPredicate)
    +            }
    +
    +          val partitions = relation.hiveQlPartitions.filter { part =>
    +            val partitionValues = part.getValues
    +            var i = 0
    +            while (i < partitionValues.size()) {
    +              inputData(i) = partitionValues(i)
    +              i += 1
    +            }
    +            pruningCondition(inputData)
    +          }
    +
    +          org.apache.spark.sql.execution.Union(
    +            partitions.par.map { p =>
    +              val partValues = p.getValues()
    +              val internalProjection = unresolvedProjection.map(_ transform {
    +                // Handle partitioning columns
    +                case a: AttributeReference if partitionKeyIds.contains(a.exprId) => {
    +                  val idx = relation.partitionKeys.indexWhere(a.exprId == _.exprId)
    +                  val key = relation.partitionKeys(idx)
    +
    +                  Alias(Cast(Literal(partValues.get(idx), StringType), key.dataType), a.name)()
    +                }
    +              })
    +
    +              hiveContext
    --- End diff --
    
    Will that causes performance issue if there are lots of partitions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by concretevitamin <gi...@git.apache.org>.

Github user concretevitamin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1819#discussion_r15914067
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
    @@ -78,6 +78,14 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
       // Change the default SQL dialect to HiveQL
       override private[spark] def dialect: String = getConf(SQLConf.DIALECT, "hiveql")
     
    +  /**
    +   * When true, enables an experimental feature where metastore tables that use the parquet SerDe
    +   * are automatically converted to use the Spark SQL parquet table scan, instead of the Hive
    +   * SerDe.
    +   */
    +  private[spark] def convertMetastoreParquet: Boolean =
    +    getConf("spark.sql.hive.convertMetastoreParquet", "false") == "true"
    --- End diff --
    
    When in doubt, make up longer names: `SQLConfigOpts`, `HiveConfigOpts`. But this is only possibly relevant in the future and should not block this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-51559638
  
    QA tests have started for PR 1819. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18179/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-52539263
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18772/consoleFull) for   PR 1819 at commit [`1620079`](https://github.com/apache/spark/commit/162007913f962910d40be7f03a39cf2541ab8dcc).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-51416510
  
    QA tests have started for PR 1819. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18086/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2406][SQL] Initial support for usi...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1819#discussion_r15978931
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
    @@ -32,6 +38,113 @@ private[hive] trait HiveStrategies {
     
       val hiveContext: HiveContext
     
    +  /**
    +   * :: Experimental ::
    +   * Finds table scans that would use the Hive SerDe and replaces them with our own native parquet
    +   * table scan operator.
    +   *
    +   * TODO: Much of this logic is duplicated in HiveTableScan.  Ideally we would do some refactoring
    +   * but since this is after the code freeze for 1.1 all logic is here to minimize disruption.
    +   */
    +  @Experimental
    +  object ParquetConversion extends Strategy {
    +    implicit class LogicalPlanHacks(s: SchemaRDD) {
    +      def lowerCase =
    +        new SchemaRDD(s.sqlContext, LowerCaseSchema(s.logicalPlan))
    +    }
    +
    +    implicit class PhysicalPlanHacks(s: SparkPlan) {
    +      def fakeOutput(newOutput: Seq[Attribute]) = OutputFaker(newOutput, s)
    +    }
    +
    +    def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    +      case PhysicalOperation(projectList, predicates, relation: MetastoreRelation)
    +          if relation.tableDesc.getSerdeClassName.contains("Parquet") &&
    +             hiveContext.convertMetastoreParquet =>
    +
    +        // Filter out all predicates that only deal with partition keys
    +        val partitionKeyIds = relation.partitionKeys.map(_.exprId).toSet
    +        val (pruningPredicates, otherPredicates) = predicates.partition {
    +          _.references.map(_.exprId).subsetOf(partitionKeyIds)
    +        }
    +
    +        // We are going to throw the predicates and projection back at the whole optimization
    +        // sequence so lets unresolve all the attributes, allowing them to be rebound to the
    +        // matching parquet attributes.
    +        val unresolvedOtherPredicates = otherPredicates.map(_ transform {
    +          case a: AttributeReference => UnresolvedAttribute(a.name)
    +        }).reduceOption(And).getOrElse(Literal(true))
    +
    +        val unresolvedProjection = projectList.map(_ transform {
    +          // Handle non-partitioning columns
    +          case a: AttributeReference if !partitionKeyIds.contains(a.exprId) => UnresolvedAttribute(a.name)
    +        })
    +
    +        if (relation.hiveQlTable.isPartitioned) {
    +          val rawPredicate = pruningPredicates.reduceOption(And).getOrElse(Literal(true))
    +          // Translate the predicate so that it automatically casts the input values to the correct
    +          // data types during evaluation
    +          val castedPredicate = rawPredicate transform {
    +            case a: AttributeReference =>
    +              val idx = relation.partitionKeys.indexWhere(a.exprId == _.exprId)
    +              val key = relation.partitionKeys(idx)
    +              Cast(BoundReference(idx, StringType, nullable = true), key.dataType)
    +          }
    +
    +          val inputData = new GenericMutableRow(relation.partitionKeys.size)
    +          val pruningCondition =
    +            if(codegenEnabled) {
    +              GeneratePredicate(castedPredicate)
    +            } else {
    +              InterpretedPredicate(castedPredicate)
    +            }
    +
    +          val partitions = relation.hiveQlPartitions.filter { part =>
    +            val partitionValues = part.getValues
    +            var i = 0
    +            while (i < partitionValues.size()) {
    +              inputData(i) = partitionValues(i)
    +              i += 1
    +            }
    +            pruningCondition(inputData)
    +          }
    +
    +          org.apache.spark.sql.execution.Union(
    +            partitions.par.map { p =>
    +              val partValues = p.getValues()
    +              val internalProjection = unresolvedProjection.map(_ transform {
    +                // Handle partitioning columns
    +                case a: AttributeReference if partitionKeyIds.contains(a.exprId) => {
    +                  val idx = relation.partitionKeys.indexWhere(a.exprId == _.exprId)
    +                  val key = relation.partitionKeys(idx)
    +
    +                  Alias(Cast(Literal(partValues.get(idx), StringType), key.dataType), a.name)()
    +                }
    +              })
    +
    +              hiveContext
    --- End diff --
    
    It did due to the hadoopConf getting broadcasted over and over again. Hence: c0d9b72


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by patmcdonough <gi...@git.apache.org>.

Github user patmcdonough commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-51422764
  
    @marmbrus - great to see this. Let's test the Hive 13 syntactic sugar too to make sure it still works (`... STORED AS PARQUET`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-51562953
  
    QA results for PR 1819:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds the following public classes (experimental):<br>case class OutputFaker(output: Seq[Attribute], child: SparkPlan) extends SparkPlan {<br>implicit class LogicalPlanHacks(s: SchemaRDD) {<br>implicit class PhysicalPlanHacks(s: SparkPlan) {<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18179/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1819


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2406][SQL] Initial support for usi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-52232683
  
    QA results for PR 1819:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds the following public classes (experimental):<br>case class OutputFaker(output: Seq[Attribute], child: SparkPlan) extends SparkPlan {<br>implicit class LogicalPlanHacks(s: SchemaRDD) {<br>implicit class PhysicalPlanHacks(originalPlan: SparkPlan) {<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18555/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2406][SQL] Initial support for usi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-52405086
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18681/consoleFull) for   PR 1819 at commit [`41ebc5f`](https://github.com/apache/spark/commit/41ebc5f912093fdf7b21808ce19da1bae514435e).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2406][SQL] Initial support for usi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-52222946
  
    QA tests have started for PR 1819. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18555/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-52548227
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18772/consoleFull) for   PR 1819 at commit [`1620079`](https://github.com/apache/spark/commit/162007913f962910d40be7f03a39cf2541ab8dcc).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class OutputFaker(output: Seq[Attribute], child: SparkPlan) extends SparkPlan `
      * `    implicit class LogicalPlanHacks(s: SchemaRDD) `
      * `    implicit class PhysicalPlanHacks(originalPlan: SparkPlan) `
      * `class FakeParquetSerDe extends SerDe `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-51554839
  
    QA tests have started for PR 1819. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18168/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-51420794
  
    QA results for PR 1819:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds the following public classes (experimental):<br>case class OutputFaker(output: Seq[Attribute], child: SparkPlan) extends SparkPlan {<br>implicit class LogicalPlanHacks(s: SchemaRDD) {<br>implicit class PhysicalPlanHacks(s: SparkPlan) {<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18086/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1819#discussion_r15975307
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
    @@ -32,6 +38,113 @@ private[hive] trait HiveStrategies {
     
       val hiveContext: HiveContext
     
    +  /**
    +   * :: Experimental ::
    +   * Finds table scans that would use the Hive SerDe and replaces them with our own native parquet
    +   * table scan operator.
    +   *
    +   * TODO: Much of this logic is duplicated in HiveTableScan.  Ideally we would do some refactoring
    +   * but since this is after the code freeze for 1.1 all logic is here to minimize disruption.
    +   */
    +  @Experimental
    +  object ParquetConversion extends Strategy {
    +    implicit class LogicalPlanHacks(s: SchemaRDD) {
    +      def lowerCase =
    +        new SchemaRDD(s.sqlContext, LowerCaseSchema(s.logicalPlan))
    +    }
    +
    +    implicit class PhysicalPlanHacks(s: SparkPlan) {
    +      def fakeOutput(newOutput: Seq[Attribute]) = OutputFaker(newOutput, s)
    +    }
    +
    +    def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    +      case PhysicalOperation(projectList, predicates, relation: MetastoreRelation)
    +          if relation.tableDesc.getSerdeClassName.contains("Parquet") &&
    +             hiveContext.convertMetastoreParquet =>
    +
    +        // Filter out all predicates that only deal with partition keys
    +        val partitionKeyIds = relation.partitionKeys.map(_.exprId).toSet
    +        val (pruningPredicates, otherPredicates) = predicates.partition {
    +          _.references.map(_.exprId).subsetOf(partitionKeyIds)
    +        }
    +
    +        // We are going to throw the predicates and projection back at the whole optimization
    +        // sequence so lets unresolve all the attributes, allowing them to be rebound to the
    +        // matching parquet attributes.
    +        val unresolvedOtherPredicates = otherPredicates.map(_ transform {
    +          case a: AttributeReference => UnresolvedAttribute(a.name)
    +        }).reduceOption(And).getOrElse(Literal(true))
    +
    +        val unresolvedProjection = projectList.map(_ transform {
    +          // Handle non-partitioning columns
    +          case a: AttributeReference if !partitionKeyIds.contains(a.exprId) => UnresolvedAttribute(a.name)
    --- End diff --
    
    My bad... My IDE was misconfigured on the right margin...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-51412917
  
    QA tests have started for PR 1819. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18078/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2406][SQL] Initial support for usi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-51653434
  
    QA tests have started for PR 1819. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18218/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2406][SQL] Initial support for usi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-52444171
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18725/consoleFull) for   PR 1819 at commit [`4f3d54f`](https://github.com/apache/spark/commit/4f3d54ff8a9111ac3c340bc077dfefd62eb1dce2).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2406][SQL] Initial support for usi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-52404599
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18678/consoleFull) for   PR 1819 at commit [`570fd9e`](https://github.com/apache/spark/commit/570fd9eb6a27b0febe174e2d64cbfee27327a278).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2406][SQL] Initial support for usi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-51660339
  
    QA results for PR 1819:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds the following public classes (experimental):<br>case class OutputFaker(output: Seq[Attribute], child: SparkPlan) extends SparkPlan {<br>implicit class LogicalPlanHacks(s: SchemaRDD) {<br>implicit class PhysicalPlanHacks(originalPlan: SparkPlan) {<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18218/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-2406][SQL] Initial support for usi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-52446903
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18725/consoleFull) for   PR 1819 at commit [`4f3d54f`](https://github.com/apache/spark/commit/4f3d54ff8a9111ac3c340bc077dfefd62eb1dce2).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1819#discussion_r15949553
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/parquet/ParquetMetastoreSuite.scala ---
    @@ -0,0 +1,138 @@
    +
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.parquet
    +
    +import java.io.File
    +
    +import org.apache.spark.sql.hive.execution.HiveTableScan
    +import org.scalatest.BeforeAndAfterAll
    +
    +import scala.reflect.ClassTag
    +
    +import org.apache.spark.sql.{SQLConf, QueryTest}
    +import org.apache.spark.sql.execution.{BroadcastHashJoin, ShuffledHashJoin}
    +import org.apache.spark.sql.hive.test.TestHive
    +import org.apache.spark.sql.hive.test.TestHive._
    +
    +case class ParquetData(intField: Int, stringField: String)
    +
    +/**
    + * Tests for our SerDe -> Native parquet scan conversion.
    + */
    +class ParquetMetastoreSuite extends QueryTest with BeforeAndAfterAll {
    +
    +  override def beforeAll(): Unit = {
    +    setConf("spark.sql.hive.convertMetastoreParquet", "true")
    +  }
    +
    +  override def afterAll(): Unit = {
    +    setConf("spark.sql.hive.convertMetastoreParquet", "false")
    +  }
    +
    +  val partitionedTableDir = File.createTempFile("parquettests", "sparksql")
    +  partitionedTableDir.delete()
    +  partitionedTableDir.mkdir()
    +
    +  (1 to 10).foreach { p =>
    +    val partDir = new File(partitionedTableDir, s"p=$p")
    +    sparkContext.makeRDD(1 to 10)
    +      .map(i => ParquetData(i, s"part-$p"))
    +      .saveAsParquetFile(partDir.getCanonicalPath)
    +  }
    +
    +  sql(s"""
    --- End diff --
    
    If we execute setup queries in the constructor, will we introduce any issue to mvn tests? It looks similar with what we originally did for `HiveTableScanSuite`. Then, we have to use `createQueryTest` to atomically run setup and execution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1819#discussion_r15913307
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
    @@ -78,6 +78,14 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
       // Change the default SQL dialect to HiveQL
       override private[spark] def dialect: String = getConf(SQLConf.DIALECT, "hiveql")
     
    +  /**
    +   * When true, enables an experimental feature where metastore tables that use the parquet SerDe
    +   * are automatically converted to use the Spark SQL parquet table scan, instead of the Hive
    +   * SerDe.
    +   */
    +  private[spark] def convertMetastoreParquet: Boolean =
    +    getConf("spark.sql.hive.convertMetastoreParquet", "false") == "true"
    --- End diff --
    
    I have mixed feelings about that.  The problem being that this only applies to HiveContexts, so it doesn't really make much sense in a SQLContext.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2406][SQL] Initial support for using Pa...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1819#issuecomment-51554872
  
    QA results for PR 1819:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds the following public classes (experimental):<br>case class OutputFaker(output: Seq[Attribute], child: SparkPlan) extends SparkPlan {<br>implicit class LogicalPlanHacks(s: SchemaRDD) {<br>implicit class PhysicalPlanHacks(s: SparkPlan) {<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18168/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org