You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by zhzhan <gi...@git.apache.org> on 2014/09/03 01:31:35 UTC

[GitHub] spark pull request: Spark 2706

GitHub user zhzhan opened a pull request:

    https://github.com/apache/spark/pull/2241

    Spark 2706

    Given that a lot of users are trying to use hive 0.13 in spark, and the incompatibility between hive-0.12 and hive-0.13 on the API level I want to propose following approach, which has no or minimum impact on existing hive-0.12 support, but be able to jumpstart the development of hive-0.13 and future version support.
    
    Approach: Introduce “hive-version” property,  and manipulate pom.xml files to support different hive version at compiling time through shim layer, e.g., hive-0.12.0 and hive-0.13.1. More specifically,
    
    1. For each different hive version, there is a very light layer of shim code to handle API differences, sitting in sql/hive/version, e.g., sql/hive/v0.12 or sql/hive/v0.13
    
    2. Add a new profile hive-default active by default, which picks up all existing configuration and hive-0.12.0 shim (v0.12)  if no hive.version is specified.
    
    3. If user specifies different version (currently only 0.13.1 by -Dhive.version = 0.13.1), hive-0.13 profile will be activated, which pick up hive-0.13.1 specific configuration, mainly the hive jars and hive-0.13.1 shim (v0.13).
    
    4. With this approach, nothing is changed with current hive-0.12 support.
    
    No change by default: sbt/sbt -Phive
    For example: sbt/sbt -Phive -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly
    
    To enable hive-0.13: sbt/sbt -Dhive.version=0.13.1 
    For example: sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly
    
    Note that in hive-0.13, hive-thriftserver is not enabled, which should be fixed by other Jira, and we don’t need -Phive with -Dhive.version in building (probably we should use -Phive -Dhive.version=xxx instead after thrift server is also supported in hive-0.13).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zhzhan/spark spark-2706

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2241.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2241
    
----
commit 7d5fce2f1dad4a4f42c21b82e8feabf3dbe50903
Author: Zhan Zhang <zh...@gmail.com>
Date:   2014-08-08T17:47:18Z

    test

commit 42585ecfce299d0bda54bd169daba944263e59d3
Author: Zhan Zhang <zh...@gmail.com>
Date:   2014-08-08T17:47:18Z

    test

commit 70ffd9398a5d1d5196ec027402bfc8990c07fef3
Author: Zhan Zhang <zh...@gmail.com>
Date:   2014-08-08T17:57:06Z

    revert

commit fe0f379bade257bb69b610328ce042d8850c26bc
Author: Zhan Zhang <zh...@gmail.com>
Date:   2014-08-08T18:01:12Z

    Merge branch 'master' of https://github.com/zhzhan/spark

commit 70964fe6e3ee952d3b8301cc17bad83804fe492f
Author: Zhan Zhang <zh...@gmail.com>
Date:   2014-08-08T18:02:01Z

    revert

commit dbedff3e6fbb1c6558ce3047cf60edbc43ffa02a
Author: Zhan Zhang <zh...@gmail.com>
Date:   2014-08-29T07:14:35Z

    Merge remote-tracking branch 'upstream/master'

commit ba14f283d8a68cc90ee828eee5dcc9a95e6dd5a4
Author: Zhan Zhang <zh...@gmail.com>
Date:   2014-08-08T17:47:18Z

    test

commit f6a8a40baa588539727d2cd450f6b638756b9b29
Author: Zhan Zhang <zh...@gmail.com>
Date:   2014-08-08T17:57:06Z

    revert

commit cb53a2cc4b25878c9ce1e8629f13739d0e26b2bf
Author: Zhan Zhang <zh...@gmail.com>
Date:   2014-08-30T18:01:37Z

    Merge branch 'master' of https://github.com/apache/spark

commit 789ea21617178ea039b306070a69fcc9c807a053
Author: Zhan Zhang <zh...@gmail.com>
Date:   2014-09-02T20:31:23Z

    Merge branch 'master' of https://github.com/apache/spark

commit f896b2a5df7f914ba81140e0becdeb0c92c7eafe
Author: Zhan Zhang <zh...@gmail.com>
Date:   2014-09-02T20:31:58Z

    Merge branch 'master' into spark-2706

commit 921e914083929aeccf95c7c6876fe182ccc9ee74
Author: Zhan Zhang <zh...@gmail.com>
Date:   2014-09-02T23:07:30Z

    Merge branch 'master' of https://github.com/apache/spark

commit 87ebf3b2ec3fdf81c2188c0be46452d7621508e4
Author: Zhan Zhang <zh...@gmail.com>
Date:   2014-09-02T23:08:17Z

    Merge branch 'master' into spark-2706

commit 94b4fdc8600cc6d5d4fe338e293be74081a0e2d4
Author: Zhan Zhang <zh...@gmail.com>
Date:   2014-09-02T23:28:33Z

    Spark-2706: hive-0.13.1 support on spark

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59888325
  
    @scwf I am wondering how do you handle the decimal support, since hive-0.13 has new semantic for this type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-57394068
  
    @pwendell Thanks a lot. But I need some confirmation from you.
    1. Is it long term decision to have customized hive 0.13 repo org.spark-project.hive
     
    2. Do you want to support both hive 0.13.0 and 0.13.1, or just 0.13.0? 
    My current patch is against hive0.13.1. The package you provided is 0.13.0. There is also API compatibility issue between 0.13.1 and 0.13.0.  I can provide a 0.13.0 patch, but this approach will also need a separate package for both 0.13.0 and 0.1.31 if we want to support both.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18782903
  
    --- Diff: docs/_config.yml ---
    @@ -1,7 +1,5 @@
     highlighter: pygments
     markdown: kramdown
    -gems:
    -  - jekyll-redirect-from
    --- End diff --
    
    This particular change we should revert - I accidentally added this from my branch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54247051
  
    Are the failed test cases in thrift server and python script the false positive? Actually my local test without the change also has the thrift server cases failed. Please comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18430042
  
    --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,84 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.net.URI
    +import java.util.Properties
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.`type`.HiveDecimal
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.{Hive, Partition, Table}
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.ql.processors._
    +import org.apache.hadoop.hive.ql.stats.StatsSetupConst
    +import org.apache.hadoop.hive.serde2.{Deserializer, ColumnProjectionUtils}
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.mapred.InputFormat
    +import scala.collection.JavaConversions._
    +import scala.language.implicitConversions
    +
    +/**
    + * A compatibility layer for interacting with Hive version 0.12.0.
    + */
    +private[hive] object HiveShim {
    +  val version = "0.12.0"
    +  val metastoreDecimal = "decimal"
    +
    +  def getTableDesc(
    +    serdeClass: Class[_ <: Deserializer],
    +    inputFormatClass: Class[_ <: InputFormat[_, _]],
    +    outputFormatClass: Class[_],
    +    properties: Properties) = {
    +    new TableDesc(serdeClass, inputFormatClass, outputFormatClass, properties)
    --- End diff --
    
    Is it necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by qiaohaijun <gi...@git.apache.org>.

Github user qiaohaijun commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-60730251
  
    I will try it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59887284
  
    @scwf I mean "change some *.ql" you already did. The problem is that it need to add another layer to take care of compatibility test suite. I have not found a good way to do it. I will thank again to see whether there is a simple way to make it work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17807348
  
    --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,85 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import scala.collection.JavaConversions._
    +import org.apache.hadoop.hive.serde2.{Deserializer, ColumnProjectionUtils}
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.`type`.HiveDecimal
    +import java.net.URI
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.ql.stats.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.hadoop.hive.ql.metadata.Hive
    +import org.apache.hadoop.hive.ql.metadata.Table
    +import org.apache.hadoop.hive.ql.processors._
    +import org.apache.hadoop.hive.conf.HiveConf
    +
    +/*hive-0.12.0 support shimmer layer*/
    +object HiveShim {
    +  val version = "0.12.0"
    +  val metastoreDecimal = "decimal"
    +  def getTableDesc(serdeClass: Class[_ <: Deserializer], inputFormatClass: Class[_ <: InputFormat[_, _]], outputFormatClass: Class[_], properties: Properties) = {
    +    new TableDesc(serdeClass, inputFormatClass, outputFormatClass, properties)
    +  }
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +  def createDefaultDBIfNeeded(context: HiveContext) ={  }
    +
    +  /*handle the difference in "None" and empty ""*/
    +  def getEmptyCommentsFieldValue = "None"
    +
    +  def convertCatalystString2Hive(s: String) = new hadoopIo.Text(s) // TODO why should be Text?
    --- End diff --
    
    Do we know the reason of using `Text` at here? If it is a hive 0.12 specific case, shall we update the comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17812859
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,147 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import scala.collection.JavaConversions._
    +import org.apache.spark.Logging
    +import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import java.util.Properties
    +import org.apache.hadoop.hive.serde2.Deserializer
    +
    +/*hive-0.13.1 support shimmer layer*/
    +object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    +  def getTableDesc(serdeClass: Class[_ <: Deserializer], inputFormatClass: Class[_ <: InputFormat[_, _]], outputFormatClass: Class[_], properties: Properties) = {
    +    new TableDesc(inputFormatClass, outputFormatClass, properties)
    +  }
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    --- End diff --
    
    Insert newlines in between all these methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18430600
  
    --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,84 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.net.URI
    +import java.util.Properties
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.`type`.HiveDecimal
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.{Hive, Partition, Table}
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.ql.processors._
    +import org.apache.hadoop.hive.ql.stats.StatsSetupConst
    +import org.apache.hadoop.hive.serde2.{Deserializer, ColumnProjectionUtils}
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.mapred.InputFormat
    +import scala.collection.JavaConversions._
    +import scala.language.implicitConversions
    +
    +/**
    + * A compatibility layer for interacting with Hive version 0.12.0.
    + */
    +private[hive] object HiveShim {
    +  val version = "0.12.0"
    +  val metastoreDecimal = "decimal"
    +
    +  def getTableDesc(
    +    serdeClass: Class[_ <: Deserializer],
    +    inputFormatClass: Class[_ <: InputFormat[_, _]],
    +    outputFormatClass: Class[_],
    +    properties: Properties) = {
    +    new TableDesc(serdeClass, inputFormatClass, outputFormatClass, properties)
    +  }
    +
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +
    +  def createDefaultDBIfNeeded(context: HiveContext) ={  }
    +
    +  /** The string used to denote an empty comments field in the schema. */
    +  def getEmptyCommentsFieldValue = "None"
    +
    +  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
    +    CommandProcessorFactory.get(cmd(0), conf)
    +  }
    +
    +  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
    +    new HiveDecimal(bd)
    +  }
    +
    +  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) {
    +    ColumnProjectionUtils.appendReadColumnIDs(conf, ids)
    +    ColumnProjectionUtils.appendReadColumnNames(conf, names)
    +  }
    +
    +  def getExternalTmpPath(context: Context, uri: URI): String = {
    --- End diff --
    
    It will be good to make the return type consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54391969
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19707/consoleFull) for   PR 2241 at commit [`94b4fdc`](https://github.com/apache/spark/commit/94b4fdc8600cc6d5d4fe338e293be74081a0e2d4).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class SparkListenerBlockManagerAdded(time: Long, blockManagerId: BlockManagerId, maxMem: Long)`
      * `case class SparkListenerBlockManagerRemoved(time: Long, blockManagerId: BlockManagerId)`
      * `case class SparkListenerApplicationStart(appName: String, appId: Option[String], time: Long,`
      * `  implicit class wrapperToPartition(p: Partition) `
      * `  implicit class wrapperToHive(client: Hive) `
      * `class ShimContext(conf: Configuration) extends Context(conf) `
      * `class ShimFileSinkDesc(var dir: String, var tableInfo: TableDesc, var compressed: Boolean)`
      * `  implicit class wrapperToPartition(p: Partition) `
      * `class ShimContext(conf: Configuration) extends Context(conf) `
      * `class ShimFileSinkDesc(var dir: String, var tableInfo: TableDesc, var compressed: Boolean)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Spark 2706

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54233174
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59889646
  
    @marmbrus FYI: I ran the compatibility test, and so far the major outstanding issues include 1st:  decimal support, 2nd: udf7 and udf_round, which can be fixed, but I am not 100% sure it is the right way. Most of other failures are false positive and should be solved by regenerating golden answer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17806225
  
    --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,85 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import scala.collection.JavaConversions._
    +import org.apache.hadoop.hive.serde2.{Deserializer, ColumnProjectionUtils}
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.`type`.HiveDecimal
    +import java.net.URI
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.ql.stats.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.hadoop.hive.ql.metadata.Hive
    +import org.apache.hadoop.hive.ql.metadata.Table
    +import org.apache.hadoop.hive.ql.processors._
    +import org.apache.hadoop.hive.conf.HiveConf
    +
    +/*hive-0.12.0 support shimmer layer*/
    +object HiveShim {
    +  val version = "0.12.0"
    +  val metastoreDecimal = "decimal"
    +  def getTableDesc(serdeClass: Class[_ <: Deserializer], inputFormatClass: Class[_ <: InputFormat[_, _]], outputFormatClass: Class[_], properties: Properties) = {
    --- End diff --
    
    New line here and this is over 100 chars.
    ```scala
      def getTableDesc(
        serdeClass: Class[_ <: Deserializer],
        inputFormatClass: Class[_ <: InputFormat[_, _]],
        outputFormatClass: Class[_],
        properties: Properties) = {
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17832655
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -59,7 +60,7 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with
         val table = client.getTable(databaseName, tblName)
         val partitions: Seq[Partition] =
           if (table.isPartitioned) {
    -        client.getAllPartitionsForPruner(table).toSeq
    +        client.getAllPartitionsOf(table).toSeq
    --- End diff --
    
    This is just the name change in hive, no functionality change. Please refer to hive-5483


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54244886
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19616/consoleFull) for   PR 2241 at commit [`94b4fdc`](https://github.com/apache/spark/commit/94b4fdc8600cc6d5d4fe338e293be74081a0e2d4).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  implicit class wrapperToPartition(p: Partition) `
      * `  implicit class wrapperToHive(client: Hive) `
      * `class ShimContext(conf: Configuration) extends Context(conf) `
      * `class ShimFileSinkDesc(var dir: String, var tableInfo: TableDesc, var compressed: Boolean)`
      * `  implicit class wrapperToPartition(p: Partition) `
      * `class ShimContext(conf: Configuration) extends Context(conf) `
      * `class ShimFileSinkDesc(var dir: String, var tableInfo: TableDesc, var compressed: Boolean)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17807897
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
    @@ -140,7 +141,7 @@ class HadoopTableReader(
           filterOpt: Option[PathFilter]): RDD[Row] = {
         val hivePartitionRDDs = partitionToDeserializer.map { case (partition, partDeserializer) =>
           val partDesc = Utilities.getPartitionDesc(partition)
    -      val partPath = partition.getPartitionPath
    +      val partPath = partition.getDataLocationPath
    --- End diff --
    
    Why?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59880350
  
    @scwf The golden answer is different in hive12 and hive13. We need some extra shim layer to handle that. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18430557
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,158 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.spark.Logging
    +import org.apache.hadoop.{io => hadoopIo}
    +import scala.collection.JavaConversions._
    +import scala.language.implicitConversions
    +
    +/**
    + * A compatibility layer for interacting with Hive version 0.13.1.
    + */
    +private[hive] object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    --- End diff --
    
    Let's say we connect to a existing hive 0.13 metastore. If there is a decimal column with a user-defined precision and scale, will we see parsing error?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18429712
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,158 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.spark.Logging
    +import org.apache.hadoop.{io => hadoopIo}
    +import scala.collection.JavaConversions._
    +import scala.language.implicitConversions
    +
    +/**
    + * A compatibility layer for interacting with Hive version 0.13.1.
    + */
    +private[hive] object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    +
    +  def getTableDesc(
    +    serdeClass: Class[_ <: Deserializer],
    +    inputFormatClass: Class[_ <: InputFormat[_, _]],
    +    outputFormatClass: Class[_],
    +    properties: Properties) = {
    +    new TableDesc(inputFormatClass, outputFormatClass, properties)
    +  }
    +
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +
    +  def createDefaultDBIfNeeded(context: HiveContext) ={
    +    context.runSqlHive("CREATE DATABASE default")
    +    context.runSqlHive("USE default")
    +  }
    +
    +  /** The string used to denote an empty comments field in the schema. */
    +  def getEmptyCommentsFieldValue = ""
    +
    +  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
    +    CommandProcessorFactory.get(cmd, conf)
    +  }
    +
    +  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
    +    HiveDecimal.create(bd)
    +  }
    +
    +  /*
    +   * This function in hive-0.13 become private, but we have to do this to walkaround hive bug
    +   */
    +  private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) {
    +    val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
    +    val result: StringBuilder = new StringBuilder(old)
    +    var first: Boolean = old.isEmpty
    +
    +    for (col <- cols) {
    +      if (first) {
    +        first = false
    +      }
    +      else {
    +        result.append(',')
    +      }
    +      result.append(col)
    +    }
    +    conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString)
    +  }
    +
    +  /*
    +   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty
    +   */
    +  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) {
    +    if (ids != null && ids.size > 0) {
    +      ColumnProjectionUtils.appendReadColumns(conf, ids)
    +    }
    +    appendReadColumnNames(conf, names)
    +  }
    +
    +  def getExternalTmpPath(context: Context, path: Path) = {
    +    context.getExternalTmpPath(path.toUri)
    +  }
    +
    +  def getDataLocationPath(p: Partition) = p.getDataLocation
    +
    +  def getAllPartitionsOf(client: Hive, tbl: Table) =  client.getAllPartitionsOf(tbl)
    +
    +  /*
    +   * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its member path is not.
    +   * Fix it through wrapper.
    +   * */
    +  implicit def wrapperToFileSinkDesc(w: ShimFileSinkDesc): FileSinkDesc = {
    +        var f = new FileSinkDesc(new Path(w.dir), w.tableInfo, w.compressed)
    +        f.setCompressed(w.compressed)
    --- End diff --
    
    Seems it is not necessary.
    
    Also, 2 space indentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-57420122
  
    Hey @zhzhan thanks for pointing that out I didn't realize. I think it's fine to just support Hive 0.13.1... didn't realize there were major changes. 
    
    I went ahead and published a modified version of 0.13.1 here:
    
    https://oss.sonatype.org/content/repositories/orgspark-project-1079/org/spark-project/hive/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17809388
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,147 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import scala.collection.JavaConversions._
    +import org.apache.spark.Logging
    +import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import java.util.Properties
    +import org.apache.hadoop.hive.serde2.Deserializer
    +
    +/*hive-0.13.1 support shimmer layer*/
    +object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    +  def getTableDesc(serdeClass: Class[_ <: Deserializer], inputFormatClass: Class[_ <: InputFormat[_, _]], outputFormatClass: Class[_], properties: Properties) = {
    +    new TableDesc(inputFormatClass, outputFormatClass, properties)
    +  }
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +  def createDefaultDBIfNeeded(context: HiveContext) ={
    +    context.runSqlHive("CREATE DATABASE default")
    +    context.runSqlHive("USE default")
    +  }
    +  /*handle the difference in HiveQuerySuite*/
    +  def getEmptyCommentsFieldValue = ""
    +
    +  def convertCatalystString2Hive(s: String) = s
    +
    +  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
    +    CommandProcessorFactory.get(cmd, conf)
    +  }
    +
    +  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
    +    HiveDecimal.create(bd)
    +  }
    +
    +  /*
    +   * This function in hive-0.13 become private, but we have to do this to walkaround hive bug
    +   * */
    +  private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) {
    +    val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
    +    val result: StringBuilder = new StringBuilder(old)
    +    var first: Boolean = old.isEmpty
    +
    +    for (col <- cols) {
    +      if (first) {
    +        first = false
    +      }
    +      else {
    +        result.append(',')
    +      }
    +      result.append(col)
    +    }
    +    conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString)
    +  }
    +
    +  /*
    +   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty
    +   * */
    +  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) {
    +    if (ids != null && ids.size > 0) {
    +      ColumnProjectionUtils.appendReadColumns(conf, ids)
    +    } else {
    +      appendReadColumnNames(conf, names)
    --- End diff --
    
    Why in else?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18430590
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala ---
    @@ -557,11 +557,14 @@ class HiveQuerySuite extends HiveComparisonTest {
               |WITH serdeproperties('s1'='9')
             """.stripMargin)
         }
    -    sql(s"ADD JAR $testJar")
    -    sql(
    -      """ALTER TABLE alter1 SET SERDE 'org.apache.hadoop.hive.serde2.TestSerDe'
    -        |WITH serdeproperties('s1'='9')
    -      """.stripMargin)
    +    // now only verify 0.12.0, and ignore other versions due to binary compatability
    --- End diff --
    
    Can you explain it a little bit more?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-55204266
  
    Thanks for reviewing the code. Due to the special package requirements in hive-0.12.0 support, and some internal behavior change in hive 0.13, I have not found a good way to use reflection to dynamically support different hive versions.
    
    I agree with you the current patch may be invasive to the building changes for supporting different hive versions in the future. I have changed the profile to pick up the src directory automatically based on user specific versions, so that in the future, if new version is supported, there is no need to change the pom any more. 
    
    In the meantime, it is possible in the future to support different hive versions with reflection. If that is the case, the current patch should be adapt to that approach.
    
    There are more development works required to be done on spark side to fully utilize hive-0.13 new features, and purpose of this patch is to give the customer a try and jumpstart the effort in the community.
    
    Please review the code and let me know your feedback.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-57939048
  
    I tested with @pwendell  shaded hive-0.13.1,  also has this problem:
    Exception in thread "main" java.lang.ClassNotFoundException: com.google.protobuf_spark.GeneratedMessage
    
    Hi @pwendell, i think we need regenerate OrcProto.java,Here is  ```org.spark_project.protobuf.ExtensionRegistry``` in
    https://github.com/pwendell/hive/blob/0.13-shaded-protobuf/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java#L9
    
    It should be ```com.google.protobuf_spark.ExtensionRegistry registry```?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54384092
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19707/consoleFull) for   PR 2241 at commit [`94b4fdc`](https://github.com/apache/spark/commit/94b4fdc8600cc6d5d4fe338e293be74081a0e2d4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18944242
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
    @@ -288,23 +290,26 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
           val cmd_trimmed: String = cmd.trim()
           val tokens: Array[String] = cmd_trimmed.split("\\s+")
           val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
    -      val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf)
    +      val proc: CommandProcessor = HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
     
           proc match {
             case driver: Driver =>
    -          driver.init()
    -
    -          val results = new JArrayList[String]
    +          val results = HiveShim.createDriverResultsArray
               val response: CommandProcessorResponse = driver.run(cmd)
               // Throw an exception if there is an error in query processing.
               if (response.getResponseCode != 0) {
    -            driver.destroy()
    +            driver.close()
                 throw new QueryExecutionException(response.getErrorMessage)
               }
               driver.setMaxRows(maxRows)
               driver.getResults(results)
    -          driver.destroy()
    -          results
    +          driver.close()
    +          results.map { r =>
    +            r match {
    +              case s: String => s
    +              case o => o.toString
    --- End diff --
    
    Here ```r``` maybe a ```Array``` type(https://github.com/scwf/hive/blob/branch-0.13/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchFormatter.java#L53-L64), we should cover that case, otherwise this will lead to console result printed as follows:
    ``` result
       [Object@5e41108b
    ```
       
    And on the other hand i suggest that we should do some tests with this PR merged with #2685 to check the basic functionality
        
    
     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54726521
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18429781
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,158 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.spark.Logging
    +import org.apache.hadoop.{io => hadoopIo}
    +import scala.collection.JavaConversions._
    +import scala.language.implicitConversions
    +
    +/**
    + * A compatibility layer for interacting with Hive version 0.13.1.
    + */
    +private[hive] object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    +
    +  def getTableDesc(
    +    serdeClass: Class[_ <: Deserializer],
    +    inputFormatClass: Class[_ <: InputFormat[_, _]],
    +    outputFormatClass: Class[_],
    +    properties: Properties) = {
    +    new TableDesc(inputFormatClass, outputFormatClass, properties)
    +  }
    +
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +
    +  def createDefaultDBIfNeeded(context: HiveContext) ={
    +    context.runSqlHive("CREATE DATABASE default")
    +    context.runSqlHive("USE default")
    +  }
    +
    +  /** The string used to denote an empty comments field in the schema. */
    +  def getEmptyCommentsFieldValue = ""
    +
    +  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
    +    CommandProcessorFactory.get(cmd, conf)
    +  }
    +
    +  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
    +    HiveDecimal.create(bd)
    +  }
    +
    +  /*
    +   * This function in hive-0.13 become private, but we have to do this to walkaround hive bug
    +   */
    +  private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) {
    +    val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
    +    val result: StringBuilder = new StringBuilder(old)
    +    var first: Boolean = old.isEmpty
    +
    +    for (col <- cols) {
    +      if (first) {
    +        first = false
    +      }
    +      else {
    --- End diff --
    
    ```
    if () {
      ...
    } else {
      ...
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-57011211
  
    @zhzhan Thank you for updating the PR. Can you let me know how to setup local dependencies? I'd like to try it. Thanks:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18430657
  
    --- Diff: sql/hive/pom.xml ---
    @@ -119,6 +83,74 @@
     
       <profiles>
         <profile>
    +      <id>hive-default</id>
    +      <activation>
    +        <property>
    +          <name>!hive.version</name>
    --- End diff --
    
    If we use modified hive dependencies, can we avoid this error and simplify pom changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17805302
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
    @@ -292,24 +294,20 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
           val cmd_trimmed: String = cmd.trim()
           val tokens: Array[String] = cmd_trimmed.split("\\s+")
           val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
    -      val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf)
    -
    -      SessionState.start(sessionState)
    +      val proc: CommandProcessor = HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
     
           proc match {
             case driver: Driver =>
    -          driver.init()
    -
               val results = new JArrayList[String]
               val response: CommandProcessorResponse = driver.run(cmd)
               // Throw an exception if there is an error in query processing.
               if (response.getResponseCode != 0) {
    -            driver.destroy()
    +            driver.close
    --- End diff --
    
    Use `()` on all methods that have side-effects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Spark 2706

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54236587
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54688393
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54335802
  
    Not sure why it fails pyspark. Can somebody help to retest or provide the insights? By the way, I saw some other PR has similar failure case around that time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17814180
  
    --- Diff: sql/hive/pom.xml ---
    @@ -119,6 +83,74 @@
     
       <profiles>
         <profile>
    +      <id>hive-default</id>
    +      <activation>
    +        <property>
    +          <name>!hive.version</name>
    +        </property>
    +      </activation>
    +      <dependencies>
    +        <dependency>
    +          <groupId>com.twitter</groupId>
    +          <artifactId>parquet-hive-bundle</artifactId>
    +          <version>1.5.0</version>
    +        </dependency>
    +        <dependency>
    +          <groupId>org.spark-project.hive</groupId>
    --- End diff --
    
    @pwendell I guess this implies that we need to build shaded artifacts for all Hive versions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18179030
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
    @@ -284,24 +287,20 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
           val cmd_trimmed: String = cmd.trim()
           val tokens: Array[String] = cmd_trimmed.split("\\s+")
           val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
    -      val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf)
    -
    -      SessionState.start(sessionState)
    +      val proc: CommandProcessor = HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
     
           proc match {
             case driver: Driver =>
    -          driver.init()
    -
               val results = new JArrayList[String]
               val response: CommandProcessorResponse = driver.run(cmd)
               // Throw an exception if there is an error in query processing.
               if (response.getResponseCode != 0) {
    -            driver.destroy()
    +            driver.close()
                 throw new QueryExecutionException(response.getErrorMessage)
               }
               driver.setMaxRows(maxRows)
               driver.getResults(results)
    -          driver.destroy()
    +          driver.close()
    --- End diff --
    
    A quick question since I ran into something similar when trying to run things on 0.13: can the driver be re-used after you call `close()`? Because I don't see the driver being removed from the CommandProcessorFactory cache, so another call to `runHive` will reuse the closed driver.
    
    (I ran into this when running the `HiveFromSpark` example compiled against Hive 0.13, so it should be easy to check.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-57242012
  
    @zhzhan I think those implicits are not necessary. Can you change those?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-56212828
  
    This patch does not include thrift patch, which will be fixed by other jiras, because I don't want the scope is too big.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59880636
  
    @marmbrus I think he refers to https://github.com/apache/spark/pull/2499


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58935643
  
    I think because the bundle has the package org.apache.hadoop.hive.ql.io.parquet, which is also now natively located in hive, and results in package conflicts. The way I fixed it is to remove this dependency  parquet-hive-bundle in hive-0.13.1. With this, I didn't see any problem with parquet anymore. Please comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Spark 2706

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54236416
  
    Would you mind to change the PR title to `[SPARK-2706][SQL] Enable Spark to support Hive 0.13`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59884283
  
    @marmbrus in #2499, i reproduce the golden answer and changed some *.ql because of 0.13 changes, the tests passed in my local machine.
    @zhzhan not get you, why to replace the query play?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59837979
  
    @scwf I did some basic functionality testing with you thrift patch, and it looks ok to me. By the way, because the 0.13.1 customized package is not available now, so I revert the pom back for testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59877085
  
    I think this is looking pretty good, but I'm not okay with merging it before the tests are passing for Hive 13.  Let me take a look and see how hard that will be.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18429783
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,158 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.spark.Logging
    +import org.apache.hadoop.{io => hadoopIo}
    +import scala.collection.JavaConversions._
    +import scala.language.implicitConversions
    +
    +/**
    + * A compatibility layer for interacting with Hive version 0.13.1.
    + */
    +private[hive] object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    +
    +  def getTableDesc(
    +    serdeClass: Class[_ <: Deserializer],
    +    inputFormatClass: Class[_ <: InputFormat[_, _]],
    +    outputFormatClass: Class[_],
    +    properties: Properties) = {
    +    new TableDesc(inputFormatClass, outputFormatClass, properties)
    +  }
    +
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +
    +  def createDefaultDBIfNeeded(context: HiveContext) ={
    +    context.runSqlHive("CREATE DATABASE default")
    +    context.runSqlHive("USE default")
    +  }
    +
    +  /** The string used to denote an empty comments field in the schema. */
    +  def getEmptyCommentsFieldValue = ""
    +
    +  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
    +    CommandProcessorFactory.get(cmd, conf)
    +  }
    +
    +  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
    +    HiveDecimal.create(bd)
    +  }
    +
    +  /*
    +   * This function in hive-0.13 become private, but we have to do this to walkaround hive bug
    +   */
    +  private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) {
    +    val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
    +    val result: StringBuilder = new StringBuilder(old)
    +    var first: Boolean = old.isEmpty
    +
    +    for (col <- cols) {
    +      if (first) {
    +        first = false
    +      }
    +      else {
    +        result.append(',')
    +      }
    +      result.append(col)
    +    }
    +    conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString)
    +  }
    +
    +  /*
    +   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty
    +   */
    +  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) {
    +    if (ids != null && ids.size > 0) {
    +      ColumnProjectionUtils.appendReadColumns(conf, ids)
    +    }
    +    appendReadColumnNames(conf, names)
    --- End diff --
    
    Why no null and empty check at here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54384096
  
    One concern related to Hive Thrift server and CLI support. Currently we're using some ugly reflection hacks to hook into Hive because related Hive code was not designed to be extensible for such scenarios ([here](https://github.com/apache/spark/blob/f2b5b619a9efee91573c0e546792e68e72afce21/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L85-L95), [here](https://github.com/apache/spark/blob/f2b5b619a9efee91573c0e546792e68e72afce21/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala#L63-L73) and [here](https://github.com/apache/spark/blob/f2b5b619a9efee91573c0e546792e68e72afce21/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala#L35-L48)). These hacks are brittle and highly depends on specific Hive implementation. Currently I don't have any thorough plan for this, but this should definitely be taken into account.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54249438
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58934179
  
    I see - so what exactly is the problem - is the issue that the Parquet serde is not compatible across versions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18433848
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,158 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.spark.Logging
    +import org.apache.hadoop.{io => hadoopIo}
    +import scala.collection.JavaConversions._
    +import scala.language.implicitConversions
    +
    +/**
    + * A compatibility layer for interacting with Hive version 0.13.1.
    + */
    +private[hive] object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    +
    +  def getTableDesc(
    +    serdeClass: Class[_ <: Deserializer],
    +    inputFormatClass: Class[_ <: InputFormat[_, _]],
    +    outputFormatClass: Class[_],
    +    properties: Properties) = {
    +    new TableDesc(inputFormatClass, outputFormatClass, properties)
    +  }
    +
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +
    +  def createDefaultDBIfNeeded(context: HiveContext) ={
    +    context.runSqlHive("CREATE DATABASE default")
    +    context.runSqlHive("USE default")
    +  }
    +
    +  /** The string used to denote an empty comments field in the schema. */
    +  def getEmptyCommentsFieldValue = ""
    +
    +  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
    +    CommandProcessorFactory.get(cmd, conf)
    +  }
    +
    +  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
    +    HiveDecimal.create(bd)
    +  }
    +
    +  /*
    +   * This function in hive-0.13 become private, but we have to do this to walkaround hive bug
    +   */
    +  private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) {
    +    val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
    +    val result: StringBuilder = new StringBuilder(old)
    +    var first: Boolean = old.isEmpty
    +
    +    for (col <- cols) {
    +      if (first) {
    +        first = false
    +      }
    +      else {
    +        result.append(',')
    +      }
    +      result.append(col)
    +    }
    +    conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString)
    +  }
    +
    +  /*
    +   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty
    +   */
    +  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) {
    +    if (ids != null && ids.size > 0) {
    +      ColumnProjectionUtils.appendReadColumns(conf, ids)
    +    }
    +    appendReadColumnNames(conf, names)
    +  }
    +
    +  def getExternalTmpPath(context: Context, path: Path) = {
    +    context.getExternalTmpPath(path.toUri)
    +  }
    +
    +  def getDataLocationPath(p: Partition) = p.getDataLocation
    +
    +  def getAllPartitionsOf(client: Hive, tbl: Table) =  client.getAllPartitionsOf(tbl)
    +
    +  /*
    +   * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its member path is not.
    +   * Fix it through wrapper.
    --- End diff --
    
    Not sure how hive handle this issue, but FileSinkDesc is certainly not serializable with Path variable. Test suite also complains this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-55317095
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/70/consoleFull) for   PR 2241 at commit [`128b60b`](https://github.com/apache/spark/commit/128b60b242d7e13e86b6b93e29d9884084a79a06).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  implicit class wrapperToPartition(p: Partition) `
      * `  implicit class wrapperToHive(client: Hive) `
      * `class ShimContext(conf: Configuration) extends Context(conf) `
      * `class ShimFileSinkDesc(var dir: String, var tableInfo: TableDesc, var compressed: Boolean)`
      * `  implicit class wrapperToPartition(p: Partition) `
      * `class ShimContext(conf: Configuration) extends Context(conf) `
      * `class ShimFileSinkDesc(var dir: String, var tableInfo: TableDesc, var compressed: Boolean)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18969029
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
    @@ -288,23 +290,26 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
           val cmd_trimmed: String = cmd.trim()
           val tokens: Array[String] = cmd_trimmed.split("\\s+")
           val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
    -      val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf)
    +      val proc: CommandProcessor = HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
     
           proc match {
             case driver: Driver =>
    -          driver.init()
    -
    -          val results = new JArrayList[String]
    +          val results = HiveShim.createDriverResultsArray
               val response: CommandProcessorResponse = driver.run(cmd)
               // Throw an exception if there is an error in query processing.
               if (response.getResponseCode != 0) {
    -            driver.destroy()
    +            driver.close()
                 throw new QueryExecutionException(response.getErrorMessage)
               }
               driver.setMaxRows(maxRows)
               driver.getResults(results)
    -          driver.destroy()
    -          results
    +          driver.close()
    +          results.map { r =>
    +            r match {
    +              case s: String => s
    +              case o => o.toString
    --- End diff --
    
    @scwf Thanks for the review. The reason I did this is that in hive testing code, I actually didn't find a case which the result value is not Array[String], and there the results is even initialized as Array[String]. For for the safety reason, I will change the code to process Array[Arrray[Object]].
    
    This patch is independent with thrift server, but the thrift server patch should be verified after this one going to upstream, mainly due to pom file change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17805665
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala ---
    @@ -110,7 +109,8 @@ private[hive] case class HiveSimpleUdf(functionClassName: String, children: Seq[
         val primitiveClasses = Seq(
           Integer.TYPE, classOf[java.lang.Integer], classOf[java.lang.String], java.lang.Double.TYPE,
           classOf[java.lang.Double], java.lang.Long.TYPE, classOf[java.lang.Long],
    -      classOf[HiveDecimal], java.lang.Byte.TYPE, classOf[java.lang.Byte],
    +      classOf[org.apache.hadoop.hive.common.`type`.HiveDecimal],
    +      java.lang.Byte.TYPE, classOf[java.lang.Byte],
    --- End diff --
    
    Is there a reason you changed this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18430322
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -80,8 +81,10 @@ class StatisticsSuite extends QueryTest with BeforeAndAfterAll {
         sql("INSERT INTO TABLE analyzeTable SELECT * FROM src").collect()
         sql("INSERT INTO TABLE analyzeTable SELECT * FROM src").collect()
     
    -    assert(queryTotalSize("analyzeTable") === defaultSizeInBytes)
    -
    +    // TODO: How it works? needs to add it back for other hive version.
    +    if (HiveShim.version =="0.12.0") {
    --- End diff --
    
    For Hive 0.13, will table always be updated after `INSERT INTO`?
    
    When we added this test, the table size was not updated with the `INSERT INTO` command.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58993918
  
    Hmmm Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17805505
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala ---
    @@ -23,7 +23,7 @@ import org.apache.hadoop.hive.common.`type`.{HiveDecimal, HiveVarchar}
     import org.apache.hadoop.hive.conf.HiveConf
     import org.apache.hadoop.hive.ql.metadata.{Partition => HivePartition}
     import org.apache.hadoop.hive.serde.serdeConstants
    -import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
    +import org.apache.spark.sql.hive.HiveShim
    --- End diff --
    
    Sort imports, this should go below with the other spark sql ones.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58853725
  
    Hi @zhzhan and @scwf  - I made some changes to the build to simplify it a bit. I made a PR into your branch. I tested it locally compiling for 0.12 and 0.13, but it would be good if you tested it as well to make sure it works.
    
    https://github.com/zhzhan/spark/pull/1/files


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17805714
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -79,9 +80,9 @@ class StatisticsSuite extends QueryTest with BeforeAndAfterAll {
         sql("CREATE TABLE analyzeTable (key STRING, value STRING)").collect()
         sql("INSERT INTO TABLE analyzeTable SELECT * FROM src").collect()
         sql("INSERT INTO TABLE analyzeTable SELECT * FROM src").collect()
    -
    -    assert(queryTotalSize("analyzeTable") === defaultSizeInBytes)
    -
    +    if (HiveShim.version.equals("0.12.0")) {
    +      assert(queryTotalSize("analyzeTable") === defaultSizeInBytes)
    --- End diff --
    
    Why is this version specific?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17833159
  
    --- Diff: sql/hive/pom.xml ---
    @@ -119,6 +83,74 @@
     
       <profiles>
         <profile>
    +      <id>hive-default</id>
    +      <activation>
    +        <property>
    +          <name>!hive.version</name>
    +        </property>
    +      </activation>
    +      <dependencies>
    +        <dependency>
    +          <groupId>com.twitter</groupId>
    +          <artifactId>parquet-hive-bundle</artifactId>
    +          <version>1.5.0</version>
    +        </dependency>
    +        <dependency>
    +          <groupId>org.spark-project.hive</groupId>
    --- End diff --
    
    For hive-0.12, because there is version conflicts in protobuf, the hive jar has to be shaded. It is OK for hive-0.13.1, but overall the problem has to be fixed in hive. I was told that the packaging issue may be fixed in hive 0.14. Because currently the primary hive version supported is hive 0.12, which use customized jar. After everything is sorted out, and spark moves to support more latest hive version, the pom should be changed accordingly and stabilized in this part. The current solution is not perfect, but it is hard to find a better way to solve this problem in one shot. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18433803
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
    @@ -80,8 +81,10 @@ class StatisticsSuite extends QueryTest with BeforeAndAfterAll {
         sql("INSERT INTO TABLE analyzeTable SELECT * FROM src").collect()
         sql("INSERT INTO TABLE analyzeTable SELECT * FROM src").collect()
     
    -    assert(queryTotalSize("analyzeTable") === defaultSizeInBytes)
    -
    +    // TODO: How it works? needs to add it back for other hive version.
    +    if (HiveShim.version =="0.12.0") {
    --- End diff --
    
    I don't quite understand this part, but in my testing the new value is always 11624. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17833192
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,147 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import scala.collection.JavaConversions._
    +import org.apache.spark.Logging
    +import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import java.util.Properties
    +import org.apache.hadoop.hive.serde2.Deserializer
    +
    +/*hive-0.13.1 support shimmer layer*/
    +object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    +  def getTableDesc(serdeClass: Class[_ <: Deserializer], inputFormatClass: Class[_ <: InputFormat[_, _]], outputFormatClass: Class[_], properties: Properties) = {
    +    new TableDesc(inputFormatClass, outputFormatClass, properties)
    +  }
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +  def createDefaultDBIfNeeded(context: HiveContext) ={
    +    context.runSqlHive("CREATE DATABASE default")
    +    context.runSqlHive("USE default")
    +  }
    +  /*handle the difference in HiveQuerySuite*/
    +  def getEmptyCommentsFieldValue = ""
    +
    +  def convertCatalystString2Hive(s: String) = s
    +
    +  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
    +    CommandProcessorFactory.get(cmd, conf)
    +  }
    +
    +  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
    +    HiveDecimal.create(bd)
    +  }
    +
    +  /*
    +   * This function in hive-0.13 become private, but we have to do this to walkaround hive bug
    +   * */
    +  private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) {
    +    val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
    +    val result: StringBuilder = new StringBuilder(old)
    +    var first: Boolean = old.isEmpty
    +
    +    for (col <- cols) {
    +      if (first) {
    +        first = false
    +      }
    +      else {
    +        result.append(',')
    +      }
    +      result.append(col)
    +    }
    +    conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString)
    +  }
    +
    +  /*
    +   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty
    +   * */
    +  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) {
    +    if (ids != null && ids.size > 0) {
    +      ColumnProjectionUtils.appendReadColumns(conf, ids)
    +    } else {
    +      appendReadColumnNames(conf, names)
    --- End diff --
    
    good catch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17809366
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,147 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import scala.collection.JavaConversions._
    +import org.apache.spark.Logging
    +import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import java.util.Properties
    +import org.apache.hadoop.hive.serde2.Deserializer
    +
    +/*hive-0.13.1 support shimmer layer*/
    +object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    +  def getTableDesc(serdeClass: Class[_ <: Deserializer], inputFormatClass: Class[_ <: InputFormat[_, _]], outputFormatClass: Class[_], properties: Properties) = {
    +    new TableDesc(inputFormatClass, outputFormatClass, properties)
    +  }
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +  def createDefaultDBIfNeeded(context: HiveContext) ={
    +    context.runSqlHive("CREATE DATABASE default")
    +    context.runSqlHive("USE default")
    +  }
    +  /*handle the difference in HiveQuerySuite*/
    +  def getEmptyCommentsFieldValue = ""
    +
    +  def convertCatalystString2Hive(s: String) = s
    +
    +  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
    +    CommandProcessorFactory.get(cmd, conf)
    +  }
    +
    +  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
    +    HiveDecimal.create(bd)
    +  }
    +
    +  /*
    +   * This function in hive-0.13 become private, but we have to do this to walkaround hive bug
    +   * */
    +  private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) {
    +    val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
    +    val result: StringBuilder = new StringBuilder(old)
    +    var first: Boolean = old.isEmpty
    +
    +    for (col <- cols) {
    +      if (first) {
    +        first = false
    +      }
    +      else {
    +        result.append(',')
    +      }
    +      result.append(col)
    +    }
    +    conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString)
    +  }
    +
    +  /*
    +   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty
    --- End diff --
    
    Why?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18433810
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,158 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.spark.Logging
    +import org.apache.hadoop.{io => hadoopIo}
    +import scala.collection.JavaConversions._
    +import scala.language.implicitConversions
    +
    +/**
    + * A compatibility layer for interacting with Hive version 0.13.1.
    + */
    +private[hive] object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    --- End diff --
    
    Change the comments, and make the parser accept any decimal(precision, scale). But the gap is still there between hive0.13.1 and spark support.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54691310
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58854626
  
    Note @scwf there are some TODO's in there that need to be addressed in your patch for JDBC.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-55302989
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/70/consoleFull) for   PR 2241 at commit [`128b60b`](https://github.com/apache/spark/commit/128b60b242d7e13e86b6b93e29d9884084a79a06).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18198152
  
    --- Diff: sql/hive/pom.xml ---
    @@ -119,6 +83,74 @@
     
       <profiles>
         <profile>
    +      <id>hive-default</id>
    +      <activation>
    +        <property>
    +          <name>!hive.version</name>
    --- End diff --
    
    Due to special packaging requirements in hive 0.12.0.  This will cause compiling error. For future other versions, the flag is OK as long as the shim layer is provided.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18433838
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,158 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.spark.Logging
    +import org.apache.hadoop.{io => hadoopIo}
    +import scala.collection.JavaConversions._
    +import scala.language.implicitConversions
    +
    +/**
    + * A compatibility layer for interacting with Hive version 0.13.1.
    + */
    +private[hive] object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    +
    +  def getTableDesc(
    +    serdeClass: Class[_ <: Deserializer],
    +    inputFormatClass: Class[_ <: InputFormat[_, _]],
    +    outputFormatClass: Class[_],
    +    properties: Properties) = {
    +    new TableDesc(inputFormatClass, outputFormatClass, properties)
    +  }
    +
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +
    +  def createDefaultDBIfNeeded(context: HiveContext) ={
    +    context.runSqlHive("CREATE DATABASE default")
    +    context.runSqlHive("USE default")
    +  }
    +
    +  /** The string used to denote an empty comments field in the schema. */
    +  def getEmptyCommentsFieldValue = ""
    +
    +  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
    +    CommandProcessorFactory.get(cmd, conf)
    +  }
    +
    +  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
    +    HiveDecimal.create(bd)
    +  }
    +
    +  /*
    +   * This function in hive-0.13 become private, but we have to do this to walkaround hive bug
    +   */
    +  private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) {
    +    val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
    +    val result: StringBuilder = new StringBuilder(old)
    +    var first: Boolean = old.isEmpty
    +
    +    for (col <- cols) {
    +      if (first) {
    +        first = false
    +      }
    +      else {
    +        result.append(',')
    +      }
    +      result.append(col)
    +    }
    +    conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString)
    +  }
    +
    +  /*
    +   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty
    +   */
    +  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) {
    +    if (ids != null && ids.size > 0) {
    +      ColumnProjectionUtils.appendReadColumns(conf, ids)
    +    }
    +    appendReadColumnNames(conf, names)
    +  }
    +
    +  def getExternalTmpPath(context: Context, path: Path) = {
    +    context.getExternalTmpPath(path.toUri)
    +  }
    +
    +  def getDataLocationPath(p: Partition) = p.getDataLocation
    +
    +  def getAllPartitionsOf(client: Hive, tbl: Table) =  client.getAllPartitionsOf(tbl)
    +
    +  /*
    +   * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its member path is not.
    +   * Fix it through wrapper.
    +   * */
    +  implicit def wrapperToFileSinkDesc(w: ShimFileSinkDesc): FileSinkDesc = {
    --- End diff --
    
    I thought about this method, but it potentially make the ShimFileSinkDesc harder to maintain in case we need to track some internal state of FileSinkDesc. If you think returning actual FileSinkDesc is better, please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-57939320
  
    I made a shaded hive-0.13.1 version several days ago for testing(https://github.com/scwf/hive/tree/0.13.1-shaded).Hope it is useful:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17812765
  
    --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,85 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import scala.collection.JavaConversions._
    +import org.apache.hadoop.hive.serde2.{Deserializer, ColumnProjectionUtils}
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.`type`.HiveDecimal
    +import java.net.URI
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.ql.stats.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.hadoop.hive.ql.metadata.Hive
    +import org.apache.hadoop.hive.ql.metadata.Table
    +import org.apache.hadoop.hive.ql.processors._
    +import org.apache.hadoop.hive.conf.HiveConf
    +
    +/*hive-0.12.0 support shimmer layer*/
    +object HiveShim {
    --- End diff --
    
    Should be `private[hive]`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58932508
  
    @pwendell following is the trace from arquetMetastoreSuite, and it is caused by the parquet-hive-bundle
    
    sbt/sbt -Phive-0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 "test-only org.apache.spark.sql.parquet.ParquetMetastoreSuite"
    
    SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
    11:11:50.877 ERROR hive.ql.exec.DDLTask: java.lang.NoSuchFieldError: doubleTypeInfo
    	at org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getObjectInspector(ArrayWritableObjectInspector.java:66)
    	at org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.<init>(ArrayWritableObjectInspector.java:59)
    	at org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113)
    	at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:339)
    	at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:288)
    	at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:194)
    	at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:597)
    	at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4189)
    	at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)
    	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
    	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
    	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
    	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
    	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
    	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
    	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
    	at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:297)
    	at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:273)
    	at org.apache.spark.sql.hive.test.TestHiveContext.runSqlHive(TestHive.scala:93)
    	at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)
    	at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)
    	at org.apache.spark.sql.execution.Command$class.execute(commands.scala:44)
    	at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30)
    	at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:361)
    	at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:361)
    	at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
    	at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:104)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17805760
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala ---
    @@ -511,11 +512,14 @@ class HiveQuerySuite extends HiveComparisonTest {
               |WITH serdeproperties('s1'='9')
             """.stripMargin)
         }
    -    sql(s"ADD JAR $testJar")
    -    sql(
    -      """ALTER TABLE alter1 SET SERDE 'org.apache.hadoop.hive.serde2.TestSerDe'
    -        |WITH serdeproperties('s1'='9')
    -      """.stripMargin)
    +    /*now only verify 0.12.0, and ignore other versions due to binary compatability*/
    --- End diff --
    
    Spaces before and after comments.  Nit: Though in this case for a single line comment I'd use `//`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17805523
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala ---
    @@ -71,14 +72,14 @@ case class HiveTableScan(
         Cast(Literal(value), dataType).eval(null)
       }
     
    +
    --- End diff --
    
    Here too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58938955
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amp.lab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21692/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17805581
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala ---
    @@ -71,14 +72,14 @@ case class HiveTableScan(
         Cast(Literal(value), dataType).eval(null)
       }
     
    +
       private def addColumnMetadataToConf(hiveConf: HiveConf) {
         // Specifies needed column IDs for those non-partitioning columns.
         val neededColumnIDs =
           attributes.map(a =>
             relation.attributes.indexWhere(_.name == a.name): Integer).filter(index => index >= 0)
     
    -    ColumnProjectionUtils.appendReadColumnIDs(hiveConf, neededColumnIDs)
    -    ColumnProjectionUtils.appendReadColumnNames(hiveConf, attributes.map(_.name))
    +    HiveShim.appendReadColumns(hiveConf, neededColumnIDs, attributes.map(_.name))
    --- End diff --
    
    @yhuai, you were say there was some bug here in the existing code?  Maybe we can fix this here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-57265563
  
    I meant those implicit classes and methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-57040898
  
    @yhuai Following cmd will build with 0.13.1 support. No extra setup required. Replace any place -Phive in the current building with -Dhive.version=0.13.1. Note that don't use -Phive and -Pthrift-server together with -Dhive.version
     
    sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-57580447
  
    @yhuai I removed all unnecessary implicits to make it consistent, but have to keep wrapperToFileSinkDesc because HiveFileFormatUtils.getHiveRecordWriter needs FileSinkDesc type, and also it help to track the internal state change of FileSinkDesc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18429655
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,158 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.spark.Logging
    +import org.apache.hadoop.{io => hadoopIo}
    +import scala.collection.JavaConversions._
    +import scala.language.implicitConversions
    +
    +/**
    + * A compatibility layer for interacting with Hive version 0.13.1.
    + */
    +private[hive] object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    +
    +  def getTableDesc(
    +    serdeClass: Class[_ <: Deserializer],
    +    inputFormatClass: Class[_ <: InputFormat[_, _]],
    +    outputFormatClass: Class[_],
    +    properties: Properties) = {
    +    new TableDesc(inputFormatClass, outputFormatClass, properties)
    +  }
    +
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +
    +  def createDefaultDBIfNeeded(context: HiveContext) ={
    +    context.runSqlHive("CREATE DATABASE default")
    +    context.runSqlHive("USE default")
    +  }
    +
    +  /** The string used to denote an empty comments field in the schema. */
    +  def getEmptyCommentsFieldValue = ""
    +
    +  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
    +    CommandProcessorFactory.get(cmd, conf)
    +  }
    +
    +  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
    +    HiveDecimal.create(bd)
    +  }
    +
    +  /*
    +   * This function in hive-0.13 become private, but we have to do this to walkaround hive bug
    +   */
    +  private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) {
    +    val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
    +    val result: StringBuilder = new StringBuilder(old)
    +    var first: Boolean = old.isEmpty
    +
    +    for (col <- cols) {
    +      if (first) {
    +        first = false
    +      }
    +      else {
    +        result.append(',')
    +      }
    +      result.append(col)
    +    }
    +    conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString)
    +  }
    +
    +  /*
    +   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty
    +   */
    +  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) {
    +    if (ids != null && ids.size > 0) {
    +      ColumnProjectionUtils.appendReadColumns(conf, ids)
    +    }
    +    appendReadColumnNames(conf, names)
    +  }
    +
    +  def getExternalTmpPath(context: Context, path: Path) = {
    +    context.getExternalTmpPath(path.toUri)
    +  }
    +
    +  def getDataLocationPath(p: Partition) = p.getDataLocation
    +
    +  def getAllPartitionsOf(client: Hive, tbl: Table) =  client.getAllPartitionsOf(tbl)
    +
    +  /*
    +   * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its member path is not.
    +   * Fix it through wrapper.
    --- End diff --
    
    I am pretty confused about it. I think Hive needs to serialize FileSinkDesc when the query plan needs to be serialized. So, how does Hive handle this issue?
    
    Also, serilizable=>serializable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54777588
  
    Hey @zhzhan thanks for looking at this. One high level comment on this - we tend to prefer using reflection rather than having multiple source directories and relying on build tricks. The latter gets really complicated when you have many different versions to keep track of and requires much more invasive build changes. Would it be possible to restructure this using that approach?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59001076
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21718/consoleFull) for   PR 2241 at commit [`cbb4691`](https://github.com/apache/spark/commit/cbb46911f881a0704fa16a36f2a362a930db6ade).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class ShimFileSinkDesc(var dir: String, var tableInfo: TableDesc, var compressed: Boolean)`
      * `class ShimFileSinkDesc(var dir: String, var tableInfo: TableDesc, var compressed: Boolean)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17807461
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -178,6 +179,7 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with
       override def unregisterAllTables() = {}
     }
     
    +
    --- End diff --
    
    Extra line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2241


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17806062
  
    --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,85 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import scala.collection.JavaConversions._
    +import org.apache.hadoop.hive.serde2.{Deserializer, ColumnProjectionUtils}
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.`type`.HiveDecimal
    +import java.net.URI
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.ql.stats.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.hadoop.hive.ql.metadata.Hive
    +import org.apache.hadoop.hive.ql.metadata.Table
    +import org.apache.hadoop.hive.ql.processors._
    +import org.apache.hadoop.hive.conf.HiveConf
    +
    +/*hive-0.12.0 support shimmer layer*/
    --- End diff --
    
    Use Scala doc style comments. [Spark Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide)
    
    For example:
    
    ```scala
    /**
     * A compatibility layer for interacting with Hive version 0.12.0.
     */


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17806210
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala ---
    @@ -71,14 +72,14 @@ case class HiveTableScan(
         Cast(Literal(value), dataType).eval(null)
       }
     
    +
       private def addColumnMetadataToConf(hiveConf: HiveConf) {
         // Specifies needed column IDs for those non-partitioning columns.
         val neededColumnIDs =
           attributes.map(a =>
             relation.attributes.indexWhere(_.name == a.name): Integer).filter(index => index >= 0)
     
    -    ColumnProjectionUtils.appendReadColumnIDs(hiveConf, neededColumnIDs)
    -    ColumnProjectionUtils.appendReadColumnNames(hiveConf, attributes.map(_.name))
    +    HiveShim.appendReadColumns(hiveConf, neededColumnIDs, attributes.map(_.name))
    --- End diff --
    
    Yeah, because we were always using append, these lists will be messed up in the case of join or after running the first query. I opened https://issues.apache.org/jira/browse/SPARK-3559. It will be great if it can also fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17805241
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
    @@ -170,13 +171,14 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
     
             val tableParameters = relation.hiveQlTable.getParameters
             val oldTotalSize =
    -          Option(tableParameters.get(StatsSetupConst.TOTAL_SIZE)).map(_.toLong).getOrElse(0L)
    +          Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize)).
    --- End diff --
    
    Don't orphan the `.`.  I'd format this as:
    
    ```scala
    Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize))
      .map(_.toLong)
      .getOrElse(0L)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17805768
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala ---
    @@ -511,11 +512,14 @@ class HiveQuerySuite extends HiveComparisonTest {
               |WITH serdeproperties('s1'='9')
             """.stripMargin)
         }
    -    sql(s"ADD JAR $testJar")
    -    sql(
    -      """ALTER TABLE alter1 SET SERDE 'org.apache.hadoop.hive.serde2.TestSerDe'
    -        |WITH serdeproperties('s1'='9')
    -      """.stripMargin)
    +    /*now only verify 0.12.0, and ignore other versions due to binary compatability*/
    +    if (HiveShim.version.equals("0.12.0")) {
    --- End diff --
    
    Use `==`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17805365
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -59,7 +60,7 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with
         val table = client.getTable(databaseName, tblName)
         val partitions: Seq[Partition] =
           if (table.isPartitioned) {
    -        client.getAllPartitionsForPruner(table).toSeq
    +        client.getAllPartitionsOf(table).toSeq
    --- End diff --
    
    This is going to change the performance right?  It was my understanding that the `forPruner` part told hive to include less information making the call cheaper.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18786888
  
    --- Diff: assembly/pom.xml ---
    @@ -205,6 +211,21 @@
           </dependencies>
         </profile>
         <profile>
    +      <id>hive-versions</id>
    --- End diff --
    
    We already require the `-Phive` profile to add hive to the assembly, so I think that is sufficient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18429736
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,158 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.spark.Logging
    +import org.apache.hadoop.{io => hadoopIo}
    +import scala.collection.JavaConversions._
    +import scala.language.implicitConversions
    +
    +/**
    + * A compatibility layer for interacting with Hive version 0.13.1.
    + */
    +private[hive] object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    +
    +  def getTableDesc(
    +    serdeClass: Class[_ <: Deserializer],
    +    inputFormatClass: Class[_ <: InputFormat[_, _]],
    +    outputFormatClass: Class[_],
    +    properties: Properties) = {
    +    new TableDesc(inputFormatClass, outputFormatClass, properties)
    +  }
    +
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +
    +  def createDefaultDBIfNeeded(context: HiveContext) ={
    +    context.runSqlHive("CREATE DATABASE default")
    +    context.runSqlHive("USE default")
    +  }
    +
    +  /** The string used to denote an empty comments field in the schema. */
    +  def getEmptyCommentsFieldValue = ""
    +
    +  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
    +    CommandProcessorFactory.get(cmd, conf)
    +  }
    +
    +  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
    +    HiveDecimal.create(bd)
    +  }
    +
    +  /*
    +   * This function in hive-0.13 become private, but we have to do this to walkaround hive bug
    +   */
    +  private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) {
    +    val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
    +    val result: StringBuilder = new StringBuilder(old)
    +    var first: Boolean = old.isEmpty
    +
    +    for (col <- cols) {
    +      if (first) {
    +        first = false
    +      }
    +      else {
    +        result.append(',')
    +      }
    +      result.append(col)
    +    }
    +    conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString)
    +  }
    +
    +  /*
    +   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty
    +   */
    +  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) {
    +    if (ids != null && ids.size > 0) {
    +      ColumnProjectionUtils.appendReadColumns(conf, ids)
    +    }
    +    appendReadColumnNames(conf, names)
    +  }
    +
    +  def getExternalTmpPath(context: Context, path: Path) = {
    +    context.getExternalTmpPath(path.toUri)
    +  }
    +
    +  def getDataLocationPath(p: Partition) = p.getDataLocation
    +
    +  def getAllPartitionsOf(client: Hive, tbl: Table) =  client.getAllPartitionsOf(tbl)
    +
    +  /*
    +   * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its member path is not.
    +   * Fix it through wrapper.
    +   * */
    +  implicit def wrapperToFileSinkDesc(w: ShimFileSinkDesc): FileSinkDesc = {
    --- End diff --
    
    If we need to use `ShimFileSinkDesc`, how about we add a method in it to return the actual `FileSinkDesc` instead of using an implicit method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-57264575
  
    @yhuai Can you be more specific regarding the comments: "I think those implicits are not necessary. Can you change those?"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17833077
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,147 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import scala.collection.JavaConversions._
    +import org.apache.spark.Logging
    +import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import java.util.Properties
    +import org.apache.hadoop.hive.serde2.Deserializer
    +
    +/*hive-0.13.1 support shimmer layer*/
    +object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    +  def getTableDesc(serdeClass: Class[_ <: Deserializer], inputFormatClass: Class[_ <: InputFormat[_, _]], outputFormatClass: Class[_], properties: Properties) = {
    +    new TableDesc(inputFormatClass, outputFormatClass, properties)
    +  }
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +  def createDefaultDBIfNeeded(context: HiveContext) ={
    +    context.runSqlHive("CREATE DATABASE default")
    +    context.runSqlHive("USE default")
    +  }
    +  /*handle the difference in HiveQuerySuite*/
    +  def getEmptyCommentsFieldValue = ""
    +
    +  def convertCatalystString2Hive(s: String) = s
    +
    +  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
    +    CommandProcessorFactory.get(cmd, conf)
    +  }
    +
    +  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
    +    HiveDecimal.create(bd)
    +  }
    +
    +  /*
    +   * This function in hive-0.13 become private, but we have to do this to walkaround hive bug
    +   * */
    +  private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) {
    +    val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
    +    val result: StringBuilder = new StringBuilder(old)
    +    var first: Boolean = old.isEmpty
    +
    +    for (col <- cols) {
    +      if (first) {
    +        first = false
    +      }
    +      else {
    +        result.append(',')
    +      }
    +      result.append(col)
    +    }
    +    conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString)
    +  }
    +
    +  /*
    +   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty
    --- End diff --
    
    I didn't see appendReadColumns handles empty strings, and met error in this case during testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18431391
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,158 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.spark.Logging
    +import org.apache.hadoop.{io => hadoopIo}
    +import scala.collection.JavaConversions._
    +import scala.language.implicitConversions
    +
    +/**
    + * A compatibility layer for interacting with Hive version 0.13.1.
    + */
    +private[hive] object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    --- End diff --
    
    Yeah I think you are right, it is unbounded in Hive 12.  Spark SQL also will use unbounded precision decimals internally, so when its not specified thats what we should assume.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58979166
  
    @zhzhan yeah you can just change `dev/run-tests` to build with the hive 12 profile for now where it currently has `-Phive`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59881210
  
    @scwf Did you also replace the query plan for hive0.13 in your another PR? because I also saw some query plan changes in hive0.13.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18786749
  
    --- Diff: assembly/pom.xml ---
    @@ -205,6 +211,21 @@
           </dependencies>
         </profile>
         <profile>
    +      <id>hive-versions</id>
    --- End diff --
    
    I think this can be removed - right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58943420
  
    Thanks! Could you elaborate more on the second point you were making? About the compatibility tests? Is Hive 0.13 no language-compatible with Hive 0.12?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18430235
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala ---
    @@ -369,6 +371,7 @@ class TestHiveContext(sc: SparkContext) extends HiveContext(sc) {
        * tests.
        */
       protected val originalUdfs: JavaSet[String] = FunctionRegistry.getFunctionNames
    +  HiveShim.createDefaultDBIfNeeded(this)
    --- End diff --
    
    Can you add a comment at here to explain why it is necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17806162
  
    --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,85 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import scala.collection.JavaConversions._
    +import org.apache.hadoop.hive.serde2.{Deserializer, ColumnProjectionUtils}
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.`type`.HiveDecimal
    +import java.net.URI
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.ql.stats.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.hadoop.hive.ql.metadata.Hive
    +import org.apache.hadoop.hive.ql.metadata.Table
    +import org.apache.hadoop.hive.ql.processors._
    +import org.apache.hadoop.hive.conf.HiveConf
    +
    +/*hive-0.12.0 support shimmer layer*/
    +object HiveShim {
    +  val version = "0.12.0"
    +  val metastoreDecimal = "decimal"
    +  def getTableDesc(serdeClass: Class[_ <: Deserializer], inputFormatClass: Class[_ <: InputFormat[_, _]], outputFormatClass: Class[_], properties: Properties) = {
    +    new TableDesc(serdeClass, inputFormatClass, outputFormatClass, properties)
    +  }
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +  def createDefaultDBIfNeeded(context: HiveContext) ={  }
    +
    +  /*handle the difference in "None" and empty ""*/
    --- End diff --
    
    `/** The string used to denote an empty comments field in the schema. */`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58849802
  
    @zhzhan @scwf - I think this should be okay now for protobuf. We made some other changes this week updating the protobuf version to be based on protobuf 2.5 instead of 2.4 in akka. So now throughout Spark we use this. Mind rebasing this? I think the protobuf issue will go away.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54239094
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19616/consoleFull) for   PR 2241 at commit [`94b4fdc`](https://github.com/apache/spark/commit/94b4fdc8600cc6d5d4fe338e293be74081a0e2d4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54254817
  
    PySpark again :(
    
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54238705
  
    done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18195198
  
    --- Diff: sql/hive/pom.xml ---
    @@ -119,6 +83,74 @@
     
       <profiles>
         <profile>
    +      <id>hive-default</id>
    +      <activation>
    +        <property>
    +          <name>!hive.version</name>
    --- End diff --
    
    What will happen if we have `-Dhive.version=0.12.0`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54692062
  
    Thanks for the follow up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54386775
  
    I agree. I saw that part of code, and don't think it is sustainable, but no better option for now. In this patch, I skip the thriftserver support totally in hive-0.13.1. I think the problem may be fixed in hive side. Currently hive is a relatively closed system.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-60426160
  
    Okay, I've merged this to master. Will file a PR shortly to fix the tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17834283
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -39,6 +38,8 @@ import org.apache.spark.sql.columnar.InMemoryRelation
     import org.apache.spark.sql.hive.execution.HiveTableScan
     import org.apache.spark.util.Utils
     
    +import org.apache.spark.sql.hive.HiveShim
    +import org.apache.spark.sql.hive.HiveShim._
    --- End diff --
    
    Yes. It need the implicit methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17807158
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
    @@ -292,24 +294,20 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
           val cmd_trimmed: String = cmd.trim()
           val tokens: Array[String] = cmd_trimmed.split("\\s+")
           val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
    -      val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf)
    -
    -      SessionState.start(sessionState)
    +      val proc: CommandProcessor = HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
     
           proc match {
             case driver: Driver =>
    -          driver.init()
    -
               val results = new JArrayList[String]
               val response: CommandProcessorResponse = driver.run(cmd)
               // Throw an exception if there is an error in query processing.
               if (response.getResponseCode != 0) {
    -            driver.destroy()
    +            driver.close
    --- End diff --
    
    Why change it to `close`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58928624
  
    @pwendell Thanks a lot for the help. But I have several concerns. 
    1. Not sure how the new package is created. But hive-0.13.1 has native parquet support, and I remember I met binary compatibility issue when <artifactId>parquet-hive-bundle</artifactId> is included. So I exclude this one form hive-0.13.1.
    2. Should Hive-0.13.1 is not compatible with hive-0.12.0. So hive-0.13.1 does not support all compatibility test suite yet. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by gss2002 <gi...@git.apache.org>.

Github user gss2002 commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-56185334
  
    We have been using this fix for a few weeks now against Hive 13. The only outstanding issue I see and this could be something larger is the fact that Spark Thrift service doesn't seem to support the hive.server2.enable.doAs = true. It doesn't set proxy user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18496664
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
    @@ -287,22 +289,20 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
           val cmd_trimmed: String = cmd.trim()
           val tokens: Array[String] = cmd_trimmed.split("\\s+")
           val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
    -      val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf)
    +      val proc: CommandProcessor = HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
     
           proc match {
             case driver: Driver =>
    -          driver.init()
    -
               val results = new JArrayList[String]
    --- End diff --
    
    Need consider compatibility here since ```driver.getResults(results)``` api changed in hive-0.13.1. You can refer my change there(https://github.com/scwf/spark/blob/hive-0.13.1-clean/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L295-L310)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58938511
  
    @pwendell Sounds good to me. Will update the patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58850172
  
    Ok, thanks for that, i will also test it in https://github.com/apache/spark/pull/2685


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58994264
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21718/consoleFull) for   PR 2241 at commit [`cbb4691`](https://github.com/apache/spark/commit/cbb46911f881a0704fa16a36f2a362a930db6ade).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17808277
  
    --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,85 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import scala.collection.JavaConversions._
    +import org.apache.hadoop.hive.serde2.{Deserializer, ColumnProjectionUtils}
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.`type`.HiveDecimal
    +import java.net.URI
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.ql.stats.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.hadoop.hive.ql.metadata.Hive
    +import org.apache.hadoop.hive.ql.metadata.Table
    +import org.apache.hadoop.hive.ql.processors._
    +import org.apache.hadoop.hive.conf.HiveConf
    +
    +/*hive-0.12.0 support shimmer layer*/
    +object HiveShim {
    +  val version = "0.12.0"
    +  val metastoreDecimal = "decimal"
    +  def getTableDesc(serdeClass: Class[_ <: Deserializer], inputFormatClass: Class[_ <: InputFormat[_, _]], outputFormatClass: Class[_], properties: Properties) = {
    +    new TableDesc(serdeClass, inputFormatClass, outputFormatClass, properties)
    +  }
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +  def createDefaultDBIfNeeded(context: HiveContext) ={  }
    +
    +  /*handle the difference in "None" and empty ""*/
    +  def getEmptyCommentsFieldValue = "None"
    +
    +  def convertCatalystString2Hive(s: String) = new hadoopIo.Text(s) // TODO why should be Text?
    --- End diff --
    
    Also, I guess it will be better to use `convertCatalystStringToHive`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17832668
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
    @@ -292,24 +294,20 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
           val cmd_trimmed: String = cmd.trim()
           val tokens: Array[String] = cmd_trimmed.split("\\s+")
           val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
    -      val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf)
    -
    -      SessionState.start(sessionState)
    +      val proc: CommandProcessor = HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
     
           proc match {
             case driver: Driver =>
    -          driver.init()
    -
               val results = new JArrayList[String]
               val response: CommandProcessorResponse = driver.run(cmd)
               // Throw an exception if there is an error in query processing.
               if (response.getResponseCode != 0) {
    -            driver.destroy()
    +            driver.close
    --- End diff --
    
    I follow some example case in hive code base. If driver.destory is used, there are some very weird behavior. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58937829
  
    Okay then how about this to keep it simple:
    
    1. Add the `parquet-hive-bundle` dependency inside of `hive/pom.xml` inside of a block for the `hive-0.12-0` profile only.
    2. Change the default values of `hive.version`, `hive.short.version`, etc. to 0.13.1
    3. Update the instructions as follows:
    
    ```
    # Apache Hadoop 2.4.X with Hive 13 support
    mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package
    
    # Apache Hadoop 2.4.X with Hive 12 support
     mvn -Pyarn -Phive-0.12.0 -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package
    ```
    
    The issue is that we can't rely on dynamic profile activation because it messes up people linking against the `spark-hive` artifact. Many build tools like SBT do not support loading profiles in poms from other projects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17805519
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala ---
    @@ -35,6 +35,7 @@ import org.apache.spark.sql.catalyst.types.{BooleanType, DataType}
     import org.apache.spark.sql.execution._
     import org.apache.spark.sql.hive._
     
    +
    --- End diff --
    
    Remove these spurious changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58995591
  
    @marmbrus from a build perspective this LGTM with the caveat that right now it's only passing Hive compatibility for 0.12 tests and may require further modification to actually pass 0.13 tests. Up to you in terms of whether that blocks merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17812902
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,147 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import scala.collection.JavaConversions._
    +import org.apache.spark.Logging
    +import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import java.util.Properties
    +import org.apache.hadoop.hive.serde2.Deserializer
    +
    +/*hive-0.13.1 support shimmer layer*/
    +object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    +  def getTableDesc(serdeClass: Class[_ <: Deserializer], inputFormatClass: Class[_ <: InputFormat[_, _]], outputFormatClass: Class[_], properties: Properties) = {
    +    new TableDesc(inputFormatClass, outputFormatClass, properties)
    +  }
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +  def createDefaultDBIfNeeded(context: HiveContext) ={
    +    context.runSqlHive("CREATE DATABASE default")
    +    context.runSqlHive("USE default")
    +  }
    +  /*handle the difference in HiveQuerySuite*/
    +  def getEmptyCommentsFieldValue = ""
    +
    +  def convertCatalystString2Hive(s: String) = s
    +
    +  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
    +    CommandProcessorFactory.get(cmd, conf)
    +  }
    +
    +  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
    +    HiveDecimal.create(bd)
    +  }
    +
    +  /*
    +   * This function in hive-0.13 become private, but we have to do this to walkaround hive bug
    +   * */
    +  private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) {
    +    val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
    +    val result: StringBuilder = new StringBuilder(old)
    +    var first: Boolean = old.isEmpty
    +
    +    for (col <- cols) {
    +      if (first) {
    +        first = false
    +      }
    +      else {
    +        result.append(',')
    +      }
    +      result.append(col)
    +    }
    +    conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString)
    +  }
    +
    +  /*
    +   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty
    +   * */
    +  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) {
    +    if (ids != null && ids.size > 0) {
    +      ColumnProjectionUtils.appendReadColumns(conf, ids)
    +    } else {
    +      appendReadColumnNames(conf, names)
    +    }
    +  }
    +
    +  /*
    +   * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its member path is not.
    +   * Fix it through wrapper.
    +   * */
    +  implicit def wrapperToFileSinkDesc(w: ShimFileSinkDesc): FileSinkDesc = {
    +    var f = new FileSinkDesc(new Path(w.dir), w.tableInfo, w.compressed)
    +    f.setCompressed(w.compressed)
    +    f.setCompressCodec(w.compressCodec)
    +    f.setCompressType(w.compressType)
    +    f.setTableInfo(w.tableInfo)
    +    f
    +  }
    +
    +  implicit class wrapperToPartition(p: Partition) {
    +    def getDataLocationPath: Path = p.getDataLocation
    +  }
    +}
    +
    +class ShimContext(conf: Configuration) extends Context(conf) {
    +  def  getExternalTmpPath(path: Path): Path = {
    +    super.getExternalTmpPath (path.toUri)
    +  }
    +}
    +
    +class ShimFileSinkDesc(var dir: String, var tableInfo: TableDesc, var compressed: Boolean)
    --- End diff --
    
    Why did you choose inheritance for the Context but implicits for the FileSinkDesc?  I'm pretty wary of implicits that do any massaging of types...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-55692499
  
    I am not very familiar the process. Can some committer help commit the code or do I need to do anything else for the code to go to upstream? Please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54254223
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19628/consoleFull) for   PR 2241 at commit [`94b4fdc`](https://github.com/apache/spark/commit/94b4fdc8600cc6d5d4fe338e293be74081a0e2d4).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17805479
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala ---
    @@ -77,7 +78,7 @@ class TestHiveContext(sc: SparkContext) extends HiveContext(sc) {
     
       // For some hive test case which contain ${system:test.tmp.dir}
       System.setProperty("test.tmp.dir", testTempDir.getCanonicalPath)
    -
    +  CommandProcessorFactory.clean(hiveconf);
    --- End diff --
    
    No `;`.  Also a comment about what is happening here.  Is this thread safe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17805441
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
    @@ -212,7 +214,13 @@ private[hive] object HiveQl {
       /**
        * Returns the AST for the given SQL string.
        */
    -  def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql))
    +  def getAst(sql: String): ASTNode = {
    +    val hContext = new Context(new HiveConf())
    +    val node = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql, hContext))
    +    hContext.clear
    --- End diff --
    
    Use `()`.  Maybe also a comment about what is happening here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59880136
  
    @scwf, which PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59145387
  
    @marmbrus I actually quite exhaustively tested the code in both unit test and system test in sandbox, and real cluster, and didn't see major issues. Regarding the compatibility test, there are several test case failure due to some hive 0.13 internal behavior change, e.g, hive decimal. We can fix it in the follow up. In my point of view, it would be good to take a accumulative approach. The current patch does not have impact on existing hive 12 support, but enable the community to actively improve hive0.13 support.
    
    Some instant benefits: 1st. Native parquet support, 2nd. some new UDFs in hive 0.13, and 3rd: better support for ORC as source, e.g., compression, predictor push down, etc.
    
    Please let me know if you have any other concerns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-57574745
  
    @pwendell I think the packaging has some problem, probably in protobuf. I ran some test suite, but cannot go through. With the original package, the test is OK. Following are some example failure case.
    
    sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 "test-only org.apache.spark.sql.hive.CachedTableSuite"
    Caused by: sbt.ForkMain$ForkError: com.google.protobuf_spark.GeneratedMessage
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    
    sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 "test-only org.apache.spark.sql.hive.execution.HiveQuerySuite"
    [info]   ...
    [info]   Cause: java.lang.ClassNotFoundException: com.google.protobuf_spark.GeneratedMessage
    [info]   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    
    sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 "test-only org.apache.spark.sql.parquet.ParquetMetastoreSuite"
    [info]   ...
    [info]   Cause: java.lang.ClassNotFoundException: com.google.protobuf_spark.GeneratedMessage
    [info]   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-57273157
  
    Hey @zhzhan I've published a modified version of Hive 0.13 that we can link against. A few benefits is:
    
    1. I fixed the hive-exec jar so it only contains hive packages and not a bunch of other code.
    2. I'm using a shaded version of the protobuf dependency (otherwise, this intefers with protobuf found in older hadoop client versions).
    
    https://oss.sonatype.org/content/repositories/orgspark-project-1077/org/spark-project/hive/
    
    For now you'll need to add this repo to the build to use it, but if it all works I can just fully publish this to maven central.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17805395
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -39,6 +38,8 @@ import org.apache.spark.sql.columnar.InMemoryRelation
     import org.apache.spark.sql.hive.execution.HiveTableScan
     import org.apache.spark.util.Utils
     
    +import org.apache.spark.sql.hive.HiveShim
    +import org.apache.spark.sql.hive.HiveShim._
    --- End diff --
    
    I wouldn't import `HiveShim._` if you are going to qualify all of the accesses below with `HiveShim` anyway (which I think is a good idea).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54383233
  
    PySpark test suites are somewhat flaky and sometimes fail, it's unrelated to your changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54691357
  
    Jenkins is not working very well because of an accidental power outage yesterday, and people are working on recovering it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54383250
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58929975
  
    @zhzhan could you give a reproduction of the compatiblity issue? around parquet support? It didn't occur at compile time but maybe there is an issue at runtime.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18439490
  
    --- Diff: pom.xml ---
    @@ -1260,7 +1259,18 @@
             </dependency>
           </dependencies>
         </profile>
    -
    +    <profile>
    +      <id>hive-default</id>
    +      <activation>
    +        <property>
    +          <name>!hive.version</name>
    +        </property>
    +      </activation>
    +      <properties>
    +        <hive.version>0.12.0</hive.version>
    +        <derby.version>10.4.2.0</derby.version>
    +      </properties>
    +    </profile>
         <profile>
           <id>hive</id>
    --- End diff --
    
    The original patch took this way, but with this new approach, it can support different hive versions without changing pom file further. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17805610
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -40,6 +39,9 @@ import org.apache.spark.rdd.RDD
     import org.apache.spark.sql.catalyst.expressions.Row
     import org.apache.spark.sql.execution.{SparkPlan, UnaryNode}
     import org.apache.spark.sql.hive.{HiveContext, MetastoreRelation, SparkHiveHadoopWriter}
    +import org.apache.spark.sql.hive.{ShimFileSinkDesc => FileSinkDesc, ShimContext => Context}
    +import org.apache.spark.sql.hive.HiveShim
    +import org.apache.spark.sql.hive.HiveShim._
    --- End diff --
    
    Remove this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17808556
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -39,6 +38,8 @@ import org.apache.spark.sql.columnar.InMemoryRelation
     import org.apache.spark.sql.hive.execution.HiveTableScan
     import org.apache.spark.util.Utils
     
    +import org.apache.spark.sql.hive.HiveShim
    +import org.apache.spark.sql.hive.HiveShim._
    --- End diff --
    
    Seems it is trying to load those implicit methods?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18186480
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
    @@ -284,24 +287,20 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
           val cmd_trimmed: String = cmd.trim()
           val tokens: Array[String] = cmd_trimmed.split("\\s+")
           val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
    -      val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf)
    -
    -      SessionState.start(sessionState)
    +      val proc: CommandProcessor = HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
     
           proc match {
             case driver: Driver =>
    -          driver.init()
    -
               val results = new JArrayList[String]
               val response: CommandProcessorResponse = driver.run(cmd)
               // Throw an exception if there is an error in query processing.
               if (response.getResponseCode != 0) {
    -            driver.destroy()
    +            driver.close()
                 throw new QueryExecutionException(response.getErrorMessage)
               }
               driver.setMaxRows(maxRows)
               driver.getResults(results)
    -          driver.destroy()
    +          driver.close()
    --- End diff --
    
    My understanding is that the driver is not removed from mapDriver of CommandProcessorFactory, and will be reused.
    
    CommandProcessorFactory.clean(hiveConf) will destroy and remove the driver from mapDriver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18430463
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,158 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.spark.Logging
    +import org.apache.hadoop.{io => hadoopIo}
    +import scala.collection.JavaConversions._
    +import scala.language.implicitConversions
    +
    +/**
    + * A compatibility layer for interacting with Hive version 0.13.1.
    + */
    +private[hive] object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    --- End diff --
    
    Can you double check it? I am not sure "DECIMAL in hive-0.12 is actually DECIMAL(10,0)". From the code, seems precision is unbounded.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17833372
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,147 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import scala.collection.JavaConversions._
    +import org.apache.spark.Logging
    +import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import java.util.Properties
    +import org.apache.hadoop.hive.serde2.Deserializer
    +
    +/*hive-0.13.1 support shimmer layer*/
    +object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    +  def getTableDesc(serdeClass: Class[_ <: Deserializer], inputFormatClass: Class[_ <: InputFormat[_, _]], outputFormatClass: Class[_], properties: Properties) = {
    +    new TableDesc(inputFormatClass, outputFormatClass, properties)
    +  }
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +  def createDefaultDBIfNeeded(context: HiveContext) ={
    +    context.runSqlHive("CREATE DATABASE default")
    +    context.runSqlHive("USE default")
    +  }
    +  /*handle the difference in HiveQuerySuite*/
    +  def getEmptyCommentsFieldValue = ""
    +
    +  def convertCatalystString2Hive(s: String) = s
    +
    +  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
    +    CommandProcessorFactory.get(cmd, conf)
    +  }
    +
    +  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
    +    HiveDecimal.create(bd)
    +  }
    +
    +  /*
    +   * This function in hive-0.13 become private, but we have to do this to walkaround hive bug
    +   * */
    +  private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) {
    +    val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "")
    +    val result: StringBuilder = new StringBuilder(old)
    +    var first: Boolean = old.isEmpty
    +
    +    for (col <- cols) {
    +      if (first) {
    +        first = false
    +      }
    +      else {
    +        result.append(',')
    +      }
    +      result.append(col)
    +    }
    +    conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString)
    +  }
    +
    +  /*
    +   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty
    +   * */
    +  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) {
    +    if (ids != null && ids.size > 0) {
    +      ColumnProjectionUtils.appendReadColumns(conf, ids)
    +    } else {
    +      appendReadColumnNames(conf, names)
    +    }
    +  }
    +
    +  /*
    +   * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its member path is not.
    +   * Fix it through wrapper.
    +   * */
    +  implicit def wrapperToFileSinkDesc(w: ShimFileSinkDesc): FileSinkDesc = {
    +    var f = new FileSinkDesc(new Path(w.dir), w.tableInfo, w.compressed)
    +    f.setCompressed(w.compressed)
    +    f.setCompressCodec(w.compressCodec)
    +    f.setCompressType(w.compressType)
    +    f.setTableInfo(w.tableInfo)
    +    f
    +  }
    +
    +  implicit class wrapperToPartition(p: Partition) {
    +    def getDataLocationPath: Path = p.getDataLocation
    +  }
    +}
    +
    +class ShimContext(conf: Configuration) extends Context(conf) {
    +  def  getExternalTmpPath(path: Path): Path = {
    +    super.getExternalTmpPath (path.toUri)
    +  }
    +}
    +
    +class ShimFileSinkDesc(var dir: String, var tableInfo: TableDesc, var compressed: Boolean)
    --- End diff --
    
    ShimFileSinkDesc is a walkaround for the bug of FileSinkDesc introduced in hive 0.13.1. Comparing to other techniques, this way is straightforward. Otherwise, we have to somehow serialize the path in the driver side, and reconstruct it in the executor side, which is more complicated. The ShimContext needs to handle two changes in hive 0.13.1. One is the input parameter type change and also the function name change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18433827
  
    --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,84 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.net.URI
    +import java.util.Properties
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.`type`.HiveDecimal
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.{Hive, Partition, Table}
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.ql.processors._
    +import org.apache.hadoop.hive.ql.stats.StatsSetupConst
    +import org.apache.hadoop.hive.serde2.{Deserializer, ColumnProjectionUtils}
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.mapred.InputFormat
    +import scala.collection.JavaConversions._
    +import scala.language.implicitConversions
    +
    +/**
    + * A compatibility layer for interacting with Hive version 0.12.0.
    + */
    +private[hive] object HiveShim {
    +  val version = "0.12.0"
    +  val metastoreDecimal = "decimal"
    +
    +  def getTableDesc(
    +    serdeClass: Class[_ <: Deserializer],
    +    inputFormatClass: Class[_ <: InputFormat[_, _]],
    +    outputFormatClass: Class[_],
    +    properties: Properties) = {
    +    new TableDesc(serdeClass, inputFormatClass, outputFormatClass, properties)
    --- End diff --
    
    we need return TableDesc to MetastoreRelation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58956296
  
    @pwendell 
    With this new patch, we actually have to change the test script to pass all hive related test suite.  Need the help from somebody familiar with the test infra to add -Phive-0.12.0. (is the change of dev/run-tests enough?)
    
    Regarding the compatibility test, the query plan in hive0.13 is different from hive 0.12. In the meantime, there is some new feature and internal change, for example decimal support, GenericUDFRound etc. But their scope is much smaller than this one and we can fix it in the following up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18805288
  
    --- Diff: sql/hive/pom.xml ---
    @@ -155,6 +162,25 @@
             <artifactId>scalatest-maven-plugin</artifactId>
           </plugin>
     
    +          <plugin>
    --- End diff --
    
    The indentation seems off here. It might be my fault from my earlier patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59001081
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21718/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-60148210
  
    @marmbrus Thanks for the comments. 
    
    Given that we have to support hive-0.12. There are two approaches I can think out to address the issue. 
    
    1st: we can temporally make the compatibility test  as hive12 only in pom, and find a good way as followup to add corresponding compatibility test for hive0.13 in an elegant way. This approach also unblock some other jiras and build the foundation for further development  of hive 0.13 feature support. 
    
    2nd: we can create a set of separate files for hive-0.13, e.g., compatibility suite, golden plan, golden answer, which may involve more than hundreds of files. In addition we need to change the current basic hive test code to adapt to different hive version. I think this approach may be a little rush, and also make the scope of this PR really big and hard to maintain.
    
    I prefer the first approach, and opening followup jiras to address leftovers in a more granular way.
    
    Please let know how you do think about it. If you have other options, please also  let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59839556
  
    Thanks, if you have any comment, let me know:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17808871
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
    @@ -140,7 +141,7 @@ class HadoopTableReader(
           filterOpt: Option[PathFilter]): RDD[Row] = {
         val hivePartitionRDDs = partitionToDeserializer.map { case (partition, partDeserializer) =>
           val partDesc = Utilities.getPartitionDesc(partition)
    -      val partPath = partition.getPartitionPath
    +      val partPath = partition.getDataLocationPath
    --- End diff --
    
    Oh, you meant to call the method in implicit class `wrapperToPartition`? 
    
    btw, Seems `getPartitionPath` is not in Hive 0.13.1, but it does appear in trunk... 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54694343
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-56243807
  
    Hi @zhzhan, thanks for working on this!  and sorry that I took me a while to look at it.  I made several comments inline.
    
    I'll let @pwendell comment on the changes to the build file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54238775
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18434383
  
    --- Diff: pom.xml ---
    @@ -1260,7 +1259,18 @@
             </dependency>
           </dependencies>
         </profile>
    -
    +    <profile>
    +      <id>hive-default</id>
    +      <activation>
    +        <property>
    +          <name>!hive.version</name>
    +        </property>
    +      </activation>
    +      <properties>
    +        <hive.version>0.12.0</hive.version>
    +        <derby.version>10.4.2.0</derby.version>
    +      </properties>
    +    </profile>
         <profile>
           <id>hive</id>
    --- End diff --
    
    Do we need profile hive? Since we only support hive-0.12.0 and hive-0.13.1, i think we can provide two profile here is ok: profile hive-0.12 for version 0.12.0 and profile hive-0.13 for version 0.13.1. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17646831
  
    --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,85 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import java.util.Properties
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import scala.collection.JavaConversions._
    +import org.apache.hadoop.hive.serde2.{Deserializer, ColumnProjectionUtils}
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.common.`type`.HiveDecimal
    +import java.net.URI
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.ql.stats.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import org.apache.hadoop.hive.ql.metadata.Hive
    +import org.apache.hadoop.hive.ql.metadata.Table
    +import org.apache.hadoop.hive.ql.processors._
    +import org.apache.hadoop.hive.conf.HiveConf
    +
    +/*hive-0.12.0 support shimmer layer*/
    +object HiveShim {
    +  val version = "0.12.0"
    +  val metastoreDecimal = "decimal"
    +  def getTableDesc(serdeClass: Class[_ <: Deserializer], inputFormatClass: Class[_ <: InputFormat[_, _]], outputFormatClass: Class[_], properties: Properties) = {
    +    new TableDesc(serdeClass, inputFormatClass, outputFormatClass, properties)
    +  }
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +  def createDefaultDBIfNeeded(context: HiveContext) ={  }
    +
    +  /*handle the difference in "None" and empty ""*/
    +  def getEmptyCommentsFieldValue = "None"
    +
    +  def convertCatalystString2Hive(s: String) = new hadoopIo.Text(s)
    --- End diff --
    
    Maybe we still need the comment where we call `new hadoopIo.Text` to here, which was at line 140 in `HiveInspectors.scala`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-57041188
  
    @yhuai I tried some 0.13.1 feature, for example collection_list, native parquet file support, limited orc file support. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58993065
  
    Test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59879567
  
    We can reproduce the golden answer for hive 0.13 as i done in my closed PR, how about that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-58927724
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17832786
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala ---
    @@ -77,7 +78,7 @@ class TestHiveContext(sc: SparkContext) extends HiveContext(sc) {
     
       // For some hive test case which contain ${system:test.tmp.dir}
       System.setProperty("test.tmp.dir", testTempDir.getCanonicalPath)
    -
    +  CommandProcessorFactory.clean(hiveconf);
    --- End diff --
    
    This is to cleanup. Otherwise, I observe memory leak in hive0.13.1. Following is the funciton.
      public static void clean(HiveConf conf) {
        Driver drv = mapDrivers.get(conf);
        if (drv != null) {
          drv.destroy();
        }
    
        mapDrivers.remove(conf);
      }



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17865634
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -40,6 +39,9 @@ import org.apache.spark.rdd.RDD
     import org.apache.spark.sql.catalyst.expressions.Row
     import org.apache.spark.sql.execution.{SparkPlan, UnaryNode}
     import org.apache.spark.sql.hive.{HiveContext, MetastoreRelation, SparkHiveHadoopWriter}
    +import org.apache.spark.sql.hive.{ShimFileSinkDesc => FileSinkDesc, ShimContext => Context}
    +import org.apache.spark.sql.hive.HiveShim
    +import org.apache.spark.sql.hive.HiveShim._
    --- End diff --
    
    It is for implicit conversion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17812813
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,147 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import scala.collection.JavaConversions._
    +import org.apache.spark.Logging
    +import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import java.util.Properties
    +import org.apache.hadoop.hive.serde2.Deserializer
    +
    +/*hive-0.13.1 support shimmer layer*/
    +object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    +  val metastoreDecimal = "decimal(10,0)"
    +  def getTableDesc(serdeClass: Class[_ <: Deserializer], inputFormatClass: Class[_ <: InputFormat[_, _]], outputFormatClass: Class[_], properties: Properties) = {
    +    new TableDesc(inputFormatClass, outputFormatClass, properties)
    +  }
    +  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
    +  def createDefaultDBIfNeeded(context: HiveContext) ={
    +    context.runSqlHive("CREATE DATABASE default")
    +    context.runSqlHive("USE default")
    +  }
    +  /*handle the difference in HiveQuerySuite*/
    +  def getEmptyCommentsFieldValue = ""
    +
    +  def convertCatalystString2Hive(s: String) = s
    +
    +  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
    +    CommandProcessorFactory.get(cmd, conf)
    +  }
    +
    +  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
    +    HiveDecimal.create(bd)
    +  }
    +
    +  /*
    +   * This function in hive-0.13 become private, but we have to do this to walkaround hive bug
    +   * */
    --- End diff --
    
    Fix formatting.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17646732
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala ---
    @@ -33,6 +31,7 @@ import org.apache.spark.util.Utils.getContextOrSparkClassLoader
     
     /* Implicit conversions */
     import scala.collection.JavaConversions._
    +import org.apache.spark.sql.hive.HiveShim
    --- End diff --
    
    should reorder import order in groups


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54696239
  
    Conflicts solved. Please retest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-59962787
  
    A few comments:  I'm not talking only about getting the current tests to pass, but upgrading the test set to include the new files.  Also, I hope to update the whitelist to include any new tests that are now passing.
    
    I'm not particularly concerned about matching every small detail (i.e. if we don't need to match the empty comment field on metadata based on version).  What is important is that we can connect to both Hive 12&13 metastores and that queries run and return the correct answers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by zhzhan <gi...@git.apache.org>.

Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17833036
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
    @@ -140,7 +141,7 @@ class HadoopTableReader(
           filterOpt: Option[PathFilter]): RDD[Row] = {
         val hivePartitionRDDs = partitionToDeserializer.map { case (partition, partDeserializer) =>
           val partDesc = Utilities.getPartitionDesc(partition)
    -      val partPath = partition.getPartitionPath
    +      val partPath = partition.getDataLocationPath
    --- End diff --
    
    You are right.  getPartitionPath is not in hive-0.13.1, but in trunk, and we have to deal with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17812784
  
    --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
    @@ -0,0 +1,147 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.hive
    +
    +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
    +import scala.language.implicitConversions
    +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
    +import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
    +import scala.collection.JavaConversions._
    +import org.apache.spark.Logging
    +import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
    +import org.apache.hadoop.conf.Configuration
    +import org.apache.hadoop.fs.Path
    +import org.apache.hadoop.hive.ql.Context
    +import org.apache.hadoop.hive.ql.metadata.Partition
    +import org.apache.hadoop.{io => hadoopIo}
    +import org.apache.hadoop.hive.conf.HiveConf
    +import org.apache.hadoop.hive.common.StatsSetupConst
    +import org.apache.hadoop.mapred.InputFormat
    +import java.util.Properties
    +import org.apache.hadoop.hive.serde2.Deserializer
    +
    +/*hive-0.13.1 support shimmer layer*/
    +object HiveShim {
    +  val version = "0.13.1"
    +  /*
    +   * hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0)
    +   * Full support of new decimal feature need to be fixed in seperate PR.
    +   */
    --- End diff --
    
    Fix formatting. Add TODO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r17808060
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -176,7 +178,7 @@ case class InsertIntoHiveTable(
         // instances within the closure, since Serializer is not serializable while TableDesc is.
         val tableDesc = table.tableDesc
         val tableLocation = table.hiveQlTable.getDataLocation
    -    val tmpLocation = hiveContext.getExternalTmpFileURI(tableLocation)
    +    val tmpLocation = hiveContext.getExternalTmpPath(tableLocation)
    --- End diff --
    
    Is `getExternalTmpPath` in Hive 0.12? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18430480
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
    @@ -212,7 +214,18 @@ private[hive] object HiveQl {
       /**
        * Returns the AST for the given SQL string.
        */
    -  def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql))
    +  def getAst(sql: String): ASTNode = {
    +    /*
    +     * Context has to be passed in in hive0.13.1.
    --- End diff --
    
    in in


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2241#issuecomment-54249712
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19628/consoleFull) for   PR 2241 at commit [`94b4fdc`](https://github.com/apache/spark/commit/94b4fdc8600cc6d5d4fe338e293be74081a0e2d4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2241#discussion_r18430262
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala ---
    @@ -78,6 +79,7 @@ class TestHiveContext(sc: SparkContext) extends HiveContext(sc) {
       // For some hive test case which contain ${system:test.tmp.dir}
       System.setProperty("test.tmp.dir", testTempDir.getCanonicalPath)
     
    +  CommandProcessorFactory.clean(hiveconf)
    --- End diff --
    
    Since it is a cleanup work, seems it is better to be placed after `System.clearProperty("spark.hostPort")`. Also, please add comment about what this call is doing and why it is needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org