You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2016/07/24 05:35:47 UTC

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/14331

    [SPARK-16691][SQL] move BucketSpec to catalyst module and use it in CatalogTable

    ## What changes were proposed in this pull request?
    
    It's weird that we have `BucketSpec` to abstract bucket info, but don't use it in `CatalogTable`. This PR moves `BucketSpec` into catalyst module.
    
    
    ## How was this patch tested?
    
    existing tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark check

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14331.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14331
    
----
commit beefff2861ac142dd3a416a29cf101b39b11ac4d
Author: Wenchen Fan <we...@databricks.com>
Date:   2016-07-23T02:23:55Z

    move BucketSpec to catalyst module and use it in CatalogTable

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14331#discussion_r71996213
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -365,9 +365,6 @@ private[hive] class HiveClientImpl(
             },
             schema = schema,
             partitionColumnNames = partCols.map(_.name),
    -        sortColumnNames = Seq(), // TODO: populate this
    -        bucketColumnNames = h.getBucketCols.asScala,
    -        numBuckets = h.getNumBuckets,
    --- End diff --
    
    Can we explicitly set `bucketSpec` to `None` at here? Also, let's add a comment to explain the reason that we do not populate this variable at here? 
    
    btw, I think we will not populate this info for a bucketed hive table because we are using different hash functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    **[Test build #62777 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62777/consoleFull)** for PR 14331 at commit [`cc63f01`](https://github.com/apache/spark/commit/cc63f01b7362aa8c34d82ef0941bc8919e5f90b7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62763/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14331#discussion_r71982334
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -498,23 +498,19 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF
       }
     
       private def describeBucketingInfo(metadata: CatalogTable, buffer: ArrayBuffer[Row]): Unit = {
    -    def appendBucketInfo(numBuckets: Int, bucketColumns: Seq[String], sortColumns: Seq[String]) = {
    -      append(buffer, "Num Buckets:", numBuckets.toString, "")
    -      append(buffer, "Bucket Columns:", bucketColumns.mkString("[", ", ", "]"), "")
    -      append(buffer, "Sort Columns:", sortColumns.mkString("[", ", ", "]"), "")
    +    def appendBucketInfo(bucketSpec: Option[BucketSpec]) = bucketSpec match {
    +      case Some(BucketSpec(numBuckets, bucketColumnNames, sortColumnNames)) =>
    +        append(buffer, "Num Buckets:", numBuckets.toString, "")
    +        append(buffer, "Bucket Columns:", bucketColumnNames.mkString("[", ", ", "]"), "")
    +        append(buffer, "Sort Columns:", sortColumnNames.mkString("[", ", ", "]"), "")
    +
    +      case _ =>
         }
     
    -    DDLUtils.getBucketSpecFromTableProperties(metadata) match {
    -      case Some(bucketSpec) =>
    -        appendBucketInfo(
    -          bucketSpec.numBuckets,
    -          bucketSpec.bucketColumnNames,
    -          bucketSpec.sortColumnNames)
    -      case None =>
    -        appendBucketInfo(
    -          metadata.numBuckets,
    -          metadata.bucketColumnNames,
    -          metadata.sortColumnNames)
    +    if (DDLUtils.isDatasourceTable(metadata)) {
    +      appendBucketInfo(DDLUtils.getBucketSpecFromTableProperties(metadata))
    +    } else {
    +      appendBucketInfo(metadata.bucketSpec)
    --- End diff --
    
    If we do not want to populate the `bucketSpec` value, does that mean this line will always return nothing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14331#discussion_r71982187
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -764,10 +761,7 @@ private[hive] class HiveClientImpl(
           hiveTable.setFields(schema.asJava)
         }
         hiveTable.setPartCols(partCols.asJava)
    -    // TODO: set sort columns here too
    -    hiveTable.setBucketCols(table.bucketColumnNames.asJava)
    --- End diff --
    
    we don't support bucketed hive table now, and I think we never will, because we have different hash implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62798/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    Merging to master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    **[Test build #62776 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62776/consoleFull)** for PR 14331 at commit [`59321d0`](https://github.com/apache/spark/commit/59321d003c1394c47dba5894d9363fc9ea65a803).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    **[Test build #62798 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62798/consoleFull)** for PR 14331 at commit [`069419d`](https://github.com/apache/spark/commit/069419d4048e05f829618d124acc784cc6c01397).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14331#discussion_r71989869
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -498,23 +498,19 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF
       }
     
       private def describeBucketingInfo(metadata: CatalogTable, buffer: ArrayBuffer[Row]): Unit = {
    -    def appendBucketInfo(numBuckets: Int, bucketColumns: Seq[String], sortColumns: Seq[String]) = {
    -      append(buffer, "Num Buckets:", numBuckets.toString, "")
    -      append(buffer, "Bucket Columns:", bucketColumns.mkString("[", ", ", "]"), "")
    -      append(buffer, "Sort Columns:", sortColumns.mkString("[", ", ", "]"), "")
    +    def appendBucketInfo(bucketSpec: Option[BucketSpec]) = bucketSpec match {
    +      case Some(BucketSpec(numBuckets, bucketColumnNames, sortColumnNames)) =>
    +        append(buffer, "Num Buckets:", numBuckets.toString, "")
    +        append(buffer, "Bucket Columns:", bucketColumnNames.mkString("[", ", ", "]"), "")
    +        append(buffer, "Sort Columns:", sortColumnNames.mkString("[", ", ", "]"), "")
    +
    +      case _ =>
         }
     
    -    DDLUtils.getBucketSpecFromTableProperties(metadata) match {
    -      case Some(bucketSpec) =>
    -        appendBucketInfo(
    -          bucketSpec.numBuckets,
    -          bucketSpec.bucketColumnNames,
    -          bucketSpec.sortColumnNames)
    -      case None =>
    -        appendBucketInfo(
    -          metadata.numBuckets,
    -          metadata.bucketColumnNames,
    -          metadata.sortColumnNames)
    +    if (DDLUtils.isDatasourceTable(metadata)) {
    +      appendBucketInfo(DDLUtils.getBucketSpecFromTableProperties(metadata))
    +    } else {
    +      appendBucketInfo(metadata.bucketSpec)
    --- End diff --
    
    For now, yes. But I don't wanna change the existing logic here, and we may support bucketed hive serde table in the future(not hive table).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    **[Test build #62776 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62776/consoleFull)** for PR 14331 at commit [`59321d0`](https://github.com/apache/spark/commit/59321d003c1394c47dba5894d9363fc9ea65a803).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14331#discussion_r71982259
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -365,9 +365,6 @@ private[hive] class HiveClientImpl(
             },
             schema = schema,
             partitionColumnNames = partCols.map(_.name),
    -        sortColumnNames = Seq(), // TODO: populate this
    -        bucketColumnNames = h.getBucketCols.asScala,
    -        numBuckets = h.getNumBuckets,
    --- End diff --
    
    nvm. I did not see the above post.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14331#discussion_r72009811
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala ---
    @@ -20,10 +20,11 @@ package org.apache.spark.sql.sources
     import java.io.File
     
     import org.apache.spark.sql._
    +import org.apache.spark.sql.catalyst.catalog.BucketSpec
     import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.plans.physical.HashPartitioning
     import org.apache.spark.sql.execution.DataSourceScanExec
    -import org.apache.spark.sql.execution.datasources.{BucketSpec, DataSourceStrategy}
    +import org.apache.spark.sql.execution.datasources.{DataSourceStrategy}
    --- End diff --
    
    Nit: Remove the braces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    **[Test build #62763 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62763/consoleFull)** for PR 14331 at commit [`beefff2`](https://github.com/apache/spark/commit/beefff2861ac142dd3a416a29cf101b39b11ac4d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    **[Test build #62777 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62777/consoleFull)** for PR 14331 at commit [`cc63f01`](https://github.com/apache/spark/commit/cc63f01b7362aa8c34d82ef0941bc8919e5f90b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62776/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14331#discussion_r72009804
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ---
    @@ -173,8 +190,18 @@ case class CatalogTable(
       override def toString: String = {
         val tableProperties = properties.map(p => p._1 + "=" + p._2).mkString("[", ", ", "]")
         val partitionColumns = partitionColumnNames.map("`" + _ + "`").mkString("[", ", ", "]")
    -    val sortColumns = sortColumnNames.map("`" + _ + "`").mkString("[", ", ", "]")
    -    val bucketColumns = bucketColumnNames.map("`" + _ + "`").mkString("[", ", ", "]")
    +    val bucketStrings = bucketSpec match {
    +      case Some(BucketSpec(numBuckets, bucketColumnNames, sortColumnNames)) =>
    +        val bucketColumnsString = bucketColumnNames.map("`" + _ + "`").mkString("[", ", ", "]")
    +        val sortColumnsString = sortColumnNames.map("`" + _ + "`").mkString("[", ", ", "]")
    --- End diff --
    
    Use `quoteIdentifier` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14331


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14331#discussion_r72047779
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ---
    @@ -173,8 +190,18 @@ case class CatalogTable(
       override def toString: String = {
         val tableProperties = properties.map(p => p._1 + "=" + p._2).mkString("[", ", ", "]")
         val partitionColumns = partitionColumnNames.map("`" + _ + "`").mkString("[", ", ", "]")
    -    val sortColumns = sortColumnNames.map("`" + _ + "`").mkString("[", ", ", "]")
    -    val bucketColumns = bucketColumnNames.map("`" + _ + "`").mkString("[", ", ", "]")
    +    val bucketStrings = bucketSpec match {
    +      case Some(BucketSpec(numBuckets, bucketColumnNames, sortColumnNames)) =>
    +        val bucketColumnsString = bucketColumnNames.map("`" + _ + "`").mkString("[", ", ", "]")
    +        val sortColumnsString = sortColumnNames.map("`" + _ + "`").mkString("[", ", ", "]")
    --- End diff --
    
    `org.apache.spark.sql.catalyst.quoteIdentifier`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    **[Test build #62763 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62763/consoleFull)** for PR 14331 at commit [`beefff2`](https://github.com/apache/spark/commit/beefff2861ac142dd3a416a29cf101b39b11ac4d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14331#discussion_r71982245
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -365,9 +365,6 @@ private[hive] class HiveClientImpl(
             },
             schema = schema,
             partitionColumnNames = partCols.map(_.name),
    -        sortColumnNames = Seq(), // TODO: populate this
    -        bucketColumnNames = h.getBucketCols.asScala,
    -        numBuckets = h.getNumBuckets,
    --- End diff --
    
    This PR replaces these attributes by `bucketSpec`. Just wondering why we do not need to populate the `bucketSpec`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    Overall LGTM, a few minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62777/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    cc @yhuai @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    **[Test build #62798 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62798/consoleFull)** for PR 14331 at commit [`069419d`](https://github.com/apache/spark/commit/069419d4048e05f829618d124acc784cc6c01397).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14331#discussion_r72009802
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ---
    @@ -110,6 +110,25 @@ case class CatalogTablePartition(
     
     
     /**
    + * A container for bucketing information.
    + * Bucketing is a technology for decomposing data sets into more manageable parts, and the number
    + * of buckets is fixed so it does not fluctuate with data.
    + *
    + * @param numBuckets number of buckets.
    + * @param bucketColumnNames the names of the columns that used to generate the bucket id.
    + * @param sortColumnNames the names of the columns that used to sort data in each bucket.
    + */
    +case class BucketSpec(
    +    numBuckets: Int,
    +    bucketColumnNames: Seq[String],
    +    sortColumnNames: Seq[String]) {
    +  if (numBuckets <= 0) {
    +    throw new AnalysisException(s"Expected positive number of buckets, but got `$numBuckets`.")
    +  }
    +}
    +
    +
    --- End diff --
    
    Nit: Remove extra newline.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62821/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    **[Test build #62821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62821/consoleFull)** for PR 14331 at commit [`b7b9946`](https://github.com/apache/spark/commit/b7b99467b3e10b92cf086f336851c8a5f9681d27).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14331: [SPARK-16691][SQL] move BucketSpec to catalyst mo...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14331#discussion_r72010784
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ---
    @@ -173,8 +190,18 @@ case class CatalogTable(
       override def toString: String = {
         val tableProperties = properties.map(p => p._1 + "=" + p._2).mkString("[", ", ", "]")
         val partitionColumns = partitionColumnNames.map("`" + _ + "`").mkString("[", ", ", "]")
    -    val sortColumns = sortColumnNames.map("`" + _ + "`").mkString("[", ", ", "]")
    -    val bucketColumns = bucketColumnNames.map("`" + _ + "`").mkString("[", ", ", "]")
    +    val bucketStrings = bucketSpec match {
    +      case Some(BucketSpec(numBuckets, bucketColumnNames, sortColumnNames)) =>
    +        val bucketColumnsString = bucketColumnNames.map("`" + _ + "`").mkString("[", ", ", "]")
    +        val sortColumnsString = sortColumnNames.map("`" + _ + "`").mkString("[", ", ", "]")
    --- End diff --
    
    which `quoteIdentifier`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    **[Test build #62821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62821/consoleFull)** for PR 14331 at commit [`b7b9946`](https://github.com/apache/spark/commit/b7b99467b3e10b92cf086f336851c8a5f9681d27).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14331: [SPARK-16691][SQL] move BucketSpec to catalyst module an...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14331
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org