You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2017/02/14 19:22:35 UTC

[GitHub] spark pull request #16931: [SPARK-19587][SQL] bucket sorting columns should ...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/16931

    [SPARK-19587][SQL] bucket sorting columns should not be picked from partition columns

    ## What changes were proposed in this pull request?
    
    We will throw an exception if bucket columns are part of partition columns, this should also apply to sort columns.
    
    This PR also move the checking logic from `DataFrameWriter` to `PreprocessTableCreation`, which is the central place for checking and normailization.
    
    ## How was this patch tested?
    
    updated test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark bucket

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16931.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16931
    
----
commit 56b1b182f869451a72276ca8ad636dfa1d955cc2
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-02-14T19:20:17Z

    bucket sorting columns should not be picked from partition columns

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16931: [SPARK-19587][SQL] bucket sorting columns should not be ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16931
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16931: [SPARK-19587][SQL] bucket sorting columns should ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16931#discussion_r101315930
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -270,52 +269,17 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
             ifNotExists = false)).toRdd
       }
     
    -  private def normalizedParCols: Option[Seq[String]] = partitioningColumns.map { cols =>
    -    cols.map(normalize(_, "Partition"))
    -  }
    -
    -  private def normalizedBucketColNames: Option[Seq[String]] = bucketColumnNames.map { cols =>
    -    cols.map(normalize(_, "Bucketing"))
    -  }
    -
    -  private def normalizedSortColNames: Option[Seq[String]] = sortColumnNames.map { cols =>
    -    cols.map(normalize(_, "Sorting"))
    -  }
    -
       private def getBucketSpec: Option[BucketSpec] = {
         if (sortColumnNames.isDefined) {
           require(numBuckets.isDefined, "sortBy must be used together with bucketBy")
         }
     
    -    for {
    -      n <- numBuckets
    -    } yield {
    +    numBuckets.map { n =>
           require(n > 0 && n < 100000, "Bucket number must be greater than 0 and less than 100000.")
    --- End diff --
    
    feel free to submit one :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16931: [SPARK-19587][SQL] bucket sorting columns should not be ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16931
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16931: [SPARK-19587][SQL] bucket sorting columns should not be ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16931
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16931: [SPARK-19587][SQL] bucket sorting columns should not be ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16931
  
    **[Test build #72889 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72889/testReport)** for PR 16931 at commit [`56b1b18`](https://github.com/apache/spark/commit/56b1b182f869451a72276ca8ad636dfa1d955cc2).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16931: [SPARK-19587][SQL] bucket sorting columns should ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16931


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16931: [SPARK-19587][SQL] bucket sorting columns should not be ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16931
  
    Another late LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16931: [SPARK-19587][SQL] bucket sorting columns should ...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16931#discussion_r101206619
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -270,52 +269,17 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
             ifNotExists = false)).toRdd
       }
     
    -  private def normalizedParCols: Option[Seq[String]] = partitioningColumns.map { cols =>
    -    cols.map(normalize(_, "Partition"))
    -  }
    -
    -  private def normalizedBucketColNames: Option[Seq[String]] = bucketColumnNames.map { cols =>
    -    cols.map(normalize(_, "Bucketing"))
    -  }
    -
    -  private def normalizedSortColNames: Option[Seq[String]] = sortColumnNames.map { cols =>
    -    cols.map(normalize(_, "Sorting"))
    -  }
    -
       private def getBucketSpec: Option[BucketSpec] = {
         if (sortColumnNames.isDefined) {
           require(numBuckets.isDefined, "sortBy must be used together with bucketBy")
         }
     
    -    for {
    -      n <- numBuckets
    -    } yield {
    +    numBuckets.map { n =>
           require(n > 0 && n < 100000, "Bucket number must be greater than 0 and less than 100000.")
    --- End diff --
    
    Cool. I will submit a PR for that change once you land this one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16931: [SPARK-19587][SQL] bucket sorting columns should not be ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16931
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72889/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16931: [SPARK-19587][SQL] bucket sorting columns should ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16931#discussion_r101338203
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala ---
    @@ -169,19 +169,20 @@ class BucketedWriteSuite extends QueryTest with SQLTestUtils with TestHiveSingle
         }
       }
     
    -  test("write bucketed data with the overlapping bucketBy and partitionBy columns") {
    -    intercept[AnalysisException](df.write
    +  test("write bucketed data with the overlapping bucketBy/sortBy and partitionBy columns") {
    --- End diff --
    
    Not related to this PR, but I think we should move most test cases to sql packages. Let me try to do it. Only orc formats are hive only. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16931: [SPARK-19587][SQL] bucket sorting columns should not be ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16931
  
    cc @tejasapatil @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16931: [SPARK-19587][SQL] bucket sorting columns should not be ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16931
  
    **[Test build #72902 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72902/testReport)** for PR 16931 at commit [`e21d8ae`](https://github.com/apache/spark/commit/e21d8ae55ef5352d86e5e699677b4506437b0db0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16931: [SPARK-19587][SQL] bucket sorting columns should not be ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16931
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72902/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16931: [SPARK-19587][SQL] bucket sorting columns should not be ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16931
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16931: [SPARK-19587][SQL] bucket sorting columns should ...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16931#discussion_r101170361
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -270,52 +269,17 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
             ifNotExists = false)).toRdd
       }
     
    -  private def normalizedParCols: Option[Seq[String]] = partitioningColumns.map { cols =>
    -    cols.map(normalize(_, "Partition"))
    -  }
    -
    -  private def normalizedBucketColNames: Option[Seq[String]] = bucketColumnNames.map { cols =>
    -    cols.map(normalize(_, "Bucketing"))
    -  }
    -
    -  private def normalizedSortColNames: Option[Seq[String]] = sortColumnNames.map { cols =>
    -    cols.map(normalize(_, "Sorting"))
    -  }
    -
       private def getBucketSpec: Option[BucketSpec] = {
         if (sortColumnNames.isDefined) {
           require(numBuckets.isDefined, "sortBy must be used together with bucketBy")
         }
     
    -    for {
    -      n <- numBuckets
    -    } yield {
    +    numBuckets.map { n =>
           require(n > 0 && n < 100000, "Bucket number must be greater than 0 and less than 100000.")
    --- End diff --
    
    Orthogonal to your PR: This means Spark supports buckets in range [1, 99999]. Any reason to have a low value for upper bound ?
    
    Also, I don't think this code gets executed if the bucketed table is written via SQL. The only check I can see was when we create `BucketSpec` but its for lower bound only : https://github.com/apache/spark/blob/4d4d0de7f64cefbca28dc532b7864de9626aa241/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L138 . This check should be only present in `BucketSpec` creation to be consistent across the codebase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16931: [SPARK-19587][SQL] bucket sorting columns should not be ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16931
  
    **[Test build #72902 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72902/testReport)** for PR 16931 at commit [`e21d8ae`](https://github.com/apache/spark/commit/e21d8ae55ef5352d86e5e699677b4506437b0db0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16931: [SPARK-19587][SQL] bucket sorting columns should not be ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16931
  
    thanks for the review, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16931: [SPARK-19587][SQL] bucket sorting columns should ...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16931#discussion_r101180242
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -270,52 +269,17 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
             ifNotExists = false)).toRdd
       }
     
    -  private def normalizedParCols: Option[Seq[String]] = partitioningColumns.map { cols =>
    -    cols.map(normalize(_, "Partition"))
    -  }
    -
    -  private def normalizedBucketColNames: Option[Seq[String]] = bucketColumnNames.map { cols =>
    -    cols.map(normalize(_, "Bucketing"))
    -  }
    -
    -  private def normalizedSortColNames: Option[Seq[String]] = sortColumnNames.map { cols =>
    -    cols.map(normalize(_, "Sorting"))
    -  }
    -
       private def getBucketSpec: Option[BucketSpec] = {
         if (sortColumnNames.isDefined) {
           require(numBuckets.isDefined, "sortBy must be used together with bucketBy")
         }
     
    -    for {
    -      n <- numBuckets
    -    } yield {
    +    numBuckets.map { n =>
           require(n > 0 && n < 100000, "Bucket number must be greater than 0 and less than 100000.")
    --- End diff --
    
    yea we should move this check to `BucketSpec` for consistency.
    
    About the upper bound, we just picked a value that should be big enough. In practice I don't think users will set large bucket numbers, this is just a sanity check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16931: [SPARK-19587][SQL] bucket sorting columns should not be ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16931
  
    **[Test build #72889 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72889/testReport)** for PR 16931 at commit [`56b1b18`](https://github.com/apache/spark/commit/56b1b182f869451a72276ca8ad636dfa1d955cc2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16931: [SPARK-19587][SQL] bucket sorting columns should not be ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16931
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16931: [SPARK-19587][SQL] bucket sorting columns should ...

Posted by tejasapatil <gi...@git.apache.org>.

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16931#discussion_r101207253
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -270,52 +269,17 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
             ifNotExists = false)).toRdd
       }
     
    -  private def normalizedParCols: Option[Seq[String]] = partitioningColumns.map { cols =>
    -    cols.map(normalize(_, "Partition"))
    -  }
    -
    -  private def normalizedBucketColNames: Option[Seq[String]] = bucketColumnNames.map { cols =>
    -    cols.map(normalize(_, "Bucketing"))
    -  }
    -
    -  private def normalizedSortColNames: Option[Seq[String]] = sortColumnNames.map { cols =>
    -    cols.map(normalize(_, "Sorting"))
    -  }
    -
       private def getBucketSpec: Option[BucketSpec] = {
         if (sortColumnNames.isDefined) {
           require(numBuckets.isDefined, "sortBy must be used together with bucketBy")
         }
     
    -    for {
    -      n <- numBuckets
    -    } yield {
    +    numBuckets.map { n =>
           require(n > 0 && n < 100000, "Bucket number must be greater than 0 and less than 100000.")
    --- End diff --
    
    or you could do that change right here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org