You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sameeragarwal <gi...@git.apache.org> on 2016/04/26 22:08:18 UTC

[GitHub] spark pull request: [SPARK-14929][SQL] Disable vectorized map for ...

GitHub user sameeragarwal opened a pull request:

    https://github.com/apache/spark/pull/12710

    [SPARK-14929][SQL] Disable vectorized map for wide schemas & high-precision decimals

    ## What changes were proposed in this pull request?
    
    While the vectorized hash map in `TungstenAggregate` is currently supported for all primitive data types during partial aggregation, this patch only enables the hash map for a subset of cases that've been verified to show performance improvements on our benchmarks subject to an internal conf that sets an upper limit on the maximum length of the aggregate key/value schema. This list of supported use-cases should be expanded over time.
    
    ## How was this patch tested?
    
    This is no new change in functionality so existing tests should suffice. Performance tests were done on TPCDS benchmarks.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sameeragarwal/spark vectorized-enable

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12710.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12710
    
----
commit f48eba15ab1804c7848dd14f0ee3bb051500934f
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-26T20:02:31Z

    Not enable vectorized hashmap for wide schemas and high-precision decimals

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14929][SQL] Disable vectorized map for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12710#issuecomment-214903168
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57026/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14929][SQL] Disable vectorized map for ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12710#discussion_r61156270
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala ---
    @@ -447,16 +447,31 @@ case class TungstenAggregate(
         }
       }
     
    +  /**
    +   * Using the vectorized hash map in TungstenAggregate is currently supported for all primitive
    +   * data types during partial aggregation. However, we currently only enable the hash map for a
    +   * subset of cases that've been verified to show performance improvements on our benchmarks
    +   * subject to an internal conf that sets an upper limit on the maximum length of the aggregate
    +   * key/value schema.
    +   *
    +   * This list of supported use-cases should be expanded over time.
    +   */
    +  private def enableVectorizedHashMap(ctx: CodegenContext): Boolean = {
    +    val isSupported =
    +      (groupingKeySchema ++ bufferSchema).forall(f => ctx.isPrimitiveType(f.dataType) ||
    +        f.dataType.isInstanceOf[DecimalType] || f.dataType.isInstanceOf[StringType]) &&
    +        bufferSchema.forall(!_.dataType.isInstanceOf[StringType]) && bufferSchema.nonEmpty &&
    --- End diff --
    
    bufferSchema can't have StringType in TungstenAggregate


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14929][SQL] Disable vectorized map for ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12710


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14929][SQL] Disable vectorized map for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12710#issuecomment-214871708
  
    **[Test build #57024 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57024/consoleFull)** for PR 12710 at commit [`f48eba1`](https://github.com/apache/spark/commit/f48eba15ab1804c7848dd14f0ee3bb051500934f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14929][SQL] Disable vectorized map for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12710#issuecomment-214897701
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14929][SQL] Disable vectorized map for ...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/12710#issuecomment-214870772
  
    cc @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14929][SQL] Disable vectorized map for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12710#issuecomment-214903166
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14929][SQL] Disable vectorized map for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12710#issuecomment-214897424
  
    **[Test build #57024 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57024/consoleFull)** for PR 12710 at commit [`f48eba1`](https://github.com/apache/spark/commit/f48eba15ab1804c7848dd14f0ee3bb051500934f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14929][SQL] Disable vectorized map for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12710#issuecomment-214878256
  
    **[Test build #57026 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57026/consoleFull)** for PR 12710 at commit [`3c0ff54`](https://github.com/apache/spark/commit/3c0ff5427810353763479e6e41d7d9167e4a98e5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14929][SQL] Disable vectorized map for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12710#issuecomment-214902906
  
    **[Test build #57026 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57026/consoleFull)** for PR 12710 at commit [`3c0ff54`](https://github.com/apache/spark/commit/3c0ff5427810353763479e6e41d7d9167e4a98e5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14929][SQL] Disable vectorized map for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12710#issuecomment-214897703
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57024/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14929][SQL] Disable vectorized map for ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12710#issuecomment-214898824
  
    LGTM
    Merging this into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14929][SQL] Disable vectorized map for ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12710#discussion_r61156381
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala ---
    @@ -447,16 +447,31 @@ case class TungstenAggregate(
         }
       }
     
    +  /**
    +   * Using the vectorized hash map in TungstenAggregate is currently supported for all primitive
    +   * data types during partial aggregation. However, we currently only enable the hash map for a
    +   * subset of cases that've been verified to show performance improvements on our benchmarks
    +   * subject to an internal conf that sets an upper limit on the maximum length of the aggregate
    +   * key/value schema.
    +   *
    +   * This list of supported use-cases should be expanded over time.
    +   */
    +  private def enableVectorizedHashMap(ctx: CodegenContext): Boolean = {
    +    val isSupported =
    +      (groupingKeySchema ++ bufferSchema).forall(f => ctx.isPrimitiveType(f.dataType) ||
    +        f.dataType.isInstanceOf[DecimalType] || f.dataType.isInstanceOf[StringType]) &&
    +        bufferSchema.forall(!_.dataType.isInstanceOf[StringType]) && bufferSchema.nonEmpty &&
    +        modes.forall(mode => mode == Partial || mode == PartialMerge)
    +
    +    isSupported && bufferSchema.map(_.dataType).filter(_.isInstanceOf[DecimalType])
    --- End diff --
    
    Add a comment to say why ByteArrayDecimalType is not supported?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org