You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2016/02/02 01:50:26 UTC

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/11010

    [SPARK-12950] [SQL] Improve lookup of BytesToBytesMap in aggregate

    This PR improve the lookup of BytesToBytesMap by:
    
    1. Generate code for calculate the hash code of grouping keys. 
    
    2. Do not use MemoryLocation, fetch the baseObject and offset for key and value directly (remove the indirection).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark gen_map

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11010.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11010
    
----
commit 64df01e138554aee6c3d79b4a33a5842b7e12e9f
Author: Davies Liu <da...@databricks.com>
Date:   2016-02-02T00:43:52Z

    improve BytesToBytesMap

commit 4cefbc5fdc02a9a39f4870767146932b28cf96e4
Author: Davies Liu <da...@databricks.com>
Date:   2016-02-02T00:44:04Z

    Merge branch 'master' of github.com:apache/spark into gen_map

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-182147420
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179700147
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50736/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-178291023
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11010#discussion_r51635675
  
    --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
    @@ -426,14 +435,13 @@ public Location lookup(Object keyBase, long keyOffset, int keyLength) {
        *
        * This is a thread-safe version of `lookup`, could be used by multiple threads.
        */
    -  public void safeLookup(Object keyBase, long keyOffset, int keyLength, Location loc) {
    +  public void safeLookup(Object keyBase, long keyOffset, int keyLength, Location loc, int hash) {
         assert(longArray != null);
     
         if (enablePerfMetrics) {
           numKeyLookups++;
         }
    -    final int hashcode = HASHER.hashUnsafeWords(keyBase, keyOffset, keyLength);
    -    int pos = hashcode & mask;
    +    int pos = hash & mask;
    --- End diff --
    
    This doesn't work if hash is negative right? Not clear to me the new hash doesn't return negatives. assert this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-182113475
  
    **[Test build #50999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50999/consoleFull)** for PR 11010 at commit [`7f5852a`](https://github.com/apache/spark/commit/7f5852a5c2005e191f4581344c48f0677ea748cc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179653930
  
    **[Test build #50734 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50734/consoleFull)** for PR 11010 at commit [`6c9ce88`](https://github.com/apache/spark/commit/6c9ce8822d3e1f29b7785a154350d6ccb6a5f6bb).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11010#discussion_r51524912
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala ---
    @@ -424,3 +424,69 @@ case class Murmur3Hash(children: Seq[Expression], seed: Int) extends Expression
         }
       }
     }
    +
    +/**
    +  * A function that calculates hash value for a group of expressions, which basically XOR all the
    +  * hash code of children expressions together.
    +  *
    +  * Note: This is used for hash map for aggreagte, designed for performance (has worse
    --- End diff --
    
    Does `UnsafeRow.hashCode` slower than this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11010#discussion_r51805826
  
    --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
    @@ -426,14 +435,13 @@ public Location lookup(Object keyBase, long keyOffset, int keyLength) {
        *
        * This is a thread-safe version of `lookup`, could be used by multiple threads.
        */
    -  public void safeLookup(Object keyBase, long keyOffset, int keyLength, Location loc) {
    +  public void safeLookup(Object keyBase, long keyOffset, int keyLength, Location loc, int hash) {
         assert(longArray != null);
     
         if (enablePerfMetrics) {
           numKeyLookups++;
         }
    -    final int hashcode = HASHER.hashUnsafeWords(keyBase, keyOffset, keyLength);
    -    int pos = hashcode & mask;
    +    int pos = hash & mask;
    --- End diff --
    
    It works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-178291024
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50513/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-182117603
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-182149195
  
    Merging this into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-178372828
  
    **[Test build #2487 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2487/consoleFull)** for PR 11010 at commit [`4cefbc5`](https://github.com/apache/spark/commit/4cefbc5fdc02a9a39f4870767146932b28cf96e4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179700144
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-180004247
  
    Do you know how much of this is from the general clean up and how much is from switching to a simpler hash? In my experience, using a very weak hash function can make things really bad if you dont account for it other ways.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179654368
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179661950
  
    **[Test build #50736 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50736/consoleFull)** for PR 11010 at commit [`85f8d0e`](https://github.com/apache/spark/commit/85f8d0e2a3913334a56f4b59ea946e57829e3523).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-182147421
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50999/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179378583
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50666/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179567135
  
    **[Test build #50723 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50723/consoleFull)** for PR 11010 at commit [`53a2dd4`](https://github.com/apache/spark/commit/53a2dd4dbbc479b940de31935050fb65d20cb55a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179569062
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179568077
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50721/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179534612
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-178377699
  
    **[Test build #2487 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2487/consoleFull)** for PR 11010 at commit [`4cefbc5`](https://github.com/apache/spark/commit/4cefbc5fdc02a9a39f4870767146932b28cf96e4).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-180115890
  
    @nongli As the benchmark show, the weak hash function could save 10ns per row, others may save 20ns per row. I'm also not sure the weak hash function is enough in this cases. BTW, the hashCode of ing/long in Java are also using this weak hash function, so they may not that bad.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179537006
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179654371
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50734/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11010#discussion_r51525298
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala ---
    @@ -41,20 +42,20 @@ class BenchmarkWholeStageCodegen extends SparkFunSuite {
     
         benchmark.addCase("Without codegen") { iter =>
           sqlContext.setConf("spark.sql.codegen.wholeStage", "false")
    -      sqlContext.range(values).filter("(id & 1) = 1").count()
    +      sqlContext.range(values).filter("(id & 1) = 1").groupBy().sum().collect()
         }
     
         benchmark.addCase("With codegen") { iter =>
           sqlContext.setConf("spark.sql.codegen.wholeStage", "true")
    -      sqlContext.range(values).filter("(id & 1) = 1").count()
    +      sqlContext.range(values).filter("(id & 1) = 1").groupBy().sum().collect()
         }
     
         /*
           Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      rang/filter/aggregate:            Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    +      rang/filter/aggregate:             Avg Time(ms)    Avg Rate(M/s)  Relative Rate
           -------------------------------------------------------------------------------
    -      Without codegen             7775.53            26.97         1.00 X
    -      With codegen                 342.15           612.94        22.73 X
    +      Without codegen                         5488.16            38.21         1.00 X
    +      With codegen                             531.08           394.88        10.33 X
    --- End diff --
    
    Then we will wait more than 2 minutes to finish this benchmark, I will send another PR to update these benchmark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179699608
  
    **[Test build #50736 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50736/consoleFull)** for PR 11010 at commit [`85f8d0e`](https://github.com/apache/spark/commit/85f8d0e2a3913334a56f4b59ea946e57829e3523).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class TaskMemoryManager `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179534615
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50701/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179569065
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50722/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11010#discussion_r51525005
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala ---
    @@ -41,20 +42,20 @@ class BenchmarkWholeStageCodegen extends SparkFunSuite {
     
         benchmark.addCase("Without codegen") { iter =>
           sqlContext.setConf("spark.sql.codegen.wholeStage", "false")
    -      sqlContext.range(values).filter("(id & 1) = 1").count()
    +      sqlContext.range(values).filter("(id & 1) = 1").groupBy().sum().collect()
         }
     
         benchmark.addCase("With codegen") { iter =>
           sqlContext.setConf("spark.sql.codegen.wholeStage", "true")
    -      sqlContext.range(values).filter("(id & 1) = 1").count()
    +      sqlContext.range(values).filter("(id & 1) = 1").groupBy().sum().collect()
         }
     
         /*
           Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      rang/filter/aggregate:            Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    +      rang/filter/aggregate:             Avg Time(ms)    Avg Rate(M/s)  Relative Rate
           -------------------------------------------------------------------------------
    -      Without codegen             7775.53            26.97         1.00 X
    -      With codegen                 342.15           612.94        22.73 X
    +      Without codegen                         5488.16            38.21         1.00 X
    +      With codegen                             531.08           394.88        10.33 X
    --- End diff --
    
    Probably want to increase it by 10x to amortize the fixed overhead. The runtime was 342 ms, which is too small.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179571014
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50723/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11010#discussion_r51634787
  
    --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
    @@ -441,30 +449,27 @@ public void safeLookup(Object keyBase, long keyOffset, int keyLength, Location l
           }
           if (longArray.get(pos * 2) == 0) {
             // This is a new key.
    -        loc.with(pos, hashcode, false);
    +        loc.with(pos, hash, false);
             return;
           } else {
             long stored = longArray.get(pos * 2 + 1);
    -        if ((int) (stored) == hashcode) {
    +        if ((int) (stored) == hash) {
               // Full hash code matches.  Let's compare the keys for equality.
    -          loc.with(pos, hashcode, true);
    +          loc.with(pos, hash, true);
               if (loc.getKeyLength() == keyLength) {
    -            final MemoryLocation keyAddress = loc.getKeyAddress();
    -            final Object storedkeyBase = keyAddress.getBaseObject();
    -            final long storedkeyOffset = keyAddress.getBaseOffset();
                 final boolean areEqual = ByteArrayMethods.arrayEquals(
                   keyBase,
                   keyOffset,
    -              storedkeyBase,
    -              storedkeyOffset,
    +              loc.getKeyBase(),
    +              loc.getKeyOffset(),
                   keyLength
                 );
                 if (areEqual) {
                   return;
                 } else {
    -              if (enablePerfMetrics) {
    +              //if (enablePerfMetrics) {
    --- End diff --
    
    restore


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179570918
  
    **[Test build #50723 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50723/consoleFull)** for PR 11010 at commit [`53a2dd4`](https://github.com/apache/spark/commit/53a2dd4dbbc479b940de31935050fb65d20cb55a).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179537012
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50705/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-182147249
  
    **[Test build #50999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50999/consoleFull)** for PR 11010 at commit [`7f5852a`](https://github.com/apache/spark/commit/7f5852a5c2005e191f4581344c48f0677ea748cc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179532296
  
    **[Test build #50705 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50705/consoleFull)** for PR 11010 at commit [`5eff34b`](https://github.com/apache/spark/commit/5eff34bd7f7f0fef31a6a81ddc5574914b0e00cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11010#discussion_r51524225
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala ---
    @@ -41,20 +42,20 @@ class BenchmarkWholeStageCodegen extends SparkFunSuite {
     
         benchmark.addCase("Without codegen") { iter =>
           sqlContext.setConf("spark.sql.codegen.wholeStage", "false")
    -      sqlContext.range(values).filter("(id & 1) = 1").count()
    +      sqlContext.range(values).filter("(id & 1) = 1").groupBy().sum().collect()
         }
     
         benchmark.addCase("With codegen") { iter =>
           sqlContext.setConf("spark.sql.codegen.wholeStage", "true")
    -      sqlContext.range(values).filter("(id & 1) = 1").count()
    +      sqlContext.range(values).filter("(id & 1) = 1").groupBy().sum().collect()
         }
     
         /*
           Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      rang/filter/aggregate:            Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    +      rang/filter/aggregate:             Avg Time(ms)    Avg Rate(M/s)  Relative Rate
           -------------------------------------------------------------------------------
    -      Without codegen             7775.53            26.97         1.00 X
    -      With codegen                 342.15           612.94        22.73 X
    +      Without codegen                         5488.16            38.21         1.00 X
    +      With codegen                             531.08           394.88        10.33 X
    --- End diff --
    
    what's causing the big drop?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179378580
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-182145652
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179571012
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-182109027
  
    @nongli Had reverted it to Murmur3 (we could figure out a faster hash function later).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11010#discussion_r51526748
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala ---
    @@ -424,3 +424,69 @@ case class Murmur3Hash(children: Seq[Expression], seed: Int) extends Expression
         }
       }
     }
    +
    +/**
    +  * A function that calculates hash value for a group of expressions, which basically XOR all the
    +  * hash code of children expressions together.
    +  *
    +  * Note: This is used for hash map for aggreagte, designed for performance (has worse
    --- End diff --
    
    UnsafeRow will call murmur3, which is still slow


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-182117609
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50998/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179536912
  
    **[Test build #50705 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50705/consoleFull)** for PR 11010 at commit [`5eff34b`](https://github.com/apache/spark/commit/5eff34bd7f7f0fef31a6a81ddc5574914b0e00cf).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11010#discussion_r51524900
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala ---
    @@ -41,20 +42,20 @@ class BenchmarkWholeStageCodegen extends SparkFunSuite {
     
         benchmark.addCase("Without codegen") { iter =>
           sqlContext.setConf("spark.sql.codegen.wholeStage", "false")
    -      sqlContext.range(values).filter("(id & 1) = 1").count()
    +      sqlContext.range(values).filter("(id & 1) = 1").groupBy().sum().collect()
         }
     
         benchmark.addCase("With codegen") { iter =>
           sqlContext.setConf("spark.sql.codegen.wholeStage", "true")
    -      sqlContext.range(values).filter("(id & 1) = 1").count()
    +      sqlContext.range(values).filter("(id & 1) = 1").groupBy().sum().collect()
         }
     
         /*
           Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      rang/filter/aggregate:            Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    +      rang/filter/aggregate:             Avg Time(ms)    Avg Rate(M/s)  Relative Rate
           -------------------------------------------------------------------------------
    -      Without codegen             7775.53            26.97         1.00 X
    -      With codegen                 342.15           612.94        22.73 X
    +      Without codegen                         5488.16            38.21         1.00 X
    +      With codegen                             531.08           394.88        10.33 X
    --- End diff --
    
    The benchmark is not that stable, this number change from 10 to 20, maybe 200M are still not enough, I will increase it to 500M. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179613974
  
    **[Test build #50734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50734/consoleFull)** for PR 11010 at commit [`6c9ce88`](https://github.com/apache/spark/commit/6c9ce8822d3e1f29b7785a154350d6ccb6a5f6bb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11010#issuecomment-179568076
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12950] [SQL] Improve lookup of BytesToB...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11010


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org