You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2016/03/22 00:43:45 UTC

[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/11870

    [SPARK-14052] [SQL] build a BytesToBytesMap directly in HashedRelation

    ## What changes were proposed in this pull request?
    
    Currently, for the key that can not fit within a long,  we build a hash map for UnsafeHashedRelation, it's converted to BytesToBytesMap after serialization and deserialization. We should build a BytesToBytesMap directly to have better memory efficiency.
    
    In order to do that, BytesToBytesMap should support multiple (K,V) pair with the same K, a new API called `append()` is added to BytesToBytesMap, which only look for empty slot and put key there. Also add a `next()` for BytesToBytesMap.Location to check if there is any pair following current one that have the same key.
    
    ## How was this patch tested?
    
    Existing tests.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark map2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11870.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11870
    
----
commit d18e5edda8b403ea3655a84e0bb7d8f00c8ebf9b
Author: Davies Liu <da...@databricks.com>
Date:   2016-03-21T23:14:37Z

    build a BytesToBytesMap directly in HashedRelation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200040663
  
    **[Test build #53808 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53808/consoleFull)** for PR 11870 at commit [`25e73c5`](https://github.com/apache/spark/commit/25e73c5c49bdf34d618e2df4d0e503c3daf4ccac).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-199964885
  
    **[Test build #53786 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53786/consoleFull)** for PR 11870 at commit [`93409de`](https://github.com/apache/spark/commit/93409de80420da20556807d3f660b2ea81cdc605).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-202605114
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-199539884
  
    **[Test build #53727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53727/consoleFull)** for PR 11870 at commit [`d18e5ed`](https://github.com/apache/spark/commit/d18e5edda8b403ea3655a84e0bb7d8f00c8ebf9b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200558311
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53972/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-199540564
  
    **[Test build #53727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53727/consoleFull)** for PR 11870 at commit [`d18e5ed`](https://github.com/apache/spark/commit/d18e5edda8b403ea3655a84e0bb7d8f00c8ebf9b).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201038801
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54093/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11870#discussion_r57621283
  
    --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
    @@ -560,6 +572,20 @@ private Location with(Object base, long offset, int length) {
         }
     
         /**
    +     * Find the next pair that has the same key as current one.
    +     */
    +    public boolean nextValue() {
    +      // Remember the current key
    +      assert isDefined;
    +      long nextAddr = Platform.getLong(baseObject, valueOffset + valueLength);
    +      if (nextAddr == 0) {
    +        return false;
    +      }
    --- End diff --
    
    nit: maybe have the next 2 lines in an else block for better readability?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201223202
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54148/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200573206
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53979/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201029914
  
    **[Test build #54093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54093/consoleFull)** for PR 11870 at commit [`c701b3b`](https://github.com/apache/spark/commit/c701b3b51940c07c126915d2a5628d47214ee3c8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-199539132
  
    cc @JoshRosen , i'm adding more unit tests for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-199679489
  
    cc @sameeragarwal too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201223167
  
    **[Test build #54148 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54148/consoleFull)** for PR 11870 at commit [`e76006d`](https://github.com/apache/spark/commit/e76006d19b3dfde2a3cebdf99da6454f524362be).
     * This patch **fails from timeout after a configured wait of \`250m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201126653
  
    **[Test build #54143 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54143/consoleFull)** for PR 11870 at commit [`2462422`](https://github.com/apache/spark/commit/246242219ab39405bedd00d3a91c0dbb35ae194b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11870#discussion_r57091496
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -340,6 +260,29 @@ private[joins] final class UnsafeHashedRelation(
       }
     }
     
    +/**
    + * A HashedRelation for UnsafeRow with unique keys.
    + */
    +private[joins] final class UniqueUnsafeHashedRelation(
    +  private var numFields: Int,
    --- End diff --
    
    These two lines are underindented.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201724179
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201052374
  
    **[Test build #2688 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2688/consoleFull)** for PR 11870 at commit [`c701b3b`](https://github.com/apache/spark/commit/c701b3b51940c07c126915d2a5628d47214ee3c8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200573205
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200572183
  
    @rxin Had switched to the algorithm you suggested, it's easier than I thought, also faster than previous one in the cases of duplicated keys (as we expected), because we do not need to compare the key for every value anymore (which is not cheap). But still have multiple copies of keys, because I want to keep the iterator as simple as before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11870#discussion_r57624415
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala ---
    @@ -184,11 +184,29 @@ class BenchmarkWholeStageCodegen extends SparkFunSuite {
     
         /**
         Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -    Join w 2 longs:                      Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    Join w 2 longs:                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
         -------------------------------------------------------------------------------------------
    -    Join w 2 longs codegen=false              7877 / 8358         13.3          75.1       1.0X
    -    Join w 2 longs codegen=true               3877 / 3937         27.0          37.0       2.0X
    +    Join w 2 longs codegen=false           12725 / 13158          8.2         121.4       1.0X
    +    Join w 2 longs codegen=true              6044 / 6771         17.3          57.6       2.1X
           */
    +
    +    val dim4 = broadcast(sqlContext.range(M)
    +      .selectExpr("cast(id/10 as long) as k1", "cast(id/10 as long) as k2"))
    +
    +    runBenchmark("Join w 2 longs duplicated", N) {
    --- End diff --
    
    All these are inner join, except the last two, so ignored.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200558297
  
    **[Test build #53972 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53972/consoleFull)** for PR 11870 at commit [`883b2ea`](https://github.com/apache/spark/commit/883b2ea12b573347674c39e1afa9a80e305e9889).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200478396
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201724183
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54260/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201223201
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200573194
  
    **[Test build #53979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53979/consoleFull)** for PR 11870 at commit [`9f36237`](https://github.com/apache/spark/commit/9f36237f2d7178875878611968df5d326e1a455a).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-202511033
  
    @sameeragarwal PING


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-202605115
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54351/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201038769
  
    **[Test build #54093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54093/consoleFull)** for PR 11870 at commit [`c701b3b`](https://github.com/apache/spark/commit/c701b3b51940c07c126915d2a5628d47214ee3c8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200040988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53808/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11870#discussion_r57205363
  
    --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
    @@ -480,11 +480,29 @@ public void safeLookup(Object keyBase, long keyOffset, int keyLength, Location l
       }
     
       /**
    +   * Append a pair of key-value at the end, there could other pairs with the same keys.
    +   */
    +  public boolean append(Object keyBase, long keyOffset, int keyLength,
    +                        Object valueBase, long valueOffset, int valueLength) {
    +    assert(longArray != null);
    +    int hash = Murmur3_x86_32.hashUnsafeWords(keyBase, keyOffset, keyLength, 42);
    +    int pos = hash & mask;
    +    int step = 1;
    +    while (longArray.get(pos * 2) != 0) {
    --- End diff --
    
    It's the same as we insert new key into the map, the map can be atmost 70% full.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201724139
  
    **[Test build #54260 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54260/consoleFull)** for PR 11870 at commit [`16fe15a`](https://github.com/apache/spark/commit/16fe15a5df90389230e0b1274ff4c23415aa90ca).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-199548333
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200557395
  
    **[Test build #53972 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53972/consoleFull)** for PR 11870 at commit [`883b2ea`](https://github.com/apache/spark/commit/883b2ea12b573347674c39e1afa9a80e305e9889).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-199975593
  
    **[Test build #53808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53808/consoleFull)** for PR 11870 at commit [`25e73c5`](https://github.com/apache/spark/commit/25e73c5c49bdf34d618e2df4d0e503c3daf4ccac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200558305
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11870#discussion_r57092440
  
    --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
    @@ -555,6 +574,31 @@ private Location with(Object base, long offset, int length) {
         }
     
         /**
    +     * Find the next pair that has the same key as current one.
    +     */
    +    public boolean next() {
    +      pos = (pos + step++) & mask;
    +      // Remember the current key
    +      Object keyBase = getKeyBase();
    +      long keyOffset = getKeyOffset();
    +      int keyLength = getKeyLength();
    +      assert (longArray != null);
    +      while (longArray.get(pos * 2) != 0) {
    --- End diff --
    
    What guarantees termination here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11870#discussion_r57622146
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala ---
    @@ -184,11 +184,29 @@ class BenchmarkWholeStageCodegen extends SparkFunSuite {
     
         /**
         Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -    Join w 2 longs:                      Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    Join w 2 longs:                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
         -------------------------------------------------------------------------------------------
    -    Join w 2 longs codegen=false              7877 / 8358         13.3          75.1       1.0X
    -    Join w 2 longs codegen=true               3877 / 3937         27.0          37.0       2.0X
    +    Join w 2 longs codegen=false           12725 / 13158          8.2         121.4       1.0X
    +    Join w 2 longs codegen=true              6044 / 6771         17.3          57.6       2.1X
           */
    +
    +    val dim4 = broadcast(sqlContext.range(M)
    +      .selectExpr("cast(id/10 as long) as k1", "cast(id/10 as long) as k2"))
    +
    +    runBenchmark("Join w 2 longs duplicated", N) {
    --- End diff --
    
    nit: inner join


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201143086
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54143/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200522669
  
    @JoshRosen Had updated the benchmark result for BroadcastHashJoin in local mode, which use the HashMap before this patch, it become 50% slower (same for ShuffledHashJoin). The good part is that we are using less memory (also less objects), and more GC friendly, also support OffHeap, That's the same reason we use BytesToBytesMap for aggregate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200571907
  
    **[Test build #53979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53979/consoleFull)** for PR 11870 at commit [`9f36237`](https://github.com/apache/spark/commit/9f36237f2d7178875878611968df5d326e1a455a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200040986
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-202604607
  
    **[Test build #54351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54351/consoleFull)** for PR 11870 at commit [`b165329`](https://github.com/apache/spark/commit/b1653294bdebe2c6c0db6dce5beca870a3e58ef7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201061661
  
    **[Test build #2688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2688/consoleFull)** for PR 11870 at commit [`c701b3b`](https://github.com/apache/spark/commit/c701b3b51940c07c126915d2a5628d47214ee3c8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201142974
  
    **[Test build #54143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54143/consoleFull)** for PR 11870 at commit [`2462422`](https://github.com/apache/spark/commit/246242219ab39405bedd00d3a91c0dbb35ae194b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11870#discussion_r57205642
  
    --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
    @@ -555,6 +574,31 @@ private Location with(Object base, long offset, int length) {
         }
     
         /**
    +     * Find the next pair that has the same key as current one.
    +     */
    +    public boolean next() {
    +      pos = (pos + step++) & mask;
    +      // Remember the current key
    +      Object keyBase = getKeyBase();
    +      long keyOffset = getKeyOffset();
    +      int keyLength = getKeyLength();
    +      assert (longArray != null);
    +      while (longArray.get(pos * 2) != 0) {
    --- End diff --
    
    The same as we insert new key.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11870#discussion_r57621657
  
    --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
    @@ -657,26 +682,23 @@ public int getValueLength() {
          * @return true if the put() was successful and false if the put() failed because memory could
          *         not be acquired.
          */
    -    public boolean putNewKey(Object keyBase, long keyOffset, int keyLength,
    -        Object valueBase, long valueOffset, int valueLength) {
    -      assert (!isDefined) : "Can only set value once for a key";
    -      assert (keyLength % 8 == 0);
    -      assert (valueLength % 8 == 0);
    -      assert(longArray != null);
    -
    +    public boolean append(Object kbase, long koff, int klen, Object vbase, long voff, int vlen) {
    +      assert (klen % 8 == 0);
    --- End diff --
    
    would bitwise & be faster here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11870#discussion_r57622346
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -347,29 +293,53 @@ private[joins] object UnsafeHashedRelation {
           keyGenerator: UnsafeProjection,
           sizeEstimate: Int): HashedRelation = {
     
    -    // Use a Java hash table here because unsafe maps expect fixed size records
    -    // TODO: Use BytesToBytesMap for memory efficiency
    -    val hashTable = new JavaHashMap[UnsafeRow, CompactBuffer[UnsafeRow]](sizeEstimate)
    +    val taskMemoryManager = if (TaskContext.get() != null) {
    +      TaskContext.get().taskMemoryManager()
    +    } else {
    +      new TaskMemoryManager(
    +        new StaticMemoryManager(
    +          new SparkConf().set("spark.memory.offHeap.enabled", "false"),
    +          Long.MaxValue,
    +          Long.MaxValue,
    +          1),
    +        0)
    +    }
    +    val pageSizeBytes = Option(SparkEnv.get).map(_.memoryManager.pageSizeBytes)
    +      .getOrElse(new SparkConf().getSizeAsBytes("spark.buffer.pageSize", "16m"))
    +
    +    val binaryMap = new BytesToBytesMap(
    +      taskMemoryManager,
    +      (sizeEstimate * 1.5 + 1).toInt, // reduce hash collision
    --- End diff --
    
    can you please comment here on why this particular value works best for reducing collisions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-202561977
  
    **[Test build #54351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54351/consoleFull)** for PR 11870 at commit [`b165329`](https://github.com/apache/spark/commit/b1653294bdebe2c6c0db6dce5beca870a3e58ef7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201992875
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200478398
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53959/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201143080
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201038798
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11870#discussion_r57092342
  
    --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
    @@ -555,6 +574,31 @@ private Location with(Object base, long offset, int length) {
         }
     
         /**
    +     * Find the next pair that has the same key as current one.
    +     */
    +    public boolean next() {
    --- End diff --
    
    Maybe call this nextPairWithMatchingKey()?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-199965318
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-199540571
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201992876
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54273/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-199548337
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53730/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-199547643
  
    **[Test build #53730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53730/consoleFull)** for PR 11870 at commit [`b8a3dd6`](https://github.com/apache/spark/commit/b8a3dd6e3d27616a911448b6baf38905f68ddaed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201649600
  
    **[Test build #2697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2697/consoleFull)** for PR 11870 at commit [`e76006d`](https://github.com/apache/spark/commit/e76006d19b3dfde2a3cebdf99da6454f524362be).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-199965323
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53786/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201723272
  
    **[Test build #54260 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54260/consoleFull)** for PR 11870 at commit [`16fe15a`](https://github.com/apache/spark/commit/16fe15a5df90389230e0b1274ff4c23415aa90ca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11870


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11870#discussion_r57091017
  
    --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
    @@ -480,11 +480,29 @@ public void safeLookup(Object keyBase, long keyOffset, int keyLength, Location l
       }
     
       /**
    +   * Append a pair of key-value at the end, there could other pairs with the same keys.
    +   */
    +  public boolean append(Object keyBase, long keyOffset, int keyLength,
    +                        Object valueBase, long valueOffset, int valueLength) {
    +    assert(longArray != null);
    +    int hash = Murmur3_x86_32.hashUnsafeWords(keyBase, keyOffset, keyLength, 42);
    +    int pos = hash & mask;
    +    int step = 1;
    +    while (longArray.get(pos * 2) != 0) {
    --- End diff --
    
    What guarantees that this loop will terminate?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11870#discussion_r57623687
  
    --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
    @@ -657,26 +682,23 @@ public int getValueLength() {
          * @return true if the put() was successful and false if the put() failed because memory could
          *         not be acquired.
          */
    -    public boolean putNewKey(Object keyBase, long keyOffset, int keyLength,
    -        Object valueBase, long valueOffset, int valueLength) {
    -      assert (!isDefined) : "Can only set value once for a key";
    -      assert (keyLength % 8 == 0);
    -      assert (valueLength % 8 == 0);
    -      assert(longArray != null);
    -
    +    public boolean append(Object kbase, long koff, int klen, Object vbase, long voff, int vlen) {
    +      assert (klen % 8 == 0);
    --- End diff --
    
    assert is only used when testing, also the JIT compiler can optimize these.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-202562101
  
    Merging this into master (since the last commit is trivial)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-199540573
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53727/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201146430
  
    **[Test build #54148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54148/consoleFull)** for PR 11870 at commit [`e76006d`](https://github.com/apache/spark/commit/e76006d19b3dfde2a3cebdf99da6454f524362be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11870#discussion_r57621843
  
    --- Diff: core/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java ---
    @@ -587,6 +587,44 @@ public void spillInIterator() throws IOException {
       }
     
       @Test
    +  public void multpleValuesForSameKey() {
    --- End diff --
    
    nit: typo


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-199915650
  
    **[Test build #53786 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53786/consoleFull)** for PR 11870 at commit [`93409de`](https://github.com/apache/spark/commit/93409de80420da20556807d3f660b2ea81cdc605).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-202556888
  
    LGTM, just few minor comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201591152
  
    **[Test build #2697 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2697/consoleFull)** for PR 11870 at commit [`e76006d`](https://github.com/apache/spark/commit/e76006d19b3dfde2a3cebdf99da6454f524362be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-200086567
  
    Does this have any performance impact on runtime rather than peak memory usage?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201980413
  
    **[Test build #54273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54273/consoleFull)** for PR 11870 at commit [`11b2364`](https://github.com/apache/spark/commit/11b2364d714869b92a98fd863d341f743d2d302c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-199548322
  
    **[Test build #53730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53730/consoleFull)** for PR 11870 at commit [`b8a3dd6`](https://github.com/apache/spark/commit/b8a3dd6e3d27616a911448b6baf38905f68ddaed).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11870#issuecomment-201992845
  
    **[Test build #54273 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54273/consoleFull)** for PR 11870 at commit [`11b2364`](https://github.com/apache/spark/commit/11b2364d714869b92a98fd863d341f743d2d302c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11870#discussion_r57622786
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -347,29 +293,53 @@ private[joins] object UnsafeHashedRelation {
           keyGenerator: UnsafeProjection,
           sizeEstimate: Int): HashedRelation = {
     
    -    // Use a Java hash table here because unsafe maps expect fixed size records
    -    // TODO: Use BytesToBytesMap for memory efficiency
    -    val hashTable = new JavaHashMap[UnsafeRow, CompactBuffer[UnsafeRow]](sizeEstimate)
    +    val taskMemoryManager = if (TaskContext.get() != null) {
    +      TaskContext.get().taskMemoryManager()
    +    } else {
    +      new TaskMemoryManager(
    +        new StaticMemoryManager(
    +          new SparkConf().set("spark.memory.offHeap.enabled", "false"),
    +          Long.MaxValue,
    +          Long.MaxValue,
    +          1),
    +        0)
    +    }
    +    val pageSizeBytes = Option(SparkEnv.get).map(_.memoryManager.pageSizeBytes)
    +      .getOrElse(new SparkConf().getSizeAsBytes("spark.buffer.pageSize", "16m"))
    +
    +    val binaryMap = new BytesToBytesMap(
    +      taskMemoryManager,
    +      (sizeEstimate * 1.5 + 1).toInt, // reduce hash collision
    +      pageSizeBytes)
     
         // Create a mapping of buildKeys -> rows
    +    var numFields = 0
    +    // Whether all the keys are unique or not
    +    var allUnique: Boolean = true
         while (input.hasNext) {
    -      val unsafeRow = input.next().asInstanceOf[UnsafeRow]
    -      val rowKey = keyGenerator(unsafeRow)
    -      if (!rowKey.anyNull) {
    -        val existingMatchList = hashTable.get(rowKey)
    -        val matchList = if (existingMatchList == null) {
    -          val newMatchList = new CompactBuffer[UnsafeRow]()
    -          hashTable.put(rowKey.copy(), newMatchList)
    -          newMatchList
    -        } else {
    -          existingMatchList
    +      val row = input.next().asInstanceOf[UnsafeRow]
    +      numFields = row.numFields()
    +      val key = keyGenerator(row)
    +      if (!key.anyNull) {
    +        val loc = binaryMap.lookup(key.getBaseObject, key.getBaseOffset, key.getSizeInBytes)
    +        if (loc.isDefined) {
    +          allUnique = false
    +        }
    +        val success = loc.append(
    +          key.getBaseObject, key.getBaseOffset, key.getSizeInBytes,
    +          row.getBaseObject, row.getBaseOffset, row.getSizeInBytes)
    +        if (!success) {
    +          binaryMap.free()
    +          throw new SparkException("There is no enough memory to build hash map")
    --- End diff --
    
    nit: not


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org