You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by ooq <gi...@git.apache.org> on 2016/07/19 16:20:03 UTC

[GitHub] spark pull request #14266: [SPARK-16526] [SQL] Benchmarking Performance for ...

GitHub user ooq opened a pull request:

    https://github.com/apache/spark/pull/14266

    [SPARK-16526] [SQL] Benchmarking Performance for Fast HashMap Implementations and Set Knobs

    ## What changes were proposed in this pull request?
    
    The 3rd PR in its series to resolve SPARK-16523.
    
    This patch adds benchmark tests for vectorized hashmap vs. row-based hashmap (along with results in the comments). Those tests are ignored by default as they take long to run.
    We would also like to use the results to set the knob which switches between vectorized and row-based hashmap. 
    
    ## How was this patch tested?
    
    This patch are mostly tests itself.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ooq/spark rowbasedfastaggmap-pr3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14266.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14266
    
----
commit c87f26b318b5d673ac95454df5c1cb9a56c677eb
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-13T07:35:06Z

    add RowBatch and RowBasedHashMapGenerator

commit a3360e0ab1223dd43f891e755e648680a402b7df
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-13T08:08:35Z

    enable row based hashmap

commit 45641e5a7df341522518b19bf4a4662d14d64b48
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-13T08:52:31Z

    fix scale codestyle

commit b94fc6383f0727ce4249653550833fd3f0019a65
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-13T08:53:11Z

    merge fix

commit 9b0b294013239f4db744d7f5f5c1bdf838dd0559
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-13T08:55:53Z

    fix indent

commit 24248b190745bef13c567bd2681164d990d31cf3
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-14T18:18:33Z

    add SimpleRowBatch for performance

commit 9008725af8159ac186e0c7f81b08b85ddd7a0ec7
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-14T18:19:36Z

    a number of minor fixs

commit 4bdaeada70a20f89f6c593a4fc0298597e9a43cd
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-14T18:58:08Z

    Merge branch 'master' of github.com:apache/spark into rowbasedfastaggmap-pr2

commit 225b6619cd070ac9da3846a3bd02fa730e4ec835
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-14T20:53:28Z

    fix bug

commit bb4678856ebc1d729e530b9a1949ca9211c6a92e
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-15T17:43:40Z

    return row

commit a158125956627e502a8045fb077760063a3ca397
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-15T17:45:16Z

    simply fash hash map condition check

commit 22d8afd7dbd187b85e6f0c0d51544f0234d4beac
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-15T17:52:36Z

    update data structures to be consistent with what is used

commit ecff4ff3f30aefbaea89a12d2d5b3fda062b0f38
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-17T23:39:19Z

    Update simple row batch to improve performance & use SimpleRowBatch by default

commit 33b2910fa412669b2460b99ba0b6232f462e7879
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-17T23:57:41Z

    add simplerowbatch

commit 2c1973a872e5b8d99a55234724ec24acbc5f70ff
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-18T07:58:14Z

    Add tests for SimpleRowBatch

commit ce72d900004bfa720460126a3573642a8a97bc53
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-18T08:00:11Z

    keep in sync with pr1

commit 43cf549c27451209fc3fe4c8bb726fcfb2d7501c
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-18T09:59:03Z

    Add benchmarks for comparing hashmaps

commit 6515c3dc8b6f4084f66259f18af362fccb436157
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-18T18:21:01Z

    simply free page in iterator

commit 8f538b177e36ccc5fb690a3b29eb03ca72d1a4b2
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-18T19:00:00Z

    Clean logic in SimpleRowBatch that was supposedly to deal with multiple pages

commit 461028e62c9d9821cf11abdb9d85e9a8edb58ba4
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-18T19:01:36Z

    update with pr1

commit 774e088dc719cbd4d4ef97995656ec912b11878a
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-18T19:02:39Z

    update with pr1

commit 251d3919ed1b7dccacccc9bee6e121954a698cdd
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-18T22:09:47Z

    shrink findOrInsert() code size

commit 708f7bb3790556a596f6de51f127e99cd6f11662
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-18T22:12:29Z

    update some benchmarking results

commit d9394888977c97fe95f1642ad9f613dcbee1e4fa
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-18T22:25:56Z

    remove Rowbatch; renaming SimpleRowBatch to RowBasedKeyValueBatch

commit 02e4ab1c76cc777ef84cacf894f063505a19fffa
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-18T22:26:28Z

    Merge branch 'rowbasedfastaggmap-pr1' of github.com:ooq/spark into rowbasedfastaggmap-pr3

commit 60e78bd477a90892b8568c1da08d7b0e5fe3672a
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-18T22:42:10Z

    update benchmark

commit 20baf3e24699589342e14b1e8f2c90fec85d183b
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-18T23:13:29Z

    update benchmark

commit 3b3c9ea6dfc17ba4ebd562c70608be02f22693f9
Author: Qifan Pu <qi...@gmail.com>
Date:   2016-07-19T16:15:02Z

    Add benchmark results for vectorized vs. rowbased hashmap

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14266
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14266
  
    **[Test build #62945 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62945/consoleFull)** for PR 14266 at commit [`c2b276f`](https://github.com/apache/spark/commit/c2b276f015746f069b55410bc28a47537aceeb3c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14266
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62541/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14266
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...

Posted by ooq <gi...@git.apache.org>.

Github user ooq commented on the issue:

    https://github.com/apache/spark/pull/14266
  
    @davies Added some test results with larger number of distinct keys.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/14266
  
    (gentle ping @ooq)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14266
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14266: [SPARK-16526][SQL] Benchmarking Performance for Fast Has...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14266
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org