You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2015/11/05 01:08:07 UTC
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
GitHub user davies opened a pull request:
https://github.com/apache/spark/pull/9477
[SPARK-7542] [SQL] Support off-heap index/sort buffer
This brings the support of off-heap memory for array inside BytesToBytesMap and InMemorySorter, then we could allocate all the memory from off-heap for execution.
This PR include #9383, see https://github.com/apache/spark/commit/2b3277781c21d0efb20275bd5632a2d4f7f171c3 for the real changes.
Closes #8068
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/davies/spark unsafe_timsort
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9477.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9477
----
commit 4e09081050633bc8baba4cebc5ffba6e9d900bae
Author: Davies Liu <da...@databricks.com>
Date: 2015-10-30T19:08:29Z
Do hash-based aggregation for all records before switch to sort-based
commit 53dbdf2d4c8c547e6bd50a589bf0223e7ce95e84
Author: Davies Liu <da...@databricks.com>
Date: 2015-10-30T19:24:08Z
merge the last map
commit 2e341f50b656d0effe36004b6abc68898a119f35
Author: Davies Liu <da...@databricks.com>
Date: 2015-11-02T18:56:59Z
update tests
commit df44fc64ed1495a1c0f6f51a7014327b6a8750b7
Author: Davies Liu <da...@databricks.com>
Date: 2015-11-02T20:57:18Z
fix bug
commit 6f3bb15b19cd326f677f15860cf215f57fd3671a
Author: Davies Liu <da...@databricks.com>
Date: 2015-11-03T21:27:35Z
address comments, add regression test
commit fc5e052ff17560d02ef7cdeec91a4a30605c65f0
Author: Davies Liu <da...@databricks.com>
Date: 2015-11-04T04:51:13Z
free the array after spilling
commit d89e03463b99e56fd25e5ada8f7d146b6749082f
Author: Davies Liu <da...@databricks.com>
Date: 2015-11-04T06:11:58Z
refactor
commit 1c0c6c36a5a16c33ceb4cd43534ce02ec3c2b286
Author: Davies Liu <da...@databricks.com>
Date: 2015-11-04T06:20:02Z
cleanup
commit d8422e15e70a1fab2535ca13ac01c0d3a7be19e9
Author: Davies Liu <da...@databricks.com>
Date: 2015-11-04T06:27:11Z
throw better exception
commit cbeaedf1cc47365ea90db6478819ca02db5acaea
Author: Davies Liu <da...@databricks.com>
Date: 2015-11-04T23:00:16Z
add more comments
commit fbce6fe74b8dccd0aefa98a1183ba1321b500a56
Author: Davies Liu <da...@databricks.com>
Date: 2015-11-04T23:08:46Z
Merge branch 'master' of github.com:apache/spark into fix_switch
commit 10d71694ae07af68265bb36a957b4ff5320d8e72
Author: Davies Liu <da...@databricks.com>
Date: 2015-11-04T23:09:16Z
fix conflict
commit 8a20e569fdb43e26804e1c71439fd2d02c5f5a69
Author: Davies Liu <da...@databricks.com>
Date: 2015-11-04T23:21:08Z
Update UnsafeFixedWidthAggregationMap.java
commit f6a5f0629c0b462fa45c8da209f724c158fba078
Author: Davies Liu <da...@databricks.com>
Date: 2015-11-04T23:46:46Z
fix build
commit 2b3277781c21d0efb20275bd5632a2d4f7f171c3
Author: Davies Liu <da...@databricks.com>
Date: 2015-11-05T00:04:11Z
support off-heap index/sort buffer
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154134910
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153917623
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154140207
**[Test build #45127 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45127/consoleFull)** for PR 9477 at commit [`c35f512`](https://github.com/apache/spark/commit/c35f5124fcb2746a12faad280063a8424bfff821).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153950136
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154212662
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153953921
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45070/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153911937
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153917595
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/9477#discussion_r43983047
--- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java ---
@@ -32,24 +37,39 @@ public int compare(PackedRecordPointer left, PackedRecordPointer right) {
}
private static final SortComparator SORT_COMPARATOR = new SortComparator();
+ private final MemoryConsumer consumer;
+ private final TaskMemoryManager memoryManager;
--- End diff --
Mind calling this `taskMemoryManager` so that it's clear that's what it is?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153914518
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154210332
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153941334
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153978941
LGTM overall, but I'd like to address one concern before merging: I'm worried that passing both the `MemoryConsumer` and `TaskMemoryManager` to the sorter components will open the potential for bugs; I feel that those classes should allocate their memory using only the public `MemoryConsumer` methods.
Note that the `MemoryConsumer` methods actually end up calling the `TaskMemoryManager` methods which your code calls directly: https://github.com/apache/spark/blob/81498dd5c86ca51d2fb351c8ef52cbb28e6844f4/core/src/main/java/org/apache/spark/memory/MemoryConsumer.java#L81
I think that we should mark those TaskMemoryManager methods as methods that are only supposed to be called _by_ the memory consumer and not directly by developers. The problem with calling them directly is that the bookkeeping in the MemoryConsumer itself won't have been updated. This current API has such a high potential for this type of misuse that I think we should look at fancy Java-isms to restrict the visibility / callability of those methods such that they can only be called from MemoryConsumer.
If you fix this by having those classes only allocate through their MemoryConsumers then feel free to merge as soon as this passes tests. I can take care of a followup to fix the documentation / code to avoid this type of misuse.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153962196
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154248203
**[Test build #1994 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1994/consoleFull)** for PR 9477 at commit [`b367daf`](https://github.com/apache/spark/commit/b367dafd1fcf29367776a42b3ef3086506c70605).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153953782
**[Test build #45070 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45070/consoleFull)** for PR 9477 at commit [`862b38f`](https://github.com/apache/spark/commit/862b38f9d58c51e5d06b15c311c91842aa183475).
* This patch **fails from timeout after a configured wait of \`250m\`**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153996532
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154139243
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154261887
**[Test build #1993 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1993/consoleFull)** for PR 9477 at commit [`b367daf`](https://github.com/apache/spark/commit/b367dafd1fcf29367776a42b3ef3086506c70605).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:\n * `final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray> `\n * `final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray> `\n
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154170426
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/9477#discussion_r43983565
--- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java ---
@@ -96,14 +111,12 @@ public long getMemoryUsage() {
*/
public void insertRecord(long recordPointer, int partitionId) {
if (!hasSpaceForAnotherRecord()) {
- if (array.length == Integer.MAX_VALUE) {
- throw new IllegalStateException("Sort pointer array has reached maximum size");
--- End diff --
Well, technically only if we're running in off-heap mode.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/9477#discussion_r43983728
--- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java ---
@@ -78,22 +81,33 @@ public int compare(RecordPointerAndKeyPrefix r1, RecordPointerAndKeyPrefix r2) {
private int pos = 0;
public UnsafeInMemorySorter(
+ final MemoryConsumer consumer,
--- End diff --
Similar question here: why not pass _just_ the consumer?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153924523
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154135406
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153961923
@JoshRosen This is ready for review
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154270321
LGTM, so I'm going to merge this into master and 1.6. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154048664
**[Test build #45108 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45108/consoleFull)** for PR 9477 at commit [`b42e7db`](https://github.com/apache/spark/commit/b42e7db68067d64b6b0e19bd00b3371ffde4b174).
* This patch **fails from timeout after a configured wait of \`250m\`**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:\n * `final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray> `\n * `final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray> `\n
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/9477#discussion_r43983106
--- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java ---
@@ -96,14 +111,12 @@ public long getMemoryUsage() {
*/
public void insertRecord(long recordPointer, int partitionId) {
if (!hasSpaceForAnotherRecord()) {
- if (array.length == Integer.MAX_VALUE) {
- throw new IllegalStateException("Sort pointer array has reached maximum size");
--- End diff --
I guess this lifts the size limit. Nice!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153914523
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45067/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154054716
**[Test build #45110 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45110/consoleFull)** for PR 9477 at commit [`854a99f`](https://github.com/apache/spark/commit/854a99f6339e831efccf9868c0bacbace2e1f75d).
* This patch **fails from timeout after a configured wait of \`250m\`**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:\n * `final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray> `\n * `final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray> `\n
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154273968
**[Test build #1994 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1994/consoleFull)** for PR 9477 at commit [`b367daf`](https://github.com/apache/spark/commit/b367dafd1fcf29367776a42b3ef3086506c70605).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:\n * `final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray> `\n * `final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray> `\n
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154210258
**[Test build #45125 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45125/consoleFull)** for PR 9477 at commit [`854a99f`](https://github.com/apache/spark/commit/854a99f6339e831efccf9868c0bacbace2e1f75d).
* This patch **fails from timeout after a configured wait of \`250m\`**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:\n * `final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray> `\n * `final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray> `\n
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153987350
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154212555
**[Test build #45127 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45127/consoleFull)** for PR 9477 at commit [`c35f512`](https://github.com/apache/spark/commit/c35f5124fcb2746a12faad280063a8424bfff821).
* This patch **fails from timeout after a configured wait of \`250m\`**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:\n * `final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray> `\n * `final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray> `\n
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154212667
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45127/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153992590
**[Test build #45110 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45110/consoleFull)** for PR 9477 at commit [`854a99f`](https://github.com/apache/spark/commit/854a99f6339e831efccf9868c0bacbace2e1f75d).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154054815
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154135434
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153953917
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153925753
**[Test build #45076 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45076/consoleFull)** for PR 9477 at commit [`3cb22d4`](https://github.com/apache/spark/commit/3cb22d4422e33a8dfafb0b181309a666ebd6d369).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154188619
**[Test build #1989 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1989/consoleFull)** for PR 9477 at commit [`b367daf`](https://github.com/apache/spark/commit/b367dafd1fcf29367776a42b3ef3086506c70605).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154228222
The failed test is not related, will re-run it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153924545
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153912804
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/9477#discussion_r43982895
--- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -321,9 +320,10 @@ private void growPointerArrayIfNecessary() throws IOException {
assert(inMemSorter != null);
if (!inMemSorter.hasSpaceForAnotherRecord()) {
long used = inMemSorter.getMemoryUsage();
- long needed = used + inMemSorter.getMemoryToExpand();
+ MemoryBlock page;
try {
- acquireMemory(needed); // could trigger spilling
+ // could trigger spilling
+ page = taskMemoryManager.allocatePage(used * 2, this);
--- End diff --
Implicit here is the fact that the in memory sorter's only source of memory usage is the pointer array itself. That's fine, though.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153911915
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154228478
**[Test build #1993 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1993/consoleFull)** for PR 9477 at commit [`b367daf`](https://github.com/apache/spark/commit/b367dafd1fcf29367776a42b3ef3086506c70605).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153991110
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153950138
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45076/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153950106
**[Test build #45076 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45076/consoleFull)** for PR 9477 at commit [`3cb22d4`](https://github.com/apache/spark/commit/3cb22d4422e33a8dfafb0b181309a666ebd6d369).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:\n * `final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray> `\n * `final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray> `\n
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/9477#discussion_r43983319
--- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSortDataFormat.java ---
@@ -44,37 +47,43 @@ public RecordPointerAndKeyPrefix newKey() {
}
@Override
- public RecordPointerAndKeyPrefix getKey(long[] data, int pos, RecordPointerAndKeyPrefix reuse) {
- reuse.recordPointer = data[pos * 2];
- reuse.keyPrefix = data[pos * 2 + 1];
+ public RecordPointerAndKeyPrefix getKey(LongArray data, int pos, RecordPointerAndKeyPrefix reuse) {
+ reuse.recordPointer = data.get(pos * 2);
+ reuse.keyPrefix = data.get(pos * 2 + 1);
return reuse;
}
@Override
- public void swap(long[] data, int pos0, int pos1) {
- long tempPointer = data[pos0 * 2];
- long tempKeyPrefix = data[pos0 * 2 + 1];
- data[pos0 * 2] = data[pos1 * 2];
- data[pos0 * 2 + 1] = data[pos1 * 2 + 1];
- data[pos1 * 2] = tempPointer;
- data[pos1 * 2 + 1] = tempKeyPrefix;
+ public void swap(LongArray data, int pos0, int pos1) {
+ long tempPointer = data.get(pos0 * 2);
+ long tempKeyPrefix = data.get(pos0 * 2 + 1);
+ data.set(pos0 * 2, data.get(pos1 * 2));
+ data.set(pos0 * 2 + 1, data.get(pos1 * 2 + 1));
+ data.set(pos1 * 2, tempPointer);
+ data.set(pos1 * 2 + 1, tempKeyPrefix);
}
@Override
- public void copyElement(long[] src, int srcPos, long[] dst, int dstPos) {
- dst[dstPos * 2] = src[srcPos * 2];
- dst[dstPos * 2 + 1] = src[srcPos * 2 + 1];
+ public void copyElement(LongArray src, int srcPos, LongArray dst, int dstPos) {
+ dst.set(dstPos * 2, src.get(srcPos * 2));
+ dst.set(dstPos * 2 + 1, src.get(srcPos * 2 + 1));
}
@Override
- public void copyRange(long[] src, int srcPos, long[] dst, int dstPos, int length) {
- System.arraycopy(src, srcPos * 2, dst, dstPos * 2, length * 2);
+ public void copyRange(LongArray src, int srcPos, LongArray dst, int dstPos, int length) {
+ Platform.copyMemory(
+ src.getBaseObject(),
+ src.getBaseOffset() + srcPos * 16,
+ dst.getBaseObject(),
+ dst.getBaseOffset() + dstPos * 16,
+ length * 16);
}
@Override
- public long[] allocate(int length) {
+ public LongArray allocate(int length) {
assert (length < Integer.MAX_VALUE / 2) : "Length " + length + " is too large";
- return new long[length * 2];
+ // This is used as temporary buffer, it's fine to allocate from JVM heap.
+ return new LongArray(MemoryBlock.fromLongArray(new long[length * 2]));
--- End diff --
Do we need to zero-out this array?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154137615
**[Test build #45125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45125/consoleFull)** for PR 9477 at commit [`854a99f`](https://github.com/apache/spark/commit/854a99f6339e831efccf9868c0bacbace2e1f75d).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153913274
**[Test build #45070 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45070/consoleFull)** for PR 9477 at commit [`862b38f`](https://github.com/apache/spark/commit/862b38f9d58c51e5d06b15c311c91842aa183475).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153991140
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/9477#discussion_r43983322
--- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSortDataFormat.java ---
@@ -44,37 +47,43 @@ public RecordPointerAndKeyPrefix newKey() {
}
@Override
- public RecordPointerAndKeyPrefix getKey(long[] data, int pos, RecordPointerAndKeyPrefix reuse) {
- reuse.recordPointer = data[pos * 2];
- reuse.keyPrefix = data[pos * 2 + 1];
+ public RecordPointerAndKeyPrefix getKey(LongArray data, int pos, RecordPointerAndKeyPrefix reuse) {
+ reuse.recordPointer = data.get(pos * 2);
+ reuse.keyPrefix = data.get(pos * 2 + 1);
return reuse;
}
@Override
- public void swap(long[] data, int pos0, int pos1) {
- long tempPointer = data[pos0 * 2];
- long tempKeyPrefix = data[pos0 * 2 + 1];
- data[pos0 * 2] = data[pos1 * 2];
- data[pos0 * 2 + 1] = data[pos1 * 2 + 1];
- data[pos1 * 2] = tempPointer;
- data[pos1 * 2 + 1] = tempKeyPrefix;
+ public void swap(LongArray data, int pos0, int pos1) {
+ long tempPointer = data.get(pos0 * 2);
+ long tempKeyPrefix = data.get(pos0 * 2 + 1);
+ data.set(pos0 * 2, data.get(pos1 * 2));
+ data.set(pos0 * 2 + 1, data.get(pos1 * 2 + 1));
+ data.set(pos1 * 2, tempPointer);
+ data.set(pos1 * 2 + 1, tempKeyPrefix);
}
@Override
- public void copyElement(long[] src, int srcPos, long[] dst, int dstPos) {
- dst[dstPos * 2] = src[srcPos * 2];
- dst[dstPos * 2 + 1] = src[srcPos * 2 + 1];
+ public void copyElement(LongArray src, int srcPos, LongArray dst, int dstPos) {
+ dst.set(dstPos * 2, src.get(srcPos * 2));
+ dst.set(dstPos * 2 + 1, src.get(srcPos * 2 + 1));
}
@Override
- public void copyRange(long[] src, int srcPos, long[] dst, int dstPos, int length) {
- System.arraycopy(src, srcPos * 2, dst, dstPos * 2, length * 2);
+ public void copyRange(LongArray src, int srcPos, LongArray dst, int dstPos, int length) {
+ Platform.copyMemory(
+ src.getBaseObject(),
+ src.getBaseOffset() + srcPos * 16,
+ dst.getBaseObject(),
+ dst.getBaseOffset() + dstPos * 16,
+ length * 16);
}
@Override
- public long[] allocate(int length) {
+ public LongArray allocate(int length) {
assert (length < Integer.MAX_VALUE / 2) : "Length " + length + " is too large";
- return new long[length * 2];
+ // This is used as temporary buffer, it's fine to allocate from JVM heap.
+ return new LongArray(MemoryBlock.fromLongArray(new long[length * 2]));
--- End diff --
Wait, nevermind: it's on-heap.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154048716
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153990700
@JoshRosen I should had addressed your comments, will merge this once pass the tests.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153941271
**[Test build #45072 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45072/consoleFull)** for PR 9477 at commit [`77555e1`](https://github.com/apache/spark/commit/77555e10664cf4a468a5a6ebd530a0b470a56356).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154048718
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45108/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154165565
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153987965
**[Test build #45108 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45108/consoleFull)** for PR 9477 at commit [`b42e7db`](https://github.com/apache/spark/commit/b42e7db68067d64b6b0e19bd00b3371ffde4b174).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153941335
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45072/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153987324
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154054818
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45110/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/9477#discussion_r43983033
--- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java ---
@@ -32,24 +37,39 @@ public int compare(PackedRecordPointer left, PackedRecordPointer right) {
}
private static final SortComparator SORT_COMPARATOR = new SortComparator();
+ private final MemoryConsumer consumer;
+ private final TaskMemoryManager memoryManager;
+
/**
* An array of record pointers and partition ids that have been encoded by
* {@link PackedRecordPointer}. The sort operates on this array instead of directly manipulating
* records.
*/
- private long[] array;
+ private LongArray array;
/**
* The position in the pointer array where new records can be inserted.
*/
private int pos = 0;
- public ShuffleInMemorySorter(int initialSize) {
+ public ShuffleInMemorySorter(
+ MemoryConsumer consumer,
+ TaskMemoryManager memoryManager,
+ int initialSize) {
+ this.consumer = consumer;
+ this.memoryManager = memoryManager;
assert (initialSize > 0);
- this.array = new long[initialSize];
+ this.array = new LongArray(memoryManager.allocatePage(initialSize * 8L, consumer));
--- End diff --
If this `allocatePage` call were to fail, I think you'd get an NPE here, since `allocatePage` would return null.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/9477#discussion_r43983352
--- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java ---
@@ -96,14 +111,12 @@ public long getMemoryUsage() {
*/
public void insertRecord(long recordPointer, int partitionId) {
if (!hasSpaceForAnotherRecord()) {
- if (array.length == Integer.MAX_VALUE) {
- throw new IllegalStateException("Sort pointer array has reached maximum size");
- } else {
- expandPointerArray();
- }
+ // for testing
--- End diff --
For testing?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153996536
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45096/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153962004
**[Test build #1983 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1983/consoleFull)** for PR 9477 at commit [`3cb22d4`](https://github.com/apache/spark/commit/3cb22d4422e33a8dfafb0b181309a666ebd6d369).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/9477
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/9477#discussion_r44043737
--- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java ---
@@ -293,9 +292,10 @@ private void growPointerArrayIfNecessary() throws IOException {
assert(inMemSorter != null);
if (!inMemSorter.hasSpaceForAnotherRecord()) {
long used = inMemSorter.getMemoryUsage();
- long needed = used + inMemSorter.getMemoryToExpand();
+ LongArray array;
try {
- acquireMemory(needed); // could trigger spilling
+ // could trigger spilling
+ array = allocateArray(used / 16 * 2);
--- End diff --
Should this be `/ 8 * 2` instead, since we want to double the number of slots in the array and each slot requires 8 bytes of memory?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154139271
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153912836
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153917831
**[Test build #45072 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45072/consoleFull)** for PR 9477 at commit [`77555e1`](https://github.com/apache/spark/commit/77555e10664cf4a468a5a6ebd530a0b470a56356).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154165525
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154210333
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45125/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154223721
**[Test build #1989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1989/consoleFull)** for PR 9477 at commit [`b367daf`](https://github.com/apache/spark/commit/b367dafd1fcf29367776a42b3ef3086506c70605).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:\n * `final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray> `\n * `final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray> `\n
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153983073
**[Test build #1983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1983/consoleFull)** for PR 9477 at commit [`3cb22d4`](https://github.com/apache/spark/commit/3cb22d4422e33a8dfafb0b181309a666ebd6d369).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/9477#discussion_r43983708
--- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java ---
@@ -32,24 +37,39 @@ public int compare(PackedRecordPointer left, PackedRecordPointer right) {
}
private static final SortComparator SORT_COMPARATOR = new SortComparator();
+ private final MemoryConsumer consumer;
+ private final TaskMemoryManager memoryManager;
+
/**
* An array of record pointers and partition ids that have been encoded by
* {@link PackedRecordPointer}. The sort operates on this array instead of directly manipulating
* records.
*/
- private long[] array;
+ private LongArray array;
/**
* The position in the pointer array where new records can be inserted.
*/
private int pos = 0;
- public ShuffleInMemorySorter(int initialSize) {
+ public ShuffleInMemorySorter(
+ MemoryConsumer consumer,
+ TaskMemoryManager memoryManager,
--- End diff --
Why does this class take both a `memoryManager` _and_ a consumer? Why not pass it just the consumer and use methods of the `consumer` to do the allocation?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153962209
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153995771
**[Test build #45096 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45096/consoleFull)** for PR 9477 at commit [`89319e0`](https://github.com/apache/spark/commit/89319e0540dda6cd70d03e3454c999998fecc3d8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:\n * `final class ShuffleSortDataFormat extends SortDataFormat<PackedRecordPointer, LongArray> `\n * `final class UnsafeSortDataFormat extends SortDataFormat<RecordPointerAndKeyPrefix, LongArray> `\n
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-153963379
**[Test build #45096 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45096/consoleFull)** for PR 9477 at commit [`89319e0`](https://github.com/apache/spark/commit/89319e0540dda6cd70d03e3454c999998fecc3d8).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-7542] [SQL] Support off-heap index/sort...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9477#issuecomment-154170431
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45133/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org