You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2014/10/27 23:25:33 UTC
[jira] [Updated] (SPARK-4104) KVArraySortDataFormat is not as fast
as Java's Arrays.sort()
[ https://issues.apache.org/jira/browse/SPARK-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng updated SPARK-4104:
---------------------------------
Description:
The previous benchmark code in `SorterSuite` doesn't reset the array after each run. So we were comparing both algorithms on already ordered arrays. With the correct code, KVArraySortDataFormat is slower than Java's Arrays.sort(). On Java 7, I got the following on arrays of size 25 million:
{code}
Tuple-sort using Arrays.sort(): Took 25626 ms
Tuple-sort using Arrays.sort(): Took 28018 ms
Tuple-sort using Arrays.sort(): Took 26932 ms
Tuple-sort using Arrays.sort(): Took 24436 ms
Tuple-sort using Arrays.sort(): Took 25894 ms
Tuple-sort using Arrays.sort(): Took 24965 ms
Tuple-sort using Arrays.sort(): Took 23817 ms
Tuple-sort using Arrays.sort(): Took 23692 ms
Tuple-sort using Arrays.sort(): Took 26731 ms
Tuple-sort using Arrays.sort(): Took 23667 ms
Tuple-sort using Arrays.sort(): (36662 ms first try, 25377 ms average)
KV-sort using Sorter: Took 39579 ms
KV-sort using Sorter: Took 39176 ms
KV-sort using Sorter: Took 41760 ms
KV-sort using Sorter: Took 42469 ms
KV-sort using Sorter: Took 43133 ms
KV-sort using Sorter: Took 41692 ms
KV-sort using Sorter: Took 39585 ms
KV-sort using Sorter: Took 41617 ms
KV-sort using Sorter: Took 42300 ms
KV-sort using Sorter: Took 48274 ms
KV-sort using Sorter: (47217 ms first try, 41958 ms average)
{code}
was:
The previous benchmark code in `SorterSuite` doesn't reset the array after each run. So we were comparing both algorithms on already ordered arrays. With the correct code, KVArraySortDataFormat is slower than Java's Arrays.sort(). On Java 7, I got the following on arrays of size 25 million:
{code}
uple-sort using Arrays.sort(): Took 25626 ms
Tuple-sort using Arrays.sort(): Took 28018 ms
Tuple-sort using Arrays.sort(): Took 26932 ms
Tuple-sort using Arrays.sort(): Took 24436 ms
Tuple-sort using Arrays.sort(): Took 25894 ms
Tuple-sort using Arrays.sort(): Took 24965 ms
Tuple-sort using Arrays.sort(): Took 23817 ms
Tuple-sort using Arrays.sort(): Took 23692 ms
Tuple-sort using Arrays.sort(): Took 26731 ms
Tuple-sort using Arrays.sort(): Took 23667 ms
Tuple-sort using Arrays.sort(): (36662 ms first try, 25377 ms average)
KV-sort using Sorter: Took 39579 ms
KV-sort using Sorter: Took 39176 ms
KV-sort using Sorter: Took 41760 ms
KV-sort using Sorter: Took 42469 ms
KV-sort using Sorter: Took 43133 ms
KV-sort using Sorter: Took 41692 ms
KV-sort using Sorter: Took 39585 ms
KV-sort using Sorter: Took 41617 ms
KV-sort using Sorter: Took 42300 ms
KV-sort using Sorter: Took 48274 ms
KV-sort using Sorter: (47217 ms first try, 41958 ms average)
{code}
> KVArraySortDataFormat is not as fast as Java's Arrays.sort()
> ------------------------------------------------------------
>
> Key: SPARK-4104
> URL: https://issues.apache.org/jira/browse/SPARK-4104
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 1.1.0, 1.2.0
> Reporter: Xiangrui Meng
>
> The previous benchmark code in `SorterSuite` doesn't reset the array after each run. So we were comparing both algorithms on already ordered arrays. With the correct code, KVArraySortDataFormat is slower than Java's Arrays.sort(). On Java 7, I got the following on arrays of size 25 million:
> {code}
> Tuple-sort using Arrays.sort(): Took 25626 ms
> Tuple-sort using Arrays.sort(): Took 28018 ms
> Tuple-sort using Arrays.sort(): Took 26932 ms
> Tuple-sort using Arrays.sort(): Took 24436 ms
> Tuple-sort using Arrays.sort(): Took 25894 ms
> Tuple-sort using Arrays.sort(): Took 24965 ms
> Tuple-sort using Arrays.sort(): Took 23817 ms
> Tuple-sort using Arrays.sort(): Took 23692 ms
> Tuple-sort using Arrays.sort(): Took 26731 ms
> Tuple-sort using Arrays.sort(): Took 23667 ms
> Tuple-sort using Arrays.sort(): (36662 ms first try, 25377 ms average)
> KV-sort using Sorter: Took 39579 ms
> KV-sort using Sorter: Took 39176 ms
> KV-sort using Sorter: Took 41760 ms
> KV-sort using Sorter: Took 42469 ms
> KV-sort using Sorter: Took 43133 ms
> KV-sort using Sorter: Took 41692 ms
> KV-sort using Sorter: Took 39585 ms
> KV-sort using Sorter: Took 41617 ms
> KV-sort using Sorter: Took 42300 ms
> KV-sort using Sorter: Took 48274 ms
> KV-sort using Sorter: (47217 ms first try, 41958 ms average)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org