You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2014/10/27 23:25:33 UTC

[jira] [Updated] (SPARK-4104) KVArraySortDataFormat is not as fast as Java's Arrays.sort()

     [ https://issues.apache.org/jira/browse/SPARK-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiangrui Meng updated SPARK-4104:
---------------------------------
    Description: 
The previous benchmark code in `SorterSuite` doesn't reset the array after each run. So we were comparing both algorithms on already ordered arrays. With the correct code, KVArraySortDataFormat is slower than Java's Arrays.sort(). On Java 7, I got the following on arrays of size 25 million:

{code}
Tuple-sort using Arrays.sort(): Took 25626 ms
Tuple-sort using Arrays.sort(): Took 28018 ms
Tuple-sort using Arrays.sort(): Took 26932 ms
Tuple-sort using Arrays.sort(): Took 24436 ms
Tuple-sort using Arrays.sort(): Took 25894 ms
Tuple-sort using Arrays.sort(): Took 24965 ms
Tuple-sort using Arrays.sort(): Took 23817 ms
Tuple-sort using Arrays.sort(): Took 23692 ms
Tuple-sort using Arrays.sort(): Took 26731 ms
Tuple-sort using Arrays.sort(): Took 23667 ms
Tuple-sort using Arrays.sort(): (36662 ms first try, 25377 ms average)
KV-sort using Sorter: Took 39579 ms
KV-sort using Sorter: Took 39176 ms
KV-sort using Sorter: Took 41760 ms
KV-sort using Sorter: Took 42469 ms
KV-sort using Sorter: Took 43133 ms
KV-sort using Sorter: Took 41692 ms
KV-sort using Sorter: Took 39585 ms
KV-sort using Sorter: Took 41617 ms
KV-sort using Sorter: Took 42300 ms
KV-sort using Sorter: Took 48274 ms
KV-sort using Sorter: (47217 ms first try, 41958 ms average)
{code}

  was:
The previous benchmark code in `SorterSuite` doesn't reset the array after each run. So we were comparing both algorithms on already ordered arrays. With the correct code, KVArraySortDataFormat is slower than Java's Arrays.sort(). On Java 7, I got the following on arrays of size 25 million:

{code}
uple-sort using Arrays.sort(): Took 25626 ms
Tuple-sort using Arrays.sort(): Took 28018 ms
Tuple-sort using Arrays.sort(): Took 26932 ms
Tuple-sort using Arrays.sort(): Took 24436 ms
Tuple-sort using Arrays.sort(): Took 25894 ms
Tuple-sort using Arrays.sort(): Took 24965 ms
Tuple-sort using Arrays.sort(): Took 23817 ms
Tuple-sort using Arrays.sort(): Took 23692 ms
Tuple-sort using Arrays.sort(): Took 26731 ms
Tuple-sort using Arrays.sort(): Took 23667 ms
Tuple-sort using Arrays.sort(): (36662 ms first try, 25377 ms average)
KV-sort using Sorter: Took 39579 ms
KV-sort using Sorter: Took 39176 ms
KV-sort using Sorter: Took 41760 ms
KV-sort using Sorter: Took 42469 ms
KV-sort using Sorter: Took 43133 ms
KV-sort using Sorter: Took 41692 ms
KV-sort using Sorter: Took 39585 ms
KV-sort using Sorter: Took 41617 ms
KV-sort using Sorter: Took 42300 ms
KV-sort using Sorter: Took 48274 ms
KV-sort using Sorter: (47217 ms first try, 41958 ms average)
{code}


> KVArraySortDataFormat is not as fast as Java's Arrays.sort()
> ------------------------------------------------------------
>
>                 Key: SPARK-4104
>                 URL: https://issues.apache.org/jira/browse/SPARK-4104
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.1.0, 1.2.0
>            Reporter: Xiangrui Meng
>
> The previous benchmark code in `SorterSuite` doesn't reset the array after each run. So we were comparing both algorithms on already ordered arrays. With the correct code, KVArraySortDataFormat is slower than Java's Arrays.sort(). On Java 7, I got the following on arrays of size 25 million:
> {code}
> Tuple-sort using Arrays.sort(): Took 25626 ms
> Tuple-sort using Arrays.sort(): Took 28018 ms
> Tuple-sort using Arrays.sort(): Took 26932 ms
> Tuple-sort using Arrays.sort(): Took 24436 ms
> Tuple-sort using Arrays.sort(): Took 25894 ms
> Tuple-sort using Arrays.sort(): Took 24965 ms
> Tuple-sort using Arrays.sort(): Took 23817 ms
> Tuple-sort using Arrays.sort(): Took 23692 ms
> Tuple-sort using Arrays.sort(): Took 26731 ms
> Tuple-sort using Arrays.sort(): Took 23667 ms
> Tuple-sort using Arrays.sort(): (36662 ms first try, 25377 ms average)
> KV-sort using Sorter: Took 39579 ms
> KV-sort using Sorter: Took 39176 ms
> KV-sort using Sorter: Took 41760 ms
> KV-sort using Sorter: Took 42469 ms
> KV-sort using Sorter: Took 43133 ms
> KV-sort using Sorter: Took 41692 ms
> KV-sort using Sorter: Took 39585 ms
> KV-sort using Sorter: Took 41617 ms
> KV-sort using Sorter: Took 42300 ms
> KV-sort using Sorter: Took 48274 ms
> KV-sort using Sorter: (47217 ms first try, 41958 ms average)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org