You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Adrien Grand <jp...@gmail.com> on 2013/05/03 15:26:00 UTC

Commons sorting?

Hi,

I'm working on an abstraction to sort random-access data-structures
for Lucene[1] and would like to know if you think this is interesting
enough for inclusion in Apache Commons.

The reason for such an abstraction is that we often need to sort
parallel arrays and the JDK provides no way to do it easily. This is
why we first borrowed SorterTemplate[2] from CGlib: provided the
definition of some primitive methods (compare and swap), it can sort
any sub-sequence of any random-access data-structure.

We recently faced the need to implement TimSort (LUCENE-4839[3]) for
some use-cases, but unfortunately it requires a different set of
primitive operations to be properly implemented. This is why I started
working on LUCENE-4946[1] to be able to have different primitive
operations depending on the sorting algorithm.

A few more notes about this sorting abstraction:
 - since it only assumes the data supports random-access, it can be
used to sort off-heap or commons-primitives collections,
 - it provides some useful sorting algorithms, in particular
introsort[4] and a modified Timsort[5],
 - performance is comparable to the JDK's Arrays.sort in spite of the
abstraction,
 - it is faster and more memory-efficient than Collections.sort for
random-access lists, since the latter dumps the content of the list
into an array to sort it with Arrays.sort.

What do you think?

[1] https://issues.apache.org/jira/browse/LUCENE-4946
[2] http://grepcode.com/file/repo1.maven.org/maven2/cglib/cglib/2.2.2/net/sf/cglib/util/SorterTemplate.java#SorterTemplate
[3] https://issues.apache.org/jira/browse/LUCENE-4839
[4] http://fr.wikipedia.org/wiki/Introsort
[5] http://en.wikipedia.org/wiki/Timsort

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Commons sorting?

Posted by Luc Maisonobe <Lu...@free.fr>.
Le 04/05/2013 14:35, Adrien Grand a écrit :
> Hi Luc,
> 
> On Sat, May 4, 2013 at 1:10 PM, Luc Maisonobe <Lu...@free.fr> wrote:
>> It should be done. Could you check if it works?
> 
> It works! Can someone now add "sorting
> https://svn.apache.org/repos/asf/commons/sandbox/sorting/trunk/" to
> the svn:externals property of commons/trunks-sandbox?

Should be done now.

Luc

> 
> Thank you!
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Commons sorting?

Posted by Adrien Grand <jp...@gmail.com>.
Hi Luc,

On Sat, May 4, 2013 at 1:10 PM, Luc Maisonobe <Lu...@free.fr> wrote:
> It should be done. Could you check if it works?

It works! Can someone now add "sorting
https://svn.apache.org/repos/asf/commons/sandbox/sorting/trunk/" to
the svn:externals property of commons/trunks-sandbox?

Thank you!

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Commons sorting?

Posted by Luc Maisonobe <Lu...@free.fr>.
Le 03/05/2013 17:29, Adrien Grand a écrit :
> Hi Matt,

Hi Adrien,

> 
> On Fri, May 3, 2013 at 4:30 PM, Matt Benson <gu...@gmail.com> wrote:
>> Without putting too much importance on what I, personally, think, know that
>> any ASF committer can have access to hack on a Commons-style library in our
>> sandbox, just by asking for it.  From my personal POV, I'm not sure if what
>> we're talking about sounds like it has enough "surface area" to warrant a
>> separate component, but getting everything in place would allow the
>> community to make the decision of whether a separate component is in order,
>> or if perhaps this code might live comfortably in e.g. Commons [lang].
> 
> Thank you for your reply. Indeed there is little functionality so
> inclusion into an existing component would make sense.
> 
> Following your advice, I propose to get the code into a separate
> sandbox component so that everyone can have a look at the code and
> later decide if this should be abandoned, promoted to an official
> component or merged into an existing component. If that sounds good,
> could someone give me commit access to commons/sandbox (my Apache id
> is "jpountz")?

It should be done. Could you check if it works?

best regards,
Luc

> 
> Regards,
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Commons sorting?

Posted by Adrien Grand <jp...@gmail.com>.
Hi,

On Fri, May 3, 2013 at 5:29 PM, Adrien Grand <jp...@gmail.com> wrote:
> Following your advice, I propose to get the code into a separate
> sandbox component so that everyone can have a look at the code and
> later decide if this should be abandoned, promoted to an official
> component or merged into an existing component.

I just checked in the code
(http://svn.apache.org/repos/asf/commons/sandbox/sorting/trunk/).
Don't hesitate to tell me what you think and/or how I can help for
further steps.

Regards,

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Commons sorting?

Posted by Adrien Grand <jp...@gmail.com>.
Hi Matt,

On Fri, May 3, 2013 at 4:30 PM, Matt Benson <gu...@gmail.com> wrote:
> Without putting too much importance on what I, personally, think, know that
> any ASF committer can have access to hack on a Commons-style library in our
> sandbox, just by asking for it.  From my personal POV, I'm not sure if what
> we're talking about sounds like it has enough "surface area" to warrant a
> separate component, but getting everything in place would allow the
> community to make the decision of whether a separate component is in order,
> or if perhaps this code might live comfortably in e.g. Commons [lang].

Thank you for your reply. Indeed there is little functionality so
inclusion into an existing component would make sense.

Following your advice, I propose to get the code into a separate
sandbox component so that everyone can have a look at the code and
later decide if this should be abandoned, promoted to an official
component or merged into an existing component. If that sounds good,
could someone give me commit access to commons/sandbox (my Apache id
is "jpountz")?

Regards,

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Commons sorting?

Posted by Matt Benson <gu...@gmail.com>.
Without putting too much importance on what I, personally, think, know that
any ASF committer can have access to hack on a Commons-style library in our
sandbox, just by asking for it.  From my personal POV, I'm not sure if what
we're talking about sounds like it has enough "surface area" to warrant a
separate component, but getting everything in place would allow the
community to make the decision of whether a separate component is in order,
or if perhaps this code might live comfortably in e.g. Commons [lang].

Regards,
Matt


On Fri, May 3, 2013 at 8:26 AM, Adrien Grand <jp...@gmail.com> wrote:

> Hi,
>
> I'm working on an abstraction to sort random-access data-structures
> for Lucene[1] and would like to know if you think this is interesting
> enough for inclusion in Apache Commons.
>
> The reason for such an abstraction is that we often need to sort
> parallel arrays and the JDK provides no way to do it easily. This is
> why we first borrowed SorterTemplate[2] from CGlib: provided the
> definition of some primitive methods (compare and swap), it can sort
> any sub-sequence of any random-access data-structure.
>
> We recently faced the need to implement TimSort (LUCENE-4839[3]) for
> some use-cases, but unfortunately it requires a different set of
> primitive operations to be properly implemented. This is why I started
> working on LUCENE-4946[1] to be able to have different primitive
> operations depending on the sorting algorithm.
>
> A few more notes about this sorting abstraction:
>  - since it only assumes the data supports random-access, it can be
> used to sort off-heap or commons-primitives collections,
>  - it provides some useful sorting algorithms, in particular
> introsort[4] and a modified Timsort[5],
>  - performance is comparable to the JDK's Arrays.sort in spite of the
> abstraction,
>  - it is faster and more memory-efficient than Collections.sort for
> random-access lists, since the latter dumps the content of the list
> into an array to sort it with Arrays.sort.
>
> What do you think?
>
> [1] https://issues.apache.org/jira/browse/LUCENE-4946
> [2]
> http://grepcode.com/file/repo1.maven.org/maven2/cglib/cglib/2.2.2/net/sf/cglib/util/SorterTemplate.java#SorterTemplate
> [3] https://issues.apache.org/jira/browse/LUCENE-4839
> [4] http://fr.wikipedia.org/wiki/Introsort
> [5] http://en.wikipedia.org/wiki/Timsort
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>