You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by "Marvin Humphrey (JIRA)" <ji...@apache.org> on 2010/02/09 02:43:28 UTC
[jira] Updated: (LUCY-99) Merge sort elements of arbitrary width
[ https://issues.apache.org/jira/browse/LUCY-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marvin Humphrey updated LUCY-99:
--------------------------------
Attachment: mergesort_any_width.diff
Specifically, this code using memcpy() and a width argument known to be "4" at
compile time...
{code}
uint8_t *left_ptr = (uint8_t*)left_vptr;
uint8_t *right_ptr = (uint8_t*)right_vptr;
uint8_t *left_limit = left_ptr + left_size * width;
uint8_t *right_limit = right_ptr + right_size * width;
uint8_t *dest = (uint8_t*)vdest;
while (left_ptr < left_limit && right_ptr < right_limit) {
if (compare(context, left_ptr, right_ptr) < 1) {
memcpy(dest, left_ptr, width);
dest += width;
left_ptr += width;
}
else {
memcpy(dest, right_ptr, width);
dest += width;
right_ptr += width;
}
}
{code}
... produces identical assembler to this code using direct assignment and a
four-byte type:
{code}
uint32_t *left_ptr = (uint32_t*)left_vptr;
uint32_t *right_ptr = (uint32_t*)right_vptr;
uint32_t *left_limit = left_ptr + left_size;
uint32_t *right_limit = right_ptr + right_size;
uint32_t *dest = (uint32_t*)vdest;
while (left_ptr < left_limit && right_ptr < right_limit) {
if (compare(context, left_ptr, right_ptr) < 1) {
*dest++ = *left_ptr++;
}
else {
*dest++ = *right_ptr++;
}
}
{code}
This patch uses static inline functions and branching to support optimized
code for sorting 4-byte and 8-byte elements, while an unoptimized branch
supports all other widths:
{code}
/* Dispatch by element size. */
switch (width) {
case 0: THROW(ERR, "Parameter 'width' cannot be 0");
break;
case 4: S_msort4(elems, scratch, 0, num_elems - 1, compare, context);
break;
case 8: S_msort8(elems, scratch, 0, num_elems - 1, compare, context);
break;
default: S_msort_any(elems, scratch, 0, num_elems - 1, compare,
context, width);
break;
}
{code}
We can add more optimized branches as it suits us.
It would be nice to work out a similar trick for our quicksort implementation.
> Merge sort elements of arbitrary width
> --------------------------------------
>
> Key: LUCY-99
> URL: https://issues.apache.org/jira/browse/LUCY-99
> Project: Lucy
> Issue Type: Improvement
> Components: Core - Util
> Reporter: Marvin Humphrey
> Assignee: Marvin Humphrey
> Priority: Minor
> Attachments: mergesort_any_width.diff
>
>
> The present implementation of mergesort in SortUtils only handles four-byte and
> eight-byte elements, because it was intended only for sorting pointers, but it
> would be convenient to make it possible to sort elements of arbitrary width.
> We can add a width argument without sacrificing speed by taking advantage of
> the fact that optimizing compilers such as GCC produce identical assembler for
> invocations of memcpy() where the amount to copy is a small constant known
> at compile time as for direct assigment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.