You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by "Marvin Humphrey (JIRA)" <ji...@apache.org> on 2010/02/09 02:43:28 UTC
[jira] Updated: (LUCY-99) Merge sort elements of arbitrary width

     [ https://issues.apache.org/jira/browse/LUCY-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marvin Humphrey updated LUCY-99:
--------------------------------

    Attachment: mergesort_any_width.diff

Specifically, this code using memcpy() and a width argument known to be "4" at
compile time...

{code}
uint8_t *left_ptr    = (uint8_t*)left_vptr;
uint8_t *right_ptr   = (uint8_t*)right_vptr;
uint8_t *left_limit  = left_ptr + left_size * width;
uint8_t *right_limit = right_ptr + right_size * width;
uint8_t *dest        = (uint8_t*)vdest;

while (left_ptr < left_limit && right_ptr < right_limit) {
    if (compare(context, left_ptr, right_ptr) < 1) {
        memcpy(dest, left_ptr, width);
        dest += width;
        left_ptr += width;
    }
    else {
        memcpy(dest, right_ptr, width);
        dest += width;
        right_ptr += width;
    }
}
{code}

... produces identical assembler to this code using direct assignment and a
four-byte type:

{code}
uint32_t *left_ptr    = (uint32_t*)left_vptr;
uint32_t *right_ptr   = (uint32_t*)right_vptr;
uint32_t *left_limit  = left_ptr + left_size;
uint32_t *right_limit = right_ptr + right_size;
uint32_t *dest        = (uint32_t*)vdest;

while (left_ptr < left_limit && right_ptr < right_limit) {
    if (compare(context, left_ptr, right_ptr) < 1) {
        *dest++ = *left_ptr++;
    }
    else {
        *dest++ = *right_ptr++;
    }
}
{code}

This patch uses static inline functions and branching to support optimized
code for sorting 4-byte and 8-byte elements, while an unoptimized branch
supports all other widths:

{code}
/* Dispatch by element size. */
switch (width) {
    case 0:  THROW(ERR, "Parameter 'width' cannot be 0");
             break;
    case 4:  S_msort4(elems, scratch, 0, num_elems - 1, compare, context);
             break;
    case 8:  S_msort8(elems, scratch, 0, num_elems - 1, compare, context);
             break;
    default: S_msort_any(elems, scratch, 0, num_elems - 1, compare, 
                context, width);
             break;
}
{code}

We can add more optimized branches as it suits us.

It would be nice to work out a similar trick for our quicksort implementation.

> Merge sort elements of arbitrary width
> --------------------------------------
>
>                 Key: LUCY-99
>                 URL: https://issues.apache.org/jira/browse/LUCY-99
>             Project: Lucy
>          Issue Type: Improvement
>          Components: Core - Util
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>            Priority: Minor
>         Attachments: mergesort_any_width.diff
>
>
> The present implementation of mergesort in SortUtils only handles four-byte and
> eight-byte elements, because it was intended only for sorting pointers, but it
> would be convenient to make it possible to sort elements of arbitrary width.
> We can add a width argument without sacrificing speed by taking advantage of
> the fact that optimizing compilers such as GCC produce identical assembler for 
> invocations of memcpy() where the amount to copy is a small constant known 
> at compile time as for direct assigment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.