You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Toke Eskildsen (JIRA)" <ji...@apache.org> on 2016/11/23 09:03:59 UTC
[jira] [Commented] (LUCENE-7521) Simplify PackedInts

    [ https://issues.apache.org/jira/browse/LUCENE-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15689447#comment-15689447 ] 

Toke Eskildsen commented on LUCENE-7521:
----------------------------------------

I was involved in the original PackedInts implementation, where I did quite a bit of performance testing of the two different approaches: Optimal memory packing (Packed64) and word-aligned packing (Packed64SingleBlock). They were named different back then, but the principles and the performance-relevant code parts were about the same. The JIRA is LUCENE-1990. The conclusion then was that aligned won in a few cases but added quite a lot of complexity, so it was scrapped.

Two years later the aligned version was re-introduced in LUCENE-4062. Again there were some performance testing. Performance characteristics differed depending on CPU structure and in-memory array size (cache utilization really). Overall it seemed that aligned packing was faster, but not by much on the i7 (desktop & Xeon). 

One important observation from the JIRA is that only the BPVs (Bits Per Value) 3, 5, 6, 7, 9, 10, 12 and 21 that differ in representation (and get/set algorithm) between packed and aligned. There's some poor graphs from an old comparison of those values on http://ekot.dk/misc/packedints/padding.html where contiguous=packed and padding=aligned. This was for a small (10M values, AFAIR) set. Note how the performance difference between the implementation varies a lot, depending on CPU type.

Long story longer, I still favour having only 1 underlying format ("optimal" packed): Too little gain in too few cases for a high code complexity cost with aligned. On a related node, a high-quality micro-benchmark for structures like these would be great.

> Simplify PackedInts
> -------------------
>
>                 Key: LUCENE-7521
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7521
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7521.patch
>
>
> We have a lot of specialization in PackedInts about how to keep packed arrays of longs in memory. However, most use-cases have slowly moved to DirectWriter and DirectMonotonicWriter and most specializations we have are barely used for performance-sensitive operations, so I'd like to clean this up a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org