You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Greg Miller (Jira)" <ji...@apache.org> on 2021/08/14 01:24:00 UTC

[jira] [Commented] (LUCENE-10033) Encode doc values in smaller blocks of values, like postings

    [ https://issues.apache.org/jira/browse/LUCENE-10033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17399003#comment-17399003 ] 

Greg Miller commented on LUCENE-10033:
--------------------------------------

I got swamped with some other work over the last couple of weeks and wasn't able to try benchmarking the more recent changes. I think I should be able to do this early next week though. Just pinging here so you know I haven't forgotten about this :) 

> Encode doc values in smaller blocks of values, like postings
> ------------------------------------------------------------
>
>                 Key: LUCENE-10033
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10033
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is a follow-up to the discussion on this thread: https://lists.apache.org/thread.html/r7b757074d5f02874ce3a295b0007dff486bc10d08fb0b5e5a4ba72c5%40%3Cdev.lucene.apache.org%3E.
> Our current approach for doc values uses large blocks of 16k values where values can be decompressed independently, using DirectWriter/DirectReader. This is a bit inefficient in some cases, e.g. a single outlier can grow the number of bits per value for the entire block, we can't easily use run-length compression, etc. Plus, it encourages using a different sub-class for every compression technique, which puts pressure on the JVM.
> We'd like to move to an approach that would be more similar to postings with smaller blocks (e.g. 128 values) whose values get all decompressed at once (using SIMD instructions), with skip data within blocks in order to efficiently skip to arbitrary doc IDs (or maybe still use jump tables as today's doc values, and as discussed here for postings: https://lists.apache.org/thread.html/r7c3cb7ab143fd4ecbc05c04064d10ef9fb50c5b4d6479b0f35732677%40%3Cdev.lucene.apache.org%3E).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org