You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2015/11/02 18:46:27 UTC

[jira] [Updated] (LUCENE-6863) Store sparse doc values more efficiently

     [ https://issues.apache.org/jira/browse/LUCENE-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrien Grand updated LUCENE-6863:
---------------------------------
    Attachment: LUCENE-6863.patch

Updated patch that:
 - makes the code a bit more readable and adds comments
 - avoids loading a slice for values when only docs with field are requested
 - saves some monotonic lookups 

Here is an updated result of the benchmark (still with a threshold of 5% for benchmarking purposes, even though the patch still has a threshold of 1%), computed exactly the same way as above. It makes the slowdown a bit more contained. Times are in ms.

||Field||sort performance on a MatchAllDocsQuery||sort performance on a term query that matches 10% of docs||sort performance on a term query that matches 1% of docs||sort performance on a term query that matches docs that have the field||
|cc2 |128→99 ({color:green}-23%{color})|21.8→23.8 (+9%)|2.92→4.33 ({color:red}+48%{color})|6.84→13.0 ({color:red}+90%{color})|
|admin4|121→98 ({color:green}-19%{color})|21.4→21.1 (-1%)| 3.65→2.81 ({color:green}-23%{color})|8.36→16.6 ({color:red}+98%{color})|
|admin3|116→125 (+1%)|20.6→20.0 (-3%)|3.20→3.24 (+1%)|18.9→19.4 (+8%)|
|admin2 |124→132 (+6%)|21.5→20.6 (-4%)|3.30→3.49 (+6%)|8.58→8.64 (+1%)|

I think the change is good to go, but I know this can be controversial. Please let me know if you have concerns, otherwise I plan to commit it by the end of the week.

> Store sparse doc values more efficiently
> ----------------------------------------
>
>                 Key: LUCENE-6863
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6863
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>         Attachments: LUCENE-6863.patch, LUCENE-6863.patch, LUCENE-6863.patch
>
>
> For both NUMERIC fields and ordinals of SORTED fields, we store data in a dense way. As a consequence, if you have only 1000 documents out of 1B that have a value, and 8 bits are required to store those 1000 numbers, we will not require 1KB of storage, but 1GB.
> I suspect this mostly happens in abuse cases, but still it's a pity that we explode storage requirements. We could try to detect sparsity and compress accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org