You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Lu Xugang (Jira)" <ji...@apache.org> on 2021/05/14 10:25:00 UTC

[jira] [Comment Edited] (LUCENE-9957) Use DirectMonotonicWriter to store sorted Values in NumericDocValues/SortedNumericDocValues

    [ https://issues.apache.org/jira/browse/LUCENE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344510#comment-17344510 ] 

Lu Xugang edited comment on LUCENE-9957 at 5/14/21, 10:24 AM:
--------------------------------------------------------------

Since in method Lucene90DocValuesConsumer#writeValues(FieldInfo field, DocValuesProducer valuesProducer) , all values will be visited, in the meantime, so we can check if all values were sorted. if so, after docIds written done, we use DirectMonotonicWriter write all values then return.

Two conditions have to be met:
 # all values were monotone increased
 # numDocsWithValue == numValue

numDocsWithValue == numValue means DocValues is NumericDocValues or SortedNumericDocValues which only has one value in one document.

I did some simple tests: indexing 10million documents into one Segment. then only calculate the file length of *.dvd file.

UniqueValues >= 256:
||Loop||Branch(Main)||Branch(PR)||Storage||UniqueValues|| ||
|0|20014826B|14129376B|-29.405%|6321130| |
|1|20011970B|13768928B|-31.196%|6322006| |
|2|20014826B|14145670B|29.324%|6321066| |
|3|20014826B|14031072B|-29.896%|6319892| |
|4|20014826B|14276230B|-28.671%|6321111| |
|5|20014826B|13998304B|-30.060%|6320938| |
|6|20014826B|13932768B|-30.387%|6320997| |
|7|20014826B|13801696B|-31.042%|6321756| |
|8|20014826B|13768928B|-31.206%|6322336| |
|9|20014826B|14260448B|-28.750%|6321014| |
| | | | | | |

 
 UniqueValues < 256:
||Loop||Branch(Main)||Branch(PR)||Storage||UniqueValues|| ||
|0|2500076B|66064B|-97.35%|2| |
|1|2500076B|66064B|-97.35%|2| |
|2|5000076B|82454B|-98.35%|4| |
|3|5000076B|82454B|-98.35%|4| |
|4|5000076B|115234B|-97.69%|8| |
|5|5000076B|115234B|-97.69%|8| |
|6|10000076B|180794B|-98.19%|16| |
|7|10000076B|180794B|-98.19%|16| |
|8|10000076B|311914B|-96.88%|32| |
|9|10000076B|311914B|-96.88%|32| |
|10|10000076B|574154B|-94.25%|64| |
|11|10000076B|574154B|-94.25%|64| |
|12|10000076B|1098634B|-89.01%|128| |
|13|10000076B|1098634B|-89.01%|128| |
|14|10000076B|1303509B|-86.96%|255| |
|15|10000076B|1303509B|-86.96%|255|


was (Author: chrislu):
I did some simple tests: indexing 10million documents into one Segment。

UniqueValues >= 256:
||Loop||Branch(Main)||Branch(PR)||Storage||UniqueValues|| ||
|0|20014826B|14129376B|-29.405%|6321130| |
|1|20011970B|13768928B|-31.196%|6322006| |
|2|20014826B|14145670B|29.324%|6321066| |
|3|20014826B|14031072B|-29.896%|6319892| |
|4|20014826B|14276230B|-28.671%|6321111| |
|5|20014826B|13998304B|-30.060%|6320938| |
|6|20014826B|13932768B|-30.387%|6320997| |
|7|20014826B|13801696B|-31.042%|6321756| |
|8|20014826B|13768928B|-31.206%|6322336| |
|9|20014826B|14260448B|-28.750%|6321014| |
| | | | | | |
 
UniqueValues < 256:
||Loop||Branch(Main)||Branch(PR)||Storage||UniqueValues|| ||
|0|2500076B|66064B|-97.35%|2| |
|1|2500076B|66064B|-97.35%|2| |
|2|5000076B|82454B|-98.35%|4| |
|3|5000076B|82454B|-98.35%|4| |
|4|5000076B|115234B|-97.69%|8| |
|5|5000076B|115234B|-97.69%|8| |
|6|10000076B|180794B|-98.19%|16| |
|7|10000076B|180794B|-98.19%|16| |
|8|10000076B|311914B|-96.88%|32| |
|9|10000076B|311914B|-96.88%|32| |
|10|10000076B|574154B|-94.25%|64| |
|11|10000076B|574154B|-94.25%|64| |
|12|10000076B|1098634B|-89.01%|128| |
|13|10000076B|1098634B|-89.01%|128| |
|14|10000076B|1303509B|-86.96%|255| |
|15|10000076B|1303509B|-86.96%|255|

> Use DirectMonotonicWriter to store sorted Values in NumericDocValues/SortedNumericDocValues
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9957
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9957
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: 8.8.2
>            Reporter: Lu Xugang
>            Priority: Major
>
> When all values were sorted, use DirectMonotonicWriter to store them can get relatively impressive compression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org