You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "wuda (Jira)" <ji...@apache.org> on 2021/08/24 04:55:00 UTC
[jira] [Commented] (LUCENE-10035) Simple text codec add multi level skip list data

    [ https://issues.apache.org/jira/browse/LUCENE-10035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403521#comment-17403521 ] 

wuda commented on LUCENE-10035:
-------------------------------

github has latest patch, it seems no need to maintains in two place, so i deleted the patch file

> Simple text codec add  multi level skip list data 
> --------------------------------------------------
>
>                 Key: LUCENE-10035
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10035
>             Project: Lucene - Core
>          Issue Type: Wish
>          Components: core/codecs
>    Affects Versions: main (9.0)
>            Reporter: wuda
>            Priority: Major
>              Labels: Impact, MultiLevelSkipList, SimpleTextCodec
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Simple text codec add skip list data( include impact) to help understand index format，For debugging, curiosity, transparency only!! When term's docFreq greater than or equal to SimpleTextSkipWriter.BLOCK_SIZE (default value is 8), Simple text codec will write skip list, the *.pst (simple text term dictionary file)* file will looks like this
> {code:java}
> field title
>   term args
>     doc 2
>       freq 2
>       pos 7
>       pos 10
>     ## we omit docs for better view ......
>     doc 98
>       freq 2
>       pos 2
>       pos 6
>     skipList 
> ?
>       level 1
>         skipDoc 65
>         skipDocFP 949
>         impacts 
>           impact 
>             freq 1
>             norm 2
>           impact 
>             freq 2
>             norm 12
>           impact 
>             freq 3
>             norm 13
>         impacts_end 
> ?
>       level 0
>         skipDoc 17
>         skipDocFP 284
>         impacts 
>           impact 
>             freq 1
>             norm 2
>           impact 
>             freq 2
>             norm 12
>         impacts_end         
>         skipDoc 34
>         skipDocFP 624
>         impacts 
>           impact 
>             freq 1
>             norm 2
>           impact 
>             freq 2
>             norm 12
>           impact 
>             freq 3
>             norm 14
>         impacts_end         
>         skipDoc 65
>         skipDocFP 949
>         impacts 
>           impact 
>             freq 1
>             norm 2
>           impact 
>             freq 2
>             norm 12
>           impact 
>             freq 3
>             norm 13
>         impacts_end         
>         skipDoc 90
>         skipDocFP 1311
>         impacts 
>           impact 
>             freq 1
>             norm 2
>           impact 
>             freq 2
>             norm 10
>           impact 
>             freq 3
>             norm 13
>           impact 
>             freq 4
>             norm 14
>         impacts_end 
> END
> checksum 00000000000829315543
> {code}
> compare with previous，we add *skipList，level, skipDoc, skipDocFP, impacts, impact, freq, norm* nodes, at the same, simple text codec can support advanceShallow when search time.
>  
> h2. Why there has question mark symbol in the file ?
> Because the *MultiLevelSkipListWriter* will write "length" and "childPointer" with VLong
> h1. This speed up search process ?
> No!!! It can be advanceShallow when search, but why not speed up yet? Because the skip list data after docs(see the file described before), it must iterate all docs before read skip list data, so it never speed up search time. it has no "skipOffset" to direct read skip list data, but as mentioned before, it is For debugging, curiosity, transparency only!! If this is a problem, may be the next time, i can add "skipOffset", so we can read skip list data directly.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org