You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Eks Dev (JIRA)" <ji...@apache.org> on 2008/07/20 13:02:31 UTC
[jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary

    [ https://issues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615077#action_12615077 ] 

Eks Dev commented on LUCENE-1278:
---------------------------------

in light of Mike's comments hier (Michael McCandless - 05/May/08 05:33 AM), I think it is worth mentioning that I am working on LUCENE-1340, that is storing postings without additional frq info. 

correct me if I am wrong, the only difference is that this approach with *.frq needs one seek more... at the same time, this could potentially increase term dict size, so we loose some locality.

Your your last proposal sounds interesting,  "inline short postings" into term dict , so for short postings (about the size of offset pointer into *.frq) with tf==1 (that is the always the case if you use omitTf(true) from LUCENE-1340)  we spare one seek()... this could be a lot. Also, there is no need to store postings into *frq  (this complicates maintenance I guess)  

> Add optional storing of document numbers in term dictionary
> -----------------------------------------------------------
>
>                 Key: LUCENE-1278
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1278
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.3.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: lucene.1278.5.4.2008.patch, lucene.1278.5.5.2008.2.patch, lucene.1278.5.5.2008.patch, lucene.1278.5.7.2008.patch, lucene.1278.5.7.2008.test.patch, TestTermEnumDocs.java
>
>
> Add optional storing of document numbers in term dictionary.  String index field cache and range filter creation will be faster.  
> Example read code:
> {noformat}
> TermEnum termEnum = indexReader.terms(TermEnum.LOAD_DOCS);
> do {
>   Term term = termEnum.term();
>   if (term == null || term.field() != field) break;
>   int[] docs = termEnum.docs();
> } while (termEnum.next());
> {noformat}
> Example write code:
> {noformat}
> Document document = new Document();
> document.add(new Field("tag", "dog", Field.Store.YES, Field.Index.UN_TOKENIZED, Field.Term.STORE_DOCS));
> indexWriter.addDocument(document);
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org