You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/05/04 01:47:00 UTC

[jira] [Commented] (LUCENE-4198) Allow codecs to index term impacts

    [ https://issues.apache.org/jira/browse/LUCENE-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338712#comment-17338712 ] 

ASF subversion and git services commented on LUCENE-4198:
---------------------------------------------------------

Commit c33d211d2aa3e8fc613b7d434b7a782675aac31a in lucene's branch refs/heads/main from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c33d211 ]

LUCENE-4198: add format description for term impacts to javadocs (#115)



> Allow codecs to index term impacts
> ----------------------------------
>
>                 Key: LUCENE-4198
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4198
>             Project: Lucene - Core
>          Issue Type: Sub-task
>          Components: core/index
>            Reporter: Robert Muir
>            Priority: Major
>             Fix For: 8.0
>
>         Attachments: LUCENE-4198-BMW.patch, LUCENE-4198.patch, LUCENE-4198.patch, LUCENE-4198.patch, LUCENE-4198.patch, LUCENE-4198.patch, LUCENE-4198_flush.patch, TestSimpleTextPostingsFormat.asf.nightly.master.1466.consoleText.excerpt.txt, TestSimpleTextPostingsFormat.sarowe.jenkins.nightly.master.681.consoleText.excerpt.txt
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Subtask of LUCENE-4100.
> Thats an example of something similar to impact indexing (though, his implementation currently stores a max for the entire term, the problem is the same).
> We can imagine other similar algorithms too: I think the codec API should be able to support these.
> Currently it really doesnt: Stefan worked around the problem by providing a tool to 'rewrite' your index, he passes the IndexReader and Similarity to it. But it would be better if we fixed the codec API.
> One problem is that the Postings writer needs to have access to the Similarity. Another problem is that it needs access to the term and collection statistics up front, rather than after the fact.
> This might have some cost (hopefully minimal), so I'm thinking to experiment in a branch with these changes and see if we can make it work well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org