You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "John Wang (JIRA)" <ji...@apache.org> on 2009/04/25 05:14:30 UTC

[jira] Created: (LUCENE-1613) TermEnum.docFreq() is not updated with there are deletes

TermEnum.docFreq() is not updated with there are deletes
--------------------------------------------------------

                 Key: LUCENE-1613
                 URL: https://issues.apache.org/jira/browse/LUCENE-1613
             Project: Lucene - Java
          Issue Type: Bug
          Components: Search
    Affects Versions: 2.4
            Reporter: John Wang


TermEnum.docFreq is used in many places, especially scoring. However, if there are deletes in the index and it is not yet merged, this value is not updated.

Attached is a test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1613) TermEnum.docFreq() is not updated with there are deletes

Posted by "Matt Chaput (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703333#action_12703333 ] 

Matt Chaput edited comment on LUCENE-1613 at 4/27/09 1:11 PM:
--------------------------------------------------------------

Given how fundamental the issue is w.r.t. how Lucene stores the index, it's unlikely to ever be fixed. (A clean, performant fix other than simply merging the segments would be a pretty incredible revelation.) As an outside observer I would argue against keeping the bug open forever for correctness sake.



      was (Author: mchaput):
    Given how fundamental the issue is w.r.t. how Lucene stores the index, it's unlikely to ever be fixed. (A clean, performant fix other than simply merging the segments would be pretty incredible revelation.) As an outside observer I would argue against keeping the bug open forever for correctness sake.


  
> TermEnum.docFreq() is not updated with there are deletes
> --------------------------------------------------------
>
>                 Key: LUCENE-1613
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1613
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: John Wang
>         Attachments: TestDeleteAndDocFreq.java
>
>
> TermEnum.docFreq is used in many places, especially scoring. However, if there are deletes in the index and it is not yet merged, this value is not updated.
> Attached is a test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1613) TermEnum.docFreq() is not updated with there are deletes

Posted by "John Wang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702649#action_12702649 ] 

John Wang commented on LUCENE-1613:
-----------------------------------

I understand this is a rather difficult problem to fix. I thought keeping a jira ticket would still be good for tracking purposes. Will let the committers decide on the urgency on this issue.

> TermEnum.docFreq() is not updated with there are deletes
> --------------------------------------------------------
>
>                 Key: LUCENE-1613
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1613
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: John Wang
>         Attachments: TestDeleteAndDocFreq.java
>
>
> TermEnum.docFreq is used in many places, especially scoring. However, if there are deletes in the index and it is not yet merged, this value is not updated.
> Attached is a test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1613) TermEnum.docFreq() is not updated with there are deletes

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703338#action_12703338 ] 

Mark Miller commented on LUCENE-1613:
-------------------------------------

This is a dupe I believe, but for the life of me, I cannot find the original to link them.

> TermEnum.docFreq() is not updated with there are deletes
> --------------------------------------------------------
>
>                 Key: LUCENE-1613
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1613
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: John Wang
>         Attachments: TestDeleteAndDocFreq.java
>
>
> TermEnum.docFreq is used in many places, especially scoring. However, if there are deletes in the index and it is not yet merged, this value is not updated.
> Attached is a test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1613) TermEnum.docFreq() is not updated with there are deletes

Posted by "John Wang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786750#action_12786750 ] 

John Wang commented on LUCENE-1613:
-----------------------------------

Maybe to just add a javadoc comment on the call to explain the behavior in this case?

Many times calling docFreq happens in a readonly context, calling expungeDeletes in that context is not a good idea.

I agree it is not trivial to fix while keeping the performance. I don't mind closing the bug either.

> TermEnum.docFreq() is not updated with there are deletes
> --------------------------------------------------------
>
>                 Key: LUCENE-1613
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1613
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: John Wang
>         Attachments: TestDeleteAndDocFreq.java
>
>
> TermEnum.docFreq is used in many places, especially scoring. However, if there are deletes in the index and it is not yet merged, this value is not updated.
> Attached is a test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1613) TermEnum.docFreq() is not updated with there are deletes

Posted by "Matt Chaput (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703333#action_12703333 ] 

Matt Chaput commented on LUCENE-1613:
-------------------------------------

Given how fundamental the issue is w.r.t. how Lucene stores the index, it's unlikely to ever be fixed. (A clean, performant fix other than simply merging the segments would be pretty incredible revelation.) As an outside observer I would argue against keeping the bug open forever for correctness sake.



> TermEnum.docFreq() is not updated with there are deletes
> --------------------------------------------------------
>
>                 Key: LUCENE-1613
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1613
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: John Wang
>         Attachments: TestDeleteAndDocFreq.java
>
>
> TermEnum.docFreq is used in many places, especially scoring. However, if there are deletes in the index and it is not yet merged, this value is not updated.
> Attached is a test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1613) TermEnum.docFreq() is not updated with there are deletes

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786697#action_12786697 ] 

Mark Miller commented on LUCENE-1613:
-------------------------------------

bq.  As an outside observer I would argue against keeping the bug open forever for correctness sake.

I agree - its not really a bug. Its by design.

bq. I suppose we could make a "fixTermCounts()" method, which takes a looong time as it iterates through the postings for each term to compute the actual count,

Just call expungeDeletes?

+1 on closing.

> TermEnum.docFreq() is not updated with there are deletes
> --------------------------------------------------------
>
>                 Key: LUCENE-1613
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1613
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: John Wang
>         Attachments: TestDeleteAndDocFreq.java
>
>
> TermEnum.docFreq is used in many places, especially scoring. However, if there are deletes in the index and it is not yet merged, this value is not updated.
> Attached is a test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1613) TermEnum.docFreq() is not updated with there are deletes

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702674#action_12702674 ] 

Michael McCandless commented on LUCENE-1613:
--------------------------------------------

John, do you have cases in practice where this is causing problems?

I understand the problem, and it's certainly real, and is not easy to fix "automatically", but I'm wondering in practice whether the difference in the resulting scores is ever significant.

I suppose we could make a "fixTermCounts()" method, which takes a looong time as it iterates through the postings for each term to compute the actual count, and then writes a new terms dict.  The app would have to manually call this method.

> TermEnum.docFreq() is not updated with there are deletes
> --------------------------------------------------------
>
>                 Key: LUCENE-1613
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1613
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: John Wang
>         Attachments: TestDeleteAndDocFreq.java
>
>
> TermEnum.docFreq is used in many places, especially scoring. However, if there are deletes in the index and it is not yet merged, this value is not updated.
> Attached is a test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1613) TermEnum.docFreq() is not updated with there are deletes

Posted by "John Wang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702691#action_12702691 ] 

John Wang commented on LUCENE-1613:
-----------------------------------

Michael: We ran into this actually in facet search. When there is a null search, instead of counting on results on a MatchAllDocsQuery, we were just using docFreq() method to avoid facet counting. The problem came with there were updates. We did get around it, but was rather cumbersome.

I agree the fix is non-trivial, just wanted to open up an issue for tracking purposes incase we think of some thing.

> TermEnum.docFreq() is not updated with there are deletes
> --------------------------------------------------------
>
>                 Key: LUCENE-1613
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1613
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: John Wang
>         Attachments: TestDeleteAndDocFreq.java
>
>
> TermEnum.docFreq is used in many places, especially scoring. However, if there are deletes in the index and it is not yet merged, this value is not updated.
> Attached is a test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1613) TermEnum.docFreq() is not updated with there are deletes

Posted by "John Wang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Wang updated LUCENE-1613:
------------------------------

    Attachment: TestDeleteAndDocFreq.java

Test showing docFreq not updated when there are deletes.

> TermEnum.docFreq() is not updated with there are deletes
> --------------------------------------------------------
>
>                 Key: LUCENE-1613
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1613
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: John Wang
>         Attachments: TestDeleteAndDocFreq.java
>
>
> TermEnum.docFreq is used in many places, especially scoring. However, if there are deletes in the index and it is not yet merged, this value is not updated.
> Attached is a test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org