You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2015/06/10 19:00:05 UTC

[jira] [Commented] (LUCENE-6545) optimize DocTermOrds in cases where the underlying TermEnum being wraped supports ord()

    [ https://issues.apache.org/jira/browse/LUCENE-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580802#comment-14580802 ] 

Hoss Man commented on LUCENE-6545:
----------------------------------

Some relevant comments from rmuir in the original issue...

bq. If i disable the ord-sharing optimization in DocTermOrds, all 3 seeds pass. So I think there is a bug in e.g. FixedGap/BlockTerms dictionary or something like that. Maybe BasePostingsFormatTestCase does not adequately exercise methods like size()/ord()/seek(ord). It should be failing!

bq. is the problem the "extra" terms introduced by precision step? Maybe crank precisionStep down and see if expected/actual change. Maybe the current optimization is unsafe in that case and yields a bogus valueCount including the range terms, which screws up things down the road.

bq. Now we know: its that this DocTermOrds optimization is conceptually broken with precisionStep. This just causes problems downstream but its not filtering out the "range terms" and that is the root cause. It cannot return the terms dict directly, it needs to wrap it with something that filters those out. Methods like NumericUtils.intTerms()/longTerms() are close, but those currently do not yet support ord() and seek(ord) which is needed here.

{quote}
1) DocTermsOrds has an optimization in case the terms dictionary supports ord(). its broken if you are filtering out a subset of the terms, because it just passes the entire termsenum. Note this optimization never happens, except for a few oddball terms dicts we have, which support ord(). thats why it fails with them.
2) those oddball terms dicts are just fine. Nothing wrong with them, its doctermsords that does the wrong thing.
3) I do not have an opinion on the optimization. its probably easy to fix, but i would just disable it as you suggest for now, since it only impacts tests or if someone explicitly uses one of these term dictionaries with this functionality.
{quote}

> optimize DocTermOrds in cases where the underlying TermEnum being wraped supports ord()
> ---------------------------------------------------------------------------------------
>
>                 Key: LUCENE-6545
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6545
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Hoss Man
>
> Prior to LUCENE-6529, DocTermOrds had an optimization when the TermEnum of the field being Uninverted already supported ord().
> This optimization was removed in LUCENE-6529 (see r1684704) because it was found to produce incorrect results for numeric fields that had a precisionStep.
> This issue is to track the possibility of re-adding a correct version of this optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org