You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2014/08/01 22:25:40 UTC

[jira] [Commented] (LUCENE-5156) CompressingTermVectors termsEnum should probably not support seek-by-ord

    [ https://issues.apache.org/jira/browse/LUCENE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082898#comment-14082898 ] 

David Smiley commented on LUCENE-5156:
--------------------------------------

I can understand why this change was done -- better to not support it than support something optional that should be implemented fast yet not do it fast.  What if it were to be made fast, along with seekCeil() which is also implemented slowly right now too?  For example, say the first time either seekCeil is called or an ord method is called, then build up an array of term start positions by ordinal, which otherwise wouldn't be done.  Then you could do a binary search for seekCeil and a direct lookup for seekExact.  The lazy-created array could also then be shared across repeated invocations to get Terms for the current document.

Why bother, you might ask?  I'm working on a means of having the Terms from term vectors be directly searched against by the default highlighter instead of re-inverting to MemoryIndex.  I'll post a separate issue for that with code, of course, which "works" but isn't as efficient as it could be thanks to the O(N) of seekCeil on term vectors' Terms.

> CompressingTermVectors termsEnum should probably not support seek-by-ord
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-5156
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5156
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>             Fix For: 4.5, 5.0
>
>         Attachments: LUCENE-5156.patch
>
>
> Just like term vectors before it, it has a O(n) seek-by-term. 
> But this one also advertises a seek-by-ord, only this is also O(n).
> This could cause e.g. checkindex to be very slow, because if termsenum supports ord it does a bunch of seeking tests. (Another solution would be to leave it, and add a boolean so checkindex never does seeking tests for term vectors, only real fields).
> However, I think its also kinda a trap, in my opinion if seek-by-ord is supported anywhere, you kinda expect it to be faster than linear time...?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org