You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steven Parkes (JIRA)" <ji...@apache.org> on 2007/01/03 20:50:27 UTC

[jira] Commented: (LUCENE-763) LuceneDictionary skips first word in enumeration

    [ https://issues.apache.org/jira/browse/LUCENE-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462042 ] 

Steven Parkes commented on LUCENE-763:
--------------------------------------

I was wondering about something very similar just recently: to call TermEnum.next() or not to call TermEnum.next() to get the first term. However, in my case I use terms() rather than terms( Term ) and there's the rub.

After looking through things, there looks to be an inconsistency between the two cases. terms( Term ) seeks such that the new TermEnum object is ready. On the other hand, terms() leaves the enum state "before" the first term: you need to call next() first and calling term() earlier will return null.

I've only tried this against SegmentReader#terms(...).

This difference of behaviour isn't mentioned in the documentation.

It would seem like it would be nice to have the same behaviour between the two calls but I'm a little worried that half the existing code would break. Should we just document the existing behaviour?

In that case, the spell checker does just need to get rid of the extra next() call.

While investigating, I noticed there are several other issues around the spell checker now, both the functional code and test code. It plays a bit fast and loose with when index readers and writers are opened. Perhaps it used to work, depending on when things got flushed to disk, but it doesn't work for me now under the trunk.

> LuceneDictionary skips first word in enumeration
> ------------------------------------------------
>
>                 Key: LUCENE-763
>                 URL: https://issues.apache.org/jira/browse/LUCENE-763
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Other
>    Affects Versions: 2.0.0
>         Environment: Windows Sun JRE 1.4.2_10_b03
>            Reporter: Dan Ertman
>
> The current code for LuceneDictionary will always skip the first word of the TermEnum. The reason is that it doesn't initially retrieve TermEnum.term - its first call is to TermEnum.next, which moves it past the first term (line 76).
> To see this problem cause a failure, add this test to TestSpellChecker:
> similar = spellChecker.suggestSimilar("eihgt",2);
>       assertEquals(1, similar.length);
>       assertEquals(similar[0], "eight");
> Because "eight" is the first word in the index, it will fail.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org