You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by "Shad Storhaug (JIRA)" <ji...@apache.org> on 2017/07/13 06:49:00 UTC

[jira] [Commented] (LUCENENET-590) SpellChecker.Exist() minimum word length

    [ https://issues.apache.org/jira/browse/LUCENENET-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085274#comment-16085274 ] 

Shad Storhaug commented on LUCENENET-590:
-----------------------------------------

I took a look at the source for this method and it is exactly the same as in Java, and it is still the same implementation in the master branch of Lucene.

{code:title=SpellChecker.cs|borderStyle=solid}
        public virtual bool Exist(string word)
        {
            // obtainSearcher calls ensureOpen
            IndexSearcher indexSearcher = ObtainSearcher();
            try
            {
                // TODO: we should use ReaderUtil+seekExact, we dont care about the docFreq
                // this is just an existence check
                return indexSearcher.IndexReader.DocFreq(new Term(F_WORD, word)) > 0;
            }
            finally
            {
                ReleaseSearcher(indexSearcher);
            }
        }
{code}

The exact way it works depends on the implementation of the {{DocFreq()}} method, which in turn depends on the {{Directory}} implementation used (specifically, what type of {{AtomicReader}} is opened). I suspect all of the built-in {{Directory}} implementations work similarly, but it is possible to provide your own that has an alternate implementation.

The {{ReaderUtil.SeekExact()}} method mentioned doesn't exist in Lucene 4.8.0, but the {{Exist()}} method is virtual so you can provide your own implementation if it doesn't work exactly the way you like.

I suspect this is the correct default behavior. After all, words that are less than 3 characters are not often misspelled and there would likely be a performance penalty for checking them. 

But there is no way to tell if this is the correct behavior without a sample of the code including the type of directory implementation you are using. Do note that if you are using one of the {{FSDirectory.Open()}} overloads the implementation you get depends on your OS and whether you are on 32 or 64 bit.

The quickest way to check would be to provide a test in the TestSpellChecker class (https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.Tests.Suggest/Spell/TestSpellChecker.cs) that demonstrates a working and a failing case (either here or as a pull request on GitHub), which could be ported back to Java to see if it behaves the same way.

> SpellChecker.Exist() minimum word length 
> -----------------------------------------
>
>                 Key: LUCENENET-590
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-590
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net.Suggest
>    Affects Versions: Lucene.Net 4.8.0
>         Environment: .NET 4.6
>            Reporter: Meta
>
> Hi,
> I'm not exactly sure if this is a bug or by design, but I've noticed when using the .Exist function of the SpellCheker  Lucene.Net.Search.Spell.SpellChecker.Exist(string), it does not check if the word exist if the word character length is 2.
> Let me know if you have questions.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)