You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2011/04/01 16:37:06 UTC

[jira] [Closed] (NUTCH-72) Query basic filter with correction feature

     [ https://issues.apache.org/jira/browse/NUTCH-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma closed NUTCH-72.
------------------------------

    Resolution: Won't Fix

> Query basic filter with correction feature
> ------------------------------------------
>
>                 Key: NUTCH-72
>                 URL: https://issues.apache.org/jira/browse/NUTCH-72
>             Project: Nutch
>          Issue Type: New Feature
>          Components: searcher
>         Environment: lucene
>            Reporter: Christophe Noel
>         Attachments: querycorrectionplugin.zip
>
>
> This plugin improves query-basic plugin with a correction feature.
> Lucene includes FuzzyQuery feature which consists of searching not only for matching terms, but searching for very similar terms too.
> This plugin should be used instead of query-basic, for people looking for an easy solution about users query requests correction.
> Correction Query Plugin can be used as follows :
> Solution 1 :  If you want to search for very similar terms, add autocorrectionmod as the first term of the query (example : 'nutch engine' -> 'autocorrectionmod nutch engine')
> Solution 2 : Create a new search.jsp page which include a "correction" checkbox management (<input type="checkbox" name="autocorrection" value="true"> may automatically add 'autocorrectionmod' as the first term of the query) 
> QueryFuzzy knows a big problem : it is very slow for large index !
> So Correction Query Plugin works as follows :
> - it is not useful for big indexes
> - it only works for 5 characters and more words
> - it only look for words matching with the 2 first characters (to improve performance this should be set to 3/4)
> - it only works for 65 % matching suffixes (algorithm is levenstein)
> PLease give your opinion about it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira