You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (Updated) (JIRA)" <ji...@apache.org> on 2012/02/20 15:04:38 UTC

[jira] [Updated] (SOLR-3143) Supply a phrase-oriented QueryConverter for Suggesters

     [ https://issues.apache.org/jira/browse/SOLR-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated SOLR-3143:
------------------------------

    Attachment: SOLR-3143.patch

Wow, phrase suggestions are ridiculously complicated to get working.

I think we need to add some configuration to the example (maybe commented out), because in my opinion this is really the default use case... but its a lot of configuration and the biggest traps imo are:

# You need to write a custom queryconverter in java code (i provide one in this patch) configured as a plugin, and set as queryConverter (is this global or is there a way to set this per-suggester?!)
# You need to make *sure* onlyMorePopular is true, even though it says it doesn't affect file-based spellcheckers, thats a lie. This controls whether results are alpha-sorted or ordered by relevance!
# (Assuming your queryConverter is well-behaved and respects the analyzer), You need to define a custom fieldType in schema.xml, even though its likely not used by any actual solr fields, that uses KeywordTokenizer + lowercase or whatever you want, and set this via queryAnalyzerFieldType. If you don't do this, it will default to whitespacetokenizer.

Anyway, attached is my patch, basically its a QueryConverter that just passes the whole string as-is to the query analyzer.

In my test analyzer config, i added a horrible regexp that tries to emulate what google's autocomplete seems to do: lowercase, collapse runs of whitespace, remove query syntax etc.

But maybe for a lot of people thats even overkill and they could just use Keyword+Lowercase or whatever.

                
> Supply a phrase-oriented QueryConverter for Suggesters
> ------------------------------------------------------
>
>                 Key: SOLR-3143
>                 URL: https://issues.apache.org/jira/browse/SOLR-3143
>             Project: Solr
>          Issue Type: New Feature
>          Components: spellchecker
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-3143.patch
>
>
> The supplied QueryConverter makes sense for Spellcheckers:
> it tries to parse out the 'meat' of the query (using e.g. identifier rules), 
> and analyzes each parsed 'word' with the configured analyzer (separate tokenstream).
> {code}
> words[] = splitByIdentifierRules();
> for (each word) {
>  tokenstream ts = analyzer.tokenStream(word)
>  for (each analyzedWord from tokenstream) {
>    tokens.add(analyzedWord)
>  }
> }
> {code}
> However, for Suggesters this is not really optimal, because in the general
> case they do not work one word at a time: they aren't really suggesting 
> individual words but instead an entire 'query' that matches a prefix.
> so instead here, I think we just want a QueryConverter that creates a 
> single string containing all the 'meat', and we pass the whole thing to 
> the analyzer, then the suggester.
> The current workaround on the wiki to this problem, is to ask the user to write custom
> code (http://wiki.apache.org/solr/Suggester#Tips_and_tricks), I think thats not 
> great since this phrase-based suggesting is really the primary use case for
> suggesters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org