You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (Updated) (JIRA)" <ji...@apache.org> on 2012/02/20 15:04:38 UTC
[jira] [Updated] (SOLR-3143) Supply a phrase-oriented
QueryConverter for Suggesters
[ https://issues.apache.org/jira/browse/SOLR-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated SOLR-3143:
------------------------------
Attachment: SOLR-3143.patch
Wow, phrase suggestions are ridiculously complicated to get working.
I think we need to add some configuration to the example (maybe commented out), because in my opinion this is really the default use case... but its a lot of configuration and the biggest traps imo are:
# You need to write a custom queryconverter in java code (i provide one in this patch) configured as a plugin, and set as queryConverter (is this global or is there a way to set this per-suggester?!)
# You need to make *sure* onlyMorePopular is true, even though it says it doesn't affect file-based spellcheckers, thats a lie. This controls whether results are alpha-sorted or ordered by relevance!
# (Assuming your queryConverter is well-behaved and respects the analyzer), You need to define a custom fieldType in schema.xml, even though its likely not used by any actual solr fields, that uses KeywordTokenizer + lowercase or whatever you want, and set this via queryAnalyzerFieldType. If you don't do this, it will default to whitespacetokenizer.
Anyway, attached is my patch, basically its a QueryConverter that just passes the whole string as-is to the query analyzer.
In my test analyzer config, i added a horrible regexp that tries to emulate what google's autocomplete seems to do: lowercase, collapse runs of whitespace, remove query syntax etc.
But maybe for a lot of people thats even overkill and they could just use Keyword+Lowercase or whatever.
> Supply a phrase-oriented QueryConverter for Suggesters
> ------------------------------------------------------
>
> Key: SOLR-3143
> URL: https://issues.apache.org/jira/browse/SOLR-3143
> Project: Solr
> Issue Type: New Feature
> Components: spellchecker
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-3143.patch
>
>
> The supplied QueryConverter makes sense for Spellcheckers:
> it tries to parse out the 'meat' of the query (using e.g. identifier rules),
> and analyzes each parsed 'word' with the configured analyzer (separate tokenstream).
> {code}
> words[] = splitByIdentifierRules();
> for (each word) {
> tokenstream ts = analyzer.tokenStream(word)
> for (each analyzedWord from tokenstream) {
> tokens.add(analyzedWord)
> }
> }
> {code}
> However, for Suggesters this is not really optimal, because in the general
> case they do not work one word at a time: they aren't really suggesting
> individual words but instead an entire 'query' that matches a prefix.
> so instead here, I think we just want a QueryConverter that creates a
> single string containing all the 'meat', and we pass the whole thing to
> the analyzer, then the suggester.
> The current workaround on the wiki to this problem, is to ask the user to write custom
> code (http://wiki.apache.org/solr/Suggester#Tips_and_tricks), I think thats not
> great since this phrase-based suggesting is really the primary use case for
> suggesters.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org