You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (Created) (JIRA)" <ji...@apache.org> on 2012/02/19 16:20:34 UTC

[jira] [Created] (SOLR-3143) Supply a phrase-oriented QueryConverter for Suggesters

Supply a phrase-oriented QueryConverter for Suggesters
------------------------------------------------------

                 Key: SOLR-3143
                 URL: https://issues.apache.org/jira/browse/SOLR-3143
             Project: Solr
          Issue Type: New Feature
          Components: spellchecker
            Reporter: Robert Muir
             Fix For: 3.6, 4.0


The supplied QueryConverter makes sense for Spellcheckers:
it tries to parse out the 'meat' of the query (using e.g. identifier rules), 
and analyzes each parsed 'word' with the configured analyzer (separate tokenstream).

{code}
words[] = splitByIdentifierRules();
for (each word) {
 tokenstream ts = analyzer.tokenStream(word)
 for (each analyzedWord from tokenstream) {
   tokens.add(analyzedWord)
 }
}
{code}

However, for Suggesters this is not really optimal, because in the general
case they do not work one word at a time: they aren't really suggesting 
individual words but instead an entire 'query' that matches a prefix.

so instead here, I think we just want a QueryConverter that creates a 
single string containing all the 'meat', and we pass the whole thing to 
the analyzer, then the suggester.

The current workaround on the wiki to this problem, is to ask the user to write custom
code (http://wiki.apache.org/solr/Suggester#Tips_and_tricks), I think thats not 
great since this phrase-based suggesting is really the primary use case for
suggesters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (SOLR-3143) Supply a phrase-oriented QueryConverter for Suggesters

Posted by "Robert Muir (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved SOLR-3143.
-------------------------------

    Resolution: Fixed
    
> Supply a phrase-oriented QueryConverter for Suggesters
> ------------------------------------------------------
>
>                 Key: SOLR-3143
>                 URL: https://issues.apache.org/jira/browse/SOLR-3143
>             Project: Solr
>          Issue Type: New Feature
>          Components: spellchecker
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-3143.patch
>
>
> The supplied QueryConverter makes sense for Spellcheckers:
> it tries to parse out the 'meat' of the query (using e.g. identifier rules), 
> and analyzes each parsed 'word' with the configured analyzer (separate tokenstream).
> {code}
> words[] = splitByIdentifierRules();
> for (each word) {
>  tokenstream ts = analyzer.tokenStream(word)
>  for (each analyzedWord from tokenstream) {
>    tokens.add(analyzedWord)
>  }
> }
> {code}
> However, for Suggesters this is not really optimal, because in the general
> case they do not work one word at a time: they aren't really suggesting 
> individual words but instead an entire 'query' that matches a prefix.
> so instead here, I think we just want a QueryConverter that creates a 
> single string containing all the 'meat', and we pass the whole thing to 
> the analyzer, then the suggester.
> The current workaround on the wiki to this problem, is to ask the user to write custom
> code (http://wiki.apache.org/solr/Suggester#Tips_and_tricks), I think thats not 
> great since this phrase-based suggesting is really the primary use case for
> suggesters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (SOLR-3143) Supply a phrase-oriented QueryConverter for Suggesters

Posted by "Robert Muir (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir reassigned SOLR-3143:
---------------------------------

    Assignee: Robert Muir
    
> Supply a phrase-oriented QueryConverter for Suggesters
> ------------------------------------------------------
>
>                 Key: SOLR-3143
>                 URL: https://issues.apache.org/jira/browse/SOLR-3143
>             Project: Solr
>          Issue Type: New Feature
>          Components: spellchecker
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>
> The supplied QueryConverter makes sense for Spellcheckers:
> it tries to parse out the 'meat' of the query (using e.g. identifier rules), 
> and analyzes each parsed 'word' with the configured analyzer (separate tokenstream).
> {code}
> words[] = splitByIdentifierRules();
> for (each word) {
>  tokenstream ts = analyzer.tokenStream(word)
>  for (each analyzedWord from tokenstream) {
>    tokens.add(analyzedWord)
>  }
> }
> {code}
> However, for Suggesters this is not really optimal, because in the general
> case they do not work one word at a time: they aren't really suggesting 
> individual words but instead an entire 'query' that matches a prefix.
> so instead here, I think we just want a QueryConverter that creates a 
> single string containing all the 'meat', and we pass the whole thing to 
> the analyzer, then the suggester.
> The current workaround on the wiki to this problem, is to ask the user to write custom
> code (http://wiki.apache.org/solr/Suggester#Tips_and_tricks), I think thats not 
> great since this phrase-based suggesting is really the primary use case for
> suggesters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3143) Supply a phrase-oriented QueryConverter for Suggesters

Posted by "Robert Muir (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated SOLR-3143:
------------------------------

    Attachment: SOLR-3143.patch

Wow, phrase suggestions are ridiculously complicated to get working.

I think we need to add some configuration to the example (maybe commented out), because in my opinion this is really the default use case... but its a lot of configuration and the biggest traps imo are:

# You need to write a custom queryconverter in java code (i provide one in this patch) configured as a plugin, and set as queryConverter (is this global or is there a way to set this per-suggester?!)
# You need to make *sure* onlyMorePopular is true, even though it says it doesn't affect file-based spellcheckers, thats a lie. This controls whether results are alpha-sorted or ordered by relevance!
# (Assuming your queryConverter is well-behaved and respects the analyzer), You need to define a custom fieldType in schema.xml, even though its likely not used by any actual solr fields, that uses KeywordTokenizer + lowercase or whatever you want, and set this via queryAnalyzerFieldType. If you don't do this, it will default to whitespacetokenizer.

Anyway, attached is my patch, basically its a QueryConverter that just passes the whole string as-is to the query analyzer.

In my test analyzer config, i added a horrible regexp that tries to emulate what google's autocomplete seems to do: lowercase, collapse runs of whitespace, remove query syntax etc.

But maybe for a lot of people thats even overkill and they could just use Keyword+Lowercase or whatever.

                
> Supply a phrase-oriented QueryConverter for Suggesters
> ------------------------------------------------------
>
>                 Key: SOLR-3143
>                 URL: https://issues.apache.org/jira/browse/SOLR-3143
>             Project: Solr
>          Issue Type: New Feature
>          Components: spellchecker
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-3143.patch
>
>
> The supplied QueryConverter makes sense for Spellcheckers:
> it tries to parse out the 'meat' of the query (using e.g. identifier rules), 
> and analyzes each parsed 'word' with the configured analyzer (separate tokenstream).
> {code}
> words[] = splitByIdentifierRules();
> for (each word) {
>  tokenstream ts = analyzer.tokenStream(word)
>  for (each analyzedWord from tokenstream) {
>    tokens.add(analyzedWord)
>  }
> }
> {code}
> However, for Suggesters this is not really optimal, because in the general
> case they do not work one word at a time: they aren't really suggesting 
> individual words but instead an entire 'query' that matches a prefix.
> so instead here, I think we just want a QueryConverter that creates a 
> single string containing all the 'meat', and we pass the whole thing to 
> the analyzer, then the suggester.
> The current workaround on the wiki to this problem, is to ask the user to write custom
> code (http://wiki.apache.org/solr/Suggester#Tips_and_tricks), I think thats not 
> great since this phrase-based suggesting is really the primary use case for
> suggesters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org