You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2006/12/22 06:27:09 UTC

Help with spellchecker integration

Hi,
I'm trying to integrate the Lucene-based spellchecker (http://wiki.apache.org/jakarta-lucene/SpellChecker + contrib/spellchecker under Lucene) with Solr (http://issues.apache.org/jira/browse/SOLR-81) in order to provide a query spellchecking service (you enter Speers and it suggest pant^H^H ... Spears).  I've created a generic NGramTokenizer (+ NGramTokenizerFactory + unit test) that I'll attach to SOLR-81 shortly.

What I'm not yet sure about is:
1) integration of this generic n-grammer with that Lucene SpellChecker code - SpellChecker & TRStringDistance classes in particular.
2) mapping n-gram Tokens that come out of my NGramTokenizer to specific field names, like 3start, 4start, gram1, gram2, gram3.... is there is scheme.xml trick one can use to accomplish this?
3) once 2) is done, getting the.... request handler(?) to n-gram the query appropriately and hit the SpellChecker index to try and find alternative spelling suggestions.

Damn, that's a lot of unknowns... on top of that my computer started freezing every half an hour.  Hi Murphy.



Any pointers will be greatly appreciated. Thanks,
Otis




Re: Help with spellchecker integration

Posted by Thorsten Scherler <th...@juntadeandalucia.es>.
On Thu, 2006-12-21 at 21:27 -0800, Otis Gospodnetic wrote: 
> Hi,
> I'm trying to integrate the Lucene-based spellchecker (http://wiki.apache.org/jakarta-lucene/SpellChecker + contrib/spellchecker under Lucene) with Solr (http://issues.apache.org/jira/browse/SOLR-81) in order to provide a query spellchecking service (you enter Speers and it suggest pant^H^H ... Spears).  I've created a generic NGramTokenizer (+ NGramTokenizerFactory + unit test) that I'll attach to SOLR-81 shortly.
> 
> What I'm not yet sure about is:
> 1) integration of this generic n-grammer with that Lucene SpellChecker code - SpellChecker & TRStringDistance classes in particular.

Hmm, reading SOLR-81, you actually have everything you need.

> 2) mapping n-gram Tokens that come out of my NGramTokenizer to specific field names, like 3start, 4start, gram1, gram2, gram3.... is there is scheme.xml trick one can use to accomplish this?

It is in the issue:
...
<!-- Here you map the @source="word" to @dest="gram2" 
     What is does is copying the word input to the gram2 field-->
<copyField source="word" dest="gram2"/>
...
<!-- Here you define what happens if the field "gram2" get indexed.
     The solr.NGramTokenizerFactory will return the different
combination of tokens -->
<fieldtype name="gram2" class="solr.TextField"> 
  <analyzer> 
    <!--more tokenizer --> 
    <tokenizer 
      class="solr.NGramTokenizerFactory" minGram="2" maxGram="2"/> 
  </analyzer> 
</fieldtype>

The above shows how to configure the second (spellcheck) index, however
if you want to update both indexes at the same time you need to write
your own implementation of the update servlet.

> 3) once 2) is done, getting the.... request handler(?) to n-gram the query appropriately and hit the SpellChecker index to try and find alternative spelling suggestions.

hmm, not sure, actually IMHO that highly depends on how you plan to use
it in the end. I mean there is more then one way to use spell check.

In the issue they talked about AJAX suggestions but that would be IMO
before the actual search request. If you want to have it in the request
handler then you need to decide how and when the spellchecker comes into
place.

Like if the normal search does not return a result or parallel. Parallel
would search in the spell check index for alternatives, use this
alternatives to dispatch the alternative word query and later on parse
the result of directly into the output writer. Here you have again
different alternatives, you can attack the solr index directly (loosing
all the cool feature) 

Or you want the google thingy "Did you mean".

... in any form 
start with:
public class NGramRequestHandler extends StandardRequestHandler
implements SolrRequestHandler, SolrInfoMBean {
    public void handleRequest(SolrQueryRequest req, SolrQueryResponse
rsp) {
        // Depending on the use case do your processing here
    }
}

This way you just need to implement the class specific methods. 


> 
> Damn, that's a lot of unknowns... on top of that my computer started freezing every half an hour.  Hi Murphy.
> 
> 
> 
> Any pointers will be greatly appreciated. Thanks,

HTH a wee bit.

salu2

> Otis
> 
> 
>