You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Otis Gospodnetic (JIRA)" <ji...@apache.org> on 2007/02/05 19:04:15 UTC

[jira] Commented: (SOLR-81) Add Query Spellchecker functionality

    [ https://issues.apache.org/jira/browse/SOLR-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470295 ] 

Otis Gospodnetic commented on SOLR-81:
--------------------------------------

Adam,

I took a look at your patch.  It looks like you brought over (copied) various n-gram tokenizer classes and their unit tests that I put in Lucene's contrib/analyzers/.... .  Did you do this on purpose?  I intentionally put those n-gram tokenizers under Lucene's contrib, as they are generic and not Solr-specific.  Thus, the only classes my patch has are classes that are Solr-specific:

src/java/org/apache/solr/analysis/EdgeNGramTokenizerFactory.java
src/java/org/apache/solr/analysis/NGramTokenizerFactory.java
src/java/org/apache/solr/analysis/BaseTokenizerFactory.java

And instead of copying the source classes from Lucene's contrib/analyzers/.... it adds the new jar built from those sources:
lib/lucene-analyzers-2.1-dev.jar

Plus:
lib/lucene-spellchecker-2.1-dev.jar
example/solr/conf/schema.xml

I have some locally modified code for this issue, that was not a part of the first patch.  I wanted to attach the updated patch assuming you didn't really want those few generic tokenizer classes copied from Lucene over to Solr, but because changes are now in two places, so to speak, let's do this to unify our work:

Could you please:
- open a new LUCENE issue or just reopen the one where I originally attached this code and post your patch to the Lucene tokenizers there.
- prepare a new patch for this issue and make sure it only contains Solr-specific classes (see above), plus those 2 Jars.  

I'll upload my patch for schema.xml, so you can see my config (your patch didn't have this), and make sure your changes to the code are in sync with that.

Finally, are you making use of this code somehow already?
One thing that is completely missing from this patch is the RequestHandler that knows how to take the input (a query string), and get suggestions for alternative spellings via a SpellChecker instance.  I have some NGramRequestHandler code locally, but the code is unfinished.


> Add Query Spellchecker functionality
> ------------------------------------
>
>                 Key: SOLR-81
>                 URL: https://issues.apache.org/jira/browse/SOLR-81
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: SOLR-81-edgengram-ngram.patch, SOLR-81-ngram.patch
>
>
> Use the simple approach of n-gramming outside of Solr and indexing n-gram documents.  For example:
> <doc>
> <field name="word">lettuce</field>
> <field name="start3">let</field>
> <field name="gram3">let ett ttu tuc uce</field>
> <field name="end3">uce</field>
> <field name="start4">lett</field>
> <field name="gram4">lett ettu ttuc tuce</field>
> <field name="end4">tuce</field>
> </doc>
> See:
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg01254.html
> Java clients: SOLR-20 (add delete commit optimize), SOLR-30 (search)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.