You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by anuvenk <an...@hotmail.com> on 2008/01/05 01:15:48 UTC

spellcheckhandler

Is it possible to implement something like this with the spellcheckhandler

Like how google does,..

say i search for 'chater 13 bakrupcy',

should be able to display these..

did you search for 'chapter 13 bankruptcy'

Has someone been able to do this?
-- 
View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p14627712.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spellcheckhandler

Posted by anuvenk <an...@hotmail.com>.

I followed the steps outlined in 
http://wiki.apache.org/solr/SpellCheckerRequestHandler
with regards to setting up of the schema with a new field 'spell' and
copying other fields to this 'spell' field at index time.
It works fine with single word queries but doesn't return anything for
multi-word queries. I read previous posts where this has been discussed. I
read that some of the active members are in the process of releasing patches
that fixes this problem. I'm actually trying to implement this spell check
in the production set up. Is it absolutely not possible to get spell check
results back for multi-word queries, should i wait for 1.3 release. If there
is any other option please educate me. In case a patch was already released,
how to add it to the current 1.2 version that i'm using?
-- 
View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spellcheckhandler

Posted by anuvenk <an...@hotmail.com>.

I was going to do this
create a new field(termsourcefield) called 'spell'
<field name="spell" type="spell" indexed="true" stored="false"
multiValued="true"/>
of type 'spell'
<fieldType name="spell" class="solr.TextField" positionIncrementGap="100">
     <analyzer type="index">
       <tokenizer class="solr.StandardTokenizerFactory "/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
     <analyzer type="query">
       <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="
stopwords.txt"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
    </fieldType>

copy my 'name' and 'body' fields to this 'spell' field at index time
<copyField source="name" dest="spell"/>
   <copyField source="body" dest="spell"/>

But like you had mentioned, the tutorial says we have to use it on a field
thats not tokenized. Now how to use my tokenized fields 'body' and 'name' to
build my spell index? 

How to use it effectively for spell checking on multi-word queries?


anuvenk wrote:
> 
> Is it possible to implement something like this with the spellcheckhandler
> 
> Like how google does,..
> 
> say i search for 'chater 13 bakrupcy',
> 
> should be able to display these..
> 
> did you search for 'chapter 13 bankruptcy'
> 
> Has someone been able to do this?
> 

-- 
View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p14977717.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spellcheckhandler

Posted by John Stewart <ca...@gmail.com>.

The way we do this is with the Solr 1.2 (the current release),
inspired by a discussion on the ML, is to build a spellcheck
dictionary with the relevant collocations such as the one in your
example, based on a custom field that is effectively not tokenized.
We actually create dummy documents for this, since each true document
may give rise to more than one such dictionary entry.

A potential downside of this approach is that, depending on the length
of the dictionary entries, queries that only specify a small subset of
a particular entry may not match.

There have been many useful revisions to the spellchecker in the
1.3-dev branch, so check there first.

jds

On Jan 4, 2008 7:15 PM, anuvenk <an...@hotmail.com> wrote:
>
> Is it possible to implement something like this with the spellcheckhandler
>
> Like how google does,..
>
> say i search for 'chater 13 bakrupcy',
>
> should be able to display these..
>
> did you search for 'chapter 13 bankruptcy'
>
> Has someone been able to do this?
> --
> View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p14627712.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

RE: spellcheckhandler

Posted by Lance Norskog <go...@gmail.com>.

We use Solr 1.2. I copied the 1.2 spellchecker and made an equivalent phrase
pair index generator.  Using this we can take an example spelling and find
example words pairs for each suggestion.  We have not deployed this.

Lance Norskog

-----Original Message-----
From: Mike Klaas [mailto:mike.klaas@gmail.com] 
Sent: Tuesday, January 29, 2008 1:48 PM
To: solr-user@lucene.apache.org
Subject: Re: spellcheckhandler

On 26-Jan-08, at 5:51 PM, anuvenk wrote:

>
> Thanks a lot for clearing my doubts. Would you know if the solr wiki 
> is up to date with the documentation for the new features that are 
> being added? I totally rely on the solr wiki documentation for my 
> project. If you may, please send me the files you had mentioned and 
> i'll be happy to test them. I appreciate your help !!

anuvenk,

Multi-word spell checking is available only with extendedResults=true, and
only in trunk.  I believe that the current javadocs are incorrect on this
point.

-Mike

Re: spellcheckhandler

Posted by Mike Klaas <mi...@gmail.com>.

On 26-Jan-08, at 5:51 PM, anuvenk wrote:

>
> Thanks a lot for clearing my doubts. Would you know if the solr  
> wiki is up to
> date with the documentation for the new features that are being  
> added? I
> totally rely on the solr wiki documentation for my project. If you  
> may,
> please send me the files you had mentioned and i'll be happy to  
> test them. I
> appreciate your help !!

anuvenk,

Multi-word spell checking is available only with  
extendedResults=true, and only in trunk.  I believe that the current  
javadocs are incorrect on this point.

-Mike

Re: spellcheckhandler

Posted by anuvenk <an...@hotmail.com>.

Thanks a lot for clearing my doubts. Would you know if the solr wiki is up to
date with the documentation for the new features that are being added? I
totally rely on the solr wiki documentation for my project. If you may,
please send me the files you had mentioned and i'll be happy to test them. I
appreciate your help !!

scott.tabar wrote:
> 
> Anuvenk,
> 
> Sorry for this "Third" email, but I was reading your question below and I
> think it warrants yet another reply.
> 
> Just some background from my focus and involvement, and hence the
> generation of the JavaDocs.  I was primarily interested in having a Solr
> based spell checker that behaved more like a traditional spell checker. 
> In my application, when I generated the input in to Solr for inclusion of
> the spell checker indexer, I was only interested in single words and not
> multi-word sets.  My intentions was to send multiple words to the handler
> and have it return details on each word as it stands independently when
> the parameter multiWords was set, otherwise it was to use all input words
> as a single check against the handler.  As such, in my original efforts, I
> had no multiple words in a single term, as you were asking below.  That is
> not to say it is not possible, but I just wanted to let you know the
> original focus of my work.
> 
> I did look a little closer at the JavaDocs and it looks like they have
> been updated from what I originally generated.  So perhaps they may be up
> to date?
> 
> One thing I would like to point out, is that I put some efforts in
> creating a test case for the SpellCheckerRequestHandler.  If it still
> exists (I have not checked the head for a long time) then it would be a
> good starting point to do some simple testing with limited data sets of
> your own.  Just make a copy of it, and then feed in multi-word terms and
> see how it responds do the different settings.  This will also allow you
> to play around with the configuration settings in the schema and
> solrconfig files without impacting your actual Solr instance and the turn
> around time could be in the seconds and not minutes with each alteration
> of a new test.  
> 
> The locations in svn and file names of the unit tests that I created were:
>   /test/test-files/solr/conf/schema-spellchecker.xml
>   /test/test-files/solr/conf/solrconfig-spellchecker.xml
>   /test/org/apache/solr/handler/SpellCheckerRequestHandlerTest.java
> 
> If these do not existing in svn currently, let me know and I can pass
> along the contents and you can recreate them locally to test with.
> 
>   Best of luck,
>     Scott Tabar
> 
> ---- anuvenk <an...@hotmail.com> wrote: 
> 
> Thanks. But i'm looking at this
> http://.../spellchecker?indent=on&onlyMorePopular=true&accuracy=.6&suggestionCount=20&q=facial+salophosphoprotein
> on
> http://lucene.apache.org/solr/api/org/apache/solr/handler/SpellCheckerRequestHandler.html
> It seems to return results (well in the example) 
> with and without extendedResults=true
> does it mean that 'facial salophosphoprotein' was a single term in the
> index. 
> 
> 
> hossman wrote:
>> 
>> : 
>> : I did try with the latest nightly build and followed the steps outlined
>> in
>> : http://wiki.apache.org/solr/SpellCheckerRequestHandler
>> : with regards to creating new catchall field 'spell' of type 'spell' and
>> : copied my text fields to 'spell' at index time.
>> : Still q=grapics returns 'graphics'
>> : but q=grapics card returns nothing.
>> : But the same queries return the correct spelling with string
>> fieldtypes.
>> : Any fix available? 
>> 
>> I don't think Otis was suggesting any specific fix was available in the 
>> nightly builds, i believe he was just addressing specificly that if there 
>> was a bug someone commited a fix for you didnt' need to wait for 1.3 -- 
>> you can test it now using the nightly builds.
>> 
>> That said: I don't see any currently open or recent resolved bugs 
>> related to spellchecking and multiple words ... i believe (but i'm not 
>> 100% positive) that "multi word" spell correction will work, as long as 
>> your dictionary contaisn those "multiple words" as individual "terms"
>> 
>> ie: if you want "graphics card" to be a suggestion for "grapics card"
>> then 
>> you need to use a termSourceField in which "graphics card" is a single 
>> term (either because it is untokenized, or maybe because you use a 
>> word-based ngram tokenfilter, etc...)
>> 
>> alternately, if you want to get "graphics asdfghjk" as a suggestion for
>> "grapics asdfghjk" (even though "asdfghjk" isn't in your index at all), 
>> hiting the spellcorrection handler for each input word individually is 
>> probably your best bet.
>> 
>> 
>> : > You don't need to wait for 1.3 to be released - you can simply use a
>> : > recent nightly build.
>> 
>> 
>> -Hoss
>> 
>> 
>> 
> 
> -- 
> View this message in context:
> http://www.nabble.com/spellcheckhandler-tp14627712p15100704.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p15115105.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spellcheckhandler

Posted by anuvenk <an...@hotmail.com>.


I did try with the latest nightly build. The problem still exists. 
I tested with the example data that comes with solr package.
1)with termsourcefield set to 'word' which is string fieldtype
q=iped nano   returns   'ipod nano' which is good

2) with termsourcefield set to 'spell' (which is the catchall field of
'spell' fieldtype according to the tutorial 
http://wiki.apache.org/solr/SpellCheckerRequestHandler
that has my text fields copied in to it at index time)
q=grapics returns 'graphics' 
but q=grapics card returns nothing.

Not sure if i'm missing something. Please help!!


Otis Gospodnetic wrote:
> 
> You don't need to wait for 1.3 to be released - you can simply use a
> recent nightly build.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> ----- Original Message ----
> From: anuvenk <an...@hotmail.com>
> To: solr-user@lucene.apache.org
> Sent: Monday, January 21, 2008 12:35:52 AM
> Subject: Re: spellcheckhandler
> 
> 
> I followed the steps outlined in 
> http://wiki.apache.org/solr/SpellCheckerRequestHandler
> with regards to setting up of the schema with a new field 'spell' and
> copying other fields to this 'spell' field at index time.
> It works fine with single word queries but doesn't return anything for
> multi-word queries. I read previous posts where this has been
>  discussed. I
> read that some of the active members are in the process of releasing
>  patches
> that fixes this problem. I'm actually trying to implement this spell
>  check
> in the production set up. Is it absolutely not possible to get spell
>  check
> results back for multi-word queries, should i wait for 1.3 release. If
>  there
> is any other option please educate me. In case a patch was already
>  released,
> how to add it to the current 1.2 version that i'm using?
> -- 
> View this message in context:
>  http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> 



-- 
View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p15025889.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spellcheckhandler

Posted by anuvenk <an...@hotmail.com>.

I did try with the latest nightly build. The problem still exists. 
I tested with the example data that comes with solr package.
1)with termsourcefield set to 'word' which is string fieldtype
q=iped nano   returns   'ipod nano' which is good

2) with termsourcefield set to 'spell' (which is the catchall field of
'spell' fieldtype according to the tutorial 
http://wiki.apache.org/solr/SpellCheckerRequestHandler
that has my text fields copied in to it at index time)
q=grapics returns 'graphics' which is good
but q=grapics card returns nothing.

Not sure if i'm missing something. Please help!!


Otis Gospodnetic wrote:
> 
> You don't need to wait for 1.3 to be released - you can simply use a
> recent nightly build.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> ----- Original Message ----
> From: anuvenk <an...@hotmail.com>
> To: solr-user@lucene.apache.org
> Sent: Monday, January 21, 2008 12:35:52 AM
> Subject: Re: spellcheckhandler
> 
> 
> I followed the steps outlined in 
> http://wiki.apache.org/solr/SpellCheckerRequestHandler
> with regards to setting up of the schema with a new field 'spell' and
> copying other fields to this 'spell' field at index time.
> It works fine with single word queries but doesn't return anything for
> multi-word queries. I read previous posts where this has been
>  discussed. I
> read that some of the active members are in the process of releasing
>  patches
> that fixes this problem. I'm actually trying to implement this spell
>  check
> in the production set up. Is it absolutely not possible to get spell
>  check
> results back for multi-word queries, should i wait for 1.3 release. If
>  there
> is any other option please educate me. In case a patch was already
>  released,
> how to add it to the current 1.2 version that i'm using?
> -- 
> View this message in context:
>  http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p15002379.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spellcheckhandler

Posted by anuvenk <an...@hotmail.com>.

I did try with the latest nightly build and followed the steps outlined in 
http://wiki.apache.org/solr/SpellCheckerRequestHandler
with regards to creating new catchall field 'spell' of type 'spell' and
copied my text fields to 'spell' at index time.
Still q=grapics returns 'graphics'
but q=grapics card returns nothing.
But the same queries return the correct spelling with string fieldtypes.
Any fix available?

Otis Gospodnetic wrote:
> 
> You don't need to wait for 1.3 to be released - you can simply use a
> recent nightly build.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> ----- Original Message ----
> From: anuvenk <an...@hotmail.com>
> To: solr-user@lucene.apache.org
> Sent: Monday, January 21, 2008 12:35:52 AM
> Subject: Re: spellcheckhandler
> 
> 
> I followed the steps outlined in 
> http://wiki.apache.org/solr/SpellCheckerRequestHandler
> with regards to setting up of the schema with a new field 'spell' and
> copying other fields to this 'spell' field at index time.
> It works fine with single word queries but doesn't return anything for
> multi-word queries. I read previous posts where this has been
>  discussed. I
> read that some of the active members are in the process of releasing
>  patches
> that fixes this problem. I'm actually trying to implement this spell
>  check
> in the production set up. Is it absolutely not possible to get spell
>  check
> results back for multi-word queries, should i wait for 1.3 release. If
>  there
> is any other option please educate me. In case a patch was already
>  released,
> how to add it to the current 1.2 version that i'm using?
> -- 
> View this message in context:
>  http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p15026217.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spellcheckhandler

Posted by anuvenk <an...@hotmail.com>.

Thanks. But i'm looking at this
http://.../spellchecker?indent=on&onlyMorePopular=true&accuracy=.6&suggestionCount=20&q=facial+salophosphoprotein
on
http://lucene.apache.org/solr/api/org/apache/solr/handler/SpellCheckerRequestHandler.html
It seems to return results (well in the example) 
with and without extendedResults=true
does it mean that 'facial salophosphoprotein' was a single term in the
index. 


hossman wrote:
> 
> : 
> : I did try with the latest nightly build and followed the steps outlined
> in
> : http://wiki.apache.org/solr/SpellCheckerRequestHandler
> : with regards to creating new catchall field 'spell' of type 'spell' and
> : copied my text fields to 'spell' at index time.
> : Still q=grapics returns 'graphics'
> : but q=grapics card returns nothing.
> : But the same queries return the correct spelling with string fieldtypes.
> : Any fix available? 
> 
> I don't think Otis was suggesting any specific fix was available in the 
> nightly builds, i believe he was just addressing specificly that if there 
> was a bug someone commited a fix for you didnt' need to wait for 1.3 -- 
> you can test it now using the nightly builds.
> 
> That said: I don't see any currently open or recent resolved bugs 
> related to spellchecking and multiple words ... i believe (but i'm not 
> 100% positive) that "multi word" spell correction will work, as long as 
> your dictionary contaisn those "multiple words" as individual "terms"
> 
> ie: if you want "graphics card" to be a suggestion for "grapics card" then 
> you need to use a termSourceField in which "graphics card" is a single 
> term (either because it is untokenized, or maybe because you use a 
> word-based ngram tokenfilter, etc...)
> 
> alternately, if you want to get "graphics asdfghjk" as a suggestion for
> "grapics asdfghjk" (even though "asdfghjk" isn't in your index at all), 
> hiting the spellcorrection handler for each input word individually is 
> probably your best bet.
> 
> 
> : > You don't need to wait for 1.3 to be released - you can simply use a
> : > recent nightly build.
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p15100704.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spellcheckhandler

Posted by Chris Hostetter <ho...@fucit.org>.

: 
: I did try with the latest nightly build and followed the steps outlined in
: http://wiki.apache.org/solr/SpellCheckerRequestHandler
: with regards to creating new catchall field 'spell' of type 'spell' and
: copied my text fields to 'spell' at index time.
: Still q=grapics returns 'graphics'
: but q=grapics card returns nothing.
: But the same queries return the correct spelling with string fieldtypes.
: Any fix available? 

I don't think Otis was suggesting any specific fix was available in the 
nightly builds, i believe he was just addressing specificly that if there 
was a bug someone commited a fix for you didnt' need to wait for 1.3 -- 
you can test it now using the nightly builds.

That said: I don't see any currently open or recent resolved bugs 
related to spellchecking and multiple words ... i believe (but i'm not 
100% positive) that "multi word" spell correction will work, as long as 
your dictionary contaisn those "multiple words" as individual "terms"

ie: if you want "graphics card" to be a suggestion for "grapics card" then 
you need to use a termSourceField in which "graphics card" is a single 
term (either because it is untokenized, or maybe because you use a 
word-based ngram tokenfilter, etc...)

alternately, if you want to get "graphics asdfghjk" as a suggestion for
"grapics asdfghjk" (even though "asdfghjk" isn't in your index at all), 
hiting the spellcorrection handler for each input word individually is 
probably your best bet.


: > You don't need to wait for 1.3 to be released - you can simply use a
: > recent nightly build.


-Hoss

Re: spellcheckhandler

Posted by anuvenk <an...@hotmail.com>.

I did try with the latest nightly build and followed the steps outlined in
http://wiki.apache.org/solr/SpellCheckerRequestHandler
with regards to creating new catchall field 'spell' of type 'spell' and
copied my text fields to 'spell' at index time.
Still q=grapics returns 'graphics'
but q=grapics card returns nothing.
But the same queries return the correct spelling with string fieldtypes.
Any fix available? 

Otis Gospodnetic wrote:
> 
> You don't need to wait for 1.3 to be released - you can simply use a
> recent nightly build.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> ----- Original Message ----
> From: anuvenk <an...@hotmail.com>
> To: solr-user@lucene.apache.org
> Sent: Monday, January 21, 2008 12:35:52 AM
> Subject: Re: spellcheckhandler
> 
> 
> I followed the steps outlined in 
> http://wiki.apache.org/solr/SpellCheckerRequestHandler
> with regards to setting up of the schema with a new field 'spell' and
> copying other fields to this 'spell' field at index time.
> It works fine with single word queries but doesn't return anything for
> multi-word queries. I read previous posts where this has been
>  discussed. I
> read that some of the active members are in the process of releasing
>  patches
> that fixes this problem. I'm actually trying to implement this spell
>  check
> in the production set up. Is it absolutely not possible to get spell
>  check
> results back for multi-word queries, should i wait for 1.3 release. If
>  there
> is any other option please educate me. In case a patch was already
>  released,
> how to add it to the current 1.2 version that i'm using?
> -- 
> View this message in context:
>  http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p15051336.html
Sent from the Solr - User mailing list archive at Nabble.com.