You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Simone Tripodi <si...@apache.org> on 2011/10/18 14:05:47 UTC

IndexBasedSpellChecker on multiple fields

Hi all guys,
I need to configure the IndexBasedSpellChecker that uses more than
just one field as a spelling dictionary, is it possible to achieve?
In the meanwhile I configured two spellcheckers and let users switch
from a checkeer to another via params on GET request, but looks like
people are not particularly happy about it...
The main problem is that fields I need to speel contain different
informations, I mean the intersection between the two sets could be
empty.
Many thanks in advance, all the best!
Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/

RE: IndexBasedSpellChecker on multiple fields

Posted by "Dyer, James" <Ja...@ingrambook.com>.
Here's approximately how I've got it set up to do essentially the same thing, in one of our production indexes:
-----------
schema.xml has:

<fieldType name="text_spelling" class="solr.TextField" positionIncrementGap="100">
 { whitespaceanalyzer, stopwordfilter, wordfelimiterfilter, lowercasefilter ... or whatever your app needs } 
</fieldType>

<field name="abstract"... />
<field name="subject" ... />
<field name="spelling_abstract_subject" type="text_spelling" indexed="true" stored="false" multiValued="true" omitNorms="true" />

<copyField source="abstract" dest="spelling_abstract_subject" />
<copyField source="subject" dest="spelling_abstract_subject" />
-------------
solrconfig.xml has:

<requestHandler name="search_abstract_and_subject" class="solr.SearchHandler" >
 <lst name="defaults">
  <str name="defType">edismax</str>
  <str name="echoParams">explicit</str>
  <float name="tie">0.01</float>
  <str name="qf">abstract subject</str>
  <str name="q.alt">*:*</str>
  <str name="spellcheck">true</str>
  <str name="spellcheck.dictionary">spellchecker_abstract_subject</str>
  <str name="spellcheck.count">10</str>
  <str name="spellcheck.collate">true</str>
  <str name="spellcheck.maxCollationTries">10</str>
  <str name="spellcheck.maxCollations">1</str>
  <str name="spellcheck.collateExtendedResults">true</str>
 </lst>
 <arr name="last-components">
  <str>spellcheck</str>
 </arr> 
</requestHandler>

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
 <str name="queryAnalyzerFieldType">text_spelling</str>
 <lst name="spellchecker">
  <str name="name">spellchecker_abstract_subject</str>
  <str name="field">spelling_abstract_subject</str>
  <str name="fieldType">text_spelling</str>
  <str name="spellcheckIndexDir">./spellchecker</str>
 </lst> 
</searchComponent>
---------------
You can then query across the 2 fields and get spell suggestions like this:
  q=query goes here&qt=search_abstract_and_subject

Of course if this is the first query since startup/commit, unless you're building automatically somehow, add:
&spellcheck.build=true

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: simone.tripodi@gmail.com [mailto:simone.tripodi@gmail.com] On Behalf Of Simone Tripodi
Sent: Thursday, October 20, 2011 8:58 AM
To: solr-user@lucene.apache.org
Subject: Re: IndexBasedSpellChecker on multiple fields

Hi James,
sorry for the noise but I am not able to using the approach described,
I'm sure I'm misconfiguring something.

Basically, I have 2 fields, `abstract` and `subject`, and a field
`master-dictionary` where the first to have ben copied.
Then, in solrconfig.xml I configured the SpellCheckComponent which
executes checks on master-dictionary field...
When I start Solr, raises an exception:

Oct 20, 2011 3:51:00 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Specified dictionary
does not exist.
	at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:164)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
	at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54)
	at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1177)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:680)

Can you help me please checking this schema[1]?

Many thanks in advance, all the best!
Simo

[1] https://gist.github.com/1301194

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/



On Wed, Oct 19, 2011 at 9:39 AM, Simone Tripodi
<si...@apache.org> wrote:
> Hi James!
> terrific suggestion, thanks a lot!!! And sorry for the delay (due to
> my timezone ;) )
> I'll let you know how things will go, thanks once again and have a nice day!
> Simo
>
> http://people.apache.org/~simonetripodi/
> http://simonetripodi.livejournal.com/
> http://twitter.com/simonetripodi
> http://www.99soft.org/
>
>
>
> On Tue, Oct 18, 2011 at 5:16 PM, Dyer, James <Ja...@ingrambook.com> wrote:
>> Simone,
>>
>> You can set up a "master" dictionary but with a few caveats.  What you'll need to do is <copyfield> all of the fields you want to include in your "master" dictionary into one field and base your IndexBasedSpellChecker dictionary on that.  In addition, I would recommend you use the "collate" feature and set "spellcheck.maxCollationTries" to something greater than zero (5-10 is usually good).  Otherwise, you probably will get a lot of ridiculous suggestions from it trying to correct words from one field with values from another.  See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate for more information.
>>
>> There is still a big problem with approach, however.  Unless you set "onlyMorePopular=true", Solr will never suggest a correction for a word that exists in the dictionary.  By creating a huge "master" dictionary, you will be increasing the chances that Solr will assume your users' misspelled words are in fact correct.  One way to work around this is instead of blindly using "copyField", to hand-pick a subset of your terms for the master field on which you base your dictionary.  Another workaround is to use "onlyMorePopular", although this has its own problems.  See the discussion for SOLR-2585 (https://issues.apache.org/jira/browse/SOLR-2585), which aims to solve these problems.
>>
>> James Dyer
>> E-Commerce Systems
>> Ingram Content Group
>> (615) 213-4311
>>
>>
>> -----Original Message-----
>> From: simone.tripodi@gmail.com [mailto:simone.tripodi@gmail.com] On Behalf Of Simone Tripodi
>> Sent: Tuesday, October 18, 2011 7:06 AM
>> To: solr-user@lucene.apache.org
>> Subject: IndexBasedSpellChecker on multiple fields
>>
>> Hi all guys,
>> I need to configure the IndexBasedSpellChecker that uses more than
>> just one field as a spelling dictionary, is it possible to achieve?
>> In the meanwhile I configured two spellcheckers and let users switch
>> from a checkeer to another via params on GET request, but looks like
>> people are not particularly happy about it...
>> The main problem is that fields I need to speel contain different
>> informations, I mean the intersection between the two sets could be
>> empty.
>> Many thanks in advance, all the best!
>> Simo
>>
>> http://people.apache.org/~simonetripodi/
>> http://simonetripodi.livejournal.com/
>> http://twitter.com/simonetripodi
>> http://www.99soft.org/
>>
>

Re: IndexBasedSpellChecker on multiple fields

Posted by Simone Tripodi <si...@apache.org>.
Hi James,
sorry for the noise but I am not able to using the approach described,
I'm sure I'm misconfiguring something.

Basically, I have 2 fields, `abstract` and `subject`, and a field
`master-dictionary` where the first to have ben copied.
Then, in solrconfig.xml I configured the SpellCheckComponent which
executes checks on master-dictionary field...
When I start Solr, raises an exception:

Oct 20, 2011 3:51:00 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Specified dictionary
does not exist.
	at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:164)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
	at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54)
	at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1177)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:680)

Can you help me please checking this schema[1]?

Many thanks in advance, all the best!
Simo

[1] https://gist.github.com/1301194

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/



On Wed, Oct 19, 2011 at 9:39 AM, Simone Tripodi
<si...@apache.org> wrote:
> Hi James!
> terrific suggestion, thanks a lot!!! And sorry for the delay (due to
> my timezone ;) )
> I'll let you know how things will go, thanks once again and have a nice day!
> Simo
>
> http://people.apache.org/~simonetripodi/
> http://simonetripodi.livejournal.com/
> http://twitter.com/simonetripodi
> http://www.99soft.org/
>
>
>
> On Tue, Oct 18, 2011 at 5:16 PM, Dyer, James <Ja...@ingrambook.com> wrote:
>> Simone,
>>
>> You can set up a "master" dictionary but with a few caveats.  What you'll need to do is <copyfield> all of the fields you want to include in your "master" dictionary into one field and base your IndexBasedSpellChecker dictionary on that.  In addition, I would recommend you use the "collate" feature and set "spellcheck.maxCollationTries" to something greater than zero (5-10 is usually good).  Otherwise, you probably will get a lot of ridiculous suggestions from it trying to correct words from one field with values from another.  See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate for more information.
>>
>> There is still a big problem with approach, however.  Unless you set "onlyMorePopular=true", Solr will never suggest a correction for a word that exists in the dictionary.  By creating a huge "master" dictionary, you will be increasing the chances that Solr will assume your users' misspelled words are in fact correct.  One way to work around this is instead of blindly using "copyField", to hand-pick a subset of your terms for the master field on which you base your dictionary.  Another workaround is to use "onlyMorePopular", although this has its own problems.  See the discussion for SOLR-2585 (https://issues.apache.org/jira/browse/SOLR-2585), which aims to solve these problems.
>>
>> James Dyer
>> E-Commerce Systems
>> Ingram Content Group
>> (615) 213-4311
>>
>>
>> -----Original Message-----
>> From: simone.tripodi@gmail.com [mailto:simone.tripodi@gmail.com] On Behalf Of Simone Tripodi
>> Sent: Tuesday, October 18, 2011 7:06 AM
>> To: solr-user@lucene.apache.org
>> Subject: IndexBasedSpellChecker on multiple fields
>>
>> Hi all guys,
>> I need to configure the IndexBasedSpellChecker that uses more than
>> just one field as a spelling dictionary, is it possible to achieve?
>> In the meanwhile I configured two spellcheckers and let users switch
>> from a checkeer to another via params on GET request, but looks like
>> people are not particularly happy about it...
>> The main problem is that fields I need to speel contain different
>> informations, I mean the intersection between the two sets could be
>> empty.
>> Many thanks in advance, all the best!
>> Simo
>>
>> http://people.apache.org/~simonetripodi/
>> http://simonetripodi.livejournal.com/
>> http://twitter.com/simonetripodi
>> http://www.99soft.org/
>>
>

Re: IndexBasedSpellChecker on multiple fields

Posted by Simone Tripodi <si...@apache.org>.
Hi James!
terrific suggestion, thanks a lot!!! And sorry for the delay (due to
my timezone ;) )
I'll let you know how things will go, thanks once again and have a nice day!
Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/



On Tue, Oct 18, 2011 at 5:16 PM, Dyer, James <Ja...@ingrambook.com> wrote:
> Simone,
>
> You can set up a "master" dictionary but with a few caveats.  What you'll need to do is <copyfield> all of the fields you want to include in your "master" dictionary into one field and base your IndexBasedSpellChecker dictionary on that.  In addition, I would recommend you use the "collate" feature and set "spellcheck.maxCollationTries" to something greater than zero (5-10 is usually good).  Otherwise, you probably will get a lot of ridiculous suggestions from it trying to correct words from one field with values from another.  See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate for more information.
>
> There is still a big problem with approach, however.  Unless you set "onlyMorePopular=true", Solr will never suggest a correction for a word that exists in the dictionary.  By creating a huge "master" dictionary, you will be increasing the chances that Solr will assume your users' misspelled words are in fact correct.  One way to work around this is instead of blindly using "copyField", to hand-pick a subset of your terms for the master field on which you base your dictionary.  Another workaround is to use "onlyMorePopular", although this has its own problems.  See the discussion for SOLR-2585 (https://issues.apache.org/jira/browse/SOLR-2585), which aims to solve these problems.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: simone.tripodi@gmail.com [mailto:simone.tripodi@gmail.com] On Behalf Of Simone Tripodi
> Sent: Tuesday, October 18, 2011 7:06 AM
> To: solr-user@lucene.apache.org
> Subject: IndexBasedSpellChecker on multiple fields
>
> Hi all guys,
> I need to configure the IndexBasedSpellChecker that uses more than
> just one field as a spelling dictionary, is it possible to achieve?
> In the meanwhile I configured two spellcheckers and let users switch
> from a checkeer to another via params on GET request, but looks like
> people are not particularly happy about it...
> The main problem is that fields I need to speel contain different
> informations, I mean the intersection between the two sets could be
> empty.
> Many thanks in advance, all the best!
> Simo
>
> http://people.apache.org/~simonetripodi/
> http://simonetripodi.livejournal.com/
> http://twitter.com/simonetripodi
> http://www.99soft.org/
>

RE: IndexBasedSpellChecker on multiple fields

Posted by "Dyer, James" <Ja...@ingrambook.com>.
Simone,

You can set up a "master" dictionary but with a few caveats.  What you'll need to do is <copyfield> all of the fields you want to include in your "master" dictionary into one field and base your IndexBasedSpellChecker dictionary on that.  In addition, I would recommend you use the "collate" feature and set "spellcheck.maxCollationTries" to something greater than zero (5-10 is usually good).  Otherwise, you probably will get a lot of ridiculous suggestions from it trying to correct words from one field with values from another.  See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate for more information.

There is still a big problem with approach, however.  Unless you set "onlyMorePopular=true", Solr will never suggest a correction for a word that exists in the dictionary.  By creating a huge "master" dictionary, you will be increasing the chances that Solr will assume your users' misspelled words are in fact correct.  One way to work around this is instead of blindly using "copyField", to hand-pick a subset of your terms for the master field on which you base your dictionary.  Another workaround is to use "onlyMorePopular", although this has its own problems.  See the discussion for SOLR-2585 (https://issues.apache.org/jira/browse/SOLR-2585), which aims to solve these problems.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: simone.tripodi@gmail.com [mailto:simone.tripodi@gmail.com] On Behalf Of Simone Tripodi
Sent: Tuesday, October 18, 2011 7:06 AM
To: solr-user@lucene.apache.org
Subject: IndexBasedSpellChecker on multiple fields

Hi all guys,
I need to configure the IndexBasedSpellChecker that uses more than
just one field as a spelling dictionary, is it possible to achieve?
In the meanwhile I configured two spellcheckers and let users switch
from a checkeer to another via params on GET request, but looks like
people are not particularly happy about it...
The main problem is that fields I need to speel contain different
informations, I mean the intersection between the two sets could be
empty.
Many thanks in advance, all the best!
Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/