You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rob Casson <ro...@gmail.com> on 2007/11/29 01:25:20 UTC
LowerCaseFilterFactory and spellchecker
think i'm just doing something wrong...
was experimenting with the spellcheck handler with the nightly
checkout from 11-28; seems my spellchecking is case-sensitive, even
tho i think i'm adding the LowerCaseFilterFactory to both the index
and query analyzers.
here's a brief rundown of my testing steps.
from schema.xml:
<fieldtype name="spell" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldtype>
<field name="title" type="text" indexed="true" stored="true"
multiValued="true"/>
<field name="spelling" type="spell" indexed="true" stored="stored"
multiValued="true"/>
<copyField source="title" dest="spelling"/>
--------------------------------
from solrconfig.xml:
<requestHandler name="spellchecker"
class="solr.SpellCheckerRequestHandler" startup="lazy">
<lst name="defaults">
<int name="suggestionCount">1</int>
<float name="accuracy">0.5</float>
</lst>
<str name="spellcheckerIndexDir">spell</str>
<str name="termSourceField">spelling</str>
</requestHandler>
--------------------------------
adding the doc:
curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
--data-binary '<add><doc><field
name="title">Thorne</field></doc></add>'
curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
--data-binary '<optimize />'
--------------------------------
building the spellchecker:
http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker&cmd=rebuild
--------------------------------
querying the spellchecker:
results from http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<str name="words">Thorne</str>
<str name="exist">false</str>
<arr name="suggestions">
<str>thorne</str>
</arr>
</response>
results from http://localhost:8983/solr/select/?q=thorne&qt=spellchecker
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">2</int>
</lst>
<str name="words">thorne</str>
<str name="exist">true</str>
<arr name="suggestions"/>
</response>
any pointers as to what i'm doing wrong, misinterpreting? i suspect
i'm just doing something bone-headed in the analyzer sections...
thanks as always,
rob casson
miami university libraries
RE: LowerCaseFilterFactory and spellchecker
Posted by "Norskog, Lance" <la...@divvio.com>.
Oops, sorry, didn't think that through.
The query to the spellchecker is not filtered through the field query
definition. You have to do your own lower-case transformation when you
do the query. This is a simple thing to resolve. But, I'm working with
international alphabets and I would like 'protege' and 'protege with
both e's accented` to match. The ISOLatin1 filter does this in indexing
& querying. But I have to rip off the code and use it in my app to
preprocess words for spell-checks.
Lance
-----Original Message-----
From: Rob Casson [mailto:rob.casson@gmail.com]
Sent: Wednesday, November 28, 2007 5:16 PM
To: solr-user@lucene.apache.org
Subject: Re: LowerCaseFilterFactory and spellchecker
lance,
thanks for the quick reply....looks like 'thorne' is getting added to
the dictionary, as it comes up as a suggestion for 'Thorne'
i could certainly just lowercase in my client, but just confirming that
i'm not just screwing it up in the firstplace :)
thanks again,
rc
On Nov 28, 2007 8:11 PM, Norskog, Lance <la...@divvio.com> wrote:
> There are a few parameters for limiting what words are added to the
> dictionary. You might be trimming out 'thorne'. See this page:
>
> http://wiki.apache.org/solr/SpellCheckerRequestHandler
>
>
> -----Original Message-----
> From: Rob Casson [mailto:rob.casson@gmail.com]
> Sent: Wednesday, November 28, 2007 4:25 PM
> To: solr-user@lucene.apache.org
> Subject: LowerCaseFilterFactory and spellchecker
>
> think i'm just doing something wrong...
>
> was experimenting with the spellcheck handler with the nightly
> checkout from 11-28; seems my spellchecking is case-sensitive, even
> tho i think i'm adding the LowerCaseFilterFactory to both the index
> and query analyzers.
>
> here's a brief rundown of my testing steps.
>
> from schema.xml:
>
> <fieldtype name="spell" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StandardFilterFactory"/>
> <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StandardFilterFactory"/>
> <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldtype>
>
> <field name="title" type="text" indexed="true" stored="true"
> multiValued="true"/>
> <field name="spelling" type="spell" indexed="true" stored="stored"
> multiValued="true"/>
>
> <copyField source="title" dest="spelling"/>
>
> --------------------------------
>
> from solrconfig.xml:
>
> <requestHandler name="spellchecker"
> class="solr.SpellCheckerRequestHandler" startup="lazy">
> <lst name="defaults">
> <int name="suggestionCount">1</int>
> <float name="accuracy">0.5</float>
> </lst>
> <str name="spellcheckerIndexDir">spell</str>
> <str name="termSourceField">spelling</str>
> </requestHandler>
>
> --------------------------------
>
> adding the doc:
>
> curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
> --data-binary '<add><doc><field
> name="title">Thorne</field></doc></add>'
> curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
> --data-binary '<optimize />'
>
> --------------------------------
>
> building the spellchecker:
>
> http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker&cmd=rebuil
> d
>
> --------------------------------
>
> querying the spellchecker:
>
> results from
> http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker
>
> <?xml version="1.0" encoding="UTF-8"?> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">1</int>
> </lst>
> <str name="words">Thorne</str>
> <str name="exist">false</str>
> <arr name="suggestions">
> <str>thorne</str>
> </arr>
> </response>
>
> results from
> http://localhost:8983/solr/select/?q=thorne&qt=spellchecker
>
> <?xml version="1.0" encoding="UTF-8"?> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">2</int>
> </lst>
> <str name="words">thorne</str>
> <str name="exist">true</str>
> <arr name="suggestions"/>
> </response>
>
>
> any pointers as to what i'm doing wrong, misinterpreting? i suspect
i'm
> just doing something bone-headed in the analyzer sections...
>
> thanks as always,
>
> rob casson
> miami university libraries
>
Re: LowerCaseFilterFactory and spellchecker
Posted by Sean Timm <ti...@aol.com>.
It seems the best thing to do would be to do a case-insensitive
spellcheck, but provide the suggestion preserving the original case that
the user provided--or at least make this an option. Users are often
lazy about capitalization, especially with search where they've learned
from web search engines that case (typically) doesn't matter.
So, for example, Thurne would return Thorne, but thurne would return thorne.
-Sean
John Stewart wrote:
> Rob,
>
> Let's say it worked as you want it to in the first place. If the
> query is for Thurne, wouldn't you get thorne (lower-case 't') as the
> suggestion? This may look weird for proper names.
>
> jds
>
Re: LowerCaseFilterFactory and spellchecker
Posted by John Stewart <ca...@gmail.com>.
Rob,
Let's say it worked as you want it to in the first place. If the
query is for Thurne, wouldn't you get thorne (lower-case 't') as the
suggestion? This may look weird for proper names.
jds
Re: LowerCaseFilterFactory and spellchecker
Posted by Rob Casson <ro...@gmail.com>.
lance,
thanks for the quick reply....looks like 'thorne' is getting added to
the dictionary, as it comes up as a suggestion for 'Thorne'
i could certainly just lowercase in my client, but just confirming
that i'm not just screwing it up in the firstplace :)
thanks again,
rc
On Nov 28, 2007 8:11 PM, Norskog, Lance <la...@divvio.com> wrote:
> There are a few parameters for limiting what words are added to the
> dictionary. You might be trimming out 'thorne'. See this page:
>
> http://wiki.apache.org/solr/SpellCheckerRequestHandler
>
>
> -----Original Message-----
> From: Rob Casson [mailto:rob.casson@gmail.com]
> Sent: Wednesday, November 28, 2007 4:25 PM
> To: solr-user@lucene.apache.org
> Subject: LowerCaseFilterFactory and spellchecker
>
> think i'm just doing something wrong...
>
> was experimenting with the spellcheck handler with the nightly checkout
> from 11-28; seems my spellchecking is case-sensitive, even tho i think
> i'm adding the LowerCaseFilterFactory to both the index and query
> analyzers.
>
> here's a brief rundown of my testing steps.
>
> from schema.xml:
>
> <fieldtype name="spell" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StandardFilterFactory"/>
> <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StandardFilterFactory"/>
> <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldtype>
>
> <field name="title" type="text" indexed="true" stored="true"
> multiValued="true"/>
> <field name="spelling" type="spell" indexed="true" stored="stored"
> multiValued="true"/>
>
> <copyField source="title" dest="spelling"/>
>
> --------------------------------
>
> from solrconfig.xml:
>
> <requestHandler name="spellchecker"
> class="solr.SpellCheckerRequestHandler" startup="lazy">
> <lst name="defaults">
> <int name="suggestionCount">1</int>
> <float name="accuracy">0.5</float>
> </lst>
> <str name="spellcheckerIndexDir">spell</str>
> <str name="termSourceField">spelling</str>
> </requestHandler>
>
> --------------------------------
>
> adding the doc:
>
> curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
> --data-binary '<add><doc><field
> name="title">Thorne</field></doc></add>'
> curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
> --data-binary '<optimize />'
>
> --------------------------------
>
> building the spellchecker:
>
> http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker&cmd=rebuild
>
> --------------------------------
>
> querying the spellchecker:
>
> results from http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">1</int>
> </lst>
> <str name="words">Thorne</str>
> <str name="exist">false</str>
> <arr name="suggestions">
> <str>thorne</str>
> </arr>
> </response>
>
> results from http://localhost:8983/solr/select/?q=thorne&qt=spellchecker
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">2</int>
> </lst>
> <str name="words">thorne</str>
> <str name="exist">true</str>
> <arr name="suggestions"/>
> </response>
>
>
> any pointers as to what i'm doing wrong, misinterpreting? i suspect i'm
> just doing something bone-headed in the analyzer sections...
>
> thanks as always,
>
> rob casson
> miami university libraries
>
RE: LowerCaseFilterFactory and spellchecker
Posted by "Norskog, Lance" <la...@divvio.com>.
There are a few parameters for limiting what words are added to the
dictionary. You might be trimming out 'thorne'. See this page:
http://wiki.apache.org/solr/SpellCheckerRequestHandler
-----Original Message-----
From: Rob Casson [mailto:rob.casson@gmail.com]
Sent: Wednesday, November 28, 2007 4:25 PM
To: solr-user@lucene.apache.org
Subject: LowerCaseFilterFactory and spellchecker
think i'm just doing something wrong...
was experimenting with the spellcheck handler with the nightly checkout
from 11-28; seems my spellchecking is case-sensitive, even tho i think
i'm adding the LowerCaseFilterFactory to both the index and query
analyzers.
here's a brief rundown of my testing steps.
from schema.xml:
<fieldtype name="spell" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldtype>
<field name="title" type="text" indexed="true" stored="true"
multiValued="true"/>
<field name="spelling" type="spell" indexed="true" stored="stored"
multiValued="true"/>
<copyField source="title" dest="spelling"/>
--------------------------------
from solrconfig.xml:
<requestHandler name="spellchecker"
class="solr.SpellCheckerRequestHandler" startup="lazy">
<lst name="defaults">
<int name="suggestionCount">1</int>
<float name="accuracy">0.5</float>
</lst>
<str name="spellcheckerIndexDir">spell</str>
<str name="termSourceField">spelling</str>
</requestHandler>
--------------------------------
adding the doc:
curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
--data-binary '<add><doc><field
name="title">Thorne</field></doc></add>'
curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
--data-binary '<optimize />'
--------------------------------
building the spellchecker:
http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker&cmd=rebuild
--------------------------------
querying the spellchecker:
results from http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<str name="words">Thorne</str>
<str name="exist">false</str>
<arr name="suggestions">
<str>thorne</str>
</arr>
</response>
results from http://localhost:8983/solr/select/?q=thorne&qt=spellchecker
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">2</int>
</lst>
<str name="words">thorne</str>
<str name="exist">true</str>
<arr name="suggestions"/>
</response>
any pointers as to what i'm doing wrong, misinterpreting? i suspect i'm
just doing something bone-headed in the analyzer sections...
thanks as always,
rob casson
miami university libraries
Re: LowerCaseFilterFactory and spellchecker
Posted by Chris Hostetter <ho...@fucit.org>.
: It does make some sense, but I'm not sure that it should be blindly analyzed
: without adding logic to handle certain cases (like the QueryParser does).
: What happens if the analyzer produces two tokens? The spellchecker has to
: deal with this appropriately. Spell checkers should be able to "reverse
: analyze" the suggestions as well, so "Pyhton" gets corrected to "Python" and
: not "python". Similarly, "ad-hco" should probably suggest "ad-hoc" and not
: "adhoc".
These all seem like arguments in favor of using the query analyzer for the
source field ... yes, the person making the schema has to think carefully
about what the analyzer does, but they already have to be equally carful
about what the indexing analyzer does.
Bottom line: if the indexing analyzer is used to build the dictionary, the
query anlyzer should be used before looking up enteries in the dictionary.
"Python" is only a good suggestion for "Pyhton" if searching for "Python"
is going to return something. "python" might be a better suggestion.
Likewise "Python" might be a good suggestion for "python" if it's always
capitalized in the source field.
-Hoss
Re: LowerCaseFilterFactory and spellchecker
Posted by Mike Klaas <mi...@gmail.com>.
That's a pretty difficult proposition. Currently the spellcheck
doesn't look at documents at all: only the top-level term&count data
is used to create the index. Adding select-by-query would be
considerably more complicated and expensive (I think a near-full
iteration of TermDocs would be needed).
-Mike
On 30-Nov-07, at 1:45 PM, Norskog, Lance wrote:
> What would also help is a query to find records for the spellcheck
> dictionary builder. We would like to make separate spelling indexes
> for
> all records in english, one in spanish, etc. We would also like to
> slice&dice the records by other dimensions as well, and have separate
> spelling DBs for each partition.
>
> That is, we would like to make an english spelling dictionary and a
> spanish dictionary, and also make subject-specific dictionaries like
> News and Sports. These are separate orthogonal partitions of our
> index.
>
> The usual practice for this is to create separate fields in the
> records
> where one field is only populated for english records, one for spanish
> records, etc. In our situation this is not practical for space reasons
> and other proprietary reasons.
>
> Lance
>
> -----Original Message-----
> From: Mike Klaas [mailto:mike.klaas@gmail.com]
> Sent: Thursday, November 29, 2007 6:01 PM
> To: solr-user@lucene.apache.org
> Subject: Re: LowerCaseFilterFactory and spellchecker
>
> On 29-Nov-07, at 5:40 PM, Chris Hostetter wrote:
>
>>
>> I'm not very familiar with the SpellCheckerRequestHandler, but i
>> don't
>
>> think you are doing anything wrong.
>>
>> a quick skim of the code indicates that the "q" param isn't being
>> analyzed by that handler, so the raw input string is pased to the
>> SpellChecker.suggestSimilar method. This may or may not have been
>> intentional.
>>
>> I personally can't think of
>> any reason why it wouldn't make sense to get the query analyzer for
>> the termSourceField and use it to analyze the q param before getting
>> suggestions.
>
> It does make some sense, but I'm not sure that it should be blindly
> analyzed without adding logic to handle certain cases (like the
> QueryParser does). What happens if the analyzer produces two tokens?
> The spellchecker has to deal with this appropriately. Spell checkers
> should be able to "reverse analyze" the suggestions as well, so
> "Pyhton"
> gets corrected to "Python" and not "python". Similarly, "ad-hco"
> should
> probably suggest "ad-hoc" and not "adhoc".
>
> -Mike
RE: LowerCaseFilterFactory and spellchecker
Posted by "Norskog, Lance" <la...@divvio.com>.
What would also help is a query to find records for the spellcheck
dictionary builder. We would like to make separate spelling indexes for
all records in english, one in spanish, etc. We would also like to
slice&dice the records by other dimensions as well, and have separate
spelling DBs for each partition.
That is, we would like to make an english spelling dictionary and a
spanish dictionary, and also make subject-specific dictionaries like
News and Sports. These are separate orthogonal partitions of our index.
The usual practice for this is to create separate fields in the records
where one field is only populated for english records, one for spanish
records, etc. In our situation this is not practical for space reasons
and other proprietary reasons.
Lance
-----Original Message-----
From: Mike Klaas [mailto:mike.klaas@gmail.com]
Sent: Thursday, November 29, 2007 6:01 PM
To: solr-user@lucene.apache.org
Subject: Re: LowerCaseFilterFactory and spellchecker
On 29-Nov-07, at 5:40 PM, Chris Hostetter wrote:
>
> I'm not very familiar with the SpellCheckerRequestHandler, but i don't
> think you are doing anything wrong.
>
> a quick skim of the code indicates that the "q" param isn't being
> analyzed by that handler, so the raw input string is pased to the
> SpellChecker.suggestSimilar method. This may or may not have been
> intentional.
>
> I personally can't think of
> any reason why it wouldn't make sense to get the query analyzer for
> the termSourceField and use it to analyze the q param before getting
> suggestions.
It does make some sense, but I'm not sure that it should be blindly
analyzed without adding logic to handle certain cases (like the
QueryParser does). What happens if the analyzer produces two tokens?
The spellchecker has to deal with this appropriately. Spell checkers
should be able to "reverse analyze" the suggestions as well, so "Pyhton"
gets corrected to "Python" and not "python". Similarly, "ad-hco" should
probably suggest "ad-hoc" and not "adhoc".
-Mike
Re: LowerCaseFilterFactory and spellchecker
Posted by Mike Klaas <mi...@gmail.com>.
On 29-Nov-07, at 5:40 PM, Chris Hostetter wrote:
>
> I'm not very familiar with the SpellCheckerRequestHandler, but i don't
> think you are doing anything wrong.
>
> a quick skim of the code indicates that the "q" param isn't being
> analyzed
> by that handler, so the raw input string is pased to the
> SpellChecker.suggestSimilar method. This may or may not have been
> intentional.
>
> I personally can't think of
> any reason why it wouldn't make sense to get the query analyzer for
> the
> termSourceField and use it to analyze the q param before getting
> suggestions.
It does make some sense, but I'm not sure that it should be blindly
analyzed without adding logic to handle certain cases (like the
QueryParser does). What happens if the analyzer produces two
tokens? The spellchecker has to deal with this appropriately. Spell
checkers should be able to "reverse analyze" the suggestions as well,
so "Pyhton" gets corrected to "Python" and not "python". Similarly,
"ad-hco" should probably suggest "ad-hoc" and not "adhoc".
-Mike
Re: LowerCaseFilterFactory and spellchecker
Posted by Chris Hostetter <ho...@fucit.org>.
: think i'm just doing something wrong...
:
: was experimenting with the spellcheck handler with the nightly
: checkout from 11-28; seems my spellchecking is case-sensitive, even
: tho i think i'm adding the LowerCaseFilterFactory to both the index
: and query analyzers.
I'm not very familiar with the SpellCheckerRequestHandler, but i don't
think you are doing anything wrong.
a quick skim of the code indicates that the "q" param isn't being analyzed
by that handler, so the raw input string is pased to the
SpellChecker.suggestSimilar method. This may or may not have been
intentional.
I personally can't think of
any reason why it wouldn't make sense to get the query analyzer for the
termSourceField and use it to analyze the q param before getting
suggestions.
-Hoss
Re: LowerCaseFilterFactory and spellchecker
Posted by sunnyfr <jo...@gmail.com>.
Hi,
After reading this post, I looked for in solrconfig.xml :
<requestHandler name="spellchecker"
class="solr.SpellCheckerRequestHandler" startup="lazy">
<lst name="defaults">
<int name="suggestionCount">1</int>
<float name="accuracy">0.5</float>
</lst>
<str name="spellcheckerIndexDir">spell</str>
<str name="termSourceField">spelling</str>
</requestHandler>
But couldn't find it, just find :
<!-- a request handler utilizing the spellcheck component -->
<requestHandler name="/spellCheckCompRH" class="solr.SearchHandler">
<lst name="defaults">
<!-- omp = Only More Popular -->
<str name="spellcheck.onlyMorePopular">false</str>
<!-- exr = Extended Results -->
<str name="spellcheck.extendedResults">false</str>
<!-- The number of suggestions to return -->
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
Can you tell me the difference? and which dir should i point to ?
Thanks a lot,
(solr1.3)
Rob Casson wrote:
>
> think i'm just doing something wrong...
>
> was experimenting with the spellcheck handler with the nightly
> checkout from 11-28; seems my spellchecking is case-sensitive, even
> tho i think i'm adding the LowerCaseFilterFactory to both the index
> and query analyzers.
>
> here's a brief rundown of my testing steps.
>
> from schema.xml:
>
> <fieldtype name="spell" class="solr.TextField" positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StandardFilterFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StandardFilterFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldtype>
>
> <field name="title" type="text" indexed="true" stored="true"
> multiValued="true"/>
> <field name="spelling" type="spell" indexed="true" stored="stored"
> multiValued="true"/>
>
> <copyField source="title" dest="spelling"/>
>
> --------------------------------
>
> from solrconfig.xml:
>
> <requestHandler name="spellchecker"
> class="solr.SpellCheckerRequestHandler" startup="lazy">
> <lst name="defaults">
> <int name="suggestionCount">1</int>
> <float name="accuracy">0.5</float>
> </lst>
> <str name="spellcheckerIndexDir">spell</str>
> <str name="termSourceField">spelling</str>
> </requestHandler>
>
> --------------------------------
>
> adding the doc:
>
> curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
> --data-binary '<add><doc><field
> name="title">Thorne</field></doc></add>'
> curl http://localhost:8983/solr/update -H "Content-Type: text/xml"
> --data-binary '<optimize />'
>
> --------------------------------
>
> building the spellchecker:
>
> http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker&cmd=rebuild
>
> --------------------------------
>
> querying the spellchecker:
>
> results from http://localhost:8983/solr/select/?q=Thorne&qt=spellchecker
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">1</int>
> </lst>
> <str name="words">Thorne</str>
> <str name="exist">false</str>
> <arr name="suggestions">
> <str>thorne</str>
> </arr>
> </response>
>
> results from http://localhost:8983/solr/select/?q=thorne&qt=spellchecker
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">2</int>
> </lst>
> <str name="words">thorne</str>
> <str name="exist">true</str>
> <arr name="suggestions"/>
> </response>
>
>
> any pointers as to what i'm doing wrong, misinterpreting? i suspect
> i'm just doing something bone-headed in the analyzer sections...
>
> thanks as always,
>
> rob casson
> miami university libraries
>
>
--
View this message in context: http://www.nabble.com/LowerCaseFilterFactory-and-spellchecker-tp14016710p20029819.html
Sent from the Solr - User mailing list archive at Nabble.com.