You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marc Ghorayeb <de...@hotmail.com> on 2010/07/08 09:46:09 UTC
Spellcheck help
Hello,I've been trying to get rid of a bug when using the spellcheck but so far with no success :(When searching for a word that starts with a number, for example "3dsmax", i get the results that i want, BUT the spellcheck says it is not correctly spelled AND the collation gives me "33dsmax". Further investigation shows that the spellcheck is actually only checking "dsmax" which it considers does not exist and gives me "3dsmax" for better results, but since i have spellcheck.collate = true, the collation that i show is "33dsmax" with the first 3 being the one discarded by the spellchecker... Otherwise, the spellcheck works correctly for normal words... any ideas? :(My spellcheck field is fairly classic, whitespace tokenizer, with lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
_________________________________________________________________
Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
http://www.messengersurvotremobile.com/?d=iPhone
RE: Spellcheck help
Posted by "Dyer, James" <Ja...@ingrambook.com>.
If you could, let me know how your testing goes with this change. I too am interested in having the Collate work as good as it can. It looks like the code would be better with this change but then again I don't know what the original author was thinking when this was put in.
James Dyer
E-Commerce Systems
Ingram Book Company
(615) 213-4311
-----Original Message-----
From: Marc Ghorayeb [mailto:dekay999@hotmail.com]
Sent: Tuesday, July 27, 2010 8:07 AM
To: solr-user@lucene.apache.org
Subject: RE: Spellcheck help
Thanks for the input, i'll check it out!
Marc
> Subject: RE: Spellcheck help
> Date: Fri, 23 Jul 2010 13:12:04 -0500
> From: James.Dyer@ingrambook.com
> To: solr-user@lucene.apache.org
>
> In org.apache.solr.spelling.SpellingQueryConverter, find the line (#84):
>
> final static String PATTERN = "(?:(?!(" + NMTOKEN + ":|\\d+)))[\\p{L}_\\-0-9]+";
>
> and remove the |\\d+ to make it:
>
> final static String PATTERN = "(?:(?!" + NMTOKEN + ":))[\\p{L}_\\-0-9]+";
>
> My testing shows this solves your problem. The caution is to test it against all your use cases because obviously someone thought we should ignore leading digits from keywords. Surely there's a reason why although I can't think of it.
>
> James Dyer
> E-Commerce Systems
> Ingram Book Company
> (615) 213-4311
>
> -----Original Message-----
> From: dekay999@hotmail.com [mailto:dekay999@hotmail.com]
> Sent: Saturday, July 17, 2010 12:41 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Spellcheck help
>
> Can anybody help me with this? :(
>
> -----Original Message-----
> From: Marc Ghorayeb
> Sent: Thursday, July 08, 2010 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck help
>
>
> Hello,I've been trying to get rid of a bug when using the spellcheck but so
> far with no success :(When searching for a word that starts with a number,
> for example "3dsmax", i get the results that i want, BUT the spellcheck says
> it is not correctly spelled AND the collation gives me "33dsmax". Further
> investigation shows that the spellcheck is actually only checking "dsmax"
> which it considers does not exist and gives me "3dsmax" for better results,
> but since i have spellcheck.collate = true, the collation that i show is
> "33dsmax" with the first 3 being the one discarded by the spellchecker...
> Otherwise, the spellcheck works correctly for normal words... any ideas?
> :(My spellcheck field is fairly classic, whitespace tokenizer, with
> lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
> _________________________________________________________________
> Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
> http://www.messengersurvotremobile.com/?d=iPhone
>
_________________________________________________________________
Exclu : Téléchargez la nouvelle version de Messenger !
http://clk.atdmt.com/FRM/go/244627952/direct/01/
RE: Spellcheck help
Posted by Marc Ghorayeb <de...@hotmail.com>.
Thanks for the input, i'll check it out!
Marc
> Subject: RE: Spellcheck help
> Date: Fri, 23 Jul 2010 13:12:04 -0500
> From: James.Dyer@ingrambook.com
> To: solr-user@lucene.apache.org
>
> In org.apache.solr.spelling.SpellingQueryConverter, find the line (#84):
>
> final static String PATTERN = "(?:(?!(" + NMTOKEN + ":|\\d+)))[\\p{L}_\\-0-9]+";
>
> and remove the |\\d+ to make it:
>
> final static String PATTERN = "(?:(?!" + NMTOKEN + ":))[\\p{L}_\\-0-9]+";
>
> My testing shows this solves your problem. The caution is to test it against all your use cases because obviously someone thought we should ignore leading digits from keywords. Surely there's a reason why although I can't think of it.
>
> James Dyer
> E-Commerce Systems
> Ingram Book Company
> (615) 213-4311
>
> -----Original Message-----
> From: dekay999@hotmail.com [mailto:dekay999@hotmail.com]
> Sent: Saturday, July 17, 2010 12:41 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Spellcheck help
>
> Can anybody help me with this? :(
>
> -----Original Message-----
> From: Marc Ghorayeb
> Sent: Thursday, July 08, 2010 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck help
>
>
> Hello,I've been trying to get rid of a bug when using the spellcheck but so
> far with no success :(When searching for a word that starts with a number,
> for example "3dsmax", i get the results that i want, BUT the spellcheck says
> it is not correctly spelled AND the collation gives me "33dsmax". Further
> investigation shows that the spellcheck is actually only checking "dsmax"
> which it considers does not exist and gives me "3dsmax" for better results,
> but since i have spellcheck.collate = true, the collation that i show is
> "33dsmax" with the first 3 being the one discarded by the spellchecker...
> Otherwise, the spellcheck works correctly for normal words... any ideas?
> :(My spellcheck field is fairly classic, whitespace tokenizer, with
> lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
> _________________________________________________________________
> Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
> http://www.messengersurvotremobile.com/?d=iPhone
>
_________________________________________________________________
Exclu : Téléchargez la nouvelle version de Messenger !
http://clk.atdmt.com/FRM/go/244627952/direct/01/
RE: Spellcheck help
Posted by "Dyer, James" <Ja...@ingrambook.com>.
In org.apache.solr.spelling.SpellingQueryConverter, find the line (#84):
final static String PATTERN = "(?:(?!(" + NMTOKEN + ":|\\d+)))[\\p{L}_\\-0-9]+";
and remove the |\\d+ to make it:
final static String PATTERN = "(?:(?!" + NMTOKEN + ":))[\\p{L}_\\-0-9]+";
My testing shows this solves your problem. The caution is to test it against all your use cases because obviously someone thought we should ignore leading digits from keywords. Surely there's a reason why although I can't think of it.
James Dyer
E-Commerce Systems
Ingram Book Company
(615) 213-4311
-----Original Message-----
From: dekay999@hotmail.com [mailto:dekay999@hotmail.com]
Sent: Saturday, July 17, 2010 12:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck help
Can anybody help me with this? :(
-----Original Message-----
From: Marc Ghorayeb
Sent: Thursday, July 08, 2010 9:46 AM
To: solr-user@lucene.apache.org
Subject: Spellcheck help
Hello,I've been trying to get rid of a bug when using the spellcheck but so
far with no success :(When searching for a word that starts with a number,
for example "3dsmax", i get the results that i want, BUT the spellcheck says
it is not correctly spelled AND the collation gives me "33dsmax". Further
investigation shows that the spellcheck is actually only checking "dsmax"
which it considers does not exist and gives me "3dsmax" for better results,
but since i have spellcheck.collate = true, the collation that i show is
"33dsmax" with the first 3 being the one discarded by the spellchecker...
Otherwise, the spellcheck works correctly for normal words... any ideas?
:(My spellcheck field is fairly classic, whitespace tokenizer, with
lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
_________________________________________________________________
Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
http://www.messengersurvotremobile.com/?d=iPhone
Re: Spellcheck help
Posted by Lance Norskog <go...@gmail.com>.
You can make two spellcheckers and consult both of them.
A spelling database made from an existing text index tends to have a
lot of confusing junk.
On Sun, Jul 18, 2010 at 4:43 AM, <de...@hotmail.com> wrote:
> Can i make a dictionnary of only the words that are having problems? There
> are not that many terms that present this behavior, but it is important for
> me to get rid of this bug. So can i use the dictionnary AND the list built
> by the spellchecker?
>
> -----Original Message----- From: Lance Norskog
> Sent: Sunday, July 18, 2010 1:42 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Spellcheck help
>
> Spellchecking can also take a dictionary as its database. Is it
> possible to create a dictionary of the terms you want suggested?
>
> On Sat, Jul 17, 2010 at 10:40 AM, <de...@hotmail.com> wrote:
>>
>> Can anybody help me with this? :(
>>
>> -----Original Message----- From: Marc Ghorayeb
>> Sent: Thursday, July 08, 2010 9:46 AM
>> To: solr-user@lucene.apache.org
>> Subject: Spellcheck help
>>
>>
>> Hello,I've been trying to get rid of a bug when using the spellcheck but
>> so
>> far with no success :(When searching for a word that starts with a number,
>> for example "3dsmax", i get the results that i want, BUT the spellcheck
>> says
>> it is not correctly spelled AND the collation gives me "33dsmax". Further
>> investigation shows that the spellcheck is actually only checking "dsmax"
>> which it considers does not exist and gives me "3dsmax" for better
>> results,
>> but since i have spellcheck.collate = true, the collation that i show is
>> "33dsmax" with the first 3 being the one discarded by the spellchecker...
>> Otherwise, the spellcheck works correctly for normal words... any ideas?
>> :(My spellcheck field is fairly classic, whitespace tokenizer, with
>> lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
>> _________________________________________________________________
>> Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
>> http://www.messengersurvotremobile.com/?d=iPhone
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
--
Lance Norskog
goksron@gmail.com
Re: Spellcheck help
Posted by de...@hotmail.com.
Can i make a dictionnary of only the words that are having problems? There
are not that many terms that present this behavior, but it is important for
me to get rid of this bug. So can i use the dictionnary AND the list built
by the spellchecker?
-----Original Message-----
From: Lance Norskog
Sent: Sunday, July 18, 2010 1:42 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck help
Spellchecking can also take a dictionary as its database. Is it
possible to create a dictionary of the terms you want suggested?
On Sat, Jul 17, 2010 at 10:40 AM, <de...@hotmail.com> wrote:
> Can anybody help me with this? :(
>
> -----Original Message----- From: Marc Ghorayeb
> Sent: Thursday, July 08, 2010 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck help
>
>
> Hello,I've been trying to get rid of a bug when using the spellcheck but
> so
> far with no success :(When searching for a word that starts with a number,
> for example "3dsmax", i get the results that i want, BUT the spellcheck
> says
> it is not correctly spelled AND the collation gives me "33dsmax". Further
> investigation shows that the spellcheck is actually only checking "dsmax"
> which it considers does not exist and gives me "3dsmax" for better
> results,
> but since i have spellcheck.collate = true, the collation that i show is
> "33dsmax" with the first 3 being the one discarded by the spellchecker...
> Otherwise, the spellcheck works correctly for normal words... any ideas?
> :(My spellcheck field is fairly classic, whitespace tokenizer, with
> lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
> _________________________________________________________________
> Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
> http://www.messengersurvotremobile.com/?d=iPhone
>
--
Lance Norskog
goksron@gmail.com
Re: Spellcheck help
Posted by Lance Norskog <go...@gmail.com>.
Spellchecking can also take a dictionary as its database. Is it
possible to create a dictionary of the terms you want suggested?
On Sat, Jul 17, 2010 at 10:40 AM, <de...@hotmail.com> wrote:
> Can anybody help me with this? :(
>
> -----Original Message----- From: Marc Ghorayeb
> Sent: Thursday, July 08, 2010 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck help
>
>
> Hello,I've been trying to get rid of a bug when using the spellcheck but so
> far with no success :(When searching for a word that starts with a number,
> for example "3dsmax", i get the results that i want, BUT the spellcheck says
> it is not correctly spelled AND the collation gives me "33dsmax". Further
> investigation shows that the spellcheck is actually only checking "dsmax"
> which it considers does not exist and gives me "3dsmax" for better results,
> but since i have spellcheck.collate = true, the collation that i show is
> "33dsmax" with the first 3 being the one discarded by the spellchecker...
> Otherwise, the spellcheck works correctly for normal words... any ideas?
> :(My spellcheck field is fairly classic, whitespace tokenizer, with
> lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
> _________________________________________________________________
> Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
> http://www.messengersurvotremobile.com/?d=iPhone
>
--
Lance Norskog
goksron@gmail.com
Re: Spellcheck help
Posted by de...@hotmail.com.
Can anybody help me with this? :(
-----Original Message-----
From: Marc Ghorayeb
Sent: Thursday, July 08, 2010 9:46 AM
To: solr-user@lucene.apache.org
Subject: Spellcheck help
Hello,I've been trying to get rid of a bug when using the spellcheck but so
far with no success :(When searching for a word that starts with a number,
for example "3dsmax", i get the results that i want, BUT the spellcheck says
it is not correctly spelled AND the collation gives me "33dsmax". Further
investigation shows that the spellcheck is actually only checking "dsmax"
which it considers does not exist and gives me "3dsmax" for better results,
but since i have spellcheck.collate = true, the collation that i show is
"33dsmax" with the first 3 being the one discarded by the spellchecker...
Otherwise, the spellcheck works correctly for normal words... any ideas?
:(My spellcheck field is fairly classic, whitespace tokenizer, with
lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
_________________________________________________________________
Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
http://www.messengersurvotremobile.com/?d=iPhone