You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marc Ghorayeb <de...@hotmail.com> on 2010/07/08 09:46:09 UTC

Spellcheck help

Hello,I've been trying to get rid of a bug when using the spellcheck but so far with no success :(When searching for a word that starts with a number, for example "3dsmax", i get the results that i want, BUT the spellcheck says it is not correctly spelled AND the collation gives me "33dsmax". Further investigation shows that the spellcheck is actually only checking "dsmax" which it considers does not exist and gives me "3dsmax" for better results, but since i have spellcheck.collate = true, the collation that i show is "33dsmax" with the first 3 being the one discarded by the spellchecker... Otherwise, the spellcheck works correctly for normal words... any ideas? :(My spellcheck field is fairly classic, whitespace tokenizer, with lowercase filter...Any help would be greatly appreciated :)Thanks,Marc 		 	   		  
_________________________________________________________________
Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
http://www.messengersurvotremobile.com/?d=iPhone

RE: Spellcheck help

Posted by "Dyer, James" <Ja...@ingrambook.com>.
If you could, let me know how your testing goes with this change.  I too am interested in having the Collate work as good as it can.  It looks like the code would be better with this change but then again I don't know what the original author was thinking when this was put in.

James Dyer
E-Commerce Systems
Ingram Book Company
(615) 213-4311

-----Original Message-----
From: Marc Ghorayeb [mailto:dekay999@hotmail.com] 
Sent: Tuesday, July 27, 2010 8:07 AM
To: solr-user@lucene.apache.org
Subject: RE: Spellcheck help


Thanks for the input, i'll check it out!
Marc

> Subject: RE: Spellcheck help
> Date: Fri, 23 Jul 2010 13:12:04 -0500
> From: James.Dyer@ingrambook.com
> To: solr-user@lucene.apache.org
> 
> In org.apache.solr.spelling.SpellingQueryConverter, find the line (#84):
> 
> final static String PATTERN = "(?:(?!(" + NMTOKEN + ":|\\d+)))[\\p{L}_\\-0-9]+";
> 
> and remove the |\\d+ to make it:
> 
> final static String PATTERN = "(?:(?!" + NMTOKEN + ":))[\\p{L}_\\-0-9]+";
> 
> My testing shows this solves your problem.  The caution is to test it against all your use cases because obviously someone thought we should ignore leading digits from keywords.  Surely there's a reason why although I can't think of it.
> 
> James Dyer
> E-Commerce Systems
> Ingram Book Company
> (615) 213-4311
> 
> -----Original Message-----
> From: dekay999@hotmail.com [mailto:dekay999@hotmail.com] 
> Sent: Saturday, July 17, 2010 12:41 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Spellcheck help
> 
> Can anybody help me with this? :(
> 
> -----Original Message----- 
> From: Marc Ghorayeb
> Sent: Thursday, July 08, 2010 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck help
> 
> 
> Hello,I've been trying to get rid of a bug when using the spellcheck but so 
> far with no success :(When searching for a word that starts with a number, 
> for example "3dsmax", i get the results that i want, BUT the spellcheck says 
> it is not correctly spelled AND the collation gives me "33dsmax". Further 
> investigation shows that the spellcheck is actually only checking "dsmax" 
> which it considers does not exist and gives me "3dsmax" for better results, 
> but since i have spellcheck.collate = true, the collation that i show is 
> "33dsmax" with the first 3 being the one discarded by the spellchecker... 
> Otherwise, the spellcheck works correctly for normal words... any ideas? 
> :(My spellcheck field is fairly classic, whitespace tokenizer, with 
> lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
> _________________________________________________________________
> Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
> http://www.messengersurvotremobile.com/?d=iPhone 
> 
 		 	   		  
_________________________________________________________________
Exclu : Téléchargez la nouvelle version de Messenger !
http://clk.atdmt.com/FRM/go/244627952/direct/01/

RE: Spellcheck help

Posted by Marc Ghorayeb <de...@hotmail.com>.
Thanks for the input, i'll check it out!
Marc

> Subject: RE: Spellcheck help
> Date: Fri, 23 Jul 2010 13:12:04 -0500
> From: James.Dyer@ingrambook.com
> To: solr-user@lucene.apache.org
> 
> In org.apache.solr.spelling.SpellingQueryConverter, find the line (#84):
> 
> final static String PATTERN = "(?:(?!(" + NMTOKEN + ":|\\d+)))[\\p{L}_\\-0-9]+";
> 
> and remove the |\\d+ to make it:
> 
> final static String PATTERN = "(?:(?!" + NMTOKEN + ":))[\\p{L}_\\-0-9]+";
> 
> My testing shows this solves your problem.  The caution is to test it against all your use cases because obviously someone thought we should ignore leading digits from keywords.  Surely there's a reason why although I can't think of it.
> 
> James Dyer
> E-Commerce Systems
> Ingram Book Company
> (615) 213-4311
> 
> -----Original Message-----
> From: dekay999@hotmail.com [mailto:dekay999@hotmail.com] 
> Sent: Saturday, July 17, 2010 12:41 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Spellcheck help
> 
> Can anybody help me with this? :(
> 
> -----Original Message----- 
> From: Marc Ghorayeb
> Sent: Thursday, July 08, 2010 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck help
> 
> 
> Hello,I've been trying to get rid of a bug when using the spellcheck but so 
> far with no success :(When searching for a word that starts with a number, 
> for example "3dsmax", i get the results that i want, BUT the spellcheck says 
> it is not correctly spelled AND the collation gives me "33dsmax". Further 
> investigation shows that the spellcheck is actually only checking "dsmax" 
> which it considers does not exist and gives me "3dsmax" for better results, 
> but since i have spellcheck.collate = true, the collation that i show is 
> "33dsmax" with the first 3 being the one discarded by the spellchecker... 
> Otherwise, the spellcheck works correctly for normal words... any ideas? 
> :(My spellcheck field is fairly classic, whitespace tokenizer, with 
> lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
> _________________________________________________________________
> Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
> http://www.messengersurvotremobile.com/?d=iPhone 
> 
 		 	   		  
_________________________________________________________________
Exclu : Téléchargez la nouvelle version de Messenger !
http://clk.atdmt.com/FRM/go/244627952/direct/01/

RE: Spellcheck help

Posted by "Dyer, James" <Ja...@ingrambook.com>.
In org.apache.solr.spelling.SpellingQueryConverter, find the line (#84):

final static String PATTERN = "(?:(?!(" + NMTOKEN + ":|\\d+)))[\\p{L}_\\-0-9]+";

and remove the |\\d+ to make it:

final static String PATTERN = "(?:(?!" + NMTOKEN + ":))[\\p{L}_\\-0-9]+";

My testing shows this solves your problem.  The caution is to test it against all your use cases because obviously someone thought we should ignore leading digits from keywords.  Surely there's a reason why although I can't think of it.

James Dyer
E-Commerce Systems
Ingram Book Company
(615) 213-4311

-----Original Message-----
From: dekay999@hotmail.com [mailto:dekay999@hotmail.com] 
Sent: Saturday, July 17, 2010 12:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck help

Can anybody help me with this? :(

-----Original Message----- 
From: Marc Ghorayeb
Sent: Thursday, July 08, 2010 9:46 AM
To: solr-user@lucene.apache.org
Subject: Spellcheck help


Hello,I've been trying to get rid of a bug when using the spellcheck but so 
far with no success :(When searching for a word that starts with a number, 
for example "3dsmax", i get the results that i want, BUT the spellcheck says 
it is not correctly spelled AND the collation gives me "33dsmax". Further 
investigation shows that the spellcheck is actually only checking "dsmax" 
which it considers does not exist and gives me "3dsmax" for better results, 
but since i have spellcheck.collate = true, the collation that i show is 
"33dsmax" with the first 3 being the one discarded by the spellchecker... 
Otherwise, the spellcheck works correctly for normal words... any ideas? 
:(My spellcheck field is fairly classic, whitespace tokenizer, with 
lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
_________________________________________________________________
Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
http://www.messengersurvotremobile.com/?d=iPhone 


Re: Spellcheck help

Posted by Lance Norskog <go...@gmail.com>.
You can make two spellcheckers and consult both of them.

A spelling database made from an existing text index tends to have a
lot of confusing junk.

On Sun, Jul 18, 2010 at 4:43 AM,  <de...@hotmail.com> wrote:
> Can i make a dictionnary of only the words that are having problems? There
> are not that many terms that present this behavior, but it is important for
> me to get rid of this bug. So can i use the dictionnary AND the list built
> by the spellchecker?
>
> -----Original Message----- From: Lance Norskog
> Sent: Sunday, July 18, 2010 1:42 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Spellcheck help
>
> Spellchecking can also take a dictionary as its database. Is it
> possible to create a dictionary of the terms you want suggested?
>
> On Sat, Jul 17, 2010 at 10:40 AM,  <de...@hotmail.com> wrote:
>>
>> Can anybody help me with this? :(
>>
>> -----Original Message----- From: Marc Ghorayeb
>> Sent: Thursday, July 08, 2010 9:46 AM
>> To: solr-user@lucene.apache.org
>> Subject: Spellcheck help
>>
>>
>> Hello,I've been trying to get rid of a bug when using the spellcheck but
>> so
>> far with no success :(When searching for a word that starts with a number,
>> for example "3dsmax", i get the results that i want, BUT the spellcheck
>> says
>> it is not correctly spelled AND the collation gives me "33dsmax". Further
>> investigation shows that the spellcheck is actually only checking "dsmax"
>> which it considers does not exist and gives me "3dsmax" for better
>> results,
>> but since i have spellcheck.collate = true, the collation that i show is
>> "33dsmax" with the first 3 being the one discarded by the spellchecker...
>> Otherwise, the spellcheck works correctly for normal words... any ideas?
>> :(My spellcheck field is fairly classic, whitespace tokenizer, with
>> lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
>> _________________________________________________________________
>> Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
>> http://www.messengersurvotremobile.com/?d=iPhone
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
Lance Norskog
goksron@gmail.com

Re: Spellcheck help

Posted by de...@hotmail.com.
Can i make a dictionnary of only the words that are having problems? There 
are not that many terms that present this behavior, but it is important for 
me to get rid of this bug. So can i use the dictionnary AND the list built 
by the spellchecker?

-----Original Message----- 
From: Lance Norskog
Sent: Sunday, July 18, 2010 1:42 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck help

Spellchecking can also take a dictionary as its database. Is it
possible to create a dictionary of the terms you want suggested?

On Sat, Jul 17, 2010 at 10:40 AM,  <de...@hotmail.com> wrote:
> Can anybody help me with this? :(
>
> -----Original Message----- From: Marc Ghorayeb
> Sent: Thursday, July 08, 2010 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck help
>
>
> Hello,I've been trying to get rid of a bug when using the spellcheck but 
> so
> far with no success :(When searching for a word that starts with a number,
> for example "3dsmax", i get the results that i want, BUT the spellcheck 
> says
> it is not correctly spelled AND the collation gives me "33dsmax". Further
> investigation shows that the spellcheck is actually only checking "dsmax"
> which it considers does not exist and gives me "3dsmax" for better 
> results,
> but since i have spellcheck.collate = true, the collation that i show is
> "33dsmax" with the first 3 being the one discarded by the spellchecker...
> Otherwise, the spellcheck works correctly for normal words... any ideas?
> :(My spellcheck field is fairly classic, whitespace tokenizer, with
> lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
> _________________________________________________________________
> Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
> http://www.messengersurvotremobile.com/?d=iPhone
>



-- 
Lance Norskog
goksron@gmail.com 


Re: Spellcheck help

Posted by Lance Norskog <go...@gmail.com>.
Spellchecking can also take a dictionary as its database. Is it
possible to create a dictionary of the terms you want suggested?

On Sat, Jul 17, 2010 at 10:40 AM,  <de...@hotmail.com> wrote:
> Can anybody help me with this? :(
>
> -----Original Message----- From: Marc Ghorayeb
> Sent: Thursday, July 08, 2010 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck help
>
>
> Hello,I've been trying to get rid of a bug when using the spellcheck but so
> far with no success :(When searching for a word that starts with a number,
> for example "3dsmax", i get the results that i want, BUT the spellcheck says
> it is not correctly spelled AND the collation gives me "33dsmax". Further
> investigation shows that the spellcheck is actually only checking "dsmax"
> which it considers does not exist and gives me "3dsmax" for better results,
> but since i have spellcheck.collate = true, the collation that i show is
> "33dsmax" with the first 3 being the one discarded by the spellchecker...
> Otherwise, the spellcheck works correctly for normal words... any ideas?
> :(My spellcheck field is fairly classic, whitespace tokenizer, with
> lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
> _________________________________________________________________
> Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
> http://www.messengersurvotremobile.com/?d=iPhone
>



-- 
Lance Norskog
goksron@gmail.com

Re: Spellcheck help

Posted by de...@hotmail.com.
Can anybody help me with this? :(

-----Original Message----- 
From: Marc Ghorayeb
Sent: Thursday, July 08, 2010 9:46 AM
To: solr-user@lucene.apache.org
Subject: Spellcheck help


Hello,I've been trying to get rid of a bug when using the spellcheck but so 
far with no success :(When searching for a word that starts with a number, 
for example "3dsmax", i get the results that i want, BUT the spellcheck says 
it is not correctly spelled AND the collation gives me "33dsmax". Further 
investigation shows that the spellcheck is actually only checking "dsmax" 
which it considers does not exist and gives me "3dsmax" for better results, 
but since i have spellcheck.collate = true, the collation that i show is 
"33dsmax" with the first 3 being the one discarded by the spellchecker... 
Otherwise, the spellcheck works correctly for normal words... any ideas? 
:(My spellcheck field is fairly classic, whitespace tokenizer, with 
lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
_________________________________________________________________
Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
http://www.messengersurvotremobile.com/?d=iPhone