You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by Max Metral <ma...@artsalliancelabs.com> on 2007/04/27 21:57:18 UTC

strategy for abbreviations?

I'm doing a search against user-generated data.  I have a listing such
as "PF Changs" (a restaurant).  It might be specified as

 

P.F. Changs

PF Chang's

P. F. Changs

And etc...

 

And my search might be any of those too.  Assuming I'm using "AND"
matches by default (which I am), is there a common mechanism for dealing
with this problem?  Snowball seems to get somewhat confused by it,
turning "P.F. Chang's" into PF Changs and then failing because it's
matching against "P F Changs"


RE: strategy for abbreviations?

Posted by George Aroush <ge...@aroush.net>.
Hi Max,

The way I solved this problem in the past is to write my own analyzer where
it would map all of those different variations to the correct one.

I had to do this for abbreviations such as states, chemical names, etc.

This isn't a Lucene issue per see, but an application issue.

Regards,

-- George  

> -----Original Message-----
> From: Max Metral [mailto:max@artsalliancelabs.com] 
> Sent: Friday, April 27, 2007 3:57 PM
> To: lucene-net-dev@incubator.apache.org
> Subject: strategy for abbreviations?
> 
> I'm doing a search against user-generated data.  I have a 
> listing such as "PF Changs" (a restaurant).  It might be specified as
> 
>  
> 
> P.F. Changs
> 
> PF Chang's
> 
> P. F. Changs
> 
> And etc...
> 
>  
> 
> And my search might be any of those too.  Assuming I'm using "AND"
> matches by default (which I am), is there a common mechanism 
> for dealing with this problem?  Snowball seems to get 
> somewhat confused by it, turning "P.F. Chang's" into PF 
> Changs and then failing because it's matching against "P F Changs"
> 
>