You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Kipping, Peter" <pk...@crcpress.com> on 2004/08/12 21:09:47 UTC

wildcard uppercase

I'm doing wildcard searches on molecular formulas where case is
critical.  For instance Co = Cobalt, CO = Carbon Monoxide.  I've read
the faq on this:


Yes, unlike other types of Lucene queries, Wildcard, Prefix, and Fuzzy
queries are case sensitive. 

That is because those types of queries are not passed through the
Analyzer, which is the component that performs operations such as
stemming and lowercasing. 

The reason for skipping the Analyzer is that if you were searching for
"dogs*" you would not want "dogs" first stemmed to "dog", since that
would then match "dog*", which is not the intended query. 
A workaround for this is simply to lowercase the entire query before
passing it to the query parser. 


But it makes no sense.  First most analyzers don't even do stemming.
I'm using the whitespace analyzer which doesn't.  Second lowercasing is
a completely separate issue from stemming, I see no reason why the a
wildcard query has to be lowercased.  Is there any way to prevent my
wildcard queries from being lowercased?  Example:  String input
"C9H10O5*", resulting query "c9h10o5*"

Thanks,

Peter


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: wildcard uppercase

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Just use an Analyzer that doesn't lowercase.  That FAQ entry assumes
that the Analyzer does lowercase its input.
Searching IS case sensitive, it's just that people often use an
Analyzer that lowercases everything (at indexing and at query time), so
the search appears not to be case sensitive, and that is just what most
people want.

Otis

--- "Kipping, Peter" <pk...@crcpress.com> wrote:

> I'm doing wildcard searches on molecular formulas where case is
> critical.  For instance Co = Cobalt, CO = Carbon Monoxide.  I've read
> the faq on this:
> 
> 
> Yes, unlike other types of Lucene queries, Wildcard, Prefix, and
> Fuzzy
> queries are case sensitive. 
> 
> That is because those types of queries are not passed through the
> Analyzer, which is the component that performs operations such as
> stemming and lowercasing. 
> 
> The reason for skipping the Analyzer is that if you were searching
> for
> "dogs*" you would not want "dogs" first stemmed to "dog", since that
> would then match "dog*", which is not the intended query. 
> A workaround for this is simply to lowercase the entire query before
> passing it to the query parser. 
> 
> 
> But it makes no sense.  First most analyzers don't even do stemming.
> I'm using the whitespace analyzer which doesn't.  Second lowercasing
> is
> a completely separate issue from stemming, I see no reason why the a
> wildcard query has to be lowercased.  Is there any way to prevent my
> wildcard queries from being lowercased?  Example:  String input
> "C9H10O5*", resulting query "c9h10o5*"
> 
> Thanks,
> 
> Peter
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org