You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by ta...@controldocs.com on 2005/08/18 21:50:18 UTC

Case-sensitive search

Is there any way to do a case-sensitive search?

Thanks
Tareque
ControlDOCS


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Case-sensitive search

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Aug 18, 2005, at 3:50 PM, tareque@controldocs.com wrote:
> Is there any way to do a case-sensitive search?

All Lucene searches are case-sensitive, actually.

But most often a lowercasing analyzer is used.  So the trick is to  
change the analysis process to not lowercase.  It gets more fun when  
you need case sensitive or insensitive searching both in the same  
situation, where the trick is to either build two different indexes  
or to use different fields that use different analysis on the same  
text (though this gets tricky with QueryParser generated queries and  
the potential for user field selection).

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Case-sensitive search

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Aug 22, 2005, at 10:40 AM, tareque@controldocs.com wrote:
> Is there any way to index as case-sensitive and then, while searching,
> making the search case-sensitive and case-insensitive using the  
> same index
> as needed?

Not really.  Terms in the index are ordered lexicographically,  
including case.  It certainly would be possible to write customized  
Query subclasses to do this sort of thing at the expense of performance.

The only techniques I'm aware of are to either build separate indexes  
or index the same information into separate fields of the same  
documents using different analyzers per field.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Case-sensitive search

Posted by ta...@controldocs.com.
>>
>> On Aug 18, 2005, at 6:22 PM, tareque@controldocs.com wrote:
>>
>>>> On Thu, 2005-08-18 at 17:16, tareque@controldocs.com wrote:
>>>>
>>>>> Thanks again! The analyzer is working now. But seems like
>>>>> actually the
>>>>> QueryParser I am using is probably converting the queries to
>>>>> lowercase
>>>>> first. Is there any way to stop that? Here is the line of code
>>>>> where I
>>>>> am
>>>>> parsing:
>>>>>
>>>>> Query query = QueryParser.parse(line, "contents", analyzer);
>>>>>
>>>>> As for analyzer, I have tried both StardaAnalyzer and StopAnalyzer.
>>>>>
>>>>
>>>> You need to use the same analyzer for parsing queries as you do for
>>>> indexing content.
>>>>
>>>> Luke Francl
>>>>
>>>>
>>>
>>> Actually I have used StopAn for indexing. So used the same for parsing
>>> queries, but it's still converting the queries to lowercase before
>>> running
>>> the actually search
>>
>> Both of those analyzers lowercase.  When you said it was working,
>> what did you mean?   To prevent lowercasing and get stop words
>> removed you *will* have to write a custom analyzer.  Also keep in
>> mind that StopFilter is case-sensitive and that the stop word list is
>> all lowercase - so you will need to account for this with a custom
>> stop filter probably too.
>>
>> It is highly recommended to "analyze the analyzer" - a topic covered
>> in depth in the Analysis chapter in Lucene in Action, and one of my
>> java.net articles.
>>
>>      Erik
>>
>
> It's all working now. I did write a custom analyzer using the
> StopAnalyzer, which correctly indexed. The problem was, when I was parsing
> the query I forgot to use my new analyzer and was using the old
> StopAnalyzer instead. Thanks for all the help!
>
> Tareque
>


Is there any way to index as case-sensitive and then, while searching,
making the search case-sensitive and case-insensitive using the same index
as needed?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Case-sensitive search

Posted by ta...@controldocs.com.
>
> On Aug 18, 2005, at 6:22 PM, tareque@controldocs.com wrote:
>
>>> On Thu, 2005-08-18 at 17:16, tareque@controldocs.com wrote:
>>>
>>>> Thanks again! The analyzer is working now. But seems like
>>>> actually the
>>>> QueryParser I am using is probably converting the queries to
>>>> lowercase
>>>> first. Is there any way to stop that? Here is the line of code
>>>> where I
>>>> am
>>>> parsing:
>>>>
>>>> Query query = QueryParser.parse(line, "contents", analyzer);
>>>>
>>>> As for analyzer, I have tried both StardaAnalyzer and StopAnalyzer.
>>>>
>>>
>>> You need to use the same analyzer for parsing queries as you do for
>>> indexing content.
>>>
>>> Luke Francl
>>>
>>>
>>
>> Actually I have used StopAn for indexing. So used the same for parsing
>> queries, but it's still converting the queries to lowercase before
>> running
>> the actually search
>
> Both of those analyzers lowercase.  When you said it was working,
> what did you mean?   To prevent lowercasing and get stop words
> removed you *will* have to write a custom analyzer.  Also keep in
> mind that StopFilter is case-sensitive and that the stop word list is
> all lowercase - so you will need to account for this with a custom
> stop filter probably too.
>
> It is highly recommended to "analyze the analyzer" - a topic covered
> in depth in the Analysis chapter in Lucene in Action, and one of my
> java.net articles.
>
>      Erik
>

It's all working now. I did write a custom analyzer using the
StopAnalyzer, which correctly indexed. The problem was, when I was parsing
the query I forgot to use my new analyzer and was using the old
StopAnalyzer instead. Thanks for all the help!

Tareque


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Case-sensitive search

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Aug 18, 2005, at 6:22 PM, tareque@controldocs.com wrote:

>> On Thu, 2005-08-18 at 17:16, tareque@controldocs.com wrote:
>>
>>> Thanks again! The analyzer is working now. But seems like  
>>> actually the
>>> QueryParser I am using is probably converting the queries to  
>>> lowercase
>>> first. Is there any way to stop that? Here is the line of code  
>>> where I
>>> am
>>> parsing:
>>>
>>> Query query = QueryParser.parse(line, "contents", analyzer);
>>>
>>> As for analyzer, I have tried both StardaAnalyzer and StopAnalyzer.
>>>
>>
>> You need to use the same analyzer for parsing queries as you do for
>> indexing content.
>>
>> Luke Francl
>>
>>
>
> Actually I have used StopAn for indexing. So used the same for parsing
> queries, but it's still converting the queries to lowercase before  
> running
> the actually search

Both of those analyzers lowercase.  When you said it was working,  
what did you mean?   To prevent lowercasing and get stop words  
removed you *will* have to write a custom analyzer.  Also keep in  
mind that StopFilter is case-sensitive and that the stop word list is  
all lowercase - so you will need to account for this with a custom  
stop filter probably too.

It is highly recommended to "analyze the analyzer" - a topic covered  
in depth in the Analysis chapter in Lucene in Action, and one of my  
java.net articles.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Case-sensitive search

Posted by ta...@controldocs.com.
> On Thu, 2005-08-18 at 17:16, tareque@controldocs.com wrote:
>> Thanks again! The analyzer is working now. But seems like actually the
>> QueryParser I am using is probably converting the queries to lowercase
>> first. Is there any way to stop that? Here is the line of code where I
>> am
>> parsing:
>>
>> Query query = QueryParser.parse(line, "contents", analyzer);
>>
>> As for analyzer, I have tried both StardaAnalyzer and StopAnalyzer.
>
> You need to use the same analyzer for parsing queries as you do for
> indexing content.
>
> Luke Francl
>

Actually I have used StopAn for indexing. So used the same for parsing
queries, but it's still converting the queries to lowercase before running
the actually search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Case-sensitive search

Posted by Luke Francl <lu...@stellent.com>.
On Thu, 2005-08-18 at 17:16, tareque@controldocs.com wrote:
> Thanks again! The analyzer is working now. But seems like actually the
> QueryParser I am using is probably converting the queries to lowercase
> first. Is there any way to stop that? Here is the line of code where I am
> parsing:
> 
> Query query = QueryParser.parse(line, "contents", analyzer);
> 
> As for analyzer, I have tried both StardaAnalyzer and StopAnalyzer.

You need to use the same analyzer for parsing queries as you do for
indexing content.

Luke Francl


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Case-sensitive search

Posted by ta...@controldocs.com.
Thanks again! The analyzer is working now. But seems like actually the
QueryParser I am using is probably converting the queries to lowercase
first. Is there any way to stop that? Here is the line of code where I am
parsing:

Query query = QueryParser.parse(line, "contents", analyzer);

As for analyzer, I have tried both StardaAnalyzer and StopAnalyzer.



> On Aug 18, 2005, at 4:16 PM, tareque@controldocs.com wrote:
>> Thanks! I have used StopAnalyzer to index. Does it lower-case before
>> indexing? I don't touch the query string before sending for
>> searching, so
>> the query string is not lower-cases.
>
> Pretty much all built-in Lucene analyzers lower-case:
>
>      http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/
> apache/lucene/analysis/StopAnalyzer.java
> (scroll to the bottom to see the tokenStream method - the heart of an
> analyzer)
>
> The exception is the WhitespaceAnalyzer, which is probably not what
> you want to use.  You can write your own Analyzer (copy/paste one and
> remove the lowercasing filter - though some analyzers use a
> lowercasing tokenizer, not a filter).
>
>      Erik
>
>
>
>>
>>
>>> The search really is case sensitive, it's just that all input is
>>> usually lower-cased, so it feels like it's case insensitive.  In
>>> other
>>> words, don't lower-case your input before indexing, and don't
>>> lower-case your queries (i.e. pick an Analyzer that doesn't
>>> lower-case).
>>>
>>> Otis
>>>
>>>
>>> --- tareque@controldocs.com wrote:
>>>
>>>
>>>> Is there any way to do a case-sensitive search?
>>>>
>>>> Thanks
>>>> Tareque
>>>> ControlDOCS
>>>>
>>>>
>>>> --------------------------------------------------------------------
>>>> -
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Case-sensitive search

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Aug 18, 2005, at 4:16 PM, tareque@controldocs.com wrote:
> Thanks! I have used StopAnalyzer to index. Does it lower-case before
> indexing? I don't touch the query string before sending for  
> searching, so
> the query string is not lower-cases.

Pretty much all built-in Lucene analyzers lower-case:

     http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/ 
apache/lucene/analysis/StopAnalyzer.java
(scroll to the bottom to see the tokenStream method - the heart of an  
analyzer)

The exception is the WhitespaceAnalyzer, which is probably not what  
you want to use.  You can write your own Analyzer (copy/paste one and  
remove the lowercasing filter - though some analyzers use a  
lowercasing tokenizer, not a filter).

     Erik



>
>
>> The search really is case sensitive, it's just that all input is
>> usually lower-cased, so it feels like it's case insensitive.  In  
>> other
>> words, don't lower-case your input before indexing, and don't
>> lower-case your queries (i.e. pick an Analyzer that doesn't
>> lower-case).
>>
>> Otis
>>
>>
>> --- tareque@controldocs.com wrote:
>>
>>
>>> Is there any way to do a case-sensitive search?
>>>
>>> Thanks
>>> Tareque
>>> ControlDOCS
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Case-sensitive search

Posted by ta...@controldocs.com.
Ok, seems like it does is a LowerCaseFilter. Is there any analyzer that do
the same thing as StopAnalyzer does, except for lowering the case? Cuz
StopAnalyzer best fits my purpose.

> Thanks! I have used StopAnalyzer to index. Does it lower-case before
> indexing? I don't touch the query string before sending for searching, so
> the query string is not lower-cases.
>
>> The search really is case sensitive, it's just that all input is
>> usually lower-cased, so it feels like it's case insensitive.  In other
>> words, don't lower-case your input before indexing, and don't
>> lower-case your queries (i.e. pick an Analyzer that doesn't
>> lower-case).
>>
>> Otis
>>
>>
>> --- tareque@controldocs.com wrote:
>>
>>> Is there any way to do a case-sensitive search?
>>>
>>> Thanks
>>> Tareque
>>> ControlDOCS
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Case-sensitive search

Posted by ta...@controldocs.com.
Thanks! I have used StopAnalyzer to index. Does it lower-case before
indexing? I don't touch the query string before sending for searching, so
the query string is not lower-cases.

> The search really is case sensitive, it's just that all input is
> usually lower-cased, so it feels like it's case insensitive.  In other
> words, don't lower-case your input before indexing, and don't
> lower-case your queries (i.e. pick an Analyzer that doesn't
> lower-case).
>
> Otis
>
>
> --- tareque@controldocs.com wrote:
>
>> Is there any way to do a case-sensitive search?
>>
>> Thanks
>> Tareque
>> ControlDOCS
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Case-sensitive search

Posted by Otis Gospodnetic <ot...@yahoo.com>.
The search really is case sensitive, it's just that all input is
usually lower-cased, so it feels like it's case insensitive.  In other
words, don't lower-case your input before indexing, and don't
lower-case your queries (i.e. pick an Analyzer that doesn't
lower-case).

Otis


--- tareque@controldocs.com wrote:

> Is there any way to do a case-sensitive search?
> 
> Thanks
> Tareque
> ControlDOCS
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org