You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by John Cecere <jo...@oracle.com> on 2014/09/19 15:07:35 UTC

Case sensitivity

Is there a way to set up Lucene so that both case-sensitive and case-insensitive searches can be done without having to generate two 
indexes?

-- 
John Cecere
Principal Engineer - Oracle Corporation
732-987-4317 / john.cecere@oracle.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Case sensitivity

Posted by Sujit Pal <su...@comcast.net>.

Hi John,

Take a look at the PerFieldAnalyzerWrapper. As the name suggests, it allows
you to create different analyzers per field.

-sujit


On Fri, Sep 19, 2014 at 6:50 AM, John Cecere <jo...@oracle.com> wrote:

> I've considered this, but there are two problems with it. First of all, it
> feels like I'm still taking up twice the storage, I'm just doing it using a
> single index rather than two of them. This doesn't sound like it's buying
> me anything.
>
> The second problem with this is simply that I haven't figured out how to
> do this. I assume in creating two fields you would implement two separate
> analyzers on them, one using LowerCaseFilter and the other not. I haven't
> made the connection on how to tie an Analyzer to a particular field. It
> seems to be tied to the IndexWriterConfig and the IndexWriter.
>
> Thanks,
> John
>
>
> On 9/19/14 9:36 AM, Paul Libbrecht wrote:
>
>> two fields?
>>
>> paul
>>
>>
>> On 19 sept. 2014, at 15:07, John Cecere <jo...@oracle.com> wrote:
>>
>>  Is there a way to set up Lucene so that both case-sensitive and
>>> case-insensitive searches can be done without having to generate two
>>> indexes?
>>>
>>> --
>>> John Cecere
>>> Principal Engineer - Oracle Corporation
>>> 732-987-4317 / john.cecere@oracle.com
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> --
> John Cecere
> Principal Engineer - Oracle Corporation
> 732-987-4317 / john.cecere@oracle.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Case sensitivity

Posted by Ian Lea <ia...@gmail.com>.

PerFieldAnalyzerWrapper is the way to mix and match fields and analyzers.

Personally I'd simply store the case-insensitive field with a call to
toLowerCase() on the value and equivalent on the search string.

You will of course use more storage, but you don't need to store the
text contents for both variants so it won't be double.  Unless you
aren't storing the original either.


--
Ian.


On Fri, Sep 19, 2014 at 2:50 PM, John Cecere <jo...@oracle.com> wrote:
> I've considered this, but there are two problems with it. First of all, it
> feels like I'm still taking up twice the storage, I'm just doing it using a
> single index rather than two of them. This doesn't sound like it's buying me
> anything.
>
> The second problem with this is simply that I haven't figured out how to do
> this. I assume in creating two fields you would implement two separate
> analyzers on them, one using LowerCaseFilter and the other not. I haven't
> made the connection on how to tie an Analyzer to a particular field. It
> seems to be tied to the IndexWriterConfig and the IndexWriter.
>
> Thanks,
> John
>
>
> On 9/19/14 9:36 AM, Paul Libbrecht wrote:
>>
>> two fields?
>>
>> paul
>>
>>
>> On 19 sept. 2014, at 15:07, John Cecere <jo...@oracle.com> wrote:
>>
>>> Is there a way to set up Lucene so that both case-sensitive and
>>> case-insensitive searches can be done without having to generate two
>>> indexes?
>>>
>>> --
>>> John Cecere
>>> Principal Engineer - Oracle Corporation
>>> 732-987-4317 / john.cecere@oracle.com
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> --
> John Cecere
> Principal Engineer - Oracle Corporation
> 732-987-4317 / john.cecere@oracle.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Case sensitivity

Posted by John Cecere <jo...@oracle.com>.

I've considered this, but there are two problems with it. First of all, it feels like I'm still taking up twice the storage, I'm 
just doing it using a single index rather than two of them. This doesn't sound like it's buying me anything.

The second problem with this is simply that I haven't figured out how to do this. I assume in creating two fields you would 
implement two separate analyzers on them, one using LowerCaseFilter and the other not. I haven't made the connection on how to tie 
an Analyzer to a particular field. It seems to be tied to the IndexWriterConfig and the IndexWriter.

Thanks,
John

On 9/19/14 9:36 AM, Paul Libbrecht wrote:
> two fields?
>
> paul
>
>
> On 19 sept. 2014, at 15:07, John Cecere <jo...@oracle.com> wrote:
>
>> Is there a way to set up Lucene so that both case-sensitive and case-insensitive searches can be done without having to generate two indexes?
>>
>> --
>> John Cecere
>> Principal Engineer - Oracle Corporation
>> 732-987-4317 / john.cecere@oracle.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

-- 
John Cecere
Principal Engineer - Oracle Corporation
732-987-4317 / john.cecere@oracle.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Case sensitivity

Posted by Paul Libbrecht <pa...@hoplahup.net>.

two fields?

paul


On 19 sept. 2014, at 15:07, John Cecere <jo...@oracle.com> wrote:

> Is there a way to set up Lucene so that both case-sensitive and case-insensitive searches can be done without having to generate two indexes?
> 
> -- 
> John Cecere
> Principal Engineer - Oracle Corporation
> 732-987-4317 / john.cecere@oracle.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Case sensitivity

Posted by Michael Sokolov <ms...@safaribooksonline.com>.

On 9/19/2014 9:07 AM, John Cecere wrote:
> Is there a way to set up Lucene so that both case-sensitive and 
> case-insensitive searches can be done without having to generate two 
> indexes?
>
You might be interested in the discussion here: 
https://issues.apache.org/jira/browse/LUCENE-5620 which addresses that 
question.  If you read it, you'll see a few different approaches based 
on indexing both lower-cased and "original" variants of the same term at 
the same position in a single field

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org