You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sébastien Lamy <la...@free.fr> on 2009/06/26 12:36:42 UTC

facets: case and accent insensitive sort

Hi!

When I ask solr for facets, with the parameter "facet.sort=index", it 
gives me the facets sorted alphabetically, but case and accent sensitive.

I found no way to have the facets returned with the original case and 
accents, and sorted alphabetically, with no sensibility to case and accents.

Is there anything I can do to achieve this goal, without having to 
retrieve all facets and sort it myself? (We have fields with many, many 
facets, and doing so impacts performance a lot).

Sebastien.

Re: facets: case and accent insensitive sort

Posted by Michel Bottan <fr...@gmail.com>.
Hi Sébastien,

I've experienced the same issue but when using "range queries". Maybe this
might help you too.

I was trying to filter a query using a range as "[ B TO F ]" being case and
accent insensitive, and still get back the case and accent at results.

The solution have been NOT TOKENIZE the field and get a SINGLE token as if
it was a STRING field and store it without case and accents. The
"KeywordTokenizer" did the job, then at query time the indexed value
(without accents and case insensitve) is used, but the stored value is
returned in the response.

As far I know facets use indexed value at processing, but i'm not sure which
of both(indexed or stored) is returned.

KeywordTokenizer is not clear at Solr docs. See what Lucene says:
KeywordTokenizer - "Emits the entire input as a single token. "

   <fieldType name="text_insensitive" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ISOLatin1AccentFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ISOLatin1AccentFilterFactory"/>
      </analyzer>
     </fieldType>


Cheers,
Michel Bottan

On Mon, Jun 29, 2009 at 10:17 AM, Sébastien Lamy <la...@free.fr> wrote:

> Thanks for your reply. I will have a look at this.
>
> Peter Wolanin a écrit :
>
>  Seems like this might be approached using a Lucene payload?  For
>> example where the original string is stored as the payload and
>> available in the returned facets for display purposes?
>>
>> Payloads are byte arrays stored with Terms on Fields. See
>> https://issues.apache.org/jira/browse/LUCENE-755
>>
>> Solr seems to have support for a few example payloads already like
>> NumericPayloadTokenFilter
>>
>> Almost any way you approach this it seems like there are potentially
>> problems since you might have multiple combinations of case and accent
>> mapping to the same case-less accent-less value that you want to use
>> for sorting (and I assume for counting) your facets?
>>
>> -Peter
>>
>> On Fri, Jun 26, 2009 at 9:02 AM, Sébastien Lamy<la...@free.fr> wrote:
>>
>>
>>> Shalin Shekhar Mangar a écrit :
>>>
>>>
>>>> On Fri, Jun 26, 2009 at 6:02 PM, Sébastien Lamy <la...@free.fr>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>> If I use a copyField to store into a string type, and facet on that, my
>>>>> problem remains:
>>>>> The facets are sorted case and accent sensitive. And I want an
>>>>> *insensitive* sort.
>>>>> If I use a copyField to store into a type with no accents and case (e.g
>>>>> alphaOnlySort), then solr return me facet values with no accents and no
>>>>> case. And I want the facet values returned by solr to *have accents and
>>>>> case*.
>>>>>
>>>>>
>>>>>
>>>> Ah, of course you are right. There is no way to do this right now except
>>>> at
>>>> the client side.
>>>>
>>>>
>>>>
>>> Thank you for your response.
>>> Would it be easy to modify Solr to behave like I want. Where should I
>>> start
>>> to investigate?
>>>
>>>
>>>
>>
>>
>>
>>
>>
>
>

Re: facets: case and accent insensitive sort

Posted by Sébastien Lamy <la...@free.fr>.
Thanks for your reply. I will have a look at this.

Peter Wolanin a écrit :
> Seems like this might be approached using a Lucene payload?  For
> example where the original string is stored as the payload and
> available in the returned facets for display purposes?
>
> Payloads are byte arrays stored with Terms on Fields. See
> https://issues.apache.org/jira/browse/LUCENE-755
>
> Solr seems to have support for a few example payloads already like
> NumericPayloadTokenFilter
>
> Almost any way you approach this it seems like there are potentially
> problems since you might have multiple combinations of case and accent
> mapping to the same case-less accent-less value that you want to use
> for sorting (and I assume for counting) your facets?
>
> -Peter
>
> On Fri, Jun 26, 2009 at 9:02 AM, Sébastien Lamy<la...@free.fr> wrote:
>   
>> Shalin Shekhar Mangar a écrit :
>>     
>>> On Fri, Jun 26, 2009 at 6:02 PM, Sébastien Lamy <la...@free.fr> wrote:
>>>
>>>
>>>       
>>>> If I use a copyField to store into a string type, and facet on that, my
>>>> problem remains:
>>>> The facets are sorted case and accent sensitive. And I want an
>>>> *insensitive* sort.
>>>> If I use a copyField to store into a type with no accents and case (e.g
>>>> alphaOnlySort), then solr return me facet values with no accents and no
>>>> case. And I want the facet values returned by solr to *have accents and
>>>> case*.
>>>>
>>>>         
>>> Ah, of course you are right. There is no way to do this right now except
>>> at
>>> the client side.
>>>
>>>       
>> Thank you for your response.
>> Would it be easy to modify Solr to behave like I want. Where should I start
>> to investigate?
>>
>>     
>
>
>
>   


Re: facets: case and accent insensitive sort

Posted by Peter Wolanin <pe...@acquia.com>.
Seems like this might be approached using a Lucene payload?  For
example where the original string is stored as the payload and
available in the returned facets for display purposes?

Payloads are byte arrays stored with Terms on Fields. See
https://issues.apache.org/jira/browse/LUCENE-755

Solr seems to have support for a few example payloads already like
NumericPayloadTokenFilter

Almost any way you approach this it seems like there are potentially
problems since you might have multiple combinations of case and accent
mapping to the same case-less accent-less value that you want to use
for sorting (and I assume for counting) your facets?

-Peter

On Fri, Jun 26, 2009 at 9:02 AM, Sébastien Lamy<la...@free.fr> wrote:
> Shalin Shekhar Mangar a écrit :
>>
>> On Fri, Jun 26, 2009 at 6:02 PM, Sébastien Lamy <la...@free.fr> wrote:
>>
>>
>>>
>>> If I use a copyField to store into a string type, and facet on that, my
>>> problem remains:
>>> The facets are sorted case and accent sensitive. And I want an
>>> *insensitive* sort.
>>> If I use a copyField to store into a type with no accents and case (e.g
>>> alphaOnlySort), then solr return me facet values with no accents and no
>>> case. And I want the facet values returned by solr to *have accents and
>>> case*.
>>>
>>
>> Ah, of course you are right. There is no way to do this right now except
>> at
>> the client side.
>>
>
> Thank you for your response.
> Would it be easy to modify Solr to behave like I want. Where should I start
> to investigate?
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wolanin@acquia.com

Re: facets: case and accent insensitive sort

Posted by Sébastien Lamy <la...@free.fr>.
Shalin Shekhar Mangar a écrit :
> On Fri, Jun 26, 2009 at 6:02 PM, Sébastien Lamy <la...@free.fr> wrote:
>
>   
>> If I use a copyField to store into a string type, and facet on that, my
>> problem remains:
>> The facets are sorted case and accent sensitive. And I want an
>> *insensitive* sort.
>> If I use a copyField to store into a type with no accents and case (e.g
>> alphaOnlySort), then solr return me facet values with no accents and no
>> case. And I want the facet values returned by solr to *have accents and
>> case*.
>>     
> Ah, of course you are right. There is no way to do this right now except at
> the client side.
>   
Thank you for your response.
Would it be easy to modify Solr to behave like I want. Where should I 
start to investigate?

Re: facets: case and accent insensitive sort

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Fri, Jun 26, 2009 at 6:02 PM, Sébastien Lamy <la...@free.fr> wrote:

>
>>
> If I use a copyField to store into a string type, and facet on that, my
> problem remains:
> The facets are sorted case and accent sensitive. And I want an
> *insensitive* sort.
> If I use a copyField to store into a type with no accents and case (e.g
> alphaOnlySort), then solr return me facet values with no accents and no
> case. And I want the facet values returned by solr to *have accents and
> case*.
>
>
Ah, of course you are right. There is no way to do this right now except at
the client side.

-- 
Regards,
Shalin Shekhar Mangar.

Re: facets: case and accent insensitive sort

Posted by Sébastien Lamy <la...@free.fr>.
Shalin Shekhar Mangar a écrit :
> On Fri, Jun 26, 2009 at 4:06 PM, Sébastien Lamy <la...@free.fr> wrote:
>
>   
>> Hi!
>>
>> When I ask solr for facets, with the parameter "facet.sort=index", it gives
>> me the facets sorted alphabetically, but case and accent sensitive.
>>
>> I found no way to have the facets returned with the original case and
>> accents, and sorted alphabetically, with no sensibility to case and accents.
>>
>> Is there anything I can do to achieve this goal, without having to retrieve
>> all facets and sort it myself? (We have fields with many, many facets, and
>> doing so impacts performance a lot).
>>
>>     
>
> Faceting is done on indexed values so if your indexed values are with
> original case and accents, they will be sorted accordingly. You could use a
> copyField to store these values into a string type and facet on that.
>
>   
If I use a copyField to store into a string type, and facet on that, my 
problem remains:
The facets are sorted case and accent sensitive. And I want an 
*insensitive* sort.
If I use a copyField to store into a type with no accents and case (e.g 
alphaOnlySort), then solr return me facet values with no accents and no 
case. And I want the facet values returned by solr to *have accents and 
case*.


Re: facets: case and accent insensitive sort

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Fri, Jun 26, 2009 at 4:06 PM, Sébastien Lamy <la...@free.fr> wrote:

> Hi!
>
> When I ask solr for facets, with the parameter "facet.sort=index", it gives
> me the facets sorted alphabetically, but case and accent sensitive.
>
> I found no way to have the facets returned with the original case and
> accents, and sorted alphabetically, with no sensibility to case and accents.
>
> Is there anything I can do to achieve this goal, without having to retrieve
> all facets and sort it myself? (We have fields with many, many facets, and
> doing so impacts performance a lot).
>

Faceting is done on indexed values so if your indexed values are with
original case and accents, they will be sorted accordingly. You could use a
copyField to store these values into a string type and facet on that.

-- 
Regards,
Shalin Shekhar Mangar.