You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Sethi, Parampreet" <pa...@teamaol.com> on 2010/10/08 17:33:02 UTC

Accented Search in Solr

Hi All,

I am using Solr 1.3 in my project. Just wanted to know if there is any other way by which below mentioned queries will return the same results:

 Gruyère-and-Zucchini
 Gruyere-and-Zucchini

The first query has accented characters in it. I was just going through the Solr tokenizers and filter factories documentation, there is a filter factory listed "solr.ISOLatin1AccentFilterFactory" that can be used to replace accented characters with their non-accented counterparts.

Is there any other way to do this search which is independent of how data is stored (whether in accented or non-accented form)?

Thanks for the help.

Regards,
param

Re: Accented Search in Solr

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Param,

Note that the original value will be stored even if ISOLatin1AccentFilter 
removes the accept for indexing / matching purposes.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: "Sethi, Parampreet" <pa...@teamaol.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Fri, October 8, 2010 11:33:02 AM
> Subject: Accented Search in Solr
> 
> Hi All,
> 
> I am using Solr 1.3 in my project. Just wanted to know if there  is any other 
>way by which below mentioned queries will return the same  results:
> 
>  Gruyère-and-Zucchini
>  Gruyere-and-Zucchini
> 
> The first  query has accented characters in it. I was just going through the 
>Solr  tokenizers and filter factories documentation, there is a filter factory 
>listed  "solr.ISOLatin1AccentFilterFactory" that can be used to replace accented  
>characters with their non-accented counterparts.
> 
> Is there any other way  to do this search which is independent of how data is 
>stored (whether in  accented or non-accented form)?
> 
> Thanks for the  help.
> 
> Regards,
> param
> 

Re: Accented Search in Solr

Posted by Erick Erickson <er...@gmail.com>.
not that I know of. Do note that whether the query has the accent filter
active or not MUST
be matched with the index-time filter. In other words, if you indexed with
the filter but
search without it or vice-versa you won't get the resultsyou expect.

Also note that no matter what, the original text (without the filter
applied) is what's #stored#
untokenized. This is entirely independent of what's #indexed# for all that
these options are
specified for the same field.

If this is irrelevant, what are you really trying to accomplish? This may be
an "xy" problem, see:
http://people.apache.org/~hossman/#xyproblem

<http://people.apache.org/~hossman/#xyproblem>

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341

Erick


On Fri, Oct 8, 2010 at 11:33 AM, Sethi, Parampreet <
parampreet.sethi@teamaol.com> wrote:

> Hi All,
>
> I am using Solr 1.3 in my project. Just wanted to know if there is any
> other way by which below mentioned queries will return the same results:
>
>  Gruyère-and-Zucchini
>  Gruyere-and-Zucchini
>
> The first query has accented characters in it. I was just going through the
> Solr tokenizers and filter factories documentation, there is a filter
> factory listed "solr.ISOLatin1AccentFilterFactory" that can be used to
> replace accented characters with their non-accented counterparts.
>
> Is there any other way to do this search which is independent of how data
> is stored (whether in accented or non-accented form)?
>
> Thanks for the help.
>
> Regards,
> param
>

Re: Accented Search in Solr

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Accented Search in Solr
: References: <AA...@mail.gmail.com>
: In-Reply-To: <AA...@mail.gmail.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss