You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Angel Ice <lb...@yahoo.fr> on 2009/10/06 10:32:57 UTC

Re : wildcard searches

Hi.

Thanks for your answers Christian and Avlesh.

But I don't understant what you mean by :
"If you want to enable wildcard queries, preserving the original token (while processing each token in your filter) might work."

Could you explain this point please ?

Laurent





________________________________
De : Avlesh Singh <av...@gmail.com>
À : solr-user@lucene.apache.org
Envoyé le : Lundi, 5 Octobre 2009, 20h30mn 54s
Objet : Re: wildcard searches

Zambrano is right, Laurent. The analyzers for a field are not invoked for
wildcard queries. You custom filter is not even getting executed at
query-time.
If you want to enable wildcard queries, preserving the original token (while
processing each token in your filter) might work.

Cheers
Avlesh

On Mon, Oct 5, 2009 at 10:39 PM, Angel Ice <lb...@yahoo.fr> wrote:

> Hi everyone,
>
> I have a little question regarding the search engine when a wildcard
> character is used in the query.
> Let's take the following example :
>
> - I have sent in indexation the word Hésitation (with an accent on the "e")
> - The filters applied to the field that will handle this word, result in
> the indexation of "esit" (the mute H is suppressed (home made filter), the
> accent too (IsoLatin1Filter), and the SnowballPorterFilter suppress the
> "ation".
>
> When i search for "hesitation", "esitation", "ésitation" etc ... all is OK,
> the document is returned.
> But as soon as I use a wildcard, like "hésita*", the document is not
> returned. In fact, I have to put the wildcard in a manner that match the
> indexed term exactly (example "esi*")
>
> Does the search engine applies the filters to the word that prefix the
> wildcard ? Or does it use this prefix verbatim ?
>
> Thanks for you help.
>
> Laurent
>
>
>
>



      

Re: Re : Re : wildcard searches

Posted by Avlesh Singh <av...@gmail.com>.
You are right, Angel. The problem would still persist.
Why don't you consider putting the original data in some field. While
querying, you can query on both the fields - analyzed and original one.
Wildcard queries will not give you any results from the analyzed field but
would match the data in your original field.

Works?

Cheers
Avlesh

On Tue, Oct 6, 2009 at 2:27 PM, Angel Ice <lb...@yahoo.fr> wrote:

> Ah yes, got it.
> But i'm not sure this will solve my problem.
> Because, I'm aloso using the IsoLatin1 filter, that remove the accentued
> characters.
> So I will have the same problem with accentued characters. Cause the
> original token is not stored with this filter.
>
> Laurent
>
>
>
>
>
>
> ________________________________
> De : Avlesh Singh <av...@gmail.com>
> À : solr-user@lucene.apache.org
> Envoyé le : Mardi, 6 Octobre 2009, 10h41mn 56s
> Objet : Re: Re : wildcard searches
>
> You are processing your tokens in the filter that you wrote. I am assuming
> it is the first filter being applied and removes the character 'h' from
> tokens. When you are doing that, you can preserve the original token in the
> same field as well. Because as of now, you are simply removing the
> character. Subsequent filters don't even know that there was an 'h'
> character in the original token.
>
> Since wild card queries are not analyzed, the 'h' character in the query
> "hésita*" does NOT get removed during query time. This means that unless
> the
> original token was preserved in the field it wouldn't find any matches.
>
> This helps?
>
> Cheers
> Avlesh
>
> On Tue, Oct 6, 2009 at 2:02 PM, Angel Ice <lb...@yahoo.fr> wrote:
>
> > Hi.
> >
> > Thanks for your answers Christian and Avlesh.
> >
> > But I don't understant what you mean by :
> > "If you want to enable wildcard queries, preserving the original token
> > (while processing each token in your filter) might work."
> >
> > Could you explain this point please ?
> >
> > Laurent
> >
> >
> >
> >
> >
> > ________________________________
> > De : Avlesh Singh <av...@gmail.com>
> > À : solr-user@lucene.apache.org
> > Envoyé le : Lundi, 5 Octobre 2009, 20h30mn 54s
> > Objet : Re: wildcard searches
> >
> > Zambrano is right, Laurent. The analyzers for a field are not invoked for
> > wildcard queries. You custom filter is not even getting executed at
> > query-time.
> > If you want to enable wildcard queries, preserving the original token
> > (while
> > processing each token in your filter) might work.
> >
> > Cheers
> > Avlesh
> >
> > On Mon, Oct 5, 2009 at 10:39 PM, Angel Ice <lb...@yahoo.fr> wrote:
> >
> > > Hi everyone,
> > >
> > > I have a little question regarding the search engine when a wildcard
> > > character is used in the query.
> > > Let's take the following example :
> > >
> > > - I have sent in indexation the word Hésitation (with an accent on the
> > "e")
> > > - The filters applied to the field that will handle this word, result
> in
> > > the indexation of "esit" (the mute H is suppressed (home made filter),
> > the
> > > accent too (IsoLatin1Filter), and the SnowballPorterFilter suppress the
> > > "ation".
> > >
> > > When i search for "hesitation", "esitation", "ésitation" etc ... all is
> > OK,
> > > the document is returned.
> > > But as soon as I use a wildcard, like "hésita*", the document is not
> > > returned. In fact, I have to put the wildcard in a manner that match
> the
> > > indexed term exactly (example "esi*")
> > >
> > > Does the search engine applies the filters to the word that prefix the
> > > wildcard ? Or does it use this prefix verbatim ?
> > >
> > > Thanks for you help.
> > >
> > > Laurent
> > >
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>
>
>

Re : Re : wildcard searches

Posted by Angel Ice <lb...@yahoo.fr>.
Ah yes, got it.
But i'm not sure this will solve my problem.
Because, I'm aloso using the IsoLatin1 filter, that remove the accentued characters.
So I will have the same problem with accentued characters. Cause the original token is not stored with this filter.

Laurent






________________________________
De : Avlesh Singh <av...@gmail.com>
À : solr-user@lucene.apache.org
Envoyé le : Mardi, 6 Octobre 2009, 10h41mn 56s
Objet : Re: Re : wildcard searches

You are processing your tokens in the filter that you wrote. I am assuming
it is the first filter being applied and removes the character 'h' from
tokens. When you are doing that, you can preserve the original token in the
same field as well. Because as of now, you are simply removing the
character. Subsequent filters don't even know that there was an 'h'
character in the original token.

Since wild card queries are not analyzed, the 'h' character in the query
"hésita*" does NOT get removed during query time. This means that unless the
original token was preserved in the field it wouldn't find any matches.

This helps?

Cheers
Avlesh

On Tue, Oct 6, 2009 at 2:02 PM, Angel Ice <lb...@yahoo.fr> wrote:

> Hi.
>
> Thanks for your answers Christian and Avlesh.
>
> But I don't understant what you mean by :
> "If you want to enable wildcard queries, preserving the original token
> (while processing each token in your filter) might work."
>
> Could you explain this point please ?
>
> Laurent
>
>
>
>
>
> ________________________________
> De : Avlesh Singh <av...@gmail.com>
> À : solr-user@lucene.apache.org
> Envoyé le : Lundi, 5 Octobre 2009, 20h30mn 54s
> Objet : Re: wildcard searches
>
> Zambrano is right, Laurent. The analyzers for a field are not invoked for
> wildcard queries. You custom filter is not even getting executed at
> query-time.
> If you want to enable wildcard queries, preserving the original token
> (while
> processing each token in your filter) might work.
>
> Cheers
> Avlesh
>
> On Mon, Oct 5, 2009 at 10:39 PM, Angel Ice <lb...@yahoo.fr> wrote:
>
> > Hi everyone,
> >
> > I have a little question regarding the search engine when a wildcard
> > character is used in the query.
> > Let's take the following example :
> >
> > - I have sent in indexation the word Hésitation (with an accent on the
> "e")
> > - The filters applied to the field that will handle this word, result in
> > the indexation of "esit" (the mute H is suppressed (home made filter),
> the
> > accent too (IsoLatin1Filter), and the SnowballPorterFilter suppress the
> > "ation".
> >
> > When i search for "hesitation", "esitation", "ésitation" etc ... all is
> OK,
> > the document is returned.
> > But as soon as I use a wildcard, like "hésita*", the document is not
> > returned. In fact, I have to put the wildcard in a manner that match the
> > indexed term exactly (example "esi*")
> >
> > Does the search engine applies the filters to the word that prefix the
> > wildcard ? Or does it use this prefix verbatim ?
> >
> > Thanks for you help.
> >
> > Laurent
> >
> >
> >
> >
>
>
>
>



      

Re: Re : wildcard searches

Posted by Avlesh Singh <av...@gmail.com>.
You are processing your tokens in the filter that you wrote. I am assuming
it is the first filter being applied and removes the character 'h' from
tokens. When you are doing that, you can preserve the original token in the
same field as well. Because as of now, you are simply removing the
character. Subsequent filters don't even know that there was an 'h'
character in the original token.

Since wild card queries are not analyzed, the 'h' character in the query
"hésita*" does NOT get removed during query time. This means that unless the
original token was preserved in the field it wouldn't find any matches.

This helps?

Cheers
Avlesh

On Tue, Oct 6, 2009 at 2:02 PM, Angel Ice <lb...@yahoo.fr> wrote:

> Hi.
>
> Thanks for your answers Christian and Avlesh.
>
> But I don't understant what you mean by :
> "If you want to enable wildcard queries, preserving the original token
> (while processing each token in your filter) might work."
>
> Could you explain this point please ?
>
> Laurent
>
>
>
>
>
> ________________________________
> De : Avlesh Singh <av...@gmail.com>
> À : solr-user@lucene.apache.org
> Envoyé le : Lundi, 5 Octobre 2009, 20h30mn 54s
> Objet : Re: wildcard searches
>
> Zambrano is right, Laurent. The analyzers for a field are not invoked for
> wildcard queries. You custom filter is not even getting executed at
> query-time.
> If you want to enable wildcard queries, preserving the original token
> (while
> processing each token in your filter) might work.
>
> Cheers
> Avlesh
>
> On Mon, Oct 5, 2009 at 10:39 PM, Angel Ice <lb...@yahoo.fr> wrote:
>
> > Hi everyone,
> >
> > I have a little question regarding the search engine when a wildcard
> > character is used in the query.
> > Let's take the following example :
> >
> > - I have sent in indexation the word Hésitation (with an accent on the
> "e")
> > - The filters applied to the field that will handle this word, result in
> > the indexation of "esit" (the mute H is suppressed (home made filter),
> the
> > accent too (IsoLatin1Filter), and the SnowballPorterFilter suppress the
> > "ation".
> >
> > When i search for "hesitation", "esitation", "ésitation" etc ... all is
> OK,
> > the document is returned.
> > But as soon as I use a wildcard, like "hésita*", the document is not
> > returned. In fact, I have to put the wildcard in a manner that match the
> > indexed term exactly (example "esi*")
> >
> > Does the search engine applies the filters to the word that prefix the
> > wildcard ? Or does it use this prefix verbatim ?
> >
> > Thanks for you help.
> >
> > Laurent
> >
> >
> >
> >
>
>
>
>