You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by maurizio1976 <ma...@gmail.com> on 2012/06/29 15:21:11 UTC

Wildcard searches with leading and ending wildcard

Hi all,
I've been searching for an answer to this everywhere but I can never find an
answer that is perfect for my case, so I'll ask this myself.

I'm on Solr 3.6.
I'm using I use the *ReversedWildcardFilterFactory* in a field containing a
telephone number.
So only one word to be indexed, no phrases no strange tokens.
To be more exact: <filter class="solr.ReversedWildcardFilterFactory"
withOriginal="true"
           maxPosAsterisk="3" maxPosQuestion="2"
maxFractionAsterisk="0.33"/>

I can check with Luke that two words are being indexed, one the reverse of
the other. Perfect.

I can run a query like this:*/ Num:*1234/* that will match docs starting
with 1234
and I can run a query like this:* /Num:1234*/* that will match docs ending
with 1234

but this is the question that everybody seems to be asking. 
Can I run in any way a query that will match records that "contains" the
value 1234?

If I write this: Num:*1234* this will match docs containing 1234 but also
docs containing 4321 which is wrong. this means this query: /Num*4321*/ and
this query: /Num:*1234*/ return exactly the same result.

Is this the wrong approach? has anybody tried the N-gram solution to this
problem?

thanks very much
Maurizio


--
View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-searches-with-leading-and-ending-wildcard-tp3992086.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard searches with leading and ending wildcard

Posted by Erick Erickson <er...@gmail.com>.

for searching sub-strings, ngrams are generally preferred. To expand
on Jack's point.

The whole purpose behind reversed wildcards is that without them, searching for
*abcd requires that _every_ term in your field be enumerated, which can be very
expensive. Adding in reversed wildcards causes this to turn into a
trailing wildcard,
and enumerating bcda* is much easier/less costly.

Best
Erick

On Fri, Jun 29, 2012 at 9:21 AM, maurizio1976
<ma...@gmail.com> wrote:
> Hi all,
> I've been searching for an answer to this everywhere but I can never find an
> answer that is perfect for my case, so I'll ask this myself.
>
> I'm on Solr 3.6.
> I'm using I use the *ReversedWildcardFilterFactory* in a field containing a
> telephone number.
> So only one word to be indexed, no phrases no strange tokens.
> To be more exact: <filter class="solr.ReversedWildcardFilterFactory"
> withOriginal="true"
>            maxPosAsterisk="3" maxPosQuestion="2"
> maxFractionAsterisk="0.33"/>
>
> I can check with Luke that two words are being indexed, one the reverse of
> the other. Perfect.
>
> I can run a query like this:*/ Num:*1234/* that will match docs starting
> with 1234
> and I can run a query like this:* /Num:1234*/* that will match docs ending
> with 1234
>
> but this is the question that everybody seems to be asking.
> Can I run in any way a query that will match records that "contains" the
> value 1234?
>
> If I write this: Num:*1234* this will match docs containing 1234 but also
> docs containing 4321 which is wrong. this means this query: /Num*4321*/ and
> this query: /Num:*1234*/ return exactly the same result.
>
> Is this the wrong approach? has anybody tried the N-gram solution to this
> problem?
>
> thanks very much
> Maurizio
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-searches-with-leading-and-ending-wildcard-tp3992086.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard searches with leading and ending wildcard

Posted by Jack Krupansky <ja...@basetechnology.com>.

I think a doubled-ended wildcard essentially defeats the whole point of the 
reverse wildcard filter, which is to improve performance by avoiding a 
leading wildcard. So, if your data is such that a leading wildcard is okay, 
just use normal wildcards to begin with.

-- Jack Krupansky

-----Original Message----- 
From: maurizio1976
Sent: Friday, June 29, 2012 8:21 AM
To: solr-user@lucene.apache.org
Subject: Wildcard searches with leading and ending wildcard

Hi all,
I've been searching for an answer to this everywhere but I can never find an
answer that is perfect for my case, so I'll ask this myself.

I'm on Solr 3.6.
I'm using I use the *ReversedWildcardFilterFactory* in a field containing a
telephone number.
So only one word to be indexed, no phrases no strange tokens.
To be more exact: <filter class="solr.ReversedWildcardFilterFactory"
withOriginal="true"
           maxPosAsterisk="3" maxPosQuestion="2"
maxFractionAsterisk="0.33"/>

I can check with Luke that two words are being indexed, one the reverse of
the other. Perfect.

I can run a query like this:*/ Num:*1234/* that will match docs starting
with 1234
and I can run a query like this:* /Num:1234*/* that will match docs ending
with 1234

but this is the question that everybody seems to be asking.
Can I run in any way a query that will match records that "contains" the
value 1234?

If I write this: Num:*1234* this will match docs containing 1234 but also
docs containing 4321 which is wrong. this means this query: /Num*4321*/ and
this query: /Num:*1234*/ return exactly the same result.

Is this the wrong approach? has anybody tried the N-gram solution to this
problem?

thanks very much
Maurizio


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-searches-with-leading-and-ending-wildcard-tp3992086.html
Sent from the Solr - User mailing list archive at Nabble.com.