You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Joel Nylund <jn...@yahoo.com> on 2009/11/25 01:51:31 UTC

how to do partial word searches?

Hi, I saw some older postings on this, but didnt see a resolution.

I have a field called title, I would like to be able to find partial  
word matches within the title.

For example:

http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22

I would expect it to find:
<str name="textTitle">the daily dish | by andrew sullivan</str>

but it doesnt, it does find sully (which is fine with me also as a  
bonus), but doesnt seem to get any of the partial word stuff. Oddly  
enough before I lowercased the title, the wildcard matching seemed to  
work a bit better, it just didnt deal with the case sensitive query.

At first I had mixed case titles and I read that the wildcard doesn't  
work with mixed case, so I created another field that is a lowered  
version of the title called "textTitle", it is of type text.

Is it possible with solr to achieve what I am trying to do, if so how?  
If not, anything closer than what I have?

thanks
Joel


Re: how to do partial word searches?

Posted by Rob Ganly <ro...@daft.ie>.
hi all,

i was having the same problem, i needed to be able to search a substring
anywhere within a word for a specific field. i used the
NGramTokenizerFactory factory in my index analyzer and it seems to work
well.  (
http://lucene.apache.org/solr/api/org/apache/solr/analysis/NGramTokenizerFactory.html
).

i created a new field type based on this definition:
http://coderrr.wordpress.com/category/solr/#ngram_schema_xml

apparently it will increased the size of your index and perhaps indexing
time but is working fine at the moment (although i'm currently only using a
testbed of 20'000 records). i will report back if i discover any painful
issues with scaling up!

rob ganly

On 3 December 2009 18:21, Joel Nylund <jn...@yahoo.com> wrote:

> Just for an update on this, I tried text_rev and it seems to work great.
>
> So in summary, if you want partial word matches within a url or small
> sentence (title), here is what I did and it seems to work pretty well:
>
> - create an extra field that is all lower case , I used mysql lcase in the
> query for DIH
> - make that field use text_rev type in schema.xml
> - make the query be "sulli OR *sulli*"    (the *sulli* doesnt seem to match
> sulli if its at the end of the field)
>
> thanks
> Joel
>
>
>
>
> On Nov 25, 2009, at 9:21 AM, Robert Muir wrote:
>
>  Hi, if you are using Solr 1.4 I think you might want to try type text_rev
>> (look in the example schema.xml)
>>
>> unless i am mistaken:
>>
>> this will enable leading wildcard support for that field.
>> this doesn't do any stemming, which I think might be making your wildcards
>> behave wierd.
>> it also enables reverse wildcard support, so some of your substring
>> matches
>> will be faster.
>>
>> On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund <jn...@yahoo.com> wrote:
>>
>>  Hi, I saw some older postings on this, but didnt see a resolution.
>>>
>>> I have a field called title, I would like to be able to find partial word
>>> matches within the title.
>>>
>>> For example:
>>>
>>> http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22
>>>
>>> I would expect it to find:
>>> <str name="textTitle">the daily dish | by andrew sullivan</str>
>>>
>>> but it doesnt, it does find sully (which is fine with me also as a
>>> bonus),
>>> but doesnt seem to get any of the partial word stuff. Oddly enough before
>>> I
>>> lowercased the title, the wildcard matching seemed to work a bit better,
>>> it
>>> just didnt deal with the case sensitive query.
>>>
>>> At first I had mixed case titles and I read that the wildcard doesn't
>>> work
>>> with mixed case, so I created another field that is a lowered version of
>>> the
>>> title called "textTitle", it is of type text.
>>>
>>> Is it possible with solr to achieve what I am trying to do, if so how? If
>>> not, anything closer than what I have?
>>>
>>> thanks
>>> Joel
>>>
>>>
>>>
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>
>

Re: how to do partial word searches?

Posted by Joel Nylund <jn...@yahoo.com>.
Just for an update on this, I tried text_rev and it seems to work great.

So in summary, if you want partial word matches within a url or small  
sentence (title), here is what I did and it seems to work pretty well:

- create an extra field that is all lower case , I used mysql lcase in  
the query for DIH
- make that field use text_rev type in schema.xml
- make the query be "sulli OR *sulli*"    (the *sulli* doesnt seem to  
match sulli if its at the end of the field)

thanks
Joel



On Nov 25, 2009, at 9:21 AM, Robert Muir wrote:

> Hi, if you are using Solr 1.4 I think you might want to try type  
> text_rev
> (look in the example schema.xml)
>
> unless i am mistaken:
>
> this will enable leading wildcard support for that field.
> this doesn't do any stemming, which I think might be making your  
> wildcards
> behave wierd.
> it also enables reverse wildcard support, so some of your substring  
> matches
> will be faster.
>
> On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund <jn...@yahoo.com>  
> wrote:
>
>> Hi, I saw some older postings on this, but didnt see a resolution.
>>
>> I have a field called title, I would like to be able to find  
>> partial word
>> matches within the title.
>>
>> For example:
>>
>> http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22
>>
>> I would expect it to find:
>> <str name="textTitle">the daily dish | by andrew sullivan</str>
>>
>> but it doesnt, it does find sully (which is fine with me also as a  
>> bonus),
>> but doesnt seem to get any of the partial word stuff. Oddly enough  
>> before I
>> lowercased the title, the wildcard matching seemed to work a bit  
>> better, it
>> just didnt deal with the case sensitive query.
>>
>> At first I had mixed case titles and I read that the wildcard  
>> doesn't work
>> with mixed case, so I created another field that is a lowered  
>> version of the
>> title called "textTitle", it is of type text.
>>
>> Is it possible with solr to achieve what I am trying to do, if so  
>> how? If
>> not, anything closer than what I have?
>>
>> thanks
>> Joel
>>
>>
>
>
> -- 
> Robert Muir
> rcmuir@gmail.com


Re: how to do partial word searches?

Posted by Robert Muir <rc...@gmail.com>.
Hi, if you are using Solr 1.4 I think you might want to try type text_rev
(look in the example schema.xml)

unless i am mistaken:

this will enable leading wildcard support for that field.
this doesn't do any stemming, which I think might be making your wildcards
behave wierd.
it also enables reverse wildcard support, so some of your substring matches
will be faster.

On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund <jn...@yahoo.com> wrote:

> Hi, I saw some older postings on this, but didnt see a resolution.
>
> I have a field called title, I would like to be able to find partial word
> matches within the title.
>
> For example:
>
> http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22
>
> I would expect it to find:
> <str name="textTitle">the daily dish | by andrew sullivan</str>
>
> but it doesnt, it does find sully (which is fine with me also as a bonus),
> but doesnt seem to get any of the partial word stuff. Oddly enough before I
> lowercased the title, the wildcard matching seemed to work a bit better, it
> just didnt deal with the case sensitive query.
>
> At first I had mixed case titles and I read that the wildcard doesn't work
> with mixed case, so I created another field that is a lowered version of the
> title called "textTitle", it is of type text.
>
> Is it possible with solr to achieve what I am trying to do, if so how? If
> not, anything closer than what I have?
>
> thanks
> Joel
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: how to do partial word searches?

Posted by Erick Erickson <er...@gmail.com>.
Confession: I haven't had occasion to use the ngram thingy, but here's the
theory....
And note that SOLR has n-gram tokenizers available..

Using a 2-gram example for sullivan, the n-gram would index these tokens...
su, ul, ll, li, iv, va, an. Then at query time in your example, sulli would
be
broken up into su, ul, ll and li. Which, when searched as a phrase
would turn match your field.....

The expense, of course is that your index is larger (but surprisingly not as
much as you'd think). But your queries are much faster.....

That's the theory anyway, the practice is "left as an exercise for the
reader"<G>

But "the folks" generously provided quite an explication of what wildcards
are
all about on the *lucene* user's list, look for a thread titled
"I just don't get wildcards at all" from around 2006. It's a nice background
for
what the underlying problem is, some of the SOLR tokenizers are realizing
some of this I think. And the state of the art has progressed considerably
since then, but the underlying issues are still there...

Sorry I can't be more help here..
Erick

On Wed, Nov 25, 2009 at 8:18 AM, Joel Nylund <jn...@yahoo.com> wrote:

> Hi Erick,
>
> thanks for the links, I read both of them and I still have no idea what to
> do, lots of back and forth, but didn't see any solution on it.
>
> One person talked about indexing the field in reverse and doing and ON on
> it, this might work I guess.
>
> thanks
> Joel
>
>
>
> On Nov 24, 2009, at 9:12 PM, Erick Erickson wrote:
>
>  copying from Eric Hatcher:
>>
>> See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently
>> does not have leading wildcard support enabled.
>>
>> There's a pretty extensive recent exchange on this, see the
>> thread on the user's list titled
>>
>> "leading and trailing wildcard query"Best
>> Erick
>>
>> On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund <jn...@yahoo.com> wrote:
>>
>>  Hi, I saw some older postings on this, but didnt see a resolution.
>>>
>>> I have a field called title, I would like to be able to find partial word
>>> matches within the title.
>>>
>>> For example:
>>>
>>> http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22
>>>
>>> I would expect it to find:
>>> <str name="textTitle">the daily dish | by andrew sullivan</str>
>>>
>>> but it doesnt, it does find sully (which is fine with me also as a
>>> bonus),
>>> but doesnt seem to get any of the partial word stuff. Oddly enough before
>>> I
>>> lowercased the title, the wildcard matching seemed to work a bit better,
>>> it
>>> just didnt deal with the case sensitive query.
>>>
>>> At first I had mixed case titles and I read that the wildcard doesn't
>>> work
>>> with mixed case, so I created another field that is a lowered version of
>>> the
>>> title called "textTitle", it is of type text.
>>>
>>> Is it possible with solr to achieve what I am trying to do, if so how? If
>>> not, anything closer than what I have?
>>>
>>> thanks
>>> Joel
>>>
>>>
>>>
>

Re: how to do partial word searches?

Posted by Joel Nylund <jn...@yahoo.com>.
Hi Erick,

thanks for the links, I read both of them and I still have no idea  
what to do, lots of back and forth, but didn't see any solution on it.

One person talked about indexing the field in reverse and doing and ON  
on it, this might work I guess.

thanks
Joel


On Nov 24, 2009, at 9:12 PM, Erick Erickson wrote:

> copying from Eric Hatcher:
>
> See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently
> does not have leading wildcard support enabled.
>
> There's a pretty extensive recent exchange on this, see the
> thread on the user's list titled
>
> "leading and trailing wildcard query"Best
> Erick
>
> On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund <jn...@yahoo.com>  
> wrote:
>
>> Hi, I saw some older postings on this, but didnt see a resolution.
>>
>> I have a field called title, I would like to be able to find  
>> partial word
>> matches within the title.
>>
>> For example:
>>
>> http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22
>>
>> I would expect it to find:
>> <str name="textTitle">the daily dish | by andrew sullivan</str>
>>
>> but it doesnt, it does find sully (which is fine with me also as a  
>> bonus),
>> but doesnt seem to get any of the partial word stuff. Oddly enough  
>> before I
>> lowercased the title, the wildcard matching seemed to work a bit  
>> better, it
>> just didnt deal with the case sensitive query.
>>
>> At first I had mixed case titles and I read that the wildcard  
>> doesn't work
>> with mixed case, so I created another field that is a lowered  
>> version of the
>> title called "textTitle", it is of type text.
>>
>> Is it possible with solr to achieve what I am trying to do, if so  
>> how? If
>> not, anything closer than what I have?
>>
>> thanks
>> Joel
>>
>>


Re: how to do partial word searches?

Posted by Erick Erickson <er...@gmail.com>.
copying from Eric Hatcher:

See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently
does not have leading wildcard support enabled.

There's a pretty extensive recent exchange on this, see the
thread on the user's list titled

"leading and trailing wildcard query"Best
Erick

On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund <jn...@yahoo.com> wrote:

> Hi, I saw some older postings on this, but didnt see a resolution.
>
> I have a field called title, I would like to be able to find partial word
> matches within the title.
>
> For example:
>
> http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22
>
> I would expect it to find:
> <str name="textTitle">the daily dish | by andrew sullivan</str>
>
> but it doesnt, it does find sully (which is fine with me also as a bonus),
> but doesnt seem to get any of the partial word stuff. Oddly enough before I
> lowercased the title, the wildcard matching seemed to work a bit better, it
> just didnt deal with the case sensitive query.
>
> At first I had mixed case titles and I read that the wildcard doesn't work
> with mixed case, so I created another field that is a lowered version of the
> title called "textTitle", it is of type text.
>
> Is it possible with solr to achieve what I am trying to do, if so how? If
> not, anything closer than what I have?
>
> thanks
> Joel
>
>