You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Haagen Hasle <ha...@gmail.com> on 2012/12/10 13:24:25 UTC

Re: Wildcards and fuzzy/phonetic query

It's been two months since I asked about wildcards and phonetic filters, and finally the task of upgrading Solr to version 4.0 was prioritized in our project.  So the last couple of days I've been working on it.  Another team member upgraded Solr from 3.4 to 4.0, and I've been making changes to schema.xml to accommodate the new multiterm functionality.

However, it doesn't seem to work..  Lowercasing is still not done when I do a fuzzy search, not through the regular index analyzer and its support of MultitermAwareComponents, and not when I try to define a special multiterm analyzer.

Do I have to do anything special to enable the multiterm functionality in Solr 4.0?


Regards, 

Hågen

Den 8. okt. 2012 kl. 18:09 skrev Erick Erickson:

> whether phonetic filters can be multiterm aware:
> 
> I'd be leery of this, as I basically don't quite know how that would
> behave. You'd have to insure that the  algorithms changed the
> first parts of the words uniformly, regardless of what followed. I'm
> pretty sure that _some_ phonetic algorithms do not follow this
> pattern, i.e. eric wouldn't necessarily have the same beginning
> as erickson. That said, some of the algorithms _may_ follow this
> rule and might be OK candidates for being MultiTermAware....
> 
> But, you don't need this in order to try it out. See the "Expert Level
> Schema Possibilities"
> at:
> http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
> 
> You can define your own analysis chain for wildcards as part of your <fieldType>
> definition and include whatever you want, whether or not it's
> MultiTermAware and it
> will be applied at query time. Use the <analyzer type="query"> entry
> as a basis. _But_ you shouldn't include anything in this section that
> produces more than one output per input token. Note, "token", not
> "field". I.e. a really bad candidate for this section is
> WordDelimiterFilterFactory
> if you use the admin/analysis page (which you'll get to know intimately) and
> look at a type that has WordDelimiterFilterFactory in its chain and
> put something
> like erickErickson1234, you'll see what I mean.. Make sure and check the
> "verbose" box....
> 
> If you can determine that some of the phonetic algorithms _should_ be
> MultiTermAware, please feel free to raise a JIRA and we can discuss... I suspect
> it'll be on a case-by-case basis.
> 
> Best
> Erick
> 
> On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle
> <ha...@gmail.com> wrote:
>> Hi!
>> 
>> I'm quite new to Solr, I was recently asked to help out on a project where the previous "Solr-person" quit quite suddenly.  I've noticed that some of our searches don't return the expected result, and I'm hoping you guys can help me out.
>> 
>> We've indexed a lot of names, and would like to search for a person in our system using these names.  We previously used Oracle Text for this, and we experience that Solr is much faster.  So far so good! :)  But when we try to use wildcards things start to to wrong.
>> 
>> We're using Solr 3.4, and I see that some of our problems are solved in 3.6.  Ref SOLR-2438:
>> https://issues.apache.org/jira/browse/SOLR-2438
>> 
>> But we would also like to be able to combine wildcards with fuzzy searches, and wildcards with a phonetic filter.  I don't see anything about phonetic filters in SOLR-2438 or SOLR-2921.  (https://issues.apache.org/jira/browse/SOLR-2921)
>> Is it possible to make the phonetic filters MultiTermAware?
>> 
>> Regarding fuzzy queries, in Oracle Text I can search for "chr%" ("chr*" in Solr..) and find both christian and kristian.  As far as I understand, this is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined.  Is this correct, or have I misunderstood anything?  Are there any workarounds or filter-combinations I can use to achieve the same result?  I've seen people suggest using a boolean query to combine the two, but I don't really see how that would solve my "chr*"-problem.
>> 
>> As I mentioned earlier I'm quite new to this, so I apologize if what I'm asking about only shows my ignorance..
>> 
>> 
>> Regards, Hågen


Re: Wildcards and fuzzy/phonetic query

Posted by Haagen Hasle <ha...@gmail.com>.
Thank you!  I actually tried to look through Jira, but I didn't focus on the minor issues.  For me, this is quite critical.. :-)

Any chance of merging this into the 4.0.1 release?


Regards, Haagen

Den 11. des. 2012 kl. 12:45 skrev Ahmet Arslan:

>> Lowercasing actually seems to work with Wildcard queries,
>> but not with fuzzy queries.  Are there any reasons why
>> I should experience such a difference?
> 
> Hi Haagen,
> 
> Yonik added this recently. https://issues.apache.org/jira/browse/SOLR-4076
> 


Re: Wildcards and fuzzy/phonetic query

Posted by Ahmet Arslan <io...@yahoo.com>.
> Lowercasing actually seems to work with Wildcard queries,
> but not with fuzzy queries.  Are there any reasons why
> I should experience such a difference?

Hi Haagen,

Yonik added this recently. https://issues.apache.org/jira/browse/SOLR-4076


Re: Wildcards and fuzzy/phonetic query

Posted by Haagen Hasle <ha...@gmail.com>.
Lowercasing actually seems to work with Wildcard queries, but not with fuzzy queries.  Are there any reasons why I should experience such a difference?


Regards, Haagen


Den 10. des. 2012 kl. 13:24 skrev Haagen Hasle:

> 
> It's been two months since I asked about wildcards and phonetic filters, and finally the task of upgrading Solr to version 4.0 was prioritized in our project.  So the last couple of days I've been working on it.  Another team member upgraded Solr from 3.4 to 4.0, and I've been making changes to schema.xml to accommodate the new multiterm functionality.
> 
> However, it doesn't seem to work..  Lowercasing is still not done when I do a fuzzy search, not through the regular index analyzer and its support of MultitermAwareComponents, and not when I try to define a special multiterm analyzer.
> 
> Do I have to do anything special to enable the multiterm functionality in Solr 4.0?
> 
> 
> Regards, 
> 
> Hågen
> 
> Den 8. okt. 2012 kl. 18:09 skrev Erick Erickson:
> 
>> whether phonetic filters can be multiterm aware:
>> 
>> I'd be leery of this, as I basically don't quite know how that would
>> behave. You'd have to insure that the  algorithms changed the
>> first parts of the words uniformly, regardless of what followed. I'm
>> pretty sure that _some_ phonetic algorithms do not follow this
>> pattern, i.e. eric wouldn't necessarily have the same beginning
>> as erickson. That said, some of the algorithms _may_ follow this
>> rule and might be OK candidates for being MultiTermAware....
>> 
>> But, you don't need this in order to try it out. See the "Expert Level
>> Schema Possibilities"
>> at:
>> http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
>> 
>> You can define your own analysis chain for wildcards as part of your <fieldType>
>> definition and include whatever you want, whether or not it's
>> MultiTermAware and it
>> will be applied at query time. Use the <analyzer type="query"> entry
>> as a basis. _But_ you shouldn't include anything in this section that
>> produces more than one output per input token. Note, "token", not
>> "field". I.e. a really bad candidate for this section is
>> WordDelimiterFilterFactory
>> if you use the admin/analysis page (which you'll get to know intimately) and
>> look at a type that has WordDelimiterFilterFactory in its chain and
>> put something
>> like erickErickson1234, you'll see what I mean.. Make sure and check the
>> "verbose" box....
>> 
>> If you can determine that some of the phonetic algorithms _should_ be
>> MultiTermAware, please feel free to raise a JIRA and we can discuss... I suspect
>> it'll be on a case-by-case basis.
>> 
>> Best
>> Erick
>> 
>> On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle
>> <ha...@gmail.com> wrote:
>>> Hi!
>>> 
>>> I'm quite new to Solr, I was recently asked to help out on a project where the previous "Solr-person" quit quite suddenly.  I've noticed that some of our searches don't return the expected result, and I'm hoping you guys can help me out.
>>> 
>>> We've indexed a lot of names, and would like to search for a person in our system using these names.  We previously used Oracle Text for this, and we experience that Solr is much faster.  So far so good! :)  But when we try to use wildcards things start to to wrong.
>>> 
>>> We're using Solr 3.4, and I see that some of our problems are solved in 3.6.  Ref SOLR-2438:
>>> https://issues.apache.org/jira/browse/SOLR-2438
>>> 
>>> But we would also like to be able to combine wildcards with fuzzy searches, and wildcards with a phonetic filter.  I don't see anything about phonetic filters in SOLR-2438 or SOLR-2921.  (https://issues.apache.org/jira/browse/SOLR-2921)
>>> Is it possible to make the phonetic filters MultiTermAware?
>>> 
>>> Regarding fuzzy queries, in Oracle Text I can search for "chr%" ("chr*" in Solr..) and find both christian and kristian.  As far as I understand, this is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined.  Is this correct, or have I misunderstood anything?  Are there any workarounds or filter-combinations I can use to achieve the same result?  I've seen people suggest using a boolean query to combine the two, but I don't really see how that would solve my "chr*"-problem.
>>> 
>>> As I mentioned earlier I'm quite new to this, so I apologize if what I'm asking about only shows my ignorance..
>>> 
>>> 
>>> Regards, Hågen
>