You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2012/01/01 16:42:54 UTC

Re: Solr, SQL Server's LIKE

Chantal:

bq: The problem with the wildcard searches is that the input is not
analyzed.

As of 3.6/4.0, this is no longer entirely true. Some analysis is
performed for wildcard searches by default and you can
specify most anything you want if you really need to see:
https://issues.apache.org/jira/browse/SOLR-2438
and
http://wiki.apache.org/solr/MultitermQueryAnalysis

Best
Erick

On Fri, Dec 30, 2011 at 4:33 PM, Devon Baumgarten
<db...@nationalcorp.com> wrote:
> Hoss,
>
> Thanks. You've answered my question. To clarify, what I should have asked for instead of 'exact' was 'not fuzzy'. For some reason it didn't occur to me that I didn't need n-grams to use the wildcard. You asking for me to clarify what I meant made me realize that the n-grams are the source of all my current problems. :)
>
> Thanks!
>
> Devon Baumgarten
>
>
> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> Sent: Thursday, December 29, 2011 7:00 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr, SQL Server's LIKE
>
>
> : Thanks. I know I'll be able to utilize some of Solr's free text
> : searching capabilities in other search types in this project. The
> : product manager wants this particular search to exactly mimic LIKE%.
>        ...
> : Ex: If I search "Albatross" I want "Albert" to be excluded completely,
> : rather than having a low score.
>
> please be specific about the types of queries you want. ie: we need more
> then one example of the type of input you want to provide, the type of
> matches you want to see for that input, and the type of matches you want
> to get back.
>
> in your first message you said you need to match company titles "pretty
> exactly" but then seem to contradict yourself by saying the SQL's LIKE
> command fit's the bill -- even though the SQL LIKE command exists
> specificly for in-exact matches on field values.
>
> Based on your one example above of Albatross, you don't need anything
> special: don't use ngrams, don't use stemming, don't use fuzzy anything --
> just search for "Albatross" and it will match "Albatross" but not
> "Albert".  if you want "Albatross" to match "Albatross Road" use some
> basic tokenization.
>
> If all you really care about is prefix searching (which seems suggested by
> your "LIKE%" comment above, which i'm guessing is shorthand for something
> similar to "LIKE 'ABC%'"), so that queries like "abc" and "abcd" both
> match "abcdef" and "abcdzzzz" but neither of them match "xxxxabcdyyyy"
> then just use prefix queries (ie: "abcd*") -- they should be plenty
> efficient for your purposes.  you only need to worry about ngrams when you
> want to efficiently match in the middle of a string. (ie: "TITLE LIKE
> %ABC%")
>
>
> -Hoss

Re: Solr, SQL Server's LIKE

Posted by Chantal Ackermann <ch...@btelligent.de>.

Thanks, Erick! That sounds great. I really do have to upgrade.

Chantal


On Sun, 2012-01-01 at 16:42 +0100, Erick Erickson wrote:
> Chantal:
> 
> bq: The problem with the wildcard searches is that the input is not
> analyzed.
> 
> As of 3.6/4.0, this is no longer entirely true. Some analysis is
> performed for wildcard searches by default and you can
> specify most anything you want if you really need to see:
> https://issues.apache.org/jira/browse/SOLR-2438
> and
> http://wiki.apache.org/solr/MultitermQueryAnalysis
> 
> Best
> Erick