You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by vsl <oc...@gmail.com> on 2013/04/25 11:33:00 UTC

Exact matching in Solr 3.6.1

Hi,
 is it possible to get exact matched result if the search term is combined
e.g. "cats" AND London NOT Leeds


In the previus threads I have read that it is possible to create new field
of String type and perform phrase search on it but nowhere the above
mentioned combined search term had been taken into consideration.

BR
Pawel



--
View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Exact matching in Solr 3.6.1

Posted by Sandeep Mestry <sa...@gmail.com>.
Agree with Jack.

The current field type text_general is designed to match the query tokens
instead of exact matches - so it's not able to fulfill your requirements.

Can you use flat file
<http://wiki.apache.org/solr/FileBasedSpellChecker>as spell check
dictionary instead and that way you can search on exact
matched field while generating spell check suggestions from the file
instead of from index?

-S


On 25 April 2013 16:25, Jack Krupansky <ja...@basetechnology.com> wrote:

> Well then just do an exact match ONLY!
>
> It sounds like you haven't worked out the inconsistencies in your
> requirements.
>
> To be clear: We're not offering you "solutions" - that's your job. We're
> only pointing out tools that you can use. It is up to you to utilize the
> tools wisely to implement your solution.
>
> I suspect that you simply haven't experimented enough with various boosts
> to assure that the unstemmed result is consistently higher.
>
> Maybe you need a custom stemmer or stemmer overide so that "passengers"
> does get stemmed to "passenger", but "cats" does not (but "dogs" does.)
> That can be a choice that you can make, but I would urge caution. Still, it
> is a decision that you can make - it's not a matter of Solr forcing or
> preventing you. I still think boosting of an unstemmed field should be
> sufficient.
>
> But until you clarify the inconsistencies in your requirements, we won't
> be able to make much progress.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: vsl
> Sent: Thursday, April 25, 2013 10:45 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Exact matching in Solr 3.6.1
>
> Thanks for your reply but this solution does not fullfil my requirment
> because other documents (not exact matched) will be returned as well.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.**
> nabble.com/Exact-matching-in-**Solr-3-6-1-tp4058865p4058929.**html<http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058929.html>
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Exact matching in Solr 3.6.1

Posted by Jack Krupansky <ja...@basetechnology.com>.
Well then just do an exact match ONLY!

It sounds like you haven't worked out the inconsistencies in your 
requirements.

To be clear: We're not offering you "solutions" - that's your job. We're 
only pointing out tools that you can use. It is up to you to utilize the 
tools wisely to implement your solution.

I suspect that you simply haven't experimented enough with various boosts to 
assure that the unstemmed result is consistently higher.

Maybe you need a custom stemmer or stemmer overide so that "passengers" does 
get stemmed to "passenger", but "cats" does not (but "dogs" does.) That can 
be a choice that you can make, but I would urge caution. Still, it is a 
decision that you can make - it's not a matter of Solr forcing or preventing 
you. I still think boosting of an unstemmed field should be sufficient.

But until you clarify the inconsistencies in your requirements, we won't be 
able to make much progress.

-- Jack Krupansky

-----Original Message----- 
From: vsl
Sent: Thursday, April 25, 2013 10:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Exact matching in Solr 3.6.1

Thanks for your reply but this solution does not fullfil my requirment
because other documents (not exact matched) will be returned as well.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058929.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Exact matching in Solr 3.6.1

Posted by vsl <oc...@gmail.com>.
Thanks for your reply but this solution does not fullfil my requirment
because other documents (not exact matched) will be returned as well.



--
View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058929.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Exact matching in Solr 3.6.1

Posted by Majirus FANSI <ma...@gmail.com>.
Hi Pawel,
If you are searching on any field of type "text_general" as defined in your
schema, you are stuck with the porter stemmer. In fact in your setting solr
is not aware of a term like "cats", but "cat". Thus no way to do exact
match  with "cats" in this case.
What you can do is creating a new type of field and with the copyField
facility save a verbatim version of your data in that field while the field
of type "text-general" still performs stemming. Finally, do add the new
field to the list of searcheable field with a higher boost so that exact
match receives highest score.
Hope this helps.
regards,

Maj


On 25 April 2013 14:43, vsl <oc...@gmail.com> wrote:

> Exact matching is just one of my cases.  Currently I perform search on
> field
> with given definition:
>
>     <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1"
>           catenateWords="1" catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="1" preserveOriginal="1" types="characters.txt"/>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English"/>
>       </analyzer>
>       <analyzer type="query">
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1"
>           catenateWords="1" catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="1" preserveOriginal="1" types="characters.txt"/>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English"/>
>
>       </analyzer>
>     </fieldType>
>
> This field definition fullfils all other requirments.
> Examples:
> - special characters
> - passengers<-> passenger
>
> The case with exact matching is the last one I have to complete.
>
> The problem with cats <-> cat is caused by SnowballPorterFilterFactory.
> This
> is what I know.
>
> The question is whether it is possible to handle exact matching (edismax)
> with only one result like described in the previous post without
> influencing
> existing functionalities?
>
> BR
> Pawel
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058907.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Exact matching in Solr 3.6.1

Posted by vsl <oc...@gmail.com>.
Exact matching is just one of my cases.  Currently I perform search on field
with given definition:

    <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1"
          catenateWords="1" catenateNumbers="1" catenateAll="0"
splitOnCaseChange="1" preserveOriginal="1" types="characters.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
        
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory"
language="English"/>
      </analyzer>
      <analyzer type="query">
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1"
          catenateWords="1" catenateNumbers="1" catenateAll="0"
splitOnCaseChange="1" preserveOriginal="1" types="characters.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory"
language="English"/>

      </analyzer>
    </fieldType>

This field definition fullfils all other requirments.
Examples:
- special characters
- passengers<-> passenger
 
The case with exact matching is the last one I have to complete.

The problem with cats <-> cat is caused by SnowballPorterFilterFactory. This
is what I know.

The question is whether it is possible to handle exact matching (edismax)
with only one result like described in the previous post without influencing
existing functionalities?

BR 
Pawel



--
View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058907.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Exact matching in Solr 3.6.1

Posted by Jack Krupansky <ja...@basetechnology.com>.
It sounds as if your field type is doing stemming - mapping "cats" to "cat". 
That is a valuable feature of search, but if you wish to turn it off... go 
ahead and do so by editing the field type. But just be aware that turning 
off stemming is a great loss of search flexibility.

Who knows, maybe you might want to have both stemmed and unstemmed fields in 
an edismax query and give a higher boost to the unstemmed field - but it's 
not up to us to guess your requirements. We're dependent on you clearly 
expressing your requirements.

As indicated before, you, the developer have complete control here. But... 
it is up to you, the developer to choose wisely, to suit your application 
requirements. But if you don't describe your requirements with greater 
precision and detail, we won't be able to be of much help to you.

Your second (only two????) requirement relates to spellcheck, which is 
completely unrelated to query matching and exactness. Yes, Solr has a 
spellcheck capability, and yes, it does collation. Is that all you are 
asking? If there is a specific issue, please be specific about it.

-- Jack Krupansky

-----Original Message----- 
From: vsl
Sent: Thursday, April 25, 2013 8:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Exact matching in Solr 3.6.1

I will explain my case in the example below:

We have three documents with given content:

First document:
london cats glenvilet

Second document
london cat glenvilet leeds

Third document
london cat glenvilet

Search term: "cats" AND London NOT Leeds

Expected result: First document
Current result: First document, Third document

Additionaly, next requirement says that when I type as search term: "cats"
AND Londo NOT Leeds
then I should get spell check collation: "cats" AND London NOT Leeds




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058890.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Exact matching in Solr 3.6.1

Posted by vsl <oc...@gmail.com>.
I will explain my case in the example below:

We have three documents with given content:

First document:
london cats glenvilet

Second document
london cat glenvilet leeds

Third document
london cat glenvilet 

Search term: "cats" AND London NOT Leeds 

Expected result: First document
Current result: First document, Third document

Additionaly, next requirement says that when I type as search term: "cats"
AND Londo NOT Leeds 
then I should get spell check collation: "cats" AND London NOT Leeds 




--
View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058890.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Exact matching in Solr 3.6.1

Posted by Jack Krupansky <ja...@basetechnology.com>.
As indicated previously, yes, exact matching is possible in Solr. You, the 
developer, have full control over the exactness or inexactness of all 
queries. If any query is inexact in some way, it is solely due to decisions 
that you, the developer, have made.

Generally speaking, inexactness, fuzziness if you will, is the precise 
quality that most developers - and users - are looking for in search. I 
mean, generally, having to be precise and "exact" in search requests... is 
tedious and a real drag, and something to be avoided - in general.

But, that's what string fields, the white space tokenizer, the regular 
expression tokenizer, and full developer control of the token filter 
sequence are for - to let you, the developer, to have full control, 
including all aspects of "exactness" of search.

As to your specific question - there is nothing about the "AND", "OR", or 
"NOT" (or "+" or "-") operators that is in any way anything other than 
"exact", in terms of document matching. "OR" can be considered a form of 
"inexactness" in that presence of a term is optional, but "AND" means 
absolutely MUST, and "NOT" means absolutely MUST_NOT. About as exact as 
anything could get.

Scoring and relevancy are another story, but have nothing to do with 
matching or "exactness". Exactness and matching only affect whether a 
document is counted in "numFound" and included in results or not, not the 
ordering of results.

But why are you asking? Is there some problem you are trying to solve? Is 
there some query that is not giving you the results you expect? If this is 
simply a general information question, fine, answered. But if you are trying 
to solve some problem, you will need to clearly state your problem rather 
than asking some general, abstract question.

-- Jack Krupansky

-----Original Message----- 
From: vsl
Sent: Thursday, April 25, 2013 5:33 AM
To: solr-user@lucene.apache.org
Subject: Exact matching in Solr 3.6.1

Hi,
is it possible to get exact matched result if the search term is combined
e.g. "cats" AND London NOT Leeds


In the previus threads I have read that it is possible to create new field
of String type and perform phrase search on it but nowhere the above
mentioned combined search term had been taken into consideration.

BR
Pawel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Exact matching in Solr 3.6.1

Posted by Sandeep Mestry <sa...@gmail.com>.
I think in that case, making a field String type is your option, however
remember that it'd be case sensitive.
Another approach is to create a case insensitive field type and doing
searches on those fields only.

<fieldType name="string_ci" class="solr.TextField" sortMissingLast="true"
omitNorms="true" compressThreshold="10">
           <analyzer>
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
</fieldType>

Can you provide your fields and dismax config and if possible records you
would like and records you do not want?

-S


On 25 April 2013 11:50, vsl <oc...@gmail.com> wrote:

> Thanks for your reply. I am using edismax as well. What I want to get is
> the
> exact match without other results that could be close to the given term.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058876.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Exact matching in Solr 3.6.1

Posted by vsl <oc...@gmail.com>.
Thanks for your reply. I am using edismax as well. What I want to get is the
exact match without other results that could be close to the given term.



--
View this message in context: http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865p4058876.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Exact matching in Solr 3.6.1

Posted by Sandeep Mestry <sa...@gmail.com>.
Hi Pawel,

Not sure which parser you are using, I am using edismax and tried using the
bq parameter to boost the results having exact matches at the top.
You may try something like:
q="cats" AND London NOT Leeds&bq="cats"^50

In edismax, pf and pf2 parameters also need some tuning to get the results
at the top.

HTH,
Sandeep


On 25 April 2013 10:33, vsl <oc...@gmail.com> wrote:

> Hi,
>  is it possible to get exact matched result if the search term is combined
> e.g. "cats" AND London NOT Leeds
>
>
> In the previus threads I have read that it is possible to create new field
> of String type and perform phrase search on it but nowhere the above
> mentioned combined search term had been taken into consideration.
>
> BR
> Pawel
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Exact-matching-in-Solr-3-6-1-tp4058865.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>