You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by prashantc88 <pr...@searshc.com> on 2014/07/21 17:29:33 UTC

Solr schema.xml query analyser

 0 down vote favorite
	

I am a complete beginner to Solr and need some help.

My task is to provide a match when the search term contains the indexed
field.

For example:

    If query= foo bar and textExactMatch= foo, I should not get a MATCH
    If query= foo bar and textExactMatch= foo bar, I should get a MATCH
    If query= foo bar and textExactMatch= xyz foo bar/foo bar xyz, I should
get a MATCH

I am indexing my field as follows:

<fieldType name="textExactMatch" class="solr.TextField"
positionIncrementGap="100">
            <analyzer type="index">
                    <tokenizer class="solr.KeywordTokenizerFactory"/>
                    <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
                    <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>

So I'm indexing the text for the field as it is without breaking it further
down. Could someone help me out with how should I tokenize and filter the
field during query time.




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-schema-xml-query-analyser-tp4148317.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr schema.xml query analyser

Posted by Jack Krupansky <ja...@basetechnology.com>.

That sounds more like a "reverse" query - trying to match documents against 
the query rather than matching the query against the documents. Solr doesn't 
have that feature currently.

Although I'm not absolutely sure what your "textExactMatch" is. I'm guessing 
that it is a document field in your index.

-- Jack Krupansky

-----Original Message----- 
From: newBie88
Sent: Monday, July 21, 2014 1:13 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr schema.xml query analyser

My apologies Jack. But there was a mistake in my question.

I actually switched "query" and "textExactMatch" in my question.

I would be really helpful if you could have a look at the scenario once
again:

My task is to provide a match when the search term contains the indexed
field.

For example:

    If textExactMatch= foo bar and query= foo, I should not get a MATCH
    If textExactMatch= foo bar and query= foo bar, I should get a MATCH
    If textExactMatch= foo bar and query= xyz foo bar/foo bar xyz, I should
get a MATCH

I am indexing my field as follows:

<fieldType name="textExactMatch" class="solr.TextField"
positionIncrementGap="100">
  <analyzer type="index">
     <tokenizer class="solr.KeywordTokenizerFactory"/>
     <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
     <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

So I'm indexing the text for the field as it is without breaking it further
down. How should I tokenize and filter the field during query time?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-schema-xml-query-analyser-tp4148317p4148352.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr schema.xml query analyser

Posted by newBie88 <pr...@searshc.com>.

My apologies Jack. But there was a mistake in my question.

I actually switched "query" and "textExactMatch" in my question.

I would be really helpful if you could have a look at the scenario once
again:

My task is to provide a match when the search term contains the indexed
field. 

For example: 

    If textExactMatch= foo bar and query= foo, I should not get a MATCH 
    If textExactMatch= foo bar and query= foo bar, I should get a MATCH 
    If textExactMatch= foo bar and query= xyz foo bar/foo bar xyz, I should
get a MATCH 

I am indexing my field as follows: 

<fieldType name="textExactMatch" class="solr.TextField"
positionIncrementGap="100">
  <analyzer type="index">
     <tokenizer class="solr.KeywordTokenizerFactory"/>
     <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
     <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

So I'm indexing the text for the field as it is without breaking it further
down. How should I tokenize and filter the field during query time? 



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-schema-xml-query-analyser-tp4148317p4148352.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr schema.xml query analyser

Posted by Jack Krupansky <ja...@basetechnology.com>.

Based on your stated requirements, there is no obvious need to use the 
keyword tokenizer. So fix that and then quoted phrases or escaped spaces 
should work.

-- Jack Krupansky

-----Original Message----- 
From: prashantc88
Sent: Monday, July 21, 2014 11:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr schema.xml query analyser

Thanks Jack for the reply.

I did not mention the query time analyzer in my post because I wasn't sure
what should be put there.

With regards to your reply, If I put the query term in quotes, would I get a
match for the following:

Indexed field value: foo bar
Query term: foo bar xyz/xyz foo bar

I believe it should not as it will be looking for the exact term present in
both the places.

However I want it to behave in the following way:

    If query= foo bar and textExactMatch= foo, I SHOULD NOT get a MATCH
    If query= foo bar and textExactMatch= foo bar, I SHOULD get a MATCH
    If query= foo bar and textExactMatch= xyz foo bar/foo bar xyz, I SHOULD
get a MATCH

Thanks in advance.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-schema-xml-query-analyser-tp4148317p4148327.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr schema.xml query analyser

Posted by prashantc88 <pr...@searshc.com>.

Thanks Jack for the reply.

I did not mention the query time analyzer in my post because I wasn't sure
what should be put there.

With regards to your reply, If I put the query term in quotes, would I get a
match for the following:

Indexed field value: foo bar
Query term: foo bar xyz/xyz foo bar

I believe it should not as it will be looking for the exact term present in
both the places.

However I want it to behave in the following way:

    If query= foo bar and textExactMatch= foo, I SHOULD NOT get a MATCH
    If query= foo bar and textExactMatch= foo bar, I SHOULD get a MATCH
    If query= foo bar and textExactMatch= xyz foo bar/foo bar xyz, I SHOULD 
get a MATCH 

Thanks in advance.




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-schema-xml-query-analyser-tp4148317p4148327.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr schema.xml query analyser

Posted by Jack Krupansky <ja...@basetechnology.com>.

If you don't specify a "query" analyzer, Solr will use the "index" analyzer 
at query time.

But... at query time there is something called a "query parser" which 
typically breaks the query into separate terms, delimited by white space, 
and then calls the analyzer for each term, separately.

You can put the entire query in quotes or escape the space with a backslash.

Of, just use the edismax query parser with the "pf" or "pf2" parameters and 
then Solr will boost exact phrase matches even if not quoted or escaped.

-- Jack Krupansky

-----Original Message----- 
From: prashantc88
Sent: Monday, July 21, 2014 11:29 AM
To: solr-user@lucene.apache.org
Subject: Solr schema.xml query analyser

0 down vote favorite


I am a complete beginner to Solr and need some help.

My task is to provide a match when the search term contains the indexed
field.

For example:

    If query= foo bar and textExactMatch= foo, I should not get a MATCH
    If query= foo bar and textExactMatch= foo bar, I should get a MATCH
    If query= foo bar and textExactMatch= xyz foo bar/foo bar xyz, I should
get a MATCH

I am indexing my field as follows:

<fieldType name="textExactMatch" class="solr.TextField"
positionIncrementGap="100">
            <analyzer type="index">
                    <tokenizer class="solr.KeywordTokenizerFactory"/>
                    <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
                    <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>

So I'm indexing the text for the field as it is without breaking it further
down. Could someone help me out with how should I tokenize and filter the
field during query time.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-schema-xml-query-analyser-tp4148317.html
Sent from the Solr - User mailing list archive at Nabble.com.