You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by PeterKerk <ve...@hotmail.com> on 2011/01/05 16:47:23 UTC

Searching similar values for same field results in different results

Something weird is happening.

I have locations that can have 1 or more themes.
A theme can be: "Kasteel en Landgoed", or a theme can be "Strand en Zee"

I checked in the database, there are many locations that have 1 or more of
these themes assigned to it.

Also in the response xml when I do a general search I get:
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="themes_raw">
	<int name="Hotel en Restaurant">366</int>
	<int name="Kasteel en Landgoed">153</int>    <----- 153 found
	<int name="Strand en Zee">16</int>	<----- 16 found
</lst>


When I request this:
http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Strand%20en%20Zee%22&q=*:*&fl=id,title
I get 16 results. Which is expected.

When I request this:
http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Kasteel%20en%20Landgoed%22&q=*:*&fl=id,title
I get 0 results!!!

why?!?


definition in schema.xml:


<field name="themes" type="text" indexed="true" stored="true"
multiValued="true"  />
<field name="themes_raw" type="string" indexed="true" stored="true"
multiValued="true"/>

<copyField source="themes" dest="themes_raw"/>

Why are these results differing?
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2199269.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching similar values for same field results in different results

Posted by PeterKerk <ve...@hotmail.com>.
That was it! thanks!
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2206087.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching similar values for same field results in different results

Posted by Juan Grande <ju...@gmail.com>.
You have a problem with the analysis chain. When you do a query, the
EnglishPorterFilter is cutting off the last part of your word, but you're
not doing the same when indexing. I think that removing that filter from the
chain will solve your problem.

Remember that there are two different analysis chains, one for indexing time
and one for querying time. I think that you didn't see the shortened word in
analysis.jsp because you entered the text in the "Field Value (Index)" text
box, so it was using the indexing time analysis chain. If you want to see
the results of applying the querying time analysis chain, you should enter
the text in the "Field Value (Query)" text box.

Good luck,

Juan Grande

On Thu, Jan 6, 2011 at 10:58 AM, PeterKerk <ve...@hotmail.com> wrote:

>
> @iorixxx:
> I ran: http://localhost:8983/solr/db/update/?optimize=true
> This is the response:
> <response>
>        <lst name="responseHeader">
>                <int name="status">0</int>
>                <int name="QTime">58</int>
>        </lst>
> </response>
>
> Then I ran:
>
> http://localhost:8983/solr/db/select/?indent=on&facet=on&q=*:*&facet.field=themes_raw
>
> This is response:
> <lst name="facet_fields">
>        <lst name="themes_raw">
>                <int name="Hotel en Restaurant">366</int>
>                <int name="Kasteel en Landgoed">153</int>
>                 <int name="Strand en Zee">16</int>
>         </lst>
> </lst>
>
> So, it seems that nothing has changed there, and it looks like also before
> the optimize operation the results were shown correct?
>
> when you say http caching, you mean the caching by the browser? Or does
> Solr
> have some caching by default? If the latter, how can I clear that cache?
>
>
> @Erick: I added debugquery
>
> For "Strand en Zee" I see this:
> <arr name="parsed_filter_queries">
> <str>PhraseQuery(themes:"strand en zee")</str>
> </arr>
>
> Looks correct.
>
>
> For "Kasteel en Landgoed" I see this:
> <arr name="parsed_filter_queries">
> <str>PhraseQuery(themes:"kasteel en landgo")</str>
> </arr>
>
> Which isnt correct! So it seems herein lies the problem.
>
> Now Im wondering why the value is cut off...this is my schema.xml:
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>  <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_dutch.txt"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>  </analyzer>
>  <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_dutch.txt"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>  </analyzer>
> </fieldType>
>
> <field name="themes" type="text" indexed="true" stored="true"
> multiValued="true"  />
> <field name="themes_raw" type="string" indexed="true" stored="true"
> multiValued="true"/>
>
>
> I checked analysis.jsp:
> filled in Field: "themes"
> and Field value: "Kasteel en Landgoed"
>
> and schema.jsp, but I didnt see any weird results
>
> Now, Im wondering what else it could be..
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2205706.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Searching similar values for same field results in different results

Posted by PeterKerk <ve...@hotmail.com>.
@iorixxx:
I ran: http://localhost:8983/solr/db/update/?optimize=true
This is the response:
<response>
	<lst name="responseHeader">
		<int name="status">0</int>
		<int name="QTime">58</int>
	</lst>
</response>

Then I ran:
http://localhost:8983/solr/db/select/?indent=on&facet=on&q=*:*&facet.field=themes_raw

This is response:
<lst name="facet_fields">
	<lst name="themes_raw">
		<int name="Hotel en Restaurant">366</int>
		<int name="Kasteel en Landgoed">153</int>
		<int name="Strand en Zee">16</int>
	</lst>
</lst>

So, it seems that nothing has changed there, and it looks like also before
the optimize operation the results were shown correct?

when you say http caching, you mean the caching by the browser? Or does Solr
have some caching by default? If the latter, how can I clear that cache?


@Erick: I added debugquery

For "Strand en Zee" I see this:
<arr name="parsed_filter_queries">
<str>PhraseQuery(themes:"strand en zee")</str>
</arr>

Looks correct.


For "Kasteel en Landgoed" I see this:
<arr name="parsed_filter_queries">
<str>PhraseQuery(themes:"kasteel en landgo")</str>
</arr>

Which isnt correct! So it seems herein lies the problem.

Now Im wondering why the value is cut off...this is my schema.xml:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
	<tokenizer class="solr.WhitespaceTokenizerFactory"/>
	<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_dutch.txt"/>
	<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1"/>
	<filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
	<tokenizer class="solr.WhitespaceTokenizerFactory"/>
	<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
	<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_dutch.txt"/>
	<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1"/>
	<filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
	<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

<field name="themes" type="text" indexed="true" stored="true"
multiValued="true"  />
<field name="themes_raw" type="string" indexed="true" stored="true"
multiValued="true"/>


I checked analysis.jsp:
filled in Field: "themes"
and Field value: "Kasteel en Landgoed"

and schema.jsp, but I didnt see any weird results

Now, Im wondering what else it could be..
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2205706.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching similar values for same field results in different results

Posted by Erick Erickson <er...@gmail.com>.
Often adding &debugQuery=on to the URL can show you very useful information
that helps pinpoint the problem. I confess I don't see anything amiss in
what
you've shown though.

Also, look at the "schema browser" page off the admin page, and look
at your "themes" field to see what is actually in your index, it may
surprise you..

Finally, the admin/analysis page (turn debug on) may also help you to see
exactly what tokenization is happening when indexing and querying. I'd guess
that the behavior isn't exactly what you expect.

Best
Erick


On Wed, Jan 5, 2011 at 10:47 AM, PeterKerk <ve...@hotmail.com> wrote:

>
> Something weird is happening.
>
> I have locations that can have 1 or more themes.
> A theme can be: "Kasteel en Landgoed", or a theme can be "Strand en Zee"
>
> I checked in the database, there are many locations that have 1 or more of
> these themes assigned to it.
>
> Also in the response xml when I do a general search I get:
> <lst name="facet_counts">
> <lst name="facet_queries"/>
> <lst name="facet_fields">
> <lst name="themes_raw">
>        <int name="Hotel en Restaurant">366</int>
>        <int name="Kasteel en Landgoed">153</int>    <----- 153 found
>        <int name="Strand en Zee">16</int>      <----- 16 found
> </lst>
>
>
> When I request this:
>
> http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Strand%20en%20Zee%22&q=*:*&fl=id,title
> I get 16 results. Which is expected.
>
> When I request this:
>
> http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Kasteel%20en%20Landgoed%22&q=*:*&fl=id,title
> I get 0 results!!!
>
> why?!?
>
>
> definition in schema.xml:
>
>
> <field name="themes" type="text" indexed="true" stored="true"
> multiValued="true"  />
> <field name="themes_raw" type="string" indexed="true" stored="true"
> multiValued="true"/>
>
> <copyField source="themes" dest="themes_raw"/>
>
> Why are these results differing?
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2199269.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Searching similar values for same field results in different results

Posted by Ahmet Arslan <io...@yahoo.com>.
> 
> uhm...how do I perform an optimize operation? :)


http://localhost:8983/solr/db/update/?optimize=true


      

Re: Searching similar values for same field results in different results

Posted by PeterKerk <ve...@hotmail.com>.
uhm...how do I perform an optimize operation? :)
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2199795.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching similar values for same field results in different results

Posted by Ahmet Arslan <io...@yahoo.com>.
> Something weird is happening.
> 
> I have locations that can have 1 or more themes.
> A theme can be: "Kasteel en Landgoed", or a theme can be
> "Strand en Zee"
> 
> I checked in the database, there are many locations that
> have 1 or more of
> these themes assigned to it.
> 
> Also in the response xml when I do a general search I get:
> <lst name="facet_counts">
> <lst name="facet_queries"/>
> <lst name="facet_fields">
> <lst name="themes_raw">
>     <int name="Hotel en
> Restaurant">366</int>
>     <int name="Kasteel en
> Landgoed">153</int>    <----- 153
> found
>     <int name="Strand en
> Zee">16</int>    <----- 16 found
> </lst>
> 
> 
> When I request this:
> http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Strand%20en%20Zee%22&q=*:*&fl=id,title
> I get 16 results. Which is expected.
> 
> When I request this:
> http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Kasteel%20en%20Landgoed%22&q=*:*&fl=id,title
> I get 0 results!!!
> 
> why?!?

May be you deleted those documents? Deleted terms can appear in facet section until you optimize. Can you run these queries after an optimize operation?
What is the output of this after an optimize :
facet=on&q=*:*&facet.field=themes_raw

Also using browser to query/test solr sometimes gives old results due to http caching.