You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark Swinson <Ma...@bbc.co.uk> on 2012/02/28 13:29:10 UTC

solr returns reduced results for same query after adding a new field to the schema.

Hi,

I'm currently setting up a schema in solr, which is being imported using
the data-import plugin.

The initial config contains the following key information:

		...

		<fieldType  name="standardTextType"
class="solr.TextField" positionIncrementGap="100" stored="false"
multiValued="false">
			<analyzer type="index">
				<tokenizer
class="solr.KeywordTokenizerFactory"/>
				<filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"/>
				<filter
class="solr.LowerCaseFilterFactory"/>
				<filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
			</analyzer>
			<analyzer type="query">
				<tokenizer
class="solr.KeywordTokenizerFactory"/>
				<filter
class="solr.LowerCaseFilterFactory"/>
				<filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
			</analyzer>
		</fieldType>

		<fieldType name="ingredientSuggestionType"
class="solr.TextField" positionIncrementGap="100" stored="false"
multiValued="true">
			<analyzer type="index">
				<tokenizer
class="solr.WhitespaceTokenizerFactory"/>
				<filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"/>
				<filter
class="solr.LowerCaseFilterFactory"/>
				<filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
			</analyzer>
			<analyzer type="query">
				<tokenizer
class="solr.WhitespaceTokenizerFactory"/>
				<filter
class="solr.LowerCaseFilterFactory"/>
				<filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
			</analyzer>
		</fieldType>

		<fieldType	name="chefSuggestionType"
class="solr.TextField" positionIncrementGap="100" stored="false"
multiValued="false">
			<analyzer type="index">
				<tokenizer
class="solr.WhitespaceTokenizerFactory"/>
				<filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"/>
				<filter
class="solr.LowerCaseFilterFactory"/>
				<filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
				<filter
class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15"
side="front"/>
			</analyzer>
			<analyzer type="query">
				<tokenizer
class="solr.WhitespaceTokenizerFactory"/>
				<filter
class="solr.LowerCaseFilterFactory"/>
				<filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
			</analyzer>
		</fieldType>

		<fieldType	name="programmeSuggestionType"
class="solr.TextField" positionIncrementGap="100" stored="false"
multiValued="false">
			<analyzer type="index">
				<tokenizer
class="solr.WhitespaceTokenizerFactory"/>
				<filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"/>
				<filter
class="solr.LowerCaseFilterFactory"/>
				<filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
				<filter
class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15"
side="front"/>
			</analyzer>
			<analyzer type="query">
				<tokenizer
class="solr.WhitespaceTokenizerFactory"/>
				<filter
class="solr.LowerCaseFilterFactory"/>
				<filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
			</analyzer>
		</fieldType>
		...

with the following fields:

	...
	<fields>
		<field name="recipeId" type="standardTextType"
indexed="true" stored="true" required="true"/>
 		<field name="programmeName" type="standardTextType"
indexed="true" stored="true" required="true"/>
		<field name="programmeSuggestion"
type="programmeSuggestionType" stored="true"/>
		<field name="chefName" type="standardTextType"
indexed="true" stored="true" required="true"/>
		<field name="chefSuggestion" type="chefSuggestionType"
stored="true"/>
		<field name="ingredientText" type="standardTextType"
indexed="true" stored="true" required="true"/>
		<field name="ingredientSuggestion"
type="ingredientSuggestionType" stored="true"/>

and
	...
	<copyField source="programmeName" dest="programmeSuggestion"/>
	<copyField source="chefName" dest="chefSuggestion"/>
	<copyField source="ingredientText" dest="ingredientSuggestion"/>
	<uniqueKey>recipeId</uniqueKey>
	<defaultSearchField>ingredientText</defaultSearchField>
 	<solrQueryParser defaultOperator="OR"/>


When I query solr with the query ?q=ingredientSuggestion=banana I get
160 results.
Ok, all fine.

When I add a new field such as

	<field name="courseName" type="text" indexed="true"
stored="true" required="true"/>

to my index it reduces the number of results from my query to 131, even
though the query 
has'nt changed and does not (at least explicitly) filter the result set.


Obviously I'm missing something fundemental , but I'm not sure what it
is. Has anyone else experienced a similar
problem? Am I doing something wrong in the way I am indexing my
database?


Mark








http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
					

RE: solr returns reduced results for same query after adding a new field to the schema.

Posted by Mark Swinson <Ma...@bbc.co.uk>.
yes, sorry.

-----Original Message-----
From: Dmitry Kan [mailto:dmitry.kan@gmail.com] 
Sent: 28 February 2012 12:33
To: solr-user@lucene.apache.org
Subject: Re: solr returns reduced results for same query after adding a new field to the schema.

Hi,

you meant you query is:

?q=ingredientSuggestion:banana

right?

On Tue, Feb 28, 2012 at 2:29 PM, Mark Swinson <Ma...@bbc.co.uk>wrote:

> Hi,
>
> I'm currently setting up a schema in solr, which is being imported using
> the data-import plugin.
>
> The initial config contains the following key information:
>
>                ...
>
>                <fieldType  name="standardTextType"
> class="solr.TextField" positionIncrementGap="100" stored="false"
> multiValued="false">
>                        <analyzer type="index">
>                                <tokenizer
> class="solr.KeywordTokenizerFactory"/>
>                                <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                        </analyzer>
>                        <analyzer type="query">
>                                <tokenizer
> class="solr.KeywordTokenizerFactory"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                        </analyzer>
>                </fieldType>
>
>                <fieldType name="ingredientSuggestionType"
> class="solr.TextField" positionIncrementGap="100" stored="false"
> multiValued="true">
>                        <analyzer type="index">
>                                <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>                                <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                        </analyzer>
>                        <analyzer type="query">
>                                <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                        </analyzer>
>                </fieldType>
>
>                <fieldType      name="chefSuggestionType"
> class="solr.TextField" positionIncrementGap="100" stored="false"
> multiValued="false">
>                        <analyzer type="index">
>                                <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>                                <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                                <filter
> class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15"
> side="front"/>
>                        </analyzer>
>                        <analyzer type="query">
>                                <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                        </analyzer>
>                </fieldType>
>
>                <fieldType      name="programmeSuggestionType"
> class="solr.TextField" positionIncrementGap="100" stored="false"
> multiValued="false">
>                        <analyzer type="index">
>                                <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>                                <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                                <filter
> class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15"
> side="front"/>
>                        </analyzer>
>                        <analyzer type="query">
>                                <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                        </analyzer>
>                </fieldType>
>                ...
>
> with the following fields:
>
>        ...
>        <fields>
>                <field name="recipeId" type="standardTextType"
> indexed="true" stored="true" required="true"/>
>                <field name="programmeName" type="standardTextType"
> indexed="true" stored="true" required="true"/>
>                <field name="programmeSuggestion"
> type="programmeSuggestionType" stored="true"/>
>                <field name="chefName" type="standardTextType"
> indexed="true" stored="true" required="true"/>
>                <field name="chefSuggestion" type="chefSuggestionType"
> stored="true"/>
>                <field name="ingredientText" type="standardTextType"
> indexed="true" stored="true" required="true"/>
>                <field name="ingredientSuggestion"
> type="ingredientSuggestionType" stored="true"/>
>
> and
>        ...
>        <copyField source="programmeName" dest="programmeSuggestion"/>
>        <copyField source="chefName" dest="chefSuggestion"/>
>        <copyField source="ingredientText" dest="ingredientSuggestion"/>
>        <uniqueKey>recipeId</uniqueKey>
>        <defaultSearchField>ingredientText</defaultSearchField>
>        <solrQueryParser defaultOperator="OR"/>
>
>
> When I query solr with the query ?q=ingredientSuggestion=banana I get
> 160 results.
> Ok, all fine.
>
> When I add a new field such as
>
>        <field name="courseName" type="text" indexed="true"
> stored="true" required="true"/>
>
> to my index it reduces the number of results from my query to 131, even
> though the query
> has'nt changed and does not (at least explicitly) filter the result set.
>
>
> Obviously I'm missing something fundemental , but I'm not sure what it
> is. Has anyone else experienced a similar
> problem? Am I doing something wrong in the way I am indexing my
> database?
>
>
> Mark
>
>
>
>
>
>
>
>
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in
> reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>
>


-- 
Regards,

Dmitry Kan

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
					

Re: solr returns reduced results for same query after adding a new field to the schema.

Posted by Dmitry Kan <dm...@gmail.com>.
Hi,

you meant you query is:

?q=ingredientSuggestion:banana

right?

On Tue, Feb 28, 2012 at 2:29 PM, Mark Swinson <Ma...@bbc.co.uk>wrote:

> Hi,
>
> I'm currently setting up a schema in solr, which is being imported using
> the data-import plugin.
>
> The initial config contains the following key information:
>
>                ...
>
>                <fieldType  name="standardTextType"
> class="solr.TextField" positionIncrementGap="100" stored="false"
> multiValued="false">
>                        <analyzer type="index">
>                                <tokenizer
> class="solr.KeywordTokenizerFactory"/>
>                                <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                        </analyzer>
>                        <analyzer type="query">
>                                <tokenizer
> class="solr.KeywordTokenizerFactory"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                        </analyzer>
>                </fieldType>
>
>                <fieldType name="ingredientSuggestionType"
> class="solr.TextField" positionIncrementGap="100" stored="false"
> multiValued="true">
>                        <analyzer type="index">
>                                <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>                                <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                        </analyzer>
>                        <analyzer type="query">
>                                <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                        </analyzer>
>                </fieldType>
>
>                <fieldType      name="chefSuggestionType"
> class="solr.TextField" positionIncrementGap="100" stored="false"
> multiValued="false">
>                        <analyzer type="index">
>                                <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>                                <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                                <filter
> class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15"
> side="front"/>
>                        </analyzer>
>                        <analyzer type="query">
>                                <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                        </analyzer>
>                </fieldType>
>
>                <fieldType      name="programmeSuggestionType"
> class="solr.TextField" positionIncrementGap="100" stored="false"
> multiValued="false">
>                        <analyzer type="index">
>                                <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>                                <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                                <filter
> class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15"
> side="front"/>
>                        </analyzer>
>                        <analyzer type="query">
>                                <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>                                <filter
> class="solr.LowerCaseFilterFactory"/>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                        </analyzer>
>                </fieldType>
>                ...
>
> with the following fields:
>
>        ...
>        <fields>
>                <field name="recipeId" type="standardTextType"
> indexed="true" stored="true" required="true"/>
>                <field name="programmeName" type="standardTextType"
> indexed="true" stored="true" required="true"/>
>                <field name="programmeSuggestion"
> type="programmeSuggestionType" stored="true"/>
>                <field name="chefName" type="standardTextType"
> indexed="true" stored="true" required="true"/>
>                <field name="chefSuggestion" type="chefSuggestionType"
> stored="true"/>
>                <field name="ingredientText" type="standardTextType"
> indexed="true" stored="true" required="true"/>
>                <field name="ingredientSuggestion"
> type="ingredientSuggestionType" stored="true"/>
>
> and
>        ...
>        <copyField source="programmeName" dest="programmeSuggestion"/>
>        <copyField source="chefName" dest="chefSuggestion"/>
>        <copyField source="ingredientText" dest="ingredientSuggestion"/>
>        <uniqueKey>recipeId</uniqueKey>
>        <defaultSearchField>ingredientText</defaultSearchField>
>        <solrQueryParser defaultOperator="OR"/>
>
>
> When I query solr with the query ?q=ingredientSuggestion=banana I get
> 160 results.
> Ok, all fine.
>
> When I add a new field such as
>
>        <field name="courseName" type="text" indexed="true"
> stored="true" required="true"/>
>
> to my index it reduces the number of results from my query to 131, even
> though the query
> has'nt changed and does not (at least explicitly) filter the result set.
>
>
> Obviously I'm missing something fundemental , but I'm not sure what it
> is. Has anyone else experienced a similar
> problem? Am I doing something wrong in the way I am indexing my
> database?
>
>
> Mark
>
>
>
>
>
>
>
>
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal
> views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in
> reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>
>


-- 
Regards,

Dmitry Kan

Re: solr returns reduced results for same query after adding a new field to the schema.

Posted by Chris Hostetter <ho...@fucit.org>.
: When I query solr with the query ?q=ingredientSuggestion=banana I get
: 160 results.
: Ok, all fine.
: 
: When I add a new field such as
: 
: 	<field name="courseName" type="text" indexed="true"
: stored="true" required="true"/>
: 
: to my index it reduces the number of results from my query to 131, even
: though the query 
: has'nt changed and does not (at least explicitly) filter the result set.

presumably you changed your DIH config in some way when you added that 
field? are you certain that you didn't do anything to alter the total 
number of docs being indexed?  or that the data source didn't change 
between the first time you indexed and the second?


-Hoss