You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by stocki <st...@shopgate.com> on 2010/03/10 14:55:52 UTC

distinct on my result

hello.

i implement my suggest-function with edgengramfilter.
now when i get my result , is the result not distinct. often ist the name
double or more.

is it possible that solr gives me only distinct result ?

 "response":{"numFound":172,"start":0,"docs":[
	{
	 "name":"Halloween"},
	{
	 "name":"Hallo Taxi"},
	{
	 "name":"Halloween"},
	{
	 "name":"Hallstatt"},
	{
	 "name":"Hallo Mary"},
	{
	 "name":"Halloween"},
	{
	 "name":"Halloween"},
	{
	 "name":"Halloween"},
	{
	 "name":"Halleluja"},
	{
	 "name":"Halloween"}]

so how can i delete Halloween from solr ? 
i didnt want delete it from client-side

thx



-- 
View this message in context: http://old.nabble.com/distinct-on-my-result-tp27849951p27849951.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: distinct on my result

Posted by stocki <st...@shopgate.com>.

okay now its better ;9

thx =)



gwk-4 wrote:
> 
> Hi,
> 
> Try replacing KeywordTokenizerFactory with a WhitespaceTokenizerFactory 
> so it'll create separate terms per word. After a reindex it should work.
> 
> Regards,
> 
> gwk
> 
> On 3/11/2010 4:33 PM, stocki wrote:
>> hey,
>>
>> okay i show your my settings ;)
>> i use an extra core with the standard requesthandler.
>>
>>
>> SCHEMA.XML
>> <field name="id" type="string"  indexed="true" stored="true"
>> required="true"
>> />
>> <field name="name" type="text"    indexed="true" stored="true"
>> required="true" />
>> <field name="suggest" type="autocomplete" indexed="true" stored="true"
>> multiValued="true"/>
>> <copyField source="name"  dest="suggest"/>
>>
>> so i copy my names to the field suggest and use the EdgeNGramFilter and
>> some
>> others
>>
>> <fieldType name="autocomplete" class="solr.TextField">
>>          <analyzer type="index">
>>              <tokenizer class="solr.KeywordTokenizerFactory"/>
>>              <filter class="solr.LowerCaseFilterFactory" />
>> 			<filter class="solr.EdgeNGramFilterFactory" maxGramSize="100"
>> minGramSize="1" />	
>> 			<filter class="solr.StandardFilterFactory"/>
>>          	<filter class="solr.TrimFilterFactory"/>
>>          	<filter class="solr.SnowballPorterFilterFactory"
>> language="German2"
>> protected="protwords.txt"/>				
>> 			<filter class="solr.SnowballPorterFilterFactory" language="English"
>> protected="protwords.txt"/>
>> 			<filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" />
>>          </analyzer>
>>          <analyzer type="query">
>>              <tokenizer class="solr.KeywordTokenizerFactory"/>
>>              <filter class="solr.LowerCaseFilterFactory" />
>> 			<filter class="solr.EdgeNGramFilterFactory" maxGramSize="100"
>> minGramSize="1" />
>> 	<filter class="solr.StandardFilterFactory"/>
>> 	<filter class="solr.TrimFilterFactory"/>
>> 	<filter class="solr.SnowballPorterFilterFactory" language="German2"
>> protected="protwords.txt"/>	
>> 	<filter class="solr.SnowballPorterFilterFactory" language="English"
>> protected="protwords.txt"/>	
>> 			<filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt"/>
>> 					
>>          </analyzer>
>> 	</fieldType>
>>
>>
>> so with this konfig i get the results above ...
>>
>> maybe i have toooo many filters ;) ?!
>>
>>
>>
>> gwk-4 wrote:
>>    
>>> Hi,
>>>
>>> I'm no expert on the full-text search features of Solr but I guess that
>>> has something to do with your fieldtype, or query. Are you using the
>>> standard request handler or dismax for your queries? And what analysers
>>> are you using on your product name field?
>>>
>>> Regards,
>>>
>>> gwk
>>>
>>> On 3/11/2010 3:24 PM, stocki wrote:
>>>      
>>>> okay.
>>>> we have a lot of products and i just importet the name of each product
>>>> to
>>>> a
>>>> core.
>>>> make an edgengram to this and my autoCOMPLETION runs.
>>>>
>>>> but i want an auto-suggestion:
>>>>
>>>> example.
>>>>
>>>> autoCompletion-->             I: "harry" O: "harry potter..."
>>>> but when the input ist -->   I. "potter" -- O: /
>>>>
>>>> so what i want is, that i get "harry potter ..." when i tipping
>>>> "potter"
>>>> into my search field!
>>>>
>>>> any idea ?
>>>>
>>>> i think the solution is a mixe of termsComponent and EdgeNGram or not ?
>>>>
>>>> i am a little bit despair, and in this forum are too many information
>>>> about
>>>> it =(
>>>>
>>>>
>>>> gwk-4 wrote:
>>>>
>>>>        
>>>>> Hi,
>>>>>
>>>>> The autosuggest core is filled by a simple script (written in PHP)
>>>>> which
>>>>> request facet values for all the possible strings one can search for
>>>>> and
>>>>> adds them one by one as a document. Our case has some special issues
>>>>> due
>>>>> to the fact that we search in multiple languages (Typing "España" will
>>>>> suggest "Spain" and the other way around when on the Spanish site). We
>>>>> have about 97500 documents yeilding approximately 12500 different
>>>>> documents in our autosuggest-core and the autosuggest-update script
>>>>> takes about 5 minutes to do a full re-index (all this is done on a
>>>>> separate server and replicated so the indexing has no impact on the
>>>>> performance of the site).
>>>>>
>>>>> Regards,
>>>>>
>>>>> gwk
>>>>>
>>>>> On 3/10/2010 3:09 PM, stocki wrote:
>>>>>
>>>>>          
>>>>>> okay. thx
>>>>>>
>>>>>> my suggestion run in another core;)
>>>>>>
>>>>>> do you distinct during the import with DIH ?
>>>>>>
>>>>>>
>>>>>>            
>>>>>
>>>>>
>>>>>          
>>>>        
>>>
>>>
>>>      
>>    
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/distinct-on-my-result-tp27849951p27874112.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: distinct on my result

Posted by gwk <gi...@eyefi.nl>.

Hi,

Try replacing KeywordTokenizerFactory with a WhitespaceTokenizerFactory 
so it'll create separate terms per word. After a reindex it should work.

Regards,

gwk

On 3/11/2010 4:33 PM, stocki wrote:
> hey,
>
> okay i show your my settings ;)
> i use an extra core with the standard requesthandler.
>
>
> SCHEMA.XML
> <field name="id" type="string"  indexed="true" stored="true" required="true"
> />
> <field name="name" type="text"    indexed="true" stored="true"
> required="true" />
> <field name="suggest" type="autocomplete" indexed="true" stored="true"
> multiValued="true"/>
> <copyField source="name"  dest="suggest"/>
>
> so i copy my names to the field suggest and use the EdgeNGramFilter and some
> others
>
> <fieldType name="autocomplete" class="solr.TextField">
>          <analyzer type="index">
>              <tokenizer class="solr.KeywordTokenizerFactory"/>
>              <filter class="solr.LowerCaseFilterFactory" />
> 			<filter class="solr.EdgeNGramFilterFactory" maxGramSize="100"
> minGramSize="1" />	
> 			<filter class="solr.StandardFilterFactory"/>
>          	<filter class="solr.TrimFilterFactory"/>
>          	<filter class="solr.SnowballPorterFilterFactory" language="German2"
> protected="protwords.txt"/>				
> 			<filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
> 			<filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>          </analyzer>
>          <analyzer type="query">
>              <tokenizer class="solr.KeywordTokenizerFactory"/>
>              <filter class="solr.LowerCaseFilterFactory" />
> 			<filter class="solr.EdgeNGramFilterFactory" maxGramSize="100"
> minGramSize="1" />
> 	<filter class="solr.StandardFilterFactory"/>
> 	<filter class="solr.TrimFilterFactory"/>
> 	<filter class="solr.SnowballPorterFilterFactory" language="German2"
> protected="protwords.txt"/>	
> 	<filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>	
> 			<filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
> 					
>          </analyzer>
> 	</fieldType>
>
>
> so with this konfig i get the results above ...
>
> maybe i have toooo many filters ;) ?!
>
>
>
> gwk-4 wrote:
>    
>> Hi,
>>
>> I'm no expert on the full-text search features of Solr but I guess that
>> has something to do with your fieldtype, or query. Are you using the
>> standard request handler or dismax for your queries? And what analysers
>> are you using on your product name field?
>>
>> Regards,
>>
>> gwk
>>
>> On 3/11/2010 3:24 PM, stocki wrote:
>>      
>>> okay.
>>> we have a lot of products and i just importet the name of each product to
>>> a
>>> core.
>>> make an edgengram to this and my autoCOMPLETION runs.
>>>
>>> but i want an auto-suggestion:
>>>
>>> example.
>>>
>>> autoCompletion-->             I: "harry" O: "harry potter..."
>>> but when the input ist -->   I. "potter" -- O: /
>>>
>>> so what i want is, that i get "harry potter ..." when i tipping "potter"
>>> into my search field!
>>>
>>> any idea ?
>>>
>>> i think the solution is a mixe of termsComponent and EdgeNGram or not ?
>>>
>>> i am a little bit despair, and in this forum are too many information
>>> about
>>> it =(
>>>
>>>
>>> gwk-4 wrote:
>>>
>>>        
>>>> Hi,
>>>>
>>>> The autosuggest core is filled by a simple script (written in PHP) which
>>>> request facet values for all the possible strings one can search for and
>>>> adds them one by one as a document. Our case has some special issues due
>>>> to the fact that we search in multiple languages (Typing "España" will
>>>> suggest "Spain" and the other way around when on the Spanish site). We
>>>> have about 97500 documents yeilding approximately 12500 different
>>>> documents in our autosuggest-core and the autosuggest-update script
>>>> takes about 5 minutes to do a full re-index (all this is done on a
>>>> separate server and replicated so the indexing has no impact on the
>>>> performance of the site).
>>>>
>>>> Regards,
>>>>
>>>> gwk
>>>>
>>>> On 3/10/2010 3:09 PM, stocki wrote:
>>>>
>>>>          
>>>>> okay. thx
>>>>>
>>>>> my suggestion run in another core;)
>>>>>
>>>>> do you distinct during the import with DIH ?
>>>>>
>>>>>
>>>>>            
>>>>
>>>>
>>>>          
>>>        
>>
>>
>>      
>

Re: distinct on my result

Posted by stocki <st...@shopgate.com>.

hey,

okay i show your my settings ;)
i use an extra core with the standard requesthandler.


SCHEMA.XML
<field name="id" type="string"  indexed="true" stored="true" required="true"
/>
<field name="name" type="text"    indexed="true" stored="true"
required="true" />
<field name="suggest" type="autocomplete" indexed="true" stored="true" 
multiValued="true"/>
<copyField source="name"  dest="suggest"/>

so i copy my names to the field suggest and use the EdgeNGramFilter and some
others 

<fieldType name="autocomplete" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory" />   
			<filter class="solr.EdgeNGramFilterFactory" maxGramSize="100"
minGramSize="1" />	
			<filter class="solr.StandardFilterFactory"/>
        	<filter class="solr.TrimFilterFactory"/>
        	<filter class="solr.SnowballPorterFilterFactory" language="German2"
protected="protwords.txt"/>				
			<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
			<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory" />
			<filter class="solr.EdgeNGramFilterFactory" maxGramSize="100"
minGramSize="1" />
	        <filter class="solr.StandardFilterFactory"/>
	        <filter class="solr.TrimFilterFactory"/>
	        <filter class="solr.SnowballPorterFilterFactory" language="German2"
protected="protwords.txt"/>	
	        <filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>	
			<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
					
        </analyzer>
	</fieldType>


so with this konfig i get the results above ...

maybe i have toooo many filters ;) ?!



gwk-4 wrote:
> 
> Hi,
> 
> I'm no expert on the full-text search features of Solr but I guess that 
> has something to do with your fieldtype, or query. Are you using the 
> standard request handler or dismax for your queries? And what analysers 
> are you using on your product name field?
> 
> Regards,
> 
> gwk
> 
> On 3/11/2010 3:24 PM, stocki wrote:
>> okay.
>> we have a lot of products and i just importet the name of each product to
>> a
>> core.
>> make an edgengram to this and my autoCOMPLETION runs.
>>
>> but i want an auto-suggestion:
>>
>> example.
>>
>> autoCompletion-->            I: "harry" O: "harry potter..."
>> but when the input ist -->  I. "potter" -- O: /
>>
>> so what i want is, that i get "harry potter ..." when i tipping "potter"
>> into my search field!
>>
>> any idea ?
>>
>> i think the solution is a mixe of termsComponent and EdgeNGram or not ?
>>
>> i am a little bit despair, and in this forum are too many information
>> about
>> it =(
>>
>>
>> gwk-4 wrote:
>>
>>> Hi,
>>>
>>> The autosuggest core is filled by a simple script (written in PHP) which
>>> request facet values for all the possible strings one can search for and
>>> adds them one by one as a document. Our case has some special issues due
>>> to the fact that we search in multiple languages (Typing "España" will
>>> suggest "Spain" and the other way around when on the Spanish site). We
>>> have about 97500 documents yeilding approximately 12500 different
>>> documents in our autosuggest-core and the autosuggest-update script
>>> takes about 5 minutes to do a full re-index (all this is done on a
>>> separate server and replicated so the indexing has no impact on the
>>> performance of the site).
>>>
>>> Regards,
>>>
>>> gwk
>>>
>>> On 3/10/2010 3:09 PM, stocki wrote:
>>>
>>>> okay. thx
>>>>
>>>> my suggestion run in another core;)
>>>>
>>>> do you distinct during the import with DIH ?
>>>>
>>>>
>>>
>>>
>>>
>>
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/distinct-on-my-result-tp27849951p27865058.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: distinct on my result

Posted by gwk <gi...@eyefi.nl>.

Hi,

I'm no expert on the full-text search features of Solr but I guess that 
has something to do with your fieldtype, or query. Are you using the 
standard request handler or dismax for your queries? And what analysers 
are you using on your product name field?

Regards,

gwk

On 3/11/2010 3:24 PM, stocki wrote:
> okay.
> we have a lot of products and i just importet the name of each product to a
> core.
> make an edgengram to this and my autoCOMPLETION runs.
>
> but i want an auto-suggestion:
>
> example.
>
> autoCompletion-->            I: "harry" O: "harry potter..."
> but when the input ist -->  I. "potter" -- O: /
>
> so what i want is, that i get "harry potter ..." when i tipping "potter"
> into my search field!
>
> any idea ?
>
> i think the solution is a mixe of termsComponent and EdgeNGram or not ?
>
> i am a little bit despair, and in this forum are too many information about
> it =(
>
>
> gwk-4 wrote:
>
>> Hi,
>>
>> The autosuggest core is filled by a simple script (written in PHP) which
>> request facet values for all the possible strings one can search for and
>> adds them one by one as a document. Our case has some special issues due
>> to the fact that we search in multiple languages (Typing "España" will
>> suggest "Spain" and the other way around when on the Spanish site). We
>> have about 97500 documents yeilding approximately 12500 different
>> documents in our autosuggest-core and the autosuggest-update script
>> takes about 5 minutes to do a full re-index (all this is done on a
>> separate server and replicated so the indexing has no impact on the
>> performance of the site).
>>
>> Regards,
>>
>> gwk
>>
>> On 3/10/2010 3:09 PM, stocki wrote:
>>
>>> okay. thx
>>>
>>> my suggestion run in another core;)
>>>
>>> do you distinct during the import with DIH ?
>>>
>>>
>>
>>
>>
>

Re: distinct on my result

Posted by stocki <st...@shopgate.com>.

okay.
we have a lot of products and i just importet the name of each product to a
core.
make an edgengram to this and my autoCOMPLETION runs.

but i want an auto-suggestion:

example.

autoCompletion-->           I: "harry" O: "harry potter..."
but when the input ist --> I. "potter" -- O: /

so what i want is, that i get "harry potter ..." when i tipping "potter"
into my search field!

any idea ? 

i think the solution is a mixe of termsComponent and EdgeNGram or not ? 

i am a little bit despair, and in this forum are too many information about
it =( 


gwk-4 wrote:
> 
> Hi,
> 
> The autosuggest core is filled by a simple script (written in PHP) which 
> request facet values for all the possible strings one can search for and 
> adds them one by one as a document. Our case has some special issues due 
> to the fact that we search in multiple languages (Typing "España" will 
> suggest "Spain" and the other way around when on the Spanish site). We 
> have about 97500 documents yeilding approximately 12500 different 
> documents in our autosuggest-core and the autosuggest-update script 
> takes about 5 minutes to do a full re-index (all this is done on a 
> separate server and replicated so the indexing has no impact on the 
> performance of the site).
> 
> Regards,
> 
> gwk
> 
> On 3/10/2010 3:09 PM, stocki wrote:
>> okay. thx
>>
>> my suggestion run in another core;)
>>
>> do you distinct during the import with DIH ?
>>    
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/distinct-on-my-result-tp27849951p27864088.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: distinct on my result

Posted by gwk <gi...@eyefi.nl>.

Hi,

The autosuggest core is filled by a simple script (written in PHP) which 
request facet values for all the possible strings one can search for and 
adds them one by one as a document. Our case has some special issues due 
to the fact that we search in multiple languages (Typing "España" will 
suggest "Spain" and the other way around when on the Spanish site). We 
have about 97500 documents yeilding approximately 12500 different 
documents in our autosuggest-core and the autosuggest-update script 
takes about 5 minutes to do a full re-index (all this is done on a 
separate server and replicated so the indexing has no impact on the 
performance of the site).

Regards,

gwk

On 3/10/2010 3:09 PM, stocki wrote:
> okay. thx
>
> my suggestion run in another core;)
>
> do you distinct during the import with DIH ?
>

Re: distinct on my result

Posted by stocki <st...@shopgate.com>.

hey.

okay. thx 

my suggestion run in another core ;)

do you distinct during the import with DIH ?



gwk-4 wrote:
> 
> Hi,
> 
> I ran into the same issue, and what I did (at 
> http://www.mysecondhome.co.uk/) was to create a separate core just for 
> autosuggest which is fully updated once an hour which contains the 
> distinct values of the items I want to look for including the count so I 
> can display the approximate amount of results in the suggest dropdown. 
> This might not be a good solution when your data is updated frequently 
> but for us it's worked very well so far. Maybe you can also use 
> clustering so you won't have to create a separate core but I'm thinking 
> my solution performs better (although I haven't tested it so I could be 
> horribly horribly wrong).
> 
> Regards,
> 
> gwk
> 
> On 3/10/2010 2:55 PM, stocki wrote:
>> hello.
>>
>> i implement my suggest-function with edgengramfilter.
>> now when i get my result , is the result not distinct. often ist the name
>> double or more.
>>
>> is it possible that solr gives me only distinct result ?
>>
>>   "response":{"numFound":172,"start":0,"docs":[
>> 	{
>> 	 "name":"Halloween"},
>> 	{
>> 	 "name":"Hallo Taxi"},
>> 	{
>> 	 "name":"Halloween"},
>> 	{
>> 	 "name":"Hallstatt"},
>> 	{
>> 	 "name":"Hallo Mary"},
>> 	{
>> 	 "name":"Halloween"},
>> 	{
>> 	 "name":"Halloween"},
>> 	{
>> 	 "name":"Halloween"},
>> 	{
>> 	 "name":"Halleluja"},
>> 	{
>> 	 "name":"Halloween"}]
>>
>> so how can i delete Halloween from solr ?
>> i didnt want delete it from client-side
>>
>> thx
>>
>>
>>
>>    
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/distinct-on-my-result-tp27849951p27850157.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: distinct on my result

Posted by gwk <gi...@eyefi.nl>.

Hi,

I ran into the same issue, and what I did (at 
http://www.mysecondhome.co.uk/) was to create a separate core just for 
autosuggest which is fully updated once an hour which contains the 
distinct values of the items I want to look for including the count so I 
can display the approximate amount of results in the suggest dropdown. 
This might not be a good solution when your data is updated frequently 
but for us it's worked very well so far. Maybe you can also use 
clustering so you won't have to create a separate core but I'm thinking 
my solution performs better (although I haven't tested it so I could be 
horribly horribly wrong).

Regards,

gwk

On 3/10/2010 2:55 PM, stocki wrote:
> hello.
>
> i implement my suggest-function with edgengramfilter.
> now when i get my result , is the result not distinct. often ist the name
> double or more.
>
> is it possible that solr gives me only distinct result ?
>
>   "response":{"numFound":172,"start":0,"docs":[
> 	{
> 	 "name":"Halloween"},
> 	{
> 	 "name":"Hallo Taxi"},
> 	{
> 	 "name":"Halloween"},
> 	{
> 	 "name":"Hallstatt"},
> 	{
> 	 "name":"Hallo Mary"},
> 	{
> 	 "name":"Halloween"},
> 	{
> 	 "name":"Halloween"},
> 	{
> 	 "name":"Halloween"},
> 	{
> 	 "name":"Halleluja"},
> 	{
> 	 "name":"Halloween"}]
>
> so how can i delete Halloween from solr ?
> i didnt want delete it from client-side
>
> thx
>
>
>
>