You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by darniz <rn...@edmunds.com> on 2009/11/24 05:42:32 UTC

Implementing phrase autopop up

hello all
Let me first explain the task i am trying to do.
i have article with title for example
<doc>
<str name="title">>Car Insurance for Teenage Drivers</str>
</doc>
−
<doc>
<str name="title">A Total Loss? </str>
</doc>
If a user begins to type car insu i want the autopop to show up with the
entire phrase.
There are two ways to implement this.
First is to use the termcomponent and the other is to use a field with field
type which uses solr.EdgeNGramFilterFactor filter.

I started with using with Term component and i declared a term request
handler and gave the following query

http://localhost:8080/solr/terms?terms.fl=title&terms.prefix=car
The issue is that its not giving the entire pharse, it gives me back results
like car, caravan, carbon. Now  i know using terms.prefix will only give me
results where the sentence start with car. On top of this i also want if
there is word like car somewhere in between the title that should also show
up in autopop very much similar like google where a word is not necessarily
start at the beginning but it could be present anywhere in the middle of the
title.
The question is does TermComponent is a good candidate or  using a custom
field lets the name is autoPopupText with field type configured with all
filter and EdgeNGramFilterFactor defined and copying the title to the
autoPopupText field and using it to power autopopup.

The other thing is that using  EdgeNGramFilterFactor is more from index
point of view when you index document you need to know which fields you want
to copy to autoPopupText field where as using Term component is more like
you can define at query time what fields you want to use to fetch
autocomplete from.

Any idea whats the best and why the Term component is not giving me an
entire phrase which i mentioned earlier.
FYI
my title field is of type text.
Thanks
darniz

-- 
View this message in context: http://old.nabble.com/Implementing-phrase-autopop-up-tp26490419p26490419.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Implementing phrase autopop up

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, Nov 24, 2009 at 11:58 PM, darniz <rn...@edmunds.com> wrote:

>
>
> i created a filed as same as the lucid blog says.
>
> <field name="autocomp" type="edgytext" indexed="true" stored="true"
> omitNorms="true" omitTermFreqAndPositions="true"/>
>
> with the following field configurtion
>
> <fieldType name="edgytext" class="solr.TextField"
> positionIncrementGap="100">
> −
> <analyzer type="index">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="25"/>
> </analyzer>
> −
> <analyzer type="query">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
>
> Now when i query i get the correct phrases for example if search for
> autocomp:"how to" i get all the correct phrases like
>
> How to find a car
> How to find a mechanic
> How to choose the right insurance company
>
> etc... which is good.
>
> Now I have two question.
> 1) Is it necessary to give the query in quote. My gut feeling is yes, since
> if you dont give quote i get phrases beginning with How followed by some
> other words like How can etc...
>

Yes since we want to do phrase searches on n-grams



> 2)if i search for word for example choose, it gives me nothing
> I was expecting to see a result considering there is a word "choose" in the
> phrase
> How to choose the right insurance company
>
> i might look more at documentation but do you have anything to advice.
>
>
EdgeNgram creates n-grams from the starting or the ending edge therefore you
can't match words in the middle of a phrase. Try using NGramFilterFactory
instead.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Implementing phrase autopop up

Posted by darniz <rn...@edmunds.com>.
can anybody update me if its possible that a word within a phrase is match,
that phrase can be displayed.

darniz

darniz wrote:
> 
> Thanks for your input
> You made a valid point, if we are using field type as text to get
> autocomplete it wont work because it goes through tokenizer.
> Hence looks like for my use case i need to have a field which uses ngram
> and copy. Here is what i did
> 
> i created a filed as same as the lucid blog says.
> 
> <field name="autocomp" type="edgytext" indexed="true" stored="true"
> omitNorms="true" omitTermFreqAndPositions="true"/>
> 
> with the following field configurtion
> 
> <fieldType name="edgytext" class="solr.TextField"
> positionIncrementGap="100">
> −
> <analyzer type="index">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="25"/>
> </analyzer>
> −
> <analyzer type="query">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
> 
> Now when i query i get the correct phrases for example if search for 
> autocomp:"how to" i get all the correct phrases like
> 
> How to find a car
> How to find a mechanic 
> How to choose the right insurance company
> 
> etc... which is good.
> 
> Now I have two question.
> 1) Is it necessary to give the query in quote. My gut feeling is yes,
> since  if you dont give quote i get phrases beginning with How followed by
> some other words like How can etc...
> 
> 2)if i search for word for example choose, it gives me nothing
> I was expecting to see a result considering there is a word "choose" in
> the phrase 
> How to choose the right insurance company
> 
> i might look more at documentation but do you have anything to advice.
> 
> darniz
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Shalin Shekhar Mangar wrote:
>> 
>> On Tue, Nov 24, 2009 at 10:12 AM, darniz <rn...@edmunds.com> wrote:
>> 
>>>
>>> hello all
>>> Let me first explain the task i am trying to do.
>>> i have article with title for example
>>> <doc>
>>> <str name="title">>Car Insurance for Teenage Drivers</str>
>>> </doc>
>>> −
>>> <doc>
>>> <str name="title">A Total Loss? </str>
>>> </doc>
>>> If a user begins to type car insu i want the autopop to show up with the
>>> entire phrase.
>>> There are two ways to implement this.
>>> First is to use the termcomponent and the other is to use a field with
>>> field
>>> type which uses solr.EdgeNGramFilterFactor filter.
>>>
>>> I started with using with Term component and i declared a term request
>>> handler and gave the following query
>>>
>>> http://localhost:8080/solr/terms?terms.fl=title&terms.prefix=car
>>> The issue is that its not giving the entire pharse, it gives me back
>>> results
>>> like car, caravan, carbon. Now  i know using terms.prefix will only give
>>> me
>>> results where the sentence start with car. On top of this i also want if
>>> there is word like car somewhere in between the title that should also
>>> show
>>> up in autopop very much similar like google where a word is not
>>> necessarily
>>> start at the beginning but it could be present anywhere in the middle of
>>> the
>>> title.
>>> The question is does TermComponent is a good candidate or  using a
>>> custom
>>> field lets the name is autoPopupText with field type configured with all
>>> filter and EdgeNGramFilterFactor defined and copying the title to the
>>> autoPopupText field and using it to power autopopup.
>>>
>>> The other thing is that using  EdgeNGramFilterFactor is more from index
>>> point of view when you index document you need to know which fields you
>>> want
>>> to copy to autoPopupText field where as using Term component is more
>>> like
>>> you can define at query time what fields you want to use to fetch
>>> autocomplete from.
>>>
>>> Any idea whats the best and why the Term component is not giving me an
>>> entire phrase which i mentioned earlier.
>>> FYI
>>> my title field is of type text.
>>>
>> 
>> 
>> You are using a tokenized field type with TermsComponent therefore each
>> word
>> in your phrase gets indexed as a separate token. You should use a
>> non-tokenized type (such as a string type) with TermsComponent. However,
>> this will only let you search by prefix and not by words in between the
>> phrase.
>> 
>> Your best bet here would be to use EdgeNGramFilterFactory. If your index
>> is
>> very large, you can consider doing a prefix search on shingles too.
>> 
>> -- 
>> Regards,
>> Shalin Shekhar Mangar.
>> 
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Implementing-phrase-autopop-up-tp26490419p26506470.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Implementing phrase autopop up

Posted by darniz <rn...@edmunds.com>.
Thanks for your input
You made a valid point, if we are using field type as text to get
autocomplete it wont work because it goes through tokenizer.
Hence looks like for my use case i need to have a field which uses ngram and
copy. Here is what i did

i created a filed as same as the lucid blog says.

<field name="autocomp" type="edgytext" indexed="true" stored="true"
omitNorms="true" omitTermFreqAndPositions="true"/>

with the following field configurtion

<fieldType name="edgytext" class="solr.TextField"
positionIncrementGap="100">
−
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
maxGramSize="25"/>
</analyzer>
−
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

Now when i query i get the correct phrases for example if search for 
autocomp:"how to" i get all the correct phrases like

How to find a car
How to find a mechanic 
How to choose the right insurance company

etc... which is good.

Now I have two question.
1) Is it necessary to give the query in quote. My gut feeling is yes, since 
if you dont give quote i get phrases beginning with How followed by some
other words like How can etc...

2)if i search for word for example choose, it gives me nothing
I was expecting to see a result considering there is a word "choose" in the
phrase 
How to choose the right insurance company

i might look more at documentation but do you have anything to advice.

darniz









Shalin Shekhar Mangar wrote:
> 
> On Tue, Nov 24, 2009 at 10:12 AM, darniz <rn...@edmunds.com> wrote:
> 
>>
>> hello all
>> Let me first explain the task i am trying to do.
>> i have article with title for example
>> <doc>
>> <str name="title">>Car Insurance for Teenage Drivers</str>
>> </doc>
>> −
>> <doc>
>> <str name="title">A Total Loss? </str>
>> </doc>
>> If a user begins to type car insu i want the autopop to show up with the
>> entire phrase.
>> There are two ways to implement this.
>> First is to use the termcomponent and the other is to use a field with
>> field
>> type which uses solr.EdgeNGramFilterFactor filter.
>>
>> I started with using with Term component and i declared a term request
>> handler and gave the following query
>>
>> http://localhost:8080/solr/terms?terms.fl=title&terms.prefix=car
>> The issue is that its not giving the entire pharse, it gives me back
>> results
>> like car, caravan, carbon. Now  i know using terms.prefix will only give
>> me
>> results where the sentence start with car. On top of this i also want if
>> there is word like car somewhere in between the title that should also
>> show
>> up in autopop very much similar like google where a word is not
>> necessarily
>> start at the beginning but it could be present anywhere in the middle of
>> the
>> title.
>> The question is does TermComponent is a good candidate or  using a custom
>> field lets the name is autoPopupText with field type configured with all
>> filter and EdgeNGramFilterFactor defined and copying the title to the
>> autoPopupText field and using it to power autopopup.
>>
>> The other thing is that using  EdgeNGramFilterFactor is more from index
>> point of view when you index document you need to know which fields you
>> want
>> to copy to autoPopupText field where as using Term component is more like
>> you can define at query time what fields you want to use to fetch
>> autocomplete from.
>>
>> Any idea whats the best and why the Term component is not giving me an
>> entire phrase which i mentioned earlier.
>> FYI
>> my title field is of type text.
>>
> 
> 
> You are using a tokenized field type with TermsComponent therefore each
> word
> in your phrase gets indexed as a separate token. You should use a
> non-tokenized type (such as a string type) with TermsComponent. However,
> this will only let you search by prefix and not by words in between the
> phrase.
> 
> Your best bet here would be to use EdgeNGramFilterFactory. If your index
> is
> very large, you can consider doing a prefix search on shingles too.
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: http://old.nabble.com/Implementing-phrase-autopop-up-tp26490419p26499912.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Implementing phrase autopop up

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, Nov 24, 2009 at 10:12 AM, darniz <rn...@edmunds.com> wrote:

>
> hello all
> Let me first explain the task i am trying to do.
> i have article with title for example
> <doc>
> <str name="title">>Car Insurance for Teenage Drivers</str>
> </doc>
> −
> <doc>
> <str name="title">A Total Loss? </str>
> </doc>
> If a user begins to type car insu i want the autopop to show up with the
> entire phrase.
> There are two ways to implement this.
> First is to use the termcomponent and the other is to use a field with
> field
> type which uses solr.EdgeNGramFilterFactor filter.
>
> I started with using with Term component and i declared a term request
> handler and gave the following query
>
> http://localhost:8080/solr/terms?terms.fl=title&terms.prefix=car
> The issue is that its not giving the entire pharse, it gives me back
> results
> like car, caravan, carbon. Now  i know using terms.prefix will only give me
> results where the sentence start with car. On top of this i also want if
> there is word like car somewhere in between the title that should also show
> up in autopop very much similar like google where a word is not necessarily
> start at the beginning but it could be present anywhere in the middle of
> the
> title.
> The question is does TermComponent is a good candidate or  using a custom
> field lets the name is autoPopupText with field type configured with all
> filter and EdgeNGramFilterFactor defined and copying the title to the
> autoPopupText field and using it to power autopopup.
>
> The other thing is that using  EdgeNGramFilterFactor is more from index
> point of view when you index document you need to know which fields you
> want
> to copy to autoPopupText field where as using Term component is more like
> you can define at query time what fields you want to use to fetch
> autocomplete from.
>
> Any idea whats the best and why the Term component is not giving me an
> entire phrase which i mentioned earlier.
> FYI
> my title field is of type text.
>


You are using a tokenized field type with TermsComponent therefore each word
in your phrase gets indexed as a separate token. You should use a
non-tokenized type (such as a string type) with TermsComponent. However,
this will only let you search by prefix and not by words in between the
phrase.

Your best bet here would be to use EdgeNGramFilterFactory. If your index is
very large, you can consider doing a prefix search on shingles too.

-- 
Regards,
Shalin Shekhar Mangar.