You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by jmlucjav <jm...@gmail.com> on 2012/04/07 23:24:11 UTC

Suggester not working for digit starting terms

Hi,

I am using Suggester component, as advised in Solr3 book (using solr3.5):
	<searchComponent name="suggest" class="solr.SpellCheckComponent">
		<lst name="spellchecker">
			<str name="name">a_suggest</str>
			<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
			<str
name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
			<str name="field">a_suggest</str>
			<str name="buildOnCommit">true</str>
			<int name="weightBuckets">100</int>
		</lst>
	</searchComponent>
	<requestHandler name="/suggest" class="solr.SearchHandler">
		<lst name="defaults">
			<str name="spellcheck">true</str>
			<str name="spellcheck.dictionary">a_suggest</str>
			<str name="spellcheck.onlyMorePopular">true</str>
			<str name="spellcheck.count">5</str>
			<str name="spellcheck.collate">true</str>
		</lst>
		<arr name="components">
			<str>suggest</str>
		</arr>
	</requestHandler>

But, even if it works fine with words, it seems it does not work for terms
starting with diggits. For example:
http://localhost:8983/solr/suggest?&q=500
gets 0 results, but I know '500 $' is in the a_suggest field, as I can find
many hits by:
http://localhost:8983/solr/select/?q={!prefix f=a_suggest}500

Am I missing something? I have been trying to play with
spellcheck.onlyMorePopular and spellcheck.accuracy but I get the same
results.

thansk
xab

--
View this message in context: http://lucene.472066.n3.nabble.com/Suggester-not-working-for-digit-starting-terms-tp3893433p3893433.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Suggester not working for digit starting terms

Posted by Robert Muir <rc...@gmail.com>.
On Thu, Apr 12, 2012 at 3:52 PM, jmlucjav <jm...@gmail.com> wrote:
> Well now I am really lost...
>
> 1. yes I want to suggest whole sentences too, I want the tokenizer to be
> taken into account, and apparently it is working for me in 3.5.0?? I get
> suggestions that are like "foo bar abc".  Maybe what you mention is only for
> file based dictionaries? I am using the field itself.

it doesnt use *JUST* your tokenizer. It splits and applies identifier
rules. Such identifier rules include things like, 'cannot start with a
digit'.

That's why i recommend you configure a SuggestQueryConverter so you
have complete control of what is going on rather than dealing with the
spellchecking one.

>
> Moving to 3.6.0 is not a problem (I had already downloaded the rc actually)
> but I still see weird things here.
>

installing 3.6 isnt going to do anything magical: as mentioned above
you have to configure the SuggestQueryConverter like the example in
the link if you want to have total control on how the input is treated
before going to the suggester.

-- 
lucidimagination.com

Re: Suggester not working for digit starting terms

Posted by jmlucjav <jm...@gmail.com>.
Well now I am really lost...

1. yes I want to suggest whole sentences too, I want the tokenizer to be
taken into account, and apparently it is working for me in 3.5.0?? I get
suggestions that are like "foo bar abc".  Maybe what you mention is only for
file based dictionaries? I am using the field itself.

2. but for the digit issue, in that case nothing is suggested, not even the
term 500 that is there cause I can find it with this query
http://localhost:8983/solr/select/?q={!prefix f=a_suggest}500 

I tried to set threshold to 0 in case the term was being removed, and is not
that.

Moving to 3.6.0 is not a problem (I had already downloaded the rc actually)
but I still see weird things here.

xab

--
View this message in context: http://lucene.472066.n3.nabble.com/Suggester-not-working-for-digit-starting-terms-tp3893433p3906303.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Suggester not working for digit starting terms

Posted by Robert Muir <rc...@gmail.com>.
On Wed, Apr 11, 2012 at 4:37 PM, jmlucjav <jm...@gmail.com> wrote:
> Just to be sure, reproduced this with example config from 3.5.
>

Regardless of your tokenizer, be aware that with this version of solr
its going to split up terms based on 'identifier rules' (including
splitting on whitespace).
This is because suggesters go thru the ordinary spellchecker framework.

If you are trying to autosuggest actual phrases, have a look at
http://wiki.apache.org/solr/Suggester#Tips_and_tricks
which describes how to set this up along with example configurations.
More information is available in
https://issues.apache.org/jira/browse/SOLR-3143

Essentially this provides a QueryConverter thats hopefully more
suitable for autosuggesters, it just passes the whole entire input to
your query analyzer,
and its your responsibility to do whatever you need there to extract
the 'meat' of the query for autosuggest. The example configuration
linked from the wiki page is just that and uses some regexps to try to
imitate what google's does (discarding operators like +/- but still
keeping the whole thing as a phrase).

You will need Solr 3.6 for this..., but its on its way out.

-- 
lucidimagination.com

Re: Suggester not working for digit starting terms

Posted by jmlucjav <jm...@gmail.com>.
Just to be sure, reproduced this with example config from 3.5.

1. add to schema.xml
		<fieldType name="simpletext" class="solr.TextField"
positionIncrementGap="100">
			<analyzer>
				<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
				<tokenizer class="solr.KeywordTokenizerFactory"/>
				<filter class="solr.LowerCaseFilterFactory"/>
				<filter class="solr.TrimFilterFactory" />
			</analyzer>
		</fieldType>	
	<field name="a_suggest" type="simpletext" stored="true" omitNorms="true"
multiValued="true"/>
	<copyField source="*" dest="a_suggest"/>

2 1. add to solrconfig.xml
	<searchComponent name="suggest" class="solr.SpellCheckComponent">
		<lst name="spellchecker">
			<str name="name">a_suggest</str>
			<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
			<str
name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
			<str name="field">a_suggest</str>
			
			<str name="buildOnCommit">true</str>
			<int name="weightBuckets">100</int>
		</lst>
	</searchComponent>
	<requestHandler name="/suggest" class="solr.SearchHandler">
		<lst name="defaults">
			<str name="spellcheck">true</str>
			<str name="spellcheck.dictionary">a_suggest</str>
			<str name="spellcheck.onlyMorePopular">true</str>
			<str name="spellcheck.count">5</str>
			<str name="spellcheck.collate">true</str>
		</lst>
		<arr name="components">
			<str>suggest</str>
		</arr>
	</requestHandler>
3. wipe data and undex sample docs
4. 
	http://localhost:8983/solr/suggest?&q=720&debugQuery=true   --- 0 result
        http://localhost:8983/solr/select/?q={!prefix%20f=a_suggest}720 ---
1 result


--
View this message in context: http://lucene.472066.n3.nabble.com/Suggester-not-working-for-digit-starting-terms-tp3893433p3903790.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Suggester not working for digit starting terms

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, I can't pursue this right now, anyone want to jump in?

Erick

On Tue, Apr 10, 2012 at 2:41 PM, jmlucjav <jm...@gmail.com> wrote:
> I have double checked and still get the same behaviour. My field is:
>                <fieldType name="simpletext" class="solr.TextField"
> positionIncrementGap="100">
>                        <analyzer>
>                                <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
>                                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                                <filter class="solr.LowerCaseFilterFactory"/>
>                                <filter class="solr.TrimFilterFactory" />
>                        </analyzer>
>                </fieldType>
>
> Analisys shows numbers are there, for '500 $' I get as last step both in
> index&query:
>
> org.apache.solr.analysis.TrimFilterFactory {luceneMatchVersion=LUCENE_35}
> position        1
> term text       500 $
> startOffset     0
> endOffset       5
>
> So I still see something going wrong here
> xab
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Suggester-not-working-for-digit-starting-terms-tp3893433p3900783.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Suggester not working for digit starting terms

Posted by jmlucjav <jm...@gmail.com>.
I have double checked and still get the same behaviour. My field is:
		<fieldType name="simpletext" class="solr.TextField"
positionIncrementGap="100">
			<analyzer>
				<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
				<tokenizer class="solr.KeywordTokenizerFactory"/>
				<filter class="solr.LowerCaseFilterFactory"/>
				<filter class="solr.TrimFilterFactory" />
			</analyzer>
		</fieldType>	

Analisys shows numbers are there, for '500 $' I get as last step both in
index&query:

org.apache.solr.analysis.TrimFilterFactory {luceneMatchVersion=LUCENE_35}
position	1
term text	500 $
startOffset	0
endOffset	5

So I still see something going wrong here
xab

--
View this message in context: http://lucene.472066.n3.nabble.com/Suggester-not-working-for-digit-starting-terms-tp3893433p3900783.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Suggester not working for digit starting terms

Posted by Erick Erickson <er...@gmail.com>.
Is it possible that your fieldType definition for a_suggest is
stripping out the digits? Consider using TermsComponent
http://wiki.apache.org/solr/TermsComponent or the admin
page or Luke to examine the terms actually _in_ your
index. Or look at the admin/analysis page and give it some
sample input to determine what the results of the analysis
chain is....

Best
Erick

On Sat, Apr 7, 2012 at 3:24 PM, jmlucjav <jm...@gmail.com> wrote:
> Hi,
>
> I am using Suggester component, as advised in Solr3 book (using solr3.5):
>        <searchComponent name="suggest" class="solr.SpellCheckComponent">
>                <lst name="spellchecker">
>                        <str name="name">a_suggest</str>
>                        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>                        <str
> name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
>                        <str name="field">a_suggest</str>
>                        <str name="buildOnCommit">true</str>
>                        <int name="weightBuckets">100</int>
>                </lst>
>        </searchComponent>
>        <requestHandler name="/suggest" class="solr.SearchHandler">
>                <lst name="defaults">
>                        <str name="spellcheck">true</str>
>                        <str name="spellcheck.dictionary">a_suggest</str>
>                        <str name="spellcheck.onlyMorePopular">true</str>
>                        <str name="spellcheck.count">5</str>
>                        <str name="spellcheck.collate">true</str>
>                </lst>
>                <arr name="components">
>                        <str>suggest</str>
>                </arr>
>        </requestHandler>
>
> But, even if it works fine with words, it seems it does not work for terms
> starting with diggits. For example:
> http://localhost:8983/solr/suggest?&q=500
> gets 0 results, but I know '500 $' is in the a_suggest field, as I can find
> many hits by:
> http://localhost:8983/solr/select/?q={!prefix f=a_suggest}500
>
> Am I missing something? I have been trying to play with
> spellcheck.onlyMorePopular and spellcheck.accuracy but I get the same
> results.
>
> thansk
> xab
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Suggester-not-working-for-digit-starting-terms-tp3893433p3893433.html
> Sent from the Solr - User mailing list archive at Nabble.com.