You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "ssharma7884@gmail.com" <ss...@gmail.com> on 2015/06/30 15:35:29 UTC

Suggester configuration queries.

Hi,
I have the following Solr 5.1 configuration:

*schema.xml*
<fields>
..... 
..... 
<field name="text" type="c_text" indexed="true" stored="true"
termVectors="true" termPositions="true" termOffsets="true" />
<field name="document_name" type="c_document_name" indexed="true"
stored="true" required="true" multiValued="false" />
..... 
..... 
</fields>

<types>
..... 
..... 
                <fieldType name="c_text" class="solr.TextField"
positionIncrementGap="100">
                        <analyzer type="index">
                                <tokenizer
class="solr.UAX29URLEmailTokenizerFactory"/>
                                <filter class="solr.StopFilterFactory"
ignoreCase="true" words="lang/stopwords_en.txt" />
                                <filter
class="solr.ASCIIFoldingFilterFactory"/>
                                <filter
class="solr.EnglishPossessiveFilterFactory"/>
                                <filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
                                <filter class="solr.TrimFilterFactory"/>
                                <filter
class="solr.LowerCaseFilterFactory"/>
                        </analyzer>
                        <analyzer type="query">
                                <tokenizer
class="solr.UAX29URLEmailTokenizerFactory"/>
                                <filter class="solr.StopFilterFactory"
ignoreCase="true" words="lang/stopwords_en.txt" />
                                <filter
class="solr.ASCIIFoldingFilterFactory"/>
                                <filter
class="solr.EnglishPossessiveFilterFactory"/>
                                <filter
class="solr.LowerCaseFilterFactory"/>
                        </analyzer>
                </fieldType>

                <fieldType name="c_document_name" class="solr.TextField"
positionIncrementGap="100">
                        <analyzer type="index">
                                <tokenizer
class="solr.KeywordTokenizerFactory"/>
                                <filter
class="solr.LowerCaseFilterFactory"/>
                        </analyzer>
                        <analyzer type="query">
                                <tokenizer
class="solr.KeywordTokenizerFactory"/>
                                <filter
class="solr.LowerCaseFilterFactory"/>
                        </analyzer>
                </fieldType>
..... 
..... 
</types>


*solrconfig.xml*
...... 
...... 
<searchComponent name="suggest" class="solr.SuggestComponent">
   <lst name="suggester">
      <str name="name">textSuggester</str>
      <str name="lookupImpl">FreeTextLookupFactory</str>
      <str name="dictionaryImpl">DocumentDictionaryFactory</str>
      <str name="field">text</str>
      <str name="suggestFreeTextAnalyzerFieldType">c_text</str>
      <str name="buildOnCommit">true</str>
   </lst>
   <lst name="suggester">
      <str name="name">docNameSuggester</str>
      <str name="lookupImpl">FreeTextLookupFactory</str>
      <str name="dictionaryImpl">DocumentDictionaryFactory</str>
      <str name="field">document_name</str>
      <str name="suggestFreeTextAnalyzerFieldType">c_document_name</str>
      <str name="buildOnCommit">true</str>
   </lst>
</searchComponent>

  <requestHandler name="/suggestHandler" class="solr.SearchHandler" 
                  startup="lazy" >
    <lst name="defaults">
      <str name="wt">json</str>
      <str name="suggest">true</str>
      <str name="suggest.count">5</str>

      <str name="suggest.dictionary">textSuggester</str>
      <str name="suggest.dictionary">docNameSuggester</str>
    </lst>
    <arr name="components">
      <str>suggest</str>
    </arr>
  </requestHandler>
...... 
...... 


*Query:*
1) w.r.t. above configuration, is it OK to autocommit on save?

I came across the a link
http://www.signaldump.org/solr/qpod/33101/solr-suggester 
which mentions:

"The index-based spellcheck/suggest just reads terms from the indexed
fields which takes no time to build but suffers from reading indexed
terms, i.e. terms that have gone through the analysis process that may
have been stemmed, lowercased, all that."

So, if the above is correct, the time consumed is for reading data (SELECT).

P.S.I need to buildOnCommit to get the latest tokens in Suggeter. Any better
ideas, suggestion to achieve this?

Regards,
Sachin Vyas.



--
View this message in context: http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Suggester configuration queries.

Posted by "ssharma7884@gmail.com" <ss...@gmail.com>.
Hi, 
For my reply dated "Jul 02, 2015; 4:47pm", for my scenario / test data, the
results of Spellchecker of Solr 4.6 & 5.1 are fine.
Also, the results of Suggester of Solr 4.6 & 5.1 are fine.

I was mixing up the two components.


Thanks & Regards, 
Sachin Vyas.



--
View this message in context: http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950p4217032.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Suggester configuration queries.

Posted by "ssharma7884@gmail.com" <ss...@gmail.com>.
Hi,
I am using the Solr Terms Component for auto-suggestion, this provides me
the functionality as per my requirements.

https://wiki.apache.org/solr/TermsComponent


Regards,
Sachin Vyas.



--
View this message in context: http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950p4217029.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Suggester configuration queries.

Posted by "ssharma7884@gmail.com" <ss...@gmail.com>.
Any Suggestions on this ?


Regards,
Sachin Vyas.



--
View this message in context: http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950p4216045.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Suggester configuration queries.

Posted by "ssharma7884@gmail.com" <ss...@gmail.com>.
Alessandro Benedetti,
Thanks for the links.


Regards,
Sachin Vyas.




--
View this message in context: http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950p4217234.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Suggester configuration queries.

Posted by Alessandro Benedetti <be...@gmail.com>.
Using the term component to get Auto-suggest is a very old approach, and
gives minimal features…
If it is ok for you, ok!

I would suggest these reading for Auto suggestions :

Suggester Solr wiki
<https://cwiki.apache.org/confluence/display/solr/Suggester>
Solr suggester <http://lucidworks.com/blog/solr-suggester/> ( Erick's post)
http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html ( my
post)

Hope they help!

Cheers


2015-07-13 11:51 GMT+01:00 ssharma7884@gmail.com <ss...@gmail.com>:

> Hi,
> For my reply dated "Jul 02, 2015; 4:47pm", Actually *there is no difference
> in results* for "spellchecker" & "suggester" components in Solr 4.6 and
> Solr
> 5.1. I was actually mixing up the two components.
>
>
> Thanks & Regards,
> Sachin Vyas.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950p4217030.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Suggester configuration queries.

Posted by "ssharma7884@gmail.com" <ss...@gmail.com>.
Hi,
For my reply dated "Jul 02, 2015; 4:47pm", Actually *there is no difference
in results* for "spellchecker" & "suggester" components in Solr 4.6 and Solr
5.1. I was actually mixing up the two components.


Thanks & Regards,
Sachin Vyas.



--
View this message in context: http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950p4217030.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Suggester configuration queries.

Posted by "ssharma7884@gmail.com" <ss...@gmail.com>.
Erick,
We actaully have a working version of Solr 4.6 Spellchecker, the
configuration details are as mentioned below:

*Solr 4.6 - schema.xml*
<field name="suggest" type="text_suggest" indexed="true" stored="false"
multiValued="true" />
<copyField source="text" dest="suggest"/>

		<fieldType name="text_suggest" class="solr.TextField"
positionIncrementGap="100">
			<analyzer type="index">
				<tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
				<filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_en.txt" />
				<filter class="solr.ASCIIFoldingFilterFactory"/>
				<filter class="solr.EnglishPossessiveFilterFactory"/>
				<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
				<filter class="solr.TrimFilterFactory"/>
				<filter class="solr.LowerCaseFilterFactory"/>
			</analyzer>
			<analyzer type="query">
				<tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
				<filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_en.txt" />
				<filter class="solr.ASCIIFoldingFilterFactory"/>
				<filter class="solr.EnglishPossessiveFilterFactory"/>
				<filter class="solr.LowerCaseFilterFactory"/>
			</analyzer>
		</fieldType>

*Solr 4.6 - solrconfig.xml*
<requestHandler class="org.apache.solr.handler.component.SearchHandler"
name="/suggest">
        <lst name="defaults">
            <str name="echoParams">none</str>
            <str name="wt">json</str>
            <str name="indent">false</str>
            <str name="spellcheck">true</str>
            <str name="spellcheck.dictionary">suggestDictionary</str>
            <str name="spellcheck.onlyMorePopular">true</str>
            <str name="spellcheck.count">5</str>
            <str name="spellcheck.collate">false</str>
        </lst>
        <arr name="components">
            <str>suggest</str>
        </arr>
    </requestHandler>

<searchComponent class="solr.SpellCheckComponent" name="suggest">
        <lst name="spellchecker">
            <str name="name">suggestDictionary</str>
            <str
name="classname">org.apache.solr.spelling.suggest.Suggester</str>
            <str
name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
            <str name="field">suggest</str>
            <float name="threshold">0.</float>
            <str name="buildOnCommit">true</str>
        </lst>
    </searchComponent>

*Solr 4.6 Spellcheck query*
http://localhost:8983/solr/portal_documents/suggest?&wt=xml&spellcheck.q=wh

*Solr 4.6 Spellcheck results*
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="wh">
<int name="numFound">5</int>
<int name="startOffset">0</int>
<int name="endOffset">2</int>
<arr name="suggestion">
*<str>when</str>
<str>what</str>
<str>where</str>
<str>which</str>
<str>who</str>*
</arr>
</lst>
</lst>
</lst>
</response>

Now, we are migrating to Solr 5.1 & have the following configuration
details:
*Solr 5.1 - schema.xml*

<field name="suggest" type="c_suggest" indexed="true" stored="false"
multiValued="true" />
<copyField source="text" dest="suggest"/>

		<fieldType name="c_suggest" class="solr.TextField"
positionIncrementGap="100">
			<analyzer type="index">
				<tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
				<filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_en.txt" />
				<filter class="solr.ASCIIFoldingFilterFactory"/>
				<filter class="solr.EnglishPossessiveFilterFactory"/>
				<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
				<filter class="solr.TrimFilterFactory"/>
				<filter class="solr.LowerCaseFilterFactory"/>
			</analyzer>
			<analyzer type="query">
				<tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
				<filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_en.txt" />
				<filter class="solr.ASCIIFoldingFilterFactory"/>
				<filter class="solr.EnglishPossessiveFilterFactory"/>
				<filter class="solr.LowerCaseFilterFactory"/>
			</analyzer>
		</fieldType>

*Solr 5.1 - solrconfig.xml*

  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
     <str name="queryAnalyzerFieldType">c_suggest</str>
    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">suggest</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>
      <str name="distanceMeasure">internal</str>
      <float name="accuracy">0.01</float>
      <int name="maxEdits">2</int>
      <int name="minPrefix">1</int>
      <int name="maxInspections">5</int>
      <int name="minQueryLength">1</int>
      <float name="maxQueryFrequency">0.01</float>
      <float name="thresholdTokenFrequency">.01</float>
    </lst>
  </searchComponent>

  <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
		<str name="spellcheck.dictionary">default</str>
		<str name="spellcheck">on</str>
		<str name="spellcheck.count">5</str>
		<str name="spellcheck.extendedResults">false</str>
		<str name="wt">xml</str>
		<str name="spellcheck.onlyMorePopular">true</str>
		<str name="spellcheck.collate">false</str>
    </lst>
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>

*Solr 5.1 SPellcheck query (same as Solr 4.6)*
http://localhost:8983/solr/portal_documents/spell?&wt=xml&spellcheck.q=wh

*Solr 5.1 Spellcheck results*
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">62</int>
</lst>
<result name="response" numFound="0" start="0"/>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="wh">
<int name="numFound">2</int>
<int name="startOffset">0</int>
<int name="endOffset">2</int>
<arr name="suggestion">
*<str>we</str>
<str>who</str>*
</arr>
</lst>
</lst>
</lst>
</response>

Both the Solr versions have same data & spellcheck index is also built.
I want to get the same results for Spellchecker in Solr 5.1 as I am getting
in 4.6, but I am not able to get it.

Can you please suggest an appropriate fix?
Is there some problem in my Solr 5.1 configuration?


Regards,
Sachin Vyas.




--
View this message in context: http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950p4215393.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Suggester configuration queries.

Posted by Erick Erickson <er...@gmail.com>.
that's where I'd start at least.....

Erick

On Wed, Jul 1, 2015 at 6:56 AM, ssharma7884@gmail.com
<ss...@gmail.com> wrote:
> Erick,
> As per your reply - *"So for your situation, I'd use a copyField to a
> minimally-analyzed field
> and use the index-based suggesters."*
>
> Are you suggesting use of "spellCheck" component in Solr, and in it
> "DirectSolrSpellChecker"?
>
>
> Regards,
> Sachin Vyas.
>
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950p4215190.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Suggester configuration queries.

Posted by "ssharma7884@gmail.com" <ss...@gmail.com>.
Erick,
As per your reply - *"So for your situation, I'd use a copyField to a
minimally-analyzed field 
and use the index-based suggesters."*

Are you suggesting use of "spellCheck" component in Solr, and in it
"DirectSolrSpellChecker"?


Regards,
Sachin Vyas.






--
View this message in context: http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950p4215190.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Suggester configuration queries.

Posted by Erick Erickson <er...@gmail.com>.
This will be pretty much unworkable for any large corpus. The
DocumentDictionaryFactory
builds its index by reading the stored value from every document
in your index to put into a sidecar Solr index (for free text suggester).

This can take many minutes so doing this on every commit is an
anti-pattern. The suggester framework is very powerful, but not to be
used casually.

So for your situation, I'd use a copyField to a minimally-analyzed field
and use the index-based suggesters.

Best,
Erick

On Tue, Jun 30, 2015 at 9:35 AM, ssharma7884@gmail.com
<ss...@gmail.com> wrote:
> Hi,
> I have the following Solr 5.1 configuration:
>
> *schema.xml*
> <fields>
> .....
> .....
> <field name="text" type="c_text" indexed="true" stored="true"
> termVectors="true" termPositions="true" termOffsets="true" />
> <field name="document_name" type="c_document_name" indexed="true"
> stored="true" required="true" multiValued="false" />
> .....
> .....
> </fields>
>
> <types>
> .....
> .....
>                 <fieldType name="c_text" class="solr.TextField"
> positionIncrementGap="100">
>                         <analyzer type="index">
>                                 <tokenizer
> class="solr.UAX29URLEmailTokenizerFactory"/>
>                                 <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="lang/stopwords_en.txt" />
>                                 <filter
> class="solr.ASCIIFoldingFilterFactory"/>
>                                 <filter
> class="solr.EnglishPossessiveFilterFactory"/>
>                                 <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                                 <filter class="solr.TrimFilterFactory"/>
>                                 <filter
> class="solr.LowerCaseFilterFactory"/>
>                         </analyzer>
>                         <analyzer type="query">
>                                 <tokenizer
> class="solr.UAX29URLEmailTokenizerFactory"/>
>                                 <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="lang/stopwords_en.txt" />
>                                 <filter
> class="solr.ASCIIFoldingFilterFactory"/>
>                                 <filter
> class="solr.EnglishPossessiveFilterFactory"/>
>                                 <filter
> class="solr.LowerCaseFilterFactory"/>
>                         </analyzer>
>                 </fieldType>
>
>                 <fieldType name="c_document_name" class="solr.TextField"
> positionIncrementGap="100">
>                         <analyzer type="index">
>                                 <tokenizer
> class="solr.KeywordTokenizerFactory"/>
>                                 <filter
> class="solr.LowerCaseFilterFactory"/>
>                         </analyzer>
>                         <analyzer type="query">
>                                 <tokenizer
> class="solr.KeywordTokenizerFactory"/>
>                                 <filter
> class="solr.LowerCaseFilterFactory"/>
>                         </analyzer>
>                 </fieldType>
> .....
> .....
> </types>
>
>
> *solrconfig.xml*
> ......
> ......
> <searchComponent name="suggest" class="solr.SuggestComponent">
>    <lst name="suggester">
>       <str name="name">textSuggester</str>
>       <str name="lookupImpl">FreeTextLookupFactory</str>
>       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>       <str name="field">text</str>
>       <str name="suggestFreeTextAnalyzerFieldType">c_text</str>
>       <str name="buildOnCommit">true</str>
>    </lst>
>    <lst name="suggester">
>       <str name="name">docNameSuggester</str>
>       <str name="lookupImpl">FreeTextLookupFactory</str>
>       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>       <str name="field">document_name</str>
>       <str name="suggestFreeTextAnalyzerFieldType">c_document_name</str>
>       <str name="buildOnCommit">true</str>
>    </lst>
> </searchComponent>
>
>   <requestHandler name="/suggestHandler" class="solr.SearchHandler"
>                   startup="lazy" >
>     <lst name="defaults">
>       <str name="wt">json</str>
>       <str name="suggest">true</str>
>       <str name="suggest.count">5</str>
>
>       <str name="suggest.dictionary">textSuggester</str>
>       <str name="suggest.dictionary">docNameSuggester</str>
>     </lst>
>     <arr name="components">
>       <str>suggest</str>
>     </arr>
>   </requestHandler>
> ......
> ......
>
>
> *Query:*
> 1) w.r.t. above configuration, is it OK to autocommit on save?
>
> I came across the a link
> http://www.signaldump.org/solr/qpod/33101/solr-suggester
> which mentions:
>
> "The index-based spellcheck/suggest just reads terms from the indexed
> fields which takes no time to build but suffers from reading indexed
> terms, i.e. terms that have gone through the analysis process that may
> have been stemmed, lowercased, all that."
>
> So, if the above is correct, the time consumed is for reading data (SELECT).
>
> P.S.I need to buildOnCommit to get the latest tokens in Suggeter. Any better
> ideas, suggestion to achieve this?
>
> Regards,
> Sachin Vyas.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Suggester-configuration-queries-tp4214950.html
> Sent from the Solr - User mailing list archive at Nabble.com.