You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Danicela nutch <Da...@mail.com> on 2011/09/07 09:46:59 UTC

Spellcheck with Solr

Hi,

 I'm trying to get search suggestions like Google 'Did you mean ?' with indexed data with Solr from Nutch.

 I added this to my schema.xml :

 <fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" >
 <analyzer>
 <tokenizer class="solr.StandardTokenizerFactory"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer>
 </fieldType>

 <field name="textSpell" type="text" stored="false" indexed="true" multiValued="true" />

 <copyField source="*_text" dest="textSpell" />

 ---

 I added this to my solrconfig.xml :

 <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

 <str name="queryAnalyzerFieldType">textSpell</str>

 <lst name="spellchecker">
 <str name="classname">solr.IndexBasedSpellChecker</str>
 <str name="name">textSpell</str>
 <str name="field">textSpell</str>
 <str name="spellcheckIndexDir">./spellcheckerDefault</str>
 </lst>
 </searchComponent>

 I modified this :

 <requestHandler name="standard" class="solr.SearchHandler" default="true">
 <lst name="defaults">
 <str name="echoParams">explicit</str>

 <str name="spellcheck">true</str>
 <str name="spellcheck.dictionary">textSpell</str>
 <str name="spellcheck.onlyMorePopular">false</str>
 <str name="spellcheck.extendedResults">true</str>
 <str name="spellcheck.collate">true</str>
 <str name="spellcheck.count">5</str>

 </lst>

 <arr name="last-components">
 <str>spellcheck</str>
 </arr>
 </requestHandler>

 ---

 The first time, I put a spellcheck.build=true in the request, the index was modified, but has only 20 bytes. (I think that's strange for 7000 indexed pages)

 This request : http://localhost:8983/solr/select/?q=nytames

 returns that :

 −
 <response>
 −
 <lst name="responseHeader">
 <int name="status">0</int>
 <int name="QTime">33</int>
 −
 <lst name="params">
 <str name="q">nytames</str>
 </lst>
 </lst>
 <result name="response" numFound="0" start="0"/>
 −
 <lst name="spellcheck">
 −
 <lst name="suggestions">
 <bool name="correctlySpelled">false</bool>
 </lst>
 </lst>
 </response>

 I tried with spellcheck=true but it doesn't change nothing.

 I should get some suggestions in the <lst name="suggestions" but I get nothing.

 Do someone has an idea about the problem ?

 Thanks.

Re: Spellcheck with Solr

Posted by Markus Jelsma <ma...@openindex.io>.
Please send Solr specific questions to the Solr mailing list. There's more 
help there.

Thanks

> Hi,
> 
>  I'm trying to get search suggestions like Google 'Did you mean ?' with
> indexed data with Solr from Nutch.
> 
>  I added this to my schema.xml :
> 
>  <fieldType name="textSpell" class="solr.TextField"
> positionIncrementGap="100" > <analyzer>
>  <tokenizer class="solr.StandardTokenizerFactory"/>
>  <filter class="solr.LowerCaseFilterFactory"/>
>  <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>  </analyzer>
>  </fieldType>
> 
>  <field name="textSpell" type="text" stored="false" indexed="true"
> multiValued="true" />
> 
>  <copyField source="*_text" dest="textSpell" />
> 
>  ---
> 
>  I added this to my solrconfig.xml :
> 
>  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
> 
>  <str name="queryAnalyzerFieldType">textSpell</str>
> 
>  <lst name="spellchecker">
>  <str name="classname">solr.IndexBasedSpellChecker</str>
>  <str name="name">textSpell</str>
>  <str name="field">textSpell</str>
>  <str name="spellcheckIndexDir">./spellcheckerDefault</str>
>  </lst>
>  </searchComponent>
> 
>  I modified this :
> 
>  <requestHandler name="standard" class="solr.SearchHandler" default="true">
>  <lst name="defaults">
>  <str name="echoParams">explicit</str>
> 
>  <str name="spellcheck">true</str>
>  <str name="spellcheck.dictionary">textSpell</str>
>  <str name="spellcheck.onlyMorePopular">false</str>
>  <str name="spellcheck.extendedResults">true</str>
>  <str name="spellcheck.collate">true</str>
>  <str name="spellcheck.count">5</str>
> 
>  </lst>
> 
>  <arr name="last-components">
>  <str>spellcheck</str>
>  </arr>
>  </requestHandler>
> 
>  ---
> 
>  The first time, I put a spellcheck.build=true in the request, the index
> was modified, but has only 20 bytes. (I think that's strange for 7000
> indexed pages)
> 
>  This request : http://localhost:8983/solr/select/?q=nytames
> 
>  returns that :
> 
>  −
>  <response>
>  −
>  <lst name="responseHeader">
>  <int name="status">0</int>
>  <int name="QTime">33</int>
>  −
>  <lst name="params">
>  <str name="q">nytames</str>
>  </lst>
>  </lst>
>  <result name="response" numFound="0" start="0"/>
>  −
>  <lst name="spellcheck">
>  −
>  <lst name="suggestions">
>  <bool name="correctlySpelled">false</bool>
>  </lst>
>  </lst>
>  </response>
> 
>  I tried with spellcheck=true but it doesn't change nothing.
> 
>  I should get some suggestions in the <lst name="suggestions" but I get
> nothing.
> 
>  Do someone has an idea about the problem ?
> 
>  Thanks.

Re: Spellcheck with Solr

Posted by Gora Mohanty <go...@mimirtech.com>.
On Wed, Sep 7, 2011 at 1:16 PM, Danicela nutch <Da...@mail.com> wrote:
[...]
>  The first time, I put a spellcheck.build=true in the request, the index was modified, but has only 20 bytes. (I think that's strange for 7000 indexed pages)

This seems to indicate that something went wrong
with the indexing. 20bytes is definitely too small,
and you probably have no entries at all in the index.
How do you do indexing? Were there any error
messages in the Solr logs at indexing time?

>  This request : http://localhost:8983/solr/select/?q=nytames

You could check if there are any entries at all with:
http://localhost:8983/solr/select/?q=*:*

Regards,
Gora

Re: Spellcheck with Solr

Posted by lewis john mcgibbney <le...@gmail.com>.
Hi Danicela,

To be honest from what I can see this question is geared completely towards
the Solr lists.

I'm not 100% sure about the changes you've made to your configuration files
or schema files and would need to look into it to give a rounded answer.

What I can say is that Solr does not like changes to its schema, and any
resulting change requires you to completely reindex your data, this may
explain why you get 20 bytes for the size of your index.

I hope you can get help on Solr list for this, sorry I couldn't be of more
help.

On Wed, Sep 7, 2011 at 8:46 AM, Danicela nutch <Da...@mail.com>wrote:

> Hi,
>
>  I'm trying to get search suggestions like Google 'Did you mean ?' with
> indexed data with Solr from Nutch.
>
>  I added this to my schema.xml :
>
>  <fieldType name="textSpell" class="solr.TextField"
> positionIncrementGap="100" >
>  <analyzer>
>  <tokenizer class="solr.StandardTokenizerFactory"/>
>  <filter class="solr.LowerCaseFilterFactory"/>
>  <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>  </analyzer>
>  </fieldType>
>
>  <field name="textSpell" type="text" stored="false" indexed="true"
> multiValued="true" />
>
>  <copyField source="*_text" dest="textSpell" />
>
>  ---
>
>  I added this to my solrconfig.xml :
>
>  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>
>  <str name="queryAnalyzerFieldType">textSpell</str>
>
>  <lst name="spellchecker">
>  <str name="classname">solr.IndexBasedSpellChecker</str>
>  <str name="name">textSpell</str>
>  <str name="field">textSpell</str>
>  <str name="spellcheckIndexDir">./spellcheckerDefault</str>
>  </lst>
>  </searchComponent>
>
>  I modified this :
>
>  <requestHandler name="standard" class="solr.SearchHandler" default="true">
>  <lst name="defaults">
>  <str name="echoParams">explicit</str>
>
>  <str name="spellcheck">true</str>
>  <str name="spellcheck.dictionary">textSpell</str>
>  <str name="spellcheck.onlyMorePopular">false</str>
>  <str name="spellcheck.extendedResults">true</str>
>  <str name="spellcheck.collate">true</str>
>  <str name="spellcheck.count">5</str>
>
>  </lst>
>
>  <arr name="last-components">
>  <str>spellcheck</str>
>  </arr>
>  </requestHandler>
>
>  ---
>
>  The first time, I put a spellcheck.build=true in the request, the index
> was modified, but has only 20 bytes. (I think that's strange for 7000
> indexed pages)
>
>  This request : http://localhost:8983/solr/select/?q=nytames
>
>  returns that :
>
>  -
>  <response>
>  -
>  <lst name="responseHeader">
>  <int name="status">0</int>
>  <int name="QTime">33</int>
>  -
>  <lst name="params">
>  <str name="q">nytames</str>
>  </lst>
>  </lst>
>  <result name="response" numFound="0" start="0"/>
>  -
>  <lst name="spellcheck">
>  -
>  <lst name="suggestions">
>  <bool name="correctlySpelled">false</bool>
>  </lst>
>  </lst>
>  </response>
>
>  I tried with spellcheck=true but it doesn't change nothing.
>
>  I should get some suggestions in the <lst name="suggestions" but I get
> nothing.
>
>  Do someone has an idea about the problem ?
>
>  Thanks.
>



-- 
*Lewis*