You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by anupamxyz <cs...@gmail.com> on 2011/08/19 17:56:27 UTC

How to implement Spell Checker using Solr?

I am using Nutch to crawl and Solr for searching. The search has been
successfully implemented. Now I want a file based Suggestion or a "Do you
mean Feature?" implemented. It is more or less like a Spell checker. For the
same I am making the requisite changes to the SolrConfig.xml and the
Schema.xml for the Solr, but it fails when I am re-indexing it gain to get
the new implementation. Please let me know how that can be corrected and
also, how can I have the suggestion displayed using Jsp over my application.
I can share part of the codes changed later if you intend to help me on
this.

Thanks in advance,
Anupam

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-implement-Spell-Checker-using-Solr-tp3268450p3268450.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to implement Spell Checker using Solr?

Posted by Gora Mohanty <go...@mimirtech.com>.
On Fri, Aug 19, 2011 at 9:26 PM, anupamxyz <cs...@gmail.com> wrote:
> I am using Nutch to crawl and Solr for searching. The search has been
> successfully implemented. Now I want a file based Suggestion or a "Do you
> mean Feature?" implemented. It is more or less like a Spell checker

Um, not quite. At least as per my understanding they are very
different things. Please take a look at:
http://wiki.apache.org/solr/MoreLikeThis
http://wiki.apache.org/solr/SpellCheckComponent

>                                                                                            For the
> same I am making the requisite changes to the SolrConfig.xml and the
> Schema.xml for the Solr, but it fails when I am re-indexing it gain to get
> the new implementation. Please let me know how that can be corrected and
> also, how can I have the suggestion displayed using Jsp over my application.
> I can share part of the codes changed later if you intend to help me on
> this.

Please share with us what changes you are making, and what does
"it fails" mean, i.e., show us the configuration files, error messages,
etc., maybe through pastebin.com. You might wish to take a look at
http://wiki.apache.org/solr/UsingMailingLists

Regards,
Gora

Re: How to implement Spell Checker using Solr?

Posted by anupamxyz <cs...@gmail.com>.
Both Nutch and Solr can be used as per the need.
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ . So the search
is implemented and I am able to search on the values. Now I need the
SpellChecker to be implemented. The changes are exactly as per the ones
listed in http://wiki.apache.org/solr/SpellCheckComponent . I will share the
log details with you by Monday.

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-implement-Spell-Checker-using-Solr-tp3268450p3268695.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to implement Spell Checker using Solr?

Posted by anupamxyz <cs...@gmail.com>.
I have been able to setup Solr Spell checker on my web application. It is a
file based spell checker that i have implemented. I would like to add that
the same isn't that accurate, since I haven't applied any specific algorithm
for having the most relevant search result. Kindly do let me know in case
you have any issues in implementing the same at your end.

regards,
Anupam

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-implement-Spell-Checker-using-Solr-tp3268450p3371563.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to implement Spell Checker using Solr?

Posted by "tamanjit.bindra@yahoo.co.in" <ta...@yahoo.co.in>.
Firstly, just to make it clear the dictionary is made out of already indexed
terms, rather it is built upon it if you are using *<str
name="classname">solr.IndexBasedSpellChecker</str>* which you are.

Next lot of changes are required for your *solrconfig.xml*

1. <str name="field">spell</str> is the name of the field which will be used
to create your dictionary. Does it exist in schema.xml?

2. <str name="queryAnalyzerFieldType">textSpell</str> is the name of
FieldType used for your dictionary building, as in the <str
name="field">spell</str> should be of type textSpell in schema.xml. Is it
so?

Now for you internal error from crawling. This is most probably because your
siolrconfig.xml/schema.xml has been changed. This I assume so because as you
say before trying to implement spellcheck this was working.

/Also, I am not too sure so as to how I can make my search work based on the
search control in my application Like how can I search with the word and
have the suggestion at the same time, since when the search item is say
"form"/"formm", then I should have essentially separate URL created. Does
Solr Spell checker component take care of it on its own. if so how and
exactly how the Solrconfig and Schema xmls should be configured for the
same.


Please note: I would prefer to use a filebased dictionary for the search, so
kindly suggest on those lines.
/

If you are looking for filebased searching, you are going in the wrong
direction. You are trying to use indexbasedspellchecker class when actually
what you need is

<lst name="spellchecker">
<str name="name">file</str>
<str name="classname">solr.FileBasedSpellChecker</str>
<str name="sourceLocation">spellings.txt</str>
<str name="characterEncoding">UTF-8</str>
<str name="spellcheckIndexDir">./spellcheckerFile</str>
</lst>

Kindly read about spellchecker more.




--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-implement-Spell-Checker-using-Solr-tp3268450p3371620.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to implement Spell Checker using Solr?

Posted by anupamxyz <cs...@gmail.com>.
The error I have been receiving after crawling using Solr is as mentioned
below: 

2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Basic Indexing
Filter (index-basic)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Basic Summarizer
Plug-in (summary-basic)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Site Query Filter
(query-site)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Http / Https
Protocol Plug-in (protocol-httpclient)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	HTTP Framework
(lib-http)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Pass-through URL
Normalizer (urlnormalizer-pass)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Regex URL Filter
(urlfilter-regex)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Http Protocol
Plug-in (protocol-http)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	XML Response Writer
Plug-in (response-xml)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Regex URL
Normalizer (urlnormalizer-regex)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	OPIC Scoring
Plug-in (scoring-opic)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	CyberNeko HTML
Parser (lib-nekohtml)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Anchor Indexing
Filter (index-anchor)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	URL Query Filter
(query-url)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Regex URL Filter
Framework (lib-regex-filter)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	JSON Response
Writer Plug-in (response-json)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - Registered
Extension-Points:
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Nutch Field Filter
(org.apache.nutch.indexer.field.FieldFilter)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Nutch Search
Results Response Writer (org.apache.nutch.searcher.response.ResponseWriter)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Nutch URL
Normalizer (org.apache.nutch.net.URLNormalizer)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Nutch Online Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Nutch Indexing
Filter (org.apache.nutch.indexer.IndexingFilter)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Nutch Content
Parser (org.apache.nutch.parse.Parser)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2011-08-24 15:47:56,225 INFO  plugin.PluginRepository - 	Ontology Model
Loader (org.apache.nutch.ontology.Ontology)
2011-08-24 15:47:56,241 INFO  indexer.IndexingFilters - Adding
org.apache.nutch.indexer.basic.BasicIndexingFilter
2011-08-24 15:47:56,241 INFO  indexer.IndexingFilters - Adding
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2011-08-24 15:47:57,366 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

request: http://localhost:7001/solr/update?wt=javabin&version=2.2
	at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:343)
	at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183)
	at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217)
	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
	at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:69)
	at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)
2011-08-24 15:47:57,882 FATAL solr.SolrIndexer - SolrIndexer:
java.io.IOException: Job failed!
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
	at org.apache.nutch.indexer.solr.SolrIndexer.indexSolr(SolrIndexer.java:73)
	at org.apache.nutch.indexer.solr.SolrIndexer.run(SolrIndexer.java:95)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.nutch.indexer.solr.SolrIndexer.main(SolrIndexer.java:104)

Also, I am not too sure so as to how I can make my search work based on the
search control in my application Like how can I search with the word and
have the suggestion at the same time, since when the search item is say
"form"/"formm", then I should have essentially separate URL created. Does
Solr Spell checker component take care of it on its own. if so how and
exactly how the Solrconfig and Schema xmls should be configured for the
same.

Please note: I would prefer to use a filebased dictionary for the search, so
kindly suggest on those lines.

Regards,
Anupam

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-implement-Spell-Checker-using-Solr-tp3268450p3292167.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to implement Spell Checker using Solr?

Posted by Alexei Martchenko <al...@superdownloads.com.br>.
What is the error?

2011/8/22 anupamxyz <cs...@gmail.com>

> The changes for Solrconfig.xml in solr is as follows
> <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>
>    <lst name="spellchecker">
>
>      <str name="name">default</str>
>
>      <str name="classname">solr.IndexBasedSpellChecker</str>
>
>      <str name="field">spell</str>
>
>      <str name="spellcheckIndexDir">./spellchecker</str>
>
>      <str name="accuracy">0.7</str>
>
>      <float name="thresholdTokenFrequency">.0001</float>
>    </lst>
>
>    <lst name="spellchecker">
>      <str name="name">jarowinkler</str>
>      <str name="field">lowerfilt</str>
>
>      <str
>
> name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
>      <str name="spellcheckIndexDir">./spellchecker</str>
>
>    </lst>
>
>
>    <str name="queryAnalyzerFieldType">textSpell</str>
> </searchComponent>
>
> And for the Request handler, I have incorporated the following changes:
>
>
> <requestHandler name="/spellCheckCompRH" class="solr.SearchHandler">
>    <lst name="defaults">
>
>          <str name="spellcheck">true</str>
>
>      <str name="spellcheck.onlyMorePopular">false</str>
>
>          <str name="spellcheck.dictionary">default</str>
>
>      <str name="spellcheck.extendedResults">false</str>
>
>      <str name="spellcheck.count">5</str>
>                <str name="spellcheck.build">true</str>
>          <str name="spellcheck.collate">true</str>
>    </lst>
>    <arr name="last-components">
>      <str>spellcheck</str>
>    </arr>
>  </requestHandler>
>
> The same is failing while crawling. I have reveretd my code for now. But
> can
> try it once again and post the exception that I have been getting while
> crawling.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-implement-Spell-Checker-using-Solr-tp3268450p3274069.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads
alexei@superdownloads.com.br | alexei@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

Re: How to implement Spell Checker using Solr?

Posted by anupamxyz <cs...@gmail.com>.
The changes for Solrconfig.xml in solr is as follows
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">

    <lst name="spellchecker">
      
      <str name="name">default</str>
      
      <str name="classname">solr.IndexBasedSpellChecker</str>
      
      <str name="field">spell</str>
      
      <str name="spellcheckIndexDir">./spellchecker</str>
      
      <str name="accuracy">0.7</str>
      
      <float name="thresholdTokenFrequency">.0001</float>
    </lst>
    
    <lst name="spellchecker">
      <str name="name">jarowinkler</str>
      <str name="field">lowerfilt</str>
      
      <str
name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
      <str name="spellcheckIndexDir">./spellchecker</str>

    </lst>

    
    <str name="queryAnalyzerFieldType">textSpell</str>
</searchComponent>

And for the Request handler, I have incorporated the following changes:


<requestHandler name="/spellCheckCompRH" class="solr.SearchHandler">
    <lst name="defaults">
	  
	  <str name="spellcheck">true</str>
      
      <str name="spellcheck.onlyMorePopular">false</str>
	  
	  <str name="spellcheck.dictionary">default</str>
      
      <str name="spellcheck.extendedResults">false</str>
      
      <str name="spellcheck.count">5</str>
		<str name="spellcheck.build">true</str>
	  <str name="spellcheck.collate">true</str>
    </lst>
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>

The same is failing while crawling. I have reveretd my code for now. But can
try it once again and post the exception that I have been getting while
crawling.

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-implement-Spell-Checker-using-Solr-tp3268450p3274069.html
Sent from the Solr - User mailing list archive at Nabble.com.