You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Shalin Shekhar Mangar <sh...@gmail.com> on 2008/06/03 21:53:10 UTC

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

The current patch has been broken for some days now and implementing a
correct query parsing logic may take time to get right. Let's not aim for
everything to get into the 1.3 release.

I would like to cut down the scope of this issue to a implementation that
indexes files and Lucene indices (both Solr and arbitary) and gives
suggestions while using the correct analyzer for multi-word queries. Let's
get a spell checker working and commit it. We can deal with more
enhancements like abstractions for custom spellcheckers and query parsing
etc. in another issue which can be dealt with separately (in 1.3 or after).
Thoughts? If there is a general consensus, I can give a new patch which can
be good enough to go in.

On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) <ji...@apache.org>
wrote:

>
>    [
> https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601256#action_12601256]
>
> Oleg Gnatovskiy commented on SOLR-572:
> --------------------------------------
>
> I installed the latest patch. Still getting a NPE. Here is my config:
>
> <searchComponent name="spellcheck"
> class="org.apache.solr.handler.component.SpellCheckComponent">
>    <lst name="defaults">
>      <!-- omp = Only More Popular -->
>      <str name="spellcheck.onlyMorePopular">false</str>
>      <!-- exr = Extended Results -->
>      <str name="spellcheck.extendedResults">false</str>
>      <!--  The number of suggestions to return -->
>      <str name="spellcheck.count">1</str>
>    </lst>
>
>     <lst name="spellchecker">
>      <str
> name="classname">org.apache.solr.spelling.FileBasedSpellChecker</str>
>      <str name="name">external</str>
>       <str name="sourceLocation">spellings.txt</str>
>       <str name="characterEncoding">UTF-8</str>
>       <str name="fieldType">text_ws</str>
>      <str
> name="indexDir">/usr/local/apache/lucene/solr2home/solr/data/spellIndex</str>
>    </lst>
>  </searchComponent>
>
>
> Here is the URL I am hitting:
> http://localhost:8983/solr/select/?q=pizza&spellcheck=true&spellcheck.dictionary=external&spellcheck.build=true
>
> Here is the error:
>
> HTTP Status 500 - null java.lang.NullPointerException at
> org.apache.lucene.index.Term.<init>(Term.java:39) at
> org.apache.lucene.index.Term.<init>(Term.java:36) at
> org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228)
> at
> org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:71)
> at
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:619)
>
> spelling.txt is in my solr/home/conf.
>
> > Spell Checker as a Search Component
> > -----------------------------------
> >
> >                 Key: SOLR-572
> >                 URL: https://issues.apache.org/jira/browse/SOLR-572
> >             Project: Solr
> >          Issue Type: New Feature
> >          Components: spellchecker
> >    Affects Versions: 1.3
> >            Reporter: Shalin Shekhar Mangar
> >            Assignee: Grant Ingersoll
> >            Priority: Minor
> >             Fix For: 1.3
> >
> >         Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
> >
> >
> > Expose the Lucene contrib SpellChecker as a Search Component. Provide the
> following features:
> > * Allow creating a spell index on a given field and make it possible to
> have multiple spell indices -- one for each field
> > * Give suggestions on a per-field basis
> > * Given a multi-word query, give only one consistent suggestion
> > * Process the query with the same analyzer specified for the source field
> and process each token separately
> > * Allow the user to specify minimum length for a token (optional)
> > Consistency criteria for a multi-word query can consist of the following:
> > * Preserve the correct words in the original query as it is
> > * Never give duplicate words in a suggestion
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Hi Grant,

I did not intend to offend you or put pressure on you in any way. Please
accept my apologies if I came off as rude. In fact, I've been having a lot
of fun working with you and Bojan on this issue. We've definitely covered a
lot of ground very fast.

I completely in favor of the goals for this piece. I was merely suggesting
that with the 1.3 release being a priority, we should go one step at a time
and commit per the initial scope for this issue as written in the issue's
description and then handle the enhancements in another issue. But I'm all
for it if you want to add extra functionality within the same issue.

Once again, I'm deeply sorry if you found my comment offending in any way.

Regards,
Shalin

On Wed, Jun 4, 2008 at 4:33 PM, Grant Ingersoll <gs...@apache.org> wrote:

> There are working patches available on the issue without the "advanced
> features" and everyone is free to fix the current one.  It's not like it is
> that far off from being able to have proper spellchecking, pluggability, and
> context information about where the mistakes are.  I frankly don't get what
> all the fuss is about.
>
> Is it that you disagree with the approach?  That hasn't come across in the
> discussions, but if it is, say so.  I thought we were working on it quite
> well together and made some good progress and are pretty darn close.  I
> don't see that I've taken away any functionality that the original patch
> offers, but I did change it so that it fits a broader audience, namely those
> who are interested in other spell checkers and those who want info about
> where in the query the problem occurs.  Which, is what the comments suggest
> people are interested in and also what I am interested in for 1.3.
>
> And, I'm sorry, but I said I'd have to let it lie for a few days and then I
> would be back to it.  Cut me some slack.  I don't get paid to work on Solr
> full time.   Is it truly that important that someone can't wait a few days
> for a patch on the trunk version for something they never had before?  It
> ain't like we're talking some core bug here that has everyone broken.
>  Besides, others are perfectly welcome to work on it in the meantime.
>
> Sorry for the rant, but I am not going to be pressured into committing a
> patch that I don't think is ready and one that I said I am going to be
> working on to see it through so that we all are happy.
>
> -Grant
>
>
> On Jun 4, 2008, at 1:14 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>
>  On Wed, Jun 4, 2008 at 2:15 AM, Grant Ingersoll <gs...@apache.org>
>> wrote:
>>
>>> I will be back on it tomorrow and will see this through before 1.3 with
>>> the
>>> abstractions.  In other words, -1 on cutting this off prematurely.  :-)
>>> Since I don't think this is the only thing holding up 1.3, let's just
>>> play
>>> it out and get it right so all of us are happy.
>>>
>>
>> This feature may not be holding back 1.3 release. The potential users
>> of this issue are very much interested in a basic working version.
>> They may be able to live without these advanced features. May be we
>> can have another jira issue for enhancements which may/may not go into
>> 1.3 (depending on when it happens).
>>
>>
>>
>>
>>> -Grant
>>>
>>> On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote:
>>>
>>>  The current patch has been broken for some days now and implementing a
>>>> correct query parsing logic may take time to get right. Let's not aim
>>>> for
>>>> everything to get into the 1.3 release.
>>>>
>>>> I would like to cut down the scope of this issue to a implementation
>>>> that
>>>> indexes files and Lucene indices (both Solr and arbitary) and gives
>>>> suggestions while using the correct analyzer for multi-word queries.
>>>> Let's
>>>> get a spell checker working and commit it. We can deal with more
>>>> enhancements like abstractions for custom spellcheckers and query
>>>> parsing
>>>> etc. in another issue which can be dealt with separately (in 1.3 or
>>>> after).
>>>> Thoughts? If there is a general consensus, I can give a new patch which
>>>> can
>>>> be good enough to go in.
>>>>
>>>> On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) <
>>>> jira@apache.org>
>>>> wrote:
>>>>
>>>>
>>>>> [
>>>>>
>>>>>
>>>>> https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601256
>>>>> #action_12601256]
>>>>>
>>>>> Oleg Gnatovskiy commented on SOLR-572:
>>>>> --------------------------------------
>>>>>
>>>>> I installed the latest patch. Still getting a NPE. Here is my config:
>>>>>
>>>>> <searchComponent name="spellcheck"
>>>>> class="org.apache.solr.handler.component.SpellCheckComponent">
>>>>> <lst name="defaults">
>>>>>  <!-- omp = Only More Popular -->
>>>>>  <str name="spellcheck.onlyMorePopular">false</str>
>>>>>  <!-- exr = Extended Results -->
>>>>>  <str name="spellcheck.extendedResults">false</str>
>>>>>  <!--  The number of suggestions to return -->
>>>>>  <str name="spellcheck.count">1</str>
>>>>> </lst>
>>>>>
>>>>>  <lst name="spellchecker">
>>>>>  <str
>>>>> name="classname">org.apache.solr.spelling.FileBasedSpellChecker</str>
>>>>>  <str name="name">external</str>
>>>>>   <str name="sourceLocation">spellings.txt</str>
>>>>>   <str name="characterEncoding">UTF-8</str>
>>>>>   <str name="fieldType">text_ws</str>
>>>>>  <str
>>>>>
>>>>>
>>>>> name="indexDir">/usr/local/apache/lucene/solr2home/solr/data/spellIndex</str>
>>>>> </lst>
>>>>> </searchComponent>
>>>>>
>>>>>
>>>>> Here is the URL I am hitting:
>>>>>
>>>>>
>>>>> http://localhost:8983/solr/select/?q=pizza&spellcheck=true&spellcheck.dictionary=external&spellcheck.build=true
>>>>>
>>>>> Here is the error:
>>>>>
>>>>> HTTP Status 500 - null java.lang.NullPointerException at
>>>>> org.apache.lucene.index.Term.<init>(Term.java:39) at
>>>>> org.apache.lucene.index.Term.<init>(Term.java:36) at
>>>>>
>>>>>
>>>>> org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228)
>>>>> at
>>>>>
>>>>>
>>>>> org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:71)
>>>>> at
>>>>>
>>>>>
>>>>> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177)
>>>>> at
>>>>>
>>>>>
>>>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153)
>>>>> at
>>>>>
>>>>>
>>>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
>>>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at
>>>>>
>>>>>
>>>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
>>>>> at
>>>>>
>>>>>
>>>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
>>>>> at
>>>>>
>>>>>
>>>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>>>> at
>>>>>
>>>>>
>>>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>>> at
>>>>>
>>>>>
>>>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>>>> at
>>>>>
>>>>>
>>>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
>>>>> at
>>>>>
>>>>>
>>>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>>>> at
>>>>>
>>>>>
>>>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>>>> at
>>>>>
>>>>>
>>>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>>>> at
>>>>>
>>>>>
>>>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>>>>> at
>>>>>
>>>>>
>>>>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
>>>>> at
>>>>>
>>>>>
>>>>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>>>>> at
>>>>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>>>>> at java.lang.Thread.run(Thread.java:619)
>>>>>
>>>>> spelling.txt is in my solr/home/conf.
>>>>>
>>>>>  Spell Checker as a Search Component
>>>>>> -----------------------------------
>>>>>>
>>>>>>             Key: SOLR-572
>>>>>>             URL: https://issues.apache.org/jira/browse/SOLR-572
>>>>>>         Project: Solr
>>>>>>      Issue Type: New Feature
>>>>>>      Components: spellchecker
>>>>>> Affects Versions: 1.3
>>>>>>        Reporter: Shalin Shekhar Mangar
>>>>>>        Assignee: Grant Ingersoll
>>>>>>        Priority: Minor
>>>>>>         Fix For: 1.3
>>>>>>
>>>>>>     Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
>>>>>>
>>>>>
>>>>> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
>>>>> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
>>>>> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
>>>>>
>>>>>>
>>>>>>
>>>>>> Expose the Lucene contrib SpellChecker as a Search Component. Provide
>>>>>> the
>>>>>>
>>>>>
>>>>> following features:
>>>>>
>>>>>>
>>>>>> * Allow creating a spell index on a given field and make it possible
>>>>>> to
>>>>>>
>>>>>
>>>>> have multiple spell indices -- one for each field
>>>>>
>>>>>>
>>>>>> * Give suggestions on a per-field basis
>>>>>> * Given a multi-word query, give only one consistent suggestion
>>>>>> * Process the query with the same analyzer specified for the source
>>>>>> field
>>>>>>
>>>>>
>>>>> and process each token separately
>>>>>
>>>>>>
>>>>>> * Allow the user to specify minimum length for a token (optional)
>>>>>> Consistency criteria for a multi-word query can consist of the
>>>>>> following:
>>>>>> * Preserve the correct words in the original query as it is
>>>>>> * Never give duplicate words in a suggestion
>>>>>>
>>>>>
>>>>> --
>>>>> This message is automatically generated by JIRA.
>>>>> -
>>>>> You can reply to this email to add a comment to the issue online.
>>>>>
>>>>>
>>>>>
>>>>


-- 
Regards,
Shalin Shekhar Mangar.

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

Posted by Grant Ingersoll <gs...@apache.org>.
There are working patches available on the issue without the "advanced  
features" and everyone is free to fix the current one.  It's not like  
it is that far off from being able to have proper spellchecking,  
pluggability, and context information about where the mistakes are.  I  
frankly don't get what all the fuss is about.

Is it that you disagree with the approach?  That hasn't come across in  
the discussions, but if it is, say so.  I thought we were working on  
it quite well together and made some good progress and are pretty darn  
close.  I don't see that I've taken away any functionality that the  
original patch offers, but I did change it so that it fits a broader  
audience, namely those who are interested in other spell checkers and  
those who want info about where in the query the problem occurs.   
Which, is what the comments suggest people are interested in and also  
what I am interested in for 1.3.

And, I'm sorry, but I said I'd have to let it lie for a few days and  
then I would be back to it.  Cut me some slack.  I don't get paid to  
work on Solr full time.   Is it truly that important that someone  
can't wait a few days for a patch on the trunk version for something  
they never had before?  It ain't like we're talking some core bug here  
that has everyone broken.  Besides, others are perfectly welcome to  
work on it in the meantime.

Sorry for the rant, but I am not going to be pressured into committing  
a patch that I don't think is ready and one that I said I am going to  
be working on to see it through so that we all are happy.

-Grant

On Jun 4, 2008, at 1:14 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:

> On Wed, Jun 4, 2008 at 2:15 AM, Grant Ingersoll  
> <gs...@apache.org> wrote:
>> I will be back on it tomorrow and will see this through before 1.3  
>> with the
>> abstractions.  In other words, -1 on cutting this off  
>> prematurely.  :-)
>> Since I don't think this is the only thing holding up 1.3, let's  
>> just play
>> it out and get it right so all of us are happy.
>
> This feature may not be holding back 1.3 release. The potential users
> of this issue are very much interested in a basic working version.
> They may be able to live without these advanced features. May be we
> can have another jira issue for enhancements which may/may not go into
> 1.3 (depending on when it happens).
>
>
>
>>
>> -Grant
>>
>> On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote:
>>
>>> The current patch has been broken for some days now and  
>>> implementing a
>>> correct query parsing logic may take time to get right. Let's not  
>>> aim for
>>> everything to get into the 1.3 release.
>>>
>>> I would like to cut down the scope of this issue to a  
>>> implementation that
>>> indexes files and Lucene indices (both Solr and arbitary) and gives
>>> suggestions while using the correct analyzer for multi-word  
>>> queries. Let's
>>> get a spell checker working and commit it. We can deal with more
>>> enhancements like abstractions for custom spellcheckers and query  
>>> parsing
>>> etc. in another issue which can be dealt with separately (in 1.3 or
>>> after).
>>> Thoughts? If there is a general consensus, I can give a new patch  
>>> which
>>> can
>>> be good enough to go in.
>>>
>>> On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) <jira@apache.org 
>>> >
>>> wrote:
>>>
>>>>
>>>> [
>>>>
>>>> https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601256 
>>>> #action_12601256]
>>>>
>>>> Oleg Gnatovskiy commented on SOLR-572:
>>>> --------------------------------------
>>>>
>>>> I installed the latest patch. Still getting a NPE. Here is my  
>>>> config:
>>>>
>>>> <searchComponent name="spellcheck"
>>>> class="org.apache.solr.handler.component.SpellCheckComponent">
>>>> <lst name="defaults">
>>>>   <!-- omp = Only More Popular -->
>>>>   <str name="spellcheck.onlyMorePopular">false</str>
>>>>   <!-- exr = Extended Results -->
>>>>   <str name="spellcheck.extendedResults">false</str>
>>>>   <!--  The number of suggestions to return -->
>>>>   <str name="spellcheck.count">1</str>
>>>> </lst>
>>>>
>>>>  <lst name="spellchecker">
>>>>   <str
>>>> name="classname">org.apache.solr.spelling.FileBasedSpellChecker</ 
>>>> str>
>>>>   <str name="name">external</str>
>>>>    <str name="sourceLocation">spellings.txt</str>
>>>>    <str name="characterEncoding">UTF-8</str>
>>>>    <str name="fieldType">text_ws</str>
>>>>   <str
>>>>
>>>> name="indexDir">/usr/local/apache/lucene/solr2home/solr/data/ 
>>>> spellIndex</str>
>>>> </lst>
>>>> </searchComponent>
>>>>
>>>>
>>>> Here is the URL I am hitting:
>>>>
>>>> http://localhost:8983/solr/select/?q=pizza&spellcheck=true&spellcheck.dictionary=external&spellcheck.build=true
>>>>
>>>> Here is the error:
>>>>
>>>> HTTP Status 500 - null java.lang.NullPointerException at
>>>> org.apache.lucene.index.Term.<init>(Term.java:39) at
>>>> org.apache.lucene.index.Term.<init>(Term.java:36) at
>>>>
>>>> org 
>>>> .apache 
>>>> .lucene 
>>>> .search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228)
>>>> at
>>>>
>>>> org 
>>>> .apache 
>>>> .solr 
>>>> .spelling 
>>>> .AbstractLuceneSpellChecker 
>>>> .getSuggestions(AbstractLuceneSpellChecker.java:71)
>>>> at
>>>>
>>>> org 
>>>> .apache 
>>>> .solr 
>>>> .handler 
>>>> .component.SpellCheckComponent.process(SpellCheckComponent.java: 
>>>> 177)
>>>> at
>>>>
>>>> org 
>>>> .apache 
>>>> .solr 
>>>> .handler 
>>>> .component.SearchHandler.handleRequestBody(SearchHandler.java:153)
>>>> at
>>>>
>>>> org 
>>>> .apache 
>>>> .solr 
>>>> .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
>>>> 125)
>>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at
>>>>
>>>> org 
>>>> .apache 
>>>> .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 
>>>> 339)
>>>> at
>>>>
>>>> org 
>>>> .apache 
>>>> .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
>>>> 274)
>>>> at
>>>>
>>>> org 
>>>> .apache 
>>>> .catalina 
>>>> .core 
>>>> .ApplicationFilterChain 
>>>> .internalDoFilter(ApplicationFilterChain.java:235)
>>>> at
>>>>
>>>> org 
>>>> .apache 
>>>> .catalina 
>>>> .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 
>>>> 206)
>>>> at
>>>>
>>>> org 
>>>> .apache 
>>>> .catalina 
>>>> .core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>>> at
>>>>
>>>> org 
>>>> .apache 
>>>> .catalina 
>>>> .core.StandardContextValve.invoke(StandardContextValve.java:175)
>>>> at
>>>>
>>>> org 
>>>> .apache 
>>>> .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>>> at
>>>>
>>>> org 
>>>> .apache 
>>>> .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>>> at
>>>>
>>>> org 
>>>> .apache 
>>>> .catalina 
>>>> .core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>>> at
>>>>
>>>> org 
>>>> .apache 
>>>> .catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>>>> at
>>>>
>>>> org 
>>>> .apache 
>>>> .coyote.http11.Http11Processor.process(Http11Processor.java:844)
>>>> at
>>>>
>>>> org.apache.coyote.http11.Http11Protocol 
>>>> $Http11ConnectionHandler.process(Http11Protocol.java:583)
>>>> at
>>>> org.apache.tomcat.util.net.JIoEndpoint 
>>>> $Worker.run(JIoEndpoint.java:447)
>>>> at java.lang.Thread.run(Thread.java:619)
>>>>
>>>> spelling.txt is in my solr/home/conf.
>>>>
>>>>> Spell Checker as a Search Component
>>>>> -----------------------------------
>>>>>
>>>>>              Key: SOLR-572
>>>>>              URL: https://issues.apache.org/jira/browse/SOLR-572
>>>>>          Project: Solr
>>>>>       Issue Type: New Feature
>>>>>       Components: spellchecker
>>>>> Affects Versions: 1.3
>>>>>         Reporter: Shalin Shekhar Mangar
>>>>>         Assignee: Grant Ingersoll
>>>>>         Priority: Minor
>>>>>          Fix For: 1.3
>>>>>
>>>>>      Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
>>>>
>>>> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
>>>> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
>>>> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
>>>>>
>>>>>
>>>>> Expose the Lucene contrib SpellChecker as a Search Component.  
>>>>> Provide
>>>>> the
>>>>
>>>> following features:
>>>>>
>>>>> * Allow creating a spell index on a given field and make it  
>>>>> possible to
>>>>
>>>> have multiple spell indices -- one for each field
>>>>>
>>>>> * Give suggestions on a per-field basis
>>>>> * Given a multi-word query, give only one consistent suggestion
>>>>> * Process the query with the same analyzer specified for the  
>>>>> source
>>>>> field
>>>>
>>>> and process each token separately
>>>>>
>>>>> * Allow the user to specify minimum length for a token (optional)
>>>>> Consistency criteria for a multi-word query can consist of the
>>>>> following:
>>>>> * Preserve the correct words in the original query as it is
>>>>> * Never give duplicate words in a suggestion
>>>>
>>>> --
>>>> This message is automatically generated by JIRA.
>>>> -
>>>> You can reply to this email to add a comment to the issue online.
>>>>
>>>>
>>>

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
On Wed, Jun 4, 2008 at 2:15 AM, Grant Ingersoll <gs...@apache.org> wrote:
> I will be back on it tomorrow and will see this through before 1.3 with the
> abstractions.  In other words, -1 on cutting this off prematurely.  :-)
>  Since I don't think this is the only thing holding up 1.3, let's just play
> it out and get it right so all of us are happy.

This feature may not be holding back 1.3 release. The potential users
of this issue are very much interested in a basic working version.
They may be able to live without these advanced features. May be we
can have another jira issue for enhancements which may/may not go into
1.3 (depending on when it happens).



>
> -Grant
>
> On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote:
>
>> The current patch has been broken for some days now and implementing a
>> correct query parsing logic may take time to get right. Let's not aim for
>> everything to get into the 1.3 release.
>>
>> I would like to cut down the scope of this issue to a implementation that
>> indexes files and Lucene indices (both Solr and arbitary) and gives
>> suggestions while using the correct analyzer for multi-word queries. Let's
>> get a spell checker working and commit it. We can deal with more
>> enhancements like abstractions for custom spellcheckers and query parsing
>> etc. in another issue which can be dealt with separately (in 1.3 or
>> after).
>> Thoughts? If there is a general consensus, I can give a new patch which
>> can
>> be good enough to go in.
>>
>> On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) <ji...@apache.org>
>> wrote:
>>
>>>
>>>  [
>>>
>>> https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601256#action_12601256]
>>>
>>> Oleg Gnatovskiy commented on SOLR-572:
>>> --------------------------------------
>>>
>>> I installed the latest patch. Still getting a NPE. Here is my config:
>>>
>>> <searchComponent name="spellcheck"
>>> class="org.apache.solr.handler.component.SpellCheckComponent">
>>>  <lst name="defaults">
>>>    <!-- omp = Only More Popular -->
>>>    <str name="spellcheck.onlyMorePopular">false</str>
>>>    <!-- exr = Extended Results -->
>>>    <str name="spellcheck.extendedResults">false</str>
>>>    <!--  The number of suggestions to return -->
>>>    <str name="spellcheck.count">1</str>
>>>  </lst>
>>>
>>>   <lst name="spellchecker">
>>>    <str
>>> name="classname">org.apache.solr.spelling.FileBasedSpellChecker</str>
>>>    <str name="name">external</str>
>>>     <str name="sourceLocation">spellings.txt</str>
>>>     <str name="characterEncoding">UTF-8</str>
>>>     <str name="fieldType">text_ws</str>
>>>    <str
>>>
>>> name="indexDir">/usr/local/apache/lucene/solr2home/solr/data/spellIndex</str>
>>>  </lst>
>>> </searchComponent>
>>>
>>>
>>> Here is the URL I am hitting:
>>>
>>> http://localhost:8983/solr/select/?q=pizza&spellcheck=true&spellcheck.dictionary=external&spellcheck.build=true
>>>
>>> Here is the error:
>>>
>>> HTTP Status 500 - null java.lang.NullPointerException at
>>> org.apache.lucene.index.Term.<init>(Term.java:39) at
>>> org.apache.lucene.index.Term.<init>(Term.java:36) at
>>>
>>> org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:228)
>>> at
>>>
>>> org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:71)
>>> at
>>>
>>> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:177)
>>> at
>>>
>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:153)
>>> at
>>>
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
>>> at
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
>>> at
>>>
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>> at
>>>
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>> at
>>>
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>> at
>>>
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
>>> at
>>>
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>> at
>>>
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>> at
>>>
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>> at
>>>
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>>> at
>>>
>>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
>>> at
>>>
>>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>>> at
>>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>>> at java.lang.Thread.run(Thread.java:619)
>>>
>>> spelling.txt is in my solr/home/conf.
>>>
>>>> Spell Checker as a Search Component
>>>> -----------------------------------
>>>>
>>>>               Key: SOLR-572
>>>>               URL: https://issues.apache.org/jira/browse/SOLR-572
>>>>           Project: Solr
>>>>        Issue Type: New Feature
>>>>        Components: spellchecker
>>>>  Affects Versions: 1.3
>>>>          Reporter: Shalin Shekhar Mangar
>>>>          Assignee: Grant Ingersoll
>>>>          Priority: Minor
>>>>           Fix For: 1.3
>>>>
>>>>       Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
>>>
>>> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
>>> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
>>> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
>>>>
>>>>
>>>> Expose the Lucene contrib SpellChecker as a Search Component. Provide
>>>> the
>>>
>>> following features:
>>>>
>>>> * Allow creating a spell index on a given field and make it possible to
>>>
>>> have multiple spell indices -- one for each field
>>>>
>>>> * Give suggestions on a per-field basis
>>>> * Given a multi-word query, give only one consistent suggestion
>>>> * Process the query with the same analyzer specified for the source
>>>> field
>>>
>>> and process each token separately
>>>>
>>>> * Allow the user to specify minimum length for a token (optional)
>>>> Consistency criteria for a multi-word query can consist of the
>>>> following:
>>>> * Preserve the correct words in the original query as it is
>>>> * Never give duplicate words in a suggestion
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>
>>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>



-- 
--Noble Paul

Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component

Posted by Grant Ingersoll <gs...@apache.org>.
I will be back on it tomorrow and will see this through before 1.3  
with the abstractions.  In other words, -1 on cutting this off  
prematurely.  :-)  Since I don't think this is the only thing holding  
up 1.3, let's just play it out and get it right so all of us are happy.

-Grant

On Jun 3, 2008, at 3:53 PM, Shalin Shekhar Mangar wrote:

> The current patch has been broken for some days now and implementing a
> correct query parsing logic may take time to get right. Let's not  
> aim for
> everything to get into the 1.3 release.
>
> I would like to cut down the scope of this issue to a implementation  
> that
> indexes files and Lucene indices (both Solr and arbitary) and gives
> suggestions while using the correct analyzer for multi-word queries.  
> Let's
> get a spell checker working and commit it. We can deal with more
> enhancements like abstractions for custom spellcheckers and query  
> parsing
> etc. in another issue which can be dealt with separately (in 1.3 or  
> after).
> Thoughts? If there is a general consensus, I can give a new patch  
> which can
> be good enough to go in.
>
> On Sat, May 31, 2008 at 2:44 AM, Oleg Gnatovskiy (JIRA) <jira@apache.org 
> >
> wrote:
>
>>
>>   [
>> https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601256 
>> #action_12601256]
>>
>> Oleg Gnatovskiy commented on SOLR-572:
>> --------------------------------------
>>
>> I installed the latest patch. Still getting a NPE. Here is my config:
>>
>> <searchComponent name="spellcheck"
>> class="org.apache.solr.handler.component.SpellCheckComponent">
>>   <lst name="defaults">
>>     <!-- omp = Only More Popular -->
>>     <str name="spellcheck.onlyMorePopular">false</str>
>>     <!-- exr = Extended Results -->
>>     <str name="spellcheck.extendedResults">false</str>
>>     <!--  The number of suggestions to return -->
>>     <str name="spellcheck.count">1</str>
>>   </lst>
>>
>>    <lst name="spellchecker">
>>     <str
>> name="classname">org.apache.solr.spelling.FileBasedSpellChecker</str>
>>     <str name="name">external</str>
>>      <str name="sourceLocation">spellings.txt</str>
>>      <str name="characterEncoding">UTF-8</str>
>>      <str name="fieldType">text_ws</str>
>>     <str
>> name="indexDir">/usr/local/apache/lucene/solr2home/solr/data/ 
>> spellIndex</str>
>>   </lst>
>> </searchComponent>
>>
>>
>> Here is the URL I am hitting:
>> http://localhost:8983/solr/select/?q=pizza&spellcheck=true&spellcheck.dictionary=external&spellcheck.build=true
>>
>> Here is the error:
>>
>> HTTP Status 500 - null java.lang.NullPointerException at
>> org.apache.lucene.index.Term.<init>(Term.java:39) at
>> org.apache.lucene.index.Term.<init>(Term.java:36) at
>> org 
>> .apache 
>> .lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java: 
>> 228)
>> at
>> org 
>> .apache 
>> .solr 
>> .spelling 
>> .AbstractLuceneSpellChecker 
>> .getSuggestions(AbstractLuceneSpellChecker.java:71)
>> at
>> org 
>> .apache 
>> .solr 
>> .handler 
>> .component.SpellCheckComponent.process(SpellCheckComponent.java:177)
>> at
>> org 
>> .apache 
>> .solr 
>> .handler 
>> .component.SearchHandler.handleRequestBody(SearchHandler.java:153)
>> at
>> org 
>> .apache 
>> .solr 
>> .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
>> 125)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at
>> org 
>> .apache 
>> .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
>> at
>> org 
>> .apache 
>> .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
>> 274)
>> at
>> org 
>> .apache 
>> .catalina 
>> .core 
>> .ApplicationFilterChain 
>> .internalDoFilter(ApplicationFilterChain.java:235)
>> at
>> org 
>> .apache 
>> .catalina 
>> .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 
>> 206)
>> at
>> org 
>> .apache 
>> .catalina 
>> .core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>> at
>> org 
>> .apache 
>> .catalina 
>> .core.StandardContextValve.invoke(StandardContextValve.java:175)
>> at
>> org 
>> .apache 
>> .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>> at
>> org 
>> .apache 
>> .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>> at
>> org 
>> .apache 
>> .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 
>> 109)
>> at
>> org 
>> .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
>> 286)
>> at
>> org 
>> .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
>> 844)
>> at
>> org.apache.coyote.http11.Http11Protocol 
>> $Http11ConnectionHandler.process(Http11Protocol.java:583)
>> at org.apache.tomcat.util.net.JIoEndpoint 
>> $Worker.run(JIoEndpoint.java:447)
>> at java.lang.Thread.run(Thread.java:619)
>>
>> spelling.txt is in my solr/home/conf.
>>
>>> Spell Checker as a Search Component
>>> -----------------------------------
>>>
>>>                Key: SOLR-572
>>>                URL: https://issues.apache.org/jira/browse/SOLR-572
>>>            Project: Solr
>>>         Issue Type: New Feature
>>>         Components: spellchecker
>>>   Affects Versions: 1.3
>>>           Reporter: Shalin Shekhar Mangar
>>>           Assignee: Grant Ingersoll
>>>           Priority: Minor
>>>            Fix For: 1.3
>>>
>>>        Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
>> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
>> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch,
>> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
>>>
>>>
>>> Expose the Lucene contrib SpellChecker as a Search Component.  
>>> Provide the
>> following features:
>>> * Allow creating a spell index on a given field and make it  
>>> possible to
>> have multiple spell indices -- one for each field
>>> * Give suggestions on a per-field basis
>>> * Given a multi-word query, give only one consistent suggestion
>>> * Process the query with the same analyzer specified for the  
>>> source field
>> and process each token separately
>>> * Allow the user to specify minimum length for a token (optional)
>>> Consistency criteria for a multi-word query can consist of the  
>>> following:
>>> * Preserve the correct words in the original query as it is
>>> * Never give duplicate words in a suggestion
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>
>
> -- 
> Regards,
> Shalin Shekhar Mangar.

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ