You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Lance Norskog <go...@gmail.com> on 2013/10/21 01:38:53 UTC

Re: SOLR: Searching on OpenNLP fields is unstable

Hi-

Unit tests to the rescue! The current unit test system in the 4.x branch
catches code sequence problems.

  [junit4]    > Throwable #1: java.lang.IllegalStateException:
TokenStream contract violation: reset()/close() call missing, reset()
called multiple times, or subclass does not call super.reset().
 Please see Javadocs of TokenStream class for more information about the
correct consuming workflow.

I'll try to get this right. But both OpenNLP and LUCENE-2899 have
deployment problems:
1) OpenNLP does not have a good source of statistical training data for the
models. For example, the NER models are trained from late 1980's newspaper
articles, so the organization finder is kind of... obsolete. That kind of
problem. I think the currency recognizer is trained on text from before the
Euro was introduced (not sure about this).
2) Solr has a basic packaging problem when the Lucene code uses external
libraries.

As to adding it to the main Solr project, I think the Marketplace Of Ideas
has spoken with deafening silence :)


On Wed, Sep 25, 2013 at 9:26 AM, rashi gandhi <ga...@gmail.com>wrote:

> HI,
>
>
>
> I am working on OpenNLP integration with SOLR. I have successfully applied
> the patch (LUCENE-2899-x.patch) to latest SOLR source code (branch_4x).
>
> I have designed OpenNLP analyzer and index data to it. Analyzer
> declaration in schema.xml is as
>
>
>
>   <fieldType name="nlp_type" class="solr.TextField"
> positionIncrementGap="100">
>
>                                 <analyzer type="index">
>
>                                 <!-- Sequence of tokenizers and filters
> applied at the index time-->
>
>                                 <tokenizer
> class="solr.StandardTokenizerFactory"/>
>
>                                 <filter
> class="solr.LowerCaseFilterFactory"/>
>
>                                 <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
>
>                                 <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>
>                                 <filter
> class="solr.SnowballPorterFilterFactory"/>
>
>                                 <filter
> class="solr.ASCIIFoldingFilterFactory"/>
>
>                                 </analyzer>
>
>                                 <analyzer type="query">
>
>                                 <!-- Sequence of tokenizers and filters
> applied at the index time-->
>
>                                 <tokenizer
> class="solr.StandardTokenizerFactory"/>
>
>                                 <filter class="solr.OpenNLPFilterFactory"
> posTaggerModel="opennlp/en-pos-maxent.bin"/>
>
>                                 <filter class="solr.OpenNLPFilterFactory"
> nerTaggerModels="opennlp/en-ner-person.bin"/>
>
>                                  <filter class="solr.OpenNLPFilterFactory"
> nerTaggerModels="opennlp/en-ner-location.bin"/>
>
>                                 <filter
> class="solr.LowerCaseFilterFactory"/>
>
>                                 <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
>
>  </analyzer>
>
> </fieldType>
>
>
>
> And field declared for this analyzer:
>
> <field name="Detail_Person" type="nlp_type" indexed="true" stored="true"
> omitNorms="true" omitPositions="true"/>
>
>
>
> Problem is here : When I search over this field Detail_Person, results are
> not constant.
>
>
>
> When I search Detail_Person:brett, it return one document
>
>
>
>
>
> But again when I fire the same query, it return zero document.
>
>
>
> Searching is not stable on OpenNLP field, sometimes it return documents
> and sometimes not but documents are there.
>
> And if I search on non OpenNLP fields, it is working properly, results are
> stable and correct.
>
> Please help me to make solr results consistent.
>
> Thanks in Advance.
>
>



-- 
Lance Norskog
goksron@gmail.com