You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by rashi gandhi <ga...@gmail.com> on 2013/09/26 19:01:11 UTC

SOLR: Searching on OpenNLP Field is Unstable

HI,



I am working on OpenNLP integration with SOLR. I have successfully applied
the patch (LUCENE-2899-x.patch) to latest SOLR source code (branch_4x).

I have designed OpenNLP analyzer and index data to it. Analyzer declaration
in schema.xml is as



  <fieldType name="nlp_type" class="solr.TextField"
positionIncrementGap="100">

                                <analyzer type="index">

                                <!-- Sequence of tokenizers and filters
applied at the index time-->

                                <tokenizer
class="solr.StandardTokenizerFactory"/>

                                <filter
class="solr.LowerCaseFilterFactory"/>

                                <filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>

                                <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

                                <filter
class="solr.SnowballPorterFilterFactory"/>

                                <filter
class="solr.ASCIIFoldingFilterFactory"/>

                                </analyzer>

                                <analyzer type="query">

                                <!-- Sequence of tokenizers and filters
applied at the index time-->

                                <tokenizer
class="solr.StandardTokenizerFactory"/>

                                <filter class="solr.OpenNLPFilterFactory"
posTaggerModel="opennlp/en-pos-maxent.bin"/>

                                <filter class="solr.OpenNLPFilterFactory"
nerTaggerModels="opennlp/en-ner-person.bin"/>

                                 <filter class="solr.OpenNLPFilterFactory"
nerTaggerModels="opennlp/en-ner-location.bin"/>

                                <filter
class="solr.LowerCaseFilterFactory"/>

                                <filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>

 </analyzer>

</fieldType>



And field declared for this analyzer:

<field name="Detail_Person" type="nlp_type" indexed="true" stored="true"
omitNorms="true" omitPositions="true"/>



Problem is here : When I search over this field Detail_Person, results are
not constant.



When I search Detail_Person:brett, it return one document





But again when I fire the same query, it return zero document.




And also these are logs:

97139 [http-bio-8080-exec-9] INFO
org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory create

97139 [http-bio-8080-exec-9] INFO
org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory create

97139 [http-bio-8080-exec-9] INFO
org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory create

97154 [http-bio-8080-exec-9] INFO  org.apache.solr.core.SolrCore  û
[collection1] webapp=/solr path=/select
params={fl=score,*&indent=true&q=Detail_Pe

rson:rashi&wt=json} hits=1 status=0 QTime=15

97154 [http-bio-8080-exec-9] DEBUG
org.apache.solr.servlet.SolrDispatchFilter  û Closing out SolrRequest:
{{params(fl=score,*&indent=true&q=Detail_Per

son:rashi&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}



134874 [http-bio-8080-exec-3] INFO
org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory create

134890 [http-bio-8080-exec-3] INFO
org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory create

134890 [http-bio-8080-exec-3] INFO
org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory create

134906 [http-bio-8080-exec-3] INFO  org.apache.solr.core.SolrCore  û
[collection1] webapp=/solr path=/select
params={fl=score,*&indent=true&q=Detail_P

erson:brett&wt=json} hits=2 status=0 QTime=32

134906 [http-bio-8080-exec-3] DEBUG
org.apache.solr.servlet.SolrDispatchFilter  û Closing out SolrRequest:
{{params(fl=score,*&indent=true&q=Detail_Pe

rson:brett&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}



147136 [http-bio-8080-exec-3] INFO  org.apache.solr.core.SolrCore  û
[collection1] webapp=/solr path=/select
params={fl=score,*&indent=true&q=Detail_P

erson:john&wt=json} hits=0 status=0 QTime=0

147136 [http-bio-8080-exec-3] DEBUG
org.apache.solr.servlet.SolrDispatchFilter  û Closing out SolrRequest:
{{params(fl=score,*&indent=true&q=Detail_Pe

rson:john&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}



302164 [http-bio-8080-exec-10] INFO
org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory create

302164 [http-bio-8080-exec-10] INFO
org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory create

302164 [http-bio-8080-exec-10] INFO
org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory create

302164 [http-bio-8080-exec-10] INFO  org.apache.solr.core.SolrCore  û
[collection1] webapp=/solr path=/select
params={fl=score,*&indent=true&q=Detail_

Person:john&wt=json} hits=1 status=0 QTime=15

302164 [http-bio-8080-exec-10] DEBUG
org.apache.solr.servlet.SolrDispatchFilter  û Closing out SolrRequest:
{{params(fl=score,*&indent=true&q=Detail_P
erson:john&wt=json),defaults(

df=text&echoParams=explicit&rows=10)}}


Searching is not stable on OpenNLP field, sometimes it return documents and
sometimes not but documents are there.

And if I search on non OpenNLP fields, it is working properly, results are
stable and correct.

Please help me to make solr results consistent.
Thanks in Advance.

Re: SOLR: Searching on OpenNLP Field is Unstable

Posted by Lance Norskog <go...@gmail.com>.
Hi-

I'm the developer. It's not a production of the OpenNLP crew. Please 
sign up for the SOLR JIRA and add this report to the LUCENE-2899 entry.

1)
The POS filters only add payloads to the search terms. Your query 
ignores payloads, so I don't see the point of this definition. If you 
then add a FilterPayloadFilter to the bottom of the stack, you can limit 
the query to the words found.

2)
The POS algorithm is statistical, and it trains on both the words and 
the pattern of surrounding words. A single word may not trigger, where 
'a guy named Brett is here' will find the word 'Brett'.

3)
The POS models are trained on old data. I think the names and 
organizations models were trained on newspaper data from 20 years ago. 
The organizations filter will not find "Google".

Lance

On 09/26/2013 10:01 AM, rashi gandhi wrote:
>
> HI,
>
> I am working on OpenNLP integration with SOLR. I have successfully 
> applied the patch (LUCENE-2899-x.patch) to latest SOLR source code 
> (branch_4x).
>
> I have designed OpenNLP analyzer and index data to it. Analyzer 
> declaration in schema.xml is as
>
>   <fieldType name="nlp_type" class="solr.TextField" 
> positionIncrementGap="100">
>
>                                 <analyzer type="index">
>
>                                 <!-- Sequence of tokenizers and 
> filters applied at the index time-->
>
>                                 <tokenizer 
> class="solr.StandardTokenizerFactory"/>
>
>                                 <filter 
> class="solr.LowerCaseFilterFactory"/>
>
>                                 <filter class="solr.StopFilterFactory" 
> ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
>
>                                 <filter 
> class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
> ignoreCase="true" expand="true"/>
>
>                                 <filter 
> class="solr.SnowballPorterFilterFactory"/>
>
>                                 <filter 
> class="solr.ASCIIFoldingFilterFactory"/>
>
>                                 </analyzer>
>
>                                 <analyzer type="query">
>
>                                 <!-- Sequence of tokenizers and 
> filters applied at the index time-->
>
>                                 <tokenizer 
> class="solr.StandardTokenizerFactory"/>
>
>                                 <filter 
> class="solr.OpenNLPFilterFactory" 
> posTaggerModel="opennlp/en-pos-maxent.bin"/>
>
>                                 <filter 
> class="solr.OpenNLPFilterFactory" 
> nerTaggerModels="opennlp/en-ner-person.bin"/>
>
>                                  <filter 
> class="solr.OpenNLPFilterFactory" 
> nerTaggerModels="opennlp/en-ner-location.bin"/>
>
>                                 <filter 
> class="solr.LowerCaseFilterFactory"/>
>
>                                 <filter class="solr.StopFilterFactory" 
> ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
>
>  </analyzer>
>
> </fieldType>
>
> And field declared for this analyzer:
>
> <field name="Detail_Person" type="nlp_type" indexed="true" 
> stored="true" omitNorms="true" omitPositions="true"/>
>
> Problem is here : When I search over this field Detail_Person, results 
> are not constant.
>
> When I search Detail_Person:brett, it return one document
>
> But again when I fire the same query, it return zero document.
>
>
> And also these are logs:
>
> 97139 [http-bio-8080-exec-9] INFO 
> org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory 
> create
>
> 97139 [http-bio-8080-exec-9] INFO 
> org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory 
> create
>
> 97139 [http-bio-8080-exec-9] INFO 
> org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory 
> create
>
> 97154 [http-bio-8080-exec-9] INFO org.apache.solr.core.SolrCore  û 
> [collection1] webapp=/solr path=/select 
> params={fl=score,*&indent=true&q=Detail_Pe
>
> rson:rashi&wt=json} hits=1 status=0 QTime=15
>
> 97154 [http-bio-8080-exec-9] DEBUG 
> org.apache.solr.servlet.SolrDispatchFilter  û Closing out SolrRequest: 
> {{params(fl=score,*&indent=true&q=Detail_Per
>
> son:rashi&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}
>
> 134874 [http-bio-8080-exec-3] INFO 
> org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory 
> create
>
> 134890 [http-bio-8080-exec-3] INFO 
> org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory 
> create
>
> 134890 [http-bio-8080-exec-3] INFO 
> org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory 
> create
>
> 134906 [http-bio-8080-exec-3] INFO org.apache.solr.core.SolrCore  û 
> [collection1] webapp=/solr path=/select 
> params={fl=score,*&indent=true&q=Detail_P
>
> erson:brett&wt=json} hits=2 status=0 QTime=32
>
> 134906 [http-bio-8080-exec-3] DEBUG 
> org.apache.solr.servlet.SolrDispatchFilter  û Closing out SolrRequest: 
> {{params(fl=score,*&indent=true&q=Detail_Pe
>
> rson:brett&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}
>
> 147136 [http-bio-8080-exec-3] INFO org.apache.solr.core.SolrCore  û 
> [collection1] webapp=/solr path=/select 
> params={fl=score,*&indent=true&q=Detail_P
>
> erson:john&wt=json} hits=0 status=0 QTime=0
>
> 147136 [http-bio-8080-exec-3] DEBUG 
> org.apache.solr.servlet.SolrDispatchFilter  û Closing out SolrRequest: 
> {{params(fl=score,*&indent=true&q=Detail_Pe
>
> rson:john&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}
>
> 302164 [http-bio-8080-exec-10] INFO 
> org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory 
> create
>
> 302164 [http-bio-8080-exec-10] INFO 
> org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory 
> create
>
> 302164 [http-bio-8080-exec-10] INFO 
> org.apache.solr.analysis.OpenNLPFilterFactory  û OpenNLPFilterFactory 
> create
>
> 302164 [http-bio-8080-exec-10] INFO org.apache.solr.core.SolrCore  û 
> [collection1] webapp=/solr path=/select 
> params={fl=score,*&indent=true&q=Detail_
>
> Person:john&wt=json} hits=1 status=0 QTime=15
>
> 302164 [http-bio-8080-exec-10] DEBUG 
> org.apache.solr.servlet.SolrDispatchFilter  û Closing out SolrRequest: 
> {{params(fl=score,*&indent=true&q=Detail_P
>
> erson:john&wt=json),defaults(
>
> df=text&echoParams=explicit&rows=10)}}
>
>
> Searching is not stable on OpenNLP field, sometimes it return 
> documents and sometimes not but documents are there.
>
> And if I search on non OpenNLP fields, it is working properly, results 
> are stable and correct.
>
> Please help me to make solr results consistent.
>
> Thanks in Advance.