You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by rashi gandhi <ga...@gmail.com> on 2013/09/26 19:01:11 UTC
SOLR: Searching on OpenNLP Field is Unstable
HI,
I am working on OpenNLP integration with SOLR. I have successfully applied
the patch (LUCENE-2899-x.patch) to latest SOLR source code (branch_4x).
I have designed OpenNLP analyzer and index data to it. Analyzer declaration
in schema.xml is as
<fieldType name="nlp_type" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<!-- Sequence of tokenizers and filters
applied at the index time-->
<tokenizer
class="solr.StandardTokenizerFactory"/>
<filter
class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter
class="solr.SnowballPorterFilterFactory"/>
<filter
class="solr.ASCIIFoldingFilterFactory"/>
</analyzer>
<analyzer type="query">
<!-- Sequence of tokenizers and filters
applied at the index time-->
<tokenizer
class="solr.StandardTokenizerFactory"/>
<filter class="solr.OpenNLPFilterFactory"
posTaggerModel="opennlp/en-pos-maxent.bin"/>
<filter class="solr.OpenNLPFilterFactory"
nerTaggerModels="opennlp/en-ner-person.bin"/>
<filter class="solr.OpenNLPFilterFactory"
nerTaggerModels="opennlp/en-ner-location.bin"/>
<filter
class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
</analyzer>
</fieldType>
And field declared for this analyzer:
<field name="Detail_Person" type="nlp_type" indexed="true" stored="true"
omitNorms="true" omitPositions="true"/>
Problem is here : When I search over this field Detail_Person, results are
not constant.
When I search Detail_Person:brett, it return one document
But again when I fire the same query, it return zero document.
And also these are logs:
97139 [http-bio-8080-exec-9] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create
97139 [http-bio-8080-exec-9] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create
97139 [http-bio-8080-exec-9] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create
97154 [http-bio-8080-exec-9] INFO org.apache.solr.core.SolrCore û
[collection1] webapp=/solr path=/select
params={fl=score,*&indent=true&q=Detail_Pe
rson:rashi&wt=json} hits=1 status=0 QTime=15
97154 [http-bio-8080-exec-9] DEBUG
org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest:
{{params(fl=score,*&indent=true&q=Detail_Per
son:rashi&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}
134874 [http-bio-8080-exec-3] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create
134890 [http-bio-8080-exec-3] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create
134890 [http-bio-8080-exec-3] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create
134906 [http-bio-8080-exec-3] INFO org.apache.solr.core.SolrCore û
[collection1] webapp=/solr path=/select
params={fl=score,*&indent=true&q=Detail_P
erson:brett&wt=json} hits=2 status=0 QTime=32
134906 [http-bio-8080-exec-3] DEBUG
org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest:
{{params(fl=score,*&indent=true&q=Detail_Pe
rson:brett&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}
147136 [http-bio-8080-exec-3] INFO org.apache.solr.core.SolrCore û
[collection1] webapp=/solr path=/select
params={fl=score,*&indent=true&q=Detail_P
erson:john&wt=json} hits=0 status=0 QTime=0
147136 [http-bio-8080-exec-3] DEBUG
org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest:
{{params(fl=score,*&indent=true&q=Detail_Pe
rson:john&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}
302164 [http-bio-8080-exec-10] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create
302164 [http-bio-8080-exec-10] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create
302164 [http-bio-8080-exec-10] INFO
org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory create
302164 [http-bio-8080-exec-10] INFO org.apache.solr.core.SolrCore û
[collection1] webapp=/solr path=/select
params={fl=score,*&indent=true&q=Detail_
Person:john&wt=json} hits=1 status=0 QTime=15
302164 [http-bio-8080-exec-10] DEBUG
org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest:
{{params(fl=score,*&indent=true&q=Detail_P
erson:john&wt=json),defaults(
df=text&echoParams=explicit&rows=10)}}
Searching is not stable on OpenNLP field, sometimes it return documents and
sometimes not but documents are there.
And if I search on non OpenNLP fields, it is working properly, results are
stable and correct.
Please help me to make solr results consistent.
Thanks in Advance.
Re: SOLR: Searching on OpenNLP Field is Unstable
Posted by Lance Norskog <go...@gmail.com>.
Hi-
I'm the developer. It's not a production of the OpenNLP crew. Please
sign up for the SOLR JIRA and add this report to the LUCENE-2899 entry.
1)
The POS filters only add payloads to the search terms. Your query
ignores payloads, so I don't see the point of this definition. If you
then add a FilterPayloadFilter to the bottom of the stack, you can limit
the query to the words found.
2)
The POS algorithm is statistical, and it trains on both the words and
the pattern of surrounding words. A single word may not trigger, where
'a guy named Brett is here' will find the word 'Brett'.
3)
The POS models are trained on old data. I think the names and
organizations models were trained on newspaper data from 20 years ago.
The organizations filter will not find "Google".
Lance
On 09/26/2013 10:01 AM, rashi gandhi wrote:
>
> HI,
>
> I am working on OpenNLP integration with SOLR. I have successfully
> applied the patch (LUCENE-2899-x.patch) to latest SOLR source code
> (branch_4x).
>
> I have designed OpenNLP analyzer and index data to it. Analyzer
> declaration in schema.xml is as
>
> <fieldType name="nlp_type" class="solr.TextField"
> positionIncrementGap="100">
>
> <analyzer type="index">
>
> <!-- Sequence of tokenizers and
> filters applied at the index time-->
>
> <tokenizer
> class="solr.StandardTokenizerFactory"/>
>
> <filter
> class="solr.LowerCaseFilterFactory"/>
>
> <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
>
> <filter
> class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>
> <filter
> class="solr.SnowballPorterFilterFactory"/>
>
> <filter
> class="solr.ASCIIFoldingFilterFactory"/>
>
> </analyzer>
>
> <analyzer type="query">
>
> <!-- Sequence of tokenizers and
> filters applied at the index time-->
>
> <tokenizer
> class="solr.StandardTokenizerFactory"/>
>
> <filter
> class="solr.OpenNLPFilterFactory"
> posTaggerModel="opennlp/en-pos-maxent.bin"/>
>
> <filter
> class="solr.OpenNLPFilterFactory"
> nerTaggerModels="opennlp/en-ner-person.bin"/>
>
> <filter
> class="solr.OpenNLPFilterFactory"
> nerTaggerModels="opennlp/en-ner-location.bin"/>
>
> <filter
> class="solr.LowerCaseFilterFactory"/>
>
> <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
>
> </analyzer>
>
> </fieldType>
>
> And field declared for this analyzer:
>
> <field name="Detail_Person" type="nlp_type" indexed="true"
> stored="true" omitNorms="true" omitPositions="true"/>
>
> Problem is here : When I search over this field Detail_Person, results
> are not constant.
>
> When I search Detail_Person:brett, it return one document
>
> But again when I fire the same query, it return zero document.
>
>
> And also these are logs:
>
> 97139 [http-bio-8080-exec-9] INFO
> org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
> create
>
> 97139 [http-bio-8080-exec-9] INFO
> org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
> create
>
> 97139 [http-bio-8080-exec-9] INFO
> org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
> create
>
> 97154 [http-bio-8080-exec-9] INFO org.apache.solr.core.SolrCore û
> [collection1] webapp=/solr path=/select
> params={fl=score,*&indent=true&q=Detail_Pe
>
> rson:rashi&wt=json} hits=1 status=0 QTime=15
>
> 97154 [http-bio-8080-exec-9] DEBUG
> org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest:
> {{params(fl=score,*&indent=true&q=Detail_Per
>
> son:rashi&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}
>
> 134874 [http-bio-8080-exec-3] INFO
> org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
> create
>
> 134890 [http-bio-8080-exec-3] INFO
> org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
> create
>
> 134890 [http-bio-8080-exec-3] INFO
> org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
> create
>
> 134906 [http-bio-8080-exec-3] INFO org.apache.solr.core.SolrCore û
> [collection1] webapp=/solr path=/select
> params={fl=score,*&indent=true&q=Detail_P
>
> erson:brett&wt=json} hits=2 status=0 QTime=32
>
> 134906 [http-bio-8080-exec-3] DEBUG
> org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest:
> {{params(fl=score,*&indent=true&q=Detail_Pe
>
> rson:brett&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}
>
> 147136 [http-bio-8080-exec-3] INFO org.apache.solr.core.SolrCore û
> [collection1] webapp=/solr path=/select
> params={fl=score,*&indent=true&q=Detail_P
>
> erson:john&wt=json} hits=0 status=0 QTime=0
>
> 147136 [http-bio-8080-exec-3] DEBUG
> org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest:
> {{params(fl=score,*&indent=true&q=Detail_Pe
>
> rson:john&wt=json),defaults(df=text&echoParams=explicit&rows=10)}}
>
> 302164 [http-bio-8080-exec-10] INFO
> org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
> create
>
> 302164 [http-bio-8080-exec-10] INFO
> org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
> create
>
> 302164 [http-bio-8080-exec-10] INFO
> org.apache.solr.analysis.OpenNLPFilterFactory û OpenNLPFilterFactory
> create
>
> 302164 [http-bio-8080-exec-10] INFO org.apache.solr.core.SolrCore û
> [collection1] webapp=/solr path=/select
> params={fl=score,*&indent=true&q=Detail_
>
> Person:john&wt=json} hits=1 status=0 QTime=15
>
> 302164 [http-bio-8080-exec-10] DEBUG
> org.apache.solr.servlet.SolrDispatchFilter û Closing out SolrRequest:
> {{params(fl=score,*&indent=true&q=Detail_P
>
> erson:john&wt=json),defaults(
>
> df=text&echoParams=explicit&rows=10)}}
>
>
> Searching is not stable on OpenNLP field, sometimes it return
> documents and sometimes not but documents are there.
>
> And if I search on non OpenNLP fields, it is working properly, results
> are stable and correct.
>
> Please help me to make solr results consistent.
>
> Thanks in Advance.