You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Lance Norskog <go...@gmail.com> on 2013/05/31 00:47:32 UTC

Re: OPENNLP problems

I will look at these problems. Thanks for trying it out!

Lance Norskog

On 05/28/2013 10:08 PM, Patrick Mi wrote:
> Hi there,
>
> Checked out branch_4x and applied the latest patch
> LUCENE-2899-current.patch however I ran into 2 problems
>
> Followed the wiki page instruction and set up a field with this type aiming
> to keep nouns and verbs and do a facet on the field
> ==
> <fieldType name="text_opennlp_nvf" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.OpenNLPTokenizerFactory"
> tokenizerModel="opennlp/en-token.bin"/>
>          <filter class="solr.OpenNLPFilterFactory"
> posTaggerModel="opennlp/en-pos-maxent.bin"/>
>          <filter class="solr.FilterPayloadsFilterFactory"
> payloadList="NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW"/>
>          <filter class="solr.StripPayloadsFilterFactory"/>
>        </analyzer>
>      </fieldType>
> ==
>
> Struggled to get that going until I put the extra parameter
> keepPayloads="true" in as below.
>       <filter class="solr.FilterPayloadsFilterFactory" keepPayloads="true"
> payloadList="NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW"/>
>
> Question: am I doing the right thing? Is this a mistake on wiki
>
> Second problem:
>
> Posted the document xml one by one to the solr and the result was what I
> expected.
>
> <add>
> <doc>
>    <field name="id">1</field>
>    <field name="text_opennlp_nvf">check in the hotel</field></doc>
> </add>
>
> However if I put multiple documents into the same xml file and post it in
> one go only the first document gets processed( only 'check' and 'hotel' were
> showing in the facet result.)
>   
> <add>
> <doc>
>    <field name="id">1</field>
>    <field name="text_opennlp_nvf">check in the hotel</field>
> </doc>
> <doc>
>    <field name="id">2</field>
>    <field name="text_opennlp_nvf">removes the payloads</field>
> </doc>
> <doc>
>    <field name="id">3</field>
>    <field name="text_opennlp_nvf">retains only nouns and verbs </field>
> </doc>
> </add>
>
> Same problem when updated the data using csv upload.
>
> Is that a bug or something I did wrong?
>
> Thanks in advance!
>
> Regards,
> Patrick
>
>