You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alexey Ponomarenko <al...@gmail.com> on 2018/04/17 12:23:17 UTC

Solr OpenNLP named entity extraction

Hi once more I am trying to implement named entities extraction using this
manual
https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html

I am modified solrconfig.xml like this:

 <updateRequestProcessorChain name="multiple-extract">
   <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
     <str name="modelFile">opennlp/en-ner-person.bin</str>
     <str name="analyzerFieldType">text_opennlp</str>
     <str name="source">description_en</str>
     <str name="dest">content</str>
   </processor>
 </updateRequestProcessorChain>

But when I was trying to add data using:

*request:*

POST
http://localhost:8983/solr/numberplate/update?version=2.2&wt=xml&update.chain=multiple-extract

<add><doc><field name="description_en">This is Steve Jobs 2
</field><field name="content_pos">This is text 2</field><field
name="content">This is text for content 2</field></doc></add>

*response*

<?xml version="1.0" encoding="UTF-8"?>
<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">3</int>
    </lst>
</response>

But I don't see any data inserted to *content* field and in any other field.

*If you need some additional data I can provide it.*

Can you help me? What have I done wrong?

Re: Solr OpenNLP named entity extraction

Posted by Jerome Yang <je...@pivotal.io>.
Thanks a lot Steve!

On Wed, Jul 11, 2018 at 10:24 AM Steve Rowe <sa...@gmail.com> wrote:

> Hi Jerome,
>
> I was able to setup a configset to perform OpenNLP NER, loading the model
> files from local storage.
>
> There is a trick though[1]: the model files must be located *in a jar* or
> *in a subdirectory* under ${solr.solr.home}/lib/ or under a directory
> specified via a solrconfig.xml <lib> directive.
>
> I tested with the bin/solr cloud example, and put model files under the
> two solr home directories, at example/cloud/node1/solr/lib/opennlp/ and
> example/cloud/node1/solr/lib/opennlp/.  The “opennlp/“ subdirectory is
> required, though its name can be anything else you choose.
>
> [1] As you noted, ZkSolrResourceLoader delegates to its parent classloader
> when it can’t find resources in a configset, and the parent classloader is
> set up to load from subdirectories and jar files under
> ${solr.solr.home}/lib/ or under a directory specified via a solrconfig.xml
> <lib> directive.  These directories themselves are not included in the set
> of directories from which resources are loaded; only their children are.
>
> --
> Steve
> www.lucidworks.com
>
> > On Jul 9, 2018, at 10:10 PM, Jerome Yang <je...@pivotal.io> wrote:
> >
> > Hi Steve,
> >
> > Put models under " ${solr.solr.home}/lib/ " is not working.
> > I check the "ZkSolrResourceLoader" seems it will first try to find modes
> in
> > config set.
> > If not find, then it uses class loader to load from resources.
> >
> > Regards,
> > Jerome
> >
> > On Tue, Jul 10, 2018 at 9:58 AM Jerome Yang <je...@pivotal.io> wrote:
> >
> >> Thanks Steve!
> >>
> >>
> >> On Tue, Jul 10, 2018 at 5:20 AM Steve Rowe <sa...@gmail.com> wrote:
> >>
> >>> Hi Jerome,
> >>>
> >>> See the ref guide[1] for a writeup of how to enable uploading files
> >>> larger than 1MB into ZooKeeper.
> >>>
> >>> Local storage should also work - have you tried placing OpenNLP model
> >>> files in ${solr.solr.home}/lib/ ? - make sure you do the same on each
> node.
> >>>
> >>> [1]
> >>>
> https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit
> >>>
> >>> --
> >>> Steve
> >>> www.lucidworks.com
> >>>
> >>>> On Jul 9, 2018, at 12:50 AM, Jerome Yang <je...@pivotal.io> wrote:
> >>>>
> >>>> Hi guys,
> >>>>
> >>>> In Solrcloud mode, where to put the OpenNLP models?
> >>>> Upload to zookeeper?
> >>>> As I test on solr 7.3.1, seems absolute path on local host is not
> >>> working.
> >>>> And can not upload into zookeeper if the model size exceed 1M.
> >>>>
> >>>> Regards,
> >>>> Jerome
> >>>>
> >>>> On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe <sa...@gmail.com> wrote:
> >>>>
> >>>>> Hi Alexey,
> >>>>>
> >>>>> First, thanks for moving the conversation to the mailing list.
> >>> Discussion
> >>>>> of usage problems should take place here rather than in JIRA.
> >>>>>
> >>>>> I locally set up Solr 7.3 similarly to you and was able to get things
> >>> to
> >>>>> work.
> >>>>>
> >>>>> Problems with your setup:
> >>>>>
> >>>>> 1. Your update chain is missing the Log and Run update processors at
> >>> the
> >>>>> end (I see these are missing from the example in the javadocs for the
> >>>>> OpenNLP NER update processor; I’ll fix that):
> >>>>>
> >>>>>    <processor class="solr.LogUpdateProcessorFactory" />
> >>>>>    <processor class="solr.RunUpdateProcessorFactory" />
> >>>>>
> >>>>>  The Log update processor isn’t strictly necessary, but, from <
> >>>>>
> >>>
> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
> >>>>>> :
> >>>>>
> >>>>>      Do not forget to add RunUpdateProcessorFactory at the end of any
> >>>>>      chains you define in solrconfig.xml. Otherwise update requests
> >>>>>      processed by that chain will not actually affect the indexed
> >>> data.
> >>>>>
> >>>>> 2. Your example document is missing an “id” field.
> >>>>>
> >>>>> 3. For whatever reason, the pre-trained model "en-ner-person.bin"
> >>> doesn’t
> >>>>> extract anything from text “This is Steve Jobs 2”.  It will extract
> >>> “Steve
> >>>>> Jobs” from text “This is Steve Jobs in white” e.g. though.
> >>>>>
> >>>>> 4. (Not a problem necessarily) You may want to use a multi-valued
> >>> “string”
> >>>>> field for the “dest” field in your update chain, e.g. “people_str”
> >>> (“*_str”
> >>>>> in the default configset is so configured).
> >>>>>
> >>>>> --
> >>>>> Steve
> >>>>> www.lucidworks.com
> >>>>>
> >>>>>> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <
> >>> alex1989ster@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hi once more I am trying to implement named entities extraction
> using
> >>>>> this
> >>>>>> manual
> >>>>>>
> >>>>>
> >>>
> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
> >>>>>>
> >>>>>> I am modified solrconfig.xml like this:
> >>>>>>
> >>>>>> <updateRequestProcessorChain name="multiple-extract">
> >>>>>> <processor
> >>>>> class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
> >>>>>>   <str name="modelFile">opennlp/en-ner-person.bin</str>
> >>>>>>   <str name="analyzerFieldType">text_opennlp</str>
> >>>>>>   <str name="source">description_en</str>
> >>>>>>   <str name="dest">content</str>
> >>>>>> </processor>
> >>>>>> </updateRequestProcessorChain>
> >>>>>>
> >>>>>> But when I was trying to add data using:
> >>>>>>
> >>>>>> *request:*
> >>>>>>
> >>>>>> POST
> >>>>>>
> >>>>>
> >>>
> http://localhost:8983/solr/numberplate/update?version=2.2&wt=xml&update.chain=multiple-extract
> >>>>>>
> >>>>>> <add><doc><field name="description_en">This is Steve Jobs 2
> >>>>>> </field><field name="content_pos">This is text 2</field><field
> >>>>>> name="content">This is text for content 2</field></doc></add>
> >>>>>>
> >>>>>> *response*
> >>>>>>
> >>>>>> <?xml version="1.0" encoding="UTF-8"?>
> >>>>>> <response>
> >>>>>>  <lst name="responseHeader">
> >>>>>>      <int name="status">0</int>
> >>>>>>      <int name="QTime">3</int>
> >>>>>>  </lst>
> >>>>>> </response>
> >>>>>>
> >>>>>> But I don't see any data inserted to *content* field and in any
> other
> >>>>> field.
> >>>>>>
> >>>>>> *If you need some additional data I can provide it.*
> >>>>>>
> >>>>>> Can you help me? What have I done wrong?
> >>>>>
> >>>>>
> >>>>
> >>>> --
> >>>> Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>
> >>>
> >>>
> >>
> >> --
> >> Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>
> >>
> >>
> >
> > --
> > Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>
>
>

-- 
 Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>

Re: Solr OpenNLP named entity extraction

Posted by Steve Rowe <sa...@gmail.com>.
Hi Jerome,

I was able to setup a configset to perform OpenNLP NER, loading the model files from local storage.

There is a trick though[1]: the model files must be located *in a jar* or *in a subdirectory* under ${solr.solr.home}/lib/ or under a directory specified via a solrconfig.xml <lib> directive.

I tested with the bin/solr cloud example, and put model files under the two solr home directories, at example/cloud/node1/solr/lib/opennlp/ and example/cloud/node1/solr/lib/opennlp/.  The “opennlp/“ subdirectory is required, though its name can be anything else you choose.

[1] As you noted, ZkSolrResourceLoader delegates to its parent classloader when it can’t find resources in a configset, and the parent classloader is set up to load from subdirectories and jar files under ${solr.solr.home}/lib/ or under a directory specified via a solrconfig.xml <lib> directive.  These directories themselves are not included in the set of directories from which resources are loaded; only their children are.

--
Steve
www.lucidworks.com

> On Jul 9, 2018, at 10:10 PM, Jerome Yang <je...@pivotal.io> wrote:
> 
> Hi Steve,
> 
> Put models under " ${solr.solr.home}/lib/ " is not working.
> I check the "ZkSolrResourceLoader" seems it will first try to find modes in
> config set.
> If not find, then it uses class loader to load from resources.
> 
> Regards,
> Jerome
> 
> On Tue, Jul 10, 2018 at 9:58 AM Jerome Yang <je...@pivotal.io> wrote:
> 
>> Thanks Steve!
>> 
>> 
>> On Tue, Jul 10, 2018 at 5:20 AM Steve Rowe <sa...@gmail.com> wrote:
>> 
>>> Hi Jerome,
>>> 
>>> See the ref guide[1] for a writeup of how to enable uploading files
>>> larger than 1MB into ZooKeeper.
>>> 
>>> Local storage should also work - have you tried placing OpenNLP model
>>> files in ${solr.solr.home}/lib/ ? - make sure you do the same on each node.
>>> 
>>> [1]
>>> https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit
>>> 
>>> --
>>> Steve
>>> www.lucidworks.com
>>> 
>>>> On Jul 9, 2018, at 12:50 AM, Jerome Yang <je...@pivotal.io> wrote:
>>>> 
>>>> Hi guys,
>>>> 
>>>> In Solrcloud mode, where to put the OpenNLP models?
>>>> Upload to zookeeper?
>>>> As I test on solr 7.3.1, seems absolute path on local host is not
>>> working.
>>>> And can not upload into zookeeper if the model size exceed 1M.
>>>> 
>>>> Regards,
>>>> Jerome
>>>> 
>>>> On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe <sa...@gmail.com> wrote:
>>>> 
>>>>> Hi Alexey,
>>>>> 
>>>>> First, thanks for moving the conversation to the mailing list.
>>> Discussion
>>>>> of usage problems should take place here rather than in JIRA.
>>>>> 
>>>>> I locally set up Solr 7.3 similarly to you and was able to get things
>>> to
>>>>> work.
>>>>> 
>>>>> Problems with your setup:
>>>>> 
>>>>> 1. Your update chain is missing the Log and Run update processors at
>>> the
>>>>> end (I see these are missing from the example in the javadocs for the
>>>>> OpenNLP NER update processor; I’ll fix that):
>>>>> 
>>>>>    <processor class="solr.LogUpdateProcessorFactory" />
>>>>>    <processor class="solr.RunUpdateProcessorFactory" />
>>>>> 
>>>>>  The Log update processor isn’t strictly necessary, but, from <
>>>>> 
>>> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
>>>>>> :
>>>>> 
>>>>>      Do not forget to add RunUpdateProcessorFactory at the end of any
>>>>>      chains you define in solrconfig.xml. Otherwise update requests
>>>>>      processed by that chain will not actually affect the indexed
>>> data.
>>>>> 
>>>>> 2. Your example document is missing an “id” field.
>>>>> 
>>>>> 3. For whatever reason, the pre-trained model "en-ner-person.bin"
>>> doesn’t
>>>>> extract anything from text “This is Steve Jobs 2”.  It will extract
>>> “Steve
>>>>> Jobs” from text “This is Steve Jobs in white” e.g. though.
>>>>> 
>>>>> 4. (Not a problem necessarily) You may want to use a multi-valued
>>> “string”
>>>>> field for the “dest” field in your update chain, e.g. “people_str”
>>> (“*_str”
>>>>> in the default configset is so configured).
>>>>> 
>>>>> --
>>>>> Steve
>>>>> www.lucidworks.com
>>>>> 
>>>>>> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <
>>> alex1989ster@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi once more I am trying to implement named entities extraction using
>>>>> this
>>>>>> manual
>>>>>> 
>>>>> 
>>> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
>>>>>> 
>>>>>> I am modified solrconfig.xml like this:
>>>>>> 
>>>>>> <updateRequestProcessorChain name="multiple-extract">
>>>>>> <processor
>>>>> class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
>>>>>>   <str name="modelFile">opennlp/en-ner-person.bin</str>
>>>>>>   <str name="analyzerFieldType">text_opennlp</str>
>>>>>>   <str name="source">description_en</str>
>>>>>>   <str name="dest">content</str>
>>>>>> </processor>
>>>>>> </updateRequestProcessorChain>
>>>>>> 
>>>>>> But when I was trying to add data using:
>>>>>> 
>>>>>> *request:*
>>>>>> 
>>>>>> POST
>>>>>> 
>>>>> 
>>> http://localhost:8983/solr/numberplate/update?version=2.2&wt=xml&update.chain=multiple-extract
>>>>>> 
>>>>>> <add><doc><field name="description_en">This is Steve Jobs 2
>>>>>> </field><field name="content_pos">This is text 2</field><field
>>>>>> name="content">This is text for content 2</field></doc></add>
>>>>>> 
>>>>>> *response*
>>>>>> 
>>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>>> <response>
>>>>>>  <lst name="responseHeader">
>>>>>>      <int name="status">0</int>
>>>>>>      <int name="QTime">3</int>
>>>>>>  </lst>
>>>>>> </response>
>>>>>> 
>>>>>> But I don't see any data inserted to *content* field and in any other
>>>>> field.
>>>>>> 
>>>>>> *If you need some additional data I can provide it.*
>>>>>> 
>>>>>> Can you help me? What have I done wrong?
>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>
>>> 
>>> 
>> 
>> --
>> Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>
>> 
>> 
> 
> -- 
> Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>


Re: Solr OpenNLP named entity extraction

Posted by Jerome Yang <je...@pivotal.io>.
Hi Steve,

Put models under " ${solr.solr.home}/lib/ " is not working.
I check the "ZkSolrResourceLoader" seems it will first try to find modes in
config set.
If not find, then it uses class loader to load from resources.

Regards,
Jerome

On Tue, Jul 10, 2018 at 9:58 AM Jerome Yang <je...@pivotal.io> wrote:

> Thanks Steve!
>
>
> On Tue, Jul 10, 2018 at 5:20 AM Steve Rowe <sa...@gmail.com> wrote:
>
>> Hi Jerome,
>>
>> See the ref guide[1] for a writeup of how to enable uploading files
>> larger than 1MB into ZooKeeper.
>>
>> Local storage should also work - have you tried placing OpenNLP model
>> files in ${solr.solr.home}/lib/ ? - make sure you do the same on each node.
>>
>> [1]
>> https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On Jul 9, 2018, at 12:50 AM, Jerome Yang <je...@pivotal.io> wrote:
>> >
>> > Hi guys,
>> >
>> > In Solrcloud mode, where to put the OpenNLP models?
>> > Upload to zookeeper?
>> > As I test on solr 7.3.1, seems absolute path on local host is not
>> working.
>> > And can not upload into zookeeper if the model size exceed 1M.
>> >
>> > Regards,
>> > Jerome
>> >
>> > On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe <sa...@gmail.com> wrote:
>> >
>> >> Hi Alexey,
>> >>
>> >> First, thanks for moving the conversation to the mailing list.
>> Discussion
>> >> of usage problems should take place here rather than in JIRA.
>> >>
>> >> I locally set up Solr 7.3 similarly to you and was able to get things
>> to
>> >> work.
>> >>
>> >> Problems with your setup:
>> >>
>> >> 1. Your update chain is missing the Log and Run update processors at
>> the
>> >> end (I see these are missing from the example in the javadocs for the
>> >> OpenNLP NER update processor; I’ll fix that):
>> >>
>> >>     <processor class="solr.LogUpdateProcessorFactory" />
>> >>     <processor class="solr.RunUpdateProcessorFactory" />
>> >>
>> >>   The Log update processor isn’t strictly necessary, but, from <
>> >>
>> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
>> >>> :
>> >>
>> >>       Do not forget to add RunUpdateProcessorFactory at the end of any
>> >>       chains you define in solrconfig.xml. Otherwise update requests
>> >>       processed by that chain will not actually affect the indexed
>> data.
>> >>
>> >> 2. Your example document is missing an “id” field.
>> >>
>> >> 3. For whatever reason, the pre-trained model "en-ner-person.bin"
>> doesn’t
>> >> extract anything from text “This is Steve Jobs 2”.  It will extract
>> “Steve
>> >> Jobs” from text “This is Steve Jobs in white” e.g. though.
>> >>
>> >> 4. (Not a problem necessarily) You may want to use a multi-valued
>> “string”
>> >> field for the “dest” field in your update chain, e.g. “people_str”
>> (“*_str”
>> >> in the default configset is so configured).
>> >>
>> >> --
>> >> Steve
>> >> www.lucidworks.com
>> >>
>> >>> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <
>> alex1989ster@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi once more I am trying to implement named entities extraction using
>> >> this
>> >>> manual
>> >>>
>> >>
>> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
>> >>>
>> >>> I am modified solrconfig.xml like this:
>> >>>
>> >>> <updateRequestProcessorChain name="multiple-extract">
>> >>>  <processor
>> >> class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
>> >>>    <str name="modelFile">opennlp/en-ner-person.bin</str>
>> >>>    <str name="analyzerFieldType">text_opennlp</str>
>> >>>    <str name="source">description_en</str>
>> >>>    <str name="dest">content</str>
>> >>>  </processor>
>> >>> </updateRequestProcessorChain>
>> >>>
>> >>> But when I was trying to add data using:
>> >>>
>> >>> *request:*
>> >>>
>> >>> POST
>> >>>
>> >>
>> http://localhost:8983/solr/numberplate/update?version=2.2&wt=xml&update.chain=multiple-extract
>> >>>
>> >>> <add><doc><field name="description_en">This is Steve Jobs 2
>> >>> </field><field name="content_pos">This is text 2</field><field
>> >>> name="content">This is text for content 2</field></doc></add>
>> >>>
>> >>> *response*
>> >>>
>> >>> <?xml version="1.0" encoding="UTF-8"?>
>> >>> <response>
>> >>>   <lst name="responseHeader">
>> >>>       <int name="status">0</int>
>> >>>       <int name="QTime">3</int>
>> >>>   </lst>
>> >>> </response>
>> >>>
>> >>> But I don't see any data inserted to *content* field and in any other
>> >> field.
>> >>>
>> >>> *If you need some additional data I can provide it.*
>> >>>
>> >>> Can you help me? What have I done wrong?
>> >>
>> >>
>> >
>> > --
>> > Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>
>>
>>
>
> --
>  Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>
>
>

-- 
 Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>

Re: Solr OpenNLP named entity extraction

Posted by Jerome Yang <je...@pivotal.io>.
Thanks Steve!


On Tue, Jul 10, 2018 at 5:20 AM Steve Rowe <sa...@gmail.com> wrote:

> Hi Jerome,
>
> See the ref guide[1] for a writeup of how to enable uploading files larger
> than 1MB into ZooKeeper.
>
> Local storage should also work - have you tried placing OpenNLP model
> files in ${solr.solr.home}/lib/ ? - make sure you do the same on each node.
>
> [1]
> https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit
>
> --
> Steve
> www.lucidworks.com
>
> > On Jul 9, 2018, at 12:50 AM, Jerome Yang <je...@pivotal.io> wrote:
> >
> > Hi guys,
> >
> > In Solrcloud mode, where to put the OpenNLP models?
> > Upload to zookeeper?
> > As I test on solr 7.3.1, seems absolute path on local host is not
> working.
> > And can not upload into zookeeper if the model size exceed 1M.
> >
> > Regards,
> > Jerome
> >
> > On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe <sa...@gmail.com> wrote:
> >
> >> Hi Alexey,
> >>
> >> First, thanks for moving the conversation to the mailing list.
> Discussion
> >> of usage problems should take place here rather than in JIRA.
> >>
> >> I locally set up Solr 7.3 similarly to you and was able to get things to
> >> work.
> >>
> >> Problems with your setup:
> >>
> >> 1. Your update chain is missing the Log and Run update processors at the
> >> end (I see these are missing from the example in the javadocs for the
> >> OpenNLP NER update processor; I’ll fix that):
> >>
> >>     <processor class="solr.LogUpdateProcessorFactory" />
> >>     <processor class="solr.RunUpdateProcessorFactory" />
> >>
> >>   The Log update processor isn’t strictly necessary, but, from <
> >>
> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
> >>> :
> >>
> >>       Do not forget to add RunUpdateProcessorFactory at the end of any
> >>       chains you define in solrconfig.xml. Otherwise update requests
> >>       processed by that chain will not actually affect the indexed data.
> >>
> >> 2. Your example document is missing an “id” field.
> >>
> >> 3. For whatever reason, the pre-trained model "en-ner-person.bin"
> doesn’t
> >> extract anything from text “This is Steve Jobs 2”.  It will extract
> “Steve
> >> Jobs” from text “This is Steve Jobs in white” e.g. though.
> >>
> >> 4. (Not a problem necessarily) You may want to use a multi-valued
> “string”
> >> field for the “dest” field in your update chain, e.g. “people_str”
> (“*_str”
> >> in the default configset is so configured).
> >>
> >> --
> >> Steve
> >> www.lucidworks.com
> >>
> >>> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <
> alex1989ster@gmail.com>
> >> wrote:
> >>>
> >>> Hi once more I am trying to implement named entities extraction using
> >> this
> >>> manual
> >>>
> >>
> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
> >>>
> >>> I am modified solrconfig.xml like this:
> >>>
> >>> <updateRequestProcessorChain name="multiple-extract">
> >>>  <processor
> >> class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
> >>>    <str name="modelFile">opennlp/en-ner-person.bin</str>
> >>>    <str name="analyzerFieldType">text_opennlp</str>
> >>>    <str name="source">description_en</str>
> >>>    <str name="dest">content</str>
> >>>  </processor>
> >>> </updateRequestProcessorChain>
> >>>
> >>> But when I was trying to add data using:
> >>>
> >>> *request:*
> >>>
> >>> POST
> >>>
> >>
> http://localhost:8983/solr/numberplate/update?version=2.2&wt=xml&update.chain=multiple-extract
> >>>
> >>> <add><doc><field name="description_en">This is Steve Jobs 2
> >>> </field><field name="content_pos">This is text 2</field><field
> >>> name="content">This is text for content 2</field></doc></add>
> >>>
> >>> *response*
> >>>
> >>> <?xml version="1.0" encoding="UTF-8"?>
> >>> <response>
> >>>   <lst name="responseHeader">
> >>>       <int name="status">0</int>
> >>>       <int name="QTime">3</int>
> >>>   </lst>
> >>> </response>
> >>>
> >>> But I don't see any data inserted to *content* field and in any other
> >> field.
> >>>
> >>> *If you need some additional data I can provide it.*
> >>>
> >>> Can you help me? What have I done wrong?
> >>
> >>
> >
> > --
> > Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>
>
>

-- 
 Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>

Re: Solr OpenNLP named entity extraction

Posted by Steve Rowe <sa...@gmail.com>.
Hi Jerome,

See the ref guide[1] for a writeup of how to enable uploading files larger than 1MB into ZooKeeper.

Local storage should also work - have you tried placing OpenNLP model files in ${solr.solr.home}/lib/ ? - make sure you do the same on each node.

[1] https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit

--
Steve
www.lucidworks.com

> On Jul 9, 2018, at 12:50 AM, Jerome Yang <je...@pivotal.io> wrote:
> 
> Hi guys,
> 
> In Solrcloud mode, where to put the OpenNLP models?
> Upload to zookeeper?
> As I test on solr 7.3.1, seems absolute path on local host is not working.
> And can not upload into zookeeper if the model size exceed 1M.
> 
> Regards,
> Jerome
> 
> On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe <sa...@gmail.com> wrote:
> 
>> Hi Alexey,
>> 
>> First, thanks for moving the conversation to the mailing list.  Discussion
>> of usage problems should take place here rather than in JIRA.
>> 
>> I locally set up Solr 7.3 similarly to you and was able to get things to
>> work.
>> 
>> Problems with your setup:
>> 
>> 1. Your update chain is missing the Log and Run update processors at the
>> end (I see these are missing from the example in the javadocs for the
>> OpenNLP NER update processor; I’ll fix that):
>> 
>>     <processor class="solr.LogUpdateProcessorFactory" />
>>     <processor class="solr.RunUpdateProcessorFactory" />
>> 
>>   The Log update processor isn’t strictly necessary, but, from <
>> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
>>> :
>> 
>>       Do not forget to add RunUpdateProcessorFactory at the end of any
>>       chains you define in solrconfig.xml. Otherwise update requests
>>       processed by that chain will not actually affect the indexed data.
>> 
>> 2. Your example document is missing an “id” field.
>> 
>> 3. For whatever reason, the pre-trained model "en-ner-person.bin" doesn’t
>> extract anything from text “This is Steve Jobs 2”.  It will extract “Steve
>> Jobs” from text “This is Steve Jobs in white” e.g. though.
>> 
>> 4. (Not a problem necessarily) You may want to use a multi-valued “string”
>> field for the “dest” field in your update chain, e.g. “people_str” (“*_str”
>> in the default configset is so configured).
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <al...@gmail.com>
>> wrote:
>>> 
>>> Hi once more I am trying to implement named entities extraction using
>> this
>>> manual
>>> 
>> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
>>> 
>>> I am modified solrconfig.xml like this:
>>> 
>>> <updateRequestProcessorChain name="multiple-extract">
>>>  <processor
>> class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
>>>    <str name="modelFile">opennlp/en-ner-person.bin</str>
>>>    <str name="analyzerFieldType">text_opennlp</str>
>>>    <str name="source">description_en</str>
>>>    <str name="dest">content</str>
>>>  </processor>
>>> </updateRequestProcessorChain>
>>> 
>>> But when I was trying to add data using:
>>> 
>>> *request:*
>>> 
>>> POST
>>> 
>> http://localhost:8983/solr/numberplate/update?version=2.2&wt=xml&update.chain=multiple-extract
>>> 
>>> <add><doc><field name="description_en">This is Steve Jobs 2
>>> </field><field name="content_pos">This is text 2</field><field
>>> name="content">This is text for content 2</field></doc></add>
>>> 
>>> *response*
>>> 
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <response>
>>>   <lst name="responseHeader">
>>>       <int name="status">0</int>
>>>       <int name="QTime">3</int>
>>>   </lst>
>>> </response>
>>> 
>>> But I don't see any data inserted to *content* field and in any other
>> field.
>>> 
>>> *If you need some additional data I can provide it.*
>>> 
>>> Can you help me? What have I done wrong?
>> 
>> 
> 
> -- 
> Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>


Re: Solr OpenNLP named entity extraction

Posted by Jerome Yang <je...@pivotal.io>.
Hi guys,

In Solrcloud mode, where to put the OpenNLP models?
Upload to zookeeper?
As I test on solr 7.3.1, seems absolute path on local host is not working.
And can not upload into zookeeper if the model size exceed 1M.

Regards,
Jerome

On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe <sa...@gmail.com> wrote:

> Hi Alexey,
>
> First, thanks for moving the conversation to the mailing list.  Discussion
> of usage problems should take place here rather than in JIRA.
>
> I locally set up Solr 7.3 similarly to you and was able to get things to
> work.
>
> Problems with your setup:
>
> 1. Your update chain is missing the Log and Run update processors at the
> end (I see these are missing from the example in the javadocs for the
> OpenNLP NER update processor; I’ll fix that):
>
>      <processor class="solr.LogUpdateProcessorFactory" />
>      <processor class="solr.RunUpdateProcessorFactory" />
>
>    The Log update processor isn’t strictly necessary, but, from <
> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
> >:
>
>        Do not forget to add RunUpdateProcessorFactory at the end of any
>        chains you define in solrconfig.xml. Otherwise update requests
>        processed by that chain will not actually affect the indexed data.
>
> 2. Your example document is missing an “id” field.
>
> 3. For whatever reason, the pre-trained model "en-ner-person.bin" doesn’t
> extract anything from text “This is Steve Jobs 2”.  It will extract “Steve
> Jobs” from text “This is Steve Jobs in white” e.g. though.
>
> 4. (Not a problem necessarily) You may want to use a multi-valued “string”
> field for the “dest” field in your update chain, e.g. “people_str” (“*_str”
> in the default configset is so configured).
>
> --
> Steve
> www.lucidworks.com
>
> > On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <al...@gmail.com>
> wrote:
> >
> > Hi once more I am trying to implement named entities extraction using
> this
> > manual
> >
> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
> >
> > I am modified solrconfig.xml like this:
> >
> > <updateRequestProcessorChain name="multiple-extract">
> >   <processor
> class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
> >     <str name="modelFile">opennlp/en-ner-person.bin</str>
> >     <str name="analyzerFieldType">text_opennlp</str>
> >     <str name="source">description_en</str>
> >     <str name="dest">content</str>
> >   </processor>
> > </updateRequestProcessorChain>
> >
> > But when I was trying to add data using:
> >
> > *request:*
> >
> > POST
> >
> http://localhost:8983/solr/numberplate/update?version=2.2&wt=xml&update.chain=multiple-extract
> >
> > <add><doc><field name="description_en">This is Steve Jobs 2
> > </field><field name="content_pos">This is text 2</field><field
> > name="content">This is text for content 2</field></doc></add>
> >
> > *response*
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <response>
> >    <lst name="responseHeader">
> >        <int name="status">0</int>
> >        <int name="QTime">3</int>
> >    </lst>
> > </response>
> >
> > But I don't see any data inserted to *content* field and in any other
> field.
> >
> > *If you need some additional data I can provide it.*
> >
> > Can you help me? What have I done wrong?
>
>

-- 
 Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>

Re: Solr OpenNLP named entity extraction

Posted by Steve Rowe <sa...@gmail.com>.
Hi Alexey,

First, thanks for moving the conversation to the mailing list.  Discussion of usage problems should take place here rather than in JIRA.

I locally set up Solr 7.3 similarly to you and was able to get things to work.

Problems with your setup:

1. Your update chain is missing the Log and Run update processors at the end (I see these are missing from the example in the javadocs for the OpenNLP NER update processor; I’ll fix that):

     <processor class="solr.LogUpdateProcessorFactory" />
     <processor class="solr.RunUpdateProcessorFactory" />

   The Log update processor isn’t strictly necessary, but, from <https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain>:

       Do not forget to add RunUpdateProcessorFactory at the end of any
       chains you define in solrconfig.xml. Otherwise update requests
       processed by that chain will not actually affect the indexed data.

2. Your example document is missing an “id” field.

3. For whatever reason, the pre-trained model "en-ner-person.bin" doesn’t extract anything from text “This is Steve Jobs 2”.  It will extract “Steve Jobs” from text “This is Steve Jobs in white” e.g. though.

4. (Not a problem necessarily) You may want to use a multi-valued “string” field for the “dest” field in your update chain, e.g. “people_str” (“*_str” in the default configset is so configured).

--
Steve
www.lucidworks.com

> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <al...@gmail.com> wrote:
> 
> Hi once more I am trying to implement named entities extraction using this
> manual
> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
> 
> I am modified solrconfig.xml like this:
> 
> <updateRequestProcessorChain name="multiple-extract">
>   <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
>     <str name="modelFile">opennlp/en-ner-person.bin</str>
>     <str name="analyzerFieldType">text_opennlp</str>
>     <str name="source">description_en</str>
>     <str name="dest">content</str>
>   </processor>
> </updateRequestProcessorChain>
> 
> But when I was trying to add data using:
> 
> *request:*
> 
> POST
> http://localhost:8983/solr/numberplate/update?version=2.2&wt=xml&update.chain=multiple-extract
> 
> <add><doc><field name="description_en">This is Steve Jobs 2
> </field><field name="content_pos">This is text 2</field><field
> name="content">This is text for content 2</field></doc></add>
> 
> *response*
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>    <lst name="responseHeader">
>        <int name="status">0</int>
>        <int name="QTime">3</int>
>    </lst>
> </response>
> 
> But I don't see any data inserted to *content* field and in any other field.
> 
> *If you need some additional data I can provide it.*
> 
> Can you help me? What have I done wrong?


Re: Solr OpenNLP named entity extraction

Posted by David Hastings <ha...@gmail.com>.
Did you send a commit after you sent the document?

On Tue, Apr 17, 2018 at 8:23 AM, Alexey Ponomarenko <al...@gmail.com>
wrote:

> Hi once more I am trying to implement named entities extraction using this
> manual
> https://lucene.apache.org/solr/7_3_0//solr-analysis-
> extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpd
> ateProcessorFactory.html
>
> I am modified solrconfig.xml like this:
>
>  <updateRequestProcessorChain name="multiple-extract">
>    <processor class="solr.OpenNLPExtractNamedEntitiesUpd
> ateProcessorFactory">
>      <str name="modelFile">opennlp/en-ner-person.bin</str>
>      <str name="analyzerFieldType">text_opennlp</str>
>      <str name="source">description_en</str>
>      <str name="dest">content</str>
>    </processor>
>  </updateRequestProcessorChain>
>
> But when I was trying to add data using:
>
> *request:*
>
> POST
> http://localhost:8983/solr/numberplate/update?version=2.
> 2&wt=xml&update.chain=multiple-extract
>
> <add><doc><field name="description_en">This is Steve Jobs 2
> </field><field name="content_pos">This is text 2</field><field
> name="content">This is text for content 2</field></doc></add>
>
> *response*
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>     <lst name="responseHeader">
>         <int name="status">0</int>
>         <int name="QTime">3</int>
>     </lst>
> </response>
>
> But I don't see any data inserted to *content* field and in any other
> field.
>
> *If you need some additional data I can provide it.*
>
> Can you help me? What have I done wrong?
>