You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by Dileepa Jayakody <di...@gmail.com> on 2013/11/27 11:12:22 UTC

How to improve NER results in Stanbol

[Typo corrected in the subject of the mail]
---------- Forwarded message ----------
From: Dileepa Jayakody <di...@gmail.com>
Date: Wed, Nov 27, 2013 at 3:40 PM
Subject: How to refinin NER results in Stanbol
To: Stanbol Dev List <de...@stanbol.apache.org>

Hi All,

I have been running some load tests on Stanbol entity recognition, with a
high load of content extracted from web articles and stored in a Solr index.

My objective is to achieve an efficient and accurate enhancement result for
the content submitted.

But I think some of the NER results obtained are not accurate.

For an example I submit the content :
Group Finance Director Chris Lucas and Group General Counsel Mark Harding
to retire from Barclays

I get below entity recognition results from default enhancement-chain;

People : Chris Lucas, Mark Harding
Organization: Barclays, *BT Group*, *Finance Director Chris Lucas and Group
General Counsel*

The highlighted NERs for organizations above are inaccurate results.
BT Group is not mentioned in the content, and the result : *Finance
Director Chris Lucas and Group General Counsel * is not an organization,
rather a phrase.
Further if I add a fullstop (.) to the end of the sentence "Barclays" is
not recognized as an Organization.

I think we need to improve these results in Stanbol NER. Can we tweak
OpenNLP-NER component for this?

Any ideas/pointers on how to refine these enhancement results will be
immensely helpful.
I'm looking for a way to improve the accuracy of the results as much as
possible.

Thanks,
Dileepa

Re: How to improve NER results in Stanbol

Posted by Rafa Haro <rh...@apache.org>.

Hi Dileepa,

Effectively, using OpenNLP NER engine I'm also obtaining a 
TextAnnotation with the selected-text "Executive Committee of Barclays", 
although the confidence level is low (0.40117949264383646), a feature 
that you might want to take into account. Anyway, the only way to 
improve the OpenNLP NER would be making better NER models and for that 
you need to provide better and/or bigger training data. If you check the 
OpenNLP documentation, you will find that the NER model for English has 
been trained on freely available manually annotated corpora, mainly from 
news articles. Consequently, the OpenNLP name finder is supposed to work 
better with "news documents", specially those using linguistic 
structures similar to the ones used in the training data. Anyway, as any 
other statistical based tool, it is not going to be never perfect.

Improve name entity recognition is a hard task. The ideal situation is 
to provide training data for your domain. Also you can try some 
workarounds, for example mixing/merging the results of different NER 
engines (OpenNLP, Stanford, Freeling...) or even at Entity Linking level 
combining the result of Named Entity Linking and Keyword Linking engines.

Cheers,
Rafa

El 27/11/13 11:28, Dileepa Jayakody escribió:
> Hi Rafa,
>
> I'm using the default chain;
> tika
> langdetect
> opennlp-sentence
> opennlp-token
> opennlp-pos	
> opennlp-ner
> dbpediaLinking
> entityhubExtraction
>
> Thanks,
> Dileepa
>
>
> On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <rh...@apache.org> wrote:
>
>> Hi Dileepa,
>>
>> Are you using only OpenNLP NER engine or are you also including an Entity
>> Linking engine?
>>
>>
>> El 27/11/13 11:17, Dileepa Jayakody escribió:
>>
>>> Content:
>>> Barclays has appointed Shaygan Kheradpir to the role of Chief Operations
>>> and Technology Officer. He will join the Executive Committee of Barclays
>>> and report directly to Group Chief Executive Antony Jenkins.
>>>
>>> Above content doesn't identify* Barclays* as an organization by
>>> identifies *Executive
>>> Committee of Barclays* as an organization.
>>>
>>>
>>> How can we improve the accuracy of these results?
>>>
>>> Thanks,
>>> Dileepa
>>>
>>>
>>> On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <
>>> dileepajayakody@gmail.com
>>>
>>>> wrote:
>>>> [Typo corrected in the subject of the mail]
>>>> ---------- Forwarded message ----------
>>>> From: Dileepa Jayakody <di...@gmail.com>
>>>> Date: Wed, Nov 27, 2013 at 3:40 PM
>>>> Subject: How to refinin NER results in Stanbol
>>>> To: Stanbol Dev List <de...@stanbol.apache.org>
>>>>
>>>>
>>>> Hi All,
>>>>
>>>> I have been running some load tests on Stanbol entity recognition, with a
>>>> high load of content extracted from web articles and stored in a Solr
>>>> index.
>>>>
>>>> My objective is to achieve an efficient and accurate enhancement result
>>>> for the content submitted.
>>>>
>>>> But I think some of the NER results obtained are not accurate.
>>>>
>>>> For an example I submit the content :
>>>> Group Finance Director Chris Lucas and Group General Counsel Mark Harding
>>>> to retire from Barclays
>>>>
>>>> I get below entity recognition results from default enhancement-chain;
>>>>
>>>> People : Chris Lucas, Mark Harding
>>>> Organization: Barclays, *BT Group*, *Finance Director Chris Lucas and
>>>> Group General Counsel*
>>>>
>>>>
>>>> The highlighted NERs for organizations above are inaccurate results.
>>>> BT Group is not mentioned in the content, and the result : *Finance
>>>> Director Chris Lucas and Group General Counsel * is not an organization,
>>>>
>>>> rather a phrase.
>>>> Further if I add a fullstop (.) to the end of the sentence "Barclays" is
>>>> not recognized as an Organization.
>>>>
>>>> I think we need to improve these results in Stanbol NER. Can we tweak
>>>> OpenNLP-NER component for this?
>>>>
>>>> Any ideas/pointers on how to refine these enhancement results will be
>>>> immensely helpful.
>>>> I'm looking for a way to improve the accuracy of the results as much as
>>>> possible.
>>>>
>>>> Thanks,
>>>> Dileepa
>>>>
>>>>
>>>>

Re: How to improve NER results in Stanbol

Posted by Dileepa Jayakody <di...@gmail.com>.

Hi Cristian,

Thanks a lot, yes the server is up and running and I can access the server
main page in the browser too.
I was expecting a log entry on server startup indication...sorry for the
false alarm guys.

BTW I think it will be good to give a log entry on successful server
startup completion.

Thanks,
Dileepa


On Fri, Nov 29, 2013 at 8:46 PM, Cristian Petroaca <
cristian.petroaca@gmail.com> wrote:

> Hi Dileepa,
>
> I've played with the Stanbol Stanford NLP project for a little while. From
> what I saw that is normal output, it means the server is up and running.
>
> If you issue a command like "curl -X POST -H "Content-Type: text/plain" -H
> "Content-Language: en" --data "[YOUR TEXT]" http://localhost:
> [PORT]/analysis"
> , replacing [YOUR_TEXT] and [PORT] accordingly you should see it work
> giving you a json output.
>
> Regards,
> Cristian
>
>
> 2013/11/29 Dileepa Jayakody <di...@gmail.com>
>
> > Hi Rupert,
> >
> > Thanks again for your suggestions.
> > I cloned and build the stanbol-stanfordnlp project above and executed the
> > run command [1] as below in a separate directory. But the server startup
> > doesn't complete..it hangs at a point with the log entry : "Reading
> > TokensRegex rules from
> > edu/stanford/nlp/models/sutime/english.holidays.sutime.txt"
> >
> > Any ideas? Can I edit the configurations to skip the above TokenRegex
> rules
> > and start the server?
> >
> > Thanks,
> > Dileepa
> >
> > [1]
> > dileepa@dileepa-laptop2:~/apache/stanfordNLP_stanbol/server$ *java
> -Xmx1g
> > -jar
> >
> >
> at.salzburgresearch.stanbol.stanbol.enhancer.nlp.stanford.server-1.0.0-SNAPSHOT-jar-with-dependencies.jar*
> > Loading default properties from tagger
> >
> >
> edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger
> > Reading POS tagger model from
> >
> >
> edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger
> > ... done [2.2 sec].
> > Loading classifier from
> > edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ...
> done
> > [6.1 sec].
> > Loading classifier from
> > edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ...
> > done [4.3 sec].
> > Loading classifier from
> > edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ...
> done
> > [3.9 sec].
> > Initialization JollyDayHoliday for sutime
> > Reading TokensRegex rules from
> > edu/stanford/nlp/models/sutime/defs.sutime.txt
> > Reading TokensRegex rules from
> > edu/stanford/nlp/models/sutime/english.sutime.txt
> > Nov 29, 2013 7:14:24 PM
> > edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
> > INFO: Ignoring inactive rule: temporal-composite-8:ranges
> > Reading TokensRegex rules from
> > edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
> >
> >
> >
> > On Fri, Nov 29, 2013 at 11:48 AM, Rupert Westenthaler <
> > rupert.westenthaler@gmail.com> wrote:
> >
> > > Hi Dileepa
> > >
> > > If you require to detect Entities that are not part of the Controlled
> > > Vocabularies than there is no way around NER. If you want to have good
> > > results there will be no way around of building your own models based
> > > on a custom trainings set.
> > >
> > > If you need to detect Persons, Organizations and Places you might have
> > > a look at Stanford NLP with the Stanbol integration [1]. As the
> > > Stanford Model provided by Stanford NLP is much better as such of
> > > OpenNLP.
> > >
> > > best
> > > Rupert
> > >
> > >
> > > [1] https://github.com/westei/stanbol-stanfordnlp
> > >
> > > On Thu, Nov 28, 2013 at 6:57 AM, Dileepa Jayakody
> > > <di...@gmail.com> wrote:
> > > > Hi Rafa, Rupert,
> > > >
> > > > Thanks a lot for your input. I will look at the options you have
> > > suggested.
> > > > However, in the first phase of my project I don't require
> > entity-linking
> > > > from entity-hub because many of the entities mentioned in the
> content I
> > > > submit will not be available in dbpedia. Therefore currently I also
> > don't
> > > > require dbpediaLinking, entityhubExtraction engines in the default
> > chain
> > > > I'm using. I will look at implementing a custom-vocab in the second
> > phase
> > > > of the project for entity-linking and disambiguation purpose.
> > > >
> > > > At the moment, I focus on improving the accuracy of
> > > > named-entity-recognition using NLP techniques. So I think
> > opennlp-chunker
> > > > based improvements will be very helpful at this point.
> > > >
> > > > Do you think the accuracy of NER will be improved if I also associate
> > > > entitylinking with dbpedia, dbpedia-fst-linking?
> > > >
> > > > Thanks,
> > > > Dileepa
> > > >
> > > >
> > > > On Wed, Nov 27, 2013 at 7:54 PM, Rupert Westenthaler <
> > > > rupert.westenthaler@gmail.com> wrote:
> > > >
> > > >> Hi Dileepa,
> > > >>
> > > >> I would suggest you also test with a chain that uses Entity Linking
> > > >> instead of Named Entity Linking. Have you tried the
> > > >> "dbpedia-fst-linking" chain? This one is also configured in the
> > > >> default launcher. Please also have a look at STANBOL-1211 [1] that
> > > >> brought a lot of improvements for EntityLinking if you include a
> > > >> chunker (e.g. the opennlp-chunker) in your chain.
> > > >>
> > > >> best
> > > >> Rupert
> > > >>
> > > >>
> > > >> [1] https://issues.apache.org/jira/browse/STANBOL-1211
> > > >>
> > > >> On Wed, Nov 27, 2013 at 11:28 AM, Dileepa Jayakody
> > > >> <di...@gmail.com> wrote:
> > > >> > Hi Rafa,
> > > >> >
> > > >> > I'm using the default chain;
> > > >> > tika
> > > >> > langdetect
> > > >> > opennlp-sentence
> > > >> > opennlp-token
> > > >> > opennlp-pos
> > > >> > opennlp-ner
> > > >> > dbpediaLinking
> > > >> > entityhubExtraction
> > > >> >
> > > >> > Thanks,
> > > >> > Dileepa
> > > >> >
> > > >> >
> > > >> > On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <rh...@apache.org>
> > wrote:
> > > >> >
> > > >> >> Hi Dileepa,
> > > >> >>
> > > >> >> Are you using only OpenNLP NER engine or are you also including
> an
> > > >> Entity
> > > >> >> Linking engine?
> > > >> >>
> > > >> >>
> > > >> >> El 27/11/13 11:17, Dileepa Jayakody escribió:
> > > >> >>
> > > >> >>> Content:
> > > >> >>> Barclays has appointed Shaygan Kheradpir to the role of Chief
> > > >> Operations
> > > >> >>> and Technology Officer. He will join the Executive Committee of
> > > >> Barclays
> > > >> >>> and report directly to Group Chief Executive Antony Jenkins.
> > > >> >>>
> > > >> >>> Above content doesn't identify* Barclays* as an organization by
> > > >> >>> identifies *Executive
> > > >> >>> Committee of Barclays* as an organization.
> > > >> >>>
> > > >> >>>
> > > >> >>> How can we improve the accuracy of these results?
> > > >> >>>
> > > >> >>> Thanks,
> > > >> >>> Dileepa
> > > >> >>>
> > > >> >>>
> > > >> >>> On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <
> > > >> >>> dileepajayakody@gmail.com
> > > >> >>>
> > > >> >>>> wrote:
> > > >> >>>> [Typo corrected in the subject of the mail]
> > > >> >>>> ---------- Forwarded message ----------
> > > >> >>>> From: Dileepa Jayakody <di...@gmail.com>
> > > >> >>>> Date: Wed, Nov 27, 2013 at 3:40 PM
> > > >> >>>> Subject: How to refinin NER results in Stanbol
> > > >> >>>> To: Stanbol Dev List <de...@stanbol.apache.org>
> > > >> >>>>
> > > >> >>>>
> > > >> >>>> Hi All,
> > > >> >>>>
> > > >> >>>> I have been running some load tests on Stanbol entity
> > recognition,
> > > >> with a
> > > >> >>>> high load of content extracted from web articles and stored in
> a
> > > Solr
> > > >> >>>> index.
> > > >> >>>>
> > > >> >>>> My objective is to achieve an efficient and accurate
> enhancement
> > > >> result
> > > >> >>>> for the content submitted.
> > > >> >>>>
> > > >> >>>> But I think some of the NER results obtained are not accurate.
> > > >> >>>>
> > > >> >>>> For an example I submit the content :
> > > >> >>>> Group Finance Director Chris Lucas and Group General Counsel
> Mark
> > > >> Harding
> > > >> >>>> to retire from Barclays
> > > >> >>>>
> > > >> >>>> I get below entity recognition results from default
> > > enhancement-chain;
> > > >> >>>>
> > > >> >>>> People : Chris Lucas, Mark Harding
> > > >> >>>> Organization: Barclays, *BT Group*, *Finance Director Chris
> Lucas
> > > and
> > > >> >>>> Group General Counsel*
> > > >> >>>>
> > > >> >>>>
> > > >> >>>> The highlighted NERs for organizations above are inaccurate
> > > results.
> > > >> >>>> BT Group is not mentioned in the content, and the result :
> > *Finance
> > > >> >>>> Director Chris Lucas and Group General Counsel * is not an
> > > >> organization,
> > > >> >>>>
> > > >> >>>> rather a phrase.
> > > >> >>>> Further if I add a fullstop (.) to the end of the sentence
> > > "Barclays"
> > > >> is
> > > >> >>>> not recognized as an Organization.
> > > >> >>>>
> > > >> >>>> I think we need to improve these results in Stanbol NER. Can we
> > > tweak
> > > >> >>>> OpenNLP-NER component for this?
> > > >> >>>>
> > > >> >>>> Any ideas/pointers on how to refine these enhancement results
> > will
> > > be
> > > >> >>>> immensely helpful.
> > > >> >>>> I'm looking for a way to improve the accuracy of the results as
> > > much
> > > >> as
> > > >> >>>> possible.
> > > >> >>>>
> > > >> >>>> Thanks,
> > > >> >>>> Dileepa
> > > >> >>>>
> > > >> >>>>
> > > >> >>>>
> > > >> >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> > > >> | Bodenlehenstraße 11                             ++43-699-11108907
> > > >> | A-5500 Bischofshofen
> > > >>
> > >
> > >
> > >
> > > --
> > > | Rupert Westenthaler             rupert.westenthaler@gmail.com
> > > | Bodenlehenstraße 11                             ++43-699-11108907
> > > | A-5500 Bischofshofen
> > >
> >
>

Re: How to improve NER results in Stanbol

Posted by Cristian Petroaca <cr...@gmail.com>.

Hi Dileepa,

I've played with the Stanbol Stanford NLP project for a little while. From
what I saw that is normal output, it means the server is up and running.

If you issue a command like "curl -X POST -H "Content-Type: text/plain" -H
"Content-Language: en" --data "[YOUR TEXT]" http://localhost:[PORT]/analysis"
, replacing [YOUR_TEXT] and [PORT] accordingly you should see it work
giving you a json output.

Regards,
Cristian


2013/11/29 Dileepa Jayakody <di...@gmail.com>

> Hi Rupert,
>
> Thanks again for your suggestions.
> I cloned and build the stanbol-stanfordnlp project above and executed the
> run command [1] as below in a separate directory. But the server startup
> doesn't complete..it hangs at a point with the log entry : "Reading
> TokensRegex rules from
> edu/stanford/nlp/models/sutime/english.holidays.sutime.txt"
>
> Any ideas? Can I edit the configurations to skip the above TokenRegex rules
> and start the server?
>
> Thanks,
> Dileepa
>
> [1]
> dileepa@dileepa-laptop2:~/apache/stanfordNLP_stanbol/server$ *java -Xmx1g
> -jar
>
> at.salzburgresearch.stanbol.stanbol.enhancer.nlp.stanford.server-1.0.0-SNAPSHOT-jar-with-dependencies.jar*
> Loading default properties from tagger
>
> edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger
> Reading POS tagger model from
>
> edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger
> ... done [2.2 sec].
> Loading classifier from
> edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done
> [6.1 sec].
> Loading classifier from
> edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ...
> done [4.3 sec].
> Loading classifier from
> edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done
> [3.9 sec].
> Initialization JollyDayHoliday for sutime
> Reading TokensRegex rules from
> edu/stanford/nlp/models/sutime/defs.sutime.txt
> Reading TokensRegex rules from
> edu/stanford/nlp/models/sutime/english.sutime.txt
> Nov 29, 2013 7:14:24 PM
> edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
> INFO: Ignoring inactive rule: temporal-composite-8:ranges
> Reading TokensRegex rules from
> edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
>
>
>
> On Fri, Nov 29, 2013 at 11:48 AM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
> > Hi Dileepa
> >
> > If you require to detect Entities that are not part of the Controlled
> > Vocabularies than there is no way around NER. If you want to have good
> > results there will be no way around of building your own models based
> > on a custom trainings set.
> >
> > If you need to detect Persons, Organizations and Places you might have
> > a look at Stanford NLP with the Stanbol integration [1]. As the
> > Stanford Model provided by Stanford NLP is much better as such of
> > OpenNLP.
> >
> > best
> > Rupert
> >
> >
> > [1] https://github.com/westei/stanbol-stanfordnlp
> >
> > On Thu, Nov 28, 2013 at 6:57 AM, Dileepa Jayakody
> > <di...@gmail.com> wrote:
> > > Hi Rafa, Rupert,
> > >
> > > Thanks a lot for your input. I will look at the options you have
> > suggested.
> > > However, in the first phase of my project I don't require
> entity-linking
> > > from entity-hub because many of the entities mentioned in the content I
> > > submit will not be available in dbpedia. Therefore currently I also
> don't
> > > require dbpediaLinking, entityhubExtraction engines in the default
> chain
> > > I'm using. I will look at implementing a custom-vocab in the second
> phase
> > > of the project for entity-linking and disambiguation purpose.
> > >
> > > At the moment, I focus on improving the accuracy of
> > > named-entity-recognition using NLP techniques. So I think
> opennlp-chunker
> > > based improvements will be very helpful at this point.
> > >
> > > Do you think the accuracy of NER will be improved if I also associate
> > > entitylinking with dbpedia, dbpedia-fst-linking?
> > >
> > > Thanks,
> > > Dileepa
> > >
> > >
> > > On Wed, Nov 27, 2013 at 7:54 PM, Rupert Westenthaler <
> > > rupert.westenthaler@gmail.com> wrote:
> > >
> > >> Hi Dileepa,
> > >>
> > >> I would suggest you also test with a chain that uses Entity Linking
> > >> instead of Named Entity Linking. Have you tried the
> > >> "dbpedia-fst-linking" chain? This one is also configured in the
> > >> default launcher. Please also have a look at STANBOL-1211 [1] that
> > >> brought a lot of improvements for EntityLinking if you include a
> > >> chunker (e.g. the opennlp-chunker) in your chain.
> > >>
> > >> best
> > >> Rupert
> > >>
> > >>
> > >> [1] https://issues.apache.org/jira/browse/STANBOL-1211
> > >>
> > >> On Wed, Nov 27, 2013 at 11:28 AM, Dileepa Jayakody
> > >> <di...@gmail.com> wrote:
> > >> > Hi Rafa,
> > >> >
> > >> > I'm using the default chain;
> > >> > tika
> > >> > langdetect
> > >> > opennlp-sentence
> > >> > opennlp-token
> > >> > opennlp-pos
> > >> > opennlp-ner
> > >> > dbpediaLinking
> > >> > entityhubExtraction
> > >> >
> > >> > Thanks,
> > >> > Dileepa
> > >> >
> > >> >
> > >> > On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <rh...@apache.org>
> wrote:
> > >> >
> > >> >> Hi Dileepa,
> > >> >>
> > >> >> Are you using only OpenNLP NER engine or are you also including an
> > >> Entity
> > >> >> Linking engine?
> > >> >>
> > >> >>
> > >> >> El 27/11/13 11:17, Dileepa Jayakody escribió:
> > >> >>
> > >> >>> Content:
> > >> >>> Barclays has appointed Shaygan Kheradpir to the role of Chief
> > >> Operations
> > >> >>> and Technology Officer. He will join the Executive Committee of
> > >> Barclays
> > >> >>> and report directly to Group Chief Executive Antony Jenkins.
> > >> >>>
> > >> >>> Above content doesn't identify* Barclays* as an organization by
> > >> >>> identifies *Executive
> > >> >>> Committee of Barclays* as an organization.
> > >> >>>
> > >> >>>
> > >> >>> How can we improve the accuracy of these results?
> > >> >>>
> > >> >>> Thanks,
> > >> >>> Dileepa
> > >> >>>
> > >> >>>
> > >> >>> On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <
> > >> >>> dileepajayakody@gmail.com
> > >> >>>
> > >> >>>> wrote:
> > >> >>>> [Typo corrected in the subject of the mail]
> > >> >>>> ---------- Forwarded message ----------
> > >> >>>> From: Dileepa Jayakody <di...@gmail.com>
> > >> >>>> Date: Wed, Nov 27, 2013 at 3:40 PM
> > >> >>>> Subject: How to refinin NER results in Stanbol
> > >> >>>> To: Stanbol Dev List <de...@stanbol.apache.org>
> > >> >>>>
> > >> >>>>
> > >> >>>> Hi All,
> > >> >>>>
> > >> >>>> I have been running some load tests on Stanbol entity
> recognition,
> > >> with a
> > >> >>>> high load of content extracted from web articles and stored in a
> > Solr
> > >> >>>> index.
> > >> >>>>
> > >> >>>> My objective is to achieve an efficient and accurate enhancement
> > >> result
> > >> >>>> for the content submitted.
> > >> >>>>
> > >> >>>> But I think some of the NER results obtained are not accurate.
> > >> >>>>
> > >> >>>> For an example I submit the content :
> > >> >>>> Group Finance Director Chris Lucas and Group General Counsel Mark
> > >> Harding
> > >> >>>> to retire from Barclays
> > >> >>>>
> > >> >>>> I get below entity recognition results from default
> > enhancement-chain;
> > >> >>>>
> > >> >>>> People : Chris Lucas, Mark Harding
> > >> >>>> Organization: Barclays, *BT Group*, *Finance Director Chris Lucas
> > and
> > >> >>>> Group General Counsel*
> > >> >>>>
> > >> >>>>
> > >> >>>> The highlighted NERs for organizations above are inaccurate
> > results.
> > >> >>>> BT Group is not mentioned in the content, and the result :
> *Finance
> > >> >>>> Director Chris Lucas and Group General Counsel * is not an
> > >> organization,
> > >> >>>>
> > >> >>>> rather a phrase.
> > >> >>>> Further if I add a fullstop (.) to the end of the sentence
> > "Barclays"
> > >> is
> > >> >>>> not recognized as an Organization.
> > >> >>>>
> > >> >>>> I think we need to improve these results in Stanbol NER. Can we
> > tweak
> > >> >>>> OpenNLP-NER component for this?
> > >> >>>>
> > >> >>>> Any ideas/pointers on how to refine these enhancement results
> will
> > be
> > >> >>>> immensely helpful.
> > >> >>>> I'm looking for a way to improve the accuracy of the results as
> > much
> > >> as
> > >> >>>> possible.
> > >> >>>>
> > >> >>>> Thanks,
> > >> >>>> Dileepa
> > >> >>>>
> > >> >>>>
> > >> >>>>
> > >> >>
> > >>
> > >>
> > >>
> > >> --
> > >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> > >> | Bodenlehenstraße 11                             ++43-699-11108907
> > >> | A-5500 Bischofshofen
> > >>
> >
> >
> >
> > --
> > | Rupert Westenthaler             rupert.westenthaler@gmail.com
> > | Bodenlehenstraße 11                             ++43-699-11108907
> > | A-5500 Bischofshofen
> >
>

Re: How to improve NER results in Stanbol

Posted by Dileepa Jayakody <di...@gmail.com>.

Hi Rupert,

Thanks again for your suggestions.
I cloned and build the stanbol-stanfordnlp project above and executed the
run command [1] as below in a separate directory. But the server startup
doesn't complete..it hangs at a point with the log entry : "Reading
TokensRegex rules from
edu/stanford/nlp/models/sutime/english.holidays.sutime.txt"

Any ideas? Can I edit the configurations to skip the above TokenRegex rules
and start the server?

Thanks,
Dileepa

[1]
dileepa@dileepa-laptop2:~/apache/stanfordNLP_stanbol/server$ *java -Xmx1g
-jar
at.salzburgresearch.stanbol.stanbol.enhancer.nlp.stanford.server-1.0.0-SNAPSHOT-jar-with-dependencies.jar*
Loading default properties from tagger
edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger
Reading POS tagger model from
edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger
... done [2.2 sec].
Loading classifier from
edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done
[6.1 sec].
Loading classifier from
edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ...
done [4.3 sec].
Loading classifier from
edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done
[3.9 sec].
Initialization JollyDayHoliday for sutime
Reading TokensRegex rules from
edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from
edu/stanford/nlp/models/sutime/english.sutime.txt
Nov 29, 2013 7:14:24 PM
edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor appendRules
INFO: Ignoring inactive rule: temporal-composite-8:ranges
Reading TokensRegex rules from
edu/stanford/nlp/models/sutime/english.holidays.sutime.txt



On Fri, Nov 29, 2013 at 11:48 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Dileepa
>
> If you require to detect Entities that are not part of the Controlled
> Vocabularies than there is no way around NER. If you want to have good
> results there will be no way around of building your own models based
> on a custom trainings set.
>
> If you need to detect Persons, Organizations and Places you might have
> a look at Stanford NLP with the Stanbol integration [1]. As the
> Stanford Model provided by Stanford NLP is much better as such of
> OpenNLP.
>
> best
> Rupert
>
>
> [1] https://github.com/westei/stanbol-stanfordnlp
>
> On Thu, Nov 28, 2013 at 6:57 AM, Dileepa Jayakody
> <di...@gmail.com> wrote:
> > Hi Rafa, Rupert,
> >
> > Thanks a lot for your input. I will look at the options you have
> suggested.
> > However, in the first phase of my project I don't require entity-linking
> > from entity-hub because many of the entities mentioned in the content I
> > submit will not be available in dbpedia. Therefore currently I also don't
> > require dbpediaLinking, entityhubExtraction engines in the default chain
> > I'm using. I will look at implementing a custom-vocab in the second phase
> > of the project for entity-linking and disambiguation purpose.
> >
> > At the moment, I focus on improving the accuracy of
> > named-entity-recognition using NLP techniques. So I think opennlp-chunker
> > based improvements will be very helpful at this point.
> >
> > Do you think the accuracy of NER will be improved if I also associate
> > entitylinking with dbpedia, dbpedia-fst-linking?
> >
> > Thanks,
> > Dileepa
> >
> >
> > On Wed, Nov 27, 2013 at 7:54 PM, Rupert Westenthaler <
> > rupert.westenthaler@gmail.com> wrote:
> >
> >> Hi Dileepa,
> >>
> >> I would suggest you also test with a chain that uses Entity Linking
> >> instead of Named Entity Linking. Have you tried the
> >> "dbpedia-fst-linking" chain? This one is also configured in the
> >> default launcher. Please also have a look at STANBOL-1211 [1] that
> >> brought a lot of improvements for EntityLinking if you include a
> >> chunker (e.g. the opennlp-chunker) in your chain.
> >>
> >> best
> >> Rupert
> >>
> >>
> >> [1] https://issues.apache.org/jira/browse/STANBOL-1211
> >>
> >> On Wed, Nov 27, 2013 at 11:28 AM, Dileepa Jayakody
> >> <di...@gmail.com> wrote:
> >> > Hi Rafa,
> >> >
> >> > I'm using the default chain;
> >> > tika
> >> > langdetect
> >> > opennlp-sentence
> >> > opennlp-token
> >> > opennlp-pos
> >> > opennlp-ner
> >> > dbpediaLinking
> >> > entityhubExtraction
> >> >
> >> > Thanks,
> >> > Dileepa
> >> >
> >> >
> >> > On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <rh...@apache.org> wrote:
> >> >
> >> >> Hi Dileepa,
> >> >>
> >> >> Are you using only OpenNLP NER engine or are you also including an
> >> Entity
> >> >> Linking engine?
> >> >>
> >> >>
> >> >> El 27/11/13 11:17, Dileepa Jayakody escribió:
> >> >>
> >> >>> Content:
> >> >>> Barclays has appointed Shaygan Kheradpir to the role of Chief
> >> Operations
> >> >>> and Technology Officer. He will join the Executive Committee of
> >> Barclays
> >> >>> and report directly to Group Chief Executive Antony Jenkins.
> >> >>>
> >> >>> Above content doesn't identify* Barclays* as an organization by
> >> >>> identifies *Executive
> >> >>> Committee of Barclays* as an organization.
> >> >>>
> >> >>>
> >> >>> How can we improve the accuracy of these results?
> >> >>>
> >> >>> Thanks,
> >> >>> Dileepa
> >> >>>
> >> >>>
> >> >>> On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <
> >> >>> dileepajayakody@gmail.com
> >> >>>
> >> >>>> wrote:
> >> >>>> [Typo corrected in the subject of the mail]
> >> >>>> ---------- Forwarded message ----------
> >> >>>> From: Dileepa Jayakody <di...@gmail.com>
> >> >>>> Date: Wed, Nov 27, 2013 at 3:40 PM
> >> >>>> Subject: How to refinin NER results in Stanbol
> >> >>>> To: Stanbol Dev List <de...@stanbol.apache.org>
> >> >>>>
> >> >>>>
> >> >>>> Hi All,
> >> >>>>
> >> >>>> I have been running some load tests on Stanbol entity recognition,
> >> with a
> >> >>>> high load of content extracted from web articles and stored in a
> Solr
> >> >>>> index.
> >> >>>>
> >> >>>> My objective is to achieve an efficient and accurate enhancement
> >> result
> >> >>>> for the content submitted.
> >> >>>>
> >> >>>> But I think some of the NER results obtained are not accurate.
> >> >>>>
> >> >>>> For an example I submit the content :
> >> >>>> Group Finance Director Chris Lucas and Group General Counsel Mark
> >> Harding
> >> >>>> to retire from Barclays
> >> >>>>
> >> >>>> I get below entity recognition results from default
> enhancement-chain;
> >> >>>>
> >> >>>> People : Chris Lucas, Mark Harding
> >> >>>> Organization: Barclays, *BT Group*, *Finance Director Chris Lucas
> and
> >> >>>> Group General Counsel*
> >> >>>>
> >> >>>>
> >> >>>> The highlighted NERs for organizations above are inaccurate
> results.
> >> >>>> BT Group is not mentioned in the content, and the result : *Finance
> >> >>>> Director Chris Lucas and Group General Counsel * is not an
> >> organization,
> >> >>>>
> >> >>>> rather a phrase.
> >> >>>> Further if I add a fullstop (.) to the end of the sentence
> "Barclays"
> >> is
> >> >>>> not recognized as an Organization.
> >> >>>>
> >> >>>> I think we need to improve these results in Stanbol NER. Can we
> tweak
> >> >>>> OpenNLP-NER component for this?
> >> >>>>
> >> >>>> Any ideas/pointers on how to refine these enhancement results will
> be
> >> >>>> immensely helpful.
> >> >>>> I'm looking for a way to improve the accuracy of the results as
> much
> >> as
> >> >>>> possible.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Dileepa
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: How to improve NER results in Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.

Hi Dileepa

If you require to detect Entities that are not part of the Controlled
Vocabularies than there is no way around NER. If you want to have good
results there will be no way around of building your own models based
on a custom trainings set.

If you need to detect Persons, Organizations and Places you might have
a look at Stanford NLP with the Stanbol integration [1]. As the
Stanford Model provided by Stanford NLP is much better as such of
OpenNLP.

best
Rupert


[1] https://github.com/westei/stanbol-stanfordnlp

On Thu, Nov 28, 2013 at 6:57 AM, Dileepa Jayakody
<di...@gmail.com> wrote:
> Hi Rafa, Rupert,
>
> Thanks a lot for your input. I will look at the options you have suggested.
> However, in the first phase of my project I don't require entity-linking
> from entity-hub because many of the entities mentioned in the content I
> submit will not be available in dbpedia. Therefore currently I also don't
> require dbpediaLinking, entityhubExtraction engines in the default chain
> I'm using. I will look at implementing a custom-vocab in the second phase
> of the project for entity-linking and disambiguation purpose.
>
> At the moment, I focus on improving the accuracy of
> named-entity-recognition using NLP techniques. So I think opennlp-chunker
> based improvements will be very helpful at this point.
>
> Do you think the accuracy of NER will be improved if I also associate
> entitylinking with dbpedia, dbpedia-fst-linking?
>
> Thanks,
> Dileepa
>
>
> On Wed, Nov 27, 2013 at 7:54 PM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi Dileepa,
>>
>> I would suggest you also test with a chain that uses Entity Linking
>> instead of Named Entity Linking. Have you tried the
>> "dbpedia-fst-linking" chain? This one is also configured in the
>> default launcher. Please also have a look at STANBOL-1211 [1] that
>> brought a lot of improvements for EntityLinking if you include a
>> chunker (e.g. the opennlp-chunker) in your chain.
>>
>> best
>> Rupert
>>
>>
>> [1] https://issues.apache.org/jira/browse/STANBOL-1211
>>
>> On Wed, Nov 27, 2013 at 11:28 AM, Dileepa Jayakody
>> <di...@gmail.com> wrote:
>> > Hi Rafa,
>> >
>> > I'm using the default chain;
>> > tika
>> > langdetect
>> > opennlp-sentence
>> > opennlp-token
>> > opennlp-pos
>> > opennlp-ner
>> > dbpediaLinking
>> > entityhubExtraction
>> >
>> > Thanks,
>> > Dileepa
>> >
>> >
>> > On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <rh...@apache.org> wrote:
>> >
>> >> Hi Dileepa,
>> >>
>> >> Are you using only OpenNLP NER engine or are you also including an
>> Entity
>> >> Linking engine?
>> >>
>> >>
>> >> El 27/11/13 11:17, Dileepa Jayakody escribió:
>> >>
>> >>> Content:
>> >>> Barclays has appointed Shaygan Kheradpir to the role of Chief
>> Operations
>> >>> and Technology Officer. He will join the Executive Committee of
>> Barclays
>> >>> and report directly to Group Chief Executive Antony Jenkins.
>> >>>
>> >>> Above content doesn't identify* Barclays* as an organization by
>> >>> identifies *Executive
>> >>> Committee of Barclays* as an organization.
>> >>>
>> >>>
>> >>> How can we improve the accuracy of these results?
>> >>>
>> >>> Thanks,
>> >>> Dileepa
>> >>>
>> >>>
>> >>> On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <
>> >>> dileepajayakody@gmail.com
>> >>>
>> >>>> wrote:
>> >>>> [Typo corrected in the subject of the mail]
>> >>>> ---------- Forwarded message ----------
>> >>>> From: Dileepa Jayakody <di...@gmail.com>
>> >>>> Date: Wed, Nov 27, 2013 at 3:40 PM
>> >>>> Subject: How to refinin NER results in Stanbol
>> >>>> To: Stanbol Dev List <de...@stanbol.apache.org>
>> >>>>
>> >>>>
>> >>>> Hi All,
>> >>>>
>> >>>> I have been running some load tests on Stanbol entity recognition,
>> with a
>> >>>> high load of content extracted from web articles and stored in a Solr
>> >>>> index.
>> >>>>
>> >>>> My objective is to achieve an efficient and accurate enhancement
>> result
>> >>>> for the content submitted.
>> >>>>
>> >>>> But I think some of the NER results obtained are not accurate.
>> >>>>
>> >>>> For an example I submit the content :
>> >>>> Group Finance Director Chris Lucas and Group General Counsel Mark
>> Harding
>> >>>> to retire from Barclays
>> >>>>
>> >>>> I get below entity recognition results from default enhancement-chain;
>> >>>>
>> >>>> People : Chris Lucas, Mark Harding
>> >>>> Organization: Barclays, *BT Group*, *Finance Director Chris Lucas and
>> >>>> Group General Counsel*
>> >>>>
>> >>>>
>> >>>> The highlighted NERs for organizations above are inaccurate results.
>> >>>> BT Group is not mentioned in the content, and the result : *Finance
>> >>>> Director Chris Lucas and Group General Counsel * is not an
>> organization,
>> >>>>
>> >>>> rather a phrase.
>> >>>> Further if I add a fullstop (.) to the end of the sentence "Barclays"
>> is
>> >>>> not recognized as an Organization.
>> >>>>
>> >>>> I think we need to improve these results in Stanbol NER. Can we tweak
>> >>>> OpenNLP-NER component for this?
>> >>>>
>> >>>> Any ideas/pointers on how to refine these enhancement results will be
>> >>>> immensely helpful.
>> >>>> I'm looking for a way to improve the accuracy of the results as much
>> as
>> >>>> possible.
>> >>>>
>> >>>> Thanks,
>> >>>> Dileepa
>> >>>>
>> >>>>
>> >>>>
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: How to improve NER results in Stanbol

Posted by Dileepa Jayakody <di...@gmail.com>.

Hi Rafa, Rupert,

Thanks a lot for your input. I will look at the options you have suggested.
However, in the first phase of my project I don't require entity-linking
from entity-hub because many of the entities mentioned in the content I
submit will not be available in dbpedia. Therefore currently I also don't
require dbpediaLinking, entityhubExtraction engines in the default chain
I'm using. I will look at implementing a custom-vocab in the second phase
of the project for entity-linking and disambiguation purpose.

At the moment, I focus on improving the accuracy of
named-entity-recognition using NLP techniques. So I think opennlp-chunker
based improvements will be very helpful at this point.

Do you think the accuracy of NER will be improved if I also associate
entitylinking with dbpedia, dbpedia-fst-linking?

Thanks,
Dileepa


On Wed, Nov 27, 2013 at 7:54 PM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Dileepa,
>
> I would suggest you also test with a chain that uses Entity Linking
> instead of Named Entity Linking. Have you tried the
> "dbpedia-fst-linking" chain? This one is also configured in the
> default launcher. Please also have a look at STANBOL-1211 [1] that
> brought a lot of improvements for EntityLinking if you include a
> chunker (e.g. the opennlp-chunker) in your chain.
>
> best
> Rupert
>
>
> [1] https://issues.apache.org/jira/browse/STANBOL-1211
>
> On Wed, Nov 27, 2013 at 11:28 AM, Dileepa Jayakody
> <di...@gmail.com> wrote:
> > Hi Rafa,
> >
> > I'm using the default chain;
> > tika
> > langdetect
> > opennlp-sentence
> > opennlp-token
> > opennlp-pos
> > opennlp-ner
> > dbpediaLinking
> > entityhubExtraction
> >
> > Thanks,
> > Dileepa
> >
> >
> > On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <rh...@apache.org> wrote:
> >
> >> Hi Dileepa,
> >>
> >> Are you using only OpenNLP NER engine or are you also including an
> Entity
> >> Linking engine?
> >>
> >>
> >> El 27/11/13 11:17, Dileepa Jayakody escribió:
> >>
> >>> Content:
> >>> Barclays has appointed Shaygan Kheradpir to the role of Chief
> Operations
> >>> and Technology Officer. He will join the Executive Committee of
> Barclays
> >>> and report directly to Group Chief Executive Antony Jenkins.
> >>>
> >>> Above content doesn't identify* Barclays* as an organization by
> >>> identifies *Executive
> >>> Committee of Barclays* as an organization.
> >>>
> >>>
> >>> How can we improve the accuracy of these results?
> >>>
> >>> Thanks,
> >>> Dileepa
> >>>
> >>>
> >>> On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <
> >>> dileepajayakody@gmail.com
> >>>
> >>>> wrote:
> >>>> [Typo corrected in the subject of the mail]
> >>>> ---------- Forwarded message ----------
> >>>> From: Dileepa Jayakody <di...@gmail.com>
> >>>> Date: Wed, Nov 27, 2013 at 3:40 PM
> >>>> Subject: How to refinin NER results in Stanbol
> >>>> To: Stanbol Dev List <de...@stanbol.apache.org>
> >>>>
> >>>>
> >>>> Hi All,
> >>>>
> >>>> I have been running some load tests on Stanbol entity recognition,
> with a
> >>>> high load of content extracted from web articles and stored in a Solr
> >>>> index.
> >>>>
> >>>> My objective is to achieve an efficient and accurate enhancement
> result
> >>>> for the content submitted.
> >>>>
> >>>> But I think some of the NER results obtained are not accurate.
> >>>>
> >>>> For an example I submit the content :
> >>>> Group Finance Director Chris Lucas and Group General Counsel Mark
> Harding
> >>>> to retire from Barclays
> >>>>
> >>>> I get below entity recognition results from default enhancement-chain;
> >>>>
> >>>> People : Chris Lucas, Mark Harding
> >>>> Organization: Barclays, *BT Group*, *Finance Director Chris Lucas and
> >>>> Group General Counsel*
> >>>>
> >>>>
> >>>> The highlighted NERs for organizations above are inaccurate results.
> >>>> BT Group is not mentioned in the content, and the result : *Finance
> >>>> Director Chris Lucas and Group General Counsel * is not an
> organization,
> >>>>
> >>>> rather a phrase.
> >>>> Further if I add a fullstop (.) to the end of the sentence "Barclays"
> is
> >>>> not recognized as an Organization.
> >>>>
> >>>> I think we need to improve these results in Stanbol NER. Can we tweak
> >>>> OpenNLP-NER component for this?
> >>>>
> >>>> Any ideas/pointers on how to refine these enhancement results will be
> >>>> immensely helpful.
> >>>> I'm looking for a way to improve the accuracy of the results as much
> as
> >>>> possible.
> >>>>
> >>>> Thanks,
> >>>> Dileepa
> >>>>
> >>>>
> >>>>
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: How to improve NER results in Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.

Hi Dileepa,

I would suggest you also test with a chain that uses Entity Linking
instead of Named Entity Linking. Have you tried the
"dbpedia-fst-linking" chain? This one is also configured in the
default launcher. Please also have a look at STANBOL-1211 [1] that
brought a lot of improvements for EntityLinking if you include a
chunker (e.g. the opennlp-chunker) in your chain.

best
Rupert


[1] https://issues.apache.org/jira/browse/STANBOL-1211

On Wed, Nov 27, 2013 at 11:28 AM, Dileepa Jayakody
<di...@gmail.com> wrote:
> Hi Rafa,
>
> I'm using the default chain;
> tika
> langdetect
> opennlp-sentence
> opennlp-token
> opennlp-pos
> opennlp-ner
> dbpediaLinking
> entityhubExtraction
>
> Thanks,
> Dileepa
>
>
> On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <rh...@apache.org> wrote:
>
>> Hi Dileepa,
>>
>> Are you using only OpenNLP NER engine or are you also including an Entity
>> Linking engine?
>>
>>
>> El 27/11/13 11:17, Dileepa Jayakody escribió:
>>
>>> Content:
>>> Barclays has appointed Shaygan Kheradpir to the role of Chief Operations
>>> and Technology Officer. He will join the Executive Committee of Barclays
>>> and report directly to Group Chief Executive Antony Jenkins.
>>>
>>> Above content doesn't identify* Barclays* as an organization by
>>> identifies *Executive
>>> Committee of Barclays* as an organization.
>>>
>>>
>>> How can we improve the accuracy of these results?
>>>
>>> Thanks,
>>> Dileepa
>>>
>>>
>>> On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <
>>> dileepajayakody@gmail.com
>>>
>>>> wrote:
>>>> [Typo corrected in the subject of the mail]
>>>> ---------- Forwarded message ----------
>>>> From: Dileepa Jayakody <di...@gmail.com>
>>>> Date: Wed, Nov 27, 2013 at 3:40 PM
>>>> Subject: How to refinin NER results in Stanbol
>>>> To: Stanbol Dev List <de...@stanbol.apache.org>
>>>>
>>>>
>>>> Hi All,
>>>>
>>>> I have been running some load tests on Stanbol entity recognition, with a
>>>> high load of content extracted from web articles and stored in a Solr
>>>> index.
>>>>
>>>> My objective is to achieve an efficient and accurate enhancement result
>>>> for the content submitted.
>>>>
>>>> But I think some of the NER results obtained are not accurate.
>>>>
>>>> For an example I submit the content :
>>>> Group Finance Director Chris Lucas and Group General Counsel Mark Harding
>>>> to retire from Barclays
>>>>
>>>> I get below entity recognition results from default enhancement-chain;
>>>>
>>>> People : Chris Lucas, Mark Harding
>>>> Organization: Barclays, *BT Group*, *Finance Director Chris Lucas and
>>>> Group General Counsel*
>>>>
>>>>
>>>> The highlighted NERs for organizations above are inaccurate results.
>>>> BT Group is not mentioned in the content, and the result : *Finance
>>>> Director Chris Lucas and Group General Counsel * is not an organization,
>>>>
>>>> rather a phrase.
>>>> Further if I add a fullstop (.) to the end of the sentence "Barclays" is
>>>> not recognized as an Organization.
>>>>
>>>> I think we need to improve these results in Stanbol NER. Can we tweak
>>>> OpenNLP-NER component for this?
>>>>
>>>> Any ideas/pointers on how to refine these enhancement results will be
>>>> immensely helpful.
>>>> I'm looking for a way to improve the accuracy of the results as much as
>>>> possible.
>>>>
>>>> Thanks,
>>>> Dileepa
>>>>
>>>>
>>>>
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: How to improve NER results in Stanbol

Posted by Dileepa Jayakody <di...@gmail.com>.

Hi Rafa,

I'm using the default chain;
tika
langdetect
opennlp-sentence
opennlp-token
opennlp-pos
opennlp-ner
dbpediaLinking
entityhubExtraction

Thanks,
Dileepa


On Wed, Nov 27, 2013 at 3:54 PM, Rafa Haro <rh...@apache.org> wrote:

> Hi Dileepa,
>
> Are you using only OpenNLP NER engine or are you also including an Entity
> Linking engine?
>
>
> El 27/11/13 11:17, Dileepa Jayakody escribió:
>
>> Content:
>> Barclays has appointed Shaygan Kheradpir to the role of Chief Operations
>> and Technology Officer. He will join the Executive Committee of Barclays
>> and report directly to Group Chief Executive Antony Jenkins.
>>
>> Above content doesn't identify* Barclays* as an organization by
>> identifies *Executive
>> Committee of Barclays* as an organization.
>>
>>
>> How can we improve the accuracy of these results?
>>
>> Thanks,
>> Dileepa
>>
>>
>> On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <
>> dileepajayakody@gmail.com
>>
>>> wrote:
>>> [Typo corrected in the subject of the mail]
>>> ---------- Forwarded message ----------
>>> From: Dileepa Jayakody <di...@gmail.com>
>>> Date: Wed, Nov 27, 2013 at 3:40 PM
>>> Subject: How to refinin NER results in Stanbol
>>> To: Stanbol Dev List <de...@stanbol.apache.org>
>>>
>>>
>>> Hi All,
>>>
>>> I have been running some load tests on Stanbol entity recognition, with a
>>> high load of content extracted from web articles and stored in a Solr
>>> index.
>>>
>>> My objective is to achieve an efficient and accurate enhancement result
>>> for the content submitted.
>>>
>>> But I think some of the NER results obtained are not accurate.
>>>
>>> For an example I submit the content :
>>> Group Finance Director Chris Lucas and Group General Counsel Mark Harding
>>> to retire from Barclays
>>>
>>> I get below entity recognition results from default enhancement-chain;
>>>
>>> People : Chris Lucas, Mark Harding
>>> Organization: Barclays, *BT Group*, *Finance Director Chris Lucas and
>>> Group General Counsel*
>>>
>>>
>>> The highlighted NERs for organizations above are inaccurate results.
>>> BT Group is not mentioned in the content, and the result : *Finance
>>> Director Chris Lucas and Group General Counsel * is not an organization,
>>>
>>> rather a phrase.
>>> Further if I add a fullstop (.) to the end of the sentence "Barclays" is
>>> not recognized as an Organization.
>>>
>>> I think we need to improve these results in Stanbol NER. Can we tweak
>>> OpenNLP-NER component for this?
>>>
>>> Any ideas/pointers on how to refine these enhancement results will be
>>> immensely helpful.
>>> I'm looking for a way to improve the accuracy of the results as much as
>>> possible.
>>>
>>> Thanks,
>>> Dileepa
>>>
>>>
>>>
>

Re: How to improve NER results in Stanbol

Posted by Rafa Haro <rh...@apache.org>.

Hi Dileepa,

Are you using only OpenNLP NER engine or are you also including an 
Entity Linking engine?


El 27/11/13 11:17, Dileepa Jayakody escribió:
> Content:
> Barclays has appointed Shaygan Kheradpir to the role of Chief Operations
> and Technology Officer. He will join the Executive Committee of Barclays
> and report directly to Group Chief Executive Antony Jenkins.
>
> Above content doesn't identify* Barclays* as an organization by
> identifies *Executive
> Committee of Barclays* as an organization.
>
> How can we improve the accuracy of these results?
>
> Thanks,
> Dileepa
>
>
> On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <dileepajayakody@gmail.com
>> wrote:
>> [Typo corrected in the subject of the mail]
>> ---------- Forwarded message ----------
>> From: Dileepa Jayakody <di...@gmail.com>
>> Date: Wed, Nov 27, 2013 at 3:40 PM
>> Subject: How to refinin NER results in Stanbol
>> To: Stanbol Dev List <de...@stanbol.apache.org>
>>
>>
>> Hi All,
>>
>> I have been running some load tests on Stanbol entity recognition, with a
>> high load of content extracted from web articles and stored in a Solr index.
>>
>> My objective is to achieve an efficient and accurate enhancement result
>> for the content submitted.
>>
>> But I think some of the NER results obtained are not accurate.
>>
>> For an example I submit the content :
>> Group Finance Director Chris Lucas and Group General Counsel Mark Harding
>> to retire from Barclays
>>
>> I get below entity recognition results from default enhancement-chain;
>>
>> People : Chris Lucas, Mark Harding
>> Organization: Barclays, *BT Group*, *Finance Director Chris Lucas and
>> Group General Counsel*
>>
>> The highlighted NERs for organizations above are inaccurate results.
>> BT Group is not mentioned in the content, and the result : *Finance
>> Director Chris Lucas and Group General Counsel * is not an organization,
>> rather a phrase.
>> Further if I add a fullstop (.) to the end of the sentence "Barclays" is
>> not recognized as an Organization.
>>
>> I think we need to improve these results in Stanbol NER. Can we tweak
>> OpenNLP-NER component for this?
>>
>> Any ideas/pointers on how to refine these enhancement results will be
>> immensely helpful.
>> I'm looking for a way to improve the accuracy of the results as much as
>> possible.
>>
>> Thanks,
>> Dileepa
>>
>>

Re: How to improve NER results in Stanbol

Posted by Dileepa Jayakody <di...@gmail.com>.

Content:
Barclays has appointed Shaygan Kheradpir to the role of Chief Operations
and Technology Officer. He will join the Executive Committee of Barclays
and report directly to Group Chief Executive Antony Jenkins.

Above content doesn't identify* Barclays* as an organization by
identifies *Executive
Committee of Barclays* as an organization.

How can we improve the accuracy of these results?

Thanks,
Dileepa


On Wed, Nov 27, 2013 at 3:42 PM, Dileepa Jayakody <dileepajayakody@gmail.com
> wrote:

> [Typo corrected in the subject of the mail]
> ---------- Forwarded message ----------
> From: Dileepa Jayakody <di...@gmail.com>
> Date: Wed, Nov 27, 2013 at 3:40 PM
> Subject: How to refinin NER results in Stanbol
> To: Stanbol Dev List <de...@stanbol.apache.org>
>
>
> Hi All,
>
> I have been running some load tests on Stanbol entity recognition, with a
> high load of content extracted from web articles and stored in a Solr index.
>
> My objective is to achieve an efficient and accurate enhancement result
> for the content submitted.
>
> But I think some of the NER results obtained are not accurate.
>
> For an example I submit the content :
> Group Finance Director Chris Lucas and Group General Counsel Mark Harding
> to retire from Barclays
>
> I get below entity recognition results from default enhancement-chain;
>
> People : Chris Lucas, Mark Harding
> Organization: Barclays, *BT Group*, *Finance Director Chris Lucas and
> Group General Counsel*
>
> The highlighted NERs for organizations above are inaccurate results.
> BT Group is not mentioned in the content, and the result : *Finance
> Director Chris Lucas and Group General Counsel * is not an organization,
> rather a phrase.
> Further if I add a fullstop (.) to the end of the sentence "Barclays" is
> not recognized as an Organization.
>
> I think we need to improve these results in Stanbol NER. Can we tweak
> OpenNLP-NER component for this?
>
> Any ideas/pointers on how to refine these enhancement results will be
> immensely helpful.
> I'm looking for a way to improve the accuracy of the results as much as
> possible.
>
> Thanks,
> Dileepa
>
>