You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Filipe Araujo <7....@gmail.com> on 2013/01/25 18:16:34 UTC

Question about correlation between entities in OpenNLP

Hi there,

First of all, nice job with OpenNLP. :)

I'm sending this email to ask your opinion on how to implement some
features for my project.
The objective of my project is to extract entities (persons, dates,
companies, etc) and their relations in resumes (CVs), for example, company
name and respective working period (date) of a person.

I have a set of resumes and I'm creating a training set for entity
recognition, however I'm a little worried about relation extraction. Is
there a way to
write nested tags in a training set to improve this relation extraction? If
there isn't, what recommendations can you give me to this problem?

As an example for this, right now i have:
<START:description> Senior Developer <END>  <START:date> 2009-current <END>
 <START:company> Apple Inc. <END>

What i hoped i could have:

<START:jobs>
    <START:description> Senior Developer <END>  <START:date> 2009-current
<END> <START:company> Apple Inc. <END>
    ...
<END>

Is a nested tag approach the correct way of trying to achieve this
correlation between entities or is there better approaches to this problem?

Thank you very much for your help.

Best regards,

-- 

Filipe Araújo

Re: Question about correlation between entities in OpenNLP

Posted by James Kosin <ja...@gmail.com>.
On 1/25/2013 12:16 PM, Filipe Araujo wrote:
> Hi there,
>
<... SNIP ...>
> As an example for this, right now i have:
> <START:description> Senior Developer <END>  <START:date> 2009-current <END>
>   <START:company> Apple Inc. <END>
>
> What i hoped i could have:
>
> <START:jobs>
>      <START:description> Senior Developer <END>  <START:date> 2009-current
> <END> <START:company> Apple Inc. <END>
>      ...
> <END>
>
> Is a nested tag approach the correct way of trying to achieve this
> correlation between entities or is there better approaches to this problem?
>
> Thank you very much for your help.
>
> Best regards,
>
No, the name finder models don't have support for nested tokens. Any 
training data setup like this would produce problems and would not train.

Maybe another approach like:
     <START:job_description> ... <END> <START:job_date> ... <END> 
<START:job_date> ... <END>
    <START:job_company> ... <END>

Then using an XML type generator to produce a more logical splitting 
.... afterwards.

James