You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Carlos Scheidecker <na...@gmail.com> on 2013/09/26 11:37:00 UTC

Triplet Extraction with OpenNLP

Hello all,

I am interested in performing Triplet Extraction.

For that, I need to traverse the parse tree.

I know how to use the ChunkMe, however I am not sure how to use the Parser
so that I can create a tree to traverse it.

Ideally, I want to obtain the subject, predicate and object.

To find the subject I need to search in the NP subtree selecting the first
descendent of NP that is a Noun via breadth first search.

To find the predicate I will search the VP subtree, the deepest verb
descendent on that tree will give the predicate.

Now for the object(s) they can be in 3 different subtrees. PP, NP and ADJ.
In NP and PP they will be the first noun while on the ADJ we need to locate
the first adjective.

Therefore, what I need to learn is how to create the parser and the main
tree so that I can navigate the subtrees.

Thanks for the help,

Carlos.

Re: Triplet Extraction with OpenNLP

Posted by Mark G <gi...@gmail.com>.
internally to the Parse class, I think, perhaps,  the showCodeTree() method
is doing similar to what you might want (as a start), it is a recursive
method for traversing through the children of the top parse object. If you
have the source code look at the Parse object, and the showCodeTree method.
I was thinking you could construct a sorted map (TreeMap) with part of
speech or chunk as a key sorted by the order it was mentioned, and then a
treeset of parts as the value to each key so you would be able to get the
first or last from the value/set depending on the position and type of the
key. Just a rough thought though
Mark G


On Fri, Sep 27, 2013 at 3:09 AM, Carlos Scheidecker <na...@gmail.com>wrote:

> This is awesome Mark, thanks!
>
> This will be quite useful for everybody else as well.
>
> I ended up doing mine and I went further with the other part of extraction.
>
> What I found interesting is the time it takes to load the
> model en-parser-chunking.bin which is about 36mb.
>
> So I am not loading everytime but just during object creation.
>
> Anyone has another better suggestion?
>
> cheers.
>
>
> On Thu, Sep 26, 2013 at 4:59 PM, Mark G <gi...@gmail.com> wrote:
>
> > Carlos.. I threw this together to show how to get a Parser running.
> > Look at what this prints, I think you may be able to iterate through
> > topParses[] and traverse the tree. If there is a more efficient way I am
> > sure the other OpenNLPers will chime in.
> >
> >
> >   public static void main(String[] args) throws InvalidFormatException,
> > IOException {
> >
> >     InputStream is = new
> > FileInputStream("c:\\temp\\opennlpmodels\\en-parser-chunking.bin");
> >
> >     ParserModel model = new ParserModel(is);
> >     is.close();
> >     Parser parser = ParserFactory.create(model);
> >
> >     String sentence = "The countries broke off peace talks following the
> > Mumbai attacks but have begun discussions again, focusing on increasing
> > trade.";
> >     Parse topParses[] = ParserTool.parseLine(sentence, parser, 1);
> >
> >     Parse p = topParses[0];
> >     p.showCodeTree();
> >     p.show();
> >     p.getParent();
> >     p.getChildren();
> >
> >
> >     System.out.println(p.getText());
> >   }
> >
> > It should print all this...
> >
> > [0] S 2092924121 -> 2092924121 TOP The countries broke off peace talks
> > following the Mumbai attacks but have begun discussions again, focusing
> on
> > increasing trade.
> > [0.0] NP 2092766686 -> 2092924121 S The countries
> > [0.0.0] DT 2092752996 -> 2092766686 NP The
> > [0.0.0.0] TK 2092752996 -> 2092752996 DT The
> > [0.0.1] NNS 2092969298 -> 2092766686 NP countries
> > [0.0.1.0] TK 2092969298 -> 2092969298 NNS countries
> > [0.1] VP 2093633263 -> 2092924121 S broke off peace talks following the
> > Mumbai attacks but have begun discussions again, focusing on increasing
> > trade.
> > [0.1.0] VP 2093545647 -> 2093633263 VP broke off peace talks following
> the
> > Mumbai attacks
> > [0.1.0.0] VBD 2093484042 -> 2093545647 VP broke
> > [0.1.0.0.0] TK 2093484042 -> 2093484042 VBD broke
> > [0.1.0.1] PRT 2093793436 -> 2093545647 VP off
> > [0.1.0.1.0] RP 2093793436 -> 2093793436 PRT off
> > [0.1.0.1.0.0] TK 2093793436 -> 2093793436 RP off
> > [0.1.0.2] NP 2094012476 -> 2093545647 VP peace talks
> > [0.1.0.2.0] NN 2094004262 -> 2094012476 NP peace
> > [0.1.0.2.0.0] TK 2094004262 -> 2094004262 NN peace
> > [0.1.0.2.1] NNS 2094316394 -> 2094012476 NP talks
> > [0.1.0.2.1.0] TK 2094316394 -> 2094316394 NNS talks
> > [0.1.0.3] PP 2094660013 -> 2093545647 VP following the Mumbai attacks
> > [0.1.0.3.0] VBG 2094634002 -> 2094660013 PP following
> > [0.1.0.3.0.0] TK 2094634002 -> 2094634002 VBG following
> > [0.1.0.3.1] NP 2095166543 -> 2094660013 PP the Mumbai attacks
> > [0.1.0.3.1.0] DT 2095146008 -> 2095166543 NP the
> > [0.1.0.3.1.0.0] TK 2095146008 -> 2095146008 DT the
> > [0.1.0.3.1.1] NNP 2095358203 -> 2095166543 NP Mumbai
> > [0.1.0.3.1.1.0] TK 2095358203 -> 2095358203 NNP Mumbai
> > [0.1.0.3.1.2] NNS 2095723726 -> 2095166543 NP attacks
> > [0.1.0.3.1.2.0] TK 2095723726 -> 2095723726 NNS attacks
> > [0.1.1] CC 2096134426 -> 2093633263 VP but
> > [0.1.1.0] TK 2096134426 -> 2096134426 CC but
> > [0.1.2] VP 2096419178 -> 2093633263 VP have begun discussions again,
> > focusing on increasing trade.
> > [0.1.2.0] VBP 2096343883 -> 2096419178 VP have
> > [0.1.2.0.0] TK 2096343883 -> 2096343883 VBP have
> > [0.1.2.1] VP 2096672443 -> 2096419178 VP begun discussions again,
> focusing
> > on increasing trade.
> > [0.1.2.1.0] VBN 2096605362 -> 2096672443 VP begun
> > [0.1.2.1.0.0] TK 2096605362 -> 2096605362 VBN begun
> > [0.1.2.1.1] NP 2096925708 -> 2096672443 VP discussions
> > [0.1.2.1.1.0] NNS 2096925708 -> 2096925708 NP discussions
> > [0.1.2.1.1.0.0] TK 2096925708 -> 2096925708 NNS discussions
> > [0.1.2.1.2] PP 2097584197 -> 2096672443 VP again, focusing on increasing
> > trade.
> > [0.1.2.1.2.0] IN 2097543127 -> 2097584197 PP again,
> > [0.1.2.1.2.0.0] TK 2097543127 -> 2097543127 IN again,
> > [0.1.2.1.2.1] S 2097938768 -> 2097584197 PP focusing on increasing trade.
> > [0.1.2.1.2.1.0] VP 2097938768 -> 2097938768 S focusing on increasing
> trade.
> > [0.1.2.1.2.1.0.0] VBG 2097910019 -> 2097938768 VP focusing
> > [0.1.2.1.2.1.0.0.0] TK 2097910019 -> 2097910019 VBG focusing
> > [0.1.2.1.2.1.0.1] PP 2098394645 -> 2097938768 VP on increasing trade.
> > [0.1.2.1.2.1.0.1.0] IN 2098370003 -> 2098394645 PP on
> > [0.1.2.1.2.1.0.1.0.0] TK 2098370003 -> 2098370003 IN on
> > [0.1.2.1.2.1.0.1.1] NP 2098546604 -> 2098394645 PP increasing trade.
> > [0.1.2.1.2.1.0.1.1.0] VBG 2098537021 -> 2098546604 NP increasing
> > [0.1.2.1.2.1.0.1.1.0.0] TK 2098537021 -> 2098537021 VBG increasing
> > [0.1.2.1.2.1.0.1.1.1] NN 2099103787 -> 2098546604 NP trade.
> > [0.1.2.1.2.1.0.1.1.1.0] TK 2099103787 -> 2099103787 NN trade.
> > (TOP (S (NP (DT The) (NNS countries)) (VP (VP (VBD broke) (PRT (RP off))
> > (NP (NN peace) (NNS talks)) (PP (VBG following) (NP (DT the) (NNP Mumbai)
> > (NNS attacks)))) (CC but) (VP (VBP have) (VP (VBN begun) (NP (NNS
> > discussions)) (PP (IN again,) (S (VP (VBG focusing) (PP (IN on) (NP (VBG
> > increasing) (NN trade.)))))))))))
> > The countries broke off peace talks following the Mumbai attacks but have
> > begun discussions again, focusing on increasing trade
> >
> > let me know how it works
> >
> > happy coding!
> >
> > Mark G
> >
> >
> >
> > On Thu, Sep 26, 2013 at 4:14 PM, Carlos Scheidecker <nando.nlp@gmail.com
> > >wrote:
> >
> > > Thanks Svetoslav,
> > >
> > > Would you have an example on that?
> > >
> > > cheers,
> > >
> > > Carlos.
> > >
> > >
> > > On Thu, Sep 26, 2013 at 5:09 AM, Svetoslav Marinov <
> > > svetoslav.marinov@findwise.com> wrote:
> > >
> > > > Hi Carlos,
> > > >
> > > > This is not exactly answer to your question but I am not really
> > convinced
> > > > that a Phrase structure tree is the best way to extract triplets. A
> > > > dependency graph is a much better option.
> > > >
> > > > There would be a number of NPs and PPs that are neither the subject
> nor
> > > > the object, and not sure at all whether an adjective can be an
> object.
> > > >
> > > > However, if you want to use OpenNLP and the parse tree, maybe you can
> > > > consider mapping the tree to FrameNet, thus you will see what kind of
> > > > arguments a verb will have and which of these can potentially be the
> > > > subject and the object.
> > > >
> > > > Best,
> > > >
> > > > Svetoslav
> > > > ________________________________________
> > > > Från: Carlos Scheidecker <na...@gmail.com>
> > > > Skickat: den 26 september 2013 11:37
> > > > Till: dev@opennlp.apache.org
> > > > Ämne: Triplet Extraction with OpenNLP
> > > >
> > > > Hello all,
> > > >
> > > > I am interested in performing Triplet Extraction.
> > > >
> > > > For that, I need to traverse the parse tree.
> > > >
> > > > I know how to use the ChunkMe, however I am not sure how to use the
> > > Parser
> > > > so that I can create a tree to traverse it.
> > > >
> > > > Ideally, I want to obtain the subject, predicate and object.
> > > >
> > > > To find the subject I need to search in the NP subtree selecting the
> > > first
> > > > descendent of NP that is a Noun via breadth first search.
> > > >
> > > > To find the predicate I will search the VP subtree, the deepest verb
> > > > descendent on that tree will give the predicate.
> > > >
> > > > Now for the object(s) they can be in 3 different subtrees. PP, NP and
> > > ADJ.
> > > > In NP and PP they will be the first noun while on the ADJ we need to
> > > locate
> > > > the first adjective.
> > > >
> > > > Therefore, what I need to learn is how to create the parser and the
> > main
> > > > tree so that I can navigate the subtrees.
> > > >
> > > > Thanks for the help,
> > > >
> > > > Carlos.
> > > >
> > >
> >
>

Re: Triplet Extraction with OpenNLP

Posted by Carlos Scheidecker <na...@gmail.com>.
This is awesome Mark, thanks!

This will be quite useful for everybody else as well.

I ended up doing mine and I went further with the other part of extraction.

What I found interesting is the time it takes to load the
model en-parser-chunking.bin which is about 36mb.

So I am not loading everytime but just during object creation.

Anyone has another better suggestion?

cheers.


On Thu, Sep 26, 2013 at 4:59 PM, Mark G <gi...@gmail.com> wrote:

> Carlos.. I threw this together to show how to get a Parser running.
> Look at what this prints, I think you may be able to iterate through
> topParses[] and traverse the tree. If there is a more efficient way I am
> sure the other OpenNLPers will chime in.
>
>
>   public static void main(String[] args) throws InvalidFormatException,
> IOException {
>
>     InputStream is = new
> FileInputStream("c:\\temp\\opennlpmodels\\en-parser-chunking.bin");
>
>     ParserModel model = new ParserModel(is);
>     is.close();
>     Parser parser = ParserFactory.create(model);
>
>     String sentence = "The countries broke off peace talks following the
> Mumbai attacks but have begun discussions again, focusing on increasing
> trade.";
>     Parse topParses[] = ParserTool.parseLine(sentence, parser, 1);
>
>     Parse p = topParses[0];
>     p.showCodeTree();
>     p.show();
>     p.getParent();
>     p.getChildren();
>
>
>     System.out.println(p.getText());
>   }
>
> It should print all this...
>
> [0] S 2092924121 -> 2092924121 TOP The countries broke off peace talks
> following the Mumbai attacks but have begun discussions again, focusing on
> increasing trade.
> [0.0] NP 2092766686 -> 2092924121 S The countries
> [0.0.0] DT 2092752996 -> 2092766686 NP The
> [0.0.0.0] TK 2092752996 -> 2092752996 DT The
> [0.0.1] NNS 2092969298 -> 2092766686 NP countries
> [0.0.1.0] TK 2092969298 -> 2092969298 NNS countries
> [0.1] VP 2093633263 -> 2092924121 S broke off peace talks following the
> Mumbai attacks but have begun discussions again, focusing on increasing
> trade.
> [0.1.0] VP 2093545647 -> 2093633263 VP broke off peace talks following the
> Mumbai attacks
> [0.1.0.0] VBD 2093484042 -> 2093545647 VP broke
> [0.1.0.0.0] TK 2093484042 -> 2093484042 VBD broke
> [0.1.0.1] PRT 2093793436 -> 2093545647 VP off
> [0.1.0.1.0] RP 2093793436 -> 2093793436 PRT off
> [0.1.0.1.0.0] TK 2093793436 -> 2093793436 RP off
> [0.1.0.2] NP 2094012476 -> 2093545647 VP peace talks
> [0.1.0.2.0] NN 2094004262 -> 2094012476 NP peace
> [0.1.0.2.0.0] TK 2094004262 -> 2094004262 NN peace
> [0.1.0.2.1] NNS 2094316394 -> 2094012476 NP talks
> [0.1.0.2.1.0] TK 2094316394 -> 2094316394 NNS talks
> [0.1.0.3] PP 2094660013 -> 2093545647 VP following the Mumbai attacks
> [0.1.0.3.0] VBG 2094634002 -> 2094660013 PP following
> [0.1.0.3.0.0] TK 2094634002 -> 2094634002 VBG following
> [0.1.0.3.1] NP 2095166543 -> 2094660013 PP the Mumbai attacks
> [0.1.0.3.1.0] DT 2095146008 -> 2095166543 NP the
> [0.1.0.3.1.0.0] TK 2095146008 -> 2095146008 DT the
> [0.1.0.3.1.1] NNP 2095358203 -> 2095166543 NP Mumbai
> [0.1.0.3.1.1.0] TK 2095358203 -> 2095358203 NNP Mumbai
> [0.1.0.3.1.2] NNS 2095723726 -> 2095166543 NP attacks
> [0.1.0.3.1.2.0] TK 2095723726 -> 2095723726 NNS attacks
> [0.1.1] CC 2096134426 -> 2093633263 VP but
> [0.1.1.0] TK 2096134426 -> 2096134426 CC but
> [0.1.2] VP 2096419178 -> 2093633263 VP have begun discussions again,
> focusing on increasing trade.
> [0.1.2.0] VBP 2096343883 -> 2096419178 VP have
> [0.1.2.0.0] TK 2096343883 -> 2096343883 VBP have
> [0.1.2.1] VP 2096672443 -> 2096419178 VP begun discussions again, focusing
> on increasing trade.
> [0.1.2.1.0] VBN 2096605362 -> 2096672443 VP begun
> [0.1.2.1.0.0] TK 2096605362 -> 2096605362 VBN begun
> [0.1.2.1.1] NP 2096925708 -> 2096672443 VP discussions
> [0.1.2.1.1.0] NNS 2096925708 -> 2096925708 NP discussions
> [0.1.2.1.1.0.0] TK 2096925708 -> 2096925708 NNS discussions
> [0.1.2.1.2] PP 2097584197 -> 2096672443 VP again, focusing on increasing
> trade.
> [0.1.2.1.2.0] IN 2097543127 -> 2097584197 PP again,
> [0.1.2.1.2.0.0] TK 2097543127 -> 2097543127 IN again,
> [0.1.2.1.2.1] S 2097938768 -> 2097584197 PP focusing on increasing trade.
> [0.1.2.1.2.1.0] VP 2097938768 -> 2097938768 S focusing on increasing trade.
> [0.1.2.1.2.1.0.0] VBG 2097910019 -> 2097938768 VP focusing
> [0.1.2.1.2.1.0.0.0] TK 2097910019 -> 2097910019 VBG focusing
> [0.1.2.1.2.1.0.1] PP 2098394645 -> 2097938768 VP on increasing trade.
> [0.1.2.1.2.1.0.1.0] IN 2098370003 -> 2098394645 PP on
> [0.1.2.1.2.1.0.1.0.0] TK 2098370003 -> 2098370003 IN on
> [0.1.2.1.2.1.0.1.1] NP 2098546604 -> 2098394645 PP increasing trade.
> [0.1.2.1.2.1.0.1.1.0] VBG 2098537021 -> 2098546604 NP increasing
> [0.1.2.1.2.1.0.1.1.0.0] TK 2098537021 -> 2098537021 VBG increasing
> [0.1.2.1.2.1.0.1.1.1] NN 2099103787 -> 2098546604 NP trade.
> [0.1.2.1.2.1.0.1.1.1.0] TK 2099103787 -> 2099103787 NN trade.
> (TOP (S (NP (DT The) (NNS countries)) (VP (VP (VBD broke) (PRT (RP off))
> (NP (NN peace) (NNS talks)) (PP (VBG following) (NP (DT the) (NNP Mumbai)
> (NNS attacks)))) (CC but) (VP (VBP have) (VP (VBN begun) (NP (NNS
> discussions)) (PP (IN again,) (S (VP (VBG focusing) (PP (IN on) (NP (VBG
> increasing) (NN trade.)))))))))))
> The countries broke off peace talks following the Mumbai attacks but have
> begun discussions again, focusing on increasing trade
>
> let me know how it works
>
> happy coding!
>
> Mark G
>
>
>
> On Thu, Sep 26, 2013 at 4:14 PM, Carlos Scheidecker <nando.nlp@gmail.com
> >wrote:
>
> > Thanks Svetoslav,
> >
> > Would you have an example on that?
> >
> > cheers,
> >
> > Carlos.
> >
> >
> > On Thu, Sep 26, 2013 at 5:09 AM, Svetoslav Marinov <
> > svetoslav.marinov@findwise.com> wrote:
> >
> > > Hi Carlos,
> > >
> > > This is not exactly answer to your question but I am not really
> convinced
> > > that a Phrase structure tree is the best way to extract triplets. A
> > > dependency graph is a much better option.
> > >
> > > There would be a number of NPs and PPs that are neither the subject nor
> > > the object, and not sure at all whether an adjective can be an object.
> > >
> > > However, if you want to use OpenNLP and the parse tree, maybe you can
> > > consider mapping the tree to FrameNet, thus you will see what kind of
> > > arguments a verb will have and which of these can potentially be the
> > > subject and the object.
> > >
> > > Best,
> > >
> > > Svetoslav
> > > ________________________________________
> > > Från: Carlos Scheidecker <na...@gmail.com>
> > > Skickat: den 26 september 2013 11:37
> > > Till: dev@opennlp.apache.org
> > > Ämne: Triplet Extraction with OpenNLP
> > >
> > > Hello all,
> > >
> > > I am interested in performing Triplet Extraction.
> > >
> > > For that, I need to traverse the parse tree.
> > >
> > > I know how to use the ChunkMe, however I am not sure how to use the
> > Parser
> > > so that I can create a tree to traverse it.
> > >
> > > Ideally, I want to obtain the subject, predicate and object.
> > >
> > > To find the subject I need to search in the NP subtree selecting the
> > first
> > > descendent of NP that is a Noun via breadth first search.
> > >
> > > To find the predicate I will search the VP subtree, the deepest verb
> > > descendent on that tree will give the predicate.
> > >
> > > Now for the object(s) they can be in 3 different subtrees. PP, NP and
> > ADJ.
> > > In NP and PP they will be the first noun while on the ADJ we need to
> > locate
> > > the first adjective.
> > >
> > > Therefore, what I need to learn is how to create the parser and the
> main
> > > tree so that I can navigate the subtrees.
> > >
> > > Thanks for the help,
> > >
> > > Carlos.
> > >
> >
>

Re: Triplet Extraction with OpenNLP

Posted by Mark G <gi...@gmail.com>.
Carlos.. I threw this together to show how to get a Parser running.
Look at what this prints, I think you may be able to iterate through
topParses[] and traverse the tree. If there is a more efficient way I am
sure the other OpenNLPers will chime in.


  public static void main(String[] args) throws InvalidFormatException,
IOException {

    InputStream is = new
FileInputStream("c:\\temp\\opennlpmodels\\en-parser-chunking.bin");

    ParserModel model = new ParserModel(is);
    is.close();
    Parser parser = ParserFactory.create(model);

    String sentence = "The countries broke off peace talks following the
Mumbai attacks but have begun discussions again, focusing on increasing
trade.";
    Parse topParses[] = ParserTool.parseLine(sentence, parser, 1);

    Parse p = topParses[0];
    p.showCodeTree();
    p.show();
    p.getParent();
    p.getChildren();


    System.out.println(p.getText());
  }

It should print all this...

[0] S 2092924121 -> 2092924121 TOP The countries broke off peace talks
following the Mumbai attacks but have begun discussions again, focusing on
increasing trade.
[0.0] NP 2092766686 -> 2092924121 S The countries
[0.0.0] DT 2092752996 -> 2092766686 NP The
[0.0.0.0] TK 2092752996 -> 2092752996 DT The
[0.0.1] NNS 2092969298 -> 2092766686 NP countries
[0.0.1.0] TK 2092969298 -> 2092969298 NNS countries
[0.1] VP 2093633263 -> 2092924121 S broke off peace talks following the
Mumbai attacks but have begun discussions again, focusing on increasing
trade.
[0.1.0] VP 2093545647 -> 2093633263 VP broke off peace talks following the
Mumbai attacks
[0.1.0.0] VBD 2093484042 -> 2093545647 VP broke
[0.1.0.0.0] TK 2093484042 -> 2093484042 VBD broke
[0.1.0.1] PRT 2093793436 -> 2093545647 VP off
[0.1.0.1.0] RP 2093793436 -> 2093793436 PRT off
[0.1.0.1.0.0] TK 2093793436 -> 2093793436 RP off
[0.1.0.2] NP 2094012476 -> 2093545647 VP peace talks
[0.1.0.2.0] NN 2094004262 -> 2094012476 NP peace
[0.1.0.2.0.0] TK 2094004262 -> 2094004262 NN peace
[0.1.0.2.1] NNS 2094316394 -> 2094012476 NP talks
[0.1.0.2.1.0] TK 2094316394 -> 2094316394 NNS talks
[0.1.0.3] PP 2094660013 -> 2093545647 VP following the Mumbai attacks
[0.1.0.3.0] VBG 2094634002 -> 2094660013 PP following
[0.1.0.3.0.0] TK 2094634002 -> 2094634002 VBG following
[0.1.0.3.1] NP 2095166543 -> 2094660013 PP the Mumbai attacks
[0.1.0.3.1.0] DT 2095146008 -> 2095166543 NP the
[0.1.0.3.1.0.0] TK 2095146008 -> 2095146008 DT the
[0.1.0.3.1.1] NNP 2095358203 -> 2095166543 NP Mumbai
[0.1.0.3.1.1.0] TK 2095358203 -> 2095358203 NNP Mumbai
[0.1.0.3.1.2] NNS 2095723726 -> 2095166543 NP attacks
[0.1.0.3.1.2.0] TK 2095723726 -> 2095723726 NNS attacks
[0.1.1] CC 2096134426 -> 2093633263 VP but
[0.1.1.0] TK 2096134426 -> 2096134426 CC but
[0.1.2] VP 2096419178 -> 2093633263 VP have begun discussions again,
focusing on increasing trade.
[0.1.2.0] VBP 2096343883 -> 2096419178 VP have
[0.1.2.0.0] TK 2096343883 -> 2096343883 VBP have
[0.1.2.1] VP 2096672443 -> 2096419178 VP begun discussions again, focusing
on increasing trade.
[0.1.2.1.0] VBN 2096605362 -> 2096672443 VP begun
[0.1.2.1.0.0] TK 2096605362 -> 2096605362 VBN begun
[0.1.2.1.1] NP 2096925708 -> 2096672443 VP discussions
[0.1.2.1.1.0] NNS 2096925708 -> 2096925708 NP discussions
[0.1.2.1.1.0.0] TK 2096925708 -> 2096925708 NNS discussions
[0.1.2.1.2] PP 2097584197 -> 2096672443 VP again, focusing on increasing
trade.
[0.1.2.1.2.0] IN 2097543127 -> 2097584197 PP again,
[0.1.2.1.2.0.0] TK 2097543127 -> 2097543127 IN again,
[0.1.2.1.2.1] S 2097938768 -> 2097584197 PP focusing on increasing trade.
[0.1.2.1.2.1.0] VP 2097938768 -> 2097938768 S focusing on increasing trade.
[0.1.2.1.2.1.0.0] VBG 2097910019 -> 2097938768 VP focusing
[0.1.2.1.2.1.0.0.0] TK 2097910019 -> 2097910019 VBG focusing
[0.1.2.1.2.1.0.1] PP 2098394645 -> 2097938768 VP on increasing trade.
[0.1.2.1.2.1.0.1.0] IN 2098370003 -> 2098394645 PP on
[0.1.2.1.2.1.0.1.0.0] TK 2098370003 -> 2098370003 IN on
[0.1.2.1.2.1.0.1.1] NP 2098546604 -> 2098394645 PP increasing trade.
[0.1.2.1.2.1.0.1.1.0] VBG 2098537021 -> 2098546604 NP increasing
[0.1.2.1.2.1.0.1.1.0.0] TK 2098537021 -> 2098537021 VBG increasing
[0.1.2.1.2.1.0.1.1.1] NN 2099103787 -> 2098546604 NP trade.
[0.1.2.1.2.1.0.1.1.1.0] TK 2099103787 -> 2099103787 NN trade.
(TOP (S (NP (DT The) (NNS countries)) (VP (VP (VBD broke) (PRT (RP off))
(NP (NN peace) (NNS talks)) (PP (VBG following) (NP (DT the) (NNP Mumbai)
(NNS attacks)))) (CC but) (VP (VBP have) (VP (VBN begun) (NP (NNS
discussions)) (PP (IN again,) (S (VP (VBG focusing) (PP (IN on) (NP (VBG
increasing) (NN trade.)))))))))))
The countries broke off peace talks following the Mumbai attacks but have
begun discussions again, focusing on increasing trade

let me know how it works

happy coding!

Mark G



On Thu, Sep 26, 2013 at 4:14 PM, Carlos Scheidecker <na...@gmail.com>wrote:

> Thanks Svetoslav,
>
> Would you have an example on that?
>
> cheers,
>
> Carlos.
>
>
> On Thu, Sep 26, 2013 at 5:09 AM, Svetoslav Marinov <
> svetoslav.marinov@findwise.com> wrote:
>
> > Hi Carlos,
> >
> > This is not exactly answer to your question but I am not really convinced
> > that a Phrase structure tree is the best way to extract triplets. A
> > dependency graph is a much better option.
> >
> > There would be a number of NPs and PPs that are neither the subject nor
> > the object, and not sure at all whether an adjective can be an object.
> >
> > However, if you want to use OpenNLP and the parse tree, maybe you can
> > consider mapping the tree to FrameNet, thus you will see what kind of
> > arguments a verb will have and which of these can potentially be the
> > subject and the object.
> >
> > Best,
> >
> > Svetoslav
> > ________________________________________
> > Från: Carlos Scheidecker <na...@gmail.com>
> > Skickat: den 26 september 2013 11:37
> > Till: dev@opennlp.apache.org
> > Ämne: Triplet Extraction with OpenNLP
> >
> > Hello all,
> >
> > I am interested in performing Triplet Extraction.
> >
> > For that, I need to traverse the parse tree.
> >
> > I know how to use the ChunkMe, however I am not sure how to use the
> Parser
> > so that I can create a tree to traverse it.
> >
> > Ideally, I want to obtain the subject, predicate and object.
> >
> > To find the subject I need to search in the NP subtree selecting the
> first
> > descendent of NP that is a Noun via breadth first search.
> >
> > To find the predicate I will search the VP subtree, the deepest verb
> > descendent on that tree will give the predicate.
> >
> > Now for the object(s) they can be in 3 different subtrees. PP, NP and
> ADJ.
> > In NP and PP they will be the first noun while on the ADJ we need to
> locate
> > the first adjective.
> >
> > Therefore, what I need to learn is how to create the parser and the main
> > tree so that I can navigate the subtrees.
> >
> > Thanks for the help,
> >
> > Carlos.
> >
>

Re: Triplet Extraction with OpenNLP

Posted by Carlos Scheidecker <na...@gmail.com>.
Thanks Svetoslav,

Would you have an example on that?

cheers,

Carlos.


On Thu, Sep 26, 2013 at 5:09 AM, Svetoslav Marinov <
svetoslav.marinov@findwise.com> wrote:

> Hi Carlos,
>
> This is not exactly answer to your question but I am not really convinced
> that a Phrase structure tree is the best way to extract triplets. A
> dependency graph is a much better option.
>
> There would be a number of NPs and PPs that are neither the subject nor
> the object, and not sure at all whether an adjective can be an object.
>
> However, if you want to use OpenNLP and the parse tree, maybe you can
> consider mapping the tree to FrameNet, thus you will see what kind of
> arguments a verb will have and which of these can potentially be the
> subject and the object.
>
> Best,
>
> Svetoslav
> ________________________________________
> Från: Carlos Scheidecker <na...@gmail.com>
> Skickat: den 26 september 2013 11:37
> Till: dev@opennlp.apache.org
> Ämne: Triplet Extraction with OpenNLP
>
> Hello all,
>
> I am interested in performing Triplet Extraction.
>
> For that, I need to traverse the parse tree.
>
> I know how to use the ChunkMe, however I am not sure how to use the Parser
> so that I can create a tree to traverse it.
>
> Ideally, I want to obtain the subject, predicate and object.
>
> To find the subject I need to search in the NP subtree selecting the first
> descendent of NP that is a Noun via breadth first search.
>
> To find the predicate I will search the VP subtree, the deepest verb
> descendent on that tree will give the predicate.
>
> Now for the object(s) they can be in 3 different subtrees. PP, NP and ADJ.
> In NP and PP they will be the first noun while on the ADJ we need to locate
> the first adjective.
>
> Therefore, what I need to learn is how to create the parser and the main
> tree so that I can navigate the subtrees.
>
> Thanks for the help,
>
> Carlos.
>

SV: Triplet Extraction with OpenNLP

Posted by Svetoslav Marinov <sv...@findwise.com>.
Hi Carlos, 

This is not exactly answer to your question but I am not really convinced that a Phrase structure tree is the best way to extract triplets. A dependency graph is a much better option.

There would be a number of NPs and PPs that are neither the subject nor the object, and not sure at all whether an adjective can be an object.

However, if you want to use OpenNLP and the parse tree, maybe you can consider mapping the tree to FrameNet, thus you will see what kind of arguments a verb will have and which of these can potentially be the subject and the object.

Best,

Svetoslav
________________________________________
Från: Carlos Scheidecker <na...@gmail.com>
Skickat: den 26 september 2013 11:37
Till: dev@opennlp.apache.org
Ämne: Triplet Extraction with OpenNLP

Hello all,

I am interested in performing Triplet Extraction.

For that, I need to traverse the parse tree.

I know how to use the ChunkMe, however I am not sure how to use the Parser
so that I can create a tree to traverse it.

Ideally, I want to obtain the subject, predicate and object.

To find the subject I need to search in the NP subtree selecting the first
descendent of NP that is a Noun via breadth first search.

To find the predicate I will search the VP subtree, the deepest verb
descendent on that tree will give the predicate.

Now for the object(s) they can be in 3 different subtrees. PP, NP and ADJ.
In NP and PP they will be the first noun while on the ADJ we need to locate
the first adjective.

Therefore, what I need to learn is how to create the parser and the main
tree so that I can navigate the subtrees.

Thanks for the help,

Carlos.