You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by John Kimpton <ki...@gmail.com> on 2011/03/21 11:28:16 UTC

Filtering POS Tags from Parse Trees

Hello,
I'm creating a GUI app where i want to be able to take in a file of use
cases and parse all of them into the tree bank structures. From there i am
trying to filter the parse tree for all of the noun structures and verb
structures to output them back to the user, so they can decide which are
classes, attributes, and methods.


Here's the code i have the first method just takes in the tokenized
sentences and creates a flat tree structure and returns it to the gui. The
gui on button click then calls this method and print the tree structure to
the text area.
    private Parse[] parseSentence(String[] tokens) throws
FileNotFoundException, IOException {
            //Model location for each specific tool
            InputStream parserModelFile = new
FileInputStream(modulePath.getText() + "\\en-parser-chunking.bin");

            //Create an instance of that model for the specific tool
            ParserModel parseModel = new ParserModel(parserModelFile);

            //Create an instance of of Each tool passing in their specific
model
            Parser parseDet = ParserFactory.create(parseModel);

           //Create an string array to hold each dectected sentence

           // build a string to parse as well as a list of tokens
            StringBuffer sb = new StringBuffer();
            List<String> tokenList = new ArrayList<String>();
            for (int j = 0; j < tokens.length; j++)
            {
               String tok = tokens[j];
               tokenList.add(tok);
               sb.append(tok).append(" ");
            }
            String text = sb.substring(0, sb.length() - 1).toString();


           Parse p = new Parse(text, new Span(0, text.length()), "INC", 1,
null);


           // create a parse object for each token and add it to the parent
           int start = 0;
           int myIndex = 0;
           for (Iterator ti = tokenList.iterator(); ti.hasNext();)
           {
               myIndex++;
               String tok = (String) ti.next();
               p.insert( new Parse(text, new Span(start, start +
tok.length()), AbstractBottomUpParser.TOK_NODE, 0, myIndex));
               start+=tok.length()+1;
               System.out.println(p.getChildren().toString());
            }

           Parse[] parses = parseDet.parse(p,start);


           return parses;
    }


And then in the button code that calls this method i have the following
code:
            //Tokenize the input file for parsing
            String[] tokens = tokeSentence();
            Parse[] parseTree = parseSentence(tokens);

            //Clear Text Area Before Printing
            jTextArea1.setText("");

            StringBuffer sBuff = new StringBuffer();

            parseTree[0].show(sBuff);
            jTextArea1.append(sBuff.toString()+"\n");

            for(int i=0; i<parseTree.length; i++)
                  textArea1.append(parseTree[i].getType());


I am able to create the parse tree for all of the sentences but when i try
to go through the trees and look for the tags the only thing that's returned
is the TOP tag. Any help on how to traverse through and filter the tree
structure on the POS tags and return that nodes content would be much
appreciated.

Thanks,
John

PS. Sorry for the long email just really stuck at the moment.