You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ctakes.apache.org by "Pei Chen (JIRA)" <ji...@apache.org> on 2012/11/02 21:13:12 UTC

[jira] [Created] (CTAKES-96) Update Dependency Parser and Semantic Role Labeler - Thanks Jinho Choi and Lee Beecker

Pei Chen created CTAKES-96:
------------------------------

             Summary: Update Dependency Parser and Semantic Role Labeler - Thanks Jinho Choi and Lee Beecker
                 Key: CTAKES-96
                 URL: https://issues.apache.org/jira/browse/CTAKES-96
             Project: cTAKES
          Issue Type: New Feature
            Reporter: Pei Chen
             Fix For: future enhancement


Update/create new wrappers for ClearNLP that have been trained on clinical notes (SHARP/MiPACQ).

Some notes:
the integration will be mostly switching to cTAKES types.

Here are a few critical spots:

In the tokenizer (https://code.google.com/p/cleartk/source/browse/cleartk-clearnlp/src/main/java/org/cleartk/clearnlp/Tokenizer.java), lines 96 and 106 are all that should need changing to switch to cTAKES Sentence and Token types.

In the pos-tagger (https://code.google.com/p/cleartk/source/browse/cleartk-clearnlp/src/main/java/org/cleartk/clearnlp/PosTagger.java) most of the changes should be lines 109 and 116-118

In the MP Analyzer (https://code.google.com/p/cleartk/source/browse/cleartk-clearnlp/src/main/java/org/cleartk/clearnlp/MPAnalyzer.java) the changes would be lines 122-124 to again use the cTAKES toke types.

The Dependency Parser (https://code.google.com/p/cleartk/source/browse/cleartk-clearnlp/src/main/java/org/cleartk/clearnlp/DependencyParser.java) is a bit harder, but similar.  I think you can step through and find instances of ClearTK types and swap them for the Dependency Relation types in cTAKES.  Basically the code grabs the token, POS, and lemma data from the CAS and passes it onto Jinho's SRL.  Then the work is in mapping that output back into CAS appropriate types.

The Semantic Role Labeler (https://code.google.com/p/cleartk/source/browse/cleartk-clearnlp/src/main/java/org/cleartk/clearnlp/SemanticRoleLabeler.java) follows a similar flow.  But also pulls out Dependency Parse information from the CAS.  Then the work is in extracting the SRL arguments and predicates to put back into ClearTK CAS types.

Lastly to get any idea of how these components are called in a UIMA pipeline, I would refer to the test cases, especailly the ClearNLP test case (https://code.google.com/p/cleartk/source/browse/cleartk-clearnlp/src/test/java/org/cleartk/clearnlp/ClearNLPTest.java)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira