You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ctakes.apache.org by Damir Olejar <ol...@gmail.com> on 2015/05/25 18:04:30 UTC

I need some help understanding the CVD output

To whom it may concern,

First, I am sorry if this question was already asked, however, I cannot
find any answers online. I am trying to understand the CVD output, and what
each output means, while doing the AggregatePlainTextUMLSProcessor.

Mainly, I would like to know what discoveryTechnique=0, Confidence,
polarity, uncertainty, conditional, … mean exactly, and what are the
possible variations (for example, how many discoveryTechniques there are,
and what does each number mean) ?

So far, I have managed to gather some information, but with the rest, I do
have a difficulty since I do not know what is the main source of
information (or perhaps the research papers) for building the CVD  to build
CVD.

Thank you for your help,
Damir
----------------------------------------
Sofa - Each representation of an Artifact is called a Subject of Analysis,
abbreviated using the acronym “Sofa” which stands for Subject OF Analysis.

CAS - Common Analysis Structure - The CAS is the central data structure
through which all UIMA components communicate. Java interface to the CAS
called the JCas.

Annotation index - Annotators add meta data about a Sofa to the CAS. It is
often useful to have this metadata denote a region of the Sofa to which it
applies. For instance, assuming the Sofa is a String, the metadata might
describe a particular substring as the name of a person.

Semantic Role Relation (Generate Index) - Consists of ID, Category,
Discovery Technique, Confidence, Polarity, Uncertainty, Conditional,
Predicate, and Argument.

    Category (the role labels) are:
    A0: Agent? (Similar to a Nominative)
    A1: Patient ? (Similar to a Genitive)
    A2: Purpose or direction ? (Similar to a Dative)
    A3: No generalization can be made?
    A4: No generalization can be made?
    A5: No generalization can be made?
    AM-ADV: general purpose
    AM-CAU: cause
    AM-DIR: direction
    AM-DIS: discourse marker
    AM-EXT: extent
    AM-LOC: location
    AM-MNR: manner
    AM-MOD: modal verb
    AM-NEG: negation marker
    AM-PNC: purpose
    AM-PRD: predication
    AM-PRP: purpose
    AM-REC: reciprocal
    AM-TMP: temporal

RE: I need some help understanding the CVD output

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.

Damir,
A lot of these are set using values in the Const class:

http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-type-system/src/main/java/org/apache/ctakes/typesystem/type/constants/CONST.java

Discovery technique is sometimes used to differentiate gold standard annotations (read in from an external file created by humans) from automatically discovered annotations.

For the attributes like polarity, uncertainty, there are values indicating polarity asserted vs. polarity negated.

While best practice is to use these constants and add to them if you create a new module, that doesn't always happen, and there are also cases where attributes exist for which methods to not yet exist to populate them. In many of these cases you will see an attribute set to 0 or null.

Tim


________________________________________
From: Damir Olejar [olejar.damir@gmail.com]
Sent: Monday, May 25, 2015 12:05 PM
To: dev@ctakes.apache.org
Subject: I need some help understanding the CVD output

To whom it may concern,

First, I am sorry if this question was already asked, however, I cannot
find any answers online. I am trying to understand the CVD output, and what
each output means, while doing the AggregatePlainTextUMLSProcessor.

Mainly, I would like to know what discoveryTechnique=0, Confidence,
polarity, uncertainty, conditional, … mean exactly, and what are the
possible variations (for example, how many discoveryTechniques there are,
and what does each number mean) ?

So far, I have managed to gather some information, but with the rest, I do
have a difficulty since I do not know what is the main source of
information (or perhaps the research papers) for building the CVD  to build
CVD.

Thank you for your help,
Damir
----------------------------------------
Sofa - Each representation of an Artifact is called a Subject of Analysis,
abbreviated using the acronym “Sofa” which stands for Subject OF Analysis.

CAS - Common Analysis Structure - The CAS is the central data structure
through which all UIMA components communicate. Java interface to the CAS
called the JCas.

Annotation index - Annotators add meta data about a Sofa to the CAS. It is
often useful to have this metadata denote a region of the Sofa to which it
applies. For instance, assuming the Sofa is a String, the metadata might
describe a particular substring as the name of a person.

Semantic Role Relation (Generate Index) - Consists of ID, Category,
Discovery Technique, Confidence, Polarity, Uncertainty, Conditional,
Predicate, and Argument.

    Category (the role labels) are:
    A0: Agent? (Similar to a Nominative)
    A1: Patient ? (Similar to a Genitive)
    A2: Purpose or direction ? (Similar to a Dative)
    A3: No generalization can be made?
    A4: No generalization can be made?
    A5: No generalization can be made?
    AM-ADV: general purpose
    AM-CAU: cause
    AM-DIR: direction
    AM-DIS: discourse marker
    AM-EXT: extent
    AM-LOC: location
    AM-MNR: manner
    AM-MOD: modal verb
    AM-NEG: negation marker
    AM-PNC: purpose
    AM-PRD: predication
    AM-PRP: purpose
    AM-REC: reciprocal
    AM-TMP: temporal