You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Miller, Timothy" <Ti...@childrens.harvard.edu> on 2014/05/09 17:53:58 UTC

markable types

What do people think about taking the "markable" types out of the
coreference project and adding them to the standard type system? This is
a pretty standard concept in coreference that doesn't really have a
great natural representation in the current type system -- it
encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
"her") and some determiners ("this").

The drawback I can see is that it is probably not something anyone would
want extracted -- ultimately you want the actual coref pairs or chains.
But it is useful for things like representing gold standard input or
splitting coreference resolution into separate markable recognition and
relation classification steps.

Tim


Re: markable types

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
Hmm, we can check what the current co reference Markable type extends. But my guess is Annotation for the pronoun use case?- A pronoun could be a Markable, but not necessarily an IdentitfiedAnnotation?

Sent from my iPhone

> On May 16, 2014, at 5:16 PM, "Dligach, Dmitriy" <Dm...@childrens.harvard.edu> wrote:
> 
> Probably a good idea. Would this new type be related to IdentifiedAnnotation? Its super type?
> 
> Dima
> 
> 
> 
> 
>> On May 15, 2014, at 3:02, Miller, Timothy <Ti...@childrens.harvard.edu> wrote:
>> 
>> What do people think about taking the "markable" types out of the
>> coreference project and adding them to the standard type system? This is
>> a pretty standard concept in coreference that doesn't really have a
>> great natural representation in the current type system -- it
>> encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
>> "her") and some determiners ("this").
>> 
>> The drawback I can see is that it is probably not something anyone would
>> want extracted -- ultimately you want the actual coref pairs or chains.
>> But it is useful for things like representing gold standard input or
>> splitting coreference resolution into separate markable recognition and
>> relation classification steps.
>> 
>> Tim
> 

Re: markable types

Posted by "Wu, Stephen T., Ph.D." <Wu...@mayo.edu>.
Yeah there's something weird with the mailing system.

stephen

On 5/16/14 4:31 PM, "Dligach, Dmitriy"
<Dm...@childrens.harvard.edu> wrote:

>WeirdŠ I sent this email days ago.
>
>Dima
>
>
>
>
>On May 16, 2014, at 16:16, Dligach, Dmitriy
><Dm...@childrens.harvard.edu> wrote:
>
>> Probably a good idea. Would this new type be related to
>>IdentifiedAnnotation? Its super type?
>> 
>> Dima
>> 
>> 
>> 
>> 
>> On May 15, 2014, at 3:02, Miller, Timothy
>><Ti...@childrens.harvard.edu> wrote:
>> 
>>> What do people think about taking the "markable" types out of the
>>> coreference project and adding them to the standard type system? This
>>>is
>>> a pretty standard concept in coreference that doesn't really have a
>>> great natural representation in the current type system -- it
>>> encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
>>> "her") and some determiners ("this").
>>> 
>>> The drawback I can see is that it is probably not something anyone
>>>would
>>> want extracted -- ultimately you want the actual coref pairs or chains.
>>> But it is useful for things like representing gold standard input or
>>> splitting coreference resolution into separate markable recognition and
>>> relation classification steps.
>>> 
>>> Tim
>>> 
>> 
>


Re: markable types

Posted by "Dligach, Dmitriy" <Dm...@childrens.harvard.edu>.
Weird… I sent this email days ago.

Dima




On May 16, 2014, at 16:16, Dligach, Dmitriy <Dm...@childrens.harvard.edu> wrote:

> Probably a good idea. Would this new type be related to IdentifiedAnnotation? Its super type?
> 
> Dima
> 
> 
> 
> 
> On May 15, 2014, at 3:02, Miller, Timothy <Ti...@childrens.harvard.edu> wrote:
> 
>> What do people think about taking the "markable" types out of the
>> coreference project and adding them to the standard type system? This is
>> a pretty standard concept in coreference that doesn't really have a
>> great natural representation in the current type system -- it
>> encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
>> "her") and some determiners ("this").
>> 
>> The drawback I can see is that it is probably not something anyone would
>> want extracted -- ultimately you want the actual coref pairs or chains.
>> But it is useful for things like representing gold standard input or
>> splitting coreference resolution into separate markable recognition and
>> relation classification steps.
>> 
>> Tim
>> 
> 


Re: markable types

Posted by "Dligach, Dmitriy" <Dm...@childrens.harvard.edu>.
Probably a good idea. Would this new type be related to IdentifiedAnnotation? Its super type?

Dima




On May 15, 2014, at 3:02, Miller, Timothy <Ti...@childrens.harvard.edu> wrote:

> What do people think about taking the "markable" types out of the
> coreference project and adding them to the standard type system? This is
> a pretty standard concept in coreference that doesn't really have a
> great natural representation in the current type system -- it
> encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
> "her") and some determiners ("this").
> 
> The drawback I can see is that it is probably not something anyone would
> want extracted -- ultimately you want the actual coref pairs or chains.
> But it is useful for things like representing gold standard input or
> splitting coreference resolution into separate markable recognition and
> relation classification steps.
> 
> Tim
> 


RE: markable types

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
+1 for a consolidated common type system...
I would go a step further- 'Markable' seems like a pretty general concept, maybe if folks can think of other uses, we can subclass a Markable>CoRefMarkable?

> -----Original Message-----
> From: Steven Bethard [mailto:steven.bethard@gmail.com]
> Sent: Sunday, May 11, 2014 8:12 AM
> To: dev@ctakes.apache.org
> Subject: Re: markable types
> 
> I don't think "not something anyone would want extracted" should be an
> argument against anything. We already have constituent and dependency
> parse trees in the type system, and those would fall under that category.
> 
> So +1 on markables in the type system. (In general, +1 on moving module-
> specific types to the standard type system. I'm not sure what the real benefit
> of splitting them out is...)
> 
> Steve
> 
> On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy
> <Ti...@childrens.harvard.edu> wrote:
> > What do people think about taking the "markable" types out of the
> > coreference project and adding them to the standard type system? This
> > is a pretty standard concept in coreference that doesn't really have a
> > great natural representation in the current type system -- it
> > encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
> > "her") and some determiners ("this").
> >
> > The drawback I can see is that it is probably not something anyone
> > would want extracted -- ultimately you want the actual coref pairs or
> chains.
> > But it is useful for things like representing gold standard input or
> > splitting coreference resolution into separate markable recognition
> > and relation classification steps.
> >
> > Tim
> >

Re: markable types

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.
Again I'm not sure I understand so please clarify if this isn't what you're looking for.

The Ctakes typesystem represents syntax trees with three types: TopTreebankNode, TreebankNode, and TerminalTreebankNode. Top and Terminal inherit from TreebankNode with special properties for being the root of a tree or the leaf of a tree (including the part of speech tag and a word). For most trees, calling getNodeType() will get you the category you want. For Terminal trees, getNodeType() and getNodeValue() will have the POS and word respectively. You can get the subtrees for a node with getChildren() and a specific subtree with getChildren(int), where the int arg is indexed from 0. Each tree is also connected to its parent by getParent(). Each node also has its headword denoted by the getHead() method (I think that's right but I'm doing this from memory so you'll have to check), which is an index into the array of _all_ children in the sentence. So if tree.getHead() returns 5, then you would call getTerminals() on the root tree and get the word at index 5 to get the head of the variable tree.
The parser works at the sentence level, so a standard thing is to simultaneously get all trees/sentences by doing:
for(TopTreebankNode tree : JCasUtil.select(jcas, TopTreebankNode.class)){
  // do something with this tree
}

Hope this helps.
Tim


On May 17, 2014, at 1:54 PM, Anirban Chakraborti wrote:

> Thanks Timothy,
> 
> I get the point but would be greatly helpful if you have an illustrative
> example of a tree structure describing the branches and the nodes generated
> by Ctakes. I have got an hang how to parse the tree now.
> 
> 
> 
> 
> On Thu, May 15, 2014 at 5:03 PM, Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu> wrote:
> 
>> Anir -- I'm not sure I understand your question but from your example it
>> doesn't sound like a tree exactly. If you just want a list of medication
>> concepts you can do something like:
>> 
>> List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
>> MedicationMention.class));
>> (I believe MedicationMention is the correct class but check your output.)
>> 
>> If you really do want to put them into a syntax tree, there are also
>> methods for doing that in AnnotationTreeUtils class.
>> 
>> getAnnotationTree(JCas, Annotation) will give you the tree for the whole
>> sentence containing the annotation you give it
>> annotationNode(JCas, Annotation) will give you the smallest subtree tree
>> covering the annotation you give it.
>> insertAnnotationNode(JCas, TopTreebankNode, Annotation, String) will
>> insert a node into the tree specified at the level specified by the
>> annotation with the category specified by the string. So for example if you
>> had meds as above you could then do:
>> 
>> for(MedicationMention med : meds){
>>  AnnotationTreeUtils.insertAnnotationNode(jcas,
>> AnnotationTreeUtils.getAnnotationTree(jcas, med), med, "MEDICATION")
>> }
>> 
>> which would insert a new node into every tree with the label "MEDICATION"
>> in every position where a medication was found.
>> 
>> One caveat to the above code is that these methods actually will change
>> the tree in the cas. That might be ok for some use cases but for many you
>> want to work on a tree outside the cas so that's why there is also methods:
>> getTreeCopy(JCas, TopTreebankNode)
>> getTreeCopy(JCas, TreebankNode)
>> 
>> if you use the getAnnotationTree method to obtain the tree you want, then
>> you can get a copy from these methods, then use the insert methods and do
>> something with them immediately (like print them out), without altering the
>> originals in the cas if other AEs may use them.
>> 
>> Tim
>> 
>> 
>> 
>> ________________________________________
>> From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]
>> Sent: Sunday, May 11, 2014 9:15 AM
>> To: dev@ctakes.apache.org
>> Subject: Re: markable types
>> 
>> Steven,
>> 
>> Would you have any example code of tree parser so the output can be
>> arranged as per need. I mean, after successful annotation, I want to
>> extract certain concepts like medication only and arrange them in a new
>> tree so that all annotation in reference to medication concept and their
>> sources are listed together.
>> 
>> Anir
>> 
>> 
>> On Sun, May 11, 2014 at 3:55 PM, Steven Bethard <steven.bethard@gmail.com
>>> wrote:
>> 
>>> I don't think "not something anyone would want extracted" should be an
>>> argument against anything. We already have constituent and dependency
>>> parse trees in the type system, and those would fall under that
>>> category.
>>> 
>>> So +1 on markables in the type system. (In general, +1 on moving
>>> module-specific types to the standard type system. I'm not sure what
>>> the real benefit of splitting them out is...)
>>> 
>>> Steve
>>> 
>>> On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy
>>> <Ti...@childrens.harvard.edu> wrote:
>>>> What do people think about taking the "markable" types out of the
>>>> coreference project and adding them to the standard type system? This
>> is
>>>> a pretty standard concept in coreference that doesn't really have a
>>>> great natural representation in the current type system -- it
>>>> encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
>>>> "her") and some determiners ("this").
>>>> 
>>>> The drawback I can see is that it is probably not something anyone
>> would
>>>> want extracted -- ultimately you want the actual coref pairs or chains.
>>>> But it is useful for things like representing gold standard input or
>>>> splitting coreference resolution into separate markable recognition and
>>>> relation classification steps.
>>>> 
>>>> Tim
>>>> 
>>> 
>> 


Re: markable types

Posted by Anirban Chakraborti <ch...@googlemail.com>.
Thanks Timothy,

I get the point but would be greatly helpful if you have an illustrative
example of a tree structure describing the branches and the nodes generated
by Ctakes. I have got an hang how to parse the tree now.




On Thu, May 15, 2014 at 5:03 PM, Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

> Anir -- I'm not sure I understand your question but from your example it
> doesn't sound like a tree exactly. If you just want a list of medication
> concepts you can do something like:
>
> List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
> MedicationMention.class));
> (I believe MedicationMention is the correct class but check your output.)
>
> If you really do want to put them into a syntax tree, there are also
> methods for doing that in AnnotationTreeUtils class.
>
> getAnnotationTree(JCas, Annotation) will give you the tree for the whole
> sentence containing the annotation you give it
> annotationNode(JCas, Annotation) will give you the smallest subtree tree
> covering the annotation you give it.
> insertAnnotationNode(JCas, TopTreebankNode, Annotation, String) will
> insert a node into the tree specified at the level specified by the
> annotation with the category specified by the string. So for example if you
> had meds as above you could then do:
>
> for(MedicationMention med : meds){
>   AnnotationTreeUtils.insertAnnotationNode(jcas,
> AnnotationTreeUtils.getAnnotationTree(jcas, med), med, "MEDICATION")
> }
>
> which would insert a new node into every tree with the label "MEDICATION"
> in every position where a medication was found.
>
> One caveat to the above code is that these methods actually will change
> the tree in the cas. That might be ok for some use cases but for many you
> want to work on a tree outside the cas so that's why there is also methods:
> getTreeCopy(JCas, TopTreebankNode)
> getTreeCopy(JCas, TreebankNode)
>
> if you use the getAnnotationTree method to obtain the tree you want, then
> you can get a copy from these methods, then use the insert methods and do
> something with them immediately (like print them out), without altering the
> originals in the cas if other AEs may use them.
>
> Tim
>
>
>
> ________________________________________
> From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]
> Sent: Sunday, May 11, 2014 9:15 AM
> To: dev@ctakes.apache.org
> Subject: Re: markable types
>
> Steven,
>
> Would you have any example code of tree parser so the output can be
> arranged as per need. I mean, after successful annotation, I want to
> extract certain concepts like medication only and arrange them in a new
> tree so that all annotation in reference to medication concept and their
> sources are listed together.
>
> Anir
>
>
> On Sun, May 11, 2014 at 3:55 PM, Steven Bethard <steven.bethard@gmail.com
> >wrote:
>
> > I don't think "not something anyone would want extracted" should be an
> > argument against anything. We already have constituent and dependency
> > parse trees in the type system, and those would fall under that
> > category.
> >
> > So +1 on markables in the type system. (In general, +1 on moving
> > module-specific types to the standard type system. I'm not sure what
> > the real benefit of splitting them out is...)
> >
> > Steve
> >
> > On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy
> > <Ti...@childrens.harvard.edu> wrote:
> > > What do people think about taking the "markable" types out of the
> > > coreference project and adding them to the standard type system? This
> is
> > > a pretty standard concept in coreference that doesn't really have a
> > > great natural representation in the current type system -- it
> > > encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
> > > "her") and some determiners ("this").
> > >
> > > The drawback I can see is that it is probably not something anyone
> would
> > > want extracted -- ultimately you want the actual coref pairs or chains.
> > > But it is useful for things like representing gold standard input or
> > > splitting coreference resolution into separate markable recognition and
> > > relation classification steps.
> > >
> > > Tim
> > >
> >
>

Re: markable types

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.
On 05/20/2014 07:24 AM, Anirban Chakraborti wrote:
> Here it is
>
> 1. The Ctakes typesystem represents syntax trees with three types:
> TopTreebankNode, TreebankNode, and TerminalTreebankNode - Understood.
>
> 2. The parser works at the sentence level, so a standard thing is to
> simultaneously get all trees/sentences by doing:
> for(TopTreebankNode tree : JCasUtil.select(jcas, TopTreebankNode.class)) -
> Understood
>
> My question is that a single word in a sentence may belong to various
> types simultaneously. How does the associated typeclass get stored in the
> nodes of tree so that when we parse the tree/sentence , we can get select
> type of interest and associated features/attributes
>
> what I want to understand what is the keys/value pairs of each node.
>
> Basically so that the following code works
>
> List<DiseaseDisorderMention> disease = new
>> ArrayList<>(JCasUtil.select(jcas, DiseaseDisorderMention.class);  //
> DiseaseDisorderMention is the selected typeclass to be extracted
>
Ok, I think I understand a bit. The way all UIMA annotations work is
they just represent spans on a string, so yes, there can be multiple
annotations for a given word. In fact, it maybe is a little misleading
to even use "word" there since UIMA has no sense of words, just strings
with character offsets.

So the above code should work to get DiseaseDisorderMention types. And
there is really no relation to the parse tree in the way the processing
works. So the extractor runs and creates a bunch of
DiseaseDisorderMention spans, then the parser runs and creates a bunch
of TreebankNode spans, and never the twain shall meet unless you supply
some code to bring them together.

So if you're looking for an easy way to navigate a parse tree and find
the named entities in it or vice versa, it's possible to do but it's not
automatic. You would probably want to start with the utility classes I
pointed you to earlier, possibly make your own modifications, and maybe
you would even need to create your own derived types to represent what
you're interested in.

Tim



>
> Hope I am clearer this time
>
>  Anir
>
>
>
>
> On Tue, May 20, 2014 at 4:32 PM, Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu> wrote:
>
>> I don't understand this question. Can you try to rephrase it? Or maybe if
>> you tell me what you want to do that would help me understand.
>>
>> ________________________________________
>> From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]
>> Sent: Tuesday, May 20, 2014 6:34 AM
>> To: dev@ctakes.apache.org
>> Subject: Re: markable types
>>
>> thanks again Timothy
>>
>> final question for now
>>
>> You had explained that each sentence is parsed and is converted to a
>>> tree with head and terminal node . Is the typesystem of ctakes an feature
>>> of the node, i.e can one node belong to two more typesystems and their
>>> further attributes OR for each type system , there is a syntax tree for
>>> every sentence parsed. I mean a sentence has various trees attached to it
>>> but there is 1:1 mapping between the node and typesystem.
>> Anir
>>
>>
>> On Tue, May 20, 2014 at 2:17 AM, Miller, Timothy <
>> Timothy.Miller@childrens.harvard.edu> wrote:
>>
>>> On 05/18/2014 07:40 AM, Anirban Chakraborti wrote:
>>>> Timothy,
>>>>
>>>> 1. so to get concepts of procedure, lab (if any), disease disorder ,
>> sign
>>>> symptoms, Anatomical sites , I would need to do
>>>>
>>>> List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
>>>> MedicationMention.class) ;
>>>> List<DiseaseDisorderMention> disease = new
>>>> ArrayList<>(JCasUtil.select(jcas, DiseaseDisorderMention.class);
>>>> List<SignSymptomsMention> signs = new ArrayList<>(JCasUtil.select(jcas,
>>>> SignSymptomMention.class);
>>>> List<AnatomicalMention> anatomy = new ArrayList
>>>> <> (JacsUtil.select(jcas,AnatomicalMention.class);
>>>> List <LabMention> labs = new ArrayList <>
>>>> (JacsUtil.select(jcas,LabMention.class);
>>>>
>>>> then check the size of the array { meds, disease, signs, anatomy ,
>> labs}
>>> ,
>>>> print out the array or make a new array using the Java.utils.List or
>>>> Java.utils.Arraylist  package interfaces as the case might me.  Right
>> ...
>>> yep
>>>> 2. I am more interested in the IdentifiedAnnotation class. However
>> there
>>>> are concepts like FractionAnnotation which are not defined enum in the
>>>> const.java. How do I handle them. Do I need to add to the const.java
>>> file.
>>> nope, you probably just want EntityMention (for anatomical sites) and
>>> EventMention (for all clinical events, including DiseaseDisorder,
>>> Procedure, SignSymptom, etc.).
>>>
>>>> 3. what exactly is the functional difference between say
>>>> MedicationEventMention .java, MedicationMention.java, Medication.java
>> and
>>>> MedicationEventMention_type.java .  I understand similar difference is
>>>> between class of lab, procedure etc...
>>> The types ending in _type.java are UIMA-internal types, you can ignore.
>>> Medication is a referential type -- something in the real world that
>>> could be referred to multiple times in a document. What you probably
>>> want are the mention types. Here I believe MedicationMention is the
>>> preferred type going forward for a particular mention of a medication in
>>> text (MedicationEventMention is the same thing but not preferred going
>>> forward).
>>>
>>>
>>>> 4.  You had explained that each sentence is parsed and is converted to
>> a
>>>> tree with head and terminal node . Is the typesystem of ctakes an
>> feature
>>>> of the node, i.e can one node belong to two more typesystems and their
>>>> further attributes OR for each type system , there is a syntax tree for
>>>> every sentence parsed. I mean a sentence has various trees attached to
>> it
>>>> but there is 1:1 mapping between the node and typesystem.
>>>>
>>>> Many Thanks
>>>>
>>>> Anirban
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, May 15, 2014 at 5:03 PM, Miller, Timothy <
>>>> Timothy.Miller@childrens.harvard.edu> wrote:
>>>>
>>>>> Anir -- I'm not sure I understand your question but from your example
>> it
>>>>> doesn't sound like a tree exactly. If you just want a list of
>> medication
>>>>> concepts you can do something like:
>>>>>
>>>>> List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
>>>>> MedicationMention.class));
>>>>> (I believe MedicationMention is the correct class but check your
>>> output.)
>>>>> If you really do want to put them into a syntax tree, there are also
>>>>> methods for doing that in AnnotationTreeUtils class.
>>>>>
>>>>> getAnnotationTree(JCas, Annotation) will give you the tree for the
>> whole
>>>>> sentence containing the annotation you give it
>>>>> annotationNode(JCas, Annotation) will give you the smallest subtree
>> tree
>>>>> covering the annotation you give it.
>>>>> insertAnnotationNode(JCas, TopTreebankNode, Annotation, String) will
>>>>> insert a node into the tree specified at the level specified by the
>>>>> annotation with the category specified by the string. So for example
>> if
>>> you
>>>>> had meds as above you could then do:
>>>>>
>>>>> for(MedicationMention med : meds){
>>>>>   AnnotationTreeUtils.insertAnnotationNode(jcas,
>>>>> AnnotationTreeUtils.getAnnotationTree(jcas, med), med, "MEDICATION")
>>>>> }
>>>>>
>>>>> which would insert a new node into every tree with the label
>>> "MEDICATION"
>>>>> in every position where a medication was found.
>>>>>
>>>>> One caveat to the above code is that these methods actually will
>> change
>>>>> the tree in the cas. That might be ok for some use cases but for many
>>> you
>>>>> want to work on a tree outside the cas so that's why there is also
>>> methods:
>>>>> getTreeCopy(JCas, TopTreebankNode)
>>>>> getTreeCopy(JCas, TreebankNode)
>>>>>
>>>>> if you use the getAnnotationTree method to obtain the tree you want,
>>> then
>>>>> you can get a copy from these methods, then use the insert methods and
>>> do
>>>>> something with them immediately (like print them out), without
>> altering
>>> the
>>>>> originals in the cas if other AEs may use them.
>>>>>
>>>>> Tim
>>>>>
>>>>>
>>>>>
>>>>> ________________________________________
>>>>> From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]
>>>>> Sent: Sunday, May 11, 2014 9:15 AM
>>>>> To: dev@ctakes.apache.org
>>>>> Subject: Re: markable types
>>>>>
>>>>> Steven,
>>>>>
>>>>> Would you have any example code of tree parser so the output can be
>>>>> arranged as per need. I mean, after successful annotation, I want to
>>>>> extract certain concepts like medication only and arrange them in a
>> new
>>>>> tree so that all annotation in reference to medication concept and
>> their
>>>>> sources are listed together.
>>>>>
>>>>> Anir
>>>>>
>>>>>
>>>>> On Sun, May 11, 2014 at 3:55 PM, Steven Bethard <
>>> steven.bethard@gmail.com
>>>>>> wrote:
>>>>>> I don't think "not something anyone would want extracted" should be
>> an
>>>>>> argument against anything. We already have constituent and dependency
>>>>>> parse trees in the type system, and those would fall under that
>>>>>> category.
>>>>>>
>>>>>> So +1 on markables in the type system. (In general, +1 on moving
>>>>>> module-specific types to the standard type system. I'm not sure what
>>>>>> the real benefit of splitting them out is...)
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy
>>>>>> <Ti...@childrens.harvard.edu> wrote:
>>>>>>> What do people think about taking the "markable" types out of the
>>>>>>> coreference project and adding them to the standard type system?
>> This
>>>>> is
>>>>>>> a pretty standard concept in coreference that doesn't really have a
>>>>>>> great natural representation in the current type system -- it
>>>>>>> encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
>>>>>>> "her") and some determiners ("this").
>>>>>>>
>>>>>>> The drawback I can see is that it is probably not something anyone
>>>>> would
>>>>>>> want extracted -- ultimately you want the actual coref pairs or
>>> chains.
>>>>>>> But it is useful for things like representing gold standard input or
>>>>>>> splitting coreference resolution into separate markable recognition
>>> and
>>>>>>> relation classification steps.
>>>>>>>
>>>>>>> Tim
>>>>>>>
>>> --
>>> Tim Miller
>>> Instructor
>>> Boston Children's Hospital and Harvard Medical School
>>> timothy.miller@childrens.harvard.edu
>>> 617-919-1223
>>>
>>>

-- 
Tim Miller
Instructor
Boston Children's Hospital and Harvard Medical School
timothy.miller@childrens.harvard.edu
617-919-1223


Re: markable types

Posted by Anirban Chakraborti <ch...@googlemail.com>.
Here it is

1. The Ctakes typesystem represents syntax trees with three types:
TopTreebankNode, TreebankNode, and TerminalTreebankNode - Understood.

2. The parser works at the sentence level, so a standard thing is to
simultaneously get all trees/sentences by doing:
for(TopTreebankNode tree : JCasUtil.select(jcas, TopTreebankNode.class)) -
Understood

My question is that a single word in a sentence may belong to various
types simultaneously. How does the associated typeclass get stored in the
nodes of tree so that when we parse the tree/sentence , we can get select
type of interest and associated features/attributes

what I want to understand what is the keys/value pairs of each node.

Basically so that the following code works

List<DiseaseDisorderMention> disease = new
> ArrayList<>(JCasUtil.select(jcas, DiseaseDisorderMention.class);  //
DiseaseDisorderMention is the selected typeclass to be extracted



Hope I am clearer this time

 Anir




On Tue, May 20, 2014 at 4:32 PM, Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

> I don't understand this question. Can you try to rephrase it? Or maybe if
> you tell me what you want to do that would help me understand.
>
> ________________________________________
> From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]
> Sent: Tuesday, May 20, 2014 6:34 AM
> To: dev@ctakes.apache.org
> Subject: Re: markable types
>
> thanks again Timothy
>
> final question for now
>
> You had explained that each sentence is parsed and is converted to a
> > tree with head and terminal node . Is the typesystem of ctakes an feature
> > of the node, i.e can one node belong to two more typesystems and their
> > further attributes OR for each type system , there is a syntax tree for
> > every sentence parsed. I mean a sentence has various trees attached to it
> > but there is 1:1 mapping between the node and typesystem.
>
> Anir
>
>
> On Tue, May 20, 2014 at 2:17 AM, Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu> wrote:
>
> >
> > On 05/18/2014 07:40 AM, Anirban Chakraborti wrote:
> > > Timothy,
> > >
> > > 1. so to get concepts of procedure, lab (if any), disease disorder ,
> sign
> > > symptoms, Anatomical sites , I would need to do
> > >
> > > List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
> > > MedicationMention.class) ;
> > > List<DiseaseDisorderMention> disease = new
> > > ArrayList<>(JCasUtil.select(jcas, DiseaseDisorderMention.class);
> > > List<SignSymptomsMention> signs = new ArrayList<>(JCasUtil.select(jcas,
> > > SignSymptomMention.class);
> > > List<AnatomicalMention> anatomy = new ArrayList
> > > <> (JacsUtil.select(jcas,AnatomicalMention.class);
> > > List <LabMention> labs = new ArrayList <>
> > > (JacsUtil.select(jcas,LabMention.class);
> > >
> > > then check the size of the array { meds, disease, signs, anatomy ,
> labs}
> > ,
> > > print out the array or make a new array using the Java.utils.List or
> > > Java.utils.Arraylist  package interfaces as the case might me.  Right
> ...
> > yep
> > > 2. I am more interested in the IdentifiedAnnotation class. However
> there
> > > are concepts like FractionAnnotation which are not defined enum in the
> > > const.java. How do I handle them. Do I need to add to the const.java
> > file.
> > nope, you probably just want EntityMention (for anatomical sites) and
> > EventMention (for all clinical events, including DiseaseDisorder,
> > Procedure, SignSymptom, etc.).
> >
> > >
> > > 3. what exactly is the functional difference between say
> > > MedicationEventMention .java, MedicationMention.java, Medication.java
> and
> > > MedicationEventMention_type.java .  I understand similar difference is
> > > between class of lab, procedure etc...
> > The types ending in _type.java are UIMA-internal types, you can ignore.
> > Medication is a referential type -- something in the real world that
> > could be referred to multiple times in a document. What you probably
> > want are the mention types. Here I believe MedicationMention is the
> > preferred type going forward for a particular mention of a medication in
> > text (MedicationEventMention is the same thing but not preferred going
> > forward).
> >
> >
> > >
> > > 4.  You had explained that each sentence is parsed and is converted to
> a
> > > tree with head and terminal node . Is the typesystem of ctakes an
> feature
> > > of the node, i.e can one node belong to two more typesystems and their
> > > further attributes OR for each type system , there is a syntax tree for
> > > every sentence parsed. I mean a sentence has various trees attached to
> it
> > > but there is 1:1 mapping between the node and typesystem.
> > >
> > > Many Thanks
> > >
> > > Anirban
> > >
> > >
> > >
> > >
> > >
> > > On Thu, May 15, 2014 at 5:03 PM, Miller, Timothy <
> > > Timothy.Miller@childrens.harvard.edu> wrote:
> > >
> > >> Anir -- I'm not sure I understand your question but from your example
> it
> > >> doesn't sound like a tree exactly. If you just want a list of
> medication
> > >> concepts you can do something like:
> > >>
> > >> List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
> > >> MedicationMention.class));
> > >> (I believe MedicationMention is the correct class but check your
> > output.)
> > >>
> > >> If you really do want to put them into a syntax tree, there are also
> > >> methods for doing that in AnnotationTreeUtils class.
> > >>
> > >> getAnnotationTree(JCas, Annotation) will give you the tree for the
> whole
> > >> sentence containing the annotation you give it
> > >> annotationNode(JCas, Annotation) will give you the smallest subtree
> tree
> > >> covering the annotation you give it.
> > >> insertAnnotationNode(JCas, TopTreebankNode, Annotation, String) will
> > >> insert a node into the tree specified at the level specified by the
> > >> annotation with the category specified by the string. So for example
> if
> > you
> > >> had meds as above you could then do:
> > >>
> > >> for(MedicationMention med : meds){
> > >>   AnnotationTreeUtils.insertAnnotationNode(jcas,
> > >> AnnotationTreeUtils.getAnnotationTree(jcas, med), med, "MEDICATION")
> > >> }
> > >>
> > >> which would insert a new node into every tree with the label
> > "MEDICATION"
> > >> in every position where a medication was found.
> > >>
> > >> One caveat to the above code is that these methods actually will
> change
> > >> the tree in the cas. That might be ok for some use cases but for many
> > you
> > >> want to work on a tree outside the cas so that's why there is also
> > methods:
> > >> getTreeCopy(JCas, TopTreebankNode)
> > >> getTreeCopy(JCas, TreebankNode)
> > >>
> > >> if you use the getAnnotationTree method to obtain the tree you want,
> > then
> > >> you can get a copy from these methods, then use the insert methods and
> > do
> > >> something with them immediately (like print them out), without
> altering
> > the
> > >> originals in the cas if other AEs may use them.
> > >>
> > >> Tim
> > >>
> > >>
> > >>
> > >> ________________________________________
> > >> From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]
> > >> Sent: Sunday, May 11, 2014 9:15 AM
> > >> To: dev@ctakes.apache.org
> > >> Subject: Re: markable types
> > >>
> > >> Steven,
> > >>
> > >> Would you have any example code of tree parser so the output can be
> > >> arranged as per need. I mean, after successful annotation, I want to
> > >> extract certain concepts like medication only and arrange them in a
> new
> > >> tree so that all annotation in reference to medication concept and
> their
> > >> sources are listed together.
> > >>
> > >> Anir
> > >>
> > >>
> > >> On Sun, May 11, 2014 at 3:55 PM, Steven Bethard <
> > steven.bethard@gmail.com
> > >>> wrote:
> > >>> I don't think "not something anyone would want extracted" should be
> an
> > >>> argument against anything. We already have constituent and dependency
> > >>> parse trees in the type system, and those would fall under that
> > >>> category.
> > >>>
> > >>> So +1 on markables in the type system. (In general, +1 on moving
> > >>> module-specific types to the standard type system. I'm not sure what
> > >>> the real benefit of splitting them out is...)
> > >>>
> > >>> Steve
> > >>>
> > >>> On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy
> > >>> <Ti...@childrens.harvard.edu> wrote:
> > >>>> What do people think about taking the "markable" types out of the
> > >>>> coreference project and adding them to the standard type system?
> This
> > >> is
> > >>>> a pretty standard concept in coreference that doesn't really have a
> > >>>> great natural representation in the current type system -- it
> > >>>> encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
> > >>>> "her") and some determiners ("this").
> > >>>>
> > >>>> The drawback I can see is that it is probably not something anyone
> > >> would
> > >>>> want extracted -- ultimately you want the actual coref pairs or
> > chains.
> > >>>> But it is useful for things like representing gold standard input or
> > >>>> splitting coreference resolution into separate markable recognition
> > and
> > >>>> relation classification steps.
> > >>>>
> > >>>> Tim
> > >>>>
> >
> > --
> > Tim Miller
> > Instructor
> > Boston Children's Hospital and Harvard Medical School
> > timothy.miller@childrens.harvard.edu
> > 617-919-1223
> >
> >
>

RE: markable types

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.
I don't understand this question. Can you try to rephrase it? Or maybe if you tell me what you want to do that would help me understand.

________________________________________
From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]
Sent: Tuesday, May 20, 2014 6:34 AM
To: dev@ctakes.apache.org
Subject: Re: markable types

thanks again Timothy

final question for now

You had explained that each sentence is parsed and is converted to a
> tree with head and terminal node . Is the typesystem of ctakes an feature
> of the node, i.e can one node belong to two more typesystems and their
> further attributes OR for each type system , there is a syntax tree for
> every sentence parsed. I mean a sentence has various trees attached to it
> but there is 1:1 mapping between the node and typesystem.

Anir


On Tue, May 20, 2014 at 2:17 AM, Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

>
> On 05/18/2014 07:40 AM, Anirban Chakraborti wrote:
> > Timothy,
> >
> > 1. so to get concepts of procedure, lab (if any), disease disorder , sign
> > symptoms, Anatomical sites , I would need to do
> >
> > List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
> > MedicationMention.class) ;
> > List<DiseaseDisorderMention> disease = new
> > ArrayList<>(JCasUtil.select(jcas, DiseaseDisorderMention.class);
> > List<SignSymptomsMention> signs = new ArrayList<>(JCasUtil.select(jcas,
> > SignSymptomMention.class);
> > List<AnatomicalMention> anatomy = new ArrayList
> > <> (JacsUtil.select(jcas,AnatomicalMention.class);
> > List <LabMention> labs = new ArrayList <>
> > (JacsUtil.select(jcas,LabMention.class);
> >
> > then check the size of the array { meds, disease, signs, anatomy , labs}
> ,
> > print out the array or make a new array using the Java.utils.List or
> > Java.utils.Arraylist  package interfaces as the case might me.  Right ...
> yep
> > 2. I am more interested in the IdentifiedAnnotation class. However there
> > are concepts like FractionAnnotation which are not defined enum in the
> > const.java. How do I handle them. Do I need to add to the const.java
> file.
> nope, you probably just want EntityMention (for anatomical sites) and
> EventMention (for all clinical events, including DiseaseDisorder,
> Procedure, SignSymptom, etc.).
>
> >
> > 3. what exactly is the functional difference between say
> > MedicationEventMention .java, MedicationMention.java, Medication.java and
> > MedicationEventMention_type.java .  I understand similar difference is
> > between class of lab, procedure etc...
> The types ending in _type.java are UIMA-internal types, you can ignore.
> Medication is a referential type -- something in the real world that
> could be referred to multiple times in a document. What you probably
> want are the mention types. Here I believe MedicationMention is the
> preferred type going forward for a particular mention of a medication in
> text (MedicationEventMention is the same thing but not preferred going
> forward).
>
>
> >
> > 4.  You had explained that each sentence is parsed and is converted to a
> > tree with head and terminal node . Is the typesystem of ctakes an feature
> > of the node, i.e can one node belong to two more typesystems and their
> > further attributes OR for each type system , there is a syntax tree for
> > every sentence parsed. I mean a sentence has various trees attached to it
> > but there is 1:1 mapping between the node and typesystem.
> >
> > Many Thanks
> >
> > Anirban
> >
> >
> >
> >
> >
> > On Thu, May 15, 2014 at 5:03 PM, Miller, Timothy <
> > Timothy.Miller@childrens.harvard.edu> wrote:
> >
> >> Anir -- I'm not sure I understand your question but from your example it
> >> doesn't sound like a tree exactly. If you just want a list of medication
> >> concepts you can do something like:
> >>
> >> List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
> >> MedicationMention.class));
> >> (I believe MedicationMention is the correct class but check your
> output.)
> >>
> >> If you really do want to put them into a syntax tree, there are also
> >> methods for doing that in AnnotationTreeUtils class.
> >>
> >> getAnnotationTree(JCas, Annotation) will give you the tree for the whole
> >> sentence containing the annotation you give it
> >> annotationNode(JCas, Annotation) will give you the smallest subtree tree
> >> covering the annotation you give it.
> >> insertAnnotationNode(JCas, TopTreebankNode, Annotation, String) will
> >> insert a node into the tree specified at the level specified by the
> >> annotation with the category specified by the string. So for example if
> you
> >> had meds as above you could then do:
> >>
> >> for(MedicationMention med : meds){
> >>   AnnotationTreeUtils.insertAnnotationNode(jcas,
> >> AnnotationTreeUtils.getAnnotationTree(jcas, med), med, "MEDICATION")
> >> }
> >>
> >> which would insert a new node into every tree with the label
> "MEDICATION"
> >> in every position where a medication was found.
> >>
> >> One caveat to the above code is that these methods actually will change
> >> the tree in the cas. That might be ok for some use cases but for many
> you
> >> want to work on a tree outside the cas so that's why there is also
> methods:
> >> getTreeCopy(JCas, TopTreebankNode)
> >> getTreeCopy(JCas, TreebankNode)
> >>
> >> if you use the getAnnotationTree method to obtain the tree you want,
> then
> >> you can get a copy from these methods, then use the insert methods and
> do
> >> something with them immediately (like print them out), without altering
> the
> >> originals in the cas if other AEs may use them.
> >>
> >> Tim
> >>
> >>
> >>
> >> ________________________________________
> >> From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]
> >> Sent: Sunday, May 11, 2014 9:15 AM
> >> To: dev@ctakes.apache.org
> >> Subject: Re: markable types
> >>
> >> Steven,
> >>
> >> Would you have any example code of tree parser so the output can be
> >> arranged as per need. I mean, after successful annotation, I want to
> >> extract certain concepts like medication only and arrange them in a new
> >> tree so that all annotation in reference to medication concept and their
> >> sources are listed together.
> >>
> >> Anir
> >>
> >>
> >> On Sun, May 11, 2014 at 3:55 PM, Steven Bethard <
> steven.bethard@gmail.com
> >>> wrote:
> >>> I don't think "not something anyone would want extracted" should be an
> >>> argument against anything. We already have constituent and dependency
> >>> parse trees in the type system, and those would fall under that
> >>> category.
> >>>
> >>> So +1 on markables in the type system. (In general, +1 on moving
> >>> module-specific types to the standard type system. I'm not sure what
> >>> the real benefit of splitting them out is...)
> >>>
> >>> Steve
> >>>
> >>> On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy
> >>> <Ti...@childrens.harvard.edu> wrote:
> >>>> What do people think about taking the "markable" types out of the
> >>>> coreference project and adding them to the standard type system? This
> >> is
> >>>> a pretty standard concept in coreference that doesn't really have a
> >>>> great natural representation in the current type system -- it
> >>>> encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
> >>>> "her") and some determiners ("this").
> >>>>
> >>>> The drawback I can see is that it is probably not something anyone
> >> would
> >>>> want extracted -- ultimately you want the actual coref pairs or
> chains.
> >>>> But it is useful for things like representing gold standard input or
> >>>> splitting coreference resolution into separate markable recognition
> and
> >>>> relation classification steps.
> >>>>
> >>>> Tim
> >>>>
>
> --
> Tim Miller
> Instructor
> Boston Children's Hospital and Harvard Medical School
> timothy.miller@childrens.harvard.edu
> 617-919-1223
>
>

Re: markable types

Posted by Anirban Chakraborti <ch...@googlemail.com>.
thanks again Timothy

final question for now

You had explained that each sentence is parsed and is converted to a
> tree with head and terminal node . Is the typesystem of ctakes an feature
> of the node, i.e can one node belong to two more typesystems and their
> further attributes OR for each type system , there is a syntax tree for
> every sentence parsed. I mean a sentence has various trees attached to it
> but there is 1:1 mapping between the node and typesystem.

Anir


On Tue, May 20, 2014 at 2:17 AM, Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

>
> On 05/18/2014 07:40 AM, Anirban Chakraborti wrote:
> > Timothy,
> >
> > 1. so to get concepts of procedure, lab (if any), disease disorder , sign
> > symptoms, Anatomical sites , I would need to do
> >
> > List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
> > MedicationMention.class) ;
> > List<DiseaseDisorderMention> disease = new
> > ArrayList<>(JCasUtil.select(jcas, DiseaseDisorderMention.class);
> > List<SignSymptomsMention> signs = new ArrayList<>(JCasUtil.select(jcas,
> > SignSymptomMention.class);
> > List<AnatomicalMention> anatomy = new ArrayList
> > <> (JacsUtil.select(jcas,AnatomicalMention.class);
> > List <LabMention> labs = new ArrayList <>
> > (JacsUtil.select(jcas,LabMention.class);
> >
> > then check the size of the array { meds, disease, signs, anatomy , labs}
> ,
> > print out the array or make a new array using the Java.utils.List or
> > Java.utils.Arraylist  package interfaces as the case might me.  Right ...
> yep
> > 2. I am more interested in the IdentifiedAnnotation class. However there
> > are concepts like FractionAnnotation which are not defined enum in the
> > const.java. How do I handle them. Do I need to add to the const.java
> file.
> nope, you probably just want EntityMention (for anatomical sites) and
> EventMention (for all clinical events, including DiseaseDisorder,
> Procedure, SignSymptom, etc.).
>
> >
> > 3. what exactly is the functional difference between say
> > MedicationEventMention .java, MedicationMention.java, Medication.java and
> > MedicationEventMention_type.java .  I understand similar difference is
> > between class of lab, procedure etc...
> The types ending in _type.java are UIMA-internal types, you can ignore.
> Medication is a referential type -- something in the real world that
> could be referred to multiple times in a document. What you probably
> want are the mention types. Here I believe MedicationMention is the
> preferred type going forward for a particular mention of a medication in
> text (MedicationEventMention is the same thing but not preferred going
> forward).
>
>
> >
> > 4.  You had explained that each sentence is parsed and is converted to a
> > tree with head and terminal node . Is the typesystem of ctakes an feature
> > of the node, i.e can one node belong to two more typesystems and their
> > further attributes OR for each type system , there is a syntax tree for
> > every sentence parsed. I mean a sentence has various trees attached to it
> > but there is 1:1 mapping between the node and typesystem.
> >
> > Many Thanks
> >
> > Anirban
> >
> >
> >
> >
> >
> > On Thu, May 15, 2014 at 5:03 PM, Miller, Timothy <
> > Timothy.Miller@childrens.harvard.edu> wrote:
> >
> >> Anir -- I'm not sure I understand your question but from your example it
> >> doesn't sound like a tree exactly. If you just want a list of medication
> >> concepts you can do something like:
> >>
> >> List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
> >> MedicationMention.class));
> >> (I believe MedicationMention is the correct class but check your
> output.)
> >>
> >> If you really do want to put them into a syntax tree, there are also
> >> methods for doing that in AnnotationTreeUtils class.
> >>
> >> getAnnotationTree(JCas, Annotation) will give you the tree for the whole
> >> sentence containing the annotation you give it
> >> annotationNode(JCas, Annotation) will give you the smallest subtree tree
> >> covering the annotation you give it.
> >> insertAnnotationNode(JCas, TopTreebankNode, Annotation, String) will
> >> insert a node into the tree specified at the level specified by the
> >> annotation with the category specified by the string. So for example if
> you
> >> had meds as above you could then do:
> >>
> >> for(MedicationMention med : meds){
> >>   AnnotationTreeUtils.insertAnnotationNode(jcas,
> >> AnnotationTreeUtils.getAnnotationTree(jcas, med), med, "MEDICATION")
> >> }
> >>
> >> which would insert a new node into every tree with the label
> "MEDICATION"
> >> in every position where a medication was found.
> >>
> >> One caveat to the above code is that these methods actually will change
> >> the tree in the cas. That might be ok for some use cases but for many
> you
> >> want to work on a tree outside the cas so that's why there is also
> methods:
> >> getTreeCopy(JCas, TopTreebankNode)
> >> getTreeCopy(JCas, TreebankNode)
> >>
> >> if you use the getAnnotationTree method to obtain the tree you want,
> then
> >> you can get a copy from these methods, then use the insert methods and
> do
> >> something with them immediately (like print them out), without altering
> the
> >> originals in the cas if other AEs may use them.
> >>
> >> Tim
> >>
> >>
> >>
> >> ________________________________________
> >> From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]
> >> Sent: Sunday, May 11, 2014 9:15 AM
> >> To: dev@ctakes.apache.org
> >> Subject: Re: markable types
> >>
> >> Steven,
> >>
> >> Would you have any example code of tree parser so the output can be
> >> arranged as per need. I mean, after successful annotation, I want to
> >> extract certain concepts like medication only and arrange them in a new
> >> tree so that all annotation in reference to medication concept and their
> >> sources are listed together.
> >>
> >> Anir
> >>
> >>
> >> On Sun, May 11, 2014 at 3:55 PM, Steven Bethard <
> steven.bethard@gmail.com
> >>> wrote:
> >>> I don't think "not something anyone would want extracted" should be an
> >>> argument against anything. We already have constituent and dependency
> >>> parse trees in the type system, and those would fall under that
> >>> category.
> >>>
> >>> So +1 on markables in the type system. (In general, +1 on moving
> >>> module-specific types to the standard type system. I'm not sure what
> >>> the real benefit of splitting them out is...)
> >>>
> >>> Steve
> >>>
> >>> On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy
> >>> <Ti...@childrens.harvard.edu> wrote:
> >>>> What do people think about taking the "markable" types out of the
> >>>> coreference project and adding them to the standard type system? This
> >> is
> >>>> a pretty standard concept in coreference that doesn't really have a
> >>>> great natural representation in the current type system -- it
> >>>> encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
> >>>> "her") and some determiners ("this").
> >>>>
> >>>> The drawback I can see is that it is probably not something anyone
> >> would
> >>>> want extracted -- ultimately you want the actual coref pairs or
> chains.
> >>>> But it is useful for things like representing gold standard input or
> >>>> splitting coreference resolution into separate markable recognition
> and
> >>>> relation classification steps.
> >>>>
> >>>> Tim
> >>>>
>
> --
> Tim Miller
> Instructor
> Boston Children's Hospital and Harvard Medical School
> timothy.miller@childrens.harvard.edu
> 617-919-1223
>
>

Re: markable types

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.
On 05/18/2014 07:40 AM, Anirban Chakraborti wrote:
> Timothy,
>
> 1. so to get concepts of procedure, lab (if any), disease disorder , sign
> symptoms, Anatomical sites , I would need to do
>
> List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
> MedicationMention.class) ;
> List<DiseaseDisorderMention> disease = new
> ArrayList<>(JCasUtil.select(jcas, DiseaseDisorderMention.class);
> List<SignSymptomsMention> signs = new ArrayList<>(JCasUtil.select(jcas,
> SignSymptomMention.class);
> List<AnatomicalMention> anatomy = new ArrayList
> <> (JacsUtil.select(jcas,AnatomicalMention.class);
> List <LabMention> labs = new ArrayList <>
> (JacsUtil.select(jcas,LabMention.class);
>
> then check the size of the array { meds, disease, signs, anatomy , labs} ,
> print out the array or make a new array using the Java.utils.List or
> Java.utils.Arraylist  package interfaces as the case might me.  Right ...
yep
> 2. I am more interested in the IdentifiedAnnotation class. However there
> are concepts like FractionAnnotation which are not defined enum in the
> const.java. How do I handle them. Do I need to add to the const.java file.
nope, you probably just want EntityMention (for anatomical sites) and
EventMention (for all clinical events, including DiseaseDisorder,
Procedure, SignSymptom, etc.).

>
> 3. what exactly is the functional difference between say
> MedicationEventMention .java, MedicationMention.java, Medication.java and
> MedicationEventMention_type.java .  I understand similar difference is
> between class of lab, procedure etc...
The types ending in _type.java are UIMA-internal types, you can ignore.
Medication is a referential type -- something in the real world that
could be referred to multiple times in a document. What you probably
want are the mention types. Here I believe MedicationMention is the
preferred type going forward for a particular mention of a medication in
text (MedicationEventMention is the same thing but not preferred going
forward).


>
> 4.  You had explained that each sentence is parsed and is converted to a
> tree with head and terminal node . Is the typesystem of ctakes an feature
> of the node, i.e can one node belong to two more typesystems and their
> further attributes OR for each type system , there is a syntax tree for
> every sentence parsed. I mean a sentence has various trees attached to it
> but there is 1:1 mapping between the node and typesystem.
>
> Many Thanks
>
> Anirban
>
>
>
>
>
> On Thu, May 15, 2014 at 5:03 PM, Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu> wrote:
>
>> Anir -- I'm not sure I understand your question but from your example it
>> doesn't sound like a tree exactly. If you just want a list of medication
>> concepts you can do something like:
>>
>> List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
>> MedicationMention.class));
>> (I believe MedicationMention is the correct class but check your output.)
>>
>> If you really do want to put them into a syntax tree, there are also
>> methods for doing that in AnnotationTreeUtils class.
>>
>> getAnnotationTree(JCas, Annotation) will give you the tree for the whole
>> sentence containing the annotation you give it
>> annotationNode(JCas, Annotation) will give you the smallest subtree tree
>> covering the annotation you give it.
>> insertAnnotationNode(JCas, TopTreebankNode, Annotation, String) will
>> insert a node into the tree specified at the level specified by the
>> annotation with the category specified by the string. So for example if you
>> had meds as above you could then do:
>>
>> for(MedicationMention med : meds){
>>   AnnotationTreeUtils.insertAnnotationNode(jcas,
>> AnnotationTreeUtils.getAnnotationTree(jcas, med), med, "MEDICATION")
>> }
>>
>> which would insert a new node into every tree with the label "MEDICATION"
>> in every position where a medication was found.
>>
>> One caveat to the above code is that these methods actually will change
>> the tree in the cas. That might be ok for some use cases but for many you
>> want to work on a tree outside the cas so that's why there is also methods:
>> getTreeCopy(JCas, TopTreebankNode)
>> getTreeCopy(JCas, TreebankNode)
>>
>> if you use the getAnnotationTree method to obtain the tree you want, then
>> you can get a copy from these methods, then use the insert methods and do
>> something with them immediately (like print them out), without altering the
>> originals in the cas if other AEs may use them.
>>
>> Tim
>>
>>
>>
>> ________________________________________
>> From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]
>> Sent: Sunday, May 11, 2014 9:15 AM
>> To: dev@ctakes.apache.org
>> Subject: Re: markable types
>>
>> Steven,
>>
>> Would you have any example code of tree parser so the output can be
>> arranged as per need. I mean, after successful annotation, I want to
>> extract certain concepts like medication only and arrange them in a new
>> tree so that all annotation in reference to medication concept and their
>> sources are listed together.
>>
>> Anir
>>
>>
>> On Sun, May 11, 2014 at 3:55 PM, Steven Bethard <steven.bethard@gmail.com
>>> wrote:
>>> I don't think "not something anyone would want extracted" should be an
>>> argument against anything. We already have constituent and dependency
>>> parse trees in the type system, and those would fall under that
>>> category.
>>>
>>> So +1 on markables in the type system. (In general, +1 on moving
>>> module-specific types to the standard type system. I'm not sure what
>>> the real benefit of splitting them out is...)
>>>
>>> Steve
>>>
>>> On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy
>>> <Ti...@childrens.harvard.edu> wrote:
>>>> What do people think about taking the "markable" types out of the
>>>> coreference project and adding them to the standard type system? This
>> is
>>>> a pretty standard concept in coreference that doesn't really have a
>>>> great natural representation in the current type system -- it
>>>> encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
>>>> "her") and some determiners ("this").
>>>>
>>>> The drawback I can see is that it is probably not something anyone
>> would
>>>> want extracted -- ultimately you want the actual coref pairs or chains.
>>>> But it is useful for things like representing gold standard input or
>>>> splitting coreference resolution into separate markable recognition and
>>>> relation classification steps.
>>>>
>>>> Tim
>>>>

-- 
Tim Miller
Instructor
Boston Children's Hospital and Harvard Medical School
timothy.miller@childrens.harvard.edu
617-919-1223


Re: markable types

Posted by Anirban Chakraborti <ch...@googlemail.com>.
Timothy,

1. so to get concepts of procedure, lab (if any), disease disorder , sign
symptoms, Anatomical sites , I would need to do

List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
MedicationMention.class) ;
List<DiseaseDisorderMention> disease = new
ArrayList<>(JCasUtil.select(jcas, DiseaseDisorderMention.class);
List<SignSymptomsMention> signs = new ArrayList<>(JCasUtil.select(jcas,
SignSymptomMention.class);
List<AnatomicalMention> anatomy = new ArrayList
<> (JacsUtil.select(jcas,AnatomicalMention.class);
List <LabMention> labs = new ArrayList <>
(JacsUtil.select(jcas,LabMention.class);

then check the size of the array { meds, disease, signs, anatomy , labs} ,
print out the array or make a new array using the Java.utils.List or
Java.utils.Arraylist  package interfaces as the case might me.  Right ...

2. I am more interested in the IdentifiedAnnotation class. However there
are concepts like FractionAnnotation which are not defined enum in the
const.java. How do I handle them. Do I need to add to the const.java file.


3. what exactly is the functional difference between say
MedicationEventMention .java, MedicationMention.java, Medication.java and
MedicationEventMention_type.java .  I understand similar difference is
between class of lab, procedure etc...

4.  You had explained that each sentence is parsed and is converted to a
tree with head and terminal node . Is the typesystem of ctakes an feature
of the node, i.e can one node belong to two more typesystems and their
further attributes OR for each type system , there is a syntax tree for
every sentence parsed. I mean a sentence has various trees attached to it
but there is 1:1 mapping between the node and typesystem.

Many Thanks

Anirban





On Thu, May 15, 2014 at 5:03 PM, Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

> Anir -- I'm not sure I understand your question but from your example it
> doesn't sound like a tree exactly. If you just want a list of medication
> concepts you can do something like:
>
> List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas,
> MedicationMention.class));
> (I believe MedicationMention is the correct class but check your output.)
>
> If you really do want to put them into a syntax tree, there are also
> methods for doing that in AnnotationTreeUtils class.
>
> getAnnotationTree(JCas, Annotation) will give you the tree for the whole
> sentence containing the annotation you give it
> annotationNode(JCas, Annotation) will give you the smallest subtree tree
> covering the annotation you give it.
> insertAnnotationNode(JCas, TopTreebankNode, Annotation, String) will
> insert a node into the tree specified at the level specified by the
> annotation with the category specified by the string. So for example if you
> had meds as above you could then do:
>
> for(MedicationMention med : meds){
>   AnnotationTreeUtils.insertAnnotationNode(jcas,
> AnnotationTreeUtils.getAnnotationTree(jcas, med), med, "MEDICATION")
> }
>
> which would insert a new node into every tree with the label "MEDICATION"
> in every position where a medication was found.
>
> One caveat to the above code is that these methods actually will change
> the tree in the cas. That might be ok for some use cases but for many you
> want to work on a tree outside the cas so that's why there is also methods:
> getTreeCopy(JCas, TopTreebankNode)
> getTreeCopy(JCas, TreebankNode)
>
> if you use the getAnnotationTree method to obtain the tree you want, then
> you can get a copy from these methods, then use the insert methods and do
> something with them immediately (like print them out), without altering the
> originals in the cas if other AEs may use them.
>
> Tim
>
>
>
> ________________________________________
> From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]
> Sent: Sunday, May 11, 2014 9:15 AM
> To: dev@ctakes.apache.org
> Subject: Re: markable types
>
> Steven,
>
> Would you have any example code of tree parser so the output can be
> arranged as per need. I mean, after successful annotation, I want to
> extract certain concepts like medication only and arrange them in a new
> tree so that all annotation in reference to medication concept and their
> sources are listed together.
>
> Anir
>
>
> On Sun, May 11, 2014 at 3:55 PM, Steven Bethard <steven.bethard@gmail.com
> >wrote:
>
> > I don't think "not something anyone would want extracted" should be an
> > argument against anything. We already have constituent and dependency
> > parse trees in the type system, and those would fall under that
> > category.
> >
> > So +1 on markables in the type system. (In general, +1 on moving
> > module-specific types to the standard type system. I'm not sure what
> > the real benefit of splitting them out is...)
> >
> > Steve
> >
> > On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy
> > <Ti...@childrens.harvard.edu> wrote:
> > > What do people think about taking the "markable" types out of the
> > > coreference project and adding them to the standard type system? This
> is
> > > a pretty standard concept in coreference that doesn't really have a
> > > great natural representation in the current type system -- it
> > > encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
> > > "her") and some determiners ("this").
> > >
> > > The drawback I can see is that it is probably not something anyone
> would
> > > want extracted -- ultimately you want the actual coref pairs or chains.
> > > But it is useful for things like representing gold standard input or
> > > splitting coreference resolution into separate markable recognition and
> > > relation classification steps.
> > >
> > > Tim
> > >
> >
>

RE: markable types

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.
Anir -- I'm not sure I understand your question but from your example it doesn't sound like a tree exactly. If you just want a list of medication concepts you can do something like:

List<MedicationMention> meds = new ArrayList<>(JCasUtil.select(jcas, MedicationMention.class));
(I believe MedicationMention is the correct class but check your output.)

If you really do want to put them into a syntax tree, there are also methods for doing that in AnnotationTreeUtils class.

getAnnotationTree(JCas, Annotation) will give you the tree for the whole sentence containing the annotation you give it
annotationNode(JCas, Annotation) will give you the smallest subtree tree covering the annotation you give it.
insertAnnotationNode(JCas, TopTreebankNode, Annotation, String) will insert a node into the tree specified at the level specified by the annotation with the category specified by the string. So for example if you had meds as above you could then do:

for(MedicationMention med : meds){
  AnnotationTreeUtils.insertAnnotationNode(jcas, AnnotationTreeUtils.getAnnotationTree(jcas, med), med, "MEDICATION")
}

which would insert a new node into every tree with the label "MEDICATION" in every position where a medication was found.

One caveat to the above code is that these methods actually will change the tree in the cas. That might be ok for some use cases but for many you want to work on a tree outside the cas so that's why there is also methods: 
getTreeCopy(JCas, TopTreebankNode)
getTreeCopy(JCas, TreebankNode)

if you use the getAnnotationTree method to obtain the tree you want, then you can get a copy from these methods, then use the insert methods and do something with them immediately (like print them out), without altering the originals in the cas if other AEs may use them.

Tim



________________________________________
From: Anirban Chakraborti [chakraborti.anirban@googlemail.com]
Sent: Sunday, May 11, 2014 9:15 AM
To: dev@ctakes.apache.org
Subject: Re: markable types

Steven,

Would you have any example code of tree parser so the output can be
arranged as per need. I mean, after successful annotation, I want to
extract certain concepts like medication only and arrange them in a new
tree so that all annotation in reference to medication concept and their
sources are listed together.

Anir


On Sun, May 11, 2014 at 3:55 PM, Steven Bethard <st...@gmail.com>wrote:

> I don't think "not something anyone would want extracted" should be an
> argument against anything. We already have constituent and dependency
> parse trees in the type system, and those would fall under that
> category.
>
> So +1 on markables in the type system. (In general, +1 on moving
> module-specific types to the standard type system. I'm not sure what
> the real benefit of splitting them out is...)
>
> Steve
>
> On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy
> <Ti...@childrens.harvard.edu> wrote:
> > What do people think about taking the "markable" types out of the
> > coreference project and adding them to the standard type system? This is
> > a pretty standard concept in coreference that doesn't really have a
> > great natural representation in the current type system -- it
> > encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
> > "her") and some determiners ("this").
> >
> > The drawback I can see is that it is probably not something anyone would
> > want extracted -- ultimately you want the actual coref pairs or chains.
> > But it is useful for things like representing gold standard input or
> > splitting coreference resolution into separate markable recognition and
> > relation classification steps.
> >
> > Tim
> >
>

Re: markable types

Posted by Anirban Chakraborti <ch...@googlemail.com>.
Steven,

Would you have any example code of tree parser so the output can be
arranged as per need. I mean, after successful annotation, I want to
extract certain concepts like medication only and arrange them in a new
tree so that all annotation in reference to medication concept and their
sources are listed together.

Anir


On Sun, May 11, 2014 at 3:55 PM, Steven Bethard <st...@gmail.com>wrote:

> I don't think "not something anyone would want extracted" should be an
> argument against anything. We already have constituent and dependency
> parse trees in the type system, and those would fall under that
> category.
>
> So +1 on markables in the type system. (In general, +1 on moving
> module-specific types to the standard type system. I'm not sure what
> the real benefit of splitting them out is...)
>
> Steve
>
> On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy
> <Ti...@childrens.harvard.edu> wrote:
> > What do people think about taking the "markable" types out of the
> > coreference project and adding them to the standard type system? This is
> > a pretty standard concept in coreference that doesn't really have a
> > great natural representation in the current type system -- it
> > encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
> > "her") and some determiners ("this").
> >
> > The drawback I can see is that it is probably not something anyone would
> > want extracted -- ultimately you want the actual coref pairs or chains.
> > But it is useful for things like representing gold standard input or
> > splitting coreference resolution into separate markable recognition and
> > relation classification steps.
> >
> > Tim
> >
>

Re: markable types

Posted by Steven Bethard <st...@gmail.com>.
I don't think "not something anyone would want extracted" should be an
argument against anything. We already have constituent and dependency
parse trees in the type system, and those would fall under that
category.

So +1 on markables in the type system. (In general, +1 on moving
module-specific types to the standard type system. I'm not sure what
the real benefit of splitting them out is...)

Steve

On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy
<Ti...@childrens.harvard.edu> wrote:
> What do people think about taking the "markable" types out of the
> coreference project and adding them to the standard type system? This is
> a pretty standard concept in coreference that doesn't really have a
> great natural representation in the current type system -- it
> encompasses IdentifiedAnnotations as well as pronouns ("It", "him",
> "her") and some determiners ("this").
>
> The drawback I can see is that it is probably not something anyone would
> want extracted -- ultimately you want the actual coref pairs or chains.
> But it is useful for things like representing gold standard input or
> splitting coreference resolution into separate markable recognition and
> relation classification steps.
>
> Tim
>