You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Keith Suderman <su...@anc.org> on 2009/05/07 22:10:02 UTC

Representing a DAG in UIMA

What is the best way to represent a tree and/or DAG in UIMA?  Is 
there a "standard" way to represent these structures?  I've tried 
searching the web, but all I can find is some discussion on the OASIS 
list from 2007.

Currently we declare an annotation type with two special features, 
children and ancestor, that are arrays of annotations.  We then 
populate these arrays with the children and ancestor 
annotations.  Everything works fine, but I wanted to know if there 
was a better way, or at least a generally accepted way, of accomplishing this.

Thanks,
Keith Suderman

--------------------------------------------------
Research Associate
American National Corpus
http://www.anc.org

Re: Representing a DAG in UIMA

Posted by Keith Suderman <su...@anc.org>.

Hi Richard,

It is good to hear from you.  We are still plugging away, and you are 
correct, we are just finishing our work on importing GrAF (Graph 
Annotation Framework) models into UIMA. I just wanted to do a quick 
sanity check incase I was missing anything obvious.

At 01:23 AM 5/8/2009, Richard Eckart wrote:
>The type systems I do it in the same manner you do, only they may use
>just a "children" or just a "parents" feature and assume that an edge
>is always directed from parent to child - thus having both would be
>redundant.

Redundant yes, but having both allows easy navigation in both 
directions given a particular annotation. It is the classic space vs 
speed tradeoff and we do a lot of things like: find the target of a 
given FrameNet annotation and then determine what part of the syntax 
tree it belongs to.

>Actually the CAS is a DAG. You have an edge whenever a feature
>structure references another feature structure.

Yes, I should have phrased my question more carefully. I was 
wondering about more explicit representations. For example, I see 
there is an AnnotationTreeNode interface and was hoping for a similar 
GraphNode interface.

I like your idea of expressing edges with an explicit 'Edge' feature 
structure as edges are already feature structures in GrAF and this 
provides a nice one-to-one correspondence. However, it would likely 
make sense to allow the user to specify if edges should be implicit 
(a reference to an annotation feature structure) or explicit (a 
reference to an Edge feature structure).  While GrAF allows labelled 
edges, we don't actually have any graphs that label edges right now 
so adding the extra layer of indirection would simply be extra 
overhead, particularly for graphs with thousands of edges.

>I am pretty sure this is already more or less how you handle things,
>so in reiterating it I am probably only expressing that I do not know
>of any better idea either.

It sounds like we are on the same page. Thanks for the feedback.

Cheers,
Keith

--------------------------------------------------
Research Associate
American National Corpus
http://www.anc.org

Re: Representing a DAG in UIMA

Posted by Richard Eckart <ec...@linglit.tu-darmstadt.de>.

Hello Keith,

long time no see. How do you do?

> What is the best way to represent a tree and/or DAG in UIMA?  
> Currently we declare an annotation type with two special features,  
> children and ancestor, that are arrays of annotations.  We then  
> populate these arrays with the children and ancestor annotations.

There are some methods in UIMA to create a tree structure by  
analysing how annotations cover each other, but this is not helpful  
here, since an explicit declaration of dominance is required. The  
type systems I do it in the same manner you do, only they may use  
just a "children" or just a "parents" feature and assume that an edge  
is always directed from parent to child - thus having both would be  
redundant.

Actually the CAS is a DAG. You have an edge whenever a feature  
structure references another feature structure. I think the only  
thing you do here is to reserve two features of a particular type of  
feature structure to represent dominance.

Since I suppose you need to represent GrAF in the CAS it may be  
sensible to make edges a bit more explicit by elevating them to a  
feature structure type. So your "parent", "children" or maybe "coref"  
features would be arrays of a subtype of Edge instead of a subtype of  
say Constituent and thus the edge could bear features.

I am pretty sure this is already more or less how you handle things,  
so in reiterating it I am probably only expressing that I do not know  
of any better idea either.

Cheers,

Richard

-- 
-------------------------------------------------------------------
Richard Eckart de Castilho
Software Engineer
Ubiquitous Knowledge Processing Lab
FB 20 Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone +49 (6151) 16 - 6218, fax -5455, room S2/02/E225
eckartde@tk.informatik.tu-darmstadt.de
www.ukp.tu-darmstadt.de
-------------------------------------------------------------------