You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Keith Suderman <su...@anc.org> on 2009/05/07 22:10:02 UTC
Representing a DAG in UIMA
What is the best way to represent a tree and/or DAG in UIMA? Is
there a "standard" way to represent these structures? I've tried
searching the web, but all I can find is some discussion on the OASIS
list from 2007.
Currently we declare an annotation type with two special features,
children and ancestor, that are arrays of annotations. We then
populate these arrays with the children and ancestor
annotations. Everything works fine, but I wanted to know if there
was a better way, or at least a generally accepted way, of accomplishing this.
Thanks,
Keith Suderman
--------------------------------------------------
Research Associate
American National Corpus
http://www.anc.org
Re: Representing a DAG in UIMA
Posted by Keith Suderman <su...@anc.org>.
Hi Richard,
It is good to hear from you. We are still plugging away, and you are
correct, we are just finishing our work on importing GrAF (Graph
Annotation Framework) models into UIMA. I just wanted to do a quick
sanity check incase I was missing anything obvious.
At 01:23 AM 5/8/2009, Richard Eckart wrote:
>The type systems I do it in the same manner you do, only they may use
>just a "children" or just a "parents" feature and assume that an edge
>is always directed from parent to child - thus having both would be
>redundant.
Redundant yes, but having both allows easy navigation in both
directions given a particular annotation. It is the classic space vs
speed tradeoff and we do a lot of things like: find the target of a
given FrameNet annotation and then determine what part of the syntax
tree it belongs to.
>Actually the CAS is a DAG. You have an edge whenever a feature
>structure references another feature structure.
Yes, I should have phrased my question more carefully. I was
wondering about more explicit representations. For example, I see
there is an AnnotationTreeNode interface and was hoping for a similar
GraphNode interface.
I like your idea of expressing edges with an explicit 'Edge' feature
structure as edges are already feature structures in GrAF and this
provides a nice one-to-one correspondence. However, it would likely
make sense to allow the user to specify if edges should be implicit
(a reference to an annotation feature structure) or explicit (a
reference to an Edge feature structure). While GrAF allows labelled
edges, we don't actually have any graphs that label edges right now
so adding the extra layer of indirection would simply be extra
overhead, particularly for graphs with thousands of edges.
>I am pretty sure this is already more or less how you handle things,
>so in reiterating it I am probably only expressing that I do not know
>of any better idea either.
It sounds like we are on the same page. Thanks for the feedback.
Cheers,
Keith
--------------------------------------------------
Research Associate
American National Corpus
http://www.anc.org
Re: Representing a DAG in UIMA
Posted by Richard Eckart <ec...@linglit.tu-darmstadt.de>.
Hello Keith,
long time no see. How do you do?
> What is the best way to represent a tree and/or DAG in UIMA?
> Currently we declare an annotation type with two special features,
> children and ancestor, that are arrays of annotations. We then
> populate these arrays with the children and ancestor annotations.
There are some methods in UIMA to create a tree structure by
analysing how annotations cover each other, but this is not helpful
here, since an explicit declaration of dominance is required. The
type systems I do it in the same manner you do, only they may use
just a "children" or just a "parents" feature and assume that an edge
is always directed from parent to child - thus having both would be
redundant.
Actually the CAS is a DAG. You have an edge whenever a feature
structure references another feature structure. I think the only
thing you do here is to reserve two features of a particular type of
feature structure to represent dominance.
Since I suppose you need to represent GrAF in the CAS it may be
sensible to make edges a bit more explicit by elevating them to a
feature structure type. So your "parent", "children" or maybe "coref"
features would be arrays of a subtype of Edge instead of a subtype of
say Constituent and thus the edge could bear features.
I am pretty sure this is already more or less how you handle things,
so in reiterating it I am probably only expressing that I do not know
of any better idea either.
Cheers,
Richard
--
-------------------------------------------------------------------
Richard Eckart de Castilho
Software Engineer
Ubiquitous Knowledge Processing Lab
FB 20 Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone +49 (6151) 16 - 6218, fax -5455, room S2/02/E225
eckartde@tk.informatik.tu-darmstadt.de
www.ukp.tu-darmstadt.de
-------------------------------------------------------------------