You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Ramon Ziai <ra...@linuxusers.de> on 2009/02/06 03:01:41 UTC

Re: Representing parse trees in the CAS

Hi Matthias,

thanks for the suggestions. I'll comment them below.

Matthias Wendt schrieb:

> storing pointers to the subconstituents (top down) might be an
> alternative - which is quite straightforward. However, if you don't plan
> to disambiguate the tree there may be sets of pointers.

This is what I ended up doing. But instead of designing my type system
in such a way that it can represent a parse forest (which involves sets
of pointers), I stored multiple versions of the same subtree. That's not
very efficient in terms of space but my CAS are small and I can afford that.
I also additionally store a pointer to the parent constituent so I can
go both ways.

> There is also a second alternative that works without saving the
> pointers in the feature structures at all: the method tree
> <http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/api/org/apache/uima/cas/text/AnnotationIndex.html#tree%28org.apache.uima.cas.text.AnnotationFS%29>
> in the AnnotationIndex. However, care has to be taken in order to avoid
> endless recursion (in case of multiple annotations having the same span)
> here. To achieve this consult the reference manual
> <http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/references/references.html>
> about type priorities.

That sounded terrific in the beginning, but unfortunately it has some
drawbacks. Type priorities cannot help if all my constituents have the
same type 'Constituent' with only a feature 'label' to distinguish them.
If instead I was to actually model my grammar categories in the type
system, i.e. create types 'NP', 'VP' and so forth I'd be tying the
grammar far too closely to the type system. A change in the grammar or
parser model would entail adapting the type system. And the type
priorities would have to encode grammatical constraints like "An S is
bigger than an NP". Apart from the fact that creating such an ordering
might be non-trivial, I think a type system should be more generic.

Best,
Ramon