You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Burn Lewis <bu...@gmail.com> on 2008/11/18 17:21:13 UTC

Donation of a widely used type system for multi-modal text analysis

NOTE:  This was foolishly posted to just uima-dev in September.  The type
system and some of the sample code are used in the recently mentioned UCC
tool from CMU.

We would like to add a UIMA type system and sample annotators to the Apache
Incubator project as an example of a rich multimodal application. Our hope
is that others will find the techniques and types useful, and will find it a
good starting point for developing other multimodal applications.

GTS is a type system designed for multi-modal applications that combine
analytics from multiple sources and modalities, such as speech recognition,
language translation, entity detection. etc.  It is currently used by 10
cooperating groups participating in the Darpa GALE project (
http://www.darpa.mil/ipto/programs/gale/gale.asp) to transcribe, translate,
and extract information from foreign language news broadcasts.  This
application requires that all the data is cross-referenced so that, for
example, any English sentence can be traced back to the precise region of
foreign language audio that generated it.

The CAS organization and type system have been designed to allow each
analytic to easily work on data of the appropriate modality.  Speech
recognition engines annotate an audio view with words aligned to a time
axis; machine translation annotates a text view of foreign sentences with
their English translation; entity detection annotates a text view of the
English sentences.  Multiple analytics of each type may be employed to
improve the overall accuracy.

The sample code includes data reorganization components that are inserted
between the different analytics to perform the necessary bookkeeping of
creating views and cross-reference links from one view back to an earlier
one.  e.g. after all speech recognition analytics have run, a reorg module
creates a source-language text view for each STT engine, along with
cross-reference annotations from each word in the new view back to the
appropriate time span in the audio view.  One reorg component is a CAS
Multiplier that resegments the initial fixed-length audio segments at likely
story boundaries so that later components can treat each CAS as a complete
story.  The STT and MT analytics are simulated analytics that read their
results from a file, so that a complete pipeline of components can be
tested.

We welcome any comments or suggestions or questions!

- Burn.