You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Burn Lewis <bu...@gmail.com> on 2008/09/12 19:44:03 UTC

Donation of a widely used type system for multi-modal text analysis

We would like to add a UIMA type system and sample annotators to the Apache
Incubator project as an example of a rich multimodal application. Our hope
is that others will find the techniques and types useful, and will find it a
good starting point for developing other multimodal applications.

GTS is a type system designed for multi-modal applications that combine
analytics from multiple sources and modalities, such as speech recognition,
language translation, entity detection. etc.  It is currently used by 10
cooperating groups participating in the Darpa GALE project (
http://www.darpa.mil/ipto/programs/gale/gale.asp) to transcribe, translate,
and extract information from foreign language news broadcasts.  This
application requires that all the data is cross-referenced so that, for
example, any English sentence can be traced back to the precise region of
foreign language audio that generated it.

The CAS organization and type system have been designed to allow each
analytic to easily work on data of the appropriate modality.  Speech
recognition engines annotate an audio view with words aligned to a time
axis; machine translation annotates a text view of foreign sentences with
their English translation; entity detection annotates a text view of the
English sentences.  Multiple analytics of each type may be employed to
improve the overall accuracy.

The sample code includes data reorganization components that are inserted
between the different analytics to perform the necessary bookkeeping of
creating views and cross-reference links from one view back to an earlier
one.  e.g. after all speech recognition analytics have run, a reorg module
creates a source-language text view for each STT engine, along with
cross-reference annotations from each word in the new view back to the
appropriate time span in the audio view.  One reorg component is a CAS
Multiplier that resegments the initial fixed-length audio segments at likely
story boundaries so that later components can treat each CAS as a complete
story.  The STT and MT analytics are simulated analytics that read their
results from a file, so that a complete pipeline of components can be
tested.

We welcome any comments or suggestions or questions!

- Burn.

Re: Donation of a widely used type system for multi-modal text analysis

Posted by Marshall Schor <ms...@schor.com>.
I think this could be a good new kind of donation for the sandbox. 
Perhaps we could have a collection of these and by their being present
and available for easy download, ones that are more generally of use and
interest to the community could gradually evolve.

So I'm +1 for this kind of donation, especially since this particular
one has been actively used by several groups already.

-Marshall

Burn Lewis wrote:
> We would like to add a UIMA type system and sample annotators to the Apache
> Incubator project as an example of a rich multimodal application. Our hope
> is that others will find the techniques and types useful, and will find it a
> good starting point for developing other multimodal applications.
>
> GTS is a type system designed for multi-modal applications that combine
> analytics from multiple sources and modalities, such as speech recognition,
> language translation, entity detection. etc.  It is currently used by 10
> cooperating groups participating in the Darpa GALE project (
> http://www.darpa.mil/ipto/programs/gale/gale.asp) to transcribe, translate,
> and extract information from foreign language news broadcasts.  This
> application requires that all the data is cross-referenced so that, for
> example, any English sentence can be traced back to the precise region of
> foreign language audio that generated it.
>
> The CAS organization and type system have been designed to allow each
> analytic to easily work on data of the appropriate modality.  Speech
> recognition engines annotate an audio view with words aligned to a time
> axis; machine translation annotates a text view of foreign sentences with
> their English translation; entity detection annotates a text view of the
> English sentences.  Multiple analytics of each type may be employed to
> improve the overall accuracy.
>
> The sample code includes data reorganization components that are inserted
> between the different analytics to perform the necessary bookkeeping of
> creating views and cross-reference links from one view back to an earlier
> one.  e.g. after all speech recognition analytics have run, a reorg module
> creates a source-language text view for each STT engine, along with
> cross-reference annotations from each word in the new view back to the
> appropriate time span in the audio view.  One reorg component is a CAS
> Multiplier that resegments the initial fixed-length audio segments at likely
> story boundaries so that later components can treat each CAS as a complete
> story.  The STT and MT analytics are simulated analytics that read their
> results from a file, so that a complete pipeline of components can be
> tested.
>
> We welcome any comments or suggestions or questions!
>
> - Burn.
>
>