You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by js...@gmx-topmail.de on 2007/05/10 16:24:50 UTC

Modeling Ink Data

Hello,

I intend to use UIMA for analyzing electronic ink. This consists of a set of stroke objects, which, in turn, each contain a set of samples (x,y coordinates). Since I'm new to UIMA and I only found examples dealing with text analysis, I wonder how ink data can be represented best within UIMA. 

I would favor to use these pre-structured data as base data of one SOFA. A second SOFA would consist in the text analyzed by a handwriting recognition engine. 

Yet, as I understand, SOFA base data must be "flat". For ink data, this would imply to store all samples in an array of integers, which is then annotated with metadata for the structure (samples and strokes)? Is this the only way to model these data?

Thanks in advance for your help!

Jürgen






-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

Re: Modeling Ink Data

Posted by Thilo Goetz <tw...@gmx.de>.

jstk@gmx-topmail.de wrote:
> Hello,
> 
> I intend to use UIMA for analyzing electronic ink. This consists of a set of stroke objects, which, in turn, each contain a set of samples (x,y coordinates). Since I'm new to UIMA and I only found examples dealing with text analysis, I wonder how ink data can be represented best within UIMA. 
> 
> I would favor to use these pre-structured data as base data of one SOFA. A second SOFA would consist in the text analyzed by a handwriting recognition engine. 
> 
> Yet, as I understand, SOFA base data must be "flat". For ink data, this would imply to store all samples in an array of integers, which is then annotated with metadata for the structure (samples and strokes)? Is this the only way to model these data?
> 
> Thanks in advance for your help!
> 
> Jürgen
> 

Sounds like a cool project.  Anyway, you don't absolutely need any sofa
data, if it's not useful.  You can create *only* structured data
(feature structures) if you like.  They don't need to be anchored in
some (text-like) artifact if that's not useful for your problem domain.

So you could create stroke objects as containers, where you reference
the corresponding set of samples.  The samples could be simple objects
themselves.  You would probably want to put the stroke objects in a
custom index, sorted by some properties that makes sense for these kinds
of objects.

Hope this helps.  Let us know how it goes.

--Thilo