You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@uima.apache.org by Richard Eckart de Castilho <re...@apache.org> on 2016/09/12 10:00:24 UTC

Handling type conversion (was Re: "Standard" UIMA typesystem)

> On 12.09.2016, at 11:52, Joern Kottmann <ko...@gmail.com> wrote:
> 
>> On Sun, Sep 11, 2016 at 3:38 PM, Peter Klügl <pe...@averbis.com>
>> wrote:
>> 
>>> Am 09.09.2016 um 23:24 schrieb Joern Kottmann:
>>> 
>>> A framework like Uima has to make it easy to reuse components and in my
>>> opinion strict compile time typing makes that really difficult to
>> achieve.
>> 
>> Components are already very reusable if they use the same typesystem.
>> Again, that has nothing to do with compile time typing.
>> 
>> And, this is not the only purpose but there are many more, e.g., allow
>> the developer to create large maintainable pipelines.
>> In my opinion, components in UIMA are much more reusable because of the
>> static typing, not just throw-away prototypes.
> 
> I strongly disagree here I think the really static type system (and with
> JCas even compile time static) in UIMA makes it hard reuse a component,
> because I need to write explicit type system converters in many cases to be
> able to use them.

IMHO type converters are necessary whether or not the type system is compiled
statically. You seem to want per-component converters (what you call adapters).
I personally prefer converters at the beginning and end of pipeline sections
(which can be realized e.g. through collection readers, CAS consumers or
CAS mulipliers).

Regarding adapters: IMHO a UIMA component largely *is* the adapter between type system X
and underlying implementation Y. My hypothesis is that if there would be a generic
configurable mechanism by which this mapping functionality could be externalized
from a UIMA component, then this mechanism would have the same level of complexity
as the Java code which usually fulfills this purpose in a component. Furthermore,
I expect the remaining component code to become largely trivial then. - OR - if the
mapping functionality is reduced in functionality in order to become simpler, then
it would mean certain type system designs are not supported (cf. OpenNLP type
mapping not being compatible with the DKPro Core type system and others).

Type conversion is an entirely separate thing from JCas classes and managing
different JCas wrappers at the level of classloaders. Here, UIMA offers
the PEAR solution and I think a constructive discussion could revolve
how PEARs can be improved or replaced by a superior approach.

> The alternative to this would be a type system which is much less static
> (or dynamic) and APIs to write AEs which can adapt well to similar but
> different user defined type systems. This could be achieved by allowing
> type system mappings, by adding explicit support for adapters in the
> framework, allowing dynamic definition of types,
> 
> Together with Thilo I wrote a paper which speaks a bit about this topic
> (see at 6.4):
> http://www.aclweb.org/anthology/W14-5209

A more dynamic approach to the type system would be great, in particular
the ability to add types and features at execution time. We have an API
that in principle supports this (CAS). Again, this is decoupled from JCas
which is a higher-level API than CAS is.

>>> But maybe I am wrong and people have some good examples of Jcas based AEs
>>> which are nice to reuse in a simple custom pipeline.
>>> 
>> 
>> I am not interested in simple pipelines. I do not need UIMA for that. I
>> need UIMA for pipelines with dozens of components developed by different
>> people.
>> 
> If have a large pipeline you will end up writing two converters if you use
> an AE which can't adapt to your type system, one to convert to the AEs type
> system, this one you place before, and one to convert back from the AE type
> system to yours. I was speaking here about a simple example, and not a
> simple pipeline.

Hm, here you seem to be talking about not per-component adapters but adapters
between larger subsections of a pipeline - those I suggested could be implemented
e.g. via CAS multipliers. At this point, I am actually unsure what you are suggesting
in terms of what you call "adapters".

Cheers,

-- Richard

Re: Handling type conversion (was Re: "Standard" UIMA typesystem)

Posted by Peter Klügl <pe...@averbis.com>.

@Jrn: this is what the converter analysis engines do, but no objects
need to be created in the trivial case. For the non-trivial case its the
same for me.


@Richard: this could work, but only for the trivial case, same structure
different names.


Maybe I am a bit pessimistic, but I see many many problems and a lot of
work if someone wants to implement it.


Best,


Peter


Am 12.09.2016 um 16:36 schrieb Richard Eckart de Castilho:
> On 12.09.2016, at 16:53, Joern Kottmann <ko...@gmail.com> wrote:
>> With special APIs you could probably do the following things:
>> - Define type name mappings, type A looks like type B to the AE
>> - Define functions which are used to access the features of a FS (the
>> function can map the feature value to something new) and let the CAS APIs
>> take care of calling it
>> - Define functions which converts an entire FS of type A into an FS of type
>> B  and let the CAS APIs take care of calling it
>> - It could be possible to define adapters for AAEs as well (same TS AEs
>> could be grouped)
> Hm, JCas cover classes (might be able to) address all these issues if some assumptions
> are relaxed, e.g. that the name of a JCas class is the same as for a uima type.
>
> So I have a feeling that these things could be solved by allowing users to prep
> a CAS with their own cover classes which would be used instead of generated
> JCas cover classes.
>
> Does that make some initial sense already or should I elaborate this further...
> or does it sound completely wrong?
>
> Cheers,
>
> -- Richard

Re: Handling type conversion (was Re: "Standard" UIMA typesystem)

Posted by Richard Eckart de Castilho <re...@apache.org>.

On 12.09.2016, at 16:53, Joern Kottmann <ko...@gmail.com> wrote:
> 
> With special APIs you could probably do the following things:
> - Define type name mappings, type A looks like type B to the AE
> - Define functions which are used to access the features of a FS (the
> function can map the feature value to something new) and let the CAS APIs
> take care of calling it
> - Define functions which converts an entire FS of type A into an FS of type
> B  and let the CAS APIs take care of calling it
> - It could be possible to define adapters for AAEs as well (same TS AEs
> could be grouped)

Hm, JCas cover classes (might be able to) address all these issues if some assumptions
are relaxed, e.g. that the name of a JCas class is the same as for a uima type.

So I have a feeling that these things could be solved by allowing users to prep
a CAS with their own cover classes which would be used instead of generated
JCas cover classes.

Does that make some initial sense already or should I elaborate this further...
or does it sound completely wrong?

Cheers,

-- Richard

Re: Handling type conversion (was Re: "Standard" UIMA typesystem)

Posted by Joern Kottmann <ko...@gmail.com>.

On Mon, Sep 12, 2016 at 12:00 PM, Richard Eckart de Castilho <rec@apache.org
> wrote:

> > On 12.09.2016, at 11:52, Joern Kottmann <ko...@gmail.com> wrote:
> >
> >> On Sun, Sep 11, 2016 at 3:38 PM, Peter Klügl <pe...@averbis.com>
> >> wrote:
> >>
> >>> Am 09.09.2016 um 23:24 schrieb Joern Kottmann:
> >>>
> >>> A framework like Uima has to make it easy to reuse components and in my
> >>> opinion strict compile time typing makes that really difficult to
> >> achieve.
> >>
> >> Components are already very reusable if they use the same typesystem.
> >> Again, that has nothing to do with compile time typing.
> >>
> >> And, this is not the only purpose but there are many more, e.g., allow
> >> the developer to create large maintainable pipelines.
> >> In my opinion, components in UIMA are much more reusable because of the
> >> static typing, not just throw-away prototypes.
> >
> > I strongly disagree here I think the really static type system (and with
> > JCas even compile time static) in UIMA makes it hard reuse a component,
> > because I need to write explicit type system converters in many cases to
> be
> > able to use them.
>
> IMHO type converters are necessary whether or not the type system is
> compiled
> statically. You seem to want per-component converters (what you call
> adapters).
> I personally prefer converters at the beginning and end of pipeline
> sections
> (which can be realized e.g. through collection readers, CAS consumers or
> CAS mulipliers).
>
> Regarding adapters: IMHO a UIMA component largely *is* the adapter between
> type system X
> and underlying implementation Y. My hypothesis is that if there would be a
> generic
> configurable mechanism by which this mapping functionality could be
> externalized
> from a UIMA component, then this mechanism would have the same level of
> complexity
> as the Java code which usually fulfills this purpose in a component.
> Furthermore,
> I expect the remaining component code to become largely trivial then. - OR
> - if the
> mapping functionality is reduced in functionality in order to become
> simpler, then
> it would mean certain type system designs are not supported (cf. OpenNLP
> type
> mapping not being compatible with the DKPro Core type system and others).
>


Today you would write pairs of converters and place them as strategically
as possible in your pipeline, right,
so you would want to group AEs with the same type system in one place.

The OpenNLP UIMA annotators are build in a more generic way and only make
certain assumptions about the type system, e.g. it has token and sentence
annotations. The user has to configure a type and feature mapping in the
xml descriptor. This works for many cases, in some it doesn't.
So there are definitely cases where type mapping isn't enough, e.g. output
of best pos tag, and list of best n pos tags. For those cases I propose to
use adapters which can adapt the component to the type system the user is
using . And those adapters could also handle the type/feature mapping case.

I think if the adapters have support from the framework we could come up
with certain tricks that are not possible with an converter AE.
A converter AE needs to duplicate all (or probably most) the inputs and
outputs for the conversation. This means everything needs to be copied at
least once.

With special APIs you could probably do the following things:
- Define type name mappings, type A looks like type B to the AE
- Define functions which are used to access the features of a FS (the
function can map the feature value to something new) and let the CAS APIs
take care of calling it
- Define functions which converts an entire FS of type A into an FS of type
B  and let the CAS APIs take care of calling it
- It could be possible to define adapters for AAEs as well (same TS AEs
could be grouped)


Type conversion is an entirely separate thing from JCas classes and managing
> different JCas wrappers at the level of classloaders. Here, UIMA offers
> the PEAR solution and I think a constructive discussion could revolve
> how PEARs can be improved or replaced by a superior approach.
>
> > The alternative to this would be a type system which is much less static
> > (or dynamic) and APIs to write AEs which can adapt well to similar but
> > different user defined type systems. This could be achieved by allowing
> > type system mappings, by adding explicit support for adapters in the
> > framework, allowing dynamic definition of types,
> >
> > Together with Thilo I wrote a paper which speaks a bit about this topic
> > (see at 6.4):
> > http://www.aclweb.org/anthology/W14-5209
>
> A more dynamic approach to the type system would be great, in particular
> the ability to add types and features at execution time. We have an API
> that in principle supports this (CAS). Again, this is decoupled from JCas
> which is a higher-level API than CAS is.
>

I agree, I think that can be useful in many situations, an example is the
output of debug or log FSes.

Jörn

De-/serialization (was Re: "Standard" UIMA typesystem)

Posted by Richard Eckart de Castilho <re...@apache.org>.

On 12.09.2016, at 13:00, Richard Eckart de Castilho <re...@apache.org> wrote:
> 
>> The alternative to this would be a type system which is much less static
>> (or dynamic) and APIs to write AEs which can adapt well to similar but
>> different user defined type systems. This could be achieved by allowing
>> type system mappings, by adding explicit support for adapters in the
>> framework, allowing dynamic definition of types,
>> 
>> Together with Thilo I wrote a paper which speaks a bit about this topic
>> (see at 6.4):
>> http://www.aclweb.org/anthology/W14-5209

I never really understood the points about the data serialization (6.1 and 6.3)
in that paper, despite having been at the workshop and briefly discussing that
after the presentation with Thilo.

UIMA natively supports a number of different serialization formats. External
component collections add a large number of additional formats. Each of these
formats has specific benefits and drawbacks and is therefore best suited
for particular uses.

My understanding is that you suggest to keep the graph structure as the
underlying model. My hypothesis is that any format supporting this
graph structure would be at least as complex as the XMI serialization.
More specifically, if a JSON serialization would be defined in UIMA, it
would end having a quite similar structure as the XMI - actually, that
is what happened in the JSON serializer that Marshall has implemented.

The paper does unfortunately not describe what type of JSON dialect you
imagine to define that would not inherit the type of structural complexity
that XMI exposes.

I also don't get the critique about being more relaxed in serialization (6.3).
You are probably aware that various formats supported by UIMA allow lenient
loading. What kind of relaxation would you deem necessary beyond that?

Best,

-- Richard