You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@uima.apache.org by Joern Kottmann <ko...@gmail.com> on 2016/09/09 11:57:13 UTC

Re: "Standard" UIMA typesystem

I personally think that we depend way too much on particular type systems
in UIMA. I really hope we can solve this to some degree in UIMA 3, if I
today
write code using JCAS I am totally stuck with the TS I use, reusing any of
that
code with a different TS is impossible.

The best you can do is using just the CAS, but then it is still difficult
to support
multiple type systems (e.g. complex configuration, various styles) and allow
reusing of the component in different systems.

Jörn

On Tue, Aug 30, 2016 at 7:57 PM, Richard Eckart de Castilho <re...@apache.org>
wrote:

> On 30.08.2016, at 16:39, Peter Klügl <pe...@averbis.com> wrote:
> >
> > If there no standard type system, then people have two options: create
> > their own one or reuse an existing type system of a component
> > repository, e.g., DKPro Core. As far as I know LiMoSINe [1] moved  from
> > their own type system to DKPro Core (I waiting for some text to put on
> > our external resources page - in case they read this).  I also was
> > thinking about switching our NLP components to the DKPro Core type
> > system, but there are several issues preventing that, first of all that
> > I cannot build it :-/
>
> /me Apache/UIMA hat off, DKPro Core hat on
>
> Ok... I am finally addressing this annoying Windowsisim...
>
>   https://github.com/dkpro/dkpro-core/issues/414
>
> Btw. feel free to submit issues for DKPro Core type system improvements.
> We actually do evolve the TS - trying to avoid breaking changes...
>
> Cheers,
>
> -- Richard

Re: "Standard" UIMA typesystem

Posted by Tommaso Teofili <to...@gmail.com>.

+1 to Joern's comment and more generally to a sort of general purpose /
common TS.

My 2 cents,
Tommaso

Il giorno ven 9 set 2016 alle ore 13:57 Joern Kottmann <ko...@gmail.com>
ha scritto:

> I personally think that we depend way too much on particular type systems
> in UIMA. I really hope we can solve this to some degree in UIMA 3, if I
> today
> write code using JCAS I am totally stuck with the TS I use, reusing any of
> that
> code with a different TS is impossible.
>
> The best you can do is using just the CAS, but then it is still difficult
> to support
> multiple type systems (e.g. complex configuration, various styles) and
> allow
> reusing of the component in different systems.
>
> Jörn
>
> On Tue, Aug 30, 2016 at 7:57 PM, Richard Eckart de Castilho <
> rec@apache.org>
> wrote:
>
> > On 30.08.2016, at 16:39, Peter Klügl <pe...@averbis.com> wrote:
> > >
> > > If there no standard type system, then people have two options: create
> > > their own one or reuse an existing type system of a component
> > > repository, e.g., DKPro Core. As far as I know LiMoSINe [1] moved  from
> > > their own type system to DKPro Core (I waiting for some text to put on
> > > our external resources page - in case they read this).  I also was
> > > thinking about switching our NLP components to the DKPro Core type
> > > system, but there are several issues preventing that, first of all that
> > > I cannot build it :-/
> >
> > /me Apache/UIMA hat off, DKPro Core hat on
> >
> > Ok... I am finally addressing this annoying Windowsisim...
> >
> >   https://github.com/dkpro/dkpro-core/issues/414
> >
> > Btw. feel free to submit issues for DKPro Core type system improvements.
> > We actually do evolve the TS - trying to avoid breaking changes...
> >
> > Cheers,
> >
> > -- Richard
>

Re: "Standard" UIMA typesystem

Posted by Richard Eckart de Castilho <re...@apache.org>.

In my experience, 95% of the function of the UIMA component class is to
data conversion, namely to/from the data model/format that some wrapped
non-UIMA tool has to the type system that is used in the pipeline.
The other 5% are passing along parameters and configuring resources.

In some cases, I have factored out the data conversion from the
process method of the UIMA component, leaving basically this:

  process(CAS) {
    DataModel data = convertToDataModel(CAS);
    runWrappedTool(data);
    convertToCas(data, CAS);
  }

UIMA allows me to implement that conversion quickly and in a rather
streamlined way. If that conversion needs to be made more flexible
to support different type system designs, it would IMHO introduce
unnecessary and annoying complexity.

That said...

=> I could imagine that some minimal support for "type mapping"
directly in the CAS could help in certain situations, e.g. when
types/features get renamed as part of evolving a type system

We already have a view mapping, i.e. when a component accesses view X,
it may be mapped to view Y in the CAS. The same could be done for type
names and for features. Eventually, I would like to change the type
names of the DKPro Core type system, and then this would come in very
handy.

=> the ability to extend the type system at runtime (via CAS API, ignoring JCas)

For frameworks like Ruta, it would be nice if types and features could
be added after the CAS has been initialized. Past discussions about this
can be found elsethread.

However, beyond that, I presently find it hard to imagine any sensible
framework support. If the structural design of a type system is changed,
then the type of mapping that can be done declaratively is usually hardly
sufficient.

Btw. nobody forces you to use the JCas API if you don't like it. Just
use the CAS API if that provides you with more of the flexibility that
you would like to have. You can happily mix components coded against CAS
and JCas in the same pipeline. I personally use the JCas whenever possible
and CAS whenever necessary (and LowLevelCas in a few cases as well ;) ).

Cheers,

-- Richard

> On 09.09.2016, at 15:11, Joern Kottmann <ko...@gmail.com> wrote:
> 
> A very good reason to use a framework like UIMA is that we can reuse
> components
> and don't have to build everything from scratch (if I have to do that I
> don't need UIMA these days).
> 
> To be able to reuse a component it must work with multiple type systems or
> can easily be adapted
> to a custom type system.
> 
> I am personally think the convenience the JCas brings is outweighed many
> times by all the complexity
> and disadvantages which come with it, e.g. code generation step, having
> extra special classes and mostly impossible
> to reuse the written code.
> 
> Jörn
> 
> On Fri, Sep 9, 2016 at 2:37 PM, Peter Klügl <pe...@averbis.com>
> wrote:
> 
>> How should this be solved/improved? I do not see it.
>> 
>> You have either generic analysis engines with parameters for the types,
>> or the analysis engine knows the types and depends on it, regardless if
>> you use CAS or JCas.
>> 
>> Isn't that the thing with static typed feature structures? If you have
>> Java code that depends on a class hierarchy, you are stuck with that
>> hierarchy. (I hope this discussion won't go in a direction that
>> dynamically typed programming languages are  better)
>> 
>> 
>> I probably do not understand the motivation. Can you give me an example?
>> 
>> 
>> Best,
>> 
>> 
>> Peter

Re: "Standard" UIMA typesystem

Posted by Joern Kottmann <ko...@gmail.com>.

On Fri, Sep 9, 2016 at 4:29 PM, Peter Klügl <pe...@averbis.com>
wrote:

>
> Am 09.09.2016 um 15:59 schrieb Joern Kottmann:
> > If you merge the type systems of the components you want to use you end
> up
> > with
> > a huge mess of a merged type system and you have to do type system
> > conversions
> > between the AEs.
>
> Why do you end up with a huge mess? All is fine if everyone uses their
> own namespaces, as it is with java classes. If you combine many java
> libraries you also get a lot of classes.
>

I know that it works, but this doesn't change the fact that I end up with
a type system which is the result of merging multiple type systems.
And that means that more or less very similar things are duplicated,
e.g. different annotations to represent tokens, sentences, etc.
Why do I need to have more or less the same type multiple times?

In my opinion, if you get beyond simple type systems, you cannot convert
> the types on the fly. You need some knowledge about the conversion,
> e.g., implemented in a converter. This can be a performance problem...
>

Yes, and that is the worst thing, I am burdened to understand the merged
together type system to then write components which can translate between
the types.

Jörn

Re: "Standard" UIMA typesystem

Posted by Peter Klügl <pe...@averbis.com>.

Am 09.09.2016 um 15:59 schrieb Joern Kottmann:
> If you merge the type systems of the components you want to use you end up
> with
> a huge mess of a merged type system and you have to do type system
> conversions
> between the AEs.

Why do you end up with a huge mess? All is fine if everyone uses their
own namespaces, as it is with java classes. If you combine many java
libraries you also get a lot of classes.

In my opinion, if you get beyond simple type systems, you cannot convert
the types on the fly. You need some knowledge about the conversion,
e.g., implemented in a converter. This can be a performance problem...

Peter

> J�rn
>
> On Fri, Sep 9, 2016 at 3:49 PM, Peter Kl�gl <pe...@averbis.com>
> wrote:
>
>> I still don't get it.
>>
>>
>> You can reuse all components if you include some type system mapping
>> that knows the transformation. I combined some of our components with
>> DKPro Core, ClearTK, cTAKES and JCore components.
>>
>>
>> Our components won't work with the DKPro Core typesystem even if the
>> UIMA Framework would support more "laziness" or dynamic typing (for
>> different reasons, e.g., missing information).
>>
>>
>> I get the point with code generation though.
>>
>>
>> Best,
>>
>> Peter
>>
>>
>> Am 09.09.2016 um 15:11 schrieb Joern Kottmann:
>>> A very good reason to use a framework like UIMA is that we can reuse
>>> components
>>> and don't have to build everything from scratch (if I have to do that I
>>> don't need UIMA these days).
>>>
>>> To be able to reuse a component it must work with multiple type systems
>> or
>>> can easily be adapted
>>> to a custom type system.
>>>
>>> I am personally think the convenience the JCas brings is outweighed many
>>> times by all the complexity
>>> and disadvantages which come with it, e.g. code generation step, having
>>> extra special classes and mostly impossible
>>> to reuse the written code.
>>>
>>> J�rn
>>>
>>> On Fri, Sep 9, 2016 at 2:37 PM, Peter Kl�gl <pe...@averbis.com>
>>> wrote:
>>>
>>>> How should this be solved/improved? I do not see it.
>>>>
>>>> You have either generic analysis engines with parameters for the types,
>>>> or the analysis engine knows the types and depends on it, regardless if
>>>> you use CAS or JCas.
>>>>
>>>> Isn't that the thing with static typed feature structures? If you have
>>>> Java code that depends on a class hierarchy, you are stuck with that
>>>> hierarchy. (I hope this discussion won't go in a direction that
>>>> dynamically typed programming languages are  better)
>>>>
>>>>
>>>> I probably do not understand the motivation. Can you give me an example?
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>> Am 09.09.2016 um 13:57 schrieb Joern Kottmann:
>>>>> I personally think that we depend way too much on particular type
>> systems
>>>>> in UIMA. I really hope we can solve this to some degree in UIMA 3, if I
>>>>> today
>>>>> write code using JCAS I am totally stuck with the TS I use, reusing any
>>>> of
>>>>> that
>>>>> code with a different TS is impossible.
>>>>>
>>>>> The best you can do is using just the CAS, but then it is still
>> difficult
>>>>> to support
>>>>> multiple type systems (e.g. complex configuration, various styles) and
>>>> allow
>>>>> reusing of the component in different systems.
>>>>>
>>>>> J�rn
>>>>>
>>>>> On Tue, Aug 30, 2016 at 7:57 PM, Richard Eckart de Castilho <
>>>> rec@apache.org>
>>>>> wrote:
>>>>>
>>>>>> On 30.08.2016, at 16:39, Peter Kl�gl <pe...@averbis.com>
>> wrote:
>>>>>>> If there no standard type system, then people have two options:
>> create
>>>>>>> their own one or reuse an existing type system of a component
>>>>>>> repository, e.g., DKPro Core. As far as I know LiMoSINe [1] moved
>> from
>>>>>>> their own type system to DKPro Core (I waiting for some text to put
>> on
>>>>>>> our external resources page - in case they read this).  I also was
>>>>>>> thinking about switching our NLP components to the DKPro Core type
>>>>>>> system, but there are several issues preventing that, first of all
>> that
>>>>>>> I cannot build it :-/
>>>>>> /me Apache/UIMA hat off, DKPro Core hat on
>>>>>>
>>>>>> Ok... I am finally addressing this annoying Windowsisim...
>>>>>>
>>>>>>   https://github.com/dkpro/dkpro-core/issues/414
>>>>>>
>>>>>> Btw. feel free to submit issues for DKPro Core type system
>> improvements.
>>>>>> We actually do evolve the TS - trying to avoid breaking changes...
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> -- Richard
>>

Re: "Standard" UIMA typesystem

Posted by Joern Kottmann <ko...@gmail.com>.

If you merge the type systems of the components you want to use you end up
with
a huge mess of a merged type system and you have to do type system
conversions
between the AEs.

Jörn

On Fri, Sep 9, 2016 at 3:49 PM, Peter Klügl <pe...@averbis.com>
wrote:

> I still don't get it.
>
>
> You can reuse all components if you include some type system mapping
> that knows the transformation. I combined some of our components with
> DKPro Core, ClearTK, cTAKES and JCore components.
>
>
> Our components won't work with the DKPro Core typesystem even if the
> UIMA Framework would support more "laziness" or dynamic typing (for
> different reasons, e.g., missing information).
>
>
> I get the point with code generation though.
>
>
> Best,
>
> Peter
>
>
> Am 09.09.2016 um 15:11 schrieb Joern Kottmann:
> > A very good reason to use a framework like UIMA is that we can reuse
> > components
> > and don't have to build everything from scratch (if I have to do that I
> > don't need UIMA these days).
> >
> > To be able to reuse a component it must work with multiple type systems
> or
> > can easily be adapted
> > to a custom type system.
> >
> > I am personally think the convenience the JCas brings is outweighed many
> > times by all the complexity
> > and disadvantages which come with it, e.g. code generation step, having
> > extra special classes and mostly impossible
> > to reuse the written code.
> >
> > Jörn
> >
> > On Fri, Sep 9, 2016 at 2:37 PM, Peter Klügl <pe...@averbis.com>
> > wrote:
> >
> >> How should this be solved/improved? I do not see it.
> >>
> >> You have either generic analysis engines with parameters for the types,
> >> or the analysis engine knows the types and depends on it, regardless if
> >> you use CAS or JCas.
> >>
> >> Isn't that the thing with static typed feature structures? If you have
> >> Java code that depends on a class hierarchy, you are stuck with that
> >> hierarchy. (I hope this discussion won't go in a direction that
> >> dynamically typed programming languages are  better)
> >>
> >>
> >> I probably do not understand the motivation. Can you give me an example?
> >>
> >>
> >> Best,
> >>
> >>
> >> Peter
> >>
> >>
> >>
> >> Am 09.09.2016 um 13:57 schrieb Joern Kottmann:
> >>> I personally think that we depend way too much on particular type
> systems
> >>> in UIMA. I really hope we can solve this to some degree in UIMA 3, if I
> >>> today
> >>> write code using JCAS I am totally stuck with the TS I use, reusing any
> >> of
> >>> that
> >>> code with a different TS is impossible.
> >>>
> >>> The best you can do is using just the CAS, but then it is still
> difficult
> >>> to support
> >>> multiple type systems (e.g. complex configuration, various styles) and
> >> allow
> >>> reusing of the component in different systems.
> >>>
> >>> Jörn
> >>>
> >>> On Tue, Aug 30, 2016 at 7:57 PM, Richard Eckart de Castilho <
> >> rec@apache.org>
> >>> wrote:
> >>>
> >>>> On 30.08.2016, at 16:39, Peter Klügl <pe...@averbis.com>
> wrote:
> >>>>> If there no standard type system, then people have two options:
> create
> >>>>> their own one or reuse an existing type system of a component
> >>>>> repository, e.g., DKPro Core. As far as I know LiMoSINe [1] moved
> from
> >>>>> their own type system to DKPro Core (I waiting for some text to put
> on
> >>>>> our external resources page - in case they read this).  I also was
> >>>>> thinking about switching our NLP components to the DKPro Core type
> >>>>> system, but there are several issues preventing that, first of all
> that
> >>>>> I cannot build it :-/
> >>>> /me Apache/UIMA hat off, DKPro Core hat on
> >>>>
> >>>> Ok... I am finally addressing this annoying Windowsisim...
> >>>>
> >>>>   https://github.com/dkpro/dkpro-core/issues/414
> >>>>
> >>>> Btw. feel free to submit issues for DKPro Core type system
> improvements.
> >>>> We actually do evolve the TS - trying to avoid breaking changes...
> >>>>
> >>>> Cheers,
> >>>>
> >>>> -- Richard
> >>
>
>

Re: "Standard" UIMA typesystem

Posted by Peter Klügl <pe...@averbis.com>.

Am 09.09.2016 um 16:09 schrieb Joern Kottmann:
> Well you are forced to use it when you have to use an AE using it.

Why? Well, yes, if you want change the implementation of that AE.
However, you can combine generic CAS AEs with JCas AEs operating on the
same annotations.


> I think the problem with the JCas is that people think, because we are
> offering it as part of UIMA, that is is acceptable to use it, but the truth
> is it really isn't. 

I do not see the problem yet. Why is it not acceptable to use it?

Peter

> If it would be me deciding, the JCas would be the first
> thing
> I would throw away for UIMA 3 (and also many other things).
>
> J�rn
>
> On Fri, Sep 9, 2016 at 4:00 PM, Richard Eckart de Castilho <re...@apache.org>
> wrote:
>
>> On 09.09.2016, at 15:49, Peter Kl�gl <pe...@averbis.com> wrote:
>>> I get the point with code generation though.
>>>
>>> Am 09.09.2016 um 15:11 schrieb Joern Kottmann:
>>>> I am personally think the convenience the JCas brings is outweighed many
>>>> times by all the complexity
>>>> and disadvantages which come with it, e.g. code generation step, having
>>>> extra special classes and mostly impossible
>>>> to reuse the written code.
>> Again, nothing forces anybody to actually make use of the JCas.
>> If it does not match your taste, then do not use it.
>>
>> If you find the CAS interface to be lacking some convenience,
>> check out the getFeature() and setFeature() methods in uimaFIT FSUtil
>> and also the CasUtil select* methods in uimaFIT.
>>
>> Cheers,
>>
>> -- Richard

Re: "Standard" UIMA typesystem

Posted by Richard Eckart de Castilho <re...@apache.org>.

If an AE uses it, you are not forced to use it as well. Other code
can still operate via the CAS interface. The JCas wrappers don't even
need to be visible to the rest of the code if you use PEARs or OSGi
or some alternative classloader management.

For my part, I consider JCas to be a great utility to build UIMA wrappers
that provides type safety and facilitates refactoring. If JCas would not
be there as a generic mechanism, the first think I'd do would be to
manually implement Java wrapper classes for specific types of FSes.

The UIMA framework comes in layers of APIs and if you don't like some
of these layers, then don't use them. An API that you may find horrible
may well be the exact thing that others link about the framework.

Btw. I think the title of this subthread should be changed. It doesn't seem
anymore to be about promoting interoperability through convergence on a
specific type system, but rather about promoting flexibility by removing
constraints (which IMHO will likely end up in even less interoperability).

Cheers,

-- Richard

> On 09.09.2016, at 16:09, Joern Kottmann <ko...@gmail.com> wrote:
> 
> Well you are forced to use it when you have to use an AE using it.
> I think the problem with the JCas is that people think, because we are
> offering it as part of UIMA, that is is acceptable to use it, but the truth
> is it really isn't. If it would be me deciding, the JCas would be the first
> thing
> I would throw away for UIMA 3 (and also many other things).
> 
> Jörn
> 
> On Fri, Sep 9, 2016 at 4:00 PM, Richard Eckart de Castilho <re...@apache.org>
> wrote:
> 
>> On 09.09.2016, at 15:49, Peter Klügl <pe...@averbis.com> wrote:
>>> 
>>> I get the point with code generation though.
>>> 
>>> Am 09.09.2016 um 15:11 schrieb Joern Kottmann:
>>>> I am personally think the convenience the JCas brings is outweighed many
>>>> times by all the complexity
>>>> and disadvantages which come with it, e.g. code generation step, having
>>>> extra special classes and mostly impossible
>>>> to reuse the written code.
>> 
>> Again, nothing forces anybody to actually make use of the JCas.
>> If it does not match your taste, then do not use it.
>> 
>> If you find the CAS interface to be lacking some convenience,
>> check out the getFeature() and setFeature() methods in uimaFIT FSUtil
>> and also the CasUtil select* methods in uimaFIT.
>> 
>> Cheers,
>> 
>> -- Richard

Re: "Standard" UIMA typesystem

Posted by Marshall Schor <ms...@schor.com>.

One feature of PEAR packaging of components, is that it implements classpath
isolation.  So within a PEAR, it can make use of JCas and this will not in any
way force the types into other people's type systems, I think.  The classes you
define within the PEAR will only be "visible" within the PEAR.  Outside of the
PEAR, in other components in the pipeline, you can have classes with identical
names and namespaces, that know nothing about the UIMA types or the JCas
definitions in use within the PEAR.

Does this kind of isolation capability address your concern about forcing the
types into other people's type systems?  Or is it something else?

-Marshall


On 9/9/2016 5:24 PM, Joern Kottmann wrote:
> My point is about b when someone is using Jcas the compile time static
> typing is forcing these types into me type system and the classes have to
> be on my classpath. It is impossible to use Jcas and not force the types
> into other people's type systems. And my opinion is that this is just wrong.
>
> A framework like Uima has to make it easy to reuse components and in my
> opinion strict compile time typing makes that really difficult to achieve.
>
> But maybe I am wrong and people have some good examples of Jcas based AEs
> which are nice to reuse in a simple custom pipeline.
>
> J�rn
>
> On Sep 9, 2016 17:58, "Marshall Schor" <ms...@schor.com> wrote:
>
>> Hi J�rn,
>>
>> I think people really want to understand the details about why:
>>
>> a) you are forced to use JCas just because an AE is using it (I assume you
>> must
>> mean, outside of that AE).   Other AEs don't need to use the JCas, so I'm
>> misunderstanding something I think, about your point.
>>
>> b) why JCas isn't acceptable (specifics?).  I understand wanting APIs that
>> allow
>> "dynamic" specifications of types, which UIMA has in its plain CAS APIs. It
>> sounds like you see no benefit from what I'll call "compile-time" static
>> typing
>> style of APIs, which the JCas implies; is that what we're discussing?
>>
>> Thanks again for your input!
>>
>> -Marshall
>> On 9/9/2016 10:09 AM, Joern Kottmann wrote:
>>> Well you are forced to use it when you have to use an AE using it.
>>> I think the problem with the JCas is that people think, because we are
>>> offering it as part of UIMA, that is is acceptable to use it, but the
>> truth
>>> is it really isn't. If it would be me deciding, the JCas would be the
>> first
>>> thing
>>> I would throw away for UIMA 3 (and also many other things).
>>>
>>> J�rn
>>>
>>> On Fri, Sep 9, 2016 at 4:00 PM, Richard Eckart de Castilho <
>> rec@apache.org>
>>> wrote:
>>>
>>>> On 09.09.2016, at 15:49, Peter Kl�gl <pe...@averbis.com> wrote:
>>>>> I get the point with code generation though.
>>>>>
>>>>> Am 09.09.2016 um 15:11 schrieb Joern Kottmann:
>>>>>> I am personally think the convenience the JCas brings is outweighed
>> many
>>>>>> times by all the complexity
>>>>>> and disadvantages which come with it, e.g. code generation step,
>> having
>>>>>> extra special classes and mostly impossible
>>>>>> to reuse the written code.
>>>> Again, nothing forces anybody to actually make use of the JCas.
>>>> If it does not match your taste, then do not use it.
>>>>
>>>> If you find the CAS interface to be lacking some convenience,
>>>> check out the getFeature() and setFeature() methods in uimaFIT FSUtil
>>>> and also the CasUtil select* methods in uimaFIT.
>>>>
>>>> Cheers,
>>>>
>>>> -- Richard
>>

Re: Handling type conversion (was Re: "Standard" UIMA typesystem)

Posted by Peter Klügl <pe...@averbis.com>.

@Jrn: this is what the converter analysis engines do, but no objects
need to be created in the trivial case. For the non-trivial case its the
same for me.


@Richard: this could work, but only for the trivial case, same structure
different names.


Maybe I am a bit pessimistic, but I see many many problems and a lot of
work if someone wants to implement it.


Best,


Peter


Am 12.09.2016 um 16:36 schrieb Richard Eckart de Castilho:
> On 12.09.2016, at 16:53, Joern Kottmann <ko...@gmail.com> wrote:
>> With special APIs you could probably do the following things:
>> - Define type name mappings, type A looks like type B to the AE
>> - Define functions which are used to access the features of a FS (the
>> function can map the feature value to something new) and let the CAS APIs
>> take care of calling it
>> - Define functions which converts an entire FS of type A into an FS of type
>> B  and let the CAS APIs take care of calling it
>> - It could be possible to define adapters for AAEs as well (same TS AEs
>> could be grouped)
> Hm, JCas cover classes (might be able to) address all these issues if some assumptions
> are relaxed, e.g. that the name of a JCas class is the same as for a uima type.
>
> So I have a feeling that these things could be solved by allowing users to prep
> a CAS with their own cover classes which would be used instead of generated
> JCas cover classes.
>
> Does that make some initial sense already or should I elaborate this further...
> or does it sound completely wrong?
>
> Cheers,
>
> -- Richard

Re: Handling type conversion (was Re: "Standard" UIMA typesystem)

Posted by Richard Eckart de Castilho <re...@apache.org>.

On 12.09.2016, at 16:53, Joern Kottmann <ko...@gmail.com> wrote:
> 
> With special APIs you could probably do the following things:
> - Define type name mappings, type A looks like type B to the AE
> - Define functions which are used to access the features of a FS (the
> function can map the feature value to something new) and let the CAS APIs
> take care of calling it
> - Define functions which converts an entire FS of type A into an FS of type
> B  and let the CAS APIs take care of calling it
> - It could be possible to define adapters for AAEs as well (same TS AEs
> could be grouped)

Hm, JCas cover classes (might be able to) address all these issues if some assumptions
are relaxed, e.g. that the name of a JCas class is the same as for a uima type.

So I have a feeling that these things could be solved by allowing users to prep
a CAS with their own cover classes which would be used instead of generated
JCas cover classes.

Does that make some initial sense already or should I elaborate this further...
or does it sound completely wrong?

Cheers,

-- Richard

Re: Handling type conversion (was Re: "Standard" UIMA typesystem)

Posted by Joern Kottmann <ko...@gmail.com>.

On Mon, Sep 12, 2016 at 12:00 PM, Richard Eckart de Castilho <rec@apache.org
> wrote:

> > On 12.09.2016, at 11:52, Joern Kottmann <ko...@gmail.com> wrote:
> >
> >> On Sun, Sep 11, 2016 at 3:38 PM, Peter Klügl <pe...@averbis.com>
> >> wrote:
> >>
> >>> Am 09.09.2016 um 23:24 schrieb Joern Kottmann:
> >>>
> >>> A framework like Uima has to make it easy to reuse components and in my
> >>> opinion strict compile time typing makes that really difficult to
> >> achieve.
> >>
> >> Components are already very reusable if they use the same typesystem.
> >> Again, that has nothing to do with compile time typing.
> >>
> >> And, this is not the only purpose but there are many more, e.g., allow
> >> the developer to create large maintainable pipelines.
> >> In my opinion, components in UIMA are much more reusable because of the
> >> static typing, not just throw-away prototypes.
> >
> > I strongly disagree here I think the really static type system (and with
> > JCas even compile time static) in UIMA makes it hard reuse a component,
> > because I need to write explicit type system converters in many cases to
> be
> > able to use them.
>
> IMHO type converters are necessary whether or not the type system is
> compiled
> statically. You seem to want per-component converters (what you call
> adapters).
> I personally prefer converters at the beginning and end of pipeline
> sections
> (which can be realized e.g. through collection readers, CAS consumers or
> CAS mulipliers).
>
> Regarding adapters: IMHO a UIMA component largely *is* the adapter between
> type system X
> and underlying implementation Y. My hypothesis is that if there would be a
> generic
> configurable mechanism by which this mapping functionality could be
> externalized
> from a UIMA component, then this mechanism would have the same level of
> complexity
> as the Java code which usually fulfills this purpose in a component.
> Furthermore,
> I expect the remaining component code to become largely trivial then. - OR
> - if the
> mapping functionality is reduced in functionality in order to become
> simpler, then
> it would mean certain type system designs are not supported (cf. OpenNLP
> type
> mapping not being compatible with the DKPro Core type system and others).
>


Today you would write pairs of converters and place them as strategically
as possible in your pipeline, right,
so you would want to group AEs with the same type system in one place.

The OpenNLP UIMA annotators are build in a more generic way and only make
certain assumptions about the type system, e.g. it has token and sentence
annotations. The user has to configure a type and feature mapping in the
xml descriptor. This works for many cases, in some it doesn't.
So there are definitely cases where type mapping isn't enough, e.g. output
of best pos tag, and list of best n pos tags. For those cases I propose to
use adapters which can adapt the component to the type system the user is
using . And those adapters could also handle the type/feature mapping case.

I think if the adapters have support from the framework we could come up
with certain tricks that are not possible with an converter AE.
A converter AE needs to duplicate all (or probably most) the inputs and
outputs for the conversation. This means everything needs to be copied at
least once.

With special APIs you could probably do the following things:
- Define type name mappings, type A looks like type B to the AE
- Define functions which are used to access the features of a FS (the
function can map the feature value to something new) and let the CAS APIs
take care of calling it
- Define functions which converts an entire FS of type A into an FS of type
B  and let the CAS APIs take care of calling it
- It could be possible to define adapters for AAEs as well (same TS AEs
could be grouped)


Type conversion is an entirely separate thing from JCas classes and managing
> different JCas wrappers at the level of classloaders. Here, UIMA offers
> the PEAR solution and I think a constructive discussion could revolve
> how PEARs can be improved or replaced by a superior approach.
>
> > The alternative to this would be a type system which is much less static
> > (or dynamic) and APIs to write AEs which can adapt well to similar but
> > different user defined type systems. This could be achieved by allowing
> > type system mappings, by adding explicit support for adapters in the
> > framework, allowing dynamic definition of types,
> >
> > Together with Thilo I wrote a paper which speaks a bit about this topic
> > (see at 6.4):
> > http://www.aclweb.org/anthology/W14-5209
>
> A more dynamic approach to the type system would be great, in particular
> the ability to add types and features at execution time. We have an API
> that in principle supports this (CAS). Again, this is decoupled from JCas
> which is a higher-level API than CAS is.
>

I agree, I think that can be useful in many situations, an example is the
output of debug or log FSes.

Jörn

De-/serialization (was Re: "Standard" UIMA typesystem)

Posted by Richard Eckart de Castilho <re...@apache.org>.

On 12.09.2016, at 13:00, Richard Eckart de Castilho <re...@apache.org> wrote:
> 
>> The alternative to this would be a type system which is much less static
>> (or dynamic) and APIs to write AEs which can adapt well to similar but
>> different user defined type systems. This could be achieved by allowing
>> type system mappings, by adding explicit support for adapters in the
>> framework, allowing dynamic definition of types,
>> 
>> Together with Thilo I wrote a paper which speaks a bit about this topic
>> (see at 6.4):
>> http://www.aclweb.org/anthology/W14-5209

I never really understood the points about the data serialization (6.1 and 6.3)
in that paper, despite having been at the workshop and briefly discussing that
after the presentation with Thilo.

UIMA natively supports a number of different serialization formats. External
component collections add a large number of additional formats. Each of these
formats has specific benefits and drawbacks and is therefore best suited
for particular uses.

My understanding is that you suggest to keep the graph structure as the
underlying model. My hypothesis is that any format supporting this
graph structure would be at least as complex as the XMI serialization.
More specifically, if a JSON serialization would be defined in UIMA, it
would end having a quite similar structure as the XMI - actually, that
is what happened in the JSON serializer that Marshall has implemented.

The paper does unfortunately not describe what type of JSON dialect you
imagine to define that would not inherit the type of structural complexity
that XMI exposes.

I also don't get the critique about being more relaxed in serialization (6.3).
You are probably aware that various formats supported by UIMA allow lenient
loading. What kind of relaxation would you deem necessary beyond that?

Best,

-- Richard

Handling type conversion (was Re: "Standard" UIMA typesystem)

Posted by Richard Eckart de Castilho <re...@apache.org>.

> On 12.09.2016, at 11:52, Joern Kottmann <ko...@gmail.com> wrote:
> 
>> On Sun, Sep 11, 2016 at 3:38 PM, Peter Klügl <pe...@averbis.com>
>> wrote:
>> 
>>> Am 09.09.2016 um 23:24 schrieb Joern Kottmann:
>>> 
>>> A framework like Uima has to make it easy to reuse components and in my
>>> opinion strict compile time typing makes that really difficult to
>> achieve.
>> 
>> Components are already very reusable if they use the same typesystem.
>> Again, that has nothing to do with compile time typing.
>> 
>> And, this is not the only purpose but there are many more, e.g., allow
>> the developer to create large maintainable pipelines.
>> In my opinion, components in UIMA are much more reusable because of the
>> static typing, not just throw-away prototypes.
> 
> I strongly disagree here I think the really static type system (and with
> JCas even compile time static) in UIMA makes it hard reuse a component,
> because I need to write explicit type system converters in many cases to be
> able to use them.

IMHO type converters are necessary whether or not the type system is compiled
statically. You seem to want per-component converters (what you call adapters).
I personally prefer converters at the beginning and end of pipeline sections
(which can be realized e.g. through collection readers, CAS consumers or
CAS mulipliers).

Regarding adapters: IMHO a UIMA component largely *is* the adapter between type system X
and underlying implementation Y. My hypothesis is that if there would be a generic
configurable mechanism by which this mapping functionality could be externalized
from a UIMA component, then this mechanism would have the same level of complexity
as the Java code which usually fulfills this purpose in a component. Furthermore,
I expect the remaining component code to become largely trivial then. - OR - if the
mapping functionality is reduced in functionality in order to become simpler, then
it would mean certain type system designs are not supported (cf. OpenNLP type
mapping not being compatible with the DKPro Core type system and others).

Type conversion is an entirely separate thing from JCas classes and managing
different JCas wrappers at the level of classloaders. Here, UIMA offers
the PEAR solution and I think a constructive discussion could revolve
how PEARs can be improved or replaced by a superior approach.

> The alternative to this would be a type system which is much less static
> (or dynamic) and APIs to write AEs which can adapt well to similar but
> different user defined type systems. This could be achieved by allowing
> type system mappings, by adding explicit support for adapters in the
> framework, allowing dynamic definition of types,
> 
> Together with Thilo I wrote a paper which speaks a bit about this topic
> (see at 6.4):
> http://www.aclweb.org/anthology/W14-5209

A more dynamic approach to the type system would be great, in particular
the ability to add types and features at execution time. We have an API
that in principle supports this (CAS). Again, this is decoupled from JCas
which is a higher-level API than CAS is.

>>> But maybe I am wrong and people have some good examples of Jcas based AEs
>>> which are nice to reuse in a simple custom pipeline.
>>> 
>> 
>> I am not interested in simple pipelines. I do not need UIMA for that. I
>> need UIMA for pipelines with dozens of components developed by different
>> people.
>> 
> If have a large pipeline you will end up writing two converters if you use
> an AE which can't adapt to your type system, one to convert to the AEs type
> system, this one you place before, and one to convert back from the AE type
> system to yours. I was speaking here about a simple example, and not a
> simple pipeline.

Hm, here you seem to be talking about not per-component adapters but adapters
between larger subsections of a pipeline - those I suggested could be implemented
e.g. via CAS multipliers. At this point, I am actually unsure what you are suggesting
in terms of what you call "adapters".

Cheers,

-- Richard

Re: Handling type conversion (was Re: "Standard" UIMA typesystem)

Posted by Peter Klügl <pe...@averbis.com>.

Hi,


Am 12.09.2016 um 10:52 schrieb Joern Kottmann:
> I strongly disagree here I think the really static type system (and with
> JCas even compile time static) in UIMA makes it hard reuse a component,
> because I need to write explicit type system converters in many cases to be
> able to use them.

In my opinion, the static type system is one of the big advantages of
UIMA compared to GATE.
The explicit type system converter can be a performance problem, but it
is the only thing that will work for non-trivial types. Btw, a generic
converter will cause even more performance problems.
I can see your point, but I do not agree. How often does one write an
analysis engine compared to a converter? The converter is written once,
and adapted if the type system changes (won't happen so often normally).
So, I rather take the advantage of static type systems for developing
analysis engines.


>
> The alternative to this would be a type system which is much less static
> (or dynamic) and APIs to write AEs which can adapt well to similar but
> different user defined type systems. This could be achieved by allowing
> type system mappings, by adding explicit support for adapters in the
> framework, allowing dynamic definition of types,

Type system mapping is not that easy as it sounds, and leads exactly to
the explicit converters mentioned above. Yes, you can do that for simple
use cases, but not for complex type systems. And this is not a specific
problem of UIMA but rather a general one.

I can see that type mapping like sofa mapping for aggregate analysis
engines can be handy, but that will work only for simple use cases,
e.g., read only or for equal feature ranges. Ruta, for example, provides
also type aliasing when importing type systems.

Dynamic type systems where new types and features are incrementally
added by analysis engines can be a nice feature, but can also reduce the
maintainability of the pipelines. It would have been a nice feature for
Ruta since Ruta spams new types, but the generation of type system
descriptors during compile time works perfectly well for me now.


>
> Together with Thilo I wrote a paper which speaks a bit about this topic
> (see at 6.4):
> http://www.aclweb.org/anthology/W14-5209
>
> You have a different view and that is ok, and other people here too.
>

I know the paper of course and I liked it.

There is a difference to state something "is just wrong" or to complain
about JCas in general with arguments that are not accurate (in my
opinion), or to provide some arguments what can be improved in UIMA and
how it can be improved.

Different views will always be for the better of UIMA if the arguments
are constructive.

> If have a large pipeline you will end up writing two converters if you use
> an AE which can't adapt to your type system, one to convert to the AEs type
> system, this one you place before, and one to convert back from the AE type
> system to yours. I was speaking here about a simple example, and not a
> simple pipeline.
>

Well, I implemented the converters for the major type systems - once -,
and now I can use the analysis engines which are wrapped in an aggregate
analysis engine with the converters. This is of course not an optimal
solution, but I do not see a realistically better one. Can you provide a
better one that will work, e.g, for combining cTAKES and DKPro Core
components up to the parser level without loss of information? If yes,
I'll be the first to adapt it.

Best,

Peter



> J�rn
>

Re: "Standard" UIMA typesystem

Posted by Joern Kottmann <ko...@gmail.com>.

On Sun, Sep 11, 2016 at 3:38 PM, Peter Klügl <pe...@averbis.com>
wrote:

> Hi,
>
>
> Am 09.09.2016 um 23:24 schrieb Joern Kottmann:
> > My point is about b when someone is using Jcas the compile time static
> > typing is forcing these types into me type system and the classes have to
> > be on my classpath. It is impossible to use Jcas and not force the types
> > into other people's type systems. And my opinion is that this is just
> wrong.
>
> I disagree. For me, the type system is the part of API of the component.
> You need the API in order to use the component.
>
> > A framework like Uima has to make it easy to reuse components and in my
> > opinion strict compile time typing makes that really difficult to
> achieve.
>
> Components are already very reusable if they use the same typesystem.
> Again, that has nothing to do with compile time typing.
>
> And, this is not the only purpose but there are many more, e.g., allow
> the developer to create large maintainable pipelines.
> In my opinion, components in UIMA are much more reusable because of the
> static typing, not just throw-away prototypes.



I strongly disagree here I think the really static type system (and with
JCas even compile time static) in UIMA makes it hard reuse a component,
because I need to write explicit type system converters in many cases to be
able to use them.

The alternative to this would be a type system which is much less static
(or dynamic) and APIs to write AEs which can adapt well to similar but
different user defined type systems. This could be achieved by allowing
type system mappings, by adding explicit support for adapters in the
framework, allowing dynamic definition of types,

Together with Thilo I wrote a paper which speaks a bit about this topic
(see at 6.4):
http://www.aclweb.org/anthology/W14-5209

You have a different view and that is ok, and other people here too.



> >
> > But maybe I am wrong and people have some good examples of Jcas based AEs
> > which are nice to reuse in a simple custom pipeline.
> >
>
> I am not interested in simple pipelines. I do not need UIMA for that. I
> need UIMA for pipelines with dozens of components developed by different
> people.
>
>
If have a large pipeline you will end up writing two converters if you use
an AE which can't adapt to your type system, one to convert to the AEs type
system, this one you place before, and one to convert back from the AE type
system to yours. I was speaking here about a simple example, and not a
simple pipeline.


Jörn

Re: "Standard" UIMA typesystem

Posted by Peter Klügl <pe...@averbis.com>.

Hi,


Am 09.09.2016 um 23:24 schrieb Joern Kottmann:
> My point is about b when someone is using Jcas the compile time static
> typing is forcing these types into me type system and the classes have to
> be on my classpath. It is impossible to use Jcas and not force the types
> into other people's type systems. And my opinion is that this is just wrong.

I disagree. For me, the type system is the part of API of the component.
You need the API in order to use the component.

> A framework like Uima has to make it easy to reuse components and in my
> opinion strict compile time typing makes that really difficult to achieve.

Components are already very reusable if they use the same typesystem.
Again, that has nothing to do with compile time typing.

And, this is not the only purpose but there are many more, e.g., allow
the developer to create large maintainable pipelines.
In my opinion, components in UIMA are much more reusable because of the
static typing, not just throw-away prototypes.


>
> But maybe I am wrong and people have some good examples of Jcas based AEs
> which are nice to reuse in a simple custom pipeline.
>

I am not interested in simple pipelines. I do not need UIMA for that. I
need UIMA for pipelines with dozens of components developed by different
people.


Good example of a JCas-based AE: every AE that was developed to solve a
specific task for well-defined semantics defined in a type system.


J�rn, if you do not provide constructive arguments or a specific
reasonable use case, but just complains, there is no reason for me to
continue this discussion.


Best,

Peter


> J�rn
>
> On Sep 9, 2016 17:58, "Marshall Schor" <ms...@schor.com> wrote:
>
>> Hi J�rn,
>>
>> I think people really want to understand the details about why:
>>
>> a) you are forced to use JCas just because an AE is using it (I assume you
>> must
>> mean, outside of that AE).   Other AEs don't need to use the JCas, so I'm
>> misunderstanding something I think, about your point.
>>
>> b) why JCas isn't acceptable (specifics?).  I understand wanting APIs that
>> allow
>> "dynamic" specifications of types, which UIMA has in its plain CAS APIs. It
>> sounds like you see no benefit from what I'll call "compile-time" static
>> typing
>> style of APIs, which the JCas implies; is that what we're discussing?
>>
>> Thanks again for your input!
>>
>> -Marshall
>> On 9/9/2016 10:09 AM, Joern Kottmann wrote:
>>> Well you are forced to use it when you have to use an AE using it.
>>> I think the problem with the JCas is that people think, because we are
>>> offering it as part of UIMA, that is is acceptable to use it, but the
>> truth
>>> is it really isn't. If it would be me deciding, the JCas would be the
>> first
>>> thing
>>> I would throw away for UIMA 3 (and also many other things).
>>>
>>> J�rn
>>>
>>> On Fri, Sep 9, 2016 at 4:00 PM, Richard Eckart de Castilho <
>> rec@apache.org>
>>> wrote:
>>>
>>>> On 09.09.2016, at 15:49, Peter Kl�gl <pe...@averbis.com> wrote:
>>>>> I get the point with code generation though.
>>>>>
>>>>> Am 09.09.2016 um 15:11 schrieb Joern Kottmann:
>>>>>> I am personally think the convenience the JCas brings is outweighed
>> many
>>>>>> times by all the complexity
>>>>>> and disadvantages which come with it, e.g. code generation step,
>> having
>>>>>> extra special classes and mostly impossible
>>>>>> to reuse the written code.
>>>> Again, nothing forces anybody to actually make use of the JCas.
>>>> If it does not match your taste, then do not use it.
>>>>
>>>> If you find the CAS interface to be lacking some convenience,
>>>> check out the getFeature() and setFeature() methods in uimaFIT FSUtil
>>>> and also the CasUtil select* methods in uimaFIT.
>>>>
>>>> Cheers,
>>>>
>>>> -- Richard
>>

Re: "Standard" UIMA typesystem

Posted by Joern Kottmann <ko...@gmail.com>.

My point is about b when someone is using Jcas the compile time static
typing is forcing these types into me type system and the classes have to
be on my classpath. It is impossible to use Jcas and not force the types
into other people's type systems. And my opinion is that this is just wrong.

A framework like Uima has to make it easy to reuse components and in my
opinion strict compile time typing makes that really difficult to achieve.

But maybe I am wrong and people have some good examples of Jcas based AEs
which are nice to reuse in a simple custom pipeline.

Jörn

On Sep 9, 2016 17:58, "Marshall Schor" <ms...@schor.com> wrote:

> Hi Jörn,
>
> I think people really want to understand the details about why:
>
> a) you are forced to use JCas just because an AE is using it (I assume you
> must
> mean, outside of that AE).   Other AEs don't need to use the JCas, so I'm
> misunderstanding something I think, about your point.
>
> b) why JCas isn't acceptable (specifics?).  I understand wanting APIs that
> allow
> "dynamic" specifications of types, which UIMA has in its plain CAS APIs. It
> sounds like you see no benefit from what I'll call "compile-time" static
> typing
> style of APIs, which the JCas implies; is that what we're discussing?
>
> Thanks again for your input!
>
> -Marshall
> On 9/9/2016 10:09 AM, Joern Kottmann wrote:
> > Well you are forced to use it when you have to use an AE using it.
> > I think the problem with the JCas is that people think, because we are
> > offering it as part of UIMA, that is is acceptable to use it, but the
> truth
> > is it really isn't. If it would be me deciding, the JCas would be the
> first
> > thing
> > I would throw away for UIMA 3 (and also many other things).
> >
> > Jörn
> >
> > On Fri, Sep 9, 2016 at 4:00 PM, Richard Eckart de Castilho <
> rec@apache.org>
> > wrote:
> >
> >> On 09.09.2016, at 15:49, Peter Klügl <pe...@averbis.com> wrote:
> >>> I get the point with code generation though.
> >>>
> >>> Am 09.09.2016 um 15:11 schrieb Joern Kottmann:
> >>>> I am personally think the convenience the JCas brings is outweighed
> many
> >>>> times by all the complexity
> >>>> and disadvantages which come with it, e.g. code generation step,
> having
> >>>> extra special classes and mostly impossible
> >>>> to reuse the written code.
> >> Again, nothing forces anybody to actually make use of the JCas.
> >> If it does not match your taste, then do not use it.
> >>
> >> If you find the CAS interface to be lacking some convenience,
> >> check out the getFeature() and setFeature() methods in uimaFIT FSUtil
> >> and also the CasUtil select* methods in uimaFIT.
> >>
> >> Cheers,
> >>
> >> -- Richard
>
>

Re: "Standard" UIMA typesystem

Posted by Marshall Schor <ms...@schor.com>.

Hi J�rn,

I think people really want to understand the details about why:

a) you are forced to use JCas just because an AE is using it (I assume you must
mean, outside of that AE).   Other AEs don't need to use the JCas, so I'm
misunderstanding something I think, about your point.

b) why JCas isn't acceptable (specifics?).  I understand wanting APIs that allow
"dynamic" specifications of types, which UIMA has in its plain CAS APIs. It
sounds like you see no benefit from what I'll call "compile-time" static typing
style of APIs, which the JCas implies; is that what we're discussing?

Thanks again for your input!

-Marshall
On 9/9/2016 10:09 AM, Joern Kottmann wrote:
> Well you are forced to use it when you have to use an AE using it.
> I think the problem with the JCas is that people think, because we are
> offering it as part of UIMA, that is is acceptable to use it, but the truth
> is it really isn't. If it would be me deciding, the JCas would be the first
> thing
> I would throw away for UIMA 3 (and also many other things).
>
> J�rn
>
> On Fri, Sep 9, 2016 at 4:00 PM, Richard Eckart de Castilho <re...@apache.org>
> wrote:
>
>> On 09.09.2016, at 15:49, Peter Kl�gl <pe...@averbis.com> wrote:
>>> I get the point with code generation though.
>>>
>>> Am 09.09.2016 um 15:11 schrieb Joern Kottmann:
>>>> I am personally think the convenience the JCas brings is outweighed many
>>>> times by all the complexity
>>>> and disadvantages which come with it, e.g. code generation step, having
>>>> extra special classes and mostly impossible
>>>> to reuse the written code.
>> Again, nothing forces anybody to actually make use of the JCas.
>> If it does not match your taste, then do not use it.
>>
>> If you find the CAS interface to be lacking some convenience,
>> check out the getFeature() and setFeature() methods in uimaFIT FSUtil
>> and also the CasUtil select* methods in uimaFIT.
>>
>> Cheers,
>>
>> -- Richard

Re: "Standard" UIMA typesystem

Posted by Joern Kottmann <ko...@gmail.com>.

Well you are forced to use it when you have to use an AE using it.
I think the problem with the JCas is that people think, because we are
offering it as part of UIMA, that is is acceptable to use it, but the truth
is it really isn't. If it would be me deciding, the JCas would be the first
thing
I would throw away for UIMA 3 (and also many other things).

Jörn

On Fri, Sep 9, 2016 at 4:00 PM, Richard Eckart de Castilho <re...@apache.org>
wrote:

> On 09.09.2016, at 15:49, Peter Klügl <pe...@averbis.com> wrote:
> >
> > I get the point with code generation though.
> >
> > Am 09.09.2016 um 15:11 schrieb Joern Kottmann:
> >> I am personally think the convenience the JCas brings is outweighed many
> >> times by all the complexity
> >> and disadvantages which come with it, e.g. code generation step, having
> >> extra special classes and mostly impossible
> >> to reuse the written code.
>
> Again, nothing forces anybody to actually make use of the JCas.
> If it does not match your taste, then do not use it.
>
> If you find the CAS interface to be lacking some convenience,
> check out the getFeature() and setFeature() methods in uimaFIT FSUtil
> and also the CasUtil select* methods in uimaFIT.
>
> Cheers,
>
> -- Richard

Re: "Standard" UIMA typesystem

Posted by Richard Eckart de Castilho <re...@apache.org>.

On 09.09.2016, at 15:49, Peter Klügl <pe...@averbis.com> wrote:
> 
> I get the point with code generation though.
> 
> Am 09.09.2016 um 15:11 schrieb Joern Kottmann:
>> I am personally think the convenience the JCas brings is outweighed many
>> times by all the complexity
>> and disadvantages which come with it, e.g. code generation step, having
>> extra special classes and mostly impossible
>> to reuse the written code.

Again, nothing forces anybody to actually make use of the JCas.
If it does not match your taste, then do not use it.

If you find the CAS interface to be lacking some convenience,
check out the getFeature() and setFeature() methods in uimaFIT FSUtil
and also the CasUtil select* methods in uimaFIT.

Cheers,

-- Richard

Re: "Standard" UIMA typesystem

Posted by Peter Klügl <pe...@averbis.com>.

I still don't get it.


You can reuse all components if you include some type system mapping
that knows the transformation. I combined some of our components with
DKPro Core, ClearTK, cTAKES and JCore components.


Our components won't work with the DKPro Core typesystem even if the
UIMA Framework would support more "laziness" or dynamic typing (for
different reasons, e.g., missing information).


I get the point with code generation though.


Best,

Peter


Am 09.09.2016 um 15:11 schrieb Joern Kottmann:
> A very good reason to use a framework like UIMA is that we can reuse
> components
> and don't have to build everything from scratch (if I have to do that I
> don't need UIMA these days).
>
> To be able to reuse a component it must work with multiple type systems or
> can easily be adapted
> to a custom type system.
>
> I am personally think the convenience the JCas brings is outweighed many
> times by all the complexity
> and disadvantages which come with it, e.g. code generation step, having
> extra special classes and mostly impossible
> to reuse the written code.
>
> J�rn
>
> On Fri, Sep 9, 2016 at 2:37 PM, Peter Kl�gl <pe...@averbis.com>
> wrote:
>
>> How should this be solved/improved? I do not see it.
>>
>> You have either generic analysis engines with parameters for the types,
>> or the analysis engine knows the types and depends on it, regardless if
>> you use CAS or JCas.
>>
>> Isn't that the thing with static typed feature structures? If you have
>> Java code that depends on a class hierarchy, you are stuck with that
>> hierarchy. (I hope this discussion won't go in a direction that
>> dynamically typed programming languages are  better)
>>
>>
>> I probably do not understand the motivation. Can you give me an example?
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>>
>> Am 09.09.2016 um 13:57 schrieb Joern Kottmann:
>>> I personally think that we depend way too much on particular type systems
>>> in UIMA. I really hope we can solve this to some degree in UIMA 3, if I
>>> today
>>> write code using JCAS I am totally stuck with the TS I use, reusing any
>> of
>>> that
>>> code with a different TS is impossible.
>>>
>>> The best you can do is using just the CAS, but then it is still difficult
>>> to support
>>> multiple type systems (e.g. complex configuration, various styles) and
>> allow
>>> reusing of the component in different systems.
>>>
>>> J�rn
>>>
>>> On Tue, Aug 30, 2016 at 7:57 PM, Richard Eckart de Castilho <
>> rec@apache.org>
>>> wrote:
>>>
>>>> On 30.08.2016, at 16:39, Peter Kl�gl <pe...@averbis.com> wrote:
>>>>> If there no standard type system, then people have two options: create
>>>>> their own one or reuse an existing type system of a component
>>>>> repository, e.g., DKPro Core. As far as I know LiMoSINe [1] moved  from
>>>>> their own type system to DKPro Core (I waiting for some text to put on
>>>>> our external resources page - in case they read this).  I also was
>>>>> thinking about switching our NLP components to the DKPro Core type
>>>>> system, but there are several issues preventing that, first of all that
>>>>> I cannot build it :-/
>>>> /me Apache/UIMA hat off, DKPro Core hat on
>>>>
>>>> Ok... I am finally addressing this annoying Windowsisim...
>>>>
>>>>   https://github.com/dkpro/dkpro-core/issues/414
>>>>
>>>> Btw. feel free to submit issues for DKPro Core type system improvements.
>>>> We actually do evolve the TS - trying to avoid breaking changes...
>>>>
>>>> Cheers,
>>>>
>>>> -- Richard
>>

Re: "Standard" UIMA typesystem

Posted by Joern Kottmann <ko...@gmail.com>.

A very good reason to use a framework like UIMA is that we can reuse
components
and don't have to build everything from scratch (if I have to do that I
don't need UIMA these days).

To be able to reuse a component it must work with multiple type systems or
can easily be adapted
to a custom type system.

I am personally think the convenience the JCas brings is outweighed many
times by all the complexity
and disadvantages which come with it, e.g. code generation step, having
extra special classes and mostly impossible
to reuse the written code.

Jörn

On Fri, Sep 9, 2016 at 2:37 PM, Peter Klügl <pe...@averbis.com>
wrote:

> How should this be solved/improved? I do not see it.
>
> You have either generic analysis engines with parameters for the types,
> or the analysis engine knows the types and depends on it, regardless if
> you use CAS or JCas.
>
> Isn't that the thing with static typed feature structures? If you have
> Java code that depends on a class hierarchy, you are stuck with that
> hierarchy. (I hope this discussion won't go in a direction that
> dynamically typed programming languages are  better)
>
>
> I probably do not understand the motivation. Can you give me an example?
>
>
> Best,
>
>
> Peter
>
>
>
> Am 09.09.2016 um 13:57 schrieb Joern Kottmann:
> > I personally think that we depend way too much on particular type systems
> > in UIMA. I really hope we can solve this to some degree in UIMA 3, if I
> > today
> > write code using JCAS I am totally stuck with the TS I use, reusing any
> of
> > that
> > code with a different TS is impossible.
> >
> > The best you can do is using just the CAS, but then it is still difficult
> > to support
> > multiple type systems (e.g. complex configuration, various styles) and
> allow
> > reusing of the component in different systems.
> >
> > Jörn
> >
> > On Tue, Aug 30, 2016 at 7:57 PM, Richard Eckart de Castilho <
> rec@apache.org>
> > wrote:
> >
> >> On 30.08.2016, at 16:39, Peter Klügl <pe...@averbis.com> wrote:
> >>> If there no standard type system, then people have two options: create
> >>> their own one or reuse an existing type system of a component
> >>> repository, e.g., DKPro Core. As far as I know LiMoSINe [1] moved  from
> >>> their own type system to DKPro Core (I waiting for some text to put on
> >>> our external resources page - in case they read this).  I also was
> >>> thinking about switching our NLP components to the DKPro Core type
> >>> system, but there are several issues preventing that, first of all that
> >>> I cannot build it :-/
> >> /me Apache/UIMA hat off, DKPro Core hat on
> >>
> >> Ok... I am finally addressing this annoying Windowsisim...
> >>
> >>   https://github.com/dkpro/dkpro-core/issues/414
> >>
> >> Btw. feel free to submit issues for DKPro Core type system improvements.
> >> We actually do evolve the TS - trying to avoid breaking changes...
> >>
> >> Cheers,
> >>
> >> -- Richard
>
>

Re: "Standard" UIMA typesystem

Posted by Peter Klügl <pe...@averbis.com>.

How should this be solved/improved? I do not see it.

You have either generic analysis engines with parameters for the types,
or the analysis engine knows the types and depends on it, regardless if
you use CAS or JCas.

Isn't that the thing with static typed feature structures? If you have
Java code that depends on a class hierarchy, you are stuck with that
hierarchy. (I hope this discussion won't go in a direction that
dynamically typed programming languages are  better)


I probably do not understand the motivation. Can you give me an example?


Best,


Peter



Am 09.09.2016 um 13:57 schrieb Joern Kottmann:
> I personally think that we depend way too much on particular type systems
> in UIMA. I really hope we can solve this to some degree in UIMA 3, if I
> today
> write code using JCAS I am totally stuck with the TS I use, reusing any of
> that
> code with a different TS is impossible.
>
> The best you can do is using just the CAS, but then it is still difficult
> to support
> multiple type systems (e.g. complex configuration, various styles) and allow
> reusing of the component in different systems.
>
> J�rn
>
> On Tue, Aug 30, 2016 at 7:57 PM, Richard Eckart de Castilho <re...@apache.org>
> wrote:
>
>> On 30.08.2016, at 16:39, Peter Kl�gl <pe...@averbis.com> wrote:
>>> If there no standard type system, then people have two options: create
>>> their own one or reuse an existing type system of a component
>>> repository, e.g., DKPro Core. As far as I know LiMoSINe [1] moved  from
>>> their own type system to DKPro Core (I waiting for some text to put on
>>> our external resources page - in case they read this).  I also was
>>> thinking about switching our NLP components to the DKPro Core type
>>> system, but there are several issues preventing that, first of all that
>>> I cannot build it :-/
>> /me Apache/UIMA hat off, DKPro Core hat on
>>
>> Ok... I am finally addressing this annoying Windowsisim...
>>
>>   https://github.com/dkpro/dkpro-core/issues/414
>>
>> Btw. feel free to submit issues for DKPro Core type system improvements.
>> We actually do evolve the TS - trying to avoid breaking changes...
>>
>> Cheers,
>>
>> -- Richard

Re: "Standard" UIMA typesystem

Posted by Marshall Schor <ms...@schor.com>.

Following on Peter's comment - is the desire here for alternatives supporting
"dynamic" typing?

Some computer languages analogies: Java (strong static typing), Groovy (some
combo of dynamic typing and strong typing), JavaScript (dynamic), TypeScript (a
variant of JavaScript that adds static typing + classes, but compiles to
JavaScript).

UIMA makes some attempt to allow flexibility.  Some examples: if you have a
complex type system, with e.g. Token defined to have 30 features, and you have
some old code that has Token defined with 3 features (with the same supertypes),
but those features are the same as 3 of the features in the more complex
version, then you could have a "simple" annotator written against the 3 feature
kind.

The reuse you're considering goes beyond these constraints, correct?

-Marshall

On 9/9/2016 7:57 AM, Joern Kottmann wrote:
> I personally think that we depend way too much on particular type systems
> in UIMA. I really hope we can solve this to some degree in UIMA 3, if I
> today
> write code using JCAS I am totally stuck with the TS I use, reusing any of
> that
> code with a different TS is impossible.
>
> The best you can do is using just the CAS, but then it is still difficult
> to support
> multiple type systems (e.g. complex configuration, various styles) and allow
> reusing of the component in different systems.
>
> J�rn
>
> On Tue, Aug 30, 2016 at 7:57 PM, Richard Eckart de Castilho <re...@apache.org>
> wrote:
>
>> On 30.08.2016, at 16:39, Peter Kl�gl <pe...@averbis.com> wrote:
>>> If there no standard type system, then people have two options: create
>>> their own one or reuse an existing type system of a component
>>> repository, e.g., DKPro Core. As far as I know LiMoSINe [1] moved  from
>>> their own type system to DKPro Core (I waiting for some text to put on
>>> our external resources page - in case they read this).  I also was
>>> thinking about switching our NLP components to the DKPro Core type
>>> system, but there are several issues preventing that, first of all that
>>> I cannot build it :-/
>> /me Apache/UIMA hat off, DKPro Core hat on
>>
>> Ok... I am finally addressing this annoying Windowsisim...
>>
>>   https://github.com/dkpro/dkpro-core/issues/414
>>
>> Btw. feel free to submit issues for DKPro Core type system improvements.
>> We actually do evolve the TS - trying to avoid breaking changes...
>>
>> Cheers,
>>
>> -- Richard