You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2016/07/15 18:35:38 UTC

making the CAS Editor work with any UIMA supported serialization format

Hi,

I'm starting a separate thread on this :-)

We currently have the following kinds of serialized formats

a) <only used for client-server communication>: various "delta" formats - these
require the original CAS be available.

b) various forms that can be saved to disk and reloaded.

----------

For (b), there are:

 - XCAS, XMI (these are xml based)

 - binary (plain, compressed form 4, form 6)

 - some compound forms using Java object serialization and encoding type systems
as well

----------

The binary forms are "self-identifying" for deserialization, so one piece of
deserializer code can "read" the format and pick the right deserializer.

Is it correct to assume the request for enhancement from Peter for the CAS
Editor is to have that support the various forms (b)?  If so, which ones does it
not support at the moment?

-Marshall


Re: making the CAS Editor work with any UIMA supported serialization format

Posted by Peter Klügl <pe...@averbis.com>.
Hi,


the functionality is implemented in uimaj-core, CAS Editor (and UIMA Ruta).


I added default file extension info to the format enums, which we should
discuss.


I plan to do some additional testing in a use case with large cas files,
but apart from that (and the file endings) I think it is ready for the
release. The test coverage could be higher, but on the other side there
are already tests for the serializers.


Best,


Peter



Am 17.07.2016 um 04:54 schrieb Marshall Schor:
> Sounds good. 
>
> If this can get done in a few days, I guess we can hold up the release. 
> Otherwise, I'd like to go ahead with the release, and pick this improvement up
> on the next one.
>
> Cheers. -Marshall
>
>
> On 7/16/2016 6:55 AM, Peter Kl�gl wrote:
>> Hi,
>>
>>
>> yes, it's (b). Right now, the CAS Editor only supports XMI and XCAS I
>> far as I remember. No binary format is supported.
>>
>>
>> I am thinking about some utils class for CAS serialization in
>> uimaj-core. The read method should be somewhat generic (detecting the
>> format), the write method takes the format as argument.
>>
>>
>> This utils class can then be utilized in the CAS Editor (and also by
>> others) to read and write the CAS files (with the format it was read).
>>
>>
>> Best,
>>
>> Peter
>>
>>
>> Am 15.07.2016 um 20:35 schrieb Marshall Schor:
>>> Hi,
>>>
>>> I'm starting a separate thread on this :-)
>>>
>>> We currently have the following kinds of serialized formats
>>>
>>> a) <only used for client-server communication>: various "delta" formats - these
>>> require the original CAS be available.
>>>
>>> b) various forms that can be saved to disk and reloaded.
>>>
>>> ----------
>>>
>>> For (b), there are:
>>>
>>>  - XCAS, XMI (these are xml based)
>>>
>>>  - binary (plain, compressed form 4, form 6)
>>>
>>>  - some compound forms using Java object serialization and encoding type systems
>>> as well
>>>
>>> ----------
>>>
>>> The binary forms are "self-identifying" for deserialization, so one piece of
>>> deserializer code can "read" the format and pick the right deserializer.
>>>
>>> Is it correct to assume the request for enhancement from Peter for the CAS
>>> Editor is to have that support the various forms (b)?  If so, which ones does it
>>> not support at the moment?
>>>
>>> -Marshall
>>>


Re: making the CAS Editor work with any UIMA supported serialization format

Posted by Marshall Schor <ms...@schor.com>.
Sounds good. 

If this can get done in a few days, I guess we can hold up the release. 
Otherwise, I'd like to go ahead with the release, and pick this improvement up
on the next one.

Cheers. -Marshall


On 7/16/2016 6:55 AM, Peter Kl�gl wrote:
> Hi,
>
>
> yes, it's (b). Right now, the CAS Editor only supports XMI and XCAS I
> far as I remember. No binary format is supported.
>
>
> I am thinking about some utils class for CAS serialization in
> uimaj-core. The read method should be somewhat generic (detecting the
> format), the write method takes the format as argument.
>
>
> This utils class can then be utilized in the CAS Editor (and also by
> others) to read and write the CAS files (with the format it was read).
>
>
> Best,
>
> Peter
>
>
> Am 15.07.2016 um 20:35 schrieb Marshall Schor:
>> Hi,
>>
>> I'm starting a separate thread on this :-)
>>
>> We currently have the following kinds of serialized formats
>>
>> a) <only used for client-server communication>: various "delta" formats - these
>> require the original CAS be available.
>>
>> b) various forms that can be saved to disk and reloaded.
>>
>> ----------
>>
>> For (b), there are:
>>
>>  - XCAS, XMI (these are xml based)
>>
>>  - binary (plain, compressed form 4, form 6)
>>
>>  - some compound forms using Java object serialization and encoding type systems
>> as well
>>
>> ----------
>>
>> The binary forms are "self-identifying" for deserialization, so one piece of
>> deserializer code can "read" the format and pick the right deserializer.
>>
>> Is it correct to assume the request for enhancement from Peter for the CAS
>> Editor is to have that support the various forms (b)?  If so, which ones does it
>> not support at the moment?
>>
>> -Marshall
>>
>


Re: making the CAS Editor work with any UIMA supported serialization format

Posted by Peter Klügl <pe...@averbis.com>.
Hi,


yes, it's (b). Right now, the CAS Editor only supports XMI and XCAS I
far as I remember. No binary format is supported.


I am thinking about some utils class for CAS serialization in
uimaj-core. The read method should be somewhat generic (detecting the
format), the write method takes the format as argument.


This utils class can then be utilized in the CAS Editor (and also by
others) to read and write the CAS files (with the format it was read).


Best,

Peter


Am 15.07.2016 um 20:35 schrieb Marshall Schor:
> Hi,
>
> I'm starting a separate thread on this :-)
>
> We currently have the following kinds of serialized formats
>
> a) <only used for client-server communication>: various "delta" formats - these
> require the original CAS be available.
>
> b) various forms that can be saved to disk and reloaded.
>
> ----------
>
> For (b), there are:
>
>  - XCAS, XMI (these are xml based)
>
>  - binary (plain, compressed form 4, form 6)
>
>  - some compound forms using Java object serialization and encoding type systems
> as well
>
> ----------
>
> The binary forms are "self-identifying" for deserialization, so one piece of
> deserializer code can "read" the format and pick the right deserializer.
>
> Is it correct to assume the request for enhancement from Peter for the CAS
> Editor is to have that support the various forms (b)?  If so, which ones does it
> not support at the moment?
>
> -Marshall
>


Re: making the CAS Editor work with any UIMA supported serialization format

Posted by Tami Takamiya <tt...@us.ibm.com>.
It would be great if the authors of Argo could make their RDF 
serializer/deserializer code open.
The ideas (e.g. write annotators in SPARQL)  presented in their articles 
are very clever.

I saw UIMA RDF CAS Consumer 

   
https://uima.apache.org/downloads/sandbox/RDF_CC/RDFCASConsumerUserGuide.html

which uses codes from Apache Clerezza.  But it does not seem to have been 
updated
recently and does not have the deserialization feature.  Although I 
thought I might be 
able to re-invent (inferior) wheels by myself, I did not have enough guts 
and time.

Tami (Masaaki) Takamiya


Richard Eckart de Castilho <re...@apache.org> wrote on 07/16/2016 04:21:16 
AM:

> From: Richard Eckart de Castilho <re...@apache.org>
> To: dev@uima.apache.org
> Date: 07/16/2016 04:21 AM
> Subject: Re: making the CAS Editor work with any UIMA supported 
> serialization format
> 
> Hi,
> 
> presently, Argo is closed source - although I hope that the authors 
> will change that in the future (at least partially).
> 
> Best,
> 
> -- Richard
> 
> > On 15.07.2016, at 21:32, Marshall Schor <ms...@schor.com> wrote:
> > 
> > Hi Tami,
> > 
> > We would welcome a contribution for this.
> > 
> > Can you investigate if the work done to support this for the Argo 
> system could
> > be reused (what kind of license, etc.)?
> > 
> > -Marshall
> 



Re: making the CAS Editor work with any UIMA supported serialization format

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi,

presently, Argo is closed source - although I hope that the authors will change that in the future (at least partially).

Best,

-- Richard

> On 15.07.2016, at 21:32, Marshall Schor <ms...@schor.com> wrote:
> 
> Hi Tami,
> 
> We would welcome a contribution for this.
> 
> Can you investigate if the work done to support this for the Argo system could
> be reused (what kind of license, etc.)?
> 
> -Marshall


Re: making the CAS Editor work with any UIMA supported serialization format

Posted by Marshall Schor <ms...@schor.com>.
Hi Tami,

We would welcome a contribution for this.

Can you investigate if the work done to support this for the Argo system could
be reused (what kind of license, etc.)?

-Marshall


On 7/15/2016 3:01 PM, Tami Takamiya wrote:
> Marshall Schor <ms...@schor.com> wrote on 07/15/2016 02:35:38 PM:
>
>> b) various forms that can be saved to disk and reloaded.
>>
>> ----------
>>
>> For (b), there are:
>>
>>  - XCAS, XMI (these are xml based)
>>
>>  - binary (plain, compressed form 4, form 6)
>>
>>  - some compound forms using Java object serialization and encoding 
>> type systems
>> as well
>>
>> ----------
> Although it is off-topic, I wish that UIMA could add RDF as a supported
> serialization format as I see in this article 
> (http://www.aclweb.org/anthology/P13-4020) on an implementation in
> the Argo system. 
>
> Because an RDF graph can contain both type system and data, it would 
> be an ideal format for editing.  Also a standard query language (SPARQL) 
> exists for automated editing.
>
> Tami (Masaaki) Takamiya
>
>


Re: making the CAS Editor work with any UIMA supported serialization format

Posted by Tami Takamiya <tt...@us.ibm.com>.
Marshall Schor <ms...@schor.com> wrote on 07/15/2016 02:35:38 PM:

> b) various forms that can be saved to disk and reloaded.
> 
> ----------
> 
> For (b), there are:
> 
>  - XCAS, XMI (these are xml based)
> 
>  - binary (plain, compressed form 4, form 6)
> 
>  - some compound forms using Java object serialization and encoding 
> type systems
> as well
> 
> ----------

Although it is off-topic, I wish that UIMA could add RDF as a supported
serialization format as I see in this article 
(http://www.aclweb.org/anthology/P13-4020) on an implementation in
the Argo system. 

Because an RDF graph can contain both type system and data, it would 
be an ideal format for editing.  Also a standard query language (SPARQL) 
exists for automated editing.

Tami (Masaaki) Takamiya