You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Richard Eckart de Castilho <ri...@gmail.com> on 2013/08/02 20:29:31 UTC

Re: How to use the new binary CAS (de)serialization?

Hi,

I'm still trying to use the new serialization methods but continue
running into problems.

Last time we discussed that I need to know the original type system
when I want to deserialize a format 6 binary CAS into a CAS.

So when I serialize the CAS now, I first write a header, then I
dump the type system into my output stream, and then the binary CAS
using 

serializeWithCompression(cas, outputStream, cas.getTypeSystem());


When I read the data, I check for my header. If it is there, I
read the type system.

Now I wanted to call

deserializeCAS(cas, inputStream, typeSystem, null);

Unfortunately, that fails. The reason is, that this signature of
deserializeCAS immediately uses the BinaryCasSerDes6 to read
data from the input stream. However, serializeWithCompression
writes a header before the data that BinaryCasSerDes6. This
header is read by a deserializeCAS(cas, inputStream), but
in this signature, I have no way of specifying the original
type system.

Of course I can copy the whole header checking code from CASImpl,
but I don't think that is a good solution. I think the
deserializeCAS methods that UIMA provides should either all deal
with the header that the serializeWithCompression methods write,
or none should.

Maybe a solution for this dilemma is something that could also
go into a 2.4.2 release.

Cheers,

-- Richard

Re: How to use the new binary CAS (de)serialization?

Posted by Richard Eckart de Castilho <ri...@gmail.com>.
I have it working by now. My last issue was the subtyping of 
the document annotation, which was a problem in a unit test
that I wrote, but is unlikely to be a problem in actual use.

-- Richard

Am 05.08.2013 um 17:40 schrieb Marshall Schor <ms...@schor.com>:

> I think if you "pre-read" some info from a stream, and then pass that stream to
> the reinit (or other method of binary deserialization), it just continues
> reading from wherever the stream was positioned, so I think your approach ought
> to work...
> 
> -Marshall
> 
> On 8/2/2013 2:34 PM, Richard Eckart de Castilho wrote:
>> Hm, I just notice that my problem analysis was not quite correct.
>> BinaryCasSerDes6 indeed is able to handle the header… so my problem
>> must be somewhere else.
>> 
>> -- Richard 
>> 
>> Am 02.08.2013 um 20:29 schrieb Richard Eckart de Castilho <ri...@gmail.com>:
>> 
>>> Hi,
>>> 
>>> I'm still trying to use the new serialization methods but continue
>>> running into problems.
>>> 
>>> Last time we discussed that I need to know the original type system
>>> when I want to deserialize a format 6 binary CAS into a CAS.
>>> 
>>> So when I serialize the CAS now, I first write a header, then I
>>> dump the type system into my output stream, and then the binary CAS
>>> using 
>>> 
>>> serializeWithCompression(cas, outputStream, cas.getTypeSystem());
>>> 
>>> 
>>> When I read the data, I check for my header. If it is there, I
>>> read the type system.
>>> 
>>> Now I wanted to call
>>> 
>>> deserializeCAS(cas, inputStream, typeSystem, null);
>>> 
>>> Unfortunately, that fails. The reason is, that this signature of
>>> deserializeCAS immediately uses the BinaryCasSerDes6 to read
>>> data from the input stream. However, serializeWithCompression
>>> writes a header before the data that BinaryCasSerDes6. This
>>> header is read by a deserializeCAS(cas, inputStream), but
>>> in this signature, I have no way of specifying the original
>>> type system.
>>> 
>>> Of course I can copy the whole header checking code from CASImpl,
>>> but I don't think that is a good solution. I think the
>>> deserializeCAS methods that UIMA provides should either all deal
>>> with the header that the serializeWithCompression methods write,
>>> or none should.
>>> 
>>> Maybe a solution for this dilemma is something that could also
>>> go into a 2.4.2 release.
>>> 
>>> Cheers,
>>> 
>>> -- Richard


Re: How to use the new binary CAS (de)serialization?

Posted by Marshall Schor <ms...@schor.com>.
I think if you "pre-read" some info from a stream, and then pass that stream to
the reinit (or other method of binary deserialization), it just continues
reading from wherever the stream was positioned, so I think your approach ought
to work...

-Marshall

On 8/2/2013 2:34 PM, Richard Eckart de Castilho wrote:
> Hm, I just notice that my problem analysis was not quite correct.
> BinaryCasSerDes6 indeed is able to handle the header… so my problem
> must be somewhere else.
>
> -- Richard 
>
> Am 02.08.2013 um 20:29 schrieb Richard Eckart de Castilho <ri...@gmail.com>:
>
>> Hi,
>>
>> I'm still trying to use the new serialization methods but continue
>> running into problems.
>>
>> Last time we discussed that I need to know the original type system
>> when I want to deserialize a format 6 binary CAS into a CAS.
>>
>> So when I serialize the CAS now, I first write a header, then I
>> dump the type system into my output stream, and then the binary CAS
>> using 
>>
>> serializeWithCompression(cas, outputStream, cas.getTypeSystem());
>>
>>
>> When I read the data, I check for my header. If it is there, I
>> read the type system.
>>
>> Now I wanted to call
>>
>> deserializeCAS(cas, inputStream, typeSystem, null);
>>
>> Unfortunately, that fails. The reason is, that this signature of
>> deserializeCAS immediately uses the BinaryCasSerDes6 to read
>> data from the input stream. However, serializeWithCompression
>> writes a header before the data that BinaryCasSerDes6. This
>> header is read by a deserializeCAS(cas, inputStream), but
>> in this signature, I have no way of specifying the original
>> type system.
>>
>> Of course I can copy the whole header checking code from CASImpl,
>> but I don't think that is a good solution. I think the
>> deserializeCAS methods that UIMA provides should either all deal
>> with the header that the serializeWithCompression methods write,
>> or none should.
>>
>> Maybe a solution for this dilemma is something that could also
>> go into a 2.4.2 release.
>>
>> Cheers,
>>
>> -- Richard
>


Re: How to use the new binary CAS (de)serialization?

Posted by Richard Eckart de Castilho <ri...@gmail.com>.
Hm, I just notice that my problem analysis was not quite correct.
BinaryCasSerDes6 indeed is able to handle the header… so my problem
must be somewhere else.

-- Richard 

Am 02.08.2013 um 20:29 schrieb Richard Eckart de Castilho <ri...@gmail.com>:

> Hi,
> 
> I'm still trying to use the new serialization methods but continue
> running into problems.
> 
> Last time we discussed that I need to know the original type system
> when I want to deserialize a format 6 binary CAS into a CAS.
> 
> So when I serialize the CAS now, I first write a header, then I
> dump the type system into my output stream, and then the binary CAS
> using 
> 
> serializeWithCompression(cas, outputStream, cas.getTypeSystem());
> 
> 
> When I read the data, I check for my header. If it is there, I
> read the type system.
> 
> Now I wanted to call
> 
> deserializeCAS(cas, inputStream, typeSystem, null);
> 
> Unfortunately, that fails. The reason is, that this signature of
> deserializeCAS immediately uses the BinaryCasSerDes6 to read
> data from the input stream. However, serializeWithCompression
> writes a header before the data that BinaryCasSerDes6. This
> header is read by a deserializeCAS(cas, inputStream), but
> in this signature, I have no way of specifying the original
> type system.
> 
> Of course I can copy the whole header checking code from CASImpl,
> but I don't think that is a good solution. I think the
> deserializeCAS methods that UIMA provides should either all deal
> with the header that the serializeWithCompression methods write,
> or none should.
> 
> Maybe a solution for this dilemma is something that could also
> go into a 2.4.2 release.
> 
> Cheers,
> 
> -- Richard