You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Richard Eckart de Castilho <ri...@gmail.com> on 2013/07/04 22:20:44 UTC

How to use the new binary CAS (de)serialization?

Hi Marshall,

I'd like to try out the new CAS (de)serialization stuff you did recently.

I try serialization with three approaches:

case 0: Serialization.serializeCAS(aJCas.getCas(), docOS); break;
case 4: Serialization.serializeWithCompression(aJCas.getCas(), docOS); break;
case 6: Serialization.serializeWithCompression(aJCas.getCas(), docOS, aJCas.getTypeSystem()); break;

Then I try to load the data back into a CAS (within a reader component) using

Serialization.deserializeCAS(aCAS, is);

For cases 0 and 4, this appears to work. But in case 6, the document text is <null> after
deserializing.

Apparently, I'm doing something wrong - but I have no idea what. Can you give me a hint?

-- Richard



Re: How to use the new binary CAS (de)serialization?

Posted by Richard Eckart de Castilho <ri...@gmail.com>.
I have it working by now. My last issue was the subtyping of 
the document annotation, which was a problem in a unit test
that I wrote, but is unlikely to be a problem in actual use.

-- Richard

Am 05.08.2013 um 17:40 schrieb Marshall Schor <ms...@schor.com>:

> I think if you "pre-read" some info from a stream, and then pass that stream to
> the reinit (or other method of binary deserialization), it just continues
> reading from wherever the stream was positioned, so I think your approach ought
> to work...
> 
> -Marshall
> 
> On 8/2/2013 2:34 PM, Richard Eckart de Castilho wrote:
>> Hm, I just notice that my problem analysis was not quite correct.
>> BinaryCasSerDes6 indeed is able to handle the header… so my problem
>> must be somewhere else.
>> 
>> -- Richard 
>> 
>> Am 02.08.2013 um 20:29 schrieb Richard Eckart de Castilho <ri...@gmail.com>:
>> 
>>> Hi,
>>> 
>>> I'm still trying to use the new serialization methods but continue
>>> running into problems.
>>> 
>>> Last time we discussed that I need to know the original type system
>>> when I want to deserialize a format 6 binary CAS into a CAS.
>>> 
>>> So when I serialize the CAS now, I first write a header, then I
>>> dump the type system into my output stream, and then the binary CAS
>>> using 
>>> 
>>> serializeWithCompression(cas, outputStream, cas.getTypeSystem());
>>> 
>>> 
>>> When I read the data, I check for my header. If it is there, I
>>> read the type system.
>>> 
>>> Now I wanted to call
>>> 
>>> deserializeCAS(cas, inputStream, typeSystem, null);
>>> 
>>> Unfortunately, that fails. The reason is, that this signature of
>>> deserializeCAS immediately uses the BinaryCasSerDes6 to read
>>> data from the input stream. However, serializeWithCompression
>>> writes a header before the data that BinaryCasSerDes6. This
>>> header is read by a deserializeCAS(cas, inputStream), but
>>> in this signature, I have no way of specifying the original
>>> type system.
>>> 
>>> Of course I can copy the whole header checking code from CASImpl,
>>> but I don't think that is a good solution. I think the
>>> deserializeCAS methods that UIMA provides should either all deal
>>> with the header that the serializeWithCompression methods write,
>>> or none should.
>>> 
>>> Maybe a solution for this dilemma is something that could also
>>> go into a 2.4.2 release.
>>> 
>>> Cheers,
>>> 
>>> -- Richard


Re: How to use the new binary CAS (de)serialization?

Posted by Marshall Schor <ms...@schor.com>.
I think if you "pre-read" some info from a stream, and then pass that stream to
the reinit (or other method of binary deserialization), it just continues
reading from wherever the stream was positioned, so I think your approach ought
to work...

-Marshall

On 8/2/2013 2:34 PM, Richard Eckart de Castilho wrote:
> Hm, I just notice that my problem analysis was not quite correct.
> BinaryCasSerDes6 indeed is able to handle the header… so my problem
> must be somewhere else.
>
> -- Richard 
>
> Am 02.08.2013 um 20:29 schrieb Richard Eckart de Castilho <ri...@gmail.com>:
>
>> Hi,
>>
>> I'm still trying to use the new serialization methods but continue
>> running into problems.
>>
>> Last time we discussed that I need to know the original type system
>> when I want to deserialize a format 6 binary CAS into a CAS.
>>
>> So when I serialize the CAS now, I first write a header, then I
>> dump the type system into my output stream, and then the binary CAS
>> using 
>>
>> serializeWithCompression(cas, outputStream, cas.getTypeSystem());
>>
>>
>> When I read the data, I check for my header. If it is there, I
>> read the type system.
>>
>> Now I wanted to call
>>
>> deserializeCAS(cas, inputStream, typeSystem, null);
>>
>> Unfortunately, that fails. The reason is, that this signature of
>> deserializeCAS immediately uses the BinaryCasSerDes6 to read
>> data from the input stream. However, serializeWithCompression
>> writes a header before the data that BinaryCasSerDes6. This
>> header is read by a deserializeCAS(cas, inputStream), but
>> in this signature, I have no way of specifying the original
>> type system.
>>
>> Of course I can copy the whole header checking code from CASImpl,
>> but I don't think that is a good solution. I think the
>> deserializeCAS methods that UIMA provides should either all deal
>> with the header that the serializeWithCompression methods write,
>> or none should.
>>
>> Maybe a solution for this dilemma is something that could also
>> go into a 2.4.2 release.
>>
>> Cheers,
>>
>> -- Richard
>


Re: How to use the new binary CAS (de)serialization?

Posted by Richard Eckart de Castilho <ri...@gmail.com>.
Hm, I just notice that my problem analysis was not quite correct.
BinaryCasSerDes6 indeed is able to handle the header… so my problem
must be somewhere else.

-- Richard 

Am 02.08.2013 um 20:29 schrieb Richard Eckart de Castilho <ri...@gmail.com>:

> Hi,
> 
> I'm still trying to use the new serialization methods but continue
> running into problems.
> 
> Last time we discussed that I need to know the original type system
> when I want to deserialize a format 6 binary CAS into a CAS.
> 
> So when I serialize the CAS now, I first write a header, then I
> dump the type system into my output stream, and then the binary CAS
> using 
> 
> serializeWithCompression(cas, outputStream, cas.getTypeSystem());
> 
> 
> When I read the data, I check for my header. If it is there, I
> read the type system.
> 
> Now I wanted to call
> 
> deserializeCAS(cas, inputStream, typeSystem, null);
> 
> Unfortunately, that fails. The reason is, that this signature of
> deserializeCAS immediately uses the BinaryCasSerDes6 to read
> data from the input stream. However, serializeWithCompression
> writes a header before the data that BinaryCasSerDes6. This
> header is read by a deserializeCAS(cas, inputStream), but
> in this signature, I have no way of specifying the original
> type system.
> 
> Of course I can copy the whole header checking code from CASImpl,
> but I don't think that is a good solution. I think the
> deserializeCAS methods that UIMA provides should either all deal
> with the header that the serializeWithCompression methods write,
> or none should.
> 
> Maybe a solution for this dilemma is something that could also
> go into a 2.4.2 release.
> 
> Cheers,
> 
> -- Richard


Re: How to use the new binary CAS (de)serialization?

Posted by Richard Eckart de Castilho <ri...@gmail.com>.
Hi,

I'm still trying to use the new serialization methods but continue
running into problems.

Last time we discussed that I need to know the original type system
when I want to deserialize a format 6 binary CAS into a CAS.

So when I serialize the CAS now, I first write a header, then I
dump the type system into my output stream, and then the binary CAS
using 

serializeWithCompression(cas, outputStream, cas.getTypeSystem());


When I read the data, I check for my header. If it is there, I
read the type system.

Now I wanted to call

deserializeCAS(cas, inputStream, typeSystem, null);

Unfortunately, that fails. The reason is, that this signature of
deserializeCAS immediately uses the BinaryCasSerDes6 to read
data from the input stream. However, serializeWithCompression
writes a header before the data that BinaryCasSerDes6. This
header is read by a deserializeCAS(cas, inputStream), but
in this signature, I have no way of specifying the original
type system.

Of course I can copy the whole header checking code from CASImpl,
but I don't think that is a good solution. I think the
deserializeCAS methods that UIMA provides should either all deal
with the header that the serializeWithCompression methods write,
or none should.

Maybe a solution for this dilemma is something that could also
go into a 2.4.2 release.

Cheers,

-- Richard

Re: How to use the new binary CAS (de)serialization?

Posted by Marshall Schor <ms...@schor.com>.
On 7/8/2013 6:06 PM, Richard Eckart de Castilho wrote:
> Am 08.07.2013 um 23:49 schrieb Marshall Schor <ms...@schor.com>:
>
>>> The documentation says:
>>>
>>>> Deserialize with type filtering:
>>>>
>>>> The reuseInfo should be null unless deserializing a delta CAS, in which case, it must be the reuse info captured when the original CAS was serialized out. If the target type system is identical to the one in the CAS, you may pass null for it. If a delta cas is not being received, you must pass null for the reuseInfo.
>>>>
>>>> Serialization.deserializeCAS(cas, bais, tgtTypeSystem, reuseInfo);
>>> So I assume that when I deserialize my persisted CAS into a fresh one which doesn't contain any types, the only thing that should arrive is the SofA. But, no matter what serialization format I use (0, 4, or 6), I always get an ArrayIndexOutOfBoundsException.
>>>
>>> I create the target CAS like this:
>>>
>>>        CAS cas = CasCreationUtils.createCas((TypeSystemDescription) null, null, null);
>>>
>>> Format 6:
>>>
>>> java.lang.ArrayIndexOutOfBoundsException: 37
>>> 	at org.apache.uima.cas.impl.TypeSystemImpl.getTypeInfo(TypeSystemImpl.java:1566)
>>> 	at org.apache.uima.cas.impl.BinaryCasSerDes6.deserializeAfterVersion(BinaryCasSerDes6.java:1701)
>>> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1203)
>>> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1168)
>>> 	at org.apache.uima.cas.impl.Serialization.deserializeCAS(Serialization.java:171)
>>>        …
>>>
>>> Am I misunderstanding how the (de)serialization is supposed to work?
>> Form 6 supports having different type systems.  When using this, it expects the
>> "other" type system to be passed in, as a type system impl object.  If "null" is
>> passed in, then it assumes the "other" type system is identical to the first
>> one.  (this is what the JavaDocs mean, when it says:
>>
>> If the target type system is identical to the one in the CAS, you may pass null for it. 
> In the sentence above, I assumed that "CAS" means "the which I deserialize into/the target CAS"
> and that "target type system is identical" means that "I want all types available in the target
> CAS to be deserialized/I do not want any types that are available in the target CAS to be ignored".
Yes, I had a hard time figuring out for all the use cases how to "name" the
various parts...

In that description, "CAS" did mean "that which I deserialize into".  The
"target type system is identical" means that the type system of the CAS and the
type system of the serialized data are the same.   Basically, the serialization
/ deserialization mechanism needs to know both type systems, in order to figure
out how to decode things.
>
>> So, to make form 6 work for you, you have to do something like:
>>
>>  a) Create an instance of a type system impl for the types in your serialized form.
>> For instance, if you created a CAS with some types in it, and serialized it,
>> before you get rid of that CAS, save its type system in a variable:
>>
>>    TypeSystem tsThatWasSerialized = theCASthatWasSerialized.getTypeSystem();
>>
>> Use this type system as the argument, (not "null") when calling the form 6 style deserialize:
>>
>> Serialization.deserializeCAS(cas, bais, tsThatWasSerialized, null);
>>
>> Is that something like what you did? 
> Nope, that's not what I did. I thought it was not necessary to preserve the "source" type
> system. 
Well, it is for this.
> I interpreted the documentation such that "tsThatWasSerialized" was not the "source"
> type system, but the "target" type system (e.g. a subset of the actual target CAS type system).
I apologize for the confusion.  I'm happy to improve the documentation
(suggestions welcome).  I did struggle to find some wording that would work for
the various use-cases.
>
> Ignoring the potential waste of space, wouldn't you find it useful to serialize all used 
> types of the type system as part of the format 6, thus avoiding to have to maintain an
> external copy of the type system? 
Sure. But it's at a cost (space and time).  I think that there are many use
cases of this serialization (e.g., for sending CASes in UIMA-AS between nodes)
where sending the ts along is not needed.  This was the original motivation for
doing this type-mapping kind of thing.  It allows the following scenario and
efficiencies:

1) Imagine a UIMA pipeline acting as a client for a UIMA service running
remotely.  That UIMA service has some type system, let's call it:  TS_service,
that it defines for what it is, that it does. 

   Note that the same service might be used by many different UIMA Clients, each
having some different type system.

2) When the UIMA client pipeline starts up, UIMA forms a combined type system -
combining the types from the Service with those of the client.  In doing this,
the client acquires knowledge of each service's type system.  The type system
merge UIMA does at startup time would typically result in a (much) bigger type
system than TS_service.

3) When it's time to send a CAS to the service, when using form 6, the
serialization takes advantage of the knowledge of the service's TS_service, and
only sends the parts of the CAS having those types to the service.  This is in
contrast to other methods of client-service communication, which send the entire
CAS to the service.
> The CasCompleterSerializer conveniently wraps up all
> data (CAS + type system) in a single serializable object. I find that very convenient.
> The only annoying part is, that it's not possible to deserialize that into a CAS with
> a new type system, e.g. with some types added or removed.
I think it would be pretty easy to do a similar thing with the compressed binary
forms.
> Btw. it might be nice if deserializeCas() could not only detect the formats 0, 4 and 6, but
> also serialized forms of the CasCompleterSerializer.
Another enhancement :-)
>
> Did you do any performance measures for the new serialization forms?
I think it depends pretty heavily on the kind of machine you run on.  In running
on my Intel i7 laptop, which has a multi-level L1, L2, etc. cache architecture,
it actually ran faster than plain binary serialization, on large test CASes.  I
suspect this was because it compressed these so much.  (And I was measuring
using bytearray style input/output, not writing to disk).  Needless-to-say, I
was pretty surprised by this.

-Marshall

Re: How to use the new binary CAS (de)serialization?

Posted by Richard Eckart de Castilho <ri...@gmail.com>.
Am 08.07.2013 um 23:49 schrieb Marshall Schor <ms...@schor.com>:

>> The documentation says:
>> 
>>> Deserialize with type filtering:
>>> 
>>> The reuseInfo should be null unless deserializing a delta CAS, in which case, it must be the reuse info captured when the original CAS was serialized out. If the target type system is identical to the one in the CAS, you may pass null for it. If a delta cas is not being received, you must pass null for the reuseInfo.
>>> 
>>> Serialization.deserializeCAS(cas, bais, tgtTypeSystem, reuseInfo);
>> So I assume that when I deserialize my persisted CAS into a fresh one which doesn't contain any types, the only thing that should arrive is the SofA. But, no matter what serialization format I use (0, 4, or 6), I always get an ArrayIndexOutOfBoundsException.
>> 
>> I create the target CAS like this:
>> 
>>        CAS cas = CasCreationUtils.createCas((TypeSystemDescription) null, null, null);
>> 
>> Format 6:
>> 
>> java.lang.ArrayIndexOutOfBoundsException: 37
>> 	at org.apache.uima.cas.impl.TypeSystemImpl.getTypeInfo(TypeSystemImpl.java:1566)
>> 	at org.apache.uima.cas.impl.BinaryCasSerDes6.deserializeAfterVersion(BinaryCasSerDes6.java:1701)
>> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1203)
>> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1168)
>> 	at org.apache.uima.cas.impl.Serialization.deserializeCAS(Serialization.java:171)
>>        …
>> 
>> Am I misunderstanding how the (de)serialization is supposed to work?
> 
> Form 6 supports having different type systems.  When using this, it expects the
> "other" type system to be passed in, as a type system impl object.  If "null" is
> passed in, then it assumes the "other" type system is identical to the first
> one.  (this is what the JavaDocs mean, when it says:
> 
> If the target type system is identical to the one in the CAS, you may pass null for it. 

In the sentence above, I assumed that "CAS" means "the which I deserialize into/the target CAS"
and that "target type system is identical" means that "I want all types available in the target
CAS to be deserialized/I do not want any types that are available in the target CAS to be ignored".

> So, to make form 6 work for you, you have to do something like:
> 
>  a) Create an instance of a type system impl for the types in your serialized form.
> For instance, if you created a CAS with some types in it, and serialized it,
> before you get rid of that CAS, save its type system in a variable:
> 
>    TypeSystem tsThatWasSerialized = theCASthatWasSerialized.getTypeSystem();
> 
> Use this type system as the argument, (not "null") when calling the form 6 style deserialize:
> 
> Serialization.deserializeCAS(cas, bais, tsThatWasSerialized, null);
> 
> Is that something like what you did? 

Nope, that's not what I did. I thought it was not necessary to preserve the "source" type
system. I interpreted the documentation such that "tsThatWasSerialized" was not the "source"
type system, but the "target" type system (e.g. a subset of the actual target CAS type system).

Ignoring the potential waste of space, wouldn't you find it useful to serialize all used 
types of the type system as part of the format 6, thus avoiding to have to maintain an
external copy of the type system? The CasCompleterSerializer conveniently wraps up all
data (CAS + type system) in a single serializable object. I find that very convenient.
The only annoying part is, that it's not possible to deserialize that into a CAS with
a new type system, e.g. with some types added or removed.

Btw. it might be nice if deserializeCas() could not only detect the formats 0, 4 and 6, but
also serialized forms of the CasCompleterSerializer.

Did you do any performance measures for the new serialization forms?

-- Richard

Re: How to use the new binary CAS (de)serialization?

Posted by Marshall Schor <ms...@schor.com>.
On 7/8/2013 5:00 PM, Richard Eckart de Castilho wrote:
> Thanks for fixing the issue :)
Thank you for finding the bug :-)
>
> Now I'm trying another basic operation: serializing a CAS with a type system and deserializing
> into a CAS with zero types. I bump into two problems:
>
>
> First one:
>
> Following what to me would appear as the path of least surprise, I assumed that
>
> 1) Serialization.deserializeCAS(cas, bais, null, null);
>
> should behave the same as
>
> 2) Serialization.deserializeCAS(cas, bais);
>
> It apparently doesn't. 1) declares a ResourceInitializationException and only reads format 6 CASes, while 2) appears to accept form 0 (is that the correct name?), 4, and 6, and does not throw a ResourceInitializationException.
I like the principle of least surprise :-)...

In this "edge" case, where the deserializeCAS is called with 4 args, but the
last 2 are null, I agree that a better implementation would be that it should
behave just like the 2 arg form.  I'll add that...

>
>
> Second one:
>
> The documentation says:
>
>> Deserialize with type filtering:
>>
>> The reuseInfo should be null unless deserializing a delta CAS, in which case, it must be the reuse info captured when the original CAS was serialized out. If the target type system is identical to the one in the CAS, you may pass null for it. If a delta cas is not being received, you must pass null for the reuseInfo.
>>
>> Serialization.deserializeCAS(cas, bais, tgtTypeSystem, reuseInfo);
> So I assume that when I deserialize my persisted CAS into a fresh one which doesn't contain any types, the only thing that should arrive is the SofA. But, no matter what serialization format I use (0, 4, or 6), I always get an ArrayIndexOutOfBoundsException.
>
> I create the target CAS like this:
>
>         CAS cas = CasCreationUtils.createCas((TypeSystemDescription) null, null, null);
>
>
> Format 0: 
>
> java.lang.ArrayIndexOutOfBoundsException: 37
> 	at org.apache.uima.cas.impl.FSIndexRepositoryImpl.incrementIllegalIndexUpdateDetector(FSIndexRepositoryImpl.java:1543)
> 	at org.apache.uima.cas.impl.FSIndexRepositoryImpl.ll_addFS(FSIndexRepositoryImpl.java:1625)
> 	at org.apache.uima.cas.impl.FSIndexRepositoryImpl.addFS(FSIndexRepositoryImpl.java:1059)
> 	at org.apache.uima.cas.impl.CASImpl.reinitIndexedFSs(CASImpl.java:1480)
> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1282)
> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1168)
> 	at org.apache.uima.cas.impl.Serialization.deserializeCAS(Serialization.java:171)
>         …
>
> Format 4:
>
> java.lang.ArrayIndexOutOfBoundsException: 37
> 	at org.apache.uima.cas.impl.BinaryCasSerDes4.getTypeInfo(BinaryCasSerDes4.java:2497)
> 	at org.apache.uima.cas.impl.BinaryCasSerDes4.access$1(BinaryCasSerDes4.java:2496)
> 	at org.apache.uima.cas.impl.BinaryCasSerDes4$Deserializer.deserialize(BinaryCasSerDes4.java:1621)
> 	at org.apache.uima.cas.impl.BinaryCasSerDes4$Deserializer.access$18(BinaryCasSerDes4.java:1567)
> 	at org.apache.uima.cas.impl.BinaryCasSerDes4.deserialize(BinaryCasSerDes4.java:360)
> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1197)
> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1168)
> 	at org.apache.uima.cas.impl.Serialization.deserializeCAS(Serialization.java:171)
>         …
>
> Format 6:
>
> java.lang.ArrayIndexOutOfBoundsException: 37
> 	at org.apache.uima.cas.impl.TypeSystemImpl.getTypeInfo(TypeSystemImpl.java:1566)
> 	at org.apache.uima.cas.impl.BinaryCasSerDes6.deserializeAfterVersion(BinaryCasSerDes6.java:1701)
> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1203)
> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1168)
> 	at org.apache.uima.cas.impl.Serialization.deserializeCAS(Serialization.java:171)
>         …
>
> Am I misunderstanding how the (de)serialization is supposed to work?
Form 0 and 4 do not support binary serialization / deserialization unless the
source and target type systems are identical.  If this is not the case, you'll
get errors like you saw.

Form 6 supports having different type systems.  When using this, it expects the
"other" type system to be passed in, as a type system impl object.  If "null" is
passed in, then it assumes the "other" type system is identical to the first
one.  (this is what the JavaDocs mean, when it says:

If the target type system is identical to the one in the CAS, you may pass null for it. 


So, to make form 6 work for you, you have to do something like:

  a) Create an instance of a type system impl for the types in your serialized form.
For instance, if you created a CAS with some types in it, and serialized it,
before you
get rid of that CAS, save its type system in a variable:

    TypeSystem tsThatWasSerialized = theCASthatWasSerialized.getTypeSystem();

Use this type system as the argument, (not "null") when calling the form 6 style deserialize:

Serialization.deserializeCAS(cas, bais, tsThatWasSerialized, null);

Is that something like what you did? 

-Marshall


Re: How to use the new binary CAS (de)serialization?

Posted by Richard Eckart de Castilho <ri...@gmail.com>.
Thanks for fixing the issue :)

Now I'm trying another basic operation: serializing a CAS with a type system and deserializing
into a CAS with zero types. I bump into two problems:


First one:

Following what to me would appear as the path of least surprise, I assumed that

1) Serialization.deserializeCAS(cas, bais, null, null);

should behave the same as

2) Serialization.deserializeCAS(cas, bais);

It apparently doesn't. 1) declares a ResourceInitializationException and only reads format 6 CASes, while 2) appears to accept form 0 (is that the correct name?), 4, and 6, and does not throw a ResourceInitializationException.


Second one:

The documentation says:

> Deserialize with type filtering:
> 
> The reuseInfo should be null unless deserializing a delta CAS, in which case, it must be the reuse info captured when the original CAS was serialized out. If the target type system is identical to the one in the CAS, you may pass null for it. If a delta cas is not being received, you must pass null for the reuseInfo.
> 
> Serialization.deserializeCAS(cas, bais, tgtTypeSystem, reuseInfo);

So I assume that when I deserialize my persisted CAS into a fresh one which doesn't contain any types, the only thing that should arrive is the SofA. But, no matter what serialization format I use (0, 4, or 6), I always get an ArrayIndexOutOfBoundsException.

I create the target CAS like this:

        CAS cas = CasCreationUtils.createCas((TypeSystemDescription) null, null, null);


Format 0: 

java.lang.ArrayIndexOutOfBoundsException: 37
	at org.apache.uima.cas.impl.FSIndexRepositoryImpl.incrementIllegalIndexUpdateDetector(FSIndexRepositoryImpl.java:1543)
	at org.apache.uima.cas.impl.FSIndexRepositoryImpl.ll_addFS(FSIndexRepositoryImpl.java:1625)
	at org.apache.uima.cas.impl.FSIndexRepositoryImpl.addFS(FSIndexRepositoryImpl.java:1059)
	at org.apache.uima.cas.impl.CASImpl.reinitIndexedFSs(CASImpl.java:1480)
	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1282)
	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1168)
	at org.apache.uima.cas.impl.Serialization.deserializeCAS(Serialization.java:171)
        …

Format 4:

java.lang.ArrayIndexOutOfBoundsException: 37
	at org.apache.uima.cas.impl.BinaryCasSerDes4.getTypeInfo(BinaryCasSerDes4.java:2497)
	at org.apache.uima.cas.impl.BinaryCasSerDes4.access$1(BinaryCasSerDes4.java:2496)
	at org.apache.uima.cas.impl.BinaryCasSerDes4$Deserializer.deserialize(BinaryCasSerDes4.java:1621)
	at org.apache.uima.cas.impl.BinaryCasSerDes4$Deserializer.access$18(BinaryCasSerDes4.java:1567)
	at org.apache.uima.cas.impl.BinaryCasSerDes4.deserialize(BinaryCasSerDes4.java:360)
	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1197)
	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1168)
	at org.apache.uima.cas.impl.Serialization.deserializeCAS(Serialization.java:171)
        …

Format 6:

java.lang.ArrayIndexOutOfBoundsException: 37
	at org.apache.uima.cas.impl.TypeSystemImpl.getTypeInfo(TypeSystemImpl.java:1566)
	at org.apache.uima.cas.impl.BinaryCasSerDes6.deserializeAfterVersion(BinaryCasSerDes6.java:1701)
	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1203)
	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1168)
	at org.apache.uima.cas.impl.Serialization.deserializeCAS(Serialization.java:171)
        …

Am I misunderstanding how the (de)serialization is supposed to work?

-- Richard


Am 08.07.2013 um 21:00 schrieb Richard Eckart de Castilho <ri...@gmail.com>:

> I opened an issue for this and pasted a minimal test case there.
> 
> https://issues.apache.org/jira/browse/UIMA-3054
> 
> -- Richard


Re: How to use the new binary CAS (de)serialization?

Posted by Richard Eckart de Castilho <ri...@gmail.com>.
I opened an issue for this and pasted a minimal test case there.

https://issues.apache.org/jira/browse/UIMA-3054

-- Richard

Am 08.07.2013 um 20:24 schrieb Marshall Schor <ms...@schor.com>:

> Hi,
> 
> I need help in reproducing this.
> 
> I added a test case that set the document text before serialization to a text
> string, before serializing.  I then did a serialize / deserialize, and checked
> the getDocumentText returned the right string.
> 
> In the original CAS, I did: 
>    casSrc.setDocumentText("some test text");
> 
> In your failing case, did you set up the CAS with a document text, and if so, how?
> 
> -Marshall
> 
> On 7/4/2013 4:20 PM, Richard Eckart de Castilho wrote:
>> Hi Marshall,
>> 
>> I'd like to try out the new CAS (de)serialization stuff you did recently.
>> 
>> I try serialization with three approaches:
>> 
>> case 0: Serialization.serializeCAS(aJCas.getCas(), docOS); break;
>> case 4: Serialization.serializeWithCompression(aJCas.getCas(), docOS); break;
>> case 6: Serialization.serializeWithCompression(aJCas.getCas(), docOS, aJCas.getTypeSystem()); break;
>> 
>> Then I try to load the data back into a CAS (within a reader component) using
>> 
>> Serialization.deserializeCAS(aCAS, is);
>> 
>> For cases 0 and 4, this appears to work. But in case 6, the document text is <null> after
>> deserializing.
>> 
>> Apparently, I'm doing something wrong - but I have no idea what. Can you give me a hint?
>> 
>> -- Richard


Re: How to use the new binary CAS (de)serialization?

Posted by Marshall Schor <ms...@schor.com>.
Hi,

I need help in reproducing this.

I added a test case that set the document text before serialization to a text
string, before serializing.  I then did a serialize / deserialize, and checked
the getDocumentText returned the right string.

In the original CAS, I did: 
    casSrc.setDocumentText("some test text");

In your failing case, did you set up the CAS with a document text, and if so, how?

-Marshall

On 7/4/2013 4:20 PM, Richard Eckart de Castilho wrote:
> Hi Marshall,
>
> I'd like to try out the new CAS (de)serialization stuff you did recently.
>
> I try serialization with three approaches:
>
> case 0: Serialization.serializeCAS(aJCas.getCas(), docOS); break;
> case 4: Serialization.serializeWithCompression(aJCas.getCas(), docOS); break;
> case 6: Serialization.serializeWithCompression(aJCas.getCas(), docOS, aJCas.getTypeSystem()); break;
>
> Then I try to load the data back into a CAS (within a reader component) using
>
> Serialization.deserializeCAS(aCAS, is);
>
> For cases 0 and 4, this appears to work. But in case 6, the document text is <null> after
> deserializing.
>
> Apparently, I'm doing something wrong - but I have no idea what. Can you give me a hint?
>
> -- Richard
>
>
>


Re: How to use the new binary CAS (de)serialization?

Posted by Marshall Schor <ms...@schor.com>.
On 7/4/2013 4:20 PM, Richard Eckart de Castilho wrote:
> Hi Marshall,
>
> I'd like to try out the new CAS (de)serialization stuff you did recently.
Great! 
>
> I try serialization with three approaches:
>
> case 0: Serialization.serializeCAS(aJCas.getCas(), docOS); break;
> case 4: Serialization.serializeWithCompression(aJCas.getCas(), docOS); break;
> case 6: Serialization.serializeWithCompression(aJCas.getCas(), docOS, aJCas.getTypeSystem()); break;
>
> Then I try to load the data back into a CAS (within a reader component) using
>
> Serialization.deserializeCAS(aCAS, is);
>
> For cases 0 and 4, this appears to work. But in case 6, the document text is <null> after
> deserializing.
I'll take a look, maybe tomorrow...

-M
>
> Apparently, I'm doing something wrong - but I have no idea what. Can you give me a hint?
>
> -- Richard
>
>
>