You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by José Tomás Atria <jt...@gmail.com> on 2015/09/29 20:57:59 UTC

How to correclty implement delta serialization in locally deployed CPE pipeline?

Hello all,

I've been trying to wrap my head around this for a while, and I can't seem
to get it to work. Could someone please explain what is the most
straightforward way of implementing delta serialization in a local,
multithreaded CPE pipeline?

So far, I've tried using a collection reader that uses a
SharedSerializationData that is stored in the current UIMA session, and
creates a CAS marker that is also stored in a map in the current UIMA
session under a CAS identifier key, and then using this
SharedSerializationData oject and the marker retrieved from the UIMA
session from the CAS identifier to serialize the delta to disk, but this
procedure causes an OutOfMemory exception if I try to process all of my
data (Not that much in my opinion, ~2000 CASes).

I assume that I'm missing some basic aspect of the API, but after trying to
deal with it for a while I just gave up...

A more specific version, as far as I could understand: Delta serialization
requires a SharedSerializationData object and a CAS marker. What is the
correct way to create, store and retrieve these in a simple,
multi-threaded, locally deployed CPE processing pipeline? (i.e. No need to
support AS or DUCC facilities, etc).

Any help would be greatly appreciated.
Thanks!
jta

-- 
entia non sunt multiplicanda praeter necessitatem

Re: How to correclty implement delta serialization in locally deployed CPE pipeline?

Posted by José Tomás Atria <jt...@gmail.com>.
Hi Marshall, thanks for your reply. I think I have figured it out: I had
not realized the session object saves data under a per-component name
space, which prevented the serializer from finding the data saved by the
collection reader and deleting it.

I'll try an alternative strategy today and come back of I bump into any
other issues.

Thanks!
Jta

On Thu, Oct 1, 2015, 10:53 Marshall Schor <ms...@schor.com> wrote:

> Hi,
>
> A little more detail of what you're doing may help us figure out what's
> happening.
>
> What API(s) are you using to do the serialization?
>
> -Marshall
>
> On 9/29/2015 2:57 PM, José Tomás Atria wrote:
> > Hello all,
> >
> > I've been trying to wrap my head around this for a while, and I can't
> seem
> > to get it to work. Could someone please explain what is the most
> > straightforward way of implementing delta serialization in a local,
> > multithreaded CPE pipeline?
> >
> > So far, I've tried using a collection reader that uses a
> > SharedSerializationData that is stored in the current UIMA session, and
> > creates a CAS marker that is also stored in a map in the current UIMA
> > session under a CAS identifier key, and then using this
> > SharedSerializationData oject and the marker retrieved from the UIMA
> > session from the CAS identifier to serialize the delta to disk, but this
> > procedure causes an OutOfMemory exception if I try to process all of my
> > data (Not that much in my opinion, ~2000 CASes).
> >
> > I assume that I'm missing some basic aspect of the API, but after trying
> to
> > deal with it for a while I just gave up...
> >
> > A more specific version, as far as I could understand: Delta
> serialization
> > requires a SharedSerializationData object and a CAS marker. What is the
> > correct way to create, store and retrieve these in a simple,
> > multi-threaded, locally deployed CPE processing pipeline? (i.e. No need
> to
> > support AS or DUCC facilities, etc).
> >
> > Any help would be greatly appreciated.
> > Thanks!
> > jta
> >
>
> --

sent from a phone. please excuse terseness and tpyos.

enviado desde un teléfono. por favor disculpe la parquedad y los erroers.

Re: How to correclty implement delta serialization in locally deployed CPE pipeline?

Posted by Marshall Schor <ms...@schor.com>.
Hi,

A little more detail of what you're doing may help us figure out what's happening.

What API(s) are you using to do the serialization?

-Marshall

On 9/29/2015 2:57 PM, José Tomás Atria wrote:
> Hello all,
>
> I've been trying to wrap my head around this for a while, and I can't seem
> to get it to work. Could someone please explain what is the most
> straightforward way of implementing delta serialization in a local,
> multithreaded CPE pipeline?
>
> So far, I've tried using a collection reader that uses a
> SharedSerializationData that is stored in the current UIMA session, and
> creates a CAS marker that is also stored in a map in the current UIMA
> session under a CAS identifier key, and then using this
> SharedSerializationData oject and the marker retrieved from the UIMA
> session from the CAS identifier to serialize the delta to disk, but this
> procedure causes an OutOfMemory exception if I try to process all of my
> data (Not that much in my opinion, ~2000 CASes).
>
> I assume that I'm missing some basic aspect of the API, but after trying to
> deal with it for a while I just gave up...
>
> A more specific version, as far as I could understand: Delta serialization
> requires a SharedSerializationData object and a CAS marker. What is the
> correct way to create, store and retrieve these in a simple,
> multi-threaded, locally deployed CPE processing pipeline? (i.e. No need to
> support AS or DUCC facilities, etc).
>
> Any help would be greatly appreciated.
> Thanks!
> jta
>