You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Eric Riebling <er...@cs.cmu.edu> on 2012/03/15 15:38:06 UTC

Getting annotations from CASes 'external' to a pipeline

I have a pipeline with it's own type system.
I also have deserialized, annotated CASes on disk with a different type system.
Suppose I want an Analysis Engine in the pipeline to read in the deserialized
CASes in order to obtain annotations and 'do things with them'

I understand some limitations in the UIMA framework prevent this, but
could it be done by making the first type system include that of the
CASes to deserialize?

Also, it would necessitate creating new CASes within the Analysis Engine.
I could think of several approaches, and have tried some without success:

  * Create a new, 'temporary' View in the AE's process() method, obtain a
	JCas, obtain it's CAS, and use that to store the deserialized CASes
    (seems to mangle the original CAS and break downstream AEs in the pipeline,
	and seems to not be able to find any annotations in the deserialized CAS)

  * Use the CAS in the process() method to store the deserialized CASes
	(also mangles the original CAS, breaks downstream AEs, but DOES
	permit obtaining annotations from the deserialized CASes)

  * Make the Analysis Engine be a CAS Multiplier, and deserialize into
	a CAS created with createEmtpyCas()
	(I haven't tried this yet)

It's kind of a use case for a hybrid Component that behaves in some ways like
an AE (has a process() method), in some ways like XMI Collection Reader, and
in some ways like a CAS Multiplier.

But it's a useful use case!  It is also a very bizarre one becuase you could
almost think of it as a pipeline within a pipeline, which processes a set
of deserialized annotated XMI documents, within a pipeline that processes ...
in our case, a Question Answering system with question keyterms,
ranked lists of documents and answer candidates.

Re: Getting annotations from CASes 'external' to a pipeline

Posted by Eddie Epstein <ea...@gmail.com>.

Hi Eric,

If the pipeline AE has no use for "out-of-typesystem" data in the serialized
CASes, it can deserialize using the "lenient" flag. Only types in the AE
typesystem will be put into the pipeline CAS.

Is that all you need?

Eddie


On Thu, Mar 15, 2012 at 10:38 AM, Eric Riebling <er...@cs.cmu.edu> wrote:
> I have a pipeline with it's own type system.
> I also have deserialized, annotated CASes on disk with a different type
> system.
> Suppose I want an Analysis Engine in the pipeline to read in the
> deserialized
> CASes in order to obtain annotations and 'do things with them'
>
> I understand some limitations in the UIMA framework prevent this, but
> could it be done by making the first type system include that of the
> CASes to deserialize?
>
> Also, it would necessitate creating new CASes within the Analysis Engine.
> I could think of several approaches, and have tried some without success:
>
>  * Create a new, 'temporary' View in the AE's process() method, obtain a
>        JCas, obtain it's CAS, and use that to store the deserialized CASes
>   (seems to mangle the original CAS and break downstream AEs in the
> pipeline,
>        and seems to not be able to find any annotations in the deserialized
> CAS)
>
>  * Use the CAS in the process() method to store the deserialized CASes
>        (also mangles the original CAS, breaks downstream AEs, but DOES
>        permit obtaining annotations from the deserialized CASes)
>
>  * Make the Analysis Engine be a CAS Multiplier, and deserialize into
>        a CAS created with createEmtpyCas()
>        (I haven't tried this yet)
>
> It's kind of a use case for a hybrid Component that behaves in some ways
> like
> an AE (has a process() method), in some ways like XMI Collection Reader, and
> in some ways like a CAS Multiplier.
>
> But it's a useful use case!  It is also a very bizarre one becuase you could
> almost think of it as a pipeline within a pipeline, which processes a set
> of deserialized annotated XMI documents, within a pipeline that processes
> ...
> in our case, a Question Answering system with question keyterms,
> ranked lists of documents and answer candidates.

Re: Getting annotations from CASes 'external' to a pipeline

Posted by Eddie Epstein <ea...@gmail.com>.

Just to clarify, the CPM has no support for CasMultipliers. So any use
of a CM in a CPE would only be supported inside the AE of a single CAS
processor. That is, no child CAS would ever come out of one CAS
processor and flow into another.

So in your case, all the action is withing a single [aggregate] AE?


On Fri, Mar 16, 2012 at 5:07 PM, Eric Riebling <er...@cs.cmu.edu> wrote:
> Last one, sorry list members for the spam.
>
> The reason things were funky is that this was a system that
> used CAS Multipliers to create the CASes that my Component
> was seeing.  If I run my Component in a straight-line pipeline,
> getEmptyCAS() produces CASes with the full type system as it
> is supposed to do.
>
> I don't fully understand the architecture of the surrounding
> system, but once I do, will supply you guys with the details in
> case this is a bug with the way UIMA handles CASes that are
> multiplied more than once.
>
>
> On 3/16/2012 2:16 PM, Eric Riebling wrote:
>>
>> And the difference in environment:
>>
>> * use SimpleRunCPE - user defined types don't show up
>> * use CPE GUI - they DO show up
>>
>> This is interesting!
>>
>> On 3/15/2012 6:50 PM, Eddie Epstein wrote:
>>>
>>> My last note was incorrect. Here is a paraphrase of working code:
>>>
>>> public AbstractCas next() throws AnalysisEngineProcessException {
>>> CAS aCAS = getEmptyCAS();
>>> try {
>>> ByteArrayInputStream casIn = getNextXmiCas();
>>> XmiCasDeserializer.deserialize(casIn, aCAS, true); //
>>> deserialize in a lenient fashion
>>> return aCAS;
>>> } catch (SAXException e) {
>>> throw new AnalysisEngineProcessException(e);
>>> } catch (IOException e) {
>>> throw new AnalysisEngineProcessException(e);
>>> }
>>> ...
>>>
>>>
>>> On Thu, Mar 15, 2012 at 5:59 PM, Marshall Schor<ms...@schor.com> wrote:
>>>>
>>>>
>>>>
>>>> On 3/15/2012 4:38 PM, Eddie Epstein wrote:
>>>>>
>>>>>
>>>>> Cannot deserialize into a CAS from getEmptyCas().
>>>>
>>>>
>>>> This is not right. More information soon (ran out of time today).
>>>> -Marshall
>>>>
>>>>> Must use a CAS from
>>>>> CasCreationUtils.createCas for deserialization, and then use casCopier
>>>>> to copy to the CAS from getEmptyCas().
>>>>>
>>>>> Pick the version of createCas that specifies a typesystem, and use the
>>>>> typesystem from the pipeline CAS (i.e. the one from getEmptyCas).
>>>>>
>>>>> On Thu, Mar 15, 2012 at 2:44 PM, Eric Riebling<er...@cs.cmu.edu> wrote:
>>>>>>
>>>>>>
>>>>>> Thanks, guys. This is getting me closer to the goal, and explains the
>>>>>> observed
>>>>>> behaviors. Now I'm facing issues when implemented as a CAS Multiplier.
>>>>>> I
>>>>>> try
>>>>>> creating a new CAS first with getEmptyJCas().
>>>>>>
>>>>>> Here are some various strategies and what resulted:
>>>>>>
>>>>>> * create a deserializer with the typesystem from the AE (which
>>>>>> includes types in the 'external' CAS to be deserialized)
>>>>>> * ues it to deserialize into the empty CAS created with getEmptyJCas()
>>>>>>
>>>>>> -> The deserialized CAS for some reason has only the base TOP
>>>>>> typesystem
>>>>>> -> Trying to access an annotation from an index (that should be there)
>>>>>> generates the "used in Java code, but was not declared in the XML
>>>>>> type
>>>>>> descriptor"
>>>>>> exception
>>>>>>
>>>>>> * same as above, but use CasCopier to try and copy the type system
>>>>>> (and everything else) from the CAS in the AE's process() method
>>>>>> into the empty CAS
>>>>>>
>>>>>> -> Attempted to copy a FeatureStructure of type "(my type name)",
>>>>>> which
>>>>>> is
>>>>>> not defined in the type system of the destination CAS.
>>>>>>
>>>>>> It seems the ONLY way to obtain a CAS (empty or otherwise) that has
>>>>>> the
>>>>>> type
>>>>>> system able
>>>>>> to accept the external CAS being deserialized is to use the very CAS
>>>>>> passed
>>>>>> into
>>>>>> the AE's process() method. Doing so obviously mangles that CAS for the
>>>>>> rest
>>>>>> of
>>>>>> the pipeline.
>>>>>>
>>>>>>
>>>>>> On 3/15/2012 1:50 PM, Marshall Schor wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 3/15/2012 10:38 AM, Eric Riebling wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I have a pipeline with it's own type system.
>>>>>>>> I also have deserialized, annotated CASes on disk with a different
>>>>>>>> type
>>>>>>>> system.
>>>>>>>> Suppose I want an Analysis Engine in the pipeline to read in the
>>>>>>>> deserialized
>>>>>>>> CASes in order to obtain annotations and 'do things with them'
>>>>>>>>
>>>>>>>> I understand some limitations in the UIMA framework prevent this,
>>>>>>>> but
>>>>>>>> could it be done by making the first type system include that of the
>>>>>>>> CASes to deserialize?
>>>>>>>
>>>>>>>
>>>>>>> Yes, I think so.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Also, it would necessitate creating new CASes within the Analysis
>>>>>>>> Engine.
>>>>>>>> I could think of several approaches, and have tried some without
>>>>>>>> success:
>>>>>>>>
>>>>>>>> * Create a new, 'temporary' View in the AE's process() method,
>>>>>>>> obtain a
>>>>>>>> JCas, obtain it's CAS, and use that to store the deserialized CASes
>>>>>>>> (seems to mangle the original CAS and break downstream AEs in the
>>>>>>>> pipeline,
>>>>>>>> and seems to not be able to find any annotations in the deserialized
>>>>>>>> CAS)
>>>>>>>>
>>>>>>> This won't work. The deserialize method effectively "resets" the CAS
>>>>>>> before loading it.
>>>>>>> A view is not a new CAS; it is a new view of the same CAS.
>>>>>>>
>>>>>>>> * Use the CAS in the process() method to store the deserialized
>>>>>>>> CASes
>>>>>>>> (also mangles the original CAS, breaks downstream AEs, but DOES
>>>>>>>> permit obtaining annotations from the deserialized CASes)
>>>>>>>
>>>>>>>
>>>>>>> Right, deserializing into an existing CAS resets it in flight.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> * Make the Analysis Engine be a CAS Multiplier, and deserialize into
>>>>>>>> a CAS created with createEmtpyCas()
>>>>>>>> (I haven't tried this yet)
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is the way to get a separate CAS instance to deserialize
>>>>>>> into.
>>>>>>> It's how Collection Readers do it.
>>>>>>> -Marshall
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> It's kind of a use case for a hybrid Component that behaves in some
>>>>>>>> ways
>>>>>>>> like
>>>>>>>> an AE (has a process() method), in some ways like XMI Collection
>>>>>>>> Reader,
>>>>>>>> and
>>>>>>>> in some ways like a CAS Multiplier.
>>>>>>>>
>>>>>>>> But it's a useful use case! It is also a very bizarre one becuase
>>>>>>>> you
>>>>>>>> could
>>>>>>>> almost think of it as a pipeline within a pipeline, which processes
>>>>>>>> a
>>>>>>>> set
>>>>>>>> of deserialized annotated XMI documents, within a pipeline that
>>>>>>>> processes
>>>>>>>> ...
>>>>>>>> in our case, a Question Answering system with question keyterms,
>>>>>>>> ranked lists of documents and answer candidates.
>>>>>>>>
>>>>
>>>
>>
>

Re: Getting annotations from CASes 'external' to a pipeline

Posted by Eric Riebling <er...@cs.cmu.edu>.

Last one, sorry list members for the spam.

The reason things were funky is that this was a system that
used CAS Multipliers to create the CASes that my Component
was seeing.  If I run my Component in a straight-line pipeline,
getEmptyCAS() produces CASes with the full type system as it
is supposed to do.

I don't fully understand the architecture of the surrounding
system, but once I do, will supply you guys with the details in
case this is a bug with the way UIMA handles CASes that are
multiplied more than once.

On 3/16/2012 2:16 PM, Eric Riebling wrote:
> And the difference in environment:
>
> * use SimpleRunCPE - user defined types don't show up
> * use CPE GUI - they DO show up
>
> This is interesting!
>
> On 3/15/2012 6:50 PM, Eddie Epstein wrote:
>> My last note was incorrect. Here is a paraphrase of working code:
>>
>> public AbstractCas next() throws AnalysisEngineProcessException {
>> CAS aCAS = getEmptyCAS();
>> try {
>> ByteArrayInputStream casIn = getNextXmiCas();
>> XmiCasDeserializer.deserialize(casIn, aCAS, true); //
>> deserialize in a lenient fashion
>> return aCAS;
>> } catch (SAXException e) {
>> throw new AnalysisEngineProcessException(e);
>> } catch (IOException e) {
>> throw new AnalysisEngineProcessException(e);
>> }
>> ...
>>
>>
>> On Thu, Mar 15, 2012 at 5:59 PM, Marshall Schor<ms...@schor.com> wrote:
>>>
>>>
>>> On 3/15/2012 4:38 PM, Eddie Epstein wrote:
>>>>
>>>> Cannot deserialize into a CAS from getEmptyCas().
>>>
>>> This is not right. More information soon (ran out of time today). -Marshall
>>>
>>>> Must use a CAS from
>>>> CasCreationUtils.createCas for deserialization, and then use casCopier
>>>> to copy to the CAS from getEmptyCas().
>>>>
>>>> Pick the version of createCas that specifies a typesystem, and use the
>>>> typesystem from the pipeline CAS (i.e. the one from getEmptyCas).
>>>>
>>>> On Thu, Mar 15, 2012 at 2:44 PM, Eric Riebling<er...@cs.cmu.edu> wrote:
>>>>>
>>>>> Thanks, guys. This is getting me closer to the goal, and explains the
>>>>> observed
>>>>> behaviors. Now I'm facing issues when implemented as a CAS Multiplier.
>>>>> I
>>>>> try
>>>>> creating a new CAS first with getEmptyJCas().
>>>>>
>>>>> Here are some various strategies and what resulted:
>>>>>
>>>>> * create a deserializer with the typesystem from the AE (which
>>>>> includes types in the 'external' CAS to be deserialized)
>>>>> * ues it to deserialize into the empty CAS created with getEmptyJCas()
>>>>>
>>>>> -> The deserialized CAS for some reason has only the base TOP
>>>>> typesystem
>>>>> -> Trying to access an annotation from an index (that should be there)
>>>>> generates the "used in Java code, but was not declared in the XML
>>>>> type
>>>>> descriptor"
>>>>> exception
>>>>>
>>>>> * same as above, but use CasCopier to try and copy the type system
>>>>> (and everything else) from the CAS in the AE's process() method
>>>>> into the empty CAS
>>>>>
>>>>> -> Attempted to copy a FeatureStructure of type "(my type name)", which
>>>>> is
>>>>> not defined in the type system of the destination CAS.
>>>>>
>>>>> It seems the ONLY way to obtain a CAS (empty or otherwise) that has the
>>>>> type
>>>>> system able
>>>>> to accept the external CAS being deserialized is to use the very CAS
>>>>> passed
>>>>> into
>>>>> the AE's process() method. Doing so obviously mangles that CAS for the
>>>>> rest
>>>>> of
>>>>> the pipeline.
>>>>>
>>>>>
>>>>> On 3/15/2012 1:50 PM, Marshall Schor wrote:
>>>>>>
>>>>>>
>>>>>> On 3/15/2012 10:38 AM, Eric Riebling wrote:
>>>>>>>
>>>>>>> I have a pipeline with it's own type system.
>>>>>>> I also have deserialized, annotated CASes on disk with a different type
>>>>>>> system.
>>>>>>> Suppose I want an Analysis Engine in the pipeline to read in the
>>>>>>> deserialized
>>>>>>> CASes in order to obtain annotations and 'do things with them'
>>>>>>>
>>>>>>> I understand some limitations in the UIMA framework prevent this, but
>>>>>>> could it be done by making the first type system include that of the
>>>>>>> CASes to deserialize?
>>>>>>
>>>>>> Yes, I think so.
>>>>>>>
>>>>>>>
>>>>>>> Also, it would necessitate creating new CASes within the Analysis
>>>>>>> Engine.
>>>>>>> I could think of several approaches, and have tried some without
>>>>>>> success:
>>>>>>>
>>>>>>> * Create a new, 'temporary' View in the AE's process() method, obtain a
>>>>>>> JCas, obtain it's CAS, and use that to store the deserialized CASes
>>>>>>> (seems to mangle the original CAS and break downstream AEs in the
>>>>>>> pipeline,
>>>>>>> and seems to not be able to find any annotations in the deserialized
>>>>>>> CAS)
>>>>>>>
>>>>>> This won't work. The deserialize method effectively "resets" the CAS
>>>>>> before loading it.
>>>>>> A view is not a new CAS; it is a new view of the same CAS.
>>>>>>
>>>>>>> * Use the CAS in the process() method to store the deserialized CASes
>>>>>>> (also mangles the original CAS, breaks downstream AEs, but DOES
>>>>>>> permit obtaining annotations from the deserialized CASes)
>>>>>>
>>>>>> Right, deserializing into an existing CAS resets it in flight.
>>>>>>>
>>>>>>>
>>>>>>> * Make the Analysis Engine be a CAS Multiplier, and deserialize into
>>>>>>> a CAS created with createEmtpyCas()
>>>>>>> (I haven't tried this yet)
>>>>>>
>>>>>> Yes, this is the way to get a separate CAS instance to deserialize into.
>>>>>> It's how Collection Readers do it.
>>>>>> -Marshall
>>>>>>>
>>>>>>>
>>>>>>> It's kind of a use case for a hybrid Component that behaves in some
>>>>>>> ways
>>>>>>> like
>>>>>>> an AE (has a process() method), in some ways like XMI Collection
>>>>>>> Reader,
>>>>>>> and
>>>>>>> in some ways like a CAS Multiplier.
>>>>>>>
>>>>>>> But it's a useful use case! It is also a very bizarre one becuase you
>>>>>>> could
>>>>>>> almost think of it as a pipeline within a pipeline, which processes a
>>>>>>> set
>>>>>>> of deserialized annotated XMI documents, within a pipeline that
>>>>>>> processes
>>>>>>> ...
>>>>>>> in our case, a Question Answering system with question keyterms,
>>>>>>> ranked lists of documents and answer candidates.
>>>>>>>
>>>
>>
>

Re: Getting annotations from CASes 'external' to a pipeline

Posted by Eric Riebling <er...@cs.cmu.edu>.

And the difference in environment:

  * use SimpleRunCPE - user defined types don't show up
  * use CPE GUI      - they DO show up

This is interesting!

On 3/15/2012 6:50 PM, Eddie Epstein wrote:
> My last note was incorrect. Here is a paraphrase of working code:
>
>    public AbstractCas next() throws AnalysisEngineProcessException {
>      CAS aCAS = getEmptyCAS();
>      try {
>        ByteArrayInputStream casIn = getNextXmiCas();
>        XmiCasDeserializer.deserialize(casIn, aCAS, true); //
> deserialize in a lenient fashion
>        return aCAS;
>      } catch (SAXException e) {
>        throw new AnalysisEngineProcessException(e);
>      } catch (IOException e) {
>        throw new AnalysisEngineProcessException(e);
>      }
> ...
>
>
> On Thu, Mar 15, 2012 at 5:59 PM, Marshall Schor<ms...@schor.com>  wrote:
>>
>>
>> On 3/15/2012 4:38 PM, Eddie Epstein wrote:
>>>
>>> Cannot deserialize into a CAS from getEmptyCas().
>>
>> This is not right.  More information soon (ran out of time today). -Marshall
>>
>>> Must use a CAS from
>>> CasCreationUtils.createCas for deserialization, and then use casCopier
>>> to copy to the CAS from getEmptyCas().
>>>
>>> Pick the version of createCas that specifies a typesystem, and use the
>>> typesystem from the pipeline CAS (i.e. the one from getEmptyCas).
>>>
>>> On Thu, Mar 15, 2012 at 2:44 PM, Eric Riebling<er...@cs.cmu.edu>    wrote:
>>>>
>>>> Thanks, guys.  This is getting me closer to the goal, and explains the
>>>> observed
>>>> behaviors.  Now I'm facing issues when implemented as a CAS Multiplier.
>>>>   I
>>>> try
>>>> creating a new CAS first with getEmptyJCas().
>>>>
>>>> Here are some various strategies and what resulted:
>>>>
>>>>   * create a deserializer with the typesystem from the AE (which
>>>>         includes types in the 'external' CAS to be deserialized)
>>>>   * ues it to deserialize into the empty CAS created with getEmptyJCas()
>>>>
>>>>   ->    The deserialized CAS for some reason has only the base TOP
>>>> typesystem
>>>>   ->    Trying to access an annotation from an index (that should be there)
>>>>     generates the "used in Java code,  but was not declared in the XML
>>>> type
>>>> descriptor"
>>>>         exception
>>>>
>>>>   * same as above, but use CasCopier to try and copy the type system
>>>>         (and everything else) from the CAS in the AE's process() method
>>>>           into the empty CAS
>>>>
>>>>   ->    Attempted to copy a FeatureStructure of type "(my type name)", which
>>>> is
>>>> not defined in the type system of the destination CAS.
>>>>
>>>> It seems the ONLY way to obtain a CAS (empty or otherwise) that has the
>>>> type
>>>> system able
>>>> to accept the external CAS being deserialized is to use the very CAS
>>>> passed
>>>> into
>>>> the AE's process() method.  Doing so obviously mangles that CAS for the
>>>> rest
>>>> of
>>>> the pipeline.
>>>>
>>>>
>>>> On 3/15/2012 1:50 PM, Marshall Schor wrote:
>>>>>
>>>>>
>>>>> On 3/15/2012 10:38 AM, Eric Riebling wrote:
>>>>>>
>>>>>> I have a pipeline with it's own type system.
>>>>>> I also have deserialized, annotated CASes on disk with a different type
>>>>>> system.
>>>>>> Suppose I want an Analysis Engine in the pipeline to read in the
>>>>>> deserialized
>>>>>> CASes in order to obtain annotations and 'do things with them'
>>>>>>
>>>>>> I understand some limitations in the UIMA framework prevent this, but
>>>>>> could it be done by making the first type system include that of the
>>>>>> CASes to deserialize?
>>>>>
>>>>> Yes, I think so.
>>>>>>
>>>>>>
>>>>>> Also, it would necessitate creating new CASes within the Analysis
>>>>>> Engine.
>>>>>> I could think of several approaches, and have tried some without
>>>>>> success:
>>>>>>
>>>>>> * Create a new, 'temporary' View in the AE's process() method, obtain a
>>>>>> JCas, obtain it's CAS, and use that to store the deserialized CASes
>>>>>> (seems to mangle the original CAS and break downstream AEs in the
>>>>>> pipeline,
>>>>>> and seems to not be able to find any annotations in the deserialized
>>>>>> CAS)
>>>>>>
>>>>> This won't work. The deserialize method effectively "resets" the CAS
>>>>> before loading it.
>>>>> A view is not a new CAS; it is a new view of the same CAS.
>>>>>
>>>>>> * Use the CAS in the process() method to store the deserialized CASes
>>>>>> (also mangles the original CAS, breaks downstream AEs, but DOES
>>>>>> permit obtaining annotations from the deserialized CASes)
>>>>>
>>>>> Right, deserializing into an existing CAS resets it in flight.
>>>>>>
>>>>>>
>>>>>> * Make the Analysis Engine be a CAS Multiplier, and deserialize into
>>>>>> a CAS created with createEmtpyCas()
>>>>>> (I haven't tried this yet)
>>>>>
>>>>> Yes, this is the way to get a separate CAS instance to deserialize into.
>>>>> It's how Collection Readers do it.
>>>>> -Marshall
>>>>>>
>>>>>>
>>>>>> It's kind of a use case for a hybrid Component that behaves in some
>>>>>> ways
>>>>>> like
>>>>>> an AE (has a process() method), in some ways like XMI Collection
>>>>>> Reader,
>>>>>> and
>>>>>> in some ways like a CAS Multiplier.
>>>>>>
>>>>>> But it's a useful use case! It is also a very bizarre one becuase you
>>>>>> could
>>>>>> almost think of it as a pipeline within a pipeline, which processes a
>>>>>> set
>>>>>> of deserialized annotated XMI documents, within a pipeline that
>>>>>> processes
>>>>>> ...
>>>>>> in our case, a Question Answering system with question keyterms,
>>>>>> ranked lists of documents and answer candidates.
>>>>>>
>>
>

Re: Getting annotations from CASes 'external' to a pipeline

Posted by Eric Riebling <er...@cs.cmu.edu>.

I found out some more about what's going on.  In my pipeline,
for whatever reason, getEmpty(J)Cas() is producing CASes that
have only the base UIMA Types.  No user-defined types are showing
up.  This is not the defined behavior; it should be producing CASes
that have the merged typesystems of the outermost containing
Components.  And there's a lot of them.

I verified that getEmptyCas() does the right thing when in a simple
CPE using tutorial type system and annotators.  So the 'real problem'
is what is weird about the environmental setup of the pipline such
that getEmptyCas() isn't finding the typesystem.

So now I'm debugging what it is about this environment that's different,
that makes getEmptyCas() produce CASes with no user defined types, when
it correctly does so in other environments.

On 3/15/2012 6:50 PM, Eddie Epstein wrote:
> My last note was incorrect. Here is a paraphrase of working code:
>
>    public AbstractCas next() throws AnalysisEngineProcessException {
>      CAS aCAS = getEmptyCAS();
>      try {
>        ByteArrayInputStream casIn = getNextXmiCas();
>        XmiCasDeserializer.deserialize(casIn, aCAS, true); //
> deserialize in a lenient fashion
>        return aCAS;
>      } catch (SAXException e) {
>        throw new AnalysisEngineProcessException(e);
>      } catch (IOException e) {
>        throw new AnalysisEngineProcessException(e);
>      }
> ...
>
>
> On Thu, Mar 15, 2012 at 5:59 PM, Marshall Schor<ms...@schor.com>  wrote:
>>
>>
>> On 3/15/2012 4:38 PM, Eddie Epstein wrote:
>>>
>>> Cannot deserialize into a CAS from getEmptyCas().
>>
>> This is not right.  More information soon (ran out of time today). -Marshall
>>
>>> Must use a CAS from
>>> CasCreationUtils.createCas for deserialization, and then use casCopier
>>> to copy to the CAS from getEmptyCas().
>>>
>>> Pick the version of createCas that specifies a typesystem, and use the
>>> typesystem from the pipeline CAS (i.e. the one from getEmptyCas).
>>>
>>> On Thu, Mar 15, 2012 at 2:44 PM, Eric Riebling<er...@cs.cmu.edu>    wrote:
>>>>
>>>> Thanks, guys.  This is getting me closer to the goal, and explains the
>>>> observed
>>>> behaviors.  Now I'm facing issues when implemented as a CAS Multiplier.
>>>>   I
>>>> try
>>>> creating a new CAS first with getEmptyJCas().
>>>>
>>>> Here are some various strategies and what resulted:
>>>>
>>>>   * create a deserializer with the typesystem from the AE (which
>>>>         includes types in the 'external' CAS to be deserialized)
>>>>   * ues it to deserialize into the empty CAS created with getEmptyJCas()
>>>>
>>>>   ->    The deserialized CAS for some reason has only the base TOP
>>>> typesystem
>>>>   ->    Trying to access an annotation from an index (that should be there)
>>>>     generates the "used in Java code,  but was not declared in the XML
>>>> type
>>>> descriptor"
>>>>         exception
>>>>
>>>>   * same as above, but use CasCopier to try and copy the type system
>>>>         (and everything else) from the CAS in the AE's process() method
>>>>           into the empty CAS
>>>>
>>>>   ->    Attempted to copy a FeatureStructure of type "(my type name)", which
>>>> is
>>>> not defined in the type system of the destination CAS.
>>>>
>>>> It seems the ONLY way to obtain a CAS (empty or otherwise) that has the
>>>> type
>>>> system able
>>>> to accept the external CAS being deserialized is to use the very CAS
>>>> passed
>>>> into
>>>> the AE's process() method.  Doing so obviously mangles that CAS for the
>>>> rest
>>>> of
>>>> the pipeline.
>>>>
>>>>
>>>> On 3/15/2012 1:50 PM, Marshall Schor wrote:
>>>>>
>>>>>
>>>>> On 3/15/2012 10:38 AM, Eric Riebling wrote:
>>>>>>
>>>>>> I have a pipeline with it's own type system.
>>>>>> I also have deserialized, annotated CASes on disk with a different type
>>>>>> system.
>>>>>> Suppose I want an Analysis Engine in the pipeline to read in the
>>>>>> deserialized
>>>>>> CASes in order to obtain annotations and 'do things with them'
>>>>>>
>>>>>> I understand some limitations in the UIMA framework prevent this, but
>>>>>> could it be done by making the first type system include that of the
>>>>>> CASes to deserialize?
>>>>>
>>>>> Yes, I think so.
>>>>>>
>>>>>>
>>>>>> Also, it would necessitate creating new CASes within the Analysis
>>>>>> Engine.
>>>>>> I could think of several approaches, and have tried some without
>>>>>> success:
>>>>>>
>>>>>> * Create a new, 'temporary' View in the AE's process() method, obtain a
>>>>>> JCas, obtain it's CAS, and use that to store the deserialized CASes
>>>>>> (seems to mangle the original CAS and break downstream AEs in the
>>>>>> pipeline,
>>>>>> and seems to not be able to find any annotations in the deserialized
>>>>>> CAS)
>>>>>>
>>>>> This won't work. The deserialize method effectively "resets" the CAS
>>>>> before loading it.
>>>>> A view is not a new CAS; it is a new view of the same CAS.
>>>>>
>>>>>> * Use the CAS in the process() method to store the deserialized CASes
>>>>>> (also mangles the original CAS, breaks downstream AEs, but DOES
>>>>>> permit obtaining annotations from the deserialized CASes)
>>>>>
>>>>> Right, deserializing into an existing CAS resets it in flight.
>>>>>>
>>>>>>
>>>>>> * Make the Analysis Engine be a CAS Multiplier, and deserialize into
>>>>>> a CAS created with createEmtpyCas()
>>>>>> (I haven't tried this yet)
>>>>>
>>>>> Yes, this is the way to get a separate CAS instance to deserialize into.
>>>>> It's how Collection Readers do it.
>>>>> -Marshall
>>>>>>
>>>>>>
>>>>>> It's kind of a use case for a hybrid Component that behaves in some
>>>>>> ways
>>>>>> like
>>>>>> an AE (has a process() method), in some ways like XMI Collection
>>>>>> Reader,
>>>>>> and
>>>>>> in some ways like a CAS Multiplier.
>>>>>>
>>>>>> But it's a useful use case! It is also a very bizarre one becuase you
>>>>>> could
>>>>>> almost think of it as a pipeline within a pipeline, which processes a
>>>>>> set
>>>>>> of deserialized annotated XMI documents, within a pipeline that
>>>>>> processes
>>>>>> ...
>>>>>> in our case, a Question Answering system with question keyterms,
>>>>>> ranked lists of documents and answer candidates.
>>>>>>
>>
>

Re: Getting annotations from CASes 'external' to a pipeline

Posted by Eric Riebling <er...@cs.cmu.edu>.

Interesting things to note:

  * When I run a CPE with XMI Collection Reader, the CAS being passed
	into getNext() already has the type system of the XMI being
	deserialized - so it works

  * XMICasDeserializer as used in XMI collection Reader doesn't
	even instantiate a deserializer with a type system, it's
	used in a static way.

  * When I use a CAS created with getEmpty(j)Cas() the CAS that's
	about to be populated does NOT have the type system of the
	XMI I'm trying to deserialize... which is why it DOESN'T
	work

  * Using a deserializer instantiated with a type system doesn't
	seem to do the right thing.  Even though the deserializer
	is given a type system to use, if the CAS being deserialized
	into doesn't already also have that type system, the types
	go ignored.  Or I don't understand the purpose of giving a
	typesystem to a deserializer. :)

I know the lenient parameter is supposed to not throw exceptions
and ignore unknown types.  I think what's happening in my case is
that almost ALL the types are unknown types, they DO get ignored,
and so I don't get annotations.

What I'm looking for now is some way to create an empty (or even full!)
CAS into which to deserialize that DOES have our typesystem.  Then it
ought to work, like it does when I use the CAS from process(), because
it DOES have our typesystem.

Interesting.



On 3/15/2012 6:50 PM, Eddie Epstein wrote:
> My last note was incorrect. Here is a paraphrase of working code:
>
>    public AbstractCas next() throws AnalysisEngineProcessException {
>      CAS aCAS = getEmptyCAS();
>      try {
>        ByteArrayInputStream casIn = getNextXmiCas();
>        XmiCasDeserializer.deserialize(casIn, aCAS, true); //
> deserialize in a lenient fashion
>        return aCAS;
>      } catch (SAXException e) {
>        throw new AnalysisEngineProcessException(e);
>      } catch (IOException e) {
>        throw new AnalysisEngineProcessException(e);
>      }
> ...
>
>
> On Thu, Mar 15, 2012 at 5:59 PM, Marshall Schor<ms...@schor.com>  wrote:
>>
>>
>> On 3/15/2012 4:38 PM, Eddie Epstein wrote:
>>>
>>> Cannot deserialize into a CAS from getEmptyCas().
>>
>> This is not right.  More information soon (ran out of time today). -Marshall
>>
>>> Must use a CAS from
>>> CasCreationUtils.createCas for deserialization, and then use casCopier
>>> to copy to the CAS from getEmptyCas().
>>>
>>> Pick the version of createCas that specifies a typesystem, and use the
>>> typesystem from the pipeline CAS (i.e. the one from getEmptyCas).
>>>
>>> On Thu, Mar 15, 2012 at 2:44 PM, Eric Riebling<er...@cs.cmu.edu>    wrote:
>>>>
>>>> Thanks, guys.  This is getting me closer to the goal, and explains the
>>>> observed
>>>> behaviors.  Now I'm facing issues when implemented as a CAS Multiplier.
>>>>   I
>>>> try
>>>> creating a new CAS first with getEmptyJCas().
>>>>
>>>> Here are some various strategies and what resulted:
>>>>
>>>>   * create a deserializer with the typesystem from the AE (which
>>>>         includes types in the 'external' CAS to be deserialized)
>>>>   * ues it to deserialize into the empty CAS created with getEmptyJCas()
>>>>
>>>>   ->    The deserialized CAS for some reason has only the base TOP
>>>> typesystem
>>>>   ->    Trying to access an annotation from an index (that should be there)
>>>>     generates the "used in Java code,  but was not declared in the XML
>>>> type
>>>> descriptor"
>>>>         exception
>>>>
>>>>   * same as above, but use CasCopier to try and copy the type system
>>>>         (and everything else) from the CAS in the AE's process() method
>>>>           into the empty CAS
>>>>
>>>>   ->    Attempted to copy a FeatureStructure of type "(my type name)", which
>>>> is
>>>> not defined in the type system of the destination CAS.
>>>>
>>>> It seems the ONLY way to obtain a CAS (empty or otherwise) that has the
>>>> type
>>>> system able
>>>> to accept the external CAS being deserialized is to use the very CAS
>>>> passed
>>>> into
>>>> the AE's process() method.  Doing so obviously mangles that CAS for the
>>>> rest
>>>> of
>>>> the pipeline.
>>>>
>>>>
>>>> On 3/15/2012 1:50 PM, Marshall Schor wrote:
>>>>>
>>>>>
>>>>> On 3/15/2012 10:38 AM, Eric Riebling wrote:
>>>>>>
>>>>>> I have a pipeline with it's own type system.
>>>>>> I also have deserialized, annotated CASes on disk with a different type
>>>>>> system.
>>>>>> Suppose I want an Analysis Engine in the pipeline to read in the
>>>>>> deserialized
>>>>>> CASes in order to obtain annotations and 'do things with them'
>>>>>>
>>>>>> I understand some limitations in the UIMA framework prevent this, but
>>>>>> could it be done by making the first type system include that of the
>>>>>> CASes to deserialize?
>>>>>
>>>>> Yes, I think so.
>>>>>>
>>>>>>
>>>>>> Also, it would necessitate creating new CASes within the Analysis
>>>>>> Engine.
>>>>>> I could think of several approaches, and have tried some without
>>>>>> success:
>>>>>>
>>>>>> * Create a new, 'temporary' View in the AE's process() method, obtain a
>>>>>> JCas, obtain it's CAS, and use that to store the deserialized CASes
>>>>>> (seems to mangle the original CAS and break downstream AEs in the
>>>>>> pipeline,
>>>>>> and seems to not be able to find any annotations in the deserialized
>>>>>> CAS)
>>>>>>
>>>>> This won't work. The deserialize method effectively "resets" the CAS
>>>>> before loading it.
>>>>> A view is not a new CAS; it is a new view of the same CAS.
>>>>>
>>>>>> * Use the CAS in the process() method to store the deserialized CASes
>>>>>> (also mangles the original CAS, breaks downstream AEs, but DOES
>>>>>> permit obtaining annotations from the deserialized CASes)
>>>>>
>>>>> Right, deserializing into an existing CAS resets it in flight.
>>>>>>
>>>>>>
>>>>>> * Make the Analysis Engine be a CAS Multiplier, and deserialize into
>>>>>> a CAS created with createEmtpyCas()
>>>>>> (I haven't tried this yet)
>>>>>
>>>>> Yes, this is the way to get a separate CAS instance to deserialize into.
>>>>> It's how Collection Readers do it.
>>>>> -Marshall
>>>>>>
>>>>>>
>>>>>> It's kind of a use case for a hybrid Component that behaves in some
>>>>>> ways
>>>>>> like
>>>>>> an AE (has a process() method), in some ways like XMI Collection
>>>>>> Reader,
>>>>>> and
>>>>>> in some ways like a CAS Multiplier.
>>>>>>
>>>>>> But it's a useful use case! It is also a very bizarre one becuase you
>>>>>> could
>>>>>> almost think of it as a pipeline within a pipeline, which processes a
>>>>>> set
>>>>>> of deserialized annotated XMI documents, within a pipeline that
>>>>>> processes
>>>>>> ...
>>>>>> in our case, a Question Answering system with question keyterms,
>>>>>> ranked lists of documents and answer candidates.
>>>>>>
>>
>

Re: Getting annotations from CASes 'external' to a pipeline

Posted by Eddie Epstein <ea...@gmail.com>.

My last note was incorrect. Here is a paraphrase of working code:

  public AbstractCas next() throws AnalysisEngineProcessException {
    CAS aCAS = getEmptyCAS();
    try {
      ByteArrayInputStream casIn = getNextXmiCas();
      XmiCasDeserializer.deserialize(casIn, aCAS, true); //
deserialize in a lenient fashion
      return aCAS;
    } catch (SAXException e) {
      throw new AnalysisEngineProcessException(e);
    } catch (IOException e) {
      throw new AnalysisEngineProcessException(e);
    }
...


On Thu, Mar 15, 2012 at 5:59 PM, Marshall Schor <ms...@schor.com> wrote:
>
>
> On 3/15/2012 4:38 PM, Eddie Epstein wrote:
>>
>> Cannot deserialize into a CAS from getEmptyCas().
>
> This is not right.  More information soon (ran out of time today). -Marshall
>
>> Must use a CAS from
>> CasCreationUtils.createCas for deserialization, and then use casCopier
>> to copy to the CAS from getEmptyCas().
>>
>> Pick the version of createCas that specifies a typesystem, and use the
>> typesystem from the pipeline CAS (i.e. the one from getEmptyCas).
>>
>> On Thu, Mar 15, 2012 at 2:44 PM, Eric Riebling<er...@cs.cmu.edu>  wrote:
>>>
>>> Thanks, guys.  This is getting me closer to the goal, and explains the
>>> observed
>>> behaviors.  Now I'm facing issues when implemented as a CAS Multiplier.
>>>  I
>>> try
>>> creating a new CAS first with getEmptyJCas().
>>>
>>> Here are some various strategies and what resulted:
>>>
>>>  * create a deserializer with the typesystem from the AE (which
>>>        includes types in the 'external' CAS to be deserialized)
>>>  * ues it to deserialize into the empty CAS created with getEmptyJCas()
>>>
>>>  ->  The deserialized CAS for some reason has only the base TOP
>>> typesystem
>>>  ->  Trying to access an annotation from an index (that should be there)
>>>    generates the "used in Java code,  but was not declared in the XML
>>> type
>>> descriptor"
>>>        exception
>>>
>>>  * same as above, but use CasCopier to try and copy the type system
>>>        (and everything else) from the CAS in the AE's process() method
>>>          into the empty CAS
>>>
>>>  ->  Attempted to copy a FeatureStructure of type "(my type name)", which
>>> is
>>> not defined in the type system of the destination CAS.
>>>
>>> It seems the ONLY way to obtain a CAS (empty or otherwise) that has the
>>> type
>>> system able
>>> to accept the external CAS being deserialized is to use the very CAS
>>> passed
>>> into
>>> the AE's process() method.  Doing so obviously mangles that CAS for the
>>> rest
>>> of
>>> the pipeline.
>>>
>>>
>>> On 3/15/2012 1:50 PM, Marshall Schor wrote:
>>>>
>>>>
>>>> On 3/15/2012 10:38 AM, Eric Riebling wrote:
>>>>>
>>>>> I have a pipeline with it's own type system.
>>>>> I also have deserialized, annotated CASes on disk with a different type
>>>>> system.
>>>>> Suppose I want an Analysis Engine in the pipeline to read in the
>>>>> deserialized
>>>>> CASes in order to obtain annotations and 'do things with them'
>>>>>
>>>>> I understand some limitations in the UIMA framework prevent this, but
>>>>> could it be done by making the first type system include that of the
>>>>> CASes to deserialize?
>>>>
>>>> Yes, I think so.
>>>>>
>>>>>
>>>>> Also, it would necessitate creating new CASes within the Analysis
>>>>> Engine.
>>>>> I could think of several approaches, and have tried some without
>>>>> success:
>>>>>
>>>>> * Create a new, 'temporary' View in the AE's process() method, obtain a
>>>>> JCas, obtain it's CAS, and use that to store the deserialized CASes
>>>>> (seems to mangle the original CAS and break downstream AEs in the
>>>>> pipeline,
>>>>> and seems to not be able to find any annotations in the deserialized
>>>>> CAS)
>>>>>
>>>> This won't work. The deserialize method effectively "resets" the CAS
>>>> before loading it.
>>>> A view is not a new CAS; it is a new view of the same CAS.
>>>>
>>>>> * Use the CAS in the process() method to store the deserialized CASes
>>>>> (also mangles the original CAS, breaks downstream AEs, but DOES
>>>>> permit obtaining annotations from the deserialized CASes)
>>>>
>>>> Right, deserializing into an existing CAS resets it in flight.
>>>>>
>>>>>
>>>>> * Make the Analysis Engine be a CAS Multiplier, and deserialize into
>>>>> a CAS created with createEmtpyCas()
>>>>> (I haven't tried this yet)
>>>>
>>>> Yes, this is the way to get a separate CAS instance to deserialize into.
>>>> It's how Collection Readers do it.
>>>> -Marshall
>>>>>
>>>>>
>>>>> It's kind of a use case for a hybrid Component that behaves in some
>>>>> ways
>>>>> like
>>>>> an AE (has a process() method), in some ways like XMI Collection
>>>>> Reader,
>>>>> and
>>>>> in some ways like a CAS Multiplier.
>>>>>
>>>>> But it's a useful use case! It is also a very bizarre one becuase you
>>>>> could
>>>>> almost think of it as a pipeline within a pipeline, which processes a
>>>>> set
>>>>> of deserialized annotated XMI documents, within a pipeline that
>>>>> processes
>>>>> ...
>>>>> in our case, a Question Answering system with question keyterms,
>>>>> ranked lists of documents and answer candidates.
>>>>>
>

Re: Getting annotations from CASes 'external' to a pipeline

Posted by Marshall Schor <ms...@schor.com>.


On 3/15/2012 4:38 PM, Eddie Epstein wrote:
> Cannot deserialize into a CAS from getEmptyCas().
This is not right.  More information soon (ran out of time today). -Marshall
> Must use a CAS from
> CasCreationUtils.createCas for deserialization, and then use casCopier
> to copy to the CAS from getEmptyCas().
>
> Pick the version of createCas that specifies a typesystem, and use the
> typesystem from the pipeline CAS (i.e. the one from getEmptyCas).
>
> On Thu, Mar 15, 2012 at 2:44 PM, Eric Riebling<er...@cs.cmu.edu>  wrote:
>> Thanks, guys.  This is getting me closer to the goal, and explains the
>> observed
>> behaviors.  Now I'm facing issues when implemented as a CAS Multiplier.  I
>> try
>> creating a new CAS first with getEmptyJCas().
>>
>> Here are some various strategies and what resulted:
>>
>>   * create a deserializer with the typesystem from the AE (which
>>         includes types in the 'external' CAS to be deserialized)
>>   * ues it to deserialize into the empty CAS created with getEmptyJCas()
>>
>>   ->  The deserialized CAS for some reason has only the base TOP typesystem
>>   ->  Trying to access an annotation from an index (that should be there)
>>     generates the "used in Java code,  but was not declared in the XML type
>> descriptor"
>>         exception
>>
>>   * same as above, but use CasCopier to try and copy the type system
>>         (and everything else) from the CAS in the AE's process() method
>>           into the empty CAS
>>
>>   ->  Attempted to copy a FeatureStructure of type "(my type name)", which is
>> not defined in the type system of the destination CAS.
>>
>> It seems the ONLY way to obtain a CAS (empty or otherwise) that has the type
>> system able
>> to accept the external CAS being deserialized is to use the very CAS passed
>> into
>> the AE's process() method.  Doing so obviously mangles that CAS for the rest
>> of
>> the pipeline.
>>
>>
>> On 3/15/2012 1:50 PM, Marshall Schor wrote:
>>>
>>> On 3/15/2012 10:38 AM, Eric Riebling wrote:
>>>> I have a pipeline with it's own type system.
>>>> I also have deserialized, annotated CASes on disk with a different type
>>>> system.
>>>> Suppose I want an Analysis Engine in the pipeline to read in the
>>>> deserialized
>>>> CASes in order to obtain annotations and 'do things with them'
>>>>
>>>> I understand some limitations in the UIMA framework prevent this, but
>>>> could it be done by making the first type system include that of the
>>>> CASes to deserialize?
>>> Yes, I think so.
>>>>
>>>> Also, it would necessitate creating new CASes within the Analysis Engine.
>>>> I could think of several approaches, and have tried some without success:
>>>>
>>>> * Create a new, 'temporary' View in the AE's process() method, obtain a
>>>> JCas, obtain it's CAS, and use that to store the deserialized CASes
>>>> (seems to mangle the original CAS and break downstream AEs in the
>>>> pipeline,
>>>> and seems to not be able to find any annotations in the deserialized CAS)
>>>>
>>> This won't work. The deserialize method effectively "resets" the CAS
>>> before loading it.
>>> A view is not a new CAS; it is a new view of the same CAS.
>>>
>>>> * Use the CAS in the process() method to store the deserialized CASes
>>>> (also mangles the original CAS, breaks downstream AEs, but DOES
>>>> permit obtaining annotations from the deserialized CASes)
>>> Right, deserializing into an existing CAS resets it in flight.
>>>>
>>>> * Make the Analysis Engine be a CAS Multiplier, and deserialize into
>>>> a CAS created with createEmtpyCas()
>>>> (I haven't tried this yet)
>>> Yes, this is the way to get a separate CAS instance to deserialize into.
>>> It's how Collection Readers do it.
>>> -Marshall
>>>>
>>>> It's kind of a use case for a hybrid Component that behaves in some ways
>>>> like
>>>> an AE (has a process() method), in some ways like XMI Collection Reader,
>>>> and
>>>> in some ways like a CAS Multiplier.
>>>>
>>>> But it's a useful use case! It is also a very bizarre one becuase you
>>>> could
>>>> almost think of it as a pipeline within a pipeline, which processes a set
>>>> of deserialized annotated XMI documents, within a pipeline that processes
>>>> ...
>>>> in our case, a Question Answering system with question keyterms,
>>>> ranked lists of documents and answer candidates.
>>>>

Re: Getting annotations from CASes 'external' to a pipeline

Posted by Eddie Epstein <ea...@gmail.com>.

Cannot deserialize into a CAS from getEmptyCas(). Must use a CAS from
CasCreationUtils.createCas for deserialization, and then use casCopier
to copy to the CAS from getEmptyCas().

Pick the version of createCas that specifies a typesystem, and use the
typesystem from the pipeline CAS (i.e. the one from getEmptyCas).

On Thu, Mar 15, 2012 at 2:44 PM, Eric Riebling <er...@cs.cmu.edu> wrote:
> Thanks, guys.  This is getting me closer to the goal, and explains the
> observed
> behaviors.  Now I'm facing issues when implemented as a CAS Multiplier.  I
> try
> creating a new CAS first with getEmptyJCas().
>
> Here are some various strategies and what resulted:
>
>  * create a deserializer with the typesystem from the AE (which
>        includes types in the 'external' CAS to be deserialized)
>  * ues it to deserialize into the empty CAS created with getEmptyJCas()
>
>  -> The deserialized CAS for some reason has only the base TOP typesystem
>  -> Trying to access an annotation from an index (that should be there)
>    generates the "used in Java code,  but was not declared in the XML type
> descriptor"
>        exception
>
>  * same as above, but use CasCopier to try and copy the type system
>        (and everything else) from the CAS in the AE's process() method
>          into the empty CAS
>
>  -> Attempted to copy a FeatureStructure of type "(my type name)", which is
> not defined in the type system of the destination CAS.
>
> It seems the ONLY way to obtain a CAS (empty or otherwise) that has the type
> system able
> to accept the external CAS being deserialized is to use the very CAS passed
> into
> the AE's process() method.  Doing so obviously mangles that CAS for the rest
> of
> the pipeline.
>
>
> On 3/15/2012 1:50 PM, Marshall Schor wrote:
>>
>>
>> On 3/15/2012 10:38 AM, Eric Riebling wrote:
>>>
>>> I have a pipeline with it's own type system.
>>> I also have deserialized, annotated CASes on disk with a different type
>>> system.
>>> Suppose I want an Analysis Engine in the pipeline to read in the
>>> deserialized
>>> CASes in order to obtain annotations and 'do things with them'
>>>
>>> I understand some limitations in the UIMA framework prevent this, but
>>> could it be done by making the first type system include that of the
>>> CASes to deserialize?
>>
>> Yes, I think so.
>>>
>>>
>>> Also, it would necessitate creating new CASes within the Analysis Engine.
>>> I could think of several approaches, and have tried some without success:
>>>
>>> * Create a new, 'temporary' View in the AE's process() method, obtain a
>>> JCas, obtain it's CAS, and use that to store the deserialized CASes
>>> (seems to mangle the original CAS and break downstream AEs in the
>>> pipeline,
>>> and seems to not be able to find any annotations in the deserialized CAS)
>>>
>> This won't work. The deserialize method effectively "resets" the CAS
>> before loading it.
>> A view is not a new CAS; it is a new view of the same CAS.
>>
>>> * Use the CAS in the process() method to store the deserialized CASes
>>> (also mangles the original CAS, breaks downstream AEs, but DOES
>>> permit obtaining annotations from the deserialized CASes)
>>
>> Right, deserializing into an existing CAS resets it in flight.
>>>
>>>
>>> * Make the Analysis Engine be a CAS Multiplier, and deserialize into
>>> a CAS created with createEmtpyCas()
>>> (I haven't tried this yet)
>>
>> Yes, this is the way to get a separate CAS instance to deserialize into.
>> It's how Collection Readers do it.
>> -Marshall
>>>
>>>
>>> It's kind of a use case for a hybrid Component that behaves in some ways
>>> like
>>> an AE (has a process() method), in some ways like XMI Collection Reader,
>>> and
>>> in some ways like a CAS Multiplier.
>>>
>>> But it's a useful use case! It is also a very bizarre one becuase you
>>> could
>>> almost think of it as a pipeline within a pipeline, which processes a set
>>> of deserialized annotated XMI documents, within a pipeline that processes
>>> ...
>>> in our case, a Question Answering system with question keyterms,
>>> ranked lists of documents and answer candidates.
>>>
>>
>

Re: Getting annotations from CASes 'external' to a pipeline

Posted by Eric Riebling <er...@cs.cmu.edu>.

Thanks, guys.  This is getting me closer to the goal, and explains the observed
behaviors.  Now I'm facing issues when implemented as a CAS Multiplier.  I try
creating a new CAS first with getEmptyJCas().

Here are some various strategies and what resulted:

  * create a deserializer with the typesystem from the AE (which
	includes types in the 'external' CAS to be deserialized)
  * ues it to deserialize into the empty CAS created with getEmptyJCas()

  -> The deserialized CAS for some reason has only the base TOP typesystem
  -> Trying to access an annotation from an index (that should be there)
     generates the "used in Java code,  but was not declared in the XML type descriptor"
	exception

  * same as above, but use CasCopier to try and copy the type system
	(and everything else) from the CAS in the AE's process() method
	  into the empty CAS

  -> Attempted to copy a FeatureStructure of type "(my type name)", which is not defined in the type system of the destination CAS.

It seems the ONLY way to obtain a CAS (empty or otherwise) that has the type system able
to accept the external CAS being deserialized is to use the very CAS passed into
the AE's process() method.  Doing so obviously mangles that CAS for the rest of
the pipeline.

On 3/15/2012 1:50 PM, Marshall Schor wrote:
>
> On 3/15/2012 10:38 AM, Eric Riebling wrote:
>> I have a pipeline with it's own type system.
>> I also have deserialized, annotated CASes on disk with a different type system.
>> Suppose I want an Analysis Engine in the pipeline to read in the deserialized
>> CASes in order to obtain annotations and 'do things with them'
>>
>> I understand some limitations in the UIMA framework prevent this, but
>> could it be done by making the first type system include that of the
>> CASes to deserialize?
> Yes, I think so.
>>
>> Also, it would necessitate creating new CASes within the Analysis Engine.
>> I could think of several approaches, and have tried some without success:
>>
>> * Create a new, 'temporary' View in the AE's process() method, obtain a
>> JCas, obtain it's CAS, and use that to store the deserialized CASes
>> (seems to mangle the original CAS and break downstream AEs in the pipeline,
>> and seems to not be able to find any annotations in the deserialized CAS)
>>
> This won't work. The deserialize method effectively "resets" the CAS before loading it.
> A view is not a new CAS; it is a new view of the same CAS.
>
>> * Use the CAS in the process() method to store the deserialized CASes
>> (also mangles the original CAS, breaks downstream AEs, but DOES
>> permit obtaining annotations from the deserialized CASes)
> Right, deserializing into an existing CAS resets it in flight.
>>
>> * Make the Analysis Engine be a CAS Multiplier, and deserialize into
>> a CAS created with createEmtpyCas()
>> (I haven't tried this yet)
> Yes, this is the way to get a separate CAS instance to deserialize into. It's how Collection Readers do it.
> -Marshall
>>
>> It's kind of a use case for a hybrid Component that behaves in some ways like
>> an AE (has a process() method), in some ways like XMI Collection Reader, and
>> in some ways like a CAS Multiplier.
>>
>> But it's a useful use case! It is also a very bizarre one becuase you could
>> almost think of it as a pipeline within a pipeline, which processes a set
>> of deserialized annotated XMI documents, within a pipeline that processes ...
>> in our case, a Question Answering system with question keyterms,
>> ranked lists of documents and answer candidates.
>>
>

Re: Getting annotations from CASes 'external' to a pipeline

Posted by Marshall Schor <ms...@schor.com>.

On 3/15/2012 10:38 AM, Eric Riebling wrote:
> I have a pipeline with it's own type system.
> I also have deserialized, annotated CASes on disk with a different type system.
> Suppose I want an Analysis Engine in the pipeline to read in the deserialized
> CASes in order to obtain annotations and 'do things with them'
>
> I understand some limitations in the UIMA framework prevent this, but
> could it be done by making the first type system include that of the
> CASes to deserialize?
Yes, I think so.
>
> Also, it would necessitate creating new CASes within the Analysis Engine.
> I could think of several approaches, and have tried some without success:
>
>  * Create a new, 'temporary' View in the AE's process() method, obtain a
>     JCas, obtain it's CAS, and use that to store the deserialized CASes
>    (seems to mangle the original CAS and break downstream AEs in the pipeline,
>     and seems to not be able to find any annotations in the deserialized CAS)
>
This won't work.  The deserialize method effectively "resets" the CAS before 
loading it.
A view is not a new CAS; it is a new view of the same CAS.

>  * Use the CAS in the process() method to store the deserialized CASes
>     (also mangles the original CAS, breaks downstream AEs, but DOES
>     permit obtaining annotations from the deserialized CASes)
Right, deserializing into an existing CAS resets it in flight.
>
>  * Make the Analysis Engine be a CAS Multiplier, and deserialize into
>     a CAS created with createEmtpyCas()
>     (I haven't tried this yet)
Yes, this is the way to get a separate CAS instance to deserialize into.  It's 
how Collection Readers do it.
  -Marshall
>
> It's kind of a use case for a hybrid Component that behaves in some ways like
> an AE (has a process() method), in some ways like XMI Collection Reader, and
> in some ways like a CAS Multiplier.
>
> But it's a useful use case!  It is also a very bizarre one becuase you could
> almost think of it as a pipeline within a pipeline, which processes a set
> of deserialized annotated XMI documents, within a pipeline that processes ...
> in our case, a Question Answering system with question keyterms,
> ranked lists of documents and answer candidates.
>