You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Eduard Moraru <en...@gmail.com> on 2009/10/08 11:48:56 UTC

Re: Change document text from analysis engine for the initial view of a CAS after it has already been set.

Hi Christoph,

Thanks for the quick reply.

Your suggestion is actually quite nice, but the only problem is that I can
not dynamically make SOFA mappings from inside the Flow controller of an
aggregate.

I don't know if I explicitly mentioned it before, but I need to route
multiple annotators for a single Document (CAS) and each of these annotators
might need a converter.

In this case, I am back to the original problem:

1. Collection reader populates CAS with sofa data string.
2. Flow controller wants to assign Annotator A1 but it needs to assign
Converter C1 first so that A1 can work.
3. Converter C1 runs and populates View V1 with conversion result.
4. Annotator A1 runs and just does cas.getDocumentText on the default CAS
(because sofa mappings took care of assigning V1 as default view)
5. Flow controller needs to route Annotator A2 with Converter C2 before it.
6. Since sofa mappings are statically defined in the aggregate at the
beginning of the process/application, Converter C2 will try to populate the
same view V2 with the conversion result.
7. Step 6 crashes -> Sofa data string already set.


In the same direction (combined with my initial approach), I tried the
following (in the Flow controller):
                CAS conversionCAS = null;
                try {
                    // Simulate converter actions.
                    conversionCAS = cas.getView("conversionResult");
                    System.out.println("View retrieved.");

                    // Second annotator's turn, view alreay existed. Distroy
it.
                    CASImpl conversionCASImpl = (CASImpl) conversionCAS;
                    conversionCASImpl.resetNoQuestions();
                    conversionCASImpl.release();

                    // Recreate the view.
                    conversionCAS = cas.createView("conversionResult");
                    System.out.println("View re-created.");
                } catch (Exception e) {
                    // First run for the first annotator.
                    conversionCAS = cas.createView("conversionResult");
                    System.out.println("View created.");
                }

                // Simulate converter populating the view with the converted
data.
                conversionCAS.setDocumentText(UUID.randomUUID().toString());

                // Return the annotator's step so it can run.

In the above example, the aggregate owning this flow controller has a sofa
mapping set up for two annotators. In the flow controller I have simulated
the actions of a Converter and right after that, the step coresponding to an
annotator is returned. (basic flow scenario).

I observe that the document text actually gets changed but the Indexes of
the base CAS (_InitialView) gets reset as well. I expected that only the
"conversionResult" would get reset together with its sofa data.

I also tried to cache an annotation from the initial view, reset the
existing "conversionResult" view and recreate it (as above) then add it back
to the initial view, but the addFSToIndex failed like in my first mail on
this thread.

Isn't there any way of achieving this for the same cas(document) in UIMA for
sofa-unaware annotators?

As a summary, the desired flow is this:

1. collectionReader.next() -> document content inside a CAS

2. Flow controller (originalContent = cas.getDocumentText())

2.1 Converter1 (cas.getDocumentText();
cas.setDocumentText("convertedVersion")
2.2 Annotator1 (cas.getDocumentText())
2.3 Annotator2 (cas.getDocumentText())
2.4 AnnotatorX (cas.getDocumentText())

2.5 Flow Controller reset cas. (cas.setDocumentText(originalContent)
2.6 Converter2 (cas.getDocumentText();
cas.setDocumentText("convertedVersion")
2.7 Same as 2.2-2.4

2.7 Same as 2.5-2.7
2.8 etc.

3. Consumer (cas.getAnnoationIndex...)

4. Go back to 1 for next document (CAS)

I am really stuck with this. Any help/idea is greatly appreciated.

Thanks!

On Wed, Oct 7, 2009 at 5:44 PM, Christoph Büscher <
christoph.buescher@neofonie.de> wrote:

> Hi Eduardo,
>
> maybe working with SofaMappings and multiple views can solve what you're
> trying to do:
>
> - Write various converters that you run before your actual annotator. Each
> of this converters creates a new view with Name "XYZ" in the format that
> your annotator understands
>
> - You can also copy annotations from the "source" to the "target" view.
> However this only makes sense if these are no real "annotations" that point
> to spans in your text. You might also convert existing annotations begin/end
> offsets too.
>
> - In the annotator descriptor use SofaMapping to map the view "XYZ" to the
> default view. This way your annotators implementation doesn't need to know
> about the multiple views.
>
> Hope this helps,
>
> Christoph
>
>
> Eduard Moraru schrieb:
>
>  Hello UIMA users,
>>
>> I know annotators are not *supposed* to change the sofa data once it has
>> been set but I really need it in my setup.
>>
>> First of all, I go by the premise that I want to integrate an annotator
>> that
>> is not really sofa-aware, thus using cas.getDocumentText() to do it's job.
>>
>> The problem with this is that an annotator might know how to process rss
>> feeds, csv data, etc. but the current cas sofa data String is in another
>> (text) format (say newsML, etc.). Remember: the annotator does not know
>> about multiple views. It just wants the default sofa data string.
>>
>> This requires adding some sort of converter which should be reusable. This
>> converter would be implemented as an analysis engine and routed by a Flow
>> controller, just before the annotator, so that in the cas, the annotator
>> would find the right text format.
>>
>> The problem with this approach is that, after the Flow controller
>> determines
>> that an annotator needs a converter, and returns the converter's step so
>> that the converter could do its job and prepare the cas for the annotator,
>> it gets into trouble because UIMA does not allow modifying the sofa data
>> string after it has already been set, thus unable to convert.
>>
>> The exception is, just in case:
>> org.apache.uima.cas.CASRuntimeException: Data for Sofa feature
>> setLocalSofaData() has already been set.
>>
>> I understand that this is a common sense restriction imposed by default by
>> UIMA, but I would like to disable the restriction from the Flow controller
>> just for the converter, then enable it back for the annotator. The flow
>> controller would cache the original content and restore it after each
>> annotator has finished it's job and before routing another
>> annotator/converter pair.
>>
>> I tried this:
>> ((CASImpl) cas).enableReset(true);
>> cas.reset();
>> cas.setDocumentText("test");
>>
>> But that, obviously, removes any annotations from the CAS's index and I
>> don't want that. So I tried to restore the cas by doing:
>> cas.addFsToIndexes(somePreviousAnnotation);
>>
>> The result is a NPE that seems to be caused by an invalid state of the CAS
>> that I have just reset.
>>
>> Here's the stack trace:
>> Caused by: java.lang.NullPointerException
>>    at
>>
>> org.apache.uima.cas.impl.FSIndexRepositoryImpl.ll_addFS(FSIndexRepositoryImpl.java:1344)
>>    at
>>
>> org.apache.uima.cas.impl.FSIndexRepositoryImpl.addFS(FSIndexRepositoryImpl.java:812)
>>    at
>>
>> org.apache.uima.cas.impl.FSIndexRepositoryImpl.addFS(FSIndexRepositoryImpl.java:1258)
>>    at org.apache.uima.cas.impl.CASImpl.addFsToIndexes(CASImpl.java:3787)
>>    at
>>
>> ws.scribo.MediaTypeFlowController$MediaTypeFlow.next(MediaTypeFlowController.java:184)
>>
>> I would really appreciate it if someone would help me set the CAS sofa
>> data
>> from an annotator after it already has been set, as I explained above.
>>
>> If my approach is fundamentally flawed, I would highly appreciate
>> suggestions on how may I achieve the same results as initially desired,
>> hopefully still respecting the restriction that annotators are not sofa
>> aware.
>>
>> Thanks for your time!
>>
>>
>
> --
> --------------------------------
> Christoph Büscher
> Softwareentwicklung
>
> neofonie
> Technologieentwicklung und
> Informationsmanagement GmbH
> Robert-Koch-Platz 4
> 10115 Berlin
> fon: +49.30 24627 522
> fax: +49.30 24627 120
> http://www.neofonie.de
>
> Handelsregister
> Berlin-Charlottenburg: HRB 67460
>
> Geschäftsführung
> Helmut Hoffer von Ankershoffen
> (Sprecher der Geschaeftsfuehrung)
> Nurhan Yildirim
>

Re: Change document text from analysis engine for the initial view of a CAS after it has already been set.

Posted by Eddie Epstein <ea...@gmail.com>.
Hi Eduard,

With "slightly" sofa aware annotators 1..X a fairly simple solution is
possible. For each new conversion, the flow controller would set the
name of the new view in a "control" FS. Converters would use that name
to create an output view. Annotators 1..X would then also use the
control FS to select which view to work on.

If the views were named "fixedPrefix.varableSuffix", the Consumer
could use getViewIterator("fixedPrefix") to iterate through all the
result views.

Is it possible for you to make annotators 1..X view aware?

Eddie

On Thu, Oct 8, 2009 at 5:48 AM, Eduard Moraru <en...@gmail.com> wrote:
> Hi Christoph,
>
> Thanks for the quick reply.
>
> Your suggestion is actually quite nice, but the only problem is that I can
> not dynamically make SOFA mappings from inside the Flow controller of an
> aggregate.
>
> I don't know if I explicitly mentioned it before, but I need to route
> multiple annotators for a single Document (CAS) and each of these annotators
> might need a converter.
>
> In this case, I am back to the original problem:
>
> 1. Collection reader populates CAS with sofa data string.
> 2. Flow controller wants to assign Annotator A1 but it needs to assign
> Converter C1 first so that A1 can work.
> 3. Converter C1 runs and populates View V1 with conversion result.
> 4. Annotator A1 runs and just does cas.getDocumentText on the default CAS
> (because sofa mappings took care of assigning V1 as default view)
> 5. Flow controller needs to route Annotator A2 with Converter C2 before it.
> 6. Since sofa mappings are statically defined in the aggregate at the
> beginning of the process/application, Converter C2 will try to populate the
> same view V2 with the conversion result.
> 7. Step 6 crashes -> Sofa data string already set.
>
>
> In the same direction (combined with my initial approach), I tried the
> following (in the Flow controller):
>                CAS conversionCAS = null;
>                try {
>                    // Simulate converter actions.
>                    conversionCAS = cas.getView("conversionResult");
>                    System.out.println("View retrieved.");
>
>                    // Second annotator's turn, view alreay existed. Distroy
> it.
>                    CASImpl conversionCASImpl = (CASImpl) conversionCAS;
>                    conversionCASImpl.resetNoQuestions();
>                    conversionCASImpl.release();
>
>                    // Recreate the view.
>                    conversionCAS = cas.createView("conversionResult");
>                    System.out.println("View re-created.");
>                } catch (Exception e) {
>                    // First run for the first annotator.
>                    conversionCAS = cas.createView("conversionResult");
>                    System.out.println("View created.");
>                }
>
>                // Simulate converter populating the view with the converted
> data.
>                conversionCAS.setDocumentText(UUID.randomUUID().toString());
>
>                // Return the annotator's step so it can run.
>
> In the above example, the aggregate owning this flow controller has a sofa
> mapping set up for two annotators. In the flow controller I have simulated
> the actions of a Converter and right after that, the step coresponding to an
> annotator is returned. (basic flow scenario).
>
> I observe that the document text actually gets changed but the Indexes of
> the base CAS (_InitialView) gets reset as well. I expected that only the
> "conversionResult" would get reset together with its sofa data.
>
> I also tried to cache an annotation from the initial view, reset the
> existing "conversionResult" view and recreate it (as above) then add it back
> to the initial view, but the addFSToIndex failed like in my first mail on
> this thread.
>
> Isn't there any way of achieving this for the same cas(document) in UIMA for
> sofa-unaware annotators?
>
> As a summary, the desired flow is this:
>
> 1. collectionReader.next() -> document content inside a CAS
>
> 2. Flow controller (originalContent = cas.getDocumentText())
>
> 2.1 Converter1 (cas.getDocumentText();
> cas.setDocumentText("convertedVersion")
> 2.2 Annotator1 (cas.getDocumentText())
> 2.3 Annotator2 (cas.getDocumentText())
> 2.4 AnnotatorX (cas.getDocumentText())
>
> 2.5 Flow Controller reset cas. (cas.setDocumentText(originalContent)
> 2.6 Converter2 (cas.getDocumentText();
> cas.setDocumentText("convertedVersion")
> 2.7 Same as 2.2-2.4
>
> 2.7 Same as 2.5-2.7
> 2.8 etc.
>
> 3. Consumer (cas.getAnnoationIndex...)
>
> 4. Go back to 1 for next document (CAS)
>
> I am really stuck with this. Any help/idea is greatly appreciated.
>
> Thanks!
>
> On Wed, Oct 7, 2009 at 5:44 PM, Christoph Büscher <
> christoph.buescher@neofonie.de> wrote:
>
>> Hi Eduardo,
>>
>> maybe working with SofaMappings and multiple views can solve what you're
>> trying to do:
>>
>> - Write various converters that you run before your actual annotator. Each
>> of this converters creates a new view with Name "XYZ" in the format that
>> your annotator understands
>>
>> - You can also copy annotations from the "source" to the "target" view.
>> However this only makes sense if these are no real "annotations" that point
>> to spans in your text. You might also convert existing annotations begin/end
>> offsets too.
>>
>> - In the annotator descriptor use SofaMapping to map the view "XYZ" to the
>> default view. This way your annotators implementation doesn't need to know
>> about the multiple views.
>>
>> Hope this helps,
>>
>> Christoph
>>
>>
>> Eduard Moraru schrieb:
>>
>>  Hello UIMA users,
>>>
>>> I know annotators are not *supposed* to change the sofa data once it has
>>> been set but I really need it in my setup.
>>>
>>> First of all, I go by the premise that I want to integrate an annotator
>>> that
>>> is not really sofa-aware, thus using cas.getDocumentText() to do it's job.
>>>
>>> The problem with this is that an annotator might know how to process rss
>>> feeds, csv data, etc. but the current cas sofa data String is in another
>>> (text) format (say newsML, etc.). Remember: the annotator does not know
>>> about multiple views. It just wants the default sofa data string.
>>>
>>> This requires adding some sort of converter which should be reusable. This
>>> converter would be implemented as an analysis engine and routed by a Flow
>>> controller, just before the annotator, so that in the cas, the annotator
>>> would find the right text format.
>>>
>>> The problem with this approach is that, after the Flow controller
>>> determines
>>> that an annotator needs a converter, and returns the converter's step so
>>> that the converter could do its job and prepare the cas for the annotator,
>>> it gets into trouble because UIMA does not allow modifying the sofa data
>>> string after it has already been set, thus unable to convert.
>>>
>>> The exception is, just in case:
>>> org.apache.uima.cas.CASRuntimeException: Data for Sofa feature
>>> setLocalSofaData() has already been set.
>>>
>>> I understand that this is a common sense restriction imposed by default by
>>> UIMA, but I would like to disable the restriction from the Flow controller
>>> just for the converter, then enable it back for the annotator. The flow
>>> controller would cache the original content and restore it after each
>>> annotator has finished it's job and before routing another
>>> annotator/converter pair.
>>>
>>> I tried this:
>>> ((CASImpl) cas).enableReset(true);
>>> cas.reset();
>>> cas.setDocumentText("test");
>>>
>>> But that, obviously, removes any annotations from the CAS's index and I
>>> don't want that. So I tried to restore the cas by doing:
>>> cas.addFsToIndexes(somePreviousAnnotation);
>>>
>>> The result is a NPE that seems to be caused by an invalid state of the CAS
>>> that I have just reset.
>>>
>>> Here's the stack trace:
>>> Caused by: java.lang.NullPointerException
>>>    at
>>>
>>> org.apache.uima.cas.impl.FSIndexRepositoryImpl.ll_addFS(FSIndexRepositoryImpl.java:1344)
>>>    at
>>>
>>> org.apache.uima.cas.impl.FSIndexRepositoryImpl.addFS(FSIndexRepositoryImpl.java:812)
>>>    at
>>>
>>> org.apache.uima.cas.impl.FSIndexRepositoryImpl.addFS(FSIndexRepositoryImpl.java:1258)
>>>    at org.apache.uima.cas.impl.CASImpl.addFsToIndexes(CASImpl.java:3787)
>>>    at
>>>
>>> ws.scribo.MediaTypeFlowController$MediaTypeFlow.next(MediaTypeFlowController.java:184)
>>>
>>> I would really appreciate it if someone would help me set the CAS sofa
>>> data
>>> from an annotator after it already has been set, as I explained above.
>>>
>>> If my approach is fundamentally flawed, I would highly appreciate
>>> suggestions on how may I achieve the same results as initially desired,
>>> hopefully still respecting the restriction that annotators are not sofa
>>> aware.
>>>
>>> Thanks for your time!
>>>
>>>
>>
>> --
>> --------------------------------
>> Christoph Büscher
>> Softwareentwicklung
>>
>> neofonie
>> Technologieentwicklung und
>> Informationsmanagement GmbH
>> Robert-Koch-Platz 4
>> 10115 Berlin
>> fon: +49.30 24627 522
>> fax: +49.30 24627 120
>> http://www.neofonie.de
>>
>> Handelsregister
>> Berlin-Charlottenburg: HRB 67460
>>
>> Geschäftsführung
>> Helmut Hoffer von Ankershoffen
>> (Sprecher der Geschaeftsfuehrung)
>> Nurhan Yildirim
>>
>