You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Nils Reiter <li...@nilsreiter.de> on 2016/02/16 12:02:25 UTC
Running an AnalysisEngine on part of a document
Hi,
is there a way to run an analysis engine on only a part of the CAS?
I have UIMA annotations over all the substrings that I want to process. The only way I could think of is creating new views or CASs for each string, but that would result in > 100 views. Is there a more straightforward way?
Background:
Only part of the CAS contains natural language, other parts are lists, names and headers. I would like to POS-tag the text, but not the rest.
Thanks in advance for any pointers or suggestions,
Nils
Re: Running an AnalysisEngine on part of a document
Posted by Richard Eckart de Castilho <re...@apache.org>.
Yes, most of the DKPro Core components rely on token/sentence annotations.
You can find a list of the types and which components create/consume them
here https://dkpro.github.io/dkpro-core/documentation/ under the section
"DKPro Core 1.8.0-SNAPSHOT" -> "Typesystem reference".
Best,
-- Richard
> On 16.02.2016, at 13:13, Nils Reiter <li...@nilsreiter.de> wrote:
>
> Hi Richard,
>
> thanks for your reply and don’t worry, I am planning on using DKpro components :)
>
> So if I get you correctly, all DKpro components rely on token/sentence annotations and ignore the rest, right?
>
> Best regards,
> Nils
>
>> On 16 Feb 2016, at 12:18, Richard Eckart de Castilho <re...@apache.org> wrote:
>>
>> Ok, sorry, the answer below would assume you are using DKPro Core components ;)
>>
>> Sorry Nils, I didn't notice you were posting to the Apache UIMA list.
>>
>> So for UIMA in general, I am not aware of a solution other that what you describe. So it would depend on the components / component collection that you are using.
>>
>> Cheers,
>>
>> -- Richard
>>
>>> On 16.02.2016, at 12:17, Richard Eckart de Castilho <re...@apache.org> wrote:
>>>
>>> The easiest would be to remove the token/sentence annotations of those parts of the text that you do not care about.
>>> Or alternatively - if you have annotations that specifically mark the text sections, then configure the segmenter component to create sentences/tokens only within the boundaries of these annotations using PARAM_ZONE_TYPES and PARAM_STRICT_ZONING.
>>>
>>> Cheers,
>>>
>>> -- Richard
>>>
>>>> On 16.02.2016, at 12:02, Nils Reiter <li...@nilsreiter.de> wrote:
>>>>
>>>> Hi,
>>>>
>>>> is there a way to run an analysis engine on only a part of the CAS?
>>>>
>>>> I have UIMA annotations over all the substrings that I want to process. The only way I could think of is creating new views or CASs for each string, but that would result in > 100 views. Is there a more straightforward way?
>>>>
>>>> Background:
>>>> Only part of the CAS contains natural language, other parts are lists, names and headers. I would like to POS-tag the text, but not the rest.
>>>>
>>>> Thanks in advance for any pointers or suggestions,
>>>> Nils
>>>
>>
>
Re: Running an AnalysisEngine on part of a document
Posted by Nils Reiter <li...@nilsreiter.de>.
Hi Richard,
thanks for your reply and don’t worry, I am planning on using DKpro components :)
So if I get you correctly, all DKpro components rely on token/sentence annotations and ignore the rest, right?
Best regards,
Nils
> On 16 Feb 2016, at 12:18, Richard Eckart de Castilho <re...@apache.org> wrote:
>
> Ok, sorry, the answer below would assume you are using DKPro Core components ;)
>
> Sorry Nils, I didn't notice you were posting to the Apache UIMA list.
>
> So for UIMA in general, I am not aware of a solution other that what you describe. So it would depend on the components / component collection that you are using.
>
> Cheers,
>
> -- Richard
>
>> On 16.02.2016, at 12:17, Richard Eckart de Castilho <re...@apache.org> wrote:
>>
>> The easiest would be to remove the token/sentence annotations of those parts of the text that you do not care about.
>> Or alternatively - if you have annotations that specifically mark the text sections, then configure the segmenter component to create sentences/tokens only within the boundaries of these annotations using PARAM_ZONE_TYPES and PARAM_STRICT_ZONING.
>>
>> Cheers,
>>
>> -- Richard
>>
>>> On 16.02.2016, at 12:02, Nils Reiter <li...@nilsreiter.de> wrote:
>>>
>>> Hi,
>>>
>>> is there a way to run an analysis engine on only a part of the CAS?
>>>
>>> I have UIMA annotations over all the substrings that I want to process. The only way I could think of is creating new views or CASs for each string, but that would result in > 100 views. Is there a more straightforward way?
>>>
>>> Background:
>>> Only part of the CAS contains natural language, other parts are lists, names and headers. I would like to POS-tag the text, but not the rest.
>>>
>>> Thanks in advance for any pointers or suggestions,
>>> Nils
>>
>
Re: Running an AnalysisEngine on part of a document
Posted by Richard Eckart de Castilho <re...@apache.org>.
Ok, sorry, the answer below would assume you are using DKPro Core components ;)
Sorry Nils, I didn't notice you were posting to the Apache UIMA list.
So for UIMA in general, I am not aware of a solution other that what you describe. So it would depend on the components / component collection that you are using.
Cheers,
-- Richard
> On 16.02.2016, at 12:17, Richard Eckart de Castilho <re...@apache.org> wrote:
>
> The easiest would be to remove the token/sentence annotations of those parts of the text that you do not care about.
> Or alternatively - if you have annotations that specifically mark the text sections, then configure the segmenter component to create sentences/tokens only within the boundaries of these annotations using PARAM_ZONE_TYPES and PARAM_STRICT_ZONING.
>
> Cheers,
>
> -- Richard
>
>> On 16.02.2016, at 12:02, Nils Reiter <li...@nilsreiter.de> wrote:
>>
>> Hi,
>>
>> is there a way to run an analysis engine on only a part of the CAS?
>>
>> I have UIMA annotations over all the substrings that I want to process. The only way I could think of is creating new views or CASs for each string, but that would result in > 100 views. Is there a more straightforward way?
>>
>> Background:
>> Only part of the CAS contains natural language, other parts are lists, names and headers. I would like to POS-tag the text, but not the rest.
>>
>> Thanks in advance for any pointers or suggestions,
>> Nils
>
Re: Running an AnalysisEngine on part of a document
Posted by Richard Eckart de Castilho <re...@apache.org>.
The easiest would be to remove the token/sentence annotations of those parts of the text that you do not care about.
Or alternatively - if you have annotations that specifically mark the text sections, then configure the segmenter component to create sentences/tokens only within the boundaries of these annotations using PARAM_ZONE_TYPES and PARAM_STRICT_ZONING.
Cheers,
-- Richard
> On 16.02.2016, at 12:02, Nils Reiter <li...@nilsreiter.de> wrote:
>
> Hi,
>
> is there a way to run an analysis engine on only a part of the CAS?
>
> I have UIMA annotations over all the substrings that I want to process. The only way I could think of is creating new views or CASs for each string, but that would result in > 100 views. Is there a more straightforward way?
>
> Background:
> Only part of the CAS contains natural language, other parts are lists, names and headers. I would like to POS-tag the text, but not the rest.
>
> Thanks in advance for any pointers or suggestions,
> Nils