You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by LASRI YASSINE <la...@gmail.com> on 2007/02/28 16:11:30 UTC

Help on UIMA Analysis Engine Agreggation

Hello,

 I have create an annotator that extract all String beginning with a capital
(Accccc)letter and I want to use this annotator (in Aggregation) to extract
all Sentences containing 2 String all of them begin with capila letter
(Xaaaaa Ybbbbb) .

Please if you can help me

Best regards
Yassine

Re: Help on UIMA Analysis Engine Agreggation

Posted by Adam Lally <al...@alum.rpi.edu>.

On 2/28/07, LASRI YASSINE <la...@gmail.com> wrote:
> Thank you for your response, my problem is that :
> I have an external file that contains a list of persons names, for example :
>
> adam
> smith
> lary
> page
> ... etc
> and I need to extract all persons names from others source (Text Documents),
> for example :
> "Lary Page is the creator of google and Adam Smith is an economist"
> The annotator shoul extract <Adam Smith> and <Lary Page> as  person name. So
> what I can do ?
>

I'm not sure I completely understand your scenario, but is it the case
that you've already written an Annotator that creates annotations over
the individual works in the list?  So for example it would annotate
<Adam> and <Smith> as separate PersonName annotations?

If so, then I think the appraoach from my last mail would work.  In a
second annotator, iterate over all the PersonName annotations.  For
each two consecutive annotations a1 and a2, check if
documentText.substring(a1.getEnd(), a2.getBegin()) is all whitespace.
If so, create a new annotation (e.g. FullPersonName) spanning from
a1.getBegin() to a2.getEnd().

-Adam

Re: Help on UIMA Analysis Engine Agreggation

Posted by LASRI YASSINE <la...@gmail.com>.

Thank you for your response, my problem is that :
I have an external file that contains a list of persons names, for example :

adam
smith
lary
page
... etc
and I need to extract all persons names from others source (Text Documents),
for example :
"Lary Page is the creator of google and Adam Smith is an economist"
The annotator shoul extract <Adam Smith> and <Lary Page> as  person name. So
what I can do ?

Bests
- Yassine



2007/2/28, Adam Lally <al...@alum.rpi.edu>:
>
> On 2/28/07, LASRI YASSINE <la...@gmail.com> wrote:
> > Hello,
> >
> >  I have create an annotator that extract all String beginning with a
> capital
> > (Accccc)letter and I want to use this annotator (in Aggregation) to
> extract
> > all Sentences containing 2 String all of them begin with capila letter
> > (Xaaaaa Ybbbbb) .
> >
>
> Hi,
>
> You will need to create a second annotator, which will take the
> results of your first annotator and do further processing on them.
> This approach is shown in the MeetingAnnotator example that is
> excercise 4 of the tutorial (see the Annotator & Analysis Engine
> Developer's Guide chapter in the documentation).
>
> Say your first annotator outputs FeatureStructures of the type
> CapitalizedWord.  Your second annotator would get an iterator over
> CapitalizedWords, for example:
>
> jcas.getJFSIndexRepository().getAnnotationIndex(CapitalizedWord.type
> ).iterator()
>
> Then you iterate over the Capitalized Word annotations and for each
> pair of annotations you can could if they are adjacent in the document
> by seeing if the document text between them is all whitespace.  If you
> find an adjacent pair of CapitalizedWords you can then create a new
> annotation of some other type that spans both CapitalizedWords.
>
> You then create an Aggregate Analysis Engine contains both of your
> annotators.  The way to do this is shown in the tutorial as well.
>
> It wasn't clear to me from your question whether you also need to
> detect sentence boundaries in your document.  If so you can you the
> example SimpleTokenAndSentenceAnnotator that comes with the SDK.
>
> Hope that helps,
>
> -Adam
>

Re: Help on UIMA Analysis Engine Agreggation

Posted by Adam Lally <al...@alum.rpi.edu>.

On 2/28/07, LASRI YASSINE <la...@gmail.com> wrote:
> Hello,
>
>  I have create an annotator that extract all String beginning with a capital
> (Accccc)letter and I want to use this annotator (in Aggregation) to extract
> all Sentences containing 2 String all of them begin with capila letter
> (Xaaaaa Ybbbbb) .
>

Hi,

You will need to create a second annotator, which will take the
results of your first annotator and do further processing on them.
This approach is shown in the MeetingAnnotator example that is
excercise 4 of the tutorial (see the Annotator & Analysis Engine
Developer's Guide chapter in the documentation).

Say your first annotator outputs FeatureStructures of the type
CapitalizedWord.  Your second annotator would get an iterator over
CapitalizedWords, for example:

jcas.getJFSIndexRepository().getAnnotationIndex(CapitalizedWord.type).iterator()

Then you iterate over the Capitalized Word annotations and for each
pair of annotations you can could if they are adjacent in the document
by seeing if the document text between them is all whitespace.  If you
find an adjacent pair of CapitalizedWords you can then create a new
annotation of some other type that spans both CapitalizedWords.

You then create an Aggregate Analysis Engine contains both of your
annotators.  The way to do this is shown in the tutorial as well.

It wasn't clear to me from your question whether you also need to
detect sentence boundaries in your document.  If so you can you the
example SimpleTokenAndSentenceAnnotator that comes with the SDK.

Hope that helps,

-Adam