You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Ted Pedersen <tp...@d.umn.edu> on 2010/12/29 19:09:41 UTC

basic question on sharing results from ./documentAnalyzer.sh demo

Greetings all,

I'm fairly new to UIMA, and to get myself oriented I've been running
the documentAnalyzer.sh demo/samples, and it's proven to be pretty
easy to use and quite informative (about what you can do with UIMA).

One thing I'd like to be able to do is cut some output and send that
to colleagues who aren't necessarily using UIMA, so as to say - look!
I gave this input file to the NamesAndPersonTitles_TAE.xml
function/descriptor, and this is what I got!

Let's assume they don't have UIMA installed, and that I don't want to
send them a screen shot (yes, I'm old school in that regard). Rather,
I'd just like to send them a text based file they can read in a
relatively simple way.

It doesn't have to be exactly this format, but just to give you an idea...

If my input is...

Mr. Smith works at IBM.

Then I'd like to send something like....

<name> <title> Mr. </title> Smith </name> works at IBM.

(Actual results, doesn't seem to recognize IBM. :) Note that I just
wrote the above manually....

Anyway, I'd just like to have these results in a somewhat simple,
readable, mailable form. I would even settle for being able to cut and
paste from the right hand column where the annotation details are
shown, to get something like....

Person Title ("Mr.")
begin=0
end=3
Name ("Mr. Smith")
begin = 0
begin = 9

Note that I had to do that manually...anyway, the specific format
doesn't actually matter (doesn't need to be either of the above
precisely) just something that conveys the output of UIMA in a way
that can be read by a human and send via email...

BTW, I did see the HTML and XML options on the Results Display Format
buttons on Analysis Results, but when I try and use those to see what
they do that just seems to hang and nothing is displayed. I saw some
output directories interactive_temp and interactive_out, but those
just contained the input text and the .xmi output (which I don't find
particularly readable. :)

Any thoughts, suggestions, arguments as to why this is a bad idea,
etc. are of course welcome.

Cordially,
Ted

-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

Re: basic question on sharing results from ./documentAnalyzer.sh demo

Posted by Marshall Schor <ms...@schor.com>.


On 12/30/2010 12:24 PM, Ted Pedersen wrote:
> PS Just a few details on a second experiment, as there was an
> interesting little twist that initially confused me. This time I just
> used
>
> Analysis Engine : PersonTitleAnnotator.xml
>
> and ran as described below. What was nice about this was that all the
> possible titles as defined in the xml file were shown to me in the CPE
> Gui, so I could review those and remove or add as needed....
>
> But, initially I did not get any titles identified! Instead I got the
> following error....
>
> No output is being produced by the PersonTitleAnnotator because the
> Result Specification did not contain a request for the type
> example.PersonTitle with the language 'x-unspecified'
>   (Note: this message will only be shown once.)

We put in that error message in the 2.3.1 version of the PersonTitleAnnotator -
because others have been hit with this same issue - the annotator previously
just produced nothing, with no message.  You can find out more about results
specification, in the documentation:
http://uima.apache.org/d/uimaj-2.3.1/tutorials_and_users_guides.html#ugr.tug.aae.result_specification_setting

Many annotators ignore the result specification, and produce their output
regardless.  But the PersonTitleAnnotator was written to be more of a tutorial,
teaching example, and has code in it that makes use of it.  You can see this
code here:
http://svn.apache.org/viewvc/uima/uimaj/tags/uimaj-2.3.1/uimaj-examples/src/main/java/org/apache/uima/examples/cas/PersonTitleAnnotator.java?revision=1044478&view=markup
<http://svn.apache.org/viewvc/uima/uimaj/tags/uimaj-2.3.1/uimaj-examples/src/main/java/org/apache/uima/examples/cas/PersonTitleAnnotator.java?revision=1044478&view=markup>
see around line 158.

-Marshall
> So, on a hunch I specified the language as English (en) via the field
> provided for that in the CPE (which is blank by default it seems), and
> then I re-ran and got results. Note that before getting results I
> added Professor to the list of titles (via the CPE).
>
> Anyway, after doing the above with the PersonTitle Analysis Engine, I
> got the following results...
>
> <++++NEW DOCUMENT++++>
> DOCUMENT URI:file:/home/ted/data/test.txt
>
> uima.tcas.DocumentAnnotation Professor Jimmy Smith and Mr. John Smith
> are friends. They both live in  Mankato and like the Minnesota
> Gophers, but they aren't too happy with  Coach Jones.
> example.PersonTitle Professor
> org.apache.uima.examples.SourceDocumentInformation
> example.PersonTitle Mr.
>
> So...very nice.
>
> Thanks!
> Ted
>
> On Thu, Dec 30, 2010 at 10:56 AM, Ted Pedersen <tp...@d.umn.edu> wrote:
>> Thank you!!! Mission accomplished. :)
>>
>> Just to make a few notes on how I did this (in the event anyone else
>> ever wonders, and to make sure I didn't do this in a weird way...)
>>
>> I created a plain text input file that consisted of the following....
>>
>> Professor Jimmy Smith and Mr. John Smith are friends. They both live
>> in  Mankato and like the Minnesota Gophers, but they aren't too happy
>> with  Coach Jones.
>>
>> Then, I started
>>
>> bin/cpeGui.sh
>>
>> to get the Collection Processing Engine Configurator going...When that
>> was running, I loaded the directory in which my file was found, as
>> well as the following (all found in the examples/descriptors
>> directory):
>>
>> Collection Reader : FileSystemCollectionReadme.xml
>> Analysis Engine : NamesAndPersonTitles_TAE.xml
>> CAS Consumer : AnnotatorPrinter.xml
>>
>> And I clicked. Then I found the following in my output directory in a
>> file called annotprint.
>>
>> <++++NEW DOCUMENT++++>
>> DOCUMENT URI:file:/home/ted/data/test.txt
>>
>> uima.tcas.DocumentAnnotation Professor Jimmy Smith and Mr. John Smith
>> are friends. They both live in  Mankato and like the Minnesota
>> Gophers, but they aren't too happy with  Coach Jones.
>> example.Name Professor Jimmy Smith
>> org.apache.uima.examples.SourceDocumentInformation
>> example.Name Mr. John Smith
>> example.Name Minnesota Gophers
>> example.Name Coach Jones
>>
>> Which is exactly the sort of information I wanted, and note, I can
>> send it to you in an email message. :)
>>
>> As you can tell, I'm pretty new at this - given that, I feel like I
>> should ask if this is this the standard way to set this up, or is
>> there another way to go that is more common? (That said, I'm pretty
>> content with what I did here, so asking mostly out of curiosity).
>>
>> Thanks!
>> Ted
>>
>> On Thu, Dec 30, 2010 at 9:19 AM, Eddie Epstein <ea...@gmail.com> wrote:
>>> Try adding the following sample annotator to the end of your pipeline:
>>> $UIMA_HOME/examples/descriptors/cas_consumer/AnnotationPrinter.xml
>>>
>>> Eddie
>>>
>>> On Wed, Dec 29, 2010 at 1:09 PM, Ted Pedersen <tp...@d.umn.edu> wrote:
>>>> Greetings all,
>>>>
>>>> I'm fairly new to UIMA, and to get myself oriented I've been running
>>>> the documentAnalyzer.sh demo/samples, and it's proven to be pretty
>>>> easy to use and quite informative (about what you can do with UIMA).
>>>>
>>>> One thing I'd like to be able to do is cut some output and send that
>>>> to colleagues who aren't necessarily using UIMA, so as to say - look!
>>>> I gave this input file to the NamesAndPersonTitles_TAE.xml
>>>> function/descriptor, and this is what I got!
>>>>
>>>> Let's assume they don't have UIMA installed, and that I don't want to
>>>> send them a screen shot (yes, I'm old school in that regard). Rather,
>>>> I'd just like to send them a text based file they can read in a
>>>> relatively simple way.
>>>>
>>>> It doesn't have to be exactly this format, but just to give you an idea...
>>>>
>>>> If my input is...
>>>>
>>>> Mr. Smith works at IBM.
>>>>
>>>> Then I'd like to send something like....
>>>>
>>>> <name> <title> Mr. </title> Smith </name> works at IBM.
>>>>
>>>> (Actual results, doesn't seem to recognize IBM. :) Note that I just
>>>> wrote the above manually....
>>>>
>>>> Anyway, I'd just like to have these results in a somewhat simple,
>>>> readable, mailable form. I would even settle for being able to cut and
>>>> paste from the right hand column where the annotation details are
>>>> shown, to get something like....
>>>>
>>>> Person Title ("Mr.")
>>>> begin=0
>>>> end=3
>>>> Name ("Mr. Smith")
>>>> begin = 0
>>>> begin = 9
>>>>
>>>> Note that I had to do that manually...anyway, the specific format
>>>> doesn't actually matter (doesn't need to be either of the above
>>>> precisely) just something that conveys the output of UIMA in a way
>>>> that can be read by a human and send via email...
>>>>
>>>> BTW, I did see the HTML and XML options on the Results Display Format
>>>> buttons on Analysis Results, but when I try and use those to see what
>>>> they do that just seems to hang and nothing is displayed. I saw some
>>>> output directories interactive_temp and interactive_out, but those
>>>> just contained the input text and the .xmi output (which I don't find
>>>> particularly readable. :)
>>>>
>>>> Any thoughts, suggestions, arguments as to why this is a bad idea,
>>>> etc. are of course welcome.
>>>>
>>>> Cordially,
>>>> Ted
>>>>
>>>> --
>>>> Ted Pedersen
>>>> http://www.d.umn.edu/~tpederse
>>>>
>>
>>
>> --
>> Ted Pedersen
>> http://www.d.umn.edu/~tpederse
>>
>
>

Re: basic question on sharing results from ./documentAnalyzer.sh demo

Posted by Ted Pedersen <tp...@d.umn.edu>.

PS Just a few details on a second experiment, as there was an
interesting little twist that initially confused me. This time I just
used

Analysis Engine : PersonTitleAnnotator.xml

and ran as described below. What was nice about this was that all the
possible titles as defined in the xml file were shown to me in the CPE
Gui, so I could review those and remove or add as needed....

But, initially I did not get any titles identified! Instead I got the
following error....

No output is being produced by the PersonTitleAnnotator because the
Result Specification did not contain a request for the type
example.PersonTitle with the language 'x-unspecified'
  (Note: this message will only be shown once.)

So, on a hunch I specified the language as English (en) via the field
provided for that in the CPE (which is blank by default it seems), and
then I re-ran and got results. Note that before getting results I
added Professor to the list of titles (via the CPE).

Anyway, after doing the above with the PersonTitle Analysis Engine, I
got the following results...

<++++NEW DOCUMENT++++>
DOCUMENT URI:file:/home/ted/data/test.txt

uima.tcas.DocumentAnnotation Professor Jimmy Smith and Mr. John Smith
are friends. They both live in  Mankato and like the Minnesota
Gophers, but they aren't too happy with  Coach Jones.
example.PersonTitle Professor
org.apache.uima.examples.SourceDocumentInformation
example.PersonTitle Mr.

So...very nice.

Thanks!
Ted

On Thu, Dec 30, 2010 at 10:56 AM, Ted Pedersen <tp...@d.umn.edu> wrote:
> Thank you!!! Mission accomplished. :)
>
> Just to make a few notes on how I did this (in the event anyone else
> ever wonders, and to make sure I didn't do this in a weird way...)
>
> I created a plain text input file that consisted of the following....
>
> Professor Jimmy Smith and Mr. John Smith are friends. They both live
> in  Mankato and like the Minnesota Gophers, but they aren't too happy
> with  Coach Jones.
>
> Then, I started
>
> bin/cpeGui.sh
>
> to get the Collection Processing Engine Configurator going...When that
> was running, I loaded the directory in which my file was found, as
> well as the following (all found in the examples/descriptors
> directory):
>
> Collection Reader : FileSystemCollectionReadme.xml
> Analysis Engine : NamesAndPersonTitles_TAE.xml
> CAS Consumer : AnnotatorPrinter.xml
>
> And I clicked. Then I found the following in my output directory in a
> file called annotprint.
>
> <++++NEW DOCUMENT++++>
> DOCUMENT URI:file:/home/ted/data/test.txt
>
> uima.tcas.DocumentAnnotation Professor Jimmy Smith and Mr. John Smith
> are friends. They both live in  Mankato and like the Minnesota
> Gophers, but they aren't too happy with  Coach Jones.
> example.Name Professor Jimmy Smith
> org.apache.uima.examples.SourceDocumentInformation
> example.Name Mr. John Smith
> example.Name Minnesota Gophers
> example.Name Coach Jones
>
> Which is exactly the sort of information I wanted, and note, I can
> send it to you in an email message. :)
>
> As you can tell, I'm pretty new at this - given that, I feel like I
> should ask if this is this the standard way to set this up, or is
> there another way to go that is more common? (That said, I'm pretty
> content with what I did here, so asking mostly out of curiosity).
>
> Thanks!
> Ted
>
> On Thu, Dec 30, 2010 at 9:19 AM, Eddie Epstein <ea...@gmail.com> wrote:
>> Try adding the following sample annotator to the end of your pipeline:
>> $UIMA_HOME/examples/descriptors/cas_consumer/AnnotationPrinter.xml
>>
>> Eddie
>>
>> On Wed, Dec 29, 2010 at 1:09 PM, Ted Pedersen <tp...@d.umn.edu> wrote:
>>> Greetings all,
>>>
>>> I'm fairly new to UIMA, and to get myself oriented I've been running
>>> the documentAnalyzer.sh demo/samples, and it's proven to be pretty
>>> easy to use and quite informative (about what you can do with UIMA).
>>>
>>> One thing I'd like to be able to do is cut some output and send that
>>> to colleagues who aren't necessarily using UIMA, so as to say - look!
>>> I gave this input file to the NamesAndPersonTitles_TAE.xml
>>> function/descriptor, and this is what I got!
>>>
>>> Let's assume they don't have UIMA installed, and that I don't want to
>>> send them a screen shot (yes, I'm old school in that regard). Rather,
>>> I'd just like to send them a text based file they can read in a
>>> relatively simple way.
>>>
>>> It doesn't have to be exactly this format, but just to give you an idea...
>>>
>>> If my input is...
>>>
>>> Mr. Smith works at IBM.
>>>
>>> Then I'd like to send something like....
>>>
>>> <name> <title> Mr. </title> Smith </name> works at IBM.
>>>
>>> (Actual results, doesn't seem to recognize IBM. :) Note that I just
>>> wrote the above manually....
>>>
>>> Anyway, I'd just like to have these results in a somewhat simple,
>>> readable, mailable form. I would even settle for being able to cut and
>>> paste from the right hand column where the annotation details are
>>> shown, to get something like....
>>>
>>> Person Title ("Mr.")
>>> begin=0
>>> end=3
>>> Name ("Mr. Smith")
>>> begin = 0
>>> begin = 9
>>>
>>> Note that I had to do that manually...anyway, the specific format
>>> doesn't actually matter (doesn't need to be either of the above
>>> precisely) just something that conveys the output of UIMA in a way
>>> that can be read by a human and send via email...
>>>
>>> BTW, I did see the HTML and XML options on the Results Display Format
>>> buttons on Analysis Results, but when I try and use those to see what
>>> they do that just seems to hang and nothing is displayed. I saw some
>>> output directories interactive_temp and interactive_out, but those
>>> just contained the input text and the .xmi output (which I don't find
>>> particularly readable. :)
>>>
>>> Any thoughts, suggestions, arguments as to why this is a bad idea,
>>> etc. are of course welcome.
>>>
>>> Cordially,
>>> Ted
>>>
>>> --
>>> Ted Pedersen
>>> http://www.d.umn.edu/~tpederse
>>>
>>
>
>
>
> --
> Ted Pedersen
> http://www.d.umn.edu/~tpederse
>



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

Re: basic question on sharing results from ./documentAnalyzer.sh demo

Posted by Ted Pedersen <tp...@d.umn.edu>.

Thank you!!! Mission accomplished. :)

Just to make a few notes on how I did this (in the event anyone else
ever wonders, and to make sure I didn't do this in a weird way...)

I created a plain text input file that consisted of the following....

Professor Jimmy Smith and Mr. John Smith are friends. They both live
in  Mankato and like the Minnesota Gophers, but they aren't too happy
with  Coach Jones.

Then, I started

bin/cpeGui.sh

to get the Collection Processing Engine Configurator going...When that
was running, I loaded the directory in which my file was found, as
well as the following (all found in the examples/descriptors
directory):

Collection Reader : FileSystemCollectionReadme.xml
Analysis Engine : NamesAndPersonTitles_TAE.xml
CAS Consumer : AnnotatorPrinter.xml

And I clicked. Then I found the following in my output directory in a
file called annotprint.

<++++NEW DOCUMENT++++>
DOCUMENT URI:file:/home/ted/data/test.txt

uima.tcas.DocumentAnnotation Professor Jimmy Smith and Mr. John Smith
are friends. They both live in  Mankato and like the Minnesota
Gophers, but they aren't too happy with  Coach Jones.
example.Name Professor Jimmy Smith
org.apache.uima.examples.SourceDocumentInformation
example.Name Mr. John Smith
example.Name Minnesota Gophers
example.Name Coach Jones

Which is exactly the sort of information I wanted, and note, I can
send it to you in an email message. :)

As you can tell, I'm pretty new at this - given that, I feel like I
should ask if this is this the standard way to set this up, or is
there another way to go that is more common? (That said, I'm pretty
content with what I did here, so asking mostly out of curiosity).

Thanks!
Ted

On Thu, Dec 30, 2010 at 9:19 AM, Eddie Epstein <ea...@gmail.com> wrote:
> Try adding the following sample annotator to the end of your pipeline:
> $UIMA_HOME/examples/descriptors/cas_consumer/AnnotationPrinter.xml
>
> Eddie
>
> On Wed, Dec 29, 2010 at 1:09 PM, Ted Pedersen <tp...@d.umn.edu> wrote:
>> Greetings all,
>>
>> I'm fairly new to UIMA, and to get myself oriented I've been running
>> the documentAnalyzer.sh demo/samples, and it's proven to be pretty
>> easy to use and quite informative (about what you can do with UIMA).
>>
>> One thing I'd like to be able to do is cut some output and send that
>> to colleagues who aren't necessarily using UIMA, so as to say - look!
>> I gave this input file to the NamesAndPersonTitles_TAE.xml
>> function/descriptor, and this is what I got!
>>
>> Let's assume they don't have UIMA installed, and that I don't want to
>> send them a screen shot (yes, I'm old school in that regard). Rather,
>> I'd just like to send them a text based file they can read in a
>> relatively simple way.
>>
>> It doesn't have to be exactly this format, but just to give you an idea...
>>
>> If my input is...
>>
>> Mr. Smith works at IBM.
>>
>> Then I'd like to send something like....
>>
>> <name> <title> Mr. </title> Smith </name> works at IBM.
>>
>> (Actual results, doesn't seem to recognize IBM. :) Note that I just
>> wrote the above manually....
>>
>> Anyway, I'd just like to have these results in a somewhat simple,
>> readable, mailable form. I would even settle for being able to cut and
>> paste from the right hand column where the annotation details are
>> shown, to get something like....
>>
>> Person Title ("Mr.")
>> begin=0
>> end=3
>> Name ("Mr. Smith")
>> begin = 0
>> begin = 9
>>
>> Note that I had to do that manually...anyway, the specific format
>> doesn't actually matter (doesn't need to be either of the above
>> precisely) just something that conveys the output of UIMA in a way
>> that can be read by a human and send via email...
>>
>> BTW, I did see the HTML and XML options on the Results Display Format
>> buttons on Analysis Results, but when I try and use those to see what
>> they do that just seems to hang and nothing is displayed. I saw some
>> output directories interactive_temp and interactive_out, but those
>> just contained the input text and the .xmi output (which I don't find
>> particularly readable. :)
>>
>> Any thoughts, suggestions, arguments as to why this is a bad idea,
>> etc. are of course welcome.
>>
>> Cordially,
>> Ted
>>
>> --
>> Ted Pedersen
>> http://www.d.umn.edu/~tpederse
>>
>



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

Re: basic question on sharing results from ./documentAnalyzer.sh demo

Posted by Eddie Epstein <ea...@gmail.com>.

Try adding the following sample annotator to the end of your pipeline:
$UIMA_HOME/examples/descriptors/cas_consumer/AnnotationPrinter.xml

Eddie

On Wed, Dec 29, 2010 at 1:09 PM, Ted Pedersen <tp...@d.umn.edu> wrote:
> Greetings all,
>
> I'm fairly new to UIMA, and to get myself oriented I've been running
> the documentAnalyzer.sh demo/samples, and it's proven to be pretty
> easy to use and quite informative (about what you can do with UIMA).
>
> One thing I'd like to be able to do is cut some output and send that
> to colleagues who aren't necessarily using UIMA, so as to say - look!
> I gave this input file to the NamesAndPersonTitles_TAE.xml
> function/descriptor, and this is what I got!
>
> Let's assume they don't have UIMA installed, and that I don't want to
> send them a screen shot (yes, I'm old school in that regard). Rather,
> I'd just like to send them a text based file they can read in a
> relatively simple way.
>
> It doesn't have to be exactly this format, but just to give you an idea...
>
> If my input is...
>
> Mr. Smith works at IBM.
>
> Then I'd like to send something like....
>
> <name> <title> Mr. </title> Smith </name> works at IBM.
>
> (Actual results, doesn't seem to recognize IBM. :) Note that I just
> wrote the above manually....
>
> Anyway, I'd just like to have these results in a somewhat simple,
> readable, mailable form. I would even settle for being able to cut and
> paste from the right hand column where the annotation details are
> shown, to get something like....
>
> Person Title ("Mr.")
> begin=0
> end=3
> Name ("Mr. Smith")
> begin = 0
> begin = 9
>
> Note that I had to do that manually...anyway, the specific format
> doesn't actually matter (doesn't need to be either of the above
> precisely) just something that conveys the output of UIMA in a way
> that can be read by a human and send via email...
>
> BTW, I did see the HTML and XML options on the Results Display Format
> buttons on Analysis Results, but when I try and use those to see what
> they do that just seems to hang and nothing is displayed. I saw some
> output directories interactive_temp and interactive_out, but those
> just contained the input text and the .xmi output (which I don't find
> particularly readable. :)
>
> Any thoughts, suggestions, arguments as to why this is a bad idea,
> etc. are of course welcome.
>
> Cordially,
> Ted
>
> --
> Ted Pedersen
> http://www.d.umn.edu/~tpederse
>