You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by estelle <ed...@similis.org> on 2009/06/15 15:38:29 UTC

difficulty using Dictionary Annotator and Hmm Tagger

Hello, 
I'm new to UIMA and i am currently testing the sandbox addons. 
I'm testing them with the help of the Document Analyzer utility.
The Dictionnary Annotator and the Hmm Tagger seem to work fine (there are no
error messages) but once the text is processed,  I can't see any annotation on
the Annotation results panel. 

Can someone help me please ?

Re: difficulty using Dictionary Annotator and Hmm Tagger

Posted by Tommaso Teofili <to...@gmail.com>.

it seems to me that Hmm Tagger is working properly in the CVD.When you run
the Hmm Tagger it tags part of speech not as a separate Annotation, it fills
a property in your Token annotations created by the Whitespace Tokenizer (I
can't recall the name of the property, something like 'pos'), so try to have
a look at the Token annotations before and after Tagger processing.
The Document Analyzer log is showing only the Whitespace Tokenizer has
started...but try to have a look at Token properties, as said above.
Bye,
Tommaso


2009/6/16 estelle <ed...@similis.org>

> Tommaso Teofili <to...@...> writes:
>
> >
> > Hi, try to use CAS Visual Debugger, I think it's very useful for starting
> > developing with UIMA.The HMM tagger needs the Whitespace Tokenizer to
> > process the document first in order to annotate POSs.
> > The flow order is significant so beware.
> > For the Dictionary, is there any entry inside the dictionary? Is it
> pointed
> > in the right place?
> > Check the log at runtime too.
> > Provide more info
> > Regards,
> > Tommaso
> >
> > 2009/6/15 estelle <ed...@...>
> >
> > > Hello,
> > > I'm new to UIMA and i am currently testing the sandbox addons.
> > > I'm testing them with the help of the Document Analyzer utility.
> > > The Dictionnary Annotator and the Hmm Tagger seem to work fine (there
> are
> > > no
> > > error messages) but once the text is processed,  I can't see any
> annotation
> > > on
> > > the Annotation results panel.
> > >
> > > Can someone help me please ?
> > >
> > >
> >
>
> Hello and thank you for your answer.
>
> HmmTagger and DictionaryAnnotator work fine with the CAS Visual Debugger.
>
> I do use the aggregateAnnotator "Tokenizer > HmmTagger" for Tagging and the
> "Tokenizer > DictionaryAnnotator" aggregateAnnotator for dictionary
> annotation.
>
> The entries in the dictionary are the default entries + an entry for the
> word
> "UIMA" that I've added to make sure it would match on the sample texts.
>
> I have checked the logfiles and it seems that only the WhiteSpaceTokenizer
> works
> when launching the Tokenizer + HmmTagger aggregation.
>
>
> Log file from running "Tokenizer + Hmm" with Document Analyzer :
>
> 16/06/09 09:32:09 - 12: WhitespaceTokenizer.initialize: INFO: "Whitespace
> tokenizer successfully initialized"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.typeSystemInit: INFO:
> "Whitespace
> tokenizer typesystem initialized"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> finished processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> finished processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> finished processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> finished processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> finished processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> finished processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> finished processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> finished processing"
>
>
>
> Log file from running "Tokenizer + Hmm" with CAS Visual Debugger :
>
> 16/06/09 09:23:52 - 10: WhitespaceTokenizer.initialize: INFO: "Whitespace
> tokenizer successfully initialized"
> 16/06/09 09:24:05 - 10: WhitespaceTokenizer.typeSystemInit: INFO:
> "Whitespace
> tokenizer typesystem initialized"
> 16/06/09 09:24:05 - 10: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> starts processing"
> 16/06/09 09:24:05 - 10: WhitespaceTokenizer.process: INFO: "Whitespace
> tokenizer
> finished processing"
> 16/06/09 09:24:05 - 10:
> org.apache.uima.tools.cvd.MainFrame.internalRunAE(1570):
> INFO: Process trace of AE run:
> Component Name: HmmTaggerTAE
> Event Type: Analysis
> Duration: 179ms (100%)
> Sub-events:
>        Component Name: WhitespaceTokenizer
>        Event Type: Analysis
>        Duration: 7ms (3,91%)
>
>        Component Name: Hidden Markov Model - Part of Speech Tagger
>        Event Type: Analysis
>        Duration: 162ms (90,5%)
>
>        Component Name: Fixed Flow Controller
>        Event Type: Analysis
>        Duration: 5ms (2,79%)
>
>
>
>
>
>

Re: difficulty using Dictionary Annotator and Hmm Tagger

Posted by Thilo Goetz <tw...@gmx.de>.

Estelle,

this may just be a usability issue with the
DocumentAnalyzer.  If your analysis chain
works with CVD, there is no reason to believe
it wouldn't work with the DocumentAnalyzer.
Did you follow the instruction described here:
http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tools/tools.html#ugr.tools.doc_analyzer.viewing_results

The difference in the log file is no cause
for alarm.  The process trace is logged by
CVD itself, not the annotators.  Looks to
me like the POS tagger is not logging anything,
neither in CVD nor in DocumentAnalyzer.

HTH,
Thilo

estelle wrote:
> Tommaso Teofili <to...@...> writes:
> 
>> Hi, try to use CAS Visual Debugger, I think it's very useful for starting
>> developing with UIMA.The HMM tagger needs the Whitespace Tokenizer to
>> process the document first in order to annotate POSs.
>> The flow order is significant so beware.
>> For the Dictionary, is there any entry inside the dictionary? Is it pointed
>> in the right place?
>> Check the log at runtime too.
>> Provide more info 
>> Regards,
>> Tommaso
>>
>> 2009/6/15 estelle <ed...@...>
>>
>>> Hello,
>>> I'm new to UIMA and i am currently testing the sandbox addons.
>>> I'm testing them with the help of the Document Analyzer utility.
>>> The Dictionnary Annotator and the Hmm Tagger seem to work fine (there are
>>> no
>>> error messages) but once the text is processed,  I can't see any annotation
>>> on
>>> the Annotation results panel.
>>>
>>> Can someone help me please ?
>>>
>>>
> 
> Hello and thank you for your answer. 
> 
> HmmTagger and DictionaryAnnotator work fine with the CAS Visual Debugger. 
> 
> I do use the aggregateAnnotator "Tokenizer > HmmTagger" for Tagging and the
> "Tokenizer > DictionaryAnnotator" aggregateAnnotator for dictionary annotation.
> 
> The entries in the dictionary are the default entries + an entry for the word
> "UIMA" that I've added to make sure it would match on the sample texts.
> 
> I have checked the logfiles and it seems that only the WhiteSpaceTokenizer works
> when launching the Tokenizer + HmmTagger aggregation.
> 
> 
> Log file from running "Tokenizer + Hmm" with Document Analyzer : 
> 
> 16/06/09 09:32:09 - 12: WhitespaceTokenizer.initialize: INFO: "Whitespace
> tokenizer successfully initialized"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.typeSystemInit: INFO: "Whitespace
> tokenizer typesystem initialized"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> finished processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> finished processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> finished processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> finished processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> finished processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> finished processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> finished processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> starts processing"
> 16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> finished processing"
> 
> 
> 
> Log file from running "Tokenizer + Hmm" with CAS Visual Debugger : 
> 
> 16/06/09 09:23:52 - 10: WhitespaceTokenizer.initialize: INFO: "Whitespace
> tokenizer successfully initialized"
> 16/06/09 09:24:05 - 10: WhitespaceTokenizer.typeSystemInit: INFO: "Whitespace
> tokenizer typesystem initialized"
> 16/06/09 09:24:05 - 10: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> starts processing"
> 16/06/09 09:24:05 - 10: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
> finished processing"
> 16/06/09 09:24:05 - 10: org.apache.uima.tools.cvd.MainFrame.internalRunAE(1570):
> INFO: Process trace of AE run:
> Component Name: HmmTaggerTAE
> Event Type: Analysis
> Duration: 179ms (100%)
> Sub-events:
>         Component Name: WhitespaceTokenizer
>         Event Type: Analysis
>         Duration: 7ms (3,91%)
> 
>         Component Name: Hidden Markov Model - Part of Speech Tagger
>         Event Type: Analysis
>         Duration: 162ms (90,5%)
> 
>         Component Name: Fixed Flow Controller
>         Event Type: Analysis
>         Duration: 5ms (2,79%)
> 
> 
> 
>

Re: difficulty using Dictionary Annotator and Hmm Tagger

Posted by estelle <ed...@similis.org>.

Tommaso Teofili <to...@...> writes:

> 
> Hi, try to use CAS Visual Debugger, I think it's very useful for starting
> developing with UIMA.The HMM tagger needs the Whitespace Tokenizer to
> process the document first in order to annotate POSs.
> The flow order is significant so beware.
> For the Dictionary, is there any entry inside the dictionary? Is it pointed
> in the right place?
> Check the log at runtime too.
> Provide more info 
> Regards,
> Tommaso
> 
> 2009/6/15 estelle <ed...@...>
> 
> > Hello,
> > I'm new to UIMA and i am currently testing the sandbox addons.
> > I'm testing them with the help of the Document Analyzer utility.
> > The Dictionnary Annotator and the Hmm Tagger seem to work fine (there are
> > no
> > error messages) but once the text is processed,  I can't see any annotation
> > on
> > the Annotation results panel.
> >
> > Can someone help me please ?
> >
> >
> 

Hello and thank you for your answer. 

HmmTagger and DictionaryAnnotator work fine with the CAS Visual Debugger. 

I do use the aggregateAnnotator "Tokenizer > HmmTagger" for Tagging and the
"Tokenizer > DictionaryAnnotator" aggregateAnnotator for dictionary annotation.

The entries in the dictionary are the default entries + an entry for the word
"UIMA" that I've added to make sure it would match on the sample texts.

I have checked the logfiles and it seems that only the WhiteSpaceTokenizer works
when launching the Tokenizer + HmmTagger aggregation.


Log file from running "Tokenizer + Hmm" with Document Analyzer : 

16/06/09 09:32:09 - 12: WhitespaceTokenizer.initialize: INFO: "Whitespace
tokenizer successfully initialized"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.typeSystemInit: INFO: "Whitespace
tokenizer typesystem initialized"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
starts processing"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
finished processing"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
starts processing"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
finished processing"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
starts processing"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
finished processing"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
starts processing"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
finished processing"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
starts processing"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
finished processing"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
starts processing"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
finished processing"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
starts processing"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
finished processing"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
starts processing"
16/06/09 09:32:10 - 13: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
finished processing"



Log file from running "Tokenizer + Hmm" with CAS Visual Debugger : 

16/06/09 09:23:52 - 10: WhitespaceTokenizer.initialize: INFO: "Whitespace
tokenizer successfully initialized"
16/06/09 09:24:05 - 10: WhitespaceTokenizer.typeSystemInit: INFO: "Whitespace
tokenizer typesystem initialized"
16/06/09 09:24:05 - 10: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
starts processing"
16/06/09 09:24:05 - 10: WhitespaceTokenizer.process: INFO: "Whitespace tokenizer
finished processing"
16/06/09 09:24:05 - 10: org.apache.uima.tools.cvd.MainFrame.internalRunAE(1570):
INFO: Process trace of AE run:
Component Name: HmmTaggerTAE
Event Type: Analysis
Duration: 179ms (100%)
Sub-events:
        Component Name: WhitespaceTokenizer
        Event Type: Analysis
        Duration: 7ms (3,91%)

        Component Name: Hidden Markov Model - Part of Speech Tagger
        Event Type: Analysis
        Duration: 162ms (90,5%)

        Component Name: Fixed Flow Controller
        Event Type: Analysis
        Duration: 5ms (2,79%)

Re: difficulty using Dictionary Annotator and Hmm Tagger

Posted by Tommaso Teofili <to...@gmail.com>.

Hi, try to use CAS Visual Debugger, I think it's very useful for starting
developing with UIMA.The HMM tagger needs the Whitespace Tokenizer to
process the document first in order to annotate POSs.
The flow order is significant so beware.
For the Dictionary, is there any entry inside the dictionary? Is it pointed
in the right place?
Check the log at runtime too.
Provide more info :-)
Regards,
Tommaso

2009/6/15 estelle <ed...@similis.org>

> Hello,
> I'm new to UIMA and i am currently testing the sandbox addons.
> I'm testing them with the help of the Document Analyzer utility.
> The Dictionnary Annotator and the Hmm Tagger seem to work fine (there are
> no
> error messages) but once the text is processed,  I can't see any annotation
> on
> the Annotation results panel.
>
> Can someone help me please ?
>
>

Re: difficulty using Dictionary Annotator and Hmm Tagger

Posted by Thilo Goetz <tw...@gmx.de>.

estelle wrote:
> Hello, 
> I'm new to UIMA and i am currently testing the sandbox addons. 
> I'm testing them with the help of the Document Analyzer utility.
> The Dictionnary Annotator and the Hmm Tagger seem to work fine (there are no
> error messages) but once the text is processed,  I can't see any annotation on
> the Annotation results panel. 
> 
> Can someone help me please ? 

I wish I had a better memory.  This is a bug in
DocumentAnalyzer that I found a while back.  It's
documented here:
https://issues.apache.org/jira/browse/UIMA-1114

To verify, try using the java viewer instead of
the html viewer.

Since it's not been fixed yet, I have small hopes
that this will be fixed in time for our next
release.

--Thilo