You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ctakes.apache.org by "Tim Miller (JIRA)" <ji...@apache.org> on 2012/11/01 18:39:12 UTC

[jira] [Commented] (CTAKES-60) Null pointer error with empty sentences

    [ https://issues.apache.org/jira/browse/CTAKES-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488867#comment-13488867 ] 

Tim Miller commented on CTAKES-60:
----------------------------------

This problem is a little bit more complex than I thought.  I think the problem is that the dependency parser depends on the POS tagger, and thus expects every token to have a POS.  However, the POS tagger for some reason does not tag single token sentences.  So in the example above, the second sentence is just a period which will not get a POS tag in the dependency pipeline.  Similarly, I also found this error with a sentence of the form "( This is a sentence. )"  (i.e. a sentence surrounded with parens).  The close paren is taken as its own sentence, but no POS tag on the token again.  

Here comes the twist: The default pipeline (aggregatePlaintextUMLSProcessor) gives this token a POS tag!  So something weird is going on where a different component is assigning POS tags to these tokens without them.  I've tracked it down to the Chunker. Since the dependency parser pipeline does not use that component it gets these occasional errors.

A quick workaround is simply adding the chunker to the dependency parser pipeline, but that will certainly not be intuitive for new users.  I think it is worth looking into why this behavior happens in the chunker and seeing if it can be moved back to the POS tagger.
                
> Null pointer error with empty sentences
> ---------------------------------------
>
>                 Key: CTAKES-60
>                 URL: https://issues.apache.org/jira/browse/CTAKES-60
>             Project: cTAKES
>          Issue Type: Bug
>          Components: ctakes-chunker, ctakes-dependency-parser, ctakes-pos-tagger
>            Reporter: Tim Miller
>
> Null pointer exception in SRL module caused by certain ill-formed sentences (that other components handle gracefully).
> Smallest workable example input:
> I encouraged exercise. She needs a vaccine still but we don't have any moer now. . She will follow up with me in 4 months' time and also with her primary care physician.
> </example>
> The problem is something to do with the double period.  Running this example in the UIMA-CVD with the AE located in "desc/analysis_engine/ClearParserSRLPlaintextAggregate.xml" produces the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira