You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Chen Lin (JIRA)" <ji...@apache.org> on 2012/09/21 22:50:07 UTC

[jira] [Created] (CTAKES-63) exception formed by malformed email address

Chen Lin created CTAKES-63:
------------------------------

             Summary: exception formed by malformed email address
                 Key: CTAKES-63
                 URL: https://issues.apache.org/jira/browse/CTAKES-63
             Project: cTAKES
          Issue Type: Bug
          Components: ctakes-dictionary-lookup
    Affects Versions: 2.6-incubating
         Environment: windows
            Reporter: Chen Lin
            Priority: Critical


2012-09-21 12:48:36,789 INFO  edu.mayo.bmi.uima.lookup.ae.UmlsDictionaryLookupAnnotator  - process(JCas)
org.apache.lucene.queryParser.ParseException: Cannot parse 'mailto:abcoman@t nec.org]': Lexical error at line 1, column 26.  Encountered: <EOF> after : ""
       at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:192)
       at edu.mayo.bmi.dictionary.lucene.LuceneDictionaryImpl.getEntries(LuceneDictionaryImpl.java:106)
       at edu.mayo.bmi.dictionary.DictionaryEngine.metaLookup(DictionaryEngine.java:181)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CTAKES-63) exception formed by malformed email address

Posted by "Pei Chen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CTAKES-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469651#comment-13469651 ] 

Pei Chen commented on CTAKES-63:
--------------------------------

Thanks Sean.
Just to confirm, are you saying that we should just rebuild the indexes?  And completely remove the (String.contains("-")) workaround code?
That would sound much cleaner.
                
> exception formed by malformed email address
> -------------------------------------------
>
>                 Key: CTAKES-63
>                 URL: https://issues.apache.org/jira/browse/CTAKES-63
>             Project: cTAKES
>          Issue Type: Bug
>          Components: ctakes-dictionary-lookup
>    Affects Versions: 2.6-incubating
>         Environment: windows
>            Reporter: Chen Lin
>            Priority: Critical
>              Labels: Stability
>
> 2012-09-21 12:48:36,789 INFO  edu.mayo.bmi.uima.lookup.ae.UmlsDictionaryLookupAnnotator  - process(JCas)
> org.apache.lucene.queryParser.ParseException: Cannot parse 'mailto:abcoman@t nec.org]': Lexical error at line 1, column 26.  Encountered: <EOF> after : ""
>        at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:192)
>        at edu.mayo.bmi.dictionary.lucene.LuceneDictionaryImpl.getEntries(LuceneDictionaryImpl.java:106)
>        at edu.mayo.bmi.dictionary.DictionaryEngine.metaLookup(DictionaryEngine.java:181)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CTAKES-63) exception formed by malformed email address

Posted by "Sean (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CTAKES-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469654#comment-13469654 ] 

Sean commented on CTAKES-63:
----------------------------

I am out of the office on PTO today, Thursday 4th.  I will be back in the office tomorrow.
         ~Sean


                
> exception formed by malformed email address
> -------------------------------------------
>
>                 Key: CTAKES-63
>                 URL: https://issues.apache.org/jira/browse/CTAKES-63
>             Project: cTAKES
>          Issue Type: Bug
>          Components: ctakes-dictionary-lookup
>    Affects Versions: 2.6-incubating
>         Environment: windows
>            Reporter: Chen Lin
>            Priority: Critical
>              Labels: Stability
>
> 2012-09-21 12:48:36,789 INFO  edu.mayo.bmi.uima.lookup.ae.UmlsDictionaryLookupAnnotator  - process(JCas)
> org.apache.lucene.queryParser.ParseException: Cannot parse 'mailto:abcoman@t nec.org]': Lexical error at line 1, column 26.  Encountered: <EOF> after : ""
>        at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:192)
>        at edu.mayo.bmi.dictionary.lucene.LuceneDictionaryImpl.getEntries(LuceneDictionaryImpl.java:106)
>        at edu.mayo.bmi.dictionary.DictionaryEngine.metaLookup(DictionaryEngine.java:181)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CTAKES-63) exception formed by malformed email address

Posted by "Pei Chen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CTAKES-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461843#comment-13461843 ] 

Pei Chen commented on CTAKES-63:
--------------------------------

Could recreate this in ctakes v2.5 Cas visual debugger with the standard AggregatePlainTextPipeline.
Example Text to recreate:
“From: Chen, Pei [mailto:abcoman@t-nec.org] Sent: Wednesday, February 27, 2008 12:55 PMTo: Chen, Pei M.,M.D.Subject: RE: blah blah blah”

Full Exception:
Exception:
2012-09-21 12:48:36,789 INFO  edu.mayo.bmi.uima.lookup.ae.UmlsDictionaryLookupAnnotator  - process(JCas)
org.apache.lucene.queryParser.ParseException: Cannot parse 'mailto:abcoman@t nec.org]': Lexical error at line 1, column 26.  Encountered: <EOF> after : ""
       at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:192)
       at edu.mayo.bmi.dictionary.lucene.LuceneDictionaryImpl.getEntries(LuceneDictionaryImpl.java:106)
       at edu.mayo.bmi.dictionary.DictionaryEngine.metaLookup(DictionaryEngine.java:181)
       at edu.mayo.bmi.lookup.algorithms.FirstTokenPermutationImpl.getFirstTokenHits(FirstTokenPermutationImpl.java:701)
       at edu.mayo.bmi.lookup.algorithms.FirstTokenPermutationImpl.lookup(FirstTokenPermutationImpl.java:145)
       at edu.mayo.bmi.uima.lookup.ae.DictionaryLookupAnnotator.performLookup(DictionaryLookupAnnotator.java:155)
       at edu.mayo.bmi.uima.lookup.ae.DictionaryLookupAnnotator.process(DictionaryLookupAnnotator.java:131)
       at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
       at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:375)
       at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:296)
       at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)
       at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:409)
       at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
       at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
       at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
       at org.apache.uima.tools.cvd.MainFrame.internalRunAE(MainFrame.java:1526)
       at org.apache.uima.tools.cvd.MainFrame.runAE(MainFrame.java:430)
       at org.apache.uima.tools.cvd.control.AnnotatorRerunEventHandler.actionPerformed(AnnotatorRerunEventHandler.java:40)
       at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
       at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
       at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
       at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
       at javax.swing.AbstractButton.doClick(Unknown Source)
       at javax.swing.AbstractButton.doClick(Unknown Source)
       at javax.swing.plaf.basic.BasicMenuItemUI$Actions.actionPerformed(Unknown Source)
       at javax.swing.SwingUtilities.notifyAction(Unknown Source)
       at javax.swing.JComponent.processKeyBinding(Unknown Source)
       at javax.swing.JMenuBar.processBindingForKeyStrokeRecursive(Unknown Source)
       at javax.swing.JMenuBar.processBindingForKeyStrokeRecursive(Unknown Source)
       at javax.swing.JMenuBar.processBindingForKeyStrokeRecursive(Unknown Source)
       at javax.swing.JMenuBar.processKeyBinding(Unknown Source)
       at javax.swing.KeyboardManager.fireBinding(Unknown Source)
       at javax.swing.KeyboardManager.fireKeyboardAction(Unknown Source)
       at javax.swing.JComponent.processKeyBindingsForAllComponents(Unknown Source)
       at javax.swing.JComponent.processKeyBindings(Unknown Source)
       at javax.swing.JComponent.processKeyEvent(Unknown Source)
       at java.awt.Component.processEvent(Unknown Source)
       at java.awt.Container.processEvent(Unknown Source)
       at java.awt.Component.dispatchEventImpl(Unknown Source)
       at java.awt.Container.dispatchEventImpl(Unknown Source)
       at java.awt.Component.dispatchEvent(Unknown Source)
       at java.awt.KeyboardFocusManager.redispatchEvent(Unknown Source)
       at java.awt.DefaultKeyboardFocusManager.dispatchKeyEvent(Unknown Source)
       at java.awt.DefaultKeyboardFocusManager.preDispatchKeyEvent(Unknown Source)
       at java.awt.DefaultKeyboardFocusManager.typeAheadAssertions(Unknown Source)
       at java.awt.DefaultKeyboardFocusManager.dispatchEvent(Unknown Source)
       at java.awt.Component.dispatchEventImpl(Unknown Source)
       at java.awt.Container.dispatchEventImpl(Unknown Source)
       at java.awt.Window.dispatchEventImpl(Unknown Source)
       at java.awt.Component.dispatchEvent(Unknown Source)
       at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
       at java.awt.EventQueue.access$000(Unknown Source)
       at java.awt.EventQueue$1.run(Unknown Source)
       at java.awt.EventQueue$1.run(Unknown Source)
       at java.security.AccessController.doPrivileged(Native Method)
       at java.security.AccessControlContext$1.doIntersectionPrivilege(Unknown Source)
       at java.security.AccessControlContext$1.doIntersectionPrivilege(Unknown Source)
       at java.awt.EventQueue$2.run(Unknown Source)
       at java.awt.EventQueue$2.run(Unknown Source)
       at java.security.AccessController.doPrivileged(Native Method)
       at java.security.AccessControlContext$1.doIntersectionPrivilege(Unknown Source)
       at java.awt.EventQueue.dispatchEvent(Unknown Source)
       at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
       at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
       at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
       at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
       at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
       at java.awt.EventDispatchThread.run(Unknown Source)
Caused by: org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column 26.  Encountered: <EOF> after : ""
       at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1229)
       at org.apache.lucene.queryParser.QueryParser.jj_scan_token(QueryParser.java:1650)
       at org.apache.lucene.queryParser.QueryParser.jj_3R_2(QueryParser.java:1533)
       at org.apache.lucene.queryParser.QueryParser.jj_3_1(QueryParser.java:1540)
       at org.apache.lucene.queryParser.QueryParser.jj_2_1(QueryParser.java:1526)
       at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1221)
       at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1207)
       at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1167)
       at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:182)
       ... 67 more


                
> exception formed by malformed email address
> -------------------------------------------
>
>                 Key: CTAKES-63
>                 URL: https://issues.apache.org/jira/browse/CTAKES-63
>             Project: cTAKES
>          Issue Type: Bug
>          Components: ctakes-dictionary-lookup
>    Affects Versions: 2.6-incubating
>         Environment: windows
>            Reporter: Chen Lin
>            Priority: Critical
>              Labels: Stability
>
> 2012-09-21 12:48:36,789 INFO  edu.mayo.bmi.uima.lookup.ae.UmlsDictionaryLookupAnnotator  - process(JCas)
> org.apache.lucene.queryParser.ParseException: Cannot parse 'mailto:abcoman@t nec.org]': Lexical error at line 1, column 26.  Encountered: <EOF> after : ""
>        at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:192)
>        at edu.mayo.bmi.dictionary.lucene.LuceneDictionaryImpl.getEntries(LuceneDictionaryImpl.java:106)
>        at edu.mayo.bmi.dictionary.DictionaryEngine.metaLookup(DictionaryEngine.java:181)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CTAKES-63) exception formed by malformed email address

Posted by "Sean (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CTAKES-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469355#comment-13469355 ] 

Sean commented on CTAKES-63:
----------------------------

As Pei had indicated to an email that was forwarded to me and I'm including here for documentation purposes (my response follows):

> After some debugging, this happens when the token contains a dash (-), 
> and contains a special char such as the right bracket].
> //I believe all of the chars in the QueryParser str token should be 
> escaped to avoid issues such as a token ending with ']'
> 
> Before we add and test the proposed fixed (add escape() call) such as 
> below, I also noticed another potential issue: we do search and 
> replace of all dashes into spaces.  Just wanted to ensure that this 
> was done intentionally and works fine because the dashes have already 
> been removed in the index.  Otherwise, we'll need to actually replace 
> the dash with a '?' instead of a space or use a phrasequery instead of 
> termquery.  Would be great if someone familiar with this bit of code to confirm...
> 
> LuceneDictionaryImpl.java (dictionary-lookup) [~Line 106]
> 
>               if (str.indexOf('-') == -1) {
>                      q = new TermQuery(new Term(iv_lookupFieldName, str));
>                      topDoc = iv_searcher.search(q, iv_maxHits);
>               }
>               else {  // needed the KeyworkAnalyzer for situations 
> where the hypen was included in the f-word
>                      QueryParser query = new 
> QueryParser(Version.LUCENE_30, iv_lookupFieldName, new KeywordAnalyzer());
>                      try {
>                            //topDoc =
> iv_searcher.search(query.parse(str.replace('-', ' ')), iv_maxHits);
>                            //proposed fixed
>                             String escaped = 
> QueryParser.escape(str.replace('-', ' '));
>                             topDoc =
> iv_searcher.search(query.parse(escaped), iv_maxHits);
>                            } catch (ParseException e) {
>                                   // TODO Auto-generated catch block
>                                   e.printStackTrace();
>                            }
>               }

I was the author of the code in question above.  Prior versions of cTAKES utilized dictionary resources that required this work around for situations when a  hyphen was contained in the first term (f-word) being looked up.  Part of the issue was the fact that hyphenated terms would be handled as single tokens, however, this problem had more to do with how the Lucene dictionary was built than the content of the dictionary.  

After some experimentation I discovered that how the field was indexed played a role in what would be able to be queried within the string.  By using the following I achieved better results:

					document.add(new Field("first_word", s[0].trim(), Field.Store.YES,
							Field.Index.ANALYZED));

                
> exception formed by malformed email address
> -------------------------------------------
>
>                 Key: CTAKES-63
>                 URL: https://issues.apache.org/jira/browse/CTAKES-63
>             Project: cTAKES
>          Issue Type: Bug
>          Components: ctakes-dictionary-lookup
>    Affects Versions: 2.6-incubating
>         Environment: windows
>            Reporter: Chen Lin
>            Priority: Critical
>              Labels: Stability
>
> 2012-09-21 12:48:36,789 INFO  edu.mayo.bmi.uima.lookup.ae.UmlsDictionaryLookupAnnotator  - process(JCas)
> org.apache.lucene.queryParser.ParseException: Cannot parse 'mailto:abcoman@t nec.org]': Lexical error at line 1, column 26.  Encountered: <EOF> after : ""
>        at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:192)
>        at edu.mayo.bmi.dictionary.lucene.LuceneDictionaryImpl.getEntries(LuceneDictionaryImpl.java:106)
>        at edu.mayo.bmi.dictionary.DictionaryEngine.metaLookup(DictionaryEngine.java:181)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira