You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/05/15 15:22:04 UTC

[jira] [Comment Edited] (TIKA-2360) Handle SentimentParser resource failure more robustly

    [ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010631#comment-16010631 ] 

Tim Allison edited comment on TIKA-2360 at 5/15/17 3:21 PM:
------------------------------------------------------------

My preference would be not to include the SentimentParser by default:

1) network calls that are not currently robustly handled

2) .sent glob in mime detection which could cause problems for users who happen to have files with that suffix, and y, I can't imagine users have a bunch of Apple II files kicking around, but this is a mildly worrisome method of triggering the SentimentParser

3) while very cool, it is a fundamentally different thing than a parser.  It enriches already extracted UTF-8 text, kind of like the phone number handler, etc.  I realize NER does exactly the same thing...I know...

My proposal is that we treat the SentimentParser the same way we do NER.  Remove it from SPI, remove glob detection, swallow but log exceptions on initialization.

[~chrismattmann] and others, any objections? 


was (Author: tallison@mitre.org):
My preference would be not to include the SentimentParser by default:

1) network calls that are not currently robustly handled

2) .sent glob in mime detection which could cause problems for users who happen to have files with that suffix, and y, I can't imagine users have a bunch of Apple II files kicking around, but this is a mildly worrisome method of triggering the SentimentParser

3) while very cool, it is a fundamentally different thing than a parser.  It enriches already extracted UTF-8 text, kind of like the phone number handler, etc.  I realize NER does exactly the same thing...I know...



> Handle SentimentParser resource failure more robustly
> -----------------------------------------------------
>
>                 Key: TIKA-2360
>                 URL: https://issues.apache.org/jira/browse/TIKA-2360
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Blocker
>
> The SentimentParser tests currently require a network call to github.  For those working behind a proxy or would prefer Tika not to make unexpected network calls, can we please turn this off by default?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)