You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@opennlp.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/01/24 16:55:00 UTC

[jira] [Commented] (OPENNLP-1267) Allow the LanguageDetector to stop before processing the full string

    [ https://issues.apache.org/jira/browse/OPENNLP-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023101#comment-17023101 ] 

ASF GitHub Bot commented on OPENNLP-1267:
-----------------------------------------

smarthi commented on pull request #357: OPENNLP-1267 -- add a ProbingLanguageDetector that can stop early.
URL: https://github.com/apache/opennlp/pull/357
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Allow the LanguageDetector to stop before processing the full string
> --------------------------------------------------------------------
>
>                 Key: OPENNLP-1267
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1267
>             Project: OpenNLP
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Major
>
> On TIKA-2790, I found that Yalder is stopping after computing character ngrams on roughly the first 60 characters.  That _likely_ explains its impressive speed.  Let's make this "stopping short" feature available in OpenNLP.
>  
> Ideally, the language detector wouldn't copy the full String, it wouldn't normalize the full String, and it wouldn't compute ngrams on the full String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)