You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@opennlp.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/05/31 23:04:00 UTC

[jira] [Commented] (OPENNLP-1265) Improve speed of lang detect

    [ https://issues.apache.org/jira/browse/OPENNLP-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853462#comment-16853462 ] 

Tim Allison commented on OPENNLP-1265:
--------------------------------------

Baseline:
Input string: 10000x "estava em uma marcenaria na Rua Bruno "
model: langdetect-183.bin
runs: 4 (don't show results for first warmup run)

Results (millis)
13366 : {por=50}
13608 : {por=50}
14035 : {por=50}

If we switch to working with string based ngrams instead of StringList, there's a 2x improvement:
6087 : {por=50}
6202 : {por=50}
6146 : {por=50}

see: https://github.com/tballison/opennlp/blob/OPENNLP-1265/opennlp-tools/src/main/java/opennlp/tools/ngram/NGramModelSimplified.java

> Improve speed of lang detect
> ----------------------------
>
>                 Key: OPENNLP-1265
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1265
>             Project: OpenNLP
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> Over on TIKA-2790, we found that opennlp's language detector is far, far slower than Optimaize and yalder.
> Let's use this ticket to see what we can do to improve lang detect's speed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)