You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steve Rowe (JIRA)" <ji...@apache.org> on 2018/01/17 03:38:00 UTC

[jira] [Comment Edited] (SOLR-11592) add another language detector using OpenNLP

    [ https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328236#comment-16328236 ] 

Steve Rowe edited comment on SOLR-11592 at 1/17/18 3:37 AM:
------------------------------------------------------------

[~koji], I've attached a modified version of your patch that I think is ready to go, including ref guide docs, a {{CHANGES.txt}} entry, and tests; tests and precommit pass for me.  If you have time I'd appreciate a review.

Notable changes from the previous version of the patch:
 * Added target {{train-test-models}} to the langid contrib's {{build.xml}}.  This downloads Leipzip corpora data files for five languages, extracts the data required for OpenNLP to train a model, then trains a test model.  The resulting model is included in the patch.
 * Added tests that use the test model.
 * Automatically convert from the 3-letter ISO 639-3 codes provided by the OpenNLP model into the corresponding 2-letter ISO 639-1 codes, to match the other two langid implementations.
 * Modified the update process factory to interrogate the "invariants" and "defaults" config sections for the {{langid.model}} param.


was (Author: steve_rowe):
[~koji], I've attached a modified version of your patch that I think is ready to go, including ref guide docs, a {{CHANGES.txt}} entry, and tests; tests and precommit pass for me.  If you have time I'd appreciate a review.

Notable changes from the previous version of the patch:
 * Added {{test-train-models}} target to the langid contrib's {{build.xml}}.  This downloads Leipzip corpora data files for five languages, extracts the data required for OpenNLP to train a model, then trains a test model.  The resulting model is included in the patch.
 * Added tests that use the test model.
 * Automatically convert from the 3-letter ISO 639-3 codes provided by the OpenNLP model into the corresponding 2-letter ISO 639-1 codes, to match the other two langid implementations.
 * Modified the update process factory to interrogate the "invariants" and "defaults" config sections for the {{langid.model}} param.

> add another language detector using OpenNLP
> -------------------------------------------
>
>                 Key: SOLR-11592
>                 URL: https://issues.apache.org/jira/browse/SOLR-11592
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - LangId
>    Affects Versions: 7.1
>            Reporter: Koji Sekiguchi
>            Priority: Minor
>         Attachments: SOLR-11592.patch, SOLR-11592.patch
>
>
> We already have two language detectors, lang-detect and Tika's lang detect. This is a ticket that gives users third option using OpenNLP. :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org