You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steve Rowe (JIRA)" <ji...@apache.org> on 2018/01/17 03:38:00 UTC
[jira] [Comment Edited] (SOLR-11592) add another language detector
using OpenNLP
[ https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328236#comment-16328236 ]
Steve Rowe edited comment on SOLR-11592 at 1/17/18 3:37 AM:
------------------------------------------------------------
[~koji], I've attached a modified version of your patch that I think is ready to go, including ref guide docs, a {{CHANGES.txt}} entry, and tests; tests and precommit pass for me. If you have time I'd appreciate a review.
Notable changes from the previous version of the patch:
* Added target {{train-test-models}} to the langid contrib's {{build.xml}}. This downloads Leipzip corpora data files for five languages, extracts the data required for OpenNLP to train a model, then trains a test model. The resulting model is included in the patch.
* Added tests that use the test model.
* Automatically convert from the 3-letter ISO 639-3 codes provided by the OpenNLP model into the corresponding 2-letter ISO 639-1 codes, to match the other two langid implementations.
* Modified the update process factory to interrogate the "invariants" and "defaults" config sections for the {{langid.model}} param.
was (Author: steve_rowe):
[~koji], I've attached a modified version of your patch that I think is ready to go, including ref guide docs, a {{CHANGES.txt}} entry, and tests; tests and precommit pass for me. If you have time I'd appreciate a review.
Notable changes from the previous version of the patch:
* Added {{test-train-models}} target to the langid contrib's {{build.xml}}. This downloads Leipzip corpora data files for five languages, extracts the data required for OpenNLP to train a model, then trains a test model. The resulting model is included in the patch.
* Added tests that use the test model.
* Automatically convert from the 3-letter ISO 639-3 codes provided by the OpenNLP model into the corresponding 2-letter ISO 639-1 codes, to match the other two langid implementations.
* Modified the update process factory to interrogate the "invariants" and "defaults" config sections for the {{langid.model}} param.
> add another language detector using OpenNLP
> -------------------------------------------
>
> Key: SOLR-11592
> URL: https://issues.apache.org/jira/browse/SOLR-11592
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Components: contrib - LangId
> Affects Versions: 7.1
> Reporter: Koji Sekiguchi
> Priority: Minor
> Attachments: SOLR-11592.patch, SOLR-11592.patch
>
>
> We already have two language detectors, lang-detect and Tika's lang detect. This is a ticket that gives users third option using OpenNLP. :)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org