You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ken Krugler (JIRA)" <ji...@apache.org> on 2015/09/03 19:26:46 UTC
[jira] [Commented] (TIKA-492) Add language identification support
for North Sami, Lule Sami and South Sami
[ https://issues.apache.org/jira/browse/TIKA-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729427#comment-14729427 ]
Ken Krugler commented on TIKA-492:
----------------------------------
Currently the language-detector library I'm integrating (see TIKA-1723) doesn't support any of the three Sami languages. I'd open an issue at that project (see https://github.com/optimaize/language-detector/). So closing this issue, unless somebody wants to (a) port the current built-in Tika detector to the new architecture, and (b) follow up with Jan about getting training text, and (c) add the new profiles. I'll wait a few days.
> Add language identification support for North Sami, Lule Sami and South Sami
> ----------------------------------------------------------------------------
>
> Key: TIKA-492
> URL: https://issues.apache.org/jira/browse/TIKA-492
> Project: Tika
> Issue Type: New Feature
> Components: languageidentifier
> Affects Versions: 0.7
> Reporter: Jan Høydahl
> Assignee: Ken Krugler
> Priority: Minor
>
> We need added support for Sami languages.
> According to document "Requirements for support for Sami languages in data processing" (http://www.samit.no/01-850-51.pdf) Tika will get "Basic Level" support by detecting North Sami, Lule Sami and South Sami.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)