You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ken Krugler (JIRA)" <ji...@apache.org> on 2015/09/03 19:26:46 UTC

[jira] [Commented] (TIKA-492) Add language identification support for North Sami, Lule Sami and South Sami

    [ https://issues.apache.org/jira/browse/TIKA-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729427#comment-14729427 ] 

Ken Krugler commented on TIKA-492:
----------------------------------

Currently the language-detector library I'm integrating (see TIKA-1723) doesn't support any of the three Sami languages. I'd open an issue at that project (see https://github.com/optimaize/language-detector/). So closing this issue, unless somebody wants to (a) port the current built-in Tika detector to the new architecture, and (b) follow up with Jan about getting training text, and (c) add the new profiles. I'll wait a few days.

> Add language identification support for North Sami, Lule Sami and South Sami
> ----------------------------------------------------------------------------
>
>                 Key: TIKA-492
>                 URL: https://issues.apache.org/jira/browse/TIKA-492
>             Project: Tika
>          Issue Type: New Feature
>          Components: languageidentifier
>    Affects Versions: 0.7
>            Reporter: Jan Høydahl
>            Assignee: Ken Krugler
>            Priority: Minor
>
> We need added support for Sami languages.
> According to document "Requirements for support for Sami languages in data processing" (http://www.samit.no/01-850-51.pdf) Tika will get "Basic Level" support by detecting North Sami, Lule Sami and South Sami.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)