You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jan Høydahl (JIRA)" <ji...@apache.org> on 2010/08/21 19:00:16 UTC

[jira] Updated: (TIKA-492) Add language identification support for North Sami, Lule Sami and South Sami

     [ https://issues.apache.org/jira/browse/TIKA-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated TIKA-492:
-----------------------------

    Description: 
We need added support for Sami languages.

According to document "Requirements for support for Sami languages in data processing" (http://www.samit.no/01-850-51.pdf) Tika will get "Basic Level" support by detecting North Sami, Lule Sami and South Sami.

  was:
Currently there is one Norwegian language profile in Tika - "no". We need to distinguish between the two official Norwegian languages defined by ISO 639-1 codes "nb" and "nn". Those codes are recommended used instead of the common "no" tag.

Proposed solved by removing the current language profile no.ngp and replacing it with two new ones for nb and nn.

We must also add tests for Norwegian


> Add language identification support for North Sami, Lule Sami and South Sami
> ----------------------------------------------------------------------------
>
>                 Key: TIKA-492
>                 URL: https://issues.apache.org/jira/browse/TIKA-492
>             Project: Tika
>          Issue Type: New Feature
>          Components: languageidentifier
>    Affects Versions: 0.7
>            Reporter: Jan Høydahl
>
> We need added support for Sami languages.
> According to document "Requirements for support for Sami languages in data processing" (http://www.samit.no/01-850-51.pdf) Tika will get "Basic Level" support by detecting North Sami, Lule Sami and South Sami.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.