You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/04/09 19:13:00 UTC

[jira] [Comment Edited] (TIKA-3340) LanguageProfile for Myanmar

    [ https://issues.apache.org/jira/browse/TIKA-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318216#comment-17318216 ] 

Tim Allison edited comment on TIKA-3340 at 4/9/21, 7:12 PM:
------------------------------------------------------------

Tika's OpenNLP lang detector will soon have 126 langs.  I'm rebuilding the common tokens lists for tika-eval.  I should be able to commit this into 2.x by the end of the afternoon.

{noformat}
afr
amh
ara
asm
ast
aze
azj
bak
ban
bel
ben
bih
bos
bre
bul
cat
ceb
ces
che
cmn
cym
dan
deu
div
ekk
ell
eng
epo
est
eus
fao
fas
fin
fra
fry
gle
glg
gom
gsw
guj
hat
heb
hin
hrv
hun
hye
ind
isl
ita
jav
jpn
kan
kas
kat
kaz
khm
kir
knn
kor
lao
lat
lav
lim
lit
ltz
lvs
mal
mar
mhr
min
mkd
mlt
mon
mri
msa
mya
nan
nds
nep
new
nld
nno
nob
oci
ori
pan
pes
plt
pnb
pol
por
pus
ron
rus
san
sin
slk
slv
som
spa
sqi
srp
sun
swa
swe
tam
tat
tel
tgk
tgl
tha
tuk
tur
uig
ukr
urd
uzb
vie
vol
war
xho
yid
zho-simp
zho-trad
zsm
zul
{noformat}


was (Author: tallison@mitre.org):
Tika's OpenNLP lang detector will shortly have 126 langs.  I'm rebuilding the common tokens lists for tika-eval.  I should be able to commit this into 2.x by the end of the afternoon.

{noformat}
afr
amh
ara
asm
ast
aze
azj
bak
ban
bel
ben
bih
bos
bre
bul
cat
ceb
ces
che
cmn
cym
dan
deu
div
ekk
ell
eng
epo
est
eus
fao
fas
fin
fra
fry
gle
glg
gom
gsw
guj
hat
heb
hin
hrv
hun
hye
ind
isl
ita
jav
jpn
kan
kas
kat
kaz
khm
kir
knn
kor
lao
lat
lav
lim
lit
ltz
lvs
mal
mar
mhr
min
mkd
mlt
mon
mri
msa
mya
nan
nds
nep
new
nld
nno
nob
oci
ori
pan
pes
plt
pnb
pol
por
pus
ron
rus
san
sin
slk
slv
som
spa
sqi
srp
sun
swa
swe
tam
tat
tel
tgk
tgl
tha
tuk
tur
uig
ukr
urd
uzb
vie
vol
war
xho
yid
zho-simp
zho-trad
zsm
zul
{noformat}

> LanguageProfile for Myanmar
> ---------------------------
>
>                 Key: TIKA-3340
>                 URL: https://issues.apache.org/jira/browse/TIKA-3340
>             Project: Tika
>          Issue Type: Improvement
>          Components: languageidentifier
>            Reporter: Arky
>            Priority: Major
>         Attachments: 20210401-model.report.txt, table-summarized-truncated.txt.gz
>
>
> A language profile for detecting Myanmar/Burmese (my).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)