You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by sai kumar <zs...@gmail.com> on 2019/12/17 14:38:17 UTC

Fwd: Inaccuracy in japanese language detection-reg

Hi

I am using tika language detection(optimizable language detector) to detect
languages..
Some of the other language (except english) are not detected properly.
my content has 30% english and 70% japanese in it..but it is detected as
english instead of japanese..
i tried setting mixedlanguage =true.but it didn't solve it..
Any suggestions on how to modify my input in a better way to detect
japanese languages in tika