You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jan Høydahl (JIRA)" <ji...@apache.org> on 2011/06/07 09:35:59 UTC

[jira] [Commented] (TIKA-568) Language Detection isReasonablyCertain() hides valuable information

    [ https://issues.apache.org/jira/browse/TIKA-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045311#comment-13045311 ] 

Jan Høydahl commented on TIKA-568:
----------------------------------

An intermediate step would perhaps be to add the getDistance() method, but mark it as experimental. That way we can start using it but still be aware that the actual value it returns may change in the future as the backend algorithm is improved.

> Language Detection isReasonablyCertain() hides valuable information
> -------------------------------------------------------------------
>
>                 Key: TIKA-568
>                 URL: https://issues.apache.org/jira/browse/TIKA-568
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Priority: Minor
>         Attachments: TIKA-568.patch
>
>
> LanguageIdentifier.isReasonablyCertain() hardcodes a threshold for language detection, which is fine, except applications should be allowed to decide what threshold suits them.  For instance, how was 0.022 decided?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira