You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Felix Meschberger (JIRA)" <ji...@apache.org> on 2010/08/13 07:03:16 UTC

[jira] Commented: (TIKA-322) Improve encoding detection speed and accuracy

    [ https://issues.apache.org/jira/browse/TIKA-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898081#action_12898081 ] 

Felix Meschberger commented on TIKA-322:
----------------------------------------

According to [1] MPL is a Category B license and such licensed work can be included in binary-only form.

[1] http://www.apache.org/legal/resolved.html#category-b

> Improve encoding detection speed and accuracy
> ---------------------------------------------
>
>                 Key: TIKA-322
>                 URL: https://issues.apache.org/jira/browse/TIKA-322
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Jukka Zitting
>            Priority: Minor
>
> The encoding detection code we took from ICU4J is not very efficient and sometimes produces odd results when more than one encoding matches the given input data. It would be good to refactor the code to be faster for easy-to-detect encodings and to have better heuristics in case multiple matches are found.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (TIKA-322) Improve encoding detection speed and accuracy

Posted by Oleg Tikhonov <ol...@gmail.com>.
I support that, +1.

On Fri, Aug 13, 2010 at 8:03 AM, Felix Meschberger (JIRA)
<ji...@apache.org>wrote:

>
>    [
> https://issues.apache.org/jira/browse/TIKA-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898081#action_12898081]
>
> Felix Meschberger commented on TIKA-322:
> ----------------------------------------
>
> According to [1] MPL is a Category B license and such licensed work can be
> included in binary-only form.
>
> [1] http://www.apache.org/legal/resolved.html#category-b
>
> > Improve encoding detection speed and accuracy
> > ---------------------------------------------
> >
> >                 Key: TIKA-322
> >                 URL: https://issues.apache.org/jira/browse/TIKA-322
> >             Project: Tika
> >          Issue Type: Improvement
> >          Components: mime
> >            Reporter: Jukka Zitting
> >            Priority: Minor
> >
> > The encoding detection code we took from ICU4J is not very efficient and
> sometimes produces odd results when more than one encoding matches the given
> input data. It would be good to refactor the code to be faster for
> easy-to-detect encodings and to have better heuristics in case multiple
> matches are found.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>


-- 
Best regards, Oleg.