You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/02/23 03:54:44 UTC

[jira] [Comment Edited] (TIKA-2273) Enable configuration of EncodingDetectors via TikaConfig

    [ https://issues.apache.org/jira/browse/TIKA-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879807#comment-15879807 ] 

Tim Allison edited comment on TIKA-2273 at 2/23/17 3:54 AM:
------------------------------------------------------------

First draft of a patch.  Not all tests are passing.

If anyone has a chance to review, I'd appreciate it!

Parsers that use the AutoDetectReader have to grab TikaConfig from somewhere...I don't much like this.

This could lead to inefficiencies of creating the entire TikaConfig at each parse for TXTParser and HtmlParser and others.  I've mitigated this for those using AutoDetectParser by including a TikaConfig in the ParseContext if a user hasn't already specified one.

Are there better options?


was (Author: tallison@mitre.org):
First draft of a patch.  If anyone has a chance to review, I'd appreciate it!

Parsers that use the AutoDetectReader have to grab TikaConfig from somewhere...I don't much like this.

This could lead to inefficiencies of creating the entire TikaConfig at each parse for TXTParser and others.  I've mitigated this for those using AutoDetectParser by including a TikaConfig in the ParseContext if a user hasn't already specified one.

Are there better options?

> Enable configuration of EncodingDetectors via TikaConfig
> --------------------------------------------------------
>
>                 Key: TIKA-2273
>                 URL: https://issues.apache.org/jira/browse/TIKA-2273
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: TIKA_2273_first_draft.patch
>
>
> It would be nice to allow easier configuration of encoding detectors.  It should be straightforward to follow the example of detectors...(famous last words).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)