You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2010/01/20 03:01:54 UTC
[jira] Created: (TIKA-366) Increase buffer size for mime type
sniffing
Increase buffer size for mime type sniffing
-------------------------------------------
Key: TIKA-366
URL: https://issues.apache.org/jira/browse/TIKA-366
Project: Tika
Issue Type: Improvement
Components: mime
Affects Versions: 0.5
Environment: My local MacBook pro laptop.
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
Fix For: 0.6
While working on TIKA-357 to address a similar problem for charset detection, I found an issue with mime identification having to do with the same general problem. Tika right now only deals with the first MimeTypes#getMinLength() bytes of a magic header to do the sniffing of mime type. With the example file attached from Ken Krugler, it's clear that the current min length size of 4 * 1024 bytes isn't enough. Extending it to 8K (8 * 1024 bytes) addresses this issue and seems to open up more opportunity for mime detection at little overhead cost.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (TIKA-366) Increase buffer size for mime type
sniffing
Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann resolved TIKA-366.
------------------------------------
Resolution: Fixed
- fixed in r901033
> Increase buffer size for mime type sniffing
> -------------------------------------------
>
> Key: TIKA-366
> URL: https://issues.apache.org/jira/browse/TIKA-366
> Project: Tika
> Issue Type: Improvement
> Components: mime
> Affects Versions: 0.5
> Environment: My local MacBook pro laptop.
> Reporter: Chris A. Mattmann
> Assignee: Chris A. Mattmann
> Fix For: 0.6
>
>
> While working on TIKA-357 to address a similar problem for charset detection, I found an issue with mime identification having to do with the same general problem. Tika right now only deals with the first MimeTypes#getMinLength() bytes of a magic header to do the sniffing of mime type. With the example file attached from Ken Krugler, it's clear that the current min length size of 4 * 1024 bytes isn't enough. Extending it to 8K (8 * 1024 bytes) addresses this issue and seems to open up more opportunity for mime detection at little overhead cost.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.