You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2016/12/22 01:32:58 UTC

[jira] [Commented] (TIKA-2224) Mime magic for OneNote formats

    [ https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768699#comment-15768699 ] 

Nick Burch commented on TIKA-2224:
----------------------------------

Mime magic now added for `.one` and `.onetoc`. `.onepkg` is actually just a cab file of other onenote files, so we can't add magic for it (it needs detecting by opening the container)

No unit tests yet, leaving open until we get some small sample files we can use, hopefully from the original poster on StackOverflow!

> Mime magic for OneNote formats
> ------------------------------
>
>                 Key: TIKA-2224
>                 URL: https://issues.apache.org/jira/browse/TIKA-2224
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.14
>            Reporter: Nick Burch
>
> As raised at http://stackoverflow.com/questions/41272195/onenote-support-for-apache-tika-parsers, we don't have any magic for the OneNote formats. Several years ago we dug out the file format specs (see http://lucene.472066.n3.nabble.com/Tika-OneNote-Support-td4020393.html), but didn't have volunteer energy to implement a parser. However, armed with those specs, we should be able to come up with some mime magic for detection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)