You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nicholas DiPiazza (JIRA)" <ji...@apache.org> on 2019/01/15 03:35:00 UTC
[jira] [Comment Edited] (TIKA-2224) Mime magic for OneNote formats
[ https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742717#comment-16742717 ]
Nicholas DiPiazza edited comment on TIKA-2224 at 1/15/19 3:34 AM:
------------------------------------------------------------------
Where are we at with this?
There is already a onenote -> json parser implemented in c++ https://github.com/dropbox/onenote-parser
We could convert this code into java without much fuss.
Has anyone done this already?
[^Sample1.one] --> [^Sample1.json]
was (Author: ndipiazza_gmail):
Where are we at with this?
There is already a onenote parser implemented in c++ https://github.com/dropbox/onenote-parser
We could convert this code into java without much fuss.
Has anyone done this already?
> Mime magic for OneNote formats
> ------------------------------
>
> Key: TIKA-2224
> URL: https://issues.apache.org/jira/browse/TIKA-2224
> Project: Tika
> Issue Type: Improvement
> Components: mime
> Affects Versions: 1.14
> Reporter: Nick Burch
> Priority: Major
> Attachments: Sample1.json, Sample1.one, note-ssn-test-mmmm.one
>
>
> As raised at http://stackoverflow.com/questions/41272195/onenote-support-for-apache-tika-parsers, we don't have any magic for the OneNote formats. Several years ago we dug out the file format specs (see http://lucene.472066.n3.nabble.com/Tika-OneNote-Support-td4020393.html), but didn't have volunteer energy to implement a parser. However, armed with those specs, we should be able to come up with some mime magic for detection
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)