You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nicholas DiPiazza (Jira)" <ji...@apache.org> on 2021/06/15 17:11:00 UTC

[jira] [Created] (TIKA-3446) OneNote - look into adding support for OneNote 365 documents

Nicholas DiPiazza created TIKA-3446:
---------------------------------------

             Summary: OneNote - look into adding support for OneNote 365 documents
                 Key: TIKA-3446
                 URL: https://issues.apache.org/jira/browse/TIKA-3446
             Project: Tika
          Issue Type: New Feature
          Components: parser
    Affects Versions: 1.27
            Reporter: Nicholas DiPiazza
            Assignee: Nicholas DiPiazza


While doing some parsing of OneNote documents, I was investigating a slew of them that did not seem to parse very well. 

When I did some digging, I found out that these documents were generated from SharePoint Online. 

I had hoped that OneNote documents generated from OneNote would just be the same as OnPrem OneNote documents from 2016, 2019 etc. 

But turns out this is NOT the case. 

I checked out the Microsoft specification MS-ONESTORE and found that the documents do not match the specifications that are published. 

Opened a community post: [Looking for the MS spec for OneNote 365 version - Microsoft Q&A|https://docs.microsoft.com/en-us/answers/questions/436943/looking-for-the-ms-spec-for-onenote-365-version-1.html]

And also opened an internal ticket with Microsoft. 

They will be responding soon with an analysis of my issue and we'll see if there is anything we can do. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)