You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/03/25 02:09:00 UTC

[jira] [Commented] (TIKA-3077) OneNote parser - very inefficient when parsing OneNote <= 2007 files

    [ https://issues.apache.org/jira/browse/TIKA-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066317#comment-17066317 ] 

ASF GitHub Bot commented on TIKA-3077:
--------------------------------------

nddipiazza commented on pull request #314: address TIKA-3077 - very slow parsing performance on OneNote <= 2007 docs.
URL: https://github.com/apache/tika/pull/314
 
 
   The OneNote 2007 code I created neglected to realize that there was no byte buffer on the direct file resource utility. So when I was setting the position on the stream over and over again during the parsing of bytes for the OneNote 2007 parsing, it was extremely inefficient. 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> OneNote parser - very inefficient when parsing OneNote <= 2007 files
> --------------------------------------------------------------------
>
>                 Key: TIKA-3077
>                 URL: https://issues.apache.org/jira/browse/TIKA-3077
>             Project: Tika
>          Issue Type: Improvement
>          Components: core
>            Reporter: Nicholas DiPiazza
>            Priority: Major
>
> The code I put in place for OneNote 2007 files is horribly inefficient. I hadn't realized that the OneNoteDirectFileResource that I extracted from another parser was not buffering the bytes. So every time I did a set position, it was very expensive. 
> The fix is to buffer the bytes into chunks and operate them instead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)