You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2018/11/07 22:07:00 UTC

[jira] [Commented] (PDFBOX-4370) Jempbox's ResourceEvent crazily slow to initialize

    [ https://issues.apache.org/jira/browse/PDFBOX-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678851#comment-16678851 ] 

Tim Allison commented on PDFBOX-4370:
-------------------------------------

There are two other files in our new corpus that trigger this.  I'm ok with a "won't fix"...unless the solution is fairly easy...and there's no rush if this is fixable. :)  Thank you!

> Jempbox's ResourceEvent crazily slow to initialize
> --------------------------------------------------
>
>                 Key: PDFBOX-4370
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4370
>             Project: PDFBox
>          Issue Type: Task
>          Components: JempBox
>    Affects Versions: 1.8.16
>            Reporter: Tim Allison
>            Priority: Trivial
>         Attachments: slow.zip
>
>
> In our new batch of regression files on Tika, one of the new PDFs caused a timeout.  This is not an infinite loop, but it does take several minutes. This may not be fixable.
> Admittedly, the XMP is large, and there are quite a few events.
> This is the code that triggers the problem.
> {noformat}
>             XMPMetadata xmp = XMPMetadata.load(is);
>             XMPSchemaMediaManagement mmSchema = xmp.getMediaManagementSchema();
>             mmSchema.getHistory();
> {noformat}
> The slow part _seems_ to be setting the attribute namespace when creating a new ResourceEvent.  When I comment out the following in ResourceEvent's initializer, the processing time is quite fast (1 second).
> {noformat}
>             parent.setAttributeNS( 
>                 XMPSchema.NS_NAMESPACE, 
>                 "xmlns:stEvt", 
>                 NAMESPACE );
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org