You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Valdis Andersons (JIRA)" <ji...@apache.org> on 2014/07/16 11:15:06 UTC

[jira] [Updated] (PDFBOX-2212) OutOfMemoryError in GlyfCompositeDescrip

     [ https://issues.apache.org/jira/browse/PDFBOX-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Valdis Andersons updated PDFBOX-2212:
-------------------------------------

    Attachment: adobe_error1.jpg
                adobe_error2.jpg

The Adobe Reader errors on the same corrupted file.

> OutOfMemoryError in GlyfCompositeDescrip
> ----------------------------------------
>
>                 Key: PDFBOX-2212
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2212
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox, Preflight
>    Affects Versions: 1.8.6
>         Environment: Windows 7, JDK6
>            Reporter: Valdis Andersons
>         Attachments: adobe_error1.jpg, adobe_error2.jpg
>
>
> Hi All,
>  
> The application I’m working on is a web service that accepts PDF documents and combines them in a single larger PDF. Client submits a bunch of PDFs and we create a single PDF out of them. In some rare cases one of the PDF documents submitted has a glitch in it that causes Adobe Reader to throw errors when viewing the final document (attached).
> When I tried to check the buggy PDF with the approach outlined here:
>  
> https://pdfbox.apache.org/cookbook/pdfavalidation.html
>  
> I was getting an OutOfMemoryError in the GlyfCompositeDescrip class, here is the full stack trace:
>  
> java.lang.OutOfMemoryError: Java heap space
>                 at org.apache.fontbox.ttf.GlyfCompositeDescript.<init>(GlyfCompositeDescript.java:58)
>                 at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:62)
>                 at org.apache.fontbox.ttf.GlyphTable.initData(GlyphTable.java:69)
>                 at org.apache.fontbox.ttf.TrueTypeFont.initializeTable(TrueTypeFont.java:280)
>                 at org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:128)
>                 at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:80)
>                 at org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:109)
>                 at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25)
>                 at org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:84)
>                 at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25)
>                 at org.apache.pdfbox.preflight.font.descriptor.TrueTypeDescriptorHelper.processFontFile(TrueTypeDescriptorHelper.java:84)
>                 at org.apache.pdfbox.preflight.font.descriptor.FontDescriptorHelper.validate(FontDescriptorHelper.java:97)
>                 at org.apache.pdfbox.preflight.font.SimpleFontValidator.processFontDescriptorValidation(SimpleFontValidator.java:82)
>                 at org.apache.pdfbox.preflight.font.SimpleFontValidator.validate(SimpleFontValidator.java:55)
>                 at org.apache.pdfbox.preflight.process.reflect.FontValidationProcess.validate(FontValidationProcess.java:69)
>                 at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
>                 at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
>                 at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateFonts(ResourcesValidationProcess.java:96)
>                 at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:74)
>                 at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
>                 at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
>                 at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validateXObjectResources(XObjFormValidator.java:178)
>                 at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:75)
>                 at org.apache.pdfbox.preflight.process.reflect.GraphicObjectPageValidationProcess.validate(GraphicObjectPageValidationProcess.java:77)
>                 at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
>                 at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
>                 at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateXObjects(ResourcesValidationProcess.java:191)
>                 at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:78)
>                 at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
>                 at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
>                 at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validateXObjectResources(XObjFormValidator.java:178)
>                 at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:75)
>  
> While I can’t send on the PDF in question due to the sensitivity of the contents in it, I did a bit of digging and debugging to find out why this is happening.
> In the GlyfCompositeDescrip classes constructor there is a do … while loop that is constructing GlyfCompositeComp objects and adding them to the components list of GlyfCompositeDescrip. In the constructor of the GlyfCompositeComp a signed short is read from the TTFDataStream in the flags field, that field in turn is used in the GlyfCompositeDescrip constructor to check if any more components are there to be read. Here is the code in question:
>  
> public GlyfCompositeDescript(TTFDataStream bais, GlyphTable glyphTable) throws IOException
>     {
> …
>         do
>         {
>             comp = new GlyfCompositeComp(bais); //This is where the OutOfMemoryError happens
>             components.add(comp);
>         } while ((comp.getFlags() & GlyfCompositeComp.MORE_COMPONENTS) != 0); //here the flags are used to check if more components are there
> …
>     }
>  
> protected GlyfCompositeComp(TTFDataStream bais) throws IOException
>     {
>         flags = bais.readSignedShort();
> …
> }
>  
> In the case of the corrupted PDF, that we get from time to time, the bais.readSignedShort() call in GlyfCompositeComp results in a value of -1 and once it hits that value the condition in the GlyfCompositeDescript constructor’s loop will always result in 32 (!=0). Basically, it ends up in an infinite loop and keeps constructing GlyfCompositeComp objects until the memory runs out.
>  
> The main question here is, has anyone ever encountered a PDF corruption that causes this behaviour and how would one have to go about checking the PDF document for this sort of corruptions without causing the application to run out of memory?
>  
> We’re not required to fix the document, just check if it’s valid. If it’s not valid then we just reject the document. Ideally I’d also like to know what the corruption could be so that I can at least give a hint to the client software as to what is causing this document to be rejected (I do understand that without the actual PDF that’s causing this it might be impossible to tell that).



--
This message was sent by Atlassian JIRA
(v6.2#6252)