You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2010/03/28 16:21:27 UTC

[jira] Resolved: (PDFBOX-574) PDFBox image extraction fails with an ArrayOutOfBoundsException in PDPixelMap.getRGBImage()

     [ https://issues.apache.org/jira/browse/PDFBOX-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-574.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.2.0

I've applied Ian's patch with version 928402.

Thanks for the contribution

> PDFBox image extraction fails with an ArrayOutOfBoundsException in PDPixelMap.getRGBImage()
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-574
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-574
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 0.8.0-incubator
>         Environment: Java
>            Reporter: Ian Kaplan
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.2.0
>
>         Attachments: omv_overview.pdf
>
>
> The project that I'm working on has been using PDFBox for both text extraction and image extraction from PDF documents.  We wrote a class, PDFImageStripper, which extends PDFStreamEngine:
> public class PDFImageStripper extends PDFStreamEngine 
> 	public List<ExtractedImage> getImages(PDDocument document, String documentFilename, File targetDirectory) throws IOException {
> 		resetEngine();
> 		
> 		this.document = document;
> 		this.documentFilename = documentFilename;
> 		this.targetDirectory = targetDirectory;
>         
>         currentImageNumber = 1;
>         
>         images.clear();
>         writeImages();        
>         return images;
>     }
> 	private void writeImages() throws IOException {
> 		List<PDPage> pages = (List<PDPage>) document.getDocumentCatalog().getAllPages();
> 		for (PDPage page : pages) {
> 		    if (page != null) {
> 		        processStream(page, page.findResources(), page.getContents().getStream());
> 		    }
> 	    }
> 	}
> The call chain is shown below:
> None.decode(byte[], byte[]) line: 57	
> PDPixelMap.getRGBImage() line: 182	
> PDPixelMap.write2OutputStream(OutputStream) line: 209	
> PDPixelMap(PDXObjectImage).write2file(File) line: 142	
> PDFImageStripper.saveImage(PDXObjectImage, String, File) line: 208	
> PDFImageStripper.processOperator(PDFOperator, List) line: 155	
> PDFImageStripper(PDFStreamEngine).processSubStream(PDPage, PDResources, COSStream) line: 229	
> PDFImageStripper(PDFStreamEngine).processStream(PDPage, PDResources, COSStream) line: 188	
> PDFImageStripper.writeImages() line: 113	
> There is an ArrayOutOfBoundsException in the decode method.  The decode method is nothing more than a wrapper for a call to System.arraycopy():
>     public void decode(byte[] src, byte[] dest)
>     {
>         System.arraycopy(src,0,dest,0,src.length);
>     }
> The problem is, the source array is larger than the destination array.  This is show (from the Eclipse debugger) below:
> src	byte[455112]  (id=171)	
> dest	byte[435456]  (id=175)	
> The code that seems to be causing the problem is shown below.  The branch that this bug shows up on is the LZW_DECODE branch.  Note that in the other code branch, the code makes sure that there is no size problem.
>             if( predictor < 10 ||
>                 filters == null || !(filters.contains( COSName.LZW_DECODE.getName()) ||
>                          filters.contains( COSName.FLATE_DECODE.getName()) ) )
>             {
>                 PredictorAlgorithm filter = PredictorAlgorithm.getFilter(predictor);
>                 filter.setWidth(width);
>                 filter.setHeight(height);
>                 filter.setBpp((bpc * 3) / 8);
>                 filter.decode(array, bufferData);
>             }
>             else
>             {
>                 System.arraycopy( array, 0,bufferData, 0, 
>                         (array.length<bufferData.length?array.length: bufferData.length) );
>             }
> One fix may be to simply change the code as follows (again, recall that the "decode" method is nothing but a wrapper for System.arraycopy()):
>           if( predictor < 10 ||
>                 filters == null || !(filters.contains( COSName.LZW_DECODE.getName()) ||
>                          filters.contains( COSName.FLATE_DECODE.getName()) ) )
>             {
>                 PredictorAlgorithm filter = PredictorAlgorithm.getFilter(predictor);
>                 filter.setWidth(width);
>                 filter.setHeight(height);
>                 filter.setBpp((bpc * 3) / 8);
>             }
>                 System.arraycopy( array, 0,bufferData, 0, 
>                         (array.length<bufferData.length?array.length: bufferData.length) );
> If Jira allows me to attach a file that causes this problem I will do so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.