You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "susheel (Commented) (JIRA)" <ji...@apache.org> on 2011/11/14 11:18:52 UTC

[jira] [Commented] (PDFBOX-1169) Images extracted from PDF are loosing color (are shown in blackcolor)

    [ https://issues.apache.org/jira/browse/PDFBOX-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149529#comment-13149529 ] 

susheel commented on PDFBOX-1169:
---------------------------------

Comment to extract the image:

private void processImages(PDResources resources, String destinationFolder) throws IOException {
		Map images = resources.getImages();

		if (images != null) {
			Iterator imageIter = images.keySet().iterator();
			while (imageIter.hasNext()) {
				String key = (String) imageIter.next();
				PDXObjectImage image = (PDXObjectImage) images.get(key);
				String name = null;
				name = destinationFolder + "image-" + imageCounter++ + "." + image.getSuffix();
						
				//image.write2file(name); - Tried image.write2file as well, but retrieved images were similar
				BufferedImage bufferedImage = image.getRGBImage();
				File outputfile = new File(name);
				ImageIO.write(bufferedImage,image.getSuffix(), outputfile);
				System.out.println("szaveri - using imageio to write files " + name + " suffix =" + image.getSuffix());
				
			}
		}
	}


Please note, out of 200 odd images in the PDF, only two got extracted correctly rest all are having images with black background. 

I am sure, I am missing out some configuration or someother parameter, but unable to find it out.

Just to update, have also added following JAI Jars in my project:
jai_codec
jai_core
mlibwrapper_jai
                
> Images extracted from PDF are loosing color (are shown in blackcolor)
> ---------------------------------------------------------------------
>
>                 Key: PDFBOX-1169
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1169
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 1.6.0
>         Environment: Windows
>            Reporter: susheel
>         Attachments: eBook-Mini.pdf, image-1.jpg, image-2.jpg
>
>
> Using PDFBox, tried to read file (eBook-Mini.pdf, which is attached)
> When images are extracted using below mentioned code, the extracted images aren't as per the ones in PDF, they have lost color.
> Checked extracting images, using other tools and images were extracted correctly.
> Attached images extracted using PDFBox as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira