You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2015/02/23 18:52:13 UTC

[jira] [Commented] (PDFBOX-2128) CMYK images are not supported correctly

    [ https://issues.apache.org/jira/browse/PDFBOX-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333571#comment-14333571 ] 

Tilman Hausherr commented on PDFBOX-2128:
-----------------------------------------

I made a new attempt with [twelvemonkeys|https://github.com/haraldk/TwelveMonkeys] in the DCTFilter. The CMYK images do not throw an exception when calling reader.read(0), so we get an RGB image with a 3 band raster, which results in an array out of bounds exception as we expect more data (4 bands!). But reader.readRaster(0, null) works fine. However this doesn't work for RGB images, the rendered image is blueish for some reason (why????). Thus the following code works for all my test files (except one, see at the end):
{code}
raster = reader.readRaster(0, null);
if (raster.getNumBands() == 3)
{
    BufferedImage image = reader.read(0);
    raster = image.getRaster();
}
{code}

It might be even faster to just read the metadata first to decide what to do.

The trick I used for PDFBOX-2501 (seek if there is an extra 0x0a) no longer works, however the user had accepted that his PDF is malformed, and the creator of that PDF has fixed the bug too.

> CMYK images are not supported correctly
> ---------------------------------------
>
>                 Key: PDFBOX-2128
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2128
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.8.5, 1.8.6, 2.0.0
>         Environment: Windows 7 Professional
> Running jvm: Java HotSpot(TM) 64-Bit Server VM - 1.6.0_26-b03 - 20.1-b02 - Sun Microsystems Inc
>            Reporter: Ludovic Davoine
>              Labels: PDJpeg, cmyk, images
>             Fix For: 2.1.0
>
>         Attachments: porsche_cmyk.pdf-2.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I have a PDF with CMYK images inside and i need to extract the images in the RGB format. But the PDJpeg class seems to not work correctly; the colors are bad.  Example:
> - Original image in te PDF : http://ludoda.free.fr/IMAGE_IN_PDF.jpg
> - Extracted image: http://ludoda.free.fr/IMAGE_EXTRACTED.jpg
> You can download the PDF : http://ludoda.free.fr/PORSCHE_CMYK.PDF
> and try my simple Test Case (I'm using PDFbox 1.8.5): 
> {code}
> import java.awt.image.BufferedImage;
> import java.io.File;
> import java.io.IOException;
> import java.util.Iterator;
> import java.util.List;
> import java.util.Map;
> import javax.imageio.ImageIO;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.pdmodel.PDPage;
> import org.apache.pdfbox.pdmodel.PDResources;
> import org.apache.pdfbox.pdmodel.graphics.xobject.PDJpeg;
> import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObject;
> import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage;
> public class TestCase {
> 	
> 	public static void main(String[] args) 
> 	{
> 		try 
> 		{
> 			System.out.println("START EXTRACTING IMAGES...");
> 			read_pdf();
> 			System.out.println("COMPLETE");
> 		}
> 		catch (IOException ex) 
> 		{
> 		    System.out.println("" + ex);
> 		}
> 	}
> 	public static void read_pdf() throws IOException 
> 	{
> 		    PDDocument document = null; 
> 		    document = PDDocument.load("C:\\temp\\PORSCHE_CMYK.pdf");
> 		    @SuppressWarnings("unchecked")
> 		    List<PDPage> pages = document.getDocumentCatalog().getAllPages();
> 		    Iterator<PDPage> iter = pages.iterator(); 
> 		    int i =1;
> 		    while (iter.hasNext())
> 		    {
> 		        PDPage page = (PDPage) iter.next();
> 		        PDResources resources = page.getResources();
> 		        Map<String, PDXObject> pageImages = resources.getXObjects();
> 		        if (pageImages != null)
> 		        { 
> 		            Iterator<String> imageIter = pageImages.keySet().iterator();
> 		            while (imageIter.hasNext())
> 		            {
> 		            	String key = (String) imageIter.next();
> 		            	if(pageImages.get(key) instanceof PDXObjectImage)
> 		                {
> 		                	PDJpeg image = (PDJpeg) pageImages.get(key);
> 		                	
> 		                	// Test 1 : write2file
> 		                	image.write2file("C:\\workspace\\JAVA_PDFTools\\temp\\image" + i);
> 		                	
> 		                	// Test 2: getRGBImage
> 		                	BufferedImage bimage=image.getRGBImage();
> 		                	File outputfile = new File("C:\\workspace\\JAVA_PDFTools\\temp\\image" + i+"_buffered.jpg");
> 		                	ImageIO.write(bimage, "jpg", outputfile);
> 		                	i ++;
> 		                }
> 		            }
> 		        }
> 		    }
> 		}
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org