You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Roland Quast (JIRA)" <ji...@apache.org> on 2011/05/26 07:43:47 UTC

[jira] [Created] (PDFBOX-1018) PDPage convertToImage bug creates white images from black and white pdf files.

PDPage convertToImage bug creates white images from black and white pdf files.
------------------------------------------------------------------------------

                 Key: PDFBOX-1018
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
             Project: PDFBox
          Issue Type: Bug
          Components: PDModel
    Affects Versions: 1.3.1, 1.4.0
         Environment: JDK 1.6.0_21
            Reporter: Roland Quast
            Assignee: Andreas Lehmkühler


When converting a PDPage of this pdf into an image, the resulting file is always a white image with no contents.

The following message appeared in the log output (It doesn't seem to be  a duplicate of PDFBOX-794.) : 

 ERROR                  filter.FlateFilter - Stop reading corrupt stream

Here's the code used to convert the image :

@Test
public void testConvertImage() {
	try {
		PDDocument pdDocument = PDDocument.load("pdf_causing_white_pages.pdf");
		List<PDPage> documentPageList = pdDocument.getDocumentCatalog().getAllPages();
		TestCase.assertNotNull(documentPageList);
		int pageNumber = 1;
		for (PDPage tmpPage :documentPageList){
			BufferedImage tempImage = tmpPage.convertToImage();
			ImageIO.write(tempImage,"jpeg", new File("result_"+pageNumber+".jpeg"));
			pageNumber ++;
		}			
	} catch (FileNotFoundException e) {
		TestCase.fail(e.getMessage());
	} catch (IOException e) {
		TestCase.fail(e.getMessage());
	}
}



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1018) PDPage convertToImage bug creates white images from black and white pdf files.

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039802#comment-13039802 ] 

Andreas Lehmkühler commented on PDFBOX-1018:
--------------------------------------------

That's a cool bug report, the best I ever saw!!

As I already mentioned earlier, PDFBox doesn't supports the decoding of the CCITTFaxDecode-filter out of the. This only works if one includes the optional JAI jar. See PDFBOX-955 for further details.



> PDPage convertToImage bug creates white images from black and white pdf files.
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1018) PDPage convertToImage bug creates white images from black and white pdf files.

Posted by "Roland Quast (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040709#comment-13040709 ] 

Roland Quast commented on PDFBOX-1018:
--------------------------------------

Which you can find here: http://ij-plugins.sourceforge.net/plugins/imageio/

> PDPage convertToImage bug creates white images from black and white pdf files.
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PDFBOX-1018) Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)

Posted by "Roland Quast (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Roland Quast updated PDFBOX-1018:
---------------------------------

    Comment: was deleted

(was: Inverted black and white happens with new filter.)

> Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>             Fix For: 1.6.0
>
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf, tast_bb.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (PDFBOX-1018) Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)

Posted by "Roland Quast (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Roland Quast reopened PDFBOX-1018:
----------------------------------


Close issue and create another issue or fix and close again?

> Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>             Fix For: 1.6.0
>
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf, tast_bb.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PDFBOX-1018) PDPage convertToImage bug creates white images from black and white pdf files.

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-1018:
---------------------------------------

    Attachment: PDFBOX1018-black_and_white1.png

I got the attached result when using the following commandline

java -cp ./pdfbox-app-1.5.0.jar:./jai_imageio.jar org.apache.pdfbox.PDFToImage -imageType png PDFBOX1018-black_and_white.pdf 

> PDPage convertToImage bug creates white images from black and white pdf files.
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1018) PDPage convertToImage bug creates white images from black and white pdf files.

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040661#comment-13040661 ] 

Andreas Lehmkühler commented on PDFBOX-1018:
--------------------------------------------

I agree with Roland concerning the error message. We should change it to something more meaningful.

Due to the license issue we can't bundle pdfbox with the mentioned jar so that it is a good idea to use something else. IMO it doesn't make sense to add a new dependency only to use a very small piece of it. But the more important fact is that AFAIK sanslan doesn't support the needed CCITTFaxDecoder.

I'm experimenting with the TIFFFaxDecoder which is part of Apache XMLGraphics [1]. It works with your pdf but fails with others. I've to dig deeper into it and try to understand decode algo.

[1] http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/image/codec/tiff/TIFFImageEncoder.java?view=markup

> PDPage convertToImage bug creates white images from black and white pdf files.
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PDFBOX-1018) Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-1018:
---------------------------------------

    Issue Type: Improvement  (was: Bug)
       Summary: Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)  (was: PDPage convertToImage bug creates white images from black and white pdf files.)

> Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PDFBOX-1018) PDPage convertToImage bug creates white images from black and white pdf files.

Posted by "Roland Quast (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Roland Quast updated PDFBOX-1018:
---------------------------------

          Component/s:     (was: PDModel)
               Labels: pdfbox  (was: )
             Priority: Critical  (was: Major)
          Description: 
This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.

I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.

The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.

26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
WARNING: getRGBImage returned NULL

We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.




  was:
When converting a PDPage of this pdf into an image, the resulting file is always a white image with no contents.

The following message appeared in the log output (It doesn't seem to be  a duplicate of PDFBOX-794.) : 

 ERROR                  filter.FlateFilter - Stop reading corrupt stream

Here's the code used to convert the image :

@Test
public void testConvertImage() {
	try {
		PDDocument pdDocument = PDDocument.load("pdf_causing_white_pages.pdf");
		List<PDPage> documentPageList = pdDocument.getDocumentCatalog().getAllPages();
		TestCase.assertNotNull(documentPageList);
		int pageNumber = 1;
		for (PDPage tmpPage :documentPageList){
			BufferedImage tempImage = tmpPage.convertToImage();
			ImageIO.write(tempImage,"jpeg", new File("result_"+pageNumber+".jpeg"));
			pageNumber ++;
		}			
	} catch (FileNotFoundException e) {
		TestCase.fail(e.getMessage());
	} catch (IOException e) {
		TestCase.fail(e.getMessage());
	}
}



          Environment: JDK 1.6.0_22  (was: JDK 1.6.0_21)
    Affects Version/s: 1.2.0
                       1.2.1
                       1.5.0

> PDPage convertToImage bug creates white images from black and white pdf files.
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (PDFBOX-1018) Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040661#comment-13040661 ] 

Andreas Lehmkühler edited comment on PDFBOX-1018 at 5/29/11 5:33 PM:
---------------------------------------------------------------------

I agree with Roland concerning the error message. We should change it to something more meaningful.

Due to the license issue we can't bundle pdfbox with the mentioned jar so that it is a good idea to use something else. IMO it doesn't make sense to add a new dependency only to use a very small piece of it. But the more important fact is that AFAIK sanslan doesn't support the needed CCITTFaxDecoder.

I'm experimenting with the TIFFFaxDecoder which is part of Apache XMLGraphics [1]. It works with your pdf but fails with others. I've to dig deeper into it and try to understand decode algo.

[1] http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/image/codec/tiff/TIFFFaxDecoder.java?view=markup

      was (Author: lehmi):
    I agree with Roland concerning the error message. We should change it to something more meaningful.

Due to the license issue we can't bundle pdfbox with the mentioned jar so that it is a good idea to use something else. IMO it doesn't make sense to add a new dependency only to use a very small piece of it. But the more important fact is that AFAIK sanslan doesn't support the needed CCITTFaxDecoder.

I'm experimenting with the TIFFFaxDecoder which is part of Apache XMLGraphics [1]. It works with your pdf but fails with others. I've to dig deeper into it and try to understand decode algo.

[1] http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/image/codec/tiff/TIFFImageEncoder.java?view=markup
  
> Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PDFBOX-1018) Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)

Posted by "Roland Quast (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Roland Quast updated PDFBOX-1018:
---------------------------------

    Attachment: tast_bb.pdf

Inverted black and white happens with new filter.

> Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>             Fix For: 1.6.0
>
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf, tast_bb.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1018) PDPage convertToImage bug creates white images from black and white pdf files.

Posted by "Roland Quast (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040700#comment-13040700 ] 

Roland Quast commented on PDFBOX-1018:
--------------------------------------

May also require non_com.media.jai.codec.*

> PDPage convertToImage bug creates white images from black and white pdf files.
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1018) PDPage convertToImage bug creates white images from black and white pdf files.

Posted by "Roland Quast (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039521#comment-13039521 ] 

Roland Quast commented on PDFBOX-1018:
--------------------------------------

I have made a video of the issue, trying to clearly show that this issue is real and it has not gone away. Please watch the video and see for yourself.

http://www.rolandquast.com/files/pdfbox_black_and_white_bug.html

> PDPage convertToImage bug creates white images from black and white pdf files.
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PDFBOX-1018) PDPage convertToImage bug creates white images from black and white pdf files.

Posted by "Roland Quast (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Roland Quast updated PDFBOX-1018:
---------------------------------

    Attachment: color.pdf
                ColorWorks.java
                black_and_white.pdf
                BlackAndWhiteBug.java

> PDPage convertToImage bug creates white images from black and white pdf files.
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (PDFBOX-1018) Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)

Posted by "Roland Quast (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Roland Quast closed PDFBOX-1018.
--------------------------------

    Resolution: Fixed

Re-closing. The tast_bb.pdf file is not CCITT encoded, it is a different issue.

> Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>             Fix For: 1.6.0
>
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf, tast_bb.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1018) PDPage convertToImage bug creates white images from black and white pdf files.

Posted by "Roland Quast (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039525#comment-13039525 ] 

Roland Quast commented on PDFBOX-1018:
--------------------------------------

Oh and one other thing, isn't that warning message completely inappropriate for the kind of problem it has caused? Shouldn't it be throwing an exception instead of silently failing?

> PDPage convertToImage bug creates white images from black and white pdf files.
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PDFBOX-1018) Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-1018.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.6.0

The ImageJ lib is GPL licensed which isn't compatible with our apache license. Furthermore it needs the non_com.media.jai.codec which is a early version of the jai-imageIO lib.

I finally figured out how to embed the TIFFFaxDecoder. I remove the JAI dependency in revision 1128912.

> Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>             Fix For: 1.6.0
>
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1018) Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)

Posted by "Roland Quast (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040892#comment-13040892 ] 

Roland Quast commented on PDFBOX-1018:
--------------------------------------

Well done! Sorry, I forgot the library was LGPL licensed. Yes, the non_com is also a BSD style license, but old like you say. Now that I've found out XMLGraphics has a TIFF reader, I might end up using that as well. Thanks once again!

> Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>             Fix For: 1.6.0
>
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1018) PDPage convertToImage bug creates white images from black and white pdf files.

Posted by "Roland Quast (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040698#comment-13040698 ] 

Roland Quast commented on PDFBOX-1018:
--------------------------------------

You are correct about the lack of the fax decode codec in Sanselan, I just checked it on their supported formats page. I had to fix the same problem in a TIFF file reader before, and after looking at the code I saw that I used ImageJ to do this. I don't know if the code is of any use, but the ImageJ license is public domain.

public static BufferedImage convertRenderedImage(RenderedImage img, String[] decoders) throws UnsupportedImageModelException {
	if (img instanceof BufferedImage) {
	    return (BufferedImage)img;	
	}	

	WritableRaster wr = ImagePlusCreator.forceTileUpdate(img);
	ImagePlus im;
	
	if (decoders[0].equalsIgnoreCase("GIF")
		|| decoders[0].equalsIgnoreCase("JPEG")) {
	    // Convert the way ImageJ does (ij.io.Opener.openJpegOrGif())
	    BufferedImage bi = new BufferedImage(img.getColorModel(), wr, false, null);
	    im = ImagePlusCreator.create(wr, img.getColorModel());
	    im.setImage(bi);
	} else {
	    im = ImagePlusCreator.create(wr, img.getColorModel());

	    if (img instanceof TIFFImage) {
		TIFFImage ti = (TIFFImage) img;
		try {
		    Object o = ti.getProperty("tiff_directory");
		    if (o instanceof TIFFDirectory) {

			TIFFDirectory dir = (TIFFDirectory) o;

			BufferedImage preimg = im.getBufferedImage();

			int compression = (int)dir.getFieldAsLong(TIFFImageDecoder.TIFF_COMPRESSION);
			switch (compression) {
			case TIFFImage.COMP_FAX_G3_1D:
			case TIFFImage.COMP_FAX_G3_2D:
			case TIFFImage.COMP_FAX_G4_2D:

			    // resize image to the same scale
			    if ( dir.isTagPresent(TIFFImageDecoder.TIFF_X_RESOLUTION) && dir.isTagPresent(TIFFImageDecoder.TIFF_Y_RESOLUTION)) {
				double x_res = dir.getFieldAsDouble(TIFFImageDecoder.TIFF_X_RESOLUTION);
				double y_res = dir.getFieldAsDouble(TIFFImageDecoder.TIFF_Y_RESOLUTION);
				
				double x_scale = 1.0d;
				double y_scale = 1.0d;
				
				if ( x_res != y_res ) {
				    if ( x_res > y_res ) {
					y_scale = x_res / y_res;
				    } else if ( y_res > x_res ) {
					x_scale = y_res / x_res;
				    }
				    
				    BufferedImageOp op = new AffineTransformOp(
					    AffineTransform.getScaleInstance(x_scale, y_scale),
					    new RenderingHints(RenderingHints.KEY_INTERPOLATION,
						    RenderingHints.VALUE_INTERPOLATION_NEAREST_NEIGHBOR));
				    preimg = op.filter(preimg, null);
				
				}
				
			    }
			    
			    // invert an image that has the wrong photometric interpretation
			    if (dir.isTagPresent(TIFFImageDecoder.TIFF_PHOTOMETRIC_INTERPRETATION)) {
				long photo = dir.getFieldAsLong(TIFFImageDecoder.TIFF_PHOTOMETRIC_INTERPRETATION);
				if ( photo == 1 ) {
				    preimg = binarizeImageAndInvert(preimg, 165);
				}
			    }
			    
			    return preimg;
			    
			default:
			    return im.getBufferedImage();
			}

		    }
		} catch (Exception ex) {
		    printStackTrace(ex);
		}
	    }

	}

        return im.getBufferedImage();

    }



BufferedImage newImg = new BufferedImage(image.getWidth(), image.getHeight(), BufferedImage.TYPE_BYTE_BINARY);
	
	WritableRaster raster = newImg.getRaster();
	
	int imageSize = image.getWidth() * image.getHeight();
	int imageWidth = image.getWidth();
	for (int i = 0; i < imageSize; i++) {
	    int y = i / imageWidth;
	    int x = i - (y * imageWidth);
	    if (isBlack(image, x, y, luminanceCutOff)) {
		raster.setSample(x,y,0,1);
	    } else {
		raster.setSample(x,y,0,0);
	    }
	}
	newImg.flush();
	return newImg;
    }

> PDPage convertToImage bug creates white images from black and white pdf files.
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1018) PDPage convertToImage bug creates white images from black and white pdf files.

Posted by "Roland Quast (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039991#comment-13039991 ] 

Roland Quast commented on PDFBOX-1018:
--------------------------------------

Hope you enjoyed my little presentation :-)

I tried jai_imageio.jar as you described and it worked perfectly! Previously I tried jai_core.jar and jai_codec.jar which didn't work. As I mentioned in the video, someone told me that JAI is required to get these PDF files to read, but I didn't know it was jai_imageio.jar.

There are still a few problems though:

1. It shouldn't fail silently. It should throw an exception. It is hard to unit test if you can't catch an exception and you have to assert that the output isn't a white image.
2. The message it throws should be meaningful, not a "null" message on something that is hard to understand.
3. It should mention that you have to use jai_imageio.jar at least in that warning message, or somewhere in the main pages of the PDFBox site.
4. The jai license is not exactly a commercially friendly license. One of the great advantages of using PDFBox is that it uses an Apache license.
5. The jai_imageio.jar contains native code which requires a different jar for each platform. For instance, our app support windows mac and linux. We'd have to use some kind of jar loader to pick the right imageio jar... and that is very difficult.
6. The majority of scanners, when they scan in black and white, will use that codec for PDF.

Having said all of that, what about the chance of using Commons Sanselan (another apache project) which already includes a decoder for that format? The advantage of Sanselan is that it is pure java code (not a native binary), it is Apache licensed and it is stable and mature. I am also assuming it could be distributed with PDFBox at one stage if required.

If you feel like it would take too much time, please let me know and I can try hack up something that works with Sanselan.

> PDPage convertToImage bug creates white images from black and white pdf files.
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1018) Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)

Posted by "Roland Quast (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040902#comment-13040902 ] 

Roland Quast commented on PDFBOX-1018:
--------------------------------------

I just tried a snapshot of the trunk and it works great for most pdf files, except for the one I'm about to attach (tast_bb.pdf). It looks like the photometric interpretation isn't working (the implementation of the IS_BLACK COS dictionary value in your new code?).

It is currently showing as inverted black and white. Shall we open another ticket for this?

> Remove imageIO dependency (was: PDPage convertToImage bug creates white images from black and white pdf files.)
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1018
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1018
>             Project: PDFBox
>          Issue Type: Improvement
>    Affects Versions: 1.2.0, 1.2.1, 1.3.1, 1.4.0, 1.5.0
>         Environment: JDK 1.6.0_22
>            Reporter: Roland Quast
>            Assignee: Andreas Lehmkühler
>            Priority: Critical
>              Labels: pdfbox
>             Fix For: 1.6.0
>
>         Attachments: BlackAndWhiteBug.java, ColorWorks.java, PDFBOX1018-black_and_white1.png, black_and_white.pdf, color.pdf, tast_bb.pdf
>
>
> This bug has been reported in various other tickets submitted before. I am attempting to conclusively prove that this is an issue, and it needs to be attended to since all past tickets regarding this bug have been marked invalid.
> I have attached a video showing very basic code that will reproduce the issue. I have also attached the code that causes the issue, as well as a PDF file that works (a color one), and a black and white PDF file that doesn't.
> The main issue is that when reading a black and white PDF file (see attached black and white pdf file), the following message is displayed, and the contents of the output image is completely white.
> 26/05/2011 3:20:14 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
> WARNING: getRGBImage returned NULL
> We use PDFBox in our program for reading PDF files, and at least 50 percent of our customer's PDF files (from different scanners) will not read because of this issue. This is a complete show stopper, and I'd be more than happy to help in any way I could to resolve it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira