You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2022/03/23 19:01:00 UTC

[jira] [Comment Edited] (PDFBOX-5397) Certain PDF cannot be processed

    [ https://issues.apache.org/jira/browse/PDFBOX-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511348#comment-17511348 ] 

Tilman Hausherr edited comment on PDFBOX-5397 at 3/23/22, 7:00 PM:
-------------------------------------------------------------------

TET_5_4xxx_GR_00_00_XX_14_F.pdf fails bitonal rendering with Amazon Corretto 11, 17 and 18 but works on jdk8.

Here's what happens last:
{noformat}
PDFOperator{d} [COSArray{[COSFloat{.001}, COSInt{0}, COSInt{0}, COSInt{0}]}, COSFloat{.001}]
PDFOperator{m} [COSFloat{1182.74}, COSFloat{814.76}]
PDFOperator{l} [COSFloat{1428.79}, COSFloat{814.76}]
PDFOperator{S} []
{noformat}
The cause is (once again) a java bug related to a poor dash pattern. As seen in PDFBOX-3360, PDFBOX-2373, PDFBOX-2929, PDFBOX-3204, PDFBOX-3813 and PDFBOX-3724. We have a workaround but it is disabled for jdk10 and higher since PDFBOX-4492 because I thought they had fixed it. So they didn't.
{code}
public class DashCrash
{
    public static void main(String[] args)
    {
        System.out.println("Version: " + System.getProperty("java.version"));
        long t0 = System.currentTimeMillis();
        BufferedImage bim = new BufferedImage(2000, 2000, BufferedImage.TYPE_BYTE_BINARY);
        Graphics2D g2d = (Graphics2D) bim.getGraphics();
        GeneralPath path = new GeneralPath();
        path.moveTo(1182.74f, 814.76f);
        path.lineTo(1428.79f, 814.76f);
        path.closePath();
        float[] dash = {0.001f};
        BasicStroke stroke = new BasicStroke(1, BasicStroke.CAP_BUTT, BasicStroke.JOIN_MITER, 10, dash, 0);
        g2d.setStroke(stroke);
        g2d.draw(path);
        g2d.dispose();
        System.out.println("done in " + (System.currentTimeMillis() - t0) + " millis");
    }
}
{code}
I played around a bit: it does end some time, and the time depends on the value, but also on the image type. 0.005 is done in 3 seconds, 0.002 in 25 seconds.

Re Adobe error msg, no idea what's going on, maybe it's the dash pattern, maybe not. It's probably not the fault of PDFTron. The pattern is likely calculated by another software that uses PDFTron. So that software should avoid using tiny tiny dash patterns, these make no sense.


was (Author: tilman):
TET_5_4xxx_GR_00_00_XX_14_F.pdf fails bitonal rendering with Amazon Corretto 11, 17 and 18 but works on jdk8.

Here's what happens last:
{noformat}
PDFOperator{d} [COSArray{[COSFloat{.001}, COSInt{0}, COSInt{0}, COSInt{0}]}, COSFloat{.001}]
PDFOperator{m} [COSFloat{1182.74}, COSFloat{814.76}]
PDFOperator{l} [COSFloat{1428.79}, COSFloat{814.76}]
PDFOperator{S} []
{noformat}
The cause is (once again) a java bug related to a poor dash pattern. As seen in PDFBOX-3360, PDFBOX-2373, PDFBOX-2929, PDFBOX-3204, PDFBOX-3813 and PDFBOX-3724. We have a workaround but it is disabled for jdk10 and higher since PDFBOX-4492 because I thought they had fixed it. So they didn't.
{code}
public class DashCrash
{
    public static void main(String[] args)
    {
        System.out.println("Version: " + System.getProperty("java.version"));
        long t0 = System.currentTimeMillis();
        BufferedImage bim = new BufferedImage(2000, 2000, BufferedImage.TYPE_BYTE_BINARY);
        Graphics2D g2d = (Graphics2D) bim.getGraphics();
        GeneralPath path = new GeneralPath();
        path.moveTo(1182.74f, 814.76f);
        path.lineTo(1428.79f, 814.76f);
        path.closePath();
        float[] dash = {0.001f};
        BasicStroke stroke = new BasicStroke(1, BasicStroke.CAP_BUTT, BasicStroke.JOIN_MITER, 10, dash, 0);
        g2d.setStroke(stroke);
        g2d.draw(path);
        g2d.dispose();
        System.out.println("done in " + (System.currentTimeMillis() - t0) + " millis");
    }
}
{code}
I played around a bit: it does end some time, and the time depends on the value, but also on the image type. 0.005 is done in 3 seconds, 0.002 in 25 seconds.

Re Adobe error msg, no idea what's going on, maybe it's the dash pattern, maybe not. It's probably not be the fault of PDFTron. The pattern is likely calculated by another software that uses PDFTron. So they should avoid using tiny tiny dash patterns, these make no sense.

> Certain PDF cannot be processed
> -------------------------------
>
>                 Key: PDFBOX-5397
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5397
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.24, 2.0.25
>            Reporter: Tobias Hugendubel
>            Priority: Blocker
>             Fix For: 2.0.26, 3.0.0 PDFBox
>
>         Attachments: TET_5_4xxx_GR_00_00_XX_14_F.pdf, TSA_5_4xxx_SH_XX_05_XX_03_F.pdf, TSA_5_4xxx_SH_XX_05_XX_03_F1.jpg, TSA_5_BF2x_GR_-1_01_XX_09_F.pdf, image-2022-03-23-12-25-56-963.png, image-2022-03-23-12-29-03-735.png
>
>
> !https://cdn.discordapp.com/attachments/381016918703996928/955833631484563566/unknown.png|width=570,height=291!
> For certain PDFs where we use PDFBox to open a PDF, scan for defined dummy QR codes on it, and then replace the dummy with a real QR code, we either get the above error, or the process does not terminate.
> A sample file TET_5_4xxx_GR_00_00_XX_14_F.pdf or TSA_5_BF2x_GR_-1_01_XX_09_F.pdf .
> They both lead to the above problem.
> Our own analysis so far is that it might be the same issue as mentionend in [https://stackoverflow.com/questions/69237146/pdfbox-renderimagewithdpi-hangs-sometimes]
>  
> For our company PMG Projektraum GmbH, Munich, Germany, this is an essential function. Currently users cannot download PDFs in most cases because of this: The download tries to add a QR code and never ends.
> It could be that the QR code on these PDFs causes issues because it is in a certain layer. We observed that in Adobe Reader we see a QR Code but with or own viewer it is invisible:
> This is TSA_5_4xxx_SH_XX_05_XX_03_F.pdf
> where we have the same issues
> Any idea, what this might be and how to solve it?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org