You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2014/10/08 19:16:34 UTC
[jira] [Comment Edited] (PDFBOX-2401) Image has wrong colors after Merge

    [ https://issues.apache.org/jira/browse/PDFBOX-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163804#comment-14163804 ] 

Tilman Hausherr edited comment on PDFBOX-2401 at 10/8/14 5:16 PM:
------------------------------------------------------------------

I did some more research:
- all flate compressed streams are identical in the good and bad files
- I found a difference, which is in the indexed colorspace. That was sortof "invisible" in PDFDebugger because it is made of 0 and FF. In the original "13-17" file, one starts with 000000FF00FFFF and another starts with 000000FF000000. In the "double" file, all start with 000000FF000000.
- something is weird PDFCloneUtility.cloneForNewDocument(): the hex string is is not cloned everytime, the method thinks it has already been cloned?!
- the hash code of the COSString with the hex value is 0 ??????
- however a hash is only a help and identical hashes don't mean equality. A look into COSString shows that a String compare is done. Further trace shows that these different strings are considered identical by java.

Here the relevant debug code I used, the first line is existing code, it is run when the cloner believes that the object already exists in his map of already cloned objects.
{code}
 //we are done, it has already been converted.
if (base instanceof COSString)
{
    System.out.println("WTF!?");
    COSString str1 = (COSString) base;
    System.out.println("c1: " + str1.getHexString().substring(0, 20));
    System.out.println("c1 hash: " + str1.hashCode());
    System.out.println("c1 str hash: " + str1.getString().hashCode());
    System.out.println("c1 hex hash: " + str1.getHexString().hashCode());
    COSString str2 = (COSString) clonedVersion.get(base);
    System.out.println("c2: " + str2.getHexString().substring(0, 20));
    System.out.println("c2 hash: " + str2.hashCode());
    System.out.println("c2 str hash: " + str2.getString().hashCode());
    System.out.println("c2 hex hash: " + str2.getHexString().hashCode());
    System.out.println("are they equal? " + str1.getString().equals(str2.getString()));
}
{code}
The output:
{code}
WTF!?
c1: 000000FF00FFFF000000
c1 hash: 0
c1 str hash: 0
c1 hex hash: -215576448
c2: 000000FF000000FF0000
c2 hash: 0
c2 str hash: 0
c2 hex hash: -1354755968
are they equal? true
{code}

1st solution: don't make a string compare if forceHexForm is true. Doesn't work, because forceHexForm is only set in signatures.

2nd solution: force hex comparison => merge works

3rd solution: remember if the string has weird content => not possible, COSStrings can be changed after construction.

4th solution: use byte compare => merge works

I commited solution 4.


was (Author: tilman):
I did some more research:
- all flate compressed streams are identical in the good and bad files
- I found a difference, which is in the indexed colorspace. That was sortof "invisible" in PDFDebugger because it is made of 0 and FF. In the original "13-17" file, one starts with 000000FF00FFFF and another starts with 000000FF000000. In the "double" file, all start with 000000FF000000.
- something is weird PDFCloneUtility.cloneForNewDocument(): the hex string is is not cloned everytime, the method thinks it has already been cloned?!
- the hash code of the COSString with the hex value is 0 ??????
- however a hash is only a help and identical hashes don't mean equality. A look into COSString shows that a String compare is done. Further trace shows that these different strings are considered identical by java.

Here the relevant debug code I used, the first line is existing code, it is run when the cloner believes that the object already exists in his map of already cloned objects.
{code}
 //we are done, it has already been converted.
if (base instanceof COSString)
{
    System.out.println("WTF!?");
    COSString str1 = (COSString) base;
    System.out.println("c1: " + str1.getHexString().substring(0, 20));
    System.out.println("c1 hash: " + str1.hashCode());
    System.out.println("c1 str hash: " + str1.getString().hashCode());
    System.out.println("c1 hex hash: " + str1.getHexString().hashCode());
    COSString str2 = (COSString) clonedVersion.get(base);
    System.out.println("c2: " + str2.getHexString().substring(0, 20));
    System.out.println("c2 hash: " + str2.hashCode());
    System.out.println("c2 str hash: " + str2.getString().hashCode());
    System.out.println("c2 hex hash: " + str2.getHexString().hashCode());
    System.out.println("are they equal? " + str1.getString().equals(str2.getString()));
}
{code}
The output:
{code}
WTF!?
c1: 000000FF00FFFF000000
c1 hash: 0
c1 str hash: 0
c1 hex hash: -215576448
c2: 000000FF000000FF0000
c2 hash: 0
c2 str hash: 0
c2 hex hash: -1354755968
are they equal? true
{code}

1st solution: don't make a string compare if forceHexForm is true. Doesn't work, because forceHexForm is only set in signatures.

2nd solution: force hex comparison => merge works

3rd solution: remember if the string has weird content => not possible, COSStrings can be changed after construction.

4th solution: use byte compare => merge works

I'll run the rendering tests and then I'll commit solution 4.

> Image has wrong colors after Merge
> ----------------------------------
>
>                 Key: PDFBOX-2401
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2401
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel, Utilities
>    Affects Versions: 2.0.0
>            Reporter: Tilman Hausherr
>            Assignee: Tilman Hausherr
>         Attachments: michael levine.pdf, p13-17.pdf, p13-17double.pdf
>
>
> Marc Davis fronm the user mailing list has provided a file (michael levine.pdf) that, when merged with another file, has a black image on page 17 ("TL-9"). I tried to investigate / narrow this somewhat:
> - it happens with any other file, or just use the michael levine file twice
> - extracting p17 with PDFSplit and then merging the result doesn't do it
> - extracting p1-17 with PDFSplit  and then merging the result does do it
> - extracting p13-17 with PDFSplit  and then merging the result does do it, altthough the black is now at the first page
> The page is not really "black", the colors are incorrect.
> That's all I found out until now. I compared the two files with PDFDebugger and can't see any obvious differences. I looked into the files with NOTEPAD++, there are some differences like that the colorspace is now indirect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)