You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Andrea Canu <an...@gmail.com> on 2016/06/08 11:27:32 UTC

Invalid signed content from PDSignature

Hi guys

I want to ask you about the correct way to get the signed-content from the
signature.
Since now I've used the PDSignature class's method:

signature.getSignedContent ( *pdfInputStream *)

With this method I'm able to extract from the *pdfInputStream *the
byte-array of the signed-content based on the signature's ByteRange.

I've noticed that if I try to verify the signature based on that
byte-array, the verification sometime unexpectedly fails!

Now, looking at the COSParser class I've found this method :

COSParser.parseHeader


This method, trying to find the correct document's header, is able to skip
some garbage in the PDF document looking for the markers "%PDF-" and
"%FDF-".

So, I've noticed that the signature verification succeed if I skip that
garbage during the signed-content extraction.

My question is:
Why this garbage-management is not present also into the getSignedContent
code?

The workaround I found is to skip that garbage manually from the
*pdfInputStream*, but now the problem is the correct way to calculate the
offset for the *pdfInputStream.*

Any suggestion?

Kinds regards
Andrea.

Re: Invalid signed content from PDSignature

Posted by Tilman Hausherr <TH...@t-online.de>.
Hello Andrea,

There is no "strict" mode. We process even the crappiest PDFs. Because 
of the many people who insist that they are OK, "but they display with 
Adobe Reader!".

All you could do is to insist on PDF/A. Then we'd have PDFBox preflight 
for you. But we don't have anything that just checks against the full 
PDF specification.

Checking that PDFs start with "%PDF" should be easy, i.e. you don't need 
PDFBox for that.

Tilman

Am 10.06.2016 um 12:29 schrieb Andrea Canu:
> Hi Tilman, you are correct!
> My file is a zip file which contains three signed PDF documents.
>
> But now I'm in trouble again.
>
> Why PDDocument PDFParser Irecognize
> Reading this stream with the two classes PDDocument PDFParser I'm not able
> to detect if some "header's junk" are skipped by the parser! In this case,
> all PDSignature  extracted from the obtained PDDocument refers to a
> byte-range with invalid offsets.
> The problem
>
> Is it possible to read the PDF stream in "strict-mode" ? This capability
> could be useful to detect if a PDF is not "clean"
>
> Alternatively, the PDDocument class can be provided by a new method that
> should return the signed-content for a given PDSignature.
>
>
> Andrea
>
> p.s
> No, the Signature's validation I'm refering to is obtained by a known
> commercial library
>
> On Thu, Jun 9, 2016 at 6:24 PM, Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> Hello Andrea,
>>
>> I disagree - IMHO your PDF is incorrect. "PK" means that it is a ZIP file.
>> Apparently with an uncompressed PDF in it (yes, ZIP can have uncompressed
>> files). Of course one could adjust the offsets, but this wouldn't be right:
>> the PDF has been modified, the PK header has been added. Try renaming that
>> file and then click on it to confirm my theory that it is really a ZIP file.
>>
>> (I suspect you'll tell me that it validates with Adobe Reader. If so, then
>> I'd say Adobe is wrong. I just tried adding "XXXX" in front of a file with
>> NOTEPAD++ and Adobe does not tell that the file was modified.)
>>
>> The good thing is that there is no bug in COSFilterInputStream (I was
>> afraid of that), so I'll use getSignedContent() in the signature example
>> instead of the code I have now.
>>
>> Tilman
>>
>>
>> Am 09.06.2016 um 10:45 schrieb Andrea Canu:
>>
>>> Hi Tilman
>>> thank you for your answer.
>>>
>>> The PDF is a real document so I can't share it, but I can give you an
>>> extract:
>>>
>>> Those are the first 1044 bytes of the document.
>>> --------------------------------------------------------------
>>>
>>>
>>>
>>> *PK      �Js: ���3� 3� <   CAACT-00-00-08 document.pdf*%PDF-1.6
>>>
>>> %����
>>> 3582 0 obj
>>> <</Linearized 1/L 697139/O 3585/E 118808/N 42/T 625450/H [ 1000 1986]>>
>>> endobj
>>>
>>> xref
>>> 3582 34
>>> 0000000016 00000 n
>>> 0000003154 00000 n
>>> 0000003481 00000 n
>>> 0000003680 00000 n
>>> 0000004019 00000 n
>>> 0000004048 00000 n
>>> 0000004265 00000 n
>>> 0000004495 00000 n
>>> 0000004765 00000 n
>>> 0000004950 00000 n
>>> 0000006189 00000 n
>>> 0000007372 00000 n
>>> 0000007629 00000 n
>>> 0000060752 00000 n
>>> 0000061525 00000 n
>>> 0000062245 00000 n
>>> 0000062284 00000 n
>>> 0000062509 00000 n
>>> 0000062740 00000 n
>>> 0000062819 00000 n
>>> 0000064540 00000 n
>>> 0000064945 00000 n
>>> 0000065082 00000 n
>>> 0000065306 00000 n
>>> 0000065606 00000 n
>>> 0000072471 00000 n
>>> 0000075166 00000 n
>>> 0000078960 00000 n
>>> 0000079194 00000 n
>>> 0000079411 00000 n
>>> 0000118645 00000 n
>>> 0000118722 00000 n
>>> 0000002986 00000 n
>>> 0000001000 00000 n
>>> trailer
>>> <</Size 3616/Prev 625437/XRefStm 2986/Root 3583 0 R/Info 3580 0
>>>
>>> R/ID[<A71F76F2A24FB6D888EDCB04CB86B815><6CCE97BD63E74F479ED22F39881647F0>]>>
>>> startxref
>>> 0
>>> %%EOF
>>>
>>> .....
>>> --------------------------------------------------------------
>>>
>>> I would to bring your attention to the first 60 bytes.
>>> Those bytes are stripped out by the *COSParser *parser, skipped like
>>> garbage.
>>> The method that skips those bytes  is:
>>>
>>> COSParser.parserHeader(PDF_HEADER, PDF_DEFAULT_VERSION)
>>>
>>> ....
>>> private static final String PDF_HEADER = "%PDF-";
>>>
>>>
>>> I've noticed that I must to manually skip too those 60 bytes from the
>>> *pdfInputStream
>>> *before to call the method
>>>
>>> signature.getSignedContent ( *pdfInputStream *)
>>>
>>>
>>> In this way, the returned byte-array digest HASH and the HASH inside
>>> signature match.
>>>
>>>
>>> Andrea
>>>
>>>
>>> On Wed, Jun 8, 2016 at 6:06 PM, Tilman Hausherr <TH...@t-online.de>
>>> wrote:
>>>
>>> Am 08.06.2016 um 13:27 schrieb Andrea Canu:
>>>> Hi guys
>>>>> I want to ask you about the correct way to get the signed-content from
>>>>> the
>>>>> signature.
>>>>> Since now I've used the PDSignature class's method:
>>>>>
>>>>> signature.getSignedContent ( *pdfInputStream *)
>>>>>
>>>>> With this method I'm able to extract from the *pdfInputStream *the
>>>>> byte-array of the signed-content based on the signature's ByteRange.
>>>>>
>>>>> I've noticed that if I try to verify the signature based on that
>>>>> byte-array, the verification sometime unexpectedly fails!
>>>>>
>>>>> Hello Andrea,
>>>> Can you share the PDF (upload it)?
>>>>
>>>> I doubt your theory re: bug in COSParser. I'd rather search if there is a
>>>> bug in COSFilterInputStream.
>>>>
>>>> If you can't share the PDF, then please download the bytes "the hard
>>>> way":
>>>>
>>>>                       // download the signed content, described in
>>>> /ByteRange COSArray:
>>>>                       // [offset1 len1 offset2 len2]
>>>>                       int[] byteRange = sig.getByteRange();
>>>>                       byte[] buf = new byte[byteRange[1] + byteRange[3]];
>>>>                       RandomAccessFile raf = new RandomAccessFile(infile,
>>>> "r");
>>>>                       raf.seek(byteRange[0]);
>>>>                       raf.readFully(buf, byteRange[0], byteRange[1]);
>>>>                       raf.seek(byteRange[2]);
>>>>                       raf.readFully(buf, byteRange[1], byteRange[3]);
>>>>                       raf.close();
>>>>
>>>> This code is not fully correct, because /ByteRange might have more than 4
>>>> elements. So have a look at it to be sure.
>>>>
>>>> Then compare the byte array "buf" with the one from getSignedContent.
>>>>
>>>> Another possibility that it fails might be that there are different
>>>> signature methods. See the code at
>>>>
>>>>
>>>> https://svn.apache.org/viewvc/pdfbox/branches/2.0/examples/src/main/java/org/apache/pdfbox/examples/signature/ShowSignature.java?view=markup
>>>>
>>>> I didn't use getsignedContent() there but I think I should. So I'd be
>>>> very
>>>> interested to find out if there is a bug there.
>>>>
>>>> Tilman
>>>>
>>>>
>>>> Now, looking at the COSParser class I've found this method :
>>>>> COSParser.parseHeader
>>>>>
>>>>>
>>>>> This method, trying to find the correct document's header, is able to
>>>>> skip
>>>>> some garbage in the PDF document looking for the markers "%PDF-" and
>>>>> "%FDF-".
>>>>>
>>>>> So, I've noticed that the signature verification succeed if I skip that
>>>>> garbage during the signed-content extraction.
>>>>>
>>>>> My question is:
>>>>> Why this garbage-management is not present also into the
>>>>> getSignedContent
>>>>> code?
>>>>>
>>>>> The workaround I found is to skip that garbage manually from the
>>>>> *pdfInputStream*, but now the problem is the correct way to calculate
>>>>> the
>>>>> offset for the *pdfInputStream.*
>>>>>
>>>>> Any suggestion?
>>>>>
>>>>> Kinds regards
>>>>> Andrea.
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Invalid signed content from PDSignature

Posted by Andrea Canu <an...@gmail.com>.
Hi Tilman, you are correct!
My file is a zip file which contains three signed PDF documents.

But now I'm in trouble again.

Why PDDocument PDFParser Irecognize
Reading this stream with the two classes PDDocument PDFParser I'm not able
to detect if some "header's junk" are skipped by the parser! In this case,
all PDSignature  extracted from the obtained PDDocument refers to a
byte-range with invalid offsets.
The problem

Is it possible to read the PDF stream in "strict-mode" ? This capability
could be useful to detect if a PDF is not "clean"

Alternatively, the PDDocument class can be provided by a new method that
should return the signed-content for a given PDSignature.


Andrea

p.s
No, the Signature's validation I'm refering to is obtained by a known
commercial library

On Thu, Jun 9, 2016 at 6:24 PM, Tilman Hausherr <TH...@t-online.de>
wrote:

> Hello Andrea,
>
> I disagree - IMHO your PDF is incorrect. "PK" means that it is a ZIP file.
> Apparently with an uncompressed PDF in it (yes, ZIP can have uncompressed
> files). Of course one could adjust the offsets, but this wouldn't be right:
> the PDF has been modified, the PK header has been added. Try renaming that
> file and then click on it to confirm my theory that it is really a ZIP file.
>
> (I suspect you'll tell me that it validates with Adobe Reader. If so, then
> I'd say Adobe is wrong. I just tried adding "XXXX" in front of a file with
> NOTEPAD++ and Adobe does not tell that the file was modified.)
>
> The good thing is that there is no bug in COSFilterInputStream (I was
> afraid of that), so I'll use getSignedContent() in the signature example
> instead of the code I have now.
>
> Tilman
>
>
> Am 09.06.2016 um 10:45 schrieb Andrea Canu:
>
>> Hi Tilman
>> thank you for your answer.
>>
>> The PDF is a real document so I can't share it, but I can give you an
>> extract:
>>
>> Those are the first 1044 bytes of the document.
>> --------------------------------------------------------------
>>
>>
>>
>> *PK      ¹Js: ¼àð3£ 3£ <   CAACT-00-00-08 document.pdf*%PDF-1.6
>>
>> %âãÏÓ
>> 3582 0 obj
>> <</Linearized 1/L 697139/O 3585/E 118808/N 42/T 625450/H [ 1000 1986]>>
>> endobj
>>
>> xref
>> 3582 34
>> 0000000016 00000 n
>> 0000003154 00000 n
>> 0000003481 00000 n
>> 0000003680 00000 n
>> 0000004019 00000 n
>> 0000004048 00000 n
>> 0000004265 00000 n
>> 0000004495 00000 n
>> 0000004765 00000 n
>> 0000004950 00000 n
>> 0000006189 00000 n
>> 0000007372 00000 n
>> 0000007629 00000 n
>> 0000060752 00000 n
>> 0000061525 00000 n
>> 0000062245 00000 n
>> 0000062284 00000 n
>> 0000062509 00000 n
>> 0000062740 00000 n
>> 0000062819 00000 n
>> 0000064540 00000 n
>> 0000064945 00000 n
>> 0000065082 00000 n
>> 0000065306 00000 n
>> 0000065606 00000 n
>> 0000072471 00000 n
>> 0000075166 00000 n
>> 0000078960 00000 n
>> 0000079194 00000 n
>> 0000079411 00000 n
>> 0000118645 00000 n
>> 0000118722 00000 n
>> 0000002986 00000 n
>> 0000001000 00000 n
>> trailer
>> <</Size 3616/Prev 625437/XRefStm 2986/Root 3583 0 R/Info 3580 0
>>
>> R/ID[<A71F76F2A24FB6D888EDCB04CB86B815><6CCE97BD63E74F479ED22F39881647F0>]>>
>> startxref
>> 0
>> %%EOF
>>
>> .....
>> --------------------------------------------------------------
>>
>> I would to bring your attention to the first 60 bytes.
>> Those bytes are stripped out by the *COSParser *parser, skipped like
>> garbage.
>> The method that skips those bytes  is:
>>
>> COSParser.parserHeader(PDF_HEADER, PDF_DEFAULT_VERSION)
>>
>> ....
>> private static final String PDF_HEADER = "%PDF-";
>>
>>
>> I've noticed that I must to manually skip too those 60 bytes from the
>> *pdfInputStream
>> *before to call the method
>>
>> signature.getSignedContent ( *pdfInputStream *)
>>
>>
>> In this way, the returned byte-array digest HASH and the HASH inside
>> signature match.
>>
>>
>> Andrea
>>
>>
>> On Wed, Jun 8, 2016 at 6:06 PM, Tilman Hausherr <TH...@t-online.de>
>> wrote:
>>
>> Am 08.06.2016 um 13:27 schrieb Andrea Canu:
>>>
>>> Hi guys
>>>>
>>>> I want to ask you about the correct way to get the signed-content from
>>>> the
>>>> signature.
>>>> Since now I've used the PDSignature class's method:
>>>>
>>>> signature.getSignedContent ( *pdfInputStream *)
>>>>
>>>> With this method I'm able to extract from the *pdfInputStream *the
>>>> byte-array of the signed-content based on the signature's ByteRange.
>>>>
>>>> I've noticed that if I try to verify the signature based on that
>>>> byte-array, the verification sometime unexpectedly fails!
>>>>
>>>> Hello Andrea,
>>>
>>> Can you share the PDF (upload it)?
>>>
>>> I doubt your theory re: bug in COSParser. I'd rather search if there is a
>>> bug in COSFilterInputStream.
>>>
>>> If you can't share the PDF, then please download the bytes "the hard
>>> way":
>>>
>>>                      // download the signed content, described in
>>> /ByteRange COSArray:
>>>                      // [offset1 len1 offset2 len2]
>>>                      int[] byteRange = sig.getByteRange();
>>>                      byte[] buf = new byte[byteRange[1] + byteRange[3]];
>>>                      RandomAccessFile raf = new RandomAccessFile(infile,
>>> "r");
>>>                      raf.seek(byteRange[0]);
>>>                      raf.readFully(buf, byteRange[0], byteRange[1]);
>>>                      raf.seek(byteRange[2]);
>>>                      raf.readFully(buf, byteRange[1], byteRange[3]);
>>>                      raf.close();
>>>
>>> This code is not fully correct, because /ByteRange might have more than 4
>>> elements. So have a look at it to be sure.
>>>
>>> Then compare the byte array "buf" with the one from getSignedContent.
>>>
>>> Another possibility that it fails might be that there are different
>>> signature methods. See the code at
>>>
>>>
>>> https://svn.apache.org/viewvc/pdfbox/branches/2.0/examples/src/main/java/org/apache/pdfbox/examples/signature/ShowSignature.java?view=markup
>>>
>>> I didn't use getsignedContent() there but I think I should. So I'd be
>>> very
>>> interested to find out if there is a bug there.
>>>
>>> Tilman
>>>
>>>
>>> Now, looking at the COSParser class I've found this method :
>>>>
>>>> COSParser.parseHeader
>>>>
>>>>
>>>> This method, trying to find the correct document's header, is able to
>>>> skip
>>>> some garbage in the PDF document looking for the markers "%PDF-" and
>>>> "%FDF-".
>>>>
>>>> So, I've noticed that the signature verification succeed if I skip that
>>>> garbage during the signed-content extraction.
>>>>
>>>> My question is:
>>>> Why this garbage-management is not present also into the
>>>> getSignedContent
>>>> code?
>>>>
>>>> The workaround I found is to skip that garbage manually from the
>>>> *pdfInputStream*, but now the problem is the correct way to calculate
>>>> the
>>>> offset for the *pdfInputStream.*
>>>>
>>>> Any suggestion?
>>>>
>>>> Kinds regards
>>>> Andrea.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: Invalid signed content from PDSignature

Posted by Tilman Hausherr <TH...@t-online.de>.
Hello Andrea,

I disagree - IMHO your PDF is incorrect. "PK" means that it is a ZIP 
file. Apparently with an uncompressed PDF in it (yes, ZIP can have 
uncompressed files). Of course one could adjust the offsets, but this 
wouldn't be right: the PDF has been modified, the PK header has been 
added. Try renaming that file and then click on it to confirm my theory 
that it is really a ZIP file.

(I suspect you'll tell me that it validates with Adobe Reader. If so, 
then I'd say Adobe is wrong. I just tried adding "XXXX" in front of a 
file with NOTEPAD++ and Adobe does not tell that the file was modified.)

The good thing is that there is no bug in COSFilterInputStream (I was 
afraid of that), so I'll use getSignedContent() in the signature example 
instead of the code I have now.

Tilman


Am 09.06.2016 um 10:45 schrieb Andrea Canu:
> Hi Tilman
> thank you for your answer.
>
> The PDF is a real document so I can't share it, but I can give you an
> extract:
>
> Those are the first 1044 bytes of the document.
> --------------------------------------------------------------
>
>
>
> *PK      �Js: ���3� 3� <   CAACT-00-00-08 document.pdf*%PDF-1.6
> %����
> 3582 0 obj
> <</Linearized 1/L 697139/O 3585/E 118808/N 42/T 625450/H [ 1000 1986]>>
> endobj
>
> xref
> 3582 34
> 0000000016 00000 n
> 0000003154 00000 n
> 0000003481 00000 n
> 0000003680 00000 n
> 0000004019 00000 n
> 0000004048 00000 n
> 0000004265 00000 n
> 0000004495 00000 n
> 0000004765 00000 n
> 0000004950 00000 n
> 0000006189 00000 n
> 0000007372 00000 n
> 0000007629 00000 n
> 0000060752 00000 n
> 0000061525 00000 n
> 0000062245 00000 n
> 0000062284 00000 n
> 0000062509 00000 n
> 0000062740 00000 n
> 0000062819 00000 n
> 0000064540 00000 n
> 0000064945 00000 n
> 0000065082 00000 n
> 0000065306 00000 n
> 0000065606 00000 n
> 0000072471 00000 n
> 0000075166 00000 n
> 0000078960 00000 n
> 0000079194 00000 n
> 0000079411 00000 n
> 0000118645 00000 n
> 0000118722 00000 n
> 0000002986 00000 n
> 0000001000 00000 n
> trailer
> <</Size 3616/Prev 625437/XRefStm 2986/Root 3583 0 R/Info 3580 0
> R/ID[<A71F76F2A24FB6D888EDCB04CB86B815><6CCE97BD63E74F479ED22F39881647F0>]>>
> startxref
> 0
> %%EOF
>
> .....
> --------------------------------------------------------------
>
> I would to bring your attention to the first 60 bytes.
> Those bytes are stripped out by the *COSParser *parser, skipped like
> garbage.
> The method that skips those bytes  is:
>
> COSParser.parserHeader(PDF_HEADER, PDF_DEFAULT_VERSION)
>
> ....
> private static final String PDF_HEADER = "%PDF-";
>
>
> I've noticed that I must to manually skip too those 60 bytes from the
> *pdfInputStream
> *before to call the method
>
> signature.getSignedContent ( *pdfInputStream *)
>
> In this way, the returned byte-array digest HASH and the HASH inside
> signature match.
>
>
> Andrea
>
>
> On Wed, Jun 8, 2016 at 6:06 PM, Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> Am 08.06.2016 um 13:27 schrieb Andrea Canu:
>>
>>> Hi guys
>>>
>>> I want to ask you about the correct way to get the signed-content from the
>>> signature.
>>> Since now I've used the PDSignature class's method:
>>>
>>> signature.getSignedContent ( *pdfInputStream *)
>>>
>>> With this method I'm able to extract from the *pdfInputStream *the
>>> byte-array of the signed-content based on the signature's ByteRange.
>>>
>>> I've noticed that if I try to verify the signature based on that
>>> byte-array, the verification sometime unexpectedly fails!
>>>
>> Hello Andrea,
>>
>> Can you share the PDF (upload it)?
>>
>> I doubt your theory re: bug in COSParser. I'd rather search if there is a
>> bug in COSFilterInputStream.
>>
>> If you can't share the PDF, then please download the bytes "the hard way":
>>
>>                      // download the signed content, described in
>> /ByteRange COSArray:
>>                      // [offset1 len1 offset2 len2]
>>                      int[] byteRange = sig.getByteRange();
>>                      byte[] buf = new byte[byteRange[1] + byteRange[3]];
>>                      RandomAccessFile raf = new RandomAccessFile(infile,
>> "r");
>>                      raf.seek(byteRange[0]);
>>                      raf.readFully(buf, byteRange[0], byteRange[1]);
>>                      raf.seek(byteRange[2]);
>>                      raf.readFully(buf, byteRange[1], byteRange[3]);
>>                      raf.close();
>>
>> This code is not fully correct, because /ByteRange might have more than 4
>> elements. So have a look at it to be sure.
>>
>> Then compare the byte array "buf" with the one from getSignedContent.
>>
>> Another possibility that it fails might be that there are different
>> signature methods. See the code at
>>
>> https://svn.apache.org/viewvc/pdfbox/branches/2.0/examples/src/main/java/org/apache/pdfbox/examples/signature/ShowSignature.java?view=markup
>>
>> I didn't use getsignedContent() there but I think I should. So I'd be very
>> interested to find out if there is a bug there.
>>
>> Tilman
>>
>>
>>> Now, looking at the COSParser class I've found this method :
>>>
>>> COSParser.parseHeader
>>>
>>>
>>> This method, trying to find the correct document's header, is able to skip
>>> some garbage in the PDF document looking for the markers "%PDF-" and
>>> "%FDF-".
>>>
>>> So, I've noticed that the signature verification succeed if I skip that
>>> garbage during the signed-content extraction.
>>>
>>> My question is:
>>> Why this garbage-management is not present also into the getSignedContent
>>> code?
>>>
>>> The workaround I found is to skip that garbage manually from the
>>> *pdfInputStream*, but now the problem is the correct way to calculate the
>>> offset for the *pdfInputStream.*
>>>
>>> Any suggestion?
>>>
>>> Kinds regards
>>> Andrea.
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Invalid signed content from PDSignature

Posted by Andrea Canu <an...@gmail.com>.
Hi Tilman
thank you for your answer.

The PDF is a real document so I can't share it, but I can give you an
extract:

Those are the first 1044 bytes of the document.
--------------------------------------------------------------



*PK      ¹Js: ¼àð3£ 3£ <   CAACT-00-00-08 document.pdf*%PDF-1.6
%âãÏÓ
3582 0 obj
<</Linearized 1/L 697139/O 3585/E 118808/N 42/T 625450/H [ 1000 1986]>>
endobj

xref
3582 34
0000000016 00000 n
0000003154 00000 n
0000003481 00000 n
0000003680 00000 n
0000004019 00000 n
0000004048 00000 n
0000004265 00000 n
0000004495 00000 n
0000004765 00000 n
0000004950 00000 n
0000006189 00000 n
0000007372 00000 n
0000007629 00000 n
0000060752 00000 n
0000061525 00000 n
0000062245 00000 n
0000062284 00000 n
0000062509 00000 n
0000062740 00000 n
0000062819 00000 n
0000064540 00000 n
0000064945 00000 n
0000065082 00000 n
0000065306 00000 n
0000065606 00000 n
0000072471 00000 n
0000075166 00000 n
0000078960 00000 n
0000079194 00000 n
0000079411 00000 n
0000118645 00000 n
0000118722 00000 n
0000002986 00000 n
0000001000 00000 n
trailer
<</Size 3616/Prev 625437/XRefStm 2986/Root 3583 0 R/Info 3580 0
R/ID[<A71F76F2A24FB6D888EDCB04CB86B815><6CCE97BD63E74F479ED22F39881647F0>]>>
startxref
0
%%EOF

.....
--------------------------------------------------------------

I would to bring your attention to the first 60 bytes.
Those bytes are stripped out by the *COSParser *parser, skipped like
garbage.
The method that skips those bytes  is:

COSParser.parserHeader(PDF_HEADER, PDF_DEFAULT_VERSION)

....
private static final String PDF_HEADER = "%PDF-";


I've noticed that I must to manually skip too those 60 bytes from the
*pdfInputStream
*before to call the method

signature.getSignedContent ( *pdfInputStream *)

In this way, the returned byte-array digest HASH and the HASH inside
signature match.


Andrea


On Wed, Jun 8, 2016 at 6:06 PM, Tilman Hausherr <TH...@t-online.de>
wrote:

> Am 08.06.2016 um 13:27 schrieb Andrea Canu:
>
>> Hi guys
>>
>> I want to ask you about the correct way to get the signed-content from the
>> signature.
>> Since now I've used the PDSignature class's method:
>>
>> signature.getSignedContent ( *pdfInputStream *)
>>
>> With this method I'm able to extract from the *pdfInputStream *the
>> byte-array of the signed-content based on the signature's ByteRange.
>>
>> I've noticed that if I try to verify the signature based on that
>> byte-array, the verification sometime unexpectedly fails!
>>
>
> Hello Andrea,
>
> Can you share the PDF (upload it)?
>
> I doubt your theory re: bug in COSParser. I'd rather search if there is a
> bug in COSFilterInputStream.
>
> If you can't share the PDF, then please download the bytes "the hard way":
>
>                     // download the signed content, described in
> /ByteRange COSArray:
>                     // [offset1 len1 offset2 len2]
>                     int[] byteRange = sig.getByteRange();
>                     byte[] buf = new byte[byteRange[1] + byteRange[3]];
>                     RandomAccessFile raf = new RandomAccessFile(infile,
> "r");
>                     raf.seek(byteRange[0]);
>                     raf.readFully(buf, byteRange[0], byteRange[1]);
>                     raf.seek(byteRange[2]);
>                     raf.readFully(buf, byteRange[1], byteRange[3]);
>                     raf.close();
>
> This code is not fully correct, because /ByteRange might have more than 4
> elements. So have a look at it to be sure.
>
> Then compare the byte array "buf" with the one from getSignedContent.
>
> Another possibility that it fails might be that there are different
> signature methods. See the code at
>
> https://svn.apache.org/viewvc/pdfbox/branches/2.0/examples/src/main/java/org/apache/pdfbox/examples/signature/ShowSignature.java?view=markup
>
> I didn't use getsignedContent() there but I think I should. So I'd be very
> interested to find out if there is a bug there.
>
> Tilman
>
>
>> Now, looking at the COSParser class I've found this method :
>>
>> COSParser.parseHeader
>>
>>
>> This method, trying to find the correct document's header, is able to skip
>> some garbage in the PDF document looking for the markers "%PDF-" and
>> "%FDF-".
>>
>> So, I've noticed that the signature verification succeed if I skip that
>> garbage during the signed-content extraction.
>>
>> My question is:
>> Why this garbage-management is not present also into the getSignedContent
>> code?
>>
>> The workaround I found is to skip that garbage manually from the
>> *pdfInputStream*, but now the problem is the correct way to calculate the
>> offset for the *pdfInputStream.*
>>
>> Any suggestion?
>>
>> Kinds regards
>> Andrea.
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: Invalid signed content from PDSignature

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 08.06.2016 um 13:27 schrieb Andrea Canu:
> Hi guys
>
> I want to ask you about the correct way to get the signed-content from the
> signature.
> Since now I've used the PDSignature class's method:
>
> signature.getSignedContent ( *pdfInputStream *)
>
> With this method I'm able to extract from the *pdfInputStream *the
> byte-array of the signed-content based on the signature's ByteRange.
>
> I've noticed that if I try to verify the signature based on that
> byte-array, the verification sometime unexpectedly fails!

Hello Andrea,

Can you share the PDF (upload it)?

I doubt your theory re: bug in COSParser. I'd rather search if there is 
a bug in COSFilterInputStream.

If you can't share the PDF, then please download the bytes "the hard way":

                     // download the signed content, described in 
/ByteRange COSArray:
                     // [offset1 len1 offset2 len2]
                     int[] byteRange = sig.getByteRange();
                     byte[] buf = new byte[byteRange[1] + byteRange[3]];
                     RandomAccessFile raf = new RandomAccessFile(infile, 
"r");
                     raf.seek(byteRange[0]);
                     raf.readFully(buf, byteRange[0], byteRange[1]);
                     raf.seek(byteRange[2]);
                     raf.readFully(buf, byteRange[1], byteRange[3]);
                     raf.close();

This code is not fully correct, because /ByteRange might have more than 
4 elements. So have a look at it to be sure.

Then compare the byte array "buf" with the one from getSignedContent.

Another possibility that it fails might be that there are different 
signature methods. See the code at
https://svn.apache.org/viewvc/pdfbox/branches/2.0/examples/src/main/java/org/apache/pdfbox/examples/signature/ShowSignature.java?view=markup

I didn't use getsignedContent() there but I think I should. So I'd be 
very interested to find out if there is a bug there.

Tilman

>
> Now, looking at the COSParser class I've found this method :
>
> COSParser.parseHeader
>
>
> This method, trying to find the correct document's header, is able to skip
> some garbage in the PDF document looking for the markers "%PDF-" and
> "%FDF-".
>
> So, I've noticed that the signature verification succeed if I skip that
> garbage during the signed-content extraction.
>
> My question is:
> Why this garbage-management is not present also into the getSignedContent
> code?
>
> The workaround I found is to skip that garbage manually from the
> *pdfInputStream*, but now the problem is the correct way to calculate the
> offset for the *pdfInputStream.*
>
> Any suggestion?
>
> Kinds regards
> Andrea.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org