You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by "Cary L. Schofield" <ca...@eesoh.com> on 2014/02/21 23:47:01 UTC

PDFParser Conflict Resolution

I have a signed document that is getting parsed incorrectly.

Using PDFParser the document form is missing all fields and I can't get 
to the signature fields.
Using NonSequentialPDFParser I can get to the signature fields but the 
signed data appears to have been corrupted.

I was able to determine that the form was being replaced or corrupted 
during conflict resolution.

I solved the problem by patching PDFParser.ConflictObj to ignore an 
object in the conflict list when the existing object (from the object 
pool) is a direct object.

I know I should do the research, but was hoping someone would already 
know if the patch is reasonable or likely to cause more/other problems.

Thanks

Re: PDFParser Conflict Resolution

Posted by "Cary L. Schofield" <ca...@eesoh.com>.

Thanks for you reply.  I have followed your recommendation.  There was a 
TODO in the NonSequentialParser that indicated that signature contents 
are not encrypt and thus should not be decrypted.  I have added code to 
not decrypt in this case and my documents seem to parsed correctly.

Thanks again.


On 02/22/2014 09:23 AM, Maruan Sahyoun wrote:
> Hi,
>
> the PDFParser works sequentially throughout the file from top to bottom and collects all objects. Conflict resolution is done by making the assumption that if an object with the same number exists later in the file that this should be the correct one.
>
> NonSequentialParser works through the file by looking at the Xref information (table or stream). This is inline with the PDF specification.
>
> So patching as you’ve done might resolve your issue but might also introduce issues with other files. The best way would be to find out why NonSequentialParser has issues parsing your file. If you think it’s a bug please open an issue in jira [https://issues.apache.org/jira/browse/PDFBOX] and attach the PDF file to together with some sample code.
>
> BR
> Maruan Sahyoun
>
> Am 21.02.2014 um 23:47 schrieb Cary L. Schofield <ca...@eesoh.com>:
>
>> I have a signed document that is getting parsed incorrectly.
>>
>> Using PDFParser the document form is missing all fields and I can't get to the signature fields.
>> Using NonSequentialPDFParser I can get to the signature fields but the signed data appears to have been corrupted.
>>
>> I was able to determine that the form was being replaced or corrupted during conflict resolution.
>>
>> I solved the problem by patching PDFParser.ConflictObj to ignore an object in the conflict list when the existing object (from the object pool) is a direct object.
>>
>> I know I should do the research, but was hoping someone would already know if the patch is reasonable or likely to cause more/other problems.
>>
>> Thanks
>>
>

Re: PDFParser Conflict Resolution

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.

Hi,

the PDFParser works sequentially throughout the file from top to bottom and collects all objects. Conflict resolution is done by making the assumption that if an object with the same number exists later in the file that this should be the correct one.

NonSequentialParser works through the file by looking at the Xref information (table or stream). This is inline with the PDF specification.

So patching as you’ve done might resolve your issue but might also introduce issues with other files. The best way would be to find out why NonSequentialParser has issues parsing your file. If you think it’s a bug please open an issue in jira [https://issues.apache.org/jira/browse/PDFBOX] and attach the PDF file to together with some sample code.

BR
Maruan Sahyoun

Am 21.02.2014 um 23:47 schrieb Cary L. Schofield <ca...@eesoh.com>:

> I have a signed document that is getting parsed incorrectly.
> 
> Using PDFParser the document form is missing all fields and I can't get to the signature fields.
> Using NonSequentialPDFParser I can get to the signature fields but the signed data appears to have been corrupted.
> 
> I was able to determine that the form was being replaced or corrupted during conflict resolution.
> 
> I solved the problem by patching PDFParser.ConflictObj to ignore an object in the conflict list when the existing object (from the object pool) is a direct object.
> 
> I know I should do the research, but was hoping someone would already know if the patch is reasonable or likely to cause more/other problems.
> 
> Thanks
>