You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Marc Davis <ma...@gmail.com> on 2014/10/01 15:21:37 UTC

Mollify PDF before merging

I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts).  However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!

Is there a workaround for this in PDFBox?

Thanks,
Marc




Re: Mollify PDF before merging

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
There are samples in the examples Package

Maruan Sahyoun

> Am 02.10.2014 um 20:43 schrieb Ivan <cu...@gmx.com>:
> 
> Is out there an example how to sign PDF with PDF Box?
> Best Regards,
> 
> Ivan
>> On Oct 2, 2014, at 1:28 PM, Marc Davis <ma...@gmail.com> wrote:
>> 
>> Tim, 1.8.7 seems to have fixed all our issues!  Thanks so much for recommending this.
>> 
>> We do have two images that seem troublesome:
>> 
>> https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0 (after merging TL-9 page is black)
>> https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0 (file is password protected, does PDFBox have a way around this?)
>> 
>> I’d love to hear your thoughts on this...
>> 
>> Thanks,
>> Marc
>> 
>> 
>> 
>>> On Oct 1, 2014, at 3:18 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>>> 
>>> Then please retry with 1.8.7, because the problem should be fixed there, hopefully. (A problem related to identically named resources in both PDF files)
>>> 
>>> if it still happens, please open an issue in JIRA, and attach the two PDF files and the result. If the files are confidential, please try producing non-confidential files.
>>> 
>>> Tilman
>>> 
>>>> Am 01.10.2014 um 21:13 schrieb Marc Davis:
>>>> I am using v1.8.6
>>>> 
>>>> Thanks,
>>>> Marc
>>>> 
>>>> 
>>>> 
>>>>> On Oct 1, 2014, at 3:08 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>>>>> 
>>>>> What version are you using? We recently fixed a bug with merge.
>>>>> 
>>>>> Tilman
>>>>> 
>>>>>> Am 01.10.2014 um 15:21 schrieb Marc Davis:
>>>>>> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts).  However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>>>>>> 
>>>>>> Is there a workaround for this in PDFBox?
>>>>>> 
>>>>>> Thanks,
>>>>>> Marc
> 

Re: Mollify PDF before merging

Posted by Ivan <cu...@gmx.com>.
Is out there an example how to sign PDF with PDF Box?
Best Regards,

Ivan
On Oct 2, 2014, at 1:28 PM, Marc Davis <ma...@gmail.com> wrote:

> Tim, 1.8.7 seems to have fixed all our issues!  Thanks so much for recommending this.
> 
> We do have two images that seem troublesome:
> 
> https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0 (after merging TL-9 page is black)
> https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0 (file is password protected, does PDFBox have a way around this?)
> 
> I’d love to hear your thoughts on this...
> 
> Thanks,
> Marc
> 
> 
> 
> On Oct 1, 2014, at 3:18 PM, Tilman Hausherr <TH...@t-online.de> wrote:
> 
>> Then please retry with 1.8.7, because the problem should be fixed there, hopefully. (A problem related to identically named resources in both PDF files)
>> 
>> if it still happens, please open an issue in JIRA, and attach the two PDF files and the result. If the files are confidential, please try producing non-confidential files.
>> 
>> Tilman
>> 
>> Am 01.10.2014 um 21:13 schrieb Marc Davis:
>>> I am using v1.8.6
>>> 
>>> Thanks,
>>> Marc
>>> 
>>> 
>>> 
>>> On Oct 1, 2014, at 3:08 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>>> 
>>>> What version are you using? We recently fixed a bug with merge.
>>>> 
>>>> Tilman
>>>> 
>>>> Am 01.10.2014 um 15:21 schrieb Marc Davis:
>>>>> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts).  However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>>>>> 
>>>>> Is there a workaround for this in PDFBox?
>>>>> 
>>>>> Thanks,
>>>>> Marc
>>>>> 
>>>>> 
>>>>> 
>> 
> 


Re: Mollify PDF before merging

Posted by Tilman Hausherr <TH...@t-online.de>.
Re problem 2:

I opened the file with Adobe Reader. In properties, "security" tab, the 
second one (in german "Dokumentzusammenstellung", I assume this is 
"merge") it says "no". I assume this means you shouldn't merge it. And 
neither should we.

Tilman

Am 03.10.2014 um 21:39 schrieb Marc Davis:
> Problem 2:
>
> We used PDFBox 1.8.7 to merge these two files:
> https://www.dropbox.com/s/7lnbdieo9t8k38e/good.pdf?dl=0
> https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0
>
> The merge does not proceed due to password encryption of badform.pdf.  Does PDFBox have a way to handle password encrypt files?  Strangely, the file can be opened normally (without the need to enter a password)!



Re: Mollify PDF before merging

Posted by Marc Davis <ma...@gmail.com>.
Oh, very nice…thanks for the heads up!

Thanks,
Marc



On Oct 8, 2014, at 1:28 PM, Tilman Hausherr <TH...@t-online.de> wrote:

> And it has been fixed now. The cause was a JDK bug. Get a new jar file here in a few hours:
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/1.8.8-SNAPSHOT/
> 
> Tilman
> 
> Am 04.10.2014 um 15:44 schrieb Tilman Hausherr:
>> Problem 1 is now here:
>> https://issues.apache.org/jira/browse/PDFBOX-2401
>> 
>> Tilman
>> 
>> Am 03.10.2014 um 21:39 schrieb Marc Davis:
>>> Problem 1:
>>> 
>>> We use PDFBox 1.8.7 to merge these two files:
>>> 
>>> https://www.dropbox.com/s/7lnbdieo9t8k38e/good.pdf?dl=0
>>> https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0
>>> 
>>> This is the resultant merged file:
>>> https://www.dropbox.com/s/gwmbd053269at0p/Merged%20PDF.pdf?dl=0
>>> 
>>> The problem: page TL-9 appears black as shown here:
>>> https://www.dropbox.com/s/09bcw1h87f5hbyy/Screenshot%202014-10-03%20at%203.28.51%20PM.png?dl=0 
>> 
> 


Re: Mollify PDF before merging

Posted by Tilman Hausherr <TH...@t-online.de>.
And it has been fixed now. The cause was a JDK bug. Get a new jar file 
here in a few hours:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/1.8.8-SNAPSHOT/

Tilman

Am 04.10.2014 um 15:44 schrieb Tilman Hausherr:
> Problem 1 is now here:
> https://issues.apache.org/jira/browse/PDFBOX-2401
>
> Tilman
>
> Am 03.10.2014 um 21:39 schrieb Marc Davis:
>> Problem 1:
>>
>> We use PDFBox 1.8.7 to merge these two files:
>>
>> https://www.dropbox.com/s/7lnbdieo9t8k38e/good.pdf?dl=0
>> https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0
>>
>> This is the resultant merged file:
>> https://www.dropbox.com/s/gwmbd053269at0p/Merged%20PDF.pdf?dl=0
>>
>> The problem: page TL-9 appears black as shown here:
>> https://www.dropbox.com/s/09bcw1h87f5hbyy/Screenshot%202014-10-03%20at%203.28.51%20PM.png?dl=0 
>>
>


Re: Mollify PDF before merging

Posted by Tilman Hausherr <TH...@t-online.de>.
Problem 1 is now here:
https://issues.apache.org/jira/browse/PDFBOX-2401

Tilman

Am 03.10.2014 um 21:39 schrieb Marc Davis:
> Problem 1:
>
> We use PDFBox 1.8.7 to merge these two files:
>
> https://www.dropbox.com/s/7lnbdieo9t8k38e/good.pdf?dl=0
> https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0
>
> This is the resultant merged file:
> https://www.dropbox.com/s/gwmbd053269at0p/Merged%20PDF.pdf?dl=0
>
> The problem: page TL-9 appears black as shown here:
> https://www.dropbox.com/s/09bcw1h87f5hbyy/Screenshot%202014-10-03%20at%203.28.51%20PM.png?dl=0


Re: Mollify PDF before merging

Posted by Marc Davis <ma...@gmail.com>.
Tilman,

Please accept my sincere apologies for incorrectly calling you Tim!  This was a genuine oversight.

Here are my issues:

Problem 1:

We use PDFBox 1.8.7 to merge these two files:

https://www.dropbox.com/s/7lnbdieo9t8k38e/good.pdf?dl=0
https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0

This is the resultant merged file:
https://www.dropbox.com/s/gwmbd053269at0p/Merged%20PDF.pdf?dl=0

The problem: page TL-9 appears black as shown here:
https://www.dropbox.com/s/09bcw1h87f5hbyy/Screenshot%202014-10-03%20at%203.28.51%20PM.png?dl=0
————————
Problem 2:

We used PDFBox 1.8.7 to merge these two files:
https://www.dropbox.com/s/7lnbdieo9t8k38e/good.pdf?dl=0
https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0

The merge does not proceed due to password encryption of badform.pdf.  Does PDFBox have a way to handle password encrypt files?  Strangely, the file can be opened normally (without the need to enter a password)!

We had another 8 files that did not merge properly with 1.8.6, but now merges fine with 1.8.7.  Only the two issues above are outstanding.

Thanks,
Marc



On Oct 2, 2014, at 3:23 PM, Tilman Hausherr <TH...@t-online.de> wrote:

> Am 02.10.2014 um 20:28 schrieb Marc Davis:
>> Tim, 1.8.7 seems to have fixed all our issues!  Thanks so much for recommending this.
> 
> I'm "Tilman". "Tim" is a (very nice) committer from Apache TIKA, a project that does use PDFBox.
> 
>> We do have two images that seem troublesome:
>> 
>> https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0 (after merging TL-9 page is black)
> 
> Then please post the other file, and the result. In other words - just assume we're dumb and lazy, so please provide every file / step that produces an error, rather describe more than needed. Even then, solutions may take some time:
> https://issues.apache.org/jira/browse/PDFBOX-1511
> took oder a year and was a group effort of at least six people.
> 
> And there's a contradiction: you're writing "1.8.7 seems to have fixed all our issues", but then you're mentioning two new problems...
> 
>> https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0 (file is password protected, does PDFBox have a way around this?)
> 
> I was able to display it in the browser. I didn't test it wirh PDFBox; some files are protected with the empty password. If you use the new nonSeq parser (loadNonSeq()), just use "" as extra parameter. If you use load(), then it is more complex, then use openProtection() (download the source code to see how)
> 
> Tilman
> 
>> 
>> I’d love to hear your thoughts on this...
>> 
>> Thanks,
>> Marc
>> 
>> 
>> 
>> On Oct 1, 2014, at 3:18 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>> 
>>> Then please retry with 1.8.7, because the problem should be fixed there, hopefully. (A problem related to identically named resources in both PDF files)
>>> 
>>> if it still happens, please open an issue in JIRA, and attach the two PDF files and the result. If the files are confidential, please try producing non-confidential files.
>>> 
>>> Tilman
>>> 
>>> Am 01.10.2014 um 21:13 schrieb Marc Davis:
>>>> I am using v1.8.6
>>>> 
>>>> Thanks,
>>>> Marc
>>>> 
>>>> 
>>>> 
>>>> On Oct 1, 2014, at 3:08 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>>>> 
>>>>> What version are you using? We recently fixed a bug with merge.
>>>>> 
>>>>> Tilman
>>>>> 
>>>>> Am 01.10.2014 um 15:21 schrieb Marc Davis:
>>>>>> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts).  However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>>>>>> 
>>>>>> Is there a workaround for this in PDFBox?
>>>>>> 
>>>>>> Thanks,
>>>>>> Marc
>>>>>> 
>>>>>> 
>>>>>> 
> 


Re: Mollify PDF before merging

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 02.10.2014 um 20:28 schrieb Marc Davis:
> Tim, 1.8.7 seems to have fixed all our issues!  Thanks so much for recommending this.

I'm "Tilman". "Tim" is a (very nice) committer from Apache TIKA, a 
project that does use PDFBox.

> We do have two images that seem troublesome:
>
> https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0 (after merging TL-9 page is black)

Then please post the other file, and the result. In other words - just 
assume we're dumb and lazy, so please provide every file / step that 
produces an error, rather describe more than needed. Even then, 
solutions may take some time:
https://issues.apache.org/jira/browse/PDFBOX-1511
took oder a year and was a group effort of at least six people.

And there's a contradiction: you're writing "1.8.7 seems to have fixed 
all our issues", but then you're mentioning two new problems...

> https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0 (file is password protected, does PDFBox have a way around this?)

I was able to display it in the browser. I didn't test it wirh PDFBox; 
some files are protected with the empty password. If you use the new 
nonSeq parser (loadNonSeq()), just use "" as extra parameter. If you use 
load(), then it is more complex, then use openProtection() (download the 
source code to see how)

Tilman

>
> I’d love to hear your thoughts on this...
>
> Thanks,
> Marc
>
>
>
> On Oct 1, 2014, at 3:18 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>
>> Then please retry with 1.8.7, because the problem should be fixed there, hopefully. (A problem related to identically named resources in both PDF files)
>>
>> if it still happens, please open an issue in JIRA, and attach the two PDF files and the result. If the files are confidential, please try producing non-confidential files.
>>
>> Tilman
>>
>> Am 01.10.2014 um 21:13 schrieb Marc Davis:
>>> I am using v1.8.6
>>>
>>> Thanks,
>>> Marc
>>>
>>>
>>>
>>> On Oct 1, 2014, at 3:08 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>>>
>>>> What version are you using? We recently fixed a bug with merge.
>>>>
>>>> Tilman
>>>>
>>>> Am 01.10.2014 um 15:21 schrieb Marc Davis:
>>>>> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts).  However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>>>>>
>>>>> Is there a workaround for this in PDFBox?
>>>>>
>>>>> Thanks,
>>>>> Marc
>>>>>
>>>>>
>>>>>


Re: Mollify PDF before merging

Posted by Marc Davis <ma...@gmail.com>.
Tim, 1.8.7 seems to have fixed all our issues!  Thanks so much for recommending this.

We do have two images that seem troublesome:

https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0 (after merging TL-9 page is black)
https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0 (file is password protected, does PDFBox have a way around this?)

I’d love to hear your thoughts on this...

Thanks,
Marc



On Oct 1, 2014, at 3:18 PM, Tilman Hausherr <TH...@t-online.de> wrote:

> Then please retry with 1.8.7, because the problem should be fixed there, hopefully. (A problem related to identically named resources in both PDF files)
> 
> if it still happens, please open an issue in JIRA, and attach the two PDF files and the result. If the files are confidential, please try producing non-confidential files.
> 
> Tilman
> 
> Am 01.10.2014 um 21:13 schrieb Marc Davis:
>> I am using v1.8.6
>> 
>> Thanks,
>> Marc
>> 
>> 
>> 
>> On Oct 1, 2014, at 3:08 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>> 
>>> What version are you using? We recently fixed a bug with merge.
>>> 
>>> Tilman
>>> 
>>> Am 01.10.2014 um 15:21 schrieb Marc Davis:
>>>> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts).  However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>>>> 
>>>> Is there a workaround for this in PDFBox?
>>>> 
>>>> Thanks,
>>>> Marc
>>>> 
>>>> 
>>>> 
> 


Re: Mollify PDF before merging

Posted by Tilman Hausherr <TH...@t-online.de>.
Then please retry with 1.8.7, because the problem should be fixed there, 
hopefully. (A problem related to identically named resources in both PDF 
files)

if it still happens, please open an issue in JIRA, and attach the two 
PDF files and the result. If the files are confidential, please try 
producing non-confidential files.

Tilman

Am 01.10.2014 um 21:13 schrieb Marc Davis:
> I am using v1.8.6
>
> Thanks,
> Marc
>
>
>
> On Oct 1, 2014, at 3:08 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>
>> What version are you using? We recently fixed a bug with merge.
>>
>> Tilman
>>
>> Am 01.10.2014 um 15:21 schrieb Marc Davis:
>>> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts).  However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>>>
>>> Is there a workaround for this in PDFBox?
>>>
>>> Thanks,
>>> Marc
>>>
>>>
>>>


Re: Mollify PDF before merging

Posted by Marc Davis <ma...@gmail.com>.
I am using v1.8.6  

Thanks,
Marc



On Oct 1, 2014, at 3:08 PM, Tilman Hausherr <TH...@t-online.de> wrote:

> What version are you using? We recently fixed a bug with merge.
> 
> Tilman
> 
> Am 01.10.2014 um 15:21 schrieb Marc Davis:
>> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts).  However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>> 
>> Is there a workaround for this in PDFBox?
>> 
>> Thanks,
>> Marc
>> 
>> 
>> 
> 


Re: Mollify PDF before merging

Posted by Tilman Hausherr <TH...@t-online.de>.
What version are you using? We recently fixed a bug with merge.

Tilman

Am 01.10.2014 um 15:21 schrieb Marc Davis:
> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts).  However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>
> Is there a workaround for this in PDFBox?
>
> Thanks,
> Marc
>
>
>