You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Marc Davis <ma...@gmail.com> on 2014/10/01 15:21:37 UTC
Mollify PDF before merging
I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts). However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
Is there a workaround for this in PDFBox?
Thanks,
Marc
Re: Mollify PDF before merging
Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
There are samples in the examples Package
Maruan Sahyoun
> Am 02.10.2014 um 20:43 schrieb Ivan <cu...@gmx.com>:
>
> Is out there an example how to sign PDF with PDF Box?
> Best Regards,
>
> Ivan
>> On Oct 2, 2014, at 1:28 PM, Marc Davis <ma...@gmail.com> wrote:
>>
>> Tim, 1.8.7 seems to have fixed all our issues! Thanks so much for recommending this.
>>
>> We do have two images that seem troublesome:
>>
>> https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0 (after merging TL-9 page is black)
>> https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0 (file is password protected, does PDFBox have a way around this?)
>>
>> I’d love to hear your thoughts on this...
>>
>> Thanks,
>> Marc
>>
>>
>>
>>> On Oct 1, 2014, at 3:18 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>>>
>>> Then please retry with 1.8.7, because the problem should be fixed there, hopefully. (A problem related to identically named resources in both PDF files)
>>>
>>> if it still happens, please open an issue in JIRA, and attach the two PDF files and the result. If the files are confidential, please try producing non-confidential files.
>>>
>>> Tilman
>>>
>>>> Am 01.10.2014 um 21:13 schrieb Marc Davis:
>>>> I am using v1.8.6
>>>>
>>>> Thanks,
>>>> Marc
>>>>
>>>>
>>>>
>>>>> On Oct 1, 2014, at 3:08 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>>>>>
>>>>> What version are you using? We recently fixed a bug with merge.
>>>>>
>>>>> Tilman
>>>>>
>>>>>> Am 01.10.2014 um 15:21 schrieb Marc Davis:
>>>>>> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts). However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>>>>>>
>>>>>> Is there a workaround for this in PDFBox?
>>>>>>
>>>>>> Thanks,
>>>>>> Marc
>
Re: Mollify PDF before merging
Posted by Ivan <cu...@gmx.com>.
Is out there an example how to sign PDF with PDF Box?
Best Regards,
Ivan
On Oct 2, 2014, at 1:28 PM, Marc Davis <ma...@gmail.com> wrote:
> Tim, 1.8.7 seems to have fixed all our issues! Thanks so much for recommending this.
>
> We do have two images that seem troublesome:
>
> https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0 (after merging TL-9 page is black)
> https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0 (file is password protected, does PDFBox have a way around this?)
>
> I’d love to hear your thoughts on this...
>
> Thanks,
> Marc
>
>
>
> On Oct 1, 2014, at 3:18 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>
>> Then please retry with 1.8.7, because the problem should be fixed there, hopefully. (A problem related to identically named resources in both PDF files)
>>
>> if it still happens, please open an issue in JIRA, and attach the two PDF files and the result. If the files are confidential, please try producing non-confidential files.
>>
>> Tilman
>>
>> Am 01.10.2014 um 21:13 schrieb Marc Davis:
>>> I am using v1.8.6
>>>
>>> Thanks,
>>> Marc
>>>
>>>
>>>
>>> On Oct 1, 2014, at 3:08 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>>>
>>>> What version are you using? We recently fixed a bug with merge.
>>>>
>>>> Tilman
>>>>
>>>> Am 01.10.2014 um 15:21 schrieb Marc Davis:
>>>>> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts). However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>>>>>
>>>>> Is there a workaround for this in PDFBox?
>>>>>
>>>>> Thanks,
>>>>> Marc
>>>>>
>>>>>
>>>>>
>>
>
Re: Mollify PDF before merging
Posted by Tilman Hausherr <TH...@t-online.de>.
Re problem 2:
I opened the file with Adobe Reader. In properties, "security" tab, the
second one (in german "Dokumentzusammenstellung", I assume this is
"merge") it says "no". I assume this means you shouldn't merge it. And
neither should we.
Tilman
Am 03.10.2014 um 21:39 schrieb Marc Davis:
> Problem 2:
>
> We used PDFBox 1.8.7 to merge these two files:
> https://www.dropbox.com/s/7lnbdieo9t8k38e/good.pdf?dl=0
> https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0
>
> The merge does not proceed due to password encryption of badform.pdf. Does PDFBox have a way to handle password encrypt files? Strangely, the file can be opened normally (without the need to enter a password)!
Re: Mollify PDF before merging
Posted by Marc Davis <ma...@gmail.com>.
Oh, very nice…thanks for the heads up!
Thanks,
Marc
On Oct 8, 2014, at 1:28 PM, Tilman Hausherr <TH...@t-online.de> wrote:
> And it has been fixed now. The cause was a JDK bug. Get a new jar file here in a few hours:
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/1.8.8-SNAPSHOT/
>
> Tilman
>
> Am 04.10.2014 um 15:44 schrieb Tilman Hausherr:
>> Problem 1 is now here:
>> https://issues.apache.org/jira/browse/PDFBOX-2401
>>
>> Tilman
>>
>> Am 03.10.2014 um 21:39 schrieb Marc Davis:
>>> Problem 1:
>>>
>>> We use PDFBox 1.8.7 to merge these two files:
>>>
>>> https://www.dropbox.com/s/7lnbdieo9t8k38e/good.pdf?dl=0
>>> https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0
>>>
>>> This is the resultant merged file:
>>> https://www.dropbox.com/s/gwmbd053269at0p/Merged%20PDF.pdf?dl=0
>>>
>>> The problem: page TL-9 appears black as shown here:
>>> https://www.dropbox.com/s/09bcw1h87f5hbyy/Screenshot%202014-10-03%20at%203.28.51%20PM.png?dl=0
>>
>
Re: Mollify PDF before merging
Posted by Tilman Hausherr <TH...@t-online.de>.
And it has been fixed now. The cause was a JDK bug. Get a new jar file
here in a few hours:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/1.8.8-SNAPSHOT/
Tilman
Am 04.10.2014 um 15:44 schrieb Tilman Hausherr:
> Problem 1 is now here:
> https://issues.apache.org/jira/browse/PDFBOX-2401
>
> Tilman
>
> Am 03.10.2014 um 21:39 schrieb Marc Davis:
>> Problem 1:
>>
>> We use PDFBox 1.8.7 to merge these two files:
>>
>> https://www.dropbox.com/s/7lnbdieo9t8k38e/good.pdf?dl=0
>> https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0
>>
>> This is the resultant merged file:
>> https://www.dropbox.com/s/gwmbd053269at0p/Merged%20PDF.pdf?dl=0
>>
>> The problem: page TL-9 appears black as shown here:
>> https://www.dropbox.com/s/09bcw1h87f5hbyy/Screenshot%202014-10-03%20at%203.28.51%20PM.png?dl=0
>>
>
Re: Mollify PDF before merging
Posted by Tilman Hausherr <TH...@t-online.de>.
Problem 1 is now here:
https://issues.apache.org/jira/browse/PDFBOX-2401
Tilman
Am 03.10.2014 um 21:39 schrieb Marc Davis:
> Problem 1:
>
> We use PDFBox 1.8.7 to merge these two files:
>
> https://www.dropbox.com/s/7lnbdieo9t8k38e/good.pdf?dl=0
> https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0
>
> This is the resultant merged file:
> https://www.dropbox.com/s/gwmbd053269at0p/Merged%20PDF.pdf?dl=0
>
> The problem: page TL-9 appears black as shown here:
> https://www.dropbox.com/s/09bcw1h87f5hbyy/Screenshot%202014-10-03%20at%203.28.51%20PM.png?dl=0
Re: Mollify PDF before merging
Posted by Marc Davis <ma...@gmail.com>.
Tilman,
Please accept my sincere apologies for incorrectly calling you Tim! This was a genuine oversight.
Here are my issues:
Problem 1:
We use PDFBox 1.8.7 to merge these two files:
https://www.dropbox.com/s/7lnbdieo9t8k38e/good.pdf?dl=0
https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0
This is the resultant merged file:
https://www.dropbox.com/s/gwmbd053269at0p/Merged%20PDF.pdf?dl=0
The problem: page TL-9 appears black as shown here:
https://www.dropbox.com/s/09bcw1h87f5hbyy/Screenshot%202014-10-03%20at%203.28.51%20PM.png?dl=0
————————
Problem 2:
We used PDFBox 1.8.7 to merge these two files:
https://www.dropbox.com/s/7lnbdieo9t8k38e/good.pdf?dl=0
https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0
The merge does not proceed due to password encryption of badform.pdf. Does PDFBox have a way to handle password encrypt files? Strangely, the file can be opened normally (without the need to enter a password)!
We had another 8 files that did not merge properly with 1.8.6, but now merges fine with 1.8.7. Only the two issues above are outstanding.
Thanks,
Marc
On Oct 2, 2014, at 3:23 PM, Tilman Hausherr <TH...@t-online.de> wrote:
> Am 02.10.2014 um 20:28 schrieb Marc Davis:
>> Tim, 1.8.7 seems to have fixed all our issues! Thanks so much for recommending this.
>
> I'm "Tilman". "Tim" is a (very nice) committer from Apache TIKA, a project that does use PDFBox.
>
>> We do have two images that seem troublesome:
>>
>> https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0 (after merging TL-9 page is black)
>
> Then please post the other file, and the result. In other words - just assume we're dumb and lazy, so please provide every file / step that produces an error, rather describe more than needed. Even then, solutions may take some time:
> https://issues.apache.org/jira/browse/PDFBOX-1511
> took oder a year and was a group effort of at least six people.
>
> And there's a contradiction: you're writing "1.8.7 seems to have fixed all our issues", but then you're mentioning two new problems...
>
>> https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0 (file is password protected, does PDFBox have a way around this?)
>
> I was able to display it in the browser. I didn't test it wirh PDFBox; some files are protected with the empty password. If you use the new nonSeq parser (loadNonSeq()), just use "" as extra parameter. If you use load(), then it is more complex, then use openProtection() (download the source code to see how)
>
> Tilman
>
>>
>> I’d love to hear your thoughts on this...
>>
>> Thanks,
>> Marc
>>
>>
>>
>> On Oct 1, 2014, at 3:18 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>>
>>> Then please retry with 1.8.7, because the problem should be fixed there, hopefully. (A problem related to identically named resources in both PDF files)
>>>
>>> if it still happens, please open an issue in JIRA, and attach the two PDF files and the result. If the files are confidential, please try producing non-confidential files.
>>>
>>> Tilman
>>>
>>> Am 01.10.2014 um 21:13 schrieb Marc Davis:
>>>> I am using v1.8.6
>>>>
>>>> Thanks,
>>>> Marc
>>>>
>>>>
>>>>
>>>> On Oct 1, 2014, at 3:08 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>>>>
>>>>> What version are you using? We recently fixed a bug with merge.
>>>>>
>>>>> Tilman
>>>>>
>>>>> Am 01.10.2014 um 15:21 schrieb Marc Davis:
>>>>>> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts). However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>>>>>>
>>>>>> Is there a workaround for this in PDFBox?
>>>>>>
>>>>>> Thanks,
>>>>>> Marc
>>>>>>
>>>>>>
>>>>>>
>
Re: Mollify PDF before merging
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 02.10.2014 um 20:28 schrieb Marc Davis:
> Tim, 1.8.7 seems to have fixed all our issues! Thanks so much for recommending this.
I'm "Tilman". "Tim" is a (very nice) committer from Apache TIKA, a
project that does use PDFBox.
> We do have two images that seem troublesome:
>
> https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0 (after merging TL-9 page is black)
Then please post the other file, and the result. In other words - just
assume we're dumb and lazy, so please provide every file / step that
produces an error, rather describe more than needed. Even then,
solutions may take some time:
https://issues.apache.org/jira/browse/PDFBOX-1511
took oder a year and was a group effort of at least six people.
And there's a contradiction: you're writing "1.8.7 seems to have fixed
all our issues", but then you're mentioning two new problems...
> https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0 (file is password protected, does PDFBox have a way around this?)
I was able to display it in the browser. I didn't test it wirh PDFBox;
some files are protected with the empty password. If you use the new
nonSeq parser (loadNonSeq()), just use "" as extra parameter. If you use
load(), then it is more complex, then use openProtection() (download the
source code to see how)
Tilman
>
> I’d love to hear your thoughts on this...
>
> Thanks,
> Marc
>
>
>
> On Oct 1, 2014, at 3:18 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>
>> Then please retry with 1.8.7, because the problem should be fixed there, hopefully. (A problem related to identically named resources in both PDF files)
>>
>> if it still happens, please open an issue in JIRA, and attach the two PDF files and the result. If the files are confidential, please try producing non-confidential files.
>>
>> Tilman
>>
>> Am 01.10.2014 um 21:13 schrieb Marc Davis:
>>> I am using v1.8.6
>>>
>>> Thanks,
>>> Marc
>>>
>>>
>>>
>>> On Oct 1, 2014, at 3:08 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>>>
>>>> What version are you using? We recently fixed a bug with merge.
>>>>
>>>> Tilman
>>>>
>>>> Am 01.10.2014 um 15:21 schrieb Marc Davis:
>>>>> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts). However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>>>>>
>>>>> Is there a workaround for this in PDFBox?
>>>>>
>>>>> Thanks,
>>>>> Marc
>>>>>
>>>>>
>>>>>
Re: Mollify PDF before merging
Posted by Marc Davis <ma...@gmail.com>.
Tim, 1.8.7 seems to have fixed all our issues! Thanks so much for recommending this.
We do have two images that seem troublesome:
https://www.dropbox.com/s/35lafjdqrt7vy3e/michael%20levine.pdf?dl=0 (after merging TL-9 page is black)
https://www.dropbox.com/s/dwlxoj2hpvbnr5i/badform.pdf?dl=0 (file is password protected, does PDFBox have a way around this?)
I’d love to hear your thoughts on this...
Thanks,
Marc
On Oct 1, 2014, at 3:18 PM, Tilman Hausherr <TH...@t-online.de> wrote:
> Then please retry with 1.8.7, because the problem should be fixed there, hopefully. (A problem related to identically named resources in both PDF files)
>
> if it still happens, please open an issue in JIRA, and attach the two PDF files and the result. If the files are confidential, please try producing non-confidential files.
>
> Tilman
>
> Am 01.10.2014 um 21:13 schrieb Marc Davis:
>> I am using v1.8.6
>>
>> Thanks,
>> Marc
>>
>>
>>
>> On Oct 1, 2014, at 3:08 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>>
>>> What version are you using? We recently fixed a bug with merge.
>>>
>>> Tilman
>>>
>>> Am 01.10.2014 um 15:21 schrieb Marc Davis:
>>>> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts). However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>>>>
>>>> Is there a workaround for this in PDFBox?
>>>>
>>>> Thanks,
>>>> Marc
>>>>
>>>>
>>>>
>
Re: Mollify PDF before merging
Posted by Tilman Hausherr <TH...@t-online.de>.
Then please retry with 1.8.7, because the problem should be fixed there,
hopefully. (A problem related to identically named resources in both PDF
files)
if it still happens, please open an issue in JIRA, and attach the two
PDF files and the result. If the files are confidential, please try
producing non-confidential files.
Tilman
Am 01.10.2014 um 21:13 schrieb Marc Davis:
> I am using v1.8.6
>
> Thanks,
> Marc
>
>
>
> On Oct 1, 2014, at 3:08 PM, Tilman Hausherr <TH...@t-online.de> wrote:
>
>> What version are you using? We recently fixed a bug with merge.
>>
>> Tilman
>>
>> Am 01.10.2014 um 15:21 schrieb Marc Davis:
>>> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts). However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>>>
>>> Is there a workaround for this in PDFBox?
>>>
>>> Thanks,
>>> Marc
>>>
>>>
>>>
Re: Mollify PDF before merging
Posted by Marc Davis <ma...@gmail.com>.
I am using v1.8.6
Thanks,
Marc
On Oct 1, 2014, at 3:08 PM, Tilman Hausherr <TH...@t-online.de> wrote:
> What version are you using? We recently fixed a bug with merge.
>
> Tilman
>
> Am 01.10.2014 um 15:21 schrieb Marc Davis:
>> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts). However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>>
>> Is there a workaround for this in PDFBox?
>>
>> Thanks,
>> Marc
>>
>>
>>
>
Re: Mollify PDF before merging
Posted by Tilman Hausherr <TH...@t-online.de>.
What version are you using? We recently fixed a bug with merge.
Tilman
Am 01.10.2014 um 15:21 schrieb Marc Davis:
> I use pdfbox to merge PDF files but we find that many files from scanners or files generated from AutoCAD do not merge properly (they are either blank or missing fonts). However, when we open and save the file in a native reader such as Adobe Reader (Windows) or Preview in Mac, and then merge again, the merge works fine!
>
> Is there a workaround for this in PDFBox?
>
> Thanks,
> Marc
>
>
>