You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by al...@extern.sdv-it.de on 2016/04/13 12:03:36 UTC
How to merge PDF/A-1b documents and keep conformity
Hi, I am new to this list.
My profile is: experienced Java programmer, knowing how to use
PDFMergerUtility, not not a PDF or even PDF/A-1b expert.
Situation: We have a bunch of PDF/A-1b compliant documents from a 3rd
party system and merge them into a new document. The end result is not
PDF/A-1b compliant though.
I found this on the mailing list archive:
http://pdfbox-users.markmail.org/search/?q=merge%20pdf%2Fa#query:merge%20pdf%2Fa+page:1+mid:uwvybz6lhgof3agg+state:results
Is there a better answer today than to look into PDFMergerUtility sources?
Because this class is what we are using, but it does not do it, at least
not in version 1.8.9. Is there a reason to assume that this has changed in
2.x?
I also found this:
https://pdfbox.apache.org/1.8/cookbook/pdfacreation.html
It does not explain how do it in a merge, though.
I would appreciate concrete hints, sample code, links to technical
documentation giving me hints etc.
Kind regards
--
Alexander Kriegisch
Re: How to merge PDF/A-1b documents and keep conformity
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 15.04.2016 um 12:35 schrieb alexander.kriegisch@extern.sdv-it.de:
> It would be really nice if I could either tell the merger to set a given
> output intent or to copy the first one as shown above. How do I achieve
> this without duplicating your original code? An additional parameter for
> setting the desired PDF/A standard type or at least one for setting the
> top level output intent to the PDFMergerUtility constructor or to
> mergeDocuments() would be really nice.
I'll think about a solution in the source code, i.e. so that it would
work with the command line utility as well. My thought is to check for
the OutputConditionIdentifier and to add only one such outputIntent.
This would solve all cases where the files are created by the same
source. I'll work on that after getting some high quality sleep.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Cannot comment on Jira issues anymore
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Am 22.04.2016 um 16:46 schrieb Petras Petkus:
> I'm also unable to add comment to the PDFBOX-3321 issue (created by me). Could you please also include my account too or should I better wait? Thank you.
Done
BR
Andreas
>
> With best regards,
> Petras Petkus
>
>
> -----Original Message-----
> From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
> Sent: Friday, April 22, 2016 11:00 AM
> To: users@pdfbox.apache.org
> Subject: Re: Cannot comment on Jira issues anymore
>
> Hi,
>
>> alexander.kriegisch@extern.sdv-it.de hat am 22. April 2016 um 09:50
>> geschrieben:
>>
>>
>> Sorry to bother everyone here on the mailing list, but something seems
>> to be wrong in Jira: I cannot comment on
>> https://issues.apache.org/jira/browse/PDFBOX-3323 and other issues
>> anymore, the comment button has vanished.
> Infra changed the auth settings for JIRA due to lot of spam. According to a discussion on infra@ they are working on a solution to be able revert that change. They expect to get this done within the next 24 - 48 hours at most.
>
> I've added your JIRA-account to the contributor-group so that you should be able to comment again.
>
> BR
> Andreas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
RE: Cannot comment on Jira issues anymore
Posted by Petras Petkus <pe...@mitsoft.lt>.
I'm also unable to add comment to the PDFBOX-3321 issue (created by me). Could you please also include my account too or should I better wait? Thank you.
With best regards,
Petras Petkus
-----Original Message-----
From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
Sent: Friday, April 22, 2016 11:00 AM
To: users@pdfbox.apache.org
Subject: Re: Cannot comment on Jira issues anymore
Hi,
> alexander.kriegisch@extern.sdv-it.de hat am 22. April 2016 um 09:50
> geschrieben:
>
>
> Sorry to bother everyone here on the mailing list, but something seems
> to be wrong in Jira: I cannot comment on
> https://issues.apache.org/jira/browse/PDFBOX-3323 and other issues
> anymore, the comment button has vanished.
Infra changed the auth settings for JIRA due to lot of spam. According to a discussion on infra@ they are working on a solution to be able revert that change. They expect to get this done within the next 24 - 48 hours at most.
I've added your JIRA-account to the contributor-group so that you should be able to comment again.
BR
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Cannot comment on Jira issues anymore
Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi,
> alexander.kriegisch@extern.sdv-it.de hat am 22. April 2016 um 09:50
> geschrieben:
>
>
> Sorry to bother everyone here on the mailing list, but something seems to
> be wrong in Jira: I cannot comment on
> https://issues.apache.org/jira/browse/PDFBOX-3323 and other issues
> anymore, the comment button has vanished.
Infra changed the auth settings for JIRA due to lot of spam. According to a
discussion on infra@ they are working on a solution to be able revert that
change. They expect to get this done within the next 24 - 48 hours at most.
I've added your JIRA-account to the contributor-group so that you should be able
to comment again.
BR
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Cannot comment on Jira issues anymore
Posted by al...@extern.sdv-it.de.
Sorry to bother everyone here on the mailing list, but something seems to
be wrong in Jira: I cannot comment on
https://issues.apache.org/jira/browse/PDFBOX-3323 and other issues
anymore, the comment button has vanished.
Re: How to merge PDF/A-1b documents and keep conformity
Posted by al...@extern.sdv-it.de.
Hi Tilman.
As I already said in the ticket, it works. I guess this will make a lot
more users happy in the future. Thanks a buch! :-) Let us continue the
discussion in Jira.
Regards
--
Alexander Kriegisch
Von: Tilman Hausherr <TH...@t-online.de>
An: users@pdfbox.apache.org,
Datum: 16.04.2016 23:15
Betreff: Re: Antwort: Re: How to merge PDF/A-1b documents and keep
conformity
I have opened a new issue
https://issues.apache.org/jira/browse/PDFBOX-3317
and created a solution that requires no changes, please try it.
get a -SNAPSHOT version with maven, or find it here in a few hours:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.1-SNAPSHOT/
please give feedback whether it worked.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Antwort: Re: How to merge PDF/A-1b documents and keep conformity
Posted by Tilman Hausherr <TH...@t-online.de>.
I have opened a new issue
https://issues.apache.org/jira/browse/PDFBOX-3317
and created a solution that requires no changes, please try it.
get a -SNAPSHOT version with maven, or find it here in a few hours:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.1-SNAPSHOT/
please give feedback whether it worked.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Antwort: Re: How to merge PDF/A-1b documents and keep conformity
Posted by al...@extern.sdv-it.de.
If you mean saving to a temp-file, re-reading and manipulating it, writing
it again, this is not an option because performance is very important for
us. As you can see from my code snippet, I am already using piped streams
to avoid disk I/O. But anyway, Maruan, what is your suggestion?
Von: Maruan Sahyoun <sa...@fileaffairs.de>
An: users@pdfbox.apache.org,
Datum: 15.04.2016 12:54
Betreff: Re: How to merge PDF/A-1b documents and keep conformity
Hi,
> Am 15.04.2016 um 12:35 schrieb alexander.kriegisch@extern.sdv-it.de:
>
> Basically your hack works if I overwrite PDFMergerUtility (extending it
is
> no option even in the same package because 'appendDocument()' needs
> private members). I had to modify your snippet by this in order to avoid
> adding multiple intents, leading to a validation error:
>
> private boolean hasIntent = false;
> ...
> public void appendDocument(PDDocument destination, PDDocument source)
> throws IOException
> {
> ...
> if (!hasIntent) {
> hasIntent = true;
> List<PDOutputIntent> srcOutputIntents =
> srcCatalog.getOutputIntents();
> for (PDOutputIntent outputIntent : srcOutputIntents)
> destCatalog.addOutputIntent(outputIntent);
> }
> ...
> }
>
> It would be really nice if I could either tell the merger to set a given
> output intent or to copy the first one as shown above. How do I achieve
> this without duplicating your original code? An additional parameter for
> setting the desired PDF/A standard type or at least one for setting the
> top level output intent to the PDFMergerUtility constructor or to
> mergeDocuments() would be really nice.
would it be an option to do the merge first and remove the output intent
that is needed/you'd like to keep on the merged document afterwards?
BR
Maruan
>
>
>
> Von: alexander.kriegisch@extern.sdv-it.de
> An: users@pdfbox.apache.org,
> Datum: 15.04.2016 11:11
> Betreff: Antwort: Re: How to merge PDF/A-1b documents and keep
> conformity
>
>
>
> Hi Tilman.
>
> What exactly do you need to know except for what I already told you in
the
>
> "situation" paragraph? We currently use something like this:
>
> public InputStream merge(final List<InputStream> sources) throws
> IOException {
> PDFMergerUtility merger = new PDFMergerUtility();
> for (InputStream source : sources) {
> logger.trace("PDF merger source = {}", source);
> merger.addSource(source);
> }
> PipedOutputStream outputStream = new PipedOutputStream();
> PipedInputStream inputStream = new PipedInputStream(outputStream);
> merger.setDestinationStream(outputStream);
> new Thread(() -> {
> try {
> merger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
> } catch (IOException e) {
> logger.error("PDF merge problem", e);
> }
> }).start();
> return inputStream;
> }
>
> Does that help? By the way, I need an automated, stable PDF merge
> solution, not a one-time hack including manual editing in Notepad++.
> Furthermore, I cannot just add code to your API, I would like to use the
> API as is. I tried to quick & dirty extend PDFMergerUtility with a
> subclass and overwrite 'appendDocument', copying all the original source
> code. But the thing is, that methods uses non-public classes like
> PDFCloneUtility and non-public members etc. I could only try to use the
> same package as the original, but this is not nice.
>
> The source documents are, as I said, PDF/A-1b compliant, all of them
> created by the same output manegement system. So I guess the output
> intents (whatever that means) are similar or identical.
>
> Regards
> --
> Alexander Kriegisch
>
>
>
>
> Von: Tilman Hausherr <TH...@t-online.de>
> An: users@pdfbox.apache.org,
> Datum: 13.04.2016 18:20
> Betreff: Re: How to merge PDF/A-1b documents and keep conformity
>
>
>
> Am 13.04.2016 um 12:03 schrieb alexander.kriegisch@extern.sdv-it.de:
>> Hi, I am new to this list.
>>
>> My profile is: experienced Java programmer, knowing how to use
>> PDFMergerUtility, not not a PDF or even PDF/A-1b expert.
>>
>> Situation: We have a bunch of PDF/A-1b compliant documents from a 3rd
>> party system and merge them into a new document. The end result is not
>> PDF/A-1b compliant though.
>>
>> I found this on the mailing list archive:
>>
>
http://pdfbox-users.markmail.org/search/?q=merge%20pdf%2Fa#query:merge%20pdf%2Fa+page:1+mid:uwvybz6lhgof3agg+state:results
>
>
>> Is there a better answer today than to look into PDFMergerUtility
> sources?
>> Because this class is what we are using, but it does not do it, at
least
>> not in version 1.8.9. Is there a reason to assume that this has changed
> in
>> 2.x?
>>
> You didn't mention what went wrong. I had that problem once with 2 files
> from the same source, what I did is:
>
> 1) in 2.0 source code (I won't bother with 1.8) add this in
> PDFMergerUtility.appendDocument() above the comment "merge logical
> structure hierarchy":
>
> List<PDOutputIntent> srcOutputIntents =
> srcCatalog.getOutputIntents();
> for (PDOutputIntent outputIntent : srcOutputIntents)
> {
> destCatalog.addOutputIntent(outputIntent);
> }
>
> then I edited the result PDF manually to remove one of the output
> intents. The result PDF should have something like this:
>
> /OutputIntents [7 0 R 8 0 R]
>
> just blank one of the two, e.g. like this:
>
> /OutputIntents [7 0 R ]
>
> make sure that you don't change any positions, i.e. switch your editor
> (NOTEPAD++) to overwrite.
>
> This may or may not work... if the two files have different output
> intents, then you'll have surprises, obviously.
>
> I haven't done any code changes... I don't know for sure what element of
> the outputIntent is the "key" (so to skip others with the same key), and
> don't know what I should do if files have different ones. I suspect it
> is "OutputConditionIdentifier".
>
>
> Example of an outputIntent:
>
> <<
> /Type/OutputIntent
> /S/GTS_PDFA1
> /OutputCondition(U.S. Web Coated \(SWOP\) v2)
> /OutputConditionIdentifier(CGATS TR 001)
> /Info(U.S. Web Coated \(SWOP\) v2)
> /DestOutputProfile 4 0 R
>>>
>
> 4 0 obj
>
> <<
> /N 4
> /Filter/FlateDecode
> /Length 389758
>>>
> stream
> ...
> endstream
>
> endobj
>
>
> If you tell more what you're trying to do (one time only problem or
> not?), maybe I can help...
>
> Tilman
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: How to merge PDF/A-1b documents and keep conformity
Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi,
> Am 15.04.2016 um 12:35 schrieb alexander.kriegisch@extern.sdv-it.de:
>
> Basically your hack works if I overwrite PDFMergerUtility (extending it is
> no option even in the same package because 'appendDocument()' needs
> private members). I had to modify your snippet by this in order to avoid
> adding multiple intents, leading to a validation error:
>
> private boolean hasIntent = false;
> ...
> public void appendDocument(PDDocument destination, PDDocument source)
> throws IOException
> {
> ...
> if (!hasIntent) {
> hasIntent = true;
> List<PDOutputIntent> srcOutputIntents =
> srcCatalog.getOutputIntents();
> for (PDOutputIntent outputIntent : srcOutputIntents)
> destCatalog.addOutputIntent(outputIntent);
> }
> ...
> }
>
> It would be really nice if I could either tell the merger to set a given
> output intent or to copy the first one as shown above. How do I achieve
> this without duplicating your original code? An additional parameter for
> setting the desired PDF/A standard type or at least one for setting the
> top level output intent to the PDFMergerUtility constructor or to
> mergeDocuments() would be really nice.
would it be an option to do the merge first and remove the output intent that is needed/you'd like to keep on the merged document afterwards?
BR
Maruan
>
>
>
> Von: alexander.kriegisch@extern.sdv-it.de
> An: users@pdfbox.apache.org,
> Datum: 15.04.2016 11:11
> Betreff: Antwort: Re: How to merge PDF/A-1b documents and keep
> conformity
>
>
>
> Hi Tilman.
>
> What exactly do you need to know except for what I already told you in the
>
> "situation" paragraph? We currently use something like this:
>
> public InputStream merge(final List<InputStream> sources) throws
> IOException {
> PDFMergerUtility merger = new PDFMergerUtility();
> for (InputStream source : sources) {
> logger.trace("PDF merger source = {}", source);
> merger.addSource(source);
> }
> PipedOutputStream outputStream = new PipedOutputStream();
> PipedInputStream inputStream = new PipedInputStream(outputStream);
> merger.setDestinationStream(outputStream);
> new Thread(() -> {
> try {
> merger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
> } catch (IOException e) {
> logger.error("PDF merge problem", e);
> }
> }).start();
> return inputStream;
> }
>
> Does that help? By the way, I need an automated, stable PDF merge
> solution, not a one-time hack including manual editing in Notepad++.
> Furthermore, I cannot just add code to your API, I would like to use the
> API as is. I tried to quick & dirty extend PDFMergerUtility with a
> subclass and overwrite 'appendDocument', copying all the original source
> code. But the thing is, that methods uses non-public classes like
> PDFCloneUtility and non-public members etc. I could only try to use the
> same package as the original, but this is not nice.
>
> The source documents are, as I said, PDF/A-1b compliant, all of them
> created by the same output manegement system. So I guess the output
> intents (whatever that means) are similar or identical.
>
> Regards
> --
> Alexander Kriegisch
>
>
>
>
> Von: Tilman Hausherr <TH...@t-online.de>
> An: users@pdfbox.apache.org,
> Datum: 13.04.2016 18:20
> Betreff: Re: How to merge PDF/A-1b documents and keep conformity
>
>
>
> Am 13.04.2016 um 12:03 schrieb alexander.kriegisch@extern.sdv-it.de:
>> Hi, I am new to this list.
>>
>> My profile is: experienced Java programmer, knowing how to use
>> PDFMergerUtility, not not a PDF or even PDF/A-1b expert.
>>
>> Situation: We have a bunch of PDF/A-1b compliant documents from a 3rd
>> party system and merge them into a new document. The end result is not
>> PDF/A-1b compliant though.
>>
>> I found this on the mailing list archive:
>>
> http://pdfbox-users.markmail.org/search/?q=merge%20pdf%2Fa#query:merge%20pdf%2Fa+page:1+mid:uwvybz6lhgof3agg+state:results
>
>
>> Is there a better answer today than to look into PDFMergerUtility
> sources?
>> Because this class is what we are using, but it does not do it, at least
>> not in version 1.8.9. Is there a reason to assume that this has changed
> in
>> 2.x?
>>
> You didn't mention what went wrong. I had that problem once with 2 files
> from the same source, what I did is:
>
> 1) in 2.0 source code (I won't bother with 1.8) add this in
> PDFMergerUtility.appendDocument() above the comment "merge logical
> structure hierarchy":
>
> List<PDOutputIntent> srcOutputIntents =
> srcCatalog.getOutputIntents();
> for (PDOutputIntent outputIntent : srcOutputIntents)
> {
> destCatalog.addOutputIntent(outputIntent);
> }
>
> then I edited the result PDF manually to remove one of the output
> intents. The result PDF should have something like this:
>
> /OutputIntents [7 0 R 8 0 R]
>
> just blank one of the two, e.g. like this:
>
> /OutputIntents [7 0 R ]
>
> make sure that you don't change any positions, i.e. switch your editor
> (NOTEPAD++) to overwrite.
>
> This may or may not work... if the two files have different output
> intents, then you'll have surprises, obviously.
>
> I haven't done any code changes... I don't know for sure what element of
> the outputIntent is the "key" (so to skip others with the same key), and
> don't know what I should do if files have different ones. I suspect it
> is "OutputConditionIdentifier".
>
>
> Example of an outputIntent:
>
> <<
> /Type/OutputIntent
> /S/GTS_PDFA1
> /OutputCondition(U.S. Web Coated \(SWOP\) v2)
> /OutputConditionIdentifier(CGATS TR 001)
> /Info(U.S. Web Coated \(SWOP\) v2)
> /DestOutputProfile 4 0 R
>>>
>
> 4 0 obj
>
> <<
> /N 4
> /Filter/FlateDecode
> /Length 389758
>>>
> stream
> ...
> endstream
>
> endobj
>
>
> If you tell more what you're trying to do (one time only problem or
> not?), maybe I can help...
>
> Tilman
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: How to merge PDF/A-1b documents and keep conformity
Posted by al...@extern.sdv-it.de.
Basically your hack works if I overwrite PDFMergerUtility (extending it is
no option even in the same package because 'appendDocument()' needs
private members). I had to modify your snippet by this in order to avoid
adding multiple intents, leading to a validation error:
private boolean hasIntent = false;
...
public void appendDocument(PDDocument destination, PDDocument source)
throws IOException
{
...
if (!hasIntent) {
hasIntent = true;
List<PDOutputIntent> srcOutputIntents =
srcCatalog.getOutputIntents();
for (PDOutputIntent outputIntent : srcOutputIntents)
destCatalog.addOutputIntent(outputIntent);
}
...
}
It would be really nice if I could either tell the merger to set a given
output intent or to copy the first one as shown above. How do I achieve
this without duplicating your original code? An additional parameter for
setting the desired PDF/A standard type or at least one for setting the
top level output intent to the PDFMergerUtility constructor or to
mergeDocuments() would be really nice.
Von: alexander.kriegisch@extern.sdv-it.de
An: users@pdfbox.apache.org,
Datum: 15.04.2016 11:11
Betreff: Antwort: Re: How to merge PDF/A-1b documents and keep
conformity
Hi Tilman.
What exactly do you need to know except for what I already told you in the
"situation" paragraph? We currently use something like this:
public InputStream merge(final List<InputStream> sources) throws
IOException {
PDFMergerUtility merger = new PDFMergerUtility();
for (InputStream source : sources) {
logger.trace("PDF merger source = {}", source);
merger.addSource(source);
}
PipedOutputStream outputStream = new PipedOutputStream();
PipedInputStream inputStream = new PipedInputStream(outputStream);
merger.setDestinationStream(outputStream);
new Thread(() -> {
try {
merger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
} catch (IOException e) {
logger.error("PDF merge problem", e);
}
}).start();
return inputStream;
}
Does that help? By the way, I need an automated, stable PDF merge
solution, not a one-time hack including manual editing in Notepad++.
Furthermore, I cannot just add code to your API, I would like to use the
API as is. I tried to quick & dirty extend PDFMergerUtility with a
subclass and overwrite 'appendDocument', copying all the original source
code. But the thing is, that methods uses non-public classes like
PDFCloneUtility and non-public members etc. I could only try to use the
same package as the original, but this is not nice.
The source documents are, as I said, PDF/A-1b compliant, all of them
created by the same output manegement system. So I guess the output
intents (whatever that means) are similar or identical.
Regards
--
Alexander Kriegisch
Von: Tilman Hausherr <TH...@t-online.de>
An: users@pdfbox.apache.org,
Datum: 13.04.2016 18:20
Betreff: Re: How to merge PDF/A-1b documents and keep conformity
Am 13.04.2016 um 12:03 schrieb alexander.kriegisch@extern.sdv-it.de:
> Hi, I am new to this list.
>
> My profile is: experienced Java programmer, knowing how to use
> PDFMergerUtility, not not a PDF or even PDF/A-1b expert.
>
> Situation: We have a bunch of PDF/A-1b compliant documents from a 3rd
> party system and merge them into a new document. The end result is not
> PDF/A-1b compliant though.
>
> I found this on the mailing list archive:
>
http://pdfbox-users.markmail.org/search/?q=merge%20pdf%2Fa#query:merge%20pdf%2Fa+page:1+mid:uwvybz6lhgof3agg+state:results
> Is there a better answer today than to look into PDFMergerUtility
sources?
> Because this class is what we are using, but it does not do it, at least
> not in version 1.8.9. Is there a reason to assume that this has changed
in
> 2.x?
>
You didn't mention what went wrong. I had that problem once with 2 files
from the same source, what I did is:
1) in 2.0 source code (I won't bother with 1.8) add this in
PDFMergerUtility.appendDocument() above the comment "merge logical
structure hierarchy":
List<PDOutputIntent> srcOutputIntents =
srcCatalog.getOutputIntents();
for (PDOutputIntent outputIntent : srcOutputIntents)
{
destCatalog.addOutputIntent(outputIntent);
}
then I edited the result PDF manually to remove one of the output
intents. The result PDF should have something like this:
/OutputIntents [7 0 R 8 0 R]
just blank one of the two, e.g. like this:
/OutputIntents [7 0 R ]
make sure that you don't change any positions, i.e. switch your editor
(NOTEPAD++) to overwrite.
This may or may not work... if the two files have different output
intents, then you'll have surprises, obviously.
I haven't done any code changes... I don't know for sure what element of
the outputIntent is the "key" (so to skip others with the same key), and
don't know what I should do if files have different ones. I suspect it
is "OutputConditionIdentifier".
Example of an outputIntent:
<<
/Type/OutputIntent
/S/GTS_PDFA1
/OutputCondition(U.S. Web Coated \(SWOP\) v2)
/OutputConditionIdentifier(CGATS TR 001)
/Info(U.S. Web Coated \(SWOP\) v2)
/DestOutputProfile 4 0 R
>>
4 0 obj
<<
/N 4
/Filter/FlateDecode
/Length 389758
>>
stream
...
endstream
endobj
If you tell more what you're trying to do (one time only problem or
not?), maybe I can help...
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Antwort: Re: How to merge PDF/A-1b documents and keep conformity
Posted by al...@extern.sdv-it.de.
Hi Tilman.
What exactly do you need to know except for what I already told you in the
"situation" paragraph? We currently use something like this:
public InputStream merge(final List<InputStream> sources) throws
IOException {
PDFMergerUtility merger = new PDFMergerUtility();
for (InputStream source : sources) {
logger.trace("PDF merger source = {}", source);
merger.addSource(source);
}
PipedOutputStream outputStream = new PipedOutputStream();
PipedInputStream inputStream = new PipedInputStream(outputStream);
merger.setDestinationStream(outputStream);
new Thread(() -> {
try {
merger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
} catch (IOException e) {
logger.error("PDF merge problem", e);
}
}).start();
return inputStream;
}
Does that help? By the way, I need an automated, stable PDF merge
solution, not a one-time hack including manual editing in Notepad++.
Furthermore, I cannot just add code to your API, I would like to use the
API as is. I tried to quick & dirty extend PDFMergerUtility with a
subclass and overwrite 'appendDocument', copying all the original source
code. But the thing is, that methods uses non-public classes like
PDFCloneUtility and non-public members etc. I could only try to use the
same package as the original, but this is not nice.
The source documents are, as I said, PDF/A-1b compliant, all of them
created by the same output manegement system. So I guess the output
intents (whatever that means) are similar or identical.
Regards
--
Alexander Kriegisch
Von: Tilman Hausherr <TH...@t-online.de>
An: users@pdfbox.apache.org,
Datum: 13.04.2016 18:20
Betreff: Re: How to merge PDF/A-1b documents and keep conformity
Am 13.04.2016 um 12:03 schrieb alexander.kriegisch@extern.sdv-it.de:
> Hi, I am new to this list.
>
> My profile is: experienced Java programmer, knowing how to use
> PDFMergerUtility, not not a PDF or even PDF/A-1b expert.
>
> Situation: We have a bunch of PDF/A-1b compliant documents from a 3rd
> party system and merge them into a new document. The end result is not
> PDF/A-1b compliant though.
>
> I found this on the mailing list archive:
>
http://pdfbox-users.markmail.org/search/?q=merge%20pdf%2Fa#query:merge%20pdf%2Fa+page:1+mid:uwvybz6lhgof3agg+state:results
> Is there a better answer today than to look into PDFMergerUtility
sources?
> Because this class is what we are using, but it does not do it, at least
> not in version 1.8.9. Is there a reason to assume that this has changed
in
> 2.x?
>
You didn't mention what went wrong. I had that problem once with 2 files
from the same source, what I did is:
1) in 2.0 source code (I won't bother with 1.8) add this in
PDFMergerUtility.appendDocument() above the comment "merge logical
structure hierarchy":
List<PDOutputIntent> srcOutputIntents =
srcCatalog.getOutputIntents();
for (PDOutputIntent outputIntent : srcOutputIntents)
{
destCatalog.addOutputIntent(outputIntent);
}
then I edited the result PDF manually to remove one of the output
intents. The result PDF should have something like this:
/OutputIntents [7 0 R 8 0 R]
just blank one of the two, e.g. like this:
/OutputIntents [7 0 R ]
make sure that you don't change any positions, i.e. switch your editor
(NOTEPAD++) to overwrite.
This may or may not work... if the two files have different output
intents, then you'll have surprises, obviously.
I haven't done any code changes... I don't know for sure what element of
the outputIntent is the "key" (so to skip others with the same key), and
don't know what I should do if files have different ones. I suspect it
is "OutputConditionIdentifier".
Example of an outputIntent:
<<
/Type/OutputIntent
/S/GTS_PDFA1
/OutputCondition(U.S. Web Coated \(SWOP\) v2)
/OutputConditionIdentifier(CGATS TR 001)
/Info(U.S. Web Coated \(SWOP\) v2)
/DestOutputProfile 4 0 R
>>
4 0 obj
<<
/N 4
/Filter/FlateDecode
/Length 389758
>>
stream
...
endstream
endobj
If you tell more what you're trying to do (one time only problem or
not?), maybe I can help...
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: How to merge PDF/A-1b documents and keep conformity
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 13.04.2016 um 12:03 schrieb alexander.kriegisch@extern.sdv-it.de:
> Hi, I am new to this list.
>
> My profile is: experienced Java programmer, knowing how to use
> PDFMergerUtility, not not a PDF or even PDF/A-1b expert.
>
> Situation: We have a bunch of PDF/A-1b compliant documents from a 3rd
> party system and merge them into a new document. The end result is not
> PDF/A-1b compliant though.
>
> I found this on the mailing list archive:
> http://pdfbox-users.markmail.org/search/?q=merge%20pdf%2Fa#query:merge%20pdf%2Fa+page:1+mid:uwvybz6lhgof3agg+state:results
> Is there a better answer today than to look into PDFMergerUtility sources?
> Because this class is what we are using, but it does not do it, at least
> not in version 1.8.9. Is there a reason to assume that this has changed in
> 2.x?
>
You didn't mention what went wrong. I had that problem once with 2 files
from the same source, what I did is:
1) in 2.0 source code (I won't bother with 1.8) add this in
PDFMergerUtility.appendDocument() above the comment "merge logical
structure hierarchy":
List<PDOutputIntent> srcOutputIntents =
srcCatalog.getOutputIntents();
for (PDOutputIntent outputIntent : srcOutputIntents)
{
destCatalog.addOutputIntent(outputIntent);
}
then I edited the result PDF manually to remove one of the output
intents. The result PDF should have something like this:
/OutputIntents [7 0 R 8 0 R]
just blank one of the two, e.g. like this:
/OutputIntents [7 0 R ]
make sure that you don't change any positions, i.e. switch your editor
(NOTEPAD++) to overwrite.
This may or may not work... if the two files have different output
intents, then you'll have surprises, obviously.
I haven't done any code changes... I don't know for sure what element of
the outputIntent is the "key" (so to skip others with the same key), and
don't know what I should do if files have different ones. I suspect it
is "OutputConditionIdentifier".
Example of an outputIntent:
<<
/Type/OutputIntent
/S/GTS_PDFA1
/OutputCondition(U.S. Web Coated \(SWOP\) v2)
/OutputConditionIdentifier(CGATS TR 001)
/Info(U.S. Web Coated \(SWOP\) v2)
/DestOutputProfile 4 0 R
>>
4 0 obj
<<
/N 4
/Filter/FlateDecode
/Length 389758
>>
stream
...
endstream
endobj
If you tell more what you're trying to do (one time only problem or
not?), maybe I can help...
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org