You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by al...@extern.sdv-it.de on 2016/04/13 12:03:36 UTC

How to merge PDF/A-1b documents and keep conformity

Hi, I am new to this list.

My profile is: experienced Java programmer, knowing how to use 
PDFMergerUtility, not not a PDF or even PDF/A-1b expert.

Situation: We have a bunch of PDF/A-1b compliant documents from a 3rd 
party system and merge them into a new document. The end result is not 
PDF/A-1b compliant though.

I found this on the mailing list archive:
http://pdfbox-users.markmail.org/search/?q=merge%20pdf%2Fa#query:merge%20pdf%2Fa+page:1+mid:uwvybz6lhgof3agg+state:results
Is there a better answer today than to look into PDFMergerUtility sources? 
Because this class is what we are using, but it does not do it, at least 
not in version 1.8.9. Is there a reason to assume that this has changed in 
2.x?

I also found this: 
https://pdfbox.apache.org/1.8/cookbook/pdfacreation.html
It does not explain how do it in a merge, though.

I would appreciate concrete hints, sample code, links to technical 
documentation giving me hints etc.

Kind regards
--
Alexander Kriegisch

Re: How to merge PDF/A-1b documents and keep conformity

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 15.04.2016 um 12:35 schrieb alexander.kriegisch@extern.sdv-it.de:
> It would be really nice if I could either tell the merger to set a given
> output intent or to copy the first one as shown above. How do I achieve
> this without duplicating your original code? An additional parameter for
> setting the desired PDF/A standard type or at least one for setting the
> top level output intent to the PDFMergerUtility constructor or to
> mergeDocuments() would be really nice.

I'll think about a solution in the source code, i.e. so that it would 
work with the command line utility as well. My thought is to check for 
the OutputConditionIdentifier and to add only one such outputIntent. 
This would solve all cases where the files are created by the same 
source. I'll work on that after getting some high quality sleep.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Cannot comment on Jira issues anymore

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Am 22.04.2016 um 16:46 schrieb Petras Petkus:
> I'm also unable to add comment to the PDFBOX-3321 issue (created by me). Could you please also include my account too or should I better wait? Thank you.
Done

BR
Andreas
>
> With best regards,
> Petras Petkus
>
>
> -----Original Message-----
> From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
> Sent: Friday, April 22, 2016 11:00 AM
> To: users@pdfbox.apache.org
> Subject: Re: Cannot comment on Jira issues anymore
>
> Hi,
>
>> alexander.kriegisch@extern.sdv-it.de hat am 22. April 2016 um 09:50
>> geschrieben:
>>
>>
>> Sorry to bother everyone here on the mailing list, but something seems
>> to be wrong in Jira: I cannot comment on
>> https://issues.apache.org/jira/browse/PDFBOX-3323 and other issues
>> anymore, the comment button has vanished.
> Infra changed the auth settings for JIRA due to lot of spam. According to a discussion on infra@ they are working on a solution to be able revert that change. They expect to get this done within the next 24 - 48 hours at most.
>
> I've added your JIRA-account to the contributor-group so that you should be able to comment again.
>
> BR
> Andreas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


RE: Cannot comment on Jira issues anymore

Posted by Petras Petkus <pe...@mitsoft.lt>.
I'm also unable to add comment to the PDFBOX-3321 issue (created by me). Could you please also include my account too or should I better wait? Thank you.

With best regards,
Petras Petkus


-----Original Message-----
From: Andreas Lehmkühler [mailto:andreas@lehmi.de] 
Sent: Friday, April 22, 2016 11:00 AM
To: users@pdfbox.apache.org
Subject: Re: Cannot comment on Jira issues anymore

Hi,

> alexander.kriegisch@extern.sdv-it.de hat am 22. April 2016 um 09:50
> geschrieben:
> 
> 
> Sorry to bother everyone here on the mailing list, but something seems 
> to be wrong in Jira: I cannot comment on
> https://issues.apache.org/jira/browse/PDFBOX-3323 and other issues 
> anymore, the comment button has vanished.
Infra changed the auth settings for JIRA due to lot of spam. According to a discussion on infra@ they are working on a solution to be able revert that change. They expect to get this done within the next 24 - 48 hours at most. 

I've added your JIRA-account to the contributor-group so that you should be able to comment again.

BR
Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Cannot comment on Jira issues anymore

Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi,

> alexander.kriegisch@extern.sdv-it.de hat am 22. April 2016 um 09:50
> geschrieben:
> 
> 
> Sorry to bother everyone here on the mailing list, but something seems to 
> be wrong in Jira: I cannot comment on 
> https://issues.apache.org/jira/browse/PDFBOX-3323 and other issues 
> anymore, the comment button has vanished.
Infra changed the auth settings for JIRA due to lot of spam. According to a
discussion on infra@ they are working on a solution to be able revert that
change. They expect to get this done within the next 24 - 48 hours at most. 

I've added your JIRA-account to the contributor-group so that you should be able
to comment again.

BR
Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Cannot comment on Jira issues anymore

Posted by al...@extern.sdv-it.de.
Sorry to bother everyone here on the mailing list, but something seems to 
be wrong in Jira: I cannot comment on 
https://issues.apache.org/jira/browse/PDFBOX-3323 and other issues 
anymore, the comment button has vanished.

Re: How to merge PDF/A-1b documents and keep conformity

Posted by al...@extern.sdv-it.de.
Hi Tilman.

As I already said in the ticket, it works. I guess this will make a lot 
more users happy in the future. Thanks a buch! :-) Let us continue the 
discussion in Jira.

Regards
--
Alexander Kriegisch




Von:    Tilman Hausherr <TH...@t-online.de>
An:     users@pdfbox.apache.org, 
Datum:  16.04.2016 23:15
Betreff:        Re: Antwort: Re: How to merge PDF/A-1b documents and keep 
conformity



I have opened a new issue
https://issues.apache.org/jira/browse/PDFBOX-3317

and created a solution that requires no changes, please try it.

get a -SNAPSHOT version with maven, or find it here in a few hours:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.1-SNAPSHOT/


please give feedback whether it worked.

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org




Re: Antwort: Re: How to merge PDF/A-1b documents and keep conformity

Posted by Tilman Hausherr <TH...@t-online.de>.
I have opened a new issue
https://issues.apache.org/jira/browse/PDFBOX-3317

and created a solution that requires no changes, please try it.

get a -SNAPSHOT version with maven, or find it here in a few hours:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.1-SNAPSHOT/

please give feedback whether it worked.

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Antwort: Re: How to merge PDF/A-1b documents and keep conformity

Posted by al...@extern.sdv-it.de.
If you mean saving to a temp-file, re-reading and manipulating it, writing 
it again, this is not an option because performance is very important for 
us. As you can see from my code snippet, I am already using piped streams 
to avoid disk I/O. But anyway, Maruan, what is your suggestion?




Von:    Maruan Sahyoun <sa...@fileaffairs.de>
An:     users@pdfbox.apache.org, 
Datum:  15.04.2016 12:54
Betreff:        Re: How to merge PDF/A-1b documents and keep conformity



Hi,

> Am 15.04.2016 um 12:35 schrieb alexander.kriegisch@extern.sdv-it.de:
> 
> Basically your hack works if I overwrite PDFMergerUtility (extending it 
is 
> no option even in the same package because 'appendDocument()' needs 
> private members). I had to modify your snippet by this in order to avoid 

> adding multiple intents, leading to a validation error:
> 
>  private boolean hasIntent = false;
>  ...
>  public void appendDocument(PDDocument destination, PDDocument source) 
> throws IOException
>  {
>    ...
>    if (!hasIntent) {
>      hasIntent = true;
>      List<PDOutputIntent> srcOutputIntents =
>        srcCatalog.getOutputIntents();
>     for (PDOutputIntent outputIntent : srcOutputIntents)
>        destCatalog.addOutputIntent(outputIntent);
>    }
>    ...
>  }
> 
> It would be really nice if I could either tell the merger to set a given 

> output intent or to copy the first one as shown above. How do I achieve 
> this without duplicating your original code? An additional parameter for 

> setting the desired PDF/A standard type or at least one for setting the 
> top level output intent to the PDFMergerUtility constructor or to 
> mergeDocuments() would be really nice.

would it be an option to do the merge first and remove the output intent 
that is needed/you'd like to keep on the merged document afterwards?
BR
Maruan

> 
> 
> 
> Von:    alexander.kriegisch@extern.sdv-it.de
> An:     users@pdfbox.apache.org, 
> Datum:  15.04.2016 11:11
> Betreff:        Antwort: Re: How to merge PDF/A-1b documents and keep 
> conformity
> 
> 
> 
> Hi Tilman.
> 
> What exactly do you need to know except for what I already told you in 
the 
> 
> "situation" paragraph? We currently use something like this:
> 
> public InputStream merge(final List<InputStream> sources) throws 
> IOException {
>  PDFMergerUtility merger = new PDFMergerUtility();
>  for (InputStream source : sources) {
>    logger.trace("PDF merger source = {}", source);
>    merger.addSource(source);
>  }
>  PipedOutputStream outputStream = new PipedOutputStream();
>  PipedInputStream inputStream = new PipedInputStream(outputStream);
>  merger.setDestinationStream(outputStream);
>  new Thread(() -> {
>    try {
>      merger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
>    } catch (IOException e) {
>      logger.error("PDF merge problem", e);
>    }
>  }).start();
>  return inputStream;
> }
> 
> Does that help? By the way, I need an automated, stable PDF merge 
> solution, not a one-time hack including manual editing in Notepad++. 
> Furthermore, I cannot just add code to your API, I would like to use the 

> API as is. I tried to quick & dirty extend PDFMergerUtility with a 
> subclass and overwrite 'appendDocument', copying all the original source 

> code. But the thing is, that methods uses non-public classes like 
> PDFCloneUtility and non-public members etc. I could only try to use the 
> same package as the original, but this is not nice.
> 
> The source documents are, as I said, PDF/A-1b compliant, all of them 
> created by the same output manegement system. So I guess the output 
> intents (whatever that means) are similar or identical.
> 
> Regards
> --
> Alexander Kriegisch
> 
> 
> 
> 
> Von:    Tilman Hausherr <TH...@t-online.de>
> An:     users@pdfbox.apache.org, 
> Datum:  13.04.2016 18:20
> Betreff:        Re: How to merge PDF/A-1b documents and keep conformity
> 
> 
> 
> Am 13.04.2016 um 12:03 schrieb alexander.kriegisch@extern.sdv-it.de:
>> Hi, I am new to this list.
>> 
>> My profile is: experienced Java programmer, knowing how to use
>> PDFMergerUtility, not not a PDF or even PDF/A-1b expert.
>> 
>> Situation: We have a bunch of PDF/A-1b compliant documents from a 3rd
>> party system and merge them into a new document. The end result is not
>> PDF/A-1b compliant though.
>> 
>> I found this on the mailing list archive:
>> 
> 
http://pdfbox-users.markmail.org/search/?q=merge%20pdf%2Fa#query:merge%20pdf%2Fa+page:1+mid:uwvybz6lhgof3agg+state:results

> 
> 
>> Is there a better answer today than to look into PDFMergerUtility 
> sources?
>> Because this class is what we are using, but it does not do it, at 
least
>> not in version 1.8.9. Is there a reason to assume that this has changed 

> in
>> 2.x?
>> 
> You didn't mention what went wrong. I had that problem once with 2 files 

> from the same source, what I did is:
> 
> 1) in 2.0 source code (I won't bother with 1.8) add this in 
> PDFMergerUtility.appendDocument() above the comment "merge logical 
> structure hierarchy":
> 
>         List<PDOutputIntent> srcOutputIntents = 
> srcCatalog.getOutputIntents();
>         for (PDOutputIntent outputIntent : srcOutputIntents)
>         {
>             destCatalog.addOutputIntent(outputIntent);
>         }
> 
> then I edited the result PDF manually to remove one of the output 
> intents. The result PDF should have something like this:
> 
> /OutputIntents [7 0 R 8 0 R]
> 
> just blank one of the two, e.g. like this:
> 
> /OutputIntents [7 0 R      ]
> 
> make sure that you don't change any positions, i.e. switch your editor 
> (NOTEPAD++) to overwrite.
> 
> This may or may not work... if the two files have different output 
> intents, then you'll have surprises, obviously.
> 
> I haven't done any code changes... I don't know for sure what element of 

> the outputIntent is the "key" (so to skip others with the same key), and 

> don't know what I should do if files have different ones. I suspect it 
> is "OutputConditionIdentifier".
> 
> 
> Example of an outputIntent:
> 
> <<
> /Type/OutputIntent
> /S/GTS_PDFA1
> /OutputCondition(U.S. Web Coated \(SWOP\) v2)
> /OutputConditionIdentifier(CGATS TR 001)
> /Info(U.S. Web Coated \(SWOP\) v2)
> /DestOutputProfile 4 0 R
>>> 
> 
> 4 0 obj
> 
> <<
> /N 4
> /Filter/FlateDecode
> /Length 389758
>>> 
> stream
> ...
> endstream
> 
> endobj
> 
> 
> If you tell more what you're trying to do (one time only problem or 
> not?), maybe I can help...
> 
> Tilman
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org




Re: How to merge PDF/A-1b documents and keep conformity

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi,

> Am 15.04.2016 um 12:35 schrieb alexander.kriegisch@extern.sdv-it.de:
> 
> Basically your hack works if I overwrite PDFMergerUtility (extending it is 
> no option even in the same package because 'appendDocument()' needs 
> private members). I had to modify your snippet by this in order to avoid 
> adding multiple intents, leading to a validation error:
> 
>  private boolean hasIntent = false;
>  ...
>  public void appendDocument(PDDocument destination, PDDocument source) 
> throws IOException
>  {
>    ...
>    if (!hasIntent) {
>      hasIntent = true;
>      List<PDOutputIntent> srcOutputIntents =
>        srcCatalog.getOutputIntents();
>     for (PDOutputIntent outputIntent : srcOutputIntents)
>        destCatalog.addOutputIntent(outputIntent);
>    }
>    ...
>  }
> 
> It would be really nice if I could either tell the merger to set a given 
> output intent or to copy the first one as shown above. How do I achieve 
> this without duplicating your original code? An additional parameter for 
> setting the desired PDF/A standard type or at least one for setting the 
> top level output intent to the PDFMergerUtility constructor or to 
> mergeDocuments() would be really nice.

would it be an option to do the merge first and remove the output intent that is needed/you'd like to keep on the merged document afterwards?
BR
Maruan

> 
> 
> 
> Von:    alexander.kriegisch@extern.sdv-it.de
> An:     users@pdfbox.apache.org, 
> Datum:  15.04.2016 11:11
> Betreff:        Antwort: Re: How to merge PDF/A-1b documents and keep 
> conformity
> 
> 
> 
> Hi Tilman.
> 
> What exactly do you need to know except for what I already told you in the 
> 
> "situation" paragraph? We currently use something like this:
> 
> public InputStream merge(final List<InputStream> sources) throws 
> IOException {
>  PDFMergerUtility merger = new PDFMergerUtility();
>  for (InputStream source : sources) {
>    logger.trace("PDF merger source = {}", source);
>    merger.addSource(source);
>  }
>  PipedOutputStream outputStream = new PipedOutputStream();
>  PipedInputStream inputStream = new PipedInputStream(outputStream);
>  merger.setDestinationStream(outputStream);
>  new Thread(() -> {
>    try {
>      merger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
>    } catch (IOException e) {
>      logger.error("PDF merge problem", e);
>    }
>  }).start();
>  return inputStream;
> }
> 
> Does that help? By the way, I need an automated, stable PDF merge 
> solution, not a one-time hack including manual editing in Notepad++. 
> Furthermore, I cannot just add code to your API, I would like to use the 
> API as is. I tried to quick & dirty extend PDFMergerUtility with a 
> subclass and overwrite 'appendDocument', copying all the original source 
> code. But the thing is, that methods uses non-public classes like 
> PDFCloneUtility and non-public members etc. I could only try to use the 
> same package as the original, but this is not nice.
> 
> The source documents are, as I said, PDF/A-1b compliant, all of them 
> created by the same output manegement system. So I guess the output 
> intents (whatever that means) are similar or identical.
> 
> Regards
> --
> Alexander Kriegisch
> 
> 
> 
> 
> Von:    Tilman Hausherr <TH...@t-online.de>
> An:     users@pdfbox.apache.org, 
> Datum:  13.04.2016 18:20
> Betreff:        Re: How to merge PDF/A-1b documents and keep conformity
> 
> 
> 
> Am 13.04.2016 um 12:03 schrieb alexander.kriegisch@extern.sdv-it.de:
>> Hi, I am new to this list.
>> 
>> My profile is: experienced Java programmer, knowing how to use
>> PDFMergerUtility, not not a PDF or even PDF/A-1b expert.
>> 
>> Situation: We have a bunch of PDF/A-1b compliant documents from a 3rd
>> party system and merge them into a new document. The end result is not
>> PDF/A-1b compliant though.
>> 
>> I found this on the mailing list archive:
>> 
> http://pdfbox-users.markmail.org/search/?q=merge%20pdf%2Fa#query:merge%20pdf%2Fa+page:1+mid:uwvybz6lhgof3agg+state:results
> 
> 
>> Is there a better answer today than to look into PDFMergerUtility 
> sources?
>> Because this class is what we are using, but it does not do it, at least
>> not in version 1.8.9. Is there a reason to assume that this has changed 
> in
>> 2.x?
>> 
> You didn't mention what went wrong. I had that problem once with 2 files 
> from the same source, what I did is:
> 
> 1) in 2.0 source code (I won't bother with 1.8) add this in 
> PDFMergerUtility.appendDocument() above the comment "merge logical 
> structure hierarchy":
> 
>         List<PDOutputIntent> srcOutputIntents = 
> srcCatalog.getOutputIntents();
>         for (PDOutputIntent outputIntent : srcOutputIntents)
>         {
>             destCatalog.addOutputIntent(outputIntent);
>         }
> 
> then I edited the result PDF manually to remove one of the output 
> intents. The result PDF should have something like this:
> 
> /OutputIntents [7 0 R 8 0 R]
> 
> just blank one of the two, e.g. like this:
> 
> /OutputIntents [7 0 R      ]
> 
> make sure that you don't change any positions, i.e. switch your editor 
> (NOTEPAD++) to overwrite.
> 
> This may or may not work... if the two files have different output 
> intents, then you'll have surprises, obviously.
> 
> I haven't done any code changes... I don't know for sure what element of 
> the outputIntent is the "key" (so to skip others with the same key), and 
> don't know what I should do if files have different ones. I suspect it 
> is "OutputConditionIdentifier".
> 
> 
> Example of an outputIntent:
> 
> <<
> /Type/OutputIntent
> /S/GTS_PDFA1
> /OutputCondition(U.S. Web Coated \(SWOP\) v2)
> /OutputConditionIdentifier(CGATS TR 001)
> /Info(U.S. Web Coated \(SWOP\) v2)
> /DestOutputProfile 4 0 R
>>> 
> 
> 4 0 obj
> 
> <<
> /N 4
> /Filter/FlateDecode
> /Length 389758
>>> 
> stream
> ...
> endstream
> 
> endobj
> 
> 
> If you tell more what you're trying to do (one time only problem or 
> not?), maybe I can help...
> 
> Tilman
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: How to merge PDF/A-1b documents and keep conformity

Posted by al...@extern.sdv-it.de.
Basically your hack works if I overwrite PDFMergerUtility (extending it is 
no option even in the same package because 'appendDocument()' needs 
private members). I had to modify your snippet by this in order to avoid 
adding multiple intents, leading to a validation error:

  private boolean hasIntent = false;
  ...
  public void appendDocument(PDDocument destination, PDDocument source) 
throws IOException
  {
    ...
    if (!hasIntent) {
      hasIntent = true;
      List<PDOutputIntent> srcOutputIntents =
        srcCatalog.getOutputIntents();
     for (PDOutputIntent outputIntent : srcOutputIntents)
        destCatalog.addOutputIntent(outputIntent);
    }
    ...
  }

It would be really nice if I could either tell the merger to set a given 
output intent or to copy the first one as shown above. How do I achieve 
this without duplicating your original code? An additional parameter for 
setting the desired PDF/A standard type or at least one for setting the 
top level output intent to the PDFMergerUtility constructor or to 
mergeDocuments() would be really nice.



Von:    alexander.kriegisch@extern.sdv-it.de
An:     users@pdfbox.apache.org, 
Datum:  15.04.2016 11:11
Betreff:        Antwort: Re: How to merge PDF/A-1b documents and keep 
conformity



Hi Tilman.

What exactly do you need to know except for what I already told you in the 

"situation" paragraph? We currently use something like this:

public InputStream merge(final List<InputStream> sources) throws 
IOException {
  PDFMergerUtility merger = new PDFMergerUtility();
  for (InputStream source : sources) {
    logger.trace("PDF merger source = {}", source);
    merger.addSource(source);
  }
  PipedOutputStream outputStream = new PipedOutputStream();
  PipedInputStream inputStream = new PipedInputStream(outputStream);
  merger.setDestinationStream(outputStream);
  new Thread(() -> {
    try {
      merger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
    } catch (IOException e) {
      logger.error("PDF merge problem", e);
    }
  }).start();
  return inputStream;
}

Does that help? By the way, I need an automated, stable PDF merge 
solution, not a one-time hack including manual editing in Notepad++. 
Furthermore, I cannot just add code to your API, I would like to use the 
API as is. I tried to quick & dirty extend PDFMergerUtility with a 
subclass and overwrite 'appendDocument', copying all the original source 
code. But the thing is, that methods uses non-public classes like 
PDFCloneUtility and non-public members etc. I could only try to use the 
same package as the original, but this is not nice.

The source documents are, as I said, PDF/A-1b compliant, all of them 
created by the same output manegement system. So I guess the output 
intents (whatever that means) are similar or identical.

Regards
--
Alexander Kriegisch




Von:    Tilman Hausherr <TH...@t-online.de>
An:     users@pdfbox.apache.org, 
Datum:  13.04.2016 18:20
Betreff:        Re: How to merge PDF/A-1b documents and keep conformity



Am 13.04.2016 um 12:03 schrieb alexander.kriegisch@extern.sdv-it.de:
> Hi, I am new to this list.
>
> My profile is: experienced Java programmer, knowing how to use
> PDFMergerUtility, not not a PDF or even PDF/A-1b expert.
>
> Situation: We have a bunch of PDF/A-1b compliant documents from a 3rd
> party system and merge them into a new document. The end result is not
> PDF/A-1b compliant though.
>
> I found this on the mailing list archive:
> 
http://pdfbox-users.markmail.org/search/?q=merge%20pdf%2Fa#query:merge%20pdf%2Fa+page:1+mid:uwvybz6lhgof3agg+state:results


> Is there a better answer today than to look into PDFMergerUtility 
sources?
> Because this class is what we are using, but it does not do it, at least
> not in version 1.8.9. Is there a reason to assume that this has changed 
in
> 2.x?
>
You didn't mention what went wrong. I had that problem once with 2 files 
from the same source, what I did is:

1) in 2.0 source code (I won't bother with 1.8) add this in 
PDFMergerUtility.appendDocument() above the comment "merge logical 
structure hierarchy":

         List<PDOutputIntent> srcOutputIntents = 
srcCatalog.getOutputIntents();
         for (PDOutputIntent outputIntent : srcOutputIntents)
         {
             destCatalog.addOutputIntent(outputIntent);
         }

then I edited the result PDF manually to remove one of the output 
intents. The result PDF should have something like this:

/OutputIntents [7 0 R 8 0 R]

just blank one of the two, e.g. like this:

/OutputIntents [7 0 R      ]

make sure that you don't change any positions, i.e. switch your editor 
(NOTEPAD++) to overwrite.

This may or may not work... if the two files have different output 
intents, then you'll have surprises, obviously.

I haven't done any code changes... I don't know for sure what element of 
the outputIntent is the "key" (so to skip others with the same key), and 
don't know what I should do if files have different ones. I suspect it 
is "OutputConditionIdentifier".


Example of an outputIntent:

<<
/Type/OutputIntent
/S/GTS_PDFA1
/OutputCondition(U.S. Web Coated \(SWOP\) v2)
/OutputConditionIdentifier(CGATS TR 001)
/Info(U.S. Web Coated \(SWOP\) v2)
/DestOutputProfile 4 0 R
 >>

4 0 obj

<<
/N 4
/Filter/FlateDecode
/Length 389758
 >>
stream
...
endstream

endobj


If you tell more what you're trying to do (one time only problem or 
not?), maybe I can help...

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org






Antwort: Re: How to merge PDF/A-1b documents and keep conformity

Posted by al...@extern.sdv-it.de.
Hi Tilman.

What exactly do you need to know except for what I already told you in the 
"situation" paragraph? We currently use something like this:

public InputStream merge(final List<InputStream> sources) throws 
IOException {
  PDFMergerUtility merger = new PDFMergerUtility();
  for (InputStream source : sources) {
    logger.trace("PDF merger source = {}", source);
    merger.addSource(source);
  }
  PipedOutputStream outputStream = new PipedOutputStream();
  PipedInputStream inputStream = new PipedInputStream(outputStream);
  merger.setDestinationStream(outputStream);
  new Thread(() -> {
    try {
      merger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
    } catch (IOException e) {
      logger.error("PDF merge problem", e);
    }
  }).start();
  return inputStream;
}

Does that help? By the way, I need an automated, stable PDF merge 
solution, not a one-time hack including manual editing in Notepad++. 
Furthermore, I cannot just add code to your API, I would like to use the 
API as is. I tried to quick & dirty extend PDFMergerUtility with a 
subclass and overwrite 'appendDocument', copying all the original source 
code. But the thing is, that methods uses non-public classes like 
PDFCloneUtility and non-public members etc. I could only try to use the 
same package as the original, but this is not nice.

The source documents are, as I said, PDF/A-1b compliant, all of them 
created by the same output manegement system. So I guess the output 
intents (whatever that means) are similar or identical.

Regards
--
Alexander Kriegisch




Von:    Tilman Hausherr <TH...@t-online.de>
An:     users@pdfbox.apache.org, 
Datum:  13.04.2016 18:20
Betreff:        Re: How to merge PDF/A-1b documents and keep conformity



Am 13.04.2016 um 12:03 schrieb alexander.kriegisch@extern.sdv-it.de:
> Hi, I am new to this list.
>
> My profile is: experienced Java programmer, knowing how to use
> PDFMergerUtility, not not a PDF or even PDF/A-1b expert.
>
> Situation: We have a bunch of PDF/A-1b compliant documents from a 3rd
> party system and merge them into a new document. The end result is not
> PDF/A-1b compliant though.
>
> I found this on the mailing list archive:
> 
http://pdfbox-users.markmail.org/search/?q=merge%20pdf%2Fa#query:merge%20pdf%2Fa+page:1+mid:uwvybz6lhgof3agg+state:results

> Is there a better answer today than to look into PDFMergerUtility 
sources?
> Because this class is what we are using, but it does not do it, at least
> not in version 1.8.9. Is there a reason to assume that this has changed 
in
> 2.x?
>
You didn't mention what went wrong. I had that problem once with 2 files 
from the same source, what I did is:

1) in 2.0 source code (I won't bother with 1.8) add this in 
PDFMergerUtility.appendDocument() above the comment "merge logical 
structure hierarchy":

         List<PDOutputIntent> srcOutputIntents = 
srcCatalog.getOutputIntents();
         for (PDOutputIntent outputIntent : srcOutputIntents)
         {
             destCatalog.addOutputIntent(outputIntent);
         }

then I edited the result PDF manually to remove one of the output 
intents. The result PDF should have something like this:

/OutputIntents [7 0 R 8 0 R]

just blank one of the two, e.g. like this:

/OutputIntents [7 0 R      ]

make sure that you don't change any positions, i.e. switch your editor 
(NOTEPAD++) to overwrite.

This may or may not work... if the two files have different output 
intents, then you'll have surprises, obviously.

I haven't done any code changes... I don't know for sure what element of 
the outputIntent is the "key" (so to skip others with the same key), and 
don't know what I should do if files have different ones. I suspect it 
is "OutputConditionIdentifier".


Example of an outputIntent:

<<
/Type/OutputIntent
/S/GTS_PDFA1
/OutputCondition(U.S. Web Coated \(SWOP\) v2)
/OutputConditionIdentifier(CGATS TR 001)
/Info(U.S. Web Coated \(SWOP\) v2)
/DestOutputProfile 4 0 R
 >>

4 0 obj

<<
/N 4
/Filter/FlateDecode
/Length 389758
 >>
stream
...
endstream

endobj


If you tell more what you're trying to do (one time only problem or 
not?), maybe I can help...

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org




Re: How to merge PDF/A-1b documents and keep conformity

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 13.04.2016 um 12:03 schrieb alexander.kriegisch@extern.sdv-it.de:
> Hi, I am new to this list.
>
> My profile is: experienced Java programmer, knowing how to use
> PDFMergerUtility, not not a PDF or even PDF/A-1b expert.
>
> Situation: We have a bunch of PDF/A-1b compliant documents from a 3rd
> party system and merge them into a new document. The end result is not
> PDF/A-1b compliant though.
>
> I found this on the mailing list archive:
> http://pdfbox-users.markmail.org/search/?q=merge%20pdf%2Fa#query:merge%20pdf%2Fa+page:1+mid:uwvybz6lhgof3agg+state:results
> Is there a better answer today than to look into PDFMergerUtility sources?
> Because this class is what we are using, but it does not do it, at least
> not in version 1.8.9. Is there a reason to assume that this has changed in
> 2.x?
>
You didn't mention what went wrong. I had that problem once with 2 files 
from the same source, what I did is:

1) in 2.0 source code (I won't bother with 1.8) add this in 
PDFMergerUtility.appendDocument() above the comment "merge logical 
structure hierarchy":

         List<PDOutputIntent> srcOutputIntents = 
srcCatalog.getOutputIntents();
         for (PDOutputIntent outputIntent : srcOutputIntents)
         {
             destCatalog.addOutputIntent(outputIntent);
         }

then I edited the result PDF manually to remove one of the output 
intents. The result PDF should have something like this:

/OutputIntents [7 0 R 8 0 R]

just blank one of the two, e.g. like this:

/OutputIntents [7 0 R      ]

make sure that you don't change any positions, i.e. switch your editor 
(NOTEPAD++) to overwrite.

This may or may not work... if the two files have different output 
intents, then you'll have surprises, obviously.

I haven't done any code changes... I don't know for sure what element of 
the outputIntent is the "key" (so to skip others with the same key), and 
don't know what I should do if files have different ones. I suspect it 
is "OutputConditionIdentifier".


Example of an outputIntent:

<<
/Type/OutputIntent
/S/GTS_PDFA1
/OutputCondition(U.S. Web Coated \(SWOP\) v2)
/OutputConditionIdentifier(CGATS TR 001)
/Info(U.S. Web Coated \(SWOP\) v2)
/DestOutputProfile 4 0 R
 >>

4 0 obj

<<
/N 4
/Filter/FlateDecode
/Length 389758
 >>
stream
...
endstream

endobj


If you tell more what you're trying to do (one time only problem or 
not?), maybe I can help...

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org