You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by Stepan RYBAR <xr...@seznam.cz> on 2009/10/09 12:22:20 UTC

What is the fastest way to create 100.000 unique PDFs?

Hallo! 

I am looking for the fastest way to create 100.000 unique PDFs. I have 10 unique XML files per 25 MB (result of output from SQL database), each one contains data for unique 10.000 letters. I have to create 100.000 letters in PDF per 1 or 2 pages. Now I am doing it in following way: 

1) For each of 10 XML file do XSL-T 2.0 using Saxon 9.2, which produces 100.000 FO files per 4 to 7 kB. Technically I am doing it by 10 times call of XSL-T using Saxon, which split input using xsl:result-document 2.0 instruction. This is fast enough for me, it takes about 18 minutes. 

2) Now the painfull process. For each of 100.000 FO file do FO to PDF using Apache FOP 0.95, which produce 100.000 PDF files per few kB. Technically I am doing it by 100.000 times call of FO to PDF using Apache FOP. I guess, that the problem is time for starting and closing Apache FOP from command line. Is there any other efficient way, how to do it? Like to specify not input exact file, but wildcarded filename. Of have I do some Java programming with Apache FOP? Or can Apache FOP do some kind of splitting output? 

I tried to solve it by parallel transformation using 

for %%f in (*0.fo) do fop... 
for %%f in (*1.fo) do fop... 
for %%f in (*2.fo) do fop... 
... 
in separate batches. 

because my files is named sequentionNumber.fo to be able to utilize more cores, but it should have the right effect on 10+ cores computers. Utilisation of cores is 100 %, RAM is OK. 

I am using Sun Java 1.6.0_05 on MS Windows Vista 32bit Enterprise, Intel Core2 6600 at 2.4 GHz, 2.0 GB RAM, java -Xmx=128m. 

Thank You for help in advance. Stepan

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: What is the fastest way to create 100.000 unique PDFs?

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Hi Stepan

What costs most in your setup is actually the JVM startup (including
class loading and just-in-time compiling), and that for each document.
No, FOP cannot split anything from the command-line.

You could try FOP's Ant task [1] to solve your problem if you're no Java
programmer. Ant [2] allows to parallelize [3] tasks and with the FOP
integration this should allow you to run just a single JVM for all
documents. You could even do the XSLT in Ant, i.e. your whole process.

HTH

[1] http://xmlgraphics.apache.org/fop/0.95/anttask.html
[2] http://ant.apache.org/
[3] http://ant.apache.org/manual/CoreTasks/parallel.html

On 09.10.2009 12:22:20 Stepan RYBAR wrote:
> Hallo! 
> 
> I am looking for the fastest way to create 100.000 unique PDFs. I have
> 10 unique XML files per 25 MB (result of output from SQL database),
> each one contains data for unique 10.000 letters. I have to create 100.000
> letters in PDF per 1 or 2 pages. Now I am doing it in following way: 
> 
> 1) For each of 10 XML file do XSL-T 2.0 using Saxon 9.2, which produces
> 100.000 FO files per 4 to 7 kB. Technically I am doing it by 10 times
> call of XSL-T using Saxon, which split input using xsl:result-document
> 2.0 instruction. This is fast enough for me, it takes about 18 minutes. 
> 
> 2) Now the painfull process. For each of 100.000 FO file do FO to PDF
> using Apache FOP 0.95, which produce 100.000 PDF files per few kB.
> Technically I am doing it by 100.000 times call of FO to PDF using
> Apache FOP. I guess, that the problem is time for starting and closing
> Apache FOP from command line. Is there any other efficient way, how to
> do it? Like to specify not input exact file, but wildcarded filename.
> Of have I do some Java programming with Apache FOP? Or can Apache FOP do
> some kind of splitting output? 
> 
> I tried to solve it by parallel transformation using 
> 
> for %%f in (*0.fo) do fop... 
> for %%f in (*1.fo) do fop... 
> for %%f in (*2.fo) do fop... 
> ... 
> in separate batches. 
> 
> because my files is named sequentionNumber.fo to be able to utilize
> more cores, but it should have the right effect on 10+ cores computers.
> Utilisation of cores is 100 %, RAM is OK. 
> 
> I am using Sun Java 1.6.0_05 on MS Windows Vista 32bit Enterprise,
> Intel Core2 6600 at 2.4 GHz, 2.0 GB RAM, java -Xmx=128m. 
> 
> Thank You for help in advance. Stepan
> 



Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org