You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by Paul Reavis <pr...@partnersoft.com> on 2002/05/20 17:43:47 UTC

diffs for on-the-fly image support

Attached are gzipped diffs for the changes I made vs. the 0.20.3
release. I'm working on patches against CVS, but am pretty busy and
wanted to get something out soonest.

Essentially the patch includes:
-> support for callback-based, on-the-fly images (URLs like
"onthefly:SomeImage", you have to preregister the callback named
"SomeImage" before running the FOP transformation)

-> a modified PDFGraphics2D called PDFStreamGraphics2D that does not
use an intermediate byte buffer, but renders direct to a PDFStream

-> modified PDFStream so that it caches to tempfiles on disk rather
than to heap

-> modified the drawImage portion of PDFStreamGraphics2D so that it
only creates a new xObject for the image if it has never seen that
image before, otherwise it reuses the reference

The combination of these things took us from render times of up to 10
minutes and hundreds of megabytes of heap to render times of less than
10 seconds and less than 64MB of heap (the default max heap size).

-- 

Paul Reavis                                      preavis@partnersoft.com
Design Lead
Partner Software, Inc.                        http://www.partnersoft.com

Re: diffs for on-the-fly image support

Posted by Paul Reavis <pr...@partnersoft.com>.

J.U. Anderegg (hansuli.anderegg@bluewin.ch) wrote To fop-dev@xml.apache.org on Tue, May 21, 2002 at 05:31:53PM +0200:

> Inserting JPEG into a PDF file is a simple file copy - given the URI,
> bits/pixel and color model. The latter are coded within JPEG files. PDF
> stores the image once and allows multiple references to it. Is programmed
> caching superior to the caching of the file system?
> 
> >From PDF view, memory = (JPEG file size + PDF encoded image) is needed at
> most during the lifetime of an output page in memory. Why isn't that so:
> device independence, AWT compatibility?
> 
> Similar considerations apply to GIF, TIFF and Fax formats.

I'm not sure exactly what you're referring to.

My hacks primarily address the issue I had of rendering large vector
plots of maps to pdf. The images that are used do not already exist as
jpegs or any other form; they are an amalgam of vector routines and
raster icons, and the icons are rotated in memory for
speed. Generating this mess to svg, then into pdf was very time
consuming and memory-intensive. So I switched to "rendering" directly
into the pdf using the existing PDFGraphics2D, which allowed me to use
the exact same routines that I use to render to the AWT window. 

Once I got that working I ran into memory problems, because the
current design of the PDF generation code keeps a lot of things in
memory as buffers, I believe because it doesn't know exactly where in
the pdf file the data will be placed at final output - it's juggling
layout etc.

So, this is not a case of file buffering, but of storing chunks
of rendered pdf for later use. My hack puts them in tempfiles rather
than in in-memory buffers. This is obviously slower but more scalable.

As for programmed caching being superior to file system caching, well,
that's another debate and really depends on the operating system. For
windows systems, especially over SMB networks, the answer is generally
yes, because they are very unaggressive about disk caching and they flush
a lot. Linux on the other hand is very aggressive (on my 754MB
development machine, 320MB is being used for disk cache), and flushes
less often. At least in my experience... in one case I got a 20x speed
increase with a decent cacheing framework; I finally noticed that machines
with crappy disk drive subsystems were far slower than those with good
ones even at the same memory and cpu speed.

-- 

Paul Reavis                                      preavis@partnersoft.com
Design Lead
Partner Software, Inc.                        http://www.partnersoft.com

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re: diffs for on-the-fly image support

Posted by Keiron Liddle <ke...@aftexsw.com>.

A normal cvs checkout gives you the development, which is different from
current maintenance releases.

What you are describing can definitely be done with an extension (in the
devel code only, so this is for later).
in your fo:
<instream-foreign-object width=".." height="..">
<myImage xmlns="my-space" id="unique-id"/>
</instream-foreign-object>

This small bit of xml will then be passed to your extension available on
the user agent.
This extension gets the image and sets up the PDFGraphics2D and does its
thing.

It should be easier. This way the extra code is contained in a simple
extension. The difference is that you need to use
instream-foreign-object instead of image.

This class is the default pdf extension that handles svg:
http://cvs.apache.org/viewcvs.cgi/xml-fop/src/org/apache/fop/render/pdf/PDFXMLHandler.java?rev=1.4&content-type=text/vnd.viewcvs-markup

On Wed, 2002-05-22 at 14:42, Paul Reavis wrote:
> In brief, the algo is this:
> 
> 1) before pdf generation, the client program sets up the on-the-fly
> snapshot objects - each is a subclass of OnTheFlyFopImage, supplying a
> paint(Graphics2D) routine.
> 
> 2) the client then registers the images somewhere in the FOP api (in
> my current hack, with FopImageFactory directly) with a url like "onthefly:uniquename"
> 
> 3) the client then runs the PDF generation
> 
> 4) the PDFRenderer, when it encounters an external image reference
> with an "onthefly:uniquename" URL, looks up the correspondingly-named
> OnTheFlyFopImage in the registry 
> 
> 5) the PDFRenderer then sets up a PDFGraphics2D and runs
> OnTheFlyFopImage.paint on it.
> 
> 6) at some point before or after pdf generation, the application can
> clear the registry, freeing up any memory used by the OnTheFlyFopImages.
> 
> If you can describe in general what the algo would be for an extension
> I'll be glad to try and implement it. Incidentally, am I getting the
> development or maintenance branch when I just do a `cvs checkout`?
> 
> Here are the actual examples from my current (outside of FOP)
> code. Incidentally, I really think there needs to be a library class
> with static methods like my convert() that allow a simple default
> embedding for folks - that's a lot of code to have to write just to
> run fop.



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re: diffs for on-the-fly image support

Posted by Paul Reavis <pr...@partnersoft.com>.

Keiron Liddle (keiron@aftexsw.com) wrote To FOP on Wed, May 22, 2002 at 10:30:45AM +0200:

> 
> Yes the several patches is good, thanks.
> This way the appropriate ones can be applied to both code bases.
> 
> I agree that 3 is probably better and should be done for the development
> code. 1 is suitable for a quick solution for the maintenance branch.
> 
> As for the extension, this is really for the development code. I don't
> know exactly where you are getting your data etc. from but the new code
> could handle this as an extension. The svg drawing itself is an
> extension and it could be done in the same way. You supply a handler on
> the user agent, this handler receives some xml data and has access to
> the pdf document, streams etc. This could make it easier but I would
> need more info.

In brief, the algo is this:

1) before pdf generation, the client program sets up the on-the-fly
snapshot objects - each is a subclass of OnTheFlyFopImage, supplying a
paint(Graphics2D) routine.

2) the client then registers the images somewhere in the FOP api (in
my current hack, with FopImageFactory directly) with a url like "onthefly:uniquename"

3) the client then runs the PDF generation

4) the PDFRenderer, when it encounters an external image reference
with an "onthefly:uniquename" URL, looks up the correspondingly-named
OnTheFlyFopImage in the registry 

5) the PDFRenderer then sets up a PDFGraphics2D and runs
OnTheFlyFopImage.paint on it.

6) at some point before or after pdf generation, the application can
clear the registry, freeing up any memory used by the OnTheFlyFopImages.

If you can describe in general what the algo would be for an extension
I'll be glad to try and implement it. Incidentally, am I getting the
development or maintenance branch when I just do a `cvs checkout`?

Here are the actual examples from my current (outside of FOP)
code. Incidentally, I really think there needs to be a library class
with static methods like my convert() that allow a simple default
embedding for folks - that's a lot of code to have to write just to
run fop.

... snip ....

    public void createOnTheFly(MapViewPanel sourcePanel, File reportDir) {
        SystemLog.singleton().enter("Creating on-the-fly snapshots...");
        try {
            FopImageFactory.clearCache();
            FopImageFactory.clearOnTheFlyImages();

            Iterator e = getSnapshots().iterator();
            int i = 0;
            while (e.hasNext()) {
                RenderMold currentSnapshot = (RenderMold)e.next();
                currentSnapshot.setMonochromeBackground(monochromeBackground);
                currentSnapshot.setInvertBackgroundColor(!noColorFiltering);
                currentSnapshot.setPrinting(true);
                SystemLog.singleton().enter("Rendering snapshot " + currentSnapshot + " to image");
                this.setDrawFinerThanScale(currentSnapshot.getScale());

                FopImageFactory.addOnTheFlyImage("Snapshot" + i, new OnTheFlySnapshot(sourcePanel, currentSnapshot));
                i++;
                }
            
            // wrap up
            this.setDrawFinerThanScale(null);
            }
        catch ( Exception oopsie ) {
            System.out.println("problem creating image in Snapshot source");
            Death.instant(oopsie);
            }
        }


... snip ....

    private class OnTheFlySnapshot extends OnTheFlyFopImage {
        
        private MapViewPanel sourcePanel;
        private RenderMold mold;
        
        public OnTheFlySnapshot(MapViewPanel sourcePanel, RenderMold mold) throws FopImageException {
            super("onthefly:Snapshot", viewFinder.getWidth(), viewFinder.getHeight());
            this.sourcePanel = sourcePanel;
            this.mold = mold;
            }
        
        public void paint(Graphics2D graphics) {
            if (isNoColorFiltering())
                GUILib.setRenderingHintsForPrinting(graphics);
            else
                GUILib.setRenderingHintsForInvertedPrinting(graphics);
            
            /*
            SystemLog.singleton().enter("Setting on-the-fly clip to: " + viewFinder.getWidth() + ", " + viewFinder.getHeight());
            graphics.setClip(0, 0, viewFinder.getWidth(), viewFinder.getHeight());
            */
            graphics.setFont(sourcePanel.getFont());
            if (noColorFiltering) {
                graphics.setColor(Color.black);
                graphics.fillRect(0, 0, viewFinder.getWidth(), viewFinder.getHeight());
                }
            else {
                graphics.setColor(Color.white);
                graphics.fillRect(0, 0, viewFinder.getWidth(), viewFinder.getHeight());
                }
            
            // iterate through layers and renderers to paint
            Iterator it = sourcePanel.layers();
            while (it.hasNext()) {
                MapViewLayer currentLayer = (MapViewLayer)it.next();
                SystemLog.singleton().enter("Rendering " + currentLayer + " to PDF");
                Iterator nit = currentLayer.getRenderers().iterator();
                while (nit.hasNext()) {
                    MapDataRenderer currentRenderer = (MapDataRenderer)nit.next();
                    currentRenderer.render(mold, graphics, null);
                    }
                }
                    
//             graphics.dispose();
            }
        }

... snip ....

    public static void convert(File xmlFile, File xslFile, File foFile, File pdfFile) {
        try {
            SystemLog.singleton().enter("Converting xml to pdf.");
            SystemLog.singleton().logMemoryUsage();
            // clear annoying image cache KLUDGE
            FopImageFactory.clearCache();

            Driver driverInstance = new Driver();
            Hierarchy hierarchy = Hierarchy.getDefaultHierarchy();
            PatternFormatter formatter = new PatternFormatter(
                                                              "[%{priority}]: %{message}\n%{throwable}" );
            
            LogTarget target = null;
            target = new StreamTarget(System.out, formatter);
            
            hierarchy.setDefaultLogTarget(target);

            Logger log = hierarchy.getLoggerFor("fop");
            log.setPriority(Priority.INFO);
            driverInstance.setLogger(log);

            PDFRenderer rendererInstance = new PDFRenderer();
            
            // set baseDir
            Configuration.put("baseDir", xmlFile.getParentFile().toURL().toExternalForm());
            
            SystemLog.singleton().enter("Converting " + xmlFile + " to " + foFile + " using " + xslFile);

            TransformerFactory tFactory = TransformerFactory.newInstance();
            Transformer transformer = tFactory.newTransformer(new StreamSource(xslFile));
            
            FileOutputStream foOut = new FileOutputStream(foFile);
            transformer.transform(new StreamSource(xmlFile), 
                                  new StreamResult(foOut)
                                  );
            foOut.close();

            SystemLog.singleton().enter("Converting " + foFile + " to " + pdfFile);
            InputHandler inputHandler = new FOInputHandler(IOLib.pathToURL(foFile.getCanonicalPath()));
            InputSource inputSource = inputHandler.getInputSource();
            BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(pdfFile));
                    
            // reset driver and supply properties
            driverInstance.reset();
            driverInstance.setRenderer(rendererInstance);
            driverInstance.setInputSource(inputSource);
            driverInstance.setOutputStream(out);
            driverInstance.run();
            out.close();
            }
        catch (IllegalArgumentException oopsie) {
            SystemLog.singleton().error(oopsie);
            SystemLog.singleton().enter("Probably demo, ignoring.");
            }
        catch (Exception oopsie) {
            SystemLog.singleton().error(oopsie);
            SystemLog.singleton().enter("Probably demo, ignoring.");
            }
        SystemLog.singleton().enter("Done converting xml to pdf.");
        SystemLog.singleton().logMemoryUsage();
        }

... snip ....


> 
> On Tue, 2002-05-21 at 16:00, Paul Reavis wrote:
> > Agreed. Here are some possible solutions:
> > 1) a boolean switch (in the api or system properties)
> > 2) intelligence in the buffer itself, where it uses a tempfile after a
> > certain size is reached
> > 3) better overall architecture where buffers are immediately flushed
> > to output rather than remaining in memory
> > 
> > (3) seems best and is in line with the next-gen design documents I see
> > on the fop site, but I don't know how far along y'all are with that. I
> > have to use a similar architecture for my map translation software;
> > GIS systems are hundreds of megabytes and scalability requires a flat
> > memory usage model. All my buffers are strictly memory-limited.
> > 
> > (1) is easy enough
> > 
> > (2) would be fine but probably has pitfalls; the problem is that there
> > are a _lot_ of these buffers and PDFStreams running around, and
> > therefore it's a global problem - I counted dozens for one plot, 24MB
> > total. 
> > 
> > I was planning on using a switch for the cvs patch, unless y'all have
> > (3) figured out.
> > 
> > > I don't see the need for an extra PDFStreamGraphics2D class. Modifying
> > > the PDFGraphics2D should suffice.
> > 
> > Agreed. I just didn't want to break the existing (the current patch
> > uses PDFStreamGraphics2D just for my case).
> >  
> > > An extension may work better in this situation with the development
> > > code. If I understand the problem properly.
> > 
> > ?? An extension to the code, or a file extension for the URL? I'm not
> > sure what you mean.
> > 
> > As far as my plans for the other features:
> > 
> > I figure the drawImage hack is a no-brainer. It's just the right thing
> > to do in that instance. The additional memory usage should be no big
> > deal (it's a hash of image pointer to integer ID).
> > 
> > I'll just modify PDFGraphics2D directly to use the underlying
> > PDFStream. I think this is fine for all cases.
> > 
> > Should I break it up into several patches?
> > -> tempfile buffering
> > -> drawImage hack
> > -> PDFGraphics2D hack
> > -> on-the-fly images
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
> For additional commands, email: fop-dev-help@xml.apache.org
> 

-- 

Paul Reavis                                      preavis@partnersoft.com
Design Lead
Partner Software, Inc.                        http://www.partnersoft.com

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re: diffs for on-the-fly image support

Posted by Keiron Liddle <ke...@aftexsw.com>.

Yes the several patches is good, thanks.
This way the appropriate ones can be applied to both code bases.

I agree that 3 is probably better and should be done for the development
code. 1 is suitable for a quick solution for the maintenance branch.

As for the extension, this is really for the development code. I don't
know exactly where you are getting your data etc. from but the new code
could handle this as an extension. The svg drawing itself is an
extension and it could be done in the same way. You supply a handler on
the user agent, this handler receives some xml data and has access to
the pdf document, streams etc. This could make it easier but I would
need more info.

On Tue, 2002-05-21 at 16:00, Paul Reavis wrote:
> Agreed. Here are some possible solutions:
> 1) a boolean switch (in the api or system properties)
> 2) intelligence in the buffer itself, where it uses a tempfile after a
> certain size is reached
> 3) better overall architecture where buffers are immediately flushed
> to output rather than remaining in memory
> 
> (3) seems best and is in line with the next-gen design documents I see
> on the fop site, but I don't know how far along y'all are with that. I
> have to use a similar architecture for my map translation software;
> GIS systems are hundreds of megabytes and scalability requires a flat
> memory usage model. All my buffers are strictly memory-limited.
> 
> (1) is easy enough
> 
> (2) would be fine but probably has pitfalls; the problem is that there
> are a _lot_ of these buffers and PDFStreams running around, and
> therefore it's a global problem - I counted dozens for one plot, 24MB
> total. 
> 
> I was planning on using a switch for the cvs patch, unless y'all have
> (3) figured out.
> 
> > I don't see the need for an extra PDFStreamGraphics2D class. Modifying
> > the PDFGraphics2D should suffice.
> 
> Agreed. I just didn't want to break the existing (the current patch
> uses PDFStreamGraphics2D just for my case).
>  
> > An extension may work better in this situation with the development
> > code. If I understand the problem properly.
> 
> ?? An extension to the code, or a file extension for the URL? I'm not
> sure what you mean.
> 
> As far as my plans for the other features:
> 
> I figure the drawImage hack is a no-brainer. It's just the right thing
> to do in that instance. The additional memory usage should be no big
> deal (it's a hash of image pointer to integer ID).
> 
> I'll just modify PDFGraphics2D directly to use the underlying
> PDFStream. I think this is fine for all cases.
> 
> Should I break it up into several patches?
> -> tempfile buffering
> -> drawImage hack
> -> PDFGraphics2D hack
> -> on-the-fly images



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re: diffs for on-the-fly image support

Posted by Paul Reavis <pr...@partnersoft.com>.

Keiron Liddle (keiron@aftexsw.com) wrote To FOP on Tue, May 21, 2002 at 11:47:19AM +0200:

> I don't think we can apply this patch directly for a number of reasons.
> Although there are parts in it with value that should be put into cvs
> when you have finished.

I figured as much. Mainly I wanted to get an example out for anyone to
look at; the code I wrote is hardly high quality and I would rather do
a more careful modification against CVS.

Sorry about the formatting changes - emacs redid the indentation
according to my weird standard and I was just too lazy to fix it
back. Is there an apache or fop standard style or style canonicalizer?
I know some projects use a tool to fix style to a standard.

> Using temp files can cause problems in certain situations.

Agreed. Here are some possible solutions:
1) a boolean switch (in the api or system properties)
2) intelligence in the buffer itself, where it uses a tempfile after a
certain size is reached
3) better overall architecture where buffers are immediately flushed
to output rather than remaining in memory

(3) seems best and is in line with the next-gen design documents I see
on the fop site, but I don't know how far along y'all are with that. I
have to use a similar architecture for my map translation software;
GIS systems are hundreds of megabytes and scalability requires a flat
memory usage model. All my buffers are strictly memory-limited.

(1) is easy enough

(2) would be fine but probably has pitfalls; the problem is that there
are a _lot_ of these buffers and PDFStreams running around, and
therefore it's a global problem - I counted dozens for one plot, 24MB
total. 

I was planning on using a switch for the cvs patch, unless y'all have
(3) figured out.

> I don't see the need for an extra PDFStreamGraphics2D class. Modifying
> the PDFGraphics2D should suffice.

Agreed. I just didn't want to break the existing (the current patch
uses PDFStreamGraphics2D just for my case).
 
> An extension may work better in this situation with the development
> code. If I understand the problem properly.

?? An extension to the code, or a file extension for the URL? I'm not
sure what you mean.

As far as my plans for the other features:

I figure the drawImage hack is a no-brainer. It's just the right thing
to do in that instance. The additional memory usage should be no big
deal (it's a hash of image pointer to integer ID).

I'll just modify PDFGraphics2D directly to use the underlying
PDFStream. I think this is fine for all cases.

Should I break it up into several patches?
-> tempfile buffering
-> drawImage hack
-> PDFGraphics2D hack
-> on-the-fly images

-- 

Paul Reavis                                      preavis@partnersoft.com
Design Lead
Partner Software, Inc.                        http://www.partnersoft.com

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

AW: diffs for on-the-fly image support

Posted by "J.U. Anderegg" <ha...@bluewin.ch>.

Inserting JPEG into a PDF file is a simple file copy - given the URI,
bits/pixel and color model. The latter are coded within JPEG files. PDF
stores the image once and allows multiple references to it. Is programmed
caching superior to the caching of the file system?

>From PDF view, memory = (JPEG file size + PDF encoded image) is needed at
most during the lifetime of an output page in memory. Why isn't that so:
device independence, AWT compatibility?

Similar considerations apply to GIF, TIFF and Fax formats.

Hansuli Anderegg



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re: diffs for on-the-fly image support

Posted by Keiron Liddle <ke...@aftexsw.com>.

Hi Paul,

I don't think we can apply this patch directly for a number of reasons.
Although there are parts in it with value that should be put into cvs
when you have finished.

The patch should be done against cvs rather than what you did which
seems to be in reverse anyway (I suppose this is what you are working
on). It's better to avoid the various formatting changes which really
confuses things.

Using temp files can cause problems in certain situations.
I don't see the need for an extra PDFStreamGraphics2D class. Modifying
the PDFGraphics2D should suffice.

An extension may work better in this situation with the development
code. If I understand the problem properly.

Thanks,
Keiron.

On Mon, 2002-05-20 at 17:43, Paul Reavis wrote:
> Attached are gzipped diffs for the changes I made vs. the 0.20.3
> release. I'm working on patches against CVS, but am pretty busy and
> wanted to get something out soonest.
> 
> Essentially the patch includes:
> -> support for callback-based, on-the-fly images (URLs like
> "onthefly:SomeImage", you have to preregister the callback named
> "SomeImage" before running the FOP transformation)
> 
> -> a modified PDFGraphics2D called PDFStreamGraphics2D that does not
> use an intermediate byte buffer, but renders direct to a PDFStream
> 
> -> modified PDFStream so that it caches to tempfiles on disk rather
> than to heap
> 
> -> modified the drawImage portion of PDFStreamGraphics2D so that it
> only creates a new xObject for the image if it has never seen that
> image before, otherwise it reuses the reference
> 
> The combination of these things took us from render times of up to 10
> minutes and hundreds of megabytes of heap to render times of less than
> 10 seconds and less than 64MB of heap (the default max heap size).

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org