You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by John Austin <jw...@sympatico.ca> on 2003/11/29 20:22:00 UTC

Properties Implementation and Canonical Mappings

In the interest of contributing (instead of just
trashing) to the proposed implementation, I wrote 
a simple Perl script to get some counts out of a 
real-world XSL-FO file.

Input: The XSL-FO file produced from a DocBook file
I have left from a dormant project. The perl program 
counts the number of properties in the source file.

PDF size:           130 Pages  // some users have a lot more
FO file size:       1.2M bytes
Properties:         22,815
Unique prop names:  89	    // bounded by the spec
Unique prop values: 2,227   // bounded by the real world

Note that storing the property name and value refs supplied
to the Property constructor will use 45,620 strings. If the
Property implementation employs canonical mapping to ensure
that only one copy of each unique string is stored, then just
over 2,300 strings are required. 

The property strings are given to the Property object
constructor by some path beginning with a SAX parser.
It is reasonable to assume that the SAX parser loses
refs to most of these strings and that the Property
implementation retains the only references to these 
String objects.

How big are String Objects ? 
At least 16 bytes plus storage for characters. 

What does this save us ? 
Probably only about 1,600,000 bytes for this file. 
CPU cost of creating strings is probably similar to 
cost of checking string table for a copy.

What does it buy for us ?
Bounds a source of current Order(n) memory growth. 
It gets us in the habit of using another good technique.

I am all ready thinking along the lines of:
The property lists for these FO's are usually generated by
programs and will be the repeated many times. Perhaps we
could use larger, faster working Property Lists consolidated with
Canonical Mappings to save both time and space.

I am thinking again along the lines of handling properties more
like C++ virtual function table (vTable). This object is larger
than Peter's ordered Property array, but would be faster. 
That's a reason C++ has fast virtual function dispatching.
-- 
John Austin <jw...@sympatico.ca>

Re: Properties Implementation and Canonical Mappings

Posted by John Austin <jw...@sympatico.ca>.
On Sat, 2003-11-29 at 16:35, J.Pietschmann wrote:
> Darn, racall the last post.
> 
> John Austin wrote:
> > Note that storing the property name and value refs supplied
> > to the Property constructor will use 45,620 strings. If the
> > Property implementation employs canonical mapping to ensure
> > that only one copy of each unique string is stored, then just
> > over 2,300 strings are required. 
> 
> Have a look at String.intern()

Bruce Eckel said not to trust it for some reason. I have 2nd Ed
of "Thinking in Java" and the online one is 3rd Ed so I haven't
found chapter and verse for this yet. 

The only 'bad thing' said about it that I could find quickly was:

http://mindprod.com/jgloss/gotchas.html

The other good thing we can do is .... compare these string refs for
equality.


> J.Pietschmann
-- 
John Austin <jw...@sympatico.ca>

Re: Properties Implementation and Canonical Mappings

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Darn, racall the last post.

John Austin wrote:
> Note that storing the property name and value refs supplied
> to the Property constructor will use 45,620 strings. If the
> Property implementation employs canonical mapping to ensure
> that only one copy of each unique string is stored, then just
> over 2,300 strings are required. 

Have a look at String.intern()


J.Pietschmann



Re: Properties Implementation and Canonical Mappings

Posted by "J.Pietschmann" <j3...@yahoo.de>.
John Austin wrote:
> In the interest of contributing (instead of just
> trashing) to the proposed implementation, I wrote 
> a simple Perl script to get some counts out of a 
> real-world XSL-FO file.
> 
> Input: The XSL-FO file produced from a DocBook file
> I have left from a dormant project. The perl program 
> counts the number of properties in the source file.
> 
> PDF size:           130 Pages  // some users have a lot more
> FO file size:       1.2M bytes
> Properties:         22,815
> Unique prop names:  89	    // bounded by the spec
> Unique prop values: 2,227   // bounded by the real world
> 
> Note that storing the property name and value refs supplied
> to the Property constructor will use 45,620 strings. If the
> Property implementation employs canonical mapping to ensure
> that only one copy of each unique string is stored, then just
> over 2,300 strings are required. 
> 
> The property strings are given to the Property object
> constructor by some path beginning with a SAX parser.
> It is reasonable to assume that the SAX parser loses
> refs to most of these strings and that the Property
> implementation retains the only references to these 
> String objects.
> 
> How big are String Objects ? 
> At least 16 bytes plus storage for characters. 
> 
> What does this save us ? 
> Probably only about 1,600,000 bytes for this file. 
> CPU cost of creating strings is probably similar to 
> cost of checking string table for a copy.
> 
> What does it buy for us ?
> Bounds a source of current Order(n) memory growth. 
> It gets us in the habit of using another good technique.
> 
> I am all ready thinking along the lines of:
> The property lists for these FO's are usually generated by
> programs and will be the repeated many times. Perhaps we
> could use larger, faster working Property Lists consolidated with
> Canonical Mappings to save both time and space.
> 
> I am thinking again along the lines of handling properties more
> like C++ virtual function table (vTable). This object is larger
> than Peter's ordered Property array, but would be faster. 
> That's a reason C++ has fast virtual function dispatching.



Re: Properties Implementation and Canonical Mappings

Posted by John Austin <jw...@sympatico.ca>.
Input: The XSL-FO file produced from:
"DocBook: The Definitive Guide "

Document size:           648 Pages  // for the O'Reilly edition
FO file size:       21,659,370 bytes
Properties:         526,648
Tags:               285,223
Height of tree:     17       // max height of the parse tree
Unique prop names:  117      // bounded by the spec
Unique prop values: 13,520   // bounded by the real world

Using these numbers, we can explore the sort of benefits to expect
from revised Property implementation. With over a million strings,
the FOTree for this document would use forty or fifty Mb in addition
to data structures. 

This document can be used as an example even though it probably
can't be formatted (yet) by FOP. It has a lot of tables. It could 
be a goal of the FOP project to generate this well-known document.

I was thinking of using the XSL-FO spec from the W3C web site but
couldn't find the stylesheet to make the FO file. If anyone knows
where to find them, please let me know.

Statistics from this file:

Number of Elements by tree level:
level=1 count=1
level=2 count=473
level=3 count=5242
level=4 count=5480
level=5 count=7129
level=6 count=26231
level=7 count=22475
level=8 count=36447
level=9 count=62288
level=10 count=38536
level=11 count=30486
level=12 count=23641
level=13 count=23190
level=14 count=2023
level=15 count=771
level=16 count=701
level=17 count=109

Element frequencies:
a 24			<==== I wonder where this came from ????
fo:basic-link 5225
fo:block 112142
fo:conditional-page-master-reference 48
fo:external-graphic 1097
fo:flow 472
fo:footnote 22
fo:footnote-body 22
fo:inline 62792
fo:layout-master-set 1
fo:leader 1764
fo:list-block 279
fo:list-item 1004
fo:list-item-body 1004
fo:list-item-label 1004
fo:marker 5335
fo:page-number 1872
fo:page-number-citation 3224
fo:page-sequence 472
fo:page-sequence-master 12
fo:region-after 38
fo:region-before 38
fo:region-body 38
fo:repeatable-page-master-alternatives 12
fo:root 1
fo:simple-page-master 38
fo:static-content 4720
fo:table 6497
fo:table-body 6497
fo:table-cell 33174
fo:table-column 19225
fo:table-footer 1
fo:table-header 29
fo:table-row 15301
fo:wrapper 1799

Properties: 526648
Tags: 285223
num_keys: 117
num_vals: 13520


-- 

John Austin <jw...@sympatico.ca>

Re: Properties Implementation and Canonical Mappings

Posted by John Austin <jw...@sympatico.ca>.
On Mon, 2003-12-01 at 02:45, Glen Mazza wrote:
> --- John Austin <jw...@sympatico.ca> wrote:
> > 
> > The property strings are given to the Property
> > object
> > constructor by some path beginning with a SAX
> > parser.
> > It is reasonable to assume that the SAX parser loses
> > refs to most of these strings and that the Property
> > implementation retains the only references to these 
> > String objects.
> > 
> > How big are String Objects ? 
> > At least 16 bytes plus storage for characters. 
> > 
> > What does this save us ? 
> > Probably only about 1,600,000 bytes for this file. 
> > CPU cost of creating strings is probably similar to 
> > cost of checking string table for a copy.
> > 
> 
> Just to clarify, the (additional?) "CPU cost" you
> mentioning above is *not* occurring for the present
> process, correct?  I think you're referring to the
> cost that would be added as a result of the changes
> you're recommending (because there now will be a
> string table search to avoid duplication).

Going back to the beginning of my involvement, I found this
issue because Property searches are the high-runner for CPU
in FOP. I don't want to split hairs in isolation over which
search/constructor sequence is faster. I want to remove the
conditions that cause the current pathology.

Hash table lookups are FAST. When we invest in object creation
we recover many times over in the end. 

> Also, the "string table" you mention--I think you're
> speaking generically, but is there a specific, already
> available construct in Java that we can use for this
> purpose in FOP?  I'd like to find out what you have in
> mind for a specific implementation.

HashMap works fine the way Peter has it set up in alt-design.

I use the same construct in the Perl code I use to analyze the
large sample FO files.

-- 
John Austin <jw...@sympatico.ca>

Re: Properties Implementation and Canonical Mappings

Posted by Glen Mazza <gr...@yahoo.com>.
--- John Austin <jw...@sympatico.ca> wrote:
> 
> The property strings are given to the Property
> object
> constructor by some path beginning with a SAX
> parser.
> It is reasonable to assume that the SAX parser loses
> refs to most of these strings and that the Property
> implementation retains the only references to these 
> String objects.
> 
> How big are String Objects ? 
> At least 16 bytes plus storage for characters. 
> 
> What does this save us ? 
> Probably only about 1,600,000 bytes for this file. 
> CPU cost of creating strings is probably similar to 
> cost of checking string table for a copy.
> 

Just to clarify, the (additional?) "CPU cost" you
mentioning above is *not* occurring for the present
process, correct?  I think you're referring to the
cost that would be added as a result of the changes
you're recommending (because there now will be a
string table search to avoid duplication).

Also, the "string table" you mention--I think you're
speaking generically, but is there a specific, already
available construct in Java that we can use for this
purpose in FOP?  I'd like to find out what you have in
mind for a specific implementation.

Thanks,
Glen

__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/