You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by Finn Bock <bc...@worldonline.dk> on 2004/01/10 14:34:48 UTC

[Bug 25480] - Experimental performance improvements.

[Glen in bugzilla]

> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25480

> I'm now looking at the changes to PropertySets.java (already applied) and 
> PropertyList.java (only partly so) and have a few questions:
> 
> 1.) For the new PropertyList constructor (in the patch), you appear to be 
> duplicating the element ID argument, once as "el", the other time 
> as "elementId"--just to confirm, they are referring to the same thing (and 
> hence one of them can be removed)?

Yes, it is a silly leftover from my own conversion process.

> ----------------
> 
> 2.) In PropertySets.java (already applied), method makeSparseIndices, you 
> define indices[0] as: 
> 
> indices[0] = (short) (set.cardinality() + 1);
> 
> Later, in PropertyList, you initialize the values array as follows:
> 
> this.values = new Property[indices[0]];
> 
> I think we can then just use set.cardinality() in makeSparseIndices(), 
> correct?  (i.e., leave out the +1).

No, the + 1 is a deliberate trick to handle unknown properties which
should return a null value during lookup(). The other part of the trick
is the

    int j = 1;

in makeSpareIndices() which ensure that all the unknown properties in a
indices array all have a value of '0' and all known properties has a
value > 0. For an element fo:bar which support 2 properties foo=21 and
baz=137, the indices array has the values

    indices[21] = 1
    indices[137] = 2

and all other values are 0. The PropertyList.values array then look
like this:

    values[0] = null // always null.
    values[1] = reference to a 'foo' Property instance
    values[2] = reference to a 'baz' Property instance

In the performance sensitive PropertyList.lookup(), the code doesn't
have to test for properties that are unknown by fo:bar

    return values[indices[propertyName]];

because all unknown properties map to the values[0] index which always
have a null value.

> ------------------
> 
> 3.) PropertySets.java defines those properties which are valid for each FO--in 
> PropertyList, the proposed implementation then uses that information to limit 
> the properties that can be assigned to an FObj (i.e., only those defined as 
> valid for it.)  Am I correct here on this point?

And it includes the properties that are valid for all the children element of
each FO. That is what the large while loop does when it calls mergeContent.
It keeps pulling the childrens properties into the parent and repeats doing
it for all the FOs until no more properties can be pulled (yes, there has
to be a better way of doing that).

> ... and do we also need to somehow additionally qualify *those* 
> properties as "valid for the FO but not directly relevant for it"?

I don't think so, and the patch doesn't do it. ProperttySets only return
the set of properties that are valid for a FO.

> ------------------------------------
> 
> 4.)  Finally, I'm too far removed from my C programming days to understand the 
> math here:
> 
> In the PropertyList constructor, you code this:
> 
>    this.specified = new int[(indices[0] >> 5) + 1];  
> 
> (where indices[0] defines the number of properties valid for the FObj)
> 
> Why the bitshifting 5 to the right?  What does this accomplish--what is this 
> shorthand for?

The 32 bits that is stored in a int. See below.

> also, in putSpecified(int idx, Property value), you code this:
> 
>    specified[i >> 5] |= 1 << (i & 31);
> 
> I'm not clear what this is doing either.  What does putSpecified() do, and 
> what's the point of the i & 31 and the Or'ing?  

PropertList.specified is just a bitmap of the same size as values.length.
All the shifting and masking is similar to what is done in
java.util.BitSet, inlined in PropertyList for performance.

putSpecified() insert a property in the array and set the specified bit to
true. This is in contrast to the inherit() method which also inserts a
property in the array but doesn't set the specified bit. The other put()
method isn't used anymore.

I also think that inlining the bitset implementation is going to far,
but it does make a performance difference, so I included it in the patch.

[Andreas L. Delmelle]

 > IIC the initial strategy WRT inherited properties was to add methods to the
 > FObj's to get these from their parent. I think the problem with this
 > implementation is that, in the case of very large documents with deeply
 > nested elements that inherit a property which is specified at the top-level,
 > you would end up with one getter being dispatched to the parent's getter,
 > and this in its turn being dispatched to yet another ancestor's getter (or
 > Makers in case of Property creation)... In this case, however, I think you
 > can't fully 'push' these onto the descendants, as this would lead to absurd
 > storage-reqs for quite average-sized documents.
 >
 > OTOH, the inherited property value (resolved or unresolved) can indeed be
 > supposed as available at parse time, because a parent is per se processed
 > *before* any of its children.

Absolutely correct, and I'm not at all sure that we should go to an array
based storage of the properties in PropertyList, as I did in the patch.
Using arrays has strengths:

- speed during lookup.
- possible to pushing inherited props into the children.

and weaknesses:

- much, much larger memory consumption.

I have some ideas regarding releasing the array memory earlier during the
endElement event that I will try out, but until we have a solution to
the memory consumption, I think we are better of with the current HashMap
implementation of property storage and then delay the implementation of
pushing inherited props into the children.

 > I just wonder if this has something to do with
 > Finn's other idea of moving logic to the property side to save on
 > unnecessary calls to empty methods ?

No, it is unrelated, but while I made the patch to remove the generated
maker classes

   http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25873

, I've concluded that it would also be good design decision and I plan
on updating patch 25873 to show that the Property.Maker can become
simpler and easier to understand as well as faster.

regards,
finn

Re: [Bug 25480] - Experimental performance improvements.

Posted by Finn Bock <bc...@worldonline.dk>.

[me]

> 1) Only store the specified properties. That is what HEAD does now.
> 2) Put the relevant properties in a fast Property[] array and the
>    remaining specified properties in a HashMap. For fo:root the result
>    would be an array of size 1 for the 'media-usage' property.
> 3) Expect to store every valid property. For fo:root that would mean
>    allocating an array large enough to store every defined property.
>    This is what my patch does, and the "values" array works as the
>    PropertyWindow.
> 
> ... I'll try to come up with some numbers to see
> how much memory that would use/save compared to 1) and 3).

You can find the counts of relevant and valid properties for each 
element type here:

    http://bckfnn-modules.sf.net/valid
    http://bckfnn-modules.sf.net/relevant

and a trace of the number of (base) properties defined at each element 
in the DocBook:TDG example I've been using all along.

    http://bckfnn-modules.sf.net/prop-count

Adding these numbers together in a rough sort of way, I've come to this 
result. The number to the right is the amount of bytes it takes to store 
the references to the properties.

1) hashsize(specified):                 713828
2) relevant * 4 + hashsize(specified)  3906168
3) valid * 4                           7007052

Where hashsize is a function
    cnt * (16 + 12) + int(cnt * 1.3333) * 4
that tries calculate to amount of memory consumed by each entry in a 
HashMap. 16 bytes is taken by each HashMap.Entry object and I estimate 
12 bytes system overhead for each object. 1.3333 is the loadfactor of 
the HashMap.table array.

The number for 2) is most likely too high, it should only be necessary 
to store the non-relevant properties in the HashMap, but I don't have 
the count of non-relevant properties available.

The full summary can be found here:
    http://bckfnn-modules.sf.net/summary

occur: the number of times the element occurs in the input.
spec:  the number of specified properties on all the occurrences of that
        element type.
relev: the maximum number of relevant slots that can be filled on all
        the occurrences of that element type.
valid: the maximum number of valid slots that can be filled on all
        the occurrences of that element type.


regards,
finn

Re: [Bug 25480] - Experimental performance improvements.

Posted by "Peter B. West" <pb...@powerup.com.au>.

Finn Bock wrote:
> [Peter B. West]
> 
>> Alt-design (trying the hyphen for a while) takes different approaches 
>> at different times.  While building the subtree of any node, all of 
>> the properties are maintained in a HashMap, along with a BitSet of 
>> specified properties.
>>
>> When the subtree construction is complete, the HashMap and BitSet are 
>> used to build the sparse array of only the relevant *resolved* 
>> property values 
> 
> 
> If I understand the Alt-design code correctly, the function calls, like 
> from-parent(), are resolved but percentage are not resolved at this 
> point, but still saved as an IndirectValue, right? The percentage will 
> be resolved at a later stage?

Basically, yes.  There are complications with from-parent() and 
from-nearest-specified-value() when used with shorthands.  These have 
lead to the creation of the FromParent and FromNearestSpecified 
pseudo-types.

>> (not properties - one of the differences with HEAD) 
> 
> 
> I think you have mentioned this before, but is it such a big difference? 
> HEAD wraps its datavalues in a very thin Property wrapper, but otherwise 
> there is a one-to-one binding between a HEAD Property and its value.

I freely admit that when I started working with properties, I had only 
the fuzziest notion of the way they were processed in the original code. 
  I'm not a lot better informed now.  However, the idea of expressing 
properties in terms of data values still seems to me to be strange. 
Even though individual properties may *eventually* resolve to a 
particular basic type, the road there can be very complicated.  It 
seemed to me that I should be able to process property values into a 
range of possible data types (e.g. enumerations and lengths), postponing 
the resolution into a particular, say, length, as long as possible.

The other issue was that some types (enumerations, strings) resolve 
eventually into very different types depending on the property on which 
they are expressed.  The Rec. (and consequently the parser) allow a 
multiplicity of different data types in the expressions on many 
properties.  It just seemed cleaner to me to separate properties (which 
have certain static characteristics) out from data types.  That way, I 
have the option of resolving different datatypes into their final 
datatype downstream of the parsing and FO tree building.

I am open to enlightenment on this.

Peter
-- 
Peter B. West <http://www.powerup.com.au/~pbwest/resume.html>

Re: [Bug 25480] - Experimental performance improvements.

Posted by Finn Bock <bc...@worldonline.dk>.

[Peter B. West]

> Alt-design (trying the hyphen for a while) takes different approaches at 
> different times.  While building the subtree of any node, all of the 
> properties are maintained in a HashMap, along with a BitSet of specified 
> properties.
> 
> When the subtree construction is complete, the HashMap and BitSet are 
> used to build the sparse array of only the relevant *resolved* property 
> values 

If I understand the Alt-design code correctly, the function calls, like 
from-parent(), are resolved but percentage are not resolved at this 
point, but still saved as an IndirectValue, right? The percentage will 
be resolved at a later stage?

> (not properties - one of the differences with HEAD) 

I think you have mentioned this before, but is it such a big difference? 
HEAD wraps its datavalues in a very thin Property wrapper, but otherwise 
there is a one-to-one binding between a HEAD Property and its value.

regards,
finn

Re: [Bug 25480] - Experimental performance improvements.

Posted by "Peter B. West" <pb...@powerup.com.au>.

Finn Bock wrote:
> [Glen]
> 
>> One thing I'm missing here, for Finn's design below:
>>
>> values[0] = null // always null.
>> values[1] = reference to a 'foo' Property instance
>> values[2] = reference to a 'baz' Property instance
>>
>> Can't we just have, say, values[1] refer to the
>> parent's Property instance in cases where inheritance
>> is applicable?
>>
>> I.e., for all those references to the 'foo' property
>> instance for the children of an FO where that value
>> would be inherited, we don't have to create a new
>> Property instance, just a reference to the inherited
>> instance.
> 
> 
> Right, that is also the way I see it.
> 
>> But if the problem is the *recursion* necessary to
>> determine the property instance to inherit--here, not
>> the memory problem, but processing speed--I'm thinking
>> of a PropertyWindow instance as follows:
>>
>> A PropertyWindow would be used temporarily during FObj
>> property initialization to hold references to all the
>> property instances that would be relevant for that
>> FObj should a property not be explicitly defined. 
>> So, to populate the property instances for a
>> particular FObj, i.e., the "values" array:
> 
> 
> Pardon me for repeating what might be obvious, but I'll like to take
> a look at what information we want to store at each FObj. I can come
> up with 3 different strategies:
> 
> 1) Only store the specified properties. That is what HEAD does now.
> 2) Put the relevant properties in a fast Property[] array and the
>    remaining specified properties in a HashMap. For fo:root the result
>    would be an array of size 1 for the 'media-usage' property.
> 3) Expect to store every valid property. For fo:root that would mean
>    allocating an array large enough to store every defined property.
>    This is what my patch does, and the "values" array works as the
>    PropertyWindow.
> 
> As I understand your PropertyWindow proposal, it would allow us to
> implement no. 2) above. I'll try to come up with some numbers to see
> how much memory that would use/save compared to 1) and 3).

Alt-design (trying the hyphen for a while) takes different approaches at 
different times.  While building the subtree of any node, all of the 
properties are maintained in a HashMap, along with a BitSet of specified 
properties.

When the subtree construction is complete, the HashMap and BitSet are 
used to build the sparse array of only the relevant *resolved* property 
values (not properties - one of the differences with HEAD) and then 
thrown away.

This approach has to be modified in two environments - fo:static-content 
and fo:marker.  In the case of fo:marker, the inherited environment is 
not known at parse time, and in the case of static-content, the 
appropriate fo:retrieve-marker subtrees are not know until the 
region-body area tree is constructed.

In general, the impact on storage of maintaining full details for 
fo:static-content and fo:marker will not be huge, even if these are 
parsed as encountered.  However, the plan for alt-design is 1) not the 
parse an fo:marker subtree unless and until it is required, and 2) to 
re-parse fo:static-content for each page after the region-body area tree 
has been constructed. (I'm working on these modifications now.)

Peter
-- 
Peter B. West <http://www.powerup.com.au/~pbwest/resume.html>

Re: [Bug 25480] - Experimental performance improvements.

Posted by Glen Mazza <gr...@yahoo.com>.

--- Finn Bock <bc...@worldonline.dk> wrote:
> > I.e., for all those references to the 'foo'
> property
> > instance for the children of an FO where that
> value
> > would be inherited, we don't have to create a new
> > Property instance, just a reference to the
> inherited
> > instance.
> 
> Right, that is also the way I see it.
> 

Good, all three of us are on the same page.

> 2) Put the relevant properties in a fast Property[]
> array and the
>     remaining specified properties in a HashMap. For
> fo:root the result
>     would be an array of size 1 for the
> 'media-usage' property.

No, I wasn't thinking of having a HashMap at all--just
a property array--first populated by incoming
specified properties, then populated by querying the
PropertyWindow for ancestor already-created property
instances.

The PropertyWindow is only a FO-tree build time
convenience for quickly attaching already created
property instances to the current node, without
needing to make recursive calls up the tree to obtain
those instances.  It is not needed if (1) the
recursion is not that big a deal in time, or (2) it is
not desired to store an array of all valid instances
for each FObj (i.e., we should continue something
similar to what HEAD does presently).  

However, its usage may also eliminate the need to
store all relevant properties for the children in the
array of each FObj.  (i.e., fo:root can return to just
storing media-usage.)  Of course, it is certainly more
space-efficient to store 250 properties at the top
than at each of the child nodes.  Ultimately, it
depends on what you want the property arrays for each
FObj instance to store.

Glen

__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus

RE: [Bug 25480] - Experimental performance improvements.

Posted by "Andreas L. Delmelle" <a_...@pandora.be>.

> -----Original Message-----
> From: Peter B. West [mailto:pbwest@powerup.com.au]
>
<snip />
> If I mentioned PropertyValue singletons, it was a slip of the fingers.
> I maintain Property singletons, which are exist solely to provide access
> to certain "static" information about individual properties.
>

Don't worry, your fingers still OK. The slip was all mine...

In any case: thanks for a very fine explanation. I'm going to digest your
remarks first, then maybe I'll be back for more.


Cheers,

Andreas

Re: [Bug 25480] - Experimental performance improvements.

Posted by "Peter B. West" <pb...@powerup.com.au>.

Andreas L. Delmelle wrote:
> Sorry for the long post... Just trying to put a few ideas together.
> 
> 
>>Alt-design (trying the hyphen for a while) takes different approaches at
>>different times.  While building the subtree of any node, all of the
>>properties are maintained in a HashMap, along with a BitSet of specified
>>properties.
>>
>>When the subtree construction is complete, the HashMap and BitSet are
>>used to build the sparse array of only the relevant *resolved* property
>>values (not properties - one of the differences with HEAD) and then
>>thrown away.
>>
> 
> 
> Yum... so a Collection of Objects is traded for a handful of --what? (Would
> this be the propertyValue singletons you mention?)

If I mentioned PropertyValue singletons, it was a slip of the fingers. 
I maintain Property singletons, which are exist solely to provide access 
to certain "static" information about individual properties.

> What happens with
> 'unresolvable' Props at that point (for the cases you mention below)?

See the package org.apache.fop.datatypes.indirect, especially 
IndirectValue, which the others extend.  IndirectValue itself extends 
AbstractPropertyValue.  Unresolved values take their place alongside the 
resolved properties.

> Do
> they get thrown away, or placed in a sort of Property stack for later
> accessing? If so, how exactly do you indicate in the FObj (or FONode) that
> the property in question still has to be resolved? So that, when Layout asks
> for this property, it is signaled that some computations still need to be
> made? (Pardon me if these questions mean I haven't read your docs thoroughly
> enough; I did read some of them, but it seems I still miss a bit of
> technical background to fully understand it.)

The docs are far from adequate.

> 
>>This approach has to be modified in two environments - fo:static-content
>>and fo:marker.  In the case of fo:marker, the inherited environment is
>>not known at parse time, and in the case of static-content, the
>>appropriate fo:retrieve-marker subtrees are not know until the
>>region-body area tree is constructed.
>>
> 
> 
> Yes... cases I was overlooking --not so plain inheritance as the FObj's for
> static-content appear only once or twice every page-sequence, but their
> inherited properties and values vary, depending on the further processing of
> the tree. Still, I'm wondering whether you really need to re-parse them (as
> you indicated)... Couldn't you just, say, keep the Property alive and alter
> its value when needed? (If this would save you any re-initializing, I mean)
> The re-parsing idea seems very interesting to be able to say after a
> threesome of pages have been processed: collapse the branches of the FOTree
> that have already been layed out to the level below the current fo:flow (or
> fo:block; in any case some logical point of reference). If downstream, it
> turns out that their layout needs to be modified again (what are the odds?
> any way of predicting this?), one could trigger a re-parse from the subtree
> in question onwards. (This would, I think, save us some memory)

The re-parse may not be strictly necessary, but it is a workable first 
cut.  Generally, static-content and markers will not be large subtrees, 
so re-parsing them ought not to have a major impact.  This can be 
experimented with once the layout is working.  The beauty of the 
approach is that it simplifies the logic, without (I hope) costing too 
much in performance.

In terms of the re-layout, re-parsing is not required (Re, re, re your 
boat...)  IndirectValues will need to be re-evaluated in the new 
context, but the process of re-layout is not markedly different from the 
normal layout process.  Any particular lineage descendant from an 
fo:flow must be able to adapt to a new environment.

Because the page is a logical unit different from the flow, some line of 
descent from the fo:flow will be unresolved when any particular page, 
which is being laid out from that flow, fills up.  The incomplete 
lineage must somehow be associated with the new page, whose region-body 
dimensions may be different.  The layout must then proceed from the 
point at which the previous page filled.

There are similarities here with backtracking.  Backtracking layout 
involves backtracking to a particular lineage in a particular state 
(even it that is the beginning of an fo:flow.)

>   Perhaps the
> same approach can be taken WRT tables: collapse all finished rows, so their
> cells are released. When their layed-out state turns out to be insufficient,
> start processing again from the row in question onwards, *with* the
> knowledge of what lies ahead this time...
> 

Peter
-- 
Peter B. West <http://www.powerup.com.au/~pbwest/resume.html>

RE: [Bug 25480] - Experimental performance improvements.

Posted by "Andreas L. Delmelle" <a_...@pandora.be>.

Sorry for the long post... Just trying to put a few ideas together.

> From: Finn Bock [mailto:bckfnn@worldonline.dk]
>
<snip />
> Pardon me for repeating what might be obvious,

Well, it wasn't to me, so... thanks in advance

> but I'll like to take
> a look at what information we want to store at each FObj.

If you look at the big picture, I think this could as well be none ( a null
vector ) for FObj's with nothing but inherited props. Their property
accessor should be able to perform the logic 'if absent in this FObj's prop
specs, get the specified/computed value (or the property, in case the value
is still unresolved) for the correct FObj', instead of having to redirect
the call to the immediate parent's Propertylist, which may or may not have
to do the same, who knows... the parser probably did at some point, but it
forgot, so did we and now it's got us frantically climbing up the
FOTree --like monkeys :)

> I can come up with 3 different strategies:
>
> 1) Only store the specified properties. That is what HEAD does now.
     ^^^^
? Is this meant ironically? :)
IIC, if not optimized or normalized in some way, overconstraint would lead
to a drastically high object instantiation rate, not? Besides that, the spec
*does* enforce verbosity in some quite trivial and widely used constructs...
like, <sigh/>, tables

> From: Peter B. West [mailto:pbwest@powerup.com.au]
>
<snip />
> Alt-design (trying the hyphen for a while) takes different approaches at
> different times.  While building the subtree of any node, all of the
> properties are maintained in a HashMap, along with a BitSet of specified
> properties.
>
> When the subtree construction is complete, the HashMap and BitSet are
> used to build the sparse array of only the relevant *resolved* property
> values (not properties - one of the differences with HEAD) and then
> thrown away.
>

Yum... so a Collection of Objects is traded for a handful of --what? (Would
this be the propertyValue singletons you mention?) What happens with
'unresolvable' Props at that point (for the cases you mention below)? Do
they get thrown away, or placed in a sort of Property stack for later
accessing? If so, how exactly do you indicate in the FObj (or FONode) that
the property in question still has to be resolved? So that, when Layout asks
for this property, it is signaled that some computations still need to be
made? (Pardon me if these questions mean I haven't read your docs thoroughly
enough; I did read some of them, but it seems I still miss a bit of
technical background to fully understand it.)

> This approach has to be modified in two environments - fo:static-content
> and fo:marker.  In the case of fo:marker, the inherited environment is
> not known at parse time, and in the case of static-content, the
> appropriate fo:retrieve-marker subtrees are not know until the
> region-body area tree is constructed.
>

Yes... cases I was overlooking --not so plain inheritance as the FObj's for
static-content appear only once or twice every page-sequence, but their
inherited properties and values vary, depending on the further processing of
the tree. Still, I'm wondering whether you really need to re-parse them (as
you indicated)... Couldn't you just, say, keep the Property alive and alter
its value when needed? (If this would save you any re-initializing, I mean)
The re-parsing idea seems very interesting to be able to say after a
threesome of pages have been processed: collapse the branches of the FOTree
that have already been layed out to the level below the current fo:flow (or
fo:block; in any case some logical point of reference). If downstream, it
turns out that their layout needs to be modified again (what are the odds?
any way of predicting this?), one could trigger a re-parse from the subtree
in question onwards. (This would, I think, save us some memory) Perhaps the
same approach can be taken WRT tables: collapse all finished rows, so their
cells are released. When their layed-out state turns out to be insufficient,
start processing again from the row in question onwards, *with* the
knowledge of what lies ahead this time...

> From: Glen Mazza [mailto:grm7793@yahoo.com]
> --- "Andreas L. Delmelle" <a_...@pandora.be>
> > What I'm very concerned about, for example, are cases like
> > tables, where it would be quite awkward to have the TableCell
> > FObj's reference their own copy of a Property instance
>
> To put it more precisely, the individual Properties of
> an FObj are currently stored in the PropertyList of
> that FOBj.
<snip />
> This sounds like it could be an excellent idea--a
> PropertyRepository ( ... ) could be a very useful
> tool for FO Tree Building.  Prior to creating any
> Property instance for any FO's array, we send the
> specs of the needed property to the
> PropertyRepository, and it gives us either (1) a
> brand-new property instance, or (2) a reference to an
> already-created one.

Aha! So this approach could also be used to recycle some of the HashMap and
BitSet that get thrown away in alt-design, correct?


Cheers,

Andreas

Re: [Bug 25480] - Experimental performance improvements.

Posted by Finn Bock <bc...@worldonline.dk>.

[Glen]

> One thing I'm missing here, for Finn's design below:
> 
> values[0] = null // always null.
> values[1] = reference to a 'foo' Property instance
> values[2] = reference to a 'baz' Property instance
> 
> Can't we just have, say, values[1] refer to the
> parent's Property instance in cases where inheritance
> is applicable?
> 
> I.e., for all those references to the 'foo' property
> instance for the children of an FO where that value
> would be inherited, we don't have to create a new
> Property instance, just a reference to the inherited
> instance.

Right, that is also the way I see it.

> But if the problem is the *recursion* necessary to
> determine the property instance to inherit--here, not
> the memory problem, but processing speed--I'm thinking
> of a PropertyWindow instance as follows:
> 
> A PropertyWindow would be used temporarily during FObj
> property initialization to hold references to all the
> property instances that would be relevant for that
> FObj should a property not be explicitly defined.  
> 
> So, to populate the property instances for a
> particular FObj, i.e., the "values" array:

Pardon me for repeating what might be obvious, but I'll like to take
a look at what information we want to store at each FObj. I can come
up with 3 different strategies:

1) Only store the specified properties. That is what HEAD does now.
2) Put the relevant properties in a fast Property[] array and the
    remaining specified properties in a HashMap. For fo:root the result
    would be an array of size 1 for the 'media-usage' property.
3) Expect to store every valid property. For fo:root that would mean
    allocating an array large enough to store every defined property.
    This is what my patch does, and the "values" array works as the
    PropertyWindow.

As I understand your PropertyWindow proposal, it would allow us to
implement no. 2) above. I'll try to come up with some numbers to see
how much memory that would use/save compared to 1) and 3).

regards,
finn

RE: [Bug 25480] - Experimental performance improvements.

Posted by John Austin <jw...@sympatico.ca>.

On Tue, 2004-01-13 at 20:49, Glen Mazza wrote:
> Let's not get too certain of anything right now with
> respect to implementation--but you probably have a
> point--a huge and very repetitively formatted document
> (say, the Chicago phone book, perhaps) would have
> comparatively fewer properties with a higher
> cardinality for each.

SOLVED! Yes!

Something to cheer up a morbidly downcast Packers fan two
days after the fall of the mighty number '4'.

I used DocBook for the frequency table because I was familiar
with formatting it as PDF with FOP. I suspect that properties
have similar distributions in general because XSL-FO are always
generated with programs and (ransom notes notwithstanding) 
adhere to general styles.

Really repetitive documents would be only slightly more skewed
than general text documents. (Say 90-10 rather than 80-20).

Someone told me where to get the style sheets for the XSL-FO
specification (RenderX) and I wanted to generate the XSL-FO
file for it, as a more appropriate 'challenge' for the project. 

--
John Austin <jw...@sympatico.ca>

RE: [Bug 25480] - Experimental performance improvements.

Posted by Glen Mazza <gr...@yahoo.com>.

Let's not get too certain of anything right now with
respect to implementation--but you probably have a
point--a huge and very repetitively formatted document
(say, the Chicago phone book, perhaps) would have
comparatively fewer properties with a higher
cardinality for each.

Glen

--- "Andreas L. Delmelle" <a_...@pandora.be>
wrote:
> > -----Original Message-----
> > From: Glen Mazza [mailto:grm7793@yahoo.com]
> >
> > OK--this may also be overkill then.  Thank you for
> the
> > analysis.
> >
> 
> It will prove useful, I am sure --provided we want
> to uphold the intention
> to be able to process any size of document (from
> sources that may not be as
> exemplary as DocBook :/)
> ... and eventually redo part
> of the process with
> significantly less effort.
> 
> Over & out (for the time being :) )
> 
> Cheers,
> 
> Andreas
> 

__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus

RE: [Bug 25480] - Experimental performance improvements.

Posted by "Andreas L. Delmelle" <a_...@pandora.be>.

> -----Original Message-----
> From: Glen Mazza [mailto:grm7793@yahoo.com]
>
> OK--this may also be overkill then.  Thank you for the
> analysis.
>

It will prove useful, I am sure --provided we want to uphold the intention
to be able to process any size of document (from sources that may not be as
exemplary as DocBook :/)... and eventually redo part of the process with
significantly less effort.

Over & out (for the time being :) )

Cheers,

Andreas

Re: [Bug 25480] - Experimental performance improvements.

Posted by Glen Mazza <gr...@yahoo.com>.

OK--this may also be overkill then.  Thank you for the
analysis.

Glen

--- Finn Bock <bc...@worldonline.dk> wrote:
> [Glen Mazza]
> 
> > This sounds like it could be an excellent idea--a
> > PropertyRepository (extending, of course, a
> > DelmelleRepository (tm) ;) ) could be a very
> useful
> > tool for FO Tree Building.  Prior to creating any
> > Property instance for any FO's array, we send the
> > specs of the needed property to the
> > PropertyRepository, and it gives us either (1) a
> > brand-new property instance, or (2) a reference to
> an
> > already-created one.  So, indeed, only one
> instance of
> > that font-size = "12pt" would need to be created. 
> 
> 
> The amount to be saved will of course depend on the
> input, but for 
> "DocBook: The Definite Guide", the amount is quite
> small. So small that 
> I didn't bothered to do it when I made the
> performance patch.
> 
> Here is a break down on the string values that are
> parsed into 
> properties. It is the output from "sort | uniq -c |
> sort" so in the 
> first column is the number of times that a value
> occurs.
> 
>     
>
http://bckfnn-modules.sf.net/property-value-breakdown.txt
> 
> The properties that starts with a '.' are the
> default values for 
> subproperties and they can all be reduced to one
> occurrence if the 
> default value for subproperties was cached.
> 
> 'start-ident' and 'line-height=1.2em' both depend on
> other properties 
> and therefore can't be (easily) cached.
> 
> regards,
> finn
> 


__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus

Re: [Bug 25480] - Experimental performance improvements.

Posted by Finn Bock <bc...@worldonline.dk>.

[Glen Mazza]

> This sounds like it could be an excellent idea--a
> PropertyRepository (extending, of course, a
> DelmelleRepository (tm) ;) ) could be a very useful
> tool for FO Tree Building.  Prior to creating any
> Property instance for any FO's array, we send the
> specs of the needed property to the
> PropertyRepository, and it gives us either (1) a
> brand-new property instance, or (2) a reference to an
> already-created one.  So, indeed, only one instance of
> that font-size = "12pt" would need to be created.  

The amount to be saved will of course depend on the input, but for 
"DocBook: The Definite Guide", the amount is quite small. So small that 
I didn't bothered to do it when I made the performance patch.

Here is a break down on the string values that are parsed into 
properties. It is the output from "sort | uniq -c | sort" so in the 
first column is the number of times that a value occurs.

     http://bckfnn-modules.sf.net/property-value-breakdown.txt

The properties that starts with a '.' are the default values for 
subproperties and they can all be reduced to one occurrence if the 
default value for subproperties was cached.

'start-ident' and 'line-height=1.2em' both depend on other properties 
and therefore can't be (easily) cached.

regards,
finn

RE: [Bug 25480] - Experimental performance improvements.

Posted by Glen Mazza <gr...@yahoo.com>.

--- "Andreas L. Delmelle" <a_...@pandora.be>
wrote:
> What I'm very concerned about, for example, are
> cases like tables, where it
> would be quite awkward to have the TableCell FObj's
> reference their own copy
> of a Property instance

To put it more precisely, the individual Properties of
an FObj are currently stored in the PropertyList of
that FOBj.

>
> > 2.) for any properties undefined, access the
> > PropertyWindow to determine the property instances
> to
> > use for them.  No recursion needed now.
> >
> 
> Sounds good, and while we're at it, could we test
> for equal specified values
> on any ancestors in step 1, so we can use the
> advantages of inheritance to
> the max...? 

> We don't really want to *punish* users
> who feel like specifying

who *don't* feel, I presume?  ;)

> 'font-size="12pt"' on 80K different FObj's that are
> descendants of the
> fo:flow, I think.
> In the proposed case, IMHO, there should be only one
> Property instance for
> 'font-size="12pt"'. Is this a correct view? Or is
> this outright impossible
> for some reason I'm missing?
> 

This sounds like it could be an excellent idea--a
PropertyRepository (extending, of course, a
DelmelleRepository (tm) ;) ) could be a very useful
tool for FO Tree Building.  Prior to creating any
Property instance for any FO's array, we send the
specs of the needed property to the
PropertyRepository, and it gives us either (1) a
brand-new property instance, or (2) a reference to an
already-created one.  So, indeed, only one instance of
that font-size = "12pt" would need to be created.  

The PropertyRepository would be accessed by
PropertyList when creating properties out of specified
attributes, prior to going to a PropertyWindow (if we
use one) to determine which non-specified properties
to attach.  (The PropertyWindow would only need the
PropertyRepository on app-startup, when it would
create its default objects.)  

But we may not need to bother with a
PropertyWindow--as you're saying, I doubt we're going
to need to store references to all properties at each
FObj--and its time savings on FO Tree building may be
questionable anyway.

Glen

__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus

RE: [Bug 25480] - Experimental performance improvements.

Posted by "Andreas L. Delmelle" <a_...@pandora.be>.

> -----Original Message-----
> From: Glen Mazza [mailto:grm7793@yahoo.com]
>
<snip />
> One thing I'm missing here, for Finn's design below:
>
> values[0] = null // always null.
> values[1] = reference to a 'foo' Property instance
> values[2] = reference to a 'baz' Property instance
>
> Can't we just have, say, values[1] refer to the
> parent's Property instance in cases where inheritance
> is applicable?
>
> I.e., for all those references to the 'foo' property
> instance for the children of an FO where that value
> would be inherited, we don't have to create a new
> Property instance, just a reference to the inherited
> instance.
>

That sounds indeed very much like the sort of thing I'm referring to,
expressed with more profound background knowledge, so more to the point.

> [Me: ]
> > The difference would be that, instead of n
> > Property objects, each with
> > its own FObj, you would end up with one larger
> > Property object --larger,
> > because it has to store references of all FObj's to
> > which it applies - that
> > is in some way shared by n FObj's.
> >
> [Glen: ]
> I think this could be done more simply by the above,
> just have values[1] refer to the property of the
> parent instead.
>

Agreed.

What I'm very concerned about, for example, are cases like tables, where it
would be quite awkward to have the TableCell FObj's reference their own copy
of a Property instance --thus increasing their size in memory - while there
would be a way to get the value (resolved/unresolved) of the ancestor in
question, directly without any recursion, given the necessary adjustments.
IIC the TableRow FObj's Property instances will always be available as long
as any Cells are being processed. Same goes for the Table and the Columns
while any Rows are processed. I admit, this is a quite different form of
inheritance, but it serves as an illustration.
(Thinking of my recent colspan excursions here: does every cell really need
its own colspan variable where you could simply retrieve it from the column
(if specified)? This also has me doubting about simply ignoring implicit
columns at LM creation stage, like the proposed (and recently applied) patch
25809 does. The corresponding FObj would simply not exist. If it did, a LM
would automatically be instantiated for it. I'm thinking of creating 'a' (at
least one) column in the FOTree, but I'll discuss this in another
thread... )

> But if the problem is the *recursion* necessary to
> determine the property instance to inherit--here, not
> the memory problem, but processing speed--I'm thinking
> of a PropertyWindow instance as follows:
>

My concern, I believe, is both. Take, for instance, the font-size property.
I would consider it to be best practice in XSL-FO to specify the value that
will be used the most throughout the rest of the document on the fo:flow,
and use specified values to override this value only where it is absolutely
necessary to do so. This, to me, means that when it comes to processing,
there has to be an advantage to adhere to such practices, WRT memory
consumption as well as WRT speed. I mean: if it takes only a little more
time to process at FOTree building stage, it might be worth it if you can
avoid a lot of recursive calls later on in the process.


So...
> A PropertyWindow would be used temporarily during FObj
> property initialization to hold references to all the
> property instances that would be relevant for that
> FObj should a property not be explicitly defined.
>
> So, to populate the property instances for a
> particular FObj, i.e., the "values" array:
>
> 1.) read any incoming attributes and make properties
> out of them.
>
> 2.) for any properties undefined, access the
> PropertyWindow to determine the property instances to
> use for them.  No recursion needed now.
>

Sounds good, and while we're at it, could we test for equal specified values
on any ancestors in step 1, so we can use the advantages of inheritance to
the max...? We don't really want to *punish* users who feel like specifying
'font-size="12pt"' on 80K different FObj's that are descendants of the
fo:flow, I think.
In the proposed case, IMHO, there should be only one Property instance for
'font-size="12pt"'. Is this a correct view? Or is this outright impossible
for some reason I'm missing?

> 3.) if the FObj has any child objects, create a new
> (temporary) PropertyWindow for their
> processing, based on the FObj's (parent)
> PropertyWindow and the value of its own Properties.
> This temporary PropertyWindow can be dropped once the
> children's FObjs have been processed.  So the number
> of open PropertyWindow instances during processing
> would just be equal to the depth of the tree at the
> current point of processing.
>
> (What I'm describing above is may very well be a
> common design pattern, I don't know its name,
> however.)
>

Me neither... If it turns out not to be one, we could call it the
'Mazza-Window' :)

> OTOH, for any XSL functions which request, "run-time",
> to use different Property values than that obtained by
> simple inheritance--we can't rely on the above.  These
> functions are relatively infrequently used, and IMO do
> not need to be as optimized as normal inheritance
> would be--i.e., we still have to support them, but we
> shouldn't be altering the speed of the base processing
> in order to optimize for *these* functions--recursion
> would be OK here, if it allows speed/memory saving for
> usual processing.
>

Except maybe the most obvious cases like a flat 'from-parent()'?


Cheers,

Andreas

RE: [Bug 25480] - Experimental performance improvements.

Posted by Glen Mazza <gr...@yahoo.com>.

--- "Andreas L. Delmelle" <a_...@pandora.be>
wrote:
> > -----Original Message-----
> > From: Finn Bock [mailto:bckfnn@worldonline.dk]
> >
> > > [Andreas L. Delmelle]
> > > In this case, however, I think you
> > > can't fully 'push' these onto the descendants,
> as this would
> > > lead to absurd storage-reqs for quite
> average-sized documents.
> >

<snip/>

> >
> > Absolutely correct, and I'm not at all sure that
> we should go to an array
> > based storage of the properties in PropertyList,
> as I did in the patch.

One thing I'm missing here, for Finn's design below:

values[0] = null // always null.
values[1] = reference to a 'foo' Property instance
values[2] = reference to a 'baz' Property instance

Can't we just have, say, values[1] refer to the
parent's Property instance in cases where inheritance
is applicable?

I.e., for all those references to the 'foo' property
instance for the children of an FO where that value
would be inherited, we don't have to create a new
Property instance, just a reference to the inherited
instance.

> The difference would be that, instead of n
> Property objects, each with
> its own FObj, you would end up with one larger
> Property object --larger,
> because it has to store references of all FObj's to
> which it applies - that
> is in some way shared by n FObj's.
> 

I think this could be done more simply by the above,
just have values[1] refer to the property of the
parent instead.  

But if the problem is the *recursion* necessary to
determine the property instance to inherit--here, not
the memory problem, but processing speed--I'm thinking
of a PropertyWindow instance as follows:

A PropertyWindow would be used temporarily during FObj
property initialization to hold references to all the
property instances that would be relevant for that
FObj should a property not be explicitly defined.  

So, to populate the property instances for a
particular FObj, i.e., the "values" array:

1.) read any incoming attributes and make properties
out of them.

2.) for any properties undefined, access the
PropertyWindow to determine the property instances to
use for them.  No recursion needed now.

3.) if the FObj has any child objects, create a new
(temporary) PropertyWindow for their
processing, based on the FObj's (parent)
PropertyWindow and the value of its own Properties.
This temporary PropertyWindow can be dropped once the
children's FObjs have been processed.  So the number
of open PropertyWindow instances during processing
would just be equal to the depth of the tree at the
current point of processing.

(What I'm describing above is may very well be a
common design pattern, I don't know its name,
however.)

OTOH, for any XSL functions which request, "run-time",
to use different Property values than that obtained by
simple inheritance--we can't rely on the above.  These
functions are relatively infrequently used, and IMO do
not need to be as optimized as normal inheritance
would be--i.e., we still have to support them, but we
shouldn't be altering the speed of the base processing
in order to optimize for *these* functions--recursion
would be OK here, if it allows speed/memory saving for
usual processing.

Glen

__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus

RE: [Bug 25480] - Experimental performance improvements.

Posted by "Andreas L. Delmelle" <a_...@pandora.be>.

> -----Original Message-----
> From: Finn Bock [mailto:bckfnn@worldonline.dk]
>
> > [Andreas L. Delmelle]
> > In this case, however, I think you
> > can't fully 'push' these onto the descendants, as this would
> > lead to absurd storage-reqs for quite average-sized documents.
>
> > OTOH, the inherited property value (resolved or unresolved)
> > can indeed be supposed as available at parse time, because a parent is
> > per se processed *before* any of its children.
>
> Absolutely correct, and I'm not at all sure that we should go to an array
> based storage of the properties in PropertyList, as I did in the patch.
> Using arrays has strengths:
>
> - speed during lookup.
> - possible to pushing inherited props into the children.
>
> and weaknesses:
>
> - much, much larger memory consumption.
>
> I have some ideas regarding releasing the array memory earlier during the
> endElement event that I will try out, but until we have a solution to
> the memory consumption, I think we are better of with the current HashMap
> implementation of property storage and then delay the implementation of
> pushing inherited props into the children.
>

My understanding of the Property side is still a bit too fragmented, but
I'll try do describe what I think to be 'the best of both worlds' in the
case of inherited props (maybe it is already (partially) done this way, I'm
not sure, but I don't get that impression from the code or the docs... see
the pointer below[1]) AFAICT, the way I describe it would turn the logic
almost inside-out :

One structure for an inherited property (in the form of a subclass? or an
'Inheritable' interface?) that contains a reference to the base FObj and an
array of all FObj's whose property accessors should be routed directly to
return the property (value) of the base FObj. This to avoid unnecessary
calls later on to an FObj's accessor from which you already know at parse
time that it's going to dispatch the call anyway. Take the case of a
property value where the property in question is inherited from some levels
up... The difference would be that, instead of n Property objects, each with
its own FObj, you would end up with one larger Property object --larger,
because it has to store references of all FObj's to which it applies - that
is in some way shared by n FObj's.

I have for the moment, no idea whether such an approach is feasible, but one
thing I do know, is that if it is, it'll save on a number of recursive calls
later on in the process.

When an FObj is interrogated by Layout about an unspecified value for a
property which falls in the inheritable category, the call for the property
value should trigger:

- getting the id for the nearest ancestor on which the property *was*
specified (conveniently stored this info at parse time, so shouldn't be a
problem)
- pass this id as a parameter to get({property}), so effectively reducing
the number of calls to 1

I must admit, I do see an increase in storage for :
- FO sources with little or no nesting (? Is this even conceivable? )
- FO sources where the values for inherited properties are specified at
every level (but this IMHO should be considered as 'bad practice')

   [Me]
>  > I just wonder if this has something to do with
>  > Finn's other idea of moving logic to the property side to save on
>  > unnecessary calls to empty methods ?

> [Finn]
> No, it is unrelated, but while I made the patch to remove the generated
> maker classes
>

As I said, I have a number of things to learn WRT the property side, so
thanks for the enlightenment.

>    http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25873
>
> , I've concluded that it would also be good design decision and I plan
> on updating patch 25873 to show that the Property.Maker can become
> simpler and easier to understand as well as faster.

Great! Looking forward to this.


Cheers,

Andreas

[1] http://xml.apache.org/fop/design/properties.html#property-list-struct
    in the main logic : "If the property is inherited, the process repeats
using the PropertyList of the FO's parent object. (This is easy because ..."
Of course, this would be _easy_, but my question is: Would it also be the
_preferred_ way of dealing with inheritance?

Re: [Bug 25480] - Experimental performance improvements.

Posted by Finn Bock <bc...@worldonline.dk>.

[Glen Mazza]

> ... what's your opinion on switching to FO
> Constants at this time?  They probably will not give
> us the rate of return that property constants have;
> but there may be future indexing or processing
> advantages with them.  I'm not strong one way or the
> other on them.

Since then, you have removed the unused support for elementTable,
which was the only place a FO constanst id was used. So I would
delay the implementation of FO ids until that future arrives.

> http://marc.theaimsgroup.com/?l=fop-dev&m=107182754300725&w=2
> 
> Your solution in your email does not look difficult to
> implement, but as a simplification, how about just
> implementing getElementId() within FONode to query an
> new ID if its value is unintialized, say, -1, and to
> store that value with the FONode, *instead of* having
> each Extension FO's implement that query & store
> function themselves?  

That would also work, but the beauty of having each extension FO
do the work is that it saves the size of an int (4 bytes) for each
instance of a FONode. In my mail, the extension stores the id value
in a static vrbl so it doesn't weight down every FONode object.

>>And it includes the properties that are valid for
>>all the children element of
>>each FO. That is what the large while loop does when
>>it calls mergeContent.
> 
> 
> OK--but to satisfy the spec, we probably need to add a
> static boolean array for the properties, defining
> true/false of whether each property is "inheritable". 

Right, my patch stores that array in PropertyListBuilder.inherit.

> Because one can attach an inheritable property to any
> FO, regardless of whether or not it is relevant for it
> or its children, we don't want to report an error if
> the user chooses to do so--we can query this static
> array to determine that we should just ignore the
> property instead.

A good point, my patch just reports an error when a property is
specified that isn't relevant for the FO or any of its children.

regards,
finn

Re: [Bug 25480] - Experimental performance improvements.

Posted by Glen Mazza <gr...@yahoo.com>.

--- Finn Bock <bc...@worldonline.dk> wrote:
> > 1.) For the new PropertyList constructor (in the
> patch), you appear to be 
> > duplicating the element ID argument, once as "el",
> the other time 
> > as "elementId"--just to confirm, they are
> referring to the same thing (and 
> > hence one of them can be removed)?
> 
> Yes, it is a silly leftover from my own conversion
> process.
> 

OK--BTW, what's your opinion on switching to FO
Constants at this time?  They probably will not give
us the rate of return that property constants have;
but there may be future indexing or processing
advantages with them.  I'm not strong one way or the
other on them.

One issue your brought up in the email below, was that
of extension elements needing their own ID's
dynamically:  

http://marc.theaimsgroup.com/?l=fop-dev&m=107182754300725&w=2

Your solution in your email does not look difficult to
implement, but as a simplification, how about just
implementing getElementId() within FONode to query an
new ID if its value is unintialized, say, -1, and to
store that value with the FONode, *instead of* having
each Extension FO's implement that query & store
function themselves?  

> > ----------------
> 
> No, the + 1 is a deliberate trick to handle unknown
> properties which
> should return a null value during lookup().

<snip/>

Excellent--thanks for your full explanation--I
understand it now, and have added comments to
PropertySets.java to make it clearer for others who
may also have questions.

> > ------------------
> > 
> And it includes the properties that are valid for
> all the children element of
> each FO. That is what the large while loop does when
> it calls mergeContent.

OK--but to satisfy the spec, we probably need to add a
static boolean array for the properties, defining
true/false of whether each property is "inheritable". 

Because one can attach an inheritable property to any
FO, regardless of whether or not it is relevant for it
or its children, we don't want to report an error if
the user chooses to do so--we can query this static
array to determine that we should just ignore the
property instead.  I can probably help out with
this--but I'll wait until later when 25873 is
settled--we may end up having this information created
anyway by then.

> , I've concluded that it would also be good design
> decision and I plan
> on updating patch 25873 to show that the
> Property.Maker can become
> simpler and easier to understand as well as faster.
> 

Looking forward to seeing it--I haven't had much time
to look at 25873 yet.  One possible way I see of 
simplifying the makers would be for them to no longer
be a nested inner class of Property, but either part
of the Property class itself or part of its own class.

BTW, when you have write access available, let me know
when you'd like me to step back, I'll happily retreat
back to the renderers and let you take over the
properties at your much faster rate of speed.

Thanks,
Glen

__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus