You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by Finn Bock <bc...@worldonline.dk> on 2004/10/13 12:40:14 UTC

Performance improvement in property consumption.

Hi Team,

I've put together another largish patch that changes the way properties
and used by the FO objects, the layout managers and the FOEventHandler
subclasses.

     http://issues.apache.org/bugzilla/show_bug.cgi?id=31699

The performance increase is 42% while at the same time using %27 less
memory. The numbers comes from rendering the initial chapters of DocBook 
The Definite Guide [1] to a 131 page pdf file using jdk1.4.2_03. 
Performance is meassured in wallclock time and memory by finding the 
minimum -Xmx value where the document can be rendered (in steps of 1000000).

The improvement in performance and in memory are due to PropertyList now
being a temporary object that is only available to a FObj during the
call to FObj.bind(). All fo elements are required to extract all the
properties that they need and to store the values as instance fields in
the fo element.

When the PropertyList is a temporary object that isn't stored in the
FObj, the size of the PropertyList does not matter and a much faster
version can be used (implemented by StaticPropertyList).

The patch also adds correct property resolution for fo:marker and
fo:retrieve-marker. Children of fo:marker has a memory efficient version
of their PropertyList stored within the Marker object and they are
re-bind'ed by fo:retrieve-marker each time a marker is retrieved.

Any objections?

[1] http://www.apache.org/~bckfnn/DocBookTDG.zip

regards,
finn

Re: Performance improvement in property consumption.

Posted by Finn Bock <bc...@worldonline.dk>.

>>Keep in mind that there is 2 different sets of
>>properties:
>>- The set of specified properties.
>>- The relevant properties (as listed in the spec
>>under each element).
>>
>>The existing PropertyList stores the specified
>>properties in the super 
>>HashMap and has an additional cache which stores all
>>retrieved properties.

[Glen]

> Ummm...just to be very careful so I can understand
> what you're saying--instead of "retrieved properties"
> above, did you mean "relevant properties"?

No, the current PropertyList caches nearly all accessed properties. This 
  was added (IIUC) to improve speed of repeated retrieval of the same 
property, as typically seen for inherited properties.

You can find the caching in PropertyList.findProperty().

regards,
finn

Re: Performance improvement in property consumption.

Posted by Glen Mazza <gr...@yahoo.com>.

--- Finn Bock <bc...@worldonline.dk> wrote:
> 
> [Glen]
> 
> > Why is it more efficient (I know it is, given your
> > metrics, but want to know why)--aren't you just
> moving
> > the values already stored in the PropertyList into
> > separate fields in the FO objects?  Yes, you're
> > releasing the PropertyList's memory, but the
> elements
> > that the PropertyList previously stored are now
> stored
> > in the FObj.
> 
> Keep in mind that there is 2 different sets of
> properties:
> - The set of specified properties.
> - The relevant properties (as listed in the spec
> under each element).
> 
> The existing PropertyList stores the specified
> properties in the super 
> HashMap and has an additional cache which stores all
> retrieved properties.
> 

Ummm...just to be very careful so I can understand
what you're saying--instead of "retrieved properties"
above, did you mean "relevant properties"?


> In my proposal the specified and the cached
> properties are still stored 
> in the property list but only the relevant
> properties are retained in 
> the fo object.
> 

Glen

Re: Performance improvement in property consumption.

Posted by Glen Mazza <gr...@yahoo.com>.

Thanks for your explanation Finn.  (Also thanks Peter
and Andreas for taking the time to respond--I read
through both your messages quite carefully as well, in
order to better understand the property resolution
issues involved.)  I looked at the current code and
the patch again, and I think I now have a better
understanding of why it performs faster.

Anyway, +1 for this change, except I would like to
have  the FONode.start() methods renamed to
.startOfNode().   IMO it is a little more descriptive
to newcomers to the code (even if annoying for those
very familiar with the code.)  Also it complements the
endOfNode() method (although I admittedly renamed that
from end()), and  it helps with global searches/S&R's,
as start() may also be defined in other packages with
completely different meanings.

Thanks,
Glen

--- Finn Bock <bc...@worldonline.dk> wrote:

Re: Performance improvement in property consumption.

Posted by Finn Bock <bc...@worldonline.dk>.

>>In my proposal the specified and the cached properties are still stored
>>in the property list but only the relevant properties are retained in
>>the fo object.

[Andreas]

> Yes, and IIJC, at the same time, you're eliminating the need for downstream
> property queries having to be performed through the PropertyList, so the
> FObj's can communicate directly (--less underlying HashMap fetches...)
> 
> So roughly:
> a. FObj1 asks FObj2
> b. FObj2 probes its property instance variable
> c. FObj2 responds to FObj1
> 
> instead of
> a. FObj1 asks its PropertyList
> b. PropertyList asks FObj2's PropertyList
> c. FObj2's PropertyList queries its HashMap
> d. FObj2's PropertyList responds to PropertyList
> e. PropertyList responds to FObj1

No, at the startElement() event the property list exists for all the 
parent elements and they are used to answer all property queries, 
including the property function and inheritance.

So the process is you outline above is unchanged.

PS. I'm ignoring the handling of markers in my descriptions.

regards,
finn

RE: Performance improvement in property consumption.

Posted by "Andreas L. Delmelle" <a_...@pandora.be>.

> -----Original Message-----
> From: Finn Bock [mailto:bckfnn@worldonline.dk]
>

Hi there,

> [Glen]
>
> > Why is it more efficient (I know it is, given your
> > metrics, but want to know why)--...
<snip />
> In my proposal the specified and the cached properties are still stored
> in the property list but only the relevant properties are retained in
> the fo object.
>

Yes, and IIJC, at the same time, you're eliminating the need for downstream
property queries having to be performed through the PropertyList, so the
FObj's can communicate directly (--less underlying HashMap fetches...)

So roughly:
a. FObj1 asks FObj2
b. FObj2 probes its property instance variable
c. FObj2 responds to FObj1

instead of
a. FObj1 asks its PropertyList
b. PropertyList asks FObj2's PropertyList
c. FObj2's PropertyList queries its HashMap
d. FObj2's PropertyList responds to PropertyList
e. PropertyList responds to FObj1

> > So if PropertyList can be thought of as a C-like
> > struct holding the values of its FObj's properties,
> > what you're doing appears to be just taking that
> > struct's member variables and moving them to the FObj.
>
> No, see above.

To clarify this further: PropertyList seems to be more than simple wrapper
around the individual property-bundles. It offers a functionality of its
own, so needs extra space, and because it needs to be alive to be able to
provide, it currently occupies this extra space longer than strictly
necessary. It's not a simple data structure, it's an object in itself,
occupying memory that won't be GC'ed until the parent FObj is released.
What you seem to propose is: limit the functionality of the PropertyList
object to the translation of the captured lists of attributes to the already
instantiated FOBj's 'properties' --in the sense of instance variables--, and
because storing the props in instance vars eliminates the need for
inter-FObj communication through the PropertyList, it speeds up the process
downstream as well... I like it :-)

>
> > But, obviously, given the performance/memory boost
> > you're noting, PropertyList *can't* be regarded as a
> > C-like struct.  Why?  Could PropertyList be made more
> > efficient instead of this change--make it more like a
> > C-like struct?
>
> Speed can be improved, but at the cost of additional memory.
>

Indeed! While your approach would only affect the size of the different FObj
Classes, not the instantiated FObj's themselves. The latter would only grow
because they now have extra instance variables --but the balance is
ultimately kept since their PropertyList had to store these anyway--, plus
certain portions of occupied memory are released earlier. All the different
PropertyLists' HashMaps for starters... Besides that, I kind of like the
idea of the API reflecting the spec in this way.

> The beauty of my proposal is that we can pick the fastest implementation
> of property assignment and property lookup without worrying about the
> memory because the property list is released.
>

Unless there's a piece of the PropertyList's functionality I'm overlooking
here...
i.e. Are there conceivable situations where the particular functions offered
by the current PropertyList *have* to be available downstream? And where
they can't be replaced by a similar addition of functionality to the FObj
(--which you always have a reference to anyway)?

If the answer to immediately above questions is 'No', I can dig the beauty
of the approach :-)


Greetz,

Andreas

Re: Performance improvement in property consumption.

Posted by Finn Bock <bc...@worldonline.dk>.

[Glen]

>>>So if we did this at the FO level, in effect, we'd
>>>have to (1) store an instance variable of every
>>>valid
>>>property for each FO, given that we wouldn't know
>>>whether the FOEventHandler's needs it beforehand,
>>>and

[me]

>>Yes. Which is massively more efficient than storing
>>the exact same 
>>properties in a PropertyList.

[Glen]

> Why is it more efficient (I know it is, given your
> metrics, but want to know why)--aren't you just moving
> the values already stored in the PropertyList into
> separate fields in the FO objects?  Yes, you're
> releasing the PropertyList's memory, but the elements
> that the PropertyList previously stored are now stored
> in the FObj.

Keep in mind that there is 2 different sets of properties:
- The set of specified properties.
- The relevant properties (as listed in the spec under each element).

The existing PropertyList stores the specified properties in the super 
HashMap and has an additional cache which stores all retrieved properties.

In my proposal the specified and the cached properties are still stored 
in the property list but only the relevant properties are retained in 
the fo object.

> So if PropertyList can be thought of as a C-like
> struct holding the values of its FObj's properties,
> what you're doing appears to be just taking that
> struct's member variables and moving them to the FObj.

No, see above.

> But, obviously, given the performance/memory boost
> you're noting, PropertyList *can't* be regarded as a
> C-like struct.  Why?  Could PropertyList be made more
> efficient instead of this change--make it more like a
> C-like struct?

Speed can be improved, but at the cost of additional memory.

The beauty of my proposal is that we can pick the fastest implementation 
of property assignment and property lookup without worrying about the 
memory because the property list is released.

regards,
finn

Re: Performance improvement in property consumption.

Posted by Glen Mazza <gr...@yahoo.com>.

--- Finn Bock <bc...@worldonline.dk> wrote:
> 
> > So if we did this at the FO level, in effect, we'd
> > have to (1) store an instance variable of every
> valid
> > property for each FO, given that we wouldn't know
> > whether the FOEventHandler's needs it beforehand,
> and
> 
> Yes. Which is massively more efficient than storing
> the exact same 
> properties in a PropertyList.
> 

Why is it more efficient (I know it is, given your
metrics, but want to know why)--aren't you just moving
the values already stored in the PropertyList into
separate fields in the FO objects?  Yes, you're
releasing the PropertyList's memory, but the elements
that the PropertyList previously stored are now stored
in the FObj.  

So if PropertyList can be thought of as a C-like
struct holding the values of its FObj's properties,
what you're doing appears to be just taking that
struct's member variables and moving them to the FObj.

But, obviously, given the performance/memory boost
you're noting, PropertyList *can't* be regarded as a
C-like struct.  Why?  Could PropertyList be made more
efficient instead of this change--make it more like a
C-like struct?

> > (2) instead of the programmatic convenience of
> > getPropString(PR_OFFICIAL_PROPERTY_NAME), we'd
> have to
> > remember what each of the properties are named in
> the
> > FO's (We can probably standardize these though.)  
> 
> I've consistly picked getOfficielPropertyName() as
> the name of the 
> getter method.
> 

OK--good--strike that argument then.  (Although that's
a lot of methods for us to have.)

> > Another clarification needed:  when would the
> property
> > list's memory get released--at the PageSequence
> level,
> > or as soon as the FO is finished?  
> 
> At the endElement() event.
> 

OK, I like that--it guarantees that every node's
ancestor--even FORoot is alive while that node is
being processed.  This is good.

> > Because the values
> > of some FO's properties further downstream depend
> on
> > an earlier FO's properties, we may not be able to
> > release the memory when the earlier FO is
> finished. 
> 
> If a property value is needed by a following
> sibling, then it can be 
> found as an instance field on the fo object that
> defined the property.
> 

Yes, I see. Good.

Thanks,
Glen

Re: Performance improvement in property consumption.

Posted by Finn Bock <bc...@worldonline.dk>.

[Glen]

>>All fo elements are required to
>>extract all the
>>properties that they need and to store the values as
>>instance fields in
>>the fo element.
>>
> 
> 
> I love the performance boost you're recording but have
> a philosophical issue of an "fo object needing a
> property"--it is really an issue of the
> *FOEventHandler subclass* that needs the
> properties--and some handlers need/implement more
> properties than others.

Does any event handler need more or different properties than what is 
defined in the spec under each fo element? Because that is the set of 
properties that I store in the fo objects.

If further properties are needed by any property consumer, it can just 
be added to the fo element. An example could be "text-decoration" on 
fo:block.

> So if we did this at the FO level, in effect, we'd
> have to (1) store an instance variable of every valid
> property for each FO, given that we wouldn't know
> whether the FOEventHandler's needs it beforehand, and

Yes. Which is massively more efficient than storing the exact same 
properties in a PropertyList.

> (2) instead of the programmatic convenience of
> getPropString(PR_OFFICIAL_PROPERTY_NAME), we'd have to
> remember what each of the properties are named in the
> FO's (We can probably standardize these though.)  

I've consistly picked getOfficielPropertyName() as the name of the 
getter method.

> Another clarification needed:  when would the property
> list's memory get released--at the PageSequence level,
> or as soon as the FO is finished?  

At the endElement() event.

> Because the values
> of some FO's properties further downstream depend on
> an earlier FO's properties, we may not be able to
> release the memory when the earlier FO is finished. 

If a property value is needed by a following sibling, then it can be 
found as an instance field on the fo object that defined the property.

> Or, (to support inheritance) we may need to store
> instance variables for what may be a very unlikely
> event of a child needing that value.

I don't understand what you mean. All property inheritence is resolved 
at the time of the child startElement() event. The resolution is done 
using the parent propertylist while it still exists.

> Also, are there any other options for releasing this
> memory that you know of? Is there someway this .bind()
> or other releasing of memory can be done within the
> FOEventHandler subclasses, or at least within
> LayoutManager (forgetting about RTF) instead?  

I don't see any benefit at all, from such approaches.

PS. The description of releasing property list only apply to normal 
elements. All element under fo:marker are treated differently.

regards,
finn

Re: Performance improvement in property consumption.

Posted by Glen Mazza <gr...@yahoo.com>.

--- Finn Bock <bc...@worldonline.dk> wrote:

> Hi Team,
> 
> I've put together another largish patch that changes
> the way properties
> and used by the FO objects, the layout managers and
> the FOEventHandler
> subclasses.
> 
>     
>
http://issues.apache.org/bugzilla/show_bug.cgi?id=31699
> 
> The performance increase is 42% while at the same
> time using %27 less
> memory. 

... 

> All fo elements are required to
> extract all the
> properties that they need and to store the values as
> instance fields in
> the fo element.
> 

I love the performance boost you're recording but have
a philosophical issue of an "fo object needing a
property"--it is really an issue of the
*FOEventHandler subclass* that needs the
properties--and some handlers need/implement more
properties than others.  

So if we did this at the FO level, in effect, we'd
have to (1) store an instance variable of every valid
property for each FO, given that we wouldn't know
whether the FOEventHandler's needs it beforehand, and
(2) instead of the programmatic convenience of
getPropString(PR_OFFICIAL_PROPERTY_NAME), we'd have to
remember what each of the properties are named in the
FO's (We can probably standardize these though.)  

Another clarification needed:  when would the property
list's memory get released--at the PageSequence level,
or as soon as the FO is finished?  Because the values
of some FO's properties further downstream depend on
an earlier FO's properties, we may not be able to
release the memory when the earlier FO is finished. 
Or, (to support inheritance) we may need to store
instance variables for what may be a very unlikely
event of a child needing that value.

Also, are there any other options for releasing this
memory that you know of? Is there someway this .bind()
or other releasing of memory can be done within the
FOEventHandler subclasses, or at least within
LayoutManager (forgetting about RTF) instead?  

Not ruling this out--just want to be the devil's
advocate here.

Thanks,
Glen