You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Andreas L Delmelle <a_...@pandora.be> on 2007/03/30 01:04:24 UTC

Suggestion: refactoring property access in the FO tree?

Hi,

Just an idea that popped in my head. I was thinking of the access to  
a FONode's properties from the layoutengine, and it became apparent  
to me that ATM the approach is not too flexible: each subclass has  
its own little set of private instance methods and public accessors.  
This makes it hard to implement new properties or deal with extension  
properties in a generic way.

In ancient times, each FO had a full PropertyList, so the properties  
could be queried via a generic get(PROPERTY_ID) accessor that was  
simply a proxy to the PropertyList's corresponding get(). This was,  
however, a much less efficient approach than what we currently have.

My suggested best-of-both-worlds would be the addition of (at least)  
one new type, which would store the applicable properties for a FO.

Starting with an interface:

public interface FOProperties {
     Property get(int);
     Property get(String);
}

Each of the particular FO classes can then define its own  
implementation, which stores the applicable properties and maybe, for  
some FOs (like FOText!), this implementation can simply search the  
ancestors, instead of having to allocate space for properties that  
are always identical and can't be specified anyway...

The downside would be the loss in convenience, for instance, where we  
now have individual accessors returning a Length, LengthRange,  
Numeric... Not sure how I would address this, yet. :/

If anyone has suggestions, feedback is welcome.


Cheers,

Andreas

Re: Suggestion: refactoring property access in the FO tree?

Posted by Vincent Hennebert <vi...@anyware-tech.com>.
Hi Richard,

Thanks for your contribution. I'll try to have a look ASAP but I'm a bit
in a hurry currently. Stay tuned.

Vincent


richardw geoquip-rnd demon co uk a écrit :
> Vincent Hennebert writes:
>  > I fully agree. Good design should not be sacrificed for efficiency.
>  > Anyway only the profiling of a FOP run would give us proper indications
>  > of what is actually eating time and memory, and where we should start
>  > optimizing.
> 
> 
> If you apply the the recent patches here:
>    http://issues.apache.org/bugzilla/show_bug.cgi?id=41656
> and here:
>    http://issues.apache.org/bugzilla/show_bug.cgi?id=41044
> then process a big silly 40MB plus fop table and wait for it to
> bomb out with an OOM you wind up with a set of instance counts
> as shown:
> 
> 134407 instances of class org.apache.fop.fo.FONode[]
> 106438 instances of class char[]
> 100833 instances of class org.apache.fop.fo.FOText
> 100805 instances of class org.apache.fop.fo.flow.Block
> 100803 instances of class org.apache.fop.fo.flow.TableCell
> 33601 instances of class org.apache.fop.fo.flow.TableRow
> 5618 instances of class java.lang.String
> 5047 instances of class java.lang.Integer
> 4316 instances of class java.util.HashMap$Entry
> 2620 instances of class org.apache.fop.fo.FObj$FObjIterator
> 1475 instances of class java.lang.Class
> 
> Richard
> 

Re: Suggestion: refactoring property access in the FO tree?

Posted by ri...@geoquip-rnd.demon.co.uk.
Vincent Hennebert writes:
 > I fully agree. Good design should not be sacrificed for efficiency.
 > Anyway only the profiling of a FOP run would give us proper indications
 > of what is actually eating time and memory, and where we should start
 > optimizing.


If you apply the the recent patches here:
   http://issues.apache.org/bugzilla/show_bug.cgi?id=41656
and here:
   http://issues.apache.org/bugzilla/show_bug.cgi?id=41044
then process a big silly 40MB plus fop table and wait for it to
bomb out with an OOM you wind up with a set of instance counts
as shown:

134407 instances of class org.apache.fop.fo.FONode[]
106438 instances of class char[]
100833 instances of class org.apache.fop.fo.FOText
100805 instances of class org.apache.fop.fo.flow.Block
100803 instances of class org.apache.fop.fo.flow.TableCell
33601 instances of class org.apache.fop.fo.flow.TableRow
5618 instances of class java.lang.String
5047 instances of class java.lang.Integer
4316 instances of class java.util.HashMap$Entry
2620 instances of class org.apache.fop.fo.FObj$FObjIterator
1475 instances of class java.lang.Class

Richard


Re: Suggestion: refactoring property access in the FO tree?

Posted by Vincent Hennebert <vi...@anyware-tech.com>.
Hi Andreas,

Andreas L Delmelle a écrit :
> On Mar 30, 2007, at 11:21, Vincent Hennebert wrote:
<snip/>
>>> In ancient times, each FO had a full PropertyList, so the properties
>>> could be queried via a generic get(PROPERTY_ID) accessor that was simply
>>> a proxy to the PropertyList's corresponding get(). This was, however, a
>>> much less efficient approach than what we currently have.
>>
>> To be sure I understand: each object had the very same list of
>> properties, with null values for the properties wich were not applicable
>> to it?
> 
> Errm... yes, if what you mean by 'the very same' is 'a different
> instance of the very same type'.

Yes that's what I meant. Am I imprecise sometimes. And I dare
complaining about the Rec's uncertainties ;-)


> Each FObj instance had its very own instance of the same PropertyList type.
> 
>> And the loss of efficiency was due to the indirection caused by
>> the generic get(PROPERTY_ID) I guess?
> 
> The biggest inefficiency was the space that these lists consume. They
> allocate space for an array with a number of elements equal to the
> number of /possible/ properties, effectively wasting space in case of
> FOs to which only a handful properties apply. On top of that, this space
> was not reclaimed until very late in the process, whereas now, the
> PropertyLists and their backing arrays in most cases can be
> garbage-collected right after endElement().
> 
> A FO to which only three properties apply will currently have only three
> instance variables corresponding to those properties. In the old
> architecture, it would have one PropertyList member, containing an array
> that could store some 250 Property instances, but only three elements
> were actually in use...

Not that I want to go back to that situation, but a Map instead of an
array would probably have been acceptable? More memory-efficient without
being much more CPU-intensive I think.


> The inefficiency caused by the indirection I would consider to be a
> small but necessary price to pay for an increased amount of flexibility
> and extensibility.

I fully agree. Good design should not be sacrificed for efficiency.
Anyway only the profiling of a FOP run would give us proper indications
of what is actually eating time and memory, and where we should start
optimizing.


>>> <snip />
>>> Each of the particular FO classes can then define its own
>>> implementation, which stores the applicable properties and maybe, for
>>> some FOs (like FOText!), this implementation can simply search the
>>> ancestors, instead of having to allocate space for properties that are
>>> always identical and can't be specified anyway...
>>
>> Hmmm. My understanding of the whole thing is still a bit vague, but
>> wouldn't that lead to the same code replication as we have now?
>> I'm wondering if it's not possible to define a restricted number of
>> implementations of this interface, each applying to a whole set of
>> similar FOs.
> 
> Good point!
> 
>> Or use object composition: there would be objects dealing
>> with a particular set of properties (say, border/padddin/background),
>> and each FO for which that set applies would be composed of the
>> corresponding object.
>>
>> Perhaps the flyweight pattern could apply here: only one object for each
>> set of properties, initializing the correct fields in the FObj. With
>> some means to automagically wire everything by using marker interfaces
>> or whatever.
> 
> Also fine suggestions. I'm going to chew on these a bit more.
> 
>> As I said my ideas are still all pretty vague... Also, does anyone have
>> knowledge about aspect programming? I've the feeling that could apply
>> quite well here.
> 
> Just been reading up on AOSD, and it does indeed seem to be fitting in
> here.
> 
>>> The downside would be the loss in convenience, for instance, where we
>>> now have individual accessors returning a Length, LengthRange,
>>> Numeric... Not sure how I would address this, yet. :/
>>
>> I'm not sure of what you mean? The same property can be accessed in
>> different ways?
> 
> What I mean is that right now, the bind() method already takes care of
> getting the right type of Property.
> pList.get(PROPERTY_ID).getLengthRange() for instance...
> 
> An accessor getInlineProgressionDimension(), for example, returns a
> LengthRange (not a Property).

Ok, I see your point. Well then the specific implementations I was
talking about earlier could probably handle that. For example almost
every formatting object can have border/padding/background. We would
have a specific class in charge of getting the values of these
properties set on the object (or retrieving inherited values), and
converting them into the right types (e.g., LengthRange). Each
applicable object would be given an instance of such a class at its
creation. When needed it would call the specific methods from this class
(getBorderBefore, etc.). Something like that...


Cheers,
Vincent

Re: Suggestion: refactoring property access in the FO tree?

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Mar 30, 2007, at 11:21, Vincent Hennebert wrote:

Hi Vincent,

> I have little knowledge of the FO tree construction, so I'll perhaps
> make naive questions and remarks. I write them in the hope they will
> trigger further thoughts.

That was the general idea. :-)
Besides, people who are not all too familiar yet with that part of  
the code are sometimes more likely to offer helpful hints (as their  
thinking has not yet been contaminated ;-))

> <snip />
> Also, that causes the same code to be replicated in every class; for
> example each object for which the margin properties apply will have  
> (1)
> a CommonMarginBlock field; (2) a "commonMarginBlock =
> pList.getMarginBlockProps();" line in the bind method. This  
> replication
> of code is unfortunate as it artificially increases the codebase size,
> is more error-prone and more difficult to maintain.

Indeed! Yet another reason pro refactoring in that area.

>> In ancient times, each FO had a full PropertyList, so the properties
>> could be queried via a generic get(PROPERTY_ID) accessor that was  
>> simply
>> a proxy to the PropertyList's corresponding get(). This was,  
>> however, a
>> much less efficient approach than what we currently have.
>
> To be sure I understand: each object had the very same list of
> properties, with null values for the properties wich were not  
> applicable
> to it?

Errm... yes, if what you mean by 'the very same' is 'a different  
instance of the very same type'.
Each FObj instance had its very own instance of the same PropertyList  
type.

> And the loss of efficiency was due to the indirection caused by
> the generic get(PROPERTY_ID) I guess?

The biggest inefficiency was the space that these lists consume. They  
allocate space for an array with a number of elements equal to the  
number of /possible/ properties, effectively wasting space in case of  
FOs to which only a handful properties apply. On top of that, this  
space was not reclaimed until very late in the process, whereas now,  
the PropertyLists and their backing arrays in most cases can be  
garbage-collected right after endElement().

A FO to which only three properties apply will currently have only  
three instance variables corresponding to those properties. In the  
old architecture, it would have one PropertyList member, containing  
an array that could store some 250 Property instances, but only three  
elements were actually in use...

The inefficiency caused by the indirection I would consider to be a  
small but necessary price to pay for an increased amount of  
flexibility and extensibility.

>> <snip />
>> Each of the particular FO classes can then define its own
>> implementation, which stores the applicable properties and maybe, for
>> some FOs (like FOText!), this implementation can simply search the
>> ancestors, instead of having to allocate space for properties that  
>> are
>> always identical and can't be specified anyway...
>
> Hmmm. My understanding of the whole thing is still a bit vague, but
> wouldn't that lead to the same code replication as we have now?
> I'm wondering if it's not possible to define a restricted number of
> implementations of this interface, each applying to a whole set of
> similar FOs.

Good point!

> Or use object composition: there would be objects dealing
> with a particular set of properties (say, border/padddin/background),
> and each FO for which that set applies would be composed of the
> corresponding object.
>
> Perhaps the flyweight pattern could apply here: only one object for  
> each
> set of properties, initializing the correct fields in the FObj. With
> some means to automagically wire everything by using marker interfaces
> or whatever.

Also fine suggestions. I'm going to chew on these a bit more.

> As I said my ideas are still all pretty vague... Also, does anyone  
> have
> knowledge about aspect programming? I've the feeling that could apply
> quite well here.

Just been reading up on AOSD, and it does indeed seem to be fitting  
in here.

>> The downside would be the loss in convenience, for instance, where we
>> now have individual accessors returning a Length, LengthRange,
>> Numeric... Not sure how I would address this, yet. :/
>
> I'm not sure of what you mean? The same property can be accessed in
> different ways?

What I mean is that right now, the bind() method already takes care  
of getting the right type of Property.
pList.get(PROPERTY_ID).getLengthRange() for instance...

An accessor getInlineProgressionDimension(), for example, returns a  
LengthRange (not a Property).

This treatment would obviously have to be moved or performed  
somewhere else, if a generic get() would always return an instance of  
type Property.

>> If anyone has suggestions, feedback is welcome.
>
> I hope what I wrote makes sense :-\

It did.


Cheers,

Andreas

Re: Suggestion: refactoring property access in the FO tree?

Posted by Vincent Hennebert <vi...@anyware-tech.com>.
Hi Andreas,

I have little knowledge of the FO tree construction, so I'll perhaps
make naive questions and remarks. I write them in the hope they will
trigger further thoughts.

Andreas L Delmelle a écrit :
> 
> Hi,
> 
> Just an idea that popped in my head. I was thinking of the access to a
> FONode's properties from the layoutengine, and it became apparent to me
> that ATM the approach is not too flexible: each subclass has its own
> little set of private instance methods and public accessors. This makes
> it hard to implement new properties or deal with extension properties in
> a generic way.

Also, that causes the same code to be replicated in every class; for
example each object for which the margin properties apply will have (1)
a CommonMarginBlock field; (2) a "commonMarginBlock =
pList.getMarginBlockProps();" line in the bind method. This replication
of code is unfortunate as it artificially increases the codebase size,
is more error-prone and more difficult to maintain.


> In ancient times, each FO had a full PropertyList, so the properties
> could be queried via a generic get(PROPERTY_ID) accessor that was simply
> a proxy to the PropertyList's corresponding get(). This was, however, a
> much less efficient approach than what we currently have.

To be sure I understand: each object had the very same list of
properties, with null values for the properties wich were not applicable
to it? And the loss of efficiency was due to the indirection caused by
the generic get(PROPERTY_ID) I guess?


> My suggested best-of-both-worlds would be the addition of (at least) one
> new type, which would store the applicable properties for a FO.
> 
> Starting with an interface:
> 
> public interface FOProperties {
>     Property get(int);
>     Property get(String);
> }
> 
> Each of the particular FO classes can then define its own
> implementation, which stores the applicable properties and maybe, for
> some FOs (like FOText!), this implementation can simply search the
> ancestors, instead of having to allocate space for properties that are
> always identical and can't be specified anyway...

Hmmm. My understanding of the whole thing is still a bit vague, but
wouldn't that lead to the same code replication as we have now? I'm
wondering if it's not possible to define a restricted number of
implementations of this interface, each applying to a whole set of
similar FOs. Or use object composition: there would be objects dealing
with a particular set of properties (say, border/padddin/background),
and each FO for which that set applies would be composed of the
corresponding object.

Perhaps the flyweight pattern could apply here: only one object for each
set of properties, initializing the correct fields in the FObj. With
some means to automagically wire everything by using marker interfaces
or whatever.

As I said my ideas are still all pretty vague... Also, does anyone have
knowledge about aspect programming? I've the feeling that could apply
quite well here.


> The downside would be the loss in convenience, for instance, where we
> now have individual accessors returning a Length, LengthRange,
> Numeric... Not sure how I would address this, yet. :/

I'm not sure of what you mean? The same property can be accessed in
different ways?


> If anyone has suggestions, feedback is welcome.

I hope what I wrote makes sense :-\


Cheers,
Vincent