You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Glen Mazza <gr...@yahoo.com> on 2004/01/18 21:04:14 UTC

Comments on new property maker implementation

Finn,

I've looked at your changes--I like them, and I'm
thankful to have someone on our team to be able to
redesign the properties as you have.  Getting rid of
the 250 autogenerated or so classes will be a welcome
improvement.

Comments right now:

1.)  Unlike what I was saying earlier, I don't think
we should move from Property.Maker to a new
PropertyMaker class after all, your design looks fine.
 I've noticed most subclasses of Property.Maker are
within subclasses of Properties themselves (e.g.,
LengthProperty, LengthProperty.Maker, etc.) so it
looks like a neat, clean design.


2.)  The new FOPropertyMapping.java class appears (1)
autogenerated, and (2) to be an XSLT masterpiece at
that as well.  If it is indeed in good shape, I'd like
you to submit it to Bugzilla as the new
fo-property-mapping.xsl, replacing the old one of that
name in src/codegen.  (We won't apply it however,
until we no longer need the current autogenerated
fo-property-mapping.xsl, i.e., until all the old
properties have been tossed out.)

This way if we have to make wide-ranging changes to
FOPropertyMapping, we'll have a XSLT source file we
can conveniently work with.  (Note that putting it in
codegen does *not* mean that it will be automatically
autogenerated anymore--it won't, just as constants.xsl
no longer is--we'll pull it out of the main Ant build
target at that time and keep it the separate, manual
xsltToJava target in our build file[1].

[1]
http://cvs.apache.org/viewcvs.cgi/xml-fop/build.xml?rev=1.97&view=auto
)


Comments on FOPropertyMapping:

I like removing all these autogenerated classes, but I
think we can still keep some processing at
compile-time for more of a performance gain, as
follows:

3)  I think the runtime construction of the generic
properties (genericColor, genericCondBorderWidth,
etc.) may not be necessary.  We can still have those
xslt-generated into classes (6-8 classes total), but
this time we check them into FOP (again, keeping the
xsl available for manual re-generation when needed). 
But most of the generic classes are so small (your
initialization of GenericCondPadding is only 4 lines
of code), that going back to creating concrete classes
would be noticeably beneficial either, so I'm not
recommending this change.

One thing that *does* stick out, however, is the 100
or so addKeyword() calls for genericColor (the largest
of the generic properties):

  genericColor.addKeyword("antiquewhite", "#faebd7");
  genericColor.addKeyword("aqua", "#00ffff");
  genericColor.addKeyword("aquamarine", "#7fffd4");
  .....

I'd like us to have a static array of these
values--i.e., something done compile-time, that
genericColor can just reference, so we don't have to
do this keyword initialization.  


4)  I'd also like us to, rather than call
setInherited() and setDefault() for each of the
properties during initialization, for the
Property/Property.Maker classes to just reference that
information from two (new) static arrays, added to
Constants.java.  We can also get rid of these two
setter methods as well (ideally there shouldn't be
setters for these attributes anyway--they should
remain inherent to the Property.)

This change will allow us to take advantage of the
fact that we are now on int-constants. 
getDefault(PR_WHATEVER), for example, is just
Constants.DefaultArray[PR_WHATEVER].


5)  Similar to (b) above, several of the makers also
have a "useGeneric()" initialization requirement:

m  = new CondLengthProperty.Maker(PR_PADDING_END);
m.useGeneric(genericCondPadding);

For those Makers that require it, I'd like the
constructor to be expanded to this:

m  = new CondLengthProperty.Maker(PR_PADDING_END,
genericCondPadding);

Again, getting rid of the useGeneric() function.  This
is for more speed, encapsulation, and also shrinking
FOPropertyMapping class a bit.

Sorry for the long post.  I'll probably have other
comments in other areas, but this is all I've studied
for now.

Thoughts?

Thanks,
Glen


__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus

RE: Comments on new property maker implementation

Posted by "Andreas L. Delmelle" <a_...@pandora.be>.
> -----Original Message-----
> From: Finn Bock [mailto:bckfnn@worldonline.dk]

> [ Glen : ]
> > Sigh...I guess I *didn't* know bytecode by heart after
> > all!  ;-)
>
> I didn't bring it up to discourage the use of static initialized arrays.
> If it makes sense to put something in a static array we should do so
> without concern of compiletime vs. runtime. After all, the
> initialization is only performed once per classloader.
>

Well, (sorry to disappoint you, Peter) I don't know my BC by heart, but IIRC
the real difference would be in the size of the compiled classes...
See also a little trick I stumbled upon for for-loops. It's common(?)
knowledge that testing for a value to be greater than (or equal to) another
value, is the same as testing whether the result of their subtraction is
greater than (or equal to) zero --rewriting this effectively saves a
processor instruction per comparison.
If you subsequently combine the test and the decrementing of the counter,
you can slightly reduce the size of the compiled class further.

In short:

  for( int i = 0; i < j ; ++i )

is better written as ( = faster; that is: it will save a few hundred
millisecs in large loops, the few that might just be enough to give the
average user the impression that the software is actually any faster than
before )

  for( int i = j; i > 0; --i )

and even better ( leads to even more compact compiled classes;
performance-boost is negligeable with current hardware, but might turn out
to be an advantage --albeit a minor one - when this class has to be loaded
from a network location )

  for( int i = j; --i >= 0; )


Cheers,

Andreas


Re: Comments on new property maker implementation

Posted by Finn Bock <bc...@worldonline.dk>.
> [Finn Bock]
> 
>>>You should perhaps also be aware that the values
>>
>>in a static array gets 
>>
>>>assigned to the array one element at a time. So
>>>
>>>    static int[] a = {
>>
>>101,102,103,104,105,106,107,108 };
>>
>>>becomes in bytecodes:
>>>
>>>Method static {}
>>>   0 bipush 8
>>>   2 newarray int
>>>   4 dup
>>>   5 iconst_0

[Glen Mazza]

> Hmmm...Are you saying that declaring a static array
> isn't much (any?) faster than manually creating one? 

I would guess that it is a bit faster than the typical bytecode for 
manually created arrays since the above uses 'dup' instead of 
'getstatic' or 'aload' to push the array on the stack.

> I didn't realize that there is code being run for
> static arrays--I would have thought the compiled
> bytecode just includes the array internally, and not
> the code to create it.  (i.e., if you opened the
> bytecode you would see an array 101 102 103 104...
> sitting someplace.)

Arrays can't be stored in the constant pool so there is no place to put 
the data except as bytecode.

http://java.sun.com/docs/books/vmspec/html/ClassFile.doc.html#20080

It should perhaps be noted that constant instances of String is not 
really stored in the constant pool either. The pool just stores the 
utf-8 representation of the string constant, and each literal string is 
initialized as a new pool String object based on that (as UTF-16ish).

> Isn't that how C works, at least?

I think so, but C has hardware support for code and data segments so C 
can make better use of it.

> Sigh...I guess I *didn't* know bytecode by heart after
> all!  ;-)

I didn't bring it up to discourage the use of static initialized arrays.
If it makes sense to put something in a static array we should do so 
without concern of compiletime vs. runtime. After all, the 
initialization is only performed once per classloader.

regards,
finn


Re: Comments on new property maker implementation

Posted by Glen Mazza <gr...@yahoo.com>.
--- "Peter B. West" <pb...@powerup.com.au> wrote:
[Finn Bock]
> > 
> > You should perhaps also be aware that the values
> in a static array gets 
> > assigned to the array one element at a time. So
> > 
> >     static int[] a = {
> 101,102,103,104,105,106,107,108 };
> > 
> > becomes in bytecodes:
> > 
> > Method static {}
> >    0 bipush 8
> >    2 newarray int
> >    4 dup
> >    5 iconst_0
....

Hmmm...Are you saying that declaring a static array
isn't much (any?) faster than manually creating one? 
I didn't realize that there is code being run for
static arrays--I would have thought the compiled
bytecode just includes the array internally, and not
the code to create it.  (i.e., if you opened the
bytecode you would see an array 101 102 103 104...
sitting someplace.)  Isn't that how C works, at least?

Sigh...I guess I *didn't* know bytecode by heart after
all!  ;-)

Glen


__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus

Re: Comments on new property maker implementation

Posted by "Peter B. West" <pb...@powerup.com.au>.
Finn Bock wrote:
> 
> I probably need an example of what you thinking are here. Right now in 
> HEAD all the color keywords are stored in a HashMap created in 
> GenericColor so the keywords initialization is already done. Putting the 
> keywords in static array would require us to somehow search the array 
> and I don't see how that will be much faster.
> 
> You should perhaps also be aware that the values in a static array gets 
> assigned to the array one element at a time. So
> 
>     static int[] a = { 101,102,103,104,105,106,107,108 };
> 
> becomes in bytecodes:
> 
> Method static {}
>    0 bipush 8
>    2 newarray int
>    4 dup
>    5 iconst_0
>    6 bipush 101
>    8 iastore
>    9 dup
>   10 iconst_1
>   11 bipush 102
>   13 iastore
>   14 dup
>   15 iconst_2
>   16 bipush 103
>   18 iastore
>   ...
> 
> and so on for each index. (In case you don't know bytecode by heart, 
> iconst and bipush both push a constant on the stack and iastore pops 3 
> items from the stack; an index, a value and an array and assign the 
> value to the index in the array).

Finn,

I can't imagine there is anyone here who doesn't know bytecode by heart. 
  (Except maybe me.)

Peter
-- 
Peter B. West <http://www.powerup.com.au/~pbwest/resume.html>


Re: Comments on new property maker implementation

Posted by Finn Bock <bc...@worldonline.dk>.
> Finn Bock wrote:
> 
>> I would guess that doing ~6 string compares to navigate the binary 
>> tree (with 148 color keywords) is slower than one string hash, ~1.2 
>> int compares and one string compare. But I haven't measured it, so you 
>> might be well be right. Many keyword sets for other properties are 
>> much smaller and could perhaps benefit from a more suitable collection 
>> type.

[J.Pietschmann]

> I meant setup effort, although a binary tree will most likely do
> additional memory management. You are right about the lookup. Just
> for curiosity, where do you get the 1.2 int comparisions? A perfect
> hash should not have collisions.

I was comparing a standard HashMap with your binary tree. A perfect hash 
would likely have a more complicated hash function and of course zero 
int compares.

> It might also be interesting how a trie or ternary tree (as used for
> hyphenation patterns) would compare to hash maps for keywords (in
> terms of setup costs, lookup costs and memory). I have doing a
> study of various Java implementations on my todo list but didn't
> quite get around to do this.

Very interesting indeed.

regards,
finn


Re: Comments on new property maker implementation

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Finn Bock wrote:
> I would guess that doing ~6 string compares to navigate the binary tree 
> (with 148 color keywords) is slower than one string hash, ~1.2 int 
> compares and one string compare. But I haven't measured it, so you might 
> be well be right. Many keyword sets for other properties are much 
> smaller and could perhaps benefit from a more suitable collection type.

I meant setup effort, although a binary tree will most likely do
additional memory management. You are right about the lookup. Just
for curiosity, where do you get the 1.2 int comparisions? A perfect
hash should not have collisions.

It might also be interesting how a trie or ternary tree (as used for
hyphenation patterns) would compare to hash maps for keywords (in
terms of setup costs, lookup costs and memory). I have doing a
study of various Java implementations on my todo list but didn't
quite get around to do this.

J.Pietschmann

Re: Comments on new property maker implementation

Posted by Finn Bock <bc...@worldonline.dk>.
>> You should perhaps also be aware that the values in a static array 
>> gets assigned to the array one element at a time. So

[J.Pietschmann]

> That's an unpleasant surprise. I was always under the impression
> statically initialized data was stored along with the string
> constants, like in C. This means a generated perfect has table
> wouldn't have much of an advantage over, let's say, a simple
> binary tree loaded with the values in proper order so that the
> tree becomes automatically balanced (without rotations like
> rb-trees do).

I would guess that doing ~6 string compares to navigate the binary tree 
(with 148 color keywords) is slower than one string hash, ~1.2 int 
compares and one string compare. But I haven't measured it, so you might 
be well be right. Many keyword sets for other properties are much 
smaller and could perhaps benefit from a more suitable collection type.

> It would make sense, however, to properly initialitze initial size
> values for the various hashmaps currently used.

Indeed. Rehashing a HashMap is very fast tho, so I wouldn't expect a 
major speedup, but it all adds up.

regards,
finn


Re: Comments on new property maker implementation

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Finn Bock wrote:
> You should perhaps also be aware that the values in a static array gets 
> assigned to the array one element at a time. So

That's an unpleasant surprise. I was always under the impression
statically initialized data was stored along with the string
constants, like in C. This means a generated perfect has table
wouldn't have much of an advantage over, let's say, a simple
binary tree loaded with the values in proper order so that the
tree becomes automatically balanced (without rotations like
rb-trees do).

It would make sense, however, to properly initialitze initial size
values for the various hashmaps currently used.

J.Pietschmann


Re: Comments on new property maker implementation

Posted by Glen Mazza <gr...@yahoo.com>.
--- Finn Bock <bc...@worldonline.dk> wrote:
> 
> But the biggest improvement is IMHO the easy ability
> to create special 
> maker subclasses to handle the corner cases. Take a
> look at 
> IndentPropertyMaker for the calculation of start and
> end-indent and at 
> BorderWidthPropertyMaker for the special handling of
> border-width when 
> border-style is NONE.
> 

Well, I'm not there yet, but I'll be able to
appreciate it in due time.

> 
> Initially my new FOPropertyMapping was generated by
> XSLT but that is now 
> a long time ago and I have made lots of manual
> changes since then. 

OK, no problem, we'll modify the Java source from now
on.

> 
> The generic properties are just templates that
> carries default data 
> values to be used later on, so I don't fully see how
> we could xslt- 
> generate them as anything other than containers of
> default values. Your 
> last statement is a bit difficult to parse so I'm
> not sure what exactly 
> you are recommending.
> 

Umm, never mind.  What I was trying to say is that the
generic templates (GenericKeep, GenericSpace, etc., of
the present code) were all autogenerated.  *If* you
thought it still useful to keep it as such, it's OK
with me (i.e., going down from 250 autogenerated to
about 8 is still a very nice improvement.)  But you no
longer see a need for it, which is absolutely fine
with me.

> > One thing that *does* stick out, however, is the
> 100
> > or so addKeyword() calls for genericColor (the
> largest
> > of the generic properties):
> > 
> >   genericColor.addKeyword("antiquewhite",
> "#faebd7");
> >   genericColor.addKeyword("aqua", "#00ffff");
> >   genericColor.addKeyword("aquamarine",
> "#7fffd4");
> >   .....
> > 
> > I'd like us to have a static array of these
> > values--i.e., something done compile-time, that
> > genericColor can just reference, so we don't have
> to
> > do this keyword initialization.  
> 
> I probably need an example of what you thinking are
> here. Right now in 
> HEAD all the color keywords are stored in a HashMap
> created in 
> GenericColor so the keywords initialization is
> already done. 

OK--I see, thanks for the enlightenment here.  Never
mind again, I was wrong on this point.

> Putting the 
> keywords in static array would require us to somehow
> search the array 
> and I don't see how that will be much faster.
> 

Yes, wasn't thinking of that.

> > 4)  I'd also like us to, rather than call
> > setInherited() and setDefault() for each of the
> > properties during initialization, for the
> > Property/Property.Maker classes to just reference
> that
> > information from two (new) static arrays, added to
> > Constants.java.  We can also get rid of these two
> > setter methods as well (ideally there shouldn't be
> > setters for these attributes anyway--they should
> > remain inherent to the Property.)
>  >
> > This change will allow us to take advantage of the
> > fact that we are now on int-constants. 
> > getDefault(PR_WHATEVER), for example, is just
> > Constants.DefaultArray[PR_WHATEVER].
> 
> I think 'Default' is a bad example, noone ever tries
> to get the default 
> value except for the property maker itself, but your
> argument holds for 
> isInherited().
> 

No--I don't think you've gotten my point here.  I
don't care about the consumers of that
information--even if it is just Property.Maker.  But I
don't see the reason to run-time initialize a
PropertyMaker with inherited and default values,
because I can add the whole array in the Constants
interface, or even in Property.Maker directly.  

static Boolean inheritedArray[] =
{
false ....  // 0
true        // PR_PROP_1
true        // PR_PROP_2
false       // PR_PROP_3
true        // ...
....

Once you initialize a Property.Maker with its
PR_XXXXXX constant, *it* (the Maker) can always obtain
these values by accessing inheritedArray[PR_XXXXXX] or
defaultArray[PR_XXXXXX].  No reason to initialize via
setInherited(true) or setDefault("5").  Do you see
what I'm trying to say?

> Still, I disagree. If one want to know is a property
> is inherited, the 
> proper way to get the information should be to call 
> propertyMapping[PR_WHATEVER].isInherited().
> 

OK--we can place these two arrays in a location where
only the Property.Makers can get to it.  (Maybe a
protected static array in Property.Maker?)  Thoughts
here?


> > Again, getting rid of the useGeneric() function. 
> This
> > is for more speed, encapsulation, and also
> shrinking
> > FOPropertyMapping class a bit.
> 
> A very good idea. +1.
> 

I can probably make the modifications to this--looks
simple.

Thanks,
Glen


__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus

Re: Comments on new property maker implementation

Posted by Finn Bock <bc...@worldonline.dk>.
[Glen Mazza]

> I've looked at your changes--I like them, and I'm
> thankful to have someone on our team to be able to
> redesign the properties as you have.  Getting rid of
> the 250 autogenerated or so classes will be a welcome
> improvement.

But the biggest improvement is IMHO the easy ability to create special 
maker subclasses to handle the corner cases. Take a look at 
IndentPropertyMaker for the calculation of start and end-indent and at 
BorderWidthPropertyMaker for the special handling of border-width when 
border-style is NONE.

> Comments right now:
> 
> 1.)  Unlike what I was saying earlier, I don't think
> we should move from Property.Maker to a new
> PropertyMaker class after all, your design looks fine.
>  I've noticed most subclasses of Property.Maker are
> within subclasses of Properties themselves (e.g.,
> LengthProperty, LengthProperty.Maker, etc.) so it
> looks like a neat, clean design.
> 
> 
> 2.)  The new FOPropertyMapping.java class appears (1)
> autogenerated, and (2) to be an XSLT masterpiece at
> that as well.  If it is indeed in good shape, I'd like
> you to submit it to Bugzilla as the new
> fo-property-mapping.xsl, replacing the old one of that
> name in src/codegen.  

Initially my new FOPropertyMapping was generated by XSLT but that is now 
a long time ago and I have made lots of manual changes since then. The 
XSLT script only handled the most common property information and was 
just a hack to get me started. The output isn't a complete java file, it 
doesn't link the subproperties to the base properties and it doesn't 
deal with the classname of any of the complex properties.

> (We won't apply it however,
> until we no longer need the current autogenerated
> fo-property-mapping.xsl, i.e., until all the old
> properties have been tossed out.)
> 
> This way if we have to make wide-ranging changes to
> FOPropertyMapping, we'll have a XSLT source file we
> can conveniently work with.  (Note that putting it in
> codegen does *not* mean that it will be automatically
> autogenerated anymore--it won't, just as constants.xsl
> no longer is--we'll pull it out of the main Ant build
> target at that time and keep it the separate, manual
> xsltToJava target in our build file[1].
> 
> [1]
> http://cvs.apache.org/viewcvs.cgi/xml-fop/build.xml?rev=1.97&view=auto
> )
> 
> 
> Comments on FOPropertyMapping:
> 
> I like removing all these autogenerated classes, but I
> think we can still keep some processing at
> compile-time for more of a performance gain, as
> follows:
> 
> 3)  I think the runtime construction of the generic
> properties (genericColor, genericCondBorderWidth,
> etc.) may not be necessary.  We can still have those
> xslt-generated into classes (6-8 classes total), but
> this time we check them into FOP (again, keeping the
> xsl available for manual re-generation when needed). 
> But most of the generic classes are so small (your
> initialization of GenericCondPadding is only 4 lines
> of code), that going back to creating concrete classes
> would be noticeably beneficial either, so I'm not
> recommending this change.

The generic properties are just templates that carries default data 
values to be used later on, so I don't fully see how we could xslt- 
generate them as anything other than containers of default values. Your 
last statement is a bit difficult to parse so I'm not sure what exactly 
you are recommending.

> One thing that *does* stick out, however, is the 100
> or so addKeyword() calls for genericColor (the largest
> of the generic properties):
> 
>   genericColor.addKeyword("antiquewhite", "#faebd7");
>   genericColor.addKeyword("aqua", "#00ffff");
>   genericColor.addKeyword("aquamarine", "#7fffd4");
>   .....
> 
> I'd like us to have a static array of these
> values--i.e., something done compile-time, that
> genericColor can just reference, so we don't have to
> do this keyword initialization.  

I probably need an example of what you thinking are here. Right now in 
HEAD all the color keywords are stored in a HashMap created in 
GenericColor so the keywords initialization is already done. Putting the 
keywords in static array would require us to somehow search the array 
and I don't see how that will be much faster.

You should perhaps also be aware that the values in a static array gets 
assigned to the array one element at a time. So

     static int[] a = { 101,102,103,104,105,106,107,108 };

becomes in bytecodes:

Method static {}
    0 bipush 8
    2 newarray int
    4 dup
    5 iconst_0
    6 bipush 101
    8 iastore
    9 dup
   10 iconst_1
   11 bipush 102
   13 iastore
   14 dup
   15 iconst_2
   16 bipush 103
   18 iastore
   ...

and so on for each index. (In case you don't know bytecode by heart, 
iconst and bipush both push a constant on the stack and iastore pops 3 
items from the stack; an index, a value and an array and assign the 
value to the index in the array).

> 4)  I'd also like us to, rather than call
> setInherited() and setDefault() for each of the
> properties during initialization, for the
> Property/Property.Maker classes to just reference that
> information from two (new) static arrays, added to
> Constants.java.  We can also get rid of these two
> setter methods as well (ideally there shouldn't be
> setters for these attributes anyway--they should
> remain inherent to the Property.)
 >
> This change will allow us to take advantage of the
> fact that we are now on int-constants. 
> getDefault(PR_WHATEVER), for example, is just
> Constants.DefaultArray[PR_WHATEVER].

I think 'Default' is a bad example, noone ever tries to get the default 
value except for the property maker itself, but your argument holds for 
isInherited().

Still, I disagree. If one want to know is a property is inherited, the 
proper way to get the information should be to call 
propertyMapping[PR_WHATEVER].isInherited().

> 5)  Similar to (b) above, several of the makers also
> have a "useGeneric()" initialization requirement:
> 
> m  = new CondLengthProperty.Maker(PR_PADDING_END);
> m.useGeneric(genericCondPadding);
> 
> For those Makers that require it, I'd like the
> constructor to be expanded to this:
> 
> m  = new CondLengthProperty.Maker(PR_PADDING_END,
> genericCondPadding);
> 
> Again, getting rid of the useGeneric() function.  This
> is for more speed, encapsulation, and also shrinking
> FOPropertyMapping class a bit.

A very good idea. +1.

regards,
finn


Re: Comments on new property maker implementation

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Glen Mazza wrote:
> One thing that *does* stick out, however, is the 100
> or so addKeyword() calls for genericColor
...
> I'd like us to have a static array of these
> values--i.e., something done compile-time, that
> genericColor can just reference, so we don't have to
> do this keyword initialization.  

Look up "perfect hash code" and the associated generators
on the internet, like gperf, a C++ implementation used by
gcc and a veriety of other compilers to provide a data
structure for mapping strings to something else in an
efficient way. Mind you, this would also benefit mapping
to FO and property names to their associated classes or
code numbers.

J.Pietschmann