You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by John Austin <jw...@sympatico.ca> on 2003/12/01 01:31:12 UTC

String.intern() thoughts

I mentioned yesterday that I thought I had read a comment
by Bruce Eckel suggesting that String.intern() might be 
avoided. 

I could not find the reference in either the 2nd or 3rd editions
of "Thinking In Java".

A couple of observations from some research:

1) There were some problems in Java 1.1 and before.
2) There may be problems in non-Sun implementations (KAffe...)

3) There have been discussions in the SAX2 list and other 
   places about using String.ntern() and I notice that
   interning is a feature of SAX2 that can be turned on.
   There is a lot of support for the technique and I suspect
   some of the objections are of the theological type.

4) The property strings in Peter West's code start life as
   string literals which are interned by the Java Language Spec.
   So they are all ready present in the table.

Some of the benefit of interning can be turned on in the parser. 

-- 
John Austin <jw...@sympatico.ca>

RE: String.intern() test and measurement

Posted by "Andreas L. Delmelle" <a_...@pandora.be>.
> -----Original Message-----
> From: Andreas L. Delmelle [mailto:a_l.delmelle@pandora.be]
> 
> for(int i= cnt; --i >= cnt; )
                         ^^^^

sorry. meant 0

cheers,

andreas


RE: String.intern() test and measurement

Posted by "Andreas L. Delmelle" <a_...@pandora.be>.
> -----Original Message-----
> From: Finn Bock [mailto:bckfnn@worldonline.dk]
>
> I can see performance slow down well before 100.000 unique string are
> interned.
>
<snip />
>
> I've attached another demo program that shows that intern'ing is slower
> than doing memory sharing with a Hashtable. This programs creates 13520
> unique strings and does 9 subsequent lookup of the same strings.
>

say... a little trick I just stumbled upon.

try changing the two for loops to :

for(int i= cnt; --i >= cnt; )

take a look at the result. FREAKY!

how much are optimizations like this taken into consideration elsewhere in
the code?


Cheers,

Andreas
> regards,
> finn
>


Re: String.intern() test and measurement

Posted by Finn Bock <bc...@worldonline.dk>.
[John Austin]

> I wondered how much chaining there has to be before performance
> gets really bad when I checked your program more closely. Your 
> example program would produce external chains of length:
> 2000000/20011 ~ 100. 

I can see performance slow down well before 100.000 unique string are 
interned.

And remember that the number of strings that FOP explicit intern must be 
added to the number of intern'ed string that the classloaders has 
created and whatever all the other components that are loaded has created.

>>Ad.2: The memory sharing can implemented by using intern'ed string but 
>>it is a very bad idea to do that. A normal java.util.Hashtable can be 
>>used just as well for the implementation of memory sharing:
> 
> 
> HashMap you mean.

Well sure, but a Hashtable is slightly faster.

I've attached another demo program that shows that intern'ing is slower 
than doing memory sharing with a Hashtable. This programs creates 13520 
unique strings and does 9 subsequent lookup of the same strings.

regards,
finn

Re: String.intern() test and measurement

Posted by John Austin <jw...@sympatico.ca>.
On Tue, 2003-12-02 at 12:59, Finn Bock wrote:
> I'm resending this mail since it hasn't yet shown up in the archives. 
> I'm sorry about any duplicates.
> 
> [John Austin]
> 
> > 4) Changed the handling of strings at the for-loop storing the
> >    attributes received from the parser in startElement( ... )
> > 
> >         // Process attributes
> >         for (int i=0; i<atts.getLength(); i++) {
> >             DefaultMutableTreeNode attribute =
> >                 new DefaultMutableTreeNode(("Attribute (name = '" +
> >                                            atts.getLocalName(i) + 
> >                                            "', value = '" +
> >                                            atts.getValue(i) +
> > "')").intern() );
> > 
> >     So I intern these strings rather than storing new strings.
> 
> Here you are also interning the attribute values, right?

Yes.

> Interning is best used with discretion. The real problem, which isn't 
> really spelled out in the gotchas.html page, is that the interning 
> algorithm is completely undefined by the java spec.
> 
> F.ex, in jdk1.4 the intern table (in symbolTable.[hpp,cpp]) is a fixed 
> size hashing table (size of 20011) with chaining buckets. So when the 
> total number of intern'ed string grows beyond that number, the interning 
> process becomes linear in time. Try the attached test program to see the 
> effect.

Gasp! SUN implemented something might not scale up ?

Note that my example of last night, the SAXTreeValidator.java
results processed the 'defguide' file that has 117 unique names and
13,520 unique values. The memory saved by interning was impressive (184M
down to 87M = 97M reduction). 

[My results may be artificially good as I have interned character
strings as well. I expect that the nature of DEFGUIDE includes many
repeated character strings. I shall re-run the benchmark: 

Hmm .. the interning version of the program now uses 104M, a reduction
of 80M rather than the previous reduction of 97M. The number of interned
strings must have been well over the hash table size, but the difference
in CPU usage to parse that file was less than ten seconds (more for
interned character strings).]

I wondered how much chaining there has to be before performance
gets really bad when I checked your program more closely. Your 
example program would produce external chains of length:
2000000/20011 ~ 100. Because the table keys are constructed, I 
expect your access times are uniformly distributed and average
access times reflect that.

Your demonstration program artificially employs 2 million strings
which is not a behavior we would expect for FOP. The number of 
attribute names is limited (by the XSL-FO Spec) and the number of
distinct values is limited by some probability distributions that
are definitely not Uniform. Takes a LOT of ransom-note typography 
to make that many unique property values.

> I also think that two different aspects of interning is being mixed 
> around in your measurements here.
> 
> 1. The identity sharing.
> 2. The memory sharing.

I have not changed the programs beyond calling intern() on strings
passed to some constructors. So no identity sharing is present.

> The identity sharing can quite possible give a performance boost to the 
> lookup of the attribute names.

I prefer to optimize from measurements. The FOP high runner is property
lookup. I didn't see a lot of time in String.equals(), I may look again.

Peter's alt-design all ready provides this functionality. I am chasing
parallel memory effects.

> The memory sharing can quite possible give a memory boost for duplicated 
> attribute values and a performance boost during garbage collection.

I agree, esp in light of the counts I made using a Perl program and the
large file defguide.fo.

> Ad.1: If one decide that all attribute names must be interned before 
> insertion and also before lookup of attributes, a special hashtable 
> (identity-hashtable) can be coded that is significantly faster than a 
> normal value-based hashtable. As an example of the performance boost, 
> the hash calculation can be done as:

I would use that if i felt that intern() wasn't theraputic. So far, I
just intern the name and value for each attribute passed to the SAX
parser callback: startElement. This stores the interned string in the
parsed tree and lets the parser trash it's copies of strings whenever.

>     int index = (System.identityHashCode(key) & 0x7fffffff) % maxindex;
> 
> In addition attribute names can be compared with '==' instead of equals.
> The downside is that the lookup can only be done with intern'ed keys.

But these values would be encapsulated in a class. Good reason for
not breaking encapsulation.

> Such an identity-hash table has been used by the jython project as a 
> significant performance boost in jython's attribute lookups.

I believe it.

> But, as Glenn noticed, the attribute names can also be implemented with 
> enumeration tokens.

PropNames class in Peter's alt-design.

> Ad.2: The memory sharing can implemented by using intern'ed string but 
> it is a very bad idea to do that. A normal java.util.Hashtable can be 
> used just as well for the implementation of memory sharing:

HashMap you mean.

> public String getSharedValue(String value) {
>     String ret = hashTable.get(value);
>     if (ret != null)
>         return ret;
>     hashTable.put(value, value);
>     return value;
> }
> 
> The hashTable can then be cleared at any time, perhaps at the end of 
> each page sequence.

I agree with the idea that we could implement a separate internment
facility easily. I am not quite convinced that it needs to be done
in an application. 

This COULD give someone the idea that an object interning mechanism
could be implemented at the language level. There's an idea for a 
stand-alone product ... (or a JSR?)

> regards,
> finn
> 
> ______________________________________________________________________
> public class interntst {
>     public static void main(String[] args) throws Exception {
>         new interntst().test();
>     }
> 
>     public void test() {
>         int cnt = 2000000;

In my large test file, there are 117 property names and 13,520 property
values. A lot less than the hash table size of 20,011.

>         String[] arr = new String[cnt];
>         long now = System.currentTimeMillis();
>         for (int i = 0; i < cnt; i++) {
>             if ((i % 1000) == 0) {
>                 System.out.println(i + " " + (System.currentTimeMillis() - now));
>                 now = System.currentTimeMillis();
>             }
>             String s = ("attribute" + i).intern();
>             arr[i] = s;
>         }
>     }
> }
-- 
John Austin <jw...@sympatico.ca>

Re: String.intern() test and measurement

Posted by John Austin <jw...@sympatico.ca>.
On Tue, 2003-12-02 at 14:04, J.Pietschmann wrote:
> Finn Bock wrote:
> >>                 new DefaultMutableTreeNode(("Attribute (name = '" +
> >>                                            atts.getLocalName(i) + 
> >>                                            "', value = '" +
> >>                                            atts.getValue(i) +
> >> "')").intern() );
> 
> > Here you are also interning the attribute values, right?
> 
> Eh, no. The String "Attribute (name = ';name:', value = ';value:')"
  ^ Canadian ?
> is interned.

But I feel it's similar for the purpose of modelling the behavior in the
SAXTreeValidator.java example. The references held by the parser go away
in this model program. I ensure the strings are unique. That's where the
80M savings comes in. In FOP, many strings are created in the parser and
references to them are stored in the parsed tree. Interning these will
produce memory savings. 

> > But, as Glenn noticed, the attribute names can also be implemented with 
> > enumeration 
> 
> There are no enumerations in pre 1.5 Java. What was meant was that
> strings denoting XSLFO property enumeration tokens can be interned
> as the set is of limited and more or less fixed size, while it is
> probably not prudent to intern the complete XML attribute value
> strings.
> For example:
>    text-decoration="underline overline"
> (Yes, that's valid, provided "overline" is valid).
> The possibly interned strings are "text-decoration" (property name),
> "underline" and "overline" (enumeration tokens).
> Somewhere else, the user might have put
>    text-decoration="overline underline"
> Granted, given that FO source ought to be XSLT or otherwise generated,
> this isn't very likely, but still.

This is why I wrote a perl program to count from a real-world case
(defguide). I haven't concerned myself with how an interned "overline
underline" string is used, just ensured it is stored that way once.

The attribute name space is defined in the XSL-FO spec. So the number
of names is strictly limited. Peter lists 380 or so in PropNames.java.
 
> Another issue is how the values are stored. For example, there are only
> 8 distinct TextDecoration objects. In 0.20.5, basically every Text node
> gets its own instance.
> The weird thing is, when I hacked the PropertyManager to look up the
> actual text decoration in an array of preinstantiated objects, FOP
> run slower and took more memory. Apparently something went wrong. If
> anybody is up there to get it right, please do.

I tried testing 0.20.5 doing things that worked in SAXTReeValidator
and haven't had instant success. The benefits would disappear if the
references passed out of the parser are still held elsewhere in FOP.
Give it some time.

> Same for some other objects, BorderAndPadding and especially FontInfo
> come to mind, although there is more variation.

-- 
John Austin <jw...@sympatico.ca>

Re: String.intern() test and measurement

Posted by "Peter B. West" <pb...@powerup.com.au>.
J.Pietschmann wrote:
> John Austin wrote:
> 
>> A high runner in FOP 0.20.5 is: PropertyList.findProperty().
>> It calls other functions in org.apache.fop.fo that consume
>> significant CPU resources. In one example it called itself
>> recursively to a (depth of 10)
> 
> 
> Without taking a closer look at the code, I suspect it tries
> to find inherited values. One approach to cope with this is
> to resolve inherited and default values immediately during
> FO construction. Because the full property set is stored,
> in contrast to only specified properties now, only the
> parent has to be looked up. This comes, of course at the
> price of consuming more memory, and there are functions like
> frome-nearest-specified-value() which require specified
> properties to be marked. If someone can come up with a space
> efficient storage, this may be a solution.

You could also look at the way alt.design tries for the best of both 
worlds.  See fop.fo.FONode.java in the alt.design tree; notably 
makeSparsePropsSet()
BitSet specifiedProps
PropertyValue[] propertySet
PropertyValue[] sparsePropsSet
final int[] sparsePropsMap
final int[] sparseIndices

Peter
-- 
Peter B. West <http://www.powerup.com.au/~pbwest/resume.html


Re: String.intern() test and measurement

Posted by "J.Pietschmann" <j3...@yahoo.de>.
John Austin wrote:
> A high runner in FOP 0.20.5 is: PropertyList.findProperty().
> It calls other functions in org.apache.fop.fo that consume
> significant CPU resources. In one example it called itself
> recursively to a (depth of 10)

Without taking a closer look at the code, I suspect it tries
to find inherited values. One approach to cope with this is
to resolve inherited and default values immediately during
FO construction. Because the full property set is stored,
in contrast to only specified properties now, only the
parent has to be looked up. This comes, of course at the
price of consuming more memory, and there are functions like
frome-nearest-specified-value() which require specified
properties to be marked. If someone can come up with a space
efficient storage, this may be a solution.

J.Pietschmann



Re: String.intern() test and measurement

Posted by John Austin <jw...@sympatico.ca>.
On Tue, 2003-12-02 at 16:43, Glen Mazza wrote:
> --- "J.Pietschmann" <j3...@yahoo.de> wrote:
> > > But, as Glenn noticed, the attribute names can
> > also be implemented with 
> > > enumeration 
> > 
> > There are no enumerations in pre 1.5 Java. What was
> > meant was that
> > strings denoting XSLFO property enumeration tokens
> > can be interned
> > as the set is of limited and more or less fixed
> > size, 
> 
> No I was actually thinking static final variables, my
> reference to "enumerations" was in a generic sense:
> 
> public static final int PROPA = 1;
> public static final int PROPB = 2;
> public static final int PROPC = 3;
> 
> If it is only the property names we were planning on
> interning, then I thought static final variables would
> be faster/more efficient instead.  
> 
> Glen

Given that there are between 249 and 380 names and they
exist in both integer and String format, there isn't a lot
that we can recover here.

If we are after better performance, we must measure to find the
'high runner' and tune from there. To 'design-in' stuff that we 
think will be fast is often unproductive. This is why I have been
using JMP to measure performance of FOP and some sample programs.

A high runner in FOP 0.20.5 is: PropertyList.findProperty().
It calls other functions in org.apache.fop.fo that consume
significant CPU resources. In one example it called itself
recursively to a (depth of 10)

One of the reasons I am playing with the SAXTreeValidator program
is as a simplified test bed for the Property implementation. I want to
be able to plug in a new Property implementation and test
it independantly of the rest of FOP.


-- 
John Austin <jw...@sympatico.ca>

Re: String.intern() test and measurement

Posted by Finn Bock <bc...@worldonline.dk>.
[Glen Mazza]

> No I was actually thinking static final variables, my
> reference to "enumerations" was in a generic sense:
> 
> public static final int PROPA = 1;
> public static final int PROPB = 2;
> public static final int PROPC = 3;

That was also what I though you meant.

Please accept my apology for misspelling your name.

regards,
finn



Re: String.intern() test and measurement

Posted by Glen Mazza <gr...@yahoo.com>.
--- "J.Pietschmann" <j3...@yahoo.de> wrote:
> > But, as Glenn noticed, the attribute names can
> also be implemented with 
> > enumeration 
> 
> There are no enumerations in pre 1.5 Java. What was
> meant was that
> strings denoting XSLFO property enumeration tokens
> can be interned
> as the set is of limited and more or less fixed
> size, 

No I was actually thinking static final variables, my
reference to "enumerations" was in a generic sense:

public static final int PROPA = 1;
public static final int PROPB = 2;
public static final int PROPC = 3;

If it is only the property names we were planning on
interning, then I thought static final variables would
be faster/more efficient instead.  

Glen


__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

Re: String.intern() test and measurement

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Finn Bock wrote:
>>                 new DefaultMutableTreeNode(("Attribute (name = '" +
>>                                            atts.getLocalName(i) + 
>>                                            "', value = '" +
>>                                            atts.getValue(i) +
>> "')").intern() );

> Here you are also interning the attribute values, right?

Eh, no. The String "Attribute (name = ';name:', value = ';value:')"
is interned.

> But, as Glenn noticed, the attribute names can also be implemented with 
> enumeration 

There are no enumerations in pre 1.5 Java. What was meant was that
strings denoting XSLFO property enumeration tokens can be interned
as the set is of limited and more or less fixed size, while it is
probably not prudent to intern the complete XML attribute value
strings.
For example:
   text-decoration="underline overline"
(Yes, that's valid, provided "overline" is valid).
The possibly interned strings are "text-decoration" (property name),
"underline" and "overline" (enumeration tokens).
Somewhere else, the user might have put
   text-decoration="overline underline"
Granted, given that FO source ought to be XSLT or otherwise generated,
this isn't very likely, but still.

Another issue is how the values are stored. For example, there are only
8 distinct TextDecoration objects. In 0.20.5, basically every Text node
gets its own instance.
The weird thing is, when I hacked the PropertyManager to look up the
actual text decoration in an array of preinstantiated objects, FOP
run slower and took more memory. Apparently something went wrong. If
anybody is up there to get it right, please do.
Same for some other objects, BorderAndPadding and especially FontInfo
come to mind, although there is more variation.

J.Pietschmann


Re: String.intern() test and measurement

Posted by Finn Bock <bc...@worldonline.dk>.
I'm resending this mail since it hasn't yet shown up in the archives. 
I'm sorry about any duplicates.

[John Austin]

> 4) Changed the handling of strings at the for-loop storing the
>    attributes received from the parser in startElement( ... )
> 
>         // Process attributes
>         for (int i=0; i<atts.getLength(); i++) {
>             DefaultMutableTreeNode attribute =
>                 new DefaultMutableTreeNode(("Attribute (name = '" +
>                                            atts.getLocalName(i) + 
>                                            "', value = '" +
>                                            atts.getValue(i) +
> "')").intern() );
> 
>     So I intern these strings rather than storing new strings.

Here you are also interning the attribute values, right?

Interning is best used with discretion. The real problem, which isn't 
really spelled out in the gotchas.html page, is that the interning 
algorithm is completely undefined by the java spec.

F.ex, in jdk1.4 the intern table (in symbolTable.[hpp,cpp]) is a fixed 
size hashing table (size of 20011) with chaining buckets. So when the 
total number of intern'ed string grows beyond that number, the interning 
process becomes linear in time. Try the attached test program to see the 
effect.



I also think that two different aspects of interning is being mixed 
around in your measurements here.

1. The identity sharing.
2. The memory sharing.

The identity sharing can quite possible give a performance boost to the 
lookup of the attribute names.

The memory sharing can quite possible give a memory boost for duplicated 
attribute values and a performance boost during garbage collection.


Ad.1: If one decide that all attribute names must be interned before 
insertion and also before lookup of attributes, a special hashtable 
(identity-hashtable) can be coded that is significantly faster than a 
normal value-based hashtable. As an example of the performance boost, 
the hash calculation can be done as:

    int index = (System.identityHashCode(key) & 0x7fffffff) % maxindex;

In addition attribute names can be compared with '==' instead of equals.
The downside is that the lookup can only be done with intern'ed keys.

Such an identity-hash table has been used by the jython project as a 
significant performance boost in jython's attribute lookups.

But, as Glenn noticed, the attribute names can also be implemented with 
enumeration tokens.


Ad.2: The memory sharing can implemented by using intern'ed string but 
it is a very bad idea to do that. A normal java.util.Hashtable can be 
used just as well for the implementation of memory sharing:

public String getSharedValue(String value) {
    String ret = hashTable.get(value);
    if (ret != null)
        return ret;
    hashTable.put(value, value);
    return value;
}

The hashTable can then be cleared at any time, perhaps at the end of 
each page sequence.

regards,
finn

String.intern() test and measurement

Posted by John Austin <jw...@sympatico.ca>.
I decided to find a demonstration program that works
similar enough to FOP that I could try the String.intern()
technique.

1) SAXTreeValidator.java from Chapter 3 of Brett McLaughlin's
   "XML and Java" the online copy of the example is 2nd Ed.

2) Fed this program various fragments of .fo files I have
   accumulated lately.

3) There was one line I had to change in the program

   Line 355: if (attPrefix == null || attPrefix.equals("")) {
   had to add test for null as the prog threw NPE.

4) Changed the handling of strings at the for-loop storing the
   attributes received from the parser in startElement( ... )

        // Process attributes
        for (int i=0; i<atts.getLength(); i++) {
            DefaultMutableTreeNode attribute =
                new DefaultMutableTreeNode(("Attribute (name = '" +
                                           atts.getLocalName(i) + 
                                           "', value = '" +
                                           atts.getValue(i) +
"')").intern() );

    So I intern these strings rather than storing new strings.

5) Tested this program with this particular call to 'intern'
   enabled and disabled. With my 'defguide.fo' file from
   styling "DocBook: The Definitive Guide"

   The results:

	interned attributes:  87M reported by 'top' in column RSS
        non-interned strings 184M reported by 'top'

6) Speed was about the same ('time' reported 27 and 30 second 'user')


7) I ran a similar test in the JMP profiler with a much smaller 
   .fo file. AT the end of execution I determined that the heap
   was about 4.8M for the non-interned code and 2.8M for the interned.


Conclusion: Interning strings in FOP will help. I believe the code
to change is in FOTree.java.

I will code up an example and test it in a day or so.

 
-- 

John Austin <jw...@sympatico.ca>

Re: String.intern() thoughts

Posted by Glen Mazza <gr...@yahoo.com>.
Thanks for the explanation.

Glen

--- "Peter B. West" <pb...@powerup.com.au> wrote:
> Glen Mazza wrote:
> ...
> > 
> > I think the next thing to consider is the storage
> of
> > specified vs. computed values.  Let's say we store
> > pointers for many properties to the same
> > {"property-name", "property-value"} pair.  A
> specified
> > property value of "10%" would not make this a very
> > helpful data structure if that percentage resolves
> to
> > different computed values for each property
> sharing
> > this pair.  I believe the goal for us then would
> be
> > just store the computed value for each pair
> (meaning
> > many more pairs), as long as we take into account
> the
> >
> can't-resolve-everything-without-knowledge-of-layout
> > issue.
> 
> alt.design makes no attempt to look for
> commonalities here.  It resolves 
> every possible property value, and keeps a
> partly-resolved value for 
> percentages.  Inheritance is (almost exclusively) of
> computed values.
> <quote>
> For a given inheritable property, if that property
> is present on a 
> child, then that value of the property is used for
> that child (and its 
> descendants until explicitly re-set in a lower
> descendant); otherwise, 
> the specified value of that property on the child is
> the computed value 
> of that property on the parent formatting object.
> </quote> 5.1.4 Inheritance
> 
> There is an exception that comes to mind.  For
> "line-height" a value may 
> be specified which, although not specified as a
> percentage, is a factor 
> by which the font-size (from memory) is multiplied. 
> When such a value 
> is inherited, it is the factor, not the computed
> value.
> 
> In general, the computed value of a percentage is
> inherited.  That still 
> leaves a problem, because the computed value is
> unknown.  In alt.design, 
> what is effectively a link back to the unresolved
> property is specified 
> as the value.  When the parent property is resolved,
> that resolved value 
> is then available to the inheriting descendants.
> 
> Yet-to-be-implemented is the handling of expressions
> involving 
> percentages, as mentioned in an earlier post.
> 
> In essence, alt.design attempts to resolve every
> property to its 
> computed value, and store the result only on nodes
> to which the property 
> applies.  There is no attempt to reduce storage by
> procedures like 
> interning strings.  Note, though, that the process
> of resolving 
> properties does eliminate most strings.  My
> objective was that when 
> areas were being laid out, the relevant properties
> would all be directly 
> available.
> 
> Peter
> -- 
> Peter B. West
> <http://www.powerup.com.au/~pbwest/resume.html>
> 


__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

Re: String.intern() thoughts

Posted by "Peter B. West" <pb...@powerup.com.au>.
Glen Mazza wrote:
...
> 
> I think the next thing to consider is the storage of
> specified vs. computed values.  Let's say we store
> pointers for many properties to the same
> {"property-name", "property-value"} pair.  A specified
> property value of "10%" would not make this a very
> helpful data structure if that percentage resolves to
> different computed values for each property sharing
> this pair.  I believe the goal for us then would be
> just store the computed value for each pair (meaning
> many more pairs), as long as we take into account the
> can't-resolve-everything-without-knowledge-of-layout
> issue.

alt.design makes no attempt to look for commonalities here.  It resolves 
every possible property value, and keeps a partly-resolved value for 
percentages.  Inheritance is (almost exclusively) of computed values.
<quote>
For a given inheritable property, if that property is present on a 
child, then that value of the property is used for that child (and its 
descendants until explicitly re-set in a lower descendant); otherwise, 
the specified value of that property on the child is the computed value 
of that property on the parent formatting object.
</quote> 5.1.4 Inheritance

There is an exception that comes to mind.  For "line-height" a value may 
be specified which, although not specified as a percentage, is a factor 
by which the font-size (from memory) is multiplied.  When such a value 
is inherited, it is the factor, not the computed value.

In general, the computed value of a percentage is inherited.  That still 
leaves a problem, because the computed value is unknown.  In alt.design, 
what is effectively a link back to the unresolved property is specified 
as the value.  When the parent property is resolved, that resolved value 
is then available to the inheriting descendants.

Yet-to-be-implemented is the handling of expressions involving 
percentages, as mentioned in an earlier post.

In essence, alt.design attempts to resolve every property to its 
computed value, and store the result only on nodes to which the property 
applies.  There is no attempt to reduce storage by procedures like 
interning strings.  Note, though, that the process of resolving 
properties does eliminate most strings.  My objective was that when 
areas were being laid out, the relevant properties would all be directly 
available.

Peter
-- 
Peter B. West <http://www.powerup.com.au/~pbwest/resume.html>


Re: String.intern() thoughts

Posted by Glen Mazza <gr...@yahoo.com>.
--- "J.Pietschmann" <j3...@yahoo.de> wrote:
> Glen Mazza wrote:
> > No need-the "gotcha" site you gave earlier did
> give
> > some specific drawbacks under string compares:  
> > 
> > http://mindprod.com/jgloss/gotchas.html#COMPARISON
> 
> That's not a drawback. Interned strings allow
> comparision of
> references:
>    String a="a".intern();
>    String b="a".intern();
>    if(a==b) { System.out.println("equal"); }
> With non-interned strings you need a.equals(b),
> which
> may be slower, for longer strings anyway.
> 
> The most important gotcha is that interned strings
> may lock
> up memory forever. 

Yes, I may have misread that--outside of the fact that
strings are not GC'ed away, the intern() function does
not look that bad (their mention of a possible 64K
limit may be an issue if we also keep property values,
however.)  Still, I still don't know enough at this
stage to rule out any particular design.

> If interning is done only for
> property
> names and enumeration tokens (in contrast to
> property values),
> this may be acceptable.
> 

If we just do that, I believe we're back to
enumeration tokens, which don't need intern() anyway.

I think the next thing to consider is the storage of
specified vs. computed values.  Let's say we store
pointers for many properties to the same
{"property-name", "property-value"} pair.  A specified
property value of "10%" would not make this a very
helpful data structure if that percentage resolves to
different computed values for each property sharing
this pair.  I believe the goal for us then would be
just store the computed value for each pair (meaning
many more pairs), as long as we take into account the
can't-resolve-everything-without-knowledge-of-layout
issue.

Glen


__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

Re: String.intern() thoughts

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Glen Mazza wrote:
> No need-the "gotcha" site you gave earlier did give
> some specific drawbacks under string compares:  
> 
> http://mindprod.com/jgloss/gotchas.html#COMPARISON

That's not a drawback. Interned strings allow comparision of
references:
   String a="a".intern();
   String b="a".intern();
   if(a==b) { System.out.println("equal"); }
With non-interned strings you need a.equals(b), which
may be slower, for longer strings anyway.

Same for the time needed for interning: I doubt a home
grown mechanism for this, which is exactly what Austin
proposes, would be more efficient.

The most important gotcha is that interned strings may lock
up memory forever. If interning is done only for property
names and enumeration tokens (in contrast to property values),
this may be acceptable.

J.Pietschmann



Re: String.intern() thoughts

Posted by John Austin <jw...@sympatico.ca>.
On Mon, 2003-12-01 at 02:11, Glen Mazza wrote:
> --- John Austin <jw...@sympatico.ca> wrote:
> > I mentioned yesterday that I thought I had read a

> BTW, The third drawback listed in the link above gave
> "weak references" as an alternative implmentation--I'm
> unsure what that construct is about--is this the
> vtable you were speaking of in an earlier message?

The Vtable is used to dispatch virtual functions in (some
implementations) of C++. There is one such table for each class
and it contains a pointer to each virtual function defined. Each
object holds a pointer to it's actual class vtable. This is the
mechanism used to implement polymorphism for C++ virtual functions.

I think a similar means can be used for the inheritence of
properties in FOP.

class a {
  virtual int f() { return 3; }
}

class b : a {
  virtual int f() { return 33; }
}

...

a* z = new b;

cout << "a" << z->f() << endl;

class a vTable contains address of a::f()
      b                            b::f()

instance z includes pointer to the class b vTable 
and is function b::f() is called using pointer to
object of type a.

-- 
John Austin <jw...@sympatico.ca>

Re: String.intern() thoughts and more stats

Posted by John Austin <jw...@sympatico.ca>.
On Mon, 2003-12-01 at 02:11, Glen Mazza wrote:
> --- John Austin <jw...@sympatico.ca> wrote:
> > I mentioned yesterday that I thought I had read a
> > comment
> > by Bruce Eckel suggesting that String.intern() might
> > be 
> > avoided. 
> > 
> > I could not find the reference in either the 2nd or
> > 3rd editions
> > of "Thinking In Java".
> > 
> 
> No need-the "gotcha" site you gave earlier did give
> some specific drawbacks under string compares:  
> 
> http://mindprod.com/jgloss/gotchas.html#COMPARISON
> 
> BTW, The third drawback listed in the link above gave
> "weak references" as an alternative implmentation--I'm
> unsure what that construct is about--is this the
> vtable you were speaking of in an earlier message?
> 
> Also, another question for my comprehension here--the
> "canonical mappings" you have been referring to in
> this thread--is this the same thing as the property
> enumerations that alt-design uses?  I'm unsure of the
> difference between the two.

I started using the term here after re-reading parts of Eckel's
Thinking in Java. I think the CM I refer to and the alt-design
implementation are almost to the same thing. 

>>From the on-line version of TIJ(3rd ed) the following excerpt:

TIJ313.htm:
Weak references are for implementing canonicalizing mappings where
instances of objects can be simultaneously used in multiple places in a
program, to save storage - that do not prevent their keys (or values)
from being reclaimed. 

Don't be mislead to the red herring of 'weak references'. I am
arguing for the "cache of unique objects" not for this GC technique.

After I started using a large .FO file to provide statistics, I
realized that we can use the same technique for larger non-string
objects. 

To this end, I have some more statistics.

In the sample FO file (DocBook: The Definitive Guide), there are
285,223 tags but there are only 18,419 unique property lists.
(There may be fewer, my perl stats program treats different
orderings of the same attributes as different lists)

The program prints out property lists which occur more than 100 times. I
prefix each list with tag names to distinguish empty lists by tag type.
That increases the number of lists by only 15 or so.



Number of Elements by tree level:
level=1 count=1
level=2 count=473
level=3 count=5242
level=4 count=5480
level=5 count=7129
level=6 count=26231
level=7 count=22475
level=8 count=36447
level=9 count=62288
level=10 count=38536
level=11 count=30486
level=12 count=23641
level=13 count=23190
level=14 count=2023
level=15 count=771
level=16 count=701
level=17 count=109


Element frequencies:
a 24
fo:basic-link 5225
fo:block 112142
fo:conditional-page-master-reference 48
fo:external-graphic 1097
fo:flow 472
fo:footnote 22
fo:footnote-body 22
fo:inline 62792
fo:layout-master-set 1
fo:leader 1764
fo:list-block 279
fo:list-item 1004
fo:list-item-body 1004
fo:list-item-label 1004
fo:marker 5335
fo:page-number 1872
fo:page-number-citation 3224
fo:page-sequence 472
fo:page-sequence-master 12
fo:region-after 38
fo:region-before 38
fo:region-body 38
fo:repeatable-page-master-alternatives 12
fo:root 1
fo:simple-page-master 38
fo:static-content 4720
fo:table 6497
fo:table-body 6497
fo:table-cell 33174
fo:table-column 19225
fo:table-footer 1
fo:table-header 29
fo:table-row 15301
fo:wrapper 1799


Property List frequencies:
395	fo:basic-link internal-destination=common.attributes,
66878	fo:block 
1292	fo:block
end-indent=24pt,text-align-last=justify,last-line-end-indent=-24pt,
2119	fo:block
font-family=monospace,space-after.optimum=1em,white-space-collapse=false,text-align=start,space-before.maximum=1.2em,space-before.optimum=1em,wrap-option=no-wrap,space-before.minimum=0.8em,space-after.maximum=1.2em,linefeed-treatment=preserve,space-after.minimum=0.8em,
5082	fo:block
font-family=sans-serif,Symbol,ZapfDingbats,keep-together=always,
236	fo:block
font-family=sans-serif,Symbol,ZapfDingbats,margin-left=-4pc,keep-together=always,
439	fo:block
font-family=sans-serif,space-after.optimum=0.5em,hyphenate=false,font-weight=bold,font-size=18pt,space-after.maximum=0.6em,space-after.minimum=0.4em,keep-with-next.within-column=always,space-after=1em,
5321	fo:block
font-family=sans-serif,space-before.maximum=1.2em,font-weight=bold,space-before.optimum=1.0em,space-before.minimum=0.8em,keep-with-next.within-column=always,
3768	fo:block font-family=serif,Symbol,ZapfDingbats,margin-left=-4pc,
3533	fo:block font-size=17.28pt,
1722	fo:block font-size=20.735999999999997pt,
104	fo:block font-weight=bold,
5332	fo:block keep-with-next.within-column=always,
439	fo:block space-after=1em,
6037	fo:block
space-before.maximum=1.2em,space-before.optimum=1em,space-before.minimum=0.8em,
2558	fo:block span=none,
191	fo:block start-indent=1pc,
476	fo:block text-align=center,
472	fo:flow flow-name=xsl-region-body,
2114	fo:inline 
50373	fo:inline font-family=monospace,
175	fo:inline font-family=monospace,font-style=italic,
448	fo:inline font-family=serif,
1055	fo:inline font-style=italic,
5817	fo:inline font-weight=bold,
115	fo:inline hyphenate=false,
1292	fo:inline keep-together.within-line=always,
1292	fo:inline keep-with-next.within-line=always,
462	fo:leader color=black,leader-pattern=rule,leader-length=1in,
1292	fo:leader
keep-with-next.within-line=always,leader-pattern-width=3pt,leader-pattern=dots,leader-alignment=reference-area,
1004	fo:list-item-body start-indent=body-start(),
313	fo:list-item-label end-indent=label-end(),
671	fo:list-item-label end-indent=label-end(),text-align=start,
5335	fo:marker marker-class-name=section.head.marker,
1872	fo:page-number 
446	fo:page-sequence
hyphenation-push-character-count=2,hyphenation-character=-,xmlns:axf=http://www.antennahouse.com/names/XSL/Extensions,language=en,hyphenate=true,hyphenation-remain-character-count=2,master-reference=body,format=1,
472	fo:static-content flow-name=blank-body,
472	fo:static-content flow-name=xsl-footnote-separator,
472	fo:static-content flow-name=xsl-region-after-blank,
472	fo:static-content flow-name=xsl-region-after-even,
472	fo:static-content flow-name=xsl-region-after-first,
472	fo:static-content flow-name=xsl-region-after-odd,
472	fo:static-content flow-name=xsl-region-before-blank,
472	fo:static-content flow-name=xsl-region-before-even,
472	fo:static-content flow-name=xsl-region-before-first,
472	fo:static-content flow-name=xsl-region-before-odd,
144	fo:table
space-before.maximum=1.2em,space-before.optimum=1em,space-before.minimum=0.8em,
1887	fo:table
width=100%,table-layout=fixed,border-bottom-width=0.5pt,border-bottom-style=solid,border-bottom-color=black,
1887	fo:table
width=100%,table-layout=fixed,border-top-color=black,border-top-style=solid,border-top-width=0.5pt,
6497	fo:table-body 
756	fo:table-cell 
1848	fo:table-cell
display-align=after,text-align=center,relative-align=baseline,
1848	fo:table-cell
display-align=after,text-align=left,relative-align=baseline,
1848	fo:table-cell
display-align=after,text-align=right,relative-align=baseline,
1848	fo:table-cell
display-align=before,text-align=center,relative-align=baseline,
1848	fo:table-cell
display-align=before,text-align=left,relative-align=baseline,
1848	fo:table-cell
display-align=before,text-align=right,relative-align=baseline,
2583	fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,border-right-color=black,padding-right=2pt,border-right-width=0.5pt,border-right-style=solid,padding-left=2pt,border-bottom-color=black,
1026	fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,border-right-color=black,padding-right=2pt,text-align=center,border-right-width=0.5pt,border-right-style=solid,padding-left=2pt,border-bottom-color=black,
956	fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,display-align=after,border-right-color=black,padding-right=2pt,text-align=center,border-right-width=0.5pt,border-right-style=solid,padding-left=2pt,border-bottom-color=black,
1578	fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,display-align=before,border-right-color=black,padding-right=2pt,text-align=left,border-right-width=0.5pt,border-right-style=solid,padding-left=2pt,border-bottom-color=black,
789	fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,display-align=before,padding-right=2pt,text-align=left,padding-left=2pt,border-bottom-color=black,
118	fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,padding-right=2pt,number-columns-spanned=2,padding-left=2pt,border-bottom-color=black,
2095	fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,padding-right=2pt,padding-left=2pt,border-bottom-color=black,
831	fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,padding-right=2pt,text-align=left,number-columns-spanned=3,padding-left=2pt,border-bottom-color=black,
142	fo:table-cell
padding-bottom=2pt,padding-top=2pt,display-align=before,border-right-color=black,padding-right=2pt,text-align=left,border-right-width=0.5pt,border-right-style=solid,padding-left=2pt,
278	fo:table-cell
padding-bottom=2pt,padding-top=2pt,padding-right=2pt,number-columns-spanned=2,padding-left=2pt,
5588	fo:table-cell
padding-bottom=2pt,padding-top=2pt,padding-right=2pt,padding-left=2pt,
4904	fo:table-cell
padding-bottom=2pt,padding-top=2pt,padding-right=2pt,text-align=left,number-columns-spanned=3,padding-left=2pt,
2552	fo:table-column column-number=1,
3918	fo:table-column
column-number=1,column-width=proportional-column-width(1),
2553	fo:table-column column-number=2,
3777	fo:table-column
column-number=2,column-width=proportional-column-width(1),
2549	fo:table-column column-number=3,
3776	fo:table-column
column-number=3,column-width=proportional-column-width(1),
11527	fo:table-row 
3774	fo:table-row height=14pt,
18419 Property Lists


Properties: 526648
Tags: 285223
num_keys: 117
num_vals: 13520

-- 
John Austin <jw...@sympatico.ca>

Re: String.intern() thoughts

Posted by Glen Mazza <gr...@yahoo.com>.
--- John Austin <jw...@sympatico.ca> wrote:
> I mentioned yesterday that I thought I had read a
> comment
> by Bruce Eckel suggesting that String.intern() might
> be 
> avoided. 
> 
> I could not find the reference in either the 2nd or
> 3rd editions
> of "Thinking In Java".
> 

No need-the "gotcha" site you gave earlier did give
some specific drawbacks under string compares:  

http://mindprod.com/jgloss/gotchas.html#COMPARISON

BTW, The third drawback listed in the link above gave
"weak references" as an alternative implmentation--I'm
unsure what that construct is about--is this the
vtable you were speaking of in an earlier message?

Also, another question for my comprehension here--the
"canonical mappings" you have been referring to in
this thread--is this the same thing as the property
enumerations that alt-design uses?  I'm unsure of the
difference between the two.

Thanks,
Glen


__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/