You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by John Austin <jw...@sympatico.ca> on 2003/12/01 01:31:12 UTC
String.intern() thoughts
I mentioned yesterday that I thought I had read a comment
by Bruce Eckel suggesting that String.intern() might be
avoided.
I could not find the reference in either the 2nd or 3rd editions
of "Thinking In Java".
A couple of observations from some research:
1) There were some problems in Java 1.1 and before.
2) There may be problems in non-Sun implementations (KAffe...)
3) There have been discussions in the SAX2 list and other
places about using String.ntern() and I notice that
interning is a feature of SAX2 that can be turned on.
There is a lot of support for the technique and I suspect
some of the objections are of the theological type.
4) The property strings in Peter West's code start life as
string literals which are interned by the Java Language Spec.
So they are all ready present in the table.
Some of the benefit of interning can be turned on in the parser.
--
John Austin <jw...@sympatico.ca>
RE: String.intern() test and measurement
Posted by "Andreas L. Delmelle" <a_...@pandora.be>.
> -----Original Message-----
> From: Andreas L. Delmelle [mailto:a_l.delmelle@pandora.be]
>
> for(int i= cnt; --i >= cnt; )
^^^^
sorry. meant 0
cheers,
andreas
RE: String.intern() test and measurement
Posted by "Andreas L. Delmelle" <a_...@pandora.be>.
> -----Original Message-----
> From: Finn Bock [mailto:bckfnn@worldonline.dk]
>
> I can see performance slow down well before 100.000 unique string are
> interned.
>
<snip />
>
> I've attached another demo program that shows that intern'ing is slower
> than doing memory sharing with a Hashtable. This programs creates 13520
> unique strings and does 9 subsequent lookup of the same strings.
>
say... a little trick I just stumbled upon.
try changing the two for loops to :
for(int i= cnt; --i >= cnt; )
take a look at the result. FREAKY!
how much are optimizations like this taken into consideration elsewhere in
the code?
Cheers,
Andreas
> regards,
> finn
>
Re: String.intern() test and measurement
Posted by Finn Bock <bc...@worldonline.dk>.
[John Austin]
> I wondered how much chaining there has to be before performance
> gets really bad when I checked your program more closely. Your
> example program would produce external chains of length:
> 2000000/20011 ~ 100.
I can see performance slow down well before 100.000 unique string are
interned.
And remember that the number of strings that FOP explicit intern must be
added to the number of intern'ed string that the classloaders has
created and whatever all the other components that are loaded has created.
>>Ad.2: The memory sharing can implemented by using intern'ed string but
>>it is a very bad idea to do that. A normal java.util.Hashtable can be
>>used just as well for the implementation of memory sharing:
>
>
> HashMap you mean.
Well sure, but a Hashtable is slightly faster.
I've attached another demo program that shows that intern'ing is slower
than doing memory sharing with a Hashtable. This programs creates 13520
unique strings and does 9 subsequent lookup of the same strings.
regards,
finn
Re: String.intern() test and measurement
Posted by John Austin <jw...@sympatico.ca>.
On Tue, 2003-12-02 at 12:59, Finn Bock wrote:
> I'm resending this mail since it hasn't yet shown up in the archives.
> I'm sorry about any duplicates.
>
> [John Austin]
>
> > 4) Changed the handling of strings at the for-loop storing the
> > attributes received from the parser in startElement( ... )
> >
> > // Process attributes
> > for (int i=0; i<atts.getLength(); i++) {
> > DefaultMutableTreeNode attribute =
> > new DefaultMutableTreeNode(("Attribute (name = '" +
> > atts.getLocalName(i) +
> > "', value = '" +
> > atts.getValue(i) +
> > "')").intern() );
> >
> > So I intern these strings rather than storing new strings.
>
> Here you are also interning the attribute values, right?
Yes.
> Interning is best used with discretion. The real problem, which isn't
> really spelled out in the gotchas.html page, is that the interning
> algorithm is completely undefined by the java spec.
>
> F.ex, in jdk1.4 the intern table (in symbolTable.[hpp,cpp]) is a fixed
> size hashing table (size of 20011) with chaining buckets. So when the
> total number of intern'ed string grows beyond that number, the interning
> process becomes linear in time. Try the attached test program to see the
> effect.
Gasp! SUN implemented something might not scale up ?
Note that my example of last night, the SAXTreeValidator.java
results processed the 'defguide' file that has 117 unique names and
13,520 unique values. The memory saved by interning was impressive (184M
down to 87M = 97M reduction).
[My results may be artificially good as I have interned character
strings as well. I expect that the nature of DEFGUIDE includes many
repeated character strings. I shall re-run the benchmark:
Hmm .. the interning version of the program now uses 104M, a reduction
of 80M rather than the previous reduction of 97M. The number of interned
strings must have been well over the hash table size, but the difference
in CPU usage to parse that file was less than ten seconds (more for
interned character strings).]
I wondered how much chaining there has to be before performance
gets really bad when I checked your program more closely. Your
example program would produce external chains of length:
2000000/20011 ~ 100. Because the table keys are constructed, I
expect your access times are uniformly distributed and average
access times reflect that.
Your demonstration program artificially employs 2 million strings
which is not a behavior we would expect for FOP. The number of
attribute names is limited (by the XSL-FO Spec) and the number of
distinct values is limited by some probability distributions that
are definitely not Uniform. Takes a LOT of ransom-note typography
to make that many unique property values.
> I also think that two different aspects of interning is being mixed
> around in your measurements here.
>
> 1. The identity sharing.
> 2. The memory sharing.
I have not changed the programs beyond calling intern() on strings
passed to some constructors. So no identity sharing is present.
> The identity sharing can quite possible give a performance boost to the
> lookup of the attribute names.
I prefer to optimize from measurements. The FOP high runner is property
lookup. I didn't see a lot of time in String.equals(), I may look again.
Peter's alt-design all ready provides this functionality. I am chasing
parallel memory effects.
> The memory sharing can quite possible give a memory boost for duplicated
> attribute values and a performance boost during garbage collection.
I agree, esp in light of the counts I made using a Perl program and the
large file defguide.fo.
> Ad.1: If one decide that all attribute names must be interned before
> insertion and also before lookup of attributes, a special hashtable
> (identity-hashtable) can be coded that is significantly faster than a
> normal value-based hashtable. As an example of the performance boost,
> the hash calculation can be done as:
I would use that if i felt that intern() wasn't theraputic. So far, I
just intern the name and value for each attribute passed to the SAX
parser callback: startElement. This stores the interned string in the
parsed tree and lets the parser trash it's copies of strings whenever.
> int index = (System.identityHashCode(key) & 0x7fffffff) % maxindex;
>
> In addition attribute names can be compared with '==' instead of equals.
> The downside is that the lookup can only be done with intern'ed keys.
But these values would be encapsulated in a class. Good reason for
not breaking encapsulation.
> Such an identity-hash table has been used by the jython project as a
> significant performance boost in jython's attribute lookups.
I believe it.
> But, as Glenn noticed, the attribute names can also be implemented with
> enumeration tokens.
PropNames class in Peter's alt-design.
> Ad.2: The memory sharing can implemented by using intern'ed string but
> it is a very bad idea to do that. A normal java.util.Hashtable can be
> used just as well for the implementation of memory sharing:
HashMap you mean.
> public String getSharedValue(String value) {
> String ret = hashTable.get(value);
> if (ret != null)
> return ret;
> hashTable.put(value, value);
> return value;
> }
>
> The hashTable can then be cleared at any time, perhaps at the end of
> each page sequence.
I agree with the idea that we could implement a separate internment
facility easily. I am not quite convinced that it needs to be done
in an application.
This COULD give someone the idea that an object interning mechanism
could be implemented at the language level. There's an idea for a
stand-alone product ... (or a JSR?)
> regards,
> finn
>
> ______________________________________________________________________
> public class interntst {
> public static void main(String[] args) throws Exception {
> new interntst().test();
> }
>
> public void test() {
> int cnt = 2000000;
In my large test file, there are 117 property names and 13,520 property
values. A lot less than the hash table size of 20,011.
> String[] arr = new String[cnt];
> long now = System.currentTimeMillis();
> for (int i = 0; i < cnt; i++) {
> if ((i % 1000) == 0) {
> System.out.println(i + " " + (System.currentTimeMillis() - now));
> now = System.currentTimeMillis();
> }
> String s = ("attribute" + i).intern();
> arr[i] = s;
> }
> }
> }
--
John Austin <jw...@sympatico.ca>
Re: String.intern() test and measurement
Posted by John Austin <jw...@sympatico.ca>.
On Tue, 2003-12-02 at 14:04, J.Pietschmann wrote:
> Finn Bock wrote:
> >> new DefaultMutableTreeNode(("Attribute (name = '" +
> >> atts.getLocalName(i) +
> >> "', value = '" +
> >> atts.getValue(i) +
> >> "')").intern() );
>
> > Here you are also interning the attribute values, right?
>
> Eh, no. The String "Attribute (name = ';name:', value = ';value:')"
^ Canadian ?
> is interned.
But I feel it's similar for the purpose of modelling the behavior in the
SAXTreeValidator.java example. The references held by the parser go away
in this model program. I ensure the strings are unique. That's where the
80M savings comes in. In FOP, many strings are created in the parser and
references to them are stored in the parsed tree. Interning these will
produce memory savings.
> > But, as Glenn noticed, the attribute names can also be implemented with
> > enumeration
>
> There are no enumerations in pre 1.5 Java. What was meant was that
> strings denoting XSLFO property enumeration tokens can be interned
> as the set is of limited and more or less fixed size, while it is
> probably not prudent to intern the complete XML attribute value
> strings.
> For example:
> text-decoration="underline overline"
> (Yes, that's valid, provided "overline" is valid).
> The possibly interned strings are "text-decoration" (property name),
> "underline" and "overline" (enumeration tokens).
> Somewhere else, the user might have put
> text-decoration="overline underline"
> Granted, given that FO source ought to be XSLT or otherwise generated,
> this isn't very likely, but still.
This is why I wrote a perl program to count from a real-world case
(defguide). I haven't concerned myself with how an interned "overline
underline" string is used, just ensured it is stored that way once.
The attribute name space is defined in the XSL-FO spec. So the number
of names is strictly limited. Peter lists 380 or so in PropNames.java.
> Another issue is how the values are stored. For example, there are only
> 8 distinct TextDecoration objects. In 0.20.5, basically every Text node
> gets its own instance.
> The weird thing is, when I hacked the PropertyManager to look up the
> actual text decoration in an array of preinstantiated objects, FOP
> run slower and took more memory. Apparently something went wrong. If
> anybody is up there to get it right, please do.
I tried testing 0.20.5 doing things that worked in SAXTReeValidator
and haven't had instant success. The benefits would disappear if the
references passed out of the parser are still held elsewhere in FOP.
Give it some time.
> Same for some other objects, BorderAndPadding and especially FontInfo
> come to mind, although there is more variation.
--
John Austin <jw...@sympatico.ca>
Re: String.intern() test and measurement
Posted by "Peter B. West" <pb...@powerup.com.au>.
J.Pietschmann wrote:
> John Austin wrote:
>
>> A high runner in FOP 0.20.5 is: PropertyList.findProperty().
>> It calls other functions in org.apache.fop.fo that consume
>> significant CPU resources. In one example it called itself
>> recursively to a (depth of 10)
>
>
> Without taking a closer look at the code, I suspect it tries
> to find inherited values. One approach to cope with this is
> to resolve inherited and default values immediately during
> FO construction. Because the full property set is stored,
> in contrast to only specified properties now, only the
> parent has to be looked up. This comes, of course at the
> price of consuming more memory, and there are functions like
> frome-nearest-specified-value() which require specified
> properties to be marked. If someone can come up with a space
> efficient storage, this may be a solution.
You could also look at the way alt.design tries for the best of both
worlds. See fop.fo.FONode.java in the alt.design tree; notably
makeSparsePropsSet()
BitSet specifiedProps
PropertyValue[] propertySet
PropertyValue[] sparsePropsSet
final int[] sparsePropsMap
final int[] sparseIndices
Peter
--
Peter B. West <http://www.powerup.com.au/~pbwest/resume.html
Re: String.intern() test and measurement
Posted by "J.Pietschmann" <j3...@yahoo.de>.
John Austin wrote:
> A high runner in FOP 0.20.5 is: PropertyList.findProperty().
> It calls other functions in org.apache.fop.fo that consume
> significant CPU resources. In one example it called itself
> recursively to a (depth of 10)
Without taking a closer look at the code, I suspect it tries
to find inherited values. One approach to cope with this is
to resolve inherited and default values immediately during
FO construction. Because the full property set is stored,
in contrast to only specified properties now, only the
parent has to be looked up. This comes, of course at the
price of consuming more memory, and there are functions like
frome-nearest-specified-value() which require specified
properties to be marked. If someone can come up with a space
efficient storage, this may be a solution.
J.Pietschmann
Re: String.intern() test and measurement
Posted by John Austin <jw...@sympatico.ca>.
On Tue, 2003-12-02 at 16:43, Glen Mazza wrote:
> --- "J.Pietschmann" <j3...@yahoo.de> wrote:
> > > But, as Glenn noticed, the attribute names can
> > also be implemented with
> > > enumeration
> >
> > There are no enumerations in pre 1.5 Java. What was
> > meant was that
> > strings denoting XSLFO property enumeration tokens
> > can be interned
> > as the set is of limited and more or less fixed
> > size,
>
> No I was actually thinking static final variables, my
> reference to "enumerations" was in a generic sense:
>
> public static final int PROPA = 1;
> public static final int PROPB = 2;
> public static final int PROPC = 3;
>
> If it is only the property names we were planning on
> interning, then I thought static final variables would
> be faster/more efficient instead.
>
> Glen
Given that there are between 249 and 380 names and they
exist in both integer and String format, there isn't a lot
that we can recover here.
If we are after better performance, we must measure to find the
'high runner' and tune from there. To 'design-in' stuff that we
think will be fast is often unproductive. This is why I have been
using JMP to measure performance of FOP and some sample programs.
A high runner in FOP 0.20.5 is: PropertyList.findProperty().
It calls other functions in org.apache.fop.fo that consume
significant CPU resources. In one example it called itself
recursively to a (depth of 10)
One of the reasons I am playing with the SAXTreeValidator program
is as a simplified test bed for the Property implementation. I want to
be able to plug in a new Property implementation and test
it independantly of the rest of FOP.
--
John Austin <jw...@sympatico.ca>
Re: String.intern() test and measurement
Posted by Finn Bock <bc...@worldonline.dk>.
[Glen Mazza]
> No I was actually thinking static final variables, my
> reference to "enumerations" was in a generic sense:
>
> public static final int PROPA = 1;
> public static final int PROPB = 2;
> public static final int PROPC = 3;
That was also what I though you meant.
Please accept my apology for misspelling your name.
regards,
finn
Re: String.intern() test and measurement
Posted by Glen Mazza <gr...@yahoo.com>.
--- "J.Pietschmann" <j3...@yahoo.de> wrote:
> > But, as Glenn noticed, the attribute names can
> also be implemented with
> > enumeration
>
> There are no enumerations in pre 1.5 Java. What was
> meant was that
> strings denoting XSLFO property enumeration tokens
> can be interned
> as the set is of limited and more or less fixed
> size,
No I was actually thinking static final variables, my
reference to "enumerations" was in a generic sense:
public static final int PROPA = 1;
public static final int PROPB = 2;
public static final int PROPC = 3;
If it is only the property names we were planning on
interning, then I thought static final variables would
be faster/more efficient instead.
Glen
__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/
Re: String.intern() test and measurement
Posted by "J.Pietschmann" <j3...@yahoo.de>.
Finn Bock wrote:
>> new DefaultMutableTreeNode(("Attribute (name = '" +
>> atts.getLocalName(i) +
>> "', value = '" +
>> atts.getValue(i) +
>> "')").intern() );
> Here you are also interning the attribute values, right?
Eh, no. The String "Attribute (name = ';name:', value = ';value:')"
is interned.
> But, as Glenn noticed, the attribute names can also be implemented with
> enumeration
There are no enumerations in pre 1.5 Java. What was meant was that
strings denoting XSLFO property enumeration tokens can be interned
as the set is of limited and more or less fixed size, while it is
probably not prudent to intern the complete XML attribute value
strings.
For example:
text-decoration="underline overline"
(Yes, that's valid, provided "overline" is valid).
The possibly interned strings are "text-decoration" (property name),
"underline" and "overline" (enumeration tokens).
Somewhere else, the user might have put
text-decoration="overline underline"
Granted, given that FO source ought to be XSLT or otherwise generated,
this isn't very likely, but still.
Another issue is how the values are stored. For example, there are only
8 distinct TextDecoration objects. In 0.20.5, basically every Text node
gets its own instance.
The weird thing is, when I hacked the PropertyManager to look up the
actual text decoration in an array of preinstantiated objects, FOP
run slower and took more memory. Apparently something went wrong. If
anybody is up there to get it right, please do.
Same for some other objects, BorderAndPadding and especially FontInfo
come to mind, although there is more variation.
J.Pietschmann
Re: String.intern() test and measurement
Posted by Finn Bock <bc...@worldonline.dk>.
I'm resending this mail since it hasn't yet shown up in the archives.
I'm sorry about any duplicates.
[John Austin]
> 4) Changed the handling of strings at the for-loop storing the
> attributes received from the parser in startElement( ... )
>
> // Process attributes
> for (int i=0; i<atts.getLength(); i++) {
> DefaultMutableTreeNode attribute =
> new DefaultMutableTreeNode(("Attribute (name = '" +
> atts.getLocalName(i) +
> "', value = '" +
> atts.getValue(i) +
> "')").intern() );
>
> So I intern these strings rather than storing new strings.
Here you are also interning the attribute values, right?
Interning is best used with discretion. The real problem, which isn't
really spelled out in the gotchas.html page, is that the interning
algorithm is completely undefined by the java spec.
F.ex, in jdk1.4 the intern table (in symbolTable.[hpp,cpp]) is a fixed
size hashing table (size of 20011) with chaining buckets. So when the
total number of intern'ed string grows beyond that number, the interning
process becomes linear in time. Try the attached test program to see the
effect.
I also think that two different aspects of interning is being mixed
around in your measurements here.
1. The identity sharing.
2. The memory sharing.
The identity sharing can quite possible give a performance boost to the
lookup of the attribute names.
The memory sharing can quite possible give a memory boost for duplicated
attribute values and a performance boost during garbage collection.
Ad.1: If one decide that all attribute names must be interned before
insertion and also before lookup of attributes, a special hashtable
(identity-hashtable) can be coded that is significantly faster than a
normal value-based hashtable. As an example of the performance boost,
the hash calculation can be done as:
int index = (System.identityHashCode(key) & 0x7fffffff) % maxindex;
In addition attribute names can be compared with '==' instead of equals.
The downside is that the lookup can only be done with intern'ed keys.
Such an identity-hash table has been used by the jython project as a
significant performance boost in jython's attribute lookups.
But, as Glenn noticed, the attribute names can also be implemented with
enumeration tokens.
Ad.2: The memory sharing can implemented by using intern'ed string but
it is a very bad idea to do that. A normal java.util.Hashtable can be
used just as well for the implementation of memory sharing:
public String getSharedValue(String value) {
String ret = hashTable.get(value);
if (ret != null)
return ret;
hashTable.put(value, value);
return value;
}
The hashTable can then be cleared at any time, perhaps at the end of
each page sequence.
regards,
finn
String.intern() test and measurement
Posted by John Austin <jw...@sympatico.ca>.
I decided to find a demonstration program that works
similar enough to FOP that I could try the String.intern()
technique.
1) SAXTreeValidator.java from Chapter 3 of Brett McLaughlin's
"XML and Java" the online copy of the example is 2nd Ed.
2) Fed this program various fragments of .fo files I have
accumulated lately.
3) There was one line I had to change in the program
Line 355: if (attPrefix == null || attPrefix.equals("")) {
had to add test for null as the prog threw NPE.
4) Changed the handling of strings at the for-loop storing the
attributes received from the parser in startElement( ... )
// Process attributes
for (int i=0; i<atts.getLength(); i++) {
DefaultMutableTreeNode attribute =
new DefaultMutableTreeNode(("Attribute (name = '" +
atts.getLocalName(i) +
"', value = '" +
atts.getValue(i) +
"')").intern() );
So I intern these strings rather than storing new strings.
5) Tested this program with this particular call to 'intern'
enabled and disabled. With my 'defguide.fo' file from
styling "DocBook: The Definitive Guide"
The results:
interned attributes: 87M reported by 'top' in column RSS
non-interned strings 184M reported by 'top'
6) Speed was about the same ('time' reported 27 and 30 second 'user')
7) I ran a similar test in the JMP profiler with a much smaller
.fo file. AT the end of execution I determined that the heap
was about 4.8M for the non-interned code and 2.8M for the interned.
Conclusion: Interning strings in FOP will help. I believe the code
to change is in FOTree.java.
I will code up an example and test it in a day or so.
--
John Austin <jw...@sympatico.ca>
Re: String.intern() thoughts
Posted by Glen Mazza <gr...@yahoo.com>.
Thanks for the explanation.
Glen
--- "Peter B. West" <pb...@powerup.com.au> wrote:
> Glen Mazza wrote:
> ...
> >
> > I think the next thing to consider is the storage
> of
> > specified vs. computed values. Let's say we store
> > pointers for many properties to the same
> > {"property-name", "property-value"} pair. A
> specified
> > property value of "10%" would not make this a very
> > helpful data structure if that percentage resolves
> to
> > different computed values for each property
> sharing
> > this pair. I believe the goal for us then would
> be
> > just store the computed value for each pair
> (meaning
> > many more pairs), as long as we take into account
> the
> >
> can't-resolve-everything-without-knowledge-of-layout
> > issue.
>
> alt.design makes no attempt to look for
> commonalities here. It resolves
> every possible property value, and keeps a
> partly-resolved value for
> percentages. Inheritance is (almost exclusively) of
> computed values.
> <quote>
> For a given inheritable property, if that property
> is present on a
> child, then that value of the property is used for
> that child (and its
> descendants until explicitly re-set in a lower
> descendant); otherwise,
> the specified value of that property on the child is
> the computed value
> of that property on the parent formatting object.
> </quote> 5.1.4 Inheritance
>
> There is an exception that comes to mind. For
> "line-height" a value may
> be specified which, although not specified as a
> percentage, is a factor
> by which the font-size (from memory) is multiplied.
> When such a value
> is inherited, it is the factor, not the computed
> value.
>
> In general, the computed value of a percentage is
> inherited. That still
> leaves a problem, because the computed value is
> unknown. In alt.design,
> what is effectively a link back to the unresolved
> property is specified
> as the value. When the parent property is resolved,
> that resolved value
> is then available to the inheriting descendants.
>
> Yet-to-be-implemented is the handling of expressions
> involving
> percentages, as mentioned in an earlier post.
>
> In essence, alt.design attempts to resolve every
> property to its
> computed value, and store the result only on nodes
> to which the property
> applies. There is no attempt to reduce storage by
> procedures like
> interning strings. Note, though, that the process
> of resolving
> properties does eliminate most strings. My
> objective was that when
> areas were being laid out, the relevant properties
> would all be directly
> available.
>
> Peter
> --
> Peter B. West
> <http://www.powerup.com.au/~pbwest/resume.html>
>
__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/
Re: String.intern() thoughts
Posted by "Peter B. West" <pb...@powerup.com.au>.
Glen Mazza wrote:
...
>
> I think the next thing to consider is the storage of
> specified vs. computed values. Let's say we store
> pointers for many properties to the same
> {"property-name", "property-value"} pair. A specified
> property value of "10%" would not make this a very
> helpful data structure if that percentage resolves to
> different computed values for each property sharing
> this pair. I believe the goal for us then would be
> just store the computed value for each pair (meaning
> many more pairs), as long as we take into account the
> can't-resolve-everything-without-knowledge-of-layout
> issue.
alt.design makes no attempt to look for commonalities here. It resolves
every possible property value, and keeps a partly-resolved value for
percentages. Inheritance is (almost exclusively) of computed values.
<quote>
For a given inheritable property, if that property is present on a
child, then that value of the property is used for that child (and its
descendants until explicitly re-set in a lower descendant); otherwise,
the specified value of that property on the child is the computed value
of that property on the parent formatting object.
</quote> 5.1.4 Inheritance
There is an exception that comes to mind. For "line-height" a value may
be specified which, although not specified as a percentage, is a factor
by which the font-size (from memory) is multiplied. When such a value
is inherited, it is the factor, not the computed value.
In general, the computed value of a percentage is inherited. That still
leaves a problem, because the computed value is unknown. In alt.design,
what is effectively a link back to the unresolved property is specified
as the value. When the parent property is resolved, that resolved value
is then available to the inheriting descendants.
Yet-to-be-implemented is the handling of expressions involving
percentages, as mentioned in an earlier post.
In essence, alt.design attempts to resolve every property to its
computed value, and store the result only on nodes to which the property
applies. There is no attempt to reduce storage by procedures like
interning strings. Note, though, that the process of resolving
properties does eliminate most strings. My objective was that when
areas were being laid out, the relevant properties would all be directly
available.
Peter
--
Peter B. West <http://www.powerup.com.au/~pbwest/resume.html>
Re: String.intern() thoughts
Posted by Glen Mazza <gr...@yahoo.com>.
--- "J.Pietschmann" <j3...@yahoo.de> wrote:
> Glen Mazza wrote:
> > No need-the "gotcha" site you gave earlier did
> give
> > some specific drawbacks under string compares:
> >
> > http://mindprod.com/jgloss/gotchas.html#COMPARISON
>
> That's not a drawback. Interned strings allow
> comparision of
> references:
> String a="a".intern();
> String b="a".intern();
> if(a==b) { System.out.println("equal"); }
> With non-interned strings you need a.equals(b),
> which
> may be slower, for longer strings anyway.
>
> The most important gotcha is that interned strings
> may lock
> up memory forever.
Yes, I may have misread that--outside of the fact that
strings are not GC'ed away, the intern() function does
not look that bad (their mention of a possible 64K
limit may be an issue if we also keep property values,
however.) Still, I still don't know enough at this
stage to rule out any particular design.
> If interning is done only for
> property
> names and enumeration tokens (in contrast to
> property values),
> this may be acceptable.
>
If we just do that, I believe we're back to
enumeration tokens, which don't need intern() anyway.
I think the next thing to consider is the storage of
specified vs. computed values. Let's say we store
pointers for many properties to the same
{"property-name", "property-value"} pair. A specified
property value of "10%" would not make this a very
helpful data structure if that percentage resolves to
different computed values for each property sharing
this pair. I believe the goal for us then would be
just store the computed value for each pair (meaning
many more pairs), as long as we take into account the
can't-resolve-everything-without-knowledge-of-layout
issue.
Glen
__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/
Re: String.intern() thoughts
Posted by "J.Pietschmann" <j3...@yahoo.de>.
Glen Mazza wrote:
> No need-the "gotcha" site you gave earlier did give
> some specific drawbacks under string compares:
>
> http://mindprod.com/jgloss/gotchas.html#COMPARISON
That's not a drawback. Interned strings allow comparision of
references:
String a="a".intern();
String b="a".intern();
if(a==b) { System.out.println("equal"); }
With non-interned strings you need a.equals(b), which
may be slower, for longer strings anyway.
Same for the time needed for interning: I doubt a home
grown mechanism for this, which is exactly what Austin
proposes, would be more efficient.
The most important gotcha is that interned strings may lock
up memory forever. If interning is done only for property
names and enumeration tokens (in contrast to property values),
this may be acceptable.
J.Pietschmann
Re: String.intern() thoughts
Posted by John Austin <jw...@sympatico.ca>.
On Mon, 2003-12-01 at 02:11, Glen Mazza wrote:
> --- John Austin <jw...@sympatico.ca> wrote:
> > I mentioned yesterday that I thought I had read a
> BTW, The third drawback listed in the link above gave
> "weak references" as an alternative implmentation--I'm
> unsure what that construct is about--is this the
> vtable you were speaking of in an earlier message?
The Vtable is used to dispatch virtual functions in (some
implementations) of C++. There is one such table for each class
and it contains a pointer to each virtual function defined. Each
object holds a pointer to it's actual class vtable. This is the
mechanism used to implement polymorphism for C++ virtual functions.
I think a similar means can be used for the inheritence of
properties in FOP.
class a {
virtual int f() { return 3; }
}
class b : a {
virtual int f() { return 33; }
}
...
a* z = new b;
cout << "a" << z->f() << endl;
class a vTable contains address of a::f()
b b::f()
instance z includes pointer to the class b vTable
and is function b::f() is called using pointer to
object of type a.
--
John Austin <jw...@sympatico.ca>
Re: String.intern() thoughts and more stats
Posted by John Austin <jw...@sympatico.ca>.
On Mon, 2003-12-01 at 02:11, Glen Mazza wrote:
> --- John Austin <jw...@sympatico.ca> wrote:
> > I mentioned yesterday that I thought I had read a
> > comment
> > by Bruce Eckel suggesting that String.intern() might
> > be
> > avoided.
> >
> > I could not find the reference in either the 2nd or
> > 3rd editions
> > of "Thinking In Java".
> >
>
> No need-the "gotcha" site you gave earlier did give
> some specific drawbacks under string compares:
>
> http://mindprod.com/jgloss/gotchas.html#COMPARISON
>
> BTW, The third drawback listed in the link above gave
> "weak references" as an alternative implmentation--I'm
> unsure what that construct is about--is this the
> vtable you were speaking of in an earlier message?
>
> Also, another question for my comprehension here--the
> "canonical mappings" you have been referring to in
> this thread--is this the same thing as the property
> enumerations that alt-design uses? I'm unsure of the
> difference between the two.
I started using the term here after re-reading parts of Eckel's
Thinking in Java. I think the CM I refer to and the alt-design
implementation are almost to the same thing.
>>From the on-line version of TIJ(3rd ed) the following excerpt:
TIJ313.htm:
Weak references are for implementing canonicalizing mappings where
instances of objects can be simultaneously used in multiple places in a
program, to save storage - that do not prevent their keys (or values)
from being reclaimed.
Don't be mislead to the red herring of 'weak references'. I am
arguing for the "cache of unique objects" not for this GC technique.
After I started using a large .FO file to provide statistics, I
realized that we can use the same technique for larger non-string
objects.
To this end, I have some more statistics.
In the sample FO file (DocBook: The Definitive Guide), there are
285,223 tags but there are only 18,419 unique property lists.
(There may be fewer, my perl stats program treats different
orderings of the same attributes as different lists)
The program prints out property lists which occur more than 100 times. I
prefix each list with tag names to distinguish empty lists by tag type.
That increases the number of lists by only 15 or so.
Number of Elements by tree level:
level=1 count=1
level=2 count=473
level=3 count=5242
level=4 count=5480
level=5 count=7129
level=6 count=26231
level=7 count=22475
level=8 count=36447
level=9 count=62288
level=10 count=38536
level=11 count=30486
level=12 count=23641
level=13 count=23190
level=14 count=2023
level=15 count=771
level=16 count=701
level=17 count=109
Element frequencies:
a 24
fo:basic-link 5225
fo:block 112142
fo:conditional-page-master-reference 48
fo:external-graphic 1097
fo:flow 472
fo:footnote 22
fo:footnote-body 22
fo:inline 62792
fo:layout-master-set 1
fo:leader 1764
fo:list-block 279
fo:list-item 1004
fo:list-item-body 1004
fo:list-item-label 1004
fo:marker 5335
fo:page-number 1872
fo:page-number-citation 3224
fo:page-sequence 472
fo:page-sequence-master 12
fo:region-after 38
fo:region-before 38
fo:region-body 38
fo:repeatable-page-master-alternatives 12
fo:root 1
fo:simple-page-master 38
fo:static-content 4720
fo:table 6497
fo:table-body 6497
fo:table-cell 33174
fo:table-column 19225
fo:table-footer 1
fo:table-header 29
fo:table-row 15301
fo:wrapper 1799
Property List frequencies:
395 fo:basic-link internal-destination=common.attributes,
66878 fo:block
1292 fo:block
end-indent=24pt,text-align-last=justify,last-line-end-indent=-24pt,
2119 fo:block
font-family=monospace,space-after.optimum=1em,white-space-collapse=false,text-align=start,space-before.maximum=1.2em,space-before.optimum=1em,wrap-option=no-wrap,space-before.minimum=0.8em,space-after.maximum=1.2em,linefeed-treatment=preserve,space-after.minimum=0.8em,
5082 fo:block
font-family=sans-serif,Symbol,ZapfDingbats,keep-together=always,
236 fo:block
font-family=sans-serif,Symbol,ZapfDingbats,margin-left=-4pc,keep-together=always,
439 fo:block
font-family=sans-serif,space-after.optimum=0.5em,hyphenate=false,font-weight=bold,font-size=18pt,space-after.maximum=0.6em,space-after.minimum=0.4em,keep-with-next.within-column=always,space-after=1em,
5321 fo:block
font-family=sans-serif,space-before.maximum=1.2em,font-weight=bold,space-before.optimum=1.0em,space-before.minimum=0.8em,keep-with-next.within-column=always,
3768 fo:block font-family=serif,Symbol,ZapfDingbats,margin-left=-4pc,
3533 fo:block font-size=17.28pt,
1722 fo:block font-size=20.735999999999997pt,
104 fo:block font-weight=bold,
5332 fo:block keep-with-next.within-column=always,
439 fo:block space-after=1em,
6037 fo:block
space-before.maximum=1.2em,space-before.optimum=1em,space-before.minimum=0.8em,
2558 fo:block span=none,
191 fo:block start-indent=1pc,
476 fo:block text-align=center,
472 fo:flow flow-name=xsl-region-body,
2114 fo:inline
50373 fo:inline font-family=monospace,
175 fo:inline font-family=monospace,font-style=italic,
448 fo:inline font-family=serif,
1055 fo:inline font-style=italic,
5817 fo:inline font-weight=bold,
115 fo:inline hyphenate=false,
1292 fo:inline keep-together.within-line=always,
1292 fo:inline keep-with-next.within-line=always,
462 fo:leader color=black,leader-pattern=rule,leader-length=1in,
1292 fo:leader
keep-with-next.within-line=always,leader-pattern-width=3pt,leader-pattern=dots,leader-alignment=reference-area,
1004 fo:list-item-body start-indent=body-start(),
313 fo:list-item-label end-indent=label-end(),
671 fo:list-item-label end-indent=label-end(),text-align=start,
5335 fo:marker marker-class-name=section.head.marker,
1872 fo:page-number
446 fo:page-sequence
hyphenation-push-character-count=2,hyphenation-character=-,xmlns:axf=http://www.antennahouse.com/names/XSL/Extensions,language=en,hyphenate=true,hyphenation-remain-character-count=2,master-reference=body,format=1,
472 fo:static-content flow-name=blank-body,
472 fo:static-content flow-name=xsl-footnote-separator,
472 fo:static-content flow-name=xsl-region-after-blank,
472 fo:static-content flow-name=xsl-region-after-even,
472 fo:static-content flow-name=xsl-region-after-first,
472 fo:static-content flow-name=xsl-region-after-odd,
472 fo:static-content flow-name=xsl-region-before-blank,
472 fo:static-content flow-name=xsl-region-before-even,
472 fo:static-content flow-name=xsl-region-before-first,
472 fo:static-content flow-name=xsl-region-before-odd,
144 fo:table
space-before.maximum=1.2em,space-before.optimum=1em,space-before.minimum=0.8em,
1887 fo:table
width=100%,table-layout=fixed,border-bottom-width=0.5pt,border-bottom-style=solid,border-bottom-color=black,
1887 fo:table
width=100%,table-layout=fixed,border-top-color=black,border-top-style=solid,border-top-width=0.5pt,
6497 fo:table-body
756 fo:table-cell
1848 fo:table-cell
display-align=after,text-align=center,relative-align=baseline,
1848 fo:table-cell
display-align=after,text-align=left,relative-align=baseline,
1848 fo:table-cell
display-align=after,text-align=right,relative-align=baseline,
1848 fo:table-cell
display-align=before,text-align=center,relative-align=baseline,
1848 fo:table-cell
display-align=before,text-align=left,relative-align=baseline,
1848 fo:table-cell
display-align=before,text-align=right,relative-align=baseline,
2583 fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,border-right-color=black,padding-right=2pt,border-right-width=0.5pt,border-right-style=solid,padding-left=2pt,border-bottom-color=black,
1026 fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,border-right-color=black,padding-right=2pt,text-align=center,border-right-width=0.5pt,border-right-style=solid,padding-left=2pt,border-bottom-color=black,
956 fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,display-align=after,border-right-color=black,padding-right=2pt,text-align=center,border-right-width=0.5pt,border-right-style=solid,padding-left=2pt,border-bottom-color=black,
1578 fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,display-align=before,border-right-color=black,padding-right=2pt,text-align=left,border-right-width=0.5pt,border-right-style=solid,padding-left=2pt,border-bottom-color=black,
789 fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,display-align=before,padding-right=2pt,text-align=left,padding-left=2pt,border-bottom-color=black,
118 fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,padding-right=2pt,number-columns-spanned=2,padding-left=2pt,border-bottom-color=black,
2095 fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,padding-right=2pt,padding-left=2pt,border-bottom-color=black,
831 fo:table-cell
padding-bottom=2pt,padding-top=2pt,border-bottom-width=0.5pt,border-bottom-style=solid,padding-right=2pt,text-align=left,number-columns-spanned=3,padding-left=2pt,border-bottom-color=black,
142 fo:table-cell
padding-bottom=2pt,padding-top=2pt,display-align=before,border-right-color=black,padding-right=2pt,text-align=left,border-right-width=0.5pt,border-right-style=solid,padding-left=2pt,
278 fo:table-cell
padding-bottom=2pt,padding-top=2pt,padding-right=2pt,number-columns-spanned=2,padding-left=2pt,
5588 fo:table-cell
padding-bottom=2pt,padding-top=2pt,padding-right=2pt,padding-left=2pt,
4904 fo:table-cell
padding-bottom=2pt,padding-top=2pt,padding-right=2pt,text-align=left,number-columns-spanned=3,padding-left=2pt,
2552 fo:table-column column-number=1,
3918 fo:table-column
column-number=1,column-width=proportional-column-width(1),
2553 fo:table-column column-number=2,
3777 fo:table-column
column-number=2,column-width=proportional-column-width(1),
2549 fo:table-column column-number=3,
3776 fo:table-column
column-number=3,column-width=proportional-column-width(1),
11527 fo:table-row
3774 fo:table-row height=14pt,
18419 Property Lists
Properties: 526648
Tags: 285223
num_keys: 117
num_vals: 13520
--
John Austin <jw...@sympatico.ca>
Re: String.intern() thoughts
Posted by Glen Mazza <gr...@yahoo.com>.
--- John Austin <jw...@sympatico.ca> wrote:
> I mentioned yesterday that I thought I had read a
> comment
> by Bruce Eckel suggesting that String.intern() might
> be
> avoided.
>
> I could not find the reference in either the 2nd or
> 3rd editions
> of "Thinking In Java".
>
No need-the "gotcha" site you gave earlier did give
some specific drawbacks under string compares:
http://mindprod.com/jgloss/gotchas.html#COMPARISON
BTW, The third drawback listed in the link above gave
"weak references" as an alternative implmentation--I'm
unsure what that construct is about--is this the
vtable you were speaking of in an earlier message?
Also, another question for my comprehension here--the
"canonical mappings" you have been referring to in
this thread--is this the same thing as the property
enumerations that alt-design uses? I'm unsure of the
difference between the two.
Thanks,
Glen
__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/