You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Jeremias Maerki <de...@greenmail.ch> on 2005/06/26 18:05:36 UTC

Re: svn commit: r201864 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/fo: ./ flow/ pagination/ pagination/bookmarks/

I don't think the discussion was finished when you did this, Glen.

Anyway, I've looked up our use of String.intern() in our sources and the
only use is with namespace URIs (see FOTreeBuilder). I believe the
intern() call there doesn't help a lot here concerning memory
consumption. If the XML parser creates a new String instance for every
SAX method call and for the namespace URI, then this is a thing we can't
change. We only evaluate the namespace URI to determine the routing of
the received elements. Since we implicitely know the namespace URI of
each element we don't need to save this information in our FO tree. It
doesn't take up any memory. We are only using the String instances given
to us by the XML parser. So I believe Nils proposed change doesn't have
any negative effects, especially since we don't seem to have pursued a
more widespread use of String.intern() back when it was discussed in
December 2003 (because it was seen as potentially problematic?). I hope
I interpreted this right.

On 26.06.2005 17:13:40 gmazza wrote:
> Author: gmazza
> Date: Sun Jun 26 08:13:38 2005
> New Revision: 201864
> 
> URL: http://svn.apache.org/viewcvs?rev=201864&view=rev
> Log:
> Switch to .equals() instead of == for String compares of namespaces.  Thanks to Nils Meier for the suggestion.



Jeremias Maerki


Re: svn commit: r201864 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/fo: ./ flow/ pagination/ pagination/bookmarks/

Posted by Jeremias Maerki <de...@greenmail.ch>.
Sorry, but I have to take a step back here and withdraw from this
discussion. This nit is taking too much of my attention. It works, so
I'll leave it be, although I'm not fully comfortable with the presence
of the intern() calls. It works just fine locally without them and I
can't imagine a big speed difference compared to the rest of the layout
process. If anyone wants to take action, feel free.

On 03.07.2005 21:04:09 Andreas L. Delmelle wrote:
> > -----Original Message-----
> > From: Jeremias Maerki [mailto:dev.jeremias@greenmail.ch]
> >
> > Thanks to Finn and Andreas for looking at this. Given what I see here
> > I'd say the two intern() calls in FOTreeBuilder would better be removed.
> > The Set the namespaces are stored in works with equals() anyway, so I
> > don't see the point of interning the Strings.
> 
> Not exactly... Maybe they had better be removed, but let's make sure it
> isn't for the wrong reasons.
> 
> > Or do I miss anything here?
> 
> Could be. Ultimately, apart from the string's lengths and the number of
> identical copies that are going to be alive at a given point, this would
> depend on:
> a) the number of times the relevant portions of code --addElementMapping() +
> conditional in findFOMaker()-- are actually executed
> b) how many times those particular strings --URIs-- are going to be compared
> to other potentially interned strings: see my addition of measurement
> results with only one of the strings interned
> 
> To repaint the full picture:
> 
> Writing
> 
> static final String s1 = "some-string-value";
> 
> is the same as
> 
> static final String s1 = "some-string-value".intern();
> 
> All strings that are subsequently assigned the value of s1, will be
> reference strings pointing to the same canonical string.
> 
> String s2 = s1;
> 
> is exactly the same as:
> 
> String s2 = s1.intern();
> 
> (so: (s1 == s2) == true)
> 
> which is precisely why the following is considered Bad
> (unless you need a really long string for a very brief moment, just once)
> 
> String s3 = new String("some-string-value");
> 
> (so: (s1 == s3) == false; s1.equals(s3) == true)
> 
> Effects similar to those of the latter statement --different String
> instances with the same string value-- are inevitable when the Strings
> originate from a file or database, or are built at runtime using
> StringBuffer.toString() --anywhere the value isn't known at compile-time.
> Hence the option of intern() to allow the compiler to optimize the bytecode.
> Optimizations among which you'll find: using bytecode for reference
> comparison, unless when explicitly asked not to do so (by explicitly using
> only equals()).
> One thing I noticed was that, once both strings to be compared were
> guaranteed to be interned at compile-time it didn't even matter anymore
> whether or not the values were the same, so '== || equals()' gave exactly
> the same results as plain '=='. Apparently, the compiler could figure out
> that in this situation, equals() would never really be needed, or would lead
> to the same results anyway.
> Strange though, that it optimizes only partly when both strings are assigned
> the same hard-coded literal --in that case, when the string values are
> different, the results of || would indicate the equals() side *is* evaluated
> (maybe even the only side evaluated, because at compile-time the strings are
> known to be different, but given the source-code, the compiler somehow
> cannot exclude that they are going to be different at run-time...?)
> 
> String.intern() should indeed be used with care. It's not a good idea to
> intern a string at random, but if that happens only in a relatively small
> number of situations, then it won't do much harm. If the call to intern() is
> going to be made many times, you have to take into account that it
> ultimately maps to a native (JNI) method.
> 
> To be absolutely sure, it would probably be wise to check for a threshold:
> i.e. at what point does the overhead of interning really become a
> drawback --we already know from the measurements that it's still worthwhile
> to intern() once, if the number of subsequent comparisons is sufficiently
> large (10^8).
> 
> AFAICT, the preferred option would be to introduce a layer --a pool of
> interned strings-- in between, where you go:
> 
> if( !contains( string ) ) {
>   add( string.intern() );
> }
> ...
> return get( string );
> 
> So, the use of equals() (via: contains()) on separate instances remains
> limited.
> The local table is guaranteed to return a reference string every time, but
> interning happens only the first time a given string value is encountered at
> run-time. Once the string is added to the local table, the call to intern()
> is avoided altogether and replaced with a faster get() --another drain
> caused by intern() is precisely the lookup in a much larger global table to
> check if an instance with that value already exists.
> The only thing to remain aware of is that, by implementing such a map, one
> might end up holding references to some string-values that are used only
> once, keeping them from being garbage-collected.
> 
> The real fun begins when you create Strings as substrings of interned
> Strings...
> You could build one large string containing, say all possible names in a
> document. If you interned this string only once, and used String.substring()
> to create individual names later on, then from the POV of the compiler, all
> of them would be pointers into different parts of one and the same string,
> each of the names and all of their copies taking up the space of an int no
> matter how long they actually are. Even without knowing the exact value of
> the canonical string at compile-time, the compiler still will be able to
> generate much more efficient code in many places, at the 'cost' of only one
> intern() at run-time.
> 
> 
> 
> Hope any of this is useful...
> 
> Cheers,
> 
> Andreas



Jeremias Maerki


RE: svn commit: r201864 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/fo: ./ flow/ pagination/ pagination/bookmarks/

Posted by "Andreas L. Delmelle" <a_...@pandora.be>.
> -----Original Message-----
> From: Jeremias Maerki [mailto:dev.jeremias@greenmail.ch]
>
> Thanks to Finn and Andreas for looking at this. Given what I see here
> I'd say the two intern() calls in FOTreeBuilder would better be removed.
> The Set the namespaces are stored in works with equals() anyway, so I
> don't see the point of interning the Strings.

Not exactly... Maybe they had better be removed, but let's make sure it
isn't for the wrong reasons.

> Or do I miss anything here?

Could be. Ultimately, apart from the string's lengths and the number of
identical copies that are going to be alive at a given point, this would
depend on:
a) the number of times the relevant portions of code --addElementMapping() +
conditional in findFOMaker()-- are actually executed
b) how many times those particular strings --URIs-- are going to be compared
to other potentially interned strings: see my addition of measurement
results with only one of the strings interned

To repaint the full picture:

Writing

static final String s1 = "some-string-value";

is the same as

static final String s1 = "some-string-value".intern();

All strings that are subsequently assigned the value of s1, will be
reference strings pointing to the same canonical string.

String s2 = s1;

is exactly the same as:

String s2 = s1.intern();

(so: (s1 == s2) == true)

which is precisely why the following is considered Bad
(unless you need a really long string for a very brief moment, just once)

String s3 = new String("some-string-value");

(so: (s1 == s3) == false; s1.equals(s3) == true)

Effects similar to those of the latter statement --different String
instances with the same string value-- are inevitable when the Strings
originate from a file or database, or are built at runtime using
StringBuffer.toString() --anywhere the value isn't known at compile-time.
Hence the option of intern() to allow the compiler to optimize the bytecode.
Optimizations among which you'll find: using bytecode for reference
comparison, unless when explicitly asked not to do so (by explicitly using
only equals()).
One thing I noticed was that, once both strings to be compared were
guaranteed to be interned at compile-time it didn't even matter anymore
whether or not the values were the same, so '== || equals()' gave exactly
the same results as plain '=='. Apparently, the compiler could figure out
that in this situation, equals() would never really be needed, or would lead
to the same results anyway.
Strange though, that it optimizes only partly when both strings are assigned
the same hard-coded literal --in that case, when the string values are
different, the results of || would indicate the equals() side *is* evaluated
(maybe even the only side evaluated, because at compile-time the strings are
known to be different, but given the source-code, the compiler somehow
cannot exclude that they are going to be different at run-time...?)

String.intern() should indeed be used with care. It's not a good idea to
intern a string at random, but if that happens only in a relatively small
number of situations, then it won't do much harm. If the call to intern() is
going to be made many times, you have to take into account that it
ultimately maps to a native (JNI) method.

To be absolutely sure, it would probably be wise to check for a threshold:
i.e. at what point does the overhead of interning really become a
drawback --we already know from the measurements that it's still worthwhile
to intern() once, if the number of subsequent comparisons is sufficiently
large (10^8).

AFAICT, the preferred option would be to introduce a layer --a pool of
interned strings-- in between, where you go:

if( !contains( string ) ) {
  add( string.intern() );
}
...
return get( string );

So, the use of equals() (via: contains()) on separate instances remains
limited.
The local table is guaranteed to return a reference string every time, but
interning happens only the first time a given string value is encountered at
run-time. Once the string is added to the local table, the call to intern()
is avoided altogether and replaced with a faster get() --another drain
caused by intern() is precisely the lookup in a much larger global table to
check if an instance with that value already exists.
The only thing to remain aware of is that, by implementing such a map, one
might end up holding references to some string-values that are used only
once, keeping them from being garbage-collected.

The real fun begins when you create Strings as substrings of interned
Strings...
You could build one large string containing, say all possible names in a
document. If you interned this string only once, and used String.substring()
to create individual names later on, then from the POV of the compiler, all
of them would be pointers into different parts of one and the same string,
each of the names and all of their copies taking up the space of an int no
matter how long they actually are. Even without knowing the exact value of
the canonical string at compile-time, the compiler still will be able to
generate much more efficient code in many places, at the 'cost' of only one
intern() at run-time.



Hope any of this is useful...

Cheers,

Andreas


Re: svn commit: r201864 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/fo: ./ flow/ pagination/ pagination/bookmarks/

Posted by Jeremias Maerki <de...@greenmail.ch>.
Thanks to Finn and Andreas for looking at this. Given what I see here
I'd say the two intern() calls in FOTreeBuilder would better be removed.
The Set the namespaces are stored in works with equals() anyway, so I
don't see the point of interning the Strings. Or do I miss anything here?

On 28.06.2005 01:07:09 Andreas L. Delmelle wrote:
> > -----Original Message-----
> > From: Andreas L. Delmelle [mailto:a_l.delmelle@pandora.be]
> >
> > > -----Original Message-----
> > > From: Finn Bock [mailto:bckfnn@worldonline.dk]
> >
> > <snip />
> > >
> > > But since == *is* faster then .equals and I think we can assume that
> > > most URIs are in fact from the FO namespace we can get the benefit from
> > > both.
> > >
> > > Measured with jdk1.4.2_02 on winXP:
> > >
> > > Equal string
> > > == 141
> > > .equals 1938
> > > == || .equals 203
> >
> > Now THERE's an alternative I can live with...
> >
> > Everyone happy, including myself :-)
> 
> BTW, have tried with added tests for the case where one of the two strings
> isn't interned --create a new String(), looks stupid, but approaches the
> situation best--, and noticed a *very* poor performance there.
> 
> (measured on OS X 10.2 - Java 1.4.2)
> 
> Equal strings - both interned
> == 252
> .equals 923
> == || .equals 302
> 
> Different strings - both interned
> == 351
> .equals 1306
> == || .equals 1357
> 
> Equal strings - only one interned
> == n/a
> .equals 6143
> == || .equals 6164
> 
> Different strings - only one interned
> == n/a
> .equals 6140
> == || .equals 6168
> 
> A one time intern for the non-interned string before the loop --intern the
> non-interned string when feeding it to the functions--, leads to figures
> like the first case, both for different and equal string values.
> 
> As was to be expected, performing the interning inside the loop is a Very
> Bad idea, but that would never have been mine...
> 
> Cheers,
> 
> Andreas



Jeremias Maerki


RE: svn commit: r201864 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/fo: ./ flow/ pagination/ pagination/bookmarks/

Posted by "Andreas L. Delmelle" <a_...@pandora.be>.
> -----Original Message-----
> From: Andreas L. Delmelle [mailto:a_l.delmelle@pandora.be]
>
> > -----Original Message-----
> > From: Finn Bock [mailto:bckfnn@worldonline.dk]
>
> <snip />
> >
> > But since == *is* faster then .equals and I think we can assume that
> > most URIs are in fact from the FO namespace we can get the benefit from
> > both.
> >
> > Measured with jdk1.4.2_02 on winXP:
> >
> > Equal string
> > == 141
> > .equals 1938
> > == || .equals 203
>
> Now THERE's an alternative I can live with...
>
> Everyone happy, including myself :-)

BTW, have tried with added tests for the case where one of the two strings
isn't interned --create a new String(), looks stupid, but approaches the
situation best--, and noticed a *very* poor performance there.

(measured on OS X 10.2 - Java 1.4.2)

Equal strings - both interned
== 252
.equals 923
== || .equals 302

Different strings - both interned
== 351
.equals 1306
== || .equals 1357

Equal strings - only one interned
== n/a
.equals 6143
== || .equals 6164

Different strings - only one interned
== n/a
.equals 6140
== || .equals 6168

A one time intern for the non-interned string before the loop --intern the
non-interned string when feeding it to the functions--, leads to figures
like the first case, both for different and equal string values.

As was to be expected, performing the interning inside the loop is a Very
Bad idea, but that would never have been mine...

Cheers,

Andreas


RE: svn commit: r201864 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/fo: ./ flow/ pagination/ pagination/bookmarks/

Posted by "Andreas L. Delmelle" <a_...@pandora.be>.
> -----Original Message-----
> From: Finn Bock [mailto:bckfnn@worldonline.dk]

<snip />
> 
> But since == *is* faster then .equals and I think we can assume that 
> most URIs are in fact from the FO namespace we can get the benefit from 
> both.
> 
> Measured with jdk1.4.2_02 on winXP:
> 
> Equal string
> == 141
> .equals 1938
> == || .equals 203

Now THERE's an alternative I can live with...

Everyone happy, including myself :-)

Cheers,

Andreas

Re: svn commit: r201864 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/fo: ./ flow/ pagination/ pagination/bookmarks/

Posted by Finn Bock <bc...@worldonline.dk>.
> Jeremias Maerki wrote:
> 
>> I don't think the discussion was finished when you did this, Glen.

[Glen Mazza]

> Oh no, it wasn't--but we know the status quo *wasn't* acceptable, and 
> that the performance argument was no longer valid because of the 
> research you did on .equals().  

Not so fast. Measurements are always better than guesses. And there *is* 
a difference in performance.

We should not intern strings, unless we have very special reasons, 
better reasons than performance. Unless SAX promises that the URI 
strings are interned, then we can't depend on receiving interned strings.

But since == *is* faster then .equals and I think we can assume that 
most URIs are in fact from the FO namespace we can get the benefit from 
both.

Measured with jdk1.4.2_02 on winXP:

Equal string
== 141
.equals 1938
== || .equals 203

Different string
== 140
.equals 1844
== || .equals 1891


regards,
finn




public class c1 {
     public static void main(String[] args) {
         String s1 = "http://xml.org/sax/features/string-interning";
         String s2 = "http://xml.org/sax/features/string-interning";
         String s3 = "http://xml.org/sax/features/namespace";
         long before;
         long after;

         System.out.println("Equal string");
         cmp(s1, s2);
         cmp(s1, s2);
         before = System.currentTimeMillis();
         cmp(s1, s2);
         after = System.currentTimeMillis();
         System.out.println("== " + (after - before));

         equals(s1, s2);
         equals(s1, s2);
         before = System.currentTimeMillis();
         equals(s1, s2);
         after = System.currentTimeMillis();
         System.out.println(".equals " + (after - before));

         cmpequals(s1, s2);
         cmpequals(s1, s2);
         before = System.currentTimeMillis();
         cmpequals(s1, s2);
         after = System.currentTimeMillis();
         System.out.println("== || .equals " + (after - before));

         System.out.println("Different string");
         cmp(s1, s3);
         cmp(s1, s3);
         before = System.currentTimeMillis();
         cmp(s1, s3);
         after = System.currentTimeMillis();
         System.out.println("== " + (after - before));

         equals(s1, s3);
         equals(s1, s3);
         before = System.currentTimeMillis();
         equals(s1, s3);
         after = System.currentTimeMillis();
         System.out.println(".equals " + (after - before));

         cmpequals(s1, s3);
         cmpequals(s1, s3);
         before = System.currentTimeMillis();
         cmpequals(s1, s3);
         after = System.currentTimeMillis();
         System.out.println("== || .equals " + (after - before));
     }

     public static void cmp(String s1, String s2) {
         for (int i = 0; i < 100000000; i++) {
             if (s1 == s2) {
                 continue;
             }
         }
     }

     public static void equals(String s1, String s2) {
         for (int i = 0; i < 100000000; i++) {
             if (s1.equals(s2)) {
                 continue;
             }
         }
     }

     public static void cmpequals(String s1, String s2) {
         for (int i = 0; i < 100000000; i++) {
             if (s1 == s2 || s1.equals(s2)) {
                 continue;
             }
         }
     }

}

Re: svn commit: r201864 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/fo: ./ flow/ pagination/ pagination/bookmarks/

Posted by Glen Mazza <gm...@apache.org>.
Andreas L. Delmelle wrote:

>The status quo wasn't acceptable to someone that is developing an
>application embedding FOP, but from where we were standing, it was perfectly
>acceptable to leave everything as is, and have Nils change his application
>to feed interned strings to the relevant portions of our code, not?
>  
>

No--it wasn't perfectly acceptable.  Any intern() was going to be within 
*our* code, not *his*.  Under no circumstances should FOP throw a "wrong 
namespace URI" error if indeed the namespace URI is correct.  I 
shouldn't have used == in comparing strings.

The issue boiled down to:  intern() or equals() *within FOP*.  I 
switched to the latter from the previous bad code, but if the results of 
healthy disagreements within the team go to the former, than go ahead 
and change FOP to the former.  I don't care either way, but the problem 
is fixed within FOP now.

Glen


RE: svn commit: r201864 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/fo: ./ flow/ pagination/ pagination/bookmarks/

Posted by "Andreas L. Delmelle" <a_...@pandora.be>.
> -----Original Message-----
> From: Glen Mazza [mailto:gmazza@apache.org]
>

<snip />
> Oh no, it wasn't--but we know the status quo *wasn't* acceptable, and
> that the performance argument was no longer valid because of the
> research you did on .equals().  Now you and Andreas can finish out the
> argument -- intern() or equals()? -- and if the former wins, just switch
> the code to intern().  If the latter wins, than do nothing, we're done.

It's not a question of which one 'wins' --on the FOP side. No noticeable
changes there, but I really saw no NEED whatsoever, to change it either.

The status quo wasn't acceptable to someone that is developing an
application embedding FOP, but from where we were standing, it was perfectly
acceptable to leave everything as is, and have Nils change his application
to feed interned strings to the relevant portions of our code, not?

Here it doesn't seem to matter all that much, but the main point is: we
can't just jump and change our code because *one* single user/developer has
a problem with it. Better to inform him/her on what else could be done to
avoid that problem.

If he has a problem with interning the strings, let HIM come up with
compelling arguments/figures/numbers to persuade US. At least, we'll all
learn from it.

But to make changes so quickly, based on a rough description of one
scenario? --I dunno


Cheers,

Andreas


Re: svn commit: r201864 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/fo: ./ flow/ pagination/ pagination/bookmarks/

Posted by Glen Mazza <gm...@apache.org>.
Jeremias Maerki wrote:

>I don't think the discussion was finished when you did this, Glen.
>
>  
>

Oh no, it wasn't--but we know the status quo *wasn't* acceptable, and 
that the performance argument was no longer valid because of the 
research you did on .equals().  Now you and Andreas can finish out the 
argument -- intern() or equals()? -- and if the former wins, just switch 
the code to intern().  If the latter wins, than do nothing, we're done.  
I'm out of this discussion now.


>especially since we don't seem to have pursued a
>more widespread use of String.intern() back when it was discussed in
>December 2003 (because it was seen as potentially problematic?). I hope
>I interpreted this right.
>  
>

The December 2003 argument had mainly to do with the interning the names 
of properties and attribute values--but we've switched to faster 
enumeration constants since then.  The discussion wasn't much about the 
(much fewer) namespace URIs to handle.

Glen


RE: svn commit: r201864 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/fo: ./ flow/ pagination/ pagination/bookmarks/

Posted by "Andreas L. Delmelle" <a_...@pandora.be>.
> -----Original Message-----
> From: Jeremias Maerki [mailto:dev.jeremias@greenmail.ch]
>

Hi,

> ... So I believe Nils proposed change doesn't have
> any negative effects, especially since we don't seem to have pursued a
> more widespread use of String.intern() back when it was discussed in
> December 2003 (because it was seen as potentially problematic?). I hope
> I interpreted this right.

That's the way I see it too. No negative effects for us, but no particular
benefits to speak of. From our side it doesn't really make a difference, but
on Nils' side it MAY prove worthwhile in the long run to intern them anyway.


Cheers,

Andreas