You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Jeffrey Rodriguez <je...@hotmail.com> on 2000/07/11 11:28:52 UTC

REDOM Design discussion.

Hi Costin,

I think that idea of using Class.forName() to load modules, and
conditional compilation have a lot of potential, to what
Arnaud has been pursuing with the DOM API.





>One thing we did a lot in tomcat is use Class.forName() to load modules, 
>and
>conditional compilation.
>In DOM is much more difficult to demarcate "features", but at least if 
>there
>are separated classes you can use an interface and load the class only if 
>needed.
>
>
>
>Costin
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
>For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>

________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com

Re: REDOM Design discussion.

Posted by Costin Manolache <co...@eng.sun.com>.

Edwin Goei wrote:

> > Ed has a very good point - it would be great if we could use SAX2 plus few
> > extensions,  or something close enough. We can then move from 2
> directions -
> > moving xerces to use the new APIs ( and nothing else, no more internals
> !),
>
> Do you mean by "moving xerces to use the new APIs" to re-use existing Xerces
> code?  I think this needs to be considered carefully.  For example, I would
> prefer not to take the core parsing code from Xerces because the last time I
> looked at the current Xerces code, it looked like the parser was implemented
> as a state machine and was difficult to understand -- I don't remember the
> name of the class right now.  This would conflict with one of my main goals
> which is to have code that is easy to understand so it can be maintained.
> (Maybe we should start by agreeing on some goals.)

I was thinking more of the schema implementation and maybe some other
high-level features.

Once we agree on the design and API and we make it modular, I suppose
we will have anyway multiple "flavours" of parsers - we already have 3
DOM implementations, with different features, we may have 2 different
SAX parsers.

> In any case, one of the first decisions that needs to be made is how the
> parser will look like.  My preference would be to have a top-down recursive
> descent parser such as what is used in Aelfred2 and Crimson because that is
> simplest to understand.  Of course there are probably other issues here like
> what kind of events the parser needs to emit that need to be discussed.  I'm
> proposing that once general issues like this are decided then individual(s)
> can code something and let others look at it and provide feedback.

Sure - that would be the best start, but the actual implementation of
the parser ( top-down r.d. ) shouldn't matter or be visible in the API.

> > and building modules that are un-optimized or optimized for a different
> > target. It will be then a compile ( or runtime ) decision.
>
> I would prefer easiest to understand and un-optimized.

I said "modules" - we can have multiple implementations for each.

I agree, pre-optimization is bad, and even "post"-optimization may
be bad.  We need to start and have as default a set of easy to
understand and un-optimized modules, but we should be
prepared to create optimized versions where needed.

The "stable" and "most supported" code will consist only on the
clean modules, but in special cases you should be able to use
the tuned one.

Costin

Re: REDOM Design discussion.

Posted by Edwin Goei <Ed...@eng.sun.com>.

> Ed has a very good point - it would be great if we could use SAX2 plus few
> extensions,  or something close enough. We can then move from 2
directions -
> moving xerces to use the new APIs ( and nothing else, no more internals
!),

Do you mean by "moving xerces to use the new APIs" to re-use existing Xerces
code?  I think this needs to be considered carefully.  For example, I would
prefer not to take the core parsing code from Xerces because the last time I
looked at the current Xerces code, it looked like the parser was implemented
as a state machine and was difficult to understand -- I don't remember the
name of the class right now.  This would conflict with one of my main goals
which is to have code that is easy to understand so it can be maintained.
(Maybe we should start by agreeing on some goals.)

In any case, one of the first decisions that needs to be made is how the
parser will look like.  My preference would be to have a top-down recursive
descent parser such as what is used in Aelfred2 and Crimson because that is
simplest to understand.  Of course there are probably other issues here like
what kind of events the parser needs to emit that need to be discussed.  I'm
proposing that once general issues like this are decided then individual(s)
can code something and let others look at it and provide feedback.

> and building modules that are un-optimized or optimized for a different
> target. It will be then a compile ( or runtime ) decision.

I would prefer easiest to understand and un-optimized.

-Edwin

Re: REDOM Design discussion.

Posted by Edwin Goei <Ed...@eng.sun.com>.

----- Original Message -----
From: "James Duncan Davidson" <ja...@eng.sun.com>
To: <xe...@xml.apache.org>
Sent: Tuesday, July 11, 2000 1:50 PM
Subject: Re: REDOM Design discussion.


> on 7/11/00 7:46 AM, Costin Manolache at Costin.Manolache@eng.sun.com
wrote:
>
> > One big issue ( IMHO) is to decide about the use of
String/int/StringHolder.
> > In my experience this is a major performance factor. Using int instead
> > of String is uncomfortable for many people, and StringHolder (i.e.
> > a recyclable object) may be a good compromise or not. I'm +1 on
> > ints, but I have a feeling I'll be in minority.
>
> :) I'm -1 on going as far as ints.. I'd actually advocate using Strings up
> front and looking at how to better optimize those points that show as Hot
in
> OptimizeIt (or whatever profiling tool you like) later -- but I'm solidly
> bought into the religion of avoiding premature optimizations, so I'm
biased.

I agree with Duncan on using Strings directly as it makes code easiest to
understand, use, and maintain.  I think there needs to be a good enough
reason to use less understandable code.  I'm not sure this qualifies.

> I'm +0 on using something like StringHolder. As long as it's
understandable,
> then I'm ok with it. :)

I don't understand the StringHolder proposal.  I'll send out an opinion
later.

-Edwin

Re: The Deferred DOM (was Re: REDOM Design discussion.)

Posted by Ryan Schmidt <ry...@cpsc.ucalgary.ca>.

Arnaud Le Hors wrote:
> > The DeferredDocumentImpl
> > class is a perfect example of this - I spent hours trying to figure out
> > what was going on with that class...very aggravating.
> 
> I agree this project is suffering from a serious lack of documentation
> on the internals. 
<snip>
> But there is no point in wasting your time like that, you should have
> asked. It's not that complicated actually.
> 
> The idea of the Deferred DOM is to lazily create the DOM structure.

Agreed, it's not that complicated. I understood the purpose of the int
arrays
quickly. The hours I spent trying to figure out what was going on were
spent
in the debugger, inspecting the int arrays and watching what happened at
each
step of the deferred-expansion. This was the point I was trying to
get across. Having internals documentation is nice, but having methods
that
would dump out the internals would be _far_ more useful, IMHO. Takes
less
time to write, and easier to keep up to date (compile errors sort of
force
that on you). I find that being able to watch what is going on (usually)
increases
my understanding much more then reading about what is going to happen.

Anyway, enough of that. This should probably be in a separate
message...but
how would I go about timing/benchmarking something like, say, ints vs
Strings vs
RecyclableStrings. I've been writing a simple 'RecyclableString'
implementation,
and I'd like to know how it compares, but I'm not exactly sure how to go
about
this. 

-RMS

The Deferred DOM (was Re: REDOM Design discussion.)

Posted by Arnaud Le Hors <le...@us.ibm.com>.

[ message first sent on Friday which never made it to the list before ]

Ryan Schmidt wrote:
> 
> The DeferredDocumentImpl
> class is a perfect example of this - I spent hours trying to figure out
> what was going on with that class...very aggravating.

I agree this project is suffering from a serious lack of documentation
on the internals. That's what happens when you don't have enough
resources and you're constantly under pressure to provide more features.
It's not an excuse, just a fact.
But there is no point in wasting your time like that, you should have
asked. It's not that complicated actually. 

The idea of the Deferred DOM is to lazily create the DOM structure.
Instead of creating all the nodes up-front, the parser creates a more
compact structure (a set of int arrays) and only creates a shallow DOM
structure that points back to the compact structure. Then as the DOM
structure is traversed, the information from the compact structure is
fluffed up into the DOM.

Note that if you actually traverse the whole tree you don't really gain
anything. On the contrary, every node is a little bigger (due to the
reference to the original structure). On the other hand, if you only
access a few nodes, it's a big win.
-- 
Arnaud  Le Hors - IBM Cupertino, XML Technology Group

Strings vs. RecylclableStrings vs int (was Re: REDOM Design discussion.)

Posted by Jay Sachs <js...@iclick.com>.

I've come to a realization that a RecyclableString has more negative
implications that just additional abstraction for developers to grok. A
RecyclableString is really a wrapper for some (slice of a) char array,
and that can be "reanchored" to a different array, position and/or
length. The semantics of such a thing are very un-String like: if I have
a reference to a RecyclableString, the contents may change out from
underneath me by some other code "reanchoring" it.  If a
RecyclableString isn't "reanchorable", then it's not really recyclable,
since we'd need to allocate them each time, and this doesn't save
anything over using Strings. 

Note that the int/StringPool idea does not have this downside.  I still
really think that raw ints are inappropriate and would make the internal
API was too obfuscated and open to misuse.

I strongly urge that Strings are adopted as the type of choice for
representing text, throughout the API. If performance becomes/is an
issue, use some sort of SymbolCache, which Costin listed as an
alternative in an early message during this discussion.

jay

Re: REDOM Design discussion.

Posted by Ryan Schmidt <ry...@cpsc.ucalgary.ca>.

This might not be a high priority, but it's something to consider: If
you
use int's instead of Strings, inspecting variables with a debugger takes
_much_ more time. Someone needs to decide which is more important (at
least
initially), speed or ease-of-use/development.

If you do go for the int's, _please_ provide some sort of utility method
that will dump out the int <-> String mappings. I know would probably be 
trivial code to write, but it's this kind of thing that hinders new 
developers from contributing if it's not there. If you're going to have 
complicated internal data structures (that are usually undocumented), 
at least provide a way to easily inspect them. The DeferredDocumentImpl
class is a perfect example of this - I spent hours trying to figure out
what was going on with that class...very aggravating. 

This probably doesn't mesh well with some of the goals of
[spinnaker/Xerces2/?],
especially if you are trying to minimize the code base. But it's a 
huge help in getting non-core-developers to contribute fixes, etc.

-RMS

Edwin Goei wrote:
> 
> Jay Sachs <js...@iclick.com> writes:
> 
> > By "client" I meant the other portions of what you called the "internal
> > API". Developing the software itself requires maintenance and clear
> > APIs, and there are costs to introducing an abstraction between
> > developers and Strings.
> 
> Yes, I agree with Jay here.  I think we should design the clearest
> interfaces so it will make it easiest for developers of XRI (or whatever its
> name is) to understand.  This is where all the maintenance happens.  As a
> new person looking at the internals of the current Xerces, after a while I
> could figure out that they were using int IDs but it's not obvious why or
> how to turn it back into a String.  If a String is used directly, there is
> no problem.
> 
> > I'd much prefer to see a RecyclableString, it at least gives a type that
> > indicates what it is. A bare int opens up many possibilities for
> > incorrect usage. And if the RecyclableString turns out to be not
> > working, it's much easier to do a massive search & replace to turn it
> > into an int than to do the reverse.
> 
> Similarly, using something like RecyclableString (I haven't figured out what
> you mean by this yet, BTW), is non-obvious to a generic Java programmer.  I
> think there needs to be a very good reason to add something non-obvious to a
> design and I'm still not convinced that there is yet.
> 
> It seems to me that something like what is in the xerces_j_2 branch in
> org.apache.xerces.utils.SymbolTable would handle your concern about creating
> unnecessary String objects and yet allow Xerces developers to directly use
> String objects and not some other indirect object like RecyclableString or
> int.
> 
> -Edwin
> 
> PS: I tried, but I can't seem to access the code via the CVS web interface
> http://xml.apache.org/websrc/cvsweb.cgi/xml-xerces/java/src/org/apache/xerce
> s/utils/?only_with_tag=xerces_j_2.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

-- 
Ryan Schmidt                            Computer Science UG
ryansc@cpsc.ucalgary.ca                 MILOS Developer
http://www.cpsc.ucalgary.ca/~ryansc     code poet

Re: REDOM Design discussion.

Posted by Edwin Goei <Ed...@eng.sun.com>.

Jay Sachs <js...@iclick.com> writes:

> By "client" I meant the other portions of what you called the "internal
> API". Developing the software itself requires maintenance and clear
> APIs, and there are costs to introducing an abstraction between
> developers and Strings.

Yes, I agree with Jay here.  I think we should design the clearest
interfaces so it will make it easiest for developers of XRI (or whatever its
name is) to understand.  This is where all the maintenance happens.  As a
new person looking at the internals of the current Xerces, after a while I
could figure out that they were using int IDs but it's not obvious why or
how to turn it back into a String.  If a String is used directly, there is
no problem.

> I'd much prefer to see a RecyclableString, it at least gives a type that
> indicates what it is. A bare int opens up many possibilities for
> incorrect usage. And if the RecyclableString turns out to be not
> working, it's much easier to do a massive search & replace to turn it
> into an int than to do the reverse.

Similarly, using something like RecyclableString (I haven't figured out what
you mean by this yet, BTW), is non-obvious to a generic Java programmer.  I
think there needs to be a very good reason to add something non-obvious to a
design and I'm still not convinced that there is yet.

It seems to me that something like what is in the xerces_j_2 branch in
org.apache.xerces.utils.SymbolTable would handle your concern about creating
unnecessary String objects and yet allow Xerces developers to directly use
String objects and not some other indirect object like RecyclableString or
int.

-Edwin

PS: I tried, but I can't seem to access the code via the CVS web interface
http://xml.apache.org/websrc/cvsweb.cgi/xml-xerces/java/src/org/apache/xerce
s/utils/?only_with_tag=xerces_j_2.

Re: REDOM Design discussion.

Posted by Brett McLaughlin <br...@lutris.com>.


Andy Heninger wrote:
> 
> My guess is that any application that does not want to run inside
> of a web browser on the client and has any communication with
> a server (of any kind) is a good candidate for use of XML for
> that client/server communication.

Agreed, but I doubt that even in an optimistic look, this is very many
clients.

> 
> Any such client will require an XML parser, and I believe that
> Xerces should have a goal of meeting the needs for this kind
> of client usage  (where the client is some reasonably capable
> system, not some absolute minimum cost dedicated device).

But I don't think they will directly go and download a JVM, then
download an XML parser. I think that you are giving /way/ too much
credit to even above-average users. I think users will purchase software
which allows them to run applications, in the small case where users
need thick-clients. And I don't think this is WinZip we're talking about
here, where anyone will just hop on and get it. You can't honestly
expect Joe the accountant to find xml.apache.org, find the dist/ page,
and then read through a directory listing and pick the correct Xerces,
unzip it, and drop the jar file in his classpath? Come on... ;-)

> 
> Whether this translates to any kind of a strong argument for
> Java 1.1 support is something that I'm much less convinced of.
> Any real, install it and go, type of end user application that
> uses Java is pretty much forced to bundle the appropriate JVM
> for its use anyway.  And on any dedicated device, the developers
> will have complete control over the execution environment.

This is what I think is more common - they buy a packaged piece of
software, which takes us back to the original controlled case of where
developers are our target, and not users.

-Brett

> 
>   -- Andy
> 
> >
> "Brett McLaughlin" <br...@lutris.com> wrote
> > I don't buy this. Users will use services on a server, where developers
> > control the VM. Unless you plan on pushing the XML parser to the client,
> > we don't need to worry about the VM of the client - this is standard
> > thin-client type stuff. In fact, I would argue that people using XML are
> > /more likely/ to be on newer VM's, because XML itself to some degree
> > lends itself to newer applications.
> >
> > -Brett
> >
> >
> > Andy Heninger wrote:
> > >
> > > If XML succeeds in some of the areas that it's being hyped for, we
> will
> > > end up with users (not developers, but users) who have no clue that
> > > they're using either XML or Java, and will not consider themselves
> > > hardcore at all.
> > >
> > > We need to think about where applications will need to run for the end
> > > users, not what the developers will be using.  The answer may be the
> > > same - go with the new stuff - but we need to be clear that
> applications
> > > of XML will target end user environments, not just developers or
> servers
> > > or newly-developed devices.
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

-- 
Brett McLaughlin, Enhydra Strategist
Lutris Technologies, Inc. 
1200 Pacific Avenue, Suite 300 
Santa Cruz, CA 95060 USA 
http://www.lutris.com
http://www.enhydra.org

Re: REDOM Design discussion.

Posted by Andy Heninger <an...@jtcsv.com>.

My guess is that any application that does not want to run inside
of a web browser on the client and has any communication with
a server (of any kind) is a good candidate for use of XML for
that client/server communication.

Any such client will require an XML parser, and I believe that
Xerces should have a goal of meeting the needs for this kind
of client usage  (where the client is some reasonably capable
system, not some absolute minimum cost dedicated device).

Whether this translates to any kind of a strong argument for
Java 1.1 support is something that I'm much less convinced of.
Any real, install it and go, type of end user application that
uses Java is pretty much forced to bundle the appropriate JVM
for its use anyway.  And on any dedicated device, the developers
will have complete control over the execution environment.

  -- Andy

>
"Brett McLaughlin" <br...@lutris.com> wrote
> I don't buy this. Users will use services on a server, where developers
> control the VM. Unless you plan on pushing the XML parser to the client,
> we don't need to worry about the VM of the client - this is standard
> thin-client type stuff. In fact, I would argue that people using XML are
> /more likely/ to be on newer VM's, because XML itself to some degree
> lends itself to newer applications.
>
> -Brett
>
>
> Andy Heninger wrote:
> >
> > If XML succeeds in some of the areas that it's being hyped for, we
will
> > end up with users (not developers, but users) who have no clue that
> > they're using either XML or Java, and will not consider themselves
> > hardcore at all.
> >
> > We need to think about where applications will need to run for the end
> > users, not what the developers will be using.  The answer may be the
> > same - go with the new stuff - but we need to be clear that
applications
> > of XML will target end user environments, not just developers or
servers
> > or newly-developed devices.
>

Re: REDOM Design discussion.

Posted by Brett McLaughlin <br...@lutris.com>.


Andy Heninger wrote:
> 
> "James Duncan Davidson" <ja...@eng.sun.com> wrote
> 
> >
> > Yep, users will take a while to migrate, that's fine -- but I think the
> hard
> > core Java developers (our target) are going to migrate fast ...
> 
> If XML succeeds in some of the areas that it's being hyped for, we will
> end up with users (not developers, but users) who have no clue that
> they're using either XML or Java, and will not consider themselves
> hardcore at all.
> 
> We need to think about where applications will need to run for the end
> users, not what the developers will be using.  The answer may be the
> same - go with the new stuff - but we need to be clear that applications
> of XML will target end user environments, not just developers or servers
> or newly-developed devices.

I don't buy this. Users will use services on a server, where developers
control the VM. Unless you plan on pushing the XML parser to the client,
we don't need to worry about the VM of the client - this is standard
thin-client type stuff. In fact, I would argue that people using XML are
/more likely/ to be on newer VM's, because XML itself to some degree
lends itself to newer applications.

-Brett

> 
> Andy Heninger
> IBM XML Technology Group, Cupertino, CA
> heninger@us.ibm.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

-- 
Brett McLaughlin, Enhydra Strategist
Lutris Technologies, Inc. 
1200 Pacific Avenue, Suite 300 
Santa Cruz, CA 95060 USA 
http://www.lutris.com
http://www.enhydra.org

Re: REDOM Design discussion.

Posted by Andy Heninger <an...@jtcsv.com>.

"James Duncan Davidson" <ja...@eng.sun.com> wrote

>
> Yep, users will take a while to migrate, that's fine -- but I think the
hard
> core Java developers (our target) are going to migrate fast ...

If XML succeeds in some of the areas that it's being hyped for, we will
end up with users (not developers, but users) who have no clue that
they're using either XML or Java, and will not consider themselves
hardcore at all.

We need to think about where applications will need to run for the end
users, not what the developers will be using.  The answer may be the
same - go with the new stuff - but we need to be clear that applications
of XML will target end user environments, not just developers or servers
or newly-developed devices.

Andy Heninger
IBM XML Technology Group, Cupertino, CA
heninger@us.ibm.com

Re: REDOM Design discussion.

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/14/00 6:28 AM, Elliotte Rusty Harold at elharo@metalab.unc.edu wrote:

> Let me correct that last statement, since I've seen that
> misapprehension cause problems in this thread and elsewhere several
> times now. MacOS is not jumping to 1.3. In fact, for the foreseeable
> future, MacOS will remain restricted to Java 1.1, not even Java 1.2.
> From a Mac perspective, working well with 1.1 is crucial.

Where MacOS == MacOS 9 & previous, yes, you are totally correct. I should
have been more clear and said MacOS X is moving to 1.3 (sorry, my bad, the
problem with actually using the new OS is that you get used to calling it a
Mac -- even if you flip between the two OSs a couple of times a day) 1.1
will be the only thing there on 9. My statement wasn't to really put Mac OS
9 on the radar as a platform to hit as Xerces1 is the target for 1.1 based
VMs imho. It was to show that the directions that os vendors are moving
towards wrt 1.3.

> It's more akin to Microsoft's
> switch from the Windows 3.0/3.1/95/98 architecture to the Windows NT
> architecture. We all know that's going a lot more slowly than
> Microsoft planned, advertised, or hoped for. I predict a similar long
> changeover for MacOS classic to MacOS X.

I'll point out a couple of things, and then let this be as it's going off
topic (we can continue elsewhere.. :).. From my understanding, aapl isn't
going to do what msft did and try to continue to have both OSs in the world.
Once they flip the switch, there they go -- kind of like 68k to PowerPC.
imho, having both trains has been deterimental to both for msft. Apple has
done better transitions (once again, the hardware thing was amazingly
smooth). 

Yep, users will take a while to migrate, that's fine -- but I think the hard
core Java developers (our target) are going to migrate fast from 9 to 10.
After using 10 for a bit, I've found a gorgeous Java development platform..
:) I've already chucked out my Windows laptop (ok, it's still sitting on the
desk at home, but gathering dust. It's about to become a Linux machine again
I think..) and if a few things are fixed in the public beta, I'll be ready
to push the Solaris machine at work into the corner. :)

.duncan

Re: REDOM Design discussion.

Posted by Elliotte Rusty Harold <el...@metalab.unc.edu>.

At 9:38 PM -0700 7/13/00, James Duncan Davidson wrote:

>Yep. But I see that there has been other discussion about VMs. My bias is,
>of course, to move the top of the bell curve of performance to be squarely
>on top of J2SE 1.3 since that seems to be the JDK version that is going to
>be picked up most widely over the next 9 months (since IBM is jumping right
>there according to what I hear, and Mac OS is jumping straight to 1.3, etc.)
>

Let me correct that last statement, since I've seen that 
misapprehension cause problems in this thread and elsewhere several 
times now. MacOS is not jumping to 1.3. In fact, for the foreseeable 
future, MacOS will remain restricted to Java 1.1, not even Java 1.2. 
 From a Mac perspective, working well with 1.1 is crucial.

What is going to happen is that sometime in 2001 or later, Apple will 
release a completely new operating system called MacOS X. This 
operating system will run some but not most of the current installed 
base of Macs. This new OS will support Java 1.3. However, the classic 
OS currently used on 100% of Macs will not be able to run Java 1.3. 
The new VM is MacOS X only and no back port is planned.

Over time users will slowly upgrade to MacOS X, especially as they 
buy new hardware. However, this is a much more major switch than 
MacOS 8.5 to MacOS 9.0 (for example). It's more akin to Microsoft's 
switch from the Windows 3.0/3.1/95/98 architecture to the Windows NT 
architecture. We all know that's going a lot more slowly than 
Microsoft planned, advertised, or hoped for. I predict a similar long 
changeover for MacOS classic to MacOS X. If we want to support the 
current installed base of Macs, then we have to support Java 1.1.

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|                  The XML Bible (IDG Books, 1999)                   |
|              http://metalab.unc.edu/xml/books/bible/               |
|   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |
+----------------------------------+---------------------------------+

Re: REDOM Design discussion.

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/12/00 9:09 AM, Costin Manolache at Costin.Manolache@eng.sun.com wrote:

>> I'd much prefer to see a RecyclableString, it at least gives a type that
>> indicates what it is. A bare int opens up many possibilities for
>> incorrect usage. And if the RecyclableString turns out to be not
>> working, it's much easier to do a massive search & replace to turn it
>> into an int than to do the reverse.
> 
> +1 on that.

+1 on the name -- mucho clearer.

>> I still have the questions: what VM? what JIT? I only persist with this
>> question because James indicated that the intended targets are "modern"
>> VMs, where object creation is (supposed to be) significantly cheaper.
> 
> 1.2.2 and 1.3. The diff is bigger on 1.2.2, of course.

Yep. But I see that there has been other discussion about VMs. My bias is,
of course, to move the top of the bell curve of performance to be squarely
on top of J2SE 1.3 since that seems to be the JDK version that is going to
be picked up most widely over the next 9 months (since IBM is jumping right
there according to what I hear, and Mac OS is jumping straight to 1.3, etc.)

.duncan

RE: REDOM Design discussion.

Posted by Paulo Gaspar <pa...@krankikom.de>.

As an addition to the JDK discussion:
Even if it is voted to go JDK 1.1.*, be aware that SUN has some 
collection classes JAR for JDK 1.1.* at their site.

(Ok, ok! you all know that but I wanted to be sure.)


Have fun,
Paulo Gaspar

RE: REDOM Design discussion.

Posted by Paulo Gaspar <pa...@krankikom.de>.

> -----Original Message-----
> From: James Duncan Davidson [mailto:james.davidson@eng.sun.com]
> Sent: Tuesday, July 18, 2000 12:28
>
> on 7/15/00 6:33 AM, Paulo Gaspar at paulo.gaspar@krankikom.de wrote:
>
> > Palm owners - gadget lovers as they are - will probably tend to upgrade
> > those devices.
>
> Yep. original --> V, and I'll get the next gen color one probably ;)
>
> .duncan
>

My new boss asked me if I want one. I told I wasn't in a hurry.

I am buying my way to be hated at the office by betting on having the first
color Palm there. hehehe


Have fun,
Paulo Gaspar

Re: REDOM Design discussion.

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/15/00 6:33 AM, Paulo Gaspar at paulo.gaspar@krankikom.de wrote:

> Small devices are evolving so damn fast that it is even better not to
> mention names. But I agree there will be "small-but-performant-ambitious"
> devices and "as-small-and-cheap-as-possible" devices.

I guess I should make it clear that my aims for being a propenent if small
devices is "small-but-perfrormant-ambitious", not
"as-small-and-cheap-as-poassible" -- for that second market, you've got to
do custom everything anyway.

> Now, if we talk about Palms and WinCE devices or even some mobile
> telephones, we are talking about "small-but-performant-ambitious"
> devices:
> - Their ambition is to be a tiny handheld personal computer;
> - Their cost target is not "as-cheap-as-possible" but "cheap-enough".
> 
> When hardware technology is available to have a Pentium processing power
> class CPU with 128 MB memory at a "cheap-enough" price, they will use it.
> And they will cram those devices with as much software as possible and
> good libraries will be badly needed.

It's already happening even with TV sets. The WebTV box is actually quite
capable. As is the Tivo machine. As is the Sega machine. And the PS2 is a
*hell* of a machine.

> This means that we are in a short transitional period before this products
> are powerful enough to include quite complete Java environments. And most
> Palm owners - gadget lovers as they are - will probably tend to upgrade
> those devices.

Yep. original --> V, and I'll get the next gen color one probably ;)

.duncan

RE: REDOM Design discussion.

Posted by Paulo Gaspar <pa...@krankikom.de>.

> -----Original Message-----
> From: James Duncan Davidson [mailto:james.davidson@eng.sun.com]
> Sent: Friday, July 14, 2000 06:58
>
>
> on 7/13/00 10:38 PM, Ted Leung at twleung@sauria.com wrote:
>
> > Yet this is one of the things that's been mentioned as a
> problem with the
> > current
> > code base.  If the requirement is one code base and runs well
> on server side
> > and on devices, then we have a problem.
>
> One core parser should be able to run excellent on full scale VMs, and
> pretty good on things like set top boxes... In my talk about TVs
> and such, I
> definitly wasn't talking about things like Cell Phones or such (that
> requires a special parser, no argument).
>
> I think that acceptable perf on a BeOS based set top box would be doable.
> That's the level of smaller device that I was thinking of. Not like Palm
> Pilots. :)
>
> .duncan

Small devices are evolving so damn fast that it is even better not to
mention names. But I agree there will be "small-but-performant-ambitious"
devices and "as-small-and-cheap-as-possible" devices.

The question seems to be around the "as-small-and-cheap-as-possible".

This subject was alredy mentioned in the Xml-Dev dev list (THE XML mailling
list, now based at www.xml.org - check on the resources link).

The conclusion was that for some environments a simpler XML dialect would
be needed in order to have simpler parsers too. Don Park (a well known XML
expert) even published a spec of what he called Minimal XML.
You can find it at:
  http://www.docuverse.com/smldev/minxmlspec.html

You can find an example of a tiny parser implemented in JavaScript at:
  http://sjoerd.editthispage.com/stories/storyReader$20

Now, you will notice that this parser implements no standar interface and
it is just a custom parser that is supposed (not in my version of MS-IE) to
output a tree respresenting any Minimal XML he gets as input.

What is the idea then?

If you alredy worked in a company related with embeded systems, I am sure
you are guessing it:
 - The idea is that when you have an "as-small-and-cheap-as-possible"
   device, it only does simple stuff and you tend to use more custom made
   stuff and less libraries;
 - The idea is to have an XML dialect that is simple enough to parse with
   only a couple of reusable routines and custom code for the few simple
   things that the device does.

I think that "as-small-and-cheap-as-possible" devices are not potential
customer for Xerces or any other full blown XML parser.

Now, if we talk about Palms and WinCE devices or even some mobile
telephones, we are talking about "small-but-performant-ambitious"
devices:
 - Their ambition is to be a tiny handheld personal computer;
 - Their cost target is not "as-cheap-as-possible" but "cheap-enough".

When hardware technology is available to have a Pentium processing power
class CPU with 128 MB memory at a "cheap-enough" price, they will use it.
And they will cram those devices with as much software as possible and
good libraries will be badly needed.

And that hardware technology is comming quite fast.

This means that we are in a short transitional period before this products
are powerful enough to include quite complete Java environments. And most
Palm owners - gadget lovers as they are - will probably tend to upgrade
those devices.

So, are you planning a Xerces version for the next year or for the next
years?

(This all means I fully agree with Duncan even when I have a different
view on mobile telephones and Palms.)

Have fun,

Paulo Gaspar

Re: REDOM Design discussion.

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/13/00 10:38 PM, Ted Leung at twleung@sauria.com wrote:

> Yet this is one of the things that's been mentioned as a problem with the
> current
> code base.  If the requirement is one code base and runs well on server side
> and on devices, then we have a problem.

One core parser should be able to run excellent on full scale VMs, and
pretty good on things like set top boxes... In my talk about TVs and such, I
definitly wasn't talking about things like Cell Phones or such (that
requires a special parser, no argument).

I think that acceptable perf on a BeOS based set top box would be doable.
That's the level of smaller device that I was thinking of. Not like Palm
Pilots. :)

.duncan

Re: REDOM Design discussion.

Posted by Ted Leung <tw...@sauria.com>.

----- Original Message -----
From: "Jim Driscoll" <ji...@eng.sun.com>
To: <xe...@xml.apache.org>
Sent: Thursday, July 13, 2000 9:12 AM
Subject: Re: REDOM Design discussion.


> I've been generally keeping quiet, but I know the answer to this one:
>
> Ted Leung wrote:
> >
> > HotSpot is not the only JVM in the world.
>
> True.  It is, however, safe to assume that object creation is going to
> get cheaper in every VM, not just HotSpot.  The technology is just plain
> easy to implement.  So writing code that eliminates object creation
> before you actually write a benchmark for testing the cost of object
> creation may be premature.
>
> > There are a number of open source JVM's, and you can bet that
> > the JVM's in devices are not going to be HotSpot VMs.   Or maybe
> > I'm wrong -- Duncan?
>
> Don't have specific information on this, but since Hotspot gets around
> object creation costs via pre-allocation of a large tract of memory,
> it's pretty safe to assume that that technology won't translate well to
> small devices.
>
> > If a parser that runs on devices is a requirement then I don't know if
> > we can assume that HotSpot will be there to save us.
>
> If you're writing a parser for devices, it seems to me that your
> requirments might well be radically different, in terms of performance
> characteristics and memory footprint.  In fact, I'd submit that the
> parser you write for devices, and the parser you write for server-side
> processing, should be two separate codebases, since they have such
> different requirements.

Yet this is one of the things that's been mentioned as a problem with the
current
code base.  If the requirement is one code base and runs well on server side
and on devices, then we have a problem.

Ted

Re: Xerces vs. com.sun.xml

Posted by Costin Manolache <co...@eng.sun.com>.

Magnus ?or Torfason wrote:

> I was just looking through the Tomcat distribution and noticed that it is
> distributed with the com.sun.xml classes, but not Xerces.  Why is this
> (considering that both Xerces and Tomcat are Apache projects)?

One reason is "historic" - that's the parser that was used initially.
The current release uses JAXP - so any parser would work.

I tried to use xerces, but that version had a strange problem - even
if tomcat implements resolveEntity(), xerces didn't seem to use it.
I found some internal APIs and I'm sure there is a simple way
to turn it off, but that would mean to use xerces internals.
I haven't tested with the latest version, but if you are on the
online you can use any version of xerces that supports jaxp.

A third reason is the size, of course. Tomcat does use an XML
parser for configuration, but the parser was bigger than tomcat
itself. A bit too much for just reading the config.

Costin

Xerces vs. com.sun.xml

Posted by Magnus ?or Torfason <ma...@handtolvur.is>.

I was just looking through the Tomcat distribution and noticed that it is
distributed with the com.sun.xml classes, but not Xerces.  Why is this
(considering that both Xerces and Tomcat are Apache projects)?

magnus

Re: REDOM Design discussion.

Posted by Jim Driscoll <ji...@eng.sun.com>.

I've been generally keeping quiet, but I know the answer to this one:

Ted Leung wrote:
> 
> HotSpot is not the only JVM in the world.

True.  It is, however, safe to assume that object creation is going to
get cheaper in every VM, not just HotSpot.  The technology is just plain
easy to implement.  So writing code that eliminates object creation
before you actually write a benchmark for testing the cost of object
creation may be premature.

> There are a number of open source JVM's, and you can bet that
> the JVM's in devices are not going to be HotSpot VMs.   Or maybe
> I'm wrong -- Duncan?

Don't have specific information on this, but since Hotspot gets around
object creation costs via pre-allocation of a large tract of memory,
it's pretty safe to assume that that technology won't translate well to
small devices.

> If a parser that runs on devices is a requirement then I don't know if
> we can assume that HotSpot will be there to save us.

If you're writing a parser for devices, it seems to me that your
requirments might well be radically different, in terms of performance
characteristics and memory footprint.  In fact, I'd submit that the
parser you write for devices, and the parser you write for server-side
processing, should be two separate codebases, since they have such
different requirements.

Jim

Re: REDOM Design discussion.

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/12/00 5:42 PM, Ted Leung at twleung@sauria.com wrote:

> There are a number of open source JVM's, and you can bet that
> the JVM's in devices are not going to be HotSpot VMs.   Or maybe
> I'm wrong -- Duncan?

Unfortunatly, I haven't seen any open source VM's attain enough usage to
really warrant to much worry about them.

But, we should be relatively concerned about VMs in the Micro space. These
VMs wont' have adaptive characteristics. Granted, speed isn't hellishly
important here, but should be considered.

Top of bell curve on Hotspot, more modern non-adaptive compilers (such as
you'd find on the Palm or something) at least half way up the curve?

.duncan

Re: REDOM Design discussion.

Posted by Ted Leung <tw...@sauria.com>.

HotSpot is not the only JVM in the world.

There are a number of open source JVM's, and you can bet that
the JVM's in devices are not going to be HotSpot VMs.   Or maybe
I'm wrong -- Duncan?

If a parser that runs on devices is a requirement then I don't know if
we can assume that HotSpot will be there to save us.

----- Original Message -----
From: "Kevin Regan" <ke...@valicert.com>
To: <xe...@xml.apache.org>
Sent: Wednesday, July 12, 2000 11:11 AM
Subject: Re: REDOM Design discussion.


>
> It might be useful to consult with the HotSpot folks before making
> any conclusions about this...
>
> --Kevin
>
> On Wed, 12 Jul 2000, Costin Manolache wrote:
>
> > > I'd much prefer to see a RecyclableString, it at least gives a type
> > that
> > > indicates what it is. A bare int opens up many possibilities for
> > > incorrect usage. And if the RecyclableString turns out to be not
> > > working, it's much easier to do a massive search & replace to turn it
> > > into an int than to do the reverse.
> >
> > +1 on that.
> >
> > > > Tomcat 3.1 versus tomcat 3.2.
> > > > The only big difference is the reuse of ( some !) objects ( we still
> > have a
> > > > large number of string allocation/request).
> > > > Performance diff: at least double ( with the max. time per request
> > 3-4 time
> > > > smaller).
> > >
> > > I still have the questions: what VM? what JIT? I only persist with
> > this
> > > question because James indicated that the intended targets are
> > "modern"
> > > VMs, where object creation is (supposed to be) significantly cheaper.
> >
> > 1.2.2 and 1.3. The diff is bigger on 1.2.2, of course.
> >
> > > And you're saying that a good percentage of the "unnecessary" object
> > > allocations are of java.lang.String, correct? Do you have an idea of
> > how
> > > many (or perhaps what percentage) java.lang.String allocations are
> > saved
> > > in the current Xerces implementation via int and StringPool?
> >
> > I would be interested to find out - I hope people working on xerces have
> > some
> > data, this seems to be very optimized in the current xerces. Is it just
> > to obfuscate
> >
> > it  or did they had some reasons ?
> >
> > Costin
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>
>
>

Re: REDOM Design discussion.

Posted by co...@eng.sun.com.

> 
> It might be useful to consult with the HotSpot folks before making
> any conclusions about this...

I don't think HotSpot ( or any future VM ) will have free memory
allocation and free garbage collection. It may be cheap, but it's never
free.

Also, recycling and reusing is a good pattern - we should do it whenever
is possible ( and not only in code :-) Wasting memory and cpu is in
general a bad pattern ( even if both CPU and memory are cheap)

Costin



> 
> --Kevin 
> 
> On Wed, 12 Jul 2000, Costin Manolache wrote:
> 
> > > I'd much prefer to see a RecyclableString, it at least gives a type
> > that
> > > indicates what it is. A bare int opens up many possibilities for
> > > incorrect usage. And if the RecyclableString turns out to be not
> > > working, it's much easier to do a massive search & replace to turn it
> > > into an int than to do the reverse.
> > 
> > +1 on that.
> > 
> > > > Tomcat 3.1 versus tomcat 3.2.
> > > > The only big difference is the reuse of ( some !) objects ( we still
> > have a
> > > > large number of string allocation/request).
> > > > Performance diff: at least double ( with the max. time per request
> > 3-4 time
> > > > smaller).
> > >
> > > I still have the questions: what VM? what JIT? I only persist with
> > this
> > > question because James indicated that the intended targets are
> > "modern"
> > > VMs, where object creation is (supposed to be) significantly cheaper.
> > 
> > 1.2.2 and 1.3. The diff is bigger on 1.2.2, of course.
> > 
> > > And you're saying that a good percentage of the "unnecessary" object
> > > allocations are of java.lang.String, correct? Do you have an idea of
> > how
> > > many (or perhaps what percentage) java.lang.String allocations are
> > saved
> > > in the current Xerces implementation via int and StringPool?
> > 
> > I would be interested to find out - I hope people working on xerces have
> > some
> > data, this seems to be very optimized in the current xerces. Is it just
> > to obfuscate
> > 
> > it  or did they had some reasons ?
> > 
> > Costin
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>

Re: REDOM Design discussion.

Posted by Kevin Regan <ke...@valicert.com>.

It might be useful to consult with the HotSpot folks before making
any conclusions about this...

--Kevin 

On Wed, 12 Jul 2000, Costin Manolache wrote:

> > I'd much prefer to see a RecyclableString, it at least gives a type
> that
> > indicates what it is. A bare int opens up many possibilities for
> > incorrect usage. And if the RecyclableString turns out to be not
> > working, it's much easier to do a massive search & replace to turn it
> > into an int than to do the reverse.
> 
> +1 on that.
> 
> > > Tomcat 3.1 versus tomcat 3.2.
> > > The only big difference is the reuse of ( some !) objects ( we still
> have a
> > > large number of string allocation/request).
> > > Performance diff: at least double ( with the max. time per request
> 3-4 time
> > > smaller).
> >
> > I still have the questions: what VM? what JIT? I only persist with
> this
> > question because James indicated that the intended targets are
> "modern"
> > VMs, where object creation is (supposed to be) significantly cheaper.
> 
> 1.2.2 and 1.3. The diff is bigger on 1.2.2, of course.
> 
> > And you're saying that a good percentage of the "unnecessary" object
> > allocations are of java.lang.String, correct? Do you have an idea of
> how
> > many (or perhaps what percentage) java.lang.String allocations are
> saved
> > in the current Xerces implementation via int and StringPool?
> 
> I would be interested to find out - I hope people working on xerces have
> some
> data, this seems to be very optimized in the current xerces. Is it just
> to obfuscate
> 
> it  or did they had some reasons ?
> 
> Costin
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>

Re: REDOM Design discussion.

Posted by Costin Manolache <co...@eng.sun.com>.

> I'd much prefer to see a RecyclableString, it at least gives a type that
> indicates what it is. A bare int opens up many possibilities for
> incorrect usage. And if the RecyclableString turns out to be not
> working, it's much easier to do a massive search & replace to turn it
> into an int than to do the reverse.

+1 on that.

> > Tomcat 3.1 versus tomcat 3.2.
> > The only big difference is the reuse of ( some !) objects ( we still have a
> > large number of string allocation/request).
> > Performance diff: at least double ( with the max. time per request 3-4 time
> > smaller).
>
> I still have the questions: what VM? what JIT? I only persist with this
> question because James indicated that the intended targets are "modern"
> VMs, where object creation is (supposed to be) significantly cheaper.

1.2.2 and 1.3. The diff is bigger on 1.2.2, of course.

> And you're saying that a good percentage of the "unnecessary" object
> allocations are of java.lang.String, correct? Do you have an idea of how
> many (or perhaps what percentage) java.lang.String allocations are saved
> in the current Xerces implementation via int and StringPool?

I would be interested to find out - I hope people working on xerces have some
data, this seems to be very optimized in the current xerces. Is it just to obfuscate

it  or did they had some reasons ?

Costin

Re: REDOM Design discussion.

Posted by Jay Sachs <js...@iclick.com>.

Costin Manolache wrote:
> 
> Jay Sachs wrote:
> 
> > Costin Manolache wrote:
> >
> > > For example using QuickSort instead of BubbleSort may be a good
> > > idea in some cases, even if it's much more complex and harder to
> > > understand.
> >
> > That metaphor doesn't hold water. The interface to both QuickSort and
> > BubbleSort is the same. Client code of that subsystem doesn't care how
> > messy it is on the inside. Using ints or StringHolders instead of
> > Strings impacts in a major way all client code of the subsystem.
> 
> This is only for the internal API, it will never be visible in the client. SAX
> will generate Strings ( interned - only one String allocation per element name
> or attribute name ),
> DOM interfaces will also generate Strings ( again, lazy).

By "client" I meant the other portions of what you called the "internal
API". Developing the software itself requires maintenance and clear
APIs, and there are costs to introducing an abstraction between
developers and Strings.

> Regarding QuickSort - the interfaces will be the same ( or very simple ).
> You'll have a StringHolder ( or a better name - RecyclableString ) with a
> toString()
> method, or an Int2StringMap. Yes, it's one extra method call - but having the
> "simplest possible parser" is not the main goal ( I assume we'll need a lot of
> caching and pooling all over the code - DTDs, schemas, etc).

I'd much prefer to see a RecyclableString, it at least gives a type that
indicates what it is. A bare int opens up many possibilities for
incorrect usage. And if the RecyclableString turns out to be not
working, it's much easier to do a massive search & replace to turn it
into an int than to do the reverse.

> > > Not generating garbage is a good idea even if the GC is very fast -
> > > recycling is good ( as it is in real life ).
> > >
> > > I am very concerned about using Strings, but if int is too much we should
> > > use StringHolder or something else - creating 100000 objects is waste,
> > > and I don't think it's "understandable".
> >
> > Can you provide justification with some reasonably concrete numbers
> > (possibly based on the intended target platforms/VMs) that using Strings
> > will be a debilitating performance bottleneck? I understand your
> > concerns in theory, but is it really a problem? I suspect it will be
> > difficult to answer that at this point in time.
> 
> Tomcat 3.1 versus tomcat 3.2.
> The only big difference is the reuse of ( some !) objects ( we still have a
> large number of string allocation/request).
> Performance diff: at least double ( with the max. time per request 3-4 time
> smaller).

I still have the questions: what VM? what JIT? I only persist with this
question because James indicated that the intended targets are "modern"
VMs, where object creation is (supposed to be) significantly cheaper.

> Of course, a tomcat request has far fewer objects anyway - if you plan to use
> xerces for a soap or large file - there is a big chance you'll see much worse.
> 

And you're saying that a good percentage of the "unnecessary" object
allocations are of java.lang.String, correct? Do you have an idea of how
many (or perhaps what percentage) java.lang.String allocations are saved
in the current Xerces implementation via int and StringPool?

jay

Re: REDOM Design discussion.

Posted by Costin Manolache <co...@eng.sun.com>.

Jay Sachs wrote:

> Costin Manolache wrote:
>
> > For example using QuickSort instead of BubbleSort may be a good
> > idea in some cases, even if it's much more complex and harder to
> > understand.
>
> That metaphor doesn't hold water. The interface to both QuickSort and
> BubbleSort is the same. Client code of that subsystem doesn't care how
> messy it is on the inside. Using ints or StringHolders instead of
> Strings impacts in a major way all client code of the subsystem.

This is only for the internal API, it will never be visible in the client. SAX
will generate Strings ( interned - only one String allocation per element name
or attribute name ),
DOM interfaces will also generate Strings ( again, lazy).

Regarding QuickSort - the interfaces will be the same ( or very simple ).
You'll have a StringHolder ( or a better name - RecyclableString ) with a
toString()
method, or an Int2StringMap. Yes, it's one extra method call - but having the
"simplest possible parser" is not the main goal ( I assume we'll need a lot of
caching and pooling all over the code - DTDs, schemas, etc).

> > Not generating garbage is a good idea even if the GC is very fast -
> > recycling is good ( as it is in real life ).
> >
> > I am very concerned about using Strings, but if int is too much we should
> > use StringHolder or something else - creating 100000 objects is waste,
> > and I don't think it's "understandable".
>
> Can you provide justification with some reasonably concrete numbers
> (possibly based on the intended target platforms/VMs) that using Strings
> will be a debilitating performance bottleneck? I understand your
> concerns in theory, but is it really a problem? I suspect it will be
> difficult to answer that at this point in time.

Tomcat 3.1 versus tomcat 3.2.
The only big difference is the reuse of ( some !) objects ( we still have a
large number of string allocation/request).
Performance diff: at least double ( with the max. time per request 3-4 time
smaller).

Of course, a tomcat request has far fewer objects anyway - if you plan to use
xerces for a soap or large file - there is a big chance you'll see much worse.

Costin

Re: REDOM Design discussion.

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/12/00 7:33 AM, Jay Sachs at jsachs@iclick.com wrote:

> Costin Manolache wrote:
> 
>> For example using QuickSort instead of BubbleSort may be a good
>> idea in some cases, even if it's much more complex and harder to
>> understand.
> 
> That metaphor doesn't hold water. The interface to both QuickSort and
> BubbleSort is the same. Client code of that subsystem doesn't care how
> messy it is on the inside. Using ints or StringHolders instead of
> Strings impacts in a major way all client code of the subsystem.

Brings up the point that it'd be really cool to use collections if we do
decide to move into 1.2+ API land. :)

.duncan

Re: REDOM Design discussion.

Posted by Jay Sachs <js...@iclick.com>.

Costin Manolache wrote:

> For example using QuickSort instead of BubbleSort may be a good
> idea in some cases, even if it's much more complex and harder to
> understand.

That metaphor doesn't hold water. The interface to both QuickSort and
BubbleSort is the same. Client code of that subsystem doesn't care how
messy it is on the inside. Using ints or StringHolders instead of
Strings impacts in a major way all client code of the subsystem.

> Not generating garbage is a good idea even if the GC is very fast -
> recycling is good ( as it is in real life ).
> 
> I am very concerned about using Strings, but if int is too much we should
> use StringHolder or something else - creating 100000 objects is waste,
> and I don't think it's "understandable".

Can you provide justification with some reasonably concrete numbers
(possibly based on the intended target platforms/VMs) that using Strings
will be a debilitating performance bottleneck? I understand your
concerns in theory, but is it really a problem? I suspect it will be
difficult to answer that at this point in time.

jay

Re: REDOM Design discussion.

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/11/00 5:40 PM, Costin Manolache at Costin.Manolache@eng.sun.com wrote:

> We both work for Sun and still have different opinions :-)

He he he.. :)

> For example using QuickSort instead of BubbleSort may be a good
> idea in some cases, even if it's much more complex and harder to
> understand.

Yep. Your point is well taken.. Which is why I'm +0 on something like
StringHolder or something that does a good job of avoiding garbage. :) No
opinion. :)

> I am very concerned about using Strings, but if int is too much we should
> use StringHolder or something else - creating 100000 objects is waste,
> and I don't think it's "understandable".

I mean "understandable" code wise -- not pardon wise. As long as the code is
clear when we do such things, I'm cool.

.duncan

Re: REDOM Design discussion.

Posted by Costin Manolache <co...@eng.sun.com>.

James Duncan Davidson wrote:

> on 7/11/00 7:46 AM, Costin Manolache at Costin.Manolache@eng.sun.com wrote:
>
> > One big issue ( IMHO) is to decide about the use of String/int/StringHolder.
> > In my experience this is a major performance factor. Using int instead
> > of String is uncomfortable for many people, and StringHolder (i.e.
> > a recyclable object) may be a good compromise or not. I'm +1 on
> > ints, but I have a feeling I'll be in minority.
>
> :) I'm -1 on going as far as ints.. I'd actually advocate using Strings up
> front and looking at how to better optimize those points that show as Hot in
> OptimizeIt (or whatever profiling tool you like) later -- but I'm solidly
> bought into the religion of avoiding premature optimizations, so I'm biased.
> I'm +0 on using something like StringHolder. As long as it's understandable,
> then I'm ok with it. :)

We both work for Sun and still have different opinions :-)

Yes, agree with you fully on premature optimizations, but not every
optimization is premature and not everything that improve
performance is an optimization :-).
For example using QuickSort instead of BubbleSort may be a good
idea in some cases, even if it's much more complex and harder to
understand.
Not generating garbage is a good idea even if the GC is very fast -
recycling is good ( as it is in real life ).

I am very concerned about using Strings, but if int is too much we should
use StringHolder or something else - creating 100000 objects is waste,
and I don't think it's "understandable".

Costin

Re: REDOM Design discussion.

Posted by James Duncan Davidson <ja...@eng.sun.com>.

on 7/11/00 7:46 AM, Costin Manolache at Costin.Manolache@eng.sun.com wrote:

> One big issue ( IMHO) is to decide about the use of String/int/StringHolder.
> In my experience this is a major performance factor. Using int instead
> of String is uncomfortable for many people, and StringHolder (i.e.
> a recyclable object) may be a good compromise or not. I'm +1 on
> ints, but I have a feeling I'll be in minority.

:) I'm -1 on going as far as ints.. I'd actually advocate using Strings up
front and looking at how to better optimize those points that show as Hot in
OptimizeIt (or whatever profiling tool you like) later -- but I'm solidly
bought into the religion of avoiding premature optimizations, so I'm biased.
I'm +0 on using something like StringHolder. As long as it's understandable,
then I'm ok with it. :)

.duncan

Re: REDOM Design discussion.

Posted by Costin Manolache <co...@eng.sun.com>.

> I think that idea of using Class.forName() to load modules, and
> conditional compilation have a lot of potential, to what
> Arnaud has been pursuing with the DOM API.

I hope not only for DOM ( where it's probably the hardest to
do it ), but for xerces in general: for example org.apache.xml.serialize,
wml, html, plus most of the features that are clearly demarcated.
Even some of the readers can be left out ( it seems there are 2
UTF readers, only one used the other just add to the jar size), probably
xcatalog, etc, etc. The AUC package is also a good way to extract
all general-purpose code, leaving only the core API and the modules.
There is a lot of great code in xerces, and it should have a nice interface
that allows it's use in other projects.

The main difficulty, as James pointed out, is to have a good set of
internal APIs that are easy to understand - that's what enable modularization.

Ed has a very good point - it would be great if we could use SAX2 plus few
extensions,  or something close enough. We can then move from 2 directions -
moving xerces to use the new APIs ( and nothing else, no more internals !),
and building modules that are un-optimized or optimized for a different
target. It will be then a compile ( or runtime ) decision.

One big issue ( IMHO) is to decide about the use of String/int/StringHolder.
In my experience this is a major performance factor. Using int instead
of String is uncomfortable for many people, and StringHolder (i.e.
a recyclable object) may be a good compromise or not. I'm +1 on
ints, but I have a feeling I'll be in minority.

Costin

> >One thing we did a lot in tomcat is use Class.forName() to load modules,
> >and
> >conditional compilation.
> >In DOM is much more difficult to demarcate "features", but at least if
> >there
> >are separated classes you can use an interface and load the class only if
> >needed.
> >
> >Costin