You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geronimo.apache.org by Alex Blewitt <Al...@ioshq.com> on 2003/09/09 20:53:52 UTC
Re: Geronimo Deployment Descriptors -- and premature optimisation
On Tuesday, Sep 9, 2003, at 17:44 Europe/London, Jeremy Boynes wrote:
>> However, it doesn't necessarily mean that it can't generate the XML,
>> rather than a binary-compatible format that Jeremy was suggesting. An
>> XML document will always be more portable between versions than a
>> generated bunch of code, because when bugs are fixed in the latter you
>> have to regenerate, whereas with an XML file you don't.
>
> Please do not think I am thinking binary is the only way to go - that
> notion
> was discarded back in EJB1.0 days. What I want is to have it as an
> option.
Can I make a few observations here:
o Assumption: large XML files take long time to parse, therefore the
server will be slow to start up
o Assumption: the way to solve that is with the deploy tool, and
possibly a combined XML+binary format.
I think there are other solutions to the problem than just these.
Whilst it is true the XML file parsing can take some time, it's not
actually likely to be where the amount of time is taken up in the
server. If we had metrics to prove it, I'd shut up, but we don't.
I'd postulate that we would be able to fire up the server faster if we
used a different optimisations; for example, a multi-threaded startup
(like provided by Avalon) instead of a single threaded model; an
on-the-fly parse of the XML file instead of into a DOM/POJO; ditching
the JMX later and using Java method calls; and so on.
But we don't *know* that this is where the bottleneck is. It may be,
and we can run tests to show that in a simple scenario, option A is
faster than option B, but that doesn't mean that that's where the
bottleneck will be in the server.
But if it takes (say) 10 or 100 times as long to dynamically create the
bean, we are solving the wrong problem. Don't get me wrong, I don't
know how much time it takes to create a bean -- but we don't seem to
have any profiling to suggest the various options. It could even be the
case that a more optimised XML parser would solve the problem, or a
different way of creating the POJOs.
I'd also like to disagree that this optimisation should be done by the
deployer. Why not have it done by the server when the code is deployed?
Sure, you wouldn't want it to happen every time the server starts (like
compiling JSPs) -- so dump out a binary representation at the server
side, and drop that cache when the application gets redeployed. That
way, you still get the fast startup (2nd time onwards) whilst
maintaining portability and without having to sacrifice any issues with
the developer.
> For example, parsing the XML with
> full schema validation is a dog - on my machine even a simple file
> takes a
> couple of seconds and a couple of MB and I am concerned about a) large
> applications with hundreds of modules taking forever to start, and b)
> applications trying to run with constrained resources. And yes, we do
> need
> to consider these things :-)
But if you had that large an application, how long would you expect it
to take up? Realistically, what is the largest size of app you've had
to deal with? Most web-apps have just a single servlet these days (ala
Struts), so the only issue is with EJBs, and with 1000 EJBs you're
still looking at 1k of data/EJB to make a 1MB file. That's a hell of a
lot. And do we know how long it takes to deploy 1000 EJBs once the XML
file has loaded? Are we seriously saying that we expect that part of
the process to take dramatically less than 2s? If not, then the
bottleneck isn't going to be at the XML parsing stage.
> We have also had proposals for storing configuration information in
> LDAP
> respositories and relational databases, neither of which would allow
> vi-style access to the XML. A binary format may well be a better
> option for
> them.
IMHO I don't think that a 'vi' style access for XML is the sole reason
to use them. I am personally more a fan of storing the configuration in
LDAP, which will be slower still than having it in XML files. But I
wanted to raise a big 'no' to a binary file format, including any
serialized concepts of MBeans which would then have real difficulty in
being interpreted if we ever managed to break away from JMX. No, I
don't think it will happen soon, but I can hope :-) See Elliotte's
comments on XML and binary at
http://www.cafeconleche.org/books/effectivexml/chapters/50.html (or the
cached version at
http://216.239.41.104/search?q=cache:oxknzyhXE9MJ:www.cafeconleche.org/
books/effectivexml/chapters/
50.html+%22Compress+if+space+is+a+problem%22&hl=en&ie=UTF-8 since I
couldnt' see it on the former)
> Think of it like JSP: some people want to pre-compile, and this is
> *very*
> common in production environments.
I don't see the two being that comparable. A site may have many
hundereds of JSPs with several k of data in them each, and they take
(relatively speaking) a long time to parse, translate, and then
compile. I don't see that parsing an EJB-JAR.xml file in the same order
of magnitude.
I don't disagree that we can cache an internal form to optimise
speedup; I just don't think it should be anything the deployment tool
should use. Same with JSPs; we can upload them into Geronimo, and then
a background process can pre-compile them when resources are available.
I don't think we should force the developer to decide between the two.
[What other JSP engines get wrong is that it's necessary to precompile
all JSPs before deployment. It's not; they just need to be compiled
before the user sees them. The process should be Deploy -> run app ->
precompile all possible next JSPs that you can move to.]
Premature optimisation is the root of all evil.
Alex.
Re: Geronimo Deployment Descriptors -- and premature optimisation
Posted by Bill de hÓra <bi...@dehora.net>.
Dain Sundstrom wrote:
> Alex in an embedded system you may only have a small prom available to
> boot from. Specifically, you may want to ditch the 1 meg for an XML
> parser. You may also have a small amount of memory and XML parsers are
> known for being memory pigs (which is fine in a normal server).
Dain,
I'd make the argument that any embedded system that needed to ditch
an XML parser probably has no business running a J2EE stack. *
Fwiw, the answer to this problem is to eliminate implmentation
specific descriptors where possible. To paraphrase jwz - some
people, when confronted with a vendor specific J2EE descriptor think
“I know, I’ll use a deployment tool.” Now they have two problems.
Bill de hÓra
* [It's possible btw, to make small footprint and fast XML parsers
(such as aelfred or piccolo). I have no idea why a number of open
source ones insist on piling on the cruft, but they do.]
Re: Geronimo Deployment Descriptors -- and premature optimisation
Posted by Alex Blewitt <Al...@ioshq.com>.
On Tuesday, Sep 9, 2003, at 22:35 Europe/London, Jeremy Boynes wrote:
>> From: Alex Blewitt [mailto:Alex.Blewitt@ioshq.com]
>> Yes, provided that those forms are a declarative non-binary
>> representation for the reasons already pointed out. As I said, I'm not
>> a definitive fan of XML; I'd prefer to see JNDI/LDAP being used
>> personally, but there are bound to be other alternatives. I was merely
>> pointing out the flaws in the argument; that it was optimisation
>> without benchmarking proving the case in Geronimo's loading time; that
>> programmatic specs degrade faster than declarative specs, and that
>> binary formats degrade faster than text formats.
>>
>> I think that that's a pretty open PoV :-)
>
> I originally said:
>
> "This allows us, if we wish, to pre-compile the configuration
> information
> into other forms.
My interpretation of 'compile' was 'translate into a binary format'. It
could, of course, mean translating to any formats, not just binary.
> For example, it could be an archive of serialized MBean
> states that can simply be unmarshalled by the server and started.
Yes, this is one possible way of doing it, but I tried to address the
risks with a binary approach that this could take.
> This has
> many potential advantages, such as reducing the startup time for very
> large
> applications or reducing the resources required for an embedded
> server."
I did disagree with the fact that it will reduce the startup time. This
is based on nothing but speculation.
It also may not necessarily be the case that it will reduce the
resources required for an embedded server; if, for example, other
aspects of the core require an XML parser to be present (for instance)
then the size may stay the same.
However, I do agree that there are several formats and options we could
use with advantages and disadvantages.
> which is hardly advocating excessive premature optimization, just
> illustrating a possible future path.
I believe that avoiding using XML because of feared performance is
premature optimisation. I don't believe that I said 'excessive', though
I may have done.
And, whilst it is good to explore other ideas and issues, it's also
good to be able to discuss the advantages and disadvantages of the
approaches. That isn't to say that this approach shouldn't or couldn't
be taken, but there are some disadvantages to which the advantages may
not be enough.
> Please do not misrepresent me.
I do not recall misrepresenting you, but I apologise if I did. I merely
tried to point out some problems with a compiled and/or binary approach.
Alex.
RE: Geronimo Deployment Descriptors -- and premature optimisation
Posted by Jeremy Boynes <je...@coredevelopers.net>.
> From: Alex Blewitt [mailto:Alex.Blewitt@ioshq.com]
> Sent: Tuesday, September 09, 2003 1:32 PM
>
> On Tuesday, Sep 9, 2003, at 21:22 Europe/London, Dain Sundstrom wrote:
>
> > Alex in an embedded system you may only have a small prom available to
> > boot from. Specifically, you may want to ditch the 1 meg for an XML
> > parser. You may also have a small amount of memory and XML parsers
> > are known for being memory pigs (which is fine in a normal server).
>
> You can't seriously be comparing Geronimo's kernel to that of an
> embedded kernel, can you? Never mind that the JDK in its current form
> is way too big to fit on a small device.
>
> If you want to write Geronimo for J2ME, go ahead.
>
> If we're using J2SE, especially 1.4, then we can use the XML parses
> that it comes with.
>
> > Also we need to be open to other persistent forms. The XML document
> > is simply a persistent form of data and we need to be open to other
> > persistent forms.
>
> Yes, provided that those forms are a declarative non-binary
> representation for the reasons already pointed out. As I said, I'm not
> a definitive fan of XML; I'd prefer to see JNDI/LDAP being used
> personally, but there are bound to be other alternatives. I was merely
> pointing out the flaws in the argument; that it was optimisation
> without benchmarking proving the case in Geronimo's loading time; that
> programmatic specs degrade faster than declarative specs, and that
> binary formats degrade faster than text formats.
>
> I think that that's a pretty open PoV :-)
>
I originally said:
"This allows us, if we wish, to pre-compile the configuration information
into other forms. For example, it could be an archive of serialized MBean
states that can simply be unmarshalled by the server and started. This has
many potential advantages, such as reducing the startup time for very large
applications or reducing the resources required for an embedded server."
which is hardly advocating excessive premature optimization, just
illustrating a possible future path.
Please do not misrepresent me.
--
Jeremy
Re: Geronimo Deployment Descriptors -- and premature optimisation
Posted by Alex Blewitt <Al...@ioshq.com>.
On Tuesday, Sep 9, 2003, at 21:22 Europe/London, Dain Sundstrom wrote:
> Alex in an embedded system you may only have a small prom available to
> boot from. Specifically, you may want to ditch the 1 meg for an XML
> parser. You may also have a small amount of memory and XML parsers
> are known for being memory pigs (which is fine in a normal server).
You can't seriously be comparing Geronimo's kernel to that of an
embedded kernel, can you? Never mind that the JDK in its current form
is way too big to fit on a small device.
If you want to write Geronimo for J2ME, go ahead.
If we're using J2SE, especially 1.4, then we can use the XML parses
that it comes with.
> Also we need to be open to other persistent forms. The XML document
> is simply a persistent form of data and we need to be open to other
> persistent forms.
Yes, provided that those forms are a declarative non-binary
representation for the reasons already pointed out. As I said, I'm not
a definitive fan of XML; I'd prefer to see JNDI/LDAP being used
personally, but there are bound to be other alternatives. I was merely
pointing out the flaws in the argument; that it was optimisation
without benchmarking proving the case in Geronimo's loading time; that
programmatic specs degrade faster than declarative specs, and that
binary formats degrade faster than text formats.
I think that that's a pretty open PoV :-)
Alex.
Re: Geronimo Deployment Descriptors -- and premature optimisation
Posted by Dain Sundstrom <da...@coredevelopers.net>.
Alex in an embedded system you may only have a small prom available to
boot from. Specifically, you may want to ditch the 1 meg for an XML
parser. You may also have a small amount of memory and XML parsers are
known for being memory pigs (which is fine in a normal server).
Also we need to be open to other persistent forms. The XML document is
simply a persistent form of data and we need to be open to other
persistent forms.
-dain
On Tuesday, September 9, 2003, at 01:53 PM, Alex Blewitt wrote:
> On Tuesday, Sep 9, 2003, at 17:44 Europe/London, Jeremy Boynes wrote:
>
>>> However, it doesn't necessarily mean that it can't generate the XML,
>>> rather than a binary-compatible format that Jeremy was suggesting. An
>>> XML document will always be more portable between versions than a
>>> generated bunch of code, because when bugs are fixed in the latter
>>> you
>>> have to regenerate, whereas with an XML file you don't.
>>
>> Please do not think I am thinking binary is the only way to go - that
>> notion
>> was discarded back in EJB1.0 days. What I want is to have it as an
>> option.
>
> Can I make a few observations here:
>
> o Assumption: large XML files take long time to parse, therefore the
> server will be slow to start up
> o Assumption: the way to solve that is with the deploy tool, and
> possibly a combined XML+binary format.
>
> I think there are other solutions to the problem than just these.
> Whilst it is true the XML file parsing can take some time, it's not
> actually likely to be where the amount of time is taken up in the
> server. If we had metrics to prove it, I'd shut up, but we don't.
>
> I'd postulate that we would be able to fire up the server faster if we
> used a different optimisations; for example, a multi-threaded startup
> (like provided by Avalon) instead of a single threaded model; an
> on-the-fly parse of the XML file instead of into a DOM/POJO; ditching
> the JMX later and using Java method calls; and so on.
>
> But we don't *know* that this is where the bottleneck is. It may be,
> and we can run tests to show that in a simple scenario, option A is
> faster than option B, but that doesn't mean that that's where the
> bottleneck will be in the server.
>
> But if it takes (say) 10 or 100 times as long to dynamically create
> the bean, we are solving the wrong problem. Don't get me wrong, I
> don't know how much time it takes to create a bean -- but we don't
> seem to have any profiling to suggest the various options. It could
> even be the case that a more optimised XML parser would solve the
> problem, or a different way of creating the POJOs.
>
> I'd also like to disagree that this optimisation should be done by the
> deployer. Why not have it done by the server when the code is
> deployed? Sure, you wouldn't want it to happen every time the server
> starts (like compiling JSPs) -- so dump out a binary representation at
> the server side, and drop that cache when the application gets
> redeployed. That way, you still get the fast startup (2nd time
> onwards) whilst maintaining portability and without having to
> sacrifice any issues with the developer.
>
>> For example, parsing the XML with
>> full schema validation is a dog - on my machine even a simple file
>> takes a
>> couple of seconds and a couple of MB and I am concerned about a) large
>> applications with hundreds of modules taking forever to start, and b)
>> applications trying to run with constrained resources. And yes, we do
>> need
>> to consider these things :-)
>
> But if you had that large an application, how long would you expect it
> to take up? Realistically, what is the largest size of app you've had
> to deal with? Most web-apps have just a single servlet these days (ala
> Struts), so the only issue is with EJBs, and with 1000 EJBs you're
> still looking at 1k of data/EJB to make a 1MB file. That's a hell of a
> lot. And do we know how long it takes to deploy 1000 EJBs once the XML
> file has loaded? Are we seriously saying that we expect that part of
> the process to take dramatically less than 2s? If not, then the
> bottleneck isn't going to be at the XML parsing stage.
>
>> We have also had proposals for storing configuration information in
>> LDAP
>> respositories and relational databases, neither of which would allow
>> vi-style access to the XML. A binary format may well be a better
>> option for
>> them.
>
> IMHO I don't think that a 'vi' style access for XML is the sole reason
> to use them. I am personally more a fan of storing the configuration
> in LDAP, which will be slower still than having it in XML files. But I
> wanted to raise a big 'no' to a binary file format, including any
> serialized concepts of MBeans which would then have real difficulty in
> being interpreted if we ever managed to break away from JMX. No, I
> don't think it will happen soon, but I can hope :-) See Elliotte's
> comments on XML and binary at
> http://www.cafeconleche.org/books/effectivexml/chapters/50.html (or
> the cached version at
> http://216.239.41.104/
> search?q=cache:oxknzyhXE9MJ:www.cafeconleche.org/books/effectivexml/
> chapters/50.html+%22Compress+if+space+is+a+problem%22&hl=en&ie=UTF-8
> since I couldnt' see it on the former)
>
>> Think of it like JSP: some people want to pre-compile, and this is
>> *very*
>> common in production environments.
>
> I don't see the two being that comparable. A site may have many
> hundereds of JSPs with several k of data in them each, and they take
> (relatively speaking) a long time to parse, translate, and then
> compile. I don't see that parsing an EJB-JAR.xml file in the same
> order of magnitude.
>
> I don't disagree that we can cache an internal form to optimise
> speedup; I just don't think it should be anything the deployment tool
> should use. Same with JSPs; we can upload them into Geronimo, and then
> a background process can pre-compile them when resources are
> available. I don't think we should force the developer to decide
> between the two. [What other JSP engines get wrong is that it's
> necessary to precompile all JSPs before deployment. It's not; they
> just need to be compiled before the user sees them. The process should
> be Deploy -> run app -> precompile all possible next JSPs that you can
> move to.]
>
> Premature optimisation is the root of all evil.
>
> Alex.
>
>
/*************************
* Dain Sundstrom
* Partner
* Core Developers Network
*************************/