You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@geronimo.apache.org by Alex Blewitt <Al...@ioshq.com> on 2003/09/09 20:53:52 UTC

Re: Geronimo Deployment Descriptors -- and premature optimisation

On Tuesday, Sep 9, 2003, at 17:44 Europe/London, Jeremy Boynes wrote:

>> However, it doesn't necessarily mean that it can't generate the XML,
>> rather than a binary-compatible format that Jeremy was suggesting. An
>> XML document will always be more portable between versions than a
>> generated bunch of code, because when bugs are fixed in the latter you
>> have to regenerate, whereas with an XML file you don't.
>
> Please do not think I am thinking binary is the only way to go - that  
> notion
> was discarded back in EJB1.0 days. What I want is to have it as an  
> option.

Can I make a few observations here:

o Assumption: large XML files take long time to parse, therefore the  
server will be slow to start up
o Assumption: the way to solve that is with the deploy tool, and  
possibly a combined XML+binary format.

I think there are other solutions to the problem than just these.  
Whilst it is true the XML file parsing can take some time, it's not  
actually likely to be where the amount of time is taken up in the  
server. If we had metrics to prove it, I'd shut up, but we don't.

I'd postulate that we would be able to fire up the server faster if we  
used a different optimisations; for example, a multi-threaded startup  
(like provided by Avalon) instead of a single threaded model; an  
on-the-fly parse of the XML file instead of into a DOM/POJO; ditching  
the JMX later and using Java method calls; and so on.

But we don't *know* that this is where the bottleneck is. It may be,  
and we can run tests to show that in a simple scenario, option A is  
faster than option B, but that doesn't mean that that's where the  
bottleneck will be in the server.

But if it takes (say) 10 or 100 times as long to dynamically create the  
bean, we are solving the wrong problem. Don't get me wrong, I don't  
know how much time it takes to create a bean -- but we don't seem to  
have any profiling to suggest the various options. It could even be the  
case that a more optimised XML parser would solve the problem, or a  
different way of creating the POJOs.

I'd also like to disagree that this optimisation should be done by the  
deployer. Why not have it done by the server when the code is deployed?  
Sure, you wouldn't want it to happen every time the server starts (like  
compiling JSPs) -- so dump out a binary representation at the server  
side, and drop that cache when the application gets redeployed. That  
way, you still get the fast startup (2nd time onwards) whilst  
maintaining portability and without having to sacrifice any issues with  
the developer.

> For example, parsing the XML with
> full schema validation is a dog - on my machine even a simple file  
> takes a
> couple of seconds and a couple of MB and I am concerned about a) large
> applications with hundreds of modules taking forever to start, and b)
> applications trying to run with constrained resources. And yes, we do  
> need
> to consider these things :-)

But if you had that large an application, how long would you expect it  
to take up? Realistically, what is the largest size of app you've had  
to deal with? Most web-apps have just a single servlet these days (ala  
Struts), so the only issue is with EJBs, and with 1000 EJBs you're  
still looking at 1k of data/EJB to make a 1MB file. That's a hell of a  
lot. And do we know how long it takes to deploy 1000 EJBs once the XML  
file has loaded? Are we seriously saying that we expect that part of  
the process to take dramatically less than 2s? If not, then the  
bottleneck isn't going to be at the XML parsing stage.

> We have also had proposals for storing configuration information in  
> LDAP
> respositories and relational databases, neither of which would allow
> vi-style access to the XML. A binary format may well be a better  
> option for
> them.

IMHO I don't think that a 'vi' style access for XML is the sole reason  
to use them. I am personally more a fan of storing the configuration in  
LDAP, which will be slower still than having it in XML files. But I  
wanted to raise a big 'no' to a binary file format, including any  
serialized concepts of MBeans which would then have real difficulty in  
being interpreted if we ever managed to break away from JMX. No, I  
don't think it will happen soon, but I can hope :-) See Elliotte's  
comments on XML and binary at  
http://www.cafeconleche.org/books/effectivexml/chapters/50.html (or the  
cached version at  
http://216.239.41.104/search?q=cache:oxknzyhXE9MJ:www.cafeconleche.org/ 
books/effectivexml/chapters/ 
50.html+%22Compress+if+space+is+a+problem%22&hl=en&ie=UTF-8 since I  
couldnt' see it on the former)

> Think of it like JSP: some people want to pre-compile, and this is  
> *very*
> common in production environments.

I don't see the two being that comparable. A site may have many  
hundereds of JSPs with several k of data in them each, and they take  
(relatively speaking) a long time to parse, translate, and then  
compile. I don't see that parsing an EJB-JAR.xml file in the same order  
of magnitude.

I don't disagree that we can cache an internal form to optimise  
speedup; I just don't think it should be anything the deployment tool  
should use. Same with JSPs; we can upload them into Geronimo, and then  
a background process can pre-compile them when resources are available.  
I don't think we should force the developer to decide between the two.  
[What other JSP engines get wrong is that it's necessary to precompile  
all JSPs before deployment. It's not; they just need to be compiled  
before the user sees them. The process should be Deploy -> run app ->  
precompile all possible next JSPs that you can move to.]

Premature optimisation is the root of all evil.

Alex.

Re: Geronimo Deployment Descriptors -- and premature optimisation

Posted by Bill de hÓra <bi...@dehora.net>.

Dain Sundstrom wrote:
> Alex in an embedded system you may only have a small prom available to  
> boot from.  Specifically, you may want to ditch the 1 meg for an XML  
> parser.  You may also have a small amount of memory and XML parsers are  
> known for being memory pigs (which is fine in a normal server).

Dain,

I'd make the argument that any embedded system that needed to ditch 
an XML parser probably has no business running a J2EE stack. *

Fwiw, the answer to this problem is to eliminate implmentation 
specific descriptors where possible. To paraphrase jwz - some 
people, when confronted with a vendor specific J2EE descriptor think 
“I know, I’ll use a deployment tool.” Now they have two problems.

Bill de hÓra

* [It's possible btw, to make small footprint and fast XML parsers 
(such as aelfred or piccolo). I have no idea why a number of open 
source ones insist on piling on the cruft, but they do.]

Re: Geronimo Deployment Descriptors -- and premature optimisation

Posted by Alex Blewitt <Al...@ioshq.com>.

On Tuesday, Sep 9, 2003, at 22:35 Europe/London, Jeremy Boynes wrote:

>> From: Alex Blewitt [mailto:Alex.Blewitt@ioshq.com]
>> Yes, provided that those forms are a declarative non-binary
>> representation for the reasons already pointed out. As I said, I'm not
>> a definitive fan of XML; I'd prefer to see JNDI/LDAP being used
>> personally, but there are bound to be other alternatives. I was merely
>> pointing out the flaws in the argument; that it was optimisation
>> without benchmarking proving the case in Geronimo's loading time; that
>> programmatic specs degrade faster than declarative specs, and that
>> binary formats degrade faster than text formats.
>>
>> I think that that's a pretty open PoV :-)
>
> I originally said:
>
> "This allows us, if we wish, to pre-compile the configuration 
> information
> into other forms.

My interpretation of 'compile' was 'translate into a binary format'. It 
could, of course, mean translating to any formats, not just binary.

> For example, it could be an archive of serialized MBean
> states that can simply be unmarshalled by the server and started.

Yes, this is one possible way of doing it, but I tried to address the 
risks with a binary approach that this could take.

> This has
> many potential advantages, such as reducing the startup time for very 
> large
> applications or reducing the resources required for an embedded 
> server."

I did disagree with the fact that it will reduce the startup time. This 
is based on nothing but speculation.

It also may not necessarily be the case that it will reduce the 
resources required for an embedded server; if, for example, other 
aspects of the core require an XML parser to be present (for instance) 
then the size may stay the same.

However, I do agree that there are several formats and options we could 
use with advantages and disadvantages.

> which is hardly advocating excessive premature optimization, just
> illustrating a possible future path.

I believe that avoiding using XML because of feared performance is 
premature optimisation. I don't believe that I said 'excessive', though 
I may have done.

And, whilst it is good to explore other ideas and issues, it's also 
good to be able to discuss the advantages and disadvantages of the 
approaches. That isn't to say that this approach shouldn't or couldn't 
be taken, but there are some disadvantages to which the advantages may 
not be enough.

> Please do not misrepresent me.

I do not recall misrepresenting you, but I apologise if I did. I merely 
tried to point out some problems with a compiled and/or binary approach.

Alex.

RE: Geronimo Deployment Descriptors -- and premature optimisation

Posted by Jeremy Boynes <je...@coredevelopers.net>.

> From: Alex Blewitt [mailto:Alex.Blewitt@ioshq.com]
> Sent: Tuesday, September 09, 2003 1:32 PM
>
> On Tuesday, Sep 9, 2003, at 21:22 Europe/London, Dain Sundstrom wrote:
>
> > Alex in an embedded system you may only have a small prom available to
> > boot from.  Specifically, you may want to ditch the 1 meg for an XML
> > parser.  You may also have a small amount of memory and XML parsers
> > are known for being memory pigs (which is fine in a normal server).
>
> You can't seriously be comparing Geronimo's kernel to that of an
> embedded kernel, can you? Never mind that the JDK in its current form
> is way too big to fit on a small device.
>
> If you want to write Geronimo for J2ME, go ahead.
>
> If we're using J2SE, especially 1.4, then we can use the XML parses
> that it comes with.
>
> > Also we need to be open to other persistent forms.  The XML document
> > is simply a persistent form of data and we need to be open to other
> > persistent forms.
>
> Yes, provided that those forms are a declarative non-binary
> representation for the reasons already pointed out. As I said, I'm not
> a definitive fan of XML; I'd prefer to see JNDI/LDAP being used
> personally, but there are bound to be other alternatives. I was merely
> pointing out the flaws in the argument; that it was optimisation
> without benchmarking proving the case in Geronimo's loading time; that
> programmatic specs degrade faster than declarative specs, and that
> binary formats degrade faster than text formats.
>
> I think that that's a pretty open PoV :-)
>

I originally said:

"This allows us, if we wish, to pre-compile the configuration information
into other forms. For example, it could be an archive of serialized MBean
states that can simply be unmarshalled by the server and started. This has
many potential advantages, such as reducing the startup time for very large
applications or reducing the resources required for an embedded server."

which is hardly advocating excessive premature optimization, just
illustrating a possible future path.

Please do not misrepresent me.

--
Jeremy

Re: Geronimo Deployment Descriptors -- and premature optimisation

Posted by Alex Blewitt <Al...@ioshq.com>.

On Tuesday, Sep 9, 2003, at 21:22 Europe/London, Dain Sundstrom wrote:

> Alex in an embedded system you may only have a small prom available to 
> boot from.  Specifically, you may want to ditch the 1 meg for an XML 
> parser.  You may also have a small amount of memory and XML parsers 
> are known for being memory pigs (which is fine in a normal server).

You can't seriously be comparing Geronimo's kernel to that of an 
embedded kernel, can you? Never mind that the JDK in its current form 
is way too big to fit on a small device.

If you want to write Geronimo for J2ME, go ahead.

If we're using J2SE, especially 1.4, then we can use the XML parses 
that it comes with.

> Also we need to be open to other persistent forms.  The XML document 
> is simply a persistent form of data and we need to be open to other 
> persistent forms.

Yes, provided that those forms are a declarative non-binary 
representation for the reasons already pointed out. As I said, I'm not 
a definitive fan of XML; I'd prefer to see JNDI/LDAP being used 
personally, but there are bound to be other alternatives. I was merely 
pointing out the flaws in the argument; that it was optimisation 
without benchmarking proving the case in Geronimo's loading time; that 
programmatic specs degrade faster than declarative specs, and that 
binary formats degrade faster than text formats.

I think that that's a pretty open PoV :-)

Alex.

Re: Geronimo Deployment Descriptors -- and premature optimisation

Posted by Dain Sundstrom <da...@coredevelopers.net>.

Alex in an embedded system you may only have a small prom available to  
boot from.  Specifically, you may want to ditch the 1 meg for an XML  
parser.  You may also have a small amount of memory and XML parsers are  
known for being memory pigs (which is fine in a normal server).

Also we need to be open to other persistent forms.  The XML document is  
simply a persistent form of data and we need to be open to other  
persistent forms.

-dain

On Tuesday, September 9, 2003, at 01:53 PM, Alex Blewitt wrote:

> On Tuesday, Sep 9, 2003, at 17:44 Europe/London, Jeremy Boynes wrote:
>
>>> However, it doesn't necessarily mean that it can't generate the XML,
>>> rather than a binary-compatible format that Jeremy was suggesting. An
>>> XML document will always be more portable between versions than a
>>> generated bunch of code, because when bugs are fixed in the latter  
>>> you
>>> have to regenerate, whereas with an XML file you don't.
>>
>> Please do not think I am thinking binary is the only way to go - that  
>> notion
>> was discarded back in EJB1.0 days. What I want is to have it as an  
>> option.
>
> Can I make a few observations here:
>
> o Assumption: large XML files take long time to parse, therefore the  
> server will be slow to start up
> o Assumption: the way to solve that is with the deploy tool, and  
> possibly a combined XML+binary format.
>
> I think there are other solutions to the problem than just these.  
> Whilst it is true the XML file parsing can take some time, it's not  
> actually likely to be where the amount of time is taken up in the  
> server. If we had metrics to prove it, I'd shut up, but we don't.
>
> I'd postulate that we would be able to fire up the server faster if we  
> used a different optimisations; for example, a multi-threaded startup  
> (like provided by Avalon) instead of a single threaded model; an  
> on-the-fly parse of the XML file instead of into a DOM/POJO; ditching  
> the JMX later and using Java method calls; and so on.
>
> But we don't *know* that this is where the bottleneck is. It may be,  
> and we can run tests to show that in a simple scenario, option A is  
> faster than option B, but that doesn't mean that that's where the  
> bottleneck will be in the server.
>
> But if it takes (say) 10 or 100 times as long to dynamically create  
> the bean, we are solving the wrong problem. Don't get me wrong, I  
> don't know how much time it takes to create a bean -- but we don't  
> seem to have any profiling to suggest the various options. It could  
> even be the case that a more optimised XML parser would solve the  
> problem, or a different way of creating the POJOs.
>
> I'd also like to disagree that this optimisation should be done by the  
> deployer. Why not have it done by the server when the code is  
> deployed? Sure, you wouldn't want it to happen every time the server  
> starts (like compiling JSPs) -- so dump out a binary representation at  
> the server side, and drop that cache when the application gets  
> redeployed. That way, you still get the fast startup (2nd time  
> onwards) whilst maintaining portability and without having to  
> sacrifice any issues with the developer.
>
>> For example, parsing the XML with
>> full schema validation is a dog - on my machine even a simple file  
>> takes a
>> couple of seconds and a couple of MB and I am concerned about a) large
>> applications with hundreds of modules taking forever to start, and b)
>> applications trying to run with constrained resources. And yes, we do  
>> need
>> to consider these things :-)
>
> But if you had that large an application, how long would you expect it  
> to take up? Realistically, what is the largest size of app you've had  
> to deal with? Most web-apps have just a single servlet these days (ala  
> Struts), so the only issue is with EJBs, and with 1000 EJBs you're  
> still looking at 1k of data/EJB to make a 1MB file. That's a hell of a  
> lot. And do we know how long it takes to deploy 1000 EJBs once the XML  
> file has loaded? Are we seriously saying that we expect that part of  
> the process to take dramatically less than 2s? If not, then the  
> bottleneck isn't going to be at the XML parsing stage.
>
>> We have also had proposals for storing configuration information in  
>> LDAP
>> respositories and relational databases, neither of which would allow
>> vi-style access to the XML. A binary format may well be a better  
>> option for
>> them.
>
> IMHO I don't think that a 'vi' style access for XML is the sole reason  
> to use them. I am personally more a fan of storing the configuration  
> in LDAP, which will be slower still than having it in XML files. But I  
> wanted to raise a big 'no' to a binary file format, including any  
> serialized concepts of MBeans which would then have real difficulty in  
> being interpreted if we ever managed to break away from JMX. No, I  
> don't think it will happen soon, but I can hope :-) See Elliotte's  
> comments on XML and binary at  
> http://www.cafeconleche.org/books/effectivexml/chapters/50.html (or  
> the cached version at  
> http://216.239.41.104/ 
> search?q=cache:oxknzyhXE9MJ:www.cafeconleche.org/books/effectivexml/ 
> chapters/50.html+%22Compress+if+space+is+a+problem%22&hl=en&ie=UTF-8  
> since I couldnt' see it on the former)
>
>> Think of it like JSP: some people want to pre-compile, and this is  
>> *very*
>> common in production environments.
>
> I don't see the two being that comparable. A site may have many  
> hundereds of JSPs with several k of data in them each, and they take  
> (relatively speaking) a long time to parse, translate, and then  
> compile. I don't see that parsing an EJB-JAR.xml file in the same  
> order of magnitude.
>
> I don't disagree that we can cache an internal form to optimise  
> speedup; I just don't think it should be anything the deployment tool  
> should use. Same with JSPs; we can upload them into Geronimo, and then  
> a background process can pre-compile them when resources are  
> available. I don't think we should force the developer to decide  
> between the two. [What other JSP engines get wrong is that it's  
> necessary to precompile all JSPs before deployment. It's not; they  
> just need to be compiled before the user sees them. The process should  
> be Deploy -> run app -> precompile all possible next JSPs that you can  
> move to.]
>
> Premature optimisation is the root of all evil.
>
> Alex.
>
>

/*************************
  * Dain Sundstrom
  * Partner
  * Core Developers Network
  *************************/