You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geronimo.apache.org by James Strachan <ja...@yahoo.co.uk> on 2003/09/01 15:02:53 UTC

Re: [XML Parsing]

On Sunday, August 31, 2003, at 09:19  pm, Davanum Srinivas wrote:

> How about using digester? - 
> http://jakarta.apache.org/commons/digester.html

If we go the POJO route (as described in the [vote] XML Parsing thread) 
then using Betwixt (which uses Digester) is my preferred route. We can 
then let Xerces take care of the XSD validation.

James
-------
http://radio.weblogs.com/0112098/


Re: [XML Parsing]

Posted by James Strachan <ja...@yahoo.co.uk>.
On Monday, September 1, 2003, at 02:28  pm, Aaron Mulder wrote:

> On Mon, 1 Sep 2003, James Strachan wrote:
>> If we go the POJO route (as described in the [vote] XML Parsing  
>> thread)
>> then using Betwixt (which uses Digester) is my preferred route. We can
>> then let Xerces take care of the XSD validation.
>
> 	I'm new to Betwixt, but it looks like we have two options.  One is
> to define a .betwixt file for every POJO

No thats only if you want to customize things on a per class basis


> , and the other is to have one big
> class which sets up all the Betrixt mappings in code.

No. By default Betwixt will do this all for you using sensible defaults.


> I was hoping for
> the middle solutions -- one mapping file listing all of the mappings.
> Which path do you recommend?

Out of the box Betwixt can handle the marshalling for you. You can  
instantiate your own XMLIntrospector and customize it to how you want  
the XML to look (e.g. for J2EE its to use elements rather than  
attributes and so forth). Then just go right ahead and marshall.

A good example to see it in action is commons-sql...

http://jakarta.apache.org/commons/sandbox/sql/index.html

here are the 2 simple classes to wrap up the use of betwixt for  
marshalling in / out of XML...

http://jakarta.apache.org/commons/sandbox/sql/apidocs/org/apache/ 
commons/sql/io/package-summary.html

then here are the POJOs...

http://jakarta.apache.org/commons/sandbox/sql/apidocs/org/apache/ 
commons/sql/model/package-summary.html

Doing something similar should be pretty easy and save us having to  
hand-craft digester rules.

James
-------
http://radio.weblogs.com/0112098/


Re: [XML Parsing]

Posted by Dain Sundstrom <da...@coredevelopers.net>.
On Tuesday, September 2, 2003, at 06:31 AM, Jan Bartel wrote:

> Dain Sundstrom wrote:
>
>> containers.  Why do I have to generate some xml to deploy a servlet?  
>> I should be able to create some pojo data structures and tell the 
>> deployer to "do it".  Anyway, that is an issue for another day...
> No offense taken :-)  The reason they all deal with xml descriptors is 
> purely because the spec mandates it.

Yes, but it doesn't say that the only way to deploy a servlet it to 
generate an xml file or dom.  Maybe I'm in left field but I think it 
would be very useful to be able to deploy a servlet at runtime by just 
creating some pojos describing what you watn.

>> For now, I think we make an exception for the web containers, but for 
>> every thing else I think the metadata should be passed around in 
>> pojos.  That way users will be able to deploy ejb and message queues 
>> on the fly.
> Not sure I'm with you here, why does on-the-fly deployment require 
> POJOs? If we pass a DOM of some kind around, representing the 
> deployment descriptors:
>
>   + the DOM can be constructed either from xml or on the fly
>   + we've already got all the jars necessary to support it in
>     geronimo's /lib
>   + there are efficiencies to be gained, at least in the web
>     deployer/container area by using DOMs

I agree with the first two but the last I don't get... unless you 
actually end up parsing the xml file several times.

>>>    + nice if object <-> DOM was easily available
>> What do you mean?
> I mean I still haven't given up on trying to stop the web containers
> double or triple parsing the descriptors, so I want the option to be
> able to pass them in a pre-parsed DOM. If we use POJOs then my only
> way of doing that is if there is POJO -> DOM.

I really think we should make an exception for the web container.  
Maybe one of the fields in the pojo is the acutal dom.  For everything 
else since we are starting fresh, I don't see why we shouldn't make the 
system easier to use by defining pojos for metadata.

-dain

/*************************
  * Dain Sundstrom
  * Partner
  * Core Developers Network
  *************************/


Re: [XML Parsing]

Posted by Jan Bartel <ja...@mortbay.com>.
Dain Sundstrom wrote:
<snip>
>>   + Web deployment:
>>     the concrete web containers (eg Jetty, Tomcat) deal with xml
>>     descriptors anyway, so they already DOM-ify them. This in fact
>>     leads to double parsing of web.xml descriptors: once by geronimo
>>     and once by the container. This could be optimised by modifying
>>     Jetty and Tomcat to operate on a DOM version of web.xml instead.
>>     However, if geronimo moves to POJO representations of the
>>     descriptors, then such a modification wouldn't be possible (or
>>     we'd have to introduce a POJO->DOM conversion).
> 
> 
> (No offense ment here) I have always considered this a design problem 
> with web containers.  Why do I have to generate some xml to deploy a 
> servlet?  I should be able to create some pojo data structures and tell 
> the deployer to "do it".  Anyway, that is an issue for another day...
No offense taken :-)  The reason they all deal with xml descriptors is 
purely because the spec mandates it.

> For now, I think we make an exception for the web containers, but for 
> every thing else I think the metadata should be passed around in pojos. 
>  That way users will be able to deploy ejb and message queues on the fly.
Not sure I'm with you here, why does on-the-fly deployment require 
POJOs? If we pass a DOM of some kind around, representing the deployment 
descriptors:

   + the DOM can be constructed either from xml or on the fly
   + we've already got all the jars necessary to support it in
     geronimo's /lib
   + there are efficiencies to be gained, at least in the web
     deployer/container area by using DOMs

> 
>> So can we enumerate the advantages of a non-DOM representation of the
>> deployment descriptors?
> 
> 
> oops above :-)
Well, it was a start at least :-)

> 
>> Other requirements for whatever representation of the deployment 
>> descriptors we choose should include:
>>
>>    + must be fast translating object <-> xml
> 
> 
> Well how fast?  All of the parsing happens during deployment time, so it 
> is not on the invocation critical path.  Deployment should be fast, but 
> it does not need to be eyeblink fast.
True.

> 
>>    + must require minimum supporting jars (we *must* avoid /lib bloat)
> 
> 
> +100  All of the frameworks we are looking at are huge.  I personally 
> think we have the best option with XMLBeans as it will be an apache 
> project.
(One of) my pet peeves is bloat in the number of jars required just to 
get a "Hello World" happening. It would be great if geronimo was lean 
and mean (without of course sacrificing functionality).

>>    + must support xpath
> 
> 
> Unless we can get outboard xpath over pojos support.
Yes. I believe there is at least one Jakarta xpath-for-POJO project. I 
don't have any experience with it and I was hoping someone on this list
might contribute their evaluation of it.

> 
>>    + nice if object <-> DOM was easily available
> 
> 
> What do you mean?
I mean I still haven't given up on trying to stop the web containers
double or triple parsing the descriptors, so I want the option to be
able to pass them in a pre-parsed DOM. If we use POJOs then my only
way of doing that is if there is POJO -> DOM.


cheers,
Jan




Re: [XML Parsing]

Posted by Dain Sundstrom <da...@coredevelopers.net>.
On Monday, September 1, 2003, at 05:39 PM, Jan Bartel wrote:

> Apropos of Aaron's plea (see blow), I know I voted +1 for POJOs, but 
> considering it a little more and given the difficulties experienced, 
> are we sure that this is a necessary approach?
>
> That is, do we have a clear set of use-cases and requirements that are 
> met by POJOs or in fact a non-DOM model? I'm not against some POJO etc 
> representation of the deployment descriptors, I just think we need to 
> clearly understand the reasons why we need them.

I need them.  I am writing a Dynamic MBean which generated the 
MBean*Info objects from an xml file.  These objects are very specific 
and must be sub classes of the MBean*Info classes (they are classes and 
not interfaces).

As an aside I prefer to write my own beans.  We get much better control 
and it is easy (and fast) with a decent IDE.

> I haven't actually seen any compelling use-cases yet - the ones I can 
> enumerate would be better served by dealing directly with a DOM :
>
>   + JSR88 deployment:
>     the purpose of JSR88 beans is to deal with xml deployment
>     descriptors. They need to perform xpath matching.

IIRC there is an XPath project that will query over a pojo graph.

>   + Web deployment:
>     the concrete web containers (eg Jetty, Tomcat) deal with xml
>     descriptors anyway, so they already DOM-ify them. This in fact
>     leads to double parsing of web.xml descriptors: once by geronimo
>     and once by the container. This could be optimised by modifying
>     Jetty and Tomcat to operate on a DOM version of web.xml instead.
>     However, if geronimo moves to POJO representations of the
>     descriptors, then such a modification wouldn't be possible (or
>     we'd have to introduce a POJO->DOM conversion).

(No offense ment here) I have always considered this a design problem 
with web containers.  Why do I have to generate some xml to deploy a 
servlet?  I should be able to create some pojo data structures and tell 
the deployer to "do it".  Anyway, that is an issue for another day...

For now, I think we make an exception for the web containers, but for 
every thing else I think the metadata should be passed around in pojos. 
  That way users will be able to deploy ejb and message queues on the 
fly.

> So can we enumerate the advantages of a non-DOM representation of the
> deployment descriptors?

oops above :-)

> Other requirements for whatever representation of the deployment 
> descriptors we choose should include:
>
>    + must be fast translating object <-> xml

Well how fast?  All of the parsing happens during deployment time, so 
it is not on the invocation critical path.  Deployment should be fast, 
but it does not need to be eyeblink fast.

>    + must require minimum supporting jars (we *must* avoid /lib bloat)

+100  All of the frameworks we are looking at are huge.  I personally 
think we have the best option with XMLBeans as it will be an apache 
project.

>    + must support xpath

Unless we can get outboard xpath over pojos support.

>    + nice if object <-> DOM was easily available

What do you mean?

-dain


Re: [XML Parsing]

Posted by Jan Bartel <ja...@mortbay.com>.
Apropos of Aaron's plea (see blow), I know I voted +1 for POJOs, but 
considering it a little more and given the difficulties experienced, are 
we sure that this is a necessary approach?

That is, do we have a clear set of use-cases and requirements that are 
met by POJOs or in fact a non-DOM model? I'm not against some POJO etc 
representation of the deployment descriptors, I just think we need to 
clearly understand the reasons why we need them.

I haven't actually seen any compelling use-cases yet - the ones I can 
enumerate would be better served by dealing directly with a DOM :

   + JSR88 deployment:
     the purpose of JSR88 beans is to deal with xml deployment
     descriptors. They need to perform xpath matching.

   + Web deployment:
     the concrete web containers (eg Jetty, Tomcat) deal with xml
     descriptors anyway, so they already DOM-ify them. This in fact
     leads to double parsing of web.xml descriptors: once by geronimo
     and once by the container. This could be optimised by modifying
     Jetty and Tomcat to operate on a DOM version of web.xml instead.
     However, if geronimo moves to POJO representations of the
     descriptors, then such a modification wouldn't be possible (or
     we'd have to introduce a POJO->DOM conversion).


So can we enumerate the advantages of a non-DOM representation of the
deployment descriptors?

Other requirements for whatever representation of the deployment 
descriptors we choose should include:

    + must be fast translating object <-> xml
    + must require minimum supporting jars (we *must* avoid /lib bloat)
    + must support xpath
    + nice if object <-> DOM was easily available


regards
Jan

Aaron Mulder wrote:
<snip>
> 	Also, the profusion of commons libraries required to get this
> running is a little frustrating.
> 
> 	Can we just use DOM please?


Re: [XML Parsing]

Posted by Aaron Mulder <am...@alumni.princeton.edu>.
On Mon, 1 Sep 2003, Aaron Mulder wrote:
> On Mon, 1 Sep 2003, James Strachan wrote:
> > If we go the POJO route (as described in the [vote] XML Parsing thread) 
> > then using Betwixt (which uses Digester) is my preferred route. We can 
> > then let Xerces take care of the XSD validation.

	Well, I tried, and I'm defeated by Betwixt.  Does anyone have an
example?  I'm trying to map a tiny EJB JAR DD fragment to beans.  I can
get the version (an attribute) into a bean property, but I can't get the
description (a repeating element) into a child bean property.  I've
included the XML fragment, the two source files, and the two .betwixt
files, but the result of parsing this is that I get an EjbJarTag with the
version set and no descriptions set.

	Also, between Digester and Betwixt, the performance on this sucks.  
Perhaps it's related to the (I kid you not) 1023 lines of log output
resulting from parsing my 33-line XML file.

	Also, the profusion of commons libraries required to get this
running is a little frustrating.

	Can we just use DOM please?

Thanks,
	Aaron


<ejb-jar version="2.1">
    <description lang="en">text</description>
    <description lang="es">text</description>
</ejb-jar>



public class EjbJarTag {
    private String version;
    private List descriptions = new ArrayList();

    public String getVersion() {
        return version;
    }

    public void setVersion(String version) {
        this.version = version;
    }

    public DescriptionTag[] getDescriptions() {
        return (DescriptionTag[])descriptions.toArray(new 
DescriptionTag[descriptions.size()]);
    }

    public void addDescription(DescriptionTag desc) {
        descriptions.add(desc);
    }
}


<info>
    <element name="ejb-jar">
        <attribute name="version" property="version" />
        <element name="description" property="description" />
    </element>
</info>


public class DescriptionTag {
    private String lang;
    private String content;

    public String getContent() {
        return content;
    }

    public void setContent(String content) {
        this.content = content;
    }

    public String getLang() {
        return lang;
    }

    public void setLang(String lang) {
        this.lang = lang;
    }
}


<info>
    <element name="description">
        <text property="content" />
        <attribute name="lang" property="lang" />
    </element>
</info>


Re: [XML Parsing]

Posted by Aaron Mulder <am...@alumni.princeton.edu>.
On Mon, 1 Sep 2003, James Strachan wrote:
> If we go the POJO route (as described in the [vote] XML Parsing thread) 
> then using Betwixt (which uses Digester) is my preferred route. We can 
> then let Xerces take care of the XSD validation.

	I'm new to Betwixt, but it looks like we have two options.  One is 
to define a .betwixt file for every POJO, and the other is to have one big 
class which sets up all the Betrixt mappings in code.  I was hoping for 
the middle solutions -- one mapping file listing all of the mappings.  
Which path do you recommend?

	It seems like we'll need to do a bit of customization since some 
of the tags have a mix of attributes and content (description, icon, 
etc.), but hopefully we can map the same "description" bean to the 
"description" element wherever it may appear.

	The Betwixt documentation would be significantly improved by an
example with a more complex XML format... I guess we can contribute one if
we go forward with this.  :)

Aaron