You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Sam TH <sa...@uchicago.edu> on 2001/02/02 08:32:51 UTC

Some XML Questions

Well, to introduce myself, I'm not really a Be user, mailing list
posts to the contrary.  

I'm interested, to start with, in writing a DTD for the XML format
that subversion uses.  And I have a couple of questions.  

1) Should the DTD cover the postfix delta format?  I ask because it
isn't really possible, not without adding another root element outside
tree-delta?  

2)  (Random and pedantic) Does SVN_ERR_MALFORMED_XML represent
non-well-formed XML (as the name indicates) or invalid XML (as the
comment immediately preceding indicates)?  It's unlikely that
subversion will actually validate the XML, since it's using expat,
which has no plans to become a validating parser.

3) Speaking of expat, since the subversion project began, expat has
undergone a number of radical changes.  It is now autoconfiscated, and
libtoolized.  It builds a shared library, and does other nice things.
Would people be interested in changing to the newer version?  (Yeah, I
am volunteering for the work.)
           
	sam th		     
	sam@uchicago.edu
	http://www.abisource.com/~sam/
	GnuPG Key:  
	http://www.abisource.com/~sam/key

Re: New expat (was Re: Some XML Questions)

Posted by Sam TH <sa...@uchicago.edu>.
On Fri, Feb 02, 2001 at 11:39:10AM -0800, Greg Stein wrote:
> On Fri, Feb 02, 2001 at 01:53:25PM -0500, Greg Hudson wrote:
> > > If I had to vote, I guess I'd be -0.  Even if we import a static
> > > `new' expat into our tree, it adds the risks of new bugs, while
> > > solving a problem we don't have.
> > 
> > I think we might have a problem in the long run.  Right now, if an
> > application wants to use a Subversion library, it has to use
> > expat-lite.  Suppose the application also wanted to use the real expat
> > for other purposes; would there be a namespace conflict?
> > 
> > To be honest, I don't know if that's a real problem now or whether
> > upgrading would help eliminate it.  More study required.  But it's a
> > possibility.
> 
> Exactly. This was my comment about expat-lite not being able to live in the
> same address space as the real expat (or heck, Apache's expat-lite!)
> 
> Note that Apache is intending to lose its copy of expat-lite, too, in favor
> of a released version.

I hope you'll excuse me if I find this funny. :-)

When you figure out exactly what you want, let me know, and I'll do
it.  

With regards to CVS vs released, I understand the desire to use a
released version as a baseline (and it's a good idea).  However, there
are at least a couple fixes (at least one of which isn't even in CVS,
since I submitted it about 3 hours ago) which will be neccessary.  So
I reccommend starting from a CVS build right now, and moving to the
next release whenever it comes out.  
           
	sam th		     
	sam@uchicago.edu
	http://www.abisource.com/~sam/
	GnuPG Key:  
	http://www.abisource.com/~sam/key

Re: New expat (was Re: Some XML Questions)

Posted by Greg Stein <gs...@lyra.org>.
On Fri, Feb 02, 2001 at 01:53:25PM -0500, Greg Hudson wrote:
> > If I had to vote, I guess I'd be -0.  Even if we import a static
> > `new' expat into our tree, it adds the risks of new bugs, while
> > solving a problem we don't have.
> 
> I think we might have a problem in the long run.  Right now, if an
> application wants to use a Subversion library, it has to use
> expat-lite.  Suppose the application also wanted to use the real expat
> for other purposes; would there be a namespace conflict?
> 
> To be honest, I don't know if that's a real problem now or whether
> upgrading would help eliminate it.  More study required.  But it's a
> possibility.

Exactly. This was my comment about expat-lite not being able to live in the
same address space as the real expat (or heck, Apache's expat-lite!)

Note that Apache is intending to lose its copy of expat-lite, too, in favor
of a released version.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Some XML Questions

Posted by Ben Collins-Sussman <su...@newton.collab.net>.
Sam TH <sa...@uchicago.edu> writes:

> CVS expat is consistently 20-30% faster than expat-lite. ...
> 
> Does that change anyone's opinion?  

Is there a speed problem?

Seriously, our `need' for XML parsing is minimal.  We use XML as an
import/export format to describe tree changes (which will become
mostly obsolete once we have a working repository.)  We use XML
formats to store administrative data in the SVN/ area.  But this is
all small, lightweight stuff.  No need to swat a fly with a Buick.

If I had to vote, I guess I'd be -0.  Even if we import a static `new'
expat into our tree, it adds the risks of new bugs, while solving a
problem we don't have.

Re: Some XML Questions

Posted by Sam TH <sa...@uchicago.edu>.
On Fri, Feb 02, 2001 at 08:52:24AM -0600, Karl Fogel wrote:
> 
> Re expat-lite vs tracking expat:
> 
> The minute we need something expat-lite doesn't give us, we should
> upgrade and probably start tracking the live expat.  But to do so
> before then would be fixing what's not broken, and that always risks
> causing bugs.  There's no point making trouble for ourselves.

Well, I never did like taking no for an answer.  How about this:

CVS expat is consistently 20-30% faster than expat-lite.  This is
based on the benchmark for XML parsers available at
http://www.xml.com/lpt/a/Benchmark/parsertest.html 

I did some hackery to get this to test differnet versions of expat,
and the test harness doesn't work, but i have two binaries on my
machine, one of which is significantly and consistently faster.

Does that change anyone's opinion?  

And, just to note, I don't advocate tracking expat in the same way APR
is tracked.  I think we should just import current CVS expat (modulo
the few patches I wrote for it this morning).  
           
	sam th		     
	sam@uchicago.edu
	http://www.abisource.com/~sam/
	GnuPG Key:  
	http://www.abisource.com/~sam/key

Re: Some XML Questions

Posted by Ben Collins-Sussman <su...@newton.collab.net>.
Greg Stein <gs...@lyra.org> writes:

> Here's a minute:
> 
> 1) we cannot (dynamically) link expat-lite into a process which includes
>    another copy of Expat or expat-lite.
> 
> 2) our use of libtool 1.3 currently prevents us from defining library-to-
>    library dependencies. but when libtool 1.4 comes out, we will. at that
>    point, we'll mark libsvn_subr as dependent upon a version of Expat.
> 
> 3) mod_dav_svn uses libsvn_subr, which means it depends on Expat. that
>    cannot be expat-lite because of (1) above.

Uncle!  Uncle!  =)

Re: Some XML Questions

Posted by Greg Stein <gs...@lyra.org>.
On Fri, Feb 02, 2001 at 08:52:24AM -0600, Karl Fogel wrote:
> Sam TH <sa...@uchicago.edu> writes:
> > Patch for the comment. (How can I make ediff respect .cvsrc?)
> 
> Applied (thanks!).
> 
> Re expat-lite vs tracking expat:
> 
> The minute we need something expat-lite doesn't give us, we should
> upgrade and probably start tracking the live expat.  But to do so
> before then would be fixing what's not broken, and that always risks
> causing bugs.  There's no point making trouble for ourselves.

Here's a minute:

1) we cannot (dynamically) link expat-lite into a process which includes
   another copy of Expat or expat-lite.

2) our use of libtool 1.3 currently prevents us from defining library-to-
   library dependencies. but when libtool 1.4 comes out, we will. at that
   point, we'll mark libsvn_subr as dependent upon a version of Expat.

3) mod_dav_svn uses libsvn_subr, which means it depends on Expat. that
   cannot be expat-lite because of (1) above.


Today, mod_dav_svn happens to work because I rely on Apache's expat-lite to
satisfy the needs of libsvn_subr. All of this is asking for trouble down the
line. Simplifying management of this stuff will steer us towards the most
recent Expat, rather than our own copy.

Note that we should probably track releases of Expat rather than the CVS
version. When we ship, we need to work against a release... not CVS.
APR(UTIL) does not have a formal release at this point, so we aren't doing
that. It is also an alpha/beta product still seeing changes, so tracking
them via CVS is best.

[ and note that we don't track Neon via CVS because it isn't on a public CVS
  share for logistical / Joe-productivity reasons. ]

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Some XML Questions

Posted by Karl Fogel <kf...@galois.collab.net>.
Sam TH <sa...@uchicago.edu> writes:
> Patch for the comment. (How can I make ediff respect .cvsrc?)

Applied (thanks!).

Re expat-lite vs tracking expat:

The minute we need something expat-lite doesn't give us, we should
upgrade and probably start tracking the live expat.  But to do so
before then would be fixing what's not broken, and that always risks
causing bugs.  There's no point making trouble for ourselves.

-K

Re: Some XML Questions

Posted by Sam TH <sa...@uchicago.edu>.
On Fri, Feb 02, 2001 at 06:43:08AM -0600, Ben Collins-Sussman wrote:
> Sam TH <sa...@uchicago.edu> writes:
> 
> > 1) Should the DTD cover the postfix delta format?  I ask because it
> > isn't really possible, not without adding another root element outside
> > tree-delta?  
> 
> I believe this DTD already exists.  Check the mail archive:
> 
> http://subversion.tigris.org/subversion-dev/6/msg00171.html
> 
> Perhaps it's not part of the source tree though... we could fix this.

Well, it isn't in the source tree.  Has the spec for the XML format
changed at all since September?

> 
> > 
> > 2)  (Random and pedantic) Does SVN_ERR_MALFORMED_XML represent
> > non-well-formed XML (as the name indicates) or invalid XML (as the
> > comment immediately preceding indicates)?  It's unlikely that
> > subversion will actually validate the XML, since it's using expat,
> > which has no plans to become a validating parser.
> 
> Right, expat only checks for well-formed XML.  Validation isn't
> possible with expat, and we don't care much.
> 

Patch for the comment. (How can I make ediff respect .cvsrc?)

Index: svn_error.h
===================================================================
RCS file: /cvs/subversion/subversion/include/svn_error.h,v
retrieving revision 1.66
diff -u -c -r1.66 svn_error.h
cvs server: conflicting specifications of output style
*** svn_error.h	2001/01/25 20:30:52	1.66
--- svn_error.h	2001/02/02 12:54:57
***************
*** 44,50 ****
    SVN_ERR_MALFORMED_FILE,
    SVN_ERR_INCOMPLETE_DATA,
  
!   /* The xml delta we got was not valid. */
    SVN_ERR_MALFORMED_XML,
  
    /* A working copy "descent" crawl came up empty */
--- 44,50 ----
    SVN_ERR_MALFORMED_FILE,
    SVN_ERR_INCOMPLETE_DATA,
  
!   /* The xml delta we got was not well formed. */
    SVN_ERR_MALFORMED_XML,
  
    /* A working copy "descent" crawl came up empty */


> 
> > 3) Speaking of expat, since the subversion project began, expat has
> > undergone a number of radical changes.  It is now autoconfiscated, and
> > libtoolized.  It builds a shared library, and does other nice things.
> > Would people be interested in changing to the newer version?  (Yeah, I
> > am volunteering for the work.)
> 
> Um... our XML needs are pretty lightweight.  If we switch to the
> "real" expat, I see a big disadvantage: we now depend on *another*
> external package (it's already annoying enough that we have to
> manually fetch APR and Neon to build).  As it is, it's convenient to
> have expat-lite as a permanent part of our tree, and it's perfectly
> sufficient for what we're doing.
> 

Well, I think that having the newer version of expat as part of the
tree would be an excellent idea.  It hasn't changed much about
November of last year, and should be quite stable.

> That said, what are the pros of upgrading?  I can't think of any.
> 

Well, there are a few.

1) Subversion could link against a version of expat already on the
system (shared versions of expat are available for Debian already)

2) There's an interface to set your own memory handler for expat, so
we might get it to use APR stuff (NB: I've never actually looked at
it). 

3) It has a bunch of new features merged since James Clark stopped
maintaining it (mostly developed for Perl expat).  Mostly these
involve handling parts of the file in different ways, or handling
different parts of the file (that expat didn't previously).

4) I'm not entirely sure when the version of expat that is in the
Subversion tree is, but if it's before expat 1.1, then all the stuff
relating to namespaces is missing too.  Unfortunately, since James
Clark provides no information about the different versions at all,
it's hard to tell exactly what's different.  
           
5) If the expat people make any other changes Subversion might care
about, they would be mergable.  

6) The expat people are making sure it's portable to various
platforms.  

Well, 1 is really the biggest one here.  And, for reference, if you
remove the sample applications, the newer expat is about 1500 lines of
source longer than expat-lite (11000 vs 9500).  

	sam th		     
	sam@uchicago.edu
	http://www.abisource.com/~sam/
	GnuPG Key:  
	http://www.abisource.com/~sam/key

Re: Some XML Questions

Posted by Ben Collins-Sussman <su...@newton.collab.net>.
Sam TH <sa...@uchicago.edu> writes:

> 1) Should the DTD cover the postfix delta format?  I ask because it
> isn't really possible, not without adding another root element outside
> tree-delta?  

I believe this DTD already exists.  Check the mail archive:

http://subversion.tigris.org/subversion-dev/6/msg00171.html

Perhaps it's not part of the source tree though... we could fix this.

> 
> 2)  (Random and pedantic) Does SVN_ERR_MALFORMED_XML represent
> non-well-formed XML (as the name indicates) or invalid XML (as the
> comment immediately preceding indicates)?  It's unlikely that
> subversion will actually validate the XML, since it's using expat,
> which has no plans to become a validating parser.

Right, expat only checks for well-formed XML.  Validation isn't
possible with expat, and we don't care much.


> 3) Speaking of expat, since the subversion project began, expat has
> undergone a number of radical changes.  It is now autoconfiscated, and
> libtoolized.  It builds a shared library, and does other nice things.
> Would people be interested in changing to the newer version?  (Yeah, I
> am volunteering for the work.)

Um... our XML needs are pretty lightweight.  If we switch to the
"real" expat, I see a big disadvantage: we now depend on *another*
external package (it's already annoying enough that we have to
manually fetch APR and Neon to build).  As it is, it's convenient to
have expat-lite as a permanent part of our tree, and it's perfectly
sufficient for what we're doing.

That said, what are the pros of upgrading?  I can't think of any.

Re: Some XML Questions

Posted by Sam TH <sa...@uchicago.edu>.
On Mon, Feb 05, 2001 at 09:40:34AM -0600, Karl Fogel wrote:
> Sure, post the patches, thanks...
> 

Here are the patches.  I've commented a little after each one.  When
you are actually ready to commit this, I'll write some changelog
entries.  

Index: subversion/include/svn_xml.h
===================================================================
RCS file: /cvs/subversion/subversion/include/svn_xml.h,v
retrieving revision 1.28
diff -u -r1.28 svn_xml.h
--- subversion/include/svn_xml.h	2000/12/26 18:37:46	1.28
+++ subversion/include/svn_xml.h	2001/02/03 12:31:25
@@ -20,7 +20,7 @@
 #ifndef SVN_XML_H
 #define SVN_XML_H
 
-#include "xmlparse.h"
+#include "expat.h"
 #include "svn_error.h"
 #include "svn_delta.h"
 #include "svn_string.h"

This is just the annoying fact that they changed the name of the
header.  

Index: Makefile.am
===================================================================
RCS file: /cvs/subversion/Makefile.am,v
retrieving revision 1.16
diff -u -r1.16 Makefile.am
--- Makefile.am	2001/01/28 16:44:28	1.16
+++ Makefile.am	2001/02/03 12:31:25
@@ -8,7 +8,7 @@
 ##       dependencies between Makefile.am, Makefile.in, and Makefile.
 ##       In other words, SUBDIRS does not completely control automake
 ##       generation.
-SUBDIRS = apr expat-lite neon subversion doc
+SUBDIRS = apr expat neon subversion doc
 
 ACLOCAL = @ACLOCAL@ -I ac-helpers
 

Since it's no longer lite, I renamed it.

Index: configure.in
===================================================================
RCS file: /cvs/subversion/configure.in,v
retrieving revision 1.52
diff -u -r1.52 configure.in
--- configure.in	2001/01/31 17:36:18	1.52
+++ configure.in	2001/02/03 12:31:25
@@ -43,7 +43,8 @@
 
 if test "$enable_subdir_config" = "yes"; then
   RUN_SUBDIR_CONFIG_NOW(apr)
-  RUN_SUBDIR_CONFIG_NOW(neon, --with-expat="$abs_srcdir/expat-lite/libexpat.la")
+  RUN_SUBDIR_CONFIG_NOW(expat, --disable-shared)
+  RUN_SUBDIR_CONFIG_NOW(neon, --with-expat="$abs_srcdir/expat/lib/libexpat.la")
 fi
 
 
@@ -99,9 +100,9 @@
 AC_SUBST(SVN_APR_LIBS)
 
 dnl Expat
-SVN_EXPAT_INCLUDES='-I$(top_srcdir)/expat-lite'
+SVN_EXPAT_INCLUDES='-I$(top_srcdir)/expat/lib'
 AC_SUBST(SVN_EXPAT_INCLUDES)
-SVN_EXPAT_LIBS='$(top_builddir)/expat-lite/libexpat.la'
+SVN_EXPAT_LIBS='$(top_builddir)/expat/lib/libexpat.la'
 AC_SUBST(SVN_EXPAT_LIBS)
 
 dnl Neon
@@ -247,7 +248,6 @@
 
 AC_OUTPUT([Makefile                                           \
            doc/Makefile                                       \
-           expat-lite/Makefile                                \
            subversion/Makefile                                \
            subversion/libsvn_subr/Makefile                    \
            subversion/libsvn_delta/Makefile                   \

The new expat uses libtool, so it's nice and easy to link.  We don't
have to create it's makefile either.

Index: subversion/libsvn_delta/xml_parse.c
===================================================================
RCS file: /cvs/subversion/subversion/libsvn_delta/xml_parse.c,v
retrieving revision 1.142
diff -u -r1.142 xml_parse.c
--- subversion/libsvn_delta/xml_parse.c	2001/01/27 02:36:32	1.142
+++ subversion/libsvn_delta/xml_parse.c	2001/02/03 12:31:26
@@ -42,7 +42,6 @@
 #include "svn_base64.h"
 #include "svn_quoprint.h"
 #include "apr_strings.h"
-#include "xmlparse.h"
 #include "delta.h"
 
 
This patch stands, even if the expat change wasn't happening.  There's
no need for this file to know the name of the expat header, since it
includes svn_xml.h.  

Of course, the actual changes to expat are much larger.  :-)
           
	sam th		     
	sam@uchicago.edu
	http://www.abisource.com/~sam/
	GnuPG Key:  
	http://www.abisource.com/~sam/key

Re: Some XML Questions

Posted by Karl Fogel <kf...@galois.collab.net>.
Sure, post the patches, thanks...

-K


Sam TH <sa...@uchicago.edu> writes:
> --U+BazGySraz5kW0T
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> Content-Transfer-Encoding: quoted-printable
> 
> On Fri, Feb 02, 2001 at 11:24:09AM -0600, Karl Fogel wrote:
> > Let's please not do this now, since there are more urgent matters to
> > concentrate on (the repository), but make sure we do it well before
> > 1.0.
> 
> You sure you don't want to do it now?  I am done.  :-)
> 
> Actually, it was pretty easy.  The major difficulty was learning to
> write Autoconf macros so that I could modify the neon ones.
> 
> Changing depends on a patch to neon, which I have submitted.  If
> anyone wants to see my patch in the interim, just let me know.  It's
> only about 20 lines. =20
>           =20
> 	sam th		    =20
> 	sam@uchicago.edu
> 	http://www.abisource.com/~sam/
> 	GnuPG Key: =20
> 	http://www.abisource.com/~sam/key
> 
> --U+BazGySraz5kW0T
> Content-Type: application/pgp-signature
> Content-Disposition: inline
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.0.4 (GNU/Linux)
> Comment: For info see http://www.gnupg.org
> 
> iD8DBQE6e/8Gt+kM0Mq9M/wRAjoGAKC5BYwihDmkB8uViQJSF+C3noiGTACfUBCg
> Gxj8IysdeHU3JMZKtBgEwgc=
> =Mrjy
> -----END PGP SIGNATURE-----
> 
> --U+BazGySraz5kW0T--

Re: Some XML Questions

Posted by Sam TH <sa...@uchicago.edu>.
On Fri, Feb 02, 2001 at 11:24:09AM -0600, Karl Fogel wrote:
> Let's please not do this now, since there are more urgent matters to
> concentrate on (the repository), but make sure we do it well before
> 1.0.

You sure you don't want to do it now?  I am done.  :-)

Actually, it was pretty easy.  The major difficulty was learning to
write Autoconf macros so that I could modify the neon ones.

Changing depends on a patch to neon, which I have submitted.  If
anyone wants to see my patch in the interim, just let me know.  It's
only about 20 lines.  
           
	sam th		     
	sam@uchicago.edu
	http://www.abisource.com/~sam/
	GnuPG Key:  
	http://www.abisource.com/~sam/key

Re: Some XML Questions

Posted by Karl Fogel <kf...@galois.collab.net>.
Greg Stein <gs...@lyra.org> writes:
> > 3) Speaking of expat, since the subversion project began, expat has
> > undergone a number of radical changes.  It is now autoconfiscated, and
> > libtoolized.  It builds a shared library, and does other nice things.
> > Would people be interested in changing to the newer version?  (Yeah, I
> > am volunteering for the work.)
> 
> There is zero intent to stick with expat-lite. It should be replaced with
> the real Expat distribution, much in the same way that we handle Neon. If
> you want to do it sooner than later, please help! But it will be replaced
> before we ship 1.0.

Okay, I'm convinced (by this and the arguments in your later mail). :-)

Let's please not do this now, since there are more urgent matters to
concentrate on (the repository), but make sure we do it well before
1.0.

Re: Some XML Questions

Posted by Greg Stein <gs...@lyra.org>.
On Fri, Feb 02, 2001 at 02:32:51AM -0600, Sam TH wrote:
>...
> 1) Should the DTD cover the postfix delta format?  I ask because it
> isn't really possible, not without adding another root element outside
> tree-delta?

There should be only one root element, no matter what. I believe there is
supposed to be a delta-pkg element that wraps everything up.

> 2)  (Random and pedantic) Does SVN_ERR_MALFORMED_XML represent
> non-well-formed XML (as the name indicates) or invalid XML (as the
> comment immediately preceding indicates)?  It's unlikely that
> subversion will actually validate the XML, since it's using expat,
> which has no plans to become a validating parser.

You're right. We wouldn't ever validate the XML; just check for
well-formedness. The software itself essentially performs the validation as
it handles the elements.

[ in fact, I have a general refusal to believe much in validation since the
  semantics and the DTD are effectively encoded in the stuff that processes
  the XML. ]

> 3) Speaking of expat, since the subversion project began, expat has
> undergone a number of radical changes.  It is now autoconfiscated, and
> libtoolized.  It builds a shared library, and does other nice things.
> Would people be interested in changing to the newer version?  (Yeah, I
> am volunteering for the work.)

There is zero intent to stick with expat-lite. It should be replaced with
the real Expat distribution, much in the same way that we handle Neon. If
you want to do it sooner than later, please help! But it will be replaced
before we ship 1.0.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/