You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Jimmy Zhang <jz...@ximpleware.com> on 2006/02/13 22:57:21 UTC
[ANN] VTD-XML Version 1.5 Released
[ANN] VTD-XML Version 1.5 Released
Eight years after the invention of XML, DOM and SAX,
despite their respective issues, are still the mainstays
of application developers.
So is it the end of road for XML parsing innovation?
The VTD-XML project team think not. We are proud to
announce the availability of both C and Java version
1.5 of VTD-XML, the next generation open-source XML
parser that goes beyond DOM and SAX in terms of
performance, memory usage and ease of use.
The technical highlights of VTD-XML are:
* Performance: the world's fastest XML parser,
between 5x~10x faster than DOM
* Memory Usage: 3x to 5x less than DOM, 1.3x~1.5x
XML document size
* Random access with built-in XPath support
* A simple and intuitive API
Other advanced features include:
* Buffer reuse
* Large document support (2GByte)
* Incremental update
* Hardware acceleration
* Native XML indexing.
For demos, latest benchmarks, related articles and software
downloads, please visit http://vtd-xml.sf.net. Also let us
know your thoughts and suggestions and help us improve
VTD-XML.
Re: VTD-XML Version 1.5 Released
Posted by Elliotte Harold <el...@metalab.unc.edu>.
Jimmy Zhang wrote:
> Elliotte, When XML pull parser came out, it didn't support DTD and
> external references either,
> to date, several XML pull implementation still do not support the full
> spec, yet they call themselves
> XML parsers, it seems to us that VTD-XML can do same thing??
They can't; you can't. Nothing should call itself an XML pull parser or
claim to support XML if it doesn't reach the minimal level of
conformance to the XML spec. As I've said before VTD-XML is hardly the
only product to cheat on benchmarks by failing to implement the spec.
The case of XML pull parsers is informative and relevant. You're right
that some XML pull parsers did and don't fully implement XML. When XML
pull parsers were first proposed, it was claimed that they would be
faster than push parsers like SAX. Initial benchmarks seemed to indicate
this might be true. However, those initial benchmarks were based on
incomplete and incorrect implementations of the XML specification. Once
those products were brought into conformance with XML, performance
plummeted to the level that could already be achieved with SAX and push
parsers.
Some XML pull parsers such as the StAX parser included in the JDK do
fully support XML (modulo bugs that will eventually be fixed). Some pull
parsers such as Woodstox do not. Products like Woodstox and VTD-XML that
don't meet the minimum level of conformance are being disingenuous in
claiming to be faster than standard products like Xerces. Formula-1 race
cars are faster than a Toyota Corolla on a race track, but there's a
reason they aren't allowed to drive on local streets.
I can't take any of VTD's measurements seriously until it demonstrates
conformance to the specification. Measuring performance before you prove
correctness is putting the cart before the horse. I don't care how fast
a car goes on the Autobahn if it blows up when hitting a speed bump on a
local road.
--
Elliotte Rusty Harold elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
Re: VTD-XML Version 1.5 Released
Posted by Jimmy Zhang <jz...@ximpleware.com>.
Elliotte, When XML pull parser came out, it didn't support DTD and external
references either,
to date, several XML pull implementation still do not support the full spec,
yet they call themselves
XML parsers, it seems to us that VTD-XML can do same thing??
Cheers,
jimmy
----- Original Message -----
From: "Elliotte Harold" <el...@metalab.unc.edu>
To: <jz...@ximpleware.com>
Cc: <xa...@xml.apache.org>
Sent: Monday, February 13, 2006 3:10 PM
Subject: Re: VTD-XML Version 1.5 Released
> jzhang@ximpleware.com wrote:
>> Elliotte,
>> It is not doing things for the sake of doing it..
>> it is to do things where it makes sense, I think that
>> a large majority of XML related vocabularies do not
>> require entity references, SOAP also excludes DTD and
>> external references because of their performance drag,
>> so though not supporting external entities, we still
>> believe many people will find VTD-XML useful...
>> so it is not about making egregious shortcuts, it is
>> more like making sensible compromises.
>> numerical entities are supported.
>
> If you're not supporting the *minimum* things required of an XML parser,
> and have no intentions of doing so, then you shouldn't be using the name
> XML for your product.
>
> You further shouldn't make claims about speed until VTD-XML does correctly
> implement XML. Over the last eight years, I have seen a lot of claims for
> speed fall apart when the last 10% of features were implemented.
>
> --
> Elliotte Rusty Harold elharo@metalab.unc.edu
> XML in a Nutshell 3rd Edition Just Published!
> http://www.cafeconleche.org/books/xian3/
> http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
>
Re: VTD-XML Version 1.5 Released
Posted by Elliotte Harold <el...@metalab.unc.edu>.
jzhang@ximpleware.com wrote:
> Elliotte,
> It is not doing things for the sake of doing it..
> it is to do things where it makes sense, I think that
> a large majority of XML related vocabularies do not
> require entity references, SOAP also excludes DTD and
> external references because of their performance drag,
> so though not supporting external entities, we still
> believe many people will find VTD-XML useful...
> so it is not about making egregious shortcuts, it is
> more like making sensible compromises.
> numerical entities are supported.
If you're not supporting the *minimum* things required of an XML parser,
and have no intentions of doing so, then you shouldn't be using the name
XML for your product.
You further shouldn't make claims about speed until VTD-XML does
correctly implement XML. Over the last eight years, I have seen a lot of
claims for speed fall apart when the last 10% of features were implemented.
--
Elliotte Rusty Harold elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
Re: VTD-XML Version 1.5 Released
Posted by jz...@ximpleware.com.
Elliotte,
It is not doing things for the sake of doing it..
it is to do things where it makes sense, I think that
a large majority of XML related vocabularies do not
require entity references, SOAP also excludes DTD and
external references because of their performance drag,
so though not supporting external entities, we still
believe many people will find VTD-XML useful...
so it is not about making egregious shortcuts, it is
more like making sensible compromises.
numerical entities are supported.
Cheers,
Jimmy
Elliotte Harold writes:
> Jimmy Zhang wrote:
>
>> The VTD-XML project team think not. We are proud to
>> announce the availability of both C and Java version
>> 1.5 of VTD-XML, the next generation open-source XML
>> parser that goes beyond DOM and SAX in terms of
>> performance, memory usage and ease of use.
>
> But not conformance. VTD-XML does not provide the minimum level of
> functionality required of an XML 1.0 parser. Until it does, performance
> measurements against fully conformant products are deceptive and
> dishonest.
>
> To be specific VTD-XML advertises that "Currently it only supports
> built-in entity references(" &s; ' > <)." Seeing that
> omission makes me wonder what else may be missing. In particular does
> VTD-XML support numeric character references? Does it correctly check all
> well-formedness constraints?
>
> It's not hard to speed up XML by cutting out the parts you don't like. The
> trick is to write a real XML parser that is both conformant and fast.
> There's some interesting stuff here. This might be a productive area for
> research, but it's not ready for production, and comparing it to parsers
> that are is wrong.
>
> --
> .Elliotte Rusty Harold elharo@metalab.unc.edu
> XML in a Nutshell 3rd Edition Just Published!
> http://www.cafeconleche.org/books/xian3/
> http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
Re: [ANN] VTD-XML Version 1.5 Released
Posted by Elliotte Harold <el...@metalab.unc.edu>.
Jimmy Zhang wrote:
> The VTD-XML project team think not. We are proud to
> announce the availability of both C and Java version
> 1.5 of VTD-XML, the next generation open-source XML
> parser that goes beyond DOM and SAX in terms of
> performance, memory usage and ease of use.
But not conformance. VTD-XML does not provide the minimum level of
functionality required of an XML 1.0 parser. Until it does, performance
measurements against fully conformant products are deceptive and dishonest.
To be specific VTD-XML advertises that "Currently it only supports
built-in entity references(" &s; ' > <)." Seeing that
omission makes me wonder what else may be missing. In particular does
VTD-XML support numeric character references? Does it correctly check
all well-formedness constraints?
It's not hard to speed up XML by cutting out the parts you don't like.
The trick is to write a real XML parser that is both conformant and
fast. There's some interesting stuff here. This might be a productive
area for research, but it's not ready for production, and comparing it
to parsers that are is wrong.
--
Elliotte Rusty Harold elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
Re: [ANN] VTD-XML Version 1.5 Released
Posted by Santiago Pericas-Geertsen <Sa...@Sun.COM>.
Jimmy,
Really interesting work. I've heard about this idea before, but
this is the first implementation I know of. Modulo XML conformance
issues (already discussed elsewhere) I can see how this can be useful
for certain applications.
There is one area, however, that I find it is not elaborated enough
on the project page: updates. I see that you list incremental
updates, but I haven't found a definition for that. Moreover, I would
expect DOM to be much more flexible when it comes to updates, which
in way, makes your comparison of VTD-XML and DOM a bit unfair.
Analogously, comparing VTD-XML to SAX with a null content handler is
not apples-to-apples as VTD-XML is defers some work until the
application actually uses the data --which, again, may actually be a
good thing for some applications, but maybe less so for others that,
for example, bind the entire infoset to programming language objects.
To summarize, IMO it may be best to position VTD-XML as a new
processing model (which I think you are!), but at the same time be
explicit and fair when comparing it with existing processing models.
Hope this feedback helps. Nice work!
-- Santiago
On Feb 13, 2006, at 4:57 PM, Jimmy Zhang wrote:
> [ANN] VTD-XML Version 1.5 Released
>
> Eight years after the invention of XML, DOM and SAX,
> despite their respective issues, are still the mainstays
> of application developers.
>
> So is it the end of road for XML parsing innovation?
>
> The VTD-XML project team think not. We are proud to
> announce the availability of both C and Java version
> 1.5 of VTD-XML, the next generation open-source XML
> parser that goes beyond DOM and SAX in terms of
> performance, memory usage and ease of use.
>
> The technical highlights of VTD-XML are:
>
> * Performance: the world's fastest XML parser,
> between 5x~10x faster than DOM
> * Memory Usage: 3x to 5x less than DOM, 1.3x~1.5x
> XML document size
> * Random access with built-in XPath support
> * A simple and intuitive API
>
> Other advanced features include:
> * Buffer reuse
> * Large document support (2GByte)
> * Incremental update
> * Hardware acceleration
> * Native XML indexing.
>
> For demos, latest benchmarks, related articles and software
> downloads, please visit http://vtd-xml.sf.net. Also let us
> know your thoughts and suggestions and help us improve
> VTD-XML.
>
Re: VTD-XML Version 1.5 Released
Posted by jz...@ximpleware.com.
xerces 2.7.1 is the one we used for comparison...
Joseph Kesselman writes:
> Comparing parse speed and memory use to DOM, rather than to specific
> implementations of parser and DOM, immediately makes the claims invalid.
>
> (They may have something here or they may not, but there's no evidence that
> they know how to make meaningful measurements.)
>
> ______________________________________
> Joe Kesselman -- Beware of Blueshift!
> "The world changed profoundly and unpredictably the day Tim Berners Lee
> got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk
>
Re: [ANN] VTD-XML Version 1.5 Released
Posted by Joseph Kesselman <ke...@us.ibm.com>.
Comparing parse speed and memory use to DOM, rather than to specific
implementations of parser and DOM, immediately makes the claims invalid.
(They may have something here or they may not, but there's no evidence that
they know how to make meaningful measurements.)
______________________________________
Joe Kesselman -- Beware of Blueshift!
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk