You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Jimmy Zhang <jz...@ximpleware.com> on 2006/02/13 22:57:21 UTC

[ANN] VTD-XML Version 1.5 Released

[ANN] VTD-XML Version 1.5 Released

Eight years after the invention of XML, DOM and SAX, 
despite their respective issues, are still the mainstays 
of application developers.  
 
So is it the end of road for XML parsing innovation? 
 
The VTD-XML project team think not. We are proud to 
announce the availability of both C and Java version 
1.5 of VTD-XML, the next generation open-source XML 
parser that goes beyond DOM and SAX in terms of 
performance, memory usage and ease of use. 
 
The technical highlights of VTD-XML are: 

* Performance: the world's fastest XML parser,
  between 5x~10x faster than DOM
* Memory Usage: 3x to 5x less than DOM, 1.3x~1.5x
  XML document size
* Random access with built-in XPath support
* A simple and intuitive API 

Other advanced features include:
* Buffer reuse
* Large document support (2GByte)
* Incremental update
* Hardware acceleration
* Native XML indexing.

For demos, latest benchmarks, related articles and software 
downloads, please visit http://vtd-xml.sf.net. Also let us 
know your thoughts and suggestions and help us improve 
VTD-XML.

Re: VTD-XML Version 1.5 Released

Posted by Elliotte Harold <el...@metalab.unc.edu>.
Jimmy Zhang wrote:
> Elliotte, When XML pull parser came out, it didn't support DTD and 
> external references either,
> to date, several XML pull implementation still do not support the full 
> spec, yet they call themselves
> XML parsers, it seems to us that VTD-XML can do same thing??

They can't; you can't. Nothing should call itself an XML pull parser or 
claim to support XML if it doesn't reach the minimal level of 
conformance to the XML spec. As I've said before VTD-XML is hardly the 
only product to cheat on benchmarks by failing to implement the spec.

The case of XML pull parsers is informative and relevant. You're right 
that some XML pull parsers did and don't fully implement XML. When XML 
pull parsers were first proposed, it was claimed that they would be 
faster than push parsers like SAX. Initial benchmarks seemed to indicate 
this might be true. However, those initial benchmarks were based on 
incomplete and incorrect implementations of the XML specification. Once 
those products were brought into conformance with XML, performance 
plummeted to the level that could already be achieved with SAX and push 
parsers.

Some XML pull parsers such as the StAX parser included in the JDK do 
fully support XML (modulo bugs that will eventually be fixed). Some pull 
parsers such as Woodstox do not. Products like Woodstox and VTD-XML that 
don't meet the minimum level of conformance are being disingenuous in 
claiming to be faster than standard products like Xerces. Formula-1 race 
cars are faster than a Toyota Corolla on a race track, but there's a 
reason they aren't allowed to drive on local streets.

I can't take any of VTD's measurements seriously until it demonstrates 
conformance to the specification. Measuring performance before you prove 
correctness is putting the cart before the horse. I don't care how fast 
a car goes on the Autobahn if it blows up when hitting a speed bump on a 
local road.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim

Re: VTD-XML Version 1.5 Released

Posted by Jimmy Zhang <jz...@ximpleware.com>.
Elliotte, When XML pull parser came out, it didn't support DTD and external 
references either,
to date, several XML pull implementation still do not support the full spec, 
yet they call themselves
XML parsers, it seems to us that VTD-XML can do same thing??
Cheers,
jimmy
----- Original Message ----- 
From: "Elliotte Harold" <el...@metalab.unc.edu>
To: <jz...@ximpleware.com>
Cc: <xa...@xml.apache.org>
Sent: Monday, February 13, 2006 3:10 PM
Subject: Re: VTD-XML Version 1.5 Released


> jzhang@ximpleware.com wrote:
>> Elliotte,
>> It is not doing things for the sake of doing it..
>> it is to do things where it makes sense, I think that
>> a large majority of XML related vocabularies do not
>> require entity references, SOAP also excludes DTD and
>> external references because of their performance drag,
>> so though not supporting external entities, we still
>> believe many people will find VTD-XML useful...
>> so it is not about making egregious shortcuts, it is
>> more like making sensible compromises.
>> numerical entities are supported.
>
> If you're not supporting the *minimum* things required of an XML parser, 
> and have no intentions of doing so, then you shouldn't be using the name 
> XML for your product.
>
> You further shouldn't make claims about speed until VTD-XML does correctly 
> implement XML. Over the last eight years, I have seen a lot of claims for 
> speed fall apart when the last 10% of features were implemented.
>
> -- 
> Elliotte Rusty Harold  elharo@metalab.unc.edu
> XML in a Nutshell 3rd Edition Just Published!
> http://www.cafeconleche.org/books/xian3/
> http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
> 



Re: VTD-XML Version 1.5 Released

Posted by Elliotte Harold <el...@metalab.unc.edu>.
jzhang@ximpleware.com wrote:
> Elliotte,
> It is not doing things for the sake of doing it..
> it is to do things where it makes sense, I think that
> a large majority of XML related vocabularies do not
> require entity references, SOAP also excludes DTD and
> external references because of their performance drag,
> so though not supporting external entities, we still
> believe many people will find VTD-XML useful...
> so it is not about making egregious shortcuts, it is
> more like making sensible compromises.
> numerical entities are supported.

If you're not supporting the *minimum* things required of an XML parser, 
and have no intentions of doing so, then you shouldn't be using the name 
XML for your product.

You further shouldn't make claims about speed until VTD-XML does 
correctly implement XML. Over the last eight years, I have seen a lot of 
claims for speed fall apart when the last 10% of features were implemented.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim

Re: VTD-XML Version 1.5 Released

Posted by jz...@ximpleware.com.
Elliotte,
It is not doing things for the sake of doing it..
it is to do things where it makes sense, I think that
a large majority of XML related vocabularies do not
require entity references, SOAP also excludes DTD and
external references because of their performance drag,
so though not supporting external entities, we still
believe many people will find VTD-XML useful...
so it is not about making egregious shortcuts, it is
more like making sensible compromises.
numerical entities are supported.
Cheers,
Jimmy 

Elliotte Harold writes: 

> Jimmy Zhang wrote: 
> 
>> The VTD-XML project team think not. We are proud to
>> announce the availability of both C and Java version
>> 1.5 of VTD-XML, the next generation open-source XML
>> parser that goes beyond DOM and SAX in terms of
>> performance, memory usage and ease of use.
> 
> But not conformance. VTD-XML does not provide the minimum level of 
> functionality required of an XML 1.0 parser. Until it does, performance 
> measurements against fully conformant products are deceptive and 
> dishonest. 
> 
> To be specific VTD-XML advertises that "Currently it only supports 
> built-in entity references(&quot; &amps; &apos; &gt; &lt;)." Seeing that 
> omission makes me wonder what else may be missing. In particular does 
> VTD-XML support numeric character references? Does it correctly check all 
> well-formedness constraints? 
> 
> It's not hard to speed up XML by cutting out the parts you don't like. The 
> trick is to write a real XML parser that is both conformant and fast. 
> There's some interesting stuff here. This might be a productive area for 
> research, but it's not ready for production, and comparing it to parsers 
> that are is wrong. 
> 
> -- 
> .Elliotte Rusty Harold  elharo@metalab.unc.edu
> XML in a Nutshell 3rd Edition Just Published!
> http://www.cafeconleche.org/books/xian3/
> http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
 


Re: [ANN] VTD-XML Version 1.5 Released

Posted by Elliotte Harold <el...@metalab.unc.edu>.
Jimmy Zhang wrote:

> The VTD-XML project team think not. We are proud to
> announce the availability of both C and Java version
> 1.5 of VTD-XML, the next generation open-source XML
> parser that goes beyond DOM and SAX in terms of
> performance, memory usage and ease of use.

But not conformance. VTD-XML does not provide the minimum level of 
functionality required of an XML 1.0 parser. Until it does, performance 
measurements against fully conformant products are deceptive and dishonest.

To be specific VTD-XML advertises that "Currently it only supports 
built-in entity references(&quot; &amps; &apos; &gt; &lt;)." Seeing that 
omission makes me wonder what else may be missing. In particular does 
VTD-XML support numeric character references? Does it correctly check 
all well-formedness constraints?

It's not hard to speed up XML by cutting out the parts you don't like. 
The trick is to write a real XML parser that is both conformant and 
fast. There's some interesting stuff here. This might be a productive 
area for research, but it's not ready for production, and comparing it 
to parsers that are is wrong.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim

Re: [ANN] VTD-XML Version 1.5 Released

Posted by Santiago Pericas-Geertsen <Sa...@Sun.COM>.
Jimmy,

  Really interesting work. I've heard about this idea before, but  
this is the first implementation I know of. Modulo XML conformance  
issues (already discussed elsewhere) I can see how this can be useful  
for certain applications.

  There is one area, however, that I find it is not elaborated enough  
on the project page: updates. I see that you list incremental  
updates, but I haven't found a definition for that. Moreover, I would  
expect DOM to be much more flexible when it comes to updates, which  
in way, makes your comparison of VTD-XML and DOM a bit unfair.  
Analogously, comparing VTD-XML to SAX with a null content handler is  
not apples-to-apples as VTD-XML is defers some work until the  
application actually uses the data --which, again, may actually be a  
good thing for some applications, but maybe less so for others that,  
for example, bind the entire infoset to programming language objects.

  To summarize, IMO it may be best to position VTD-XML as a new  
processing model (which I think you are!), but at the same time be  
explicit and fair when comparing it with existing processing models.

  Hope this feedback helps. Nice work!

-- Santiago

On Feb 13, 2006, at 4:57 PM, Jimmy Zhang wrote:

> [ANN] VTD-XML Version 1.5 Released
>
> Eight years after the invention of XML, DOM and SAX,
> despite their respective issues, are still the mainstays
> of application developers.
>
> So is it the end of road for XML parsing innovation?
>
> The VTD-XML project team think not. We are proud to
> announce the availability of both C and Java version
> 1.5 of VTD-XML, the next generation open-source XML
> parser that goes beyond DOM and SAX in terms of
> performance, memory usage and ease of use.
>
> The technical highlights of VTD-XML are:
>
> * Performance: the world's fastest XML parser,
>   between 5x~10x faster than DOM
> * Memory Usage: 3x to 5x less than DOM, 1.3x~1.5x
>   XML document size
> * Random access with built-in XPath support
> * A simple and intuitive API
>
> Other advanced features include:
> * Buffer reuse
> * Large document support (2GByte)
> * Incremental update
> * Hardware acceleration
> * Native XML indexing.
>
> For demos, latest benchmarks, related articles and software
> downloads, please visit http://vtd-xml.sf.net. Also let us
> know your thoughts and suggestions and help us improve
> VTD-XML.
>


Re: VTD-XML Version 1.5 Released

Posted by jz...@ximpleware.com.
xerces 2.7.1 is the one we used for comparison... 

 

Joseph Kesselman writes: 

> Comparing parse speed and memory use to DOM, rather than to specific
> implementations of parser and DOM, immediately makes the claims invalid. 
> 
> (They may have something here or they may not, but there's no evidence that
> they know how to make meaningful measurements.) 
> 
> ______________________________________
> Joe Kesselman -- Beware of Blueshift!
> "The world changed profoundly and unpredictably the day Tim Berners Lee
> got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk 
> 
 


Re: [ANN] VTD-XML Version 1.5 Released

Posted by Joseph Kesselman <ke...@us.ibm.com>.
Comparing parse speed and memory use to DOM, rather than to specific
implementations of parser and DOM, immediately makes the claims invalid.

(They may have something here or they may not, but there's no evidence that
they know how to make meaningful measurements.)

______________________________________
Joe Kesselman -- Beware of Blueshift!
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk