You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xerces.apache.org by Harrell Lyndell E <Ly...@irs.gov> on 2007/08/01 16:52:33 UTC

Questions about XML Parser for Java

Would you be so kind as to provide me a rough estimate of the man hours
that expended in developing the XML Parser.
I am interested in an estimate for the entire end product, start to
finish, not for some particular version such as XML4J which was probably
built on existing software.
 
My guess is that a lot of work was done open source so it may be
impossible to say with any accuracy closer than +/- 300%. Still,
whatever estimate you would guess would be helpful. We are considering
writing our own XML parser, and it woud be helpful to know what your
experience has been.
 
We have another question, please. We have noted that saving an XML file
as an Excel file gets you an Excel file that seems to have been parsed
in some manner.
While I'm sure you agree with us that Excel is a excellent product, I
wonder if you would be willing to comment on the differences between
what XML4J would provide 
and what Excel provides for some particular XML file.
 
My request is not concerned with me personally, or my personal software.
My employer is the Internal Revenue Service and my inquiry concerns
software we are developing here in house for IRS use.        
 
Thank you in advance for your response, and please comment in lenghty
detail if you wish to do so. It would be helpful. 

Lyndell  Harrell 
Custodial  Accounting 
OS:CIO:AD:IM:FS:CU 
202-283-5147,  A6-433

Re: Questions about XML Parser for Java

Posted by Michael Glavassevich <mr...@ca.ibm.com>.

Hi Lyndell,

Have you heard of Ohloh? This site has estimated development costs for 
many open source projects including Xerces [1]. I think it only factors in 
the current lines of code, rather than all the changes which were made to 
the codebase to get to that point so I wouldn't say it's that accurate but 
it might give you some idea of what the lower bound might be if you get 
everything right from the start.

Thanks.

[1] http://www.ohloh.net/projects/3466

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Harrell Lyndell E" <Ly...@irs.gov> wrote on 08/01/2007 
10:52:33 AM:

> Would you be so kind as to provide me a rough estimate of the man 
> hours that expended in developing the XML Parser.
> I am interested in an estimate for the entire end product, start to 
> finish, not for some particular version such as XML4J which was 
> probably built on existing software.
> 
> My guess is that a lot of work was done open source so it may be 
> impossible to say with any accuracy closer than +/- 300%. Still, 
> whatever estimate you would guess would be helpful. We are 
> considering writing our own XML parser, and it woud be helpful to 
> know what your experience has been.
> 
> We have another question, please. We have noted that saving an XML 
> file as an Excel file gets you an Excel file that seems to have been
> parsed in some manner.
> While I'm sure you agree with us that Excel is a excellent product, 
> I wonder if you would be willing to comment on the differences 
> between what XML4J would provide 
> and what Excel provides for some particular XML file.
> 
> My request is not concerned with me personally, or my personal 
> software. My employer is the Internal Revenue Service and my inquiry
> concerns software we are developing here in house for IRS use. 
> 
> Thank you in advance for your response, and please comment in 
> lenghty detail if you wish to do so. It would be helpful. 
> Lyndell  Harrell 
> Custodial  Accounting 
> OS:CIO:AD:IM:FS:CU 
> 202-283-5147,  A6-433 

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

Re: Questions about XML Parser for Java

Posted by ke...@us.ibm.com.

>Would you be so kind as to provide me a rough estimate of the man hours
that expended in developing the XML Parser

Probably not possible, but it's a significant number of man-years.

Xerces started off as an early prototype of IBM's XML4J parser, which went
through several complete redesigns and reimplementations, API changes,
changes in the validation scheme... Heck, the DOM implementation alone is
probably multiple man-years during that stage, since the first DOM
implementation was discarded in favor of one I wrote, which then underwent
a lot of further evolution. Work on that was done across multiple IBM
groups from Tokyo to California to New York to Toronto to wherever. I
really doubt anyone was attempting to track total time investment.

And of course once Xerces hit Apache, and we started getting contributions
from the open source community, any pretense of time tracking would have
gone right out the window.

Could a parser be written in less time? Sure; a lot of the time was spent
in helping the standards to evolve, and a lot was spent in performance
tuning, and Xerces supports things that your particular application may not
need (the downside of being a generally useful tool is that one has to
invest in being general.) And the requirements for an XML parser are better
understood these days. But writing a parser that you'll be happy using is
still not a trivial exercise; the devil really is in the details.


>We have noted that saving an XML file as an Excel file gets you an Excel
file that seems to have been parsed in some
> manner. [...] I wonder if you would be willing to comment on the
differences between what XML4J would provide
>and what Excel provides for some particular XML file.

I'm sorry, but that question really doesn't make a lot of sense. It's like
asking what the difference is between a motor and a washing machine.

Excel is a particular application. It supports a particular XML-based
markup language as one of its file export/import syntaxes, and therefore
must contain at least a limited XML serializer and parser. (May not be
fully general, since they know a priori exactly what kind of XML they
intend to generate and process.))

XML4J/Xerces is a general-purpose XML parser for invocation from
applications. It converts between XML syntax and the standard APIs for
working with XML (DOM, SAX, etc.), as well as performing validation against
DTDs and/or schemas that describe the particular XML-based markup language
you are working with.. Xerces can be used as a building block for any
application which needs to read or write data represented in XML.



______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
(http://www.ovff.org/pegasus/songs/threes-rev-11.html)