You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Andy Clark <an...@apache.org> on 2001/08/06 07:44:54 UTC

Re: [Xerces2] Scanner Interfaces

Everyone,

Okay, I have added the scanner interfaces and made the changes
to the Xerces2 reference implementation to implement this. I
built it and ran it through the OASIS test suite. Nothing seems 
to have broken so I went ahead and committed the changes.

This is actually quite cool because of the following reasons:

1) We can define a parser conifugration's pipeline dynamically.
   For example, through an XML config file or something. That
   way people could create new configurations (e.g. adding an
   XInclude filter in the pipeline) by tweaking the standard
   configuration file instead of having to create a new
   XMLParserConfiguration duplicating a lot of code.

2) We can remove some of these extraneous class layers from 
   the parsers package. Right now each parser type, DOM for
   example, has a base class called "AbstractDOMParser" that 
   takes an XMLParserConfiguration object in its constructor 
   and the normal "DOMParser" which uses the standard 
   configuration.

   We made the separation because to compile the standard
   configuration, it depends on the various Xerces2 parser
   components. Which means that the *parser* class would
   have relied on these same classes just to compile --
   something that hurts the layerability of the whole
   thing.

   So now we could define the standard parser configuration
   in some kind of config file which gets loaded dynamically
   or some such thing. Perhaps using a META-INF/services
   kind of file??? I guess we could do this anyway without
   the new scanner interfaces. But now that we have them,
   I'm getting all sorts of crazy ideas running through my
   head about the possibilities! :)

Next, I'll be updating the XNI Manual in the documentation to
detail these new interfaces. (I'll have to force myself 'cause
I'm really feeling burnt out writing so much documentation. :)
Anybody out there want to help with the documentation? We
still need some docs on how to build parser configurations 
that use the Xerces2 standard components -- because they have
certain dependencies, etc. So that's something that still
needs to be done.

Ted,

You done with that grammar for the pipeline setup, yet?
Now that we have the scanner interfaces, I could whip up a 
sample that implements this "dynamic" parser configuration
that's possible. When I tried this before (and stopped 'cause
we didn't have any scanner interfaces), I came up with a DTD
grammar for the purpose. 

<!ELEMENT parser (component*,xml?,dtd?)>
<!ATTLIST parser name    CDATA #IMPLIED
                 version CDATA #FIXED '0.9'>

<!ELEMENT component EMPTY>
<!ATTLIST component id       ID    #REQUIRED
                    class    CDATA #REQUIRED
                    property CDATA #IMPLIED>

<!ELEMENT xml (source,filter*)>
<!ELEMENT dtd (source,filter*)>

<!ELEMENT source EMPTY>
<!ATTLIST source idref IDREF #REQUIRED>

<!ELEMENT filter EMPTY>
<!ATTLIST filter idref IDREF #REQUIRED>

I'm hoping that you can sort of figure out how it works by 
looking at the grammar. Basically, you have a number of 
<component>s with an ID and then you construct the <xml> 
and <dtd> parsing pipelines from <source>s and <filter>s 
that reference the components using an IDREF. It's very 
simple, but I was just writing it as an XNI sample. Not 
something that was too complicated.

Anyway, let me know where you were heading with this idea.

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: [Xerces2] Scanner Interfaces

Posted by Andy Clark <an...@apache.org>.
Edwin Goei wrote:
> Which test suite did you run?  

OASIS 15 Mar 2001

Does anyone know when this suite will be updated/fixed at OASIS?

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: [Xerces2] Scanner Interfaces

Posted by Edwin Goei <ed...@sun.com>.
Andy Clark wrote:
> 
> Edwin Goei wrote:
> > When you said you ran it through the OASIS test suite, what do you mean
> > by this?  As I understand it the current tests at OASIS are broken and
> > there is a fixed version elsewhere, but it requires manually
> > interpreting over 1200 negative tests to verify passing those tests.
> 
> Basically, what I was trying to say is that after making the
> change, I ran the test suite and compared the results with a
> previous run of the test suite. A regression would have
> affected the results. Yet the results were the same so that
> led me to assume that my changes were a qualified "ok".

OK.  When I recently downloaded the OASIS XML 1.0 Test Suite at
http://www.oasis-open.org/committees/xml-conformance/xml-test-suite.shtml. 
The suite was broken.  Which test suite did you run?  Is it an IBM
internal test suite or an earlier version of the OASIS suite?  I was
able to run a fixed version of the OASIS suite (on crimson) which I got
from http://xmlconf.sourceforge.net/, but did not interpret the results.

-Edwin

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: [Xerces2] Scanner Interfaces

Posted by Andy Clark <an...@apache.org>.
Edwin Goei wrote:
> When you said you ran it through the OASIS test suite, what do you mean
> by this?  As I understand it the current tests at OASIS are broken and
> there is a fixed version elsewhere, but it requires manually
> interpreting over 1200 negative tests to verify passing those tests.

Basically, what I was trying to say is that after making the
change, I ran the test suite and compared the results with a
previous run of the test suite. A regression would have 
affected the results. Yet the results were the same so that
led me to assume that my changes were a qualified "ok".

The IBM team has also been running it through their modified 
test suite and seem pretty confident in the results. There 
are a few more minor problems but it looks good for the most 
part.


> Does this mean that Xerces 2 must read in and parse this config file
> before parsing any source documents?

No. I was only describing what *could* be done with the
new scanner definitions. Not what *is* done in the code
now. But by describing the configuration in an XML file,
we can do various things with it, including:

  * actually read it in to build the standard configuration
  * allow users to quickly and easily make custom configs
  * write a tool that reads the parser configuration and
    generate the code for that config

The last one seems more appropriate. Imagine a tool that
reads in your parser configuration description and then
generates the configuration class, and the JAXP impl that
will instantiate the various parsers using that config.

> Did you get any takers?  I'd like to get more familiar with this parser
> config scheme so I may have some time to help out.

Nope. I'd really appreciate any help you can offer in 
this area. Let me know if you have any questions
regarding the components and their use.

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: [Xerces2] Scanner Interfaces

Posted by Edwin Goei <ed...@sun.com>.
First, sorry it took so long to get around to reading this message.  My
comments/questions are inline...

Andy Clark wrote:
> 
> Everyone,
> 
> Okay, I have added the scanner interfaces and made the changes
> to the Xerces2 reference implementation to implement this. I
> built it and ran it through the OASIS test suite. Nothing seems
> to have broken so I went ahead and committed the changes.

When you said you ran it through the OASIS test suite, what do you mean
by this?  As I understand it the current tests at OASIS are broken and
there is a fixed version elsewhere, but it requires manually
interpreting over 1200 negative tests to verify passing those tests.

> 
> This is actually quite cool because of the following reasons:
> 
> 1) We can define a parser conifugration's pipeline dynamically.
>    For example, through an XML config file or something. That
>    way people could create new configurations (e.g. adding an
>    XInclude filter in the pipeline) by tweaking the standard
>    configuration file instead of having to create a new
>    XMLParserConfiguration duplicating a lot of code.

Does this mean that Xerces 2 must read in and parse this config file
before parsing any source documents?

> Next, I'll be updating the XNI Manual in the documentation to
> detail these new interfaces. (I'll have to force myself 'cause
> I'm really feeling burnt out writing so much documentation. :)
> Anybody out there want to help with the documentation? We
> still need some docs on how to build parser configurations
> that use the Xerces2 standard components -- because they have
> certain dependencies, etc. So that's something that still
> needs to be done.

Did you get any takers?  I'd like to get more familiar with this parser
config scheme so I may have some time to help out.

-Edwin

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org