You are viewing a plain text version of this content. The canonical link for it is here.
Posted to p-dev@xerces.apache.org by Geert Theys <ge...@staf.pi.be> on 2001/07/24 10:32:02 UTC

example testing DTD

hello,

can someone give me a working example in for perl xerces howto validate a
XML against a DTD? With the limited documentation I'm stuck...


Thanks.




--
Geert Theys
Just Another Perl Hacker

Email : perl -e "print join('@', 'geert','staf.pi.be')"

        // Anyone who can't laugh at himself is taking life far too
seriously. -- Larry Wall //




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: example testing DTD

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
"Geert Theys" <ge...@staf.pi.be> writes:

> > I'm afraid to say that Xerces.pm has not yet implemented
> > EntityResolver's completely (oh the shame ....). All the framework is
> > there. I just need to hook it up. I'll try to cut a release while I'm
> > here at the Open Source conference.
> 
> No problem. For the moment I'll use the XML::LibXML module. But the xerces-p
> module has nice prospects :)

So I got the EntityResolver code hooked up today. Give me a day or so,
and I'll have a test and a working example for you, and I'll cut a
release before the weekend.

jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


RE: example testing DTD

Posted by Geert Theys <ge...@staf.pi.be>.
> Aha!! Now I finally understand (amazing how dense module maintainers
> can be sometimes ;-)
>
> Yes, I agree completely this type of high-level documentation is
> completely missing from Xerces-P/C/J
>
Yup, even the source code from the module is sometimes difficult to read.

> In Xerces, Parser::parse() does all the validation for you without
> needing to call the is_valid() method. (Just curious, why set
> validation(1) when you call is_valid() by hand? And what is the
> difference between validate() and is_valid()?).

The parser in LibXML only checks if your XML is well-formed.


> Depending on which subclass of Parser you are using (i.e. DOMParser,
> SAXParser, SAX2XMLReader, or IDOMParser) you would handle things
> slightly differently for the steps after handling the DTD, but they
> all handle DTD's the same.
>

Ok

> The simplest way to deal with DTD's is to use SYSTEM and make sure
> that the DTD's are where you say they will be. If not, then ou need to
> setup an EntityResolver. I believe this is all part of the XML spec,
> and so, although the other library is convenient, it's not actually
> implementing the spec...
>

Ah, so you can't do it in your set the DTD in your code. It depends on the
XML. But the XML gets send from somewhere else and  I need to validate it as
it comes in. But I can't rely on the SYSTEM thingie to be correct...


> I'm afraid to say that Xerces.pm has not yet implemented
> EntityResolver's completely (oh the shame ....). All the framework is
> there. I just need to hook it up. I'll try to cut a release while I'm
> here at the Open Source conference.

No problem. For the moment I'll use the XML::LibXML module. But the xerces-p
module has nice prospects :)

Friendly greets.


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: example testing DTD

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
"Geert Theys" <ge...@staf.pi.be> writes:

> > If you've set DoSchema, it will use a schema to validate the document,
> > if you haven't it will expect a DTD.
> >
> 
> this is the header of the XML I receive:
> <!DOCTYPE PXMLServiceRequest SYSTEM "D:\DTD\PXMLServiceRequest.dtd">
> 
> But the DTD is not in that place off course. Where in the program do I set
> where the DTD is?
> 
> In my other code I do(just to illustrate what I want):
> 
> use strict;
> use XML::LibXML;
> 
> # Here I read my dtd in a string
> my $dtdstr = do {
>     local $/;
>     open(DTD, 'XMLS/PXMLStatusMessage.dtd') || die $!;
>     my $str = <DTD>;
>     close DTD;
>     $str;
> };
> # Set validation on
> XML::LibXML->validation(1);
> 
> # pars XML and DTD
> my $dtd1 = XML::LibXML::Dtd->parse_string($dtdstr);
> my $xml1 = XML::LibXML->new->parse_file('XMLS/PXMLStatusMessage.xml');
> 
> # To ways to check validation. Both work rather well :)
> if(!$xml1->is_valid($dtd1)) {
>     print "Our XML is not valid!!!\n";
> }
> 
> eval { $xml1->validate($dtd1); }; print $@ if $@;

Aha!! Now I finally understand (amazing how dense module maintainers
can be sometimes ;-)

Yes, I agree completely this type of high-level documentation is
completely missing from Xerces-P/C/J

In Xerces, Parser::parse() does all the validation for you without
needing to call the is_valid() method. (Just curious, why set
validation(1) when you call is_valid() by hand? And what is the
difference between validate() and is_valid()?).

Depending on which subclass of Parser you are using (i.e. DOMParser,
SAXParser, SAX2XMLReader, or IDOMParser) you would handle things
slightly differently for the steps after handling the DTD, but they
all handle DTD's the same.

The simplest way to deal with DTD's is to use SYSTEM and make sure
that the DTD's are where you say they will be. If not, then ou need to
setup an EntityResolver. I believe this is all part of the XML spec,
and so, although the other library is convenient, it's not actually
implementing the spec...

I'm afraid to say that Xerces.pm has not yet implemented
EntityResolver's completely (oh the shame ....). All the framework is
there. I just need to hook it up. I'll try to cut a release while I'm
here at the Open Source conference.

Sorry,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


RE: example testing DTD

Posted by Geert Theys <ge...@staf.pi.be>.
> > what you told me I already did.
>
> Sorry about that.

No problem about that.

>
> I had hoped that they were more clear... Any suggestions as to how to
> make them more useful are welcome.
>

Erm, for a trained C programmer they're good I think. But I just started to
learn C in my spare time (I have a basic understanding of C)...


> If you've set DoSchema, it will use a schema to validate the document,
> if you haven't it will expect a DTD.
>

this is the header of the XML I receive:
<!DOCTYPE PXMLServiceRequest SYSTEM "D:\DTD\PXMLServiceRequest.dtd">

But the DTD is not in that place off course. Where in the program do I set
where the DTD is?

In my other code I do(just to illustrate what I want):

use strict;
use XML::LibXML;

# Here I read my dtd in a string
my $dtdstr = do {
    local $/;
    open(DTD, 'XMLS/PXMLStatusMessage.dtd') || die $!;
    my $str = <DTD>;
    close DTD;
    $str;
};
# Set validation on
XML::LibXML->validation(1);

# pars XML and DTD
my $dtd1 = XML::LibXML::Dtd->parse_string($dtdstr);
my $xml1 = XML::LibXML->new->parse_file('XMLS/PXMLStatusMessage.xml');

# To ways to check validation. Both work rather well :)
if(!$xml1->is_valid($dtd1)) {
    print "Our XML is not valid!!!\n";
}

eval { $xml1->validate($dtd1); }; print $@ if $@;



Like you see also with error handling :)
(Sorry about the use of anorher module)

>   my $error_handler = XML::Xerces::PerlErrorHandler->new();
>   $parser->setErrorHandler($error_handler);

Yup I saw that in the example and understood the use :)


> Once you're done, getDocument() will give you the Document object,
> with which you can begin picking apart the information...


With the parsing part I don't have any problems.


Thanx for the support.


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: example testing DTD

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
"Geert Theys" <ge...@staf.pi.be> writes:

> hehe,
> 
> what you told me I already did. 

Sorry about that.

> I read the examples in samples and in t :) but don't really
> understand the validation scheme?  It doesn't use a DTD for
> validation but scheme?
> 
> I read the C docs too, but my perl skills are a lot better then my c.
> I figured out how to use the xerces as parser, but on the DTD end I'm
> stuck....

I had hoped that they were more clear... Any suggestions as to how to
make them more useful are welcome.

Let's pick apart DOMCount.pl for a bit:

  my $parser = XML::Xerces::DOMParser->new();
  $parser->setValidationScheme ($validate);
  $parser->setDoNamespaces ($namespace);
  $parser->setCreateEntityReferenceNodes(1);
  $parser->setDoSchema ($schema);

First we create the parser object, then we set whatever parse flags we
need. Don't confuse ValidationScheme with scheme, it can take three
values:

  * $XML::Xerces::DOMParser::Val_Always
  * $XML::Xerces::DOMParser::Val_Never
  * $XML::Xerces::DOMParser::Val_Auto

If you've set DoSchema, it will use a schema to validate the document,
if you haven't it will expect a DTD.

  my $error_handler = XML::Xerces::PerlErrorHandler->new();
  $parser->setErrorHandler($error_handler);

It's always a good idea to use an error handler. Otherwise parsing
just stops with no error message, not too friendly.


  $parser->parse (XML::Xerces::LocalFileInputSource->new($file));

Currently, only one of the parse() methods is available, so you need
to create an InputSource. parse() does all the work.

  my $doc = $parser->getDocument ();
  my $element_count = $doc->getElementsByTagName("*")->getLength();

Once you're done, getDocument() will give you the Document object,
with which you can begin picking apart the information...

HTH,
jas.



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


RE: example testing DTD

Posted by Geert Theys <ge...@staf.pi.be>.
hehe,

what you told me I already did. I read the examples in samples and in t :)
but don't really understand the validation scheme?
It doesn't use a DTD for validation but scheme?

I read the C docs too, but my perl skills are a lot better then my c.
I figured out how to use the xerces as parser, but on the DTD end I'm
stuck....

Thanx for answering my mail.

> -----Original Message-----
> From: Jason E. Stewart [mailto:jason@openinformatics.com]
> Sent: dinsdag 24 juli 2001 16:00
> To: xerces-p-dev@xml.apache.org
> Subject: Re: example testing DTD
>
>
> "Geert Theys" <ge...@staf.pi.be> writes:
>
> > can someone give me a working example in for perl xerces howto
> > validate a XML against a DTD? With the limited documentation I'm
> > stuck...
>
> Hey Geert,
>
> Welcome back to Xerces!
>
> Check out the examples in the samples/ directory. Anyone of them will
> work for you. For example:
>
> perl samples/DOMCount -v foo.xml
>
> will validate the file foo.xml using the DOM interface. It creates a
> DOMParser instance, an InputSource instance, sets the validation
> scheme, and then calls parse() on the input source.
>
> The tests in the t/ directory give a lot of low-level examples as
> opposed to the more complete ones in samples/.
>
> Also, I do recommend using the Xerces-C documentation:
>
> http://xml.apache.org/xerces-c/apiDocs/index.html
>
> They are very complete, and the Xerces.pm API maps almost 1-to-1 with
> the Xerces-C API. The only differences are specifically dealt with in
> the README:
>
> Even though Xerces.pm is based on the C++ API, it has been modified in
> a few ways to make it more accessible to typical Perl usage, primarily
> in the handling:
> * strings (DOMString, XMLCh, and perl string)
> * lists   (DOM_NodeList and perl list)
> * hashes  (DOM_NamedNodeMap and perl hash)
> * DOMParse.pm (for serializing a DOM tree)
> * implementing Perl handlers for C++ event callbacks
> * handling exceptions C++ ({XML,DOM,SAX}Exception's)
>
> HTH,
> jas.
>
> PS. I'm also accepting volunteers to help improved the "limited"
> documentation ...
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-p-dev-help@xml.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org


Re: example testing DTD

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
"Geert Theys" <ge...@staf.pi.be> writes:

> can someone give me a working example in for perl xerces howto
> validate a XML against a DTD? With the limited documentation I'm
> stuck...

Hey Geert,

Welcome back to Xerces!

Check out the examples in the samples/ directory. Anyone of them will
work for you. For example:

perl samples/DOMCount -v foo.xml

will validate the file foo.xml using the DOM interface. It creates a
DOMParser instance, an InputSource instance, sets the validation
scheme, and then calls parse() on the input source.

The tests in the t/ directory give a lot of low-level examples as
opposed to the more complete ones in samples/.

Also, I do recommend using the Xerces-C documentation:

http://xml.apache.org/xerces-c/apiDocs/index.html

They are very complete, and the Xerces.pm API maps almost 1-to-1 with
the Xerces-C API. The only differences are specifically dealt with in
the README:

Even though Xerces.pm is based on the C++ API, it has been modified in
a few ways to make it more accessible to typical Perl usage, primarily
in the handling:
* strings (DOMString, XMLCh, and perl string)
* lists   (DOM_NodeList and perl list)
* hashes  (DOM_NamedNodeMap and perl hash)
* DOMParse.pm (for serializing a DOM tree)
* implementing Perl handlers for C++ event callbacks
* handling exceptions C++ ({XML,DOM,SAX}Exception's)

HTH,
jas.

PS. I'm also accepting volunteers to help improved the "limited"
documentation ...


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org