You are viewing a plain text version of this content. The canonical link for it is here.
Posted to p-dev@xerces.apache.org by Rodent of Unusual Size <Ke...@Golux.Com> on 2001/07/27 13:00:16 UTC
Hola, folks.. This is not really a PMC issue, but a fast
New to actually *using* XML and Xerces, so please be gentle..
I am probably missing some obvious pieces.
I have three [basic] questions twith which I hope someone
can help me..
I am using XML::Xerces::DOM_Document::createDocument() to
create a DOM tree, and populating it with the appropriate
calls as I process the data I want to represent. Knowing
next to nothing about the DOM, I am essentially following the
sample apps' example blindly. After completing the model,
I am emitting it as XML using XML::Xerces::DOMParse::format()
and XML::Xerces::DOMParse::print().
I am finding the Xerces-P and Xerces-C documentation pretty
bloody opaque. For instance, one major facet it seems to be
missing is *examples*..
1. How can I set the additional pieces of the DOCTYPE from
Xerces-P? I can set the name, but I do not see any way to
set the SYSTEM/PUBLIC identifier keyword and the external
subset URL. Trying to set them in the createDocumentType()
call causes a segfault. :-)
2. How can I persuade the printing method to *not* turn "
into " in my processing instructions??? 'type="text/css"'
gets turned into 'type="text/css"' which is really
annoying..
3. How can I convince the printing/formatting method that *some*
elements should not be newline-and-indented? This is adding
incorrect whitespace. For instance, something that should be
<foo>this is[<bar>something</bar>] to see</foo>
is being emitted as
<foo>this is[
<bar>something</bar>] to see
</foo>
The whitespace between '[' and '<bar>something' is screwing up the
result.
TIA! :-)
--
#ken P-)}
Ken Coar, Sanagendamgagwedweinini http://Golux.Com/coar/
Apache Software Foundation http://www.apache.org/
"Apache Server for Dummies" http://Apache-Server.Com/
"Apache Server Unleashed" http://ApacheUnleashed.Com/
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org
Re: Doctype attributes, and formatting in the DOM
Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
Dunno what happened to the subject of my previous message.
This subject is more appropriate for the thread..
--
#ken P-)}
Ken Coar, Sanagendamgagwedweinini http://Golux.Com/coar/
Apache Software Foundation http://www.apache.org/
"Apache Server for Dummies" http://Apache-Server.Com/
"Apache Server Unleashed" http://ApacheUnleashed.Com/
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org
Re: Doctype attributes, and formatting in the DOM
Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
Hey Ken,
Glad to hear that things are finally going your way.
"Rodent of Unusual Size" <Ke...@Golux.Com> writes:
> Aha. Now I just need to figure out how to get a handle on that
> <lexicon/> element createDocument() added so I can futz with
> it.. found it, getDocumentElement(). Man, wading through the docco
> to find that was a pain. This stuff is clearly written for people
> who are already very familiar with XML and the DOM. :-(
Ah... That is actually quite a good point. There really is no tuturial
for for XML, DOM, or SAX in Xerces, so if you don't know it, then your
SOL. I hadn't thought about that. I've pretty much taught myself XML
over the past two years, so I forget what it's like.
In terms of using DOM to do anything, I'd look at the DOM-related
tests in t/. They'll give you a clue as to how I do it. Remember that
I've made some perl specific adjustments that you can read in the
README about anything that returns NodeList's or NamedNodeMap's (who
wants ugly heavy-weight C++ objects when you can get nice perl arrays
and hashes). Also getElementsByTagName() and getElementById() are your
friend.
> On which list are you going to bring up the documentation issue?
This one. More on that later.
jas.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org
Re: Doctype attributes, and formatting in the DOM
Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
"Jason E. Stewart" wrote:
>
> The problem is after you run createDocument() your document already
> has two child nodes, the doctype node, and an element node
> <lexicon/>. When you use appendChild() it puts the XMLDecl node
> in the wrong place, it wants to be first in the list, so use
> insertBefore():
Aha. Now I just need to figure out how to get a handle on that
<lexicon/> element createDocument() added so I can futz with
it.. found it, getDocumentElement(). Man, wading through the
docco to find that was a pain. This stuff is clearly written
for people who are already very familiar with XML and the DOM. :-(
On which list are you going to bring up the documentation issue?
--
#ken P-)}
Ken Coar, Sanagendamgagwedweinini http://Golux.Com/coar/
Apache Software Foundation http://www.apache.org/
"Apache Server for Dummies" http://Apache-Server.Com/
"Apache Server Unleashed" http://ApacheUnleashed.Com/
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org
Re: Hola, folks.. This is not really a PMC issue, but a fast
Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
Hi Ken,
"Rodent of Unusual Size" <Ke...@Golux.Com> writes:
> Okey, I tried that, and I ran into an entirely new set of problems:
>
> use XML::Xerces;
> use XML::Xerces::DOMParse;
>
> $DOMimpl = XML::Xerces::DOM_DOMImplementation::getImplementation();
> $doctype = $DOMimpl->createDocumentType('lexicon', '', 'ap-dict.dtd');
> $doc = $DOMimpl->createDocument('lexicon', 'lexicon', $doctype);
> $XMLdecl = $doc->createXMLDecl("1.0", "utf-8", 'yes');
> $doc->appendChild($XMLdecl);
> $XML::Xerces::DOMParse::INDENT = " ";
> XML::Xerces::DOMParse::format($doc);
> XML::Xerces::DOMParse::print(\*STDOUT, $doc);
The problem is after you run createDocument() your document already
has two child nodes, the doctype node, and an element node
<lexicon/>. When you use appendChild() it puts the XMLDecl node in the
wrong place, it wants to be first in the list, so use insertBefore():
DB<19> $doc->insertBefore($XMLdecl,$doctype)
DB<20> x $doc->getChildNodes
0 XML::Xerces::DOM_XMLDecl=HASH(0x1066baa8)
empty hash
1 XML::Xerces::DOM_DocumentType=HASH(0x1066cb08)
empty hash
2 XML::Xerces::DOM_Element=HASH(0x1066ca60)
empty hash
DB<21> XML::Xerces::DOMParse::print(\*STDOUT,$doc)
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE lexicon SYSTEM 'ap-dict.dtd' >
<lexicon/>
> > > I am finding the Xerces-P and Xerces-C documentation pretty
> > > bloody opaque. For instance, one major facet it seems to be
> > > missing is *examples*..
> >
> > Yes, that seems to be everyones comment. So far, I have no
> > takers for helping me improve the docs (nor, in fact, have I
> > had any volunteers to help me improve the code, which would
> > also be welcome ;-)
>
> I would be glad to help out if and how. I do a lot of writing,
> and my incredibly neophyte perspective on this stuff might be
> useful.
Ok, I'll start a discussion on the list of what needs to be done and
different ways that we can do it.
> > > 2. How can I persuade the printing method to *not* turn "
> > > into " in my processing instructions??? 'type="text/css"'
> > > gets turned into 'type="text/css"' which is really
> > > annoying..
> >
> > Hmmm... I would fiddle with
> > $parser->setCreateEntityReferenceNodes(). If you set this to '1' it
> > should leave your entities alone. Let us know if this succeeds.
>
> Well, I *want* entities encoded everywhere except in the
> processing directives.
Sorry, I'm not sure. I've found some other issues with how Xerces
handles entities, so I haven't dug into it much further...
jas.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org
Re: Hola, folks.. This is not really a PMC issue, but a fast
Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
Thanks for the assist, Jason. I still have some issues..
"Jason E. Stewart" wrote:
>
> "Rodent of Unusual Size" <Ke...@Golux.Com> writes:
>
> > I am using XML::Xerces::DOM_Document::createDocument() to
> > create a DOM tree, and populating it with the appropriate
> > calls as I process the data I want to represent.
>
> I do not know why this interface exists, as far as I can tell, it is
> broken in Xerces-C. Check out the DOMException.t example for creating
> a document using the DOM_DOMIplementation::createDocument interface
> instead:
>
> my $impl = XML::Xerces::DOM_DOMImplementation::getImplementation();
> my $dt = $impl->createDocumentType('Foo', '', 'Foo.dtd');
> my $doc = $impl->createDocument('Foo', 'foo',$dt);
Okey, I tried that, and I ran into an entirely new set of problems:
use XML::Xerces;
use XML::Xerces::DOMParse;
$DOMimpl = XML::Xerces::DOM_DOMImplementation::getImplementation();
$doctype = $DOMimpl->createDocumentType('lexicon', '', 'ap-dict.dtd');
$doc = $DOMimpl->createDocument('lexicon', 'lexicon', $doctype);
$XMLdecl = $doc->createXMLDecl("1.0", "utf-8", 'yes');
$doc->appendChild($XMLdecl);
$XML::Xerces::DOMParse::INDENT = " ";
XML::Xerces::DOMParse::format($doc);
XML::Xerces::DOMParse::print(\*STDOUT, $doc);
gives me
./logerr-extract.pl: couldn't find an XMLDecl node, try
$parser->setToCreateXMLDeclTypeNode(1) at
/usr/local/lib/perl5/site_perl/5.6.0/i686-linux
/XML/Xerces/DOMParse.pm line 269.
Eh?
> > I am finding the Xerces-P and Xerces-C documentation pretty
> > bloody opaque. For instance, one major facet it seems to be
> > missing is *examples*..
>
> Yes, that seems to be everyones comment. So far, I have no
> takers for helping me improve the docs (nor, in fact, have I
> had any volunteers to help me improve the code, which would
> also be welcome ;-)
I would be glad to help out if and how. I do a lot of writing,
and my incredibly neophyte perspective on this stuff might be
useful.
I can tell you right off the top one thing that would make
the docco *much* more useful for me, and possibly others:
ditch the organisation-by-object as the sole index. If I want
details about a createFoo method, I want to be able to go to the
TOC/index and look for "createFoo", not drill down to it through
the object tree. If multiple object types implement createFoo,
fine, list/xref them -- but 'createFoo' should be in the index.
> > 2. How can I persuade the printing method to *not* turn "
> > into " in my processing instructions??? 'type="text/css"'
> > gets turned into 'type="text/css"' which is really
> > annoying..
>
> Hmmm... I would fiddle with
> $parser->setCreateEntityReferenceNodes(). If you set this to '1' it
> should leave your entities alone. Let us know if this succeeds.
Well, I *want* entities encoded everywhere except in the
processing directives.
--
#ken P-)}
Ken Coar, Sanagendamgagwedweinini http://Golux.Com/coar/
Apache Software Foundation http://www.apache.org/
"Apache Server for Dummies" http://Apache-Server.Com/
"Apache Server Unleashed" http://ApacheUnleashed.Com/
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org
Re: Hola, folks.. This is not really a PMC issue, but a fast
Posted by "Jason E. Stewart" <ja...@openinformatics.com>.
"Rodent of Unusual Size" <Ke...@Golux.Com> writes:
> I am using XML::Xerces::DOM_Document::createDocument() to
> create a DOM tree, and populating it with the appropriate
> calls as I process the data I want to represent.
I do not know why this interface exists, as far as I can tell, it is
broken in Xerces-C. Check out the DOMException.t example for creating
a document using the DOM_DOMIplementation::createDocument interface
instead:
my $impl = XML::Xerces::DOM_DOMImplementation::getImplementation();
my $dt = $impl->createDocumentType('Foo', '', 'Foo.dtd');
my $doc = $impl->createDocument('Foo', 'foo',$dt);
> Knowing next to nothing about the DOM, I am essentially following
> the sample apps' example blindly. After completing the model, I am
> emitting it as XML using XML::Xerces::DOMParse::format() and
> XML::Xerces::DOMParse::print().
DOMParse::print() works well (although the interface was designed by
Tom Watson a couple of years ago and really needs an overhaul), but if
you want something simpler just use the serialize() method for
DOM_Node. It won't format the output, but in most cases, that's
unnecessary.
> I am finding the Xerces-P and Xerces-C documentation pretty
> bloody opaque. For instance, one major facet it seems to be
> missing is *examples*..
Yes, that seems to be everyones comment. So far, I have no takers for
helping me improve the docs (nor, in fact, have I had any volunteers
to help me improve the code, which would also be welcome ;-)
> 1. How can I set the additional pieces of the DOCTYPE from
> Xerces-P? I can set the name, but I do not see any way to
> set the SYSTEM/PUBLIC identifier keyword and the external
> subset URL. Trying to set them in the createDocumentType()
> call causes a segfault. :-)
See the example above.
> 2. How can I persuade the printing method to *not* turn "
> into " in my processing instructions??? 'type="text/css"'
> gets turned into 'type="text/css"' which is really
> annoying..
Hmmm... I would fiddle with
$parser->setCreateEntityReferenceNodes(). If you set this to '1' it
should leave your entities alone. Let us know if this succeeds.
> 3. How can I convince the printing/formatting method that *some*
> elements should not be newline-and-indented? This is adding
> incorrect whitespace. For instance, something that should be
>
> <foo>this is[<bar>something</bar>] to see</foo>
>
> is being emitted as
>
> <foo>this is[
> <bar>something</bar>] to see
> </foo>
>
> The whitespace between '[' and '<bar>something' is screwing up the
> result.
DOMParse may have to be re-written. I haven't looked at it's interface
since I ported it to Xerces.pm-1.3.
jas.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org