You are viewing a plain text version of this content. The canonical link for it is here.

Posted to p-dev@xerces.apache.org by Rodent of Unusual Size <Ke...@Golux.Com> on 2001/07/27 13:00:16 UTC

Hola, folks.. This is not really a PMC issue, but a fast

New to actually *using* XML and Xerces, so please be gentle..
I am probably missing some obvious pieces.

I have three [basic] questions twith which I hope someone
can help me..

I am using XML::Xerces::DOM_Document::createDocument() to
create a DOM tree, and populating it with the appropriate
calls as I process the data I want to represent.  Knowing
next to nothing about the DOM, I am essentially following the
sample apps' example blindly.  After completing the model,
I am emitting it as XML using XML::Xerces::DOMParse::format()
and XML::Xerces::DOMParse::print().

I am finding the Xerces-P and Xerces-C documentation pretty
bloody opaque.  For instance, one major facet it seems to be
missing is *examples*..

1. How can I set the additional pieces of the DOCTYPE from
Xerces-P?  I can set the name, but I do not see any way to
set the SYSTEM/PUBLIC identifier keyword and the external
subset URL.  Trying to set them in the createDocumentType()
call causes a segfault. :-)

2. How can I persuade the printing method to *not* turn "
into &quot; in my processing instructions???  'type="text/css"'
gets turned into 'type=&quot;text/css&quot;' which is really
annoying..

3. How can I convince the printing/formatting method that *some*
elements should not be newline-and-indented?  This is adding
incorrect whitespace.  For instance, something that should be

  <foo>this is[<bar>something</bar>] to see</foo>

is being emitted as

  <foo>this is[
   <bar>something</bar>] to see
  </foo>

The whitespace between '[' and '<bar>something' is screwing up the
result.

TIA! :-)
-- 
#ken    P-)}

Ken Coar, Sanagendamgagwedweinini  http://Golux.Com/coar/
Apache Software Foundation         http://www.apache.org/
"Apache Server for Dummies"        http://Apache-Server.Com/
"Apache Server Unleashed"          http://ApacheUnleashed.Com/

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org

Re: Doctype attributes, and formatting in the DOM

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.

Dunno what happened to the subject of my previous message.
This subject is more appropriate for the thread..
-- 
#ken    P-)}

Ken Coar, Sanagendamgagwedweinini  http://Golux.Com/coar/
Apache Software Foundation         http://www.apache.org/
"Apache Server for Dummies"        http://Apache-Server.Com/
"Apache Server Unleashed"          http://ApacheUnleashed.Com/

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org

Re: Doctype attributes, and formatting in the DOM

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.

Hey Ken,

Glad to hear that things are finally going your way.

"Rodent of Unusual Size" <Ke...@Golux.Com> writes:

> Aha.  Now I just need to figure out how to get a handle on that
> <lexicon/> element createDocument() added so I can futz with
> it.. found it, getDocumentElement().  Man, wading through the docco
> to find that was a pain.  This stuff is clearly written for people
> who are already very familiar with XML and the DOM. :-(

Ah... That is actually quite a good point. There really is no tuturial
for for XML, DOM, or SAX in Xerces, so if you don't know it, then your
SOL. I hadn't thought about that. I've pretty much taught myself XML
over the past two years, so I forget what it's like.

In terms of using DOM to do anything, I'd look at the DOM-related
tests in t/. They'll give you a clue as to how I do it. Remember that
I've made some perl specific adjustments that you can read in the
README about anything that returns NodeList's or NamedNodeMap's (who
wants ugly heavy-weight C++ objects when you can get nice perl arrays
and hashes). Also getElementsByTagName() and getElementById() are your
friend.

> On which list are you going to bring up the documentation issue?

This one. More on that later.

jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org

Re: Doctype attributes, and formatting in the DOM

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.

"Jason E. Stewart" wrote:
> 
> The problem is after you run createDocument() your document already
> has two child nodes, the doctype node, and an element node
> <lexicon/>. When you use appendChild() it puts the XMLDecl node
> in the wrong place, it wants to be first in the list, so use
> insertBefore():

Aha.  Now I just need to figure out how to get a handle on that
<lexicon/> element createDocument() added so I can futz with
it.. found it, getDocumentElement().  Man, wading through the
docco to find that was a pain.  This stuff is clearly written
for people who are already very familiar with XML and the DOM. :-(

On which list are you going to bring up the documentation issue?
-- 
#ken    P-)}

Ken Coar, Sanagendamgagwedweinini  http://Golux.Com/coar/
Apache Software Foundation         http://www.apache.org/
"Apache Server for Dummies"        http://Apache-Server.Com/
"Apache Server Unleashed"          http://ApacheUnleashed.Com/

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org

Re: Hola, folks.. This is not really a PMC issue, but a fast

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.

Hi Ken,

"Rodent of Unusual Size" <Ke...@Golux.Com> writes:

> Okey, I tried that, and I ran into an entirely new set of problems:
> 
>   use XML::Xerces;
>   use XML::Xerces::DOMParse;
> 
>   $DOMimpl = XML::Xerces::DOM_DOMImplementation::getImplementation();
>   $doctype = $DOMimpl->createDocumentType('lexicon', '', 'ap-dict.dtd');
>   $doc = $DOMimpl->createDocument('lexicon', 'lexicon', $doctype);
>   $XMLdecl = $doc->createXMLDecl("1.0", "utf-8", 'yes');
>   $doc->appendChild($XMLdecl);
>   $XML::Xerces::DOMParse::INDENT = " ";
>   XML::Xerces::DOMParse::format($doc);
>   XML::Xerces::DOMParse::print(\*STDOUT, $doc);

The problem is after you run createDocument() your document already
has two child nodes, the doctype node, and an element node
<lexicon/>. When you use appendChild() it puts the XMLDecl node in the
wrong place, it wants to be first in the list, so use insertBefore():

  DB<19> $doc->insertBefore($XMLdecl,$doctype)
  DB<20> x $doc->getChildNodes
0  XML::Xerces::DOM_XMLDecl=HASH(0x1066baa8)
     empty hash
1  XML::Xerces::DOM_DocumentType=HASH(0x1066cb08)
     empty hash
2  XML::Xerces::DOM_Element=HASH(0x1066ca60)
     empty hash
  DB<21> XML::Xerces::DOMParse::print(\*STDOUT,$doc)
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE lexicon SYSTEM 'ap-dict.dtd' >
<lexicon/>

> > > I am finding the Xerces-P and Xerces-C documentation pretty
> > > bloody opaque.  For instance, one major facet it seems to be
> > > missing is *examples*..
> > 
> > Yes, that seems to be everyones comment. So far, I have no
> > takers for helping me improve the docs (nor, in fact, have I
> > had any volunteers to help me improve the code, which would
> > also be welcome ;-)
> 
> I would be glad to help out if and how.  I do a lot of writing,
> and my incredibly neophyte perspective on this stuff might be
> useful.

Ok, I'll start a discussion on the list of what needs to be done and
different ways that we can do it.

> > > 2. How can I persuade the printing method to *not* turn "
> > > into &quot; in my processing instructions???  'type="text/css"'
> > > gets turned into 'type=&quot;text/css&quot;' which is really
> > > annoying..
> > 
> > Hmmm... I would fiddle with
> > $parser->setCreateEntityReferenceNodes(). If you set this to '1' it
> > should leave your entities alone. Let us know if this succeeds.
> 
> Well, I *want* entities encoded everywhere except in the
> processing directives.

Sorry, I'm not sure. I've found some other issues with how Xerces
handles entities, so I haven't dug into it much further...

jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org

Re: Hola, folks.. This is not really a PMC issue, but a fast

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.

Thanks for the assist, Jason.  I still have some issues..

"Jason E. Stewart" wrote:
> 
> "Rodent of Unusual Size" <Ke...@Golux.Com> writes:
> 
> > I am using XML::Xerces::DOM_Document::createDocument() to
> > create a DOM tree, and populating it with the appropriate
> > calls as I process the data I want to represent.
> 
> I do not know why this interface exists, as far as I can tell, it is
> broken in Xerces-C. Check out the DOMException.t example for creating
> a document using the DOM_DOMIplementation::createDocument interface
> instead:
> 
> my $impl = XML::Xerces::DOM_DOMImplementation::getImplementation();
> my $dt = $impl->createDocumentType('Foo', '', 'Foo.dtd');
> my $doc = $impl->createDocument('Foo', 'foo',$dt);

Okey, I tried that, and I ran into an entirely new set of problems:

  use XML::Xerces;
  use XML::Xerces::DOMParse;

  $DOMimpl = XML::Xerces::DOM_DOMImplementation::getImplementation();
  $doctype = $DOMimpl->createDocumentType('lexicon', '', 'ap-dict.dtd');
  $doc = $DOMimpl->createDocument('lexicon', 'lexicon', $doctype);
  $XMLdecl = $doc->createXMLDecl("1.0", "utf-8", 'yes');
  $doc->appendChild($XMLdecl);
  $XML::Xerces::DOMParse::INDENT = " ";
  XML::Xerces::DOMParse::format($doc);
  XML::Xerces::DOMParse::print(\*STDOUT, $doc);

gives me

  ./logerr-extract.pl: couldn't find an XMLDecl node, try
    $parser->setToCreateXMLDeclTypeNode(1) at
    /usr/local/lib/perl5/site_perl/5.6.0/i686-linux
    /XML/Xerces/DOMParse.pm line 269.

Eh?

> > I am finding the Xerces-P and Xerces-C documentation pretty
> > bloody opaque.  For instance, one major facet it seems to be
> > missing is *examples*..
> 
> Yes, that seems to be everyones comment. So far, I have no
> takers for helping me improve the docs (nor, in fact, have I
> had any volunteers to help me improve the code, which would
> also be welcome ;-)

I would be glad to help out if and how.  I do a lot of writing,
and my incredibly neophyte perspective on this stuff might be
useful.

I can tell you right off the top one thing that would make
the docco *much* more useful for me, and possibly others:
ditch the organisation-by-object as the sole index.  If I want
details about a createFoo method, I want to be able to go to the
TOC/index and look for "createFoo", not drill down to it through
the object tree.  If multiple object types implement createFoo,
fine, list/xref them -- but 'createFoo' should be in the index.

> > 2. How can I persuade the printing method to *not* turn "
> > into &quot; in my processing instructions???  'type="text/css"'
> > gets turned into 'type=&quot;text/css&quot;' which is really
> > annoying..
> 
> Hmmm... I would fiddle with
> $parser->setCreateEntityReferenceNodes(). If you set this to '1' it
> should leave your entities alone. Let us know if this succeeds.

Well, I *want* entities encoded everywhere except in the
processing directives.
-- 
#ken    P-)}

Ken Coar, Sanagendamgagwedweinini  http://Golux.Com/coar/
Apache Software Foundation         http://www.apache.org/
"Apache Server for Dummies"        http://Apache-Server.Com/
"Apache Server Unleashed"          http://ApacheUnleashed.Com/

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org

Re: Hola, folks.. This is not really a PMC issue, but a fast

Posted by "Jason E. Stewart" <ja...@openinformatics.com>.

"Rodent of Unusual Size" <Ke...@Golux.Com> writes:

> I am using XML::Xerces::DOM_Document::createDocument() to
> create a DOM tree, and populating it with the appropriate
> calls as I process the data I want to represent.  

I do not know why this interface exists, as far as I can tell, it is
broken in Xerces-C. Check out the DOMException.t example for creating
a document using the DOM_DOMIplementation::createDocument interface
instead:

my $impl = XML::Xerces::DOM_DOMImplementation::getImplementation();
my $dt = $impl->createDocumentType('Foo', '', 'Foo.dtd');
my $doc = $impl->createDocument('Foo', 'foo',$dt);

> Knowing next to nothing about the DOM, I am essentially following
> the sample apps' example blindly.  After completing the model, I am
> emitting it as XML using XML::Xerces::DOMParse::format() and
> XML::Xerces::DOMParse::print().

DOMParse::print() works well (although the interface was designed by
Tom Watson a couple of years ago and really needs an overhaul), but if
you want something simpler just use the serialize() method for
DOM_Node. It won't format the output, but in most cases, that's
unnecessary. 

> I am finding the Xerces-P and Xerces-C documentation pretty
> bloody opaque.  For instance, one major facet it seems to be
> missing is *examples*..

Yes, that seems to be everyones comment. So far, I have no takers for
helping me improve the docs (nor, in fact, have I had any volunteers
to help me improve the code, which would also be welcome ;-)

> 1. How can I set the additional pieces of the DOCTYPE from
> Xerces-P?  I can set the name, but I do not see any way to
> set the SYSTEM/PUBLIC identifier keyword and the external
> subset URL.  Trying to set them in the createDocumentType()
> call causes a segfault. :-)

See the example above.

> 2. How can I persuade the printing method to *not* turn "
> into &quot; in my processing instructions???  'type="text/css"'
> gets turned into 'type=&quot;text/css&quot;' which is really
> annoying..

Hmmm... I would fiddle with
$parser->setCreateEntityReferenceNodes(). If you set this to '1' it
should leave your entities alone. Let us know if this succeeds.

> 3. How can I convince the printing/formatting method that *some*
> elements should not be newline-and-indented?  This is adding
> incorrect whitespace.  For instance, something that should be
> 
>   <foo>this is[<bar>something</bar>] to see</foo>
> 
> is being emitted as
> 
>   <foo>this is[
>    <bar>something</bar>] to see
>   </foo>
> 
> The whitespace between '[' and '<bar>something' is screwing up the
> result.

DOMParse may have to be re-written. I haven't looked at it's interface
since I ported it to Xerces.pm-1.3. 

jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org