You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by Ben Griffin <be...@redsnapper.net> on 2009/04/08 16:19:04 UTC

How to stop elements with no content from being converted to empty elements?

On XercesC 301 using DOMLSParser and DOMLSSerializer

I am sure that I am missing something really obvious here.

If I have an element with no content such as: .... <xxx></xxx>  ....
How can I preserve it as an element, without it being normalised to  
the empty element <xxx />
Of course, I'm only interested in the final output - it doesn't really  
matter to me how it looks internally.
Is it something I need to set in the schema, or an option that I  
switch within the Parser/Serialzer?

I am using the current set of options:

DOMLSParser options
fgXercesValidationErrorAsFatal, false		
fgXercesUserAdoptsDOMDocument, true		
fgXercesSchema, true	
fgXercesContinueAfterFatalError, true		
fgXercesCacheGrammarFromParse, true		
fgXercesUseCachedGrammarInParse, true		
fgDOMElementContentWhitespace, false		
fgDOMNamespaces, true
fgDOMDatatypeNormalization, true
fgXercesLoadExternalDTD, false
fgXercesIgnoreCachedDTD, false
fgXercesIdentityConstraintChecking,true
fgDOMValidateIfSchema, true

DOMLSSerializer options
fgDOMErrorHandler,errorHandler
fgDOMWRTDiscardDefaultContent,true
fgDOMXMLDeclaration,false
fgDOMWRTWhitespaceInElementContent,true
fgDOMWRTBOM,false


Thanks for any help!


Re: How to stop elements with no content from being converted to empty elements?

Posted by Ben Griffin <be...@redsnapper.net>.
Hi Dale, I have been using  w3c's XHTML schema for XHTML 1.0 Strict,  
just as you suggested below, for over four years.
The problem is that the following resulting elements will crash or  
cause other problems on many legacy browsers:

<script />  <--- kills MSIE before version 6.
<div />  <--- destroys rendering on MSIE
<textarea />  <-- breaks MSIE.

Now, there are workarounds for the first two:  using xml comments to  
prevent the element from closing, such as:

<script xmlns="http://www.w3.org/1999/xhtml"  ><!-- stop the closure-- 
 ></script>

<div xmlns="http://www.w3.org/1999/xhtml"><!-- stop the closure --></ 
div>

BUT within MSIE (still, unfortunatly, the most popular, and least  
standards complient user-agent) textarea will go awry

<textarea xmlns="http://www.w3.org/1999/xhtml" ><!-- this comment will  
appear IN the textarea on MSIE --></textarea>

I am assuming that this is something you (and possibly others) are  
unaware of, otherwise you would have mentioned it.
I guess the only option is to do some horrible post-processing over  
any result documents, looking for textarea elements using the xhtml  
namespace, and manually converting them.

Initially I was handling this by using the XHTML Strict DTD - but  
unfortunately, I have had no ability to successfully cache it, and w3c  
were getting hit with grammar loads every time I ran the executable,  
so I moved back to the cached XHTML schema instead (which caches just  
fine!)

Any suggestions / work-arounds are very welcome!

Ben

> On Wed, 2009-04-08 at 15:43 +0100, Ben Griffin wrote:
>> Thanks for your speedy response Dale,
>>
>> Yes,  know that within XML, they are defined to be equivalent, but
>> there a many legacy SGML parsers that differentiate them, such as Web
>> browsers.
>> I was -hoping- that there would be a way of preserving the form of  
>> the
>> text, regardless of it's equivalence.  But I am assuming from your
>> answer that it is probably not be the case.
>
> If you're interested in generating text for consumption by web  
> browsers,
> you may want to look at "XHTML".  If I have it right, XHTML's goal  
> is to
> be both valid XML and acceptable HTML for browsers.  Which means that
> XHTML generators might well do what you want so as to not have  
> problems
> with web browsers.
>
> Dale
>
>


RE: How to stop elements with no content from being converted toempty elements?

Posted by Jesse Pelton <js...@PKC.com>.
 
> If you're interested in generating text for consumption by web
browsers,
> you may want to look at "XHTML".  If I have it right, XHTML's goal is
to
> be both valid XML and acceptable HTML for browsers.  Which means that
> XHTML generators might well do what you want so as to not have
problems
> with web browsers.

While you could in theory use any XML generator (like Xerces) to
generate
XHTML, if you do so you need a way to deal with the problem Ben raised.
I use Xerces to generate content in the form of XHTML, but its
content-type
is "text/html," because the reality is that the dominant browser can't
handle XHTML served as XHTML.  Consequently, the short-form rendering of
some tags (like <script ../> and <div.../> confuses most browsers.

My (ugly) solution is to make sure such tags have content, even if it's
whitespace.  Another would be to write a generalized serializer that
never uses the short form.  It would be better, though, to write a
serializer that knows what XHTML elements are problematic and guarantees
that they'll be emitted in long form.  You can think of this as using
Xerces to build an XHTML generator like that suggested by Dale.



Re: How to stop elements with no content from being converted to empty elements?

Posted by Dale Worley <dw...@nortel.com>.
On Wed, 2009-04-08 at 15:43 +0100, Ben Griffin wrote:
> Thanks for your speedy response Dale,
> 
> Yes,  know that within XML, they are defined to be equivalent, but  
> there a many legacy SGML parsers that differentiate them, such as Web  
> browsers.
> I was -hoping- that there would be a way of preserving the form of the  
> text, regardless of it's equivalence.  But I am assuming from your  
> answer that it is probably not be the case.

If you're interested in generating text for consumption by web browsers,
you may want to look at "XHTML".  If I have it right, XHTML's goal is to
be both valid XML and acceptable HTML for browsers.  Which means that
XHTML generators might well do what you want so as to not have problems
with web browsers.

Dale



Re: How to stop elements with no content from being converted to empty elements?

Posted by Ben Griffin <be...@redsnapper.net>.
Thanks for your speedy response Dale,

Yes,  know that within XML, they are defined to be equivalent, but  
there a many legacy SGML parsers that differentiate them, such as Web  
browsers.
I was -hoping- that there would be a way of preserving the form of the  
text, regardless of it's equivalence.  But I am assuming from your  
answer that it is probably not be the case.

B.

On 8 Apr 2009, at 15:37, Dale Worley wrote:

> On Wed, 2009-04-08 at 15:19 +0100, Ben Griffin wrote:
>> If I have an element with no content such as: .... <xxx></xxx>  ....
>> How can I preserve it as an element, without it being normalised to
>> the empty element <xxx />
>
> The two forms are defined to be equivalent, so I wouldn't be  
> surprise if
> XML output software feels free to replace the one with the other.
>
> Dale
>
>


Re: How to stop elements with no content from being converted to empty elements?

Posted by Dale Worley <dw...@nortel.com>.
On Wed, 2009-04-08 at 15:19 +0100, Ben Griffin wrote:
> If I have an element with no content such as: .... <xxx></xxx>  ....
> How can I preserve it as an element, without it being normalised to  
> the empty element <xxx />

The two forms are defined to be equivalent, so I wouldn't be surprise if
XML output software feels free to replace the one with the other.

Dale