You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Mark Shacklette <jm...@home.com> on 2000/11/01 19:08:16 UTC

question about embedded data in XML

We have a need to embed the following types of information in an XML document:

strings that themselves contain XML (which we DON'T want parsed)
strings that themselves contain HTML (which we also don't want parsed)
encoded data that may be binary data encoded which could contain ANYTHING
(which we also don't want parsed).

Does anyone have any recommendations on XML best practices to handle the above
data needs?

If more info is needed, I can provide that.

Thanks.


Re: question about embedded data in XML

Posted by "Thomas B. Passin" <tp...@mitretek.org>.
Sean Kelly wrote -

> How about setting the content type of the element to
> contain embedded the XML as ANY ...
>
Actually, ANY means you can use any element declared in the DTD, not
that you can use any element whatever.

Tom Passin


Re: question about embedded data in XML

Posted by Sean Kelly <ke...@ad1440.net>.
> This means that within XML, there is NO WAY WHATEVER
> to embed XML within an XML document, right?  What if
> the "embedded" XML were encoded, might that work?
> Any other ideas for workarounds?

How about setting the content type of the element to
contain embedded the XML as ANY ...

<!ELEMENT root (title, xml)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT xml ANY>

So a doc like this is valid:

<root>
  <title>My Embedded XML</title>
  <xml>
    <foo>
      <bar>baz</bar>
      <eeep/>
    </foo>
  </xml>
</root>

Alternatively, encode the embedded XML using a format
opaque to XML---say base-64, and store that as CDATA.
You could then embed the <?xml...?> processing
instruction, DOCTYPE declaration, schema declaration,
etc., too.

--Sean




Re: question about embedded data in XML

Posted by Andy Clark <an...@apache.org>.
Dane Foster wrote:
> What about instead of embedding the file inside your XML you store 
> the file somewhere else( like disk or a database).  Then your XML 
> file would only store  a reference (URI, path, etc..) to the actual 
> file.

Or encode the inclusion in some other format besides XML -- with
the assumption that you don't include invalid XML chars and do the
proper escaping for markup characters such as '<' and '&'.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: question about embedded data in XML

Posted by Dane Foster <df...@equitytg.com>.
What about instead of embedding the file inside your XML you store the file
somewhere else( like disk or a database).  Then your XML file would only
store  a reference (URI, path, etc..) to the actual file.
----- Original Message -----
From: "Mark Shacklette" <jm...@home.com>
To: <ge...@xml.apache.org>
Cc: <an...@apache.org>
Sent: Friday, November 03, 2000 12:12 PM
Subject: Re: question about embedded data in XML


> Thanks Andy for the info.
>
> This means that within XML, there is NO WAY WHATEVER to embed XML within
> an XML document, right?  What if the "embedded" XML were encoded, might
> that work?  Any other ideas for workarounds?
>
> What we're trying to do is use XML for our file format.  HOWEVER, any
given
> file (which represents an "object") might very well itself contain XML
(say,
> one of the "string" variables in the object itself contained some XML),
which
> is why we need the ability to "embed" XML in an XML document.  I was
hoping the
> CDATA section might be the solution, apparently it is not.
>
> Any other ideas?
>
> Thanks for all info and help.
>
>
> > Mark Shacklette wrote:
> > > the point is this:  Will Xerces parsers correctly work with
> > > something like the following (pardon if it's not exacly correct,
> > > but you get the idea...):
> > >
> > > <foo><! [CDATA[<embeddedXML><! [CDATA[hello]]></embeddedXML>]]></foo>
> >
> > This has acouple problems:
> >
> > 1) You aren't allowed a space between "<!" and "[CDATA[".
> >    (XML spec 2.7 [19])
> > 2) CDATA (and character content, for that matter) cannot contain
> >    the "]]>" sequence. Which means that you cannot embed CDATA
> >    sections within each other. (XML spec 2.4 [14], 2.7 [20])
> >
> > --
> > Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org
> >
> > ---------------------------------------------------------------------
> > In case of troubles, e-mail:     webmaster@xml.apache.org
> > To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
> > For additional commands, e-mail: general-help@xml.apache.org
> >
>
>
> ---------------------------------------------------------------------
> In case of troubles, e-mail:     webmaster@xml.apache.org
> To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org
>
>


Re: question about embedded data in XML

Posted by Mark Shacklette <jm...@home.com>.
Thanks Andy for the info.

This means that within XML, there is NO WAY WHATEVER to embed XML within
an XML document, right?  What if the "embedded" XML were encoded, might
that work?  Any other ideas for workarounds?

What we're trying to do is use XML for our file format.  HOWEVER, any given
file (which represents an "object") might very well itself contain XML (say,
one of the "string" variables in the object itself contained some XML), which
is why we need the ability to "embed" XML in an XML document.  I was hoping the
CDATA section might be the solution, apparently it is not.

Any other ideas?

Thanks for all info and help.


> Mark Shacklette wrote:
> > the point is this:  Will Xerces parsers correctly work with 
> > something like the following (pardon if it's not exacly correct, 
> > but you get the idea...):
> > 
> > <foo><! [CDATA[<embeddedXML><! [CDATA[hello]]></embeddedXML>]]></foo>
> 
> This has acouple problems:
> 
> 1) You aren't allowed a space between "<!" and "[CDATA[". 
>    (XML spec 2.7 [19])
> 2) CDATA (and character content, for that matter) cannot contain 
>    the "]]>" sequence. Which means that you cannot embed CDATA 
>    sections within each other. (XML spec 2.4 [14], 2.7 [20])
> 
> -- 
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org
> 
> ---------------------------------------------------------------------
> In case of troubles, e-mail:     webmaster@xml.apache.org
> To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org
> 


Re: question about embedded data in XML

Posted by Andy Clark <an...@apache.org>.
Mark Shacklette wrote:
> the point is this:  Will Xerces parsers correctly work with 
> something like the following (pardon if it's not exacly correct, 
> but you get the idea...):
> 
> <foo><! [CDATA[<embeddedXML><! [CDATA[hello]]></embeddedXML>]]></foo>

This has acouple problems:

1) You aren't allowed a space between "<!" and "[CDATA[". 
   (XML spec 2.7 [19])
2) CDATA (and character content, for that matter) cannot contain 
   the "]]>" sequence. Which means that you cannot embed CDATA 
   sections within each other. (XML spec 2.4 [14], 2.7 [20])

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: question about embedded data in XML

Posted by "Thomas B. Passin" <tp...@mitretek.org>.
Mark Shacklette asked

> And this is I suppose the question, because what happens in the case
that THAT
> CDATA section itself includes XML, and to really get at it, what if
that
> enclosed XML itself includes multiple CDATA section.  Which CDATA
closure will
> operate?  Or am I missing something?
>
> the point is this:  Will Xerces parsers correctly work with something
like the
> following (pardon if it's not exacly correct, but you get the
idea...):
>
> <foo><! [CDATA[<embeddedXML><! [CDATA[hello]]></embeddedXML>]]></foo>
>                                ^^^^^^^^^^^^^^
>
You can have any legal xml characters in a CDATA section except for the
token that denotes the end of one.  To include them, you'd have to
escape them with an entity or a character reference.

But there is a trick - you can split the delimiter characters between
two adjacent CDATA sections, then it works.  This means you have to do
some pre and post processing to get what you want, but it's not much
processing.

Tom Passin


Re: question about embedded data in XML

Posted by "Thomas B. Passin" <tp...@mitretek.org>.
Mark Shacklette aksed

> And this is I suppose the question, because what happens in the case
that THAT
> CDATA section itself includes XML, and to really get at it, what if
that
> enclosed XML itself includes multiple CDATA section.  Which CDATA
closure will
> operate?  Or am I missing something?
>
> the point is this:  Will Xerces parsers correctly work with something
like the
> following (pardon if it's not exacly correct, but you get the
idea...):
>
> <foo><! [CDATA[<embeddedXML><! [CDATA[hello]]></embeddedXML>]]></foo>
>                                ^^^^^^^^^^^^^^
>
You can also include an xml fragment using an entity but then it would
get parsed, which you sain you don't want.  Why not include a reference
(http:// or file://) to the xml that you want to transmit, rather than
trying to include it verbatim?

Tom Passin


Re: question about embedded data in XML

Posted by Mark Shacklette <jm...@home.com>.
And this is I suppose the question, because what happens in the case that THAT
CDATA section itself includes XML, and to really get at it, what if that
enclosed XML itself includes multiple CDATA section.  Which CDATA closure will
operate?  Or am I missing something?

the point is this:  Will Xerces parsers correctly work with something like the
following (pardon if it's not exacly correct, but you get the idea...):

<foo><! [CDATA[<embeddedXML><! [CDATA[hello]]></embeddedXML>]]></foo>
                               ^^^^^^^^^^^^^^

Thanks for any and all help or suggestions.

> Mark Shacklette
> 
> > We have a need to embed the following types of information in an XML
> document:
> >
> > strings that themselves contain XML (which we DON'T want parsed)
> > strings that themselves contain HTML (which we also don't want parsed)
> > encoded data that may be binary data encoded which could contain
> ANYTHING
> > (which we also don't want parsed).
> >
> > Does anyone have any recommendations on XML best practices to handle
> the above
> > data needs?
> >
> This is what CDATA sections are for.  Binary data should be base-64
> encoded, then put into a CDATA section.
> 
> Tom Passin
> 


Re: question about embedded data in XML

Posted by "Thomas B. Passin" <tp...@mitretek.org>.
Mark Shacklette

> We have a need to embed the following types of information in an XML
document:
>
> strings that themselves contain XML (which we DON'T want parsed)
> strings that themselves contain HTML (which we also don't want parsed)
> encoded data that may be binary data encoded which could contain
ANYTHING
> (which we also don't want parsed).
>
> Does anyone have any recommendations on XML best practices to handle
the above
> data needs?
>
This is what CDATA sections are for.  Binary data should be base-64
encoded, then put into a CDATA section.

Tom Passin