You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-users@xml.apache.org by Henrik Vendelbo <hv...@bluprints.com> on 2003/10/22 15:55:05 UTC

Storing XML in string format

I am playing around with JAXB persistence to Xindice, and was passing the
data as a String.

Apparently there was a parsing error, which is a bit surprising as it was
just marshalled from an object and looks fine as I can see. Anyway I noticed
DOM being involved. Is it converted to a DOM tree in order to check to see
if it is well formed ??

If so, how should I pass it, because creating a DOM tree seems quite a waste
of CPU and memory


The string that was passed :
<data:MobileContract xmlns:data="http://commons.dspc.net/datamodel">

<data:AccountName>henrik</data:AccountName><data:CustomerNo>2222222</data:Cu
stomerNo><data:ContractNo>123454</data:ContractNo><data:ContractType>std.bun
dle</data:ContractType><data:PersonName><data:First>Henrik</data:First><data
:Last>Vendelbo</data:Last></data:PersonName><data:CompanyName>BLUPRINTS</dat
a:CompanyName><data:MobileNo>07930416886</data:MobileNo>

<data:MobileNetwork>t-mobile.uk</data:MobileNetwork>

<data:HomeHostName>localhost</data:HomeHostName><data:HomePortNo>80</data:Ho
mePortNo></data:MobileContract>



Re: Storing XML in string format

Posted by Henrik Vendelbo <hv...@bluprints.com>.
Without thinking too much about it I expected the string format to be the
most direct.

Looking at the internal storage documentation, it is obvious that Xindice
needs to parse the XML, there are two benefits :
1) Only actual XML documents are stored as such. So less risk of retrieving
inconsistent resources later.
2) XML is saved in a binary form enhancing queries and storage size.

The execution speed impact should be reasonable. The only potential problem
I see is memory consumption. Using DOM for a big document could be quite
significant hit.

Apparently the setContent(String) uses DOM, although noone has verified
this.

Currently I have opted to use SAX.

For the most part I am storing Java Object implemented using XML Binding
(JaxMe), so I never really want the XML document as a string in my finished
solution.

I suggest you use a SAX contenthandler and feed your OutputStream into that.

When u get it to work you can post the code snip for others that run into
these issues :)

Henrik
----- Original Message ----- 
From: "JC Tchitchiama" <jc...@panonet.net>
To: <xi...@xml.apache.org>; "Henrik Vendelbo"
<hv...@bluprints.com>
Sent: Friday, October 31, 2003 12:21 PM
Subject: Re: Storing XML in string format


Henrik,

I'm trying to do somthing close to what you're doing.
XMLResource.setContent(String)  works fine for me. However I'd rather have
XMLResource.setContent(OutputStream) to try avoid parsing the document
many times by me and xindice. Does this sound like what you want too ?

On Wednesday 22 Oct 2003 4:24 pm, Henrik Vendelbo wrote:
> > If this isn't helpful, then I'm probably barking up the wrong tree.
>
> Apparently it was heh. Thanks for the input.
>
> What still remains is a number of questions,
>
> 1) Given a JAXB java object, with the ability to marshal into a DOM tree,
> SAX ContentHandler or java.lang.String;
>  what is the best object to pass to Xindice in order to store it as
regular
> XML ?
>
> 2) Does Xindice have a problem with namespaces ?
>
> 3) Does Xindice handle data internally as DOM trees. I got an error from
> DOM manipulation code, when I attempted to
> use XMLResource.setContent(String) ?
>
> Henrik

-- 

Best Regards.

JC.
           \\- - -//
          (  @ @  )
===oOOo-(_)-oOOo=================================================
      jct@panonet.net
=================================================================






Re: Storing XML in string format

Posted by JC Tchitchiama <jc...@panonet.net>.
Henrik,

I'm trying to do somthing close to what you're doing.
XMLResource.setContent(String)  works fine for me. However I'd rather have 
XMLResource.setContent(OutputStream) to try avoid parsing the document 
many times by me and xindice. Does this sound like what you want too ? 

On Wednesday 22 Oct 2003 4:24 pm, Henrik Vendelbo wrote:
> > If this isn't helpful, then I'm probably barking up the wrong tree.
>
> Apparently it was heh. Thanks for the input.
>
> What still remains is a number of questions,
>
> 1) Given a JAXB java object, with the ability to marshal into a DOM tree,
> SAX ContentHandler or java.lang.String;
>  what is the best object to pass to Xindice in order to store it as regular
> XML ?
>
> 2) Does Xindice have a problem with namespaces ?
>
> 3) Does Xindice handle data internally as DOM trees. I got an error from
> DOM manipulation code, when I attempted to
> use XMLResource.setContent(String) ?
>
> Henrik

-- 

Best Regards.

JC.
           \\- - -//
          (  @ @  )
===oOOo-(_)-oOOo=================================================
      jct@panonet.net
=================================================================


Re: Storing XML in string format

Posted by Henrik Vendelbo <hv...@bluprints.com>.
> If this isn't helpful, then I'm probably barking up the wrong tree.

Apparently it was heh. Thanks for the input.

What still remains is a number of questions,

1) Given a JAXB java object, with the ability to marshal into a DOM tree,
SAX ContentHandler or java.lang.String;
 what is the best object to pass to Xindice in order to store it as regular
XML ?

2) Does Xindice have a problem with namespaces ?

3) Does Xindice handle data internally as DOM trees. I got an error from DOM
manipulation code, when I attempted to
use XMLResource.setContent(String) ?

Henrik



Re: Storing XML in string format

Posted by Murray Altheim <m....@open.ac.uk>.
Henrik Vendelbo wrote:
> Hmm, look like a hack, but if it does the job I can live with it. I don't
> quite understand why the string I pass isn't valid XML, which Xindice should
> store properly. Or are you merely saying that I can speed up the DOM parsing
> using a CDATA element ?
> 
> Does this mean that Xindice will go via DOM even if you use SAX ?
> 
> And will the CDATA wrap be removed before storing it in the db ?

Henrik,

Honestly, I was in a hurray and perhaps don't understand your
predicament entirely. If you want to store XML as XML, as DOM,
then don't use CDATA sections. If you want to store XML as a
String, use a CDATA section. You'll lose the ability to do XPath
queries and anything else XML-ish, but you can store non-well-
formed markup, or any content that you don't care about in its
form as markup.

If this isn't helpful, then I'm probably barking up the wrong tree.

Murray

......................................................................
Murray Altheim                    http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK               .

   Monkeys use thoughts to control robotic arm
     http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2003/10/13/MN2018.DTL
   Bush uses media expertly to push apocalyptic view
     http://truthout.org/docs_03/091403J.shtml


Re: Storing XML in string format

Posted by Henrik Vendelbo <hv...@bluprints.com>.
Hmm, look like a hack, but if it does the job I can live with it. I don't
quite understand why the string I pass isn't valid XML, which Xindice should
store properly. Or are you merely saying that I can speed up the DOM parsing
using a CDATA element ?

Does this mean that Xindice will go via DOM even if you use SAX ?

And will the CDATA wrap be removed before storing it in the db ?

----- Original Message ----- 
From: "Murray Altheim" <m....@open.ac.uk>
To: <xi...@xml.apache.org>
Sent: Wednesday, October 22, 2003 3:03 PM
Subject: Re: Storing XML in string format


> Henrik Vendelbo wrote:
> > I am playing around with JAXB persistence to Xindice, and was passing
the
> > data as a String.
> >
> > Apparently there was a parsing error, which is a bit surprising as it
was
> > just marshalled from an object and looks fine as I can see. Anyway I
noticed
> > DOM being involved. Is it converted to a DOM tree in order to check to
see
> > if it is well formed ??
> >
> > If so, how should I pass it, because creating a DOM tree seems quite a
waste
> > of CPU and memory
> >
> >
> > The string that was passed :
> > <data:MobileContract xmlns:data="http://commons.dspc.net/datamodel">
> >
<data:AccountName>henrik</data:AccountName><data:CustomerNo>2222222</data:Cu
> >
stomerNo><data:ContractNo>123454</data:ContractNo><data:ContractType>std.bun
> >
dle</data:ContractType><data:PersonName><data:First>Henrik</data:First><data
> >
:Last>Vendelbo</data:Last></data:PersonName><data:CompanyName>BLUPRINTS</dat
> > a:CompanyName><data:MobileNo>07930416886</data:MobileNo>
> > <data:MobileNetwork>t-mobile.uk</data:MobileNetwork>
> >
<data:HomeHostName>localhost</data:HomeHostName><data:HomePortNo>80</data:Ho
> > mePortNo></data:MobileContract>
>
> Wrap any Strings containing markup within a CDATA section, which in
> XML looks like this:
>
>     <![CDATA[ ......your content....  ]]>
>
> You have to check that your content doesn't contain a "]]>" (escape
> it if you do). You put this in an org.w3c.dom.CDATASection.
>
> Murray
>
> ......................................................................
> Murray Altheim                    http://kmi.open.ac.uk/people/murray/
> Knowledge Media Institute
> The Open University, Milton Keynes, Bucks, MK7 6AA, UK               .
>
>    Monkeys use thoughts to control robotic arm
>
http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2003/10/13/MN2018.DTL
>    Bush uses media expertly to push apocalyptic view
>      http://truthout.org/docs_03/091403J.shtml
>
>
>
>



Re: Storing XML in string format

Posted by Murray Altheim <m....@open.ac.uk>.
Henrik Vendelbo wrote:
> I am playing around with JAXB persistence to Xindice, and was passing the
> data as a String.
> 
> Apparently there was a parsing error, which is a bit surprising as it was
> just marshalled from an object and looks fine as I can see. Anyway I noticed
> DOM being involved. Is it converted to a DOM tree in order to check to see
> if it is well formed ??
> 
> If so, how should I pass it, because creating a DOM tree seems quite a waste
> of CPU and memory
> 
> 
> The string that was passed :
> <data:MobileContract xmlns:data="http://commons.dspc.net/datamodel">
> 
> <data:AccountName>henrik</data:AccountName><data:CustomerNo>2222222</data:Cu
> stomerNo><data:ContractNo>123454</data:ContractNo><data:ContractType>std.bun
> dle</data:ContractType><data:PersonName><data:First>Henrik</data:First><data
> :Last>Vendelbo</data:Last></data:PersonName><data:CompanyName>BLUPRINTS</dat
> a:CompanyName><data:MobileNo>07930416886</data:MobileNo>
> 
> <data:MobileNetwork>t-mobile.uk</data:MobileNetwork>
> 
> <data:HomeHostName>localhost</data:HomeHostName><data:HomePortNo>80</data:Ho
> mePortNo></data:MobileContract>

Wrap any Strings containing markup within a CDATA section, which in
XML looks like this:

    <![CDATA[ ......your content....  ]]>

You have to check that your content doesn't contain a "]]>" (escape
it if you do). You put this in an org.w3c.dom.CDATASection.

Murray

......................................................................
Murray Altheim                    http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK               .

   Monkeys use thoughts to control robotic arm
     http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2003/10/13/MN2018.DTL
   Bush uses media expertly to push apocalyptic view
     http://truthout.org/docs_03/091403J.shtml