You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xindice-users@xml.apache.org by jcplerm <jc...@ameritech.net> on 2003/06/25 17:53:17 UTC

XML growth after repeated XUpdate commands

I noticed that whenever an element of an XML document is removed by means of an XUpdate remove command, blank spaces are left in place of the removed element.

My application does lots of XUpdate removes and appends on medium sized XML docs. As I result I will end up with large documents containing lots of spaces.

For example, assuming this is the original XML doc:

<doc>
    <elem1>xxxx</elem1>
    <elem2>xxxx</elem2>
</doc>

After removing elem2 and adding elem3 this is how that doc would look like after exporting it to a flat text file:

<doc>
    <elem1>xxxx</elem1> 

    <elem2>xxxx</elem3>
</doc>

For relatively static documents, it's not a problem.
But even for small docs, if they are extensively through a similar process, they might grow too large, unnecessarily.

Could this be considered a bug and be properly fixed?
Or is there a straightforward workaround?

Thanks a lot,

jlerm

Re: XML growth after repeated XUpdate commands

Posted by Murray Altheim <m....@open.ac.uk>.
jcplerm wrote:
>Murray Altheim writes:
>
>>All parts of a DOM Document are composed of DOM Nodes -- it's the base
>>interface. So you can apply it to any portion of a DOM Document, not
>>just org.w3c.dom.Document. If you can use XUpdate to overwrite portions
>>of a Document, this should work just fine in manipulating those portions.
> 
> Please explain how you can apply your idea to XUpdate.
> 
> The problem is that the XUpdateQueryService's update methods
> don't take DOM trees as input parameters. They only accept Strings
> containing the text of the Xupdate commands.
> This means it does not matter how clean I make a DOM tree
> in my client application, because, in order to use XUpdate, it
> must be serialized to String so it may be passed to XUpdateQueryService.
> And from that point on, that's were the useless space-only text nodes
> are left.
> 
> There is no way of stripping off space (or empty) text nodes
> merely with XUpdate commands. All commands (insert-after,
> remove, update, etc,etc) must be passed as strings and after
> nodes are removed, XUpdate leaves useless empty text nodes
> behind that build up the size of the document.
> 
> In summary, by using XUpdateQueryService, there is no way of doing
> what you are suggesting.
> 
> Unless I am missing something.

Perhaps not. I myself don't use XUpdate so I'm unfamiliar with how it
works. I suggested the Node.normalize() method since that's how in the
DOM one would trim whitespace (i.e., convert a bunch of separate but
contiguous DOM Text nodes into one). Since from what you say it can't
happen client-side, it sounds like this might be either the default
behaviour or an optional feature on XUpdate itself, in order to keep
the kind of problem you're describing from occurring over time in
documents stored within Xindice. (Myself, I'd probably want it as a
feature setable via a system property. My own development lately has
had me digging deep into another aspect of my application, so I've not
attempted integrating the 1.1 code; still using 1.0).

Murray

...........................................................................
Murray Altheim                         http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK                    .

        "There's a lot of intelligence out there that you don't
         know if it's true or not."  -- Anonymous US official
         http://news.bbc.co.uk/1/hi/world/middle_east/3014850.stm


Re: XML growth after repeated XUpdate commands

Posted by jcplerm <jc...@ameritech.net>.
----- Original Message ----- 
From: "Murray Altheim" <m....@open.ac.uk>
To: <xi...@xml.apache.org>
Sent: Wednesday, June 25, 2003 3:42 PM
Subject: Re: XML growth after repeated XUpdate commands


> jcplerm wrote:
> >>jcplerm wrote:
> >>
> >>>I noticed that whenever an element of an XML document is removed by
> >>>means of an XUpdate remove command, blank spaces are left in place of
> >>>the removed element.
> >>>
> >>>My application does lots of XUpdate removes and appends on medium sized
> >>>XML docs. As I result I will end up with large documents containing
lots
> >>>of spaces.
> >>>
> >>>For example, assuming this is the original XML doc:
> >>>
> >>><doc>
> >>>    <elem1>xxxx</elem1>
> >>>    <elem2>xxxx</elem2>
> >>></doc>
> >>>
> >>>After removing elem2 and adding elem3 this is how that doc would look
> >>>like after exporting it to a flat text file:
> >>>
> >>><doc>
> >>>    <elem1>xxxx</elem1>
> >>>
> >>>    <elem2>xxxx</elem3>
> >>></doc>
> >>>
> >>>For relatively static documents, it's not a problem.
> >>>But even for small docs, if they are extensively through a similar
> >>>process, they might grow too large, unnecessarily.
> >>>
> >>>Could this be considered a bug and be properly fixed?
> >>>Or is there a straightforward workaround?
> >>
> >>If you're manipulating the DOM, have you tried Node.normalize()? It
> >
> > I guess I would then have to pull the whole document out of the DB
> > and apply this function, right?
> > But that's not what I want to do. I simply fetch portions of the
> > document with XPathQueryService and then update that same
> > document with XUpdate commands. The goal is to avoid
> > fetching and massaging the entire document.
>
> All parts of a DOM Document are composed of DOM Nodes -- it's the base
> interface. So you can apply it to any portion of a DOM Document, not
> just org.w3c.dom.Document. If you can use XUpdate to overwrite portions
> of a Document, this should work just fine in manipulating those portions.
>

Please explain how you can apply your idea to XUpdate.

The problem is that the XUpdateQueryService's update methods
don't take DOM trees as input parameters. They only accept Strings
containing the text of the Xupdate commands.
This means it does not matter how clean I make a DOM tree
in my client application, because, in order to use XUpdate, it
must be serialized to String so it may be passed to XUpdateQueryService.
And from that point on, that's were the useless space-only text nodes
are left.

There is no way of stripping off space (or empty) text nodes
merely with XUpdate commands. All commands (insert-after,
remove, update, etc,etc) must be passed as strings and after
nodes are removed, XUpdate leaves useless empty text nodes
behind that build up the size of the document.

In summary, by using XUpdateQueryService, there is no way of doing
what you are suggesting.

Unless I am missing something.

Thanks,

jlerm


Re: XML growth after repeated XUpdate commands

Posted by Murray Altheim <m....@open.ac.uk>.
jcplerm wrote:
>>jcplerm wrote:
>>
>>>I noticed that whenever an element of an XML document is removed by
>>>means of an XUpdate remove command, blank spaces are left in place of
>>>the removed element.
>>>
>>>My application does lots of XUpdate removes and appends on medium sized
>>>XML docs. As I result I will end up with large documents containing lots
>>>of spaces.
>>>
>>>For example, assuming this is the original XML doc:
>>>
>>><doc>
>>>    <elem1>xxxx</elem1>
>>>    <elem2>xxxx</elem2>
>>></doc>
>>>
>>>After removing elem2 and adding elem3 this is how that doc would look
>>>like after exporting it to a flat text file:
>>>
>>><doc>
>>>    <elem1>xxxx</elem1>
>>>
>>>    <elem2>xxxx</elem3>
>>></doc>
>>>
>>>For relatively static documents, it's not a problem.
>>>But even for small docs, if they are extensively through a similar
>>>process, they might grow too large, unnecessarily.
>>>
>>>Could this be considered a bug and be properly fixed?
>>>Or is there a straightforward workaround?
>>
>>If you're manipulating the DOM, have you tried Node.normalize()? It
> 
> I guess I would then have to pull the whole document out of the DB
> and apply this function, right?
> But that's not what I want to do. I simply fetch portions of the
> document with XPathQueryService and then update that same
> document with XUpdate commands. The goal is to avoid
> fetching and massaging the entire document.

All parts of a DOM Document are composed of DOM Nodes -- it's the base
interface. So you can apply it to any portion of a DOM Document, not
just org.w3c.dom.Document. If you can use XUpdate to overwrite portions
of a Document, this should work just fine in manipulating those portions.

Murray

...........................................................................
Murray Altheim                         http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK                    .

        "There's a lot of intelligence out there that you don't
         know if it's true or not."  -- Anonymous US official
         http://news.bbc.co.uk/1/hi/world/middle_east/3014850.stm


Re: XML growth after repeated XUpdate commands

Posted by jcplerm <jc...@ameritech.net>.
> jcplerm wrote:
> > I noticed that whenever an element of an XML document is removed by
> > means of an XUpdate remove command, blank spaces are left in place of
> > the removed element.
> >
> > My application does lots of XUpdate removes and appends on medium sized
> > XML docs. As I result I will end up with large documents containing lots
> > of spaces.
> >
> > For example, assuming this is the original XML doc:
> >
> > <doc>
> >     <elem1>xxxx</elem1>
> >     <elem2>xxxx</elem2>
> > </doc>
> >
> > After removing elem2 and adding elem3 this is how that doc would look
> > like after exporting it to a flat text file:
> >
> > <doc>
> >     <elem1>xxxx</elem1>
> >
> >     <elem2>xxxx</elem3>
> > </doc>
> >
> > For relatively static documents, it's not a problem.
> > But even for small docs, if they are extensively through a similar
> > process, they might grow too large, unnecessarily.
> >
> > Could this be considered a bug and be properly fixed?
> > Or is there a straightforward workaround?
>
> If you're manipulating the DOM, have you tried Node.normalize()? It

I guess I would then have to pull the whole document out of the DB
and apply this function, right?
But that's not what I want to do. I simply fetch portions of the
document with XPathQueryService and then update that same
document with XUpdate commands. The goal is to avoid
fetching and massaging the entire document.

But if there is no work around, then I might have to implement
some sort of periodic cleansing of the database documents
based on the function you proposed.

> likely does what you want. There may be an appropriate place to use
> this within Xindice, but I'm not sure where in the code this would
> be applied.
>
> Murray
>


Re: XML growth after repeated XUpdate commands

Posted by Murray Altheim <m....@open.ac.uk>.
jcplerm wrote:
> I noticed that whenever an element of an XML document is removed by 
> means of an XUpdate remove command, blank spaces are left in place of 
> the removed element.
>  
> My application does lots of XUpdate removes and appends on medium sized 
> XML docs. As I result I will end up with large documents containing lots 
> of spaces.
>  
> For example, assuming this is the original XML doc:
>  
> <doc>
>     <elem1>xxxx</elem1>
>     <elem2>xxxx</elem2>
> </doc>
>  
> After removing elem2 and adding elem3 this is how that doc would look 
> like after exporting it to a flat text file:
>  
> <doc>
>     <elem1>xxxx</elem1>
>  
>     <elem2>xxxx</elem3>
> </doc>
>  
> For relatively static documents, it's not a problem.
> But even for small docs, if they are extensively through a similar 
> process, they might grow too large, unnecessarily.
>  
> Could this be considered a bug and be properly fixed?
> Or is there a straightforward workaround?

If you're manipulating the DOM, have you tried Node.normalize()? It
likely does what you want. There may be an appropriate place to use
this within Xindice, but I'm not sure where in the code this would
be applied.

Murray

...........................................................................
Murray Altheim                         http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK                    .

        "There's a lot of intelligence out there that you don't
         know if it's true or not."  -- Anonymous US official
         http://news.bbc.co.uk/1/hi/world/middle_east/3014850.stm