You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Peter Chen <sl...@psn.net> on 2000/04/07 08:45:42 UTC

XSLT problem . . .

I hope someone on this list can help me with this problem.  I am trying to use XSLT to transform a XML document into another XML document.  The problem here is that the source XML document is highly denormalized.  I say denormalized to mean that the XML document has much redundant data.  I would like to transform the denormalized XML document, via XSLT, into a normalized XML document.  For example, a denormalized XML document may be like:

<GROUP>group A</GROUP>
    <SECTION>section 1</SECTION>
        <STUFF>0001</STUFF>
        <MORE_STUFF>000000001</MORE_STUFF>
<GROUP>group A</GROUP>
    <SECTION>section 1</SECTION>
        <STUFF>0002</STUFF>
        <MORE_STUFF>000000002</MORE_STUFF>
    
The normalized form of the above XML document would look like:

<GROUP>group A</GROUP>
    <SECTION>section 1</SECTION>
        <STUFF>0001</STUFF>
        <STUFF>0002</STUFF>
        <MORE_STUFF>000000001</MORE_STUFF>
        <MORE_STUFF>000000002</MORE_STUFF>

The above XML document represents the same data as the first XML document in a more compact manner (we are assuming that the corresponding DTD defines element "STUFF" and "MORE_STUFF" to be repeating elements).

Is there a way to do the above with XSLT and not comparing any of the actual element values?
Any help is appreciated.



Re: XSLT problem . . .

Posted by David Bourget <db...@videotron.ca>.
Hi Peter,

The only way I see you could achieve this without looking at the elements pcdata would require some heuristics about the document formation process. Example : is the stuff for group A ALWAYS written first, does it always contains the same number of sections etc.. And I think this is very unlikely to give you a complete solution, and this is very un-elegant and error prone. So I would recommend you to use very simple regular expressions. I know RE support to be planed in future versions of XSLT, for now you will have to use another method or XSLT extensions. If your source document is as simple as it seems you could probably write a little script in python or perl using an xml module and have the task done in a snap.. XSLt is not the universal solution :)

I hope this help,
David.


  ----- Original Message ----- 
  From: Peter Chen 
  To: general@xml.apache.org 
  Sent: Friday, April 07, 2000 2:45 AM
  Subject: XSLT problem . . .


  I hope someone on this list can help me with this problem.  I am trying to use XSLT to transform a XML document into another XML document.  The problem here is that the source XML document is highly denormalized.  I say denormalized to mean that the XML document has much redundant data.  I would like to transform the denormalized XML document, via XSLT, into a normalized XML document.  For example, a denormalized XML document may be like:

  <GROUP>group A</GROUP>
      <SECTION>section 1</SECTION>
          <STUFF>0001</STUFF>
          <MORE_STUFF>000000001</MORE_STUFF>
  <GROUP>group A</GROUP>
      <SECTION>section 1</SECTION> 
          <STUFF>0002</STUFF>
          <MORE_STUFF>000000002</MORE_STUFF>
      
  The normalized form of the above XML document would look like:

  <GROUP>group A</GROUP>
      <SECTION>section 1</SECTION>
          <STUFF>0001</STUFF>
          <STUFF>0002</STUFF>
          <MORE_STUFF>000000001</MORE_STUFF>
          <MORE_STUFF>000000002</MORE_STUFF>

  The above XML document represents the same data as the first XML document in a more compact manner (we are assuming that the corresponding DTD defines element "STUFF" and "MORE_STUFF" to be repeating elements).

  Is there a way to do the above with XSLT and not comparing any of the actual element values?
  Any help is appreciated.