You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@groovy.apache.org by garneke <ke...@issinc.com> on 2016/04/11 18:40:51 UTC

XmlSlurper Namespace Question

I have a requirement in my application to receive xml files from a user and
filter out specific nodes based on a configuration option ( accepts GPath
String ).

This is fine.  
I can use XmlSlurper to parse the file and with the defined GPath I can find
and remove the node and rewrite the file.

My application is generic and can accept any XML.  
My user base is not terribly savvy so I need to be able to have the GPath
specified without the namespace prefixes.   This is also fine if the
XmlSlurper is created to be namespace aware.

*The problem is...*
If my XmlSlurper is namespace aware and I remove a node, when I re-write the
XML file all of the namespace prefixes get altered to "tag0:", "tag1:",
"tag2:" etc.

Is there a way to produce the XML with its original namespace prefixes?
Is there someway I can query the original file for its namespaces and use
that declare the namespaces for the slurper?

Thanks in advance for your help.






--
View this message in context: http://groovy.329449.n5.nabble.com/XmlSlurper-Namespace-Question-tp5732293.html
Sent from the Groovy Users mailing list archive at Nabble.com.

Re: XmlSlurper Namespace Question

Posted by garneke <ke...@issinc.com>.

Thanks!  - I have extended the example to show four variations of this.  
I my code I was creating a SAXParser pool to initialize the XmlSlurper with
because I was told it greatly reduce the time and overhead of instantiating
an XmlSlurper object from scratch.
Although, I thought I had tested with and without a SAXParser I must have
missed something.
So from the four examples here you can see that the SAXParser introduces the
namespaces that I did not want.

Perhaps I could just use a pool of XmlSlurper objects..

import groovy.xml.*;
import groovy.util.slurpersupport.GPathResult;
import javax.xml.parsers.*;


public boolean removePath(def xml, def path) throws Exception{
    Eval.x( xml, "x.${path}.replaceNode({})" )
    return true;
}

public GPathResult getPathFilter( GPathResult gpResult, String filterPath )
{
    removePath( gpResult, filterPath );
    return gpResult;
}
 
String xml = '''
<a:root xmlns:a="urn:a" xmlns:b="urn:b" xmlns:c="urn:c">
    <a:child1>child1</a:child1>
    <b:child2>
        <b:content>
            <b:dataNode>data</b:dataNode>
        </b:content>
    </b:child2>
    <c:child3>child3</c:child3>
</a:root>
'''
String nodeToRemove = 'child2.content.dataNode';
String nodeToRemove_Incl_NS = "'b:child2'.'b:content'.'b:dataNode'";
SAXParserFactory parserFactory = SAXParserFactory.newInstance();

// 1) - SAXParse NS Aware
parserFactory.setNamespaceAware( true );
SAXParser parser = parserFactory.newSAXParser();
def root_one = new XmlSlurper(parser).parseText(xml)
getPathFilter( root_one , nodeToRemove);
def newXml =  XmlUtil.serialize(root_one)
print "1) - SAXParse NS Aware Filter:["+nodeToRemove+"]\n" + newXml

// 2) - SAXParse NON-NS Aware
parserFactory.setNamespaceAware( false );
parser = parserFactory.newSAXParser();
def root_two = new XmlSlurper(parser).parseText(xml)
getPathFilter( root_two , nodeToRemove_Incl_NS);
newXml =  XmlUtil.serialize(root_two)
print "\n2) - SAXParse Non-NS Aware Filter:["+nodeToRemove_Incl_NS+"]\n" +
newXml

// 3) - Slurper NS Aware
def root_three = new XmlSlurper(false, true).parseText(xml)
getPathFilter( root_three , nodeToRemove);
newXml =  XmlUtil.serialize(root_three)
print "\n3) - XmlSlurper NS Aware Filter:["+nodeToRemove+"]\n" + newXml

// 4) - Slurper Non-NS Aware
def root_four = new XmlSlurper(false, false).parseText(xml)
getPathFilter( root_four , nodeToRemove_Incl_NS);
newXml =  XmlUtil.serialize(root_four)
print "\n4) - XmlSlurper Non-NS Aware Filter:["+nodeToRemove_Incl_NS+"]\n" +
newXml





--
View this message in context: http://groovy.329449.n5.nabble.com/XmlSlurper-Namespace-Question-tp5732293p5732402.html
Sent from the Groovy Users mailing list archive at Nabble.com.

Re: XmlSlurper Namespace Question

Posted by John Wagenleitner <jo...@gmail.com>.

By default XmlSlurper is namespace aware so new XmlSlurper() is the same as
new XmlSlurper(false, true).  I tried your example and I still get the
original prefixes.  Maybe the difference is in how your code is removing
the node?


import groovy.xml.*

String xml = '''
<a:root xmlns:a="urn:a" xmlns:b="urn:b" xmlns:c="urn:c">
    <a:child1>child1</a:child1>
    <b:child2>
        <b:content>
            <b:dataNode>data</b:dataNode>
        </b:content>
    </b:child2>
    <c:child3>child3</c:child3>
</a:root>
'''

def root = new XmlSlurper(false, true).parseText(xml)

String userInput = 'child2.content.dataNode'
def nodeToRemove = root
userInput.split("\\.").each {
    nodeToRemove = nodeToRemove."${it}"
}
nodeToRemove.replaceNode {}

String newXml = XmlUtil.serialize(root)
def newRoot = new XmlSlurper(false, true).parseText(newXml)

println XmlUtil.serialize(newRoot)
assert newRoot.lookupNamespace('a') == 'urn:a'
assert newRoot.child3.lookupNamespace('c') == 'urn:c'


OUTPUT:

<?xml version="1.0" encoding="UTF-8"?><a:root xmlns:a="urn:a">
  <a:child1>child1</a:child1>
  <b:child2 xmlns:b="urn:b">
    <b:content/>
  </b:child2>
  <c:child3 xmlns:c="urn:c">child3</c:child3>
</a:root>


On Fri, Apr 15, 2016 at 11:41 AM, Kenton Garner <ke...@issinc.com>
wrote:

> I appreciate your response.
>
>
>
> The problem is that the node that is being removed will be defined by the
> user during a configuration stage.
>
> Because of the generic nature the node to remove will be defined via a
> GPath string.
>
>
>
> I do not want to require the user to specify the GPath string with the
> namespace prefixes.
>
>
>
> To expand on your example…
>
> String xml = '''
>
> <a:root xmlns:a="urn:a" xmlns:b="urn:b" xmlns:c="urn:c">
>
>     <a:child1>child1</a:child1>
>
>     <b:child2>
>
>        <b:content>
>
>          <b:dataNode>
>
>             Proprietary data to sanitize
>
>          </b:dataNode>
>
>        </b:content>
>
>     </b:child2>
>
>     <c:child3>child3</c:child3>
>
> </a:root>
>
> '''
>
> User would enter a GPath of  “child2.content.dataNode”  NOT
>  “b:child2.b:content.b:dataNode”
>
>
>
> In order for this to work the XML has to be parsed with the XmlSlurper
> argument namespaceAware=true.
>
> def root = new XmlSlurper(false, true).parseText(xml)
>
>
>
> Now the println will show that the namespace prefixes are all altered a: =
> tag0: , b: = tag1: , c: = tag2: , etc.
>
> println XmlUtil.serialize(newRoot)
>
>
>
>
>
>
>
>
>
> *From:* John Wagenleitner [mailto:john.wagenleitner@gmail.com]
> *Sent:* Friday, April 15, 2016 2:18 PM
> *To:* users@groovy.apache.org
> *Subject:* Re: XmlSlurper Namespace Question
>
>
>
> Are you doing more than just removing nodes?  The following seems to
> retain the namespace prefixes when serializing back out after removing a
> node.
>
>
>
>
>
> import groovy.xml.*
>
>
>
> String xml = '''
>
> <a:root xmlns:a="urn:a" xmlns:b="urn:b" xmlns:c="urn:c">
>
>     <a:child1>child1</a:child1>
>
>     <b:child2>child2</b:child2>
>
>     <c:child3>child3</c:child3>
>
> </a:root>
>
> '''
>
>
>
> def root = new XmlSlurper().parseText(xml)
>
> root.child2.replaceNode {}
>
>
>
> String newXml = XmlUtil.serialize(root)
>
> def newRoot = new XmlSlurper().parseText(newXml)
>
>
>
> println XmlUtil.serialize(newRoot)
>
> assert newRoot.lookupNamespace('a') == 'urn:a'
>
> assert newRoot.child3.lookupNamespace('c') == 'urn:c'
>
>
>
>
>
> Don't believe there's a defined way to get all the namespaces from the
> original files, other than possibly picking at the
> non-public namespaceTagHints field or by visiting each node and using
> #namespaceURI() to get it's namespace.
>
>
>
> On Mon, Apr 11, 2016 at 9:40 AM, garneke <ke...@issinc.com> wrote:
>
> I have a requirement in my application to receive xml files from a user and
> filter out specific nodes based on a configuration option ( accepts GPath
> String ).
>
> This is fine.
> I can use XmlSlurper to parse the file and with the defined GPath I can
> find
> and remove the node and rewrite the file.
>
> My application is generic and can accept any XML.
> My user base is not terribly savvy so I need to be able to have the GPath
> specified without the namespace prefixes.   This is also fine if the
> XmlSlurper is created to be namespace aware.
>
> *The problem is...*
> If my XmlSlurper is namespace aware and I remove a node, when I re-write
> the
> XML file all of the namespace prefixes get altered to "tag0:", "tag1:",
> "tag2:" etc.
>
> Is there a way to produce the XML with its original namespace prefixes?
> Is there someway I can query the original file for its namespaces and use
> that declare the namespaces for the slurper?
>
> Thanks in advance for your help.
>
>
>
>
>
>
> --
> View this message in context:
> http://groovy.329449.n5.nabble.com/XmlSlurper-Namespace-Question-tp5732293.html
> Sent from the Groovy Users mailing list archive at Nabble.com.
>
>
>

RE: XmlSlurper Namespace Question

Posted by Kenton Garner <ke...@issinc.com>.

I appreciate your response.

The problem is that the node that is being removed will be defined by the user during a configuration stage.
Because of the generic nature the node to remove will be defined via a GPath string.

I do not want to require the user to specify the GPath string with the namespace prefixes.

To expand on your example…
String xml = '''
<a:root xmlns:a="urn:a" xmlns:b="urn:b" xmlns:c="urn:c">
    <a:child1>child1</a:child1>
    <b:child2>
       <b:content>
         <b:dataNode>
            Proprietary data to sanitize
         </b:dataNode>
       </b:content>
    </b:child2>
    <c:child3>child3</c:child3>
</a:root>
'''
User would enter a GPath of  “child2.content.dataNode”  NOT  “b:child2.b:content.b:dataNode”

In order for this to work the XML has to be parsed with the XmlSlurper argument namespaceAware=true.
def root = new XmlSlurper(false, true).parseText(xml)

Now the println will show that the namespace prefixes are all altered a: = tag0: , b: = tag1: , c: = tag2: , etc.
println XmlUtil.serialize(newRoot)




From: John Wagenleitner [mailto:john.wagenleitner@gmail.com]
Sent: Friday, April 15, 2016 2:18 PM
To: users@groovy.apache.org
Subject: Re: XmlSlurper Namespace Question

Are you doing more than just removing nodes?  The following seems to retain the namespace prefixes when serializing back out after removing a node.


import groovy.xml.*

String xml = '''
<a:root xmlns:a="urn:a" xmlns:b="urn:b" xmlns:c="urn:c">
    <a:child1>child1</a:child1>
    <b:child2>child2</b:child2>
    <c:child3>child3</c:child3>
</a:root>
'''

def root = new XmlSlurper().parseText(xml)
root.child2.replaceNode {}

String newXml = XmlUtil.serialize(root)
def newRoot = new XmlSlurper().parseText(newXml)

println XmlUtil.serialize(newRoot)
assert newRoot.lookupNamespace('a') == 'urn:a'
assert newRoot.child3.lookupNamespace('c') == 'urn:c'


Don't believe there's a defined way to get all the namespaces from the original files, other than possibly picking at the non-public namespaceTagHints field or by visiting each node and using #namespaceURI() to get it's namespace.

On Mon, Apr 11, 2016 at 9:40 AM, garneke <ke...@issinc.com>> wrote:
I have a requirement in my application to receive xml files from a user and
filter out specific nodes based on a configuration option ( accepts GPath
String ).

This is fine.
I can use XmlSlurper to parse the file and with the defined GPath I can find
and remove the node and rewrite the file.

My application is generic and can accept any XML.
My user base is not terribly savvy so I need to be able to have the GPath
specified without the namespace prefixes.   This is also fine if the
XmlSlurper is created to be namespace aware.

*The problem is...*
If my XmlSlurper is namespace aware and I remove a node, when I re-write the
XML file all of the namespace prefixes get altered to "tag0:", "tag1:",
"tag2:" etc.

Is there a way to produce the XML with its original namespace prefixes?
Is there someway I can query the original file for its namespaces and use
that declare the namespaces for the slurper?

Thanks in advance for your help.






--
View this message in context: http://groovy.329449.n5.nabble.com/XmlSlurper-Namespace-Question-tp5732293.html
Sent from the Groovy Users mailing list archive at Nabble.com.

Re: XmlSlurper Namespace Question

Posted by John Wagenleitner <jo...@gmail.com>.

Are you doing more than just removing nodes?  The following seems to retain
the namespace prefixes when serializing back out after removing a node.


import groovy.xml.*

String xml = '''
<a:root xmlns:a="urn:a" xmlns:b="urn:b" xmlns:c="urn:c">
    <a:child1>child1</a:child1>
    <b:child2>child2</b:child2>
    <c:child3>child3</c:child3>
</a:root>
'''

def root = new XmlSlurper().parseText(xml)
root.child2.replaceNode {}

String newXml = XmlUtil.serialize(root)
def newRoot = new XmlSlurper().parseText(newXml)

println XmlUtil.serialize(newRoot)
assert newRoot.lookupNamespace('a') == 'urn:a'
assert newRoot.child3.lookupNamespace('c') == 'urn:c'


Don't believe there's a defined way to get all the namespaces from the
original files, other than possibly picking at the
non-public namespaceTagHints field or by visiting each node and using
#namespaceURI() to get it's namespace.

On Mon, Apr 11, 2016 at 9:40 AM, garneke <ke...@issinc.com> wrote:

> I have a requirement in my application to receive xml files from a user and
> filter out specific nodes based on a configuration option ( accepts GPath
> String ).
>
> This is fine.
> I can use XmlSlurper to parse the file and with the defined GPath I can
> find
> and remove the node and rewrite the file.
>
> My application is generic and can accept any XML.
> My user base is not terribly savvy so I need to be able to have the GPath
> specified without the namespace prefixes.   This is also fine if the
> XmlSlurper is created to be namespace aware.
>
> *The problem is...*
> If my XmlSlurper is namespace aware and I remove a node, when I re-write
> the
> XML file all of the namespace prefixes get altered to "tag0:", "tag1:",
> "tag2:" etc.
>
> Is there a way to produce the XML with its original namespace prefixes?
> Is there someway I can query the original file for its namespaces and use
> that declare the namespaces for the slurper?
>
> Thanks in advance for your help.
>
>
>
>
>
>
> --
> View this message in context:
> http://groovy.329449.n5.nabble.com/XmlSlurper-Namespace-Question-tp5732293.html
> Sent from the Groovy Users mailing list archive at Nabble.com.
>