You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Ken Tanaka <Ke...@noaa.gov> on 2008/05/16 01:02:36 UTC

[digester] xml attribute values containing "]" character get scrambled

I'm configuring a program with regex patterns to inventory filenames to 
a database. I have 8 types of files I wish to deal with, with 8 separate 
filename patterns I want to match against to choose an appropriate 
action. I'm not sure if I'm doing something wrong or if I've hit a bug. 
I'm using digester 1.8, with jdk1.5.0_11 on Red Hat Enterprise Linux 
Client release 5.1.

So I'm configuring my program with regular expressions that match parts 
of a directory path with "/([^/]+)/" to match one or more characters 
other than "/" between directory separators of "/". I encountered some 
weirdness in parsing the configuration file. I simplified my more 
elaborate regular expressions from this down to a configuration file 
containing:

<?xml version="1.0" encoding="UTF-8"?>
<!--     Document   : digester_conf.xml  -->
<toplevel>
    <!--    Set of strings -->
    <!--    If all the strings contain "]", then string seven gets garbled
            by content from string eight. -->
    <stringset
        attr1="/one/([^/]+)"
        attr2="/two/([^/]+)"
        attr3="/three/([^/]+)"
        attr4="/four/([^/]+)"
        attr5="/five/([^/]+)"
        attr6="/six/([^/]+)"
        attr7="/seven/([^/]+)"
        attr8="/eight/([^/]+)"
        attr9="/nine/([^/]+)"
        />
</toplevel>

I print out the stringset values and get (note attr7 doesn't look right):

String set:
  attr1 =/one/([^/]+)
  attr2 =/two/([^/]+)
  attr3 =/three/([^/]+)
  attr4 =/four/([^/]+)
  attr5 =/five/([^/]+)
  attr6 =/six/([^/]+)
  attr7 =/eight/([^/]+)
  attr8 =/eight/([^/]+)
  attr9 =/nine/([^/]+)
  attr10=null

If the configuration file sets attr1...attr9 to strings not containing 
the ']' character, then all works as expected. Am I using special 
characters that I should I be escaping in the values somehow? Do you see 
any errors in my addRules method? Or could this be a bug? I thought at 
first there was a limit to the number of attributes allowed, but 
variations on the program work with up to 15 stringset attributes, which 
is as far as I've tested. But the seventh string containing ']' gets 
corrupted by the eighth string containing ']', even with other values 
interspersed.

The main Java file contains

public class TryDigester
{  
    /** Creates a new instance of TryDigester */
    public TryDigester() {
    }
   
    public static void main( String[] args )
    {
        System.out.println( "Starting configuration test" );
        if (args.length < 1) {
            System.out.println("Usage: java -jar target/TryDigester.jar 
digester_conf.xml");
            System.exit(-1);
        }
       
        TryDigester app = new TryDigester();
       
        //configure loading session
        String configFile = args[0];
        // Create a Digester instance
        Digester d = new Digester();
       
        // Prime the digester stack with an object for rules to
        // operate on.
        ConfigData cfg = new ConfigData();
        d.push(cfg);
       
        // Add rules to the digester that will be triggered while
        // parsing occurs.
        addRules(d);
       
        // Process the input file.
        try {
            java.io.File srcfile = new java.io.File(configFile);
            d.parse(srcfile);
        } catch(java.io.IOException ioe) {
            System.out.println("Error reading input file:" + 
ioe.getMessage());
            System.exit(-1);
        } catch(org.xml.sax.SAXException se) {
            System.out.println("Error parsing input file:" + 
se.getMessage());
            System.exit(-1);
        }
       
        // For debugging, see the results of configuration file processing.
        cfg.print();
       
        // Do something with configuration data
//        app.doSomething(cfg);
       
        System.out.println("Done");
    } // end main(String[] args)
   
   
    /**
     * The parsing rules for filling in session parameters from the XML
     * configuration file.
     */
    private static void addRules(Digester d) {      
        d.addObjectCreate("toplevel/stringset", StringSet.class);
        d.addSetProperties("toplevel/stringset");
        d.addSetNext("toplevel/stringset", "addStringset");      
    } // addRules(Digester d)


    private void doSomething(ConfigData cfgData) {
        if (cfgData.getStringset() == null) {
            throw new NullPointerException("stringset is null");
        }
        // ============ processing code goes here ==========
    } // doSomething(ConfigData cfgData)
}

ConfigData is an object representing anything of interest from the 
configuration file. I contains just a single object StringSet, which is 
a databean with 10 String properties (attr1 through attr10). If people 
want, I can post the source files (3 small java files, 1 maven2 
pom.xml), the first listing above shows the configuration file in its 
entirety.

Thanks in advance for any suggestions,
-Ken

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org