You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@commons.apache.org by Worley Kevin <Ke...@tetrapak.com> on 2005/05/01 19:27:44 UTC

[Digester] How can I 'shortcut' a parse?

Howdy,

I would like to know if anyone can tell me how to 'shortcut' a parse.

The XML file that I need to parse does not use Namespaces, but the rules
I need to use vary based on information in the "header" section.
Basically, the file is something like:

<bulkdata>
  <header>
    <PayloadID>X956487</PayloadID>
    <GenDate>1967-08-13</GenDate>
    <Mode>ABC</Mode>
    <ContactEmail>name@domain.com</ContactEmail>
    <SupplierID>A123456789B</SupplierID>
  </header>
  <body>
    <LineItem LineNumber="0">
      <ID>
        <Supplier>XYX Corp.</Supplier>
        <SupplierGroup>Group a</SupplierGroup>
        <ReferenceID>6565656</ReferenceID>
        <EmployeeName>John Doe</EmployeeName>
        <EmployeeNumber>123</EmployeeNumber>
        <BLNumber>AW54645664Z</BLNumber>
        <AWB>456789456</AWB>
        <HAWB>456789789</HAWB>
      </ID>
	...
    </LineItem>
  </body>
</bulkdata>

The 'body' can consist of thousands of 'LineItem' elements which are
each much more extensive than shown here.  Currently, I use Digester
with rules to parse the header.  When complete, I can look at what was
returned and set the rules to correctly parse the 'body' of the file.
This works, but requires the digester to completely parse the file
twice.  I'd really like to avoid doing it this way.

Does anyone know of a way I can have the parser simply end after the
'header' section is parsed?  

Thanks,

Kevin


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

Re: [Digester] How can I 'shortcut' a parse?

Posted by Simon Kitching <sk...@apache.org>.

On Sun, 2005-05-01 at 12:27 -0500, Worley Kevin wrote:
> Howdy,
> 
> I would like to know if anyone can tell me how to 'shortcut' a parse.
> 
> The XML file that I need to parse does not use Namespaces, but the rules
> I need to use vary based on information in the "header" section.
> Basically, the file is something like:
> 
> <bulkdata>
>   <header>
>     <PayloadID>X956487</PayloadID>
>     <GenDate>1967-08-13</GenDate>
>     <Mode>ABC</Mode>
>     <ContactEmail>name@domain.com</ContactEmail>
>     <SupplierID>A123456789B</SupplierID>
>   </header>
>   <body>
>     <LineItem LineNumber="0">
>       <ID>
>         <Supplier>XYX Corp.</Supplier>
>         <SupplierGroup>Group a</SupplierGroup>
>         <ReferenceID>6565656</ReferenceID>
>         <EmployeeName>John Doe</EmployeeName>
>         <EmployeeNumber>123</EmployeeNumber>
>         <BLNumber>AW54645664Z</BLNumber>
>         <AWB>456789456</AWB>
>         <HAWB>456789789</HAWB>
>       </ID>
> 	...
>     </LineItem>
>   </body>
> </bulkdata>
> 
> The 'body' can consist of thousands of 'LineItem' elements which are
> each much more extensive than shown here.  Currently, I use Digester
> with rules to parse the header.  When complete, I can look at what was
> returned and set the rules to correctly parse the 'body' of the file.
> This works, but requires the digester to completely parse the file
> twice.  I'd really like to avoid doing it this way.
> 
> Does anyone know of a way I can have the parser simply end after the
> 'header' section is parsed?  

I think you could write your own Rule class that throws an exception:

class CancelParseException extends SAXException {}

class CancelParseRule extends Rule {
    public void begin(...) {
      throw new CancelParseException();
    }
}

...
digester.addRule("bulkdata/body", new CancelParseRule());
...
try {
  digester.parse(input);
} catch(CancelParseException ex) {
  // ok
}

The above is only pseudocode; not tested!


Alternatively, run the xml input through an org.xml.sax.XMLFilter before
passing it to Digester to discard the unwanted xml on the first pass.
The xml will still be parsed, but at least Digester won't have to
process it. Or use XSLT to do the same job of "filtering" the input
xml. 

Note that Digester is a ContentHandler, so can be passed to
parser.setContentHandler rather than calling digester.parse in order to
incorporate it in a "pipeline" of SAX events. This can be useful when
playing tricks with xml such as filtering/transforming before passing
data to digester.

Regards,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org