You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@commons.apache.org by rich coco <ra...@starbak.com> on 2004/07/21 17:23:38 UTC

parsing xmlrpc message

I am attempting to use Digester (Jakarta Commons) to parse an xmlrpc
message. The format of the incoming xmlrpc is some what peculiar to the
company sending it (I am looking at a response to a sent command request).

The details:

     * there is only 1 <param> coming back: a structure.

     * Each <member> element of this single "top-level" structure is also (always) a
       structure. That is, no primitive type is returned at this 'secondary level'

     *  Each of these secondary strucutures can, arbitrarily, have members that are
        primitives, structures, or arrays of said types. This can, in principle, be
        arbitrarily deep.

My difficulty is that I believe I need some kind of recursive pattern-matching capability
in order to properly handle the general case. However, I cannot see any way to do this
using Digester. I am hoping this is a reflection of my inexperience with Digester
and not a limitation of that package itself.

The Digester "method" I can imagine helping with this involves the use of the
NodeCreateRule class, which will allow a subset of the xmlrpc stream -
in the form of a DOM Node - to be passed into a user-defined method for subsequent
(possibly) recursive parsing. However, I discovered that the NodeCreateRule suppresses
(as a side effect) the triggering of pattern matching rules under the XML element
whose match triggered the NodeCreateRule in the first place. This side-effect
resulted in an object not getting popped from the Digester stack, which results
in an Exception being thrown by the Digester later on (Digester's stack now being
out of sync because a rule that would have popped an entry to the stack was
no longer being triggered when an 'n-level structure' was encountered.

i hope this is not too vague. i can provide an example xmlrpc msg and the Java source
I am using if anyone cares to help out with this.

In brief, I guess I am asking: given what i am trying to do and the structure of the
incoming xmlrpc stream, can I use Digester to properly parse this arbitrarily
recursive stream? is there some other xmlrpc parsing package that I should use
instead?

Many thanks for any help

- rich

-- 
rich coco
racoco@starbak.com
781.736.1200  x165
Starbak Inc.
29 Sawyer Rd.
One University Office Park
Waltham, MA 02453


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

[digester] Re: parsing xmlrpc message

Posted by Bill Keese <bi...@tech.beacon-it.co.jp>.

Simon beat me to the answer to your question by 3 minutes  :-)

> The Digester "method" I can imagine helping with this involves the use 
> of the
> NodeCreateRule class... Digester's stack now being
> out of sync

Like the previous responses implied , I don't think you need or want to 
use the NodeCreateRule class.   But if you do end up using that class, 
you need to pull the latest code from CVS.  The 1.5 release has a bug 
about popping the created node off the stack:

 http://www.mail-archive.com/commons-user@jakarta.apache.org/msg03695.html

Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

[digester] Re: parsing xmlrpc message

Posted by Bill Keese <bi...@tech.beacon-it.co.jp>.

> My difficulty is that I believe I need some kind of recursive 
> pattern-matching capability
> in order to properly handle the general case.

Like Robert said, you may have to use ExtendedBaseRules 
(http://jakarta.apache.org/commons/digester/apidocs/org/apache/commons/digester/ExtendedBaseRules.html)

But from your description, I suspect that you can get what you want  
from the standard digester configuration, by using tail match.  The 
pattern "*/xyz" will match element xyz regardless of how many levels of 
nesting there are.  From the manual:

/Tail Match/ - A pattern "*/a/b" matches a |<b>| element, nested inside 
an |<a>| element, no matter how deeply the pair is nested.

Are you asking for something more complicated than that?

Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

Re: [digester] parsing xmlrpc message

Posted by Bill Keese <bi...@tech.beacon-it.co.jp>.

I think what Simon said explains exactly how to do it. I would only add
one thing. There's a common misunderstanding people have when learning
Digester, and I wonder if you are making that mistake.

People think: "When I call addCreateObject(), it scans the input XML for
a tag matching the pattern and then creates an object on the stack". But
actually, that's not what happens at all. All the calls like
addCreateObject() just add a rule into a table. They don't even look at
the input XML. After you are finished creating all the rules, digester
sequentially processes each tag/attribute in the input XML, by applying
matching rule(s) to it.

So, the rules don't necessarily execute in the order that you wrote them
in the java source code. And, a single rule can execute many times, on
elements in different levels of the input XML. The end effect is that
you get something like recursion, although it's implemented using the
digester stack.

Bill



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

Re: [digester] parsing xmlrpc message

Posted by Simon Kitching <si...@ecnetwork.co.nz>.

On Fri, 2004-07-23 at 01:10, rich coco wrote:
> bill/simon/bob -
> 
> thanks to all of you for your advice.
> 
> Unfortunately (for me) I do not see how the wildcard rule helps me,
> tho this may be more do to my lack of imagination that anything to
> do with the rule itself.
> 
> I'll try to explain why i think that. refer to the attached xmlrpc msg
> (that i have to parse) in what follows. In general, the xmlrpc response
> has this format:
> 
> 	* exactly one argument is returned, an unnamed (top-level) structure.
> 
> 	* Each member of that single argument is iteslf a (level-2) structure.
> 
> 	* From this point on, each member of a level-2 structure can be a
> 	   primitive (int, string, etc...), another structure, or an array
>             of any supported type (including structure).
> 
> 	* In principle, there is not limit to the depth of structures:
>             structures can have members that structures, whose corresponding
> 	  members may be structures, etc. (or even arrays thereof).
> 
> So with this in mind, and refering to the sample xml below, it is not
> clear to me that a wild-card prefic rule like "*/struct" (or any rule of the form
> "*/whatever/...").
> 
> Since the first 2-levels of the xmlrpc response have well defined (as opposed
> to arbitray and recursive) structure, maybe I might use a wild-card rule like this:
> 
> 	/memberResponse/params/param/value/struct/member/value/struct/*/struct
> 
> to capture all level-3 and deeper structures? (similarly for arrays?).
> 
> One of you clued me in to the fix to the NodeCreateRule bug. thanks.
> just today i downloaded the latet Digester cvs code and rebuild the jar file.
> I had been getting an exception when using it precisely because it did not
> pop an element off the Digester stack when it was supposed to (the bug). The man page
> for that Rule gave this detailed warning about how using the rule would supress
> the firing of subsequent rules (or some such), which made me think maybe that was
> how it was supposed to work. tho, in retrospect, it's behavior made it
> somewhat unusable, which should have been a clue to me...
> 
> Any additional insight welcome.
> 
> - rich
> 
> -------------
> 
> <methodResponse>
>    <params>
>      <param>
>        <value>
>          <struct>
>            <member>
>              <name>status</name>
>              <value>
>                <struct>
>                  <member>
>                    <name>encoding</name>
>                    <value><i4>200</i4></value>
>                  </member>
>                  <member>
>                    <name>content_info</name>
>                    <value>
>                      <struct>
>                        <member>
>                          <name>description</name>
>                          <value></string></value>
>                        </member>
>                        <member>
>                          <name>encoder_info</name>
>                          <value>
>                            <struct>
>                              <member>
>                               .  .  .
> 

Presumably you have a Java class which represents the info in a <value>
tag, and another class that represents the info in a <struct> tag,
right? So:

  digester.addCreateObject("*/value", Value.class);

  // the pattern "*/value/struct could simply be "*/struct", because
  // they always do occur within a value tag, but putting the value
  // in there makes the expected structure clearer.
  digester.addCreateObject("*/value/struct", Struct.class);
  digester.addSetNext("*/value/struct", "addStruct"); 

  digester.addCallMethod("*/value/i4", 1, setIntValue);
  digester.addCallParam("*/value,i4);

etc...

I don't see anything in your input that can't be handled with wildcard
prefixes (though I haven't investigated in detail). Although the spec
appears to be specific about the first 2 levels of nesting being
mandatory, they appear to have exactly the same hierarchy/nesting rules
that apply to the levels below that. So the same parsing rules should
work at any depth as far as I can see.

Regards,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

Re: [digester] parsing xmlrpc message

Posted by rich coco <ra...@starbak.com>.

bill/simon/bob -

thanks to all of you for your advice.

Unfortunately (for me) I do not see how the wildcard rule helps me,
tho this may be more do to my lack of imagination that anything to
do with the rule itself.

I'll try to explain why i think that. refer to the attached xmlrpc msg
(that i have to parse) in what follows. In general, the xmlrpc response
has this format:

	* exactly one argument is returned, an unnamed (top-level) structure.

	* Each member of that single argument is iteslf a (level-2) structure.

	* From this point on, each member of a level-2 structure can be a
	   primitive (int, string, etc...), another structure, or an array
            of any supported type (including structure).

	* In principle, there is not limit to the depth of structures:
            structures can have members that structures, whose corresponding
	  members may be structures, etc. (or even arrays thereof).

So with this in mind, and refering to the sample xml below, it is not
clear to me that a wild-card prefic rule like "*/struct" (or any rule of the form
"*/whatever/...").

Since the first 2-levels of the xmlrpc response have well defined (as opposed
to arbitray and recursive) structure, maybe I might use a wild-card rule like this:

	/memberResponse/params/param/value/struct/member/value/struct/*/struct

to capture all level-3 and deeper structures? (similarly for arrays?).

One of you clued me in to the fix to the NodeCreateRule bug. thanks.
just today i downloaded the latet Digester cvs code and rebuild the jar file.
I had been getting an exception when using it precisely because it did not
pop an element off the Digester stack when it was supposed to (the bug). The man page
for that Rule gave this detailed warning about how using the rule would supress
the firing of subsequent rules (or some such), which made me think maybe that was
how it was supposed to work. tho, in retrospect, it's behavior made it
somewhat unusable, which should have been a clue to me...

Any additional insight welcome.

- rich

-------------

<methodResponse>
   <params>
     <param>
       <value>
         <struct>
           <member>
             <name>status</name>
             <value>
               <struct>
                 <member>
                   <name>encoding</name>
                   <value><i4>200</i4></value>
                 </member>
                 <member>
                   <name>content_info</name>
                   <value>
                     <struct>
                       <member>
                         <name>description</name>
                         <value></string></value>
                       </member>
                       <member>
                         <name>encoder_info</name>
                         <value>
                           <struct>
                             <member>
                              .  .  .

Simon Kitching wrote:
> On Thu, 2004-07-22 at 08:27, robert burrell donkin wrote:
> 
>>you may need to use the ExtendedBaseRules (or create a regex rule 
>>implementation from something like ORO) if the base pattern matching 
>>rule vocabulary is not rich enough but it's hard to give you more help 
>>without the idea of the xml and the source to which you're trying to 
>>map.
>>
> 
> 
> Digester does handle recursive structures, at least for the basic cases.
> 
> The standard rules engine allows wildcard prefixes, eg
>  "*/window/widget"
> 
> This pattern will match both of the "widget" tags, firing the same Rule
> (which is generally what is desired).
> 
>  <gui>
>    <window>
>     <widget id="1"/>
>     <window>
>       <widget id="2"/>
>     </window>
>   </window>
>  </gui>
> 
> If this doesn't give you what you want, please let us know why.
> As Robert said, there are a number of other pattern-matching engines
> (ExtendedBaseRules and RegexRules) that might work for you.
> 
> Regards,
> 
> Simon
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org

-- 
rich coco
racoco@starbak.com
781.736.1200  x165
Starbak Inc.
29 Sawyer Rd.
One University Office Park
Waltham, MA 02453

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

Re: [digester] parsing xmlrpc message [WAS Re: parsing xmlrpc message]

Posted by Simon Kitching <si...@ecnetwork.co.nz>.

On Thu, 2004-07-22 at 08:27, robert burrell donkin wrote:
> you may need to use the ExtendedBaseRules (or create a regex rule 
> implementation from something like ORO) if the base pattern matching 
> rule vocabulary is not rich enough but it's hard to give you more help 
> without the idea of the xml and the source to which you're trying to 
> map.
> 

Digester does handle recursive structures, at least for the basic cases.

The standard rules engine allows wildcard prefixes, eg
 "*/window/widget"

This pattern will match both of the "widget" tags, firing the same Rule
(which is generally what is desired).

 <gui>
   <window>
    <widget id="1"/>
    <window>
      <widget id="2"/>
    </window>
  </window>
 </gui>

If this doesn't give you what you want, please let us know why.
As Robert said, there are a number of other pattern-matching engines
(ExtendedBaseRules and RegexRules) that might work for you.

Regards,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org

[digester] parsing xmlrpc message [WAS Re: parsing xmlrpc message]

Posted by robert burrell donkin <ro...@blueyonder.co.uk>.

On 21 Jul 2004, at 16:23, rich coco wrote:

hi rich

<snip>

> In brief, I guess I am asking: given what i am trying to do and the 
> structure of the
> incoming xmlrpc stream, can I use Digester to properly parse this 
> arbitrarily
> recursive stream?

digester is fundamentally just an easy wrapper around SAX. given enough 
effort, anything you can process using SAX you can process using 
digester. you can always just create a custom rule implementation if 
none of the standard one are right for you.

you may need to use the ExtendedBaseRules (or create a regex rule 
implementation from something like ORO) if the base pattern matching 
rule vocabulary is not rich enough but it's hard to give you more help 
without the idea of the xml and the source to which you're trying to 
map.

> is there some other xmlrpc parsing package that I should use
> instead?

i suppose this question becomes: is there any easier way to do this?

digester isn't a specialist xmlrpc package so it's possible that 
someone might know a specialist package that will make this easier. 
maybe they'd like to jump in since i don't.

BTW please remember to prefix your subject with the name of the 
component you're posting about.

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org