You are viewing a plain text version of this content. The canonical link for it is here.
Posted to p-dev@xerces.apache.org by "Jason E. Stewart" <ja...@openinformatics.com> on 2001/03/29 20:07:18 UTC

Re: About XML::Xerces, bugs, observations and questions.

Hey Brian,

You've uncovered some of the penalties that one pays when using SWIG
to wrap a big library. If you just want to write straight-forward
programs, SWIG is great. If you want to do things that SWIG's authors
didn't anticipate, you might be in trouble....

read on

"Brian Thomas" <th...@adc.gsfc.nasa.gov> writes:

> 1) A minor patch for DOMParse (attached). Perl complains when you use
>     deprecated '\1' and \2 instead of '$1' and '$2'. Perhaps there is a platform
>     issue here for why you need '\1' ? If so please ignore the
>     patch!!

I just read the perlre man page and got clear on $<digit> versus
\<digit>. I misunderstood the semantics. the \<digit> should only be
used in the right hand side of the regexp, $<digit> is used elsewhere.

There aren't any platform issues, so I'll change it, Thanks!!

> 2) The GeneXML classes proved to be instructive, for I was having a
>    hard time sub-classing XML::Xerces classes !! Under the paradigm
>    you use, (basically store the superclass under an attribute, and
>    use AUTOLOAD) this works OK for most of my needs.

I can't remember what the Gang of Four calls the pattern, delegation
or something like that. But avoiding inheritance can be very
useful. The idea being: "is the relationship you're modeling an 'is-a'
relationship or a 'has-a' relationship". I find delegation solves a
lot of problems.

this however is not one of them:

>    However, I find that I need to be able to sub-class the
>    DOM::Element AND be able to insert it in the parent document
>    (which is holding an XML::Xerces::DOM_Document in a private
>    field). The parent document attempts to insert the sub-classed
>    element object, and then gets a failure statement. Something to
>    the effect of 'This is not a Xerces::DOM_Element you are trying
>    to insert here bud, forget it'.

What you have found is a severe limitation to the current
implementation of SWIG for perl classes. Dave Beazley (SWIG's author)
has a lot of nice type checking to keep SWIG'ed code from dumping core
when giving the wrong kind of pointer, but he's done it in a way that
asks the equivalent of:

   ref($object) eq 'XML::Xerces::Node'

when what we really want is:

   $object->isa('XML::Xerces::Node')

isa() checks the inheritance tree, and the string equals operator
can't... 

The problem is that at when point in time SWIG had a very simple
implementation of this type checking, and so this would be a one line
fix. But now (with SWIG 1.3) there is a very complicated
implementation, and although I can imagine how I might fix it, it will
take me some time (a week most likely since I'm at a conference in
Palo Alto until Sun evening). Let me look into it, and maybe a can
send you an inheritance friendly Xerces.C file tomorrow.


> So question: is it really needed to have such strict checking on the use of 
> classes? Can I set SWIG to not 'hardwire' the classes it is expecting from a 
> method call? This could be a 'show-stopper' for me using Xerces.

Sorry, it's necessary, but SWIG did it wrong. The issue is that under
the hood the DOM_Node objects are really C++ objects. Perl is able to
reference those C++ obects by keeping around pointers to them. So if
you really wanted to mess perl up, you could give it a pointer to
anything you wanted  and perl would happily cast it to a DOM_Node and
you might get a core dump. SWIG tries to 'help you out' by type
checking it for you.

This is just a case of what is all to easy to do: forget that someone
may want to subclass the object. One of the other modules I maintain,
Class::ObjectTemplate made the same mistake (except it was written
entirely in perl).

> 3) I found that cloneNode breaks if you use Xerces-c v1.4 and the 1.3.3 XML-Xerces
>     package. I've included a script which illustrates the core dump. I now use c++ 1.3.0
>     package :)

Thanks! I'll look into that one. 

Harmon, Frederick Paul, either of you got time to check out the problem?

> 4) Question: when I make a call to find the ownerdocument, I get a
>    different document each time!!! (same script from #3 will
>    illustrate this). Whats going on here, isnt this a bug??

<discussion duration="lengthy" 
            context="Perl and SWIG internals"
            geek_factor="high"
            warning="proceed at your own risk">

Hmmmm.... maybe yes, maybe no. After digging for a bit, you may be
correct. 

However, just printing out the perl memory location won't tell you
though. Under the hood SWIG calls: c++_object->getDocument() and wraps
the return value as a *new* perl object. Each of those perl objects
*should* still reference the same underlying C++ object. If they
don't, then that's up to Xerces-C.

<background>
First the C glue code in Xerces.C is auto-generated by SWIG to
communicate between the Xerces.pm *perl* code and the libxerces-c C++
code. The Xerces.C methods return scalar references to the underlying
C++ objects. Then because SWIG wants to maintain extra information
about those objects, it uses 'tie' to store the scalar reference in a
hash, and then blesses that hash into the appropriate class. This is
basically what I do with the GeneXML class, but I don't bother with
the 'tie' step.
</background>

To really find out what's going on you need to use Devel::Peek in the
debugger, and poke around a bit:

  DB<3> $doc = Bio::Genex::GeneXML->new()

  DB<9> $d1 = $doc->getOwnerDocument

  DB<10> x $d1
0  XML::Xerces::DOM_Document=HASH(0x10592640)
     empty hash
  DB<11> $d2 = $doc->getOwnerDocument

  DB<12> x $d2
0  XML::Xerces::DOM_Document=HASH(0x104e3fdc)
     empty hash

so $d1 and $d2 are definately *different* perl objects, but the real
question is what pointers are they storing underneath? Notice that
they are both HASH refs? lets use 'tied' and see what they're really
storing:

  DB<17> x tied %{$d2}
0  XML::Xerces::DOM_Document=SCALAR(0x104e415c)
   -> 277377024

  DB<18> x tied %{$d1}
0  XML::Xerces::DOM_Document=SCALAR(0x104e400c)
   -> 277381056

Notice that we've still got a couple of XML::Xerces::DOM_Document
objects, but now they're SCALAR refs. Those are the objects that
Xerces.C gave to us. Let's use Devel::Peek::Dump and look under the
hood: 

  DB<19> Devel::Peek::Dump  tied %{$d1}
SV = PVMG(0x1088dac0) at 0x104e4648
  REFCNT = 1
  FLAGS = (ROK,OVERLOAD)
  IV = 0
  NV = 0
  RV = 0x104e400c
  SV = PVMG(0x108852b0) at 0x104e400c
    REFCNT = 2
    FLAGS = (OBJECT,IOK,pIOK)
    IV = 277381056
    NV = 0
    PV = 0
    STASH = 0x1056f350	"XML::Xerces::DOM_Document"

  DB<20> Devel::Peek::Dump tied %{$d2}
SV = PVMG(0x1088db40) at 0x104e4528
  REFCNT = 1
  FLAGS = (ROK,OVERLOAD)
  IV = 0
  NV = 0
  RV = 0x104e415c
  SV = PVMG(0x10885470) at 0x104e415c
    REFCNT = 2
    FLAGS = (OBJECT,IOK,pIOK)
    IV = 277377024
    NV = 0
    PV = 0
    STASH = 0x1056f350	"XML::Xerces::DOM_Document"

Well, hmm... Notice that we've got a PVMG in each case, a MAGIC perl
object. In each case it's a reference to another scalar that has the
OBJECT and IOK flags set, indicating that it's wrapping a C/C++
pointer, and the pointer values are in the IV field:

    IV = 277381056 ($d1)
    IV = 277377024 ($d2)

Looks like different pointers to me... That means that libxerces-c
gave us different pointers when we called getOwnerDocument twice on
the same node. libxerces-c doesn't seem to be doing anything fancier
than return a pointer to the DOM_Document, so calling it multiple
times should give the same pointer. 

Let me dig deeper and get back to you...
</discussion>

So I need to:
* solve the inheritance problem
* see if these different pointers create a problem

Thanks again Brian,
jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org