You are viewing a plain text version of this content. The canonical link for it is here.
Posted to p-dev@xerces.apache.org by "Jason E. Stewart" <ja...@openinformatics.com> on 2001/03/29 20:07:18 UTC
Re: About XML::Xerces, bugs, observations and questions.
Hey Brian,
You've uncovered some of the penalties that one pays when using SWIG
to wrap a big library. If you just want to write straight-forward
programs, SWIG is great. If you want to do things that SWIG's authors
didn't anticipate, you might be in trouble....
read on
"Brian Thomas" <th...@adc.gsfc.nasa.gov> writes:
> 1) A minor patch for DOMParse (attached). Perl complains when you use
> deprecated '\1' and \2 instead of '$1' and '$2'. Perhaps there is a platform
> issue here for why you need '\1' ? If so please ignore the
> patch!!
I just read the perlre man page and got clear on $<digit> versus
\<digit>. I misunderstood the semantics. the \<digit> should only be
used in the right hand side of the regexp, $<digit> is used elsewhere.
There aren't any platform issues, so I'll change it, Thanks!!
> 2) The GeneXML classes proved to be instructive, for I was having a
> hard time sub-classing XML::Xerces classes !! Under the paradigm
> you use, (basically store the superclass under an attribute, and
> use AUTOLOAD) this works OK for most of my needs.
I can't remember what the Gang of Four calls the pattern, delegation
or something like that. But avoiding inheritance can be very
useful. The idea being: "is the relationship you're modeling an 'is-a'
relationship or a 'has-a' relationship". I find delegation solves a
lot of problems.
this however is not one of them:
> However, I find that I need to be able to sub-class the
> DOM::Element AND be able to insert it in the parent document
> (which is holding an XML::Xerces::DOM_Document in a private
> field). The parent document attempts to insert the sub-classed
> element object, and then gets a failure statement. Something to
> the effect of 'This is not a Xerces::DOM_Element you are trying
> to insert here bud, forget it'.
What you have found is a severe limitation to the current
implementation of SWIG for perl classes. Dave Beazley (SWIG's author)
has a lot of nice type checking to keep SWIG'ed code from dumping core
when giving the wrong kind of pointer, but he's done it in a way that
asks the equivalent of:
ref($object) eq 'XML::Xerces::Node'
when what we really want is:
$object->isa('XML::Xerces::Node')
isa() checks the inheritance tree, and the string equals operator
can't...
The problem is that at when point in time SWIG had a very simple
implementation of this type checking, and so this would be a one line
fix. But now (with SWIG 1.3) there is a very complicated
implementation, and although I can imagine how I might fix it, it will
take me some time (a week most likely since I'm at a conference in
Palo Alto until Sun evening). Let me look into it, and maybe a can
send you an inheritance friendly Xerces.C file tomorrow.
> So question: is it really needed to have such strict checking on the use of
> classes? Can I set SWIG to not 'hardwire' the classes it is expecting from a
> method call? This could be a 'show-stopper' for me using Xerces.
Sorry, it's necessary, but SWIG did it wrong. The issue is that under
the hood the DOM_Node objects are really C++ objects. Perl is able to
reference those C++ obects by keeping around pointers to them. So if
you really wanted to mess perl up, you could give it a pointer to
anything you wanted and perl would happily cast it to a DOM_Node and
you might get a core dump. SWIG tries to 'help you out' by type
checking it for you.
This is just a case of what is all to easy to do: forget that someone
may want to subclass the object. One of the other modules I maintain,
Class::ObjectTemplate made the same mistake (except it was written
entirely in perl).
> 3) I found that cloneNode breaks if you use Xerces-c v1.4 and the 1.3.3 XML-Xerces
> package. I've included a script which illustrates the core dump. I now use c++ 1.3.0
> package :)
Thanks! I'll look into that one.
Harmon, Frederick Paul, either of you got time to check out the problem?
> 4) Question: when I make a call to find the ownerdocument, I get a
> different document each time!!! (same script from #3 will
> illustrate this). Whats going on here, isnt this a bug??
<discussion duration="lengthy"
context="Perl and SWIG internals"
geek_factor="high"
warning="proceed at your own risk">
Hmmmm.... maybe yes, maybe no. After digging for a bit, you may be
correct.
However, just printing out the perl memory location won't tell you
though. Under the hood SWIG calls: c++_object->getDocument() and wraps
the return value as a *new* perl object. Each of those perl objects
*should* still reference the same underlying C++ object. If they
don't, then that's up to Xerces-C.
<background>
First the C glue code in Xerces.C is auto-generated by SWIG to
communicate between the Xerces.pm *perl* code and the libxerces-c C++
code. The Xerces.C methods return scalar references to the underlying
C++ objects. Then because SWIG wants to maintain extra information
about those objects, it uses 'tie' to store the scalar reference in a
hash, and then blesses that hash into the appropriate class. This is
basically what I do with the GeneXML class, but I don't bother with
the 'tie' step.
</background>
To really find out what's going on you need to use Devel::Peek in the
debugger, and poke around a bit:
DB<3> $doc = Bio::Genex::GeneXML->new()
DB<9> $d1 = $doc->getOwnerDocument
DB<10> x $d1
0 XML::Xerces::DOM_Document=HASH(0x10592640)
empty hash
DB<11> $d2 = $doc->getOwnerDocument
DB<12> x $d2
0 XML::Xerces::DOM_Document=HASH(0x104e3fdc)
empty hash
so $d1 and $d2 are definately *different* perl objects, but the real
question is what pointers are they storing underneath? Notice that
they are both HASH refs? lets use 'tied' and see what they're really
storing:
DB<17> x tied %{$d2}
0 XML::Xerces::DOM_Document=SCALAR(0x104e415c)
-> 277377024
DB<18> x tied %{$d1}
0 XML::Xerces::DOM_Document=SCALAR(0x104e400c)
-> 277381056
Notice that we've still got a couple of XML::Xerces::DOM_Document
objects, but now they're SCALAR refs. Those are the objects that
Xerces.C gave to us. Let's use Devel::Peek::Dump and look under the
hood:
DB<19> Devel::Peek::Dump tied %{$d1}
SV = PVMG(0x1088dac0) at 0x104e4648
REFCNT = 1
FLAGS = (ROK,OVERLOAD)
IV = 0
NV = 0
RV = 0x104e400c
SV = PVMG(0x108852b0) at 0x104e400c
REFCNT = 2
FLAGS = (OBJECT,IOK,pIOK)
IV = 277381056
NV = 0
PV = 0
STASH = 0x1056f350 "XML::Xerces::DOM_Document"
DB<20> Devel::Peek::Dump tied %{$d2}
SV = PVMG(0x1088db40) at 0x104e4528
REFCNT = 1
FLAGS = (ROK,OVERLOAD)
IV = 0
NV = 0
RV = 0x104e415c
SV = PVMG(0x10885470) at 0x104e415c
REFCNT = 2
FLAGS = (OBJECT,IOK,pIOK)
IV = 277377024
NV = 0
PV = 0
STASH = 0x1056f350 "XML::Xerces::DOM_Document"
Well, hmm... Notice that we've got a PVMG in each case, a MAGIC perl
object. In each case it's a reference to another scalar that has the
OBJECT and IOK flags set, indicating that it's wrapping a C/C++
pointer, and the pointer values are in the IV field:
IV = 277381056 ($d1)
IV = 277377024 ($d2)
Looks like different pointers to me... That means that libxerces-c
gave us different pointers when we called getOwnerDocument twice on
the same node. libxerces-c doesn't seem to be doing anything fancier
than return a pointer to the DOM_Document, so calling it multiple
times should give the same pointer.
Let me dig deeper and get back to you...
</discussion>
So I need to:
* solve the inheritance problem
* see if these different pointers create a problem
Thanks again Brian,
jas.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-p-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-p-dev-help@xml.apache.org