You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucy.apache.org by Nick Wellnhofer <we...@aevum.de> on 2012/04/19 18:12:05 UTC

[lucy-dev] OFFSET globals

Attached is a patch that replaces the OFFSET globals with NUM #defines 
which contain the method number. This more than halves the number of 
extern symbols in Lucy.so. The only cost is a constant add for every 
method invocation which should be negligible.

This also removes the use of offsetof in parcel.c, so we don't need the 
definition of the VTable struct in that file.

Thoughts?

Nick

Re: [lucy-dev] OFFSET globals

Posted by Nick Wellnhofer <we...@aevum.de>.

On 24/04/2012 03:36, Marvin Humphrey wrote:
> So... that means that before dlopen returns, any OFFSET vars in the DSO will
> have been resolved to local copies (which will always be present under your
> scheme).  These will hold potentially incorrect values until our bootstrap
> routines run, but I don't see a scenario where a *different* thread starts
> using the DSO before then -- and of course the thread doing the loading will
> execute linearly and finish bootstrapping before use.

I did some testing on Linux, and it seems that if two DSOs define the 
same symbols, the DSO loaded second will always see the symbol of the 
first DSO and never its own ones. So the whole thing shouldn't be a problem.

Nick

Re: [lucy-dev] OFFSET globals

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Mon, Apr 23, 2012 at 8:50 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> There's only a small problem I can see on platforms where we can't hide
> symbols: Between the loading of the DLL and the initialization of the parcel
> there will be a window during which the offsets will become uninitialized.
> This would cause problems with threads.

I'm not sure that we need to worry about that, because it seems that on common
POSIX systems, *variable* symbols are always resolved by the dynamic loader
before the DSO can be used. (On Windows, symbols don't have to be exported
so there's no issue.) Only function symbols are resolved lazily:

    http://www.akkadia.org/drepper/dsohowto.pdf  [section 1.5.2]

    Due to the way code is generated on some architectures it is possible to
    delay the processing of some relocations until the references in question
    are actually used. This is on many architectures the case for calls to
    functions. All other kinds of relocations always have to be processed
    before the object can be used.

    http://linux.die.net/man/3/dlopen

    RTLD_LAZY

    Perform lazy binding. Only resolve symbols as the code that references
    them is executed. If the symbol is never referenced, then it is never
    resolved. (Lazy binding is only performed for function references;
    references to variables are always immediately bound when the library is
    loaded.)

    http://www.freebsd.org/cgi/man.cgi?query=dlopen&sektion=3

    RTLD_LAZY

    Each external function reference is resolved when the function is first
    called.

    http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man3/dlopen.3.html

    RTLD_LAZY

    Each external function reference is bound the first time the function is
    called.

    http://www.unix.com/man-page/OpenSolaris/3c/dlopen/

    RTLD_LAZY

    Only immediate symbol references are relocated when the object is first
    loaded. Lazy references are not relocated until a given function is called
    for the first time. This value for mode should improve performance, since
    a process might not require all lazy references in any given object.  This
    behavior mimics the normal loading of dependencies during process
    initialization. See NOTES.

These implementations all behave this way despite the fact that POSIX
describes the timing as "implementation-defined":

    http://pubs.opengroup.org/onlinepubs/009695399/functions/dlopen.html

    RTLD_LAZY

    Relocations shall be performed at an implementation-defined time, ranging
    from the time of the dlopen() call until the first reference to a given
    symbol occurs. Specifying RTLD_LAZY should improve performance on
    implementations supporting dynamic symbol binding as a process may not
    reference all of the functions in any given object. And, for systems
    supporting dynamic symbol resolution for normal process execution, this
    behavior mimics the normal handling of process execution.

So... that means that before dlopen returns, any OFFSET vars in the DSO will
have been resolved to local copies (which will always be present under your
scheme).  These will hold potentially incorrect values until our bootstrap
routines run, but I don't see a scenario where a *different* thread starts
using the DSO before then -- and of course the thread doing the loading will
execute linearly and finish bootstrapping before use.

Now, I believe we would have a problem if we were exposing *function* symbols,
as a different thread could attempt to lazy load a function symbol in the
window between the return of dlopen and bootstrap completion.  But since our
method invocations are static inline routines, they are not exported as DSO
symbols -- only the OFFSET vars that those static inline routines reference
are at issue.

For totally different reasons, though, it's worth revisiting our
initialization policy for OFFSET vars.  For any DSO other than the Clownfish
core, we should avoid assigning to them in order to save space, as all OFFSET
vars in every extension will have to be bootstrapped anyway.

Marvin Humphrey

Re: [lucy-dev] OFFSET globals

Posted by Nick Wellnhofer <we...@aevum.de>.

On 23/04/2012 16:55, Marvin Humphrey wrote:
> On Mon, Apr 23, 2012 at 4:09 AM, Nick Wellnhofer<we...@aevum.de>  wrote:
>> On 23/04/2012 06:15, Marvin Humphrey wrote:
>
>>>    * Generating headers suddenly get more complicated, because the content
>>>      derived from each .cfh file will be different for each target parcel!
>>>      Plus, things like METHOD and SUPER_METHOD will require parcel-specific
>>>      implementations...
>>
>> This could be handled with defines:
>>
>> #define Lucy_Hash_Fetch_OFFSET ext_Hash_fetch_OFFSET
>
> Hmm, wait a minute... I think that even on systems where we can't stop symbol
> export, ELF symbol resolution rules may work in our favor and render this
> complexity unnecessary.
>
> My understanding is that in the event that two ELF DSOs export the same
> symbol, the general rule is that the last library loaded wins[1]:
>
>      http://www.akkadia.org/drepper/dsohowto.pdf
>
>      Note that there can be many references to the same symbol in different
>      objects. The result of the lookup can be different for each of the objects
>      so there can be no short cuts except for caching results for a symbol in
>      each object in case more than one relocation references the same symbol.
>      The lookup scope mentioned in the steps below is an ordered list of a
>      subset of the loaded objects which can be different for each object
>      itself.
>
> (I imagine symbol resolution of Mach-o DSOs on OS X follows similar rules.)
>
> But if Clownfish.so and Lucy.so both export Cfish_Hash_Fetch_OFFSET, it
> doesn't matter which copy of the var gets used so long as all copies get
> initialized to the proper value during bootstrapping (and no method calls are
> made during the boostrapping phase). The values of these OFFSET vars are
> constant once bootstrapping completes.
>
> The bootstrapping will work because there will only be a single canonical
> copy of each VTable, and it will know the offsets.  Cyclic dependency graphs
> are a problem, but that was already true regardless of visibility.
>
> So I don't think symbol visibility matters unless there are systems out there
> where we can't stop symbol export AND the dynamic loader crashes if it finds
> multiple copies of the same symbol.  I'm not an expert on this subject (yet!),
> but probably such a system does not exist.

Oh, great idea. This would make the implementation much cleaner. I think 
it crossed my mind, but I dismissed it for some reason. There's only a 
small problem I can see on platforms where we can't hide symbols: 
Between the loading of the DLL and the initialization of the parcel 
there will be a window during which the offsets will become 
uninitialized. This would cause problems with threads.

Nick

Re: [lucy-dev] OFFSET globals

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Sat, Apr 28, 2012 at 11:28 AM, Nathan Kurz <na...@verse.com> wrote:
> On Thu, Apr 26, 2012 at 6:23 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
>> If that were _PERL, we would eliminate Host_callback() indirection layer and
>> flesh out the function body with raw XS:
>
> +1.  I like that it removes a layer of indirection, and makes it
> clearer what the function does.

Sounds good -- I'll open an issue.

> And I might be overplanning, but I think there are cases where it might be
> nice to try to bridge between to host languages, Host language specific
> naming would allow for this.

There are other obstacles:

  * What should Obj#To_Host() return when there are multiple hosts?
  * What about classes that use host language data structures directly, like
    Lucy::Document::Doc?
  * Where do we put the cached host object that makes it possible to support
    inside-out member variables under Perl?

The fundamental challenge is that the more tightly we integrate with the host
language, the harder it becomes to bridge between multiple hosts.

> But what I'd like to move towards is a policy of "if you can grab hold of
> it, you can subclass it", with the final or non-public classes being static
> or otherwise opaque in the object file.  This doesn't need to be absolute,
> and convention will still play a role, but reduced visibility equals reduced
> temptation.

I'm with you on reduced visibility for non-public classes -- that's just
best-practice information hiding.

Final classes are different, though.  They enable certain compiler
optimizations, e.g. inlining, because final methods can often be resolved to
specific functions, avoiding vtable dispatch (which is why virtual methods
are not the default in C++).

> But I do want to make sure that Core->Compiled->Script hierarchies
> works identically to Core->Script.

+1 -- That's how it works now (parcel prefix bugs notwithstanding) -- so for
instance we can subclass Nick's test extension
LucyX::Analysis::WhitespaceTokenizer as "MyWhitespaceTokenizer" and it will
behave as expected.

> And certainly we want Core->Script->Script to work within the same host
> language.

+1 -- Also 100% supported.

> Do we care about Core->ScriptA->ScriptB:  probably not, although I maybe if
> ScriptA was something embedded like Lua.  But I see no harm in aiming
> for this capability as a polygot point of pride.

I feel the attraction of this feature too, but to be honest, I think it's
deceptively difficult to implement.  Object life cycle issues such as
destructors are a real headache already without having to coordinate with
*multiple* external garbage collection systems.  And who wants to write the
tests to demonstrate that every last feature works under
Core->ScriptA->ScriptB?

In my opinion, we need to leave this item off of the requirements list instead
specify that Clownfish need only target one host language at a time.

> [...] Currently, it looks like it's possible for a Clownfish object to
> create a per-instance copy of the registered per-class VTable using
> VTable_singleton.
>
> Within Clownfish this copying is as part of the process of creating a
> subclass, but I don't see any technical reasons that an object couldn't
> create this copy, modify it, use it, and never register it.  Multiple
> objects could even share this "private" and unregistered VTable, or others
> could copy and modify it creating an ad-hoc prototype system independent of
> the official class hierarchy.
>
> I'm not advocating for this, but wondered if this was a valuable capability
> that should be preserved or just a spandrel.

It's a spandrel[1].  In other words, it's an accident of the implementation
that per-instance VTables look doable.

I'd oppose supporting per-instance methods as a Clownfish feature.  They have
their place in Ruby (and JavaScript, etc), but they would be difficult to
support properly in other languages -- notably Perl.

One of the defining characteristics of the Clownfish undertaking is that we
do NOT need to innovate in the OO space.  We only need to support a vanilla
classical inheritance model -- and the ambitious part of our task is to
integrate that model into diverse environments and to provide idiomatic APIs
which make using Clownfish-based software from the host language feel as
natural as possible.

The only OO feature I think Clownfish needs to add is some sort of weak
multiple inheritance along the lines of mixins or interface inheritance.  Raw
single inheritance was good enough to build Lucy, but it's going to be hard to
live with such a limitation once Clownfish gains a public API.

> What I will be advocating for is a window of malleability before the
> classes are registered.  I'm with you regarding immutability once a
> class is "visible", but I'd like the host language to have a shot at
> things before the core classes are declared visible.
> I also want to make sure we preserve the current unadvertised ability to
> override the core methods with compiled "overlay" extensions using
> LD_PRELOAD.  But this is a secret I'm waiting to spring on you at some later
> point when have a better feel for the whole situation.

Hmm... as you probably anticipated, I have certain reservations about
officially sanctioning a technique which violates encapsulation.  :)

But one quirk of this proposal is that it can only be achieved at the level of
the application, since libraries don't get to control LD_PRELOAD.  That
doesn't make it any less risky to reach in and monkey patch a core class, but
it *does* mean that the monkey who monkey patches is going to be the *same*
monkey who gets bitten to death by the spooky action-at-a-distance bugs their
tinkering unleashed.

I'm still not enthusiastic because I don't think it's in our interest to
forswear the compiler optimizations that hiding those function symbols
enables.

>> Even if you don't override any methods, the VTable stores other metadata,
>> like the class name.  (Which is why "VTable" should be renamed to
>>"MetaClass" -- it's not just an array of function pointers any more.)
>
> +0.  I see your logic, but I'm not a fan of "MetaClass" as a name.
> VTable might be technically inaccurate, but it sets up the right
> expectations.

I've come to believe that it's important that we ditch the name "VTable",
which is an accident of the historical implementation -- originally, it *was*
just an array of function pointers.  Very few of our users are going to have
any idea what on earth a "vtable" is; "MetaClass" is easier to grok.  And if
we change the implementation to use multiple vtables, "VTable" becomes even
less accurate.

The other obvious candidate is "Class".  Personally, I like that a little less
than "MetaClass" because it's an overloaded term and it means we have to spell
instances "klass" to avoid colliding with the C++ keyword, but it's apparently
good enough for both Ruby and Java.

>> Thanks for starting up a requirements list, as recommended in that
>> Joshua Bloch presentation.
>
> Now I'm worried that you read every link I send you in full, and that
> if I were to accidentally send the wrong long thing we could lose you
> for weeks! I'll try to make sure to aim for high quality at least. :)

Haha, I'd seen that Bloch presentation before -- it's really good!

It would be a great exercise to apply Bloch's critiques to the classes that
end up in the Clownfish core -- especially since a lot of that presentation
deals with Java platform APIs which bear a strong resemblance to Clownfish.

Marvin Humphrey

[1] For those of you like me who weren't familiar with the term "spandrel",
    here's Wikipedia: http://en.wikipedia.org/wiki/Spandrel_(biology)
    "In evolutionary biology, a Spandrel is a phenotypic characteristic that
    is a byproduct of the evolution of some other characteristic, rather than
    a direct product of adaptive selection."

Re: [lucy-dev] OFFSET globals

Posted by Nathan Kurz <na...@verse.com>.

On Thu, Apr 26, 2012 at 6:23 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> If that were _PERL, we would eliminate Host_callback() indirection layer and
> flesh out the function body with raw XS:

+1.  I like that it removes a layer of indirection, and makes it
clearer what the function does.  And I might be overplanning, but I
think there are cases where it might be nice to try to bridge between
to host languages, Host language specific naming would allow for this.

>> 7) It must be possible to dynamically subclass a core class at runtime.
>
> +1
>
> You might say "non-final public class", since we support "final" classes.
> (InStream and OutStream are both final.)

Yes, more correct.  But what I'd like to move towards is a policy of
"if you can grab hold of it, you can subclass it", with the final or
non-public classes being static or otherwise opaque in the object
file. This doesn't need to be absolute, and convention will still play
a role, but reduced visibility equals reduced temptation.

>> 7) a. Dynamically created subclasses should be indistinguishable from
>> and interchangeable with core classes.
>
> I prefer your other formulation.  I don't think the inability to get at
> pure-host-language subclasses from C poses a major impediment.

Perhaps we need to distinguish between compiled extensions and
scripted extensions.  Classes in compiled extensions are
interchangeable with core classes; classes created by a scripting
language might not need to be.  Simplicity has advantages, though, and
it would be nice if once they are registered that one can't tell the
difference.  Would be even nicer if the pathways to registration were
the same for both.

Like you, I'm not worried about Core->Script->Compiled extensions.
But I do want to make sure that Core->Compiled->Script hierarchies
works identically to Core->Script.  And certainly we want
Core->Script->Script to work within the same host language.  Do we
care about Core->ScriptA->ScriptB:  probably not, although I maybe if
ScriptA was something embedded like Lua.  But I see no harm in aiming
for this capability as a polygot point of pride.

>> Before thinking about approaches, there is at least one more potential
>> constraint questions that concerns me:  How important is the ability
>> to for objects to be able to have a "private" VTable?
>
> I interpret your phrase '"private" VTable' as meaning "a dedicated VTable
> singleton for each subclass, distinct from the VTable singleton belonging to
> the parent class" -- in which case the answer is that per-class VTable
> singletons are mandatory.

I guess I was fumbling with terminology again.  In a class-based OO
system, every object is in a class, and every instance of that class
has the same methods.  In an instance-based system, it's possible for
one instance to change its behaviour without formally creating a new
class.   Currently, it looks like it's possible for a Clownfish object
to create a per-instance copy of the registered per-class VTable using
VTable_singleton.

Within Clownfish this copying is as part of the process of creating a
subclass, but I don't see any technical reasons that an object
couldn't create this copy, modify it, use it, and never register it.
Multiple objects could even share this "private" and unregistered
VTable, or others could copy and modify it creating an ad-hoc
prototype system independent of the official class hierarchy.

I'm not advocating for this, but wondered if this was a valuable
capability that should be preserved or just a spandrel.

What I will be advocating for is a window of malleability before the
classes are registered.  I'm with you regarding immutability once a
class is "visible", but I'd like the host language to have a shot at
things before the core classes are declared visible.   I also want to
make sure we preserve the current unadvertised ability to override the
core methods with compiled "overlay" extensions using LD_PRELOAD.
But this is a secret I'm waiting to spring on you at some later point
when have a better feel for the whole situation.

>  Even if you don't override any methods, the VTable
> stores other metadata, like the class name.
> (Which is why "VTable" should be
> renamed to "MetaClass" -- it's not just an array of function pointers any
> more.)

+0.  I see your logic, but I'm not a fan of "MetaClass" as a name.
VTable might be technically inaccurate, but it sets up the right
expectations.

> Thanks for starting up a requirements list, as recommended in that
> Joshua Bloch presentation.

Now I'm worried that you read every link I send you in full, and that
if I were to accidentally send the wrong long thing we could lose you
for weeks! I'll try to make sure to aim for high quality at least. :)

--nate

Re: [lucy-dev] OFFSET globals

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Thu, Apr 26, 2012 at 1:10 PM, Nathan Kurz <na...@verse.com> wrote:
> I was confused by the _OVERRIDE functions
> that are (were?) used to provide Perl callbacks.  I thought at first
> that the existence of this symbol was a signaling mechanism, but now
> realize that it's just the name of the wrapper function.

Yes.  I can see now how _OVERRIDE might have been misleading.

> I wonder if naming _WRAPPER, or _CALLBACK, or something host language
> specific like _PERL might be clearer.

It's an implementation detail, so no big deal what we go with.  _CALLBACK was
being used for something else until very recently, but has become available
once again.  _WRAPPER is fine.  I might suggest _HOSTCALL.

_PERL would be inappropriate right now because these autogenerated functions
are part of the host-agnostic core.  Consider this function, which makes it
possible to override Query#set_boost via the host:

    void
    lucy_Query_set_boost_OVERRIDE(lucy_Query* self, float boost) {
        cfish_Host_callback(self, "set_boost", 1,
                            CFISH_ARG_F64("boost", boost));
    }

If that were _PERL, we would eliminate Host_callback() indirection layer and
flesh out the function body with raw XS:

    void
    lucy_Query_set_boost_PERL(lucy_Query* self, float boost) {
        dSP;
        EXTEND(SP, 2);
        ENTER;
        SAVETMPS;
        PUSHMARK(SP);
        PUSHs(sv_2mortal((SV*)Lucy_Query_To_Host(self)));
        PUSHs(sv_2mortal(newSVnv(boost)));
        PUTBACK;
        call_method("set_boost", G_VOID | G_DISCARD);
        FREETMPS;
        LEAVE;
    }

Hmm... that might not be a bad idea.  The CFC sprintfs to generate XS would be
hilariously ugly, but there's nothing stopping us.

> In a previous message, Marvin writes:
>> Supporting CPAN/Rubygems/PyPI-style development for compiled extensions is of
>> paramount importance, IMO.
>
> I think I'm now understanding the implications of this, which means I
> forgot something from the list of requirements:
>
> 7) It must be possible to dynamically subclass a core class at runtime.

+1

You might say "non-final public class", since we support "final" classes.
(InStream and OutStream are both final.)

> Currently, this is distinct from loading a C extension compiled as a
> shared object, in that when we subclass from a scripting language the
> _OFFSET globals are not created.  With our current mechanism, I think
> this means that the dynamically created subclasses (and possibly the
> dynamically loaded subclasses) are not quite whole:  one can't (I
> think) dynamically subclass these subclasses.

It is true that subclasses written in pure Perl do not have C APIs.  It's
impossible to subclass something like LucyX::Remote::ClusterSearcher from
host-agnostic C because you can't get at its Perl-space-only constructor.  You
would need to write callback wrappers -- at which point it's no longer a
pure-host-language subclass.

> I think these should be harmonized, and that there should also be:
> 7) a. Dynamically created subclasses should be indistinguishable from
> and interchangeable with core classes.

I prefer your other formulation.  I don't think the inability to get at
pure-host-language subclasses from C poses a major impediment.

> Before thinking about approaches, there is at least one more potential
> constraint questions that concerns me:  How important is the ability
> to for objects to be able to have a "private" VTable?

I interpret your phrase '"private" VTable' as meaning "a dedicated VTable
singleton for each subclass, distinct from the VTable singleton belonging to
the parent class" -- in which case the answer is that per-class VTable
singletons are mandatory.  Even if you don't override any methods, the VTable
stores other metadata, like the class name. (Which is why "VTable" should be
renamed to "MetaClass" -- it's not just an array of function pointers any
more.)

...

Thanks for starting up a requirements list, as recommended in that
Joshua Bloch presentation.

Marvin Humphrey

Re: [lucy-dev] OFFSET globals

Posted by Nathan Kurz <na...@verse.com>.

On Wed, Apr 25, 2012 at 4:03 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> The output of CFC is definitely easier to understand than the code like
> S_bequeath_methods() that does the work. :)  Let's look at some generated
> code.
>
> [snip]

All wonderfully helpful and clear.  Thanks!

> I wouldn't say that TermQuery "signals" to anyone that it has overridden a
> method.

I think I understand now.   I was confused by the _OVERRIDE functions
that are (were?) used to provide Perl callbacks.  I thought at first
that the existence of this symbol was a signaling mechanism, but now
realize that it's just the name of the wrapper function.  I wonder if
naming _WRAPPER, or _CALLBACK, or something host language specific
like _PERL might be clearer.  Or could just be me.

> The problem that Nick has been working on is how to cut down on the number of
> OFFSET vars.  In order to guarantee that certain seemingly innocuous
> refactoring actions won't break the ABI, the current release of Lucy generates
> a huge number of OFFSET vars -- and they're all exposed as globals in the
> DSO.  Under the new design, we will have far fewer OFFSET vars, and when the
> build environment supports it, they will not be visible as global symbols in
> the DSO.

I think I'm following along now, although I still have some fuzzy
areas.   Thanks for your patience in explaining.   As I read through
the archives, I see myself asking the same questions repeatedly over
many years, with you answering very kindly in each case.

In a previous message, Marvin writes:
> Supporting CPAN/Rubygems/PyPI-style development for compiled extensions is of
> paramount importance, IMO.

I think I'm now understanding the implications of this, which means I
forgot something from the list of requirements:

7) It must be possible to dynamically subclass a core class at runtime.

Currently, this is distinct from loading a C extension compiled as a
shared object, in that when we subclass from a scripting language the
_OFFSET globals are not created.  With our current mechanism, I think
this means that the dynamically created subclasses (and possibly the
dynamically loaded subclasses) are not quite whole:  one can't (I
think) dynamically subclass these subclasses.

I think these should be harmonized, and that there should also be:
7) a. Dynamically created subclasses should be indistinguishable from
and interchangeable with core classes.

As it see it (from my fuzzy vantage point) this would mean that either
we have to avoid reliance on DSO symbols like the _OFFSET variables,
or we have to create these symbols when the subclasses are made at
runtime (with something like libbfd, possibly by writing an ELF file
and and reloading it with dlopen()).  Nick's proposal of doing a hash
table lookup for offsets leads to one solution, and the approach I was
envisioning (intimately twining with the dynamic loader) would be the
other.

Given the difficulty of doing cross-platform in-process symbol table
manipulation, Nick's approach of a runtime initialization with some
sort of find_offset() function seems quite appealing.  But if we want
the extensions to be first class citizens, I think we need to go all
the way and remove reliance on global _OFFSET symbols altogether, and
have all the metadata available from the VTable registry.

(Wow, how's that for oblique.  But I think I'm on the right path here.)

Before thinking about approaches, there is at least one more potential
constraint questions that concerns me:  How important is the ability
to for objects to be able to have a "private" VTable?  Is this a
requirement, or just an implementation detail of the current host
language subclassing using VTable_singleton?

--nate

Re: [lucy-dev] OFFSET globals

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Wed, Apr 25, 2012 at 3:27 PM, Nathan Kurz <na...@verse.com> wrote:
> Yes, very helpful.  That's essentially what I determined, but I got
> lost around S_bequeath_methods.  It's difficult reading through the
> sprintf()'s.
>
> Going one layer deeper to the object code level, which symbols are we
> currently using? I think I found where the OFFSETS come in, but I
> haven't figured out how overrides are signalled.

The output of CFC is definitely easier to understand than the code like
S_bequeath_methods() that does the work. :)  Let's look at some generated
code.

Here are TermQuery's methods:

cfish_method_t LUCY_TERMQUERY_METHODS[] = {
    (cfish_method_t)lucy_Obj_make,
    (cfish_method_t)lucy_Obj_get_refcount,
    (cfish_method_t)lucy_Obj_inc_refcount,
    (cfish_method_t)lucy_Obj_dec_refcount,
    (cfish_method_t)lucy_Obj_to_host,
    (cfish_method_t)lucy_Obj_clone,
    (cfish_method_t)lucy_TermQuery_destroy,
    (cfish_method_t)lucy_TermQuery_equals,
    (cfish_method_t)lucy_Obj_compare_to,
    (cfish_method_t)lucy_Obj_hash_sum,
    (cfish_method_t)lucy_Obj_get_vtable,
    (cfish_method_t)lucy_Obj_get_class_name,
    (cfish_method_t)lucy_Obj_is_a,
    (cfish_method_t)lucy_TermQuery_to_string,
    (cfish_method_t)lucy_Obj_to_i64,
    (cfish_method_t)lucy_Obj_to_f64,
    (cfish_method_t)lucy_Obj_to_bool,
    (cfish_method_t)lucy_TermQuery_serialize,
    (cfish_method_t)lucy_TermQuery_deserialize,
    (cfish_method_t)lucy_TermQuery_dump,
    (cfish_method_t)lucy_TermQuery_load,
    (cfish_method_t)lucy_Obj_mimic,
    (cfish_method_t)lucy_TermQuery_make_compiler,
    (cfish_method_t)lucy_Query_set_boost,
    (cfish_method_t)lucy_Query_get_boost,
    (cfish_method_t)lucy_TermQuery_get_field,
    (cfish_method_t)lucy_TermQuery_get_term,
    NULL
};

So, TermQuery overrides the following methods with its own "fresh"
implementations:

    Destroy        <--- Obj
    Equals         <--- Obj
    To_String      <--- Obj
    Serialize      <--- Obj
    Deserialize    <--- Obj
    Make_Compiler  <--- Query

TermQuery inherits many implementations from Obj.  It also inherits two
implementations from Query:

    Set_Boost
    Get_Boost

And then, TermQuery defines two "novel" methods of its own:

    Get_Field
    Get_Term

I wouldn't say that TermQuery "signals" to anyone that it has overridden a
method.  It's more that when somebody invokes a method on a TermQuery -- let's
pick Equals() -- the caller goes to the slot indicated by
Lucy_TermQuery_Equals_OFFSET and uses whatever function pointer happens to be
there.  Here's the actual method invocation code:

static CHY_INLINE chy_bool_t
Lucy_TermQuery_Equals(const lucy_TermQuery *self, lucy_Obj* other) {
    char *const method_address
        = *(char**)self + Lucy_TermQuery_Equals_OFFSET;
    const Lucy_TermQuery_Equals_t method
        = *((Lucy_TermQuery_Equals_t*)method_address);
    return method((lucy_TermQuery*)self, other);
}

Note that *(char**)self is a way of getting at the first member variable of
the lucy_TermQuery struct, which is a VTable*.

The problem that Nick has been working on is how to cut down on the number of
OFFSET vars.  In order to guarantee that certain seemingly innocuous
refactoring actions won't break the ABI, the current release of Lucy generates
a huge number of OFFSET vars -- and they're all exposed as globals in the
DSO.  Under the new design, we will have far fewer OFFSET vars, and when the
build environment supports it, they will not be visible as global symbols in
the DSO.

Marvin Humphrey

Re: [lucy-dev] OFFSET globals

Posted by Nathan Kurz <na...@verse.com>.

On Wed, Apr 25, 2012 at 2:40 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> The CFC source code is visually difficult to parse because C format strings
> and sprintf invocations suck readability-wise.  The generated code is easier
> to grok.

Yes, that does work much better.  I don't have a working development
machine right now, and was just browsing the online source.   I'll try
to rectify that when I have time.

> (It would be easier still if we had a smaller project to run CFC on.  Maybe a
> single-parcel sample project with 3 or 4 .cfh files under clownfish/sample
> would help.  But we still need to break out Lucy::Object* into Clownfish::*
> for that to work...)

Yes, this would be a great thing to do.  It seems likely that
Clownfish will experience some breakage when we use it a second time.
But if we can get to three:
http://lcsd05.cs.tamu.edu/slides/keynote.pdf ("if you write three").

>> I think I understand the finished layout, but is there an overview of how
>> the bootstrapping works?
>
> Here's how it works in the final product:
>
>    1. Lucy.pm gets loaded.
>    2. Lucy.pm invokes XSLoader::load() to load the DSO.
>    3. The lucy_Lucy_bootstrap() function (which is defined in
>       autogen/source/lucy_boot.c) gets invoked as part of the XS
>       loading process.
>    4. lucy_Lucy_bootstrap() runs lucy_bootstrap_parcel(), which is defined in
>       autogen/source/parcel.c.  [After lucy_bootstrap_parcel() returns, it
>       runs some hacks which preserve KinoSearch compatibility and then does
>       some Perl-specific initialization of @ISA variables.]
>    5. lucy_bootstrap_parcel() is where all the VTable bootstrapping stuff
>       happens -- so that's probably what you want to look at.  After the
>       VTable bootstrapping happens, the last thing lucy_bootstrap_parcel()
>       does is run lucy_init_parcel() (which is defined in core/Lucy.c).
>       lucy_init_parcel() runs 3 class-specific init functions:
>       lucy_Bool_init_class(), lucy_Hash_init_class(), and
>       lucy_Err_init_class().
>
> Hope that helps.

Yes, very helpful.  That's essentially what I determined, but I got
lost around S_bequeath_methods.  It's difficult reading through the
sprintf()'s.

Going one layer deeper to the object code level, which symbols are we
currently using? I think I found where the OFFSETS come in, but I
haven't figured out how overrides are signalled.

> It would be a lot of work to get to the point where we have something which
> matches or improves on markmail.org -- most of it front-end web work rather
> than Lucy-specific index design.  Between Apache's own archives (which I use
> in conjunction with the shortener at s.apache.org when I want to embed durable
> links in public emails which will be archived or issue-tracker comments) and
> markmail.org, it's hard to justify such an undertaking.

Very hard to justify having you spend your time on it, but if someone
else wanted to get _very_ familiar with Lucy and simultaneously give
something back to Apache it would be a great way to promote the
project.  :)

--nate

Re: [lucy-dev] OFFSET globals

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Wed, Apr 25, 2012 at 1:10 PM, Nathan Kurz <na...@verse.com> wrote:
> On Tue, Apr 24, 2012 at 4:32 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
>> The general solution is certainly for the upstream author to signal ABI
>> breakage via version number changes.  What I don't know how to do is automatic
>> detection of accidental or irresponsible ABI breakage -- we *have* to rely on
>> author-supplied version numbers.
>
> I'm catching up here, but I'm still feeling lost.  I've been reading
> the recent messages that I only skimmed, and browsing through the
> Clownfish source, but it feels like I'm still lacking some steps.

One technique which helps a little is to run the following commands from
within trunk/perl and then look at the files that CFC generates in ./autogen:

   perl Build.PL
   ./Build clownfish

The CFC source code is visually difficult to parse because C format strings
and sprintf invocations suck readability-wise.  The generated code is easier
to grok.

(It would be easier still if we had a smaller project to run CFC on.  Maybe a
single-parcel sample project with 3 or 4 .cfh files under clownfish/sample
would help.  But we still need to break out Lucy::Object* into Clownfish::*
for that to work...)

> I'm having trouble matching up the discussion here
> <http://mail-archives.apache.org/mod_mbox/lucy-dev/201204.mbox/%3CCAAS6=7gU_oUY1Mm3T-BJVg-cTYVbNQ6N7EkqDrUhzOv+bJ=90Q@mail.gmail.com%3E>
> to the C code.

Not all of the proposals in that email have been implemented yet, FWIW.  The
method access macros have been updated, and the capitalization conventions for
method typedefs have been changed.  However, the capitalization conventions
for methods have not changed, and method-implementation functions have not
been changed to static visibility.

> I think I understand the finished layout, but is there an overview of how
> the bootstrapping works?

Here's how it works in the final product:

    1. Lucy.pm gets loaded.
    2. Lucy.pm invokes XSLoader::load() to load the DSO.
    3. The lucy_Lucy_bootstrap() function (which is defined in
       autogen/source/lucy_boot.c) gets invoked as part of the XS
       loading process.
    4. lucy_Lucy_bootstrap() runs lucy_bootstrap_parcel(), which is defined in
       autogen/source/parcel.c.  [After lucy_bootstrap_parcel() returns, it
       runs some hacks which preserve KinoSearch compatibility and then does
       some Perl-specific initialization of @ISA variables.]
    5. lucy_bootstrap_parcel() is where all the VTable bootstrapping stuff
       happens -- so that's probably what you want to look at.  After the
       VTable bootstrapping happens, the last thing lucy_bootstrap_parcel()
       does is run lucy_init_parcel() (which is defined in core/Lucy.c).
       lucy_init_parcel() runs 3 class-specific init functions:
       lucy_Bool_init_class(), lucy_Hash_init_class(), and
       lucy_Err_init_class().

Hope that helps.

> It's sadly ironic, but it would be really helpful to have a better
> search tool for the Lucy source and mailing lists.   The Apache online
> tools for browsing the list archives and version control are clumsy,
> and Google has evolved so that I can no longer depend on it.  I
> realize it's as much my task as anyone else's, but using Lucy to
> improve the Apache tools might be a great way to showcase the project.

I tend to use http://lucy.markmail.org for searching the Lucy lists.  It's
powered by Lucene, and the interface is pretty nice.

It would be a lot of work to get to the point where we have something which
matches or improves on markmail.org -- most of it front-end web work rather
than Lucy-specific index design.  Between Apache's own archives (which I use
in conjunction with the shortener at s.apache.org when I want to embed durable
links in public emails which will be archived or issue-tracker comments) and
markmail.org, it's hard to justify such an undertaking.

Marvin Humphrey

Re: [lucy-dev] OFFSET globals

Posted by Nathan Kurz <na...@verse.com>.

On Tue, Apr 24, 2012 at 4:32 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> The general solution is certainly for the upstream author to signal ABI
> breakage via version number changes.  What I don't know how to do is automatic
> detection of accidental or irresponsible ABI breakage -- we *have* to rely on
> author-supplied version numbers.

I'm catching up here, but I'm still feeling lost.  I've been reading
the recent messages that I only skimmed, and browsing through the
Clownfish source, but it feels like I'm still lacking some steps.
I'm having trouble matching up the discussion here
<http://mail-archives.apache.org/mod_mbox/lucy-dev/201204.mbox/%3CCAAS6=7gU_oUY1Mm3T-BJVg-cTYVbNQ6N7EkqDrUhzOv+bJ=90Q@mail.gmail.com%3E>
to the C code.  I think I understand the finished layout, but is there
an overview of how the bootstrapping works?

It's sadly ironic, but it would be really helpful to have a better
search tool for the Lucy source and mailing lists.   The Apache online
tools for browsing the list archives and version control are clumsy,
and Google has evolved so that I can no longer depend on it.   I
realize it's as much my task as anyone else's, but using Lucy to
improve the Apache tools might be a great way to showcase the project.

--nate

Re: [lucy-dev] OFFSET globals

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Tue, Apr 24, 2012 at 2:21 PM, Nathan Kurz <na...@verse.com> wrote:
> On Tue, Apr 24, 2012 at 4:32 AM, Marvin Humphrey <ma...@rectangular.com> wrote:

>>> 4) In all other cases (removal or signature change of an inherited
>>> method) we just want to detect the incompatibility at start time and
>>> die with an appropriate error message rather than randomly crashing or
>>> doing the wrong thing.
>>
>> +0.  Nice if possible, but from the perspective of an existing compiled
>> extension, it's hard to detect when upstream has broken an ABI.
>
> Difficult, but I think as necessary as the other requirements.  If we
> don't do this, we have no way of knowing whether the system is stable,
> or just hasn't yet encountered a broken pathway.  The main library
> needs increment something every time an internal ABI changes, and this
> needs to be checked by the extension.   This is one of the places I
> hope we can use DSO symbol versioning, so this check would happen
> automatically at runtime linkage.

The general solution is certainly for the upstream author to signal ABI
breakage via version number changes.  What I don't know how to do is automatic
detection of accidental or irresponsible ABI breakage -- we *have* to rely on
author-supplied version numbers.

>> 6) It should be possible to remove methods from upstream which are not marked
>> as "public.
>
> What would be the desired behaviour when a compiled extension
> references a removed method?

A compiled extension should not reference a non-public method.  When the
build environment supports it, we should avoid exporting such symbols.  If a
non-public symbol gets exported despite our efforts and an extension author
references it, a runtime error when symbol relocation or bootstrapping fails is
acceptable.

The main point here is that we should only propagage OFFSET vars into
extension DSOs when those OFFSET vars correspond to *public* methods.  That
way, our automatically generated bootstrapping routines will not break when a
non-public method goes away.

>> Then, after we finish dealing with methods, the next step is to give upstream
>> the ability to add, remove or rearrange member variables. :)
>
> Can this wait for the second round, or should this be solved now as well?

It is a separate issue.  I have some ideas, though.

> For reference, here's a followup from Ian Wienand.  I don't yet
> comprehend his suggestion regarding libbfd.
>
> Ian Wienand replies:
>> Nathan Kurz writes:
>>    Unfortunately, the externally compiled Boxer_vtable has a fixed layout,
>>    and it puts Boxer_drool in the slot where the core expects to find eat().
>>    When the core tries to call eat() on a Boxer object, chaos will ensue.
>
> Yeah; aren't you also relying on the compiler laying out the
> structures the same, independent of you adding a function?

In Lucy's C code, we certainly rely on struct equivalence, make heavy use of
type punning and violate strict aliasing rules -- but the problem Ian is
referring to here has never bitten us.  We are at least aware enough of the
vagaries of struct padding to have avoided getting into trouble so far. :)

In the case of the Boxer_vtable and Dog_vtable sample pseudo-code, I omitted
types for brevity, apparently at the expense of clarity. Those vtables
would have been arrays of function pointers rather than structs.

       cfish_method_t Dog_vtable[] = {
           (cfish_method_t)Dog_bark,
           (cfish_method_t)Dog_bite
       };

       cfish_method_t Boxer_vtable[] = {
           (cfish_method_t)Boxer_bark,
           (cfish_method_t)Dog_bite,
           (cfish_method_t)Boxer_drool
       };

       // After revising "Dog"...

       cfish_method_t Dog_vtable[] = {
           (cfish_method_t)Dog_bark,
           (cfish_method_t)Dog_bite,
           (cfish_method_t)Dog_eat
       };

I would consider it a surprising result if the compiler were to lay those
arrays out incompatibly.  :)

>> Our goal is to figure out a way to leverage the runtime linker to do
>> the lifting work so we don't have to do our own bootstrapping.  Likely
>> this will be by specifying the VTable lengths (offsets) as a variable
>> within the .so rather than as hardcoded.  But fully understand that
>> this *might* not be the best use of your time today.  Thanks for the
>> response!
>
> I don't think the dynamic linker is going to help you much with this
> ... it seems you really need to version "Dog" and have "Boxer" tell
> the core what version of "Dog" it implements.  Something you could
> consider is stamping this info into a separate section of the .so with
> something like
>
> #define DOG_VERSION(version)
>  const char __dog_version __attribute__((section(".version.info"))) =
> "dog_version =" #version
>
> in your "boxer" plugin, and then use libbfd to read and parse that
> section at load time, and hence know what members to call.

It was generous of Ian to code up that sample for us.

"libbfd" is an object file parser; we could use it to extract DSO symbol
versions (on some platforms) and then change the behavior of the bootstrapping
routine on the fly based on the versioning info extracted from the extension.

The real problem we have been working on here is not really related to our
ability to version symbols -- it is that it is tough for the upstream author
to keep track of when the ABI has actually been broken.  The upstream author
has to realize, "Oops, I moved the first declaration of Do_Stuff() up into
BasicObj from Obj, so now I need to increment the ABI version."  Such a system
is guaranteed to fail frequently due to "human error" because it asks too much
of humans.

Fortunately, the design that we've hashed out renders that unnecessary -- by
removing the mechanism that would cause the ABI to break in such subtle ways.
Now, we just rely on the upstream author to change up the version number when
they have done something obvious to break the ABI, such as changing a method
signature.

Marvin Humphrey

Re: [lucy-dev] OFFSET globals

Posted by Nathan Kurz <na...@verse.com>.

On Tue, Apr 24, 2012 at 4:32 AM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Mon, Apr 23, 2012 at 10:20 PM, Nathan Kurz <na...@verse.com> wrote:
>> 2) We don't want to allow "monkey-patching" of classes at runtime,
>> such that an extension can make changes to parent classes that
>> propagate back to the child classes.
>
> I think you can stop at "make changes to parent classes [period]".  As far as
> Clownfish is concerned, once a class has been registered, it is immutable.

OK, slightly more stringent but also true.  Children load after their
parents, and are allowed presume that their parents are immutable.
This follows from the requirement that registration happens from the
top down and that all classes are immutable once registered.

>> 4) In all other cases (removal or signature change of an inherited
>> method) we just want to detect the incompatibility at start time and
>> die with an appropriate error message rather than randomly crashing or
>> doing the wrong thing.
>
> +0.  Nice if possible, but from the perspective of an existing compiled
> extension, it's hard to detect when upstream has broken an ABI.

Difficult, but I think as necessary as the other requirements.  If we
don't do this, we have no way of knowing whether the system is stable,
or just hasn't yet encountered a broken pathway.  The main library
needs increment something every time an internal ABI changes, and this
needs to be checked by the extension.   This is one of the places I
hope we can use DSO symbol versioning, so this check would happen
automatically at runtime linkage.

>> Are these correct?  Are the requirements I'm missing?
>
> I would add this:
>
> 6) It should be possible to remove methods from upstream which are not marked
> as "public.

What would be the desired behaviour when a compiled extension
references a removed method?

> Then, after we finish dealing with methods, the next step is to give upstream
> the ability to add, remove or rearrange member variables. :)

Can this wait for the second round, or should this be solved now as well?

---

For reference, here's a followup from Ian Wienand.  I don't yet
comprehend his suggestion regarding libbfd.

Ian Wienand replies:
> Nathan Kurz writes:
>    Unfortunately, the externally compiled Boxer_vtable has a fixed layout,
>    and it puts Boxer_drool in the slot where the core expects to find eat().
>    When the core tries to call eat() on a Boxer object, chaos will ensue.

Yeah; aren't you also relying on the compiler laying out the
structures the same, independent of you adding a function?

> Our goal is to figure out a way to leverage the runtime linker to do
> the lifting work so we don't have to do our own bootstrapping.  Likely
> this will be by specifying the VTable lengths (offsets) as a variable
> within the .so rather than as hardcoded.  But fully understand that
> this *might* not be the best use of your time today.  Thanks for the
> response!

I don't think the dynamic linker is going to help you much with this
... it seems you really need to version "Dog" and have "Boxer" tell
the core what version of "Dog" it implements.  Something you could
consider is stamping this info into a separate section of the .so with
something like

#define DOG_VERSION(version)
 const char __dog_version __attribute__((section(".version.info"))) =
"dog_version =" #version

in your "boxer" plugin, and then use libbfd to read and parse that
section at load time, and hence know what members to call.

Re: [lucy-dev] OFFSET globals

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Mon, Apr 23, 2012 at 10:20 PM, Nathan Kurz <na...@verse.com> wrote:
> A fine reason to finally do this right, and having him excited to
> solve this is another.  But I haven't been sure whether it's Nick
> demanding an upgrade proof ABI, or whether you feel it should be one.
> I was wondering whether it would be better to have him continue
> writing extensions and recompiling as necessary until the ABI settles
> down, since the process of writing these extensions will probably
> cause faster change than will be the usual case.
>
> But as long as _both_ of you view this as a preliminary requirement,
> it's a great thing to get out of the way.

Supporting CPAN/Rubygems/PyPI-style development for compiled extensions is of
paramount importance, IMO.  Extension authors need to be able to rapid
prototype in their host language of choice, and then at the time they deem
appropriate, port their extension to a compiled format for efficiency.  It
cannot be the case that in order to attain maximum performance, features must
be accepted into core, distributed and compiled with core, and then supported
in perpetuity by the core development team.  That development model does not
scale.

Of course a feature might go into core because there is consensus that it
belongs there, but it can't be the case that "core is fast and extensions are
slow", creating an unnatural pressure to bloat core that is unhealthy for the
community.

> 1) We want to make it possible to add methods to core Lucy objects on
> a library upgrade without requiring a recompile of extension classes
> that inherit from these.

+1

> 2) We don't want to allow "monkey-patching" of classes at runtime,
> such that an extension can make changes to parent classes that
> propagate back to the child classes.

I think you can stop at "make changes to parent classes [period]".  As far as
Clownfish is concerned, once a class has been registered, it is immutable.

For an extension which is implemented in a host language, you might get away
with some lazy code generation funny business, but that's an implementation
detail and you're on your own.

In particular: once registered, VTables are immutable[1].  No adding new
methods, no modification of method pointers or any member variables.  VTables
are shared globals; non-constant shared globals suck for many, many reasons,
including thread-safety, vulnerability to action-at-a-distance bugs,
resistance to compiler optimizations, etc.

For what it's worth, there is a direct parallel between Lucy allowing addition
of new segments but forbidding modification of existing segment data (enabling
mmap with MMAP_SHARED), and Clownfish allowing incremental compilation of new
classes but forbidding modification of existing ones (allowing VTables to be
shared across threads).

> 3) We want to handle the case of a method that moves from a parent
> class to a grandparent, and vice versa.   As long as it's in the chain
> of ancestors, the exact location shouldn't matter.

+1, well articulated.

> 4) In all other cases (removal or signature change of an inherited
> method) we just want to detect the incompatibility at start time and
> die with an appropriate error message rather than randomly crashing or
> doing the wrong thing.

+0.  Nice if possible, but from the perspective of an existing compiled
extension, it's hard to detect when upstream has broken an ABI.

> 5) It's OK to start with a Linux-only solution, so long as there is a
> pathway to a solution on Windows and OSX, and

+1, so long as there aren't regressions at release points.  It wouldn't be
cool to release version Lucy 0.4.0 and have it broken everywhere but Linux,
but that policy sounds fine for new features.

> it's OK to to have a
> solution that is efficient only on x64 as long as there is an x86
> workaround. (?)

+1

> Are these correct?  Are the requirements I'm missing?

I would add this:

6) It should be possible to remove methods from upstream which are not marked
as "public.

Then, after we finish dealing with methods, the next step is to give upstream
the ability to add, remove or rearrange member variables. :)

Marvin Humphrey

[1] In the present implementation, the cached host object in a VTable is
    mutable.  This is a bug.

Re: [lucy-dev] OFFSET globals

Posted by Nick Wellnhofer <we...@aevum.de>.

On 24/04/2012 07:20, Nathan Kurz wrote:
> A fine reason to finally do this right, and having him excited to
> solve this is another.  But I haven't been sure whether it's Nick
> demanding an upgrade proof ABI, or whether you feel it should be one.
> I was wondering whether it would be better to have him continue
> writing extensions and recompiling as necessary until the ABI settles
> down, since the process of writing these extensions will probably
> cause faster change than will be the usual case.
>
> But as long as _both_ of you view this as a preliminary requirement,
> it's a great thing to get out of the way.

For my own needs, I don't care much about ABI compatibility. I actually 
started this thread with a proposal that would make ABI compatibility 
very hard to maintain.

I also wouldn't say that this compatibility work is essential for 
compiled extensions. But I realized it's an interesting and important 
issue that I'd like to work on.

Nick

Re: [lucy-dev] OFFSET globals

Posted by Nathan Kurz <na...@verse.com>.

On Mon, Apr 23, 2012 at 8:38 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
>> 1) This can and should be solved through DSO symbol versioning.
>
> Hmm, I'm not sure that the problem we're working on can actually be solved
> this way.  What we're tying to protect against is the possible *removal* of a
> symbol.  I don't see how you guard against that using DSO symbol versioning.

Then you are likely right!  I'm still doing background reading to
fully understand the problem.  Just made may way through the short and
quite readable paper by Wu (which I found easier to read as a PDF:
http://static.usenix.org/events/javavm02/yu/yu.pdf).  Also finding a
couple posts by Eli to be useful:
http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/

Also, I didn't mean that symbol versioning was all we needed -- rather
 I was thinking (and still think, but perhaps only due to ignorance)
that  we can leverage the runtime linker to solve most of this
problem.  We'd still need some way of updating the the appropriate
offsets, but my hope was that we could get the linker to do this for
us rather than needing to do our own second bootstrap pass.

>> 2) It's great to get things right so as to have a solid foundation, but
>> 3) We got along just fine for years without actually implementing this, thus
>> 4) We don't need to solve this right now, just future-proof ourselves.
>
> Well, the reason we're working on this now is that Nick wants to write
> compiled extensions, and we want to be able to update Lucy without breaking
> his compiled extensions.  We haven't needed this until now because we haven't
> supported compiled extensions, but that's changing!

A fine reason to finally do this right, and having him excited to
solve this is another.  But I haven't been sure whether it's Nick
demanding an upgrade proof ABI, or whether you feel it should be one.
I was wondering whether it would be better to have him continue
writing extensions and recompiling as necessary until the ABI settles
down, since the process of writing these extensions will probably
cause faster change than will be the usual case.

But as long as _both_ of you view this as a preliminary requirement,
it's a great thing to get out of the way.

---

Checking that my understanding the scope of the problem is correct:

1) We want to make it possible to add methods to core Lucy objects on
a library upgrade without requiring a recompile of extension classes
that inherit from these.

2) We don't want to allow "monkey-patching" of classes at runtime,
such that an extension can make changes to parent classes that
propagate back to the child classes.

3) We want to handle the case of a method that moves from a parent
class to a grandparent, and vice versa.   As long as it's in the chain
of ancestors, the exact location shouldn't matter.

4) In all other cases (removal or signature change of an inherited
method) we just want to detect the incompatibility at start time and
die with an appropriate error message rather than randomly crashing or
doing the wrong thing.

5) It's OK to start with a Linux-only solution, so long as there is a
pathway to a solution on Windows and OSX, and it's OK to to have a
solution that is efficient only on x64 as long as there is an x86
workaround. (?)

Are these correct?  Are the requirements I'm missing?

--nate

Re: [lucy-dev] OFFSET globals

Posted by Nathan Kurz <na...@verse.com>.

On Mon, Apr 23, 2012 at 11:01 PM, Nathan Kurz <na...@verse.com> wrote:
> I'll try to write to him (Wienand) and see if he can clarify or fix
> the text.  Both bottomupcs.com and his blog at technovelty.org seem to
> have a lot of useful information.

Got a very quick and polite response from Ian Wienand, confirming that
LD_PRELOAD does indeed go at the front of the list, and that the first
occurrence is preferred.

He offered this link to the source for details:
http://repo.or.cz/w/glibc.git/blob/HEAD:/elf/rtld.c#l1696

His response did make me wonder about another question:  are we
concerned with loading extensions at runtime (after startup) through
dlopen() type mechanisms, or do we expect all extensions to be
available at compile time?

--nate

Re: [lucy-dev] OFFSET globals

Posted by Nathan Kurz <na...@verse.com>.

On Mon, Apr 23, 2012 at 8:38 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Mon, Apr 23, 2012 at 12:49 PM, Nathan Kurz <na...@verse.com> wrote:
>> On Mon, Apr 23, 2012 at 7:55 AM, Marvin Humphrey <ma...@rectangular.com> wrote:
>>> My understanding is that in the event that two ELF DSOs export the same
>>> symbol, the general rule is that the last library loaded wins[1]:
>>
>> I'm pretty sure that at runtime it's the first definition found that
>> wins, which is why LD_PRELOAD can be used to override functions like
>> malloc().
>>
>>>    http://www.akkadia.org/drepper/dsohowto.pdf
>>
>> "Note that there is no problem if the scope contains more than one
>> definition of the same symbol. The symbol lookup algorithm simply
>> picks up the first definition it finds." p.6, 1.5.2
>
> Ah, that's a better quote than the one I found.  Thanks!
>
> This page has more on symbol resolution, and it's a little more approachable
> than the Drepper:
>
>    http://www.bottomupcs.com/libraries_and_the_linker.html
>
>    It is often very useful for a programmer to be able to override a symbol
>    in a library; that is to subvert the normal symbol with a different
>    definition.
>
>    We mentioned that the order that libraries is searched is given by the
>    order of the DT_NEEDED fields within the library. However, it is possible
>    to insert libraries as the last libraries to be searched; this means that
>    any symbols within them will be found as the final reference.
>
>    This is done via an environment variable called LD_PRELOAD which specifies
>    libraries that the linker should load last.

I've yet to figure out how to reconcile these two quotes.  One seems
to say first, and the other says last.  My best guess is that the
Drepper paper is correct (ie, first reference wins) and the "Computer
Science from the Bottom Up" is mistaken.  But I don't have a lot of
basis for this.

1) http://linux.die.net/man/3/dlsym
The definition of RTLD_DEFAULT refers to the "first occurrence", and
"next" would seem to refer to later libraries.

2) It would sad but not unprecedented that "LD_PRELOAD which specifies
 libraries that the linker should load last."  One would hope that
preloads would happen first.

3) Drepper is probably more of an authority than Wienand, the author
of bottomupcs.com.

4) In a fairly recent blog post
(http://www.technovelty.org/linux/pltgot.html) Wienand also says
"LD_PRELOAD, for example, simply tells the dynamic loader it should
insert a library as first to be looked-up for symbols".

I'll try to write to him (Wienand) and see if he can clarify or fix
the text.  Both bottomupcs.com and his blog at technovelty.org seem to
have a lot of useful information.

--nate

Re: [lucy-dev] OFFSET globals

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Mon, Apr 23, 2012 at 12:49 PM, Nathan Kurz <na...@verse.com> wrote:
> On Mon, Apr 23, 2012 at 7:55 AM, Marvin Humphrey <ma...@rectangular.com> wrote:
>> My understanding is that in the event that two ELF DSOs export the same
>> symbol, the general rule is that the last library loaded wins[1]:
>
> I'm pretty sure that at runtime it's the first definition found that
> wins, which is why LD_PRELOAD can be used to override functions like
> malloc().
>
>>    http://www.akkadia.org/drepper/dsohowto.pdf
>
> "Note that there is no problem if the scope contains more than one
> definition of the same symbol. The symbol lookup algorithm simply
> picks up the first definition it finds." p.6, 1.5.2

Ah, that's a better quote than the one I found.  Thanks!

This page has more on symbol resolution, and it's a little more approachable
than the Drepper:

    http://www.bottomupcs.com/libraries_and_the_linker.html

    It is often very useful for a programmer to be able to override a symbol
    in a library; that is to subvert the normal symbol with a different
    definition.

    We mentioned that the order that libraries is searched is given by the
    order of the DT_NEEDED fields within the library. However, it is possible
    to insert libraries as the last libraries to be searched; this means that
    any symbols within them will be found as the final reference.

    This is done via an environment variable called LD_PRELOAD which specifies
    libraries that the linker should load last.

    ...

    We have seen how we can override a function in another library by
    preloading another shared library with the same symbol defined. The symbol
    that gets resolved as the final one is the last one in the order that the
    dynamic loader loads the libraries.

    Libraries are loaded in the order they are specified in the DT_NEEDED flag
    of the binary. This in turn is decided from the order that libraries are
    passed in on the command line when the object is built. When symbols are
    to be located, the dynamic linker starts at the last loaded library and
    works backwards until the symbol is found.

> I've been planning to chime in on this, but haven't had time to do it
> justice.  My quick thoughts are that:
>
> 1) This can and should be solved through DSO symbol versioning.

Hmm, I'm not sure that the problem we're working on can actually be solved
this way.  What we're tying to protect against is the possible *removal* of a
symbol.  I don't see how you guard against that using DSO symbol versioning.

> 2) It's great to get things right so as to have a solid foundation, but
> 3) We got along just fine for years without actually implementing this, thus
> 4) We don't need to solve this right now, just future-proof ourselves.

Well, the reason we're working on this now is that Nick wants to write
compiled extensions, and we want to be able to update Lucy without breaking
his compiled extensions.  We haven't needed this until now because we haven't
supported compiled extensions, but that's changing!

> I'm rusty and busy, though, and might not be of immediate help.  In
> addition to Ulrich's paper, Ian's series here might be a good
> refresher: http://www.airs.com/blog/page/4?s=%22linkers+part%22
>
> I haven't found a good overview of the links, but versioning is covered in
> Part 9: Symbol Versions http://www.airs.com/blog/archives/46
> Part 13: Symbol Versions Redux http://www.airs.com/blog/archives/50

Thanks for these links (and for being the first to introduce Drepper's paper
to me a while back).  I've been wondering if DSO symbol versioning could help
us.  After reading these materials, I'm reasonably sure it cannot --
though if I'm
wrong, I would be excited to learn why!

Marvin Humphrey

Re: [lucy-dev] OFFSET globals

Posted by Nathan Kurz <na...@verse.com>.

On Mon, Apr 23, 2012 at 7:55 AM, Marvin Humphrey <ma...@rectangular.com> wrote:
> My understanding is that in the event that two ELF DSOs export the same
> symbol, the general rule is that the last library loaded wins[1]:

I'm pretty sure that at runtime it's the first definition found that
wins, which is why LD_PRELOAD can be used to override functions like
malloc().

>    http://www.akkadia.org/drepper/dsohowto.pdf

"Note that there is no problem if the scope contains more than one
definition of the same symbol. The symbol lookup algorithm simply
picks up the first definition it finds." p.6, 1.5.2

> [1] "How to Write Shared Libraries" by Ulrich Drepper, section 1.4.5.

Thought maybe you were referencing a different version of the paper,
then figured out that you meant "1.5.4" (transposition).

---

I've been planning to chime in on this, but haven't had time to do it
justice.  My quick thoughts are that:

1) This can and should be solved through DSO symbol versioning.
2) It's great to get things right so as to have a solid foundation, but
3) We got along just fine for years without actually implementing this, thus
4) We don't need to solve this right now, just future-proof ourselves.

I'm rusty and busy, though, and might not be of immediate help.  In
addition to Ulrich's paper, Ian's series here might be a good
refresher: http://www.airs.com/blog/page/4?s=%22linkers+part%22

I haven't found a good overview of the links, but versioning is covered in
Part 9: Symbol Versions http://www.airs.com/blog/archives/46
Part 13: Symbol Versions Redux http://www.airs.com/blog/archives/50

--nate

Re: [lucy-dev] OFFSET globals

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Mon, Apr 23, 2012 at 4:09 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> On 23/04/2012 06:15, Marvin Humphrey wrote:

>>   * Generating headers suddenly get more complicated, because the content
>>     derived from each .cfh file will be different for each target parcel!
>>     Plus, things like METHOD and SUPER_METHOD will require parcel-specific
>>     implementations...
>
> This could be handled with defines:
>
> #define Lucy_Hash_Fetch_OFFSET ext_Hash_fetch_OFFSET

Hmm, wait a minute... I think that even on systems where we can't stop symbol
export, ELF symbol resolution rules may work in our favor and render this
complexity unnecessary.

My understanding is that in the event that two ELF DSOs export the same
symbol, the general rule is that the last library loaded wins[1]:

    http://www.akkadia.org/drepper/dsohowto.pdf

    Note that there can be many references to the same symbol in different
    objects. The result of the lookup can be different for each of the objects
    so there can be no short cuts except for caching results for a symbol in
    each object in case more than one relocation references the same symbol.
    The lookup scope mentioned in the steps below is an ordered list of a
    subset of the loaded objects which can be different for each object
    itself.

(I imagine symbol resolution of Mach-o DSOs on OS X follows similar rules.)

But if Clownfish.so and Lucy.so both export Cfish_Hash_Fetch_OFFSET, it
doesn't matter which copy of the var gets used so long as all copies get
initialized to the proper value during bootstrapping (and no method calls are
made during the boostrapping phase). The values of these OFFSET vars are
constant once bootstrapping completes.

The bootstrapping will work because there will only be a single canonical
copy of each VTable, and it will know the offsets.  Cyclic dependency graphs
are a problem, but that was already true regardless of visibility.

So I don't think symbol visibility matters unless there are systems out there
where we can't stop symbol export AND the dynamic loader crashes if it finds
multiple copies of the same symbol.  I'm not an expert on this subject (yet!),
but probably such a system does not exist.

Marvin Humphrey

[1] "How to Write Shared Libraries" by Ulrich Drepper, section 1.4.5.

Re: [lucy-dev] OFFSET globals

Posted by Nick Wellnhofer <we...@aevum.de>.

On 23/04/2012 06:15, Marvin Humphrey wrote:
> On Sun, Apr 22, 2012 at 8:11 AM, Nick Wellnhofer<we...@aevum.de>  wrote:
>> On 20/04/2012 19:44, Marvin Humphrey wrote:
>>> Creating OFFSET vars for every invocant/method-name combo solved that
>>> problem, though at the cost of considerable DLL symbol proliferation.
>>> However, as you point out, a mechanism to propagate the offsets was never
>>> introduced.
>>
>> OK, I see. Then I would propose the following solution: We go back to
>> defining offsets only for novel methods. But we define the offset variables
>> per-parcel even for included classes. Then we lookup the offsets of included
>> methods by method name at run-time. We'll have to store a bit of method
>> metadata to make this work. Patch 03 in LUCY-231 already goes in that
>> direction.
>>
>> So a compiled extension would have something like the following in parcel.c:
>>
>> size_t ext_Hash_fetch_OFFSET;
>>
>> ...
>>
>> void
>> ext_bootstrap_parcel() {
>>     ext_Hash_fetch_OFFSET
>>         = cfish_VTable_find_offset(LUCY_HASH, "Fetch");
>>     ...
>> }
>
> It's a provocative idea -- I definitely had not considered such an approach!
>
>> Note that the offsets of included classes are also prefixed with "ext_", not
>> "lucy_".
>
> I think you need both prefixes: "Ext_Lucy_Hash_Fetch_OFFSET", or after
> Hash moves under Clownfish, "Ext_Cfish_Hash_Fetch_OFFSET".  Otherwise you
> couldn't have a class named "Hash" in your extension.

Good point.

>> Every extension will use its own private copy. This makes it possible to
>> hide the offset variables in the DLL.
>
> OK, so this means...
>
>    * A novel method will spawn one variable for each extension, rather than one
>      variable for each subclass.
>    * Or to think of it another way, each extension will duplicate the vars for
>      all of its direct dependencies.  (I don't believe an extension needs to
>      dupe the vars of indirect dependencies.)
>    * Lookups of OFFSET vars will not go through the GOT (global offset table).

Yes, at least on some platforms.

>    * Generating headers suddenly get more complicated, because the content
>      derived from each .cfh file will be different for each target parcel!
>      Plus, things like METHOD and SUPER_METHOD will require parcel-specific
>      implementations...

This could be handled with defines:

#define Lucy_Hash_Fetch_OFFSET ext_Hash_fetch_OFFSET

>      I think this means Lucy, which will have at least two
>      parcels eventually including a separate parcel for tests, will need to run
>      CFC multiple times and output into different dirs...

That shouldn't be a problem. We'd have to that already, because of the 
lucy_bootstrap_parcel and lucy_init_parcel functions.

> Sounds like this would be a significant improvment in terms of DSO symbol
> economy over the current system.
>
> I'm wincing at how messy CFC would have to become to pull it off, though.

Yeah, the changes are pretty intrusive and I'm not sure if the added 
complexity is worth it.

Nick

Re: [lucy-dev] OFFSET globals

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Sun, Apr 22, 2012 at 8:11 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> On 20/04/2012 19:44, Marvin Humphrey wrote:
>> Creating OFFSET vars for every invocant/method-name combo solved that
>> problem, though at the cost of considerable DLL symbol proliferation.
>> However, as you point out, a mechanism to propagate the offsets was never
>> introduced.
>
> OK, I see. Then I would propose the following solution: We go back to
> defining offsets only for novel methods. But we define the offset variables
> per-parcel even for included classes. Then we lookup the offsets of included
> methods by method name at run-time. We'll have to store a bit of method
> metadata to make this work. Patch 03 in LUCY-231 already goes in that
> direction.
>
> So a compiled extension would have something like the following in parcel.c:
>
> size_t ext_Hash_fetch_OFFSET;
>
> ...
>
> void
> ext_bootstrap_parcel() {
>    ext_Hash_fetch_OFFSET
>        = cfish_VTable_find_offset(LUCY_HASH, "Fetch");
>    ...
> }

It's a provocative idea -- I definitely had not considered such an approach!

> Note that the offsets of included classes are also prefixed with "ext_", not
> "lucy_".

I think you need both prefixes: "Ext_Lucy_Hash_Fetch_OFFSET", or after
Hash moves under Clownfish, "Ext_Cfish_Hash_Fetch_OFFSET".  Otherwise you
couldn't have a class named "Hash" in your extension.

> Every extension will use its own private copy. This makes it possible to
> hide the offset variables in the DLL.

OK, so this means...

  * A novel method will spawn one variable for each extension, rather than one
    variable for each subclass.
  * Or to think of it another way, each extension will duplicate the vars for
    all of its direct dependencies.  (I don't believe an extension needs to
    dupe the vars of indirect dependencies.)
  * Lookups of OFFSET vars will not go through the GOT (global offset table).
  * Generating headers suddenly get more complicated, because the content
    derived from each .cfh file will be different for each target parcel!
    Plus, things like METHOD and SUPER_METHOD will require parcel-specific
    implementations...  I think this means Lucy, which will have at least two
    parcels eventually including a separate parcel for tests, will need to run
    CFC multiple times and output into different dirs...

Sounds like this would be a significant improvment in terms of DSO symbol
economy over the current system.

I'm wincing at how messy CFC would have to become to pull it off, though.

Marvin Humphrey

Re: [lucy-dev] OFFSET globals

Posted by Nick Wellnhofer <we...@aevum.de>.

On 20/04/2012 19:44, Marvin Humphrey wrote:
> On Fri, Apr 20, 2012 at 3:02 AM, Nick Wellnhofer<we...@aevum.de>  wrote:
>> On 19/04/2012 23:55, Marvin Humphrey wrote:
>>> Unfortunately, applying this patch would severely constrain the development
>>> of the Lucy core.  Because it freezes the layout of our vtables by
>>> hard-coding the offsets at which method pointers are located, we would not
>>> be able to add new methods to Lucy (among other problems) without breaking
>>> compiled extensions.
>>
>> But that's exactly what we do right now. The offsets are never changed.
>
> Heh.
>
> The original concept was to have one OFFSET var per novel method.  However,
> that design was flawed because it caused the ABI to break if the method's
> novel declaration moved from one class to another, e.g. up into a parent
> class: a compiled extension would be counting on the existence of
> Parcel_OldClass_MethodName_OFFSET, but that var would go away with the new
> release of the core, breaking ABI compat.
>
> Creating OFFSET vars for every invocant/method-name combo solved that problem,
> though at the cost of considerable DLL symbol proliferation.  However, as you
> point out, a mechanism to propagate the offsets was never introduced.

OK, I see. Then I would propose the following solution: We go back to 
defining offsets only for novel methods. But we define the offset 
variables per-parcel even for included classes. Then we lookup the 
offsets of included methods by method name at run-time. We'll have to 
store a bit of method metadata to make this work. Patch 03 in LUCY-231 
already goes in that direction.

So a compiled extension would have something like the following in parcel.c:

size_t ext_Hash_fetch_OFFSET;

...

void
ext_bootstrap_parcel() {
     ext_Hash_fetch_OFFSET
         = cfish_VTable_find_offset(LUCY_HASH, "Fetch");
     ...
}

Note that the offsets of included classes are also prefixed with "ext_", 
not "lucy_". Every extension will use its own private copy. This makes 
it possible to hide the offset variables in the DLL.

VTable_find_offset can then find the correct offset even if a novel 
method has moved to a parent class.

Nick

Re: [lucy-dev] OFFSET globals

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, Apr 20, 2012 at 3:02 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> On 19/04/2012 23:55, Marvin Humphrey wrote:
>> Unfortunately, applying this patch would severely constrain the development
>> of the Lucy core.  Because it freezes the layout of our vtables by
>> hard-coding the offsets at which method pointers are located, we would not
>> be able to add new methods to Lucy (among other problems) without breaking
>> compiled extensions.
>
> But that's exactly what we do right now. The offsets are never changed.

Heh.

The original concept was to have one OFFSET var per novel method.  However,
that design was flawed because it caused the ABI to break if the method's
novel declaration moved from one class to another, e.g. up into a parent
class: a compiled extension would be counting on the existence of
Parcel_OldClass_MethodName_OFFSET, but that var would go away with the new
release of the core, breaking ABI compat.

Creating OFFSET vars for every invocant/method-name combo solved that problem,
though at the cost of considerable DLL symbol proliferation.  However, as you
point out, a mechanism to propagate the offsets was never introduced.

> If we want binary compatibility across different versions, a compiled
> extension would have to create the offsets at run-time.

Yes.

>> I have occasionally fantasized about writing a JIT for Clownfish to get rid
>> of those OFFSET globals: something along the lines of what Dachuan Yu et al
>> propose in their paper.  However, their mechanism involves requires
>> knowledge of the class implementation, which in Clownfish's case resides in
>> C code that CFC doesn't have access to.
>
> AFAICS, we'll need something like the mechanism described in the paper. In
> the Dog/Boxer example above, the Boxer VTable could be intialized like this:
>
>    BOXER = VTable_allocate(DOG->vt_alloc_size + extra_space)
>    memcpy(BOXER->methods, DOG->methods, dog_methods_size)
>    Boxer_Bark_OFFSET  = Dog_Bark_OFFSET;
>    Boxer_Bite_OFFSET  = Dog_Bite_OFFSET;
>    Boxer_Drool_OFFSET = DOG->vt_alloc_size;
>    VTable_Override(BOXER, Boxer_bark,  Boxer_Bark_OFFSET);
>    VTable_Override(BOXER, Boxer_drool, Boxer_Drool_OFFSET);
>
> This shouldn't be too hard to implement.

Indeed, this is exactly what we need to complete the existing design.
Excellent diagnosis and prescription, Herr Doktor Wellnhofer. :)

I've considered several other approaches to solving this problem, but I
haven't yet found one that's suitable.

A traditional JIT would transform an intermediate stage of compilation into
final object code.  In our case, the only transformation we want to make is
to update hard-coded vtable offsets at every method invocation site.
However, Clownfish at this point is just an interface description language,
and the C implementation code compiles down to system-specific library formats
like ELF, DLL, etc.  I don't know of a way to treat those as input to a JIT.
:)

In theory, another possibility is to turn method invocations into global
functions which delegate.

    // autogen

    bool_t
    Cfish_Obj_Equals(const cfish_Obj *self, cfish_Obj *other) {
        char *const method_address = *(char**)self
                                   + cfish_VTable_offset_of_methods
                                   + Cfish_Obj_Equals_NUM *
sizeof(cfish_method_t);
        const Cfish_Obj_Equals_t method =
*((Cfish_Obj_Equals_t*)method_address);
        return method(self, other);
    }

    // Obj.c

    bool_t
    Cfish_Obj_Equals_IMPL(cfish_Obj *self, cfish_Obj *other) {
        // ...
    }

However, then we face the same symbol proliferation problem we're currently
facing with our global OFFSET vars: we need one delegator for each
invocant type,
or the ABI will break if the first declaration moves to another class.

Another approach is to use static variables which trigger some sort of
initialization the first time they are invoked, causing the method invocation
behavior to self-modify.

Here's one variant which modifies a static function pointer...

    static chy_bool_t
    Cfish_Obj_Equals_INIT(cfish_Obj *self, cfish_Obj *other);

    static Cfish_Obj_Equals_t Cfish_Obj_Equals = Cfish_Obj_Equals_INIT;

    static chy_bool_t
    Cfish_Obj_Equals_INIT(cfish_Obj *self, cfish_Obj *other) {
        CFish_Obj_Equals
            = (CFish_Obj_Equals_t)Obj_Look_Up_Delegator(self, "equals");
        return Cfish_Obj_Equals(self, other);
    }

... and here's an alternative that uses a static OFFSET var, but
requires an extra
conditional check for each method invocation.

    static size_t Cfish_Obj_Equals_OFFSET = 0;
    static CHY_INLINE chy_bool_t bool_t
    Cfish_Obj_Equals(cfish_Obj *self, cfish_Obj *other) {
        if (Cfish_Obj_Equals_OFFSET == 0) {
            CFish_Obj_Equals_OFFSET
                = (CFish_Obj_Equals_t)Obj_Look_Up_Offset(self, "equals");
        }
        char *const method_address = *(char**)self + Cfish_Obj_Equals_OFFSET;
        const Cfish_Obj_Equals_t method =
*((Cfish_Obj_Equals_t*)method_address);
        return method(self, other);
    }

These self-modifying-init techniques are sort of similar to the Yu proposal, but
have drawbacks:

  * Initialization happens lazily and piecemeal, as opposed to once per
    class file at boot as part of the Java class loader in Yu.
  * AFAICT, there's no way to avoid the overhead of either a delegator or
    an extra conditional for each method invocation.

Marvin Humphrey

Re: [lucy-dev] OFFSET globals

Posted by Nick Wellnhofer <we...@aevum.de>.

On 19/04/2012 23:55, Marvin Humphrey wrote:
> On Thu, Apr 19, 2012 at 9:12 AM, Nick Wellnhofer<we...@aevum.de>  wrote:
>> Attached is a patch that replaces the OFFSET globals with NUM #defines which
>> contain the method number. This more than halves the number of extern
>> symbols in Lucy.so. The only cost is a constant add for every method
>> invocation which should be negligible.
>>
>> This also removes the use of offsetof in parcel.c, so we don't need the
>> definition of the VTable struct in that file.
>>
>> Thoughts?
>
> Unfortunately, applying this patch would severely constrain the development of
> the Lucy core.  Because it freezes the layout of our vtables by hard-coding
> the offsets at which method pointers are located, we would not be able to add
> new methods to Lucy (among other problems) without breaking compiled
> extensions.

But that's exactly what we do right now. The offsets are never changed. 
If we want binary compatibility across different versions, a compiled 
extension would have to create the offsets at run-time.

> Here's what we want to avoid:
>
>      http://s.apache.org/hzY
>
>      The idea is that if you install an independent third-party compiled
>      extension like "LucyX::RTree", it should still work after you upgrade the
>      Lucy core.
>
>      Using Perl/CPAN as an example, consider the following sequence of events:
>
>      1. Install Lucy 1.00 via CPAN.
>      2. Install LucyX::RTree via CPAN.
>      3. Upgrade Lucy to version 1.01 via CPAN.
>
>      If we do not preserve Lucy's binary compatibility from version 1.00 to
>      1.01, apps which use LucyX::RTree will suddenly start crashing hard
>      immediately after the upgrade finishes. That's not acceptable.
>
> Here's the mechanism by which ABI compat breaks:
>
>      Say that we have a core class "Dog" with two methods, bark() and bite(),
>      and an externally compiled subclass "Boxer" which overrides bark() and
>      adds drool().
>
>          Dog_vtable = {
>              Dog_bark,
>              Dog_bite
>          };
>
>          Boxer_vtable = {
>              Boxer_bark,
>              Dog_bite,
>              Boxer_drool
>          };
>
>      Now say that we add eat(Food *food) to the base class Dog:
>
>          Dog_vtable = {
>              Dog_bark,
>              Dog_bite,
>              Dog_eat
>          };
>
>      Unfortunately, the externally compiled Boxer_vtable has a fixed layout,
>      and it puts Boxer_drool in the slot where the core expects to find eat().
>      When the core tries to call eat() on a Boxer object, chaos will ensue.
>
> Here is the rationale for those OFFSET globals as the solution:
>
>      http://s.apache.org/pSd
>
>      To address the virtual method ABI problem, we can use what I call the
>      "inside-out vtable" approach. Normally, when compiling virtual method
>      invocations, the compiler hard-codes the offset into the vtable. This
>      causes severe runtime memory errors when a compiled extension expects to
>      find a function pointer with a certain signature at a given hard-coded
>      offset, but finds something unexpected and incompatible there instead.
>      However, if we store the offsets into the vtable as variables – a change
>      which seems to have minimal/negligible performance impact – then a
>      compiled extension can adapt to a new vtable layout presented by a
>      recompiled core. We still can't remove methods, rename them, or change
>      their signatures, but we can add new ones.
>
>      http://s.apache.org/hzY
>
>      The "inside-out" aspect of using individual variables to hold the offsets
>      was inspired by the "inside-out object" technique drawn from Perl culture.
>      However, the idea of using variable vtable offsets has been studied
>      before, and is actually implemented in GCJ.
>
>      See "Supporting Binary Compatibility with Static Compilation" by Dachuan
>      Yu, Zhong Shao, and Valery Trifonov, at
>      http://www.usenix.org/events/javavm02/yu/yu_html/index.html.
>
> I have occasionally fantasized about writing a JIT for Clownfish to get rid
> of those OFFSET globals: something along the lines of what Dachuan Yu et al
> propose in their paper.  However, their mechanism involves requires knowledge
> of the class implementation, which in Clownfish's case resides in C code that
> CFC doesn't have access to.

AFAICS, we'll need something like the mechanism described in the paper. 
In the Dog/Boxer example above, the Boxer VTable could be intialized 
like this:

     BOXER = VTable_allocate(DOG->vt_alloc_size + extra_space)
     memcpy(BOXER->methods, DOG->methods, dog_methods_size)
     Boxer_Bark_OFFSET  = Dog_Bark_OFFSET;
     Boxer_Bite_OFFSET  = Dog_Bite_OFFSET;
     Boxer_Drool_OFFSET = DOG->vt_alloc_size;
     VTable_Override(BOXER, Boxer_bark,  Boxer_Bark_OFFSET);
     VTable_Override(BOXER, Boxer_drool, Boxer_Drool_OFFSET);

This shouldn't be too hard to implement.

Nick

Re: [lucy-dev] OFFSET globals

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Thu, Apr 19, 2012 at 9:12 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> Attached is a patch that replaces the OFFSET globals with NUM #defines which
> contain the method number. This more than halves the number of extern
> symbols in Lucy.so. The only cost is a constant add for every method
> invocation which should be negligible.
>
> This also removes the use of offsetof in parcel.c, so we don't need the
> definition of the VTable struct in that file.
>
> Thoughts?

Unfortunately, applying this patch would severely constrain the development of
the Lucy core.  Because it freezes the layout of our vtables by hard-coding
the offsets at which method pointers are located, we would not be able to add
new methods to Lucy (among other problems) without breaking compiled
extensions.

Here's what we want to avoid:

    http://s.apache.org/hzY

    The idea is that if you install an independent third-party compiled
    extension like "LucyX::RTree", it should still work after you upgrade the
    Lucy core.

    Using Perl/CPAN as an example, consider the following sequence of events:

    1. Install Lucy 1.00 via CPAN.
    2. Install LucyX::RTree via CPAN.
    3. Upgrade Lucy to version 1.01 via CPAN.

    If we do not preserve Lucy's binary compatibility from version 1.00 to
    1.01, apps which use LucyX::RTree will suddenly start crashing hard
    immediately after the upgrade finishes. That's not acceptable.

Here's the mechanism by which ABI compat breaks:

    Say that we have a core class "Dog" with two methods, bark() and bite(),
    and an externally compiled subclass "Boxer" which overrides bark() and
    adds drool().

        Dog_vtable = {
            Dog_bark,
            Dog_bite
        };

        Boxer_vtable = {
            Boxer_bark,
            Dog_bite,
            Boxer_drool
        };

    Now say that we add eat(Food *food) to the base class Dog:

        Dog_vtable = {
            Dog_bark,
            Dog_bite,
            Dog_eat
        };

    Unfortunately, the externally compiled Boxer_vtable has a fixed layout,
    and it puts Boxer_drool in the slot where the core expects to find eat().
    When the core tries to call eat() on a Boxer object, chaos will ensue.

Here is the rationale for those OFFSET globals as the solution:

    http://s.apache.org/pSd

    To address the virtual method ABI problem, we can use what I call the
    "inside-out vtable" approach. Normally, when compiling virtual method
    invocations, the compiler hard-codes the offset into the vtable. This
    causes severe runtime memory errors when a compiled extension expects to
    find a function pointer with a certain signature at a given hard-coded
    offset, but finds something unexpected and incompatible there instead.
    However, if we store the offsets into the vtable as variables – a change
    which seems to have minimal/negligible performance impact – then a
    compiled extension can adapt to a new vtable layout presented by a
    recompiled core. We still can't remove methods, rename them, or change
    their signatures, but we can add new ones.

    http://s.apache.org/hzY

    The "inside-out" aspect of using individual variables to hold the offsets
    was inspired by the "inside-out object" technique drawn from Perl culture.
    However, the idea of using variable vtable offsets has been studied
    before, and is actually implemented in GCJ.

    See "Supporting Binary Compatibility with Static Compilation" by Dachuan
    Yu, Zhong Shao, and Valery Trifonov, at
    http://www.usenix.org/events/javavm02/yu/yu_html/index.html.

I have occasionally fantasized about writing a JIT for Clownfish to get rid
of those OFFSET globals: something along the lines of what Dachuan Yu et al
propose in their paper.  However, their mechanism involves requires knowledge
of the class implementation, which in Clownfish's case resides in C code that
CFC doesn't have access to.

Marvin Humphrey