You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2014/09/09 04:04:21 UTC

[lucy-dev] Re: (CLOWNFISH-12) Clownfish interfaces

On Fri, Sep 5, 2014 at 1:07 PM, Nick Wellnhofer (JIRA) <ji...@apache.org> wrote:
>
> Nick Wellnhofer commented on CLOWNFISH-12:
> ------------------------------------------
>
> Another approach are "implicit" interfaces like in Go. Some details how to
> implement this can be found here:
>
>     http://research.swtch.com/interfaces
>
> We'd have to add method signatures to our runtime class metadata. Then the
> creation of an interface instance from an object would typically require a
> heap allocation and a hash lookup. Another downside is that interface
> instances must be decref'd. But Go-like interfaces would be a really cool
> feature.

Implicitly satisfied interfaces are a very nice feature of Go, and I would
love to have them in Clownfish.

Building the interface table on the fly at runtime seems straightforward:

    In our simple example, the method table for Stringer has one method, while
    the table for Binary has two methods. In general there might be _ni_
    methods for the interface type and _nt_ methods for the concrete type. The
    obvious search to find the mapping from interface methods to concrete
    methods would take `O(ni × nt)` time, but we can do better. By sorting the
    two method tables and walking them simultaneously, we can build the
    mapping in `O(ni + nt)` time instead.

However, we have a problem with C implementation code: how can we get the C
compiler to tell us whether invoking an interface method on a given value is
safe?

In Go, the compiler detects invalid assignment to an interface variable
(<http://play.golang.org/p/mT743SWoxu>):

    package main

    type Futzer interface {
        Futz()
    }

    func main() {
        var futzer Futzer = "I am not a Futzer" // fails to compile.
        futzer.Futz()
    }

In Clownfish-flavored C, we can perform a runtime check:

    // Frobber implements Futzer, so the runtime check succeeds.
    Frobber *frobber = Frober_new();
    Futzer  *futzer  = (Futzer*)IMPLEMENTS(frobber, iFUTZER);  // succeeds
    Frober_Futz(futzer);

    // String doesn't implement Futzer, so the runtime check fails.
    char   *cstring = strdup("I am not a Futzer")
    String *string  = Str_new_from_utf8(cstring, strlen(cstring));
    Futzer *futzer  = (Futzer*)IMPLEMENTS(string, iFUTZER);  // runtime error
    Futzer_Futz(futzer);

I think we can make that check redundant using DSO-style lazy loading
techniques: have each class start off with a dummy interface table populated
with stubs which lazily build the interface table, replacing themselves and
reinvoking on success or throwing an exception on failure.

However, unless we cast or have interface method invocation functions take
`void*` for the invocant, they'll produce compile-time warnings.

    // Futzer_Futz() takes `Futzer*`: false compile-time warning.
    Futzer_Futz(frobber);

    // Futzer_Futz() takes `Futzer*`: compile-time warning, segfault.
    Futzer_Futz(cstring);

    // Futzer_Futz() takes `void*`: no compile-time warning, runtime error.
    Futzer_Futz(string);

    // Futzer_Futz() takes `void*`: no compile-time warning, segfault.
    Futzer_Futz(cstring);

Marvin Humphrey

Re: [lucy-dev] Re: (CLOWNFISH-12) Clownfish interfaces

Posted by Nathan Kurz <na...@verse.com>.
On Tue, Sep 16, 2014 at 11:14 AM, Marvin Humphrey
<ma...@rectangular.com> wrote:
> On Tue, Sep 16, 2014 at 1:04 AM, Nick Wellnhofer <we...@aevum.de> wrote:
>> I'm more concerned with the waste of memory. I'd happily sacrifice some more
>> cycles to find a solution that doesn't need those sparse arrays.
>
> I suppose there's always linear search or binary search.

I've only been skimming this thread, so perhaps I'm way off, but from
afar it sounds like a little custom hash would fit your needs
excellently.  Memory overhead can be very low, and probably can be
localized to one or a pair of cachelines. Reads of certain keys are
consistently high in frequency and thus well predicted.   The slowest
part is hashing the key, but you could have this happen once at
registration and return the result.  Writes are rare, and usually
happen at startup, so runtime resizes can probably be avoided.
Deletions are practically never, so you don't have to worry about
shrinking.  Overrides are easy, and probably could be done in a way
compatible with caller caching.

--nate

Re: [lucy-dev] Re: (CLOWNFISH-12) Clownfish interfaces

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Sep 16, 2014 at 1:04 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> I'm more concerned with the waste of memory. I'd happily sacrifice some more
> cycles to find a solution that doesn't need those sparse arrays.

I suppose there's always linear search or binary search.

We could speed that up using your proposed itable_cache variable, but moved to
the Class object -- which would save adding overhead to every object and help
achieve the objective of reducing memory footprint.  This approach seems to
violate the principle of avoiding non-constant global variables, but it's only
a cache and we don't need to update it atomically.

The downside would be non-linear performance degradation due to thrashing in
unpredictable situations, possibly worsened in multi-threaded operation.
However, performance would probably never be that bad, because subtleties of
method dispatch are going to have only minor impact on non-trivial
Clownfish-based programs.

> Even if a class from module A implements an interface of module B, it's
> possible that the class already provides the added method or that the new
> method has a concrete implementation defined in the interface (like you
> suggest below).

I think we should support adding concrete methods to later versions of an
interface/trait/role, but that we should forbid adding abstract methods.

If you really need to add abstract functionality, the workaround is to add a
new concrete method which dies at runtime.

Marvin Humphrey

Re: [lucy-dev] Re: (CLOWNFISH-12) Clownfish interfaces

Posted by Nick Wellnhofer <we...@aevum.de>.
On 16/09/2014 04:39, Marvin Humphrey wrote:
> So we go from ~15.9 cycles for our current class method dispatch to ~17.9
> cycles for this flavor of interface method dispatch.

I'm more concerned with the waste of memory. I'd happily sacrifice some more 
cycles to find a solution that doesn't need those sparse arrays.

> Sure, but how much cheaper can we get?  It doesn't seem like the penalty is
> all that severe.

Memory-wise, we can get a lot cheaper.

> It occurs to me that an additional optimization is possible for all flavors of
> interface method dispatch.  Since adding a new abstract method to an interface
> is a backwards compatibility break, we could sort the abstract methods by name
> and assign fixed offsets.

Adding a new interface method isn't always a compatibility break. If module A 
uses (not implements) an interface of module B, and module B adds a new 
interface method, module A should continue to function unless the semantics of 
the other methods change (just like with classes). Even if a class from module 
A implements an interface of module B, it's possible that the class already 
provides the added method or that the new method has a concrete implementation 
defined in the interface (like you suggest below).

> This leads to another topic I wanted to cover: I would really like to allow
> interfaces to define concrete methods in addition to abstract methods.
>
> Allowing concrete methods facilitates stuff like Ruby's Comparable, where
> defining `<=>` gets you 6 other methods:
>
>      http://www.ruby-doc.org/core-2.1.2/Comparable.html
>
> Sometimes people use "trait", "role", or "mixin" instead of "interface" to
> describe such a construct.  In any case, for Clownfish I'm suggesting that
> interfaces may define additional methods but with the limitation that these
> methods may not access any ivars.

We would a need a way to mark interface methods that are non-abstract and 
store them in the itable as default methods if there's no implementation in a 
class. The default methods would take an interface object as "self". If 
interface object pointers aren't equivalent to object pointers, we'd need an 
auto-generated wrapper that does the conversion.

Nick


Re: [lucy-dev] Re: (CLOWNFISH-12) Clownfish interfaces

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Mon, Sep 15, 2014 at 10:57 AM, Nick Wellnhofer <we...@aevum.de> wrote:

> IMO, the main benefit is fast method dispatch without the need for the
> possibly huge itables arrays. The main downside is that the creation of
> interface objects is expensive. Using the same pointer for objects and
> interface objects has many benefits but it means that we have to find the
> correct method pointer on every method invocation which is either slow or
> memory-hungry.

I made some commits today to the method dispatch benchmark to explore what
kind of penalty we incur for interface method dispatch with an extra layer of
indirection.  Here's the output on Linux:

    LD_LIBRARY_PATH=. ./exe
    cycles/call with method ptr loop: 12.894064
    cycles/call with wrapper loop: 15.903370
    cycles/call with fixed offset wrapper loop: 11.401846
    cycles/call with interface loop: 17.897250
    cycles/call with simulated inline: 6.756627
    cycles/call with wrapper: 20.345853

So we go from ~15.9 cycles for our current class method dispatch to ~17.9
cycles for this flavor of interface method dispatch.

> I wouldn’t rule out this approach so quickly. We should also still consider
> “traditional” interfaces where every class must declare the interfaces it
> implements explicity. This would make both the cast to interface types and
> method dispatch extremely cheap.

Sure, but how much cheaper can we get?  It doesn't seem like the penalty is
all that severe.

It occurs to me that an additional optimization is possible for all flavors of
interface method dispatch.  Since adding a new abstract method to an interface
is a backwards compatibility break, we could sort the abstract methods by name
and assign fixed offsets.

This leads to another topic I wanted to cover: I would really like to allow
interfaces to define concrete methods in addition to abstract methods.

Allowing concrete methods facilitates stuff like Ruby's Comparable, where
defining `<=>` gets you 6 other methods:

    http://www.ruby-doc.org/core-2.1.2/Comparable.html

Sometimes people use "trait", "role", or "mixin" instead of "interface" to
describe such a construct.  In any case, for Clownfish I'm suggesting that
interfaces may define additional methods but with the limitation that these
methods may not access any ivars.

Marvin Humphrey

[lucy-dev] Re: (CLOWNFISH-12) Clownfish interfaces

Posted by Nick Wellnhofer <we...@aevum.de>.
On Sep 15, 2014, at 18:14 , Marvin Humphrey <ma...@rectangular.com> wrote:

> On Mon, Sep 15, 2014 at 3:01 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> 
>> Originally, I didn't think of it as a cast operation but more like a
>> constructor for interface objects. It would be nice if the API would support
>> an implementation using interface objects that are allocated separately.
> 
> That's possible under the current proposal by combining an INCREF and a cast,
> right?
> 
> Let's also consider the possibility of representing interface objects using a
> two-word struct, like Go.

My original idea was to allocate interface objects on the heap and use a separate refcount. In this case, they should start with a refcount of 1. But then we’d either need separate INCREF/DECREF macros for interfaces which would mean a different API anyway and would be error-prone. Alternatively, it should be possible to make interface objects inherit from Obj, using the class vtable as itable (or at least making INCREF and DECREF compatible). With interfaces, Obj basically only needs the Inc_RefCount, Dec_RefCount and Destroy methods. Interface objects would then essentially be proxy objects that forward all method calls (expect for memory management) to the wrapped object.

But it’s also possible to allocate interface structs on the stack, pass them by value and reuse the original object’s refcount. Refcount handling would then be more cumbersome and error-prone, though.

> This more or less works for invocations, but also means that you can't use the
> interface object in any place we would ordinarily use an `Obj*`.
> 
>    int
>    S_compare(void *va, void *vb) {
>        Comparer *a = *(Comparer*)va;
>        Comparer *b = *(Comparer*)vb;
>        return Comparer_Compare_To(*a, b->obj);  // complicated.
>    }

That’s true in this case but I don’t think that’s a typical example.

> So, the only benefit is slightly streamlined interface method dispatch, but it
> comes at the cost of making the C API for using interface objects more
> cumbersome.

IMO, the main benefit is fast method dispatch without the need for the possibly huge itables arrays. The main downside is that the creation of interface objects is expensive. Using the same pointer for objects and interface objects has many benefits but it means that we have to find the correct method pointer on every method invocation which is either slow or memory-hungry.

> It also makes it more difficult to convert to host objects and back.  For
> instance, we can't store a two-word struct in a Perl SV's IV slot.

Yeah, but only for stack-allocated structs. It shouldn’t be necessary to expose interface objects to duck typing languages anyway. Instead of an interface object, we should simply pass the original object. Name-based method dispatch should then do the right thing when calling an interface method. Conversion of host language objects to interface objects should also happen behind the scenes. We don’t want to force host language code to cast objects to interfaces manually.

> That doesn't seem worthwhile.  It might be different if the compiler were able
> to perform conversions implicitly as with Go's compiler, but even then
> interoperability would suffer -- and Clownfish is all about interoperability.

I wouldn’t rule out this approach so quickly. We should also still consider “traditional” interfaces where every class must declare the interfaces it implements explicity. This would make both the cast to interface types and method dispatch extremely cheap.

Nick


Re: [lucy-dev] Re: (CLOWNFISH-12) Clownfish interfaces

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Mon, Sep 15, 2014 at 3:01 AM, Nick Wellnhofer <we...@aevum.de> wrote:

> Originally, I didn't think of it as a cast operation but more like a
> constructor for interface objects. It would be nice if the API would support
> an implementation using interface objects that are allocated separately.

That's possible under the current proposal by combining an INCREF and a cast,
right?

Let's also consider the possibility of representing interface objects using a
two-word struct, like Go.

    typedef struct Comparer {
        cfish_Obj       *obj;
        cfish_Interface *itable;
    } Comparer;

    static inline struct Comparer
    toCOMPARER(void *vself) {
        Comparer comparer;
        comparer.obj = (Obj*)vself;
        comparer.itable = fast_find_itable(comparer.obj);
        if (comparer.itable == NULL) {
            // slow lookup ...
        }
        return comparer;
    }

    static inline int32_t
    Comparer_Compare_To(Comparer self, Obj *other) {
        char *ptr = (char*)self.itable;
        Comparer_Compare_To_t method
            = (Comparer_Compare_To_t)(ptr + Comparer_Compare_To_OFFSET);
        return method(self.obj, other);
    }

    int32_t
    Foo_Compare_To_IMP(Foo *self, Obj *other) {
        ...
    }

This more or less works for invocations, but also means that you can't use the
interface object in any place we would ordinarily use an `Obj*`.

    int
    S_compare(void *va, void *vb) {
        Comparer *a = *(Comparer*)va;
        Comparer *b = *(Comparer*)vb;
        return Comparer_Compare_To(*a, b->obj);  // complicated.
    }

So, the only benefit is slightly streamlined interface method dispatch, but it
comes at the cost of making the C API for using interface objects more
cumbersome.

It also makes it more difficult to convert to host objects and back.  For
instance, we can't store a two-word struct in a Perl SV's IV slot.

That doesn't seem worthwhile.  It might be different if the compiler were able
to perform conversions implicitly as with Go's compiler, but even then
interoperability would suffer -- and Clownfish is all about interoperability.

Marvin Humphrey

[lucy-dev] Re: (CLOWNFISH-12) Clownfish interfaces

Posted by Nick Wellnhofer <we...@aevum.de>.
On 11/09/2014 18:55, Marvin Humphrey wrote:
> I'm skeptical about having a cast operation trigger a refcount increment.
>
> Besides the runtime cost of the refcount manipulation (or interface object
> allocation), the need to DECREF after use also makes programming more complex
> and increases the likelihood of memory leaks.
>
>      static int
>      compare(const void *va, const void *vb) {
>          Comparer *a = toCOMPARER(*(Obj**)va);
>          Comparer *b = toCOMPARER(*(Obj**)vb);
>          int retval = Comparer_Compare_To(a, b); // exception here causes leak
>          DECREF(a);
>          DECREF(b);
>          return retval;
>      }
>
> I understand that avoiding refcount manipulation narrows our implementation
> options.

Originally, I didn't think of it as a cast operation but more like a 
constructor for interface objects. It would be nice if the API would support 
an implementation using interface objects that are allocated separately.

>> 1. Allocate the itable array on demand. Objects of many classes are never
>> converted to interface types. This would only require an additional NULL
>> check.
>
> Right, and the NULL check is in the cast which is better than having it in the
> method dispatch routine.
>
> We could do this and also divide the itable array into, say, 8 parts.  Then
> during the lookup, figure out which array to look in by performing a
> mask/shift on the interface ID.

Good idea. It doesn't change the quadratic memory growth but it should give us 
enough leeway to support even thousands of classes and interfaces with 
reasonable memory usage.

> Hey, how about we encode the interface ID and into the high bits of the method
> OFFSET variable?  Then there's only one global variable memory fetch needed
> during method dispatch.
>
>      static inline void
>      Futzer_Futz(Futzer *self) {
>          uint32_t offset = Futzer_Futz_OFFSET;
>          uint32_t itable_array_slot
>              = (offset & ITABLE_ARRAY_MASK) >> ITABLE_ARRAY_SHIFT;
>          Interface **itables = &self->klass->itables + itable_array_slot;
>          uint32_t interface_id
>              = (offset & INTERFACE_ID_MASK) >> INTERFACE_ID_SHIFT;
>          Interface *interface = itables[interface_id];
>          char *ptr = (char*)interface + (offset & IMETHOD_OFFSET_MASK);
>          Futzer_Futz_t method = (Futzer_Futz_t)ptr;
>          method(self);
>      }
>
> Oh, and OFFSET vars should probably be uint32_t rather than size_t.  No class
> is ever going to have so many methods that we need a size_t.  That'll save
> some space on 64-bit systems.

Yes, we could use something like a 20/12 or a 16/16 bit split. This would 
limit the total number of interfaces and the number of methods per interface 
to a value between 2^12 and 2^20 but this shouldn't be a problem.

Nick


Re: [lucy-dev] Re: (CLOWNFISH-12) Clownfish interfaces

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Thu, Sep 11, 2014 at 3:44 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> On 11/09/2014 03:20, Marvin Humphrey wrote:
>>
>> +1, this is a better solution.  It should allow NULLs, though.
>>
>>           if (obj != NULL && !obj->klass->itables[Futzer_INTERFACE_ID]) {
>
> Yes, and it should also INCREF the result.

I'm skeptical about having a cast operation trigger a refcount increment.

Besides the runtime cost of the refcount manipulation (or interface object
allocation), the need to DECREF after use also makes programming more complex
and increases the likelihood of memory leaks.

    static int
    compare(const void *va, const void *vb) {
        Comparer *a = toCOMPARER(*(Obj**)va);
        Comparer *b = toCOMPARER(*(Obj**)vb);
        int retval = Comparer_Compare_To(a, b); // exception here causes leak
        DECREF(a);
        DECREF(b);
        return retval;
    }

I understand that avoiding refcount manipulation narrows our implementation
options.

> I'm still concerned about the quadratic space complexity. It shouldn't be a
> problem unless there are thousands of classes and interfaces but here are
> some numbers to illustrate the non-linear growth of the itable arrays:
>
>     50 classes, 20 interfaces: 8 KB
>     500 classes, 200 interfaces: 800 KB
>     5000 classes, 2000 interfaces: 80 MB

It's a reasonable concern.

> Some other ideas (simply brainstorming):

Here's an article on how Mono implements interface method dispatch:

    http://monoruntime.wordpress.com/2009/04/22/interface-method-dispatch-im-table-and-thunks/

It influenced my thinking on lazy loading.

> 1. Allocate the itable array on demand. Objects of many classes are never
> converted to interface types. This would only require an additional NULL
> check.

Right, and the NULL check is in the cast which is better than having it in the
method dispatch routine.

We could do this and also divide the itable array into, say, 8 parts.  Then
during the lookup, figure out which array to look in by performing a
mask/shift on the interface ID.

Hey, how about we encode the interface ID and into the high bits of the method
OFFSET variable?  Then there's only one global variable memory fetch needed
during method dispatch.

    static inline void
    Futzer_Futz(Futzer *self) {
        uint32_t offset = Futzer_Futz_OFFSET;
        uint32_t itable_array_slot
            = (offset & ITABLE_ARRAY_MASK) >> ITABLE_ARRAY_SHIFT;
        Interface **itables = &self->klass->itables + itable_array_slot;
        uint32_t interface_id
            = (offset & INTERFACE_ID_MASK) >> INTERFACE_ID_SHIFT;
        Interface *interface = itables[interface_id];
        char *ptr = (char*)interface + (offset & IMETHOD_OFFSET_MASK);
        Futzer_Futz_t method = (Futzer_Futz_t)ptr;
        method(self);
    }

Oh, and OFFSET vars should probably be uint32_t rather than size_t.  No class
is ever going to have so many methods that we need a size_t.  That'll save
some space on 64-bit systems.

Marvin Humphrey

[lucy-dev] Re: (CLOWNFISH-12) Clownfish interfaces

Posted by Nick Wellnhofer <we...@aevum.de>.
On 11/09/2014 03:20, Marvin Humphrey wrote:
> +1, this is a better solution.  It should allow NULLs, though.
>
>           if (obj != NULL && !obj->klass->itables[Futzer_INTERFACE_ID]) {

Yes, and it should also INCREF the result.

> I feel like the name could be shorter.  How about generating a `toCLASSNAME`
> macro for every Clownfish interface (and every class) which performs a safe
> runtime cast?  We'd have to accept `void*` which is somewhat dangerous,
> but oh well.
>
>      Futzer *futzer = toFUTZER(obj);
>
> This would allow us to eliminate DOWNCAST.
>
>      Foo *foo = (Foo*)DOWNCAST(obj, FOO);
>      Foo *foo = toFOO(obj);
>
> If we add a NOTNULL, it would also allow us to replace CERTIFY, at the cost of
> slightly degraded error messages on NULL values (because type information
> which is available to CERTIFY would not be available to NOTNULL).
>
>      Foo *foo = (Foo*)CERTIFY(obj, FOO);
>      Foo *foo = toFOO(NOTNULL(obj));

Sounds good.

I'm still concerned about the quadratic space complexity. It shouldn't be a 
problem unless there are thousands of classes and interfaces but here are some 
numbers to illustrate the non-linear growth of the itable arrays:

     50 classes, 20 interfaces: 8 KB
     500 classes, 200 interfaces: 800 KB
     5000 classes, 2000 interfaces: 80 MB

Some other ideas (simply brainstorming):

1. Allocate the itable array on demand. Objects of many classes are never 
converted to interface types. This would only require an additional NULL check.

2. Store only the interfaces a class is known to implement in a list that is 
iterated for every method call. This sounds expensive, but typically, a class 
only implements very few interfaces.

3. Like 2 but lookup interfaces in a hash table, either per class, per 
interface, or globally. A specialized hash table could be made reasonably fast.

4. Approaches 2 and 3 could be sped up by caching the last used interface in 
the object struct. Then an interface method call would look like:

     static inline void
     Futzer_Futz(Futzer *self) {
         ITable *itable = self->itable_cache;
         if (itable != FUTZER) {
             itable = slow_lookup(self->klass, FUTZER);
             self->itable_cache = itable;
         }
         char *ptr = (char*)itable + Futzer_Futz_OFFSET;
         Futzer_Futz_t method = (Futzer_Futz_t)ptr;
         method(self);
     }

This would shift the memory overhead from the class to the object structs.

5. (The original approach I thought of in CLOWNFISH-12.) When casting an 
object to an interface type, allocate "interface objects" on the heap which 
contain a pointer to the object and the itable. Subsequent interface method 
calls would be made using the separate interface object.

Nick


Re: [lucy-dev] Re: (CLOWNFISH-12) Clownfish interfaces

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Wed, Sep 10, 2014 at 12:53 PM, Nick Wellnhofer <we...@aevum.de> wrote:
> Why don’t we simply initialize the itable array with NULLs and have a
> function like
>
>     Futzer*
>     Futzer_from_obj(Obj* obj) {
>         if (!obj->klass->itables[Futzer_INTERFACE_ID]) {
>             // Build the interface table and store it in itables array.
>             // Throw if the class doesn’t satisfy the interface.
>         }
>         return (Futzer*)obj;
>     }

+1, this is a better solution.  It should allow NULLs, though.

         if (obj != NULL && !obj->klass->itables[Futzer_INTERFACE_ID]) {

I feel like the name could be shorter.  How about generating a `toCLASSNAME`
macro for every Clownfish interface (and every class) which performs a safe
runtime cast?  We'd have to accept `void*` which is somewhat dangerous,
but oh well.

    Futzer *futzer = toFUTZER(obj);

This would allow us to eliminate DOWNCAST.

    Foo *foo = (Foo*)DOWNCAST(obj, FOO);
    Foo *foo = toFOO(obj);

If we add a NOTNULL, it would also allow us to replace CERTIFY, at the cost of
slightly degraded error messages on NULL values (because type information
which is available to CERTIFY would not be available to NOTNULL).

    Foo *foo = (Foo*)CERTIFY(obj, FOO);
    Foo *foo = toFOO(NOTNULL(obj));

Marvin Humphrey

Re: [lucy-dev] Re: (CLOWNFISH-12) Clownfish interfaces

Posted by Nick Wellnhofer <we...@aevum.de>.
On Sep 10, 2014, at 19:22 , Marvin Humphrey <ma...@rectangular.com> wrote:

> Each interface gets:
> 
> *   A unique integer id.
> *   A stub interface table filled with stub methods.
> 
> Each Class gets:
> 
> *   An array of interfaces, initialized to the stub interface tables for each
>    interface.
> 
> Interface method dispatch would look something like this:
> 
>    static inline void
>    Futzer_Futz(Futzer *self) {
>        Interface *interface = self->klass->itables[Futzer_INTERFACE_ID];
>        char *ptr = (char*)interface + Futzer_Futz_OFFSET;
>        Futzer_Futz_t method = (Futzer_Futz_t)ptr;
>        method(self);
>    }
> 
> If we add many interfaces dynamically, we'll have to reallocate, at a cost
> proportional to O(M*N) for M classes and N interfaces.  Thread safety when
> modifying Class data will be tricky but ought to be manageable.

Ah, you mean an array of itable pointers for every class. This would mean a per-class overhead of 8 KB for 1,000 interfaces. This is a bit much but having an itable array has the benefit that we don’t need a heap allocation for interface instances. Method dispatch is also fast with only one additional memory access for the itables member.

But I don’t understand why we need the stub interfaces. Although we could simply cast an Obj to any interface type, we should still have a function that checks whether an object satifies the interface before calling any interface methods. Otherwise, we’ll get an exception on an attempt to call an unsupported method. This can happen much later and looks dangerous to me.

Why don’t we simply initialize the itable array with NULLs and have a function like

    Futzer*
    Futzer_from_obj(Obj* obj) {
        if (!obj->klass->itables[Futzer_INTERFACE_ID]) {
            // Build the interface table and store it in itables array.
            // Throw if the class doesn’t satisfy the interface.
        }
        return (Futzer*)obj;
    }

This can be used to cast an object to an interface type and make sure that the object’s class is compatible and the itables entry is populated.

Nick


Re: [lucy-dev] Re: (CLOWNFISH-12) Clownfish interfaces

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Sep 9, 2014 at 2:01 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> On 09/09/2014 04:04, Marvin Humphrey wrote:

>> I think we can make that check redundant using DSO-style lazy loading
>> techniques: have each class start off with a dummy interface table
>> populated with stubs which lazily build the interface table, replacing
>> themselves and reinvoking on success or throwing an exception on failure.
>
> I can't see how this would work. The interfaces a class could implement
> aren't known at compile time.

Each interface gets:

*   A unique integer id.
*   A stub interface table filled with stub methods.

Each Class gets:

*   An array of interfaces, initialized to the stub interface tables for each
    interface.

Interface method dispatch would look something like this:

    static inline void
    Futzer_Futz(Futzer *self) {
        Interface *interface = self->klass->itables[Futzer_INTERFACE_ID];
        char *ptr = (char*)interface + Futzer_Futz_OFFSET;
        Futzer_Futz_t method = (Futzer_Futz_t)ptr;
        method(self);
    }

If we add many interfaces dynamically, we'll have to reallocate, at a cost
proportional to O(M*N) for M classes and N interfaces.  Thread safety when
modifying Class data will be tricky but ought to be manageable.

Marvin Humphrey

[lucy-dev] Re: (CLOWNFISH-12) Clownfish interfaces

Posted by Nick Wellnhofer <we...@aevum.de>.
On 09/09/2014 04:04, Marvin Humphrey wrote:
> However, we have a problem with C implementation code: how can we get the C
> compiler to tell us whether invoking an interface method on a given value is
> safe?
>
> In Go, the compiler detects invalid assignment to an interface variable
> (<http://play.golang.org/p/mT743SWoxu>):
>
>      package main
>
>      type Futzer interface {
>          Futz()
>      }
>
>      func main() {
>          var futzer Futzer = "I am not a Futzer" // fails to compile.
>          futzer.Futz()
>      }
>
> In Clownfish-flavored C, we can perform a runtime check:

You can also perform a runtime check in Go (http://play.golang.org/p/8irDzxASe4):

     futzer, ok := any.(Futzer)
     if ok {
         futzer.Futz()
     } else {
         fmt.Println("Not a Futzer")
     }

> I think we can make that check redundant using DSO-style lazy loading
> techniques: have each class start off with a dummy interface table populated
> with stubs which lazily build the interface table, replacing themselves and
> reinvoking on success or throwing an exception on failure.

I can't see how this would work. The interfaces a class could implement aren't 
known at compile time.

Nick

[lucy-dev] Re: (CLOWNFISH-12) Clownfish interfaces

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Fri, Sep 5, 2014 at 1:07 PM, Nick Wellnhofer (JIRA) <ji...@apache.org> wrote:

> Nick Wellnhofer commented on CLOWNFISH-12:
> ------------------------------------------
>
> Another approach are "implicit" interfaces like in Go.

A couple more comments about Go-style interfaces...

Go is quite deliberately designed to avoid inheritance; its type embedding is
syntactic sugar in support of composition.

    http://en.wikipedia.org/wiki/Composition_over_inheritance

In Go, if the type Hello has a Greet() method, embedding Hello within HelloJr
makes it possible to call Greet() on HelloJr, object, but **the method
reciever is the inner type**.

Those who are used to inheritance would expect the second line of output for
this program to be "wassup world":

    http://play.golang.org/p/nqO1UKbzP6

    hello world
    hello world

Here are the shenanigans you have to go through to make that happen (which are
definitely not idiomatic Go):

    http://play.golang.org/p/PihtxohvHa
    https://paste.apache.org/foJK
    http://tech.t9i.in/2014/01/inheritance-semantics-in-go/

I understand why the Go folks believe their type system is superior and hope
that it will propagate, but I'm not yet persuaded that Clownfish interfaces
should emulate this aspect of Go's interfaces.

On a related note, Go's lack of support for inheritance is problematic for
classes which are designed to have methods overridden:

    http://lucy.apache.org/docs/perl/Lucy/Docs/Cookbook/CustomQueryParser.html#Extending-the-query-language

    To add support for trailing wildcards to our query language, we need to
    override expand_leaf() to accommodate PrefixQuery...

To support such functionality for Go, QueryParser would need to delegate to a
"LeafExpander" instead of calling Expand_Leaf() on itself.  To customize
behavior, one would create a QueryParser which has-a custom LeafExpander,
rather than creating a new subclass which is-a QueryParser yet overrides
Expand_Leaf().

Interestingly, such an approach would work for other host languages as well.
Composition seems to map cleanly to more host languages than inheritance.

Marvin Humphrey