You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucy.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2015/06/03 23:19:45 UTC

Re: [lucy-dev] Error handling using MAYBE types

On Wed, May 20, 2015 at 7:43 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> On 18/05/2015 02:09, Marvin Humphrey wrote:
>>
>> As an alternative to throwing exceptions or storing exception objects in
>> thread-local variables, let's consider encoding error information into
>> return values using a crude form of algebraic data types: pre-defined
>> "MAYBE" types which can be either an Err or something else.
>
> +1. This is a great idea.

This won't work with Ruby's GC, but here's another variant implementing a
quasi-tagged-union using the low bit to distinguish error from value.

https://gist.github.com/rectang/d229dfd3e27057540940

*   MAYBE types are typedef'd to `size_t`.  (This can encode enough bits to
    hold pointers and primitive types narrower than a pointer.  MAYBE types
    for wider primitives will need to be implemented as structs.)
*   The low bit is set to 1 to indicate success and 0 for failure.
*   Access to the MAYBE type is funneled through subroutines.

Unlike the previous proof-of-concept code which inspects the class pointer,
using the low bit as a type tag allows us to differentiate between an error
condition and a deliberate Err* return value.

The reason this approach is incompatible with the MRI Ruby runtime is that
MRI's conservative GC scans the C stack looking for Ruby object pointers, and
setting the low bit would hide them from it.  However, if we only manipulate
MAYBE types through subroutines, they could be two-slot structs under some
hosts (such as Ruby) and integers under others.

Marvin Humphrey

// -------------------------------------------------------------------------


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAYBE_SUCCESS_FLAG 1
#define MAYBE_SUCCESS_MASK (((size_t)-1) ^ MAYBE_SUCCESS_FLAG)

typedef enum {
    OBJ,
    ERR,
    GREETER
} Class;

typedef struct Obj {
    Class klass;
} Obj;

typedef struct Err {
    Class klass;
    char *message;
} Err;

typedef struct Greeter {
    Class klass;
    char *greeting;
} Greeter;

typedef size_t MAYBEGreeter;

static inline MAYBEGreeter
MAYBEGreeter_good(Greeter *greeter) {
    return ((MAYBEGreeter)greeter) | MAYBE_SUCCESS_FLAG;
}

static inline MAYBEGreeter
MAYBEGreeter_bad(void *error) {
    return (MAYBEGreeter)error;
}

static void
S_fail_unwrap(size_t maybe, Class klass) {
    Err *err = (Err*)maybe;
    (void)klass;
    const char *message = err == NULL
                          ? "unexpected NULL"
                          : err->message;
    fprintf(stderr, "An error occurred: %s\n", message);
    exit(EXIT_FAILURE);
}

static inline Obj*
SI_unwrap_any(size_t maybe, Class klass) {
    if (!(maybe & MAYBE_SUCCESS_FLAG)) {
        S_fail_unwrap(maybe, klass);
    }
    return (Obj*)(maybe & MAYBE_SUCCESS_MASK);
}

Err*
Err_new(const char *message) {
    Err *self = (Err*)malloc(sizeof(Err));
    self->klass = ERR;
    self->message = strdup(message);
    return self;
}

MAYBEGreeter
Greeter_new() {
    Greeter *self = (Greeter*)malloc(sizeof(Greeter));
    self->klass    = GREETER;
    self->greeting = "Hello, world!";
    return MAYBEGreeter_good(self);
}

MAYBEGreeter
Greeter_bad() {
    return MAYBEGreeter_bad(Err_new("*earth-shattering ka-boom*"));
}

static inline Greeter*
Greeter_UNWRAP(MAYBEGreeter maybe) {
    return (Greeter*)SI_unwrap_any(maybe, GREETER);
}

int
main() {
    Greeter *good = Greeter_UNWRAP(Greeter_new());
    printf("Good Greeter says: %s\n", good->greeting);
    Greeter *bad = Greeter_UNWRAP(Greeter_bad());
    printf("Bad Greeter says: %s\n", bad->greeting);
    return 0;
}

Re: [lucy-dev] Error handling using MAYBE types

Posted by Nick Wellnhofer <we...@aevum.de>.

On 17/06/2015 05:11, Marvin Humphrey wrote:
>> You mean using a struct RObject as CFISH_OBJ_HEAD similar to what we plan
>> for the Python bindings? Whether this works depends on how Ruby allocates
>> and deallocates memory but I wouldn't be surprised if it turns out to be
>> infeasible.
>
> Yep, that's the case. :(  That's frustrating -- having to allocate a new Ruby
> object wrapper every time that Clownfish sends an object into Ruby space
> annoys me.  Since Ruby's GC actually watches the C stack, it would have been
> an opportunity to demonstrate that Clownfish was flexible enough to work
> within both a refcounting model and a tracing GC model while keeping the
> object lifetimes coordinated.

Even if it works, embedding RObject in CFISH_OBJ_HEAD blows up the size of 
every Clownfish object by a considerable amount. I count five machine words 
(40 bytes on 64-bit) in RObject:

     https://github.com/ruby/ruby/blob/trunk/include/ruby/ruby.h

With Python, the situation is much better. PyObject only contains two words, 
and one of them is a refcount that can be reused by Clownfish.

I think allocating a Ruby wrapper for every Clownfish object passed to Ruby 
space isn't too bad.

Another approach that we already discussed is to cache the Ruby VALUE in 
cfish_Obj, and keep a list of all these objects in some kind of registry. 
During GC, the registry could mark all the wrappers as alive.

     http://s.apache.org/cfish-gc

Here's a sketch of how it could work:

     https://gist.github.com/nwellnhof/8daf2871fda012fc23c6

Which approach is more performant depends on the application.

>> Also, this is only useful for subclassing Clownfish classes from the host
>> language. If we have some kind of interface support, we'll probably want to
>> remove this ability anyway. It doesn't make sense to have core features that
>> are only supported by a subset of host languages.
>
> This is what you described in the thread "Interface-based callbacks for Perl
> bindings", right?
>
>      http://markmail.org/message/nauw3psvuta7myn4

Yes. This design is based on abstract classes but interfaces would work the same.

Nick

Re: [lucy-dev] Error handling using MAYBE types

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Thu, Jun 4, 2015 at 8:08 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> On 04/06/2015 16:37, Marvin Humphrey wrote:

>> I'm hoping that we can implement Clownfish under Ruby such that...
>>
>> *   Clownfish objects *are* Ruby objects.
>> *   The Clownfish refcounting routines are made no-ops.
>> *   A `Visit_Refs` method can be factored out of our current `Destroy`
>>     methods which will be used for decref'ing under refcounting hosts and
>>     for the "mark" phase of mark-and-sweep GC under Ruby (and Python).
>
> You mean using a struct RObject as CFISH_OBJ_HEAD similar to what we plan
> for the Python bindings? Whether this works depends on how Ruby allocates
> and deallocates memory but I wouldn't be surprised if it turns out to be
> infeasible.

Yep, that's the case. :(  That's frustrating -- having to allocate a new Ruby
object wrapper every time that Clownfish sends an object into Ruby space
annoys me.  Since Ruby's GC actually watches the C stack, it would have been
an opportunity to demonstrate that Clownfish was flexible enough to work
within both a refcounting model and a tracing GC model while keeping the
object lifetimes coordinated.

> Also, this is only useful for subclassing Clownfish classes from the host
> language. If we have some kind of interface support, we'll probably want to
> remove this ability anyway. It doesn't make sense to have core features that
> are only supported by a subset of host languages.

This is what you described in the thread "Interface-based callbacks for Perl
bindings", right?

    http://markmail.org/message/nauw3psvuta7myn4

I guess I'm stuck in my old mental model and haven't quite grokked all the
details.

Marvin Humphrey

Re: [lucy-dev] Error handling using MAYBE types

Posted by Nick Wellnhofer <we...@aevum.de>.

On 04/06/2015 16:37, Marvin Humphrey wrote:
> On Thu, Jun 4, 2015 at 7:06 AM, Nick Wellnhofer <we...@aevum.de> wrote:
>
>> But we don't set the low bit on the Ruby object pointer (VALUE). Pointers to
>> Clownfish objects will probably end up in the `data` field of struct RData.
>> Ruby shouldn't care about the contents of this field.
>
> I'm hoping that we can implement Clownfish under Ruby such that...
>
> *   Clownfish objects *are* Ruby objects.
> *   The Clownfish refcounting routines are made no-ops.
> *   A `Visit_Refs` method can be factored out of our current `Destroy` methods
>      which will be used for decref'ing under refcounting hosts and for the
>      "mark" phase of mark-and-sweep GC under Ruby (and Python).

You mean using a struct RObject as CFISH_OBJ_HEAD similar to what we plan for 
the Python bindings? Whether this works depends on how Ruby allocates and 
deallocates memory but I wouldn't be surprised if it turns out to be infeasible.

Also, this is only useful for subclassing Clownfish classes from the host 
language. If we have some kind of interface support, we'll probably want to 
remove this ability anyway. It doesn't make sense to have core features that 
are only supported by a subset of host languages.

Nick

Re: [lucy-dev] Error handling using MAYBE types

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Thu, Jun 4, 2015 at 7:06 AM, Nick Wellnhofer <we...@aevum.de> wrote:

> But we don't set the low bit on the Ruby object pointer (VALUE). Pointers to
> Clownfish objects will probably end up in the `data` field of struct RData.
> Ruby shouldn't care about the contents of this field.

I'm hoping that we can implement Clownfish under Ruby such that...

*   Clownfish objects *are* Ruby objects.
*   The Clownfish refcounting routines are made no-ops.
*   A `Visit_Refs` method can be factored out of our current `Destroy` methods
    which will be used for decref'ing under refcounting hosts and for the
    "mark" phase of mark-and-sweep GC under Ruby (and Python).

Marvin Humphrey

Re: [lucy-dev] Error handling using MAYBE types

Posted by Nick Wellnhofer <we...@aevum.de>.

On 03/06/2015 23:19, Marvin Humphrey wrote:
> *   MAYBE types are typedef'd to `size_t`.  (This can encode enough bits to
>      hold pointers and primitive types narrower than a pointer.  MAYBE types
>      for wider primitives will need to be implemented as structs.)
> *   The low bit is set to 1 to indicate success and 0 for failure.
> *   Access to the MAYBE type is funneled through subroutines.
>
> Unlike the previous proof-of-concept code which inspects the class pointer,
> using the low bit as a type tag allows us to differentiate between an error
> condition and a deliberate Err* return value.

I like the idea of using tagged pointers but I'd still prefer a union for 
"MAYBE" types to improve type safety. If every "MAYBE" type is a size_t (or 
uintptr_t), they can be used interchangeably and the compiler won't complain 
about assigning a MAYBEHash to a MAYBEVector, for example.

> The reason this approach is incompatible with the MRI Ruby runtime is that
> MRI's conservative GC scans the C stack looking for Ruby object pointers, and
> setting the low bit would hide them from it.  However, if we only manipulate
> MAYBE types through subroutines, they could be two-slot structs under some
> hosts (such as Ruby) and integers under others.

But we don't set the low bit on the Ruby object pointer (VALUE). Pointers to 
Clownfish objects will probably end up in the `data` field of struct RData. 
Ruby shouldn't care about the contents of this field.

Nick