You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucy.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2009/03/20 01:33:55 UTC

Reference counting inside a GC host (was "real time updates")

On Tue, Mar 17, 2009 at 05:50:14AM -0400, Michael McCandless wrote:

> > There are some quirks, though, with how it manages host objects.
> > The default behavior is to create a host language object at the same
> > time as the Boilerplater object, and have the host object manage the
> > refcount.
> 
> Hmm, sounds tricky... because there are typically consumers in C and
> in the Host language and both need to incRef/decRef.

Indeed, we need to accomodate refcounting ops both in the Lucy core and in the
Host.  For a refcounted Host like Perl or Python, all of these ops will affect
a single, *unified* refcount which resides in the cached Perl/Python object at
self->ref.host_obj.  That's what I meant by "have the host manage the
refcount" -- I wasn't clear enough that Lucy would be able to manipulate that
host refcount using wrapper methods.

The Lucy::Obj header at trunk/core/Lucy/Obj.bp will declare Inc_RefCount() and
Dec_RefCount() methods:

    /** Increment an object's refcount.
     * 
     * @return the object, allowing an assignment idiom.
     */
    public incremented Obj* 
    Inc_RefCount(Obj *self);

    /** Decrement an object's refcount, calling Destroy() if it hits 0.
     * 
     * @return the modified refcount.
     */
    u32_t
    Dec_RefCount(Obj *self);

However, no implementation for these methods is provided in
trunk/core/Lucy/Obj.c.  It will be up to the bindings to provide an
implentation, or a linking error will occur.  

For the Perl bindings, we'll provide a second
Obj.c at trunk/perl/xs/Lucy/Obj.c which will contain the following:

    lucy_Obj*
    lucy_Obj_inc_refcount(lucy_Obj *self)
    {
        SvREFCNT_inc_simple_void_NN((SV*)self->ref.host_obj);
        return self;
    }

    chy_u32_t
    lucy_Obj_dec_refcount(lucy_Obj *self)
    {
        chy_u32_t modified_refcount = SvREFCNT((SV*)self->ref.host_obj) - 1;
        /* If the SV's refcount falls to 0, DESTROY will be invoked from
         * Perl-space.
         */
        SvREFCNT_dec((SV*)self->ref.host_obj);
        return modified_refcount;
    }

That's how most objects in Lucy will be managed.  However, that approach isn't
ideal for all of them.

The first, obvious objection to caching a host object inside every single Lucy
object is that it wastes memory for those objects which never venture into
Host-space; an integer refcount would require less overhead.  The "FastObj"
class was originally written to address this concern.

However, that's not a major problem unless we're creating and destroying a
boatload of small objects.  Lucene 1.4.3 was a profligate wastrel in this
regard, but KinoSearch's basic architecture has gotten pretty lean and has
room to get leaner still.  If memory use and speed were the only reasons to
use FastObj, I think we could kill it off.

However, there's a second, more annoying problem.  It's not possible to
declare static structs which contain e.g. a Perl object, because all Perl
objects are malloc'd at runtime.  That's inconvenient for declaring things
like CharBuf literals or VTables:

   /* Can't do this unless CharBuf is a subclass of FastObj. */
   static CharBuf foo = {
        (VTable*)&CHARBUF,
        1,      /* ref.count */
        "foo",  /* character data */
        3,      /* size */
        4       /* capacity (includes terminating NULL) */
   };

It's probably possible to initialize all of our VTables, CharBuf literals, and
such in a bootstrap routine, but it's enough of a pain to set something like
that up that I haven't gone and made such a change in KS.

I'd really like to kill of FastObj just for the sake of simplicity, though.

> > I've tried searching the web for resources on how to make
> > refcounting and GC coexist happily, but I haven't found anything so
> > far.  If anybody's got strong google-fu today or actually knows and
> > can recommend some literature, I'm all ears.
> 
> This is tricky!

There's one scheme that I know will at least work under a tracing garbage
collector: the one used by Ferret.

    * Within the C portion of Lucy, perform integer refcounting.
    * Every time a unique host wrapper object is created, increment the
      refcount.
    * Every time a host wrapper is destroyed, decrement the refcount.

In other words, for Hosts that use tracing garbage collection, all Lucy
objects would use an integer refcount, and nobody would cache any host
objects.

However, the Ferret approach has a drawback: You create and destroy host
wrappers every time you cross the host/C boundary.  That'll create a
performance drag in some situations.

The Ferret scheme won't cause problems with light usage of the library,
because most of Lucy's work will be done within tight loops in the C core.  It
also doesn't stop you from attaching host data to the C object using the
"flyweight" design pattern, a.k.a. the "inside-out object" pattern, because
you can still key data off of the unchanging C object memory address.

However, once you start writing subclasses, all that OO overhead at the host/C
boundary is going to slow down tight loops like Scorer_Next.  

Caching a host object for the life of the the Lucy object solves that problem,
but I'm not sure how to do that within the context of a tracing garbage
collector.  

We can assume that the program initiates within Host-space; most Lucy objects
will be able to trace back to the host.  However, independent objects that we
create as statics, globals, or C stack vars won't be visible to the garbage
collector and will get reclaimed prematurely.

Any ideas on how to pull off the caching trick?  Is there something we can do
if we allocate space within all of our Lucy objects for *both* an integer
refcount and a cached host object?  Do we need to add all new objects to a
giant Hash that we tell the host about, and yank C stack vars out of that Hash
before the C function returns?

Marvin Humphrey

Re: Reference counting inside a GC host (was "real time updates")

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, Mar 31, 2009 at 12:06 AM, Marvin Humphrey
<ma...@rectangular.com> wrote:
> On Mon, Mar 30, 2009 at 06:19:09PM -0400, Michael McCandless wrote:
>
>> > Analyzer: maybe not so much because the analysis method would be non-trivial.
>> > (Assuming that we settle on something like the KS "Inversion"-style Analyzers
>> > rather than the method-call-orgy of Lucene TokenStream.)
>>
>> What's an inversion style analyzer?
>
> Sorry, that wasn't a good way of putting it.  KS passes around Tokens as
> arrays rather than as iterators; Inversion is the class that holds Tokens, and
> it actually descends from VArray.
>
> It's hard to write an Analyzer in pure Perl that operates on multiple tokens
> and isn't a sloth.  Because of method call overhead, having to call next()
> for every token makes that even harder.

Ahh, OK.

>> OK, I'm convinced... it seems like we should stick with your approach
>> (make & cache host wrapper when requested or RC becomes 2).
>
> Thanks to you and Peter for driving the discussion towards an improved version
> of that model.

It's been great fun :)  Lucy looks to have a great infrastructure design.

Mike

Re: Reference counting inside a GC host (was "real time updates")

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Mon, Mar 30, 2009 at 06:19:09PM -0400, Michael McCandless wrote:

> > Analyzer: maybe not so much because the analysis method would be non-trivial.
> > (Assuming that we settle on something like the KS "Inversion"-style Analyzers
> > rather than the method-call-orgy of Lucene TokenStream.)
> 
> What's an inversion style analyzer?

Sorry, that wasn't a good way of putting it.  KS passes around Tokens as
arrays rather than as iterators; Inversion is the class that holds Tokens, and
it actually descends from VArray.

It's hard to write an Analyzer in pure Perl that operates on multiple tokens
and isn't a sloth.  Because of method call overhead, having to call next()
for every token makes that even harder.

> OK, I'm convinced... it seems like we should stick with your approach
> (make & cache host wrapper when requested or RC becomes 2).

Thanks to you and Peter for driving the discussion towards an improved version
of that model.

Marvin Humphrey

Re: Reference counting inside a GC host (was "real time updates")

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Sun, Mar 29, 2009 at 9:20 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Sun, Mar 29, 2009 at 05:55:11AM -0400, Michael McCandless wrote:
>
>> > FWIW, the cached-host-object algo wins the tight loop performance contest.
>>
>> Which makes sense, but how often are such loops [expected to be] used
>> in practice?  EG we already thought Collector inside host would be
>> doable but expected poor performance.
>
> The performance hit will matter whenever someone overrides a tight-loop
> method.  How much it matters will vary from host to host.
>
> Probably any Scorer or Collector would feel it.  DataReader: it depends on
> whether there are Scorer implementations which call on it in tight loops.
>
> InStream and OutStream: definitely would have, except that they're final
> classes so you can't override any methods.

Yeah OK.

> Analyzer: maybe not so much because the analysis method would be non-trivial.
> (Assuming that we settle on something like the KS "Inversion"-style Analyzers
> rather than the method-call-orgy of Lucene TokenStream.)

What's an inversion style analyzer?

> Searcher, Indexer, Query, QueryParser, DataWriter, IndexManager, Lock: no,
> because they don't operate light methods within tight loops.  FieldSpec,
> Schema: I don't think so because so far it's been possible to cache the return
> values of certain methods which would otherwise be called often.
> (KinoSearch::Index::Inverter performs such caching.)
>
>> How much work is being done when you create a new host-wrapper object?
>
> That will vary from Host to Host.  In Perl, we're using blessed scalars, which
> are about as cheap as Perl objects get.
>
> However, Perl method dispatch is comparatively slow; if method calls were
> faster, the effect would loom larger.  In Java, for instance, method calls are
> very fast, and creating a host wrapper object on each loop will probably cause
> a significant degradation much sooner.
>
> Put another way: caching a host wrapper object probably matters a lot more
> when method dispatch is fast.

OK.

>> Could you somehow re-use such objects?  (Sort of like how when you go
>> to a children's museum, to the water section, they provide a bunch of
>> yellow frocks that the kids can wear when they play with the water,
>> and return when they are done).
>
> Evocative metaphor. :)  So, who manages the pool?

It's a massively concurrent free for all with no locking :)  And
sometimes frocks get lost for a little while and a background sweeping
thread eventually reclaims them.

> Ideally, when we know we're going to be calling back to the host in a tight
> loop, we'd want the caller to cache the host object.  But that's not easy.
>
> Here's the basic implementation of Matcher_Collect():
>
>  void
>  Matcher_collect(Matcher *self, Collector *collector)
>  {
>    i32_t doc_num;
>    Collector_Set_Matcher(collector, self);
>    while (0 != (doc_num = Matcher_Next(self))) {
>      Coll_Collect(doc_num);
>    }
>  }
>
> Say that the user has overridden Collector's Collect() method in the host, so
> it's slow.  To get best performance, we need to change how *Matcher* works,
> even though it's *Collector* that's the source of the slowdown.
>
>  void
>  Matcher_collect(Matcher *self, Collector *collector)
>  {
>    i32_t  doc_num;
>    void  *host_obj = Coll_To_Host(collector); /* <--- Cache wrapper here */
>
>    Collector_Set_Matcher(collector, self);
>    while (0 != (doc_num = Matcher_Next(self))) {
>      Host_callback(host_obj, "collect", 1, ARG_I32("doc_num", doc_num));
>    }
>
>    Coll_Destroy_Host(collector, host_obj); /* <--- Destroy wrapper here */
>  }
>
> (Host_callback() ordinarily takes a Lucy object as its first argument, but
> pretend that it takes a host wrapper object for now.)
>
> Coll_Collect doesn't know it's being called in a loop.  Even if it did, it
> could only cache host object wrappers using static or global vars. (Bye-bye
> thread safety, and what on earth do we do about subclasses?)

OK.

>  static void *Coll_collect_HOST_OBJ = NULL;
>  void
>  Coll_collect_OVERRIDE(Collector* self, i32_t doc_num)
>  {
>    /* Bleah. */
>    Coll_collect_HOST_OBJ = Coll_Validate_Or_Make_Wrapper(self,
>        Coll_collect_HOST_OBJ);
>    Host_callback(Coll_collect_HOST_OBJ, "collect", 1,
>        ARG_I32("doc_num", doc_num));
>  }
>
>> > We lose convenient use of the "inside-out-object" pattern. :(
>>
>> Neat.... I've seen a similar approach (though likely someone has given
>> / would give it a different named pattern)
>
> The "FlyWeight" design pattern is very similar.
>
>> In your example, shouldn't/couldn't the hash key somehow be the id of
>> the *Lucy* object, not of its fleeting host wrapper obj?
>
> Actually, that's what $$self is.
>
>   $self          <---- address of Perl object[1]
>   $$self         <---- address of Lucy object
>
> However, invoking $self->DESTROY on multiple wrappers around the same Lucy
> object is still a problem.

Hmmmm true.

>  sub DESTROY {
>      my $self = shift;
>      delete $foo{$$self};
>      $self->SUPER::DESTROY;  # Calls Dec_RefCount(), not Destroy() as it
>                              # would under the cached-object model.
>  }
>
> As soon as the wrapper has its refcount fall to 0, Perl invokes DESTROY, and
> despite the fact that the Lucy object doesn't get zapped yet, $foo{$$self}
> gets deleted.

OK, I'm convinced... it seems like we should stick with your approach
(make & cache host wrapper when requested or RC becomes 2).

Mike

Re: Reference counting inside a GC host (was "real time updates")

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, Mar 31, 2009 at 12:52 AM, Marvin Humphrey
<ma...@rectangular.com> wrote:
> On Mon, Mar 30, 2009 at 06:34:18PM -0400, Michael McCandless wrote:
>
>> What is the vtable for vtable vtable?  Itself?
>
> This is going to get a bit "meta" -- prepare yourself.  :)
>
> The "vtable" member for VTable objects is &VTABLE.
>
> So, all of the following are true:
>
>  OBJ.vtable    == &VTABLE;
>  VARRAY.vtable == &VTABLE;
>  VTABLE.vtable == &VTABLE;

OK so VTABLE's vtable is simply itself.

> We need VTABLE because VTables are Lucy objects, and not just arrays of
> function pointers.  They have methods, and they also contain a couple very
> important pieces of class data, e.g. obj_alloc_size which is sizeof(Obj) for
> Obj, sizeof(VArray) for VArray, etc.

Got it.

>> >> Do VTables only store methods?  Or can they store fields as well?
>> >
>> > You mean could they store class data?  I suppose they could.  Initializing
>> > might get a little messy, and I'd want to make sure that we locked them down
>> > and made them stateless before threading starts.
>>
>> I actually meant data fields on the object, but class data ("static"
>> in Java) is also good.
>
> Oh, I think I understand now.  Each VTable object does have a few member vars:
> obj_alloc_size, vt_alloc_size, etc.

Actually I meant eg SegMetaData (not sure on the name) would have a
String data memeber segmentName in it.  Is there an entry in VTable
for members that are data, not methods?

>> Hmm...  Python's cyclic collector won't collect cycles involving
>> classes that have __del__ since it can't guess a safe order to run the
>> __del__ methods.  (I know we're expecting people to just avoid making
>> cycles, but still important to know).
>
> That's a bummer.  But I think we'd need __del__ no matter what, since we need
> to trigger C-level destructors.

Yeah.  The whole "finalizers cannot be run in a deterministic manner
so we simply don't collect class with finalizers" things is scary...
all tracing collectors struggle with this.

>> you could probably simply use that host_obj
>> field to hold low value ref counts (which are not valid pointers).
>> Though that's scary-C-hack-territory :)
>
> If we could guarantee that, say, 0x0 (NULL), 0x1, 0x2, and 0x3 could never
> represent a pointer to a real host object for every single system that Lucy
> might be compiled on, we could probably prevent a few unnecessary host object
> creations that way -- more than 0x3 and we're at the point of diminishing
> returns.  And I think we can, because malloc() always returns word-aligned
> pointers, which 0x1, 0x2, and 0x3 would not be for any 32-bit or 64-bit system
> where NULL was 0x0.
>
> So, this is almost certainly safe, but makes the library less maintainable
> because the hack has to be explained.
>
> I think we should do it.
>
> If we're really paranoid, we can check the return value for To_Host() and
> throw an exception if it's less than 4.

Nice!  The scary things you can do in C...

Mike

Re: Reference counting inside a GC host (was "real time updates")

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Mon, Mar 30, 2009 at 06:34:18PM -0400, Michael McCandless wrote:

> What is the vtable for vtable vtable?  Itself?

This is going to get a bit "meta" -- prepare yourself.  :)

The "vtable" member for VTable objects is &VTABLE.

So, all of the following are true:

  OBJ.vtable    == &VTABLE;
  VARRAY.vtable == &VTABLE;
  VTABLE.vtable == &VTABLE;

We need VTABLE because VTables are Lucy objects, and not just arrays of
function pointers.  They have methods, and they also contain a couple very
important pieces of class data, e.g. obj_alloc_size which is sizeof(Obj) for
Obj, sizeof(VArray) for VArray, etc.

> >> Do VTables only store methods?  Or can they store fields as well?
> >
> > You mean could they store class data?  I suppose they could.  Initializing
> > might get a little messy, and I'd want to make sure that we locked them down
> > and made them stateless before threading starts.
> 
> I actually meant data fields on the object, but class data ("static"
> in Java) is also good.

Oh, I think I understand now.  Each VTable object does have a few member vars:
obj_alloc_size, vt_alloc_size, etc.

> Hmm...  Python's cyclic collector won't collect cycles involving
> classes that have __del__ since it can't guess a safe order to run the
> __del__ methods.  (I know we're expecting people to just avoid making
> cycles, but still important to know).

That's a bummer.  But I think we'd need __del__ no matter what, since we need
to trigger C-level destructors.

> you could probably simply use that host_obj
> field to hold low value ref counts (which are not valid pointers).
> Though that's scary-C-hack-territory :)

If we could guarantee that, say, 0x0 (NULL), 0x1, 0x2, and 0x3 could never
represent a pointer to a real host object for every single system that Lucy
might be compiled on, we could probably prevent a few unnecessary host object
creations that way -- more than 0x3 and we're at the point of diminishing
returns.  And I think we can, because malloc() always returns word-aligned
pointers, which 0x1, 0x2, and 0x3 would not be for any 32-bit or 64-bit system
where NULL was 0x0.

So, this is almost certainly safe, but makes the library less maintainable
because the hack has to be explained.

I think we should do it.  

If we're really paranoid, we can check the return value for To_Host() and
throw an exception if it's less than 4.

Marvin Humphrey

Re: Reference counting inside a GC host (was "real time updates")

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Sun, Mar 29, 2009 at 11:48 PM, Marvin Humphrey
<ma...@rectangular.com> wrote:
> On Sun, Mar 29, 2009 at 08:36:57AM -0400, Michael McCandless wrote:
>
>> Are VTables also considered Obj's?
>
> Yes.  They are structs, with the last member being a "flexible array member"
> which can hold any number of "method_t" pointers.  Here's the list of members
> (omitting host_obj and vtable, which are inherited).
>
>    VTable            *parent;
>    CharBuf           *name;         /* class name */
>    u32_t              flags;
>    size_t             obj_alloc_size;
>    size_t             vt_alloc_size;
>    Callback         **callbacks;
>    boil_method_t[1]   methods;      /* flexible array */
>
> I've included the generated code for the VARRAY VTable below my sig.  It's a
> little messy because some mild hacks are needed to satisfy C89 and portability
> constraints.

What is the vtable for vtable vtable?  Itself?

>> Do VTables only store methods?  Or can they store fields as well?
>
> You mean could they store class data?  I suppose they could.  Initializing
> might get a little messy, and I'd want to make sure that we locked them down
> and made them stateless before threading starts.

I actually meant data fields on the object, but class data ("static"
in Java) is also good.

>> Can an arbitrary Obj at runtime become a VTable for another Obj?
>> (True "prototype" programming language).  Seems like "no", because an
>> arbitrary Obj is not allowed to add new members wrt its parent (only
>> new VTables can do so).
>
> Correct.
>
>> Does VARRAY (VTable for VArray objects) hold a reference to OBJ?
>
> Yes: self->vtable->parent.  The Obj_Is_A() method, similar to Java's
> "instanceof" follows the chain upwards:
>
>    bool_t
>    Obj_is_a(Obj *self, VTable *target_vtable)
>    {
>        VTable *vtable = self ? self->vtable : NULL;
>        while (vtable != NULL) {
>            if (vtable == target_vtable) { return true; }
>            vtable = vtable->parent;
>        }
>        return false;
>    }

OK.

>> How are these trees of VTables init'd?
>
> All core classes have VTables which are global structs initialized at compile
> time.  Boilerplater spits them out into a file called "boil.c".
>
> User subclass VTables are allocated on the fly at runtime.

OK.

>> And Lucy objs are single inheritance.
>
> Correct.
>
>> So a VArray is allowed to have C NULLs in its elems array (vs say Java
>> which always inits the array to hold Java null's).
>
> That's how it's implemented now.  Changing it over to something like Java
> null's might be a good idea.

Saves having to NULL check, though it's wasteful if the code that
created the VArray will immediately then fill it in.

>> Is there an explicit object in Lucy that represents null (java), None
>> (Python), etc.?
>
> Yes: UNDEF, a singleton belonging to the class Undefined.
> However, it's not used very much.

OK.

>> > First, note that the destructor for VArray invokes the destructor of its
>> > parent class, Obj.  This superclass call makes it possible for us to add
>> > members to a base class without having to manually edit all subclasses.
>>
>> Great.
>
> The same is true at construction time; each class implements two functions,
> new() and init(), and subclasses call their parent class's init():
>
>    TermQuery*
>    TermQuery_new(const CharBuf *field, const Obj *term)
>    {
>        TermQuery *self = (TermQuery*)CREATE(NULL, TERMQUERY);
>        return TermQuery_init(self, field, term);
>    }
>
>    TermQuery*
>    TermQuery_init(TermQuery *self, const CharBuf *field, const Obj *term)
>    {
>        Query_init((Query*)self, 1.0f);
>        self->field       = CB_Clone(field);
>        self->term        = Obj_Clone(term);
>        return self;
>    }

OK.

>> I'm a bit confused: what if you have a Lucy obj, that's got a cached
>> host obj, such that the host obj is not referred to anywhere in the
>> host language, but is referred to in Lucy, and Lucy finally decrefs
>> its last reference.
>
> At that point, the Lucy object and the Python object share a single refcount,
> which resides in the Python object's ob_refcnt member.  When you call PyDECREF
> on the Python object and ob_refcnt falls to 0, Python then invokes the
> "__del__" method.  We will define "__del__" so that it invokes Destroy().

AHH, sorry, I see now.

Hmm...  Python's cyclic collector won't collect cycles involving
classes that have __del__ since it can't guess a safe order to run the
__del__ methods.  (I know we're expecting people to just avoid making
cycles, but still important to know).

>> How is the cycle broken in that case?
>
> There's no reference cycle because the Lucy object and the Python object share
> a single unified refcount.  The have C pointers which point at each other, but
> they don't hold refcounts open against each other.
>
>> (Ie, Destroy should be invoked via Lucy).
>
> The call sequence will be:
>
>    Obj_Dec_Refcount(lucy_obj);
>    PyDECREF(lucy_obj->host_obj);
>    python_obj.__del__()
>    Obj_Destroy(lucy_obj);

OK.

>> OK.  That 1 refCount "belonging" to Lucy.  This is essentially an
>> efficient way to represent the common case of "only Lucy has a single
>> reference to this object".
>
> Yes.  In addition, it allows us to initialize structs at compile-time.
>
>   CharBuf snapshot_new = {
>       (VTable*)&CHARBUF, /* vtable */
>       NULL,              /* <------ NULL host_obj */
>       "snapshot.new",    /* ptr */
>       12,                /* size */
>       13                 /* capacity */
>   }
>
> Since Python objects live on the heap, we can't create them at compile-time.
> The NULL conveniently stands in for them.

Great.

>> >  void*
>> >  Obj_to_host(Obj *self)
>> >  {
>> >      if (self->host_obj) {
>> >          /* The Python object is already cached, so incref it and return. */
>> >          PyINCREF((PyObject*)self->host_obj);
>> >          return self->host_obj;
>> >      }
>> >      else {
>> >          /* Lazily create Python object. */
>> >          self->host_obj = PyCObject_FromVoidPtr((void*)self, Obj_Destroy)
>> >      }
>> >  }
>>
>> (missing a return on the else clause, but I get it).
>
> Heh.  Why won't my email client test my code for me? :)

Someday :)

>> So this is a great approach, in that a host obj
>> is not created immediately on creating a Lucy obj.  However, it's
>> still "falsely" created, in order to track refCount > 1 from within
>> Lucy, even when the obj never crosses the bridge.
>
> Yes.  But if we are parsimonious about creating objects in the first place, it
> doesn't matter so much if constant costs per object are large.

Yes.

>> If only host languages let us override what decRef does for a given
>> obj... then we could break the tight cycles ourself and only allocate
>> a host obj when needed.
>
> There's an alternative.  We can waste an extra 4 bytes per object to hold an
> integer refcount, and use that unless a host object has been cached.  The
> instant that the host object has been cached, we set its refcount and use that
> instead.
>
> I'm not sure that's either clearer or faster, but it does mean that we
> wouldn't ever needlessly create host objects.

That's interesting... you could probably simply use that host_obj
field to hold low value ref counts (which are not valid pointers).
Though that's scary-C-hack-territory :)

Mike

Re: Reference counting inside a GC host (was "real time updates")

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Sun, Mar 29, 2009 at 08:36:57AM -0400, Michael McCandless wrote:

> Are VTables also considered Obj's?  

Yes.  They are structs, with the last member being a "flexible array member"
which can hold any number of "method_t" pointers.  Here's the list of members
(omitting host_obj and vtable, which are inherited).

    VTable            *parent;
    CharBuf           *name;         /* class name */
    u32_t              flags;
    size_t             obj_alloc_size;
    size_t             vt_alloc_size;
    Callback         **callbacks;
    boil_method_t[1]   methods;      /* flexible array */

I've included the generated code for the VARRAY VTable below my sig.  It's a
little messy because some mild hacks are needed to satisfy C89 and portability
constraints.

> Do VTables only store methods?  Or can they store fields as well?

You mean could they store class data?  I suppose they could.  Initializing
might get a little messy, and I'd want to make sure that we locked them down
and made them stateless before threading starts.

> Can an arbitrary Obj at runtime become a VTable for another Obj?
> (True "prototype" programming language).  Seems like "no", because an
> arbitrary Obj is not allowed to add new members wrt its parent (only
> new VTables can do so).

Correct.

> Does VARRAY (VTable for VArray objects) hold a reference to OBJ?  

Yes: self->vtable->parent.  The Obj_Is_A() method, similar to Java's
"instanceof" follows the chain upwards:

    bool_t
    Obj_is_a(Obj *self, VTable *target_vtable)
    {
        VTable *vtable = self ? self->vtable : NULL;
        while (vtable != NULL) {
            if (vtable == target_vtable) { return true; }
            vtable = vtable->parent;
        }   
        return false;
    }

> How are these trees of VTables init'd?

All core classes have VTables which are global structs initialized at compile
time.  Boilerplater spits them out into a file called "boil.c".

User subclass VTables are allocated on the fly at runtime.

> And Lucy objs are single inheritance.

Correct.

> So a VArray is allowed to have C NULLs in its elems array (vs say Java
> which always inits the array to hold Java null's).  

That's how it's implemented now.  Changing it over to something like Java
null's might be a good idea.

> Is there an explicit object in Lucy that represents null (java), None
> (Python), etc.?

Yes: UNDEF, a singleton belonging to the class Undefined.
However, it's not used very much.

> > First, note that the destructor for VArray invokes the destructor of its
> > parent class, Obj.  This superclass call makes it possible for us to add
> > members to a base class without having to manually edit all subclasses.
> 
> Great.

The same is true at construction time; each class implements two functions,
new() and init(), and subclasses call their parent class's init():

    TermQuery*
    TermQuery_new(const CharBuf *field, const Obj *term)
    {
        TermQuery *self = (TermQuery*)CREATE(NULL, TERMQUERY);
        return TermQuery_init(self, field, term);
    }

    TermQuery*
    TermQuery_init(TermQuery *self, const CharBuf *field, const Obj *term)
    {
        Query_init((Query*)self, 1.0f);
        self->field       = CB_Clone(field);
        self->term        = Obj_Clone(term);
        return self;
    }

> I'm a bit confused: what if you have a Lucy obj, that's got a cached
> host obj, such that the host obj is not referred to anywhere in the
> host language, but is referred to in Lucy, and Lucy finally decrefs
> its last reference.  

At that point, the Lucy object and the Python object share a single refcount,
which resides in the Python object's ob_refcnt member.  When you call PyDECREF
on the Python object and ob_refcnt falls to 0, Python then invokes the
"__del__" method.  We will define "__del__" so that it invokes Destroy().

> How is the cycle broken in that case?  

There's no reference cycle because the Lucy object and the Python object share
a single unified refcount.  The have C pointers which point at each other, but
they don't hold refcounts open against each other.

> (Ie, Destroy should be invoked via Lucy).

The call sequence will be: 

    Obj_Dec_Refcount(lucy_obj);
    PyDECREF(lucy_obj->host_obj);
    python_obj.__del__()
    Obj_Destroy(lucy_obj);

> OK.  That 1 refCount "belonging" to Lucy.  This is essentially an
> efficient way to represent the common case of "only Lucy has a single
> reference to this object".

Yes.  In addition, it allows us to initialize structs at compile-time.

   CharBuf snapshot_new = {
       (VTable*)&CHARBUF, /* vtable */
       NULL,              /* <------ NULL host_obj */
       "snapshot.new",    /* ptr */
       12,                /* size */
       13                 /* capacity */
   }

Since Python objects live on the heap, we can't create them at compile-time.
The NULL conveniently stands in for them.

> >  void*
> >  Obj_to_host(Obj *self)
> >  {
> >      if (self->host_obj) {
> >          /* The Python object is already cached, so incref it and return. */
> >          PyINCREF((PyObject*)self->host_obj);
> >          return self->host_obj;
> >      }
> >      else {
> >          /* Lazily create Python object. */
> >          self->host_obj = PyCObject_FromVoidPtr((void*)self, Obj_Destroy)
> >      }
> >  }
> 
> (missing a return on the else clause, but I get it).

Heh.  Why won't my email client test my code for me? :)

> So this is a great approach, in that a host obj
> is not created immediately on creating a Lucy obj.  However, it's
> still "falsely" created, in order to track refCount > 1 from within
> Lucy, even when the obj never crosses the bridge.  

Yes.  But if we are parsimonious about creating objects in the first place, it
doesn't matter so much if constant costs per object are large.

> If only host languages let us override what decRef does for a given
> obj... then we could break the tight cycles ourself and only allocate
> a host obj when needed.

There's an alternative.  We can waste an extra 4 bytes per object to hold an
integer refcount, and use that unless a host object has been cached.  The
instant that the host object has been cached, we set its refcount and use that
instead.

I'm not sure that's either clearer or faster, but it does mean that we
wouldn't ever needlessly create host objects.

Marvin Humphrey

/************************** VARRAY VTable definition ************************/

KINO_VARRAY_VT KINO_VARRAY = {
    (kino_VTable*)&KINO_VTABLE, /* vtable vtable */
    NULL, /* host_obj */
    (kino_VTable*)&KINO_OBJ, /* parent */
    (kino_CharBuf*)&KINO_VARRAY_CLASS_NAME, /* class name */
    KINO_VTABLE_F_IMMORTAL, /* flags */
    sizeof(kino_VArray), /* obj_alloc_size */
    offsetof(kino_VTable, methods) 
        + 38 * sizeof(boil_method_t), /* vt_alloc_size */
    (kino_Callback**)&KINO_VARRAY_CALLBACKS,  /* callbacks */
    {
        (boil_method_t)kino_Obj_create,
        (boil_method_t)kino_Obj_foster,
        (boil_method_t)kino_Obj_make,
        (boil_method_t)kino_Obj_get_refcount,
        (boil_method_t)kino_Obj_inc_refcount,
        (boil_method_t)kino_Obj_dec_refcount,
        (boil_method_t)kino_Obj_to_host,
        (boil_method_t)kino_VA_clone,
        (boil_method_t)kino_VA_destroy,
        (boil_method_t)kino_VA_equals,
        (boil_method_t)kino_Obj_compare_to,
        (boil_method_t)kino_Obj_hash_code,
        (boil_method_t)kino_Obj_get_vtable,
        (boil_method_t)kino_Obj_get_class_name,
        (boil_method_t)kino_Obj_is_a,
        (boil_method_t)kino_Obj_to_string,
        (boil_method_t)kino_Obj_to_i64,
        (boil_method_t)kino_Obj_to_f64,
        (boil_method_t)kino_VA_serialize,
        (boil_method_t)kino_VA_deserialize,
        (boil_method_t)kino_VA_dump,
        (boil_method_t)kino_VA_load,
        (boil_method_t)kino_VA_push,
        (boil_method_t)kino_VA_push_varray,
        (boil_method_t)kino_VA_pop,
        (boil_method_t)kino_VA_unshift,
        (boil_method_t)kino_VA_shift,
        (boil_method_t)kino_VA_grow,
        (boil_method_t)kino_VA_fetch,
        (boil_method_t)kino_VA_store,
        (boil_method_t)kino_VA_delete,
        (boil_method_t)kino_VA_splice,
        (boil_method_t)kino_VA_shallow_copy,
        (boil_method_t)kino_VA_sort,
        (boil_method_t)kino_VA_resize,
        (boil_method_t)kino_VA_clear,
        (boil_method_t)kino_VA_get_size,
        (boil_method_t)kino_VA_grep
    }
};

Re: Reference counting inside a GC host (was "real time updates")

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Sat, Mar 28, 2009 at 8:24 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Fri, Mar 27, 2009 at 08:21:54AM -0400, Michael McCandless wrote:
>
>> On Fri, Mar 27, 2009 at 1:08 AM, Marvin Humphrey <ma...@rectangular.com>
>> wrote:
>>
>> > I think I have an approach that's going to allow us to eliminate FastObj:
>> > We lazily create the host object, and treat a NULL host_obj as
>> > semantically equivalent to a refcount of 1.
>
> I'm happy to report that this approach succeeded.  FastObj is now history.  :)

Most excellent!  Though we do need to figure out how tracing GC languages play.

>> Much of this is beyond me, but...
>
> Hopefully we will soon get to the point where that's no longer the case.
>
> Our KS prototype now has only one object model.  I'll describe how it works
> for refcounting hosts like Perl and Python.
>
> ---
>
> Every Obj is a struct with a VTable and a host object as its first two members:
>
>  struct Obj {
>      VTable   *vtable;
>      void     *host_obj;
>  };
>
>  struct VArray {
>      VTable   *vtable;
>      void     *host_obj;
>      Obj     **elems;
>      u32_t     size;
>      u32_t     capacity;
>  }
>
> When any Lucy object is created, self->host_obj is NULL.
>
> Here's are some simplified sample constructors, for Lucy::Obj, and our
> variable-sized array class Lucy::Util::VArray:
>
>  Obj*
>  Obj_new() {
>      Obj *self      = (Obj*)malloc(sizeof(Obj));
>      self->vtable   = (VTable*)VTable_Inc_RefCount(&OBJ);
>      self->host_obj = NULL;
>      return self;
>  }
>
>  VArray*
>  VA_new(u32_t capacity)
>  {
>      VArray *self   = (Obj*)malloc(sizeof(VArray));
>      self->vtable   = (VTable*)VTable_Inc_RefCount(&VARRAY);
>      self->host_obj = NULL;
>      self->elems    = (Obj**)calloc(capacity * sizeof(Obj*));
>      self->size     = 0;
>      self->capacity = capacity;
>      return self;
>  }
>
> Note that the VTable for the Obj class is OBJ, and the VTable for VArray is
> VARRAY.  The same pattern holds true for other classes: TermScorer's VTable is
> TERMSCORER, etc.

OK.

Are VTables also considered Obj's?  Ie, you are Inc/DecRef'ing them --
will a VTable be destroyed when it DecRefs to 0?

Do VTables only store methods?  Or can they store fields as well?

Can an arbitrary Obj at runtime become a VTable for another Obj?
(True "prototype" programming language).  Seems like "no", because an
arbitrary Obj is not allowed to add new members wrt its parent (only
new VTables can do so).

Does VARRAY (VTable for VArray objects) hold a reference to OBJ?  How
are these trees of VTables init'd?

And Lucy objs are single inheritance.

(NOTE: I'm just "probing" with these questions... I'm certainly not
implying Lucy should have / needs 1all of these capabilities).

> Here are corresponding destructors for Obj and VArray:
>
>  void
>  Obj_destroy(Obj *self)
>  {
>      VTable_Dec_RefCount(self->vtable);
>      free(self);
>  }
>
>  void
>  VA_destroy(VArray *self)
>  {
>      u32_t i;
>      for (i = 0; i < self->size, i++) {
>          if (self->elems[i]) {
>              Obj_Dec_RefCount(self->elems[i]);
>          }
>      }
>      free(self->elems);
>      Obj_destroy((Obj*)self); /* super */
>  }

OK, it's making sense.

So a VArray is allowed to have C NULLs in its elems array (vs say Java
which always inits the array to hold Java null's).  Is there an
explicit object in Lucy that represents null (java), None (Python),
etc.?

> Two items of note about the destructors:
>
> First, note that the destructor for VArray invokes the destructor of its
> parent class, Obj.  This superclass call makes it possible for us to add
> members to a base class without having to manually edit all subclasses.

Great.

> Second, there is no mention whatsoever of self->host_obj in the destructor.
> That's because there are only two paths into the destructor, and both of them
> avoid the need for Lucy core code to worry about the cached host object.
>
>  1) The cached host object was never created so it doesn't need to be
>     cleaned up.
>  2) Destroy() is being invoked from host-space via e.g. Pythons "__del__"
>     method, and after it returns the host will clean up the host object
>     itself.

I'm a bit confused: what if you have a Lucy obj, that's got a cached
host obj, such that the host obj is not referred to anywhere in the
host language, but is referred to in Lucy, and Lucy finally decrefs
its last reference.  How is the cycle broken in that case?  (Ie,
Destroy should be invoked via Lucy).

> Obj declares four methods which each host must implement:
>
>   Get_RefCount
>   Inc_RefCount
>   Dec_RefCount
>   To_Host
>
> Mike, since you're familiar with Python, I'll have a go at implementing those
> methods for the Python bindings.
>
> First, the accessor for the object's refcount, which is shared by the Lucy
> object and the Python object.  If self->host_obj is NULL, then the refcount is
> 1.  Otherwise, we delegate responsibility for tracking the refcount to the
> Python object cached in self->host_obj.
>
>  u32_t
>  Obj_get_refcount(Obj *self)
>  {
>      if (self->host_obj == NULL) {
>          return 1;  /* NULL host_obj implies a refcount of 1. */
>      }
>      else {
>          PyObject *py_object = (PyObject*)self->host_obj;
>          return py_object->ob_refcnt;
>      }
>  }
>
> Next, the method which increments the refcount.  Calling this method even once
> guarantees that a Python object will be created, since the first time it is
> called, the refcount will progress from 1 to 2, and we need a place to put
> that number.
>
> This means that there are two ways to indicate a refcount of 1.  Either we
> have a newly created Lucy object with a NULL self->host_obj which *implies* a
> refcount of 1, or we have a cached host object which had a refcount of 2 or
> more at some point, but which has fallen back down to an *explicit* refcount
> of 1.

OK.  That 1 refCount "belonging" to Lucy.  This is essentially an
efficient way to represent the common case of "only Lucy has a single
reference to this object".

>  Obj*
>  Obj_inc_refcount(Obj *self)
>  {
>      if (self->host_obj == NULL) {
>        self->host_obj = Obj_To_Host(self);
>      }
>      PyINCREF((PyObject*)self->host_obj);
>      return self;
>  }
>
> Once the host object is cached, it never goes away -- it's there for the life
> of the Lucy object.
>
> Next, the method to decrement the refcount.  Note that we only call Destroy()
> directly if self->host_obj is NULL.  If we've created a Python object, then we
> count on it to invoke the __del__ method when its refcount falls to 0; we will
> have defined __del__ to invoke Destroy().

I'm still confused because it seems like there should be times when
Lucy needs to destroy a Lucy obj that had at some point crossed the
bridge, yet the host retained no reference.  Ie the destruction may
need to initiate from the Lucy side of the bridge.

>  u32_t
>  Obj_dec_refcount(Obj *self)
>  {
>      if (self->host_obj == NULL) {
>          /* NULL host object implies a refcount of 1.  That's dropping to 0
>           * as a result of this call, so it's time to invoke Destroy(). */
>          Obj_Destroy(self);
>      }
>      else {
>          /* If the PyObject's ob_refcnt falls to 0, then the destructor will
>           * be invoked from Python-space via the "__del__" method */
>          PyDECREF((PyObject*)self->host_obj);
>      }
>  }
>
> The last method we need to define is To_Host(), which, in the parlance of the
> Python C API docs, will return a "new reference".
>
> (I'm not sure that this implementation is correct, but it should convey
> the gist.)
>
>  void*
>  Obj_to_host(Obj *self)
>  {
>      if (self->host_obj) {
>          /* The Python object is already cached, so incref it and return. */
>          PyINCREF((PyObject*)self->host_obj);
>          return self->host_obj;
>      }
>      else {
>          /* Lazily create Python object. */
>          self->host_obj = PyCObject_FromVoidPtr((void*)self, Obj_Destroy)
>      }
>  }

(missing a return on the else clause, but I get it).

>> won't there be multiple references in C to a given Lucy object, each of
>> which would need to incRef the RC?
>
> Yes.  As soon as the refcount has to be increased above 1, we lazily create a
> Host object to hold the refcount.
>
> ---
>
> Leaving aside the question of tracing GC hosts for now... does the
> cached-host-object delegated refcounting model seem sufficiently clear to you
> for use within Lucy Python bindings?
>
> The rest of the Lucy library doesn't need to know about the host object
> caching -- it just uses the opaque refcounting API, which looks like plain
> old integer refcounting from the outside.

Yes, makes sense (except for the "Lucy destroys object that has cached
host object" case).  So this is a great approach, in that a host obj
is not created immediately on creating a Lucy obj.  However, it's
still "falsely" created, in order to track refCount > 1 from within
Lucy, even when the obj never crosses the bridge.  I wonder in
practice how often that actually happens -- my guess is the
RefCount==1 case is the "long tail" in a typical snapshot of a future
running Lucy app.

If only host languages let us override what decRef does for a given
obj... then we could break the tight cycles ourself and only allocate
a host obj when needed.

Mike

Re: Reference counting inside a GC host (was "real time updates")

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, Mar 27, 2009 at 08:21:54AM -0400, Michael McCandless wrote:

> On Fri, Mar 27, 2009 at 1:08 AM, Marvin Humphrey <ma...@rectangular.com>
> wrote:
> 
> > I think I have an approach that's going to allow us to eliminate FastObj:
> > We lazily create the host object, and treat a NULL host_obj as
> > semantically equivalent to a refcount of 1.

I'm happy to report that this approach succeeded.  FastObj is now history.  :)

> Much of this is beyond me, but... 

Hopefully we will soon get to the point where that's no longer the case.  

Our KS prototype now has only one object model.  I'll describe how it works
for refcounting hosts like Perl and Python.

---

Every Obj is a struct with a VTable and a host object as its first two members:

  struct Obj {
      VTable   *vtable;
      void     *host_obj;
  };

  struct VArray {
      VTable   *vtable;
      void     *host_obj;
      Obj     **elems;
      u32_t     size;
      u32_t     capacity;
  }

When any Lucy object is created, self->host_obj is NULL.  

Here's are some simplified sample constructors, for Lucy::Obj, and our
variable-sized array class Lucy::Util::VArray:

  Obj*
  Obj_new() {
      Obj *self      = (Obj*)malloc(sizeof(Obj));
      self->vtable   = (VTable*)VTable_Inc_RefCount(&OBJ);
      self->host_obj = NULL;
      return self;
  }

  VArray*
  VA_new(u32_t capacity) 
  {
      VArray *self   = (Obj*)malloc(sizeof(VArray));
      self->vtable   = (VTable*)VTable_Inc_RefCount(&VARRAY);
      self->host_obj = NULL;
      self->elems    = (Obj**)calloc(capacity * sizeof(Obj*));
      self->size     = 0;
      self->capacity = capacity;
      return self;
  }

Note that the VTable for the Obj class is OBJ, and the VTable for VArray is
VARRAY.  The same pattern holds true for other classes: TermScorer's VTable is
TERMSCORER, etc.

Here are corresponding destructors for Obj and VArray:  

  void
  Obj_destroy(Obj *self)
  {
      VTable_Dec_RefCount(self->vtable);
      free(self);
  }

  void
  VA_destroy(VArray *self)
  {
      u32_t i;
      for (i = 0; i < self->size, i++) {
          if (self->elems[i]) {
              Obj_Dec_RefCount(self->elems[i]);
          }
      }
      free(self->elems);
      Obj_destroy((Obj*)self); /* super */
  }

Two items of note about the destructors:

First, note that the destructor for VArray invokes the destructor of its
parent class, Obj.  This superclass call makes it possible for us to add
members to a base class without having to manually edit all subclasses.

Second, there is no mention whatsoever of self->host_obj in the destructor.
That's because there are only two paths into the destructor, and both of them
avoid the need for Lucy core code to worry about the cached host object.

  1) The cached host object was never created so it doesn't need to be 
     cleaned up.
  2) Destroy() is being invoked from host-space via e.g. Pythons "__del__"
     method, and after it returns the host will clean up the host object
     itself.

Obj declares four methods which each host must implement: 

   Get_RefCount
   Inc_RefCount
   Dec_RefCount
   To_Host

Mike, since you're familiar with Python, I'll have a go at implementing those
methods for the Python bindings.

First, the accessor for the object's refcount, which is shared by the Lucy
object and the Python object.  If self->host_obj is NULL, then the refcount is
1.  Otherwise, we delegate responsibility for tracking the refcount to the
Python object cached in self->host_obj.

  u32_t
  Obj_get_refcount(Obj *self) 
  {
      if (self->host_obj == NULL) {
          return 1;  /* NULL host_obj implies a refcount of 1. */
      }
      else {
          PyObject *py_object = (PyObject*)self->host_obj;
          return py_object->ob_refcnt;
      }
  }

Next, the method which increments the refcount.  Calling this method even once
guarantees that a Python object will be created, since the first time it is
called, the refcount will progress from 1 to 2, and we need a place to put
that number.

This means that there are two ways to indicate a refcount of 1.  Either we
have a newly created Lucy object with a NULL self->host_obj which *implies* a
refcount of 1, or we have a cached host object which had a refcount of 2 or
more at some point, but which has fallen back down to an *explicit* refcount
of 1.

  Obj*
  Obj_inc_refcount(Obj *self)
  {
      if (self->host_obj == NULL) {
        self->host_obj = Obj_To_Host(self);
      }
      PyINCREF((PyObject*)self->host_obj);
      return self;
  }

Once the host object is cached, it never goes away -- it's there for the life
of the Lucy object.

Next, the method to decrement the refcount.  Note that we only call Destroy()
directly if self->host_obj is NULL.  If we've created a Python object, then we
count on it to invoke the __del__ method when its refcount falls to 0; we will
have defined __del__ to invoke Destroy().

  u32_t
  Obj_dec_refcount(Obj *self)
  {
      if (self->host_obj == NULL) {
          /* NULL host object implies a refcount of 1.  That's dropping to 0
           * as a result of this call, so it's time to invoke Destroy(). */
          Obj_Destroy(self);
      }
      else {
          /* If the PyObject's ob_refcnt falls to 0, then the destructor will
           * be invoked from Python-space via the "__del__" method */
          PyDECREF((PyObject*)self->host_obj);
      }
  }

The last method we need to define is To_Host(), which, in the parlance of the
Python C API docs, will return a "new reference".  

(I'm not sure that this implementation is correct, but it should convey
the gist.)

  void*
  Obj_to_host(Obj *self)
  {
      if (self->host_obj) {
          /* The Python object is already cached, so incref it and return. */
          PyINCREF((PyObject*)self->host_obj);
          return self->host_obj;
      }
      else {
          /* Lazily create Python object. */
          self->host_obj = PyCObject_FromVoidPtr((void*)self, Obj_Destroy)
      }
  }

> won't there be multiple references in C to a given Lucy object, each of
> which would need to incRef the RC?

Yes.  As soon as the refcount has to be increased above 1, we lazily create a
Host object to hold the refcount.  

---

Leaving aside the question of tracing GC hosts for now... does the
cached-host-object delegated refcounting model seem sufficiently clear to you
for use within Lucy Python bindings?

The rest of the Lucy library doesn't need to know about the host object
caching -- it just uses the opaque refcounting API, which looks like plain
old integer refcounting from the outside.

Marvin Humphrey

Re: Reference counting inside a GC host (was "real time updates")

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Sun, Mar 29, 2009 at 05:55:11AM -0400, Michael McCandless wrote:

> > FWIW, the cached-host-object algo wins the tight loop performance contest.
> 
> Which makes sense, but how often are such loops [expected to be] used
> in practice?  EG we already thought Collector inside host would be
> doable but expected poor performance.

The performance hit will matter whenever someone overrides a tight-loop
method.  How much it matters will vary from host to host.

Probably any Scorer or Collector would feel it.  DataReader: it depends on
whether there are Scorer implementations which call on it in tight loops.

InStream and OutStream: definitely would have, except that they're final
classes so you can't override any methods.

Analyzer: maybe not so much because the analysis method would be non-trivial.
(Assuming that we settle on something like the KS "Inversion"-style Analyzers
rather than the method-call-orgy of Lucene TokenStream.)

Searcher, Indexer, Query, QueryParser, DataWriter, IndexManager, Lock: no,
because they don't operate light methods within tight loops.  FieldSpec,
Schema: I don't think so because so far it's been possible to cache the return
values of certain methods which would otherwise be called often.
(KinoSearch::Index::Inverter performs such caching.)

> How much work is being done when you create a new host-wrapper object?

That will vary from Host to Host.  In Perl, we're using blessed scalars, which
are about as cheap as Perl objects get.

However, Perl method dispatch is comparatively slow; if method calls were
faster, the effect would loom larger.  In Java, for instance, method calls are
very fast, and creating a host wrapper object on each loop will probably cause
a significant degradation much sooner.  

Put another way: caching a host wrapper object probably matters a lot more
when method dispatch is fast.

> Could you somehow re-use such objects?  (Sort of like how when you go
> to a children's museum, to the water section, they provide a bunch of
> yellow frocks that the kids can wear when they play with the water,
> and return when they are done).  

Evocative metaphor. :)  So, who manages the pool?  

Ideally, when we know we're going to be calling back to the host in a tight
loop, we'd want the caller to cache the host object.  But that's not easy.

Here's the basic implementation of Matcher_Collect():

  void
  Matcher_collect(Matcher *self, Collector *collector)
  {
    i32_t doc_num;
    Collector_Set_Matcher(collector, self);
    while (0 != (doc_num = Matcher_Next(self))) {
      Coll_Collect(doc_num);
    }
  }

Say that the user has overridden Collector's Collect() method in the host, so
it's slow.  To get best performance, we need to change how *Matcher* works,
even though it's *Collector* that's the source of the slowdown.

  void
  Matcher_collect(Matcher *self, Collector *collector)
  {
    i32_t  doc_num;
    void  *host_obj = Coll_To_Host(collector); /* <--- Cache wrapper here */

    Collector_Set_Matcher(collector, self);
    while (0 != (doc_num = Matcher_Next(self))) {
      Host_callback(host_obj, "collect", 1, ARG_I32("doc_num", doc_num));
    }

    Coll_Destroy_Host(collector, host_obj); /* <--- Destroy wrapper here */
  }

(Host_callback() ordinarily takes a Lucy object as its first argument, but
pretend that it takes a host wrapper object for now.)

Coll_Collect doesn't know it's being called in a loop.  Even if it did, it
could only cache host object wrappers using static or global vars. (Bye-bye
thread safety, and what on earth do we do about subclasses?)

  static void *Coll_collect_HOST_OBJ = NULL;
  void
  Coll_collect_OVERRIDE(Collector* self, i32_t doc_num)
  {
    /* Bleah. */
    Coll_collect_HOST_OBJ = Coll_Validate_Or_Make_Wrapper(self, 
        Coll_collect_HOST_OBJ);  
    Host_callback(Coll_collect_HOST_OBJ, "collect", 1, 
        ARG_I32("doc_num", doc_num));
  }

> > We lose convenient use of the "inside-out-object" pattern. :(
> 
> Neat.... I've seen a similar approach (though likely someone has given
> / would give it a different named pattern) 

The "FlyWeight" design pattern is very similar.

> In your example, shouldn't/couldn't the hash key somehow be the id of
> the *Lucy* object, not of its fleeting host wrapper obj?  

Actually, that's what $$self is.

   $self          <---- address of Perl object[1]
   $$self         <---- address of Lucy object

However, invoking $self->DESTROY on multiple wrappers around the same Lucy
object is still a problem.

  sub DESTROY {
      my $self = shift;
      delete $foo{$$self};
      $self->SUPER::DESTROY;  # Calls Dec_RefCount(), not Destroy() as it
                              # would under the cached-object model.
  }

As soon as the wrapper has its refcount fall to 0, Perl invokes DESTROY, and
despite the fact that the Lucy object doesn't get zapped yet, $foo{$$self}
gets deleted.

Marvin Humphrey

[1] Slight oversimplification.

Re: Reference counting inside a GC host (was "real time updates")

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Sun, Mar 29, 2009 at 9:11 AM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Sun, Mar 29, 2009 at 06:01:16AM -0400, Michael McCandless wrote:
>
>> You might consider adding the hooks for a tracing collector, but not
>> creating such a collector now.  IE have each Lucy obj expose a method
>> to visit the other objs it refers to.
>
> We can probably have Boilerplater autogenerate that code.  That way we only
> have to only write one compiler module instead of manually maintaining methods
> for X hundred classes.

Excellent!  I guess Boilerplate knows enough about the members of an
object to determine which are objects vs which are atomic (and don't
need visiting by tracer).

>> I think you'd be surprised at how easily cycles are accidentally created.
>>
>> It's sort of like saying programmers are accustomed to writing high
>> performance code, but then an O(N^2) slips in somewhere and goes
>> undetected until N happens to get large deep in production.
>
> Heh.  Or a reference cycle leak slips in that happens to get large and causes
> a performance hit in production.  Been there done that.

Tell me about it.  We all have our battle scars ;)  I've had great
"fun" writing tools in Python to trace the sneaky cycles... once an
app gets big enough it's practically impossible to prevent accidental
cycles.

Mike

Re: Reference counting inside a GC host (was "real time updates")

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Sun, Mar 29, 2009 at 06:01:16AM -0400, Michael McCandless wrote:

> You might consider adding the hooks for a tracing collector, but not
> creating such a collector now.  IE have each Lucy obj expose a method
> to visit the other objs it refers to.

We can probably have Boilerplater autogenerate that code.  That way we only
have to only write one compiler module instead of manually maintaining methods
for X hundred classes.

> I think you'd be surprised at how easily cycles are accidentally created.
> 
> It's sort of like saying programmers are accustomed to writing high
> performance code, but then an O(N^2) slips in somewhere and goes
> undetected until N happens to get large deep in production.

Heh.  Or a reference cycle leak slips in that happens to get large and causes
a performance hit in production.  Been there done that.

Marvin Humphrey

Re: Reference counting inside a GC host (was "real time updates")

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Sat, Mar 28, 2009 at 6:55 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Fri, Mar 27, 2009 at 08:23:45AM -0400, Michael McCandless wrote:
>
>> since Lucy will use reference counting, how will you deal with cycles?
>
> I don't think we should do anything about them.  It will be up to the user to
> avoid them.
>
> There *are* limits to my ambitions for the Lucy object model.  :)

OK :)  Intentionally leaky abstraction.  FWIW Python took that same
approach for many years, before sprouting a cycle detector addition to
its GC.  But it's awkward, because sometimes things are collected
immediately (when they decref to 0), but at other times, only when the
cyclic collector sweeps.  And of course, because it's tracing, it
requires each class to expose a "visit all things I refer to" API.

You might consider adding the hooks for a tracing collector, but not
creating such a collector now.  IE have each Lucy obj expose a method
to visit the other objs it refers to.

>> since the host language is involved, a cycle could easily reach out to the
>> host language and wrap back around into Lucy?
>
> Possible.  I doubt it will be much of a problem for programmers accustomed to
> working within a refcounted environment.

I think you'd be surprised at how easily cycles are accidentally created.

It's sort of like saying programmers are accustomed to writing high
performance code, but then an O(N^2) slips in somewhere and goes
undetected until N happens to get large deep in production.

> Programmers who normally work within a tracing GC environment may be more
> prone to create reference cycles, because they're not trained to avoid
> cyclical data structures and alarms might not go off in their heads when they
> see them.
>
> But what are our options?  Our class hierarchy is sophisticated and large
> enough that we have to use either tracing GC or refcounting.  Writing our own
> tracing GC, now *that* would be ambitious. ;)  And though I haven't studied
> cycle detection, I imagine it has to be both involved and costly.

Agreed.  I wonder if there's a decent tracing GC library somehow out
there, that only requires certain limited methods to be exposed for
your objects...

And note that I'm really playing devil's advocate here -- I don't have
good solutions to offer for these hard problems.

Mike

Re: Reference counting inside a GC host (was "real time updates")

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, Mar 27, 2009 at 08:23:45AM -0400, Michael McCandless wrote:

> since Lucy will use reference counting, how will you deal with cycles?

I don't think we should do anything about them.  It will be up to the user to
avoid them.

There *are* limits to my ambitions for the Lucy object model.  :)

> since the host language is involved, a cycle could easily reach out to the
> host language and wrap back around into Lucy?

Possible.  I doubt it will be much of a problem for programmers accustomed to
working within a refcounted environment.  

Programmers who normally work within a tracing GC environment may be more
prone to create reference cycles, because they're not trained to avoid
cyclical data structures and alarms might not go off in their heads when they
see them.

But what are our options?  Our class hierarchy is sophisticated and large
enough that we have to use either tracing GC or refcounting.  Writing our own
tracing GC, now *that* would be ambitious. ;)  And though I haven't studied
cycle detection, I imagine it has to be both involved and costly.

Marvin Humphrey

Re: Reference counting inside a GC host (was "real time updates")

Posted by Michael McCandless <lu...@mikemccandless.com>.

A new top-level question: since Lucy will use reference counting, how
will you deal with cycles?

You could try to ensure the Lucy core doesn't create cycles of its
own, but.... since the host language is involved, a cycle could easily
reach out to the host language and wrap back around into Lucy?

Mike

On Fri, Mar 27, 2009 at 8:21 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> On Fri, Mar 27, 2009 at 1:08 AM, Marvin Humphrey <ma...@rectangular.com> wrote:
>
>> I think I have an approach that's going to allow us to eliminate FastObj: We
>> lazily create the host object, and treat a NULL host_obj as semantically
>> equivalent to a refcount of 1.
>
> Much of this is beyond me, but... won't there be multiple references
> in C to a given Lucy object, each of which would need to incRef the
> RC?
>
> Expanding on/restating my "sort of like autoboxing" idea...: what if
> the Lucy obj never held onto the reference to the mirror host obj?
> (And the Lucy obj did its own reference counting, separately from the
> host's GC).
>
> But, the reverse is allowed (the host can hold onto the Lucy mirror
> obj that has incRef's (refers to) the Lucy obj).
>
> When we cross the bridge, Lucy to host, we would make a new host
> wrapper obj each time.  These host mirror objs are very lightweight
> wrappers, right?  When that call returns across the bridge, Lucy drops
> the reference to it.
>
> The host is allowed to keep a reference somewhere to that host mirror
> obj, and later cross the bridge in the reverse direction, using it.
>
> Note that Lucy can still clearly store "other" host objects.  It's
> just these 'thin host mirrors Lucy object" objects that would not be
> retained in Lucy.
>
> This then breaks the circular ref, and gives complete freedom for Lucy
> to do whatever GC it needs, separate from the Host's GC.
>
> But what do we lost by not retaining a permanent host obj wrapper?
> (Note that if it's a real performance problem, then in certain cases
> you could eg explicitly retain a reference at the top of the loop, do
> the loop, then drop the reference at the end).
>
> Mike
>

Re: Reference counting inside a GC host (was "real time updates")

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Sat, Mar 28, 2009 at 6:08 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Fri, Mar 27, 2009 at 08:21:54AM -0400, Michael McCandless wrote:
>
>> what if the Lucy obj never held onto the reference to the mirror host obj?
>> (And the Lucy obj did its own reference counting, separately from the
>> host's GC).
>
> What you've laid out is *exactly* the same scheme Peter Karman used in SWISH3,
> the same scheme Dave Balmain used in Ferret, and the same scheme I used in
> KinoSearch for objects which subclass FastObj.

OK, excellent.  I'm playing catch up over here...

> Since this algorithm seems to be an intuitive winner, and since I'd like to
> just solve this problem and move on, I took a stab at converting KS over to
> using it exclusively.  Switching over to something that everybody else seems to
> grok would be great -- indeed, I was rooting for my cached-host-object algo to
> lose.  Sacrificing a little performance for the sake of an implementation that
> other people might have less difficulty understanding, maintaining, and
> modifying would be just fine with me.
>
> The results were disappointing.  The cached-host-object algo has certain
> advantages we should be reluctant to discard.

Interesting.

>> When we cross the bridge, Lucy to host, we would make a new host
>> wrapper obj each time.  These host mirror objs are very lightweight
>> wrappers, right?
>
> FWIW, the cached-host-object algo wins the tight loop performance contest.

Which makes sense, but how often are such loops [expected to be] used
in practice?  EG we already thought Collector inside host would be
doable but expected poor performance.

> Below my sig, you'll find a patch that applies against KinoSearch svn revision
> 4361, and a benchmarking script. The patch gives TermQuery a member var named
> "thing", plus Set_Thing() and Get_Thing() methods.  The loop compares the
> performance of calling $term_query->get_thing from Perl-space when "thing" is a
> VArray (which subclasses FastObj) vs. when "thing" is a Hash (which subclasses
> Obj).
>
> The only difference in those loops is the behavior of FastObj_to_host (which
> creates a new host object every time) vs.  Obj_to_host (which exploits a cached
> host object).  The results:
>
>  marvin@smokey:~/projects/ks4361/perl $ perl -Mblib time_callbacks.pl
>                     Rate no_cached_object    cached_object
>  no_cached_object 1.93/s               --             -66%
>  cached_object    5.71/s             196%               --
>
> On its own, this performance gap probably doesn't justify sacrificing clarity
> of implementation.  It only matters for host-language subclassing, where
> performance is going to be heavily compromised anyway -- and if your method
> does something non-trivial, this effect is going to get swamped.

How much work is being done when you create a new host-wrapper object?
 Could you somehow re-use such objects?  (Sort of like how when you go
to a children's museum, to the water section, they provide a bunch of
yellow frocks that the kids can wear when they play with the water,
and return when they are done).  I want to make sure we've fully
explored performance improvements for the "simpler"
non-cached-host-object approach before making the decision.

> However...
>
>> But what do we lost by not retaining a permanent host obj wrapper?
>
> We lose convenient use of the "inside-out-object" pattern. :(

Neat.... I've seen a similar approach (though likely someone has given
/ would give it a different named pattern) applied to fine-grained
locks, too, where you want to make a lock for objects where you may
have zillions of them, so instead of actually making a lock per
object, you make a single shared array of locks and each object is
hashed into one lock in that array.

In your example, shouldn't/couldn't the hash key somehow be the id of
the *Lucy* object, not of its fleeting host wrapper obj?  I'm not sure
how easy this'd be ($$self likely is not something you could override
to return the pointer to the Lucy object instead of the host
wrapper?).  And then when the Lucy obj is really destroyed, you'd have
to cross the bridge and call DESTROY in the host.

>> (Note that if it's a real performance problem, then in certain cases
>> you could eg explicitly retain a reference at the top of the loop, do
>> the loop, then drop the reference at the end).

I view this as being similar to the "though shalt not create cycles"
approach.  We are expecting the user (not end-user) to understand some
limitations of how Lucy works and to properly work around them.  It's
an intentionally leaky abstraction...

Mike

Re: Reference counting inside a GC host (was "real time updates")

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, Mar 27, 2009 at 08:21:54AM -0400, Michael McCandless wrote:

> what if the Lucy obj never held onto the reference to the mirror host obj?
> (And the Lucy obj did its own reference counting, separately from the
> host's GC).

What you've laid out is *exactly* the same scheme Peter Karman used in SWISH3,
the same scheme Dave Balmain used in Ferret, and the same scheme I used in
KinoSearch for objects which subclass FastObj.

Since this algorithm seems to be an intuitive winner, and since I'd like to
just solve this problem and move on, I took a stab at converting KS over to
using it exclusively.  Switching over to something that everybody else seems to
grok would be great -- indeed, I was rooting for my cached-host-object algo to
lose.  Sacrificing a little performance for the sake of an implementation that
other people might have less difficulty understanding, maintaining, and
modifying would be just fine with me.

The results were disappointing.  The cached-host-object algo has certain
advantages we should be reluctant to discard.

> When we cross the bridge, Lucy to host, we would make a new host
> wrapper obj each time.  These host mirror objs are very lightweight
> wrappers, right?  

FWIW, the cached-host-object algo wins the tight loop performance contest.

Below my sig, you'll find a patch that applies against KinoSearch svn revision
4361, and a benchmarking script. The patch gives TermQuery a member var named
"thing", plus Set_Thing() and Get_Thing() methods.  The loop compares the
performance of calling $term_query->get_thing from Perl-space when "thing" is a
VArray (which subclasses FastObj) vs. when "thing" is a Hash (which subclasses
Obj).

The only difference in those loops is the behavior of FastObj_to_host (which
creates a new host object every time) vs.  Obj_to_host (which exploits a cached
host object).  The results:

  marvin@smokey:~/projects/ks4361/perl $ perl -Mblib time_callbacks.pl 
                     Rate no_cached_object    cached_object
  no_cached_object 1.93/s               --             -66%
  cached_object    5.71/s             196%               --

On its own, this performance gap probably doesn't justify sacrificing clarity
of implementation.  It only matters for host-language subclassing, where
performance is going to be heavily compromised anyway -- and if your method
does something non-trivial, this effect is going to get swamped.  

However...

> But what do we lost by not retaining a permanent host obj wrapper?

We lose convenient use of the "inside-out-object" pattern. :(

The "inside-out object" design pattern is a technique that has been increasing
in popularity within the Perl community for a few years now -- Perl 5.10
provides the Hash::Util::FieldHash class specifically to support it:

    http://perldoc.perl.org/Hash/Util/FieldHash.html#The-Inside-out-Technique

The inside-out technique is the *only* way we will be able to attach arbitary
data to a pure-Perl subclass of a Lucy object.  Within the KinoSearch
distribution, all of the pure-Perl KSx classes depend on it --
KSx::Search::MockScorer, KSx::Simple, KSx::Remote::SearchServer,
KSx::Remote::SearchClient, etc -- and it is essential for making pure-Perl
subclassing of Lucy feasible.

Here's example code from the docs for KinoSearch::Obj which illustrates the
technique, attaching a "foo" variable to our "MyObj" subclass.

    package MyObj;
    use base qw( KinoSearch::Obj );

    # Inside-out member var.
    my %foo;

    sub new {
        my ( $class, %args ) = @_;
        my $foo = delete $args{foo};
        my $self = $class->SUPER::new(%args);
        $foo{$$self} = $foo;
        return $self;
    }

    sub get_foo {
        my $self = shift;
        return $foo{$$self};
    }

    sub DESTROY {
        my $self = shift;
        delete $foo{$$self};
        $self->SUPER::DESTROY;
    }

The problem is that DESTROY method.  It is vital, as the Hash::Util::FieldHash
documentation explains:

    When a normal object dies, anything stored in the object body is
    garbage-collected by perl. With inside-out objects, Perl knows nothing
    about the data stored in field hashes by a class, but these must be
    deleted when the object goes out of scope. Thus the class must provide a
    DESTROY method to take care of that.

Under the cached-host-object algorithm, that DESTROY method is called only
once, when the Lucy object has really had its refcount fall to 0.  However, if
we create host wrappers for each callback, DESTROY gets called *multiple*
times, and the attached data gets zapped too soon.

It is techically possible to address the problem like so:

    sub DESTROY {
        my $self = shift;
        if ( $self->get_refcount == 1 ) {
            delete $foo{$$self};
        }
        $self->SUPER::DESTROY;
    }

However, having to explain to the user that DESTROY will be called many times
for each Lucy object -- and having to expose get_refcount and explain why it's
needed -- makes for a stupid and ugly API.

> (Note that if it's a real performance problem, then in certain cases
> you could eg explicitly retain a reference at the top of the loop, do
> the loop, then drop the reference at the end).

This is a novel idea.  I'd really like to get to a place where someone besides
me would feel comfortable going in and making such an optimization.

I've at least made some progress on the simplification front, which I'll
explain in another post.

Marvin Humphrey

##############################################
# Benchmark Obj_to_host vs. FastObj_to_host. #
##############################################

use strict;
use warnings;

use KinoSearch;
use Benchmark qw( cmpthese );

my $foo_query = KinoSearch::Search::TermQuery->new(
    field => 'content',
    term  => 'foo',
);
my $bar_query = KinoSearch::Search::TermQuery->new(
    field => 'content',
    term  => 'bar',
);
$foo_query->set_thing( KinoSearch::Util::VArray->new( capacity => 16 ) );
$bar_query->set_thing( KinoSearch::Util::Hash->new( capacity => 16 ) );

cmpthese(
    10, 
    {   no_cached_object => sub { $foo_query->get_thing for 1 .. 100000 },
        cached_object    => sub { $bar_query->get_thing for 1 .. 100000 },
    }   
);

#############################################
# Patch to add "thing" member to TermQuery. #
#############################################

Index: ../core/KinoSearch/Search/TermQuery.bp
===================================================================
--- ../core/KinoSearch/Search/TermQuery.bp  (revision 4361)
+++ ../core/KinoSearch/Search/TermQuery.bp  (working copy)
@@ -12,6 +12,7 @@

     CharBuf *field;
     Obj     *term;
+    Obj     *thing;

     static incremented TermQuery*
     new(const CharBuf *field, const Obj *term);
@@ -33,6 +34,12 @@
     Obj*
     Get_Term(TermQuery *self);

+    public void
+    Set_Thing(TermQuery *self, Obj *thing);
+
+    public Obj*
+    Get_Thing(TermQuery *self);
+
     public incremented Compiler*
     Make_Compiler(TermQuery *self, Searchable *searchable, float boost);

Index: ../core/KinoSearch/Search/TermQuery.c
===================================================================
--- ../core/KinoSearch/Search/TermQuery.c   (revision 4361)
+++ ../core/KinoSearch/Search/TermQuery.c   (working copy)
@@ -29,6 +29,7 @@
     Query_init((Query*)self, 1.0f);
     self->field       = CB_Clone(field);
     self->term        = Obj_Clone(term);
+    self->thing       = NULL;
     return self;
 }

@@ -55,6 +56,7 @@
 {
     DECREF(self->field);
     DECREF(self->term);
+    DECREF(self->thing);
     Query_destroy((Query*)self);
 }

@@ -63,6 +65,15 @@
 Obj*
 TermQuery_get_term(TermQuery *self)  { return self->term; }

+Obj*
+TermQuery_get_thing(TermQuery *self)  { return self->thing; }
+void
+TermQuery_set_thing(TermQuery *self, Obj *thing)  
+{
+    DECREF(self->thing);
+    self->thing = thing ? INCREF(thing) : NULL;
+}
+
 bool_t
 TermQuery_equals(TermQuery *self, Obj *other)
 {
Index: ../perl/lib/KinoSearch/Search/TermQuery.pm
===================================================================
--- ../perl/lib/KinoSearch/Search/TermQuery.pm  (revision 4361)
+++ ../perl/lib/KinoSearch/Search/TermQuery.pm  (working copy)
@@ -41,7 +41,7 @@
 END_CONSTRUCTOR

 {   "KinoSearch::Search::TermQuery" => {
-        bind_methods      => [qw( Get_Field )],
+        bind_methods      => [qw( Get_Field Set_Thing Get_Thing )],
         make_constructors => ["new"],
         make_pod          => {
             synopsis    => $synopsis,

Re: Reference counting inside a GC host (was "real time updates")

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Fri, Mar 27, 2009 at 1:08 AM, Marvin Humphrey <ma...@rectangular.com> wrote:

> I think I have an approach that's going to allow us to eliminate FastObj: We
> lazily create the host object, and treat a NULL host_obj as semantically
> equivalent to a refcount of 1.

Much of this is beyond me, but... won't there be multiple references
in C to a given Lucy object, each of which would need to incRef the
RC?

Expanding on/restating my "sort of like autoboxing" idea...: what if
the Lucy obj never held onto the reference to the mirror host obj?
(And the Lucy obj did its own reference counting, separately from the
host's GC).

But, the reverse is allowed (the host can hold onto the Lucy mirror
obj that has incRef's (refers to) the Lucy obj).

When we cross the bridge, Lucy to host, we would make a new host
wrapper obj each time.  These host mirror objs are very lightweight
wrappers, right?  When that call returns across the bridge, Lucy drops
the reference to it.

The host is allowed to keep a reference somewhere to that host mirror
obj, and later cross the bridge in the reverse direction, using it.

Note that Lucy can still clearly store "other" host objects.  It's
just these 'thin host mirrors Lucy object" objects that would not be
retained in Lucy.

This then breaks the circular ref, and gives complete freedom for Lucy
to do whatever GC it needs, separate from the Host's GC.

But what do we lost by not retaining a permanent host obj wrapper?
(Note that if it's a real performance problem, then in certain cases
you could eg explicitly retain a reference at the top of the loop, do
the loop, then drop the reference at the end).

Mike

Re: Reference counting inside a GC host (was "real time updates")

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Tue, Mar 24, 2009 at 08:23:49PM -0500, Peter Karman wrote:

> I tried all the patterns thus far mentioned in the
> libswish3 Perl/XS bindings (SWISH::3) and finally (through much trial and error)
> ended up with the same pattern that Ferret (and I believe KS) uses: create host
> objects at the C boundaries, refcounting only the internal C structs, and then
> in the perl DESTROY() method doing something like:
> 
>   if (c_obj->ref_cnt--) {
>       swish_destroy(c_obj);
>   }
> 
> I've chosen to mitigate the host object expense by doing all the required heavy
> lifting (tight loops, etc) in native C, exposing only those methods that the
> Perl user really needs.

That's a good system.  And it may be that we can't improve on it in the
context of a garbage collecting host.

KS does something a little different these days, though.   Instead of creating
Perl objects at the Perl/C boundary, KS creates a Perl object and a C object
at the same time, and they are mated for life.[1]

HOWEVER, despite the fact that the C object and the Perl object point at each
other, we don't have a circular reference problem because the two objects
*share* a single, unified refcount.  

  Obj *self = Obj_new();        // SvREFCNT((SV*)self->ref.host_obj) == 1
  Obj_Inc_RefCount(self);       // SvREFCNT((SV*)self->ref.host_obj) == 2
  SV *pobj = Obj_To_Host(self); // SvREFCNT((SV*)self->ref.host_obj) == 3
  SV *copy = newSVsv(pobj);     // SvREFCNT((SV*)self->ref.host_obj) == 4
  Obj_Dec_RefCount(self)        // SvREFCNT((SV*)self->ref.host_obj) == 3
  Obj_Dec_RefCount(self)        // SvREFCNT((SV*)self->ref.host_obj) == 2
  Obj_Dec_RefCount(self)        // SvREFCNT((SV*)self->ref.host_obj) == 1
  Obj_Dec_RefCount(self)        // $obj->DESTROY invoked from Perl-space

Unlike SWISH3, KinoSearch::Obj::DESTROY knows that it will only be invoked
when the refcount is 0, so it can invoke the C destructor every time.

    void
    DESTROY(self)
        kino_Obj *self;
    PPCODE:
        Kino_Obj_Destroy(self);

In fact, every time that a KinoSearch::Obj has its refcount falls to 0, we venture
back into Perl-space.  That's because Obj_Dec_RefCount() is a wrapper around
SvREFCNT_dec(), and when SvREFCNT_dec() hits 0, it invokes the Perl DESTROY
method.

The advantage of this scheme is that we create fewer objects, and thus incur
less overhead at the host/C boundary.

The question I'm trying to answer is: Is there any way to make this work in
the context of a host like Ruby or Java, which use tracing garbage collection
instead of a refcounting scheme that we can piggyback on?

Marvin Humphrey

[1] OK, I'm lying.  KinoSearch classes which subclass  
    KinoSearch::Obj::FastObj actually behave exactly the same way as Ferret
    and SWISH.  It would be really nice if that were not the case, because if
    even really really smart people like Peter find this hard to follow, the
    system needs to be simplified.

Re: Reference counting inside a GC host (was "real time updates")

Posted by Marvin Humphrey <ma...@rectangular.com>.

> >> > I'd really like to kill of FastObj just for the sake of simplicity,
> >> > though.
> >>
> >> What objects are planned to subclass FastObj?
> >
> > VTable, plus the CharBuf family.
> >
> > Right now in KS, the following classes also descend from FastObj:
> >
> >    VArray
> >    Posting
> >    Token
> >    Inversion (nee TokenBatch)
> >    ByteBuf
> >    ViewByteBuf
> >    Undefined
> >
> > However, none of them have the declaration/bootstrapping problem presented by
> > VTable and CharBuf.  

I think I have an approach that's going to allow us to eliminate FastObj: We
lazily create the host object, and treat a NULL host_obj as semantically
equivalent to a refcount of 1.

chy_u32_t
lucy_Obj_dec_refcount(lucy_Obj *self)
{
    if (self->ref.host_obj) {
        chy_u32_t modified_refcount = SvREFCNT((SV*)self->ref.host_obj) - 1;
        /* If the SV's refcount falls to 0, DESTROY will be invoked from
         * Perl-space.
         */
        SvREFCNT_dec((SV*)self->ref.host_obj);
        return modified_refcount;
    }
    else {
        /* NULL host_obj implies a refcount of 1, so invoke Destroy(). */
        Lucy_Obj_Destroy(self);
        return 0;
    }
}

That allows us to declare VTable and CharBuf structs at compile time, by
inserting NULL where the host object would be.

I've switched oer VTable and CharBuf with either zero or maybe 1% impact on
performance (the numbers are noisy).

The only thing that's weird is that I can't figure out what's happening to the
Perl objects these persistent objects must be acquiring.  I think they're
getting zapped during Perl's "everyone out of the pool" global destruction
phase, but I'm not sure yet -- I tried to make VTables leak by incrementing the
refcount a while bunch of times in the VTable_singleton() constructor, but
couldn't get Valgrind to show anything.

Marvin Humphrey

Re: Reference counting inside a GC host (was "real time updates")

Posted by Michael McCandless <lu...@mikemccandless.com>.

Marvin Humphrey <ma...@rectangular.com> wrote:

>> So... one alternative would be to separately track a private-to-Lucy
>> refCount from the host object's refCount?
>
> Since you refer to the "host object's refCount", I take it the following
> passage applies to refcounted systems like Perl and Python, and not to GC
> systems like Ruby and Java:

Sorry, let me refine the proposal: let the host GC the host object,
and Lucy GC's the C object, and cross-link (circular ref) the firt
time the bridge is crossed.

I think Java (JNI) actually exposes a refCount-like API, ie you can
say "I (C) am referring to Java object X", which I think just adds a
new global ref that the tracing collector includes.  Presumably Ruby
exposes something similar?  And RefCount host languages obviously let
you incRef/decRef.

>> Then, for Lucy objects that
>> never cross the bridge you wouldn't have to make a "false" host
>> object.  But you'd need to take care to destroy an object when both
>> Lucy's Obj & the host's wrapper obj drop to refCount 1.
>
> OK, if I understand you correctly, then there would be two refcounts.
> In essence, we intentionally create a circular reference between the Lucy
> object and the host wrapper object.  However, we avoid the memory leak because
> whenever either of the two refcounts falls to one, we see whether the other
> refcount is also at 1 -- and if that's the case, we destroy the object.
>
> The problem I see here is that we don't necessarily have control over what
> happens during the host refcounting.  In Perl at least, I don't know of a way
> to override what happens during SvREFCNT_dec, because it's not a method.  So
> we can't set up a trigger that fires when the Perl object wrapper's refcount
> falls to 1 -- the only event we can count on is the call to $obj->DESTROY()
> when the perl reference count falls to 0.

Hmm, right, that's a problem.... I think Python also doesn't let you
override incRef/decRef for an object.

Maybe you could not keep a permanent ref to the host object?  Eg say a
Term crosses the bridge; at that point you could make a host object
for it, but when the term comes back across the bridge, remove that
reference.  If the host language keeps a ref and later crosses the
bridge backwards, it would hold a ref to the Lucy obj.

Ie things that make a stack-only foray into the host language would
get a temporary host object wrapper; sort of like autoboxing.

I haven't really though through the implications of that though...

>> This may also be better for non-refCount languages (Java).
>
> How would maintaining a refcount for a Java object work?  Under a tracing GC
> system, I think you could say that the refcount is "true" until the object
> becomes unreachable, at which point it's "false".  Other than the fact that
> the object is reachable, you don't know how many things are pointing at it.
> Because of that, there's no way to detect when the only reference left is the
> one from the Lucy object.

Sorry, I meant you'd use JNI's API to say "I am now referring to java
object X" and "I have stopped referring to java object X".  Basically,
a refCount.  But we still have the same tight-cycle challenge.

>> > However, that's not a major problem unless we're creating and
>> > destroying a boatload of small objects.
>
> ---->8 snip 8<----
>
>> I think Lucene has improved here too, especially on the indexing side
>> (though the searching side doesn't create too many tiny objects I
>> think).
>
> Nothing like indexing.  The only spot I can think of is the term dictionary
> index, with all those Term and TermInfo objects.

Yeah... LUCENE-1458 addresses that (no more TermInfo & Term instances
created for the terms index... though it's still creates String
instances).

> In Java, you could address that by turning a 1000-object array of 5-element
> TermInfo objects into 5 primitive-type arrays with 1000 slots each.  We want
> to do something like that anyway because of flexible indexing.

Right, column stride instead of row stride.

> In KS, we no longer generate a lot of Term/TermInfo objects because that data
> structure has been modified for use with mmap.  However, we'll want to mod
> that system further for flexible indexing and PFOR.

Right, LUCENE-1458 takes some similar steps forward -- it stores less
in RAM, eg we don't need most of the TermInfo for each indexed term;
just the long pointer into the tis file.

> I look forward to taking that up in the future.  :)

There's alot to take up in the future!!

>> > literals, and such in a bootstrap routine, but it's enough of a pain
>> > to set something like that up that I haven't gone and made such a
>> > change in KS.
>> >
>> > I'd really like to kill of FastObj just for the sake of simplicity,
>> > though.
>>
>> What objects are planned to subclass FastObj?
>
> VTable, plus the CharBuf family.
>
> Right now in KS, the following classes also descend from FastObj:
>
>    VArray
>    Posting
>    Token
>    Inversion (nee TokenBatch)
>    ByteBuf
>    ViewByteBuf
>    Undefined
>
> However, none of them have the declaration/bootstrapping problem presented by
> VTable and CharBuf.  Additionally, I hope to eliminate Token (by expanding the
> capabilities of Inversion), Posting is due for an index-time overhaul but
> isn't a problem at search-time, Undefined isn't important, and ByteBuf and
> ViewByteBuf don't present speed problems.
>
> Rebasing VArray and Inversion (which is a subclass of VArray) to be children
> of Obj rather than FastObj causes a performance dip of about 3% on the
> indexing benchmark.  I imagine that's as bad as it would be for any binding
> because Perl objects are pretty heavy.

OK

>> > The Ferret scheme won't cause problems with light usage of the
>> > library, because most of Lucy's work will be done within tight loops
>> > in the C core.
>>
>> What about a HitCollector in the host language?  Can you efficiently
>> re-use an object from the host language?  (Python has such tricks, eg
>> to re-use a TupleObject during iteration).
>
> Oh, definitely a HitCollector would be a problem.
>
> Restating the original assertion: the Ferret scheme won't cause problems if
> you avoid host-language subclasses.

OK

>> > Do we need to
>> > add all new objects to a giant Hash that we tell the host about, and
>> > yank C stack vars out of that Hash before the C function returns?
>>
>> What does Ruby's C-embedding API expose to interact with its GC?  I
>> would imagine it'd be similar to Java's (ie "here's a new global root
>> object")?
>
> I understand you can do something like that.

OK

>> I don't understand the C stack vars / hash question.
>
> Consider the following routine, set up for tracing garbage collection:
>
>    void
>    FSFolder_delete(FSFolder *self, const CharBuf *filename)
>    {
>        CharBuf *fullpath = (CharBuf*)GC_add(
>            CB_newf("%o/%o", self->path, filename));
>        bool_t result = Host_callback_i(self, "do_delete", 1,
>            ARG_STR("path", fullpath));
>        if (!result) {
>            THROW("failed to delete %o", fullpath);
>        }
>        GC_remove(fullpath);
>    }
>
> Assume that the CharBuf formatted constructor CB_newf() caches a Host obj
> within the CharBuf, and that no reference counting is performed because we're
> going to rely upon the host's tracing garbage collector.
>
> When "fullpath" is created at the top of our function, it's unreferenced as
> far as the Host's garbage collector is concerned.  The only place it exists is
> on the C stack -- but the garbage collector doesn't walk the C stack looking
> for roots.  It walks the host's stack, but not the C stack.
>
> As soon as fullpath is born, it is a candidate for garbage collection, and
> could be reclaimed at any moment.  Therefore, we *have* to call GC_add() to
> inform the Host about our new CharBuf.

You're going to need to register all threads with the host language
for this reason?  Doesn't java need the threads to stop mutating
things, at least to get the roots for tracing?

> If we *don't* call GC_add(), we could potentially get an invalid read error
> while composing the message for THROW.  Say that Host_callback_i doesn't send
> "fullpath" itself into host-space, but instead copies its contents into a host
> string type.  (This is actually what happens in the current KS implementation.)
> Now, say that a GC takes place during Host_callback_i().  Uh-oh, "fullpath" is
> toast -- it was unreferenced, and got reclaimed.  If THROW needs it, we'll be
> reading from freed memory.
>
> Our only remedy is to call GC_add() early on, protecting fullpath until we
> know we don't need it any more.  But calling GC_add() means that we have to
> call GC_remove() somewhere, or we'll leak memory.
>
> That call to GC_remove() at the end was what I meant by "yank C stack vars out
> of that Hash before the C function returns".
>
> Now, it turns out that the same function set up for a refcounting host looks
> pretty similar... eliminate the call to GC_add(), and change the GC_remove() to
> a DECREF():
>
>    void
>    FSFolder_delete(FSFolder *self, const CharBuf *filename)
>    {
>        CharBuf *fullpath = CB_newf("%o/%o", self->path, filename);
>        bool_t result = Host_callback_i(self, "do_delete", 1,
>            ARG_STR("path", fullpath));
>        if (!result) {
>            THROW("failed to delete %o", fullpath);
>        }
>        DECREF(fullpath);
>    }
>
> Perhaps it's possible to somehow have the DECREF() op perform double duty and
> trigger GC_remove() on a garbage collecting host.  I dunno, that seems likely
> to cause double-free problems, but I'll try thinking that through in a future
> post.

This is devilishly tricky stuff!

Mike

Re: Reference counting inside a GC host (was "real time updates")

Posted by Peter Karman <pe...@peknet.com>.

Marvin Humphrey wrote on 3/24/09 7:12 PM:
> On Fri, Mar 20, 2009 at 04:30:21PM -0400, Michael McCandless wrote:
> 
>> So... one alternative would be to separately track a private-to-Lucy
>> refCount from the host object's refCount?  
> 
> Since you refer to the "host object's refCount", I take it the following
> passage applies to refcounted systems like Perl and Python, and not to GC
> systems like Ruby and Java:
> 
>> Then, for Lucy objects that
>> never cross the bridge you wouldn't have to make a "false" host
>> object.  But you'd need to take care to destroy an object when both
>> Lucy's Obj & the host's wrapper obj drop to refCount 1.
> 
> OK, if I understand you correctly, then there would be two refcounts.
> In essence, we intentionally create a circular reference between the Lucy
> object and the host wrapper object.  However, we avoid the memory leak because
> whenever either of the two refcounts falls to one, we see whether the other
> refcount is also at 1 -- and if that's the case, we destroy the object.
> 
> The problem I see here is that we don't necessarily have control over what
> happens during the host refcounting.  In Perl at least, I don't know of a way
> to override what happens during SvREFCNT_dec, because it's not a method.  So
> we can't set up a trigger that fires when the Perl object wrapper's refcount
> falls to 1 -- the only event we can count on is the call to $obj->DESTROY()
> when the perl reference count falls to 0.
> 

I'm chiming in late on this thread, but I want to echo what Marvin says here
about the host refcounting. I tried all the patterns thus far mentioned in the
libswish3 Perl/XS bindings (SWISH::3) and finally (through much trial and error)
ended up with the same pattern that Ferret (and I believe KS) uses: create host
objects at the C boundaries, refcounting only the internal C structs, and then
in the perl DESTROY() method doing something like:

  if (c_obj->ref_cnt--) {
      swish_destroy(c_obj);
  }

I've chosen to mitigate the host object expense by doing all the required heavy
lifting (tight loops, etc) in native C, exposing only those methods that the
Perl user really needs.

>>> The Ferret scheme won't cause problems with light usage of the
>>> library, because most of Lucy's work will be done within tight loops
>>> in the C core.
>> What about a HitCollector in the host language?  Can you efficiently
>> re-use an object from the host language?  (Python has such tricks, eg
>> to re-use a TupleObject during iteration).
>
> Oh, definitely a HitCollector would be a problem.
>
> Restating the original assertion: the Ferret scheme won't cause problems if
> you avoid host-language subclasses.
>

Amen. I had to create a subclassing scheme for SWISH::3 for just that reason.
It's not as elegant as the Boilerplater VTable stuff, but does address the same
problem.

-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Re: Reference counting inside a GC host (was "real time updates")

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, Mar 20, 2009 at 04:30:21PM -0400, Michael McCandless wrote:

> So... one alternative would be to separately track a private-to-Lucy
> refCount from the host object's refCount?  

Since you refer to the "host object's refCount", I take it the following
passage applies to refcounted systems like Perl and Python, and not to GC
systems like Ruby and Java:

> Then, for Lucy objects that
> never cross the bridge you wouldn't have to make a "false" host
> object.  But you'd need to take care to destroy an object when both
> Lucy's Obj & the host's wrapper obj drop to refCount 1.

OK, if I understand you correctly, then there would be two refcounts.
In essence, we intentionally create a circular reference between the Lucy
object and the host wrapper object.  However, we avoid the memory leak because
whenever either of the two refcounts falls to one, we see whether the other
refcount is also at 1 -- and if that's the case, we destroy the object.

The problem I see here is that we don't necessarily have control over what
happens during the host refcounting.  In Perl at least, I don't know of a way
to override what happens during SvREFCNT_dec, because it's not a method.  So
we can't set up a trigger that fires when the Perl object wrapper's refcount
falls to 1 -- the only event we can count on is the call to $obj->DESTROY()
when the perl reference count falls to 0.

> This may also be better for non-refCount languages (Java).

How would maintaining a refcount for a Java object work?  Under a tracing GC
system, I think you could say that the refcount is "true" until the object
becomes unreachable, at which point it's "false".  Other than the fact that
the object is reachable, you don't know how many things are pointing at it.
Because of that, there's no way to detect when the only reference left is the
one from the Lucy object.

> > However, that's not a major problem unless we're creating and
> > destroying a boatload of small objects.

---->8 snip 8<----

> I think Lucene has improved here too, especially on the indexing side
> (though the searching side doesn't create too many tiny objects I
> think).

Nothing like indexing.  The only spot I can think of is the term dictionary
index, with all those Term and TermInfo objects.  

In Java, you could address that by turning a 1000-object array of 5-element
TermInfo objects into 5 primitive-type arrays with 1000 slots each.  We want
to do something like that anyway because of flexible indexing.

In KS, we no longer generate a lot of Term/TermInfo objects because that data
structure has been modified for use with mmap.  However, we'll want to mod
that system further for flexible indexing and PFOR.  

I look forward to taking that up in the future.  :)

> > literals, and such in a bootstrap routine, but it's enough of a pain
> > to set something like that up that I haven't gone and made such a
> > change in KS.
> >
> > I'd really like to kill of FastObj just for the sake of simplicity,
> > though.
> 
> What objects are planned to subclass FastObj?

VTable, plus the CharBuf family.  

Right now in KS, the following classes also descend from FastObj:

    VArray 
    Posting
    Token
    Inversion (nee TokenBatch)
    ByteBuf
    ViewByteBuf
    Undefined 

However, none of them have the declaration/bootstrapping problem presented by
VTable and CharBuf.  Additionally, I hope to eliminate Token (by expanding the
capabilities of Inversion), Posting is due for an index-time overhaul but
isn't a problem at search-time, Undefined isn't important, and ByteBuf and
ViewByteBuf don't present speed problems.

Rebasing VArray and Inversion (which is a subclass of VArray) to be children
of Obj rather than FastObj causes a performance dip of about 3% on the
indexing benchmark.  I imagine that's as bad as it would be for any binding
because Perl objects are pretty heavy.

> > The Ferret scheme won't cause problems with light usage of the
> > library, because most of Lucy's work will be done within tight loops
> > in the C core.
> 
> What about a HitCollector in the host language?  Can you efficiently
> re-use an object from the host language?  (Python has such tricks, eg
> to re-use a TupleObject during iteration).

Oh, definitely a HitCollector would be a problem.

Restating the original assertion: the Ferret scheme won't cause problems if
you avoid host-language subclasses.

> > Do we need to
> > add all new objects to a giant Hash that we tell the host about, and
> > yank C stack vars out of that Hash before the C function returns?
>
> What does Ruby's C-embedding API expose to interact with its GC?  I
> would imagine it'd be similar to Java's (ie "here's a new global root
> object")?

I understand you can do something like that.

> I don't understand the C stack vars / hash question.

Consider the following routine, set up for tracing garbage collection:

    void
    FSFolder_delete(FSFolder *self, const CharBuf *filename)
    {
        CharBuf *fullpath = (CharBuf*)GC_add(
            CB_newf("%o/%o", self->path, filename));
        bool_t result = Host_callback_i(self, "do_delete", 1, 
            ARG_STR("path", fullpath));
        if (!result) {
            THROW("failed to delete %o", fullpath);
        }
        GC_remove(fullpath);
    }

Assume that the CharBuf formatted constructor CB_newf() caches a Host obj
within the CharBuf, and that no reference counting is performed because we're
going to rely upon the host's tracing garbage collector.  

When "fullpath" is created at the top of our function, it's unreferenced as
far as the Host's garbage collector is concerned.  The only place it exists is
on the C stack -- but the garbage collector doesn't walk the C stack looking
for roots.  It walks the host's stack, but not the C stack.  

As soon as fullpath is born, it is a candidate for garbage collection, and
could be reclaimed at any moment.  Therefore, we *have* to call GC_add() to
inform the Host about our new CharBuf.

If we *don't* call GC_add(), we could potentially get an invalid read error
while composing the message for THROW.  Say that Host_callback_i doesn't send
"fullpath" itself into host-space, but instead copies its contents into a host
string type.  (This is actually what happens in the current KS implementation.)
Now, say that a GC takes place during Host_callback_i().  Uh-oh, "fullpath" is
toast -- it was unreferenced, and got reclaimed.  If THROW needs it, we'll be
reading from freed memory.

Our only remedy is to call GC_add() early on, protecting fullpath until we
know we don't need it any more.  But calling GC_add() means that we have to
call GC_remove() somewhere, or we'll leak memory.

That call to GC_remove() at the end was what I meant by "yank C stack vars out
of that Hash before the C function returns".

Now, it turns out that the same function set up for a refcounting host looks
pretty similar... eliminate the call to GC_add(), and change the GC_remove() to
a DECREF():

    void
    FSFolder_delete(FSFolder *self, const CharBuf *filename)
    {
        CharBuf *fullpath = CB_newf("%o/%o", self->path, filename);
        bool_t result = Host_callback_i(self, "do_delete", 1, 
            ARG_STR("path", fullpath));
        if (!result) {
            THROW("failed to delete %o", fullpath);
        }
        DECREF(fullpath);
    }

Perhaps it's possible to somehow have the DECREF() op perform double duty and
trigger GC_remove() on a garbage collecting host.  I dunno, that seems likely
to cause double-free problems, but I'll try thinking that through in a future
post.

Marvin Humphrey

Re: Reference counting inside a GC host (was "real time updates")

Posted by Michael McCandless <lu...@mikemccandless.com>.

Marvin Humphrey <ma...@rectangular.com> wrote:

>> > There are some quirks, though, with how it manages host objects.
>> > The default behavior is to create a host language object at the
>> > same time as the Boilerplater object, and have the host object
>> > manage the refcount.
>>
>> Hmm, sounds tricky... because there are typically consumers in C
>> and in the Host language and both need to incRef/decRef.
>
> Indeed, we need to accomodate refcounting ops both in the Lucy core
> and in the Host.  For a refcounted Host like Perl or Python, all of
> these ops will affect a single, *unified* refcount which resides in
> the cached Perl/Python object at self->ref.host_obj.  That's what I
> meant by "have the host manage the refcount" -- I wasn't clear
> enough that Lucy would be able to manipulate that host refcount
> using wrapper methods.
>
> The Lucy::Obj header at trunk/core/Lucy/Obj.bp will declare
> Inc_RefCount() and Dec_RefCount() methods:
>
>    /** Increment an object's refcount.
>     *
>     * @return the object, allowing an assignment idiom.
>     */
>    public incremented Obj*
>    Inc_RefCount(Obj *self);
>
>    /** Decrement an object's refcount, calling Destroy() if it hits 0.
>     *
>     * @return the modified refcount.
>     */
>    u32_t
>    Dec_RefCount(Obj *self);
>
> However, no implementation for these methods is provided in
> trunk/core/Lucy/Obj.c.  It will be up to the bindings to provide an
> implentation, or a linking error will occur.
>
> For the Perl bindings, we'll provide a second
> Obj.c at trunk/perl/xs/Lucy/Obj.c which will contain the following:
>
>    lucy_Obj*
>    lucy_Obj_inc_refcount(lucy_Obj *self)
>    {
>        SvREFCNT_inc_simple_void_NN((SV*)self->ref.host_obj);
>        return self;
>    }
>
>    chy_u32_t
>    lucy_Obj_dec_refcount(lucy_Obj *self)
>    {
>        chy_u32_t modified_refcount = SvREFCNT((SV*)self->ref.host_obj) - 1;
>        /* If the SV's refcount falls to 0, DESTROY will be invoked from
>         * Perl-space.
>         */
>        SvREFCNT_dec((SV*)self->ref.host_obj);
>        return modified_refcount;
>    }
>
> That's how most objects in Lucy will be managed.  However, that
> approach isn't ideal for all of them.
>
> The first, obvious objection to caching a host object inside every
> single Lucy object is that it wastes memory for those objects which
> never venture into Host-space; an integer refcount would require
> less overhead.  The "FastObj" class was originally written to
> address this concern.

So... one alternative would be to separately track a private-to-Lucy
refCount from the host object's refCount?  Then, for Lucy objects that
never cross the bridge you wouldn't have to make a "false" host
object.  But you'd need to take care to destroy an object when both
Lucy's Obj & the host's wrapper obj drop to refCount 1.

This may also be better for non-refCount languages (Java).

> However, that's not a major problem unless we're creating and
> destroying a boatload of small objects.  Lucene 1.4.3 was a
> profligate wastrel in this regard, but KinoSearch's basic
> architecture has gotten pretty lean and has room to get leaner
> still.  If memory use and speed were the only reasons to use
> FastObj, I think we could kill it off.

I think Lucene has improved here too, especially on the indexing side
(though the searching side doesn't create too many tiny objects I
think).

> However, there's a second, more annoying problem.  It's not possible
> to declare static structs which contain e.g. a Perl object, because
> all Perl objects are malloc'd at runtime.  That's inconvenient for
> declaring things like CharBuf literals or VTables:
>
>   /* Can't do this unless CharBuf is a subclass of FastObj. */
>   static CharBuf foo = {
>        (VTable*)&CHARBUF,
>        1,      /* ref.count */
>        "foo",  /* character data */
>        3,      /* size */
>        4       /* capacity (includes terminating NULL) */
>   };
>
> It's probably possible to initialize all of our VTables, CharBuf
> literals, and such in a bootstrap routine, but it's enough of a pain
> to set something like that up that I haven't gone and made such a
> change in KS.
>
> I'd really like to kill of FastObj just for the sake of simplicity,
> though.

What objects are planned to subclass FastObj?

>> > I've tried searching the web for resources on how to make
>> > refcounting and GC coexist happily, but I haven't found anything
>> > so far.  If anybody's got strong google-fu today or actually
>> > knows and can recommend some literature, I'm all ears.
>>
>> This is tricky!
>
> There's one scheme that I know will at least work under a tracing
> garbage collector: the one used by Ferret.
>
>    * Within the C portion of Lucy, perform integer refcounting.
>    * Every time a unique host wrapper object is created, increment
>      the refcount.
>    * Every time a host wrapper is destroyed, decrement the
>      refcount.
>
> In other words, for Hosts that use tracing garbage collection, all
> Lucy objects would use an integer refcount, and nobody would cache
> any host objects.

I think, similarly, Java (JNI) lets you add a global reference to a
Java object which is analagous to incref-ing.  But caching a host
object would be fine?  Why lose that? (I don't know Ferret/Ruby very
well...).

> However, the Ferret approach has a drawback: You create and destroy
> host wrappers every time you cross the host/C boundary.  That'll
> create a performance drag in some situations.

Yeah.

> The Ferret scheme won't cause problems with light usage of the
> library, because most of Lucy's work will be done within tight loops
> in the C core.

What about a HitCollector in the host language?  Can you efficiently
re-use an object from the host language?  (Python has such tricks, eg
to re-use a TupleObject during iteration).

> It also doesn't stop you from attaching host data to the C object
> using the "flyweight" design pattern, a.k.a. the "inside-out object"
> pattern, because you can still key data off of the unchanging C
> object memory address.

OK.

> However, once you start writing subclasses, all that OO overhead at
> the host/C boundary is going to slow down tight loops like
> Scorer_Next.
>
> Caching a host object for the life of the the Lucy object solves
> that problem, but I'm not sure how to do that within the context of
> a tracing garbage collector.
>
> We can assume that the program initiates within Host-space; most
> Lucy objects will be able to trace back to the host.  However,
> independent objects that we create as statics, globals, or C stack
> vars won't be visible to the garbage collector and will get
> reclaimed prematurely.
>
> Any ideas on how to pull off the caching trick?  Is there something
> we can do if we allocate space within all of our Lucy objects for
> *both* an integer refcount and a cached host object?  Do we need to
> add all new objects to a giant Hash that we tell the host about, and
> yank C stack vars out of that Hash before the C function returns?

What does Ruby's C-embedding API expose to interact with its GC?  I
would imagine it'd be similar to Java's (ie "here's a new global root
object")?

I don't understand the C stack vars / hash question.

Mike