You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucy.apache.org by Nick Wellnhofer <we...@aevum.de> on 2014/07/29 01:12:59 UTC

[lucy-dev] Handling of methods excluded via Clownfish

Lucifers,

I’d like to discuss how to handle the case when a Perl class defines a method with the same name as a Clownfish method that is excluded from the bindings (manually or because the return type or parameters can’t be mapped). The current code throws an exception. This might be confusing if a user defines such a method in a subclass without even knowing that the same name is used by Clownfish under the hood.

I think the better approach would be to let Clownfish simply ignore such methods. They would work only in Perl-space and never be called from Clownfish. Since they’re not part of the Perl API, this shouldn’t cause problems. What do you think?

Nick

Re: [lucy-dev] Thread safety of class registry

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Tue, Aug 5, 2014 at 3:19 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> Yes, the special refcount value would be lost as soon as a host object is
> created. But we could use the second least significant bit of cfish_ref_t
> for the immortal flag, regardless of whether it contains a refcount or a
> host object. This would mean to mask the flag to zero whenever the host
> object is accessed but that's a small price to pay.
>
> Then we could check the flag in Obj#Destroy and throw an exception if it's
> set.

+1

More on hiding bits in pointers:

https://www.mikeash.com/pyblog/friday-qa-2013-09-27-arm64-and-you.html
http://www.sealiesoftware.com/blog/archive/2013/09/24/objc_explain_Non-pointer_isa.html

Marvin Humphrey

Re: [lucy-dev] Thread safety of class registry

Posted by Nick Wellnhofer <we...@aevum.de>.

On 04/08/2014 18:09, Marvin Humphrey wrote:
> Using a special value in `ref.count` such as -1 prevents us from caching a
> host object in `ref.host_obj` and thus has the consequence that a Clownfish
> object may be associated with multiple Perl objects during the course of its
> lifetime.  That means we need to account for the fact that the Perl-space
> DESTROY may be called an arbitrary number of times for a given Clownfish
> object.
>
> [...]
>
> So what I'm left with is the unsatisfying suggestion that Obj_make_immortal()
> be implemented in Perl by caching a host object and setting its refcount to
> half of the maximum.  In the absence of additional refcounting bugs, that
> yields immortality in practice.  But if there are refcounting bugs, the
> failure mode is bad: only long-running processes would be affected, making it
> hard to reproduce.

Yes, the special refcount value would be lost as soon as a host object is 
created. But we could use the second least significant bit of cfish_ref_t for 
the immortal flag, regardless of whether it contains a refcount or a host 
object. This would mean to mask the flag to zero whenever the host object is 
accessed but that's a small price to pay.

Then we could check the flag in Obj#Destroy and throw an exception if it's set.

Nick

Re: [lucy-dev] Thread safety of class registry

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Mon, Aug 4, 2014 at 2:42 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> OK, then let's discuss the immortal flag.
>
> To protect against refcount races on the Perl side, objects are shared using
> SvSHARE, right? So if a host object is created and the immortal flag is set,
> it could be shared automatically. This eliminates the need to override
> To_Host.
>
> To set the immortal flag, a simple function like Obj_make_immortal would be
> enough. This should be called immediately after object creation. We wouldn't
> even need a flag but only a special value like -1 in ref.count.
>
> Shouldn't be hard to implement.

Using a special value in `ref.count` such as -1 prevents us from caching a
host object in `ref.host_obj` and thus has the consequence that a Clownfish
object may be associated with multiple Perl objects during the course of its
lifetime.  That means we need to account for the fact that the Perl-space
DESTROY may be called an arbitrary number of times for a given Clownfish
object.

For various reasons (including incompatibility with inside-out member vars),
modern Clownfish avoids this pattern and tries to ensure that only a single
Perl object is ever associated with a Clownfish object during its lifetime --
but let's imagine breaking that discipline for the sake of argument.

For Class, all of whose instances are immortal, we could override Perl-space
DESTROY to do nothing.  (This would preclude using inside-out member vars with
Class, but it isn't designed to be subclassed so that's not a deal-killer.)
But for String, we can't do that -- so what would it mean to call
`Obj_make_immortal(klass->name)`?

What I'd like Obj_make_immortal() to do is cache a host object and set an
"immortal" flag on *that*, yielding the behavior that Perl exhibits with
regards to SVs which are SvIMMORTAL.

    http://perl5.git.perl.org/perl.git/blob/dee28d0dbca2d9719243e3e17392dcd12d011104:/sv.c#l6615

Unfortunately, `SvIMMORTAL` (which is not public) isn't a flag -- membership is
limited to a whitelist of Perl-internal SVs.

    http://perl5.git.perl.org/perl.git/blob/dee28d0dbca2d9719243e3e17392dcd12d011104:/sv.h#l2101

So what I'm left with is the unsatisfying suggestion that Obj_make_immortal()
be implemented in Perl by caching a host object and setting its refcount to
half of the maximum.  In the absence of additional refcounting bugs, that
yields immortality in practice.  But if there are refcounting bugs, the
failure mode is bad: only long-running processes would be affected, making it
hard to reproduce.

Marvin Humphrey

Re: [lucy-dev] Thread safety of class registry

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Tue, Aug 5, 2014 at 2:50 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> On 04/08/2014 22:37, Marvin Humphrey wrote:

>> Could we instead decide that Clone() is not thread safe?  And then in the
>> context of duplicating `klass->name` and the like we'd use some other
>> constructor instead of Clone().
>
> OK with me. It will only be surprising for anyone who really needs to clone
> a String and doesn't know that String is special in that regard.

Yeah, you're right. :\

We have a funny overlap between Clone() and Inc_RefCount().  Both involve
taking ownership of a refcount.  Both involve optimizations around
not-completely independent copies.

I feel like we ought to be able to disentangle this situation, and that when
we finish there should be fewer methods on Obj.  The best answer will probably
reveal itself as we add more host languages.

> It depends on what guarantees LFReg#Clone should make exactly. I think it's
> OK if it returns a clone of the registry in some arbitrary state during the
> cloning process. For example, if a single object is inserted during the
> cloning process, it would be OK to return a clone without that object. The
> registry could change immediately after the final size check anyway.
>
> But my implementation could return a version of the registry in a completely
> different state. If, during cloning, object A is inserted first and then
> object B, a version with only object B could be returned. This is clearly a
> bug, so let's skip that commit.
>
>> Iteration, in contrast, simply cannot defend against concurrent
>> modifications without locking.  And therefore this implementation suffices.
>> Very nice, actually!
>
> Iteration suffers from the same problem mentioned above. This could, for
> example, lead to finding a class but not its parent.
>
> This is all more complicated than I initially thought, so I'll leave LFReg
> as is.

Check out the caveats in the docs on iteration with Java's ConcurrentHashMap
or enumeration with C#'s ConcurrentDictionary.  It doesn't matter what kind of
engineering resources you throw at this problem -- even exemplary
implementations are incredibly hard to reason about and use safely.

"All shared state immutable and all mutable state thread-private" ought to be
an acceptable model of concurrency for Clownfish.  Abstracting a common
interface which generalizes the wildly varying multi-threading capabilities of
various hosts is not feasible in any case.

I'm content to have LockFreeRegistry remain a private implementation detail of
the Clownfish runtime forever -- exposing it as a public API is not central to
the project's mission.

>> One thing to think about is that if we add interfaces to Clownfish, we'll
>> have to normalize the various signatures of Next(), perhaps necessitating
>> that we access key and value via Get_Key() and Get_Value().  But that
>> applies to several other iterators as well, so shouldn't block this commit.
>
> +1 for Get_Key() and Get_Value(). I'm also for separate Iterator objects in
> general. But let's ignore that for now.

The iterator you wrote for LockFreeRegistry could easily be adapted for Hash.
I talked to my colleague Tim Wilkens about that today and it looks like he's
interested in picking up that task.

Marvin Humphrey

Re: [lucy-dev] Thread safety of class registry

Posted by Nick Wellnhofer <we...@aevum.de>.

On 04/08/2014 22:37, Marvin Humphrey wrote:
> On Mon, Aug 4, 2014 at 2:42 AM, Nick Wellnhofer <we...@aevum.de> wrote:
>> - Make String#Clone create a new String object
>>
>> This removes an optimization that immutable strings made possible. But
>> especially with regard to threads, I think it's important that String#Clone
>> really creates a new object and doesn't just increment the refcount.
>
> This is somewhat unfortunate because under hosts with tracing GC the
> optimization remains valid.  And of course the optimization remains valid
> within single threaded execution.  I liked what you had before!
>
> Could we instead decide that Clone() is not thread safe?  And then in the
> context of duplicating `klass->name` and the like we'd use some other
> constructor instead of Clone().

OK with me. It will only be surprising for anyone who really needs to clone a 
String and doesn't know that String is special in that regard.

>> - Implement Method#Clone
>> - Make sure to clone a Class's methods
>>
>> Not strictly needed right now but a good precaution.
>
> This change to Method_new() isn't safe if someone passes in a StackString,
> right?
>
> -    self->name          = Str_Clone(name);
> +    self->name          = (String*)INCREF(name);

No, it's safe. String#Inc_RefCount returns a new object for "wrapper" strings 
that don't own the char buffer (string->origin == NULL). StackStrings are 
always wrappers. (This also means that Inc_RefCount has to stay a method even 
with the immortal flag.)

> With regards to the other commit about cloning a Class's methods, it seems
> like an appropriate fix for the current implementation.  (Ultimately, I
> would prefer to use a raw Method** array rather than a VArray here in order to
> guarantee immutability.)

OK, let's use a raw array.

>> - Implement LFReg#Clone
>> - Implement iterator for LockFreeRegistry
>
> Since LockFreeRegistry only supports addition of elements and not replacement
> or removal, it is possible to implement Clone() with a lock-free algorithm
> that it is truly threadsafe.
>
> *   Iterate through all elements, counting the number of iterations.
> *   When finished, iterate again to confirm that no elements were added by
>      other threads while the cloning was underway, by ensuring that the total
>      number of elements hasn't changed.
> *   If the count *has* changed, iterate again to find and insert any missing
>      elements.
>
> The second count-only iteration can be avoided if we modify LockFreeRegistry
> to track its size.
>
> If we are going to implement functionality to duplicate a LockFreeRegistry, I
> think that the implementation should be threadsafe.  This is the only
> threadsafe class we support which has mutable data aside from refcounts, so
> the extra care is justified.

Good point.

It depends on what guarantees LFReg#Clone should make exactly. I think it's OK 
if it returns a clone of the registry in some arbitrary state during the 
cloning process. For example, if a single object is inserted during the 
cloning process, it would be OK to return a clone without that object. The 
registry could change immediately after the final size check anyway.

But my implementation could return a version of the registry in a completely 
different state. If, during cloning, object A is inserted first and then 
object B, a version with only object B could be returned. This is clearly a 
bug, so let's skip that commit.

> Iteration, in contrast, simply cannot defend against concurrent modifications
> without locking.  And therefore this implementation suffices.  Very nice,
> actually!

Iteration suffers from the same problem mentioned above. This could, for 
example, lead to finding a class but not its parent.

This is all more complicated than I initially thought, so I'll leave LFReg as is.

> One thing to think about is that if we add interfaces to Clownfish, we'll have
> to normalize the various signatures of Next(), perhaps necessitating that we
> access key and value via Get_Key() and Get_Value().  But that applies to
> several other iterators as well, so shouldn't block this commit.

+1 for Get_Key() and Get_Value(). I'm also for separate Iterator objects in 
general. But let's ignore that for now.

Nick

Re: [lucy-dev] Thread safety of class registry

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Mon, Aug 4, 2014 at 2:42 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> - Make String#Clone create a new String object
>
> This removes an optimization that immutable strings made possible. But
> especially with regard to threads, I think it's important that String#Clone
> really creates a new object and doesn't just increment the refcount.

This is somewhat unfortunate because under hosts with tracing GC the
optimization remains valid.  And of course the optimization remains valid
within single threaded execution.  I liked what you had before!

Could we instead decide that Clone() is not thread safe?  And then in the
context of duplicating `klass->name` and the like we'd use some other
constructor instead of Clone().

> - Special case String hash keys
>
> To make up for the commit above.

The principle of special-casing String keys in Hash for the sake of
optimization is sound because it's the truly common case.  But if we don't
make the commit above this commit may not be necessary.

> - Implement Method#Clone
> - Make sure to clone a Class's methods
>
> Not strictly needed right now but a good precaution.

This change to Method_new() isn't safe if someone passes in a StackString,
right?

-    self->name          = Str_Clone(name);
+    self->name          = (String*)INCREF(name);

Aside from that, the commit adding Method_Clone() seems fine in principle.

With regards to the other commit about cloning a Class's methods, it seems
like an appropriate fix for the current implementation.  (Ultimately, I
would prefer to use a raw Method** array rather than a VArray here in order to
guarantee immutability.)

> - Implement LFReg#Clone
> - Implement iterator for LockFreeRegistry

Since LockFreeRegistry only supports addition of elements and not replacement
or removal, it is possible to implement Clone() with a lock-free algorithm
that it is truly threadsafe.

*   Iterate through all elements, counting the number of iterations.
*   When finished, iterate again to confirm that no elements were added by
    other threads while the cloning was underway, by ensuring that the total
    number of elements hasn't changed.
*   If the count *has* changed, iterate again to find and insert any missing
    elements.

The second count-only iteration can be avoided if we modify LockFreeRegistry
to track its size.

If we are going to implement functionality to duplicate a LockFreeRegistry, I
think that the implementation should be threadsafe.  This is the only
threadsafe class we support which has mutable data aside from refcounts, so
the extra care is justified.

Iteration, in contrast, simply cannot defend against concurrent modifications
without locking.  And therefore this implementation suffices.  Very nice,
actually!

One thing to think about is that if we add interfaces to Clownfish, we'll have
to normalize the various signatures of Next(), perhaps necessitating that we
access key and value via Get_Key() and Get_Value().  But that applies to
several other iterators as well, so shouldn't block this commit.

Marvin Humphrey

[lucy-dev] Thread safety of class registry

Posted by Nick Wellnhofer <we...@aevum.de>.

On 04/08/2014 04:50, Marvin Humphrey wrote:
> On Sun, Aug 3, 2014 at 3:20 PM, Nick Wellnhofer <we...@aevum.de> wrote:
>> But I never understood exactly why the refcount hack is needed at all.
>
> These days, the custom refcounting methods on Class, HashTombStone and
> RawPosting faciliate two objectives: compatibility with threads, and speed.
>
> For the sake of compatibility with threads, we made two changes a while back:
>
> *   Eliminate refcounting of Class ("VTable" at the time).
> *   Use LockFreeRegistry instead of Hash for the Clownfish class registry.
>
> After those changes, you still couldn't share Clownfish-based objects across
> threads, but basic object creation and destruction were now possible without
> refcount race conditions on elements of the global OO infrastructure.

OK, then let's discuss the immortal flag.

To protect against refcount races on the Perl side, objects are shared using 
SvSHARE, right? So if a host object is created and the immortal flag is set, 
it could be shared automatically. This eliminates the need to override To_Host.

To set the immortal flag, a simple function like Obj_make_immortal would be 
enough. This should be called immediately after object creation. We wouldn't 
even need a flag but only a special value like -1 in ref.count.

Shouldn't be hard to implement.

Then I'd like to salvage most commits from the clone_class_registry branch:

- Make String#Clone create a new String object

This removes an optimization that immutable strings made possible. But 
especially with regard to threads, I think it's important that String#Clone 
really creates a new object and doesn't just increment the refcount.

- Special case String hash keys

To make up for the commit above.

- Implement Method#Clone
- Make sure to clone a Class's methods

Not strictly needed right now but a good precaution.

- Implement LFReg#Clone
- Implement iterator for LockFreeRegistry

Especially the latter might be useful later.

Nick

Re: [lucy-dev] Handling of methods excluded via Clownfish

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Sun, Aug 3, 2014 at 3:20 PM, Nick Wellnhofer <we...@aevum.de> wrote:
> But I never understood exactly why the refcount hack is needed at all.

These days, the custom refcounting methods on Class, HashTombStone and
RawPosting faciliate two objectives: compatibility with threads, and speed.

For the sake of compatibility with threads, we made two changes a while back:

*   Eliminate refcounting of Class ("VTable" at the time).
*   Use LockFreeRegistry instead of Hash for the Clownfish class registry.

After those changes, you still couldn't share Clownfish-based objects across
threads, but basic object creation and destruction were now possible without
refcount race conditions on elements of the global OO infrastructure.

Eliminating refcounting on Class objects also had the side effect of saving a
handful of CPU cycles.  Similarly, we save a small amount of time and effort
by not refcounting the HashTombStone singleton.

Lastly, RawPosting instances are allocated using a MemoryPool rather than
malloc(), which allows us to bulk-deallocate groups of them in a single call
to MemPool_Release().  (This is possible only because we use RawPostings in a
closed environment and do not allow any references to escape.)  Deallocating
lots of small objects can be expensive because of the bookkeeping involved in
a heap memory allocator.  However, there are alternative techniques we could
use to achieve similar optimizations, and the MemPool approach has saved less
time than I would have hoped anyway.

Marvin Humphrey

Re: [lucy-dev] Handling of methods excluded via Clownfish

Posted by Nick Wellnhofer <we...@aevum.de>.

On Aug 3, 2014, at 21:50 , Marvin Humphrey <ma...@rectangular.com> wrote:

> Hmm.  I'm sorry but I'm not a fan of this approach, which throws away hard-won
> invariants in the current codebase and conflicts with the philosophy which
> undergirds the class registry: our OO infrastructure is thread-safe even
> though most objects are not.
> 
> We've gone to a lot of trouble to guarantee that Class objects will always be
> true global singletons -- that there will always be one and only one Class
> object per process for a given class, across all threads.  If we can no longer
> count on that, reasoning about concurrency becomes much harder.  We are
> precluded from doing things like returning an object from a child thread to a
> parent thread:
> 
> 1.  Child thread spawned.
> 2.  FooJr class initialized in child thread.
> 3.  FooJr object returned to parent thread.
> 4.  FooJr class initialized in parent thread.
> 5.  Now FooJr_Equals() fails, FooJr_Is_A() fails.
> 
> There are probably other subtle bugs lurking.  Non-constant global variables
> are evil.  Singleton initialization is a notorious problem area.
> 
> Meanwhile, there is a competing approach (per-object flags) which might not
> only address these thread-safety issues, but allow us to remove several
> methods from Obj.
> 
> I'm comfortable with making the Lucy 0.4.0 release without these thread-safety
> fixes, and I was willing to put off discussing per-object flags at your
> request.  I think it's important to evaluate that approach before
> proceeding... but can we agree to do nothing for the time being?

Of course, we can agree on that. I’d only like to add that I see this approach simply as a Perl-only fix. Other host languages should stick with a single per-process class registry. I don’t see a problem with using different approaches to threading for different languages.

Perl’s threading model is a bit special and it’s extremely hard to make it play nicely with objects that wrap C pointers. For now, I’d be happy with a solution that allows to use Clownfish objects in a completely isolated way within a Perl thread. Cloning the class registry is a straight-forward solution and I don’t see how it would throw anything away or change the guarantees we make for other host languages. It also requires no changes to other parts of the code and doesn’t cause any performance loss.

I haven’t thought it through yet, but I think with a cloned class registry, we don’t need any special handling of refcounts of immortal objects. So we might be able to remove the refcount methods as well. But I never understood exactly why the refcount hack is needed at all.

Nick

Re: [lucy-dev] Handling of methods excluded via Clownfish

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Sun, Aug 3, 2014 at 9:12 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> On Aug 3, 2014, at 11:41 , Nick Wellnhofer <we...@aevum.de> wrote:
>
>> Another approach is to clone the class registry for every Perl thread. Something like that should work:
>>
>> - Store the class registry in a Perl global. This would mean we need a
>>   host-specific function to fetch the registry.
>> - Define the special CLONE method which is invoked whenever Perl creates
>>   a new thread. This method would clone the registry and store it in the
>>   new thread’s version of the global variable.
>>
>> Then we wouldn’t have to care about thread-safety of refcount handling at all.
>
> I implemented this approach in branch ‘clone_class_registry’.

Hmm.  I'm sorry but I'm not a fan of this approach, which throws away hard-won
invariants in the current codebase and conflicts with the philosophy which
undergirds the class registry: our OO infrastructure is thread-safe even
though most objects are not.

We've gone to a lot of trouble to guarantee that Class objects will always be
true global singletons -- that there will always be one and only one Class
object per process for a given class, across all threads.  If we can no longer
count on that, reasoning about concurrency becomes much harder.  We are
precluded from doing things like returning an object from a child thread to a
parent thread:

1.  Child thread spawned.
2.  FooJr class initialized in child thread.
3.  FooJr object returned to parent thread.
4.  FooJr class initialized in parent thread.
5.  Now FooJr_Equals() fails, FooJr_Is_A() fails.

There are probably other subtle bugs lurking.  Non-constant global variables
are evil.  Singleton initialization is a notorious problem area.

Meanwhile, there is a competing approach (per-object flags) which might not
only address these thread-safety issues, but allow us to remove several
methods from Obj.

I'm comfortable with making the Lucy 0.4.0 release without these thread-safety
fixes, and I was willing to put off discussing per-object flags at your
request.  I think it's important to evaluate that approach before
proceeding... but can we agree to do nothing for the time being?

Marvin Humphrey

Re: [lucy-dev] Handling of methods excluded via Clownfish

Posted by Nick Wellnhofer <we...@aevum.de>.

On Aug 3, 2014, at 11:41 , Nick Wellnhofer <we...@aevum.de> wrote:

> Another approach is to clone the class registry for every Perl thread. Something like that should work:
> 
> - Store the class registry in a Perl global. This would mean we need a
>  host-specific function to fetch the registry.
> - Define the special CLONE method which is invoked whenever Perl creates
>  a new thread. This method would clone the registry and store it in the
>  new thread’s version of the global variable.
> 
> Then we wouldn’t have to care about thread-safety of refcount handling at all.

I implemented this approach in branch ‘clone_class_registry’.

Nick

Re: [lucy-dev] Handling of methods excluded via Clownfish

Posted by Nick Wellnhofer <we...@aevum.de>.

On Aug 2, 2014, at 20:24 , Marvin Humphrey <ma...@rectangular.com> wrote:

> Agreed about mutability being OK during initalization, but what I'm referring
> to are member variables:
> 
> *   Clownfish::Class => "name" (String)
> *   Clownfish::Class => "methods" (VArray)
> *   Clownfish::Method => "name" (String)
> *   Clownfish::Method => "host_alias" (String)
> 
> Under refcounting hosts, the refcounts of all of those objects are vulnerable
> to race bugs under multi-threading.
> 
> Under Perl we also have the cached host object to think about; that won't be a
> concern under e.g. Python because Clownfish objects will inherit from Python's
> `object` rather than cache a host object.  Perl is probably an odd case here.
> 
> Since the refcounts on Class and Method objects are held constant, they are
> not themselves vulnerable to refcount race conditions -- but that's not true
> for member variables whose refcounts are not held constant.

Ah, the refcounts.

> In addition, the "methods" VArray is a concern because unlike the Strings, its
> content is mutable.
> 
> I think the most straightforward way to address this issue would be to use
> raw arrays instead of Clownfish types, with `char*` in place of String
> and `Method**` in place of the VArray.  That would be annoying, though,
> for frequently accessed members like `name`.
> 
> Perhaps a better option would be to control refcounting with a flag.
> 
>        if (!(obj->ref.count & CFISH_OBJ_IS_IMMORTAL)) {
>            ...
>        }

This would also mean that “immortal” Perl host objects must be SvSHAREd, right? Otherwise, manipulating the Perl refcount wouldn’t be thread-safe.

> If we do that, then we don't need refcounting to go through method calls any
> more.  The only reason we've ever needed to override the refcounting methods
> is to thwart refcounting (Class, Method, HashTombStone, RawPosting).

Another approach is to clone the class registry for every Perl thread. Something like that should work:

- Store the class registry in a Perl global. This would mean we need a
  host-specific function to fetch the registry.
- Define the special CLONE method which is invoked whenever Perl creates
  a new thread. This method would clone the registry and store it in the
  new thread’s version of the global variable.

Then we wouldn’t have to care about thread-safety of refcount handling at all.

Nick

Re: [lucy-dev] Handling of methods excluded via Clownfish

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Sat, Aug 2, 2014 at 6:00 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> On Aug 1, 2014, at 18:34 , Marvin Humphrey <ma...@rectangular.com> wrote:
>
>> It would be great if we could replace To_Host() with an ordinary subroutine
>> and remove it as a method from Obj.
>
> Can we postpone this and simply remove the void* mapping for now?

No objections here.  Thanks for digging up that snippet!

>> (Hmm, it occurs to me that other global classes such as Clownfish::Method may
>> also need SvSHARE enabled.  Also, using non-threadsafe classes such as String
>> and VArray within Class and Method is probably a bug.)
>
> Yes, it looks like Clownfish::Method needs SvSHARE. The use of
> non-threadsafe classes shouldn’t be a problem if data is modified only
> during initialization. Afterwards, Class and Method should be treated as
> immutable.

Agreed about mutability being OK during initalization, but what I'm referring
to are member variables:

*   Clownfish::Class => "name" (String)
*   Clownfish::Class => "methods" (VArray)
*   Clownfish::Method => "name" (String)
*   Clownfish::Method => "host_alias" (String)

Under refcounting hosts, the refcounts of all of those objects are vulnerable
to race bugs under multi-threading.

Under Perl we also have the cached host object to think about; that won't be a
concern under e.g. Python because Clownfish objects will inherit from Python's
`object` rather than cache a host object.  Perl is probably an odd case here.

Since the refcounts on Class and Method objects are held constant, they are
not themselves vulnerable to refcount race conditions -- but that's not true
for member variables whose refcounts are not held constant.

In addition, the "methods" VArray is a concern because unlike the Strings, its
content is mutable.

I think the most straightforward way to address this issue would be to use
raw arrays instead of Clownfish types, with `char*` in place of String
and `Method**` in place of the VArray.  That would be annoying, though,
for frequently accessed members like `name`.

Perhaps a better option would be to control refcounting with a flag.

        if (!(obj->ref.count & CFISH_OBJ_IS_IMMORTAL)) {
            ...
        }

If we do that, then we don't need refcounting to go through method calls any
more.  The only reason we've ever needed to override the refcounting methods
is to thwart refcounting (Class, Method, HashTombStone, RawPosting).

Marvin Humphrey

Re: [lucy-dev] Handling of methods excluded via Clownfish

Posted by Nick Wellnhofer <we...@aevum.de>.

On Aug 1, 2014, at 18:34 , Marvin Humphrey <ma...@rectangular.com> wrote:

> It would be great if we could replace To_Host() with an ordinary subroutine
> and remove it as a method from Obj.

Can we postpone this and simply remove the void* mapping for now?

> (Hmm, it occurs to me that other global classes such as Clownfish::Method may
> also need SvSHARE enabled.  Also, using non-threadsafe classes such as String
> and VArray within Class and Method is probably a bug.)

Yes, it looks like Clownfish::Method needs SvSHARE. The use of non-threadsafe classes shouldn’t be a problem if data is modified only during initialization. Afterwards, Class and Method should be treated as immutable.

Nick

Re: [lucy-dev] Handling of methods excluded via Clownfish

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Thu, Jul 31, 2014 at 12:26 PM, Nick Wellnhofer <we...@aevum.de> wrote:

> OTOH, do we have to expose To_Host at all? It doesn't make sense to call
> To_Host from a host language since it will simply return the same object.
> And overriding it in the host language would result in a endless recursive
> call.

It would be great if we could replace To_Host() with an ordinary subroutine
and remove it as a method from Obj.

Here's the implementation in the Perl bindings for Obj, which would be serve
as the basis of the new subroutine:

    http://s.apache.org/hE8

There are four custom implementations of To_Host().

Clownfish::Class and Clownfish::LockFreeRegistry need to set a flag to enable
sharing across Perl ithreads:

    http://s.apache.org/Yqf
    http://s.apache.org/p4L

(Hmm, it occurs to me that other global classes such as Clownfish::Method may
also need SvSHARE enabled.  Also, using non-threadsafe classes such as String
and VArray within Class and Method is probably a bug.)

Clownfish::Err needs to set a flag to enable stringification overloading:

    http://s.apache.org/ve

Lucy::Document::Doc needs to set a flag to enable hashref overloading:

    http://s.apache.org/rH

The problem is that these flags are set on the *outer* RV (in at least some
versions of Perl), while Clownfish only keeps a copy of the inner object SV.

We could evaluate a return to caching the RV in Clownfish objects rather than
the inner object SV -- to_host() would then be implemented with
`newSVsv(obj->ref.host_obj)` rather than `newRV_inc(obj->ref.host_obj)`.  This
would allow Class, LockFreeRegistry, Err and Doc to get at an outer RV and set
those flags during object construction, which would be propagated via newSVsv.

Marvin Humphrey

Re: [lucy-dev] Handling of methods excluded via Clownfish

Posted by Nick Wellnhofer <we...@aevum.de>.

On Jul 31, 2014, at 19:00 , Nick Wellnhofer <we...@aevum.de> wrote:

> On Jul 31, 2014, at 17:16 , Marvin Humphrey <ma...@rectangular.com> wrote:
> 
>> However, we'll need to figure out what to do about To_Host() first.
> 
> Defining a new type cfish_host_obj_t in cfish_hostdefs.h for host language objects looks like the sanest approach to me. With such a type, CFC could create the appropriate mappings easily. This would also allow us to remove a few casts to SV* in XSBind.c.

OTOH, do we have to expose To_Host at all? It doesn't make sense to call To_Host from a host language since it will simply return the same object. And overriding it in the host language would result in a endless recursive call.

Nick

Re: [lucy-dev] Handling of methods excluded via Clownfish

Posted by Nick Wellnhofer <we...@aevum.de>.

On Jul 31, 2014, at 17:16 , Marvin Humphrey <ma...@rectangular.com> wrote:

> However, we'll need to figure out what to do about To_Host() first.

Defining a new type cfish_host_obj_t in cfish_hostdefs.h for host language objects looks like the sanest approach to me. With such a type, CFC could create the appropriate mappings easily. This would also allow us to remove a few casts to SV* in XSBind.c.

Nick

Re: [lucy-dev] Handling of methods excluded via Clownfish

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Tue, Jul 29, 2014 at 3:14 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> There is also a minor inconsistency in the functions that map a method's
> return type for XSUBs and callbacks. In CFCPerlTypeMap_to_perl which is used
> for XSUBs we have this piece of code:
>
>     else if (CFCType_is_composite(type)) {
>         if (strcmp(type_str, "void*") == 0) {
>             // Assume that void* is a reference SV -- either a hashref or an
>             // arrayref.
>             result = CFCUtil_sprintf("newRV_inc((SV*)%s)", cf_var);
>         }
>     }
>
> The code in CFCPerlMethod_callback_def doesn't support 'void*' return types.
> Assuming that void pointers are a reference SV looks a bit fragile to me.
> Are there any places where we rely on this behavior?

Your analysis is dead on.

That snippet is an artifact of a dreadful hack I thought I had eliminated a
while ago.  Once upon a time, the autogenerated XS binding for
LUCY_Doc_Get_Fields depended on that hack, but we now supply a hand-rolled XS
binding so it's no longer an issue.

However, it turns out that disabling that block prevents autogenerated XS
bindings from being created for the following methods, visible by diffing the
autogenerated files Clownfish.xs and Lucy.xs before and after.

    CFISH_Obj_To_Host
    CFISH_Class_To_Host
    CFISH_Err_To_Host
    CFISH_LFReg_To_Host

    LUCY_Doc_To_Host
    LUCY_MemPool_Grab
    LUCY_SortCache_Get_Ords

Calling either MemPool#grab or SortCache#get_ords from Perl is going to
explode when the autogenerated XS binding calls newRV_inc() on their return
values -- illustrating why we need to give this hack the heave-ho.

However, we'll need to figure out what to do about To_Host() first.

Marvin Humphrey

Re: [lucy-dev] Handling of methods excluded via Clownfish

Posted by Nick Wellnhofer <we...@aevum.de>.

On 29/07/2014 04:44, Marvin Humphrey wrote:
> Hmm, this would mean that an object can have two distinct methods with the
> same name which get invoked in different contexts: in C-space, one method
> gets called, while in Perl-space, another one gets called.
>
> I feel like that makes it harder to reason about the class -- when you talk
> about calling "do_stuff" on an object, which "do_stuff" are you referring to?
>
> It wouldn't be confusing to have two different methods for different contexts
> if they were conspicuously scope-limited by a `private` qualifier or something
> similar.  But instead what differentiates these methods from other methods
> might be quite subtle: returning `char*`, or accepting an argument of type
> `void*`.
>
> Another item worth noting is that the proposed behavior change optimizes for a
> different failure mode.  The current behavior protects inquisitive hackers
> attempting to override a method they "shouldn't" know about from silent
> failure.  The proposed change protects innocents who don't know about a hidden
> method from noisy failure.
>
> Can you provide a concrete example?

An example would be a Clownfish class that implements a helper function that 
can't be mapped to Perl for some reasons. If a Perl subclass tries to 
implement the same helper, things will fail.

I agree that it's confusing that methods would live in different contexts just 
because of their signature. To fix this, I propose to force methods to be 
declared private if they can't be mapped to Perl. If a non-private method 
can't be mapped, the Clownfish compiler would die with an error message 
suggesting to make the method private.

All these issues aren't important for a Clownfish release. I created a new 
branch 'overridden_aliases' yesterday to fix some issues when overriding 
aliased methods from Perl. There are similar issues when overriding excluded 
methods. The current implementation allows to override private or excluded 
methods. This is something I want to fix before release. The question above 
just popped up when looking at the code.

There is also a minor inconsistency in the functions that map a method's 
return type for XSUBs and callbacks. In CFCPerlTypeMap_to_perl which is used 
for XSUBs we have this piece of code:

     else if (CFCType_is_composite(type)) {
         if (strcmp(type_str, "void*") == 0) {
             // Assume that void* is a reference SV -- either a hashref or an
             // arrayref.
             result = CFCUtil_sprintf("newRV_inc((SV*)%s)", cf_var);
         }
     }

The code in CFCPerlMethod_callback_def doesn't support 'void*' return types. 
Assuming that void pointers are a reference SV looks a bit fragile to me. Are 
there any places where we rely on this behavior?

Nick

Re: [lucy-dev] Handling of methods excluded via Clownfish

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Mon, Jul 28, 2014 at 4:12 PM, Nick Wellnhofer <we...@aevum.de> wrote:
> I’d like to discuss how to handle the case when a Perl class defines a
> method with the same name as a Clownfish method that is excluded from the
> bindings (manually or because the return type or parameters can’t be
> mapped). The current code throws an exception. This might be confusing if a
> user defines such a method in a subclass without even knowing that the same
> name is used by Clownfish under the hood.
>
> I think the better approach would be to let Clownfish simply ignore such
> methods. They would work only in Perl-space and never be called from
> Clownfish. Since they’re not part of the Perl API, this shouldn’t cause
> problems. What do you think?

Hmm, this would mean that an object can have two distinct methods with the
same name which get invoked in different contexts: in C-space, one method
gets called, while in Perl-space, another one gets called.

I feel like that makes it harder to reason about the class -- when you talk
about calling "do_stuff" on an object, which "do_stuff" are you referring to?

It wouldn't be confusing to have two different methods for different contexts
if they were conspicuously scope-limited by a `private` qualifier or something
similar.  But instead what differentiates these methods from other methods
might be quite subtle: returning `char*`, or accepting an argument of type
`void*`.

Another item worth noting is that the proposed behavior change optimizes for a
different failure mode.  The current behavior protects inquisitive hackers
attempting to override a method they "shouldn't" know about from silent
failure.  The proposed change protects innocents who don't know about a hidden
method from noisy failure.

Can you provide a concrete example?

Marvin Humphrey

Re: [lucy-dev] Handling of methods excluded via Clownfish

Posted by Peter Karman <pe...@peknet.com>.

Nick Wellnhofer wrote on 7/28/14 7:12 PM:
> Lucifers,
> 
> I’d like to discuss how to handle the case when a Perl class defines a method
> with the same name as a Clownfish method that is excluded from the bindings
> (manually or because the return type or parameters can’t be mapped). The
> current code throws an exception. This might be confusing if a user defines
> such a method in a subclass without even knowing that the same name is used
> by Clownfish under the hood.
> 
> I think the better approach would be to let Clownfish simply ignore such
> methods. They would work only in Perl-space and never be called from
> Clownfish. Since they’re not part of the Perl API, this shouldn’t cause
> problems. What do you think?

I tend to agree with Marvin on this. If I spelunk the code and find a method I
want to override, and then can't figure out why Perl never calls it but C does,
I will tear what little hair I have left out trying to debug. POLS.

It's not unusual IME to have a superclass with private methods that are reserved
to the superclass. Often those methods have some kind of naming convention that
is documented, even if the methods themselves are undocumented, so that users
who extend the class know to stay clear of such names.

Example:

 Clownfish reserves private function and method names with the [[whatever]]
 prefix, so avoid those names in your subclass method names.

That's usually sufficient, IMO, as a caveat.

Even if we didn't/don't have naming prefix conventions, I would keep the
existing behavior of croaking if someone does hit on a name. That would be a
clear message that the name is reserved and to pick a different one.


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com