You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucy.apache.org by Nick Wellnhofer <we...@aevum.de> on 2013/05/02 17:12:57 UTC

Re: [lucy-dev] Proposal for implementation of immutable strings

On 30/04/2013 02:35, Marvin Humphrey wrote:
> I don't know of any other reference counting mechanisms with such a behavior.
> For that reason, I have a mild preference for a novel name.

+1 for CAPTURE/RELEASE, then.

> For String, though, it seems like one internal encoding matching the primary
> host encoding ought to suffice.

This should work. But for the Python bindings, it would mean to use the 
Python encoding for every Clownfish string, even if it isn't passed to 
the Python side at all. And what about the on-disk index? Since it 
probably should always use UTF-8, most strings passed to and from Python 
have to be reencoded anyway. So we could as well use UTF-8 for all 
Clownfish Strings.

Nick

Re: [lucy-dev] Proposal for implementation of immutable strings

Posted by Nick Wellnhofer <we...@aevum.de>.

On May 9, 2013, at 04:28 , Marvin Humphrey <ma...@rectangular.com> wrote:

> I coded up a Perl benchmark script using Inline::C which approximates the
> optimization under best-case conditions. (See below my sig.)
> 
> Here are results on one system:
> 
>                   Rate heap_latin1 heap_utf8 stack_latin1 stack_utf8
> heap_latin1   4591346/s          --       -6%         -67%       -69%
> heap_utf8     4906438/s          7%        --         -65%       -66%
> stack_latin1 13849117/s        202%      182%           --        -5%
> stack_utf8   14588883/s        218%      197%           5%         --

I get a speedup of about 130% on OS X, and 90% on a Linux system.

> That's not nothing, but it's not a 10x speedup either.  Perl's subroutine call
> overhead is pretty substantial.  Other languages may be better, but I bet
> Python and Ruby are comparably sluggish.

If a host language has immutable strings, it might be possible to retain a reference to the host language string. So the string buffer would never have to be copied.

> In the abstract, I like stack-allocated host string wrappers because the
> approach is appropos to the Clownfish mission.  But if they require sacrifices
> to the API (CAPTURE is inferior to INCREF), a gnarly implementation,
> and they don't speed things up all that much anyway, it may be time to give up
> on them and focus on other things.

> If we start translating to string SVs to Clownfish Strings the way that we
> translate AVs to VArrays and HVs to Hashes, we'll need to mortalize them so
> they don't leak on exceptions.
> 
> From XSBind_maybe_sv_to_cfish_obj:
> 
>            if (retval) {
>                // Mortalize the converted object -- which is somewhat
>                // dangerous, but is the only way to avoid requiring that the
>                // caller take responsibility for a refcount.
>                SV *mortal = (SV*)Cfish_Obj_To_Host(retval);
>                CFISH_DECREF(retval);
>                sv_2mortal(mortal);
>            }
> 
> That will make things slower still, but it's probably worth it for the sake of
> correctness.


Good point. I'd say we keep this optimization and switch to CAPTURE/RELEASE. I was only hoping we could find a way to reduce the amount of copying.

By the way, we'll have to rename Lock#Release and maybe MemoryPool#Release_All and FileHandle#Release_Window to avoid confusion.

Nick

Re: [lucy-dev] Proposal for implementation of immutable strings

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Tue, May 7, 2013 at 6:50 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> So in many cases a "zombie" string has to be copied anyway defeating the
> whole optimization (even costing a bit).

Any speedup is going to be dependent on a number of factors.

*   Whether the subroutine being invoked requires that the string be copied.
*   The cost ratio between host-subroutine-call overhead and duplicating the
    content with a heap-allocated Clownfish String.
*   Whether the host string encoding matches the Clownfish String native
    encoding.  (Perl scalars without the SVf_UTF8 flag set must be checked for
    whether they are ASCII and possibly converted from Latin-1 to UTF-8).

> But, as you say, it would be nice if this could be measured.

This stack-allocated-string mechanism sure does suck up a lot of developer
oxygen. :P  It's primarily justified as an optimization, and it's only affects
the speed of code which crosses the host/clownfish border.  Host code is
always going to be slow; optimizing pure Clownfish-flavored C is more
important, and stack-allocated string wrappers have no effect on that.

I coded up a Perl benchmark script using Inline::C which approximates the
optimization under best-case conditions. (See below my sig.)

Here are results on one system:

                   Rate heap_latin1 heap_utf8 stack_latin1 stack_utf8
heap_latin1   4591346/s          --       -6%         -67%       -69%
heap_utf8     4906438/s          7%        --         -65%       -66%
stack_latin1 13849117/s        202%      182%           --        -5%
stack_utf8   14588883/s        218%      197%           5%         --

That's not nothing, but it's not a 10x speedup either.  Perl's subroutine call
overhead is pretty substantial.  Other languages may be better, but I bet
Python and Ruby are comparably sluggish.

In the abstract, I like stack-allocated host string wrappers because the
approach is appropos to the Clownfish mission.  But if they require sacrifices
to the API (CAPTURE is inferior to INCREF), a gnarly implementation,
and they don't speed things up all that much anyway, it may be time to give up
on them and focus on other things.

>> I think that exceptions may longjmp out of Clownfish code past the host
>> wrapper, though.  Right now if that happens we can leak memory -- a newly
>> allocated Clownfish object needed for calling into Clownfish code may not
>> get decremented.  If we delay copying we might start getting memory read
>> errors, though.
>
> Right. The lazy copying won't work with longjmps.

If we start translating to string SVs to Clownfish Strings the way that we
translate AVs to VArrays and HVs to Hashes, we'll need to mortalize them so
they don't leak on exceptions.

>From XSBind_maybe_sv_to_cfish_obj:

            if (retval) {
                // Mortalize the converted object -- which is somewhat
                // dangerous, but is the only way to avoid requiring that the
                // caller take responsibility for a refcount.
                SV *mortal = (SV*)Cfish_Obj_To_Host(retval);
                CFISH_DECREF(retval);
                sv_2mortal(mortal);
            }

That will make things slower still, but it's probably worth it for the sake of
correctness.

Marvin Humphrey


use strict;
use warnings;
use Benchmark qw( cmpthese );
use Inline C => <<'END_C';

typedef struct String {
  union { int32_t count; void *host_obj; } ref;
  void *vtable;
  const char *content;
  size_t size;
  struct String *origin;
} String;

String*
String_init(String *self, const char *content, size_t size, int zombie) {
  self->vtable    = NULL;
  self->ref.count = 1;
  if (zombie) {
    self->content = content;
    self->size    = size;
    self->origin  = NULL;
  }
  else {
    self->content = (char*)malloc(size);
    memcpy((char*)self->content, content, size);
    self->origin = self;
  }
  return self;
}

void
String_destroy(String *self) {
  if (self->origin == self) {
    free((char*)self->content);
    free(self);
  }
  else if (self->origin == NULL) {
    croak("Can't destroy a zombie string!");
  }
}

void
do_nothing_with_string(String *string) {
  (void)string;
}

void
stack_wrap(SV *sv) {
  STRLEN len;
  char *content = SvPVutf8(sv, len);
  void *allocation = alloca(sizeof(String));
  String *string = String_init(allocation, content, len, TRUE);
  do_nothing_with_string(string);
}

void
heap_dupe(SV *sv) {
  STRLEN len;
  char *content = SvPVutf8(sv, len);
  void *allocation = malloc(sizeof(String));
  String *string = String_init(allocation, content, len, FALSE);
  do_nothing_with_string(string);
  String_destroy(string);
}

END_C

cmpthese(-1, {
  heap_latin1 => sub { heap_dupe("Mot\xF6rhead") },
  heap_utf8   => sub { heap_dupe("Smile: \x{263a}") },
  stack_latin1 => sub { stack_wrap("Mot\xF6rhead") },
  stack_utf8   => sub { stack_wrap("Smile: \x{263a}") },
})

Re: [lucy-dev] Proposal for implementation of immutable strings

Posted by Nick Wellnhofer <we...@aevum.de>.

On 07/05/2013 05:02, Marvin Humphrey wrote:
> We might want to introduce String as a subclass of CharBuf with all of the
> mutability disabled.  That makes it easy to break up monolithic interconnected
> webs of CharBuf usage into bite-sized chunks when switching over.  Once the
> transition is complete, we can remove the inheritance.

Yes, this might be helpful.

> That's true, but I still imagine stack wrappers are a win in the aggregate.
> (I wish I could quantify that assertion.)  It's very common to pass simple
> strings: field names, hash keys, query strings, file names and file paths,
> user names, URIs, etc.

My guess is that there will be a win, but that it's smaller than it 
might look. Some examples:

     * Hash keys: No copy needed if used to fetch from a hash. If used to
       store, they must be cloned.

     * Field names: No copy needed if used as hash key like in
       Schema_Fetch_Type. Will be copied in constructors like
       SortRule_new, Highlighter_new, *Query_new.

     * Query strings will need iterators, so a copy will be made.

     * File names and paths won't have to be copied mostly.

So in many cases a "zombie" string has to be copied anyway defeating the 
whole optimization (even costing a bit). But, as you say, it would be 
nice if this could be measured.

> I think that exceptions may longjmp out of Clownfish code past the host
> wrapper, though.  Right now if that happens we can leak memory -- a newly
> allocated Clownfish object needed for calling into Clownfish code may not get
> decremented.  If we delay copying we might start getting memory read errors,
> though.

Right. The lazy copying won't work with longjmps.

Nick

Re: [lucy-dev] Proposal for implementation of immutable strings

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Sat, May 4, 2013 at 4:43 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> On May 4, 2013, at 01:14 , Marvin Humphrey <ma...@rectangular.com> wrote:
>> OK, cool... Since it's primarily a naming change but disruptive, I'd
>> suggest the following order of actions:
>>
>> 1.  Grep for `INCREF` and `Inc_RefCount` and make sure all invocations
>>     capture the returned reference.
>> 2.  Implement immutable String.
>> 3.  Make the naming change.
>
> As a side note, step 3 is quite a bit more than a naming change.

I was unclear.  The "naming change" that I was referring to was INCREF/DECREF
to CAPTURE/RELEASE.  The mutable-CharBuf-to-immutable-String change is, as you
point out, more involved than a renaming.

> All usages of CB_Nip must be converted to string iterators since CB_Nip
> mutates the string. Additionally, all the places where we construct new
> strings have to be identified. In this case we have to keep using a CharBuf
> followed by CB_Yield_String after the construction is complete.

+1

We might want to introduce String as a subclass of CharBuf with all of the
mutability disabled.  That makes it easy to break up monolithic interconnected
webs of CharBuf usage into bite-sized chunks when switching over.  Once the
transition is complete, we can remove the inheritance.

>> Hmm.  Well, the most incremental strategy is to hard-code UTF-8 into String
>> for now and the Python bindings can just forego the stack-allocated-string
>> optimization until we make up our minds later.
>
> It just occurred to me that there's another problem with the INCREF
> approach. Since string iterators INCREF the source string, every zombie
> string would be copied as soon as  it's iterated. The String methods using
> zombie iterators wouldn't be affected, but it would still result in many
> unneccessary copies of host strings.

That's true, but I still imagine stack wrappers are a win in the aggregate.
(I wish I could quantify that assertion.)  It's very common to pass simple
strings: field names, hash keys, query strings, file names and file paths,
user names, URIs, etc.

> A possible solution would be to do away with stack allocation but to keep
> using the host string buffer directly. Before returning to the host
> language, we could check whether the refcount of the string is greater than
> one and copy the string only then. This scheme wouldn't require changing
> INCREF semantics.

I think that exceptions may longjmp out of Clownfish code past the host
wrapper, though.  Right now if that happens we can leak memory -- a newly
allocated Clownfish object needed for calling into Clownfish code may not get
decremented.  If we delay copying we might start getting memory read errors,
though.

Marvin Humphrey

Re: [lucy-dev] Proposal for implementation of immutable strings

Posted by Nick Wellnhofer <we...@aevum.de>.

On May 4, 2013, at 01:14 , Marvin Humphrey <ma...@rectangular.com> wrote:
> OK, cool... Since it's primarily a naming change but disruptive, I'd suggest
> the following order of actions:
> 
> 1.  Grep for `INCREF` and `Inc_RefCount` and make sure all invocations capture
>    the returned reference.
> 2.  Implement immutable String.
> 3.  Make the naming change.

As a side note, step 3 is quite a bit more than a naming change. All usages of CB_Nip must be converted to string iterators since CB_Nip mutates the string. Additionally, all the places where we construct new strings have to be identified. In this case we have to keep using a CharBuf followed by CB_Yield_String after the construction is complete.

> Hmm.  Well, the most incremental strategy is to hard-code UTF-8 into String
> for now and the Python bindings can just forego the stack-allocated-string
> optimization until we make up our minds later.

It just occurred to me that there's another problem with the INCREF approach. Since string iterators INCREF the source string, every zombie string would be copied as soon as  it's iterated. The String methods using zombie iterators wouldn't be affected, but it would still result in many unneccessary copies of host strings.

A possible solution would be to do away with stack allocation but to keep using the host string buffer directly. Before returning to the host language, we could check whether the refcount of the string is greater than one and copy the string only then. This scheme wouldn't require changing INCREF semantics.

Nick

Re: [lucy-dev] Proposal for implementation of immutable strings

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Thu, May 2, 2013 at 8:12 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> +1 for CAPTURE/RELEASE, then.

OK, cool... Since it's primarily a naming change but disruptive, I'd suggest
the following order of actions:

1.  Grep for `INCREF` and `Inc_RefCount` and make sure all invocations capture
    the returned reference.
2.  Implement immutable String.
3.  Make the naming change.

>> For String, though, it seems like one internal encoding matching the
>> primary host encoding ought to suffice.
>
> This should work. But for the Python bindings, it would mean to use the
> Python encoding for every Clownfish string, even if it isn't passed to the
> Python side at all. And what about the on-disk index? Since it probably
> should always use UTF-8, most strings passed to and from Python have to be
> reencoded anyway. So we could as well use UTF-8 for all Clownfish Strings.

Hmm.  Well, the most incremental strategy is to hard-code UTF-8 into String
for now and the Python bindings can just forego the stack-allocated-string
optimization until we make up our minds later.

Marvin Humphrey