You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Stein <gs...@lyra.org> on 2001/06/06 09:25:49 UTC

svn_kstring_t

As most of you may be aware, I have a particular dislike(*) to our abundance
of svn_string_t use. In my mind, the problem is exposed by constructs such
as this:

    svn_string_create("some constant string", pool);

We are creating an svn_string_t on the heap, allocating more bytes for the
string on the heap, and copying that string to the heap. All in the name of
passing a *constant* to a function.

When C defines a perfectly reasonable alternative (const char *) for this,
it just pisses me off. :-)


Now, it is somewhat possible to "fake" a constant svn_string_t. Consider its
definition:

typedef struct svn_string_t
{
  char *data;
  apr_size_t len;
  apr_size_t blocksize;
  apr_pool_t *pool;          
} svn_string_t;
		    
We could initialize a string like this:

  svn_string_t my_str = { CONSTANT, strlen(CONSTANT), 0, NULL };

The problem is that when we have a function such as:

  void some_function(svn_string_t *s);

As a caller, we don't know whether that function is going to modify the
string or not. If it *does*, then my_str's construction is going to cause a
segfault right quick.

We can ameliorate the problem somewhat by saying:

  svn_string_t my_str = { CONSTANT, strlen(CONSTANT), strlen(CONSTANT), pool };

Thus, when some_function() goes to change the string, it can at least
reallocate it from a pool. However, it still can't poke values into s->data
because those are bytes from CONSTANT which is (typically) mapped into
readonly memory.

As a result, for functions taking svn_string_t, even if they will treat them
as a constant, we must construct a "real" structure and pass that.

Even worse, if the function *will* treat it as a constant, we can't assume
that (based purely on the function signature) and must pass it an
svn_string_t that we are okay with having it changed. This could mean
creating a dup() of the string. (yes, it is possible to resolve the
situation in the function doc/comments, but that reliance causes
brittle-ness in the code)


So far we've discussed two problems:

1) a function taking an svn_string_t cannot tell its caller whether the
   argument will be modified or not (thus the caller must be conservative in
   the construction of the argument value)

2) because of (1), we must copy "STRING CONSTANTS" onto the heap, along with
   the supporting svn_string_t structure, to be passed to the function.


All that said... why do we have svn_string_t? Two words: counted strings.

The svn_string_t type means that we don't have to do a strlen() all over the
place; we can easily compute the length once, then pass it around.

Second: we can over-allocate the buffer so that we have space for various
operations (such as "append").


------------------

My recommendation is based on the assumption that Benefit #2 (over-allocate)
happens only in a few situations. Most of our operations are passing around
strings that will not be modified within the called function. With this in
mind, we can construct a more finely tuned structure:

typedef struct {
    const char *data;
    apr_size_t len;

} svn_kstring_t;


I'm calling it a "kstring" for "constant string" (some ancient programming
style uses "k" to mean "constant"). I avoided "cstring" because of confusion
with "C style strings". Another alternative might be "istring" for
"immutable string". Being counted strings, these could contain null bytes,
but the typical usage is for a C string, so I discounted names such as
"svn_membytes_t" or "svn_counted_t".


I'd like to propose that we introduce the svn_kstring_t type and begin
converting functions from "svn_string_t *" to "const svn_kstring_t *". Since
a kstring is immutable, there aren't a lot of functions to be added to
svn_string.h:

    svn_kstring_create()
    svn_kstring_ncreate()
    svn_kstring_createf()
    svn_kstring_createv()
    svn_kstring_dup()
    svn_kstring_isempty()
    svn_kstring_compare()
    svn_kstring_first_non_whitespace()
    svn_kstring_find_char_backward()

We can implement the other svn_string functions, but they would return a new
svn_kstring_t rather than modify-in-place. I don't have a particular feel
either way on that.

There are a number of other macros and functions for flipping between
svn_string_t and svn_kstring_t as needed.


As an aside: my particular desire to revisit this issue is because of the
language bindings issue. We'd like to be able to pass a language's native
string to our functions. However, when these functions take an svn_string_t,
we have two problems:

1) if the svn_string_t implies a modification (and we can't tell from the
   signature! we have to explicitly encode this knowledge into our binding
   code), then we must treat the parameter as IN/OUT and all that implies.

2) to create a proper svn_string_t, we need a pool. *what* pool should that
   be? This has two answers:
   a) most functions have a pool argument; encode knowledge for each
      function binding that says to use that pool
   b) create and toss a pool on each function invocation to hold the
      svn_string_t.

The above two problems are a real bitch for our language bindings. Yes, we
can solve it without a new svn_kstring_t, but it is a perfect demonstration
that svn_string_t is an improper type for many things. Consider
svn_fs_change_rev_prop:

svn_error_t *svn_fs_change_rev_prop (svn_fs_t *fs,
                                     svn_revnum_t rev,
				     svn_string_t *name,
                                     svn_string_t *value,
				     apr_pool_t *pool);

The binding language should be able to pass constant strings for name and
value. Not IN/OUT parameters. Nor should we have to add logic to our
bindings generation to say "well, svn_fs_change_rev_prop doesn't actually
change those values, so it is fine to construct pseudo-svn_string_t values)

If the above example used svn_kstring_t (or const char *), then we could
take the binding language's value and directly pass it in, without worry
about how to propagate changes back.



Okay... that is my opening salvo. I'll stop for now. :-)

Thoughts?


Cheers,
-g

(*) I find seven swear words in our codebase; all attributable to me; all in
    association with svn_string_t usage. :-)

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn_kstring_t

Posted by David Ascher <Da...@ActiveState.com>.
> The binding language should be able to pass constant strings for name and
> value. Not IN/OUT parameters. Nor should we have to add logic to our
> bindings generation to say "well, svn_fs_change_rev_prop doesn't actually
> change those values, so it is fine to construct pseudo-svn_string_t values)

Right on!  Mozilla's string types change all the time, but the inability
to deal w/ constant strings is a real pain in PyXPCOM.

Keep up the good fight, Greg! =)

--david

PS: it's especially a good fight if no one opposes you =)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn_kstring_t

Posted by Ben Collins-Sussman <su...@collab.net>.
Greg Stein <gs...@lyra.org> writes:

> svn_kstring_t

+1.

I am *so* there.  :)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: plan of action (was: Re: svn_kstring_t)

Posted by Karl Fogel <kf...@collab.net>.
Greg Stein <gs...@lyra.org> writes:
> Okay... make sure everything is checked in before you leave tonite. Or
> you'll be in merge hell tomorrow :-)

I'm committed, I'm committed!  Oh, be merciful! :-)

> I'm going to make a mother-commit to s/svn_string_t/svn_stringbuf_t/ across
> our whole code repository.
> 
> We'll let that shake out, then I'll start introducing svn_string_t (the name
> for what I called svn_kstring_t). The first place I'll be starting is with
> the FS library so that I can get good bindings around that sucker.

Sounds good.

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

plan of action (was: Re: svn_kstring_t)

Posted by Greg Stein <gs...@lyra.org>.
On Thu, Jun 07, 2001 at 02:33:47PM -0700, Daniel Rall wrote:
> Greg Stein <gs...@lyra.org> writes:
> 
> > On Wed, Jun 06, 2001 at 05:01:24PM -0400, Greg Hudson wrote:
> > > Some notes:
> > > 
> > > "string" (for constant counted string) and "stringbuf" (for modifiable
> > > counted string) are more intuitive names than "kstring" and "string".
> > 
> > Agreed. Unfortunately, I'm not sure if we could pull off this name change
> > since svn_string_t means something quite different. (I think the problem
> > mostly from the human angle of changing the semantics of the "svn_string_t"
> > which is in their brains)
> > 
> > I'm +1 on your suggested naming. Thoughts from others? 
> 
> Both my experience with the Java language and with other C APIs lead
> me to agree that putting "buf[fer]" into the name of the expandable
> string is much more intuitive.


Okay... make sure everything is checked in before you leave tonite. Or
you'll be in merge hell tomorrow :-)

I'm going to make a mother-commit to s/svn_string_t/svn_stringbuf_t/ across
our whole code repository.

We'll let that shake out, then I'll start introducing svn_string_t (the name
for what I called svn_kstring_t). The first place I'll be starting is with
the FS library so that I can get good bindings around that sucker.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn_kstring_t

Posted by Daniel Rall <dl...@finemaltcoding.com>.
Greg Stein <gs...@lyra.org> writes:

> On Wed, Jun 06, 2001 at 05:01:24PM -0400, Greg Hudson wrote:
> > Some notes:
> > 
> > "string" (for constant counted string) and "stringbuf" (for modifiable
> > counted string) are more intuitive names than "kstring" and "string".
> 
> Agreed. Unfortunately, I'm not sure if we could pull off this name change
> since svn_string_t means something quite different. (I think the problem
> mostly from the human angle of changing the semantics of the "svn_string_t"
> which is in their brains)
> 
> I'm +1 on your suggested naming. Thoughts from others? 

Both my experience with the Java language and with other C APIs lead
me to agree that putting "buf[fer]" into the name of the expandable
string is much more intuitive.

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn_kstring_t

Posted by Greg Stein <gs...@lyra.org>.
On Wed, Jun 06, 2001 at 05:01:24PM -0400, Greg Hudson wrote:
> Some notes:
> 
> "string" (for constant counted string) and "stringbuf" (for modifiable
> counted string) are more intuitive names than "kstring" and "string".

Agreed. Unfortunately, I'm not sure if we could pull off this name change
since svn_string_t means something quite different. (I think the problem
mostly from the human angle of changing the semantics of the "svn_string_t"
which is in their brains)

I'm +1 on your suggested naming. Thoughts from others? 

[ hmm. maybe we do a two-step: add svn_stringbuf_t first and do a mass
  rename. then we add svn_string_t. then start changing the usage. ]

> I'll assume your terminology for the rest of these comments, though.
> 
> Avoid macros which know about the internal structure of a string
> (either kind).  They complicate the ABI.

Hmm. Not sure that I see this. How can a macro affect the ABI? Are you
thinking in terms of making it harder in the future to change the structure
/ layout of the svn_string_t datatype? I don't see how avoiding macros can
help, given that the structure(s) are public.

> Creating a kstring will still require a pool, to allocate the
> structure, even though it doesn't require a pool to allocate the
> actual data.

Nope. We don't need pools for the structure:

    svn_kstring_t ks = { "hello", sizeof("hello") - 1 };

    function(&ks);

In fact, we can even put these into global const variables now (because
there is no pool field):

    static const svn_kstring_t ks = ...

Now... if the svn_kstring_t structure is *opaque*, then yes: we would need a
pool. But I'm not sure that I see the benefit for making it opaque.

> We could--I'm not sure whether I like this idea or not--pass around
> actual kstring structures and not merely pointers to them.  It's quite
> portable given that we're assuming ANSI C.  This would finesse the
> issue of requiring a pool, and would be very much like passing around
> separate <data,length> arguments without the pain.  But it has some
> minor down sides, like you can't use NULL any more.
> 
> It's also possible that we use constant, counted strings rarely enough
> that we should just pass around the data and length as separate
> parameters; there might not be enough pain to really warrant creating
> an abstraction.

I'm not that these follow since we can easily put the structures on the
stack...

Note that if a function is going to store an svn_kstring_t, it needs to
dup() it just like it would need to copy an svn_string_t (to ensure that the
passed-in value has the correct lifetime). That dup will be allocated from a
pool.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org