You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "Peter N. Lundblad" <pe...@famlundblad.se> on 2006/03/25 20:53:23 UTC

Re: [PATCH] was: More documentation for svn_utf_cstring_from_utf8_ex()[Show] etc.

Julian Foad writes:
 > Paul Burba wrote:
 > > 
 > > Here is my first stab at this.
 > > 
 > > While it isn't strictly necessary, I changed all callers to 
 > > svn_utf_cstring_*_utf8_ex() which passed a non-NULL convset_key to now 
 > > pass NULL.  This makes it clearer to someone working on the calling code 
 > > that this argument is no longer relevant.
 > 
 > Good.

This makes me question my suggestion not to rev the API. Obviously,
you should change the arguments to null everywhere, and then it
doesn't take much to rev the APIs. Well...

 > > Index: subversion/libsvn_subr/utf.c
 > > ===================================================================
 > > --- subversion/libsvn_subr/utf.c	(revision 19012)
 > > +++ subversion/libsvn_subr/utf.c	(working copy)
 > > @@ -729,6 +729,22 @@
 > >    xlate_handle_node_t *node;
 > >    svn_error_t *err;
 > >  
 > > +  /* In the cases of SVN_APR_LOCALE_CHARSET and SVN_APR_DEFAULT_CHARSET
 > > +   * frompage is really an int, not a valid string.  So generate a unique
 > > +   * key accordingly.  The exception is OS400 where code pages are always
 > > +   * ints. */
 > > +#ifndef AS400
 > > +  if (frompage == SVN_APR_LOCALE_CHARSET
 > > +      || frompage == SVN_APR_DEFAULT_CHARSET)
 > > +#endif
 > > +    convset_key = apr_psprintf(pool, "svn-utf-%dtou-xlate-handle",
 > > +                               (int)frompage);
 > > +#ifndef AS400
 > > +  else
 > > +    convset_key = apr_psprintf(pool, "svn-utf-%stou-xlate-handle",
 > > +                               frompage);
 > > +#endif
 > > +
 > >    SVN_ERR(get_xlate_handle_node(&node, SVN_APR_UTF8_CHARSET, frompage,
 > >                                  convset_key, pool));
 > 
 > That looks like it will work, but I was hoping we could do something more 
 > efficient, along the lines of:
 > 
 > * Use the object "struct {const char *frompage, topage; }" as the hash key, not 
 > caring whether the two pointers are real pointers or type-cast integers.

NOthing guarantees that two string constants that are equal share the
same address, even if the compiler *might* optimize that way.  But you
could probably save some cycles by using apr_pstrcat instead of
sprintf and that's as easy to read and maintain.

 > That would be done in a single place, in get_xlate_handle_node().
 > 
 > Presently, the code uses either a dedicated, global-ish hash 
 > "xlate_handle_hash" if our UTF-8 sub-system has been initialised 
 > (svn_utf_initialize()), otherwise the "user data" hash of whatever pool the 
 > caller supplies.
 > 
 > When using the pool user data as the hash, there are two additional 
 > requirements on the key: it must be a null-terminated cstring (because the user 
 > data access API requires this), and it must be a unique string within the 
 > program - hence the "svn-utf-" prefix and "-xlate-handle" suffix.
 > 
 > In order to use a bare pointer pair as a hash key, we must always use our own 
 > hash: we can't go looking up this key directly in the pool userdata hash. 
 > Therefore we must do one of:
 > 
 >    * Require that svn_utf_initialize() be used, which means we always use the 
 > dedicated "xlate_handle_hash".  We can remove the other code path except in the 
 > backwards-compatibility API.  I think this requires "revving" the API, but I 
 > don't see that as a problem (indeed, it would be good because we could 
 > eliminate the redundant arguments).
 > 
 > or
 > 
 >    * Store a translation handle hash (which uses our pointer pair as its key) 
 > within the pool userdata (which can then use a single, constant string as its key).
 > 

I think both of these are too messy for little gain.

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] was: More documentation for svn_utf_cstring_from_utf8_ex()[Show] etc.

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
Paul Burba writes:
 > Julian Foad <ju...@btopenworld.com> wrote on 03/26/2006 08:15:24 AM:
 > 
 > Sorry for the delay in getting back to this.  Let me know if this is what 
 > you all had in mind.
 > 

Looks good. +1.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] was: More documentation for svn_utf_cstring_from_utf8_ex()[Show] etc.

Posted by Julian Foad <ju...@btopenworld.com>.
Paul Burba wrote:
> Just an FYI: I tested this via ra_local and ra_dav on Windows XP (Apache 
> 2.0.54, Neon 0.24.7) with fsfs.  I also ran my OS400 versions of the test 
> suite.  No problems.
> 
> Also, FWIW, I ran two versions of the svnserve client tests side-by-side 
> against 1.3.1 on the iSeries, one with this change applied, one without, 
[...]

Thanks.  That's more than enough testing for me!  +1 to commit it.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] was: More documentation for svn_utf_cstring_from_utf8_ex()[Show] etc.

Posted by Paul Burba <pa...@softlanding.com>.
Julian Foad <ju...@btopenworld.com> wrote on 04/10/2006 10:51:56 AM:

> Paul Burba wrote:
> > Sorry for the delay in getting back to this.  Let me know if this is 
what 
> > you all had in mind.
> > 
> > Paul B.
> > 
> > [[[
> > Rev svn_utf_cstring_*_utf8_ex so xlate_handle_node_t structs are
> > cached based on the conversion taking place rather than a string
> > key argument.
> 
> Yes, this patch looks fine to me, though I haven't tested it.

Julian,

Just an FYI: I tested this via ra_local and ra_dav on Windows XP (Apache 
2.0.54, Neon 0.24.7) with fsfs.  I also ran my OS400 versions of the test 
suite.  No problems.

Also, FWIW, I ran two versions of the svnserve client tests side-by-side 
against 1.3.1 on the iSeries, one with this change applied, one without, 
trying to see if there was any noticeable performance difference (i.e. Did 
the string manipulation now required for every call to svn_utf_*_utf8_ex2 
show up in an appreciable way?).  The OS400 port makes fairly heavy use of 
the *_ex2 funcs since all the command line programs use it to convert 
incoming EBCDIC args to UTF-8.  Not exactly scientific, but they both ran 
within a few seconds of each other.  These tests take several hours to 
complete on the iSeries so that seems an insignificant difference.

Paul B.

_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] was: More documentation for svn_utf_cstring_from_utf8_ex()[Show] etc.

Posted by Julian Foad <ju...@btopenworld.com>.
Paul Burba wrote:
> Sorry for the delay in getting back to this.  Let me know if this is what 
> you all had in mind.
> 
> Paul B.
> 
> [[[
> Rev svn_utf_cstring_*_utf8_ex so xlate_handle_node_t structs are
> cached based on the conversion taking place rather than a string
> key argument.

Yes, this patch looks fine to me, though I haven't tested it.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] was: More documentation for svn_utf_cstring_from_utf8_ex()[Show] etc.

Posted by Paul Burba <pa...@softlanding.com>.
Julian Foad <ju...@btopenworld.com> wrote on 03/26/2006 08:15:24 AM:

> Peter N. Lundblad wrote:
> > I think both of these are too messy for little gain.
> 
> Well, I'd describe the first idea (requiring initialisation) as the 
> "tidy" way 
> in terms of its result (and I'm sure we'll want to do it sooner or 
> later), but 
> if you mean it's too much code churn for this particular gain then Ican 
agree 
> with you.
> 
> 
> After all that, I'm happy for you (Paul) to go with something like what 
you 
> had, but probably "revving" the API as Peter mentioned because I don't 
really 
> see a benefit in not doing so.
> 
> - Julian

Hello,

Sorry for the delay in getting back to this.  Let me know if this is what 
you all had in mind.

Paul B.

[[[
Rev svn_utf_cstring_*_utf8_ex so xlate_handle_node_t structs are
cached based on the conversion taking place rather than a string
key argument.

* subversion/include/svn_utf.h
  (svn_utf_cstring_to_utf8_ex2, svn_utf_cstring_from_utf8_ex2): New.
  (svn_utf_cstring_to_utf8_ex, svn_utf_cstring_from_utf8_ex):
  Deprecate.

* subversion/libsvn_subr/cmdline.c
  (SVN_UTF_CONTOU_XLATE_HANDLE, SVN_UTF_UTOCON_XLATE_HANDLE,
  SVN_UTF_ETOU_XLATE_HANDLE): Remove.
  (svn_cmdline_cstring_from_utf8, svn_cmdline_cstring_to_utf8,
  svn_cmdline__getopt_init): Change calls to svn_utf_cstring_*_utf8_ex
  to use svn_utf_cstring_*_utf8_ex2.

* subversion/libsvn_subr/io.c
  (SVN_UTF_ETOU_XLATE_HANDLE, SVN_UTF_UTOE_XLATE_HANDLE): Remove.
  (svn_io_create_unique_link, svn_io_read_link): Change calls to
   svn_utf_cstring_*_utf8_ex to use svn_utf_cstring_*_utf8_ex2.

* subversion/libsvn_subr/utf.c
  (get_xlate_key): New function.
  (convert_cstring): Update doc string.
  (svn_utf_cstring_to_utf8_ex2): New.
  (svn_utf_cstring_to_utf8_ex): Reimplement as wrapper around
  svn_utf_cstring_to_utf8_ex2.
  (svn_utf_cstring_from_utf8_ex2): New.
  (svn_utf_cstring_from_utf8_ex): Reimplement as wrapper around
  svn_utf_cstring_from_utf8_ex2.

* subversion/libsvn_client/diff.c,
  subversion/libsvn_diff/diff_file.c,
  subversion/libsvn_repos/hooks.c,
  subversion/libsvn_subr/stream.c,
  subversion/libsvn_subr/subst.c,
  subversion/svn/util.c:
  Change all callers of svn_utf_cstring_*_utf8_ex to use
  svn_utf_cstring_*_utf8_ex2.
]]]


_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

Re: [PATCH] was: More documentation for svn_utf_cstring_from_utf8_ex()[Show] etc.

Posted by Julian Foad <ju...@btopenworld.com>.
Peter N. Lundblad wrote:
> Julian Foad writes:
>  > > +#ifndef AS400
>  > > +  if (frompage == SVN_APR_LOCALE_CHARSET
>  > > +      || frompage == SVN_APR_DEFAULT_CHARSET)
>  > > +#endif
>  > > +    convset_key = apr_psprintf(pool, "svn-utf-%dtou-xlate-handle",
>  > > +                               (int)frompage);
>  > > +#ifndef AS400
>  > > +  else
>  > > +    convset_key = apr_psprintf(pool, "svn-utf-%stou-xlate-handle",
>  > > +                               frompage);
>  > > +#endif
>  > > +
>  > >    SVN_ERR(get_xlate_handle_node(&node, SVN_APR_UTF8_CHARSET, frompage,
>  > >                                  convset_key, pool));
>  > 
>  > That looks like it will work, but I was hoping we could do something more 
>  > efficient, along the lines of:
>  > 
>  > * Use the object "struct {const char *frompage, topage; }" as the hash key, not 
>  > caring whether the two pointers are real pointers or type-cast integers.
> 
> NOthing guarantees that two string constants that are equal share the
> same address, even if the compiler *might* optimize that way.  But you

I'm aware of that, but it was bad of me not to mention it.  I think the benefit 
of this caching would normally be reaped when called multiple times from the 
same place, in which case the pointer will be the same each time.

The same sort of limitation already applies when the subsystem has not been 
initialised: in that case, it's only calls from the same place (or, more 
accurately, that pass the same pool) that can share cached data.

However, now you are making me reconsider, and I agree it would be better to do 
the string concatenation and/or integer-to-ASCII conversion, as above, to 
ensure that the semantics are as unsurprising as possible and the opportunities 
for caching are as wide as possible.

> could probably save some cycles by using apr_pstrcat instead of
> sprintf and that's as easy to read and maintain.

Perhaps, at least in the non-AS400, non-APR_{DEFAULT,LOCALE}_CHARSET case, but 
I expect it's barely measurable.

To be honest, I haven't a clue how slow the translation look-up is without this 
caching, and therefore how much string manipulation is acceptable before it 
negates the benefit of the cache, so take my ideas with a pinch of salt.


>  >    * Require that svn_utf_initialize() be used, which means we always use the 
>  > dedicated "xlate_handle_hash".  We can remove the other code path except in the 
>  > backwards-compatibility API.  I think this requires "revving" the API, but I 
>  > don't see that as a problem (indeed, it would be good because we could 
>  > eliminate the redundant arguments).
>  > 
>  > or
>  > 
>  >    * Store a translation handle hash (which uses our pointer pair as its key) 
>  > within the pool userdata (which can then use a single, constant string as its key).
> 
> I think both of these are too messy for little gain.

Well, I'd describe the first idea (requiring initialisation) as the "tidy" way 
in terms of its result (and I'm sure we'll want to do it sooner or later), but 
if you mean it's too much code churn for this particular gain then I can agree 
with you.


After all that, I'm happy for you (Paul) to go with something like what you 
had, but probably "revving" the API as Peter mentioned because I don't really 
see a benefit in not doing so.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org