You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Küng <to...@gmail.com> on 2012/09/03 20:47:59 UTC

thread safe

Hi,

It seems that the svn_config_t structure isn't thread safe, i.e. can't 
be shared among multiple threads.
See here for a detailed report on what problems this causes:
http://tortoisesvn.tigris.org/ds/viewMessage.do?dsForumId=757&dsMessageId=3003152

Is this by design? I assumed that all svn APIs are thread safe for read 
access at least unless noted otherwise (example: the API 
svn_utf_initialize states its thread-unsafeness in the note section of 
the docs).

While we can work around that thread problem a little bit, that 
workaround has its own big problems:
* each thread would need its own svn_config_t structure
   --> for each thread the config file is read (open, read, close)
   which results in multiple unnecessary disk accesses
   --> when the config file is read multiple times, sometimes the read
   fails (at least on Windows) because the file is opened already -
   usually because virus scanners open and scan every file that a
   process accesses, even when only for reading.

Joel Jirak made a patch for subversion/libsvn_subr/config.c which would 
fix the problem (see the thread linked above). Would that be an 
acceptable solution?

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: thread safe

Posted by Stefan Küng <to...@gmail.com>.
On 03.09.2012 21:09, Bert Huijben wrote:
>
>
>> -----Original Message-----
>> From: Stefan Küng [mailto:tortoisesvn@gmail.com]
>> Sent: maandag 3 september 2012 20:48
>> To: Subversion Development
>> Subject: thread safe
>>
>> Hi,
>>
>> It seems that the svn_config_t structure isn't thread safe, i.e. can't
>> be shared among multiple threads.
>> See here for a detailed report on what problems this causes:
>> http://tortoisesvn.tigris.org/ds/viewMessage.do?dsForumId=757&dsMessa
>> geId=3003152
>>
>> Is this by design? I assumed that all svn APIs are thread safe for read
>> access at least unless noted otherwise (example: the API
>> svn_utf_initialize states its thread-unsafeness in the note section of
>> the docs).
>
> I always assume(d) that all functions that are not documented otherwise are
> thread safe, but that they also assume that the data is not passed to
> multiple functions in multiple threads at once.
>
> So mostly the same thing as what the C runtime or the .Net runtime promise:
> Static functions are thread safe, but instances and functions on those are
> not.

Well, for the c-runtime functions and structs/data objects it's 
documented that they are thread safe for read-access. Only for write 
access you have to write your own thread guards.

As for the svn_config_t object after initializing it (i.e., reading the 
config from the config file and the registry) I assume that it isn't 
written to anymore.
So I assumed that read-only access from multiple threads wouldn't be a 
problem.

>
> In my testing via SharpSvn (and the bug reports over the years) this mostly
> proved safe. (There were a few issues fixed in early 1.6 development).
>
>
> In SharpSvn I keep a config instance alive per SvnClient. And in AnkhSVN I
> keep a small pool of recently used SvnClient instances at hand to avoid
> re-reading the config files over and over again.

Do you have a fixed number of threads that you use?
Because in TSVN we use a variable number of threads, depending on how 
many processors/cores are available.
we'd have to implement a somewhat complicated pool of config objects to 
re-use.

>
> This keeps svn_client_context_t and everything below thread safe as an
> SvnClient is documented to be only usable in one thread at a time.
>
>> While we can work around that thread problem a little bit, that
>> workaround has its own big problems:
>> * each thread would need its own svn_config_t structure
>>     --> for each thread the config file is read (open, read, close)
>>     which results in multiple unnecessary disk accesses
>>     --> when the config file is read multiple times, sometimes the read
>>     fails (at least on Windows) because the file is opened already -
>>     usually because virus scanners open and scan every file that a
>>     process accesses, even when only for reading.
>
> I think this is the proper way to fix this issue at your side: keep data
> thread safe.
>
> How are you handling the authentication cache, and other settings. I would
> just keep it safe for future extension.

All objects are created fresh and initialized for each thread. Except 
the svn_config_t object because of the very, very expensive 
initialization it requires. And because of the problems with accessing 
the config file from multiple threads.

>
>> Joel Jirak made a patch for subversion/libsvn_subr/config.c which would
>> fix the problem (see the thread linked above). Would that be an
>> acceptable solution?
>
> The patch on
> http://tortoisesvn.tigris.org/ds/getDSMessageAttachment.do/config.c.hack-fix
> .patch?dsForumId=757&dsMessageId=3003152&dsAttachmentId=3606977&dsAttachment
> Mime=application/octet-stream
> creates a string in a very short living thread pool, in a static function
> with a few callers that all have a pool available.
>
> With a bit of cleanup it should be possible to apply this patch with a
> proper scratch sub-pool and I don't see why such a patch wouldn't be
> accepted.
>
>
> But I don't think that this will be the solution to make the rest of
> Subversion thread safe, or to guarantee that all other memory structures
> will be thread safe in the future. (But it might make a read-once config
> struct reusable in multiple threads)

I don't expect all memory structures to be thread safe since most of 
them will be written to in the APIs.

What would also help: a deep-copy function for the svn_config_t object, 
something similar to e.g. svn_wc_dup_status3.
That way I could avoid initializing new objects by re-reading the config 
file but simply copy the already available data to the new object.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

RE: thread safe

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Stefan Küng [mailto:tortoisesvn@gmail.com]
> Sent: maandag 3 september 2012 20:48
> To: Subversion Development
> Subject: thread safe
> 
> Hi,
> 
> It seems that the svn_config_t structure isn't thread safe, i.e. can't
> be shared among multiple threads.
> See here for a detailed report on what problems this causes:
> http://tortoisesvn.tigris.org/ds/viewMessage.do?dsForumId=757&dsMessa
> geId=3003152
> 
> Is this by design? I assumed that all svn APIs are thread safe for read
> access at least unless noted otherwise (example: the API
> svn_utf_initialize states its thread-unsafeness in the note section of
> the docs).

I always assume(d) that all functions that are not documented otherwise are
thread safe, but that they also assume that the data is not passed to
multiple functions in multiple threads at once. 

So mostly the same thing as what the C runtime or the .Net runtime promise:
Static functions are thread safe, but instances and functions on those are
not.

In my testing via SharpSvn (and the bug reports over the years) this mostly
proved safe. (There were a few issues fixed in early 1.6 development).


In SharpSvn I keep a config instance alive per SvnClient. And in AnkhSVN I
keep a small pool of recently used SvnClient instances at hand to avoid
re-reading the config files over and over again.

This keeps svn_client_context_t and everything below thread safe as an
SvnClient is documented to be only usable in one thread at a time.

> While we can work around that thread problem a little bit, that
> workaround has its own big problems:
> * each thread would need its own svn_config_t structure
>    --> for each thread the config file is read (open, read, close)
>    which results in multiple unnecessary disk accesses
>    --> when the config file is read multiple times, sometimes the read
>    fails (at least on Windows) because the file is opened already -
>    usually because virus scanners open and scan every file that a
>    process accesses, even when only for reading.

I think this is the proper way to fix this issue at your side: keep data
thread safe.

How are you handling the authentication cache, and other settings. I would
just keep it safe for future extension.

> Joel Jirak made a patch for subversion/libsvn_subr/config.c which would
> fix the problem (see the thread linked above). Would that be an
> acceptable solution?

The patch on
http://tortoisesvn.tigris.org/ds/getDSMessageAttachment.do/config.c.hack-fix
.patch?dsForumId=757&dsMessageId=3003152&dsAttachmentId=3606977&dsAttachment
Mime=application/octet-stream
creates a string in a very short living thread pool, in a static function
with a few callers that all have a pool available.

With a bit of cleanup it should be possible to apply this patch with a
proper scratch sub-pool and I don't see why such a patch wouldn't be
accepted.


But I don't think that this will be the solution to make the rest of
Subversion thread safe, or to guarantee that all other memory structures
will be thread safe in the future. (But it might make a read-once config
struct reusable in multiple threads)

	Bert 


Re: thread safe

Posted by Stefan Küng <to...@gmail.com>.
Here's my attempt to implement two new APIs to duplicate an svn_config_t 
and a hash of those.

I factored out some code from svn_config_set into two new helper 
functions which I then used in the new dup functions.

Please review.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: thread safe

Posted by Stefan Küng <to...@gmail.com>.
On Wed, Sep 5, 2012 at 3:54 PM, Branko Čibej <br...@wandisco.com> wrote:

>>> As I said, the return value must be a svn_config_t*.
>>> More comments below (I'll ignore coding style).
>> Why must it be an svn_config_t* ?
>> That makes no sense: we want to duplicate the config hash.
>
> Ah, I begin to understand; you want to duplicate the result of
> svn_config_get_config, instead of the result of
> svn_config_create/svn_config_read. I tend to think these are two
> separate functions:
>
> svn_error_t *svn_config_dup(svn_config_t **cfgp, svn_config_t *src, apr_pool_t *pool);
> svn_error_t *svn_config_copy_config(apr_hash_t **cfg_hash, apr_hash_t *src_hash, apr_pool_t *pool);
>
>
> Sorry about the misunderstanding.

Ok, I'll create two separate dup functions then.
The hash-dup function can just call the config-dup function.

>
>> > And here; svn_config_set already has code to create option
>> > and section records, it only needs to be factored out.
>> Yes, but I don't like the idea of using svn_config_set to do this
>> because it does more than is necessary here.
>> We don't have to check whether to replace or add an entry because we
>> always add (we're copying).
>>
>> And the question is: do we want an exact copy or a new object,
>> initialized with the same data?
>
> I'm not suggesting you use svn_config_set, that would be wrong, because
> it fiddles with value expansions. I'm suggesting that you factor out the
> parts of svn_config_set that create section and options into separate
> static helper functions, so that the new code and svn_config_set can
> both use those. That's better than duplicating the code.

Ok, I'll do some factoring.

Stefan

-- 
       ___
  oo  // \\      "De Chelonian Mobile"
 (_,\/ \_/ \     TortoiseSVN
   \ \_/_\_/>    The coolest Interface to (Sub)Version Control
   /_/   \_\     http://tortoisesvn.net

Re: thread safe

Posted by Branko Čibej <br...@wandisco.com>.
On 05.09.2012 15:24, Stefan Küng wrote:
> On Wed, Sep 5, 2012 at 1:14 AM, Branko Čibej <br...@wandisco.com> wrote:
>> On 04.09.2012 22:24, Stefan Küng wrote:
>>> Attached is my first draft of the new svn_config_dup() API.
>>>
>>> * it does not return an svn_error_t but the duplicate hash directly.
>>> The only API call that could return an error is svn_config_create()
>>> but that API also returns SVN_NO_ERROR unconditionally (for now at
>>> least).
>>> * no error checking is done for now, will add some checks for failed
>>> memory allocations later.
>>> * hash keys are duplicated as well (allocated in the passed pool)
>>>
>>> Please have a look at this. I think it's correct, at least my first
>>> tests show that it works and solves my problem in TSVN with multiple
>>> threads as well.
>>>
>>> be back tomorrow...
>> Index: subversion/include/svn_config.h
>> ===================================================================
>> --- subversion/include/svn_config.h     (revision 1380737)
>> +++ subversion/include/svn_config.h     (working copy)
>> @@ -626,6 +626,16 @@
>>                                  const char *fname,
>>                                  apr_pool_t *pool);
>>
>> +/** Return a deep copy of the config hash.
>> + * @note use this function instead of repeated calls
>> + * to svn_config_get_config to avoid reading the config file
>> + * over and over.
>> + *
>> + * @since New in 1.8.
>> + */
>> +apr_hash_t *
>> +svn_config_dup(apr_hash_t * config, apr_pool_t * pool);
>>
>>
>> As I said, the return value must be a svn_config_t*.
>> More comments below (I'll ignore coding style).
> Why must it be an svn_config_t* ?
> That makes no sense: we want to duplicate the config hash.

Ah, I begin to understand; you want to duplicate the result of
svn_config_get_config, instead of the result of
svn_config_create/svn_config_read. I tend to think these are two
separate functions:

svn_error_t *svn_config_dup(svn_config_t **cfgp, svn_config_t *src, apr_pool_t *pool);
svn_error_t *svn_config_copy_config(apr_hash_t **cfg_hash, apr_hash_t *src_hash, apr_pool_t *pool);


Sorry about the misunderstanding.

> > And here; svn_config_set already has code to create option
> > and section records, it only needs to be factored out.
> Yes, but I don't like the idea of using svn_config_set to do this
> because it does more than is necessary here.
> We don't have to check whether to replace or add an entry because we
> always add (we're copying).
>
> And the question is: do we want an exact copy or a new object,
> initialized with the same data?

I'm not suggesting you use svn_config_set, that would be wrong, because
it fiddles with value expansions. I'm suggesting that you factor out the
parts of svn_config_set that create section and options into separate
static helper functions, so that the new code and svn_config_set can
both use those. That's better than duplicating the code.

-- Brane

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download


Re: thread safe

Posted by Stefan Küng <to...@gmail.com>.
On Wed, Sep 5, 2012 at 1:14 AM, Branko Čibej <br...@wandisco.com> wrote:
> On 04.09.2012 22:24, Stefan Küng wrote:
>> Attached is my first draft of the new svn_config_dup() API.
>>
>> * it does not return an svn_error_t but the duplicate hash directly.
>> The only API call that could return an error is svn_config_create()
>> but that API also returns SVN_NO_ERROR unconditionally (for now at
>> least).
>> * no error checking is done for now, will add some checks for failed
>> memory allocations later.
>> * hash keys are duplicated as well (allocated in the passed pool)
>>
>> Please have a look at this. I think it's correct, at least my first
>> tests show that it works and solves my problem in TSVN with multiple
>> threads as well.
>>
>> be back tomorrow...
>
> Index: subversion/include/svn_config.h
> ===================================================================
> --- subversion/include/svn_config.h     (revision 1380737)
> +++ subversion/include/svn_config.h     (working copy)
> @@ -626,6 +626,16 @@
>                                  const char *fname,
>                                  apr_pool_t *pool);
>
> +/** Return a deep copy of the config hash.
> + * @note use this function instead of repeated calls
> + * to svn_config_get_config to avoid reading the config file
> + * over and over.
> + *
> + * @since New in 1.8.
> + */
> +apr_hash_t *
> +svn_config_dup(apr_hash_t * config, apr_pool_t * pool);
>
>
> As I said, the return value must be a svn_config_t*.
> More comments below (I'll ignore coding style).

Why must it be an svn_config_t* ?
That makes no sense: we want to duplicate the config hash.

>
> Index: subversion/libsvn_subr/config.c
> ===================================================================
> --- subversion/libsvn_subr/config.c     (revision 1380737)
> +++ subversion/libsvn_subr/config.c     (working copy)
> @@ -1011,3 +1011,88 @@
>    sec = apr_hash_get(cfg->sections, cfg->tmp_key->data, APR_HASH_KEY_STRING);
>    return sec != NULL;
>  }
> +
> +
> +apr_hash_t *
> +svn_config_dup(apr_hash_t * config, apr_pool_t * pool)
> +{
> +  apr_hash_t * rethash;
> +  apr_hash_index_t * cidx;
> +  apr_hash_index_t * sectidx;
> +  apr_hash_index_t * optidx;
> +
> +  rethash = apr_hash_make(pool);
> +  for (cidx = apr_hash_first(pool, config);
> +       cidx != NULL;
> +       cidx = apr_hash_next(cidx))
> +  {
> +    const void *ckey = NULL;
> +    void *cval = NULL;
> +    apr_ssize_t ckeyLength = 0;
> +    svn_config_t * c;
> +    svn_config_t * newconfig = NULL;
> +
> +    svn_config_create(&newconfig, FALSE, pool);
>
>
> Um, what? You return a hash of svn_config_t? None of the
> svn_config_* APIs use such a structure.

Yes, the config hash that you get from
svn_config_get_config().
And the config hash you have to pass to client APIs as part of the
client context object.

>
> +
> +    apr_hash_this(cidx, &ckey, &ckeyLength, &cval);
> +    c = cval;
> +
> +    newconfig->sections = apr_hash_make(pool);
> +    newconfig->pool = pool;
> +    newconfig->x_pool = svn_pool_create(pool);
> +    newconfig->x_values = c->x_values;
> +    newconfig->tmp_key = svn_stringbuf_dup(c->tmp_key, pool);
> +    newconfig->tmp_value = svn_stringbuf_dup(c->tmp_value, pool);
> +    newconfig->section_names_case_sensitive = c->section_names_case_sensitive;
>
> svn_config_create already creates the hash table, populates
> pool and x_pool, and creates the TEMPORARY stringbufs.
> In other words, all you have to do is copy x_values and
> section_names_case_sensitive, everything else is already done.

ok, thanks for the hint.

> +
> +    for (sectidx = apr_hash_first(pool, c->sections);
> +         sectidx != NULL;
> +         sectidx = apr_hash_next(sectidx))
> +    {
> +      const void *sectkey = NULL;
> +      void *sectval = NULL;
> +      apr_ssize_t sectkeyLength = 0;
> +      cfg_section_t * sect;
> +      cfg_section_t * newsec;
> +
> +      apr_hash_this(sectidx, &sectkey, &sectkeyLength, &sectval);
> +      sect = sectval;
> +
> +      newsec = apr_palloc(pool, sizeof(*newsec));
> +      newsec->name = apr_pstrdup(pool, sect->name);
> +      newsec->hash_key = apr_pstrdup(pool, sect->hash_key);
> +      newsec->options = apr_hash_make(pool);
>
> Suggest you factor this part out of svn_config_set instead
> of duplicating the logic here.
>
> +      for (optidx = apr_hash_first(pool, sect->options);
> +           optidx != NULL;
> +           optidx = apr_hash_next(optidx))
> +      {
> +        const void *optkey = NULL;
> +        void *optval = NULL;
> +        apr_ssize_t optkeyLength = 0;
> +        cfg_option_t * opt;
> +        cfg_option_t * newopt;
> +
> +        apr_hash_this(optidx, &optkey, &optkeyLength, &optval);
> +        opt = optval;
> +
> +        newopt = apr_palloc(pool, sizeof(*newopt));
> +        newopt->name = apr_pstrdup(pool, opt->name);
> +        newopt->hash_key = apr_pstrdup(pool, opt->hash_key);
> +        newopt->value = apr_pstrdup(pool, opt->value);
> +        newopt->x_value = apr_pstrdup(pool, opt->x_value);
> +        newopt->expanded = opt->expanded;
> +        apr_hash_set(newsec->options,
> +                     apr_pstrdup(pool, (const char*)optkey),
> +                     optkeyLength, newopt);
>
> And here; svn_config_set already has code to create option
> and section records, it only needs to be factored out.

Yes, but I don't like the idea of using svn_config_set to do this
because it does more than is necessary here.
We don't have to check whether to replace or add an entry because we
always add (we're copying).

And the question is: do we want an exact copy or a new object,
initialized with the same data?

>
> +      }
> +      apr_hash_set(newconfig->sections,
> +                   apr_pstrdup(pool, (const char*)sectkey),
> +                   sectkeyLength, newsec);
> +    }
> +    apr_hash_set(rethash,
> +                 apr_pstrdup(pool, (const char*)ckey),
> +                 ckeyLength, newconfig);
> +  }
> +
> +  return rethash;
> +}
>
>
> So except for the rethash which I frankly don't understand at all, the
> rest looks like a good start. Basically just strip away the outer loop
> and rely on (refactored) existing code instead of copying it around, and
> you're done.

as I said, the hash is necessary because that's the object we want to
duplicate, the object that's passed to svn client APIs.
To deal with configs alone, we don't really need a duplicate function.

Stefan

-- 
       ___
  oo  // \\      "De Chelonian Mobile"
 (_,\/ \_/ \     TortoiseSVN
   \ \_/_\_/>    The coolest Interface to (Sub)Version Control
   /_/   \_\     http://tortoisesvn.net

Re: thread safe

Posted by Branko Čibej <br...@wandisco.com>.
On 04.09.2012 22:24, Stefan Küng wrote:
> Attached is my first draft of the new svn_config_dup() API.
>
> * it does not return an svn_error_t but the duplicate hash directly.
> The only API call that could return an error is svn_config_create()
> but that API also returns SVN_NO_ERROR unconditionally (for now at
> least).
> * no error checking is done for now, will add some checks for failed
> memory allocations later.
> * hash keys are duplicated as well (allocated in the passed pool)
>
> Please have a look at this. I think it's correct, at least my first
> tests show that it works and solves my problem in TSVN with multiple
> threads as well.
>
> be back tomorrow...

Index: subversion/include/svn_config.h
===================================================================
--- subversion/include/svn_config.h	(revision 1380737)
+++ subversion/include/svn_config.h	(working copy)
@@ -626,6 +626,16 @@
                                 const char *fname,
                                 apr_pool_t *pool);
 
+/** Return a deep copy of the config hash.
+ * @note use this function instead of repeated calls
+ * to svn_config_get_config to avoid reading the config file
+ * over and over.
+ *
+ * @since New in 1.8.
+ */
+apr_hash_t *
+svn_config_dup(apr_hash_t * config, apr_pool_t * pool);


As I said, the return value must be a svn_config_t*.
More comments below (I'll ignore coding style).

 
Index: subversion/libsvn_subr/config.c
===================================================================
--- subversion/libsvn_subr/config.c	(revision 1380737)
+++ subversion/libsvn_subr/config.c	(working copy)
@@ -1011,3 +1011,88 @@
   sec = apr_hash_get(cfg->sections, cfg->tmp_key->data, APR_HASH_KEY_STRING);
   return sec != NULL;
 }
+
+
+apr_hash_t *
+svn_config_dup(apr_hash_t * config, apr_pool_t * pool)
+{
+  apr_hash_t * rethash;
+  apr_hash_index_t * cidx;
+  apr_hash_index_t * sectidx;
+  apr_hash_index_t * optidx;
+
+  rethash = apr_hash_make(pool);
+  for (cidx = apr_hash_first(pool, config);
+       cidx != NULL;
+       cidx = apr_hash_next(cidx))
+  {
+    const void *ckey = NULL;
+    void *cval = NULL;
+    apr_ssize_t ckeyLength = 0;
+    svn_config_t * c;
+    svn_config_t * newconfig = NULL;
+
+    svn_config_create(&newconfig, FALSE, pool);


Um, what? You return a hash of svn_config_t? None of the
svn_config_* APIs use such a structure.

+
+    apr_hash_this(cidx, &ckey, &ckeyLength, &cval);
+    c = cval;
+
+    newconfig->sections = apr_hash_make(pool);
+    newconfig->pool = pool;
+    newconfig->x_pool = svn_pool_create(pool);
+    newconfig->x_values = c->x_values;
+    newconfig->tmp_key = svn_stringbuf_dup(c->tmp_key, pool);
+    newconfig->tmp_value = svn_stringbuf_dup(c->tmp_value, pool);
+    newconfig->section_names_case_sensitive = c->section_names_case_sensitive;

svn_config_create already creates the hash table, populates
pool and x_pool, and creates the TEMPORARY stringbufs.
In other words, all you have to do is copy x_values and
section_names_case_sensitive, everything else is already done.

+
+    for (sectidx = apr_hash_first(pool, c->sections);
+         sectidx != NULL;
+         sectidx = apr_hash_next(sectidx))
+    {
+      const void *sectkey = NULL;
+      void *sectval = NULL;
+      apr_ssize_t sectkeyLength = 0;
+      cfg_section_t * sect;
+      cfg_section_t * newsec;
+
+      apr_hash_this(sectidx, &sectkey, &sectkeyLength, &sectval);
+      sect = sectval;
+
+      newsec = apr_palloc(pool, sizeof(*newsec));
+      newsec->name = apr_pstrdup(pool, sect->name);
+      newsec->hash_key = apr_pstrdup(pool, sect->hash_key);
+      newsec->options = apr_hash_make(pool);

Suggest you factor this part out of svn_config_set instead
of duplicating the logic here.

+      for (optidx = apr_hash_first(pool, sect->options);
+           optidx != NULL;
+           optidx = apr_hash_next(optidx))
+      {
+        const void *optkey = NULL;
+        void *optval = NULL;
+        apr_ssize_t optkeyLength = 0;
+        cfg_option_t * opt;
+        cfg_option_t * newopt;
+
+        apr_hash_this(optidx, &optkey, &optkeyLength, &optval);
+        opt = optval;
+
+        newopt = apr_palloc(pool, sizeof(*newopt));
+        newopt->name = apr_pstrdup(pool, opt->name);
+        newopt->hash_key = apr_pstrdup(pool, opt->hash_key);
+        newopt->value = apr_pstrdup(pool, opt->value);
+        newopt->x_value = apr_pstrdup(pool, opt->x_value);
+        newopt->expanded = opt->expanded;
+        apr_hash_set(newsec->options,
+                     apr_pstrdup(pool, (const char*)optkey),
+                     optkeyLength, newopt);

And here; svn_config_set already has code to create option
and section records, it only needs to be factored out.


+      }
+      apr_hash_set(newconfig->sections,
+                   apr_pstrdup(pool, (const char*)sectkey),
+                   sectkeyLength, newsec);
+    }
+    apr_hash_set(rethash, 
+                 apr_pstrdup(pool, (const char*)ckey),
+                 ckeyLength, newconfig);
+  }
+
+  return rethash;
+}


So except for the rethash which I frankly don't understand at all, the
rest looks like a good start. Basically just strip away the outer loop
and rely on (refactored) existing code instead of copying it around, and
you're done.

-- Brane

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download


Re: thread safe

Posted by Stefan Küng <to...@gmail.com>.
Attached is my first draft of the new svn_config_dup() API.

* it does not return an svn_error_t but the duplicate hash directly. The 
only API call that could return an error is svn_config_create() but that 
API also returns SVN_NO_ERROR unconditionally (for now at least).
* no error checking is done for now, will add some checks for failed 
memory allocations later.
* hash keys are duplicated as well (allocated in the passed pool)

Please have a look at this. I think it's correct, at least my first 
tests show that it works and solves my problem in TSVN with multiple 
threads as well.

be back tomorrow...

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: thread safe

Posted by Branko Čibej <br...@wandisco.com>.
On 04.09.2012 21:14, Stefan Küng wrote:
> On 04.09.2012 21:07, Branko Čibej wrote:
>> On 04.09.2012 20:42, Stefan Küng wrote:
>>> Seems I got confused here:
>>> What we have to pass to almost each svn API is the config hash for the
>>> svn_client_ctx_t struct. That's what I need a copy of.
>>
>> The svn_config_* functions work with svn_config_t object. The tmp_key
>> and tmp_value fields in those object are in fact the ones that prevent
>> you from reading the same svn_config_t from multiple threads without
>> synchronization.
>>
>>> And as Bert already noticed: apr_hash_copy only does a shallow copy,
>>> not a deep copy. Seems I have to iterate over the hash and copy every
>>> item separately.
>>
>> That is correct. You'll also have to duplicate the svn_config_t
>> structure itself, I think that essentially means doing an
>> svn_config_create and copying over "x_values" and
>> "section_names_case_sensitive"; and, of course, doing the deep copy of
>> "sections".
>>
>> If you do the deep copy properly -- i.e., copying the keys and values
>> into the new config_t's pool as well as the hashes -- then you'll save
>> yourself a bit of trouble with pool lifetimes, too.
>>
>>> I'll do that first in TSVN, then do some tests.
>>> After that I can use that code to create an svn_config_dup() API if
>>> still required.
>>
>> Why not just write svn_config_dup in libsvn_subr/config.c? Surely
>> there's no reason to keep this functionality out of the core libraries.
>
> This will be a new API which won't get backported.
> But in TSVN we have the problem now and need to fix it.
>
> So I'll have to implement it there as well, at least until 1.8 is out.

Ah, yes, good point. Still, writing the code only once wouldn't be that
bad. :)

-- Brane

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download


Re: thread safe

Posted by Stefan Küng <to...@gmail.com>.
On 04.09.2012 21:07, Branko Čibej wrote:
> On 04.09.2012 20:42, Stefan Küng wrote:
>> Seems I got confused here:
>> What we have to pass to almost each svn API is the config hash for the
>> svn_client_ctx_t struct. That's what I need a copy of.
>
> The svn_config_* functions work with svn_config_t object. The tmp_key
> and tmp_value fields in those object are in fact the ones that prevent
> you from reading the same svn_config_t from multiple threads without
> synchronization.
>
>> And as Bert already noticed: apr_hash_copy only does a shallow copy,
>> not a deep copy. Seems I have to iterate over the hash and copy every
>> item separately.
>
> That is correct. You'll also have to duplicate the svn_config_t
> structure itself, I think that essentially means doing an
> svn_config_create and copying over "x_values" and
> "section_names_case_sensitive"; and, of course, doing the deep copy of
> "sections".
>
> If you do the deep copy properly -- i.e., copying the keys and values
> into the new config_t's pool as well as the hashes -- then you'll save
> yourself a bit of trouble with pool lifetimes, too.
>
>> I'll do that first in TSVN, then do some tests.
>> After that I can use that code to create an svn_config_dup() API if
>> still required.
>
> Why not just write svn_config_dup in libsvn_subr/config.c? Surely
> there's no reason to keep this functionality out of the core libraries.

This will be a new API which won't get backported.
But in TSVN we have the problem now and need to fix it.

So I'll have to implement it there as well, at least until 1.8 is out.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: thread safe

Posted by Branko Čibej <br...@wandisco.com>.
On 04.09.2012 20:42, Stefan Küng wrote:
> Seems I got confused here:
> What we have to pass to almost each svn API is the config hash for the
> svn_client_ctx_t struct. That's what I need a copy of.

The svn_config_* functions work with svn_config_t object. The tmp_key
and tmp_value fields in those object are in fact the ones that prevent
you from reading the same svn_config_t from multiple threads without
synchronization.

> And as Bert already noticed: apr_hash_copy only does a shallow copy,
> not a deep copy. Seems I have to iterate over the hash and copy every
> item separately.

That is correct. You'll also have to duplicate the svn_config_t
structure itself, I think that essentially means doing an
svn_config_create and copying over "x_values" and
"section_names_case_sensitive"; and, of course, doing the deep copy of
"sections".

If you do the deep copy properly -- i.e., copying the keys and values
into the new config_t's pool as well as the hashes -- then you'll save
yourself a bit of trouble with pool lifetimes, too.

> I'll do that first in TSVN, then do some tests.
> After that I can use that code to create an svn_config_dup() API if
> still required.

Why not just write svn_config_dup in libsvn_subr/config.c? Surely
there's no reason to keep this functionality out of the core libraries.

-- Brane

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download


Re: thread safe

Posted by Stefan Küng <to...@gmail.com>.
On 04.09.2012 20:31, Branko Čibej wrote:
> On 04.09.2012 19:33, Stefan Küng wrote:
>> On 03.09.2012 22:21, Branko Čibej wrote:
>>> On 03.09.2012 22:16, Stefan Küng wrote:
>>>> On 03.09.2012 22:11, Branko Čibej wrote:
>>>>> Either that, or add a new API that creates a deep-copy of the
>>>>> in-memory
>>>>> svn_config_t structure, making this another way to avoid re-reading
>>>>> the
>>>>> config file for each thread and avoiding the hassle of managing a
>>>>> mutex.
>>>>
>>>> I'm in favor of implementing a deep copy API for this. I think that
>>>> would be the best solution.
>>>
>>> I find it to be the best option, too. Will you write a patch for that?
>>
>> Ahem:
>> apr_hash_copy
>> will do the trick: svn_config_t is a typedef for an apr_hash_t.
>
> Really? That's strange; are you quite sure you're looking at the right code?
> subversion/libsvn_subr/config_impl.h says:
>
> struct svn_config_t
> {
>    /* Table of cfg_section_t's. */
>    apr_hash_t *sections;
>
>    /* Pool for hash tables, table entries and unexpanded values */
>    apr_pool_t *pool;
>
>    /* Pool for expanded values -- this is separate, so that we can
>       clear it when modifying the config data. */
>    apr_pool_t *x_pool;
>
>    /* Indicates that some values in the configuration have been expanded. */
>    svn_boolean_t x_values;
>
>    /* Temporary string used for lookups.  (Using a stringbuf so that
>       frequent resetting is efficient.) */
>    svn_stringbuf_t *tmp_key;
>
>    /* Temporary value used for expanded default values in svn_config_get.
>       (Using a stringbuf so that frequent resetting is efficient.) */
>    svn_stringbuf_t *tmp_value;
>
>    /* Specifies whether section names are populated case sensitively. */
>    svn_boolean_t section_names_case_sensitive;
> };
>
>
>> So it's already done :)
>
> Not even close, I'm afraid.


Seems I got confused here:
What we have to pass to almost each svn API is the config hash for the 
svn_client_ctx_t struct. That's what I need a copy of.

And as Bert already noticed: apr_hash_copy only does a shallow copy, not 
a deep copy. Seems I have to iterate over the hash and copy every item 
separately.

I'll do that first in TSVN, then do some tests.
After that I can use that code to create an svn_config_dup() API if 
still required.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: thread safe

Posted by Branko Čibej <br...@wandisco.com>.
On 04.09.2012 19:33, Stefan Küng wrote:
> On 03.09.2012 22:21, Branko Čibej wrote:
>> On 03.09.2012 22:16, Stefan Küng wrote:
>>> On 03.09.2012 22:11, Branko Čibej wrote:
>>>> Either that, or add a new API that creates a deep-copy of the
>>>> in-memory
>>>> svn_config_t structure, making this another way to avoid re-reading
>>>> the
>>>> config file for each thread and avoiding the hassle of managing a
>>>> mutex.
>>>
>>> I'm in favor of implementing a deep copy API for this. I think that
>>> would be the best solution.
>>
>> I find it to be the best option, too. Will you write a patch for that?
>
> Ahem:
> apr_hash_copy
> will do the trick: svn_config_t is a typedef for an apr_hash_t.

Really? That's strange; are you quite sure you're looking at the right code?
subversion/libsvn_subr/config_impl.h says:

struct svn_config_t
{
  /* Table of cfg_section_t's. */
  apr_hash_t *sections;

  /* Pool for hash tables, table entries and unexpanded values */
  apr_pool_t *pool;

  /* Pool for expanded values -- this is separate, so that we can
     clear it when modifying the config data. */
  apr_pool_t *x_pool;

  /* Indicates that some values in the configuration have been expanded. */
  svn_boolean_t x_values;

  /* Temporary string used for lookups.  (Using a stringbuf so that
     frequent resetting is efficient.) */
  svn_stringbuf_t *tmp_key;

  /* Temporary value used for expanded default values in svn_config_get.
     (Using a stringbuf so that frequent resetting is efficient.) */
  svn_stringbuf_t *tmp_value;

  /* Specifies whether section names are populated case sensitively. */
  svn_boolean_t section_names_case_sensitive;
};


> So it's already done :)

Not even close, I'm afraid.

-- Brane

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download


Re: thread safe

Posted by Stefan Küng <to...@gmail.com>.
On 03.09.2012 22:21, Branko Čibej wrote:
> On 03.09.2012 22:16, Stefan Küng wrote:
>> On 03.09.2012 22:11, Branko Čibej wrote:
>>> Either that, or add a new API that creates a deep-copy of the in-memory
>>> svn_config_t structure, making this another way to avoid re-reading the
>>> config file for each thread and avoiding the hassle of managing a mutex.
>>
>> I'm in favor of implementing a deep copy API for this. I think that
>> would be the best solution.
>
> I find it to be the best option, too. Will you write a patch for that?

Ahem:
apr_hash_copy
will do the trick: svn_config_t is a typedef for an apr_hash_t.

So it's already done :)

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: thread safe

Posted by Stefan Küng <to...@gmail.com>.
On 03.09.2012 22:21, Branko Čibej wrote:
> On 03.09.2012 22:16, Stefan Küng wrote:
>> On 03.09.2012 22:11, Branko Čibej wrote:
>>> Either that, or add a new API that creates a deep-copy of the in-memory
>>> svn_config_t structure, making this another way to avoid re-reading the
>>> config file for each thread and avoiding the hassle of managing a mutex.
>>
>> I'm in favor of implementing a deep copy API for this. I think that
>> would be the best solution.
>
> I find it to be the best option, too. Will you write a patch for that?

I'll give it a try tomorrow, yes.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: thread safe

Posted by Branko Čibej <br...@wandisco.com>.
On 03.09.2012 22:16, Stefan Küng wrote:
> On 03.09.2012 22:11, Branko Čibej wrote:
>> Either that, or add a new API that creates a deep-copy of the in-memory
>> svn_config_t structure, making this another way to avoid re-reading the
>> config file for each thread and avoiding the hassle of managing a mutex.
>
> I'm in favor of implementing a deep copy API for this. I think that
> would be the best solution.

I find it to be the best option, too. Will you write a patch for that?

-- Brane

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download


Re: thread safe

Posted by Stefan Küng <to...@gmail.com>.
On 03.09.2012 22:11, Branko Čibej wrote:
> On 03.09.2012 20:47, Stefan Küng wrote:
>> Hi,
>>
>> It seems that the svn_config_t structure isn't thread safe, i.e. can't
>> be shared among multiple threads.
>> See here for a detailed report on what problems this causes:
>> http://tortoisesvn.tigris.org/ds/viewMessage.do?dsForumId=757&dsMessageId=3003152
>>
>>
>> Is this by design? I assumed that all svn APIs are thread safe for
>> read access at least unless noted otherwise (example: the API
>> svn_utf_initialize states its thread-unsafeness in the note section of
>> the docs).
>
> Subversion's functions are re-entrant. Data structures, including
> svn_config_t, are thread-specific. We don't promise any more than that.
>
>> While we can work around that thread problem a little bit, that
>> workaround has its own big problems:
>> * each thread would need its own svn_config_t structure
>>    --> for each thread the config file is read (open, read, close)
>>    which results in multiple unnecessary disk accesses
>
> One would assume that the config file is contents are cached once
> they've been read, so disk accesses shouldn't be an issue.
>
>>    --> when the config file is read multiple times, sometimes the read
>>    fails (at least on Windows) because the file is opened already -
>>    usually because virus scanners open and scan every file that a
>>    process accesses, even when only for reading.
>
> So we have to add a delay loop. Why a file that has not been changes
> should be locked by virus scanners is, IMO, an issue to be taken up with
> Windows virus scanner writers.

Well, in an ideal world that would work. But that's not how it works: we 
have to deal with such issues ourselves.

> Also, one can always use a mutex to control access to the svn_config_t.
> Surely that's preferable to re-reading it any number of times.
>
>> Joel Jirak made a patch for subversion/libsvn_subr/config.c which
>> would fix the problem (see the thread linked above). Would that be an
>> acceptable solution?
>
> The patch creates a new pool for almost every access, so no, that's not
> acceptable. The only way to make this work would be to rev all the
> svn_config_* global APIs to accept a scratch pool argument from which
> allocations should be made.
>
> Either that, or add a new API that creates a deep-copy of the in-memory
> svn_config_t structure, making this another way to avoid re-reading the
> config file for each thread and avoiding the hassle of managing a mutex.

I'm in favor of implementing a deep copy API for this. I think that 
would be the best solution.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: thread safe

Posted by Branko Čibej <br...@wandisco.com>.
On 03.09.2012 20:47, Stefan Küng wrote:
> Hi,
>
> It seems that the svn_config_t structure isn't thread safe, i.e. can't
> be shared among multiple threads.
> See here for a detailed report on what problems this causes:
> http://tortoisesvn.tigris.org/ds/viewMessage.do?dsForumId=757&dsMessageId=3003152
>
>
> Is this by design? I assumed that all svn APIs are thread safe for
> read access at least unless noted otherwise (example: the API
> svn_utf_initialize states its thread-unsafeness in the note section of
> the docs).

Subversion's functions are re-entrant. Data structures, including
svn_config_t, are thread-specific. We don't promise any more than that.

> While we can work around that thread problem a little bit, that
> workaround has its own big problems:
> * each thread would need its own svn_config_t structure
>   --> for each thread the config file is read (open, read, close)
>   which results in multiple unnecessary disk accesses

One would assume that the config file is contents are cached once
they've been read, so disk accesses shouldn't be an issue.

>   --> when the config file is read multiple times, sometimes the read
>   fails (at least on Windows) because the file is opened already -
>   usually because virus scanners open and scan every file that a
>   process accesses, even when only for reading.

So we have to add a delay loop. Why a file that has not been changes
should be locked by virus scanners is, IMO, an issue to be taken up with
Windows virus scanner writers.

Also, one can always use a mutex to control access to the svn_config_t.
Surely that's preferable to re-reading it any number of times.

> Joel Jirak made a patch for subversion/libsvn_subr/config.c which
> would fix the problem (see the thread linked above). Would that be an
> acceptable solution?

The patch creates a new pool for almost every access, so no, that's not
acceptable. The only way to make this work would be to rev all the
svn_config_* global APIs to accept a scratch pool argument from which
allocations should be made.

Either that, or add a new API that creates a deep-copy of the in-memory
svn_config_t structure, making this another way to avoid re-reading the
config file for each thread and avoiding the hassle of managing a mutex.

-- Brane

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download