You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ivan Zhakov <iv...@visualsvn.com> on 2014/03/25 12:02:33 UTC

Re: svn commit: r1344347 - /subversion/trunk/subversion/libsvn_subr/config_file.c

On 30 May 2012 20:52,  <rh...@apache.org> wrote:
> Author: rhuijben
> Date: Wed May 30 16:52:36 2012
> New Revision: 1344347
>
> URL: http://svn.apache.org/viewvc?rev=1344347&view=rev
> Log:
> Remove about 15% of the cpu overhead of svnserve and httpd while running the
> test suite by introducing a very simple parser buffer in the config file parser
> code.
>
> The nice clean stream translation code had an 7 times overhead on this very
> performance critical code path, while clients of svnserve and httpd were
> waiting. (Before this patch the this code path used 22% of the cpu time,
> now just 3%)
>
> * subversion/libsvn_subr/config_file.c
>   (parse_context_t): Add buffer and position variables. Use EOF as no ungetc var.
>   (parser_getc): Skip '\r' characters as we only need the '\n' character for our
>     config files. Use EOF as no ungetc var.
>   (parse_option,
>    parse_section_name): Rename pool argument to scratch_pool.
>   (svn_config__parse_file): Create scratch pool and store the file and parse
>     ctx in it. Update initialization and destroy scratch_pool after successfull
>     parsing.
>
> Modified:
>     subversion/trunk/subversion/libsvn_subr/config_file.c
>
> Modified: subversion/trunk/subversion/libsvn_subr/config_file.c
> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/config_file.c?rev=1344347&r1=1344346&r2=1344347&view=diff
> ==============================================================================
> --- subversion/trunk/subversion/libsvn_subr/config_file.c (original)
> +++ subversion/trunk/subversion/libsvn_subr/config_file.c Wed May 30 16:52:36 2012
> @@ -60,15 +60,19 @@ typedef struct parse_context_t
>    /* The current line in the file */
>    int line;
>
> -  /* Cached ungotten character  - streams don't support ungetc()
> -     [emulate it] */
> +  /* Emulate an ungetc */
>    int ungotten_char;
> -  svn_boolean_t have_ungotten_char;
>
> -  /* Temporary strings, allocated from the temp pool */
> +  /* Temporary strings */
>    svn_stringbuf_t *section;
>    svn_stringbuf_t *option;
>    svn_stringbuf_t *value;
> +
> +  /* Parser buffer for getc() to avoid call overhead into several libraries
> +     for every character */
> +  char parser_buffer[SVN_STREAM_CHUNK_SIZE]; /* Larger than most config files */
Hi Bert,

Did you have specific reasons for using deprecated
SVN_STREAM_CHUNK_SIZE (100 kb) instead of SVN__STREAM_CHUNK_SIZE
(16kb) constant for read buffer? I'm asking because it seems this
significantly increase memory footprint for high loaded servers due
APR allocator behavior and fragmentation.

I would like change this buffer to 16kb instead of 100kb, if you don't
have strong objections.

-- 
Ivan Zhakov
CTO | VisualSVN | http://www.visualsvn.com

Re: svn commit: r1344347 - /subversion/trunk/subversion/libsvn_subr/config_file.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 25 March 2014 15:11, Bert Huijben <be...@qqmail.nl> wrote:
>
>
>> -----Original Message-----
>> From: Ivan Zhakov [mailto:ivan@visualsvn.com]
>> Sent: dinsdag 25 maart 2014 12:03
>> To: Subversion Development; Bert Huijben
>> Subject: Re: svn commit: r1344347 -
>> /subversion/trunk/subversion/libsvn_subr/config_file.c
>>
>> On 30 May 2012 20:52,  <rh...@apache.org> wrote:
>> > Author: rhuijben
>> > Date: Wed May 30 16:52:36 2012
>> > New Revision: 1344347
>> >
>> > URL: http://svn.apache.org/viewvc?rev=1344347&view=rev
>> > Log:
>> > Remove about 15% of the cpu overhead of svnserve and httpd while
>> running the
>> > test suite by introducing a very simple parser buffer in the config file parser
>> > code.
>> >
>> > The nice clean stream translation code had an 7 times overhead on this
>> very
>> > performance critical code path, while clients of svnserve and httpd were
>> > waiting. (Before this patch the this code path used 22% of the cpu time,
>> > now just 3%)
>> >
>> > * subversion/libsvn_subr/config_file.c
>> >   (parse_context_t): Add buffer and position variables. Use EOF as no
>> ungetc var.
>> >   (parser_getc): Skip '\r' characters as we only need the '\n' character for
>> our
>> >     config files. Use EOF as no ungetc var.
>> >   (parse_option,
>> >    parse_section_name): Rename pool argument to scratch_pool.
>> >   (svn_config__parse_file): Create scratch pool and store the file and parse
>> >     ctx in it. Update initialization and destroy scratch_pool after successfull
>> >     parsing.
>> >
>> > Modified:
>> >     subversion/trunk/subversion/libsvn_subr/config_file.c
>> >
>> > Modified: subversion/trunk/subversion/libsvn_subr/config_file.c
>> > URL:
>> http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/co
>> nfig_file.c?rev=1344347&r1=1344346&r2=1344347&view=diff
>> >
>> ==========================================================
>> ====================
>> > --- subversion/trunk/subversion/libsvn_subr/config_file.c (original)
>> > +++ subversion/trunk/subversion/libsvn_subr/config_file.c Wed May 30
>> 16:52:36 2012
>> > @@ -60,15 +60,19 @@ typedef struct parse_context_t
>> >    /* The current line in the file */
>> >    int line;
>> >
>> > -  /* Cached ungotten character  - streams don't support ungetc()
>> > -     [emulate it] */
>> > +  /* Emulate an ungetc */
>> >    int ungotten_char;
>> > -  svn_boolean_t have_ungotten_char;
>> >
>> > -  /* Temporary strings, allocated from the temp pool */
>> > +  /* Temporary strings */
>> >    svn_stringbuf_t *section;
>> >    svn_stringbuf_t *option;
>> >    svn_stringbuf_t *value;
>> > +
>> > +  /* Parser buffer for getc() to avoid call overhead into several libraries
>> > +     for every character */
>> > +  char parser_buffer[SVN_STREAM_CHUNK_SIZE]; /* Larger than most
>> config files */
>> Hi Bert,
>>
>> Did you have specific reasons for using deprecated
>> SVN_STREAM_CHUNK_SIZE (100 kb) instead of SVN__STREAM_CHUNK_SIZE
>> (16kb) constant for read buffer? I'm asking because it seems this
>> significantly increase memory footprint for high loaded servers due
>> APR allocator behavior and fragmentation.
>>
>> I would like change this buffer to 16kb instead of 100kb, if you don't
>> have strong objections.
>
> Answered on IRC: +1
>
> Using something like <= 8 KByte would not handle our default generated client side config files...
> But 16 KB will easily hold them.
>
> The fsfs and repository config files are much smaller, so if it really helps we could make this configurable per use case.
>
Bert,

Thanks for quick reply.

Committed in r1581296 and nominated for backport to 1.8.x.
-- 
Ivan Zhakov
CTO | VisualSVN | http://www.visualsvn.com

RE: svn commit: r1344347 - /subversion/trunk/subversion/libsvn_subr/config_file.c

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Ivan Zhakov [mailto:ivan@visualsvn.com]
> Sent: dinsdag 25 maart 2014 12:03
> To: Subversion Development; Bert Huijben
> Subject: Re: svn commit: r1344347 -
> /subversion/trunk/subversion/libsvn_subr/config_file.c
> 
> On 30 May 2012 20:52,  <rh...@apache.org> wrote:
> > Author: rhuijben
> > Date: Wed May 30 16:52:36 2012
> > New Revision: 1344347
> >
> > URL: http://svn.apache.org/viewvc?rev=1344347&view=rev
> > Log:
> > Remove about 15% of the cpu overhead of svnserve and httpd while
> running the
> > test suite by introducing a very simple parser buffer in the config file parser
> > code.
> >
> > The nice clean stream translation code had an 7 times overhead on this
> very
> > performance critical code path, while clients of svnserve and httpd were
> > waiting. (Before this patch the this code path used 22% of the cpu time,
> > now just 3%)
> >
> > * subversion/libsvn_subr/config_file.c
> >   (parse_context_t): Add buffer and position variables. Use EOF as no
> ungetc var.
> >   (parser_getc): Skip '\r' characters as we only need the '\n' character for
> our
> >     config files. Use EOF as no ungetc var.
> >   (parse_option,
> >    parse_section_name): Rename pool argument to scratch_pool.
> >   (svn_config__parse_file): Create scratch pool and store the file and parse
> >     ctx in it. Update initialization and destroy scratch_pool after successfull
> >     parsing.
> >
> > Modified:
> >     subversion/trunk/subversion/libsvn_subr/config_file.c
> >
> > Modified: subversion/trunk/subversion/libsvn_subr/config_file.c
> > URL:
> http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/co
> nfig_file.c?rev=1344347&r1=1344346&r2=1344347&view=diff
> >
> ==========================================================
> ====================
> > --- subversion/trunk/subversion/libsvn_subr/config_file.c (original)
> > +++ subversion/trunk/subversion/libsvn_subr/config_file.c Wed May 30
> 16:52:36 2012
> > @@ -60,15 +60,19 @@ typedef struct parse_context_t
> >    /* The current line in the file */
> >    int line;
> >
> > -  /* Cached ungotten character  - streams don't support ungetc()
> > -     [emulate it] */
> > +  /* Emulate an ungetc */
> >    int ungotten_char;
> > -  svn_boolean_t have_ungotten_char;
> >
> > -  /* Temporary strings, allocated from the temp pool */
> > +  /* Temporary strings */
> >    svn_stringbuf_t *section;
> >    svn_stringbuf_t *option;
> >    svn_stringbuf_t *value;
> > +
> > +  /* Parser buffer for getc() to avoid call overhead into several libraries
> > +     for every character */
> > +  char parser_buffer[SVN_STREAM_CHUNK_SIZE]; /* Larger than most
> config files */
> Hi Bert,
> 
> Did you have specific reasons for using deprecated
> SVN_STREAM_CHUNK_SIZE (100 kb) instead of SVN__STREAM_CHUNK_SIZE
> (16kb) constant for read buffer? I'm asking because it seems this
> significantly increase memory footprint for high loaded servers due
> APR allocator behavior and fragmentation.
> 
> I would like change this buffer to 16kb instead of 100kb, if you don't
> have strong objections.

Answered on IRC: +1

Using something like <= 8 KByte would not handle our default generated client side config files...
But 16 KB will easily hold them.

The fsfs and repository config files are much smaller, so if it really helps we could make this configurable per use case.

	Bert 


Re: r1344347 - stream buffering in config_file.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 3 April 2014 22:15, Julian Foad <ju...@btopenworld.com> wrote:
>>> URL: http://svn.apache.org/viewvc?rev=1344347&view=rev
>
>>> Log:
>>> Remove about 15% of the cpu overhead of svnserve and httpd while running
>>> the test suite by introducing a very simple parser buffer in the config file
>>> parser code.
> [...]
>>>  * subversion/libsvn_subr/config_file.c
>>>    (parse_context_t): Add buffer and position variables. Use EOF as no
>>> ungetc var.
> [...]
>>>  Modified: subversion/trunk/subversion/libsvn_subr/config_file.c
>>> ==============================================================================
>>>  +
>>>  +  /* Parser buffer for getc() to avoid call overhead into several libraries
>>>  +     for every character */
>>>  +  char parser_buffer[SVN_STREAM_CHUNK_SIZE];
> [...]
>
> This commit implements buffering of a generic stream, directly in the config file parser. Would it
> not be better to implement buffering of a generic stream as a generic module?
>
This commit improved performance over buffer APR file, so switching to
back to buffered stream will degrade performance again.
Config parser also use parser_buffer knowledge to use extremle
optimized memchr() to find end of line to skip comment lines. Config
parsers used very often, that's because it should be optimized.


-- 
Ivan Zhakov

Re: r1344347 - stream buffering in config_file.c

Posted by Julian Foad <ju...@btopenworld.com>.
Branko Čibej wrote:
> Julian Foad wrote:
>> URL: http://svn.apache.org/viewvc?rev=1344347&view=rev
>> This commit implements buffering of a generic stream, directly in the
>> config file parser. Would it not be better to implement buffering of
>> a generic stream as a generic module?
> 
> We already have svn_stream_buffered.

That's something different: it's a read/write stream interface to a spill buffer.

- Julian

Re: r1344347 - stream buffering in config_file.c

Posted by Branko Čibej <br...@wandisco.com>.
On 03.04.2014 20:15, Julian Foad wrote:
>>> URL: http://svn.apache.org/viewvc?rev=1344347&view=rev
>>> Log:
>>> Remove about 15% of the cpu overhead of svnserve and httpd while running
>>> the test suite by introducing a very simple parser buffer in the config file
>>> parser code.
> [...]
>>>   * subversion/libsvn_subr/config_file.c
>>>    (parse_context_t): Add buffer and position variables. Use EOF as no 
>>> ungetc var.
> [...]
>>>   Modified: subversion/trunk/subversion/libsvn_subr/config_file.c
>>> ==============================================================================
>>>   +
>>>   +  /* Parser buffer for getc() to avoid call overhead into several libraries
>>>   +     for every character */
>>>   +  char parser_buffer[SVN_STREAM_CHUNK_SIZE];
> [...]
>
> This commit implements buffering of a generic stream, directly in the config file parser. Would it not be better to implement buffering of a generic stream as a generic module?

We already have svn_stream_buffered.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com

Re: r1344347 - stream buffering in config_file.c

Posted by Julian Foad <ju...@btopenworld.com>.
Bert Huijben wrote:

> Julian Foad wrote:
>>>> URL: http://svn.apache.org/viewvc?rev=1344347&view=rev
>>  This commit implements buffering of a generic stream, directly in the config
>>  file parser. Would it not be better to implement buffering of a generic stream
>>  as a generic module?
> 
> Perhaps...
> 
> But a patch like that didn't allow speeding up the test suite by 15% in one
> afternoon. This was a very local patch with a very huge result. (And as it
> was very local it is easy to remove it when somebody else optimizes it in a
> different place). 

Sure -- I appreciate why it was worth doing that way.

> [...]
> 
> And just abstracting it, just for abstracting will add more overhead... (The
> current code is all in one C file and will be completely inlined by every
> compiler... Which is impossible if we need 3 callback and/or shared library
> calls for reading every byte... Even worse when that is taking out mutexes
> such as the apr read code)

I don't think it's fair to assume that an abstraction need be that inefficient, although I admit that an implementation easily can be that inefficient if we don't take care of it.

Part of what I was thinking is, why are we re-inventing the standard <stdio.h> streams interface (fread, fgetc, fungetc, ...) in Subversion? And I started wondering if we could hook in to the standard streams. The GNU C library (glibc) does provide hooks for doing this:

  http://www.gnu.org/software/libc/manual/html_mono/libc.html#Custom-Streams

But it's not standardized, and Windows doesn't seem to provide any such facility.

I don't think we need to do anything right now, but I wanted to note the suggestion.

- Julian

RE: r1344347 - stream buffering in config_file.c

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Julian Foad [mailto:julianfoad@btopenworld.com]
> Sent: donderdag 3 april 2014 20:15
> To: Bert Huijben
> Cc: Ivan Zhakov; Subversion Development
> Subject: Re: r1344347 - stream buffering in config_file.c
> 
> >> URL: http://svn.apache.org/viewvc?rev=1344347&view=rev
> 
> >> Log:
> >> Remove about 15% of the cpu overhead of svnserve and httpd while
> running
> >> the test suite by introducing a very simple parser buffer in the config
file
> >> parser code.
> [...]
> >>  * subversion/libsvn_subr/config_file.c
> >>    (parse_context_t): Add buffer and position variables. Use EOF as no
> >> ungetc var.
> [...]
> >>  Modified: subversion/trunk/subversion/libsvn_subr/config_file.c
> >>
> ==========================================================
> ====================
> >>  +
> >>  +  /* Parser buffer for getc() to avoid call overhead into several
libraries
> >>  +     for every character */
> >>  +  char parser_buffer[SVN_STREAM_CHUNK_SIZE];
> [...]
> 
> This commit implements buffering of a generic stream, directly in the
config
> file parser. Would it not be better to implement buffering of a generic
stream
> as a generic module?

Perhaps...

But a patch like that didn't allow speeding up the test suite by 15% in one
afternoon. This was a very local patch with a very huge result. (And as it
was very local it is easy to remove it when somebody else optimizes it in a
different place). 

The stream api is optimized for using larger buffers while the config parser
is really specific in that it reads byte by byte. And the callback on
callback on callback wrapping (stream over svn_io over apr... none of them
inlineable by the compiler) was getting very expensive here, while there are
as far as I know no other users that have the same needs.
(Almost every other location uses per line parsing, which is already
optimized)

This was (at the time I applied the patch) by far the most expensive code in
both mod_dav_svn and svnserve when running the testsuite on Windows (on a
ramdrive), as we are reading the config files (e.g. fsfs.conf and the authz
files) over and over again.

I would guess this code should be easy to abstract... if/when there are more
parts of Subversion that need it, but I haven't found other performance
critical code paths that would need this.

And just abstracting it, just for abstracting will add more overhead... (The
current code is all in one C file and will be completely inlined by every
compiler... Which is impossible if we need 3 callback and/or shared library
calls for reading every byte... Even worse when that is taking out mutexes
such as the apr read code)

	Bert



Re: r1344347 - stream buffering in config_file.c

Posted by Julian Foad <ju...@btopenworld.com>.
>> URL: http://svn.apache.org/viewvc?rev=1344347&view=rev

>> Log:
>> Remove about 15% of the cpu overhead of svnserve and httpd while running
>> the test suite by introducing a very simple parser buffer in the config file
>> parser code.
[...]
>>  * subversion/libsvn_subr/config_file.c
>>    (parse_context_t): Add buffer and position variables. Use EOF as no 
>> ungetc var.
[...]
>>  Modified: subversion/trunk/subversion/libsvn_subr/config_file.c
>> ==============================================================================
>>  +
>>  +  /* Parser buffer for getc() to avoid call overhead into several libraries
>>  +     for every character */
>>  +  char parser_buffer[SVN_STREAM_CHUNK_SIZE];
[...]

This commit implements buffering of a generic stream, directly in the config file parser. Would it not be better to implement buffering of a generic stream as a generic module?

- Julian