You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Shahaf <da...@elego.de> on 2012/01/20 03:40:44 UTC

Re: svn commit: r1233566 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c

hwright@apache.org wrote on Thu, Jan 19, 2012 at 20:56:35 -0000:
> Author: hwright
> Date: Thu Jan 19 20:56:35 2012
> New Revision: 1233566
> 
> URL: http://svn.apache.org/viewvc?rev=1233566&view=rev
> Log:
> Add a public wrapper around our spillbuf-backed stream.
> 
> * subversion/include/svn_io.h
>   (svn_stream_buffered): New.
> 
> * subversion/libsvn_subr/stream.c
>   (svn_stream_buffered): New.

So, with this change, exported files of up to 100kB in size will be
processed in-memory, without being written to disk twice.  Nice. :-)

I do wonder if the "to disk" threshold should be in the public
signature, but don't have offhand a use-case justifying that.

Thanks,

Daniel

Re: svn commit: r1233566 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c

Posted by Daniel Shahaf <da...@elego.de>.
Hyrum K Wright wrote on Fri, Jan 20, 2012 at 06:59:45 -0600:
> On Fri, Jan 20, 2012 at 12:42 AM, Greg Stein <gs...@gmail.com> wrote:
> >
> > On Jan 19, 2012 11:02 PM, "Daniel Shahaf" <da...@elego.de> wrote:
> >>
> >> Hyrum K Wright wrote on Thu, Jan 19, 2012 at 21:47:51 -0600:
> >> > On Thu, Jan 19, 2012 at 8:40 PM, Daniel Shahaf <da...@elego.de>
> >> > wrote:
> >> > > I do wonder if the "to disk" threshold should be in the public
> >> > > signature, but don't have offhand a use-case justifying that.
> >> >
> >> > We could, but I figured callers who needed finer-grain control over
> >> > the size of the buffer would use the underlying private API which
> >> > gives them that ability.  And as I noted in the code, the choice of
> >> > 100k was completely arbitrary, the result of a lot of squinting and
> >> > hand waving.  If somebody has better guestimates (Stefan F.?) as to
> >> > what would be better, feel free to improve upon it.
> >>
> >> FWIW, my point was that the caller (of this now-public API) may be able
> >> to have a better guesstimate as they'd know the use-case, the number of
> >> concurrent spillbuf objects, the typical length (`wc -c`) of the stream,
> >> etc.
> >
> > Right. We always assume "caller knows best", and we try to document our APIs
> > to give them the needed info.
> >
> > I would recommend exposure, with two DEFAULT constants. I'd be -0 with
> > making them 0 and documenting their remapping to "suitable defaults". (ie.
> > allowing callers to use 0 instead of a 20-char symbol)
> 
> But some callers don't care; those that do already have a way of
> having finer-grained control: the private API.  The blocksize and

You're suggesting a private API as a replacement to a public API.

I'll agree with you, though, if you claim that it's not a problem until
a caller outside of the core libraries has demonstrated a need for this
API. :-)

> maxsize parameters to the spillbuf code are an implementation detail
> of a more generic paradigm, so exposing them through the primary API
> is inappropriate.  It's the "generic buffered stream" abstraction that
> callers should be interested in, and letting us improve the
> implementation as we see fit.  For callers who want or need the
> specific spillbuf semantics: use the spillbuf-specific private API.
> 
> As implemented, using svn_stream_buffered() is no worse than
> unconditionally spooling data to disk (and is in many usecases *much*
> better).  Callers get a free benefit now, so let's not worry about
> giving them the additional control at the expense of additional
> complexity until it's shown they need it.

Re: svn commit: r1233566 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c

Posted by Hyrum K Wright <hy...@wandisco.com>.
On Fri, Jan 20, 2012 at 12:42 AM, Greg Stein <gs...@gmail.com> wrote:
>
> On Jan 19, 2012 11:02 PM, "Daniel Shahaf" <da...@elego.de> wrote:
>>
>> Hyrum K Wright wrote on Thu, Jan 19, 2012 at 21:47:51 -0600:
>> > On Thu, Jan 19, 2012 at 8:40 PM, Daniel Shahaf <da...@elego.de>
>> > wrote:
>> > > I do wonder if the "to disk" threshold should be in the public
>> > > signature, but don't have offhand a use-case justifying that.
>> >
>> > We could, but I figured callers who needed finer-grain control over
>> > the size of the buffer would use the underlying private API which
>> > gives them that ability.  And as I noted in the code, the choice of
>> > 100k was completely arbitrary, the result of a lot of squinting and
>> > hand waving.  If somebody has better guestimates (Stefan F.?) as to
>> > what would be better, feel free to improve upon it.
>>
>> FWIW, my point was that the caller (of this now-public API) may be able
>> to have a better guesstimate as they'd know the use-case, the number of
>> concurrent spillbuf objects, the typical length (`wc -c`) of the stream,
>> etc.
>
> Right. We always assume "caller knows best", and we try to document our APIs
> to give them the needed info.
>
> I would recommend exposure, with two DEFAULT constants. I'd be -0 with
> making them 0 and documenting their remapping to "suitable defaults". (ie.
> allowing callers to use 0 instead of a 20-char symbol)

But some callers don't care; those that do already have a way of
having finer-grained control: the private API.  The blocksize and
maxsize parameters to the spillbuf code are an implementation detail
of a more generic paradigm, so exposing them through the primary API
is inappropriate.  It's the "generic buffered stream" abstraction that
callers should be interested in, and letting us improve the
implementation as we see fit.  For callers who want or need the
specific spillbuf semantics: use the spillbuf-specific private API.

As implemented, using svn_stream_buffered() is no worse than
unconditionally spooling data to disk (and is in many usecases *much*
better).  Callers get a free benefit now, so let's not worry about
giving them the additional control at the expense of additional
complexity until it's shown they need it.

> I'm also a bit concerned with putting these in the public API. I guess our
> prevalence of streams kind of demands this "memory shortcut", but I worry
> about doing this earlier rather than later. Maybe delay one release? (I'm
> confident in the code; unclear on what we want to sign up for long-term)

I was really surprised we didn't already have a stream which
implemented these types of semantics.  We can remove it if you want,
but the abstraction is valid, and one whose usefulness is apparent.  I
think we should leave the no-parameter generic API in.

-Hyrum



-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/

Re: svn commit: r1233566 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c

Posted by Greg Stein <gs...@gmail.com>.
On Jan 19, 2012 11:02 PM, "Daniel Shahaf" <da...@elego.de> wrote:
>
> Hyrum K Wright wrote on Thu, Jan 19, 2012 at 21:47:51 -0600:
> > On Thu, Jan 19, 2012 at 8:40 PM, Daniel Shahaf <da...@elego.de>
wrote:
> > > I do wonder if the "to disk" threshold should be in the public
> > > signature, but don't have offhand a use-case justifying that.
> >
> > We could, but I figured callers who needed finer-grain control over
> > the size of the buffer would use the underlying private API which
> > gives them that ability.  And as I noted in the code, the choice of
> > 100k was completely arbitrary, the result of a lot of squinting and
> > hand waving.  If somebody has better guestimates (Stefan F.?) as to
> > what would be better, feel free to improve upon it.
>
> FWIW, my point was that the caller (of this now-public API) may be able
> to have a better guesstimate as they'd know the use-case, the number of
> concurrent spillbuf objects, the typical length (`wc -c`) of the stream,
> etc.

Right. We always assume "caller knows best", and we try to document our
APIs to give them the needed info.

I would recommend exposure, with two DEFAULT constants. I'd be -0 with
making them 0 and documenting their remapping to "suitable defaults". (ie.
allowing callers to use 0 instead of a 20-char symbol)

I'm also a bit concerned with putting these in the public API. I guess our
prevalence of streams kind of demands this "memory shortcut", but I worry
about doing this earlier rather than later. Maybe delay one release? (I'm
confident in the code; unclear on what we want to sign up for long-term)

Cheers,
-g

Re: svn commit: r1233566 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c

Posted by Daniel Shahaf <da...@elego.de>.
Hyrum K Wright wrote on Thu, Jan 19, 2012 at 21:47:51 -0600:
> On Thu, Jan 19, 2012 at 8:40 PM, Daniel Shahaf <da...@elego.de> wrote:
> > I do wonder if the "to disk" threshold should be in the public
> > signature, but don't have offhand a use-case justifying that.
> 
> We could, but I figured callers who needed finer-grain control over
> the size of the buffer would use the underlying private API which
> gives them that ability.  And as I noted in the code, the choice of
> 100k was completely arbitrary, the result of a lot of squinting and
> hand waving.  If somebody has better guestimates (Stefan F.?) as to
> what would be better, feel free to improve upon it.

FWIW, my point was that the caller (of this now-public API) may be able
to have a better guesstimate as they'd know the use-case, the number of
concurrent spillbuf objects, the typical length (`wc -c`) of the stream,
etc.

Re: svn commit: r1233566 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c

Posted by Daniel Shahaf <da...@elego.de>.
Hyrum K Wright wrote on Thu, Jan 19, 2012 at 21:47:51 -0600:
> On Thu, Jan 19, 2012 at 8:40 PM, Daniel Shahaf <da...@elego.de> wrote:
> > I do wonder if the "to disk" threshold should be in the public
> > signature, but don't have offhand a use-case justifying that.
> 
> We could, but I figured callers who needed finer-grain control over
> the size of the buffer would use the underlying private API which
> gives them that ability.  And as I noted in the code, the choice of
> 100k was completely arbitrary, the result of a lot of squinting and
> hand waving.  If somebody has better guestimates (Stefan F.?) as to
> what would be better, feel free to improve upon it.

FWIW, my point was that the caller (of this now-public API) may be able
to have a better guesstimate as they'd know the use-case, the number of
concurrent spillbuf objects, the typical length (`wc -c`) of the stream,
etc.

Re: svn commit: r1233566 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c

Posted by Hyrum K Wright <hy...@wandisco.com>.
On Thu, Jan 19, 2012 at 8:40 PM, Daniel Shahaf <da...@elego.de> wrote:
> hwright@apache.org wrote on Thu, Jan 19, 2012 at 20:56:35 -0000:
>> Author: hwright
>> Date: Thu Jan 19 20:56:35 2012
>> New Revision: 1233566
>>
>> URL: http://svn.apache.org/viewvc?rev=1233566&view=rev
>> Log:
>> Add a public wrapper around our spillbuf-backed stream.
>>
>> * subversion/include/svn_io.h
>>   (svn_stream_buffered): New.
>>
>> * subversion/libsvn_subr/stream.c
>>   (svn_stream_buffered): New.
>
> So, with this change, exported files of up to 100kB in size will be
> processed in-memory, without being written to disk twice.  Nice. :-)

On the ev2-export branch, yes.  I haven't yet made a corresponding
change on trunk, and I don't know of we even can with the existing
editor implementation there.

On the other hand, repos-to-wc copy used a similar paradigm, and
that's been improved on trunk, so it's not a complete loss. :)

> I do wonder if the "to disk" threshold should be in the public
> signature, but don't have offhand a use-case justifying that.

We could, but I figured callers who needed finer-grain control over
the size of the buffer would use the underlying private API which
gives them that ability.  And as I noted in the code, the choice of
100k was completely arbitrary, the result of a lot of squinting and
hand waving.  If somebody has better guestimates (Stefan F.?) as to
what would be better, feel free to improve upon it.

-Hyrum


-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/

Re: svn commit: r1233566 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c

Posted by Hyrum K Wright <hy...@wandisco.com>.
On Thu, Jan 19, 2012 at 8:40 PM, Daniel Shahaf <da...@elego.de> wrote:
> hwright@apache.org wrote on Thu, Jan 19, 2012 at 20:56:35 -0000:
>> Author: hwright
>> Date: Thu Jan 19 20:56:35 2012
>> New Revision: 1233566
>>
>> URL: http://svn.apache.org/viewvc?rev=1233566&view=rev
>> Log:
>> Add a public wrapper around our spillbuf-backed stream.
>>
>> * subversion/include/svn_io.h
>>   (svn_stream_buffered): New.
>>
>> * subversion/libsvn_subr/stream.c
>>   (svn_stream_buffered): New.
>
> So, with this change, exported files of up to 100kB in size will be
> processed in-memory, without being written to disk twice.  Nice. :-)

On the ev2-export branch, yes.  I haven't yet made a corresponding
change on trunk, and I don't know of we even can with the existing
editor implementation there.

On the other hand, repos-to-wc copy used a similar paradigm, and
that's been improved on trunk, so it's not a complete loss. :)

> I do wonder if the "to disk" threshold should be in the public
> signature, but don't have offhand a use-case justifying that.

We could, but I figured callers who needed finer-grain control over
the size of the buffer would use the underlying private API which
gives them that ability.  And as I noted in the code, the choice of
100k was completely arbitrary, the result of a lot of squinting and
hand waving.  If somebody has better guestimates (Stefan F.?) as to
what would be better, feel free to improve upon it.

-Hyrum


-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/