You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Hyrum K Wright <hy...@wandisco.com> on 2011/09/13 20:15:48 UTC

[PATCH] Make client diff output streamy

In looking over the client diff APIs, I noticed the output parameters
are file handles, not streams.  This feels...odd to me, so I hacked
together the attached patch which updates the APIs to use output
streams.  It isn't ready to commit just yet, but before doing the
backward compat dance (rev'ing an API before it's even released, ugh),
I thought I'd post it here for comments or concerns first.

The one gotcha is that running an external diff command still requires
files, so this patch creates temporary ones and then copies them to
the stream.  Right now, this is a limitation of our own APIs, but
fundamentally it's because APR wants an apr_file_t to plug into the
stdout and stderr of processes it launches.  Ideally, we'd find some
way of mimicking an apr_file_t with a stream, but that feels like
Future Work.

Incidentally, one motivation for having streams instead of files is
compressed pristines.  At some point, I'd like to use streams as
*input* to things like diff and merge, which is a prerequisite for
storing stuff in compressed form and uncompressing it via a stream.
(If that doesn't make much sense, it's 'cause I'm still working out
the details in my head.)

-Hyrum

-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/

Re: [PATCH] Make client diff output streamy

Posted by Hyrum K Wright <hy...@wandisco.com>.
I didn't hear any objections, so I committed the patch (along with
rev'ing the API) in r1171713.

-Hyrum

On Tue, Sep 13, 2011 at 1:15 PM, Hyrum K Wright
<hy...@wandisco.com> wrote:
> In looking over the client diff APIs, I noticed the output parameters
> are file handles, not streams.  This feels...odd to me, so I hacked
> together the attached patch which updates the APIs to use output
> streams.  It isn't ready to commit just yet, but before doing the
> backward compat dance (rev'ing an API before it's even released, ugh),
> I thought I'd post it here for comments or concerns first.
>
> The one gotcha is that running an external diff command still requires
> files, so this patch creates temporary ones and then copies them to
> the stream.  Right now, this is a limitation of our own APIs, but
> fundamentally it's because APR wants an apr_file_t to plug into the
> stdout and stderr of processes it launches.  Ideally, we'd find some
> way of mimicking an apr_file_t with a stream, but that feels like
> Future Work.
>
> Incidentally, one motivation for having streams instead of files is
> compressed pristines.  At some point, I'd like to use streams as
> *input* to things like diff and merge, which is a prerequisite for
> storing stuff in compressed form and uncompressing it via a stream.
> (If that doesn't make much sense, it's 'cause I'm still working out
> the details in my head.)
>
> -Hyrum
>
> --
>
> uberSVN: Apache Subversion Made Easy
> http://www.uberSVN.com/
>



-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/

Re: [PATCH] Make client diff output streamy

Posted by Hyrum K Wright <hy...@wandisco.com>.
On Wed, Sep 14, 2011 at 4:21 AM, Julian Foad <ju...@wandisco.com> wrote:
> On Tue, 2011-09-13, Hyrum K Wright wrote:
>> In looking over the client diff APIs, I noticed the output parameters
>> are file handles, not streams.  This feels...odd to me, so I hacked
>> together the attached patch which updates the APIs to use output
>> streams.  It isn't ready to commit just yet, but before doing the
>> backward compat dance (rev'ing an API before it's even released, ugh),
>> I thought I'd post it here for comments or concerns first.
>
> Cool.  +1 (concept).  Haven't read the patch.
>
>
>> The one gotcha is that running an external diff command still requires
>> files, so this patch creates temporary ones and then copies them to
>> the stream.  Right now, this is a limitation of our own APIs, but
>> fundamentally it's because APR wants an apr_file_t to plug into the
>> stdout and stderr of processes it launches.
>
> Side note:  This (stdout -> apr_file_t -> svn_stream_t) is in contrast
> with how to achieve streamy *input* to an external diff.  Most external
> diff programs take their input from files on disk (not through stdin or
> an open file descriptor), and thus we require not just an apr_file_t
> object but an actual path on disk.  See idea below.
>
>>   Ideally, we'd find some
>> way of mimicking an apr_file_t with a stream, but that feels like
>> Future Work.
>
> Right.
>
>> Incidentally, one motivation for having streams instead of files is
>> compressed pristines.  At some point, I'd like to use streams as
>> *input* to things like diff and merge, which is a prerequisite for
>> storing stuff in compressed form and uncompressing it via a stream.
>> (If that doesn't make much sense, it's 'cause I'm still working out
>> the details in my head.)
>
> For converting efficiently from a readable stream to a readable file,
> for passing to external diff programs such as "kdiff3", I envisage a
> stream API that either creates a temp file and copies the stream content
> into it, or, if it's just a simple stream reading from a file, returns
> the path of the existing file, using some private knowledge inside the
> streams layer to detect that case.  Pseudo-code like this:
>
>  svn_stream_t stream1 = pristine_read(checksum)
>    # Read pristine text (from a plain file, a compressed file,
>    # or, for a higher level API, perhaps from an RA connection?)
>
>  const char *in_path1 = svn_stream_get_as_readable_file(stream1)
>    # If stream1 is reading directly from an existing plain file,
>    # get that file's path, else copy the stream to a temp file
>    # and return its path.
>
>  system("kdiff3", in_path1, in_path2, "-o", out_path)
>    # Run the external command that requires its input as a file
>
>  svn_stream_close(stream1)
>    # Delete the file iff it was a temp file

An intriguing idea, but you'd have to know more about the underlying
stream than our current stream API currently exposes.

>> Index: subversion/include/svn_client.h
>> ===================================================================
>> --- subversion/include/svn_client.h     (revision 1170202)
>> +++ subversion/include/svn_client.h     (working copy)
>> @@ -2831,8 +2831,8 @@ svn_client_diff5(const apr_array_header_t
>> *diff_op
>>                   svn_boolean_t ignore_content_type,
>>                   svn_boolean_t use_git_diff_format,
>>                   const char *header_encoding,
>> -                 apr_file_t *outfile,
>> -                 apr_file_t *errfile,
>> +                 svn_stream_t *outstream,
>> +                 svn_stream_t *errstream,
>>                   const apr_array_header_t *changelists,
>>                   svn_client_ctx_t *ctx,
>>                   apr_pool_t *pool);
>
> Makes me wonder why we have errfile/errstream.  This is a diff API, not
> a general-purpose external program runner.

Yeah, I was wondering about that, too.  Something like
svn_io_run_diff2() can return an error stream/file, but a client-level
API should somehow wrap that in an svn_error_t, me thinks.  Though,
I've no experience nor knowledge of the typical use case for external
diff tools.  Perhaps they do provide meaningful feedback through
stderr?

-Hyrum


-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/

Re: [PATCH] Make client diff output streamy

Posted by Julian Foad <ju...@wandisco.com>.
On Tue, 2011-09-13, Hyrum K Wright wrote:
> In looking over the client diff APIs, I noticed the output parameters
> are file handles, not streams.  This feels...odd to me, so I hacked
> together the attached patch which updates the APIs to use output
> streams.  It isn't ready to commit just yet, but before doing the
> backward compat dance (rev'ing an API before it's even released, ugh),
> I thought I'd post it here for comments or concerns first.

Cool.  +1 (concept).  Haven't read the patch.


> The one gotcha is that running an external diff command still requires
> files, so this patch creates temporary ones and then copies them to
> the stream.  Right now, this is a limitation of our own APIs, but
> fundamentally it's because APR wants an apr_file_t to plug into the
> stdout and stderr of processes it launches.

Side note:  This (stdout -> apr_file_t -> svn_stream_t) is in contrast
with how to achieve streamy *input* to an external diff.  Most external
diff programs take their input from files on disk (not through stdin or
an open file descriptor), and thus we require not just an apr_file_t
object but an actual path on disk.  See idea below.

>   Ideally, we'd find some
> way of mimicking an apr_file_t with a stream, but that feels like
> Future Work.

Right.

> Incidentally, one motivation for having streams instead of files is
> compressed pristines.  At some point, I'd like to use streams as
> *input* to things like diff and merge, which is a prerequisite for
> storing stuff in compressed form and uncompressing it via a stream.
> (If that doesn't make much sense, it's 'cause I'm still working out
> the details in my head.)

For converting efficiently from a readable stream to a readable file,
for passing to external diff programs such as "kdiff3", I envisage a
stream API that either creates a temp file and copies the stream content
into it, or, if it's just a simple stream reading from a file, returns
the path of the existing file, using some private knowledge inside the
streams layer to detect that case.  Pseudo-code like this:

  svn_stream_t stream1 = pristine_read(checksum)
    # Read pristine text (from a plain file, a compressed file,
    # or, for a higher level API, perhaps from an RA connection?)

  const char *in_path1 = svn_stream_get_as_readable_file(stream1)
    # If stream1 is reading directly from an existing plain file,
    # get that file's path, else copy the stream to a temp file
    # and return its path.

  system("kdiff3", in_path1, in_path2, "-o", out_path)
    # Run the external command that requires its input as a file

  svn_stream_close(stream1)
    # Delete the file iff it was a temp file


> Index: subversion/include/svn_client.h
> ===================================================================
> --- subversion/include/svn_client.h     (revision 1170202)
> +++ subversion/include/svn_client.h     (working copy)
> @@ -2831,8 +2831,8 @@ svn_client_diff5(const apr_array_header_t
> *diff_op
>                   svn_boolean_t ignore_content_type,
>                   svn_boolean_t use_git_diff_format,
>                   const char *header_encoding,
> -                 apr_file_t *outfile,
> -                 apr_file_t *errfile,
> +                 svn_stream_t *outstream,
> +                 svn_stream_t *errstream,
>                   const apr_array_header_t *changelists,
>                   svn_client_ctx_t *ctx,
>                   apr_pool_t *pool);

Makes me wonder why we have errfile/errstream.  This is a diff API, not
a general-purpose external program runner.

- Julian