You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "Hyrum K. Wright" <hy...@mail.utexas.edu> on 2010/04/28 15:05:58 UTC

patch API review

I'm looking at adding support for the patch client API to JavaHL, but in
doing so, I have a few questions about the API itself.  I'm pretty late the
game, so these questions might already be answered.  If so, please point me
to the relevant thread(s).

For reference, here's the public API:

svn_error_t *
svn_client_patch(const char *abs_patch_path,
                 const char *local_abspath,
                 svn_boolean_t dry_run,
                 int strip_count,
                 svn_boolean_t reverse,
                 const apr_array_header_t *include_patterns,
                 const apr_array_header_t *exclude_patterns,
                 apr_hash_t **patched_tempfiles,
                 apr_hash_t **reject_tempfiles,
                 svn_boolean_t ignore_whitespaces,
                 svn_client_ctx_t *ctx,
                 apr_pool_t *result_pool,
                 apr_pool_t *scratch_pool);

Firstly, the first parameter should probably be 'patch_abspath', which
follows convention with other variable names.  That being said, is there a
reason why the patch needs to be a file, instead of a stream?  Using a
stream seems much more flexible for the long-term evolution of the API.

Despite my best efforts, I still haven't been able to fully grok the
patched_tempfiles and reject_tempfiles parameters.  However, if they are
truly output parameters, they should be at the front of the parameter list,
not hidden in the middle somewhere.  Also, I know that the API uses
notifications, but it might also be useful to return the _tempfile
parameters through a callback mechanism, rather than as (arbitrarily large)
hashes.

Anyway, those are my thoughts.  I'd go ahead and make these changes myself,
but as I mentioned before, I'm not sure if there are Deep reasons for the
current API.

Thanks,
-Hyrum

Re: patch API review

Posted by Stefan Küng <to...@gmail.com>.
On 04.05.2010 11:46, Hyrum K. Wright wrote:
> On Wed, Apr 28, 2010 at 9:20 PM, Hyrum K. Wright<
> hyrum_wright@mail.utexas.edu>  wrote:
>
>>
>>
>> On Wed, Apr 28, 2010 at 1:43 PM, Stefan Sperling<st...@elego.de>  wrote:
>>
>>> ...
>>>
>>> Also, I know that the API uses
>>>> notifications, but it might also be useful to return the _tempfile
>>>> parameters through a callback mechanism, rather than as (arbitrarily
>>> large)
>>>> hashes.
>>>
>>> That would work as well, yes. Might look nicer than having the output
>>> parameters at the front.
>>>
>>
>> I'll play around with this idea.
>>
>
> r940786 changes the API to use a patch callback function.  Review welcome.

Looks good to me.
Haven't tested it though. But the API is usable from an UI client just fine.

Stefan


-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: patch API review

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
On Wed, Apr 28, 2010 at 9:20 PM, Hyrum K. Wright <
hyrum_wright@mail.utexas.edu> wrote:

>
>
> On Wed, Apr 28, 2010 at 1:43 PM, Stefan Sperling <st...@elego.de> wrote:
>
>> ...
>>
> > Also, I know that the API uses
>> > notifications, but it might also be useful to return the _tempfile
>> > parameters through a callback mechanism, rather than as (arbitrarily
>> large)
>> > hashes.
>>
>> That would work as well, yes. Might look nicer than having the output
>> parameters at the front.
>>
>
> I'll play around with this idea.
>

r940786 changes the API to use a patch callback function.  Review welcome.

-Hyrum

Re: patch API review

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
On Wed, Apr 28, 2010 at 1:43 PM, Stefan Sperling <st...@elego.de> wrote:

> On Wed, Apr 28, 2010 at 10:05:58AM -0500, Hyrum K. Wright wrote:
> > I'm looking at adding support for the patch client API to JavaHL, but in
> > doing so, I have a few questions about the API itself.  I'm pretty late
> the
> > game, so these questions might already be answered.  If so, please point
> me
> > to the relevant thread(s).
> >
> > For reference, here's the public API:
> >
> > svn_error_t *
> > svn_client_patch(const char *abs_patch_path,
> >                  const char *local_abspath,
> >                  svn_boolean_t dry_run,
> >                  int strip_count,
> >                  svn_boolean_t reverse,
> >                  const apr_array_header_t *include_patterns,
> >                  const apr_array_header_t *exclude_patterns,
> >                  apr_hash_t **patched_tempfiles,
> >                  apr_hash_t **reject_tempfiles,
> >                  svn_boolean_t ignore_whitespaces,
> >                  svn_client_ctx_t *ctx,
> >                  apr_pool_t *result_pool,
> >                  apr_pool_t *scratch_pool);
> >
> > Firstly, the first parameter should probably be 'patch_abspath', which
> > follows convention with other variable names.
>
> Sure.
>
> > That being said, is there a
> > reason why the patch needs to be a file, instead of a stream?  Using a
> > stream seems much more flexible for the long-term evolution of the API.
>
> Be prepared. Such a change will probably go deep into the diff parser.
>
> The current svn patch and diff parsing code is optimized around never
> allocating more memory than needed to process a single line.
> Otherwise svn patch would run out of memory quickly for large files.
> We currently support lines as long as memory can hold, rather than
> limiting the size of the entire file by memory.
>
> Furthermore the diff parsing code provides the patch application code
> with streams mapped to portions of the patch file via the
> svn_stream_from_aprfile_range_readonly() API.
> We could now do something similar via mark / seek support in streams.
> When I wrote the diff parser we did not yet have mark / seek support
> in streams yet.
>
> The mark / seek support is currently only available for streams backed
> by APR files or stringbufs. IOW, making the patch file a stream will
> not provide a lot of abstraction -- you'll either use a file anyway and
> wrap a stream around it, or load the entire patch into memory and hog
> tons of RAM for large patches (which is exactly what we want to avoid).
>
> While I would not object to making the patch file a stream if it doesn't
> hurt memory usage, I don't think it's worth re-implementing a lot of
> the svn patch and diff parsing code just to provide callers with the
> option of passing a stream. Why would they want to use a stream?
> I have no problem with requiring callers of this API to create a tempfile
> if they get the patch data from something other than a file. If callers
> pass a stream we have no idea how much memory we'll end up using.
> Granted, we could say that's the caller's problem, but even then I still
> don't see enough benefit to justify the effort involved in changing
> this so I won't do it myself.
>

This sounds like a very valid reason, so I won't touch anything here.


> > Despite my best efforts, I still haven't been able to fully grok the
> > patched_tempfiles and reject_tempfiles parameters.  However, if they are
> > truly output parameters, they should be at the front of the parameter
> list,
> > not hidden in the middle somewhere.
>
> They are output parameters, so yes, we might want to move them.
> They're primarily for use by TortoiseSVN and similar 3rd party clients,
> not really a primary feature of the API. Hence it didn't look right to
> me to put them at the front. But I don't mind moving them.
>
> > Also, I know that the API uses
> > notifications, but it might also be useful to return the _tempfile
> > parameters through a callback mechanism, rather than as (arbitrarily
> large)
> > hashes.
>
> That would work as well, yes. Might look nicer than having the output
> parameters at the front.
>

I'll play around with this idea.


> > Anyway, those are my thoughts.  I'd go ahead and make these changes
> myself,
> > but as I mentioned before, I'm not sure if there are Deep reasons for the
> > current API.
>
> Feel free to go ahead and make these changes. If you change the patch
> file to a stream, please try not to make 'svn patch' use more memory
> than it currently does


As noted, I won't change the file to a stream.  That's out of my league. :)

-Hyrum

Re: patch API review

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Apr 28, 2010 at 10:05:58AM -0500, Hyrum K. Wright wrote:
> I'm looking at adding support for the patch client API to JavaHL, but in
> doing so, I have a few questions about the API itself.  I'm pretty late the
> game, so these questions might already be answered.  If so, please point me
> to the relevant thread(s).
> 
> For reference, here's the public API:
> 
> svn_error_t *
> svn_client_patch(const char *abs_patch_path,
>                  const char *local_abspath,
>                  svn_boolean_t dry_run,
>                  int strip_count,
>                  svn_boolean_t reverse,
>                  const apr_array_header_t *include_patterns,
>                  const apr_array_header_t *exclude_patterns,
>                  apr_hash_t **patched_tempfiles,
>                  apr_hash_t **reject_tempfiles,
>                  svn_boolean_t ignore_whitespaces,
>                  svn_client_ctx_t *ctx,
>                  apr_pool_t *result_pool,
>                  apr_pool_t *scratch_pool);
> 
> Firstly, the first parameter should probably be 'patch_abspath', which
> follows convention with other variable names.

Sure.

> That being said, is there a
> reason why the patch needs to be a file, instead of a stream?  Using a
> stream seems much more flexible for the long-term evolution of the API.
 
Be prepared. Such a change will probably go deep into the diff parser.

The current svn patch and diff parsing code is optimized around never
allocating more memory than needed to process a single line.
Otherwise svn patch would run out of memory quickly for large files.
We currently support lines as long as memory can hold, rather than
limiting the size of the entire file by memory.

Furthermore the diff parsing code provides the patch application code 
with streams mapped to portions of the patch file via the
svn_stream_from_aprfile_range_readonly() API.
We could now do something similar via mark / seek support in streams.
When I wrote the diff parser we did not yet have mark / seek support
in streams yet.

The mark / seek support is currently only available for streams backed
by APR files or stringbufs. IOW, making the patch file a stream will
not provide a lot of abstraction -- you'll either use a file anyway and
wrap a stream around it, or load the entire patch into memory and hog
tons of RAM for large patches (which is exactly what we want to avoid).

While I would not object to making the patch file a stream if it doesn't
hurt memory usage, I don't think it's worth re-implementing a lot of
the svn patch and diff parsing code just to provide callers with the
option of passing a stream. Why would they want to use a stream?
I have no problem with requiring callers of this API to create a tempfile
if they get the patch data from something other than a file. If callers
pass a stream we have no idea how much memory we'll end up using.
Granted, we could say that's the caller's problem, but even then I still
don't see enough benefit to justify the effort involved in changing
this so I won't do it myself.

> Despite my best efforts, I still haven't been able to fully grok the
> patched_tempfiles and reject_tempfiles parameters.  However, if they are
> truly output parameters, they should be at the front of the parameter list,
> not hidden in the middle somewhere.

They are output parameters, so yes, we might want to move them.
They're primarily for use by TortoiseSVN and similar 3rd party clients,
not really a primary feature of the API. Hence it didn't look right to
me to put them at the front. But I don't mind moving them.

> Also, I know that the API uses
> notifications, but it might also be useful to return the _tempfile
> parameters through a callback mechanism, rather than as (arbitrarily large)
> hashes.

That would work as well, yes. Might look nicer than having the output
parameters at the front.

> Anyway, those are my thoughts.  I'd go ahead and make these changes myself,
> but as I mentioned before, I'm not sure if there are Deep reasons for the
> current API.

Feel free to go ahead and make these changes. If you change the patch
file to a stream, please try not to make 'svn patch' use more memory
than it currently does.

Stefan