You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Nik Clayton <ni...@ngo.org.uk> on 2006/11/08 11:57:53 UTC

Bug with SVN::Client::diff() and in-memory files

All,

SVN::Client::diff() is supposed to take (as part of its interface) two file
handles, where the output and errors will be sent, respectively.

If these are two file handles that correspond to files on disk then
everything works as expected.

Perl has the facility to open what are called "in memory" files, where the
file corresponds to a Perl variable.  This is handy way to avoid having to
create temporary files.  However, file handles created in this way don't
work with SVN::Client::diff().

I've attached diff.pl which shows the problem:

Run

   ./diff.pl 0 http://svn.collab.net/repos/svn 2 1

(or use another repo).  You should see the diff printed to STDOUT.

Change that to

   ./diff.pl 1 http://svn.collab.net/repos/svn 2 1

to use in-memory files.  I get (error message wrapped):

   Bad file descriptor: Can't write to stream: Bad file descriptor at
     diff.pl line 40

This seems to be coming from libsvn_subr/io.c:svn_io_file_write(), which in
turn calls apr_file_write()

As far as Perl is concerned there shouldn't be any difference between the
two types of file handle.

Can anyone else reproduce this?

Some other client functions work correctly.  For example, SVN::Client::cat()
works with in memory file handles.  Poking through

   subversion/bindings/swig/perl/native/svn_client.c

it looks as though this might be because ::cat()'s first argument is an
svn_stream_t, while ::diff()'s ninth argument is an apr_file_t.

Is there a reason for this discrepency?

N


Re: Bug with SVN::Client::diff() and in-memory files

Posted by Peter Lundblad <pl...@google.com>.
Malcolm Rowe writes:
> I think I'm agreeing with you: I meant that we can't literally just
> pass streams to be inherited by child processes.  What you said above,
> I think I said in more detail next:
> 
Oh, below you said "hard" which I maybe misunderstood to mean "very hard"
or something:-)  Anyway, we agree this is doable if we want to.

> > > Instead, we'd need to create a temporary file (pipe?) to store the
> > > input or output and then connect filehandles to that, marshalling the
> > > data to or from the svn_stream_t* to the filehandle as necessary.
> > > (We'd also want the ability to detect and extract filehandles from
> > > apr_file_t-wrapping-svn_stream_t's, because I suspect we'd want to
> > > short-circuit all that nonsense in the usual case).
> > > 
> > Why?
> > 
> 
> Why would we want to special-case the file-wrapping-stream case?  Only
> because it seems a little mad to use something that provides a pipe to
> a pipe reader/stream writer that's wrapping a file handle when we could
> instead just provide the file handle directly.
> 
Maybe mad, but not a big problem IMO.  If we think the API change
would be worth it, then I don't think we need to wory about this small
performance penalty.

> > I don't think it would be too hard, actually. It wouldn't be worth the
> > complexity just for this Perl use case, but I think it would improve
> > our APIs in general. In general, we should pass streams of data using
> > svn_stream_t objects in our APIs rather than files.
> > 
> 
> I completely agree; I just meant that it sounded like a lot of work for
> the one specific case.
> 
Agreed.

Best,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Bug with SVN::Client::diff() and in-memory files

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Thu, Nov 09, 2006 at 02:12:10PM +0100, Peter Lundblad wrote:
> > Actually solving the problem would require us to be able to exec()
> > commands with stdin/out/err connected to non-file-handle-backed streams.
> > Of course, that's impossible.
> > 
> No, we don't need that. We could use pipes and forward to our own stream
> objects if we want to.
> 

I think I'm agreeing with you: I meant that we can't literally just
pass streams to be inherited by child processes.  What you said above,
I think I said in more detail next:

> > Instead, we'd need to create a temporary file (pipe?) to store the
> > input or output and then connect filehandles to that, marshalling the
> > data to or from the svn_stream_t* to the filehandle as necessary.
> > (We'd also want the ability to detect and extract filehandles from
> > apr_file_t-wrapping-svn_stream_t's, because I suspect we'd want to
> > short-circuit all that nonsense in the usual case).
> > 
> Why?
> 

Why would we want to special-case the file-wrapping-stream case?  Only
because it seems a little mad to use something that provides a pipe to
a pipe reader/stream writer that's wrapping a file handle when we could
instead just provide the file handle directly.

> I don't think it would be too hard, actually. It wouldn't be worth the
> complexity just for this Perl use case, but I think it would improve
> our APIs in general. In general, we should pass streams of data using
> svn_stream_t objects in our APIs rather than files.
> 

I completely agree; I just meant that it sounded like a lot of work for
the one specific case.

Regards,
Malcolm

Re: Bug with SVN::Client::diff() and in-memory files

Posted by Peter Lundblad <pl...@google.com>.
Malcolm Rowe writes:
> On Wed, Nov 08, 2006 at 11:39:46AM -0500, David Glasser wrote:
> > On 11/8/06, Malcolm Rowe <ma...@farside.org.uk> wrote:
> > 
> So the problem boils down to this:
> 
> - Perl IO memory-backed globs are implemented using the "scalar" PerlIO
>   layer, which doesn't make any use of a filehandle (so PerlIO_fileno()
>   returns -1).
> - svn_swig_pl_make_file() successfully makes an apr_file_t* that refers
>   to a filehandle -1, but then attempting to write anything to that
>   handle fails, obviously.
> 
> The solution is probably just to change svn_swig_pl_make_file() so that
> it returns an svn_error_t*, and make it return a sensible error message
> if it sees a PerlIO fileno of -1, because that indicates a
> non-filehandle-backed PerlIO handle that we aren't going to be able to
> wrap in an apr_file_t.
> 
At least that should be done, yes.


> 
> Actually solving the problem would require us to be able to exec()
> commands with stdin/out/err connected to non-file-handle-backed streams.
> Of course, that's impossible.
> 
No, we don't need that. We could use pipes and forward to our own stream
objects if we want to.

> Instead, we'd need to create a temporary file (pipe?) to store the
> input or output and then connect filehandles to that, marshalling the
> data to or from the svn_stream_t* to the filehandle as necessary.
> (We'd also want the ability to detect and extract filehandles from
> apr_file_t-wrapping-svn_stream_t's, because I suspect we'd want to
> short-circuit all that nonsense in the usual case).
> 
Why?

> All that sounds rather.. hard.
> 

I don't think it would be too hard, actually. It wouldn't be worth the
complexity just for this Perl use case, but I think it would improve
our APIs in general. In general, we should pass streams of data using
svn_stream_t objects in our APIs rather than files.

I am not going to work on it, so anyone, feel free if inclined.

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Bug with SVN::Client::diff() and in-memory files

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Wed, Nov 08, 2006 at 11:39:46AM -0500, David Glasser wrote:
> On 11/8/06, Malcolm Rowe <ma...@farside.org.uk> wrote:
> 
> >Just to clarify: we're talking solely about the filehandles passed as
> >the output and error 'streams' for diff, not anything related to the
> >input files (which yes, need to be mmapable).
> 
> An attempt to bump svn_client_diff to use svn_streams turns into a
> need to do the same to svn_io_run_diff, which then implies
> svn_io_run_cmd, which then implies svn_io_start_cmd, which does
> certainly appear to start doing file-specific things.  That's probably
> the place to look.
> 

So the problem boils down to this:

- Perl IO memory-backed globs are implemented using the "scalar" PerlIO
  layer, which doesn't make any use of a filehandle (so PerlIO_fileno()
  returns -1).
- svn_swig_pl_make_file() successfully makes an apr_file_t* that refers
  to a filehandle -1, but then attempting to write anything to that
  handle fails, obviously.

The solution is probably just to change svn_swig_pl_make_file() so that
it returns an svn_error_t*, and make it return a sensible error message
if it sees a PerlIO fileno of -1, because that indicates a
non-filehandle-backed PerlIO handle that we aren't going to be able to
wrap in an apr_file_t.


Actually solving the problem would require us to be able to exec()
commands with stdin/out/err connected to non-file-handle-backed streams.
Of course, that's impossible.

Instead, we'd need to create a temporary file (pipe?) to store the
input or output and then connect filehandles to that, marshalling the
data to or from the svn_stream_t* to the filehandle as necessary.
(We'd also want the ability to detect and extract filehandles from
apr_file_t-wrapping-svn_stream_t's, because I suspect we'd want to
short-circuit all that nonsense in the usual case).

All that sounds rather.. hard.

Regards,
Malcolm

Re: Bug with SVN::Client::diff() and in-memory files

Posted by David Glasser <gl...@mit.edu>.
On 11/8/06, Malcolm Rowe <ma...@farside.org.uk> wrote:

> Just to clarify: we're talking solely about the filehandles passed as
> the output and error 'streams' for diff, not anything related to the
> input files (which yes, need to be mmapable).

An attempt to bump svn_client_diff to use svn_streams turns into a
need to do the same to svn_io_run_diff, which then implies
svn_io_run_cmd, which then implies svn_io_start_cmd, which does
certainly appear to start doing file-specific things.  That's probably
the place to look.

--dave

-- 
David Glasser | glasser@mit.edu | http://www.davidglasser.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Bug with SVN::Client::diff() and in-memory files

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Wed, Nov 08, 2006 at 04:23:58PM +0100, Erik Huelsmann wrote:
> >> >SVN::Client::diff() is supposed to take (as part of its interface) two 
> >file
> >> >handles, where the output and errors will be sent, respectively.
> >> >
> >> >If these are two file handles that correspond to files on disk then
> >> >everything works as expected.
> >> >
> >>
> >> This is not all that surprising.  The diff functions don't take
> >> streams, they work in terms of apr_file_t, so they really do expect
> >> there to be an actual file on disk.  The fix is to rewrite them to
> >> work in terms of streams (not trivial, IIRC), or work around it in
> >> either the bindings or in your application by writing that in-memory
> >> data to disk and passing in handles to those files to diff.
> >>
> >
> >I doubt that the diff code really needs the file to exist on disk,
> >though it probably does need it to be a seekable filehandle.
> 
> As the diff code shares a lot of code with merge, that code *does*
> require a disk file, since it tries to mmap the files. I have never
> actually looked through the code thoroughly enough to determine
> necessity for merge, but in the presence of mmap, this requirement
> can't be lifted now.
> 

Just to clarify: we're talking solely about the filehandles passed as
the output and error 'streams' for diff, not anything related to the
input files (which yes, need to be mmapable).

I doubt we need to mmap those filehandles :-)

Regards,
Malcolm

Re: Bug with SVN::Client::diff() and in-memory files

Posted by Erik Huelsmann <eh...@gmail.com>.
On 11/8/06, Malcolm Rowe <ma...@farside.org.uk> wrote:
> On Wed, Nov 08, 2006 at 09:05:28AM -0500, Garrett Rooney wrote:
> > On 11/8/06, Nik Clayton <ni...@ngo.org.uk> wrote:
> > >SVN::Client::diff() is supposed to take (as part of its interface) two file
> > >handles, where the output and errors will be sent, respectively.
> > >
> > >If these are two file handles that correspond to files on disk then
> > >everything works as expected.
> > >
> >
> > This is not all that surprising.  The diff functions don't take
> > streams, they work in terms of apr_file_t, so they really do expect
> > there to be an actual file on disk.  The fix is to rewrite them to
> > work in terms of streams (not trivial, IIRC), or work around it in
> > either the bindings or in your application by writing that in-memory
> > data to disk and passing in handles to those files to diff.
> >
>
> I doubt that the diff code really needs the file to exist on disk,
> though it probably does need it to be a seekable filehandle.

As the diff code shares a lot of code with merge, that code *does*
require a disk file, since it tries to mmap the files. I have never
actually looked through the code thoroughly enough to determine
necessity for merge, but in the presence of mmap, this requirement
can't be lifted now.

bye,

Erik.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Bug with SVN::Client::diff() and in-memory files

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Wed, Nov 08, 2006 at 09:05:28AM -0500, Garrett Rooney wrote:
> On 11/8/06, Nik Clayton <ni...@ngo.org.uk> wrote:
> >SVN::Client::diff() is supposed to take (as part of its interface) two file
> >handles, where the output and errors will be sent, respectively.
> >
> >If these are two file handles that correspond to files on disk then
> >everything works as expected.
> >
> 
> This is not all that surprising.  The diff functions don't take
> streams, they work in terms of apr_file_t, so they really do expect
> there to be an actual file on disk.  The fix is to rewrite them to
> work in terms of streams (not trivial, IIRC), or work around it in
> either the bindings or in your application by writing that in-memory
> data to disk and passing in handles to those files to diff.
> 

I doubt that the diff code really needs the file to exist on disk,
though it probably does need it to be a seekable filehandle.

The code that converts a Perl file glob to an apr_file_t* is in
svn_swig_pl_make_file(), and just grabs the OS filehandle from the Perl
glob and apr_os_file_put()'s it into a new apr_file_t structure, with
roughly ('file' is the Perl reference-to-glob):

        apr_os_file_t osfile = PerlIO_fileno(IoIFP(sv_2io(file)));
        status = apr_os_file_put(&apr_file, &osfile, 
                                 O_CREAT | O_WRONLY, pool);

It's possible that Perl's in-memory files don't have a filehandle, and
so osfile is being set to a bogus value.  If so, we should probably flag
an early error at this point rather than waiting for the write to fail
later.  (Or, if we can get an OS filehandle for the glob using some
other mechanism, do that).

Regards,
Malcolm

Re: Bug with SVN::Client::diff() and in-memory files

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 11/8/06, Nik Clayton <ni...@ngo.org.uk> wrote:
> All,
>
> SVN::Client::diff() is supposed to take (as part of its interface) two file
> handles, where the output and errors will be sent, respectively.
>
> If these are two file handles that correspond to files on disk then
> everything works as expected.
>
> Perl has the facility to open what are called "in memory" files, where the
> file corresponds to a Perl variable.  This is handy way to avoid having to
> create temporary files.  However, file handles created in this way don't
> work with SVN::Client::diff().
>
> I've attached diff.pl which shows the problem:
>
> Run
>
>    ./diff.pl 0 http://svn.collab.net/repos/svn 2 1
>
> (or use another repo).  You should see the diff printed to STDOUT.
>
> Change that to
>
>    ./diff.pl 1 http://svn.collab.net/repos/svn 2 1
>
> to use in-memory files.  I get (error message wrapped):
>
>    Bad file descriptor: Can't write to stream: Bad file descriptor at
>      diff.pl line 40
>
> This seems to be coming from libsvn_subr/io.c:svn_io_file_write(), which in
> turn calls apr_file_write()
>
> As far as Perl is concerned there shouldn't be any difference between the
> two types of file handle.
>
> Can anyone else reproduce this?

This is not all that surprising.  The diff functions don't take
streams, they work in terms of apr_file_t, so they really do expect
there to be an actual file on disk.  The fix is to rewrite them to
work in terms of streams (not trivial, IIRC), or work around it in
either the bindings or in your application by writing that in-memory
data to disk and passing in handles to those files to diff.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org