You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ben Reser <be...@reser.org> on 2004/01/12 10:10:23 UTC

apr_off_t is of an ambiguous size.

I started out with a perl bindings bug and found a nasty bug in the our
interface.  John Peacock had reported trouble running the test suite for
my pending perl bindings patch.  It would segfault on him.

We tracked it down to the blame receiver callback.  The C thunk seemed
to be getting its parameters wrong.  Yet I checked and the parameters
were correct coming from the client lib.  

What I eventually discovered was a size mismatch for the apr_off_t type
between the bindings and the client lib.  The apr_off_t type is used for
the line_no parameter to the blame receiver.  

apr_off_t is just a typedef to off_t.  Which on Linux is defined in
sys/types.h as:
# ifndef __USE_FILE_OFFSET64
typedef __off_t off_t;
# else
typedef __off64_t off_t;
# endif

The problem is pretty clear.  Some compliations might have be a 64-bit
int and some might be a 32-bit int.  In my case perl bindings are
picking up the 64-bit off_t form perl via the SWIG_PL_INCLUDES variable.
libsvn_client sees it as 32-bit, the bindings see it as 64-bit.  This is
a problem.

Looking at the headers we're using apr_off_t in svn_diff, svn_io and
svn_client (for the blame receiver).  By only have a 32-bit apr_off_t
we're crippling ourselves in the long run for large file support and
making interoperating with the growing number of libraries that do have
it difficult.

Unfortunately, APR doesn't turn it on.  Nor does apr provide us with an
apr_off64_t.  I would guess it doesn't try to turn it on because it
can't be reliably done on all platforms.

We can't just ignore the issue because we have an ambiguous data type in
our interface.  Even if APR adds support apr_off_t will still probably
be ambiguous.  Only a new type of apr_off32_t and apr_off64_t would be
unambiguous.  So if/when APR fixes this we'd be left with an API change.

We can turn on the 64-bit off_t in our compilation but then we won't
necessarily match the APR lib we build against.  This shouldn't be a
huge problem, unless we run into a large file.  Then APR wouldn't be
able to handle it and we might get bizzare behavior, which I don't think
is good.

So I guess my suggestion at this point is to do the following:

a) For exported interfaces do not use apr_off_t.  Use apr_int64_t.

b) At build time check to see if APR is using a 32-bit off_t or a 64-bit
off_t (it could be 64-bit on some platforms).  We can check this by
looking at the APR_OFF_T_FMT define.  If it's 32-bit enable bounds
checking so we can throw an error if the value would surpass the bounds
of what APR can handle.  I don't think there'd be very many places we'd
have to check for this.  Only right before we passed one of these vars
to an APR function.  (Which as far as I can tell is only apr_file_seek
right now).

c) Don't use apr_off_t for the line_no parameter of the blame receiver.
I figure either unsigned long or apr_uint32_t.  The former gets the
advantage of scaling with 64-bit archs.  Not that I think anyone is ever
going to want to run blame on a file with more than 4 billion lines.

Since we're mostly only using apr_off_t as an output variable as opposed
to an input variable this ought to be pretty safe.  The one exception is
svn_io_file_seek.  Which would require the bounds checking to match APR.
However, we might be able to ignore the bounds checking and just drop
apr_io_file_seek, I don't see it being used anywhere.  

If nobody has any better ideas, I'll start coding up a patch to
implement this.  I strongly think if we're going to do this we should do
it before 1.0.

Thoughts?

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Mon, Jan 12, 2004 at 07:49:18PM -0500, mark benedetto king wrote:
> It is a line number, but it is also implicitly the same datatype as the
> results of svn_diff_file_diff(), which uses apr_off_t.
> 
> What are we going to do about apr_off_t in public interfaces in general,
> rather than in this specific case?

See this email for what seems to be the consensus:
http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=54363

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by mark benedetto king <mb...@lowlatency.com>.
On Mon, Jan 12, 2004 at 01:45:06PM -0800, Greg Stein wrote:
> 
> I'd go with apr_uint64_t rather than svn_filesize_t. This is a line number
> rather than an offset/size of a file. Yes, the two are proportional to
> some degree, but I'd agree with ghudson about the "mental judo" if we used
> svn_filesize_t for that param.
> 

It is a line number, but it is also implicitly the same datatype as the
results of svn_diff_file_diff(), which uses apr_off_t.

What are we going to do about apr_off_t in public interfaces in general,
rather than in this specific case?

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Chia-Liang Kao <cl...@clkao.org>.
SVN::Ra is currently using io_open_unique_file, mainly because the existing
typemap doesn't allow perl filehandles to be wrapped as apr_file_t to be
returned by racallback->open_tmp_file. I think the io_* removal should be
couple with the the apr_file_t out typemap fixes.

Cheers,
CLK

On Mon, Jan 12, 2004 at 02:01:02PM -0800, Ben Reser wrote:
> > I would seriously recommend dropping all svn_io_* functions from the Perl
> > bindings. In general, Perl itself has its own APIs to do that stuff (and
> > it does it in a Perl-standard fashion and is doc'd well and has precedent
> > and ...). Note that we already ignore a bunch of svn_io_ functions (see
> > core.i). I suspect there are a lot more svn_io_* functions that have been
> > added recently which need to be ignored, too.
> 
> Agreed, I didn't notice until last night that some of them were getting
> wrapped.  I just need to get around to removing them.  I've made a note
> of it on my mental todo list.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Tue, Jan 13, 2004 at 04:31:19AM -0500, Russell Yanofsky wrote:
> I agree that svn_filesize_t should be used instead of apr_off_t in the diff
> and blame apis.
> 
> I the functions that wrap around apr calls for error handling would be
> better off using APR types. But if you're only going to change
> svn_io_file_seek, then I guess it's not really a big deal. That's as a good
> a place as any to do bounds checking on platforms where APR doesn't do 64
> bit i/o.

We're not even wrapping svn_io_file_seek.  As far as I'm concerned it
could stay as apr_off_t but then we're just pushing the typecasting up
one level.  

As far as I can see the only interface we export that takes apr_off_t
and passes it on to an actual APR function is svn_io_file_seek.  So it
would be the only place where we'd need bounds checking if we moved to
an always 64-bit int for our offsets.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Russell Yanofsky <re...@columbia.edu>.
Ben Reser wrote:
> Okay sounds like there's some consensus on the best way to fix this...
> I wasn't aware of the svn_filesize_t discussion until ghudson pointed
> it out to me.
>
>> I'd go with apr_uint64_t rather than svn_filesize_t. This is a line
>> number rather than an offset/size of a file. Yes, the two are
>> proportional to some degree, but I'd agree with ghudson about the
>> "mental judo" if we used svn_filesize_t for that param.
>
> So apr_uint64_t for blame's line_no, svn_filesize_t for the uses of
> svn_off_t in libsvn_diff, svn_filesize_t for the svn_io_file_seek with
> appropriate bounds checking for making sure we don't pass a 64-bit
> number to a 32-bit off_t.

I agree that svn_filesize_t should be used instead of apr_off_t in the diff
and blame apis.

I the functions that wrap around apr calls for error handling would be
better off using APR types. But if you're only going to change
svn_io_file_seek, then I guess it's not really a big deal. That's as a good
a place as any to do bounds checking on platforms where APR doesn't do 64
bit i/o.

- Russ



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Russell Yanofsky <re...@columbia.edu>.
Ben Reser wrote:
>   const apr_off_t min = ((apr_off_t) sizeof (apr_off_t)-1) <<
>                         ((sizeof (apr_off_t)*8)-1);

Again, this should be

   const apr_off_t min = ((apr_off_t) 1) <<
                         ((sizeof (apr_off_t)*8)-1);

- Russ


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Fri, Jan 16, 2004 at 05:19:29PM -0800, Ben Reser wrote:
> You're right.  And this does work even on 32-bit big endian archs
> (except you need to cast what you want to shift or you'll get the wrong
> output.
> 
> The following example code:
> 
> long long min, max;
> 
> min = (long long) (sizeof(long long) - 1) << (sizeof(long long)*8-1);
> max = ~min;
> 
> printf("min=%lld\nmax=%lld\n",min,max);
> printf("min-1=%lld\nmax+1=%lld\n",min-1,max+1);
> 
> Produces the following output: 
> 
> min=-9223372036854775808
> max=9223372036854775807
> min-1=9223372036854775807
> max+1=-9223372036854775808
> 
> This is true on the follow archs:
> 32-bit little endian (i586)
> 32-bit bit enndia (ppc)
> 64-bit little endian (amd64)
> 64-bit big endian (sun4u)
> 
> The min-1 and max+1 is there to prove that the 2s compliment is correct.

Based on the above the bounds svn_io_file_seek can be implemented like
so, which looks a whole lot nicer:

svn_error_t *
svn_io_file_seek (apr_file_t *file, apr_seek_where_t where,
                  svn_offset_t *offset, apr_pool_t *pool)
{
  const apr_off_t min = ((apr_off_t) sizeof (apr_off_t)-1) <<
                        ((sizeof (apr_off_t)*8)-1);
  const apr_off_t max = ~min;

  if (sizeof (apr_off_t) >= sizeof (svn_offset_t) ||
      (*offset > min && *offset < max))
    {
      apr_off_t apr_offset = *offset;
      return do_io_file_wrapper_cleanup
        (file, apr_file_seek (file, where, &apr_offset),
         "set psoition pointer in", pool);
    }
  else
    {
      return svn_error_create (SVN_ERR_IO_OFFSET_SIZE, NULL,
                               "Cannot seek file");
    }
}

If nobody dislikes this I'll regen the patch on the issue with this
change.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Kean Johnston <jk...@sco.com>.
> Here we go again... and with this I'm stopping for the day.  It's
> Friday.
Beforfe you stop consider one small change please :)

>   const apr_off_t min = (apr_off_t) 1 << ((sizeof (apr_off_t)*8)-1);
>   const apr_off_t max = ~min;
Dont use the names min and max. They are all-too-frequently defined to 
be macros that will cause syntax errors when that file is compiled.

Kean

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Branko Čibej <br...@xbc.nu>.
Ben Reser wrote:

>On Sun, Jan 18, 2004 at 08:53:21AM +0100, Branko ??ibej wrote:
>  
>
>>Not on Windows, where apr_off_t is always 64-bit.
>>    
>>
>
>I didn't say anything about Windows.  All I was saying is we know there
>are platforms with 32-bit apr_off_t's.  Nor did I say every platform has
>a 32-bit apr_off_t.
>  
>
I wasn't saying you had.

>>If anyone takes the time to make this a complete patch, please don't
>>introduce svn_offset_t. Use the existing svn_filesize_t. Yes, I know
>>that's unsigned, but we can fix that (i.e., either make it signed, or
>>tell svn_io_file_seek of we're seeking backwards or forwards from the
>>current position). But I don't want to see two different types.
>>    
>>
>
>What's wrong with two types?  We already have two types if we make no
>change.  It's not like this is really introducing a new type.  It's just
>replacing apr_off_t with svn_offset_t.
>  
>
I know we have two types now, and I don't like it, for the reasons you
describe below.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Sun, Jan 18, 2004 at 08:53:21AM +0100, Branko ??ibej wrote:
> Not on Windows, where apr_off_t is always 64-bit.

I didn't say anything about Windows.  All I was saying is we know there
are platforms with 32-bit apr_off_t's.  Nor did I say every platform has
a 32-bit apr_off_t.

> If anyone takes the time to make this a complete patch, please don't
> introduce svn_offset_t. Use the existing svn_filesize_t. Yes, I know
> that's unsigned, but we can fix that (i.e., either make it signed, or
> tell svn_io_file_seek of we're seeking backwards or forwards from the
> current position). But I don't want to see two different types.

What's wrong with two types?  We already have two types if we make no
change.  It's not like this is really introducing a new type.  It's just
replacing apr_off_t with svn_offset_t.

Though, I do see a problem these types the way they are.  If filesize is
going to be an unsigned value and the offset type is a signed value then
the offset type has to be twice as big.  Otherwise you can't seek to all
points in the file with one operation.  Plus APR sets the offset
parameter to your current offset within the file after the operation
completes.  Unfortunately it can't do this without overflowing if your
filesize is bigger than your offset type can support.

Alternatives to this are as follows:

*  Only have svn_filesize_t as a apr_int64_t.  This decreases our
maximum file size to 4GB but provides a consistent way to seek any
location in the file.  Use the same type for the offsets.

*  svn_filesize_t is an apr_uint32_t and svn_offset_t is an apr_int64_t.
This has the same maximum file size of 4GB.  

But the problem here is we don't really gain the ability to seek to all
points in the file anyway.  We'd still have these platforms with a
32-bit apr_off_t.  There's no way to write (using APR) portable code
that provides that functionality without decreasing our maximum filesize
to 2GB.

Using an unsigned for offsets has its own problems.  The diff code
assumes that the offset type is signed.  It would likely take more than
trivial changes to make this work.  So I don't really see that as a
great option at this point.  Plus we're still stuck with a signed
interface to APR and the difficulties with its output into the offset
parameter.

So maybe seeking to all points in a single operation isn't a big deal.
Right now the only thing we seem to be using seek for is to get to the
beginning of the file.  In fact svn_io_file_seek isn't being used at
all.  So all the seeks are still using arp_off_t even if you applied the
patch as it stands today.  And no matter what we do platforms with
32-bit apr_off_t's are not going to be able to support the seeking on
these large files because of the whole offset issue.

So in the end I think the way the patch is implemented now is best.  I'd
rather have the additional file length than be able to seek around the
file.  No matter what we do we'll still be stuck with the lack of good
cross platform large file support in APR.  So this seems like the best
solution until and if APR does that.

Having two types is going to make it a lot easier to deal with whatever
APR does because we'll be able to track down the code effected by such
an APR improvement easier.  Mixing the two into one type is going to
force someone to potentially have to go through all the code and figure
out if the type should be changed or not.

I don't think there is a great solution to this problem, just ones with
varying amounts of hassle and annoyance.  

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Sat, Jan 17, 2004 at 09:13:42PM -0500, Greg Hudson wrote:
> And, last I checked, 64 is at least as big as 32.
> 
> The first check is fine.  The second check (before stuffing the APR
> offset back into the parameter) seems needless.

Ahh I misunderstood what you meant.  I thought you meant let's just
ditch the bounds checking entirely.  Rather you're saying let's not
bother testing before putting back into the svn_offset_t from the
apr_off_t.  

That's fine with me.  I tend to error on the side of being overly
paranoid.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Branko Čibej <br...@xbc.nu>.
Ben Reser wrote:

>On Sat, Jan 17, 2004 at 06:08:29PM -0500, Greg Hudson wrote:
>  
>
>>I think I'd rather we just hide our heads in the sand and assume that
>>svn_offset_t is at least as big as apr_off_t.  If we ever run into
>>128-bit apr_off_t values and people want to use >64-bit offsets, we're
>>in more trouble than we can really solve with a bounds check here.
>>    
>>
>
>But we already know that this isn't true.  Out of the box on 32-bit
>archs running Linux apr_off_t is going to be a 32-bit int.
>
Not on Windows, where apr_off_t is always 64-bit.

>svn_offset_t is 64-bits on all platforms.
>
If anyone takes the time to make this a complete patch, please don't
introduce svn_offset_t. Use the existing svn_filesize_t. Yes, I know
that's unsigned, but we can fix that (i.e., either make it signed, or
tell svn_io_file_seek of we're seeking backwards or forwards from the
current position). But I don't want to see two different types.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sat, 2004-01-17 at 21:10, Ben Reser wrote:
> On Sat, Jan 17, 2004 at 06:08:29PM -0500, Greg Hudson wrote:
> > I think I'd rather we just hide our heads in the sand and assume that
> > svn_offset_t is at least as big as apr_off_t.  If we ever run into
> > 128-bit apr_off_t values and people want to use >64-bit offsets, we're
> > in more trouble than we can really solve with a bounds check here.
> 
> But we already know that this isn't true.  Out of the box on 32-bit
> archs running Linux apr_off_t is going to be a 32-bit int.  svn_offset_t
> is 64-bits on all platforms.  

And, last I checked, 64 is at least as big as 32.

The first check is fine.  The second check (before stuffing the APR
offset back into the parameter) seems needless.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Sat, Jan 17, 2004 at 06:08:29PM -0500, Greg Hudson wrote:
> I think I'd rather we just hide our heads in the sand and assume that
> svn_offset_t is at least as big as apr_off_t.  If we ever run into
> 128-bit apr_off_t values and people want to use >64-bit offsets, we're
> in more trouble than we can really solve with a bounds check here.

But we already know that this isn't true.  Out of the box on 32-bit
archs running Linux apr_off_t is going to be a 32-bit int.  svn_offset_t
is 64-bits on all platforms.  

Anybody who tries to use such a platform for a file larger than 2
gigabytes could encouter undefined behavior due to the difference
between apr_off_t and svn_offset_t.  A 2 gigabyte file is not
unrealistic, so if it happens we should produce an error rather than
having undefined behavior.

Obviously the only solution for those people will be to rebuild their
APR with a 64-bit apr_off_t.  So in this situation this person will
receive an error message which they can track down and realize they need
to rebuild APR appropriately.  If we stick our heads in the sand we'll
have bizzare behavior without an error message.  

Sure a 128-bit apr_off_t would be a problem, but an edge case that's not
likely to happen.  A 64-bit off_t supports files up to 8 exabytes.  I
have to doubt that subversion is going to run reliably for anything that
big.  Can BDB even scale that big?  I kinda doubt there's a database
available to us that can handle that.

At this point I don't think anyone is going to attempt to put even a
single exabyte file into subversion.  But I do suspect we'll have people
looking to put multiple gigabyte files into it.  I seem to recall
someone discussing storing disk images in subversion at one point.

Let's solve the problems we've got today.  If and when we see 128-bit
apr_off_t's we can worry about them then.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sat, 2004-01-17 at 17:39, Ben Reser wrote:
> svn_error_t *
> svn_io_file_seek (apr_file_t *file, apr_seek_where_t where,
>                   svn_offset_t *offset, apr_pool_t *pool)
> {
>   /* Because apr_off_t may not be the same size as svn_offset_t we need 
>    * to do bounds checking to make sure it will fit.  The typecasted
>    * comparisions do that for us. */
>   if ((apr_off_t) *offset == *offset )
>     {
>       apr_off_t apr_offset = (apr_off_t) *offset;
>       apr_status_t status;
> 
>       status = apr_file_seek (file, where, &apr_offset);
> 
>       if ((svn_offset_t) apr_offset == apr_offset)
>         {

I think I'd rather we just hide our heads in the sand and assume that
svn_offset_t is at least as big as apr_off_t.  If we ever run into
128-bit apr_off_t values and people want to use >64-bit offsets, we're
in more trouble than we can really solve with a bounds check here.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Sat, Jan 17, 2004 at 02:28:20PM -0800, (Mr Hyde) Ben Reser wrote:
> Any objections to this implementation?

Yes you forgot to typecast the assignments.

svn_error_t *
svn_io_file_seek (apr_file_t *file, apr_seek_where_t where,
                  svn_offset_t *offset, apr_pool_t *pool)
{
  /* Because apr_off_t may not be the same size as svn_offset_t we need 
   * to do bounds checking to make sure it will fit.  The typecasted
   * comparisions do that for us. */
  if ((apr_off_t) *offset == *offset )
    {
      apr_off_t apr_offset = (apr_off_t) *offset;
      apr_status_t status;

      status = apr_file_seek (file, where, &apr_offset);

      if ((svn_offset_t) apr_offset == apr_offset)
        {
          *offset = (svn_offset_t) apr_offset;
          return do_io_file_wrapper_cleanup (file, status,
                                             "set position pointer in",
                                             pool);
        }
    }

    return svn_error_create (SVN_ERR_IO_OFFSET_SIZE, NULL,
                             "Cannot seek file");
}




-- 
Ben Reser (Dr. Jekyll) <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Sat, Jan 17, 2004 at 11:52:43AM -0500, Greg Hudson wrote:
> No, the standard is pretty clear that the compiler is not allowed to
> optimize out operations that could result in an overflow.  Otherwise
> optimized code would produce different results than unoptimized code
> with much greater regularity than it does currently, leading to tons of
> debugging headaches.

Okay just thought I'd double check.  I've heard the horror stories of
compilers optimizing stuff out that was necessary before.  But if you're
comfortable with it I'm okay with it.

> apr_file_seek does modify the offset value, so yes, we need to stuff it
> back into *offset after the call completes.

Yup, I just spaced handling this.

Here's a changed implementation on the basis of your comments:

svn_error_t *
svn_io_file_seek (apr_file_t *file, apr_seek_where_t where,
                  svn_offset_t *offset, apr_pool_t *pool)
{ 
  /* Because apr_off_t may not be the same size as svn_offset_t we need 
   * to do bounds checking to make sure it will fit.  The typecasted
   * comparisions do that for us. */
  if ((apr_off_t) *offset == *offset )
    { 
      apr_off_t apr_offset = *offset;
      apr_status_t status;

      status = apr_file_seek (file, where, &apr_offset);
    
      if ((svn_offset_t) apr_offset == apr_offset)
        {
          *offset = apr_offset;
          return do_io_file_wrapper_cleanup (file, status,
                                             "set position pointer in",
                                             pool);
        }
    }

    return svn_error_create (SVN_ERR_IO_OFFSET_SIZE, NULL,
                             "Cannot seek file");
}

Any objections to this implementation?

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by mark benedetto king <mb...@lowlatency.com>.
On Sat, Jan 17, 2004 at 12:11:36AM -0800, Ben Reser wrote:
> On Sat, Jan 17, 2004 at 12:07:19AM -0500, Greg Hudson wrote:
> > >   const apr_off_t min = (apr_off_t) 1 << ((sizeof (apr_off_t)*8)-1);
> > 
> > The behavior of this expression is undefined by C99.  If we're going
> > to rely on common signed overflow behavior, "if ((apr_off_t) *offset
> > == *offset)" would appear to be simpler.
> > 
> > I'm not sure if there's a good way to perform this check which is
> > totally standards-conformant.  C99 allows two's-complement,
> > one's-complement, and sign-and-magnitude representation of integers.
> > It would be nice if APR would help us out by defining *_MIN and *_MAX
> > constants for apr_off_t, but it doesn't; moreover, it looks like it
> > would have a difficult job doing so, since there are apparently no
> > system constants for the minimum or maximum off_t value.
> 
> I'd say the comparision test looks brilliant.  My only concern is will a
> compiler optimize it out?
> 

It would take a very smart compiler.  Regardless: this is a single compare
and branch; it's not the end of the world.

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Russell Yanofsky <re...@columbia.edu>.
Greg Hudson wrote:
> Russell Yanofsky wrote:
>> Well, I did ask whether it was ok to assume 2's complement
>
> Even if you could assume two's complement, you can't assume that the
> environment won't raise a signal on signed integer overflow.  (Not
> that I've ever heard of an environment which does.)

Hmm. An environment that triggers overflows on bitwise operations. That
can't be a pretty sight, but I don't doubt that the C standard allows it.

>> I'm curious about something else. Does the standard allows just
>> these three representations, or does it allow any representation?
>> If it allows just those three, then in theory you could write a
>> big ass conditional expression that works everywhere.
>
> Just those three.  I'm not sure if you can determine which you have
> without risking an overflow, though.

Interesting. Thanks for answering my questions. I'd RTFS if I had a copy...

- Russ



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sat, 2004-01-17 at 03:11, Ben Reser wrote:
> On Sat, Jan 17, 2004 at 12:07:19AM -0500, Greg Hudson wrote:
> > If we're going to rely on common signed overflow behavior,
> >  "if ((apr_off_t) *offset == *offset)" would appear to be
> >  simpler.

> I'd say the comparision test looks brilliant.  My only concern is will a
> compiler optimize it out?

No, the standard is pretty clear that the compiler is not allowed to
optimize out operations that could result in an overflow.  Otherwise
optimized code would produce different results than unoptimized code
with much greater regularity than it does currently, leading to tons of
debugging headaches.

mbk wrote:
> It would take a very smart compiler.  Regardless: this is a single
> compare and branch; it's not the end of the world.

The question is not "will the compiler do this as quickly as we'd like?"
but "will the compiler do this *too* quickly, and produce a false
positive when *offset won't fit in an apr_off_t?"

Russell Yanofsky wrote:
> Well, I did ask whether it was ok to assume 2's complement

Even if you could assume two's complement, you can't assume that the
environment won't raise a signal on signed integer overflow.  (Not that
I've ever heard of an environment which does.)

> Which reminds me, Ben, that you didn't answer my question at the
> bottom of that post. The one about whether the *apr_offset value
> can be manipulated.

apr_file_seek does modify the offset value, so yes, we need to stuff it
back into *offset after the call completes.

> I'm curious about something else. Does the standard allows just
> these three representations, or does it allow any representation?
> If it allows just those three, then in theory you could write a
> big ass conditional expression that works everywhere.

Just those three.  I'm not sure if you can determine which you have
without risking an overflow, though.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Sat, Jan 17, 2004 at 12:07:19AM -0500, Greg Hudson wrote:
> >   const apr_off_t min = (apr_off_t) 1 << ((sizeof (apr_off_t)*8)-1);
> 
> The behavior of this expression is undefined by C99.  If we're going
> to rely on common signed overflow behavior, "if ((apr_off_t) *offset
> == *offset)" would appear to be simpler.
> 
> I'm not sure if there's a good way to perform this check which is
> totally standards-conformant.  C99 allows two's-complement,
> one's-complement, and sign-and-magnitude representation of integers.
> It would be nice if APR would help us out by defining *_MIN and *_MAX
> constants for apr_off_t, but it doesn't; moreover, it looks like it
> would have a difficult job doing so, since there are apparently no
> system constants for the minimum or maximum off_t value.

I'd say the comparision test looks brilliant.  My only concern is will a
compiler optimize it out?

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Russell Yanofsky <re...@columbia.edu>.
Greg Hudson wrote:
>>   const apr_off_t min = (apr_off_t) 1 << ((sizeof (apr_off_t)*8)-1);
>
> The behavior of this expression is undefined by C99.  If we're going
> to rely on common signed overflow behavior, "if ((apr_off_t) *offset
> == *offset)" would appear to be simpler.

Much simpler!

> I'm not sure if there's a good way to perform this check which is
> totally standards-conformant.  C99 allows two's-complement,
> one's-complement, and sign-and-magnitude representation of integers.

Well, I did ask whether it was ok to assume 2's complement when I first
posted the code here:

http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=54782

Which reminds me, Ben, that you didn't answer my question at the bottom of
that post. The one about whether the *apr_offset value can be manipulated.

- Russ



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Russell Yanofsky <re...@columbia.edu>.
Greg Hudson wrote:
> C99 allows two's-complement,
> one's-complement, and sign-and-magnitude representation of integers.

I'm curious about something else. Does the standard allows just these three
representations, or does it allow any representation? If it allows just
those three, then in theory you could write a big ass conditional expression
that works everywhere.

- Russ



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Greg Hudson <gh...@MIT.EDU>.
>   const apr_off_t min = (apr_off_t) 1 << ((sizeof (apr_off_t)*8)-1);

The behavior of this expression is undefined by C99.  If we're going
to rely on common signed overflow behavior, "if ((apr_off_t) *offset
== *offset)" would appear to be simpler.

I'm not sure if there's a good way to perform this check which is
totally standards-conformant.  C99 allows two's-complement,
one's-complement, and sign-and-magnitude representation of integers.
It would be nice if APR would help us out by defining *_MIN and *_MAX
constants for apr_off_t, but it doesn't; moreover, it looks like it
would have a difficult job doing so, since there are apparently no
system constants for the minimum or maximum off_t value.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Fri, Jan 16, 2004 at 08:40:43PM -0500, Russell Yanofsky wrote:
> Ben Reser wrote:
> > min = (long long) (sizeof(long long) - 1) << (sizeof(long long)*8-1);
> 
> Looks like something was added in translation there. The (sizeof(long
> long) - 1) should be replaced with just 1. Either one will work, because any
> odd number will work, but 1 makes more sense:
> 
>   min = (long long) 1 << (sizeof(long long)*8-1);

Here we go again... and with this I'm stopping for the day.  It's
Friday.

svn_error_t *
svn_io_file_seek (apr_file_t *file, apr_seek_where_t where,
                  svn_offset_t *offset, apr_pool_t *pool)
{
  const apr_off_t min = (apr_off_t) 1 << ((sizeof (apr_off_t)*8)-1);
  const apr_off_t max = ~min;

  if (sizeof (apr_off_t) >= sizeof (svn_offset_t) ||
      (*offset > min && *offset < max))
    {
      apr_off_t apr_offset = *offset;
      return do_io_file_wrapper_cleanup
        (file, apr_file_seek (file, where, &apr_offset),
         "set psoition pointer in", pool);
    }
  else
    {
      return svn_error_create (SVN_ERR_IO_OFFSET_SIZE, NULL,
                               "Cannot seek file");
    }
}


-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Russell Yanofsky <re...@columbia.edu>.
Ben Reser wrote:
> min = (long long) (sizeof(long long) - 1) << (sizeof(long long)*8-1);

Looks like something was added in translation there. The (sizeof(long
long) - 1) should be replaced with just 1. Either one will work, because any
odd number will work, but 1 makes more sense:

  min = (long long) 1 << (sizeof(long long)*8-1);

- Russ




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Branko Čibej <br...@xbc.nu>.
Kean Johnston wrote:

>>> min = 0x8000000000000000 = -9223372036854775808 = -2^63
>>> max = 0x7FFFFFFFFFFFFFFF = 9223372036854775807 = 2^63-1
>>
> Bad code.
>
> Those constants cannot be expressed on a platform that does not
> support 64-bit integers. Even on platforms that do if you really get a
> long long the compiler is being generous. They should be suffixed with
> LL, or ULL for unsigned. Type long long is, I believe, an ANSI
> extension, its not base ANSI (but I may be wrong on that point).

There is no such thing as ANSI C. :-)
But in ISO C, "long long" is a standard type. Has been for more than
three years. :-)


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Kean Johnston <jk...@sco.com>.
Russell Yanofsky wrote:

> Kean Johnston wrote:
> 
>>>>min = 0x8000000000000000 = -9223372036854775808 = -2^63
>>>>max = 0x7FFFFFFFFFFFFFFF = 9223372036854775807 = 2^63-1
>>
>>Bad code.
> 
> 
> It's not code, just an illustration.
I know it was, but I was trying to point out the constants were bad. I 
thought Id seen one of the previous incantations of the patch actually 
use those constants.

Kean

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Russell Yanofsky <re...@columbia.edu>.
Kean Johnston wrote:
>>> min = 0x8000000000000000 = -9223372036854775808 = -2^63
>>> max = 0x7FFFFFFFFFFFFFFF = 9223372036854775807 = 2^63-1
> Bad code.

It's not code, just an illustration.

- Russ


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Kean Johnston <jk...@sco.com>.
>> min = 0x8000000000000000 = -9223372036854775808 = -2^63
>>max = 0x7FFFFFFFFFFFFFFF = 9223372036854775807 = 2^63-1
Bad code.

Those constants cannot be expressed on a platform that does not support 
64-bit integers. Even on platforms that do if you really get a long long 
the compiler is being generous. They should be suffixed with LL, or ULL 
for unsigned. Type long long is, I believe, an ANSI extension, its not 
base ANSI (but I may be wrong on that point).

> For whatever reason I always thought of the << and >> operators as
> literally shifting the bits left or right.  Rather they should be
> described as shifting fromt he low bit to the high bit and the high bit
> to the low bit respectively.
That is one and the same thing. You're dealing with numbers here, not 
data busses and in-memory representation. Just as a decimal number, in a 
little endian machine, is not expressed differently, neither is a binary 
number. For the purposes of the shift operators, you are shifting a 
binary number, either left or right by a given number of bits. How those 
bits are implemented in memory should never be a concern (until such 
time as they are stored or transfered over a non-memory mediam, such as 
a network).

Kean

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Fri, Jan 16, 2004 at 07:30:53PM -0500, Russell Yanofsky wrote:
> There appear to be a few bugs above. In the first line >= should be replaced
> by <=. On the third line | should be &&. On the seventh line the second >
> should be < and an extra parenthesis should be added on the end.

err yeah.

> Yes. With the bugs fixed it should be equivalent to my first suggestion, but
> shorter.

And easier to understand if I wrote things right.

> Huh? How does endianness come into play here? Do you see me casting to a
> char array? declaring a union? calling a socket api? :)
> With an 8 byte apr_off_t the expression should be evaluated as follows on
> *any* architecture:
>
> min = ((apr_off_t)-1) << (sizeof(apr_off_t)*8-1)
> min = 0xFFFFFFFFFFFFFFFF << (sizeof(apr_off_t)*8-1)
> min = 0xFFFFFFFFFFFFFFFF << 63
> min = 0x8000000000000000
> max = ~min
> max = 0x7FFFFFFFFFFFFFFF
> 
> In 2's complement representation
> 
> min = 0x8000000000000000 = -9223372036854775808 = -2^63
> max = 0x7FFFFFFFFFFFFFFF = 9223372036854775807 = 2^63-1

You're right.  And this does work even on 32-bit big endian archs
(except you need to cast what you want to shift or you'll get the wrong
output.

The following example code:

long long min, max;

min = (long long) (sizeof(long long) - 1) << (sizeof(long long)*8-1);
max = ~min;

printf("min=%lld\nmax=%lld\n",min,max);
printf("min-1=%lld\nmax+1=%lld\n",min-1,max+1);

Produces the following output: 

min=-9223372036854775808
max=9223372036854775807
min-1=9223372036854775807
max+1=-9223372036854775808

This is true on the follow archs:
32-bit little endian (i586)
32-bit bit enndia (ppc)
64-bit little endian (amd64)
64-bit big endian (sun4u)

The min-1 and max+1 is there to prove that the 2s compliment is correct.
 
For whatever reason I always thought of the << and >> operators as
literally shifting the bits left or right.  Rather they should be
described as shifting fromt he low bit to the high bit and the high bit
to the low bit respectively.

For whatever reason, everyone calls them right and left shifts.  But
it's only really a right or left shift on big endian archs.  On a little
endian arch the shift is actually reversed.

Anyway I'll fix the code to work that way.  And to think I asked someone
else if they thought that was Endian safe and they told me no too.  Next
time I'll just test it for myself.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Russell Yanofsky <re...@columbia.edu>.
Ben Reser wrote:
> On Fri, Jan 16, 2004 at 05:09:23PM -0500, Russell Yanofsky wrote:
> I see what you're getting at and I think this would be better then
> what
> I did or even what you did:
>
> if (sizeof (svn_offset_t) >= sizeof (apr_off_t) ||
>     (sizeof (apr_off_t) == 4 &&
>       *offset > -0x7FFFFFFF | *offset < 0x7FFFFFFF) ||
>     (sizeof (apr_off_t) == 2 &&
>       *offset > -0x7FFF && *offset < 0x7FFF) ||
>     (sizeof (apr_off_t) == 1 &&
>       *offset > -0x7F || *offset > 0x7FF)
>   {
>     /* execute apr_file_seek */
>   }
> else
>   {
>     /* throw error */
>   }

There appear to be a few bugs above. In the first line >= should be replaced
by <=. On the third line | should be &&. On the seventh line the second >
should be < and an extra parenthesis should be added on the end.

> It eliminates an entire block that mostly is there because I started
> to implement it the way you suggest next.

Yes. With the bugs fixed it should be equivalent to my first suggestion, but
shorter.

>> I'm also curious if another implementation might work too:
>>
>>   const apr_off_t min = ((apr_off_t)-1) << (sizeof(apr_off_t)*8-1);
>>   const apr_off_t max = ~min;
>>
>>   if (*offset < min || *offset > max) ...
>>
>> It assumes a 2's complement representation. Anyone know if that's ok
>> in ansi c?
>
> Nope because it's dependent upon the bit order of the arch.  If you do
> this on a big endian 32-bit system that has an 8 byte apr_off_t it
> will
> fail.  And such architectures do exist.  That's why I went with it the
> way I did.  It's kinda ugly but it works more portably.

Huh? How does endianness come into play here? Do you see me casting to a
char array? declaring a union? calling a socket api? :)
With an 8 byte apr_off_t the expression should be evaluated as follows on
*any* architecture:

min = ((apr_off_t)-1) << (sizeof(apr_off_t)*8-1)
min = 0xFFFFFFFFFFFFFFFF << (sizeof(apr_off_t)*8-1)
min = 0xFFFFFFFFFFFFFFFF << 63
min = 0x8000000000000000
max = ~min
max = 0x7FFFFFFFFFFFFFFF

In 2's complement representation

min = 0x8000000000000000 = -9223372036854775808 = -2^63
max = 0x7FFFFFFFFFFFFFFF = 9223372036854775807 = 2^63-1

- Russ



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Fri, Jan 16, 2004 at 05:09:23PM -0500, Russell Yanofsky wrote:
> If you really wanted to be paranoid you could write it like
> 
>       if ((sizeof (apr_off_t) != 4 ||
>             *offset < -0x7FFFFFFF || *offset > 0x7FFFFFFF) &&
>           (sizeof (apr_off_t) != 2 ||
>             *offset < -0x7FFF || *offset > 0x7FFF) &&
>           (sizeof (apr_off_t) != 1 ||
>             *offset < -0x7F || *offset > 0x7FF))
>         {
>           return svn_error_create (SVN_ERR_IO_OFFSET_SIZE, NULL,
>                                    "Cannot seek file");
>         }

I see what you're getting at and I think this would be better then what
I did or even what you did:

if (sizeof (svn_offset_t) >= sizeof (apr_off_t) ||
    (sizeof (apr_off_t) == 4 && 
      *offset > -0x7FFFFFFF | *offset < 0x7FFFFFFF) ||
    (sizeof (apr_off_t) == 2 &&
      *offset > -0x7FFF && *offset < 0x7FFF) ||
    (sizeof (apr_off_t) == 1 &&
      *offset > -0x7F || *offset > 0x7FF) 
  {
    /* execute apr_file_seek */
  }
else
  {
    /* throw error */
  }

It eliminates an entire block that mostly is there because I started to
implement it the way you suggest next.

> That way there'd be an error for 3, 5, 6, and 7 byte off_t's.
> 
> I'm also curious if another implementation might work too:
> 
>   const apr_off_t min = ((apr_off_t)-1) << (sizeof(apr_off_t)*8-1);
>   const apr_off_t max = ~min;
> 
>   if (*offset < min || *offset > max) ...
> 
> It assumes a 2's complement representation. Anyone know if that's ok in ansi
> c?

Nope because it's dependent upon the bit order of the arch.  If you do
this on a big endian 32-bit system that has an 8 byte apr_off_t it will
fail.  And such architectures do exist.  That's why I went with it the
way I did.  It's kinda ugly but it works more portably.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Russell Yanofsky <re...@columbia.edu>.
Ben Reser wrote:
> Patch is up in Issue #1705.  It's also marked 1.0-cand and in the
> STATUS file.

[from the patch]
>+      /* This is probably overly paranoid about checking
>+       * but I'd rather be safe than sorry */
>+      if ((sizeof (apr_off_t) == 4 &&
>+            (*offset < -0x7FFFFFFF || *offset > 0x7FFFFFFF)) ||
>+          (sizeof (apr_off_t) == 2 &&
>+            (*offset < -0x7FFF || *offset > 0x7FFF)) ||
>+          (sizeof (apr_off_t) == 1 &&
>+            (*offset < -0x7F || *offset > 0x7FF)))
>+        {
>+          return svn_error_create (SVN_ERR_IO_OFFSET_SIZE, NULL,
>+                                   "Cannot seek file");
>+        }

If you really wanted to be paranoid you could write it like

      if ((sizeof (apr_off_t) != 4 ||
            *offset < -0x7FFFFFFF || *offset > 0x7FFFFFFF) &&
          (sizeof (apr_off_t) != 2 ||
            *offset < -0x7FFF || *offset > 0x7FFF) &&
          (sizeof (apr_off_t) != 1 ||
            *offset < -0x7F || *offset > 0x7FF))
        {
          return svn_error_create (SVN_ERR_IO_OFFSET_SIZE, NULL,
                                   "Cannot seek file");
        }

That way there'd be an error for 3, 5, 6, and 7 byte off_t's.

I'm also curious if another implementation might work too:

  const apr_off_t min = ((apr_off_t)-1) << (sizeof(apr_off_t)*8-1);
  const apr_off_t max = ~min;

  if (*offset < min || *offset > max) ...

It assumes a 2's complement representation. Anyone know if that's ok in ansi
c?

[further down in patch]
>+  apr_offset = (apr_off_t) *offset;
>+
>   return do_io_file_wrapper_cleanup
>-    (file, apr_file_seek (file, where, offset),
>+    (file, apr_file_seek (file, where, &apr_offset),
>      "set position pointer in", pool);
> }

The function declaration says "svn_offset_t *offset" instead of
"svn_offset_t const *offset." Does that mean the APR function might
manipulate the offset pointer? If so you'd need to forward the new value
back to the caller with:

  *offset = apr_offset;

- Russ



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Fri, Jan 16, 2004 at 11:46:51AM -0800, Ben Reser wrote:
> Patch is up in Issue #1705.  It's also marked 1.0-cand and in the STATUS
> file.

Updated patch is on Issue #1705.  Only changes are to the
svn_io_file_seek function.  It now uses the method Greg Hudson
recommended.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Mon, Jan 12, 2004 at 02:01:02PM -0800, Ben Reser wrote:
> So apr_uint64_t for blame's line_no, svn_filesize_t for the uses of
> svn_off_t in libsvn_diff, svn_filesize_t for the svn_io_file_seek with
> appropriate bounds checking for making sure we don't pass a 64-bit
> number to a 32-bit off_t.
> 
> I'll try to get a patch for this out in the next few days.

Patch is up in Issue #1705.  It's also marked 1.0-cand and in the STATUS
file.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Mon, Jan 12, 2004 at 01:45:06PM -0800, Greg Stein wrote:
> This is a good, general rule. A long while back, we came up with the idea
> to use svn_filesize_t because we wanted to ensure that SVN could hold
> large files, regardless of the underlying platform's capabilities. Of
> course, it will obviously fail if it tries, but the APIs are in place to
> be correct and "where it hits the road", we can compensate between SVN's
> large file size and the platform specifics.

Okay sounds like there's some consensus on the best way to fix this...
I wasn't aware of the svn_filesize_t discussion until ghudson pointed it
out to me.

> I'd go with apr_uint64_t rather than svn_filesize_t. This is a line number
> rather than an offset/size of a file. Yes, the two are proportional to
> some degree, but I'd agree with ghudson about the "mental judo" if we used
> svn_filesize_t for that param.

So apr_uint64_t for blame's line_no, svn_filesize_t for the uses of
svn_off_t in libsvn_diff, svn_filesize_t for the svn_io_file_seek with
appropriate bounds checking for making sure we don't pass a 64-bit
number to a 32-bit off_t.

I'll try to get a patch for this out in the next few days.

> I would seriously recommend dropping all svn_io_* functions from the Perl
> bindings. In general, Perl itself has its own APIs to do that stuff (and
> it does it in a Perl-standard fashion and is doc'd well and has precedent
> and ...). Note that we already ignore a bunch of svn_io_ functions (see
> core.i). I suspect there are a lot more svn_io_* functions that have been
> added recently which need to be ignored, too.
> 
> Thus, I'd expect to see a patch which extends the list in core.i to
> include a lot more functions. There are probably a good number more of
> these across the subr functions.

Agreed, I didn't notice until last night that some of them were getting
wrapped.  I just need to get around to removing them.  I've made a note
of it on my mental todo list.

But my comments were about the C API.  Not specifically the perl
wrapping.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Greg Stein <gs...@lyra.org>.
On Mon, Jan 12, 2004 at 02:10:23AM -0800, Ben Reser wrote:
>...
> So I guess my suggestion at this point is to do the following:
> 
> a) For exported interfaces do not use apr_off_t.  Use apr_int64_t.

This is a good, general rule. A long while back, we came up with the idea
to use svn_filesize_t because we wanted to ensure that SVN could hold
large files, regardless of the underlying platform's capabilities. Of
course, it will obviously fail if it tries, but the APIs are in place to
be correct and "where it hits the road", we can compensate between SVN's
large file size and the platform specifics.

>...
> c) Don't use apr_off_t for the line_no parameter of the blame receiver.
> I figure either unsigned long or apr_uint32_t.  The former gets the
> advantage of scaling with 64-bit archs.  Not that I think anyone is ever
> going to want to run blame on a file with more than 4 billion lines.

I'd go with apr_uint64_t rather than svn_filesize_t. This is a line number
rather than an offset/size of a file. Yes, the two are proportional to
some degree, but I'd agree with ghudson about the "mental judo" if we used
svn_filesize_t for that param.

> Since we're mostly only using apr_off_t as an output variable as opposed
> to an input variable this ought to be pretty safe.  The one exception is
> svn_io_file_seek.  Which would require the bounds checking to match APR.
> However, we might be able to ignore the bounds checking and just drop
> apr_io_file_seek, I don't see it being used anywhere.

I would seriously recommend dropping all svn_io_* functions from the Perl
bindings. In general, Perl itself has its own APIs to do that stuff (and
it does it in a Perl-standard fashion and is doc'd well and has precedent
and ...). Note that we already ignore a bunch of svn_io_ functions (see
core.i). I suspect there are a lot more svn_io_* functions that have been
added recently which need to be ignored, too.

Thus, I'd expect to see a patch which extends the list in core.i to
include a lot more functions. There are probably a good number more of
these across the subr functions.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Mon, Jan 12, 2004 at 11:41:36AM +0000, Chia-Liang Kao wrote:
> This definitely should be fixed. Regarding the pending patch, I plan to commit
> it with the blame test in client test suite skipped before this gets fixed,
> what do you think?

That's fine.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Chia-Liang Kao <cl...@clkao.org>.
This definitely should be fixed. Regarding the pending patch, I plan to commit
it with the blame test in client test suite skipped before this gets fixed,
what do you think?

On Mon, Jan 12, 2004 at 02:10:23AM -0800, Ben Reser wrote:
> If nobody has any better ideas, I'll start coding up a patch to
> implement this.  I strongly think if we're going to do this we should do
> it before 1.0.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Mon, Jan 12, 2004 at 11:23:27AM -0500, Greg Hudson wrote:
>   * Because of the mess Linux (and Solaris?) have created, using
>     apr_off_t in our APIs is somewhat fraught.  We can use #ifdef magic
>     to make our APIs work with both LARGEFILE and non-LARGEFILE callers,
>     but I bet we'd be happier without doing that.

*nod*

>   * We have a type svn_filesize_t, which is always apr_int64_t.  It
>     seems reasonable to use in our APIs where we want offsets (although
>     it requires a bit of mental judo to realize that the type which
>     holds file sizes is also appropriately large for holding file
>     offsets).  See issue #639.

Pretty much the same issue then.  Sounds like the job didn't get
finished.  svn_filesize_t works for me.  If we think it's too much judo
we can always typedef svn_offset_t to svn_filesize_t.

>   * I'm waffling on what we should do with the svn_io APIs which simply
>     wrap APR functions.  I think we want to fix those as well (i.e. use
>     svn_filesize_t), and then if APR ever gets a largefile story we
>     should be in a good position to make use of it.

*nod*

>   * I can see no good motivation for using a 32-bit type in the blame
>     receiver callback.  It introduces a type narrowing problem in the
>     blame implementation, even if it seems "unlikely" that there will
>     ever be more than four billion lines in a file.  If the perl
> bindings
>     have to be a little more complicated because perl can't handle
>     64-bit types gracefully, that's not a good reason to screw up the C
>     API.

Well I didn't see a motivation for a 64-bit int.  All line_no is a
counter that gets incremented with every call.  I have no problem using
a 64-bit int or doing the tiny (and I mean tiny) workaround to make it
work nicely on various perl versions.  I'll be more than happy to
implement it as a apr_uint64_t if that's what people want.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Greg Hudson <gh...@MIT.EDU>.
On Tue, 2004-01-13 at 16:25, Joe Orton wrote:
> > I was assuming that APR would take the apr_off_t bashing route. Is this not
> > a safe assumption?

> Yes, the only sane way to get LFS support in APR is to define apr_off_t
> as off64_t and internally use the transitional API (open64 etc).  Having
> APR export -D_FILE_OFFSET_BITS=64 would screw everything up just as much
> as when Perl does it.

Well, they can't really do that after they hit 1.0.  And if they're
going to do that very soon (would it have implications for Apache 2?),
we might want to factor that into our release schedule to avoid an ABI
issue.

(Of course, it won't matter too much if we eliminate all the apr_off_t
uses in our API.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Joe Orton <jo...@manyfish.co.uk>.
On Tue, Jan 13, 2004 at 01:32:46PM -0500, Russell Yanofsky wrote:
> Greg Hudson wrote:
> >>> My input:
> >>> ...
> >>>   * I'm waffling on what we should do with the svn_io APIs which
> >>>     simply wrap APR functions.  I think we want to fix those as well
> >>>     (i.e. use svn_filesize_t), and then if APR ever gets a largefile
> >>>     story we should be in a good position to make use of it.
> >>
> >> If APR gets a large filesize story, won't they just make the
> >> definition of apr_off_t 64 bits? If so, then we're already in a
> >> position to make use of it. Why complicate the wrapper functions if
> >> they are already portable?
> >
> > If APR simply bashes apr_off_t on Linux to 64 bits, breaking ABI
> > compatibility, then by avoiding apr_off_t, at least our own ABIs won't
> > break, although that would be small consolation to apps which use APR
> > functions in addition to svn functions.

I don't know where you get the idea that there is something unique about
how Linux implements large file support: the kernel and glibc just
implement the LFS standard, http://ftp.sas.com/standards/large.file/. 
It's the same in Solaris, HP-UX, AIX, UnixWare, and probably lots of
other non-BSD-derived Unixes.

> > If APR makes apr_off_t on Linux 64 bits for new compiles but uses
> > #defines so as to leave around 32-bit functions for old compiles (the
> > *BSD route), then by avoiding apr_off_t, our ABI won't break when it
> > otherwise would.
> >
> > If APR makes apr_off_t 64 bits only when a special symbol is defined
> > (the Linux route), then we'll be free to define that symbol and get
> > 64-bit I/O without fear of breaking our ABI.

> I was assuming that APR would take the apr_off_t bashing route. Is this not
> a safe assumption?

Yes, the only sane way to get LFS support in APR is to define apr_off_t
as off64_t and internally use the transitional API (open64 etc).  Having
APR export -D_FILE_OFFSET_BITS=64 would screw everything up just as much
as when Perl does it.

Currently, users can already choose a 64-bit apr_off_t on LFS systems by
building APR with -D_FILE_OFFSET_BITS=64, and changing the size of off_t
itself.  (though APR wasn't choosing the correct APR_FMT_OFF_T in that
case until yesterday on HEAD)

Regards,

joe

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Russell Yanofsky <re...@columbia.edu>.
Greg Hudson wrote:
> On Tue, 2004-01-13 at 04:16, Russell Yanofsky wrote:
>> Then surely you can see that what this linux people did is pretty
>> reasonable. Instead of changing off_t from 32 bits to 64 when they
>> introduced 64 bit file I/O, they decided that making 64 bits default
>> wasn't worth breaking backwards compatibility over.
>
> Perhaps you're not familiar with "the *BSD route" I mentioned above.
> For each libc symbol accepting an off_t or struct stat, they created a
> new symbol and used header file redirects to point newly-compiled
> source files at the new symbol.

Well sure. The Linux library actually does the exact same thing when
__USE_FILE_OFFSET64 is defined. And, like you say, this doesn't provide
binary compatibility with 3rd party libraries which expose off_t in their
interfaces. So they took a conservative position, keeping the posix
interfaces 32 bit, and providing a parallel 64 bit interface for
applications that need it (off64_t, struct stat64, lseek64, etc).

>...
>> IMO, it is perl that is at fault here. It would be fine if perl just
>> used __USE_FILE_OFFSET internally to do 64 bit I/O on linux, but it
>> has no business imposing nonstandard library options on libraries
>> that call into it.
>
> Well, perl may be at fault for doing that, but Linux is delegating an
> awful lot of complexity to libraries here.  It's not at all surprising
> that some will get it wrong, or will just ignore the problem and only
> provide 32-bit file I/O.

Linux developers only need to know two things:

1) sizeof(off_t) == 4
2) there's a separate, nonposix interface for 64-bit i/o

Surely we could expect the big brains at perl.com to handle this
information. But instead they decided to mess with glibc library options to
wangle 64-bit I/O. Messing with library options is fine if you are, say,
writing a prototype app or supporting legacy software, but it's not the
brightest thing to do when you are building something that needs to interact
with third party libraries

>>> My input:
>>> ...
>>>   * I'm waffling on what we should do with the svn_io APIs which
>>>     simply wrap APR functions.  I think we want to fix those as well
>>>     (i.e. use svn_filesize_t), and then if APR ever gets a largefile
>>>     story we should be in a good position to make use of it.
>>
>> If APR gets a large filesize story, won't they just make the
>> definition of apr_off_t 64 bits? If so, then we're already in a
>> position to make use of it. Why complicate the wrapper functions if
>> they are already portable?
>
> If APR simply bashes apr_off_t on Linux to 64 bits, breaking ABI
> compatibility, then by avoiding apr_off_t, at least our own ABIs won't
> break, although that would be small consolation to apps which use APR
> functions in addition to svn functions.
>
> If APR makes apr_off_t on Linux 64 bits for new compiles but uses
> #defines so as to leave around 32-bit functions for old compiles (the
> *BSD route), then by avoiding apr_off_t, our ABI won't break when it
> otherwise would.
>
> If APR makes apr_off_t 64 bits only when a special symbol is defined
> (the Linux route), then we'll be free to define that symbol and get
> 64-bit I/O without fear of breaking our ABI.

I was assuming that APR would take the apr_off_t bashing route. Is this not
a safe assumption?

- Russ



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Greg Hudson <gh...@MIT.EDU>.
On Tue, 2004-01-13 at 04:16, Russell Yanofsky wrote:
> Greg Hudson wrote:
> > Well, Linux has certainly created a mess here.  Pity they didn't go
> > the *BSD route and actually wind up with a 64-bit off_t.
> 
> I agree with you here, but wow. This is from the same Greg Hudson that
> expects the 1.0 subversion API to be supported 5 years from now? The one who
> goes around saying that if a library isn't backwards compatible it might as
> well not exist at all? :)

> Then surely you can see that what this linux people did is pretty
> reasonable. Instead of changing off_t from 32 bits to 64 when they
> introduced 64 bit file I/O, they decided that making 64 bits default wasn't
> worth breaking backwards compatibility over.

Perhaps you're not familiar with "the *BSD route" I mentioned above. 
For each libc symbol accepting an off_t or struct stat, they created a
new symbol and used header file redirects to point newly-compiled source
files at the new symbol.  Other system libraries which used off_t or
struct stat had to be addressed similarly.  Old binaries continued to
work just fine.  There was a compatibility issue with third-party
libraries using off_t or struct stat (you had to recompile them before
you could link against them with newly-compiled source, and without
special effort, the recompiled library would not be compatible with old
libraries, so you had to bump the major rev or something), but Back in
the Day, there weren't such a huge number of third-party libraries
running around, so it wasn't very traumatic.

In today's age of GNOME's towering library dependency chart and so on,
the damage would probably be too great.

(I'm not certain what form the "header file redirects" were.  They might
have been straight #defines, but they might have used compiler magic so
that they didn't have to infect all namespaces.)

> IMO, it is perl that is at fault here. It would be fine if perl just used
> __USE_FILE_OFFSET internally to do 64 bit I/O on linux, but it has no
> business imposing nonstandard library options on libraries that call into
> it.

Well, perl may be at fault for doing that, but Linux is delegating an
awful lot of complexity to libraries here.  It's not at all surprising
that some will get it wrong, or will just ignore the problem and only
provide 32-bit file I/O.

> > My input:
> > ...
> >   * I'm waffling on what we should do with the svn_io APIs which
> >     simply wrap APR functions.  I think we want to fix those as well
> >     (i.e. use svn_filesize_t), and then if APR ever gets a largefile
> >     story we should be in a good position to make use of it.
> 
> If APR gets a large filesize story, won't they just make the definition of
> apr_off_t 64 bits? If so, then we're already in a position to make use of
> it. Why complicate the wrapper functions if they are already portable?

If APR simply bashes apr_off_t on Linux to 64 bits, breaking ABI
compatibility, then by avoiding apr_off_t, at least our own ABIs won't
break, although that would be small consolation to apps which use APR
functions in addition to svn functions.

If APR makes apr_off_t on Linux 64 bits for new compiles but uses
#defines so as to leave around 32-bit functions for old compiles (the
*BSD route), then by avoiding apr_off_t, our ABI won't break when it
otherwise would.

If APR makes apr_off_t 64 bits only when a special symbol is defined
(the Linux route), then we'll be free to define that symbol and get
64-bit I/O without fear of breaking our ABI.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Russell Yanofsky <re...@columbia.edu>.
Greg Hudson wrote:
> Well, Linux has certainly created a mess here.  Pity they didn't go
> the *BSD route and actually wind up with a 64-bit off_t.

I agree with you here, but wow. This is from the same Greg Hudson that
expects the 1.0 subversion API to be supported 5 years from now? The one who
goes around saying that if a library isn't backwards compatible it might as
well not exist at all? :)

Then surely you can see that what this linux people did is pretty
reasonable. Instead of changing off_t from 32 bits to 64 when they
introduced 64 bit file I/O, they decided that making 64 bits default wasn't
worth breaking backwards compatibility over.

IMO, it is perl that is at fault here. It would be fine if perl just used
__USE_FILE_OFFSET internally to do 64 bit I/O on linux, but it has no
business imposing nonstandard library options on libraries that call into
it.

> My input:
> ...
>   * I'm waffling on what we should do with the svn_io APIs which
>     simply wrap APR functions.  I think we want to fix those as well
>     (i.e. use svn_filesize_t), and then if APR ever gets a largefile
>     story we should be in a good position to make use of it.

If APR gets a large filesize story, won't they just make the definition of
apr_off_t 64 bits? If so, then we're already in a position to make use of
it. Why complicate the wrapper functions if they are already portable?

> ...  If the perl
> bindings
>     have to be a little more complicated because perl can't handle
>     64-bit types gracefully, that's not a good reason to screw up the
>     C API.

I agree here.

- Russ



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Kean Johnston <jk...@sco.com>.
Greg Hudson wrote:

> Well, Linux has certainly created a mess here.  Pity they didn't go the
> *BSD route and actually wind up with a 64-bit off_t.
I speak under correction here but memory tells me that the 
_FILE_OFFSET_BITS nonsense was POSIX mandated. I'll ask some of our 
standards gurus if they remember.

Kean



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Greg Hudson <gh...@MIT.EDU>.
Well, Linux has certainly created a mess here.  Pity they didn't go the
*BSD route and actually wind up with a 64-bit off_t.

My input:

  * Because of the mess Linux (and Solaris?) have created, using
    apr_off_t in our APIs is somewhat fraught.  We can use #ifdef magic
    to make our APIs work with both LARGEFILE and non-LARGEFILE callers,
    but I bet we'd be happier without doing that.

  * We have a type svn_filesize_t, which is always apr_int64_t.  It
    seems reasonable to use in our APIs where we want offsets (although
    it requires a bit of mental judo to realize that the type which
    holds file sizes is also appropriately large for holding file
    offsets).  See issue #639.

  * I'm waffling on what we should do with the svn_io APIs which simply
    wrap APR functions.  I think we want to fix those as well (i.e. use
    svn_filesize_t), and then if APR ever gets a largefile story we
    should be in a good position to make use of it.

  * I can see no good motivation for using a 32-bit type in the blame
    receiver callback.  It introduces a type narrowing problem in the
    blame implementation, even if it seems "unlikely" that there will
    ever be more than four billion lines in a file.  If the perl
bindings
    have to be a little more complicated because perl can't handle
    64-bit types gracefully, that's not a good reason to screw up the C
    API.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Mon, Jan 12, 2004 at 02:10:23AM -0800, Ben Reser wrote:
> b) At build time check to see if APR is using a 32-bit off_t or a 64-bit
> off_t (it could be 64-bit on some platforms).  We can check this by
> looking at the APR_OFF_T_FMT define.  If it's 32-bit enable bounds
> checking so we can throw an error if the value would surpass the bounds
> of what APR can handle.  I don't think there'd be very many places we'd
> have to check for this.  Only right before we passed one of these vars
> to an APR function.  (Which as far as I can tell is only apr_file_seek
> right now).

*sigh* this is a little more complicated than I thought.  Looking at the
APR_OFF_T_FMT isn't going to help by itself.  Because on a 64 bit
platform "ld" and "lld" are both 64-bit formats.  So I guess I'd have to
do the test on the size of a long AND APR_OFF_T_FMT.  

If a long is 32-bits and the OFF_T_FMT is "ld" then we know we have a
32-bit off_t in APR and need to do bounds checking.

If OFF_T_FMT is "lld" or a long is 64-bits then off_t has to be a 64-bit
int.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Mon, Jan 12, 2004 at 01:00:58PM -0800, Kean Johnston wrote:
> Thats fair. I think its rarer than a developer doing it, but I can see 
> it happening. Of course, only a developer is likely to be in need of an 
> SCM, especially at its bleeding edge version, but your point is valid :)

SCM is useful for things other than what most people would call source
code.  You can certainly use it for HTML and Javascripts.  In fact due
to the integration with Apache I'd guess that subversion will end up
being extermely popular for such people.

But I'd venture to bet that at least some portion of those people are
not going to be familiar with the issues of large file support.  Nor
will they likely care.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Mon, Jan 12, 2004 at 01:00:58PM -0800, Kean Johnston wrote:
>  if (sizeof(aprtype) < sizeof(svntype)) {
>    whatever();
>  }

That works.  I think I like that better than the whole OFF_T_FMT thing I
was thinking of.  

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Kean Johnston <jk...@sco.com>.
> It seems much more logical to simply always use a 64-bit offset and to
> #ifdef the bounds checking we'd need on the machines that have an APR
> built with a 32-bit off_t.
Thats a good approach too. However, I would suggest you use real code, 
not an ifdef, to do the switching. It may well result in a constant 
condition which the optimizer will eliminate, but in general its better 
to get to the real truth of the matter as the compiler sees it (which 
can, as you know, be subtly different to what the pre-processor sees). 
So if you make the SVN type always be 64 bits, at the point you need to 
possibly downscale, do something like:

  if (sizeof(aprtype) < sizeof(svntype)) {
    whatever();
  }

> Users build subversion too you know.  I don't think it's realstic to
> expect them to just know the details of large file support.  Fixing the
Thats fair. I think its rarer than a developer doing it, but I can see 
it happening. Of course, only a developer is likely to be in need of an 
SCM, especially at its bleeding edge version, but your point is valid :)

> Fair enough.  Sorry for venting at you about that.  Wrote the reply
No harm done. I've had to develop a skin so thick it would make a rhino 
blush with envy over the last few months :)

Kean



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Mon, Jan 12, 2004 at 12:22:12PM -0800, Kean Johnston wrote:
> I would expect that querying Perl should tell you that.

Sure and we ask perl for the compile flags we need.  One of which is the
OFFSET_BITS stuff.  However, that doesn't mean that APR was built that
way and it would be wrong to just turn on the flag and ignore the issue.
So simply fixing the build system to turn it on when perl has it turned
on would be wrong.  We'd still have to do bound checking for what we'd
pass into APR.

It seems much more logical to simply always use a 64-bit offset and to
#ifdef the bounds checking we'd need on the machines that have an APR
built with a 32-bit off_t.

> And you seem to be confusing users and developers. Users dont need to
> be aware of the issue, developers do.

No I'm not.  This is a user visible issue.  I install a fresh copy of
Mandrake.  I decide I want the latest subversion that hasn't been
packaged.  So I download and build it.  Except now this nifty perl
script that I've been using that I didn't write segfaults.

Users build subversion too you know.  I don't think it's realstic to
expect them to just know the details of large file support.  Fixing the
build system would require us to do some of the pain of bounds checking
anyway.  Might as well get the benefits of a fixed size in our interface
while we're at it.

> Thats was completely uncalled for, and I think you know it. Please
> redirect your anger at my employer to the appropriate channels, not at
> any random person who happens to post. Thank you.     

Fair enough.  Sorry for venting at you about that.  Wrote the reply
shortly after getting up and should have put more thought into what I
was saying.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by kf...@collab.net.
Kean Johnston <jk...@sco.com> writes:
> > I'm sorry but that just seems a tad crazy to me.  Certainly doesn't seem
> > user friendly.  But then considering the company you're keeping, I guess
> > I shouldn't be surprised.
>
> Thats was completely uncalled for, and I think you know it. Please
> redirect your anger at my employer to the appropriate channels, not at
> any random person who happens to post. Thank you.

Bravo for the civilized response (and completely agree with you).

Ben: please don't rag on people for what their employers do.  We've
seen this happen occasionally with employers other than SCO, too, and
it's never constructive.  It won't change the things you don't like
about SCO, and it won't help Kean help Subversion.

Thanks,
-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Kean Johnston <jk...@sco.com>.
> Most users are not going to be aware of _FILE_OFFSET_BITS.  You're
> suggesting that users should be aware of that, aware of how their perl
> was compiled and know that they should pass a
> CFLAGS=-D_FILE_OFFSET_BITS=64 to the configure?
I would expect that querying Perl should tell you that. And you seem to 
be confusing users and developers. Users dont need to be aware of the 
issue, developers do.

> I'm sorry but that just seems a tad crazy to me.  Certainly doesn't seem
> user friendly.  But then considering the company you're keeping, I guess
> I shouldn't be surprised.
Thats was completely uncalled for, and I think you know it. Please 
redirect your anger at my employer to the appropriate channels, not at 
any random person who happens to post. Thank you.

Kean


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Mon, Jan 12, 2004 at 04:16:55AM -0800, Kean Johnston wrote:
> Although I agree its a problem, which may well be addressed by changing 
> the particular usage you encountered to an unambiguous type such as 
> apr_uint32_t, I think the real error is in how you are compiling svn. If 
> you have decided to use the Perl bindings then you should compile svn in 
> the same approximate environment that Perl was compiled in. This 
> includes subtle things like _FILE_OFFSET_BITS, as well as other things 
> such as FD_SETSIZE or other nasty things that can trip you up if they 
> are set to different values in different applications. Likewise things 
> like threading.
> 
> There is an implied contract between the Perl distribution and its 
> users. If you want to use Perl in your app, you have to play Perl's 
> game. The same can be said for Python, Tcl, SLang, or any other library 
> for that matter. These days, you get to make an almost universal choice 
> for your platform. You either compile everything with 64bit support or 
> nothing, or else you do everything twice and get to futz around with 
> library paths.
> 
> Another popular library you may run into this exact problem with is 
> libz. It also defines z_off_t in terms of off_t, which makes calls to 
> thinks like gzseek() dangerous if you mix and match _FILE_OFFSET_BITS 
> values betwwen the app that uses it and the way the library was compiled.
> 
> In an ideal world, every application or library that needed offsets 
> would be sensitive to _FILE_OFFSET_BITS and provide both whatever32 and 
> whatever64 functions, and switch between them as needed. This is not an 
> ideal world :)

It was a standard distro perl.  Every single user of Mandrake on a
32-bit platform (and possibly others) would end up with this problem
unless they go out of their way to do some hoop jumping to handle this.
I did nothing extra to tell it to use _FILE_OFFSET_BITS, the build
system is picking that up automaticaly from Perl via:
perl -MExtUtils::Embed -e ccopts

Most users are not going to be aware of _FILE_OFFSET_BITS.  You're
suggesting that users should be aware of that, aware of how their perl
was compiled and know that they should pass a
CFLAGS=-D_FILE_OFFSET_BITS=64 to the configure?

I'm sorry but that just seems a tad crazy to me.  Certainly doesn't seem
user friendly.  But then considering the company you're keeping, I guess
I shouldn't be surprised.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Mon, Jan 12, 2004 at 10:02:22AM -0500, John Peacock wrote:
> Sorry, I am the one who noticed the problem, and I did exactly that.  While 
> debugging this, I literally built _everything_ from the ground up, 
> including Perl, SWIG, Subversion, and the Perl bindings.  If there is 
> something special I need to do at any point with _FILE_OFFSET_BITS, et al, 
> then the build system must be enhanced to do this, or else the bindings are 
> useless to anyone not running a pure-64bit environment.
> 
> It just so happens that Ben was developing with a 64bit pre-release version 
> of Mandrake, so he didn't see the problem.  I'm using a 32bit distro 
> (currently a much more common environment), so although I can build Perl to 
> use 64-bit ints, I shouldn't have to unless I want to for some other 
> reason.  I will, in fact, build a 64-bit aware Perl and see if that fixes 
> it (as it should).  But the larger issue of relying on an off_t that is not 
> consistent is still a problem that needs solving.

It won't.  The segfault is happening because the perl bindings are
expecting 32-bits more memory than the client library is actually giving
them.  So when it goes to read the last parameter it reads past the end
of its memory.  

It's not visable on a 64-bit platform because the size of a long and a
long long are identical.  So despite the different data types there's no
difference.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Kean Johnston <jk...@sco.com>.
> if that fixes it (as it should).  But the larger issue of relying on an 
> off_t that is not consistent is still a problem that needs solving.
I concur, and tried to express just that. Its a pretty thorny issue though.

Kean


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by John Peacock <jp...@rowman.com>.
Kean Johnston wrote:
> Although I agree its a problem, which may well be addressed by changing 
> the particular usage you encountered to an unambiguous type such as 
> apr_uint32_t, I think the real error is in how you are compiling svn. If 
> you have decided to use the Perl bindings then you should compile svn in 
> the same approximate environment that Perl was compiled in. This 
> includes subtle things like _FILE_OFFSET_BITS, as well as other things 
> such as FD_SETSIZE or other nasty things that can trip you up if they 
> are set to different values in different applications. Likewise things 
> like threading.

Sorry, I am the one who noticed the problem, and I did exactly that.  While 
debugging this, I literally built _everything_ from the ground up, including 
Perl, SWIG, Subversion, and the Perl bindings.  If there is something special I 
need to do at any point with _FILE_OFFSET_BITS, et al, then the build system 
must be enhanced to do this, or else the bindings are useless to anyone not 
running a pure-64bit environment.

It just so happens that Ben was developing with a 64bit pre-release version of 
Mandrake, so he didn't see the problem.  I'm using a 32bit distro (currently a 
much more common environment), so although I can build Perl to use 64-bit ints, 
I shouldn't have to unless I want to for some other reason.  I will, in fact, 
build a 64-bit aware Perl and see if that fixes it (as it should).  But the 
larger issue of relying on an off_t that is not consistent is still a problem 
that needs solving.

John

-- 
John Peacock
Director of Information Research and Technology
Rowman & Littlefield Publishing Group
4501 Forbes Boulevard
Suite H
Lanham, MD  20706
301-459-3366 x.5010
fax 301-429-5748


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Kean Johnston <jk...@sco.com>.
> The problem is pretty clear.  Some compliations might have be a 64-bit
> int and some might be a 32-bit int.  In my case perl bindings are
> picking up the 64-bit off_t form perl via the SWIG_PL_INCLUDES variable.
> libsvn_client sees it as 32-bit, the bindings see it as 64-bit.  This is
> a problem.
Although I agree its a problem, which may well be addressed by changing 
the particular usage you encountered to an unambiguous type such as 
apr_uint32_t, I think the real error is in how you are compiling svn. If 
you have decided to use the Perl bindings then you should compile svn in 
the same approximate environment that Perl was compiled in. This 
includes subtle things like _FILE_OFFSET_BITS, as well as other things 
such as FD_SETSIZE or other nasty things that can trip you up if they 
are set to different values in different applications. Likewise things 
like threading.

There is an implied contract between the Perl distribution and its 
users. If you want to use Perl in your app, you have to play Perl's 
game. The same can be said for Python, Tcl, SLang, or any other library 
for that matter. These days, you get to make an almost universal choice 
for your platform. You either compile everything with 64bit support or 
nothing, or else you do everything twice and get to futz around with 
library paths.

Another popular library you may run into this exact problem with is 
libz. It also defines z_off_t in terms of off_t, which makes calls to 
thinks like gzseek() dangerous if you mix and match _FILE_OFFSET_BITS 
values betwwen the app that uses it and the way the library was compiled.

In an ideal world, every application or library that needed offsets 
would be sensitive to _FILE_OFFSET_BITS and provide both whatever32 and 
whatever64 functions, and switch between them as needed. This is not an 
ideal world :)

Kean


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Thu, Jan 15, 2004 at 06:24:09AM -0500, Russell Yanofsky wrote:
> I agree. The subversion libraries and APR should be hardened to provide a
> consistent interface on any given platform regardless what runtime library
> or library options they are used with. And the changes you propose at
> 
> http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=54363
> 
> would be desirable even if this issue didn't exist.

I'm working on this today.  So far it's going pretty well.  Looks like
it's going to be fairly straightforward.  

> It sounds like perl gives extension writers some leeway in this area. I'd
> say if that if you aren't forced to build your extension
> with -D_FILE_OFFSET_BITS=64 (or   -D__USE_FILE_OFFSET64 or whatever) then,
> in the name of all that is good and holy, don't do that. But that's up to
> you, and your patch for subversion will probably make the issue moot.

I'm not inclined to go mucking in the compile time flags that perl is
telling us to use.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Russell Yanofsky <re...@columbia.edu>.
Ben Reser wrote:
> On Tue, Jan 13, 2004 at 04:07:57AM -0500, Russell Yanofsky wrote:
>> Is it possible to just strip out -D__USE_FILE_OFFSET64 from what it
>> returns? Or do the perl APIs use off_t in their own interfaces?
>
> Depends on which IO interface you use.  There's several of them.
> I don't think it would create any problems for the perl bindings to
> strip that.  But I don't think it solves the problem.  It hides it for
> one situation that happens to be easier to work around and does
> nothing
> for the rest.

I agree. The subversion libraries and APR should be hardened to provide a
consistent interface on any given platform regardless what runtime library
or library options they are used with. And the changes you propose at

http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=54363

would be desirable even if this issue didn't exist.

It sounds like perl gives extension writers some leeway in this area. I'd
say if that if you aren't forced to build your extension
with -D_FILE_OFFSET_BITS=64 (or   -D__USE_FILE_OFFSET64 or whatever) then,
in the name of all that is good and holy, don't do that. But that's up to
you, and your patch for subversion will probably make the issue moot.

- Russ



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Tue, Jan 13, 2004 at 04:07:57AM -0500, Russell Yanofsky wrote:
> Is it possible to just strip out -D__USE_FILE_OFFSET64 from what it returns?
> Or do the perl APIs use off_t in their own interfaces?

Depends on which IO interface you use.  There's several of them.
I don't think it would create any problems for the perl bindings to
strip that.  But I don't think it solves the problem.  It hides it for
one situation that happens to be easier to work around and does nothing
for the rest.

> It seems the perl people have screwed up in making extension developers
> build with non-standard library options. It would be fine if they just used
> the __USE_FILE_OFFSET64 macro internally to do 64 bit I/O on Linux, but they
> aren't right to try to impose their implementation detail on everybody else.

They largely let you do whatever you want when writing modules.  There's
good and bad in this strategy.  The good is there have been some very
nifty modules written that wouldn't otherwise be possible.  The bad is
that it makes modules dependent upon impelementation details.  

> This problem should be addressed in the perl bindings itself. It should not
> be passed on to Subversion or APR. If stripping out the macro definition is
> not enough to deal with the problem, then you can create a little
> intermediate library in the perl bindings compiled without perl options,
> that forwards calls to the handful of functions that use apr_off_t, and uses
> apr_int64_t in it's own interface.

This is not a unique problem to perl.  If someone uses a library that is
using 64-bit off_t's when build their client and APR is using 32-bit
off_t's we're going to run into the same problem.  Except this will be
harder to fix.  It'll require that they rebuild APR and anything the
depends on it to use 64-bit off_t's.  From when I was investigating I
didn't see any options for APR to enable this.  I think you'd have to
use CFLAGS to get APR to do it.

> How is that not consistent with what APR is doing? APR is a platform
> compatility layer. It's whole purpose is to provide a platform neutral
> interface that works on multiple platforms. There is no reason to hardcode
> 64 bit types into a generic file i/o interface.
> 
> I'm not proposing that. In your original post you seemed to suggest that the
> apr interface needed be changed to address this problem. I disagree with
> that. This is a linux specific problem and if anything at all is needs to be
> changed in APR to address it, the changes should be limited to the linux
> implementation.

I don't see APR fixing this issue happening in a timeframe that fits the
schedule we're trying to keep.  And even then we'd have issues with
continuing to use a type name that was ambiguous.  We'd have to require
that everyone use the latest version of APR that has the fix in order to
be sure that we weren't running into such an issue.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Russell Yanofsky <re...@columbia.edu>.
Ben Reser wrote:
> On Mon, Jan 12, 2004 at 08:22:06AM -0500, Russell Yanofsky wrote:
>> I don't understand. Are you defining __USE_FILE_OFFSET64 somewhere
>> in the perl bindings? Isn't the solution to just use the same
>> library settings across all parts of subversion?
>
> Automatically picked up by the subversion build system with:
> perl -MExtUtils::Embed -e ccopt


Is it possible to just strip out -D__USE_FILE_OFFSET64 from what it returns?
Or do the perl APIs use off_t in their own interfaces?

It seems the perl people have screwed up in making extension developers
build with non-standard library options. It would be fine if they just used
the __USE_FILE_OFFSET64 macro internally to do 64 bit I/O on Linux, but they
aren't right to try to impose their implementation detail on everybody else.

This problem should be addressed in the perl bindings itself. It should not
be passed on to Subversion or APR. If stripping out the macro definition is
not enough to deal with the problem, then you can create a little
intermediate library in the perl bindings compiled without perl options,
that forwards calls to the handful of functions that use apr_off_t, and uses
apr_int64_t in it's own interface.

>> The nice thing about apr_off_t interfaces over off32 and off64
>> interfaces is that apr_off_t interfaces can work ideally on all
>> platforms whether they support 64-bit files or not.
>>
>> If apr has to be changed to avoid this ambiguity, a better way to do
>> it would be to define
>>
>>   typedef  off64_t apr_off_t;
>>
>> unconditionally on linux and have it use 64 bit functions like
>> lseek64 internally.
>
> That would work I guess but that's not consistent with what APR is
> doing

How is that not consistent with what APR is doing? APR is a platform
compatility layer. It's whole purpose is to provide a platform neutral
interface that works on multiple platforms. There is no reason to hardcode
64 bit types into a generic file i/o interface.

> and I don't think it's a good idea for us to go around redefining apr
> types.

I'm not proposing that. In your original post you seemed to suggest that the
apr interface needed be changed to address this problem. I disagree with
that. This is a linux specific problem and if anything at all is needs to be
changed in APR to address it, the changes should be limited to the linux
implementation.

- Russ



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Ben Reser <be...@reser.org>.
On Mon, Jan 12, 2004 at 08:22:06AM -0500, Russell Yanofsky wrote:
> I don't understand. Are you defining __USE_FILE_OFFSET64 somewhere in the
> perl bindings? Isn't the solution to just use the same library settings
> across all parts of subversion?

Automatically picked up by the subversion build system with:
perl -MExtUtils::Embed -e ccopts

> The nice thing about apr_off_t interfaces over off32 and off64 interfaces is
> that apr_off_t interfaces can work ideally on all platforms whether they
> support 64-bit files or not.
> 
> If apr has to be changed to avoid this ambiguity, a better way to do it
> would be to define
> 
>   typedef  off64_t apr_off_t;
> 
> unconditionally on linux and have it use 64 bit functions like lseek64
> internally.

That would work I guess but that's not consistent with what APR is doing
and I don't think it's a good idea for us to go around redefining apr
types.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: apr_off_t is of an ambiguous size.

Posted by Russell Yanofsky <re...@columbia.edu>.
Ben Reser wrote:
> ...
> apr_off_t is just a typedef to off_t.  Which on Linux is defined in
> sys/types.h as:
> # ifndef __USE_FILE_OFFSET64
> typedef __off_t off_t;
> # else
> typedef __off64_t off_t;
> # endif
>
> The problem is pretty clear.  Some compliations might have be a 64-bit
> int and some might be a 32-bit int.  In my case perl bindings are
> picking up the 64-bit off_t form perl via the SWIG_PL_INCLUDES
> variable. libsvn_client sees it as 32-bit, the bindings see it as
> 64-bit.  This is a problem.

I don't understand. Are you defining __USE_FILE_OFFSET64 somewhere in the
perl bindings? Isn't the solution to just use the same library settings
across all parts of subversion?

> Looking at the headers we're using apr_off_t in svn_diff, svn_io and
> svn_client (for the blame receiver).  By only have a 32-bit apr_off_t
> we're crippling ourselves in the long run for large file support and
> making interoperating with the growing number of libraries that do
> have it difficult.
>
> Unfortunately, APR doesn't turn it on.  Nor does apr provide us with
> an apr_off64_t.  I would guess it doesn't try to turn it on because it
> can't be reliably done on all platforms.
>
> We can't just ignore the issue because we have an ambiguous data type
> in our interface.  Even if APR adds support apr_off_t will still
> probably be ambiguous.  Only a new type of apr_off32_t and
> apr_off64_t would be unambiguous.  So if/when APR fixes this we'd be
> left with an API change.

The nice thing about apr_off_t interfaces over off32 and off64 interfaces is
that apr_off_t interfaces can work ideally on all platforms whether they
support 64-bit files or not.

If apr has to be changed to avoid this ambiguity, a better way to do it
would be to define

  typedef  off64_t apr_off_t;

unconditionally on linux and have it use 64 bit functions like lseek64
internally.

- Russ



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org