You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Greg Stein <gs...@lyra.org> on 2001/01/11 06:43:16 UTC

[wrowe@rowe-clan.net: RE: apr_fileinfo_t is ___BOGUS___]

misfired... forwarding to dev@ ...

----- Forwarded message from "William A. Rowe, Jr." <wr...@rowe-clan.net> -----

From: "William A. Rowe, Jr." <wr...@rowe-clan.net>
Subject: RE: apr_fileinfo_t is ___BOGUS___
To: "'Greg Stein'" <gs...@lyra.org>
Date: Wed, 10 Jan 2001 19:03:18 -0600

> From: Greg Stein [mailto:gstein@lyra.org]
> Sent: Wednesday, January 10, 2001 6:35 PM
> 
> [ moving this to dev@apr.apache.org, where it belongs... ]

ack... typo the third time I tried retransmitting (my usual account
isn't moderated in.)

> On Wed, Jan 10, 2001 at 02:21:37PM -0600, William A. Rowe, Jr. wrote:
> > Ok guys...
> > 
> >   apr_fileinfo_t must be restored to an incomplete type.
> > 
> > On win32, we have the following scenario;
> > 
> >   inode, dev, and ctype require a CreateFile call.
> 
> ctype?

file, pipe, char, dev etc.  Simple typo.

> >   type (pipe, etc) requires CreateFile/GetFileType calls.
> >   query owner/access kills performance by two orders of magnitude.
> 
> that doesn't happen today. user/group fields are zeroed.

Ack.  I certainly expect to see an accessor for these where they are
-required-, e.g. permission properties of dav?

> >   truename requires a FindFile call.
> 
> truename? eh?

eh, ya, eh.  We may not need it everywhere, but we need it some places.
Both to distinguish case and mappings (shortnames).  Just because it
it a noop returning the original string passed under unix doesn't imply
apps don't need it.  And explain that to the folks trying to do security
on case insensitive unixes such as OS X, or a case-insensitive volume
mount point.  Noone but the kernel knows what case folding was accepted.
On unix, that's a stat - readdir - match for inode.  But we obviously
need some 'inside information' that there is case folding going on.
That part, on unix, I have no idea.

> > Now, there is no SINGLE ATOMIC CALL that does all this work for
> > us.  It -should- be our problem (the win32 hackers) to get this
> > right, but we can only do that with accessor functions.
> 
> You're talking about performance, not correctness... right?

It's (nearly) correct now, but as Stoddard observed, performancewise
we have work to do.

> > The changes need to also augment the lstat flavor, so that the
> > one can be queried from the other (a significant performance
> > boost on win32.)
> 
> I don't understand this "queried from the other" bit.

If we have resolved the truename of the link file, do we really need
to call findfirstfile again as the target and get a truename again?  No.

> > BTW - we also need an optimized form of apr_file_open that accepts
> > an apr_fileinfo_t ... that handle we grabbed to stat the file can
> > be Duplicated into a read or read/write handle for file access!  The
> > file handle in the win32 flavor will be cached with a registered
> > cleanup against the pool, so until we are destroyed, we can keep
> > grabbing the file for different flavors of access.
> 
> That is a bit of a stretch. How about saying people can open the file, then
> use apr_getfileinfo() on it. (as opposed to supporting the other direction)

Both.  Either way, we have a handle embedded in both structures.  If it's
there, we can DuplicateHandle on win32.

> Some other comments on filestat.c, from some other 
> discussions with Bill
> Tutt, and looking at it now:
> 
> *) is_exe() should be static and/or #if'd out.

Ack.

> *) the code needs refactoring big time (reduce code dup errors).

Ack.  Already once-throughed and reduced utf folding to four blocks, and
even that can be compressed.  Other opportunities there.

> *) in apr_stat(): if the CreateFileW() fails because the user doesn't have
>    permission to open the file, then the code should fall thru to the
>    GetFileAttributesEx() path.

Cool thought!  

> *) possibly the same fall thru on apr_lstat()

Perhaps... that's trickier.

> *) refactoring would prevent the fake construction of an incomplete
>    apr_file_t (which could pose maintenance problems if apr_getfileinfo()
>    assumed that another field was valid, but the fake ones didn't set it
>    up). the refactoring would just pass a filehandle into an
>    attribute-fetching funtcion, and apr_getfileinfo() would also call this
>    function.

Actually, based on the volume and other factors, several fields are invalid
on a runtime basis (CreateFileW fell through?  No inode, it's APR_ENOTIMPL.
Same always on Win9x.)

> *) note that finfo->size computation is done differently in two places.
>    again: a suggestion to refactor to remove duplication.

Concur, thanks for all the pointers.

The fact remains that non-unicies may require more than one call to get
complete info, and even unicies have file info that isn't part of a stat
call (e.g. true name of file on a case-insensitive mount point.)

If we are going to optimize, this cruft must become transparent to the user.

----- End forwarded message -----

-- 
Greg Stein, http://www.lyra.org/

RE: apr_fileinfo_t is ___BOGUS___

Posted by Greg Stein <gs...@lyra.org>.
On Wed, Jan 10, 2001 at 09:43:16PM -0800, Greg Stein wrote:
>...
> ----- Forwarded message from "William A. Rowe, Jr." <wr...@rowe-clan.net> -----
>...
> > From: Greg Stein [mailto:gstein@lyra.org]
> > Sent: Wednesday, January 10, 2001 6:35 PM
>...
> > On Wed, Jan 10, 2001 at 02:21:37PM -0600, William A. Rowe, Jr. wrote:
> > > Ok guys...
> > > 
> > >   apr_fileinfo_t must be restored to an incomplete type.
> > > 
> > > On win32, we have the following scenario;
> > > 
> > >   inode, dev, and ctype require a CreateFile call.
> > 
> > ctype?
> 
> file, pipe, char, dev etc.  Simple typo.

Ah. Actually, it is "filetype". I just did a search, and found that Apache
only cares about the following types: APR_NOFILE, APR_REG, APR_DIR, and
APR_LNK.

> > >   type (pipe, etc) requires CreateFile/GetFileType calls.
> > >   query owner/access kills performance by two orders of magnitude.
> > 
> > that doesn't happen today. user/group fields are zeroed.
> 
> Ack.  I certainly expect to see an accessor for these where they are
> -required-, e.g. permission properties of dav?

DAV certainly doesn't need those. Looks like mod_include and mod_rewrire are
the only users for those two fields.

> 
> > >   truename requires a FindFile call.
> > 
> > truename? eh?
> 
> eh, ya, eh.  We may not need it everywhere, but we need it some places.

My point was: what the heck are you talking about? There is no "truename"
field anywhere in the apr_fileinfo_t structure. Since it isn't there, you
can't complain about it :-)

>...
> > > Now, there is no SINGLE ATOMIC CALL that does all this work for
> > > us.  It -should- be our problem (the win32 hackers) to get this
> > > right, but we can only do that with accessor functions.
> > 
> > You're talking about performance, not correctness... right?
> 
> It's (nearly) correct now,

If there are correctness issues, then they should be fixed regardless.

> but as Stoddard observed, performancewise
> we have work to do.

Okay. Just wanted to be sure that you're only talking about perf here.

And if we're talking about perf changes as the cause for an API change, then
I'd like to see some before/after numbers to really know what we're talking
about here. Turning it into an opaque type for a 5% win is a poor tradeoff
in my book :-)

> > > The changes need to also augment the lstat flavor, so that the
> > > one can be queried from the other (a significant performance
> > > boost on win32.)
> > 
> > I don't understand this "queried from the other" bit.
> 
> If we have resolved the truename of the link file, do we really need
> to call findfirstfile again as the target and get a truename again?  No.

That truename stuff again. From your earlier discussion, that sounds like
some of the canonical name / trusted name stuff. That doesn't belong in this
discussion, does it? I don't see a truename in apr_fileinfo_t, and I don't
see that it would be added; so that point seems moot.

> > > BTW - we also need an optimized form of apr_file_open that accepts
> > > an apr_fileinfo_t ... that handle we grabbed to stat the file can
> > > be Duplicated into a read or read/write handle for file access!  The
> > > file handle in the win32 flavor will be cached with a registered
> > > cleanup against the pool, so until we are destroyed, we can keep
> > > grabbing the file for different flavors of access.
> > 
> > That is a bit of a stretch. How about saying people can open the file, then
> > use apr_getfileinfo() on it. (as opposed to supporting the other direction)
> 
> Both.  Either way, we have a handle embedded in both structures.  If it's
> there, we can DuplicateHandle on win32.

Eh? Embedded in both structures? No... I'm saying only in the apr_file_t. If
a person wants to open the file just once, then they should open the darn
thing with apr_open(), then use apr_getfileinfo() on the result.

apr_getfileinfo() would just use the already-opened handle from the
apr_file_t structure. No double handles, no re-opening.

>...
> > *) the code needs refactoring big time (reduce code dup errors).
> 
> Ack.  Already once-throughed and reduced utf folding to four blocks, and
> even that can be compressed.  Other opportunities there.

I see two calls to utf8_to_unicode_path(). It really sounds like you're
working with a totally different filestat.c. We can't effectively talk about
revamping the API when you're working with something else.

> > *) in apr_stat(): if the CreateFileW() fails because the user doesn't have
> >    permission to open the file, then the code should fall thru to the
> >    GetFileAttributesEx() path.
> 
> Cool thought!

Credit Bill Tutt on this one. He ran into it.

[ Bill actually had a few of these suggestions, and I looked through to see
  what he was talking about. eek. ]

>...
> > *) refactoring would prevent the fake construction of an incomplete
> >    apr_file_t (which could pose maintenance problems if apr_getfileinfo()
> >    assumed that another field was valid, but the fake ones didn't set it
> >    up). the refactoring would just pass a filehandle into an
> >    attribute-fetching funtcion, and apr_getfileinfo() would also call this
> >    function.
> 
> Actually, based on the volume and other factors, several fields are invalid
> on a runtime basis (CreateFileW fell through?  No inode, it's APR_ENOTIMPL.
> Same always on Win9x.)

Sure, but that doesn't obviate the need to simplify the code.


When we get some perf numbers, then we can look at the root cause of the
problem here. I'm thinking that rather than make the whole thing opaque,
that we just move key (slow) items to other functions. But then we bung up
Unix systems where this falls out in a single step. Hard to say, which is
why we need perf data.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/