You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apr.apache.org by Ryan Bloom <rb...@gmail.com> on 2005/01/19 19:33:14 UTC

Re-architecture for 2.0 tree

Those of you who have been here for a long time have heard me talking
about this idea,  but I wanted to get it out there for everybody and
on list.  Obviously, I probably won't implement much of it, but I
would like people to think about it.  This is a 2.0 change, because it
completely changes the source and binary compatibility of APR.

One of my biggest mistakes when initially designing APR was that I
forced each platform to have completely distinct implementations for
simple functions if the structures were distinct.  This has led to a
great amount of duplicate code in the library, and makes it harder to
see where the differences between the platforms truly are.  I would
love to see APR 2.0 solve this problem.

My idea is simple enough, share as much of the structures as possible
between platforms.  This allows the functions that act on those
structures to be common code.  To do this, the basic format of our
structures have to change, so that they look more like the apr_poll_t
structure.  I'll use files as an example (forgive the syntax):

typedef struct arch_file arch_file;
typedef struct apr_file_t {
    apr_pool_t *p;
    char *name;
    int eof_hit;
    ....
    union 
    apr_arch_file_t *arch_file;
} apr_file_t;

In another header file:
#ifdef WIN32
struct arch_file {
    HANDLE filedes;
}
#else
struct arch_file {
    int filedes;
}
#endif

This should be enough that you can see what I am thinking about. 
Basically, anything that is common to all files is in a shared
structure which can be incomplete, but doesn't have to be.  Anything
that isn't common is stuck in an internal structure, which must be
incomplete.  This can allow simple functions, like
apr_file_get_name(), to read the name of the file quickly (and only
have one implementation of that function), while still allowing native
functions to be used for the more complex work.

If we don't want the apr_file_t structure to be incomplete, we can
still provide the getters/setters for the internal structures, but
also open up our structures, so that people can get the file's name by
accessing fp->name directly.

This may also help us remove the depenance on pools throughout the
code.  My least favorite part of APR is that all allocation _must_ be
done through pools, even if pools don't make sense for your
application (I take full blame for that).  With this model, end-users
can allocate the space for apr_file_t themselves, and all we need to
do is figure out how to allocate the arch_file structure correctly, a
much smaller allocation, and I believe we will find that we allocate
less within APR and allow developers to pass us pre-allocated
structures.

Thoughts, comments, critisms, questions?

Ryan

-- 
Ryan Bloom
rbb@apache.org
rbb@redhat.com
rbloom@gmail.com

Re: Re-architecture for 2.0 tree

Posted by Nick Kew <ni...@webthing.com>.

On Wednesday 19 January 2005 18:33, Ryan Bloom wrote:

> One of my biggest mistakes when initially designing APR was that I
> forced each platform to have completely distinct implementations for
> simple functions if the structures were distinct.  This has led to a
> great amount of duplicate code in the library, and makes it harder to
> see where the differences between the platforms truly are.  I would
> love to see APR 2.0 solve this problem.

I don't think the architecture is inherently wrong.  Duplicated code suggests
rather that the platform-layer is too big.

> My idea is simple enough, share as much of the structures as possible
> between platforms.  This allows the functions that act on those
> structures to be common code.  To do this, the basic format of our
> structures have to change, so that they look more like the apr_poll_t
> structure.  I'll use files as an example (forgive the syntax):

If I understand you aright, you mean factoring out common code from the
platform layer into a new platform-independent layer, yesno?   I think that
makes sense.

> typedef struct arch_file arch_file;
> typedef struct apr_file_t {
>     apr_pool_t *p;
>     char *name;
>     int eof_hit;
>     ....
>     union
>     apr_arch_file_t *arch_file;
> } apr_file_t;

Good so far: an APR-ised indirection for a platform-dependent entry
(except - what's that "union" doing in there?)

> In another header file:
> #ifdef WIN32
> struct arch_file {
>     HANDLE filedes;
> }
> #else
> struct arch_file {
>     int filedes;
> }
> #endif

I'm not so keen on that.  A proliferation of #ifdefs in actual code
are harder to work with (some of us still use tools like grep:-)
A minor reorganisation like

#ifdef WIN32
#include "win32stuff.h"
#else
#include "posixstuff.h"
#endif

with a level of granularity TBD serves to define the interface of the
platform-layer much more clearly.

> Thoughts, comments, critisms, questions?

My apr_dbd code deals with a somewhat analagous problem where the
"platform" is the backend database.  It deals with that in a similar way,
using incomplete types for the platform-dependent structs.

It also dlloads the drivers (provided APR_HAS_DSO).  That isn't relevant to
most of APR as it stands now (except perhaps APR_DBM), but could become
the norm if - as has been suggested - we're going to make it a much more
pluggable framework and accommodate third-party extensions, apr-java, etc.

-- 
Nick Kew

Re: Re-architecture for 2.0 tree

Posted by kf...@collab.net.

"Graham Leggett" <mi...@sharp.fm> writes:
> Brian W. Fitzpatrick said:
> > Might I suggest making a private branch?  Branches are cheap and easy in
> > Subversion.  While I abhor branching in CVS, in Subversion, I prefer to
> > work on a branch cause it's so easy to manage.
> 
> Can do - but I am going to need some handholding to set it up, haven't yet
> got to branching in subversion and I don't want to mess it up. Do you have
> a short howto somewhere on what to do?

Quick answer:

   $ svn cp -m "Create XYZ branch." http://.../trunk/ http://.../branches/XYZ

Longer answer:

   http://svnbook.red-bean.com/en/1.1/svn-book.html#svn-ch-4

-Karl

Re: Re-architecture for 2.0 tree

Posted by Graham Leggett <mi...@sharp.fm>.

Brian W. Fitzpatrick said:

> Might I suggest making a private branch?  Branches are cheap and easy in
> Subversion.  While I abhor branching in CVS, in Subversion, I prefer to
> work on a branch cause it's so easy to manage.

Can do - but I am going to need some handholding to set it up, haven't yet
got to branching in subversion and I don't want to mess it up. Do you have
a short howto somewhere on what to do?

Regards,
Graham
--

Re: Re-architecture for 2.0 tree

Posted by "Brian W. Fitzpatrick" <fi...@red-bean.com>.

On Thu, 2005-01-20 at 07:36, Graham Leggett wrote:
> Ryan Bloom said:
> 
> >> +1
> 
> Is it time for apr/trunk to become v2.0?
> 
> The LDAP stuff has some fundamental flaws that I need to fix, but I have
> no sandbox to play in :(

Might I suggest making a private branch?  Branches are cheap and easy in
Subversion.  While I abhor branching in CVS, in Subversion, I prefer to
work on a branch cause it's so easy to manage.

-Fitz

Re: Re-architecture for 2.0 tree

Posted by Graham Leggett <mi...@sharp.fm>.

Ryan Bloom said:

>> +1

Is it time for apr/trunk to become v2.0?

The LDAP stuff has some fundamental flaws that I need to fix, but I have
no sandbox to play in :(

Regards,
Graham
--

Re-architecture for 2.0 tree

Posted by Ryan Bloom <rb...@gmail.com>.

grrrrr,  should have sent this to the list.

Ryan


---------- Forwarded message ----------
From: Ryan Bloom <rb...@gmail.com>
Date: Wed, 19 Jan 2005 15:05:10 -0500
Subject: Re: Re-architecture for 2.0 tree
To: "William A. Rowe, Jr." <wr...@rowe-clan.net>


So, I took a few shortcuts in my original message, explained below.

On Wed, 19 Jan 2005 13:16:10 -0600, William A. Rowe, Jr.
<wr...@rowe-clan.net> wrote:
> At 12:33 PM 1/19/2005, Ryan Bloom wrote:
>
> >One of my biggest mistakes when initially designing APR was that I
> >forced each platform to have completely distinct implementations for
> >simple functions if the structures were distinct.  This has led to a
> >great amount of duplicate code in the library, and makes it harder to
> >see where the differences between the platforms truly are.  I would
> >love to see APR 2.0 solve this problem.
>
> +1
>
> >In another header file:
> >#ifdef WIN32
> >struct arch_file {
> >    HANDLE filedes;
> >}
> >#else
> >struct arch_file {
> >    int filedes;
> >}
> >#endif
>
> I still like the concept of doing something like
>
> struct apr_foo_t {
>     apr_pool_t *pool;
>     void *      foodat;
>     struct apr_foo_arch_t arch;
> }
>
> and defining struct apr_foo_arch_t within an include/arch/xxx/foo.h
> file.  This gains you the transparency of common elements (pool,
> foodat, etc) while leaving certain elements opaque.  Because the
> arch element above is undefined, the structure remains incomplete.
> (We can't anticipate the maximum size for the object.)
>
> Right now, in 1.2.0, we could add an element onto the end of the
> apr_foo_t opaque structure, and not break abi.  Entirely transparent
> structures don't share that attribute, folks can determine the size
> of the complete, transparent structure in their code, and force
> minor version dependencies that don't exist today.
>
> The bit that disturbed me was long #ifdef litanies which we
> know (from apache-1.3/src/server/main.c) get illegible quickly.

Yeah, I know, this was a shortcut.  I honestly do expect those
arch_file_t type of structures to be hidden inside private headers and
to have them as incomplete structures, but I was too lazy to make that
clear, you caught me.

> >This can allow simple functions, like apr_file_get_name(),
> >to read the name of the file quickly (and only have one
> >implementation of that function), while still allowing native
> >functions to be used for the more complex work.
>
> Again +1.  I'd even suggest the structure
>
>   apr/foo/common/*.c
>   apr/foo/netware/*.c
>   apr/foo/unix/*.c
>   apr/foo/win32/*.c
>
> where we drop the concept of '/unix/' as the 'common' impl,
> and build apr/*/common/ on all platforms.

++1.  Using Unix as the common base was a hack, which I only got away
with because POSIX was renamed to UNIX98 at some point.   :-D

> >If we don't want the apr_file_t structure to be incomplete, we can
> >still provide the getters/setters for the internal structures, but
> >also open up our structures, so that people can get the file's name by
> >accessing fp->name directly.
>
> Huge risks, garbage collection and modifying associated fields.
> The more we let people diddle in the transparent elements, the
> less they recognize side effects that are only possible when
> they invoke the set'ter function.  This will vary based on the
> apr_xxx_t object we are talking about of course.

Yeah, but I am a firm believer that some stuff is private, and some
stuff isn't private.  The filename doesn't necessarily need to be a
private field.  We could change my original proposal to be:

typedef struct apr_file_t {
    char *name;
    apr_common_file_t *common;
    apr_arch_file_t *arch_specific;
}

Then the common stuff is hidden in an incomplete type, but some stuff
can be easily exposed.  Obviously not all structures would necessarily
need all three types of data.

> >This may also help us remove the depenance on pools throughout the
> >code.  My least favorite part of APR is that all allocation _must_ be
> >done through pools, even if pools don't make sense for your
> >application (I take full blame for that).
>
> Unfortuantely -we- perform allocations within our own operations
> on these objects.  When that happens, we have to look at the pool
> allocator.  I believe we can and should revisit allocator/free
> indirection.  Yes, a small performance hit will result, but it
> should not be substantial enough to invalidate the entire concept.

I think we should take a closer look at why _we_ allocate structures.
Or, we could make the allocator stuff a compile-time choice, which
would mean no performance hit.  Or, we could have functions to
allocate from a pool, and other functions that allocate using
malloc/free.  Lots of options.

> >With this model, end-users can allocate the space for apr_file_t
> >themselves, and all we need to do is figure out how to allocate
> >the arch_file structure correctly, a much smaller allocation,
> >and I believe we will find that we allocate less within APR and
> >allow developers to pass us pre-allocated structures.
>
> Again, garbage collection is the primary issue.  If the user goes
> and allocates our structures, how are our (critical) destructors
> then invoked?  And how are elements within the structure ultimately
> freed?  This part of the approach scares me.
>
> One alternative would be more reusable apr_xxx_t objects, which
> would ensure we don't hit some of these issues.  To be able to
> call an apr_xxx_clear() prior to reuse, or apr_xxx_free() that
> invokes the cleanups might help here, even if they are mostly
> no-ops until a destructor actually returns the memory.
>
> In other works, clearly segregate what c++ users refer to as
> a pre-destruction phase from the actual garbage collection,
> and the allocation from the initialization phase.  This would
> help you realize your concept of detaching us (somewhat) from
> pools all together.


I think this is going to be required.  The original design relied on
pools to do this stuff, but the 2.0 approach should include explicit
destruction methods that are no-ops when pools are used, but that
actually free memory and release resources when pools aren't used.
This part of my thoughts aren't well thought out yet, but I think
there is some wiggle room here.

I think you and I are on the same page, I just took some short-cuts
and didn't explain it as well as you did.   Thanks!   :-)

Ryan

--
Ryan Bloom
rbb@apache.org
rbb@redhat.com
rbloom@gmail.com


-- 
Ryan Bloom
rbb@apache.org
rbb@redhat.com
rbloom@gmail.com

Re: Re-architecture for 2.0 tree

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

At 12:33 PM 1/19/2005, Ryan Bloom wrote:

>One of my biggest mistakes when initially designing APR was that I
>forced each platform to have completely distinct implementations for
>simple functions if the structures were distinct.  This has led to a
>great amount of duplicate code in the library, and makes it harder to
>see where the differences between the platforms truly are.  I would
>love to see APR 2.0 solve this problem.

+1

>In another header file:
>#ifdef WIN32
>struct arch_file {
>    HANDLE filedes;
>}
>#else
>struct arch_file {
>    int filedes;
>}
>#endif

I still like the concept of doing something like

struct apr_foo_t {
    apr_pool_t *pool;
    void *      foodat;
    struct apr_foo_arch_t arch;
}

and defining struct apr_foo_arch_t within an include/arch/xxx/foo.h
file.  This gains you the transparency of common elements (pool,
foodat, etc) while leaving certain elements opaque.  Because the
arch element above is undefined, the structure remains incomplete.
(We can't anticipate the maximum size for the object.)

Right now, in 1.2.0, we could add an element onto the end of the
apr_foo_t opaque structure, and not break abi.  Entirely transparent
structures don't share that attribute, folks can determine the size
of the complete, transparent structure in their code, and force
minor version dependencies that don't exist today.

The bit that disturbed me was long #ifdef litanies which we
know (from apache-1.3/src/server/main.c) get illegible quickly.

>This can allow simple functions, like apr_file_get_name(), 
>to read the name of the file quickly (and only have one 
>implementation of that function), while still allowing native
>functions to be used for the more complex work.

Again +1.  I'd even suggest the structure

  apr/foo/common/*.c
  apr/foo/netware/*.c
  apr/foo/unix/*.c
  apr/foo/win32/*.c

where we drop the concept of '/unix/' as the 'common' impl,
and build apr/*/common/ on all platforms.

>If we don't want the apr_file_t structure to be incomplete, we can
>still provide the getters/setters for the internal structures, but
>also open up our structures, so that people can get the file's name by
>accessing fp->name directly.

Huge risks, garbage collection and modifying associated fields.
The more we let people diddle in the transparent elements, the 
less they recognize side effects that are only possible when
they invoke the set'ter function.  This will vary based on the
apr_xxx_t object we are talking about of course.

>This may also help us remove the depenance on pools throughout the
>code.  My least favorite part of APR is that all allocation _must_ be
>done through pools, even if pools don't make sense for your
>application (I take full blame for that).

Unfortuantely -we- perform allocations within our own operations
on these objects.  When that happens, we have to look at the pool
allocator.  I believe we can and should revisit allocator/free
indirection.  Yes, a small performance hit will result, but it
should not be substantial enough to invalidate the entire concept.

>With this model, end-users can allocate the space for apr_file_t 
>themselves, and all we need to do is figure out how to allocate 
>the arch_file structure correctly, a much smaller allocation, 
>and I believe we will find that we allocate less within APR and 
>allow developers to pass us pre-allocated structures.

Again, garbage collection is the primary issue.  If the user goes
and allocates our structures, how are our (critical) destructors 
then invoked?  And how are elements within the structure ultimately
freed?  This part of the approach scares me.

One alternative would be more reusable apr_xxx_t objects, which
would ensure we don't hit some of these issues.  To be able to
call an apr_xxx_clear() prior to reuse, or apr_xxx_free() that
invokes the cleanups might help here, even if they are mostly
no-ops until a destructor actually returns the memory.

In other works, clearly segregate what c++ users refer to as
a pre-destruction phase from the actual garbage collection,
and the allocation from the initialization phase.  This would 
help you realize your concept of detaching us (somewhat) from
pools all together.

Bill