You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apr.apache.org by "William A. Rowe, Jr." <wr...@rowe-clan.net> on 2002/07/10 18:50:10 UTC

[PATCH] example BUSEC patch for benchmarking only

At 01:14 PM 7/9/2002, you wrote:
>Bill Stoddard wrote:
>>
>>Where and when was the post that described this proposal?  I'd like to take
>>a look. Is there a patch?
>
>There's no patch that I know of, but here's the thread with wrowe's
>original proposal for the binary microseconds design:
>  http://marc.theaimsgroup.com/?l=apr-dev&m=102376298728487&w=2

http://marc.theaimsgroup.com/?l=apr-dev&m=102383695930036&w=2

is a better link to some example macros.

>With the big batch of changes that I made last week, almost all
>of the time-related code in the httpd now uses the new macros--
>apr_time_sec(), apr_time_usec(), etc--instead of multiplying and
>dividing by 1,000,000 directly.  Hopefully that will make it
>easier to try out a new time representation.

Agreed.  But even discounting that USEC_PER_SEC is used for
it's real meaning somewhere, and would introduce slight discrepancies,
attached is simply the gross hack to force busec units.

Note the 'right' change is to have a BUSEC_PER_SEC constant,
but this is a truer test for benchmarking.

Bill

Re: [PATCH] example BUSEC patch for benchmarking only

Posted by Brian Pane <br...@apache.org>.

Brian Pane wrote:

> Bill Stoddard wrote:
>
>> I've not looked at the generated code, but profiling indicates that an
>> additional division is happening, adding an extra 231 instructions.
>> (xlc_r -O2)
>>
>
> If you redefine the macro as a shift, does the profile look better? 

er, I mean redefine apr_time_sec() as a shift, and
apr_time_usec() as "value & mask"...

--Brian

Re: [PATCH 3] example binary BUSEC for discussion

Posted by Brian Pane <br...@cnet.com>.

William A. Rowe, Jr. wrote:

> Just as a trivial example of using apr_time_sec_get... it helps us
> avoid casting over in the new poll/unix/poll.c line 164, where
>
>         tv.tv_sec = apr_time_sec_get(timeout);
>
> avoids a compiler emit that we are downcasting an in64 to a long. 

What's your strategy for updating the macro when we
someday start supporting 64-bit seconds in the apr_time
code?  Change the macro definition to cast to apr_uint64_t
instead of apr_uint32_t, or leave the 32-bit cast in place?
I have reservations about the latter, as it would create a
hidden Y2038 problem in code like your example above.

--Brian

Re: [PATCH 3] example binary BUSEC for discussion

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

A simple example of the disparity below...

At 02:31 PM 7/11/2002, you wrote:
>At 02:13 PM 7/11/2002, Brian Pane wrote:
>>William A. Rowe, Jr. wrote:
>>
>>>apr_int32_t apr_time_sec_get(time)  deprecates apr_time_sec(time)
>...
>>>apr_int64_t apr_time_as_sec(time)
>>
>>The naming convention is way too subtle:
>>   apr_time_[unit]_get     means 32-bit
>>   apr_time_as_[unit]      means 64-bit
>
>WRONG! apr_time_msec_get(time) gets the fractional component
>of time... e.g. 32500.15 seconds would return 150 msecs..
>
>apr_time_as_msec returns the entire time value!  So we end up with
>a time of 32,500,150 in msec's.  That's why these are int64.
>
>>How often do you anticipate that people will be using the
>>32-bit versions?  If it's not very often, then I'd rather
>>just supply the 64-bit versions in the API, because those
>>are the safe ones (no Y2038 problem).  People who really
>>need to do 32-bit math for performance reasons would have
>>to cast explicitly, but IMHO that's good because it forces
>>them to think more carefully about whether the loss of
>>precision is safe.
>
>The 32bit flavor of sec, vs. a 64 bit flavor is a simple way of
>avoiding casts in the user's code.  I'm only asking for the number
>of seconds in an apr_time_t in order to set a time, calculate a
>difference, etc.  But I don't care too much, one way or the other.

Just as a trivial example of using apr_time_sec_get... it helps us
avoid casting over in the new poll/unix/poll.c line 164, where

         tv.tv_sec = apr_time_sec_get(timeout);

avoids a compiler emit that we are downcasting an in64 to a long.

>apr_time_as_sec gives me a HUGE number of seconds to work
>with as an absolute time from 1/1/70.  That must remain 64bit.

Re: [PATCH 3] example binary BUSEC for discussion

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

At 02:13 PM 7/11/2002, Brian Pane wrote:
>William A. Rowe, Jr. wrote:
>
>>apr_int32_t apr_time_sec_get(time)  deprecates apr_time_sec(time)
>>apr_int32_t apr_time_msec_get(time) deprecates apr_time_msec(time)
>>apr_int32_t apr_time_usec_get(time) deprecates apr_time_usec(time)
>>apr_int32_t apr_time_nsec_get(time) deprecates apr_time_nsec(time)
>
>Oh no, I just recently finished changing all the code to use
>those now-deprecated macros. :-)

I know... but there are two different 'things'...

>>apr_int64_t apr_time_as_sec(time)
>>apr_int64_t apr_time_as_msec(time) or apr_time_as_272yrs_msec(time)
>>apr_int64_t apr_time_as_usec(time) or apr_time_as_3mos_usec(time)
>>apr_int64_t apr_time_as_nsec(time) or apr_time_as_2hrs_nsec(time)
>
>The naming convention is way too subtle:
>   apr_time_[unit]_get     means 32-bit
>   apr_time_as_[unit]      means 64-bit

WRONG! apr_time_msec_get(time) gets the fractional component
of time... e.g. 32500.15 seconds would return 150 msecs..

apr_time_as_msec returns the entire time value!  So we end up with
a time of 32,500,150 in msec's.  That's why these are int64.

>How often do you anticipate that people will be using the
>32-bit versions?  If it's not very often, then I'd rather
>just supply the 64-bit versions in the API, because those
>are the safe ones (no Y2038 problem).  People who really
>need to do 32-bit math for performance reasons would have
>to cast explicitly, but IMHO that's good because it forces
>them to think more carefully about whether the loss of
>precision is safe.

The 32bit flavor of sec, vs. a 64 bit flavor is a simple way of
avoiding casts in the user's code.  I'm only asking for the number
of seconds in an apr_time_t in order to set a time, calculate a
difference, etc.  But I don't care too much, one way or the other.

apr_time_as_sec gives me a HUGE number of seconds to work
with as an absolute time from 1/1/70.  That must remain 64bit.

Bill

Re: [PATCH 3] example binary BUSEC for discussion

Posted by Brian Pane <br...@apache.org>.

William A. Rowe, Jr. wrote:

> At 11:31 AM 7/11/2002, Brian Pane wrote:
>
>> I don't see a way to eliminate the "/ 1000" to convert usec to
>> msec.  But we may be able to do all the math in 32-bit mode, by
>> limiting the maximum timeout to the number of milliseconds that
>> will fit in 32 bits, which works out to a max timeout of about
>> 50 days.
>
>
> Actually, here is my current math, with optimized forms for specific
> situations we encounter often (aprox. limits are given in the macro 
> name).
>
> They break down to;
>
> apr_int32_t apr_time_sec_get(time)  deprecates apr_time_sec(time)
> apr_int32_t apr_time_msec_get(time) deprecates apr_time_msec(time)
> apr_int32_t apr_time_usec_get(time) deprecates apr_time_usec(time)
> apr_int32_t apr_time_nsec_get(time) deprecates apr_time_nsec(time)

Oh no, I just recently finished changing all the code to use
those now-deprecated macros. :-)

> apr_int64_t apr_time_as_sec(time)
> apr_int64_t apr_time_as_msec(time) or apr_time_as_272yrs_msec(time)
> apr_int64_t apr_time_as_usec(time) or apr_time_as_3mos_usec(time)
> apr_int64_t apr_time_as_nsec(time) or apr_time_as_2hrs_nsec(time) 

The naming convention is way too subtle:
   apr_time_[unit]_get     means 32-bit
   apr_time_as_[unit]      means 64-bit

How often do you anticipate that people will be using the
32-bit versions?  If it's not very often, then I'd rather
just supply the 64-bit versions in the API, because those
are the safe ones (no Y2038 problem).  People who really
need to do 32-bit math for performance reasons would have
to cast explicitly, but IMHO that's good because it forces
them to think more carefully about whether the loss of
precision is safe.

--Brian

Re: [PATCH 3] example binary BUSEC for discussion

Posted by Greg Marr <gr...@alum.wpi.edu>.

At 02:49 PM 07/11/2002, William A. Rowe, Jr. wrote:
>The code follows, patch attached is hard to follow.  [Like my code 
>isn't :o]
>
>/* Number of binary microseconds per second is (2^20)
>  * to keep the math fast and simple [binary arithmetic.]
>  */
>#define APR_BUSEC_BITS 20
>#define APR_BUSEC_PER_SEC APR_TIME_C(2 ^ APR_BUSEC_BITS)

Thanks, but (2 ^ APR_BUSEC_BITS) is 22 (^ is XOR).  You want 1 << 
APR_BUSEC_BITS.

-- 
Greg Marr
gregm@alum.wpi.edu
"We thought you were dead."
"I was, but I'm better now." - Sheridan, "The Summoning"

Re: [PATCH 3] example binary BUSEC for discussion

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

Guys, everyone here admits we are lousy at inventing names...

all suggestions for better names of these macros are welcome!

At 01:49 PM 7/11/2002, William A. Rowe, Jr. wrote:
Actually, here is my current math, with optimized forms for specific
>situations we encounter often (aprox. limits are given in the macro name).
>
>They break down to;
>
>apr_int32_t apr_time_sec_get(time)  deprecates apr_time_sec(time)
>apr_int32_t apr_time_msec_get(time) deprecates apr_time_msec(time)
>apr_int32_t apr_time_usec_get(time) deprecates apr_time_usec(time)
>apr_int32_t apr_time_nsec_get(time) deprecates apr_time_nsec(time)
>
>apr_int64_t apr_time_as_sec(time)
>apr_int64_t apr_time_as_msec(time) or apr_time_as_272yrs_msec(time)
>apr_int64_t apr_time_as_usec(time) or apr_time_as_3mos_usec(time)
>apr_int64_t apr_time_as_nsec(time) or apr_time_as_2hrs_nsec(time)
>
>apr_time_t apr_time_mmake(sec, msec)
>apr_time_t apr_time_umake(sec, usec)
>apr_time_t apr_time_nmake(sec, nsec)
>
>apr_time_t apr_time_from_sec(sec)
>apr_time_t apr_time_from_msec(msec) or apr_time_from_3mos_msec(msec)
>apr_time_t apr_time_from_usec(usec) or apr_time_from_3mos_usec(usec)
>apr_time_t apr_time_from_nsec(nsec) or apr_time_from_3mos_nsec(nsec)

[PATCH 3] example binary BUSEC for discussion

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

At 11:31 AM 7/11/2002, Brian Pane wrote:
>I don't see a way to eliminate the "/ 1000" to convert usec to
>msec.  But we may be able to do all the math in 32-bit mode, by
>limiting the maximum timeout to the number of milliseconds that
>will fit in 32 bits, which works out to a max timeout of about
>50 days.

Actually, here is my current math, with optimized forms for specific
situations we encounter often (aprox. limits are given in the macro name).

They break down to;

apr_int32_t apr_time_sec_get(time)  deprecates apr_time_sec(time)
apr_int32_t apr_time_msec_get(time) deprecates apr_time_msec(time)
apr_int32_t apr_time_usec_get(time) deprecates apr_time_usec(time)
apr_int32_t apr_time_nsec_get(time) deprecates apr_time_nsec(time)

apr_int64_t apr_time_as_sec(time)
apr_int64_t apr_time_as_msec(time) or apr_time_as_272yrs_msec(time)
apr_int64_t apr_time_as_usec(time) or apr_time_as_3mos_usec(time)
apr_int64_t apr_time_as_nsec(time) or apr_time_as_2hrs_nsec(time)

apr_time_t apr_time_mmake(sec, msec)
apr_time_t apr_time_umake(sec, usec)
apr_time_t apr_time_nmake(sec, nsec)

apr_time_t apr_time_from_sec(sec)
apr_time_t apr_time_from_msec(msec) or apr_time_from_3mos_msec(msec)
apr_time_t apr_time_from_usec(usec) or apr_time_from_3mos_usec(usec)
apr_time_t apr_time_from_nsec(nsec) or apr_time_from_3mos_nsec(nsec)

The code follows, patch attached is hard to follow.  [Like my code isn't :o]

/* Number of binary microseconds per second is (2^20)
  * to keep the math fast and simple [binary arithmetic.]
  */
#define APR_BUSEC_BITS 20
#define APR_BUSEC_PER_SEC APR_TIME_C(2 ^ APR_BUSEC_BITS)

/* apr_time_sec_get returns 32 bits and is subject to the Y2038 bug
  * @see apr_time_as_sec for 64 bit resolution
  */
#define apr_time_sec_get(time) ((apr_int32_t)((time) >> APR_BUSEC_BITS))

/* These functions 'get' the xsec fractional component of the time.
  *
  * msec extraction won't overflow 32 bits so convert early,
  * while usec/nsec computations must be in all 64 bit math.
  */
#define apr_time_msec_get(time) \
     (((apr_int32_t)((time) & (APR_BUSEC_PER_SEC - 1)) * 1000) \
                                                    >> APR_BUSEC_BITS)

#define apr_time_usec_get(time) \
     ((apr_int32_t)((((time) & (APR_BUSEC_PER_SEC - 1)) \
                             * APR_TIME_C(1000000)) >> APR_BUSEC_BITS))

#define apr_time_nsec_get(time) \
     ((apr_int32_t)((((time) & (APR_BUSEC_PER_SEC - 1)) \
                             * APR_TIME_C(1000000000)) >> APR_BUSEC_BITS))


/* apr_time_as_sec returns 64 bits and escapes the Y2038 bug
  * @see apr_time_sec_get for 32 bit resolution
  */
#define apr_time_as_sec(time) ((apr_int64_t)((time) >> APR_BUSEC_BITS))

/* These functions return the given time value, complete,
  * in the given resolution (not simply the fractional component)
  *
  * They carry no resolution penalty, and due to the binary math,
  * they suffer little performance penalty either.
  */
#define apr_time_as_msec(time) \
     (((apr_int64_t)(time) >> APR_BUSEC_BITS) \
      + ((((apr_int32_t)(time) & (APR_BUSEC_PER_SEC - 1)) \
          * 1000) >> APR_BUSEC_BITS))

#define apr_time_as_usec(time) \
     (((apr_int64_t)(time) >> APR_BUSEC_BITS) \
      + ((((apr_int64_t)(time) & (APR_BUSEC_PER_SEC - 1)) \
          * APR_TIME_C(1000000)) >> APR_BUSEC_BITS))

#define apr_time_as_nsec(time) \
     (((apr_int64_t)(time) >> APR_BUSEC_BITS) \
      + ((((apr_int64_t)(time) & (APR_BUSEC_PER_SEC - 1)) \
          * APR_TIME_C(1000000000)) >> APR_BUSEC_BITS))

/* this computation is alright, but not great, we are limited to 272
  * years.  No Y2038 bug, but consider it an 'optimized form' for which
  * we have some loss of resolution.  Doing this in 32 bit signed math
  * would yield only 2 seconds, which is not worth implementing compared
  * to the existing apr_time_msec_get().
  */
#define apr_time_as_272yrs_msec(time) \
     (((apr_int64_t)(time) * APR_TIME_C(1000)) >> APR_BUSEC_BITS)

/* Another optimized form, we are limited to some 101 days.
  * Consider it a bleeding optimization with a great loss of resolution.
  */
#define apr_time_as_3mos_usec(time) \
     (((apr_int64_t)(time) * APR_TIME_C(1000000)) >> APR_BUSEC_BITS)

/* The final optimized form, we are limited to some 2.44 hours.
  * Use with extreme care.
  */
#define apr_time_as_2hrs_nsec(time) \
     (((apr_int64_t)(time) * APR_TIME_C(1000000000)) >> APR_BUSEC_BITS)


/* These have no loss of resolution if the 2nd xsec arg is < 1second
  */
#define apr_time_mmake(sec, msec) ((apr_time_t)(sec) << APR_BUSEC_BITS \
           + ((apr_time_t)(msec) << APR_BUSEC_BITS) / APR_TIME_C(1000))

#define apr_time_umake(sec, usec) ((apr_time_t)(sec) << APR_BUSEC_BITS \
           + ((apr_time_t)(usec) << APR_BUSEC_BITS) / APR_TIME_C(1000000))

#define apr_time_nmake(sec, nsec) ((apr_time_t)(sec) << APR_BUSEC_BITS \
           + ((apr_time_t)(nsec) << APR_BUSEC_BITS) / APR_TIME_C(1000000000))


#define apr_time_from_sec(sec)  ((apr_time_t)(sec) << APR_BUSEC_BITS)

/* These carry no resolution penalty, just a nasty performance penalty.
  */
#define apr_time_from_msec(msec) \
     ((((apr_time_t)(msec) / APR_TIME_C(1000)) << APR_BUSEC_BITS) \
      + ((((apr_time_t)(msec) % APR_TIME_C(1000)) << APR_BUSEC_BITS) \
         / APR_TIME_C(1000)))

#define apr_time_from_usec(usec) \
     ((((apr_time_t)(usec) / APR_TIME_C(1000000)) << APR_BUSEC_BITS) \
      + ((((apr_time_t)(usec) % APR_TIME_C(1000000)) << APR_BUSEC_BITS) \
         / APR_TIME_C(1000000)))

#define apr_time_from_nsec(nsec) \
     ((((apr_time_t)(nsec) / APR_TIME_C(1000000000)) << APR_BUSEC_BITS) \
      + ((((apr_time_t)(nsec) % APR_TIME_C(1000000000)) << APR_BUSEC_BITS) \
         / APR_TIME_C(1000000000)))

/* 97 day macros; you would expect these to be correct, until you
  * consider that the initial SHL reduces precision to 43 bits (signed)
  * of which 20 bits is the busecs.  23 remaining bits ~= 97 days.
  * The apr_time_xmake macros above doesn't suffer the problem, since
  * we presume that the xsec component is < 1 second... and the 'proper'
  * apr_time_from_xsec macros will split into integral and fractional
  * seconds while converting, so there is no unexpected loss of range.
  */
#define apr_time_from_3mos_msec(msec) \
     (((apr_time_t)(msec) << APR_BUSEC_BITS) / APR_TIME_C(1000))

#define apr_time_from_3mos_usec(usec) \
     (((apr_time_t)(usec) << APR_BUSEC_BITS) / APR_TIME_C(1000000))

#define apr_time_from_3mos_nsec(nsec) \
     (((apr_time_t)(usec) << APR_BUSEC_BITS) / APR_TIME_C(1000000000))

/* XXX These are ambiguous and dangerous.  The symbol names must be
  * deprecated to assure folks don't misuse them (_xsec_get vs _as_xsec)
  */
#define apr_time_sec apr_time_sec_get
#define apr_time_msec apr_time_msec_get
#define apr_time_usec apr_time_usec_get
#define apr_time_nsec apr_time_nsec_get

/* XXX Also dangerous, since folks might assume the literal meaning.
  * we have no need to define this constant anymore for library users
  */
#define APR_USEC_PER_SEC APR_BUSEC_PER_SEC

RE: [PATCH 2] example binary BUSEC patch for benchmarking only

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

At 10:41 AM 7/11/2002, Brian Pane wrote:
>On Thu, 2002-07-11 at 06:58, Bill Stoddard wrote:
>
> > I ran a quick profile with this patch and it eliminated a couple of
> > divisions (calls to __divi64 reduced from 4 to 2 in my test setup. your
> > mileage may vary) which was good for 493 instructions. Still have 3 
> __divu64
> > and 2 __divi64 calls. The three __divu64 calls are in the 
> gettimeofday() CRT
> > function, so there is not much we can do about these directly.  One 
> __divi64
> > is in apr_poll (convert microseconds to milliseconds. This can probably be
> > optimized away). The other __divi64 is somewhere in cached_explode
> > (util_time.c).
>
>The only division that I know of in cached_explode() is:
>
>     struct exploded_time_cache_element *cache_element =
>         &(cache[seconds % TIME_CACHE_SIZE]);
>
>Is that where the division operation is being generated
>on your test system?

Actually, we still have division on construct busec from usecs.

That would be the one /1000000 which can never be optimized
away.  Fortunately, it should be infrequent.

Bill

RE: [PATCH 2] example binary BUSEC patch for benchmarking only

Posted by Ryan Bloom <rb...@covalent.net>.

Brian, fix is commit soon.

Ryan

----------------------------------------------
Ryan Bloom                  rbb@covalent.net
645 Howard St.              rbb@apache.org
San Francisco, CA 

> -----Original Message-----
> From: Brian Pane [mailto:bpane@pacbell.net]
> Sent: Thursday, July 11, 2002 9:31 AM
> To: APR Development List
> Subject: RE: [PATCH 2] example binary BUSEC patch for benchmarking
only
> 
> On Thu, 2002-07-11 at 06:58, Bill Stoddard wrote:
> > I ran a quick profile with this patch and it eliminated a couple of
> > divisions (calls to __divi64 reduced from 4 to 2 in my test setup.
your
> > mileage may vary) which was good for 493 instructions. Still have 3
> __divu64
> > and 2 __divi64 calls. The three __divu64 calls are in the
gettimeofday()
> CRT
> > function, so there is not much we can do about these directly.  One
> __divi64
> > is in apr_poll (convert microseconds to milliseconds. This can
probably
> be
> > optimized away). The other __divi64 is somewhere in cached_explode
> > (util_time.c).
> 
> With the cached_explode() division now fixed, I just looked at the
> division in apr_poll().  The code that's there now is doing the math
> like this:
> 
>     if (timeout > 0) {
>         timeout /= 1000; /* convert microseconds to milliseconds */
>     }
> 
> It's assuming that timeout, an apr_interval_time_t, represents an
> absolute time in microseconds--but that may not be true much longer
> if we switch to binary microseconds.  The calculation should be
> something like:
> 
>     timeout_msec = apr_time_usec(timeout) / 1000;
>     timeout_sec = apr_time_sec(timeout);
>     if (timeout_sec) {
>         timeout_msec += 1000 * timeout_sec;
>     }
> 
> I don't see a way to eliminate the "/ 1000" to convert usec to
> msec.  But we may be able to do all the math in 32-bit mode, by
> limiting the maximum timeout to the number of milliseconds that
> will fit in 32 bits, which works out to a max timeout of about
> 50 days.
> 
> --Brian
>

RE: [PATCH 2] example binary BUSEC patch for benchmarking only

Posted by Brian Pane <bp...@pacbell.net>.

On Thu, 2002-07-11 at 06:58, Bill Stoddard wrote:
> I ran a quick profile with this patch and it eliminated a couple of
> divisions (calls to __divi64 reduced from 4 to 2 in my test setup. your
> mileage may vary) which was good for 493 instructions. Still have 3 __divu64
> and 2 __divi64 calls. The three __divu64 calls are in the gettimeofday() CRT
> function, so there is not much we can do about these directly.  One __divi64
> is in apr_poll (convert microseconds to milliseconds. This can probably be
> optimized away). The other __divi64 is somewhere in cached_explode
> (util_time.c).

With the cached_explode() division now fixed, I just looked at the
division in apr_poll().  The code that's there now is doing the math
like this:

    if (timeout > 0) {
        timeout /= 1000; /* convert microseconds to milliseconds */
    }

It's assuming that timeout, an apr_interval_time_t, represents an
absolute time in microseconds--but that may not be true much longer
if we switch to binary microseconds.  The calculation should be
something like:

    timeout_msec = apr_time_usec(timeout) / 1000;
    timeout_sec = apr_time_sec(timeout);
    if (timeout_sec) {
        timeout_msec += 1000 * timeout_sec;
    }

I don't see a way to eliminate the "/ 1000" to convert usec to
msec.  But we may be able to do all the math in 32-bit mode, by
limiting the maximum timeout to the number of milliseconds that
will fit in 32 bits, which works out to a max timeout of about
50 days.

--Brian

RE: [PATCH 2] example binary BUSEC patch for benchmarking only

Posted by Brian Pane <bp...@pacbell.net>.

On Thu, 2002-07-11 at 06:58, Bill Stoddard wrote:

> I ran a quick profile with this patch and it eliminated a couple of
> divisions (calls to __divi64 reduced from 4 to 2 in my test setup. your
> mileage may vary) which was good for 493 instructions. Still have 3 __divu64
> and 2 __divi64 calls. The three __divu64 calls are in the gettimeofday() CRT
> function, so there is not much we can do about these directly.  One __divi64
> is in apr_poll (convert microseconds to milliseconds. This can probably be
> optimized away). The other __divi64 is somewhere in cached_explode
> (util_time.c).

The only division that I know of in cached_explode() is:

    struct exploded_time_cache_element *cache_element =
        &(cache[seconds % TIME_CACHE_SIZE]);

Is that where the division operation is being generated
on your test system?

TIME_CACHE_SIZE is 16, specifically so that the compiler
can optimize away the division, but if there's still a
division being generated, we can replace it with
  "cache[seconds & TIME_CACHE_MASK]"

--Brian

RE: [PATCH 2] example binary BUSEC patch for benchmarking only

Posted by Ryan Bloom <rb...@covalent.net>.

> From: Bill Stoddard [mailto:bill@wstoddard.com]
> 
> I ran a quick profile with this patch and it eliminated a couple of
> divisions (calls to __divi64 reduced from 4 to 2 in my test setup.
your
> mileage may vary) which was good for 493 instructions. Still have 3
> __divu64
> and 2 __divi64 calls. The three __divu64 calls are in the
gettimeofday()
> CRT
> function, so there is not much we can do about these directly.  One

Has anybody looked at ExtendedStatus to make sure that we are
eliminating all of the time calls that we can during request processing?

> __divi64
> is in apr_poll (convert microseconds to milliseconds. This can
probably be

If we move to using the macros, it should go away.

Ryan

RE: [PATCH 2] example binary BUSEC patch for benchmarking only

Posted by Bill Stoddard <bi...@wstoddard.com>.

I ran a quick profile with this patch and it eliminated a couple of
divisions (calls to __divi64 reduced from 4 to 2 in my test setup. your
mileage may vary) which was good for 493 instructions. Still have 3 __divu64
and 2 __divi64 calls. The three __divu64 calls are in the gettimeofday() CRT
function, so there is not much we can do about these directly.  One __divi64
is in apr_poll (convert microseconds to milliseconds. This can probably be
optimized away). The other __divi64 is somewhere in cached_explode
(util_time.c).

Bill

> At 10:38 PM 7/10/2002, William A. Rowe, Jr. wrote:
> >At 10:03 PM 7/10/2002, Brian Pane wrote:
> >>Bill Stoddard wrote:
> >>
> >>>I've not looked at the generated code, but profiling indicates that an
> >>>additional division is happening, adding an extra 231 instructions.
> >>>(xlc_r -O2)
> >>
> >>If you redefine the macro as a shift, does the profile look better?
>
> Ok, attached is the code redone as binary math.  I'm tired, could be
> any number of major blunders in it, but on first pass, it looked right.
>
> Bill

Re: [PATCH 2] example binary BUSEC patch for benchmarking only

Posted by Cliff Woolley <jw...@virginia.edu>.

On Thu, 11 Jul 2002, Greg Marr wrote:

> I keep thinking that APR_USEC_PER_SEC should be (1 << 20), or now
> (1 << APR_USEC_BITS) instead of the magical constant.  I have no way
> of verifying with a quick glance that 1048576 is really 2^20.

You don't know your powers of 2?  Memorize, Greg, Memorize.  ;)

Re: [PATCH 2] example binary BUSEC patch for benchmarking only

Posted by Greg Marr <gr...@alum.wpi.edu>.

At 01:33 AM 07/11/2002, William A. Rowe, Jr. wrote:
>Ok, attached is the code redone as binary math.  I'm tired, could be 
>any number of major blunders in it, but on first pass, it looked right.
>
>-/** number of microseconds per second */
>-#define APR_USEC_PER_SEC APR_TIME_C(1000000)
>+/** number of binary microseconds per second (2^20) */
>+#define APR_USEC_PER_SEC APR_TIME_C(1048576)
>+#define APR_USEC_BITS 20

I keep thinking that APR_USEC_PER_SEC should be (1 << 20), or now
(1 << APR_USEC_BITS) instead of the magical constant.  I have no way 
of verifying with a quick glance that 1048576 is really 2^20.

-- 
Greg Marr
gregm@alum.wpi.edu
"We thought you were dead."
"I was, but I'm better now." - Sheridan, "The Summoning"

[PATCH 2] example binary BUSEC patch for benchmarking only

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

At 10:38 PM 7/10/2002, William A. Rowe, Jr. wrote:
>At 10:03 PM 7/10/2002, Brian Pane wrote:
>>Bill Stoddard wrote:
>>
>>>I've not looked at the generated code, but profiling indicates that an
>>>additional division is happening, adding an extra 231 instructions.
>>>(xlc_r -O2)
>>
>>If you redefine the macro as a shift, does the profile look better?

Ok, attached is the code redone as binary math.  I'm tired, could be
any number of major blunders in it, but on first pass, it looked right.

Bill

Re: [PATCH] example BUSEC patch for benchmarking only

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

At 10:03 PM 7/10/2002, Brian Pane wrote:
>Bill Stoddard wrote:
>
>>I've not looked at the generated code, but profiling indicates that an
>>additional division is happening, adding an extra 231 instructions.
>>(xlc_r -O2)
>
>If you redefine the macro as a shift, does the profile look better?

It isn't unbelievable.

Consider that constructing or devolving a -true- usec value become
more expensive [instruction-wise] because they need to be factored
out modulos (e.g. time & (2^20 - 1)) and then multiplied by usecs,
then divided by busecs.

The question is CPU cycles.  Shifts and ANDs are cheap.  Huge
integer division is not.  At least factoring out a usec from an apr_time
is only a huge integer multiplication.

Of course factoring a usec into a busec is just as expensive as it
once was, since we multiply (shift) by busec/sec, then divide the
huge value by usec/sec.  That is the one most expensive operation
in the entire schema.

BTW - if you are looking at NT numbers, Bill, keep in mind we have
additional optimizations due to the fact that we deal in 100ns units
and milliseconds on Win32, which can all be cleaned up for busec
optimized calculations.

Don't forget to optimize both the busec/sec division and multiplication
to shifts, and the modulos operator to &(2^20-1).

Bill

Re: [PATCH] example BUSEC patch for benchmarking only

Posted by Brian Pane <br...@apache.org>.

Bill Stoddard wrote:

>I've not looked at the generated code, but profiling indicates that an
>additional division is happening, adding an extra 231 instructions.
>(xlc_r -O2)
>

If you redefine the macro as a shift, does the profile look better?

--Brian

RE: [PATCH] example BUSEC patch for benchmarking only

Posted by Cliff Woolley <jw...@virginia.edu>.

On Wed, 10 Jul 2002, Bill Stoddard wrote:

> > It's definitely a valid optimization.  I just checked gcc on
> > Sparc and it generates a shift rather than a division.  But
>
> I've not looked at the generated code, but profiling indicates that an
> additional division is happening, adding an extra 231 instructions.
> (xlc_r -O2)

Note that even gcc will only do this optimization if -fstrength-reduce is
specified (granted, for gcc that optimization is turned on automatically
with -O2 and -O3).

--Cliff

RE: [PATCH] example BUSEC patch for benchmarking only

Posted by Bill Stoddard <bi...@wstoddard.com>.

> Bill Stoddard wrote:
>
> >>Humm... looking at this macro which is used all over the place, I see a
> >>division.
> >>
> >> #define apr_time_sec(time) ((apr_int64_t)((time) / APR_USEC_PER_SEC))
> >>
> >>Since APR_USEC_PER_SEC is now a binary representation, I assume
> >>the compiler
> >>will do the proper optimization.
> >>
> >>
> >
> >ie, turn the division into a shift, which is much less expensive.
> >
>
> It's definitely a valid optimization.  I just checked gcc on
> Sparc and it generates a shift rather than a division.  But
> if the busec code becomes part of APR, I'd rather define the
> macro as a shift, just to make sure that people get the benefit
> of the speedup even if their compilers don't do the optimization.
> (My fear is that some 32-bit compilers might immediately generate
> a call to a library function if they see anything that looks like
> a 64-bit division.)
>
> --Brian
>

I've not looked at the generated code, but profiling indicates that an
additional division is happening, adding an extra 231 instructions.
(xlc_r -O2)

Bill

RE: [PATCH] example BUSEC patch for benchmarking only

Posted by Emery Berger <em...@cs.utexas.edu>.

> -----Original Message-----
> From: Brian Pane [mailto:brianp@apache.org]
> Sent: Wednesday, July 10, 2002 8:46 PM
> To: APR Development List
> Subject: Re: [PATCH] example BUSEC patch for benchmarking only
> 
> Bill Stoddard wrote:
> 
> >>Humm... looking at this macro which is used all over the place, I
see a
> >>division.
> >>
> >> #define apr_time_sec(time) ((apr_int64_t)((time) /
APR_USEC_PER_SEC))
> >>
> >>Since APR_USEC_PER_SEC is now a binary representation, I assume
> >>the compiler
> >>will do the proper optimization.
> >>
> >>
> >
> >ie, turn the division into a shift, which is much less expensive.
> >
> 
> It's definitely a valid optimization.  I just checked gcc on
> Sparc and it generates a shift rather than a division.  But
> if the busec code becomes part of APR, I'd rather define the
> macro as a shift, just to make sure that people get the benefit
> of the speedup even if their compilers don't do the optimization.
> (My fear is that some 32-bit compilers might immediately generate
> a call to a library function if they see anything that looks like
> a 64-bit division.)

Brian's concern is quite valid.

For 64-bit integers, VC7 (Microsoft's latest Visual C++) does indeed use
a library call when dividing __int64's (on x86). Shifts are of course
compiled to shifts.

-- Emery

--
Emery Berger
Assistant Professor (starting Fall 2002)
Dept. of Computer Science
University of Massachusetts, Amherst
www.cs.utexas.edu/users/emery

Re: [PATCH] example BUSEC patch for benchmarking only

Posted by Brian Pane <br...@apache.org>.

Bill Stoddard wrote:

>>Humm... looking at this macro which is used all over the place, I see a
>>division.
>>
>> #define apr_time_sec(time) ((apr_int64_t)((time) / APR_USEC_PER_SEC))
>>
>>Since APR_USEC_PER_SEC is now a binary representation, I assume 
>>the compiler
>>will do the proper optimization.
>>    
>>
>
>ie, turn the division into a shift, which is much less expensive.
>

It's definitely a valid optimization.  I just checked gcc on
Sparc and it generates a shift rather than a division.  But
if the busec code becomes part of APR, I'd rather define the
macro as a shift, just to make sure that people get the benefit
of the speedup even if their compilers don't do the optimization.
(My fear is that some 32-bit compilers might immediately generate
a call to a library function if they see anything that looks like
a 64-bit division.)

--Brian

RE: [PATCH] example BUSEC patch for benchmarking only

Posted by Bill Stoddard <bi...@wstoddard.com>.

> > At 01:14 PM 7/9/2002, you wrote:
> > >Bill Stoddard wrote:
> > >>
> > >>Where and when was the post that described this proposal?  I'd
> > like to take
> > >>a look. Is there a patch?
> > >
> > >There's no patch that I know of, but here's the thread with wrowe's
> > >original proposal for the binary microseconds design:
> > >  http://marc.theaimsgroup.com/?l=apr-dev&m=102376298728487&w=2
> >
> > http://marc.theaimsgroup.com/?l=apr-dev&m=102383695930036&w=2
> >
> > is a better link to some example macros.
> >
> > >With the big batch of changes that I made last week, almost all
> > >of the time-related code in the httpd now uses the new macros--
> > >apr_time_sec(), apr_time_usec(), etc--instead of multiplying and
> > >dividing by 1,000,000 directly.  Hopefully that will make it
> > >easier to try out a new time representation.
> >
> > Agreed.  But even discounting that USEC_PER_SEC is used for
> > it's real meaning somewhere, and would introduce slight discrepancies,
> > attached is simply the gross hack to force busec units.
> >
> > Note the 'right' change is to have a BUSEC_PER_SEC constant,
> > but this is a truer test for benchmarking.
> >
> > Bill
> 
> Humm... looking at this macro which is used all over the place, I see a
> division.
> 
>  #define apr_time_sec(time) ((apr_int64_t)((time) / APR_USEC_PER_SEC))
> 
> Since APR_USEC_PER_SEC is now a binary representation, I assume 
> the compiler
> will do the proper optimization.

ie, turn the division into a shift, which is much less expensive.

Bill

RE: [PATCH] example BUSEC patch for benchmarking only

Posted by Bill Stoddard <bi...@wstoddard.com>.

> At 01:14 PM 7/9/2002, you wrote:
> >Bill Stoddard wrote:
> >>
> >>Where and when was the post that described this proposal?  I'd
> like to take
> >>a look. Is there a patch?
> >
> >There's no patch that I know of, but here's the thread with wrowe's
> >original proposal for the binary microseconds design:
> >  http://marc.theaimsgroup.com/?l=apr-dev&m=102376298728487&w=2
>
> http://marc.theaimsgroup.com/?l=apr-dev&m=102383695930036&w=2
>
> is a better link to some example macros.
>
> >With the big batch of changes that I made last week, almost all
> >of the time-related code in the httpd now uses the new macros--
> >apr_time_sec(), apr_time_usec(), etc--instead of multiplying and
> >dividing by 1,000,000 directly.  Hopefully that will make it
> >easier to try out a new time representation.
>
> Agreed.  But even discounting that USEC_PER_SEC is used for
> it's real meaning somewhere, and would introduce slight discrepancies,
> attached is simply the gross hack to force busec units.
>
> Note the 'right' change is to have a BUSEC_PER_SEC constant,
> but this is a truer test for benchmarking.
>
> Bill

Humm... looking at this macro which is used all over the place, I see a
division.

 #define apr_time_sec(time) ((apr_int64_t)((time) / APR_USEC_PER_SEC))

Since APR_USEC_PER_SEC is now a binary representation, I assume the compiler
will do the proper optimization.

Bill