You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Ruediger Pluem <rp...@apache.org> on 2020/06/29 09:41:04 UTC

Re: svn commit: r1879285 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/core.xml include/ap_mmn.h include/http_core.h include/http_protocol.h include/httpd.h modules/dav/fs/repos.c modules/test/mod_dialup.c server/core.c server/util_etag.c


On 6/28/20 1:41 AM, minfrin@apache.org wrote:
> Author: minfrin
> Date: Sat Jun 27 23:41:00 2020
> New Revision: 1879285
> 
> URL: http://svn.apache.org/viewvc?rev=1879285&view=rev
> Log:
> "[mod_dav_fs etag handling] should really honor the FileETag setting".
> - It now does.
> - Add "Digest" to FileETag directive, allowing a strong ETag to be
>   generated using a file digest.
> - Add ap_make_etag_ex() and ap_set_etag_fd() to allow full control over
>   ETag generation.
> - Add concept of "binary notes" to request_rec, allowing packed bit flags
>   to be added to a request.
> - First binary note - AP_REQUEST_STRONG_ETAG - allows modules to force
>   the ETag to a strong ETag to comply with RFC requirements, such as those
>   mandated by various WebDAV extensions.
> 
> Modified:
>     httpd/httpd/trunk/CHANGES
>     httpd/httpd/trunk/docs/manual/mod/core.xml
>     httpd/httpd/trunk/include/ap_mmn.h
>     httpd/httpd/trunk/include/http_core.h
>     httpd/httpd/trunk/include/http_protocol.h
>     httpd/httpd/trunk/include/httpd.h
>     httpd/httpd/trunk/modules/dav/fs/repos.c
>     httpd/httpd/trunk/modules/test/mod_dialup.c
>     httpd/httpd/trunk/server/core.c
>     httpd/httpd/trunk/server/util_etag.c
> 

> Modified: httpd/httpd/trunk/server/util_etag.c
> URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/server/util_etag.c?rev=1879285&r1=1879284&r2=1879285&view=diff
> ==============================================================================
> --- httpd/httpd/trunk/server/util_etag.c (original)
> +++ httpd/httpd/trunk/server/util_etag.c Sat Jun 27 23:41:00 2020

> @@ -73,13 +99,96 @@ AP_DECLARE(char *) ap_make_etag(request_
>      cfg = (core_dir_config *)ap_get_core_module_config(r->per_dir_config);
>      etag_bits = (cfg->etag_bits & (~ cfg->etag_remove)) | cfg->etag_add;
>  
> +    if (er->force_weak) {
> +        weak = ETAG_WEAK;
> +        weak_len = sizeof(ETAG_WEAK);
> +    }
> +
> +    if (r->vlist_validator) {
> +
> +        /* If we have a variant list validator (vlv) due to the
> +         * response being negotiated, then we create a structured
> +         * entity tag which merges the variant etag with the variant
> +         * list validator (vlv).  This merging makes revalidation
> +         * somewhat safer, ensures that caches which can deal with
> +         * Vary will (eventually) be updated if the set of variants is
> +         * changed, and is also a protocol requirement for transparent
> +         * content negotiation.
> +         */
> +
> +        /* if the variant list validator is weak, we make the whole
> +         * structured etag weak.  If we would not, then clients could
> +         * have problems merging range responses if we have different
> +         * variants with the same non-globally-unique strong etag.
> +         */
> +
> +        vlv = r->vlist_validator;
> +        if (vlv[0] == 'W') {
> +            vlv += 3;
> +            weak = ETAG_WEAK;
> +            weak_len = sizeof(ETAG_WEAK);
> +        }
> +        else {
> +            vlv++;
> +        }
> +        vlv_len = strlen(vlv);
> +
> +    }
> +    else {
> +        vlv = NULL;
> +        vlv_len = 0;
> +    }
> +
> +    /*
> +     * Did a module flag the need for a strong etag, or did the
> +     * configuration tell us to generate a digest?
> +     */
> +    if (er->finfo->filetype == APR_REG &&
> +            (AP_REQUEST_IS_STRONG_ETAG(r) || (etag_bits & ETAG_DIGEST))) {
> +
> +        apr_sha1_ctx_t context;
> +        unsigned char buf[4096]; /* keep this a multiple of 64 */
> +        unsigned char digest[APR_SHA1_DIGESTSIZE];
> +        apr_file_t *fd = NULL;
> +
> +        apr_size_t nbytes;
> +        apr_off_t offset = 0L;
> +
> +        if (er->fd) {
> +            fd = er->fd;
> +        }
> +        else if (er->pathname) {
> +             apr_file_open(&fd, er->pathname, APR_READ | APR_BINARY, 0, r->pool);
> +        }
> +        if (!fd) {
> +            return "";
> +        }
> +
> +        etag = apr_palloc(r->pool, weak_len + sizeof("\"\"") +
> +                4*(APR_SHA1_DIGESTSIZE/3) + vlv_len + 4);

Using apr_base64_encode_len in the formula above would make it easier to understand and read IMHO.

> +
> +        apr_sha1_init(&context);
> +        nbytes = sizeof(buf);
> +        while (apr_file_read(fd, buf, &nbytes) == APR_SUCCESS) {

I assume that the code has been taken from ap_md5digest, but
if we have MMAP available and if we have not disabled MMAP we use MMAP to read the contents of the file
if sendfile is not possible (e.g. SSL connections).
Wouldn't using MMAP more performant here especially in case of larger files?

> +            apr_sha1_update_binary(&context, buf, nbytes);
> +            nbytes = sizeof(buf);
> +        }
> +        apr_file_seek(fd, APR_SET, &offset);

Why do we always reset the file pointer to 0? Why don't we get the actual file pointer of fd before we do the reading
and restore it afterwards?

> +        apr_sha1_final(digest, &context> +
> +        etag_start(etag, weak, &next);
> +        next += apr_base64_encode_binary(next, digest, APR_SHA1_DIGESTSIZE) - 1;
> +        etag_end(next, vlv, vlv_len);
> +
> +        return etag;
> +    }
> +
>      /*
>       * If it's a file (or we wouldn't be here) and no ETags
>       * should be set for files, return an empty string and
>       * note it for the header-sender to ignore.
>       */
>      if (etag_bits & ETAG_NONE) {
> -        apr_table_setn(r->notes, "no-etag", "omit");
>          return "";
>      }
>  


Regards

Rüdiger

Re: svn commit: r1879285 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/core.xml include/ap_mmn.h include/http_core.h include/http_protocol.h include/httpd.h modules/dav/fs/repos.c modules/test/mod_dialup.c server/core.c server/util_etag.c

Posted by Graham Leggett <mi...@sharp.fm>.
On 03 Jul 2020, at 11:19, Ruediger Pluem <rp...@apache.org> wrote:

> Thanks for the pointer. Is Content-MD5 really used? And given that it has been removed in the RFC my approach would be as follows:
> 
> 1. Continue with your new additions as is. Do not try to merge any of this code with Content-MD5 related content.
> 2. Backport them.
> 3. Leave Content-MD5 untouched in 2.4.x.
> 4. Remove Content-MD5 in trunk.

Thought about it some more and was going to suggest the above instead, +1.

Regards,
Graham
—


Re: svn commit: r1879285 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/core.xml include/ap_mmn.h include/http_core.h include/http_protocol.h include/httpd.h modules/dav/fs/repos.c modules/test/mod_dialup.c server/core.c server/util_etag.c

Posted by Ruediger Pluem <rp...@apache.org>.

On 7/3/20 10:54 AM, Graham Leggett wrote:
> On 29 Jun 2020, at 16:37, Ruediger Pluem <rp...@apache.org> wrote:
> 
>> Makes sense.
>> Do you see a possibility to merge this code and the one of ap_md5digest to a more generic procedure that
>> allows to choose the digest algorithm while using 'MMAPED' reads?
>> BTW: Is sha1 mandatory for strong etags? If not wouldn't MD5 be enough and if MD5 is seen as too insecure
>> why isn't sha1?
> 
> I chose sha1 as it was a) widely available in APR and b) better than md5, but that was it.
> 
> I am wondering if for 2.4 if we use md5 instead, and then set Content-MD5 at the same time in the same code instead of calculating the md5 twice.
> 
> Then - as per the removal of Content-MD5 from https://tools.ietf.org/html/rfc7231#appendix-B - we separately switch it to sha1 and remove Content-MD5 in trunk.

Thanks for the pointer. Is Content-MD5 really used? And given that it has been removed in the RFC my approach would be as follows:

1. Continue with your new additions as is. Do not try to merge any of this code with Content-MD5 related content.
2. Backport them.
3. Leave Content-MD5 untouched in 2.4.x.
4. Remove Content-MD5 in trunk.

Regards

Rüdiger


Re: svn commit: r1879285 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/core.xml include/ap_mmn.h include/http_core.h include/http_protocol.h include/httpd.h modules/dav/fs/repos.c modules/test/mod_dialup.c server/core.c server/util_etag.c

Posted by Graham Leggett <mi...@sharp.fm>.
On 29 Jun 2020, at 16:37, Ruediger Pluem <rp...@apache.org> wrote:

> Makes sense.
> Do you see a possibility to merge this code and the one of ap_md5digest to a more generic procedure that
> allows to choose the digest algorithm while using 'MMAPED' reads?
> BTW: Is sha1 mandatory for strong etags? If not wouldn't MD5 be enough and if MD5 is seen as too insecure
> why isn't sha1?

I chose sha1 as it was a) widely available in APR and b) better than md5, but that was it.

I am wondering if for 2.4 if we use md5 instead, and then set Content-MD5 at the same time in the same code instead of calculating the md5 twice.

Then - as per the removal of Content-MD5 from https://tools.ietf.org/html/rfc7231#appendix-B - we separately switch it to sha1 and remove Content-MD5 in trunk.

Is that sane?

Regards,
Graham
—


Re: svn commit: r1879285 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/core.xml include/ap_mmn.h include/http_core.h include/http_protocol.h include/httpd.h modules/dav/fs/repos.c modules/test/mod_dialup.c server/core.c server/util_etag.c

Posted by Ruediger Pluem <rp...@apache.org>.

On 6/29/20 4:08 PM, Graham Leggett wrote:
> On 29 Jun 2020, at 11:41, Ruediger Pluem <rpluem@apache.org <ma...@apache.org>> wrote:
> 
>>> +        etag = apr_palloc(r->pool, weak_len + sizeof("\"\"") +
>>> +                4*(APR_SHA1_DIGESTSIZE/3) + vlv_len + 4);
>>
>> Using apr_base64_encode_len in the formula above would make it easier to understand and read IMHO.
> 
> It would also mean it could no longer be optimised to a constant. What side do we want to fall on?

Fair enough. The above is faster. Nevertheless I would prefer to group that better. Probably a separate define
SHA1_DIGEST_BASE64_LEN that only defines the base64 length of the base64 encode SHA1 digest. This would mean
more calc work for the compiler and not during runtime, but I guess this is acceptable.

> 
>>> +
>>> +        apr_sha1_init(&context);
>>> +        nbytes = sizeof(buf);
>>> +        while (apr_file_read(fd, buf, &nbytes) == APR_SUCCESS) {
>>
>> I assume that the code has been taken from ap_md5digest, but
>> if we have MMAP available and if we have not disabled MMAP we use MMAP to read the contents of the file
>> if sendfile is not possible (e.g. SSL connections).
>> Wouldn't using MMAP more performant here especially in case of larger files?
> 
> It makes sense, but I would want to change something like this separately.

Makes sense.
Do you see a possibility to merge this code and the one of ap_md5digest to a more generic procedure that
allows to choose the digest algorithm while using 'MMAPED' reads?
BTW: Is sha1 mandatory for strong etags? If not wouldn't MD5 be enough and if MD5 is seen as too insecure
why isn't sha1?

>>> +            apr_sha1_update_binary(&context, buf, nbytes);
>>> +            nbytes = sizeof(buf);
>>> +        }
>>> +        apr_file_seek(fd, APR_SET, &offset);
>>
>> Why do we always reset the file pointer to 0? Why don't we get the actual file pointer of fd before we do the reading
>> and restore it afterwards?
>
> I am guessing that the why is lost in the mists of time. As a guess, it avoids an additional call.
>
> Using the actual file pointer is cleaner, as there is a guarantee that the function does not have any side effects.
>
> http://svn.apache.org/viewvc?rev=1879332&view=rev

Thanks. Another option that just came to mind would it be also fine in case that we get an fd passed to do
an apr_file_dup() on that descriptor such that we also do not touch the actual buffering (if enabled) of the used fd?
Furthermore don't we need to do a seek to 0 before we start reading? Who tells us that the filepointer is at 0 for a passed fd?


Regards

Rüdiger

Re: svn commit: r1879285 - in /httpd/httpd/trunk: CHANGES docs/manual/mod/core.xml include/ap_mmn.h include/http_core.h include/http_protocol.h include/httpd.h modules/dav/fs/repos.c modules/test/mod_dialup.c server/core.c server/util_etag.c

Posted by Graham Leggett <mi...@sharp.fm>.
On 29 Jun 2020, at 11:41, Ruediger Pluem <rp...@apache.org> wrote:

>> +        etag = apr_palloc(r->pool, weak_len + sizeof("\"\"") +
>> +                4*(APR_SHA1_DIGESTSIZE/3) + vlv_len + 4);
> 
> Using apr_base64_encode_len in the formula above would make it easier to understand and read IMHO.

It would also mean it could no longer be optimised to a constant. What side do we want to fall on?

>> +
>> +        apr_sha1_init(&context);
>> +        nbytes = sizeof(buf);
>> +        while (apr_file_read(fd, buf, &nbytes) == APR_SUCCESS) {
> 
> I assume that the code has been taken from ap_md5digest, but
> if we have MMAP available and if we have not disabled MMAP we use MMAP to read the contents of the file
> if sendfile is not possible (e.g. SSL connections).
> Wouldn't using MMAP more performant here especially in case of larger files?

It makes sense, but I would want to change something like this separately.

>> +            apr_sha1_update_binary(&context, buf, nbytes);
>> +            nbytes = sizeof(buf);
>> +        }
>> +        apr_file_seek(fd, APR_SET, &offset);
> 
> Why do we always reset the file pointer to 0? Why don't we get the actual file pointer of fd before we do the reading
> and restore it afterwards?

I am guessing that the why is lost in the mists of time. As a guess, it avoids an additional call.

Using the actual file pointer is cleaner, as there is a guarantee that the function does not have any side effects.

http://svn.apache.org/viewvc?rev=1879332&view=rev <http://svn.apache.org/viewvc?rev=1879332&view=rev>

Regards,
Graham
—