You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafficserver.apache.org by Jack Bates <6n...@nottheoilrig.com> on 2012/06/22 12:12:24 UTC

Re: Download mirrors, plugin, GSoC

What's the best way to compute SHA-256 digests for content in the cache? 
I am thinking of using libgcrypt [1], can anyone comment on whether this 
is a good choice, or offer advice?

To read the content, I am thinking of following the null transform 
example, and whenever I copy from the input to the output buffer, update 
the digest with this chunk of content. Is the null transform example the 
best way to read the content, for the purpose of computing the digest?

Should I worry about tying up the event loop with calls to libgcrypt?

Here is a first attempt, on GitHub, at using libgcrypt and following the 
null transform example [2]. I would love any feedback

My GSoC project is a plugin to exploit RFC 6249, Metalink/HTTP: Mirrors 
and Hashes [3], and redirect clients to mirrors that are already cached. 
So far it works just well enough that, given a response with a URL in 
the "Location: ..." header that is not already cached and a URL in a 
"Link: <...>; rel=duplicate" header that is already cached, it will 
rewrite the "Location: ..." header with the cached URL. This should 
redirect clients that are not Metalink aware to mirrors that are already 
cached

I think the next step is for this plugin to check the digest of the 
cached content:

    If Instance Digests are not provided by the Metalink servers, the
    Link header fields pertaining to this specification MUST be ignored.

So I plan to compute digests for content in the cache and check the 
"Digest: ..." header against these. Does this sound like the right approach?

   [1] http://directory.fsf.org/wiki/Libgcrypt
   [2] https://github.com/jablko/dedup
   [3] http://tools.ietf.org/html/rfc6249

Re: Download mirrors, plugin, GSoC

Posted by Jack Bates <6n...@nottheoilrig.com>.
On 22/06/12 07:34 AM, Leif Hedstrom wrote:
> On 6/22/12 4:12 AM, Jack Bates wrote:
>> What's the best way to compute SHA-256 digests for content in the
>> cache? I am thinking of using libgcrypt [1], can anyone comment on
>> whether this is a good choice, or offer advice?
>
> hmmm, maybe consider something from OpenSSL? We already link / use it
> heavily, and I believe there's a SHA-256 in it as well?

Thanks Leif, done. I switched from libgcrypt to OpenSSL [1]

This plugin now computes SHA-256 digests for content in the cache. Given 
a response with a "Location: ..." header and a "Digest: SHA-256=..." 
header, if the "Location: ..." URL isn't already cached but content with 
a matching digest does exist in the cache, the plugin will rewrite the 
"Location: ..." header with the cached URL. This should redirect clients 
that are not Metalink aware to mirrors that are already cached

The code is up on GitHub [2] I would love any feedback

To check for content with a matching digest, this plugin uses 
TSCacheWrite() and TSCacheRead() to map the SHA-256 digest of the 
content to the request URL

It listens for responses from origin servers with 
TS_EVENT_HTTP_READ_RESPONSE_HDR and sets up a transform. The transform 
doesn't alter the content, just feeds it to OpenSSL SHA256_Update(). 
When complete, it calls TSCacheKeyDigestSet() on the SHA-256 digest, and 
TSCacheWrite() to store there the request URL

It also listens for responses to clients (from cache or from origin 
server) with TS_EVENT_HTTP_SEND_RESPONSE_HDR. If the response has a 
"Location: ..." header and a "Digest: SHA-256=..." header then it calls 
TSCacheKeyDigestFromUrlSet() and TSCacheRead() to check if the 
"Location: ..." URL is already cached. If not then it calls 
TSCacheKeyDigestSet() and TSCacheRead() to check if the "Digest: 
SHA-256=..." digest already exists in the cache. If so then it calls 
TSVConnRead() to read the URL associated with the digest

Finally it calls TSCacheKeyDigestFromUrlSet() and TSCacheRead() again, 
on the URL associated with the digest, to check that the content is 
still fresh. If it is then the plugin rewrites the "Location: ..." 
header with the cached URL

What are your thoughts on this approach?

This satisfies the requirement from RFC 6249:

    If Instance Digests are not provided by the Metalink servers, the
    Link header fields pertaining to this specification MUST be ignored.

RFC 6249 also requires the SHA-256 digest:

    Metalinks contain whole file hashes as described in
    Section 6, and MUST include SHA-256, as specified in [FIPS-180-3].

The plugin could support additional digest algorithms, if they are useful?

   [1] 
https://github.com/jablko/dedup/commit/3d1e6c1980df5b75aa44ace24f6a4886d6ba4215
   [2] https://github.com/jablko/dedup

Re: Download mirrors, plugin, GSoC

Posted by Leif Hedstrom <zw...@apache.org>.
On 6/22/12 4:12 AM, Jack Bates wrote:
> What's the best way to compute SHA-256 digests for content in the cache? I 
> am thinking of using libgcrypt [1], can anyone comment on whether this is 
> a good choice, or offer advice?

hmmm, maybe consider something from OpenSSL? We already link / use it 
heavily, and I believe there's a SHA-256 in it as well?

-- leif