You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafficserver.apache.org by Rasim Saltuk Alakuş <ra...@turksat.com.tr> on 2014/08/27 17:42:48 UTC

generating hash from packet content

Hi All,

ATS uses URL hash for cache storage. And CacheUrl plugin adds some more flexibility in URL hashing strategy.

We think of creating hash based on packet content and use it as the hash while storing and retrieving from cache This looks a better solution, so that URI changes won't hurt caching system. One immediate benefit for example if you cache YouTube , each request for same video can have different URL and CacheUrl plugin does not always provide a good solution. Also maintaining site based hash filters looks not an elegant solution.

Is there any previous or active work for implementing content based hashing? What kind of problems and constrains you may guess. Is there any volunteer to implement this feature together with us?

Kind regards
Saltuk Alakuş

Rasim Saltuk Alakuş

Kıdemli Uzman
Senior Specialist
Bilişim Ar-Ge ve Teknoloji Direktörlüğü
IT R & D and Technology

www.turksat.com.tr<http://www.turksat.com.tr>
ralakus@turksat.com.tr<ma...@turksat.com.tr>

TUNA MAH. İSMAİL ÖZKUNT 1709 SOK. NO:3 KAT:2 KARŞIYAKA – İZMİR
T : +90 232 323 43 00
F : +90 232 323 43 44





"Bu mesaj ve ekleri mesajda gönderildiği belirtilen kişi ya da kişilere özel olup gizli bilgiler içeriyor olabilir. Mesajın muhatabı ilgilisi ya da gönderileni değilseniz lütfen mesajı herhangi bir şekilde kullanmayınız çoğaltmayınız ve başkalarına ifşa etmeyiniz. Eğer mesaj yanlışlıkla size ulaşmışsa anılan mesaj ve ekinde yer alan bilgileri gizli tutunuz ve mesajı gönderen kişiyi bilgilendirerek söz konusu mesaj ile eklerini derhal imha ediniz. Bu mesaj ve ekindeki belgelerin bilinen virüslere karşı kontrolü yapılmıştır. Ancak e-posta sistemlerinin taşıdığı risklerden dolayı şirketimiz bu mesajın ve içerdiği bilgilerin size değişikliğe uğrayarak veya geç ulaşmasından bütünlüğünün ve gizliliğinin korunamamasından virüs içermesinden ve herhangi bir sebeple bilgisayarınıza ve sisteminize verebileceği zararlardan sorumlu tutulamaz.”<<<<<

“This message together with its attachments is intended solely for the address(es) and may contain confidential or privileged information. If you are not the intended recipient please do not use copy or disclose the message for any purpose. Should you receive this message by mistake please keep all information contained in the message or its attachments strictly confidential and advise the sender and delete it immediately without retaining a copy. This message and its attachments have been swept by anti-virus systems for the presence of known viruses. However due to the risks of e-mail systems our company cannot accept liability for any changes or delay in receiving loss of integrity and confidentiality containing viruses and any damages caused in any way to your computer and system recipient, you are notified that disclosing, distributing, or copying this e-mail is strictly prohibited. “



Re: generating hash from packet content

Posted by Leif Hedstrom <zw...@apache.org>.
On Aug 27, 2014, at 9:42 AM, Rasim Saltuk Alakuş <ra...@turksat.com.tr> wrote:

> Hi All,
> 
> ATS uses URL hash for cache storage. And CacheUrl plugin adds some more flexibility in URL hashing strategy.
> 
> We think of creating hash based on packet content and use it as the hash while storing and retrieving from cache This looks a better solution, so that URI changes won't hurt caching system. One immediate benefit for example if you cache YouTube , each request for same video can have different URL and CacheUrl plugin does not always provide a good solution. Also maintaining site based hash filters looks not an elegant solution.
> 
> Is there any previous or active work for implementing content based hashing? What kind of problems and constrains you may guess. Is there any volunteer to implement this feature together with us?



So what would the client lookup “hash” on? All ATS has at that point is the URL, and various headers. It would not be able (without further actions, see next paragraph) be able to retrieve an object from cache.

Now, what would  work, which I think had been mentioned before somewhere, is that you fetch (from origin) a very small piece of the object on every client request. Say, 512 bytes (or something else that fits within one (typical) TCP packet, using e.g. "Range: bytes=0-511". Then you use that as your cache key. This could do what you are asking for, using the “data” as the cache key, but has the downside that you always have to ask origin for some data. At a minimum, I think such a plugin also must be a per-remap plugin, so you can decide when you want to take that hit or not.

I don’t know of anyone working on such a plugin. But it sounds potentially very useful :).


Also, as a side note, there was, way back when, some beginning code to deal with cache dedup. I don’t know that it ever worked, but John might know. The idea was to hash the actual data (body) and dedup it just like e.g. some modern file systems can do. This wouldn’t avoid origin requests per se, you’d still have to fetch it, calculate the hash, and then decide that you don’t have to duplicate the storage (so, save some disk storage).

Cheers,

— leif