You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Bharath Ravi <bh...@gmail.com> on 2011/11/12 18:09:16 UTC

Detecting/Predicting hotspots

Hi all,

We're trying to perform some sort of monitoring on HDFS, that could detect
when a datanode or a data-block
is "hot". It would be useful to see patterns of popularity in live HDFS
deployments.

Would anyone know if there are any publicly available statistics on data
access patterns that we could look at?

Thanks a lot!
-- 
Bharath Ravi

Re: Detecting/Predicting hotspots

Posted by Mi...@emc.com.
Last year, Rini Kaushik and I authored a paper "GreenHDFS: Towards An
Energy-Conserving,
Storage-Efficient, Hybrid Hadoop Compute Cluster" at HotPower'10 (PDF here:
http://www.usenix.org/event/hotpower10/tech/full_papers/Kaushik.pdf) that
analyzed "hotness" of files based on real namenode audit logs at Yahoo
production clusters.

Based on the scan-centric nature of hadoop applications, we focused on
file hotness, not block hotness.

- Milind

---
Milind Bhandarkar
Greenplum Labs, EMC
(Disclaimer: Opinions expressed in this email are those of the author, and
do not necessarily represent the views of any organization, past or
present, the author might be affiliated with.)



On 11/12/11 9:09 AM, "Bharath Ravi" <bh...@gmail.com> wrote:

>Hi all,
>
>We're trying to perform some sort of monitoring on HDFS, that could detect
>when a datanode or a data-block
>is "hot". It would be useful to see patterns of popularity in live HDFS
>deployments.
>
>Would anyone know if there are any publicly available statistics on data
>access patterns that we could look at?
>
>Thanks a lot!
>-- 
>Bharath Ravi