You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by PORTO aLET <po...@gmail.com> on 2009/06/13 18:00:48 UTC

data in yahoo / facebook hdfs

Hi,
I am just wondering what do facebook/yahoo do with the data in hdfs after
they finish processing the log files or whatever that are in hdfs?
Are they simply deleted? or get backed up in tape ?
whats the typical process?
Also what is the process of adding a new node to the hadoop cluster? simply
connect a new computer to the network (and setup the hadoop conf)?

Re: data in yahoo / facebook hdfs

Posted by Allen Wittenauer <aw...@yahoo-inc.com>.


On 6/13/09 9:00 AM, "PORTO aLET" <po...@gmail.com> wrote:
> I am just wondering what do facebook/yahoo do with the data in hdfs after
> they finish processing the log files or whatever that are in hdfs?
> Are they simply deleted? or get backed up in tape ?
> whats the typical process?

    The grid ops team here at Yahoo! has a strict retention policy that
dictates the data is deleted after X time period.  We perform no backups of
the data on the grid.  It is also worth mentioning that the data is loaded
from the primary source, so in the case of data corruption (hai hadoop-0.18)
or accidental deletion (where are my snapshots dev people?), we reload the
data from that primary source. (dependent, of course, on whether they still
have it or not)

> Also what is the process of adding a new node to the hadoop cluster? simply
> connect a new computer to the network (and setup the hadoop conf)?

    http://wiki.apache.org/hadoop/FAQ#17