You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "Bryan A. Pendleton" <bp...@geekdom.net> on 2006/03/08 02:16:41 UTC

Files rotting?

So, I was sick for a week, before which I'd been a week or so since I'd
touched the hadoop work I've been doing. I left the namenode/datanodes
running during that time, so that disk failures and whatnot could clean up
after themselves.

My current dataset is composed of about 16k files, totalling around 450gb.

For the second time now, I've found that files have started to rot. No nodes
from the ~20 machine cluster died during the time of I wasn't paying
attention to them. Each machine has an average of 3 drives in it, and, after
the last time, I turned replication up to 4x, "just in case". Yet, somehow,
dozen of files are now missing blocks. They weren't missing blocks before.

Has anyone run into this? I can't find any gremlins in the system,
especially not that would leave 99% of my data alone but kill all 4 copies
on different machines of a few blocks such as to make them disappear from
the cluster entirely.... but it's starting to get annoying.

--
Bryan A. Pendleton
Ph: (877) geek-1-bp