You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by ch huang <ju...@gmail.com> on 2013/07/30 08:54:51 UTC

hadoop missing file?

one of my workmate told me some of his file missing ,i use fs check find
following info , how can i prevent  them from missing? anyone can help me?

Status: HEALTHY
 Total size:    272020850157 B (Total open files size: 652056 B)
 Total dirs:    1143
 Total files:   1886 (Files currently being written: 2)
 Total blocks (validated):      5651 (avg. block size 48136763 B) (Total
open file blocks (not validated): 1)
 Minimally replicated blocks:   5651 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       129 (2.2827818 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              903 (5.0571237 %)
 Number of data-nodes:          3
 Number of racks:               1
FSCK ended at Tue Jul 30 14:38:01 CST 2013 in 462 milliseconds

Re: hadoop missing file?

Posted by Bertrand Dechoux <de...@gmail.com>.

(10-3) * 129 = 903

But long answer
1) which missing file?
2) how do you know it is missing?

You have a cluster with 3 datanodes, the default replication factor is 3
but not for the job jar which is 10 (mapred.submit.replication).
Let's say you ran 129 jobs who failed in a weird way (like at submission),
you would have 129 under-replicated blocks (one block per jar because your
jar is small) and 903 missing replicas because with 3 datanodes you can't
have more than 3 replicas anyway.

So back to the first question : which missing file?
It might only be that the file hasn't be uploaded in the first place. It
happens.

For all your blocks, you do have at least one replica : Minimally
replicated blocks:   5651 (100.0 %)

Bertrand

On Tue, Jul 30, 2013 at 8:54 AM, ch huang <ju...@gmail.com> wrote:

> one of my workmate told me some of his file missing ,i use fs check find
> following info , how can i prevent  them from missing? anyone can help me?
>
> Status: HEALTHY
>  Total size:    272020850157 B (Total open files size: 652056 B)
>  Total dirs:    1143
>  Total files:   1886 (Files currently being written: 2)
>  Total blocks (validated):      5651 (avg. block size 48136763 B) (Total
> open file blocks (not validated): 1)
>  Minimally replicated blocks:   5651 (100.0 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       129 (2.2827818 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     3.0
>  Corrupt blocks:                0
>  Missing replicas:              903 (5.0571237 %)
>  Number of data-nodes:          3
>  Number of racks:               1
> FSCK ended at Tue Jul 30 14:38:01 CST 2013 in 462 milliseconds
>

-- 
Bertrand Dechoux

Re: hadoop missing file?

Posted by Bertrand Dechoux <de...@gmail.com>.

(10-3) * 129 = 903

But long answer
1) which missing file?
2) how do you know it is missing?

You have a cluster with 3 datanodes, the default replication factor is 3
but not for the job jar which is 10 (mapred.submit.replication).
Let's say you ran 129 jobs who failed in a weird way (like at submission),
you would have 129 under-replicated blocks (one block per jar because your
jar is small) and 903 missing replicas because with 3 datanodes you can't
have more than 3 replicas anyway.

So back to the first question : which missing file?
It might only be that the file hasn't be uploaded in the first place. It
happens.

For all your blocks, you do have at least one replica : Minimally
replicated blocks:   5651 (100.0 %)

Bertrand

On Tue, Jul 30, 2013 at 8:54 AM, ch huang <ju...@gmail.com> wrote:

> one of my workmate told me some of his file missing ,i use fs check find
> following info , how can i prevent  them from missing? anyone can help me?
>
> Status: HEALTHY
>  Total size:    272020850157 B (Total open files size: 652056 B)
>  Total dirs:    1143
>  Total files:   1886 (Files currently being written: 2)
>  Total blocks (validated):      5651 (avg. block size 48136763 B) (Total
> open file blocks (not validated): 1)
>  Minimally replicated blocks:   5651 (100.0 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       129 (2.2827818 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     3.0
>  Corrupt blocks:                0
>  Missing replicas:              903 (5.0571237 %)
>  Number of data-nodes:          3
>  Number of racks:               1
> FSCK ended at Tue Jul 30 14:38:01 CST 2013 in 462 milliseconds
>

-- 
Bertrand Dechoux

Re: hadoop missing file?

Posted by Bertrand Dechoux <de...@gmail.com>.

(10-3) * 129 = 903

But long answer
1) which missing file?
2) how do you know it is missing?

You have a cluster with 3 datanodes, the default replication factor is 3
but not for the job jar which is 10 (mapred.submit.replication).
Let's say you ran 129 jobs who failed in a weird way (like at submission),
you would have 129 under-replicated blocks (one block per jar because your
jar is small) and 903 missing replicas because with 3 datanodes you can't
have more than 3 replicas anyway.

So back to the first question : which missing file?
It might only be that the file hasn't be uploaded in the first place. It
happens.

For all your blocks, you do have at least one replica : Minimally
replicated blocks:   5651 (100.0 %)

Bertrand

On Tue, Jul 30, 2013 at 8:54 AM, ch huang <ju...@gmail.com> wrote:

> one of my workmate told me some of his file missing ,i use fs check find
> following info , how can i prevent  them from missing? anyone can help me?
>
> Status: HEALTHY
>  Total size:    272020850157 B (Total open files size: 652056 B)
>  Total dirs:    1143
>  Total files:   1886 (Files currently being written: 2)
>  Total blocks (validated):      5651 (avg. block size 48136763 B) (Total
> open file blocks (not validated): 1)
>  Minimally replicated blocks:   5651 (100.0 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       129 (2.2827818 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     3.0
>  Corrupt blocks:                0
>  Missing replicas:              903 (5.0571237 %)
>  Number of data-nodes:          3
>  Number of racks:               1
> FSCK ended at Tue Jul 30 14:38:01 CST 2013 in 462 milliseconds
>

-- 
Bertrand Dechoux

Re: hadoop missing file?

Posted by Bertrand Dechoux <de...@gmail.com>.

(10-3) * 129 = 903

But long answer
1) which missing file?
2) how do you know it is missing?

You have a cluster with 3 datanodes, the default replication factor is 3
but not for the job jar which is 10 (mapred.submit.replication).
Let's say you ran 129 jobs who failed in a weird way (like at submission),
you would have 129 under-replicated blocks (one block per jar because your
jar is small) and 903 missing replicas because with 3 datanodes you can't
have more than 3 replicas anyway.

So back to the first question : which missing file?
It might only be that the file hasn't be uploaded in the first place. It
happens.

For all your blocks, you do have at least one replica : Minimally
replicated blocks:   5651 (100.0 %)

Bertrand

On Tue, Jul 30, 2013 at 8:54 AM, ch huang <ju...@gmail.com> wrote:

> one of my workmate told me some of his file missing ,i use fs check find
> following info , how can i prevent  them from missing? anyone can help me?
>
> Status: HEALTHY
>  Total size:    272020850157 B (Total open files size: 652056 B)
>  Total dirs:    1143
>  Total files:   1886 (Files currently being written: 2)
>  Total blocks (validated):      5651 (avg. block size 48136763 B) (Total
> open file blocks (not validated): 1)
>  Minimally replicated blocks:   5651 (100.0 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       129 (2.2827818 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     3.0
>  Corrupt blocks:                0
>  Missing replicas:              903 (5.0571237 %)
>  Number of data-nodes:          3
>  Number of racks:               1
> FSCK ended at Tue Jul 30 14:38:01 CST 2013 in 462 milliseconds
>

-- 
Bertrand Dechoux