You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Quan Nguyen Hong <nh...@tma.com.vn> on 2015/08/24 14:36:43 UTC
Appending file makes Hadoop cluster out of storage.
Hi all,
Have a good day!
I used these below code to append file in HDFS from a local file.
The local file size is 85MB.
The Hadoop cluster (CDH 5.4.2, hdfs 2.6, replica number is 3) has 140GB
free.
I have a while loop, in there I do
FSDataOutputStream out = fs.append(outFile);
out.write(buffer, 0, bytesRead);
out.close();
Each time I append 1024 byte from local file to HDFS file, the above loop
makes my cluster out of storage and my program couldn't finished yet.
Here's the full code.
import java.io.*;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
public class writeflushexisted {
public static void main(String[] argv) throws IOException,
URISyntaxException {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(new URI(
"hdfs://192.168.94.185:8020" ),conf);
Path inFile = new Path("testdata.txt");
Path outFile = new Path("/myhdfs/testdata.txt");
File localFile = new File(inFile.toString());
// Read from and write to new file
FileInputStream in = new
FileInputStream(localFile);
int i = 0;
byte buffer[] = new byte[1024];
try {
int bytesRead = 0;
while ((bytesRead = in.read(buffer))
> 0) {
FSDataOutputStream out
= fs.append(outFile);
out.write(buffer, 0,
bytesRead);
out.close();
i++;
}
} catch (IOException e) {
System.out.println("Error while
copying file: " + e.getMessage());
} finally {
in.close();
System.out.println("Number of loop:"
+ i);
}
}
}
Here's the information before I run this code
---------------------------------------------------------------
[hdfs@chdhost125 current]$ hadoop fs -df -h
Filesystem Size Used Available Use%
hdfs://chdhost185.vitaldev.com:8020 266.4 G 38.2 G 139.8 G 14%
---------------------------------------------------------------
[hdfs@chdhost125 lib]$ hadoop fs -du -h /
67.7 M 1.3 G /hbase
0 0 /myhdfs
0 0 /solr
1.8 G 5.4 G /tmp
10.6 G 31.4 G /user
Here's the information while above code was running
---------------------------------------------------------------
Filesystem Size Used Available
Use%
hdfs://chdhost185.vitaldev.com:8020 266.4 G 170.2 G 95.9 G 64%
---------------------------------------------------------------
[hdfs@chdhost125 lib]$ hadoop fs -du -h /
67.7 M 1.3 G /hbase
32.9 M 384 M /myhdfs
0 0 /solr
1.8 G 5.4 G /tmp
10.6 G 31.4 G /user
After 10 minutes, my cluster is out of storage and my program throw
exception with error: "Error while copying file: Failed to replace a bad
datanode on the existing pipeline due to no more good datanodes being
available to try. (Nodes: current=[192.168.94.185:50010,
192.168.94.27:50010], original=[192.168.94.185:50010, 192.168.94.27:50010]).
The current failed datanode replacement policy is DEFAULT, and a client may
configure this via
'dfs.client.block.write.replace-datanode-on-failure.policy' in its
configuration."
So, why append file with little append size (1024 byte) make my cluster out
of space (local file is 85MB, but hdfs consumes ~ 140GB to append file)?
Is any problem with my code? I know that append file with small size is not
recommend, but I just want to know the reason why hdfs consume so much
space.
Thanks and Regards,
Quan Nguyen