You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by MichaĆ Michalak <mi...@gmail.com> on 2014/07/15 07:13:15 UTC
Java/Webhdfs - upload with snappy compressing on the fly
Hello
I need to upload large file using WEBHDFS (from local disk into HDFS,
WEBHDFS is my only option, don't have direct access). Because in my case
network connection is bottleneck I decided to compress file with snappy
before sending. I am using Java application, compiled with
"org.apache.hadoop:hadoop-client:2.4.0" library.
So far my code looks as below:
private void uploadFile(Path hdfsPath, FileSystem fileSystem) throws
IOException {
// Input file reader
BufferedReader bufferedReader = new BufferedReader(new
FileReader(localFile), INPUT_STREAM_BUFFER_SIZE);
// Output file writer
FSDataOutputStream hdfsDataOutputStream =
fileSystem.create(hdfsPath, false, OUTPUT_STREAM_BUFFER_SIZE);
SnappyOutputStream snappyOutputStream = new
SnappyOutputStream(hdfsDataOutputStream, OUTPUT_STREAM_BUFFER_SIZE);
BufferedWriter bufferedWriter = new BufferedWriter(new
OutputStreamWriter(snappyOutputStream, "UTF-8"));
String line;
while ((line = bufferedReader.readLine()) != null) {
bufferedWriter.write(line);
}
bufferedReader.close();
bufferedWriter.close();
}
Basically it works. Snappy compressed file is uploaded to HDFS, yet there
seem to be problems with snappy format itsefl. It is not recognized as
snappy compressed file by Hadoop. I checked my compressed file, and another
one compressed by Hadoop. Main compressed stream seem to be the same in
both files, but headers are different.
What do I do wrong? Would you be so kind to suggest any solution for my
issue?
Best Regards
Michal Michalak