You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by MichaƂ Michalak <mi...@gmail.com> on 2014/07/15 07:13:15 UTC

Java/Webhdfs - upload with snappy compressing on the fly

Hello

I need to upload large file using WEBHDFS (from local disk into HDFS,
WEBHDFS is my only option, don't have direct access). Because in my case
network connection is bottleneck I decided to compress file with snappy
before sending. I am using Java application, compiled with
"org.apache.hadoop:hadoop-client:2.4.0" library.

So far my code looks as below:

private void uploadFile(Path hdfsPath, FileSystem fileSystem) throws
IOException {
        // Input file reader
        BufferedReader bufferedReader = new BufferedReader(new
FileReader(localFile), INPUT_STREAM_BUFFER_SIZE);

        // Output file writer
        FSDataOutputStream hdfsDataOutputStream =
fileSystem.create(hdfsPath, false, OUTPUT_STREAM_BUFFER_SIZE);
        SnappyOutputStream snappyOutputStream = new
SnappyOutputStream(hdfsDataOutputStream, OUTPUT_STREAM_BUFFER_SIZE);
        BufferedWriter bufferedWriter = new BufferedWriter(new
OutputStreamWriter(snappyOutputStream, "UTF-8"));

        String line;
        while ((line = bufferedReader.readLine()) != null) {
            bufferedWriter.write(line);
        }

        bufferedReader.close();
        bufferedWriter.close();
    }

Basically it works. Snappy compressed file is uploaded to HDFS, yet there
seem to be problems with snappy format itsefl. It is not recognized as
snappy compressed file by Hadoop. I checked my compressed file, and another
one compressed by Hadoop. Main compressed stream seem to be the same in
both files, but headers are different.

What do I do wrong? Would you be so kind to suggest any solution for my
issue?

Best Regards
Michal Michalak