You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by 张春玮 <zc...@gmail.com> on 2011/01/19 03:15:29 UTC

help:2 problems in using hadoop sequencefile

Hi,all

I am a HDFS beginner.I use hadoop 0.20.2 in my system, and there are many
small files which are needed to store in this system. These small files are
increasing day and day. So I adopt sequencefile to solve “a large number of
small files” problem. The problem appears in the following situation:



    *public* *static* *void* testSequenceFileWrite(String path,
*int*fileCount,
SequenceFile.CompressionType type )

        *throws* Throwable {

      Writer w = *null*;

      *try* {

        w = SequenceFile.*createWriter*(*fs*, *conf*, *new* Path(path),

              BytesWritable.*class*, BytesWritable.*class*, type);

        *for* (*int* i = 0; i < fileCount; i++) {

           *byte* bs[] = *new* *byte*[i + 1 + 4096];

           *for* (*int* j = 0; j < bs.length; j++) {

              bs[j] = (*byte*) i;

           }

           BytesWritable key = *new* BytesWritable(String.*valueOf*(i+4000)

                 .getBytes());

           BytesWritable value = *new* BytesWritable(bs);

           System.*out*.printf("%d %d\n", i, w.getLength());

           w.append(key, value);



        }

      } *catch* (Throwable t) {

        t.printStackTrace();

      } *finally* {

        *if* (w != *null*) {

           w.close();

        }

      }

   }



   Public static void main(String args[]) {

      testSequenceFileWrite(“/test”, 100,SequenceFile.CompressionType.RECORD);

      testSequenceFileWrite(“/test”, 100,SequenceFile.CompressionType.RECORD);


}


When I invoke this function 2 times in main function, the second time it
will overwrite not append the file “/test” in hdfs. Can you tell me how to
append data when reopen an existing sequencefile in hdfs?

Another problem:
Is Appending operation  supported in HAR file?

Re: help:2 problems in using hadoop sequencefile

Posted by Allen Wittenauer <aw...@linkedin.com>.
On Jan 18, 2011, at 6:15 PM, 张春玮 wrote:
> 
> 
> When I invoke this function 2 times in main function, the second time it
> will overwrite not append the file “/test” in hdfs. Can you tell me how to
> append data when reopen an existing sequencefile in hdfs?

	There is no working append code in any released version of Apache Hadoop 0.20.2.  So you'll need to read all the files with an identity mapper and write them with a single identity reducer.

> 
> Another problem:
> Is Appending operation  supported in HAR file?

	No.