You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Teng, James" <xt...@ebay.com> on 2011/07/18 08:00:42 UTC

Multiple Output Format -Unrecognizable Characters in Output File

Hi,
I encounter a problem why try to define my own MultipleOutputFormat class, here is the codes bellow.
public class MultipleOutputFormat extends FileOutputFormat<LongWritable,Text>{
      public class LineWriter extends RecordWriter<LongWritable,Text>{
            private DataOutputStream output;
            private byte separatorBytes[];
            public LineWriter(DataOutputStream output, String separator) throws UnsupportedEncodingException
            {
                  this.output=output;
                  this.separatorBytes=separator.getBytes("UTF-8");
            }
            @Override
            public synchronized void close(TaskAttemptContext context) throws IOException,
                        InterruptedException {
                  // TODO Auto-generated method stub
                  output.close();
            }

            @Override
            public void write(LongWritable key, Text value) throws IOException,
                        InterruptedException {
                  System.out.println("key:"+key.get());
                  System.out.println("value:"+value.toString());
                  // TODO Auto-generated method stub
                  //output.writeLong(key.)
                  //output.write(separatorBytes);
                  //output.write(value.toString().getBytes("UTF-8"));
                  //output.write("\n".getBytes("UTF-8"));
                  //key.write(output);
                  key.write(output);
value.write(output);

                  output.write("\n".getBytes("UTF-8"));
            }
      }
      private Path path;
      protected String generateFileNameForKeyValue(LongWritable key,Text value,String name)
      {
            return "key"+Math.random();
      }

      @Override
      public RecordWriter<LongWritable, Text> getRecordWriter(
                  TaskAttemptContext context) throws IOException, InterruptedException {
            path=getOutputPath(context);
            System.out.println("ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd");
            // TODO Auto-generated method stub
            Path file = getDefaultWorkFile(context, "");
            FileSystem fs = file.getFileSystem(context.getConfiguration());

            FSDataOutputStream fileOut = fs.create(file, false);

            return new LineWriter(fileOut, "\t");

      }

however, there is a problem of unrecognizable characters occurrences in the output file,
is there any one encounter the problem before, any comment is greatly appreciated, thanks in advance.


James, Teng (Teng Linxiao)
eRL,   CDC,    eBay,    Shanghai
Extension:        86-21-28913530
MSN:     tenglinxiao@hotmail.com<ma...@hotmail.com>
Skype:                James,Teng
Email:            xteng@ebay.com<ma...@ebay.com>
[cid:image002.gif@01CC4553.143F5A00]

Re: Multiple Output Format -Unrecognizable Characters in Output File

Posted by Yaozhen Pan <it...@gmail.com>.
Hi James,

Not sure if you meant to write both key and value as text.
key.write(output);
This line of code writes long numbers as binary format, that might be the
reason you saw unrecognizable characters in output file.

Yaozhen

On Mon, Jul 18, 2011 at 2:00 PM, Teng, James <xt...@ebay.com> wrote:

> ** **
>
> Hi,****
>
> I encounter a problem why try to define my own MultipleOutputFormat class,
> here is the codes bellow.****
>
> *public* *class* MultipleOutputFormat *extends*FileOutputFormat<LongWritable,Text>{
> ****
>
>       *public* *class* LineWriter *extends*RecordWriter<LongWritable,Text>{
> ****
>
>             *private* DataOutputStream output;****
>
>             *private* *byte* *separatorBytes*[];****
>
>             *public* LineWriter(DataOutputStream output, String separator)
> *throws* UnsupportedEncodingException****
>
>             {****
>
>                   *this*.output=output;****
>
>                   *this*.separatorBytes=separator.getBytes("UTF-8");****
>
>             }****
>
>             @Override****
>
>             *public* *synchronized* *void* close(TaskAttemptContext
> context) *throws* IOException,****
>
>                         InterruptedException {****
>
>                   // *TODO* Auto-generated method stub****
>
>                   output.close();****
>
>             }****
>
> ** **
>
>             @Override****
>
>             *public* *void* write(LongWritable key, Text value) *throws*IOException,
> ****
>
>                         InterruptedException {****
>
>                   System.*out*.println("key:"+key.get());****
>
>                   System.*out*.println("value:"+value.toString());****
>
>                   // *TODO* Auto-generated method stub****
>
>                   //output.writeLong(key.)****
>
>                   //output.write(separatorBytes);****
>
>                   //output.write(value.toString().getBytes("UTF-8"));****
>
>                   //output.write("\n".getBytes("UTF-8"));****
>
>                   //key.write(output);****
>
>                   key.write(output);****
>
> value.write(output);****
>
> ** **
>
>                   output.write("\n".getBytes("UTF-8"));****
>
>             }****
>
>       }****
>
>       *private* Path *path*;****
>
>       *protected* String generateFileNameForKeyValue(LongWritable key,Text
> value,String name)****
>
>       {****
>
>             *return* "key"+Math.*random*();****
>
>       }****
>
> ** **
>
>       @Override****
>
>       *public* RecordWriter<LongWritable, Text> getRecordWriter(****
>
>                   TaskAttemptContext context) *throws* IOException,
> InterruptedException {****
>
>             path=*getOutputPath*(context);****
>
>             System.*out*.println(
> "ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
> );****
>
>             // *TODO* Auto-generated method stub****
>
>             Path file = getDefaultWorkFile(context, "");****
>
>             FileSystem fs = file.getFileSystem(context.getConfiguration());
> ****
>
> ** **
>
>             FSDataOutputStream fileOut = fs.create(file, *false*);****
>
> ** **
>
>             *return* *new* LineWriter(fileOut, "\t");****
>
> ** **
>
>       }****
>
> ** **
>
> however, there is a problem of unrecognizable characters occurrences in the
> output file,****
>
> is there any one encounter the problem before, any comment is greatly
> appreciated, thanks in advance.****
>
> ** **
>
>  ****
>
> *James, Teng (Teng Linxiao)*
>
> *eRL,   CDC,    eBay,    Shanghai*****
>
> *Extension*:        86-21-28913530****
>
> *MSN*:     tenglinxiao@hotmail.com****
>
> *Skype*:                James,Teng****
>
> *Email*:            xteng@ebay.com****
>
> ****
>