You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Vivek K <ha...@gmail.com> on 2011/09/13 18:27:54 UTC

Outputformat and RecordWriter in Hadoop Pipes

Hi all,

I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I
have been able to successfully work with my own mappers and reducers, but
now I need to generate output (from reducer) in a format different from the
default TextOutputFormat. I have a few questions:

(1) Similar to Hadoop streaming, is there an option to set OutputFormat in
HadoopPipes (in order to use say org.apache.hadoop.io.SequenceFile.Writer) ?
I am using Hadoop version 0.20.2.

(2) For a simple test on how to use an in-built non-default writer, I tried
the following:

     hadoop pipes -D hadoop.pipes.java.recordreader=true -D
hadoop.pipes.java.recordwriter=false -input input.seq -output output
-inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer
org.apache.hadoop.io.SequenceFile.Writer -program my_test_program

     However this fails with a ClassNotFound exception. And if I remove the
-writer flag and use the default writer, it works just fine.

(3) Is there some example or discussion related to how to write your own
RecordWriter and run it with Hadoop-pipes ?

Thanks.

Best,
Vivek
--

Re: Outputformat and RecordWriter in Hadoop Pipes

Posted by Vivek K <ha...@gmail.com>.
It would very helpful if someone can point to where I can possibly find a
solution to this problem.

Thanks.
Vivek
--
On Tue, Sep 13, 2011 at 12:27 PM, Vivek K <ha...@gmail.com> wrote:

> Hi all,
>
> I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I
> have been able to successfully work with my own mappers and reducers, but
> now I need to generate output (from reducer) in a format different from the
> default TextOutputFormat. I have a few questions:
>
> (1) Similar to Hadoop streaming, is there an option to set OutputFormat in
> HadoopPipes (in order to use say org.apache.hadoop.io.SequenceFile.Writer) ?
> I am using Hadoop version 0.20.2.
>
> (2) For a simple test on how to use an in-built non-default writer, I tried
> the following:
>
>      hadoop pipes -D hadoop.pipes.java.recordreader=true -D
> hadoop.pipes.java.recordwriter=false -input input.seq -output output
> -inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer
> org.apache.hadoop.io.SequenceFile.Writer -program my_test_program
>
>      However this fails with a ClassNotFound exception. And if I remove the
> -writer flag and use the default writer, it works just fine.
>
> (3) Is there some example or discussion related to how to write your own
> RecordWriter and run it with Hadoop-pipes ?
>
> Thanks.
>
> Best,
> Vivek
> --
>
>

Re: Outputformat and RecordWriter in Hadoop Pipes

Posted by Vivek K <ha...@gmail.com>.
Hi Brock

Thanks for a prompt and to-the-point response.
It is working as you said.

Best,
Vivek
--
On Tue, Sep 20, 2011 at 6:25 PM, Brock Noland <br...@cloudera.com> wrote:

> Hi,
>
> On Tue, Sep 13, 2011 at 12:27 PM, Vivek K <ha...@gmail.com> wrote:
> > Hi all,
> >
> > I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I
> > have been able to successfully work with my own mappers and reducers, but
> > now I need to generate output (from reducer) in a format different from
> the
> > default TextOutputFormat. I have a few questions:
> >
> > (1) Similar to Hadoop streaming, is there an option to set OutputFormat
> in
> > HadoopPipes (in order to use say
> org.apache.hadoop.io.SequenceFile.Writer) ?
> > I am using Hadoop version 0.20.2.
> >
> > (2) For a simple test on how to use an in-built non-default writer, I
> tried
> > the following:
> >
> >     hadoop pipes -D hadoop.pipes.java.recordreader=true -D
> > hadoop.pipes.java.recordwriter=false -input input.seq -output output
> > -inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer
> > org.apache.hadoop.io.SequenceFile.Writer -program my_test_program
>
>
> -writer wants an outputformat:
>
>      if (results.hasOption("writer")) {
>        setIsJavaRecordWriter(job, true);
>        job.setOutputFormat(getClass(results, "writer", job,
>                                      OutputFormat.class));
>
>
>
> As such I think you want:
>
> -writer org.apache.hadoop.mapred.SequenceFileOutputFormat
>
> SequenceFile.Writer simply writes sequence files has nothing todo with
> MapReduce.
>
> This is also wrong:
>
> hadoop.pipes.java.recordwriter=false
>
> Brock
>

Re: Outputformat and RecordWriter in Hadoop Pipes

Posted by Brock Noland <br...@cloudera.com>.
Hi,

On Tue, Sep 13, 2011 at 12:27 PM, Vivek K <ha...@gmail.com> wrote:
> Hi all,
>
> I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I
> have been able to successfully work with my own mappers and reducers, but
> now I need to generate output (from reducer) in a format different from the
> default TextOutputFormat. I have a few questions:
>
> (1) Similar to Hadoop streaming, is there an option to set OutputFormat in
> HadoopPipes (in order to use say org.apache.hadoop.io.SequenceFile.Writer) ?
> I am using Hadoop version 0.20.2.
>
> (2) For a simple test on how to use an in-built non-default writer, I tried
> the following:
>
>     hadoop pipes -D hadoop.pipes.java.recordreader=true -D
> hadoop.pipes.java.recordwriter=false -input input.seq -output output
> -inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer
> org.apache.hadoop.io.SequenceFile.Writer -program my_test_program


-writer wants an outputformat:

      if (results.hasOption("writer")) {
        setIsJavaRecordWriter(job, true);
        job.setOutputFormat(getClass(results, "writer", job,
                                      OutputFormat.class));



As such I think you want:

-writer org.apache.hadoop.mapred.SequenceFileOutputFormat

SequenceFile.Writer simply writes sequence files has nothing todo with
MapReduce.

This is also wrong:

hadoop.pipes.java.recordwriter=false

Brock