You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Barak Yaish <ba...@gmail.com> on 2013/01/24 08:10:26 UTC

MulitpleOutputs outputs just one line

Hi,

I'm trying to utilize MulitpleOutputs ( hadoop 1.0.4 ) to produce multiple
files based on some policy. In the job i set:

MultipleOutputs.addNamedOutput( job, "rejected", TextOutputFormat.class,
Text.class, NullWritable.class );

And at the mapper:

private MultipleOutputs<Text, Writable> mos;

setup(): mos = new MultipleOutputs( context );

map():   if( somecond )
             {
                     context.write( new Text( key ), NullWritable.get() );
             }
             else
             {
                     logger.info( "Going to write to mos: " + key );
                     mos.write( new Text( key ), NullWritable.get(), "/tmp"
);
             }

The problem I'm facing is that if multiple mappers running that code, I can
see at the logs that the mos.write() is being invoked, but only one line is
printed to the output file under /tmp. Is there some config I missed?

Thanks.

Re: MulitpleOutputs outputs just one line

Posted by Barak Yaish <ba...@gmail.com>.
Yes, I'm calling mos.close() at the Mapper.cleanup(). Are there some logs
that I can turn on to troubleshoot this issue?

On Thu, Jan 24, 2013 at 9:36 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi Barak,
>
> As instructed on
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
> ,
> do you also make sure to call the mos.close() function at the end of
> Mapper (in its cleanup stage)?
>
> On Thu, Jan 24, 2013 at 12:40 PM, Barak Yaish <ba...@gmail.com>
> wrote:
> > Hi,
> >
> > I'm trying to utilize MulitpleOutputs ( hadoop 1.0.4 ) to produce
> multiple
> > files based on some policy. In the job i set:
> >
> > MultipleOutputs.addNamedOutput( job, "rejected", TextOutputFormat.class,
> > Text.class, NullWritable.class );
> >
> > And at the mapper:
> >
> > private MultipleOutputs<Text, Writable> mos;
> >
> > setup(): mos = new MultipleOutputs( context );
> >
> > map():   if( somecond )
> >              {
> >                      context.write( new Text( key ), NullWritable.get()
> );
> >              }
> >              else
> >              {
> >                      logger.info( "Going to write to mos: " + key );
> >                      mos.write( new Text( key ), NullWritable.get(),
> "/tmp"
> > );
> >              }
> >
> > The problem I'm facing is that if multiple mappers running that code, I
> can
> > see at the logs that the mos.write() is being invoked, but only one line
> is
> > printed to the output file under /tmp. Is there some config I missed?
> >
> > Thanks.
>
>
>
> --
> Harsh J
>

Re: MulitpleOutputs outputs just one line

Posted by Barak Yaish <ba...@gmail.com>.
Yes, I'm calling mos.close() at the Mapper.cleanup(). Are there some logs
that I can turn on to troubleshoot this issue?

On Thu, Jan 24, 2013 at 9:36 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi Barak,
>
> As instructed on
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
> ,
> do you also make sure to call the mos.close() function at the end of
> Mapper (in its cleanup stage)?
>
> On Thu, Jan 24, 2013 at 12:40 PM, Barak Yaish <ba...@gmail.com>
> wrote:
> > Hi,
> >
> > I'm trying to utilize MulitpleOutputs ( hadoop 1.0.4 ) to produce
> multiple
> > files based on some policy. In the job i set:
> >
> > MultipleOutputs.addNamedOutput( job, "rejected", TextOutputFormat.class,
> > Text.class, NullWritable.class );
> >
> > And at the mapper:
> >
> > private MultipleOutputs<Text, Writable> mos;
> >
> > setup(): mos = new MultipleOutputs( context );
> >
> > map():   if( somecond )
> >              {
> >                      context.write( new Text( key ), NullWritable.get()
> );
> >              }
> >              else
> >              {
> >                      logger.info( "Going to write to mos: " + key );
> >                      mos.write( new Text( key ), NullWritable.get(),
> "/tmp"
> > );
> >              }
> >
> > The problem I'm facing is that if multiple mappers running that code, I
> can
> > see at the logs that the mos.write() is being invoked, but only one line
> is
> > printed to the output file under /tmp. Is there some config I missed?
> >
> > Thanks.
>
>
>
> --
> Harsh J
>

Re: MulitpleOutputs outputs just one line

Posted by Barak Yaish <ba...@gmail.com>.
Yes, I'm calling mos.close() at the Mapper.cleanup(). Are there some logs
that I can turn on to troubleshoot this issue?

On Thu, Jan 24, 2013 at 9:36 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi Barak,
>
> As instructed on
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
> ,
> do you also make sure to call the mos.close() function at the end of
> Mapper (in its cleanup stage)?
>
> On Thu, Jan 24, 2013 at 12:40 PM, Barak Yaish <ba...@gmail.com>
> wrote:
> > Hi,
> >
> > I'm trying to utilize MulitpleOutputs ( hadoop 1.0.4 ) to produce
> multiple
> > files based on some policy. In the job i set:
> >
> > MultipleOutputs.addNamedOutput( job, "rejected", TextOutputFormat.class,
> > Text.class, NullWritable.class );
> >
> > And at the mapper:
> >
> > private MultipleOutputs<Text, Writable> mos;
> >
> > setup(): mos = new MultipleOutputs( context );
> >
> > map():   if( somecond )
> >              {
> >                      context.write( new Text( key ), NullWritable.get()
> );
> >              }
> >              else
> >              {
> >                      logger.info( "Going to write to mos: " + key );
> >                      mos.write( new Text( key ), NullWritable.get(),
> "/tmp"
> > );
> >              }
> >
> > The problem I'm facing is that if multiple mappers running that code, I
> can
> > see at the logs that the mos.write() is being invoked, but only one line
> is
> > printed to the output file under /tmp. Is there some config I missed?
> >
> > Thanks.
>
>
>
> --
> Harsh J
>

Re: MulitpleOutputs outputs just one line

Posted by Barak Yaish <ba...@gmail.com>.
Yes, I'm calling mos.close() at the Mapper.cleanup(). Are there some logs
that I can turn on to troubleshoot this issue?

On Thu, Jan 24, 2013 at 9:36 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi Barak,
>
> As instructed on
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
> ,
> do you also make sure to call the mos.close() function at the end of
> Mapper (in its cleanup stage)?
>
> On Thu, Jan 24, 2013 at 12:40 PM, Barak Yaish <ba...@gmail.com>
> wrote:
> > Hi,
> >
> > I'm trying to utilize MulitpleOutputs ( hadoop 1.0.4 ) to produce
> multiple
> > files based on some policy. In the job i set:
> >
> > MultipleOutputs.addNamedOutput( job, "rejected", TextOutputFormat.class,
> > Text.class, NullWritable.class );
> >
> > And at the mapper:
> >
> > private MultipleOutputs<Text, Writable> mos;
> >
> > setup(): mos = new MultipleOutputs( context );
> >
> > map():   if( somecond )
> >              {
> >                      context.write( new Text( key ), NullWritable.get()
> );
> >              }
> >              else
> >              {
> >                      logger.info( "Going to write to mos: " + key );
> >                      mos.write( new Text( key ), NullWritable.get(),
> "/tmp"
> > );
> >              }
> >
> > The problem I'm facing is that if multiple mappers running that code, I
> can
> > see at the logs that the mos.write() is being invoked, but only one line
> is
> > printed to the output file under /tmp. Is there some config I missed?
> >
> > Thanks.
>
>
>
> --
> Harsh J
>

Re: MulitpleOutputs outputs just one line

Posted by Harsh J <ha...@cloudera.com>.
Hi Barak,

As instructed on
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html,
do you also make sure to call the mos.close() function at the end of
Mapper (in its cleanup stage)?

On Thu, Jan 24, 2013 at 12:40 PM, Barak Yaish <ba...@gmail.com> wrote:
> Hi,
>
> I'm trying to utilize MulitpleOutputs ( hadoop 1.0.4 ) to produce multiple
> files based on some policy. In the job i set:
>
> MultipleOutputs.addNamedOutput( job, "rejected", TextOutputFormat.class,
> Text.class, NullWritable.class );
>
> And at the mapper:
>
> private MultipleOutputs<Text, Writable> mos;
>
> setup(): mos = new MultipleOutputs( context );
>
> map():   if( somecond )
>              {
>                      context.write( new Text( key ), NullWritable.get() );
>              }
>              else
>              {
>                      logger.info( "Going to write to mos: " + key );
>                      mos.write( new Text( key ), NullWritable.get(), "/tmp"
> );
>              }
>
> The problem I'm facing is that if multiple mappers running that code, I can
> see at the logs that the mos.write() is being invoked, but only one line is
> printed to the output file under /tmp. Is there some config I missed?
>
> Thanks.



-- 
Harsh J

Re: MulitpleOutputs outputs just one line

Posted by Harsh J <ha...@cloudera.com>.
Hi Barak,

As instructed on
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html,
do you also make sure to call the mos.close() function at the end of
Mapper (in its cleanup stage)?

On Thu, Jan 24, 2013 at 12:40 PM, Barak Yaish <ba...@gmail.com> wrote:
> Hi,
>
> I'm trying to utilize MulitpleOutputs ( hadoop 1.0.4 ) to produce multiple
> files based on some policy. In the job i set:
>
> MultipleOutputs.addNamedOutput( job, "rejected", TextOutputFormat.class,
> Text.class, NullWritable.class );
>
> And at the mapper:
>
> private MultipleOutputs<Text, Writable> mos;
>
> setup(): mos = new MultipleOutputs( context );
>
> map():   if( somecond )
>              {
>                      context.write( new Text( key ), NullWritable.get() );
>              }
>              else
>              {
>                      logger.info( "Going to write to mos: " + key );
>                      mos.write( new Text( key ), NullWritable.get(), "/tmp"
> );
>              }
>
> The problem I'm facing is that if multiple mappers running that code, I can
> see at the logs that the mos.write() is being invoked, but only one line is
> printed to the output file under /tmp. Is there some config I missed?
>
> Thanks.



-- 
Harsh J

Re: MulitpleOutputs outputs just one line

Posted by Harsh J <ha...@cloudera.com>.
Hi Barak,

As instructed on
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html,
do you also make sure to call the mos.close() function at the end of
Mapper (in its cleanup stage)?

On Thu, Jan 24, 2013 at 12:40 PM, Barak Yaish <ba...@gmail.com> wrote:
> Hi,
>
> I'm trying to utilize MulitpleOutputs ( hadoop 1.0.4 ) to produce multiple
> files based on some policy. In the job i set:
>
> MultipleOutputs.addNamedOutput( job, "rejected", TextOutputFormat.class,
> Text.class, NullWritable.class );
>
> And at the mapper:
>
> private MultipleOutputs<Text, Writable> mos;
>
> setup(): mos = new MultipleOutputs( context );
>
> map():   if( somecond )
>              {
>                      context.write( new Text( key ), NullWritable.get() );
>              }
>              else
>              {
>                      logger.info( "Going to write to mos: " + key );
>                      mos.write( new Text( key ), NullWritable.get(), "/tmp"
> );
>              }
>
> The problem I'm facing is that if multiple mappers running that code, I can
> see at the logs that the mos.write() is being invoked, but only one line is
> printed to the output file under /tmp. Is there some config I missed?
>
> Thanks.



-- 
Harsh J

Re: MulitpleOutputs outputs just one line

Posted by Harsh J <ha...@cloudera.com>.
Hi Barak,

As instructed on
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html,
do you also make sure to call the mos.close() function at the end of
Mapper (in its cleanup stage)?

On Thu, Jan 24, 2013 at 12:40 PM, Barak Yaish <ba...@gmail.com> wrote:
> Hi,
>
> I'm trying to utilize MulitpleOutputs ( hadoop 1.0.4 ) to produce multiple
> files based on some policy. In the job i set:
>
> MultipleOutputs.addNamedOutput( job, "rejected", TextOutputFormat.class,
> Text.class, NullWritable.class );
>
> And at the mapper:
>
> private MultipleOutputs<Text, Writable> mos;
>
> setup(): mos = new MultipleOutputs( context );
>
> map():   if( somecond )
>              {
>                      context.write( new Text( key ), NullWritable.get() );
>              }
>              else
>              {
>                      logger.info( "Going to write to mos: " + key );
>                      mos.write( new Text( key ), NullWritable.get(), "/tmp"
> );
>              }
>
> The problem I'm facing is that if multiple mappers running that code, I can
> see at the logs that the mos.write() is being invoked, but only one line is
> printed to the output file under /tmp. Is there some config I missed?
>
> Thanks.



-- 
Harsh J