You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sugandha Naolekar <su...@gmail.com> on 2009/09/04 09:46:51 UTC

Some issues!

Hello!

        Running a simple MR job, and setting a replication factor of 2. Now,
after its execution, the output is split in files named as part-00000 and so
on. I want to ask is, can't we avoid these keys or key values to get printed
in output files? I mean, I am getting the output in the files in key-value
pair. I want just the data and not the keys(integers) in it.




-- 
Regards!
Sugandha

Re: Some issues!

Posted by ll_oz_ll <hi...@yahoo.com>.
Yes, you can do that. Just output null as the key in reducer and you wont get
the key or the tab delimiter in your output.


Sugandha Naolekar wrote:
> 
> Hello!
> 
>         Running a simple MR job, and setting a replication factor of 2.
> Now,
> after its execution, the output is split in files named as part-00000 and
> so
> on. I want to ask is, can't we avoid these keys or key values to get
> printed
> in output files? I mean, I am getting the output in the files in key-value
> pair. I want just the data and not the keys(integers) in it.
> 
> 
> 
> 
> -- 
> Regards!
> Sugandha
> 
> 

-- 
View this message in context: http://www.nabble.com/Some-issues%21-tp25289798p25323434.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: Some issues!

Posted by bharath vissapragada <bh...@gmail.com>.
Amogh , thanks for yout reply.

I will make my question more clear ,

Suppose I have an array and it got updated in the MRjob1 . and i want to
access it in MRjob2 . This is what i intended in my previous question . I
have gone through the JobConf class , but i haven't found anything useful .
If  Iam wrong , kindly point me to the correct methods. .

Thanks

On Fri, Sep 4, 2009 at 9:26 PM, Amogh Vasekar <am...@yahoo-inc.com> wrote:

> Have a look at jobclient, it should suffice.
>
> Cheers!
> Amogh
>
> -----Original Message-----
> From: bharath vissapragada [mailto:bharathvissapragada1990@gmail.com]
> Sent: Friday, September 04, 2009 9:15 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Some issues!
>
> Hey ,
>
> I have one more doubt , Suppose I have some cascading mapred jobs and
> suppose some data which was collected in
> MRjob1 is to be used in MRjob2 m is there any way?
>
> Thanks
>
> On Fri, Sep 4, 2009 at 1:54 PM, Amandeep Khurana <am...@gmail.com> wrote:
>
> > Or you can output the data in the keys and NullWritable as the value.
> > That ways you'll get only unique data...
> >
> > On 9/4/09, zhang jianfeng <zj...@gmail.com> wrote:
> > > Hi Sugandha ,
> > >
> > > If you only want to the value, you need to set the key as NullWritable
> in
> > > reduce.
> > >
> > > e.g.
> > > output.collect(NullWritable.get(), value);
> > >
> > >
> > >
> > > On Fri, Sep 4, 2009 at 12:46 AM, Sugandha Naolekar
> > > <su...@gmail.com>wrote:
> > >
> > >> Hello!
> > >>
> > >>        Running a simple MR job, and setting a replication factor of 2.
> > >> Now,
> > >> after its execution, the output is split in files named as part-00000
> > and
> > >> so
> > >> on. I want to ask is, can't we avoid these keys or key values to get
> > >> printed
> > >> in output files? I mean, I am getting the output in the files in
> > key-value
> > >> pair. I want just the data and not the keys(integers) in it.
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Regards!
> > >> Sugandha
> > >>
> > >
> >
> >
> > --
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
>

RE: Some issues!

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Have a look at jobclient, it should suffice.

Cheers!
Amogh

-----Original Message-----
From: bharath vissapragada [mailto:bharathvissapragada1990@gmail.com] 
Sent: Friday, September 04, 2009 9:15 PM
To: common-user@hadoop.apache.org
Subject: Re: Some issues!

Hey ,

I have one more doubt , Suppose I have some cascading mapred jobs and
suppose some data which was collected in
MRjob1 is to be used in MRjob2 m is there any way?

Thanks

On Fri, Sep 4, 2009 at 1:54 PM, Amandeep Khurana <am...@gmail.com> wrote:

> Or you can output the data in the keys and NullWritable as the value.
> That ways you'll get only unique data...
>
> On 9/4/09, zhang jianfeng <zj...@gmail.com> wrote:
> > Hi Sugandha ,
> >
> > If you only want to the value, you need to set the key as NullWritable in
> > reduce.
> >
> > e.g.
> > output.collect(NullWritable.get(), value);
> >
> >
> >
> > On Fri, Sep 4, 2009 at 12:46 AM, Sugandha Naolekar
> > <su...@gmail.com>wrote:
> >
> >> Hello!
> >>
> >>        Running a simple MR job, and setting a replication factor of 2.
> >> Now,
> >> after its execution, the output is split in files named as part-00000
> and
> >> so
> >> on. I want to ask is, can't we avoid these keys or key values to get
> >> printed
> >> in output files? I mean, I am getting the output in the files in
> key-value
> >> pair. I want just the data and not the keys(integers) in it.
> >>
> >>
> >>
> >>
> >> --
> >> Regards!
> >> Sugandha
> >>
> >
>
>
> --
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>

Re: Some issues!

Posted by bharath vissapragada <bh...@gmail.com>.
Hey ,

I have one more doubt , Suppose I have some cascading mapred jobs and
suppose some data which was collected in
MRjob1 is to be used in MRjob2 m is there any way?

Thanks

On Fri, Sep 4, 2009 at 1:54 PM, Amandeep Khurana <am...@gmail.com> wrote:

> Or you can output the data in the keys and NullWritable as the value.
> That ways you'll get only unique data...
>
> On 9/4/09, zhang jianfeng <zj...@gmail.com> wrote:
> > Hi Sugandha ,
> >
> > If you only want to the value, you need to set the key as NullWritable in
> > reduce.
> >
> > e.g.
> > output.collect(NullWritable.get(), value);
> >
> >
> >
> > On Fri, Sep 4, 2009 at 12:46 AM, Sugandha Naolekar
> > <su...@gmail.com>wrote:
> >
> >> Hello!
> >>
> >>        Running a simple MR job, and setting a replication factor of 2.
> >> Now,
> >> after its execution, the output is split in files named as part-00000
> and
> >> so
> >> on. I want to ask is, can't we avoid these keys or key values to get
> >> printed
> >> in output files? I mean, I am getting the output in the files in
> key-value
> >> pair. I want just the data and not the keys(integers) in it.
> >>
> >>
> >>
> >>
> >> --
> >> Regards!
> >> Sugandha
> >>
> >
>
>
> --
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>

Re: Some issues!

Posted by Amandeep Khurana <am...@gmail.com>.
Or you can output the data in the keys and NullWritable as the value.
That ways you'll get only unique data...

On 9/4/09, zhang jianfeng <zj...@gmail.com> wrote:
> Hi Sugandha ,
>
> If you only want to the value, you need to set the key as NullWritable in
> reduce.
>
> e.g.
> output.collect(NullWritable.get(), value);
>
>
>
> On Fri, Sep 4, 2009 at 12:46 AM, Sugandha Naolekar
> <su...@gmail.com>wrote:
>
>> Hello!
>>
>>        Running a simple MR job, and setting a replication factor of 2.
>> Now,
>> after its execution, the output is split in files named as part-00000 and
>> so
>> on. I want to ask is, can't we avoid these keys or key values to get
>> printed
>> in output files? I mean, I am getting the output in the files in key-value
>> pair. I want just the data and not the keys(integers) in it.
>>
>>
>>
>>
>> --
>> Regards!
>> Sugandha
>>
>


-- 


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

Re: Some issues!

Posted by zhang jianfeng <zj...@gmail.com>.
Hi Sugandha ,

If you only want to the value, you need to set the key as NullWritable in
reduce.

e.g.
output.collect(NullWritable.get(), value);



On Fri, Sep 4, 2009 at 12:46 AM, Sugandha Naolekar
<su...@gmail.com>wrote:

> Hello!
>
>        Running a simple MR job, and setting a replication factor of 2. Now,
> after its execution, the output is split in files named as part-00000 and
> so
> on. I want to ask is, can't we avoid these keys or key values to get
> printed
> in output files? I mean, I am getting the output in the files in key-value
> pair. I want just the data and not the keys(integers) in it.
>
>
>
>
> --
> Regards!
> Sugandha
>