You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Fatih Haltas <fa...@nyu.edu> on 2013/03/24 14:03:15 UTC

2 Reduce method in one Job

I want to get reduce output as key and value then I want to pass them to a
new reduce as input key and input value.

So is there any Map-Reduce-Reduce kind of method?

Thanks to all.

Re: 2 Reduce method in one Job

Posted by Harsh J <ha...@cloudera.com>.

Yes, just use an identity mapper (in new API, the base Mapper class
itself identity-maps, in the old API use IdentityMapper class) and set
the input path as the output path of the first job.

If you'll be ending up doing more such step-wise job chaining,
consider using Apache Oozie's workflow system.

On Sun, Mar 24, 2013 at 7:23 PM, Fatih Haltas <fa...@nyu.edu> wrote:
> Thank you very much.
>
> You are right Harsh, it is exactly what i am trying to do.
>
> I want to process my result, according to the keys and i donot spend time
> writing this data to hdfs, I want to pass data as input to another reduce.
>
> One more question then,
> Creating 2 diffirent job, secondone has only reduce for example, is it
> possible to pass first jobs output as argument to second job?
>
>
> On Sun, Mar 24, 2013 at 5:44 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> You seem to want to re-sort/partition your data without materializing
>> it onto HDFS.
>>
>> Azuryy is right: There isn't a way right now and a second job (with an
>> identity mapper) is necessary. With YARN this is more possible to
>> implement into the project, though.
>>
>> The newly inducted incubator project Tez sorta targets this. Its in
>> its nascent stages though (for general user use), and the website
>> should hopefully appear at
>> http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can
>> read the proposal behind this project at
>> http://wiki.apache.org/incubator/TezProposal. Initial sources are at
>> https://svn.apache.org/repos/asf/incubator/tez/trunk/.
>>
>> On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas <fa...@nyu.edu>
>> wrote:
>> > I want to get reduce output as key and value then I want to pass them to
>> > a
>> > new reduce as input key and input value.
>> >
>> > So is there any Map-Reduce-Reduce kind of method?
>> >
>> > Thanks to all.
>>
>>
>>
>> --
>> Harsh J
>
>



--
Harsh J

Re: 2 Reduce method in one Job

Posted by Harsh J <ha...@cloudera.com>.

Yes, just use an identity mapper (in new API, the base Mapper class
itself identity-maps, in the old API use IdentityMapper class) and set
the input path as the output path of the first job.

If you'll be ending up doing more such step-wise job chaining,
consider using Apache Oozie's workflow system.

On Sun, Mar 24, 2013 at 7:23 PM, Fatih Haltas <fa...@nyu.edu> wrote:
> Thank you very much.
>
> You are right Harsh, it is exactly what i am trying to do.
>
> I want to process my result, according to the keys and i donot spend time
> writing this data to hdfs, I want to pass data as input to another reduce.
>
> One more question then,
> Creating 2 diffirent job, secondone has only reduce for example, is it
> possible to pass first jobs output as argument to second job?
>
>
> On Sun, Mar 24, 2013 at 5:44 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> You seem to want to re-sort/partition your data without materializing
>> it onto HDFS.
>>
>> Azuryy is right: There isn't a way right now and a second job (with an
>> identity mapper) is necessary. With YARN this is more possible to
>> implement into the project, though.
>>
>> The newly inducted incubator project Tez sorta targets this. Its in
>> its nascent stages though (for general user use), and the website
>> should hopefully appear at
>> http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can
>> read the proposal behind this project at
>> http://wiki.apache.org/incubator/TezProposal. Initial sources are at
>> https://svn.apache.org/repos/asf/incubator/tez/trunk/.
>>
>> On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas <fa...@nyu.edu>
>> wrote:
>> > I want to get reduce output as key and value then I want to pass them to
>> > a
>> > new reduce as input key and input value.
>> >
>> > So is there any Map-Reduce-Reduce kind of method?
>> >
>> > Thanks to all.
>>
>>
>>
>> --
>> Harsh J
>
>



--
Harsh J

Re: 2 Reduce method in one Job

Posted by Harsh J <ha...@cloudera.com>.

Yes, just use an identity mapper (in new API, the base Mapper class
itself identity-maps, in the old API use IdentityMapper class) and set
the input path as the output path of the first job.

If you'll be ending up doing more such step-wise job chaining,
consider using Apache Oozie's workflow system.

On Sun, Mar 24, 2013 at 7:23 PM, Fatih Haltas <fa...@nyu.edu> wrote:
> Thank you very much.
>
> You are right Harsh, it is exactly what i am trying to do.
>
> I want to process my result, according to the keys and i donot spend time
> writing this data to hdfs, I want to pass data as input to another reduce.
>
> One more question then,
> Creating 2 diffirent job, secondone has only reduce for example, is it
> possible to pass first jobs output as argument to second job?
>
>
> On Sun, Mar 24, 2013 at 5:44 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> You seem to want to re-sort/partition your data without materializing
>> it onto HDFS.
>>
>> Azuryy is right: There isn't a way right now and a second job (with an
>> identity mapper) is necessary. With YARN this is more possible to
>> implement into the project, though.
>>
>> The newly inducted incubator project Tez sorta targets this. Its in
>> its nascent stages though (for general user use), and the website
>> should hopefully appear at
>> http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can
>> read the proposal behind this project at
>> http://wiki.apache.org/incubator/TezProposal. Initial sources are at
>> https://svn.apache.org/repos/asf/incubator/tez/trunk/.
>>
>> On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas <fa...@nyu.edu>
>> wrote:
>> > I want to get reduce output as key and value then I want to pass them to
>> > a
>> > new reduce as input key and input value.
>> >
>> > So is there any Map-Reduce-Reduce kind of method?
>> >
>> > Thanks to all.
>>
>>
>>
>> --
>> Harsh J
>
>



--
Harsh J

Re: 2 Reduce method in one Job

Posted by Harsh J <ha...@cloudera.com>.

Yes, just use an identity mapper (in new API, the base Mapper class
itself identity-maps, in the old API use IdentityMapper class) and set
the input path as the output path of the first job.

If you'll be ending up doing more such step-wise job chaining,
consider using Apache Oozie's workflow system.

On Sun, Mar 24, 2013 at 7:23 PM, Fatih Haltas <fa...@nyu.edu> wrote:
> Thank you very much.
>
> You are right Harsh, it is exactly what i am trying to do.
>
> I want to process my result, according to the keys and i donot spend time
> writing this data to hdfs, I want to pass data as input to another reduce.
>
> One more question then,
> Creating 2 diffirent job, secondone has only reduce for example, is it
> possible to pass first jobs output as argument to second job?
>
>
> On Sun, Mar 24, 2013 at 5:44 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> You seem to want to re-sort/partition your data without materializing
>> it onto HDFS.
>>
>> Azuryy is right: There isn't a way right now and a second job (with an
>> identity mapper) is necessary. With YARN this is more possible to
>> implement into the project, though.
>>
>> The newly inducted incubator project Tez sorta targets this. Its in
>> its nascent stages though (for general user use), and the website
>> should hopefully appear at
>> http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can
>> read the proposal behind this project at
>> http://wiki.apache.org/incubator/TezProposal. Initial sources are at
>> https://svn.apache.org/repos/asf/incubator/tez/trunk/.
>>
>> On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas <fa...@nyu.edu>
>> wrote:
>> > I want to get reduce output as key and value then I want to pass them to
>> > a
>> > new reduce as input key and input value.
>> >
>> > So is there any Map-Reduce-Reduce kind of method?
>> >
>> > Thanks to all.
>>
>>
>>
>> --
>> Harsh J
>
>



--
Harsh J

Re: 2 Reduce method in one Job

Posted by Fatih Haltas <fa...@nyu.edu>.

Thank you very much.

You are right Harsh, it is exactly what i am trying to do.

I want to process my result, according to the keys and i donot spend time
writing this data to hdfs, I want to pass data as input to another reduce.

One more question then,
Creating 2 diffirent job, secondone has only reduce for example, is it
possible to pass first jobs output as argument to second job?

On Sun, Mar 24, 2013 at 5:44 PM, Harsh J <ha...@cloudera.com> wrote:

> You seem to want to re-sort/partition your data without materializing
> it onto HDFS.
>
> Azuryy is right: There isn't a way right now and a second job (with an
> identity mapper) is necessary. With YARN this is more possible to
> implement into the project, though.
>
> The newly inducted incubator project Tez sorta targets this. Its in
> its nascent stages though (for general user use), and the website
> should hopefully appear at
> http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can
> read the proposal behind this project at
> http://wiki.apache.org/incubator/TezProposal. Initial sources are at
> https://svn.apache.org/repos/asf/incubator/tez/trunk/.
>
> On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas <fa...@nyu.edu>
> wrote:
> > I want to get reduce output as key and value then I want to pass them to
> a
> > new reduce as input key and input value.
> >
> > So is there any Map-Reduce-Reduce kind of method?
> >
> > Thanks to all.
>
>
>
> --
> Harsh J
>

Re: 2 Reduce method in one Job

Posted by Fatih Haltas <fa...@nyu.edu>.

Thank you very much.

You are right Harsh, it is exactly what i am trying to do.

I want to process my result, according to the keys and i donot spend time
writing this data to hdfs, I want to pass data as input to another reduce.

One more question then,
Creating 2 diffirent job, secondone has only reduce for example, is it
possible to pass first jobs output as argument to second job?

On Sun, Mar 24, 2013 at 5:44 PM, Harsh J <ha...@cloudera.com> wrote:

> You seem to want to re-sort/partition your data without materializing
> it onto HDFS.
>
> Azuryy is right: There isn't a way right now and a second job (with an
> identity mapper) is necessary. With YARN this is more possible to
> implement into the project, though.
>
> The newly inducted incubator project Tez sorta targets this. Its in
> its nascent stages though (for general user use), and the website
> should hopefully appear at
> http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can
> read the proposal behind this project at
> http://wiki.apache.org/incubator/TezProposal. Initial sources are at
> https://svn.apache.org/repos/asf/incubator/tez/trunk/.
>
> On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas <fa...@nyu.edu>
> wrote:
> > I want to get reduce output as key and value then I want to pass them to
> a
> > new reduce as input key and input value.
> >
> > So is there any Map-Reduce-Reduce kind of method?
> >
> > Thanks to all.
>
>
>
> --
> Harsh J
>

Re: 2 Reduce method in one Job

Posted by Fatih Haltas <fa...@nyu.edu>.

Thank you very much.

You are right Harsh, it is exactly what i am trying to do.

I want to process my result, according to the keys and i donot spend time
writing this data to hdfs, I want to pass data as input to another reduce.

One more question then,
Creating 2 diffirent job, secondone has only reduce for example, is it
possible to pass first jobs output as argument to second job?

On Sun, Mar 24, 2013 at 5:44 PM, Harsh J <ha...@cloudera.com> wrote:

> You seem to want to re-sort/partition your data without materializing
> it onto HDFS.
>
> Azuryy is right: There isn't a way right now and a second job (with an
> identity mapper) is necessary. With YARN this is more possible to
> implement into the project, though.
>
> The newly inducted incubator project Tez sorta targets this. Its in
> its nascent stages though (for general user use), and the website
> should hopefully appear at
> http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can
> read the proposal behind this project at
> http://wiki.apache.org/incubator/TezProposal. Initial sources are at
> https://svn.apache.org/repos/asf/incubator/tez/trunk/.
>
> On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas <fa...@nyu.edu>
> wrote:
> > I want to get reduce output as key and value then I want to pass them to
> a
> > new reduce as input key and input value.
> >
> > So is there any Map-Reduce-Reduce kind of method?
> >
> > Thanks to all.
>
>
>
> --
> Harsh J
>

Re: 2 Reduce method in one Job

Posted by Fatih Haltas <fa...@nyu.edu>.

Thank you very much.

You are right Harsh, it is exactly what i am trying to do.

I want to process my result, according to the keys and i donot spend time
writing this data to hdfs, I want to pass data as input to another reduce.

One more question then,
Creating 2 diffirent job, secondone has only reduce for example, is it
possible to pass first jobs output as argument to second job?

On Sun, Mar 24, 2013 at 5:44 PM, Harsh J <ha...@cloudera.com> wrote:

> You seem to want to re-sort/partition your data without materializing
> it onto HDFS.
>
> Azuryy is right: There isn't a way right now and a second job (with an
> identity mapper) is necessary. With YARN this is more possible to
> implement into the project, though.
>
> The newly inducted incubator project Tez sorta targets this. Its in
> its nascent stages though (for general user use), and the website
> should hopefully appear at
> http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can
> read the proposal behind this project at
> http://wiki.apache.org/incubator/TezProposal. Initial sources are at
> https://svn.apache.org/repos/asf/incubator/tez/trunk/.
>
> On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas <fa...@nyu.edu>
> wrote:
> > I want to get reduce output as key and value then I want to pass them to
> a
> > new reduce as input key and input value.
> >
> > So is there any Map-Reduce-Reduce kind of method?
> >
> > Thanks to all.
>
>
>
> --
> Harsh J
>

Re: 2 Reduce method in one Job

Posted by Harsh J <ha...@cloudera.com>.

You seem to want to re-sort/partition your data without materializing
it onto HDFS.

Azuryy is right: There isn't a way right now and a second job (with an
identity mapper) is necessary. With YARN this is more possible to
implement into the project, though.

The newly inducted incubator project Tez sorta targets this. Its in
its nascent stages though (for general user use), and the website
should hopefully appear at
http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can
read the proposal behind this project at
http://wiki.apache.org/incubator/TezProposal. Initial sources are at
https://svn.apache.org/repos/asf/incubator/tez/trunk/.

On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas <fa...@nyu.edu> wrote:
> I want to get reduce output as key and value then I want to pass them to a
> new reduce as input key and input value.
>
> So is there any Map-Reduce-Reduce kind of method?
>
> Thanks to all.

-- 
Harsh J

Re: 2 Reduce method in one Job

Posted by Azuryy Yu <az...@gmail.com>.

there isn't such method, you had to submit another MR.
On Mar 24, 2013 9:03 PM, "Fatih Haltas" <fa...@nyu.edu> wrote:

> I want to get reduce output as key and value then I want to pass them to a
> new reduce as input key and input value.
>
> So is there any Map-Reduce-Reduce kind of method?
>
> Thanks to all.
>

Re: 2 Reduce method in one Job

Posted by Harsh J <ha...@cloudera.com>.

You seem to want to re-sort/partition your data without materializing
it onto HDFS.

Azuryy is right: There isn't a way right now and a second job (with an
identity mapper) is necessary. With YARN this is more possible to
implement into the project, though.

The newly inducted incubator project Tez sorta targets this. Its in
its nascent stages though (for general user use), and the website
should hopefully appear at
http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can
read the proposal behind this project at
http://wiki.apache.org/incubator/TezProposal. Initial sources are at
https://svn.apache.org/repos/asf/incubator/tez/trunk/.

On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas <fa...@nyu.edu> wrote:
> I want to get reduce output as key and value then I want to pass them to a
> new reduce as input key and input value.
>
> So is there any Map-Reduce-Reduce kind of method?
>
> Thanks to all.

-- 
Harsh J

Re: 2 Reduce method in one Job

Posted by Azuryy Yu <az...@gmail.com>.

there isn't such method, you had to submit another MR.
On Mar 24, 2013 9:03 PM, "Fatih Haltas" <fa...@nyu.edu> wrote:

> I want to get reduce output as key and value then I want to pass them to a
> new reduce as input key and input value.
>
> So is there any Map-Reduce-Reduce kind of method?
>
> Thanks to all.
>

Re: 2 Reduce method in one Job

Posted by Azuryy Yu <az...@gmail.com>.

there isn't such method, you had to submit another MR.
On Mar 24, 2013 9:03 PM, "Fatih Haltas" <fa...@nyu.edu> wrote:

> I want to get reduce output as key and value then I want to pass them to a
> new reduce as input key and input value.
>
> So is there any Map-Reduce-Reduce kind of method?
>
> Thanks to all.
>

Re: 2 Reduce method in one Job

Posted by Harsh J <ha...@cloudera.com>.

You seem to want to re-sort/partition your data without materializing
it onto HDFS.

Azuryy is right: There isn't a way right now and a second job (with an
identity mapper) is necessary. With YARN this is more possible to
implement into the project, though.

The newly inducted incubator project Tez sorta targets this. Its in
its nascent stages though (for general user use), and the website
should hopefully appear at
http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can
read the proposal behind this project at
http://wiki.apache.org/incubator/TezProposal. Initial sources are at
https://svn.apache.org/repos/asf/incubator/tez/trunk/.

On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas <fa...@nyu.edu> wrote:
> I want to get reduce output as key and value then I want to pass them to a
> new reduce as input key and input value.
>
> So is there any Map-Reduce-Reduce kind of method?
>
> Thanks to all.

-- 
Harsh J

Re: 2 Reduce method in one Job

Posted by Harsh J <ha...@cloudera.com>.

You seem to want to re-sort/partition your data without materializing
it onto HDFS.

Azuryy is right: There isn't a way right now and a second job (with an
identity mapper) is necessary. With YARN this is more possible to
implement into the project, though.

The newly inducted incubator project Tez sorta targets this. Its in
its nascent stages though (for general user use), and the website
should hopefully appear at
http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can
read the proposal behind this project at
http://wiki.apache.org/incubator/TezProposal. Initial sources are at
https://svn.apache.org/repos/asf/incubator/tez/trunk/.

On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas <fa...@nyu.edu> wrote:
> I want to get reduce output as key and value then I want to pass them to a
> new reduce as input key and input value.
>
> So is there any Map-Reduce-Reduce kind of method?
>
> Thanks to all.

-- 
Harsh J

Re: 2 Reduce method in one Job

Posted by Azuryy Yu <az...@gmail.com>.

there isn't such method, you had to submit another MR.
On Mar 24, 2013 9:03 PM, "Fatih Haltas" <fa...@nyu.edu> wrote:

> I want to get reduce output as key and value then I want to pass them to a
> new reduce as input key and input value.
>
> So is there any Map-Reduce-Reduce kind of method?
>
> Thanks to all.
>