You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Binal Jhaveri <bi...@gmail.com> on 2015/11/12 23:23:23 UTC

Merge more than 127 map-reduce jobs not supported

Hi All,

I am trying to group 135 relations on a common parameter (id) but I am
getting an error.

ERROR 1082: Merge more than 127 map-reduce jobs not supported.

My initial error was ERROR 1082: Cogroups with more than 127 inputs not
supported which I resolved by splitting the group clause.

Now I get the map-reduce jobs not supported error when I try to merge the
split jobs.
Below is the code I am using:

groupAllCat1 = GROUP
r1 BY id, r2 BY id, .....; (65 such relations)

groupAllCat2 = GROUP
x1 BY id, x2 BY id, ....; (70 such relations)

mergedAllCat1 = FOREACH groupAllCat1 GENERATE FLATTEN(group) AS id
, FLATTEN(EmptyBagToNull(r1.c1)) AS r1c1
, FLATTEN(EmptyBagToNull(r2.c1)) AS r2c1
,.....;

mergedAllCat2 = FOREACH groupAllCat2 GENERATE FLATTEN(group) AS id
, FLATTEN(EmptyBagToNull(x1.c1)) AS x1c1
, FLATTEN(EmptyBagToNull(x2.c1)) AS x2c1
, .....;

mergedAll = GROUP mergedAllCat1 BY id, mergedAllCat2 BY id;

My end goal is to produce one row per id with all the fields corresponding
to the id in one single row.

Please advise.

Thanks,
Binal

Re: Merge more than 127 map-reduce jobs not supported

Posted by Vitalii Tymchyshyn <vi...@tym.im>.

Well, I think you should be able to store intermediate results to force
multiple level job being generated.

Чт, 19 лист. 2015 22:27 Binal Jhaveri <bi...@gmail.com> пише:

> Hi Daniel/Arvind,
>
> Thanks for the suggestions !
>
> I understand the check is there for a long time. Is there a way to get
> around this ? Considering the cluster does not have Tez.
>
> Thanks !
>
> On Thu, Nov 19, 2015 at 5:27 PM, Daniel Dai <da...@hortonworks.com> wrote:
>
> > This check is there for a very long time. Not sure why you just saw it
> > recently.
> >
> > Yes, try to run it on tez.
> >
> > Thanks,
> > Daniel
> >
> > On 11/17/15, 8:49 PM, "Arvind S" <ar...@gmail.com> wrote:
> >
> > >not faced this issue till now..
> > >
> > >suggestions
> > >> could be related to number of slots you have on your cluster .. if you
> > >are not executing in local mode
> > >> try tez based executor if you have the option. .. launch using "pig -x
> > >tez"
> > >
> > >
> > >
> > >*Cheers !!*
> > >Arvind
> > >
> > >On Fri, Nov 13, 2015 at 3:53 AM, Binal Jhaveri <bi...@gmail.com>
> > wrote:
> > >
> > >> Hi All,
> > >>
> > >> I am trying to group 135 relations on a common parameter (id) but I am
> > >> getting an error.
> > >>
> > >> ERROR 1082: Merge more than 127 map-reduce jobs not supported.
> > >>
> > >> My initial error was ERROR 1082: Cogroups with more than 127 inputs
> not
> > >> supported which I resolved by splitting the group clause.
> > >>
> > >> Now I get the map-reduce jobs not supported error when I try to merge
> > >>the
> > >> split jobs.
> > >> Below is the code I am using:
> > >>
> > >> groupAllCat1 = GROUP
> > >> r1 BY id, r2 BY id, .....; (65 such relations)
> > >>
> > >> groupAllCat2 = GROUP
> > >> x1 BY id, x2 BY id, ....; (70 such relations)
> > >>
> > >> mergedAllCat1 = FOREACH groupAllCat1 GENERATE FLATTEN(group) AS id
> > >> , FLATTEN(EmptyBagToNull(r1.c1)) AS r1c1
> > >> , FLATTEN(EmptyBagToNull(r2.c1)) AS r2c1
> > >> ,.....;
> > >>
> > >> mergedAllCat2 = FOREACH groupAllCat2 GENERATE FLATTEN(group) AS id
> > >> , FLATTEN(EmptyBagToNull(x1.c1)) AS x1c1
> > >> , FLATTEN(EmptyBagToNull(x2.c1)) AS x2c1
> > >> , .....;
> > >>
> > >> mergedAll = GROUP mergedAllCat1 BY id, mergedAllCat2 BY id;
> > >>
> > >> My end goal is to produce one row per id with all the fields
> > >>corresponding
> > >> to the id in one single row.
> > >>
> > >> Please advise.
> > >>
> > >> Thanks,
> > >> Binal
> > >>
> >
> >
>

Re: Merge more than 127 map-reduce jobs not supported

Posted by Vitalii Tymchyshyn <vi...@tym.im>.

Well, I think you should be able to store intermediate results to force
multiple level job being generated.

Чт, 19 лист. 2015 22:27 Binal Jhaveri <bi...@gmail.com> пише:

> Hi Daniel/Arvind,
>
> Thanks for the suggestions !
>
> I understand the check is there for a long time. Is there a way to get
> around this ? Considering the cluster does not have Tez.
>
> Thanks !
>
> On Thu, Nov 19, 2015 at 5:27 PM, Daniel Dai <da...@hortonworks.com> wrote:
>
> > This check is there for a very long time. Not sure why you just saw it
> > recently.
> >
> > Yes, try to run it on tez.
> >
> > Thanks,
> > Daniel
> >
> > On 11/17/15, 8:49 PM, "Arvind S" <ar...@gmail.com> wrote:
> >
> > >not faced this issue till now..
> > >
> > >suggestions
> > >> could be related to number of slots you have on your cluster .. if you
> > >are not executing in local mode
> > >> try tez based executor if you have the option. .. launch using "pig -x
> > >tez"
> > >
> > >
> > >
> > >*Cheers !!*
> > >Arvind
> > >
> > >On Fri, Nov 13, 2015 at 3:53 AM, Binal Jhaveri <bi...@gmail.com>
> > wrote:
> > >
> > >> Hi All,
> > >>
> > >> I am trying to group 135 relations on a common parameter (id) but I am
> > >> getting an error.
> > >>
> > >> ERROR 1082: Merge more than 127 map-reduce jobs not supported.
> > >>
> > >> My initial error was ERROR 1082: Cogroups with more than 127 inputs
> not
> > >> supported which I resolved by splitting the group clause.
> > >>
> > >> Now I get the map-reduce jobs not supported error when I try to merge
> > >>the
> > >> split jobs.
> > >> Below is the code I am using:
> > >>
> > >> groupAllCat1 = GROUP
> > >> r1 BY id, r2 BY id, .....; (65 such relations)
> > >>
> > >> groupAllCat2 = GROUP
> > >> x1 BY id, x2 BY id, ....; (70 such relations)
> > >>
> > >> mergedAllCat1 = FOREACH groupAllCat1 GENERATE FLATTEN(group) AS id
> > >> , FLATTEN(EmptyBagToNull(r1.c1)) AS r1c1
> > >> , FLATTEN(EmptyBagToNull(r2.c1)) AS r2c1
> > >> ,.....;
> > >>
> > >> mergedAllCat2 = FOREACH groupAllCat2 GENERATE FLATTEN(group) AS id
> > >> , FLATTEN(EmptyBagToNull(x1.c1)) AS x1c1
> > >> , FLATTEN(EmptyBagToNull(x2.c1)) AS x2c1
> > >> , .....;
> > >>
> > >> mergedAll = GROUP mergedAllCat1 BY id, mergedAllCat2 BY id;
> > >>
> > >> My end goal is to produce one row per id with all the fields
> > >>corresponding
> > >> to the id in one single row.
> > >>
> > >> Please advise.
> > >>
> > >> Thanks,
> > >> Binal
> > >>
> >
> >
>

Re: Merge more than 127 map-reduce jobs not supported

Posted by Arvind S <ar...@gmail.com>.

please give details on
> data sample that you are using as input ..
> sample if output expected.
> whats the volume ..number of files you wish to process
> details on cluster ..version ..nodes ..cpu ..ram etc..
> any other limitations or restriction in env.


*Cheers !!*
Arvind

On Fri, Nov 20, 2015 at 8:57 AM, Binal Jhaveri <bi...@gmail.com> wrote:

> Hi Daniel/Arvind,
>
> Thanks for the suggestions !
>
> I understand the check is there for a long time. Is there a way to get
> around this ? Considering the cluster does not have Tez.
>
> Thanks !
>
> On Thu, Nov 19, 2015 at 5:27 PM, Daniel Dai <da...@hortonworks.com> wrote:
>
> > This check is there for a very long time. Not sure why you just saw it
> > recently.
> >
> > Yes, try to run it on tez.
> >
> > Thanks,
> > Daniel
> >
> > On 11/17/15, 8:49 PM, "Arvind S" <ar...@gmail.com> wrote:
> >
> > >not faced this issue till now..
> > >
> > >suggestions
> > >> could be related to number of slots you have on your cluster .. if you
> > >are not executing in local mode
> > >> try tez based executor if you have the option. .. launch using "pig -x
> > >tez"
> > >
> > >
> > >
> > >*Cheers !!*
> > >Arvind
> > >
> > >On Fri, Nov 13, 2015 at 3:53 AM, Binal Jhaveri <bi...@gmail.com>
> > wrote:
> > >
> > >> Hi All,
> > >>
> > >> I am trying to group 135 relations on a common parameter (id) but I am
> > >> getting an error.
> > >>
> > >> ERROR 1082: Merge more than 127 map-reduce jobs not supported.
> > >>
> > >> My initial error was ERROR 1082: Cogroups with more than 127 inputs
> not
> > >> supported which I resolved by splitting the group clause.
> > >>
> > >> Now I get the map-reduce jobs not supported error when I try to merge
> > >>the
> > >> split jobs.
> > >> Below is the code I am using:
> > >>
> > >> groupAllCat1 = GROUP
> > >> r1 BY id, r2 BY id, .....; (65 such relations)
> > >>
> > >> groupAllCat2 = GROUP
> > >> x1 BY id, x2 BY id, ....; (70 such relations)
> > >>
> > >> mergedAllCat1 = FOREACH groupAllCat1 GENERATE FLATTEN(group) AS id
> > >> , FLATTEN(EmptyBagToNull(r1.c1)) AS r1c1
> > >> , FLATTEN(EmptyBagToNull(r2.c1)) AS r2c1
> > >> ,.....;
> > >>
> > >> mergedAllCat2 = FOREACH groupAllCat2 GENERATE FLATTEN(group) AS id
> > >> , FLATTEN(EmptyBagToNull(x1.c1)) AS x1c1
> > >> , FLATTEN(EmptyBagToNull(x2.c1)) AS x2c1
> > >> , .....;
> > >>
> > >> mergedAll = GROUP mergedAllCat1 BY id, mergedAllCat2 BY id;
> > >>
> > >> My end goal is to produce one row per id with all the fields
> > >>corresponding
> > >> to the id in one single row.
> > >>
> > >> Please advise.
> > >>
> > >> Thanks,
> > >> Binal
> > >>
> >
> >
>

Re: Merge more than 127 map-reduce jobs not supported

Posted by Binal Jhaveri <bi...@gmail.com>.

Hi Daniel/Arvind,

Thanks for the suggestions !

I understand the check is there for a long time. Is there a way to get
around this ? Considering the cluster does not have Tez.

Thanks !

On Thu, Nov 19, 2015 at 5:27 PM, Daniel Dai <da...@hortonworks.com> wrote:

> This check is there for a very long time. Not sure why you just saw it
> recently.
>
> Yes, try to run it on tez.
>
> Thanks,
> Daniel
>
> On 11/17/15, 8:49 PM, "Arvind S" <ar...@gmail.com> wrote:
>
> >not faced this issue till now..
> >
> >suggestions
> >> could be related to number of slots you have on your cluster .. if you
> >are not executing in local mode
> >> try tez based executor if you have the option. .. launch using "pig -x
> >tez"
> >
> >
> >
> >*Cheers !!*
> >Arvind
> >
> >On Fri, Nov 13, 2015 at 3:53 AM, Binal Jhaveri <bi...@gmail.com>
> wrote:
> >
> >> Hi All,
> >>
> >> I am trying to group 135 relations on a common parameter (id) but I am
> >> getting an error.
> >>
> >> ERROR 1082: Merge more than 127 map-reduce jobs not supported.
> >>
> >> My initial error was ERROR 1082: Cogroups with more than 127 inputs not
> >> supported which I resolved by splitting the group clause.
> >>
> >> Now I get the map-reduce jobs not supported error when I try to merge
> >>the
> >> split jobs.
> >> Below is the code I am using:
> >>
> >> groupAllCat1 = GROUP
> >> r1 BY id, r2 BY id, .....; (65 such relations)
> >>
> >> groupAllCat2 = GROUP
> >> x1 BY id, x2 BY id, ....; (70 such relations)
> >>
> >> mergedAllCat1 = FOREACH groupAllCat1 GENERATE FLATTEN(group) AS id
> >> , FLATTEN(EmptyBagToNull(r1.c1)) AS r1c1
> >> , FLATTEN(EmptyBagToNull(r2.c1)) AS r2c1
> >> ,.....;
> >>
> >> mergedAllCat2 = FOREACH groupAllCat2 GENERATE FLATTEN(group) AS id
> >> , FLATTEN(EmptyBagToNull(x1.c1)) AS x1c1
> >> , FLATTEN(EmptyBagToNull(x2.c1)) AS x2c1
> >> , .....;
> >>
> >> mergedAll = GROUP mergedAllCat1 BY id, mergedAllCat2 BY id;
> >>
> >> My end goal is to produce one row per id with all the fields
> >>corresponding
> >> to the id in one single row.
> >>
> >> Please advise.
> >>
> >> Thanks,
> >> Binal
> >>
>
>

Re: Merge more than 127 map-reduce jobs not supported

Posted by Daniel Dai <da...@hortonworks.com>.

This check is there for a very long time. Not sure why you just saw it
recently.

Yes, try to run it on tez.

Thanks,
Daniel

On 11/17/15, 8:49 PM, "Arvind S" <ar...@gmail.com> wrote:

>not faced this issue till now..
>
>suggestions
>> could be related to number of slots you have on your cluster .. if you
>are not executing in local mode
>> try tez based executor if you have the option. .. launch using "pig -x
>tez"
>
>
>
>*Cheers !!*
>Arvind
>
>On Fri, Nov 13, 2015 at 3:53 AM, Binal Jhaveri <bi...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am trying to group 135 relations on a common parameter (id) but I am
>> getting an error.
>>
>> ERROR 1082: Merge more than 127 map-reduce jobs not supported.
>>
>> My initial error was ERROR 1082: Cogroups with more than 127 inputs not
>> supported which I resolved by splitting the group clause.
>>
>> Now I get the map-reduce jobs not supported error when I try to merge
>>the
>> split jobs.
>> Below is the code I am using:
>>
>> groupAllCat1 = GROUP
>> r1 BY id, r2 BY id, .....; (65 such relations)
>>
>> groupAllCat2 = GROUP
>> x1 BY id, x2 BY id, ....; (70 such relations)
>>
>> mergedAllCat1 = FOREACH groupAllCat1 GENERATE FLATTEN(group) AS id
>> , FLATTEN(EmptyBagToNull(r1.c1)) AS r1c1
>> , FLATTEN(EmptyBagToNull(r2.c1)) AS r2c1
>> ,.....;
>>
>> mergedAllCat2 = FOREACH groupAllCat2 GENERATE FLATTEN(group) AS id
>> , FLATTEN(EmptyBagToNull(x1.c1)) AS x1c1
>> , FLATTEN(EmptyBagToNull(x2.c1)) AS x2c1
>> , .....;
>>
>> mergedAll = GROUP mergedAllCat1 BY id, mergedAllCat2 BY id;
>>
>> My end goal is to produce one row per id with all the fields
>>corresponding
>> to the id in one single row.
>>
>> Please advise.
>>
>> Thanks,
>> Binal
>>

Re: Merge more than 127 map-reduce jobs not supported

Posted by Daniel Dai <da...@hortonworks.com>.

This check is there for a very long time. Not sure why you just saw it
recently.

Yes, try to run it on tez.

Thanks,
Daniel

On 11/17/15, 8:49 PM, "Arvind S" <ar...@gmail.com> wrote:

>not faced this issue till now..
>
>suggestions
>> could be related to number of slots you have on your cluster .. if you
>are not executing in local mode
>> try tez based executor if you have the option. .. launch using "pig -x
>tez"
>
>
>
>*Cheers !!*
>Arvind
>
>On Fri, Nov 13, 2015 at 3:53 AM, Binal Jhaveri <bi...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am trying to group 135 relations on a common parameter (id) but I am
>> getting an error.
>>
>> ERROR 1082: Merge more than 127 map-reduce jobs not supported.
>>
>> My initial error was ERROR 1082: Cogroups with more than 127 inputs not
>> supported which I resolved by splitting the group clause.
>>
>> Now I get the map-reduce jobs not supported error when I try to merge
>>the
>> split jobs.
>> Below is the code I am using:
>>
>> groupAllCat1 = GROUP
>> r1 BY id, r2 BY id, .....; (65 such relations)
>>
>> groupAllCat2 = GROUP
>> x1 BY id, x2 BY id, ....; (70 such relations)
>>
>> mergedAllCat1 = FOREACH groupAllCat1 GENERATE FLATTEN(group) AS id
>> , FLATTEN(EmptyBagToNull(r1.c1)) AS r1c1
>> , FLATTEN(EmptyBagToNull(r2.c1)) AS r2c1
>> ,.....;
>>
>> mergedAllCat2 = FOREACH groupAllCat2 GENERATE FLATTEN(group) AS id
>> , FLATTEN(EmptyBagToNull(x1.c1)) AS x1c1
>> , FLATTEN(EmptyBagToNull(x2.c1)) AS x2c1
>> , .....;
>>
>> mergedAll = GROUP mergedAllCat1 BY id, mergedAllCat2 BY id;
>>
>> My end goal is to produce one row per id with all the fields
>>corresponding
>> to the id in one single row.
>>
>> Please advise.
>>
>> Thanks,
>> Binal
>>

Re: Merge more than 127 map-reduce jobs not supported

Posted by Arvind S <ar...@gmail.com>.

not faced this issue till now..

suggestions
> could be related to number of slots you have on your cluster .. if you
are not executing in local mode
> try tez based executor if you have the option. .. launch using "pig -x
tez"



*Cheers !!*
Arvind

On Fri, Nov 13, 2015 at 3:53 AM, Binal Jhaveri <bi...@gmail.com> wrote:

> Hi All,
>
> I am trying to group 135 relations on a common parameter (id) but I am
> getting an error.
>
> ERROR 1082: Merge more than 127 map-reduce jobs not supported.
>
> My initial error was ERROR 1082: Cogroups with more than 127 inputs not
> supported which I resolved by splitting the group clause.
>
> Now I get the map-reduce jobs not supported error when I try to merge the
> split jobs.
> Below is the code I am using:
>
> groupAllCat1 = GROUP
> r1 BY id, r2 BY id, .....; (65 such relations)
>
> groupAllCat2 = GROUP
> x1 BY id, x2 BY id, ....; (70 such relations)
>
> mergedAllCat1 = FOREACH groupAllCat1 GENERATE FLATTEN(group) AS id
> , FLATTEN(EmptyBagToNull(r1.c1)) AS r1c1
> , FLATTEN(EmptyBagToNull(r2.c1)) AS r2c1
> ,.....;
>
> mergedAllCat2 = FOREACH groupAllCat2 GENERATE FLATTEN(group) AS id
> , FLATTEN(EmptyBagToNull(x1.c1)) AS x1c1
> , FLATTEN(EmptyBagToNull(x2.c1)) AS x2c1
> , .....;
>
> mergedAll = GROUP mergedAllCat1 BY id, mergedAllCat2 BY id;
>
> My end goal is to produce one row per id with all the fields corresponding
> to the id in one single row.
>
> Please advise.
>
> Thanks,
> Binal
>