You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Amit Sela <am...@infolinks.com> on 2014/01/12 17:25:58 UTC

manipulating key in combine phase

Hi all,

I was wondering if it is possible to manipulate the key during combine:

Say I have a mapreduce job where the key has many qualifiers.
I would like to "split" the key into two (or more) keys if it has more
than, say 100 qualifiers.
In the combiner class I would do something like:

int count = 0;
for (Writable value: values) {
  if (++count >= 100){
    context.write(newKey, value);
  } else {
    context.write(key, value);
  }
}

where newKey is something like key+randomUUID

I know that the combiner can be called "zero, once or more..." and I'm
getting strange results (same key written more then once) so I would be
glad to get some deeper insight into how the combiner works.

Thanks,

Amit.

Re: manipulating key in combine phase

Posted by Devin Suiter RDX <ds...@rdx.com>.
I believe combine process is after that step, so, no.

What comes out of a mapper is a set of records {k1, v1} {k1, v2} {k1, v(n)}
{k2, v1} {k2, v2} {k2, v(n)} and then reducers aggregate that into arrays
like {k1, {v1, v2, v(n)}}, {k2, {v1, v2, v(n)}} and performs logic on the
value set for each unique key, for example.

What comes out of a combiner is {k1, {v1, v2, v(n)}}, {k2, {v1, v2, v(n)}},
the same {k, v} map that reducer builds, and then the reducer does the
logic on the value set for each unique key.

If you change the key in the combiner, you aren't working with the same
set, and so you've used your combiner as another mapper, essentially. But
your method signature won't be right.

Combiner is designed solely to reduce network traffic from mappers to
reducers, since there are usually more mappers than reducers, it reduces
bottlenecking at switches.

If you want to change the key after you've set the key, I feel like you
should use chainMapper and/or write custom input/output format classes if
you need to.

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Mon, Jan 13, 2014 at 12:39 PM, Amit Sela <am...@infolinks.com> wrote:

> More than a solution, I'd like to know if a combiner is allowed to change
> the key ? will it interfere with the mappers sort/merge ?
>
>
> On Mon, Jan 13, 2014 at 3:06 PM, Devin Suiter RDX <ds...@rdx.com> wrote:
>
>> Amit,
>>
>> Have you explored chainMapper class?
>>
>> *Devin Suiter*
>> Jr. Data Solutions Software Engineer
>> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
>> Google Voice: 412-256-8556 | www.rdx.com
>>
>>
>> On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <jo...@redpoint.net>wrote:
>>
>>>  Isn’t this is what you’d normally do in the Mapper?
>>>
>>> My understanding of the combiner is that it is like a “mapper-side
>>> pre-reducer” and operates on blocks of data that have already been sorted
>>> by key, so mucking with the keys doesn’t **seem** like a good idea.
>>>
>>> john
>>>
>>>
>>>
>>> *From:* Amit Sela [mailto:amits@infolinks.com]
>>> *Sent:* Sunday, January 12, 2014 9:26 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* manipulating key in combine phase
>>>
>>>
>>>
>>> Hi all,
>>>
>>>
>>>
>>> I was wondering if it is possible to manipulate the key during combine:
>>>
>>>
>>>
>>> Say I have a mapreduce job where the key has many qualifiers.
>>>
>>> I would like to "split" the key into two (or more) keys if it has more
>>> than, say 100 qualifiers.
>>>
>>> In the combiner class I would do something like:
>>>
>>>
>>>
>>> int count = 0;
>>>
>>> for (Writable value: values) {
>>>
>>>   if (++count >= 100){
>>>
>>>     context.write(newKey, value);
>>>
>>>   } else {
>>>
>>>     context.write(key, value);
>>>
>>>   }
>>>
>>> }
>>>
>>>
>>>
>>> where newKey is something like key+randomUUID
>>>
>>>
>>>
>>> I know that the combiner can be called "zero, once or more..." and I'm
>>> getting strange results (same key written more then once) so I would be
>>> glad to get some deeper insight into how the combiner works.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Amit.
>>>
>>
>>
>

Re: manipulating key in combine phase

Posted by Devin Suiter RDX <ds...@rdx.com>.
I believe combine process is after that step, so, no.

What comes out of a mapper is a set of records {k1, v1} {k1, v2} {k1, v(n)}
{k2, v1} {k2, v2} {k2, v(n)} and then reducers aggregate that into arrays
like {k1, {v1, v2, v(n)}}, {k2, {v1, v2, v(n)}} and performs logic on the
value set for each unique key, for example.

What comes out of a combiner is {k1, {v1, v2, v(n)}}, {k2, {v1, v2, v(n)}},
the same {k, v} map that reducer builds, and then the reducer does the
logic on the value set for each unique key.

If you change the key in the combiner, you aren't working with the same
set, and so you've used your combiner as another mapper, essentially. But
your method signature won't be right.

Combiner is designed solely to reduce network traffic from mappers to
reducers, since there are usually more mappers than reducers, it reduces
bottlenecking at switches.

If you want to change the key after you've set the key, I feel like you
should use chainMapper and/or write custom input/output format classes if
you need to.

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Mon, Jan 13, 2014 at 12:39 PM, Amit Sela <am...@infolinks.com> wrote:

> More than a solution, I'd like to know if a combiner is allowed to change
> the key ? will it interfere with the mappers sort/merge ?
>
>
> On Mon, Jan 13, 2014 at 3:06 PM, Devin Suiter RDX <ds...@rdx.com> wrote:
>
>> Amit,
>>
>> Have you explored chainMapper class?
>>
>> *Devin Suiter*
>> Jr. Data Solutions Software Engineer
>> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
>> Google Voice: 412-256-8556 | www.rdx.com
>>
>>
>> On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <jo...@redpoint.net>wrote:
>>
>>>  Isn’t this is what you’d normally do in the Mapper?
>>>
>>> My understanding of the combiner is that it is like a “mapper-side
>>> pre-reducer” and operates on blocks of data that have already been sorted
>>> by key, so mucking with the keys doesn’t **seem** like a good idea.
>>>
>>> john
>>>
>>>
>>>
>>> *From:* Amit Sela [mailto:amits@infolinks.com]
>>> *Sent:* Sunday, January 12, 2014 9:26 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* manipulating key in combine phase
>>>
>>>
>>>
>>> Hi all,
>>>
>>>
>>>
>>> I was wondering if it is possible to manipulate the key during combine:
>>>
>>>
>>>
>>> Say I have a mapreduce job where the key has many qualifiers.
>>>
>>> I would like to "split" the key into two (or more) keys if it has more
>>> than, say 100 qualifiers.
>>>
>>> In the combiner class I would do something like:
>>>
>>>
>>>
>>> int count = 0;
>>>
>>> for (Writable value: values) {
>>>
>>>   if (++count >= 100){
>>>
>>>     context.write(newKey, value);
>>>
>>>   } else {
>>>
>>>     context.write(key, value);
>>>
>>>   }
>>>
>>> }
>>>
>>>
>>>
>>> where newKey is something like key+randomUUID
>>>
>>>
>>>
>>> I know that the combiner can be called "zero, once or more..." and I'm
>>> getting strange results (same key written more then once) so I would be
>>> glad to get some deeper insight into how the combiner works.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Amit.
>>>
>>
>>
>

Re: manipulating key in combine phase

Posted by Devin Suiter RDX <ds...@rdx.com>.
I believe combine process is after that step, so, no.

What comes out of a mapper is a set of records {k1, v1} {k1, v2} {k1, v(n)}
{k2, v1} {k2, v2} {k2, v(n)} and then reducers aggregate that into arrays
like {k1, {v1, v2, v(n)}}, {k2, {v1, v2, v(n)}} and performs logic on the
value set for each unique key, for example.

What comes out of a combiner is {k1, {v1, v2, v(n)}}, {k2, {v1, v2, v(n)}},
the same {k, v} map that reducer builds, and then the reducer does the
logic on the value set for each unique key.

If you change the key in the combiner, you aren't working with the same
set, and so you've used your combiner as another mapper, essentially. But
your method signature won't be right.

Combiner is designed solely to reduce network traffic from mappers to
reducers, since there are usually more mappers than reducers, it reduces
bottlenecking at switches.

If you want to change the key after you've set the key, I feel like you
should use chainMapper and/or write custom input/output format classes if
you need to.

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Mon, Jan 13, 2014 at 12:39 PM, Amit Sela <am...@infolinks.com> wrote:

> More than a solution, I'd like to know if a combiner is allowed to change
> the key ? will it interfere with the mappers sort/merge ?
>
>
> On Mon, Jan 13, 2014 at 3:06 PM, Devin Suiter RDX <ds...@rdx.com> wrote:
>
>> Amit,
>>
>> Have you explored chainMapper class?
>>
>> *Devin Suiter*
>> Jr. Data Solutions Software Engineer
>> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
>> Google Voice: 412-256-8556 | www.rdx.com
>>
>>
>> On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <jo...@redpoint.net>wrote:
>>
>>>  Isn’t this is what you’d normally do in the Mapper?
>>>
>>> My understanding of the combiner is that it is like a “mapper-side
>>> pre-reducer” and operates on blocks of data that have already been sorted
>>> by key, so mucking with the keys doesn’t **seem** like a good idea.
>>>
>>> john
>>>
>>>
>>>
>>> *From:* Amit Sela [mailto:amits@infolinks.com]
>>> *Sent:* Sunday, January 12, 2014 9:26 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* manipulating key in combine phase
>>>
>>>
>>>
>>> Hi all,
>>>
>>>
>>>
>>> I was wondering if it is possible to manipulate the key during combine:
>>>
>>>
>>>
>>> Say I have a mapreduce job where the key has many qualifiers.
>>>
>>> I would like to "split" the key into two (or more) keys if it has more
>>> than, say 100 qualifiers.
>>>
>>> In the combiner class I would do something like:
>>>
>>>
>>>
>>> int count = 0;
>>>
>>> for (Writable value: values) {
>>>
>>>   if (++count >= 100){
>>>
>>>     context.write(newKey, value);
>>>
>>>   } else {
>>>
>>>     context.write(key, value);
>>>
>>>   }
>>>
>>> }
>>>
>>>
>>>
>>> where newKey is something like key+randomUUID
>>>
>>>
>>>
>>> I know that the combiner can be called "zero, once or more..." and I'm
>>> getting strange results (same key written more then once) so I would be
>>> glad to get some deeper insight into how the combiner works.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Amit.
>>>
>>
>>
>

Re: manipulating key in combine phase

Posted by Devin Suiter RDX <ds...@rdx.com>.
I believe combine process is after that step, so, no.

What comes out of a mapper is a set of records {k1, v1} {k1, v2} {k1, v(n)}
{k2, v1} {k2, v2} {k2, v(n)} and then reducers aggregate that into arrays
like {k1, {v1, v2, v(n)}}, {k2, {v1, v2, v(n)}} and performs logic on the
value set for each unique key, for example.

What comes out of a combiner is {k1, {v1, v2, v(n)}}, {k2, {v1, v2, v(n)}},
the same {k, v} map that reducer builds, and then the reducer does the
logic on the value set for each unique key.

If you change the key in the combiner, you aren't working with the same
set, and so you've used your combiner as another mapper, essentially. But
your method signature won't be right.

Combiner is designed solely to reduce network traffic from mappers to
reducers, since there are usually more mappers than reducers, it reduces
bottlenecking at switches.

If you want to change the key after you've set the key, I feel like you
should use chainMapper and/or write custom input/output format classes if
you need to.

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Mon, Jan 13, 2014 at 12:39 PM, Amit Sela <am...@infolinks.com> wrote:

> More than a solution, I'd like to know if a combiner is allowed to change
> the key ? will it interfere with the mappers sort/merge ?
>
>
> On Mon, Jan 13, 2014 at 3:06 PM, Devin Suiter RDX <ds...@rdx.com> wrote:
>
>> Amit,
>>
>> Have you explored chainMapper class?
>>
>> *Devin Suiter*
>> Jr. Data Solutions Software Engineer
>> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
>> Google Voice: 412-256-8556 | www.rdx.com
>>
>>
>> On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <jo...@redpoint.net>wrote:
>>
>>>  Isn’t this is what you’d normally do in the Mapper?
>>>
>>> My understanding of the combiner is that it is like a “mapper-side
>>> pre-reducer” and operates on blocks of data that have already been sorted
>>> by key, so mucking with the keys doesn’t **seem** like a good idea.
>>>
>>> john
>>>
>>>
>>>
>>> *From:* Amit Sela [mailto:amits@infolinks.com]
>>> *Sent:* Sunday, January 12, 2014 9:26 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* manipulating key in combine phase
>>>
>>>
>>>
>>> Hi all,
>>>
>>>
>>>
>>> I was wondering if it is possible to manipulate the key during combine:
>>>
>>>
>>>
>>> Say I have a mapreduce job where the key has many qualifiers.
>>>
>>> I would like to "split" the key into two (or more) keys if it has more
>>> than, say 100 qualifiers.
>>>
>>> In the combiner class I would do something like:
>>>
>>>
>>>
>>> int count = 0;
>>>
>>> for (Writable value: values) {
>>>
>>>   if (++count >= 100){
>>>
>>>     context.write(newKey, value);
>>>
>>>   } else {
>>>
>>>     context.write(key, value);
>>>
>>>   }
>>>
>>> }
>>>
>>>
>>>
>>> where newKey is something like key+randomUUID
>>>
>>>
>>>
>>> I know that the combiner can be called "zero, once or more..." and I'm
>>> getting strange results (same key written more then once) so I would be
>>> glad to get some deeper insight into how the combiner works.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Amit.
>>>
>>
>>
>

Re: manipulating key in combine phase

Posted by Amit Sela <am...@infolinks.com>.
More than a solution, I'd like to know if a combiner is allowed to change
the key ? will it interfere with the mappers sort/merge ?


On Mon, Jan 13, 2014 at 3:06 PM, Devin Suiter RDX <ds...@rdx.com> wrote:

> Amit,
>
> Have you explored chainMapper class?
>
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <jo...@redpoint.net>wrote:
>
>>  Isn’t this is what you’d normally do in the Mapper?
>>
>> My understanding of the combiner is that it is like a “mapper-side
>> pre-reducer” and operates on blocks of data that have already been sorted
>> by key, so mucking with the keys doesn’t **seem** like a good idea.
>>
>> john
>>
>>
>>
>> *From:* Amit Sela [mailto:amits@infolinks.com]
>> *Sent:* Sunday, January 12, 2014 9:26 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* manipulating key in combine phase
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I was wondering if it is possible to manipulate the key during combine:
>>
>>
>>
>> Say I have a mapreduce job where the key has many qualifiers.
>>
>> I would like to "split" the key into two (or more) keys if it has more
>> than, say 100 qualifiers.
>>
>> In the combiner class I would do something like:
>>
>>
>>
>> int count = 0;
>>
>> for (Writable value: values) {
>>
>>   if (++count >= 100){
>>
>>     context.write(newKey, value);
>>
>>   } else {
>>
>>     context.write(key, value);
>>
>>   }
>>
>> }
>>
>>
>>
>> where newKey is something like key+randomUUID
>>
>>
>>
>> I know that the combiner can be called "zero, once or more..." and I'm
>> getting strange results (same key written more then once) so I would be
>> glad to get some deeper insight into how the combiner works.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Amit.
>>
>
>

Re: manipulating key in combine phase

Posted by Amit Sela <am...@infolinks.com>.
More than a solution, I'd like to know if a combiner is allowed to change
the key ? will it interfere with the mappers sort/merge ?


On Mon, Jan 13, 2014 at 3:06 PM, Devin Suiter RDX <ds...@rdx.com> wrote:

> Amit,
>
> Have you explored chainMapper class?
>
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <jo...@redpoint.net>wrote:
>
>>  Isn’t this is what you’d normally do in the Mapper?
>>
>> My understanding of the combiner is that it is like a “mapper-side
>> pre-reducer” and operates on blocks of data that have already been sorted
>> by key, so mucking with the keys doesn’t **seem** like a good idea.
>>
>> john
>>
>>
>>
>> *From:* Amit Sela [mailto:amits@infolinks.com]
>> *Sent:* Sunday, January 12, 2014 9:26 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* manipulating key in combine phase
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I was wondering if it is possible to manipulate the key during combine:
>>
>>
>>
>> Say I have a mapreduce job where the key has many qualifiers.
>>
>> I would like to "split" the key into two (or more) keys if it has more
>> than, say 100 qualifiers.
>>
>> In the combiner class I would do something like:
>>
>>
>>
>> int count = 0;
>>
>> for (Writable value: values) {
>>
>>   if (++count >= 100){
>>
>>     context.write(newKey, value);
>>
>>   } else {
>>
>>     context.write(key, value);
>>
>>   }
>>
>> }
>>
>>
>>
>> where newKey is something like key+randomUUID
>>
>>
>>
>> I know that the combiner can be called "zero, once or more..." and I'm
>> getting strange results (same key written more then once) so I would be
>> glad to get some deeper insight into how the combiner works.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Amit.
>>
>
>

Re: manipulating key in combine phase

Posted by Amit Sela <am...@infolinks.com>.
More than a solution, I'd like to know if a combiner is allowed to change
the key ? will it interfere with the mappers sort/merge ?


On Mon, Jan 13, 2014 at 3:06 PM, Devin Suiter RDX <ds...@rdx.com> wrote:

> Amit,
>
> Have you explored chainMapper class?
>
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <jo...@redpoint.net>wrote:
>
>>  Isn’t this is what you’d normally do in the Mapper?
>>
>> My understanding of the combiner is that it is like a “mapper-side
>> pre-reducer” and operates on blocks of data that have already been sorted
>> by key, so mucking with the keys doesn’t **seem** like a good idea.
>>
>> john
>>
>>
>>
>> *From:* Amit Sela [mailto:amits@infolinks.com]
>> *Sent:* Sunday, January 12, 2014 9:26 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* manipulating key in combine phase
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I was wondering if it is possible to manipulate the key during combine:
>>
>>
>>
>> Say I have a mapreduce job where the key has many qualifiers.
>>
>> I would like to "split" the key into two (or more) keys if it has more
>> than, say 100 qualifiers.
>>
>> In the combiner class I would do something like:
>>
>>
>>
>> int count = 0;
>>
>> for (Writable value: values) {
>>
>>   if (++count >= 100){
>>
>>     context.write(newKey, value);
>>
>>   } else {
>>
>>     context.write(key, value);
>>
>>   }
>>
>> }
>>
>>
>>
>> where newKey is something like key+randomUUID
>>
>>
>>
>> I know that the combiner can be called "zero, once or more..." and I'm
>> getting strange results (same key written more then once) so I would be
>> glad to get some deeper insight into how the combiner works.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Amit.
>>
>
>

Re: manipulating key in combine phase

Posted by Amit Sela <am...@infolinks.com>.
More than a solution, I'd like to know if a combiner is allowed to change
the key ? will it interfere with the mappers sort/merge ?


On Mon, Jan 13, 2014 at 3:06 PM, Devin Suiter RDX <ds...@rdx.com> wrote:

> Amit,
>
> Have you explored chainMapper class?
>
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <jo...@redpoint.net>wrote:
>
>>  Isn’t this is what you’d normally do in the Mapper?
>>
>> My understanding of the combiner is that it is like a “mapper-side
>> pre-reducer” and operates on blocks of data that have already been sorted
>> by key, so mucking with the keys doesn’t **seem** like a good idea.
>>
>> john
>>
>>
>>
>> *From:* Amit Sela [mailto:amits@infolinks.com]
>> *Sent:* Sunday, January 12, 2014 9:26 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* manipulating key in combine phase
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I was wondering if it is possible to manipulate the key during combine:
>>
>>
>>
>> Say I have a mapreduce job where the key has many qualifiers.
>>
>> I would like to "split" the key into two (or more) keys if it has more
>> than, say 100 qualifiers.
>>
>> In the combiner class I would do something like:
>>
>>
>>
>> int count = 0;
>>
>> for (Writable value: values) {
>>
>>   if (++count >= 100){
>>
>>     context.write(newKey, value);
>>
>>   } else {
>>
>>     context.write(key, value);
>>
>>   }
>>
>> }
>>
>>
>>
>> where newKey is something like key+randomUUID
>>
>>
>>
>> I know that the combiner can be called "zero, once or more..." and I'm
>> getting strange results (same key written more then once) so I would be
>> glad to get some deeper insight into how the combiner works.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Amit.
>>
>
>

Re: manipulating key in combine phase

Posted by Devin Suiter RDX <ds...@rdx.com>.
Amit,

Have you explored chainMapper class?

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <jo...@redpoint.net>wrote:

>  Isn’t this is what you’d normally do in the Mapper?
>
> My understanding of the combiner is that it is like a “mapper-side
> pre-reducer” and operates on blocks of data that have already been sorted
> by key, so mucking with the keys doesn’t **seem** like a good idea.
>
> john
>
>
>
> *From:* Amit Sela [mailto:amits@infolinks.com]
> *Sent:* Sunday, January 12, 2014 9:26 AM
> *To:* user@hadoop.apache.org
> *Subject:* manipulating key in combine phase
>
>
>
> Hi all,
>
>
>
> I was wondering if it is possible to manipulate the key during combine:
>
>
>
> Say I have a mapreduce job where the key has many qualifiers.
>
> I would like to "split" the key into two (or more) keys if it has more
> than, say 100 qualifiers.
>
> In the combiner class I would do something like:
>
>
>
> int count = 0;
>
> for (Writable value: values) {
>
>   if (++count >= 100){
>
>     context.write(newKey, value);
>
>   } else {
>
>     context.write(key, value);
>
>   }
>
> }
>
>
>
> where newKey is something like key+randomUUID
>
>
>
> I know that the combiner can be called "zero, once or more..." and I'm
> getting strange results (same key written more then once) so I would be
> glad to get some deeper insight into how the combiner works.
>
>
>
> Thanks,
>
>
>
> Amit.
>

Re: manipulating key in combine phase

Posted by Devin Suiter RDX <ds...@rdx.com>.
Amit,

Have you explored chainMapper class?

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <jo...@redpoint.net>wrote:

>  Isn’t this is what you’d normally do in the Mapper?
>
> My understanding of the combiner is that it is like a “mapper-side
> pre-reducer” and operates on blocks of data that have already been sorted
> by key, so mucking with the keys doesn’t **seem** like a good idea.
>
> john
>
>
>
> *From:* Amit Sela [mailto:amits@infolinks.com]
> *Sent:* Sunday, January 12, 2014 9:26 AM
> *To:* user@hadoop.apache.org
> *Subject:* manipulating key in combine phase
>
>
>
> Hi all,
>
>
>
> I was wondering if it is possible to manipulate the key during combine:
>
>
>
> Say I have a mapreduce job where the key has many qualifiers.
>
> I would like to "split" the key into two (or more) keys if it has more
> than, say 100 qualifiers.
>
> In the combiner class I would do something like:
>
>
>
> int count = 0;
>
> for (Writable value: values) {
>
>   if (++count >= 100){
>
>     context.write(newKey, value);
>
>   } else {
>
>     context.write(key, value);
>
>   }
>
> }
>
>
>
> where newKey is something like key+randomUUID
>
>
>
> I know that the combiner can be called "zero, once or more..." and I'm
> getting strange results (same key written more then once) so I would be
> glad to get some deeper insight into how the combiner works.
>
>
>
> Thanks,
>
>
>
> Amit.
>

Re: manipulating key in combine phase

Posted by Devin Suiter RDX <ds...@rdx.com>.
Amit,

Have you explored chainMapper class?

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <jo...@redpoint.net>wrote:

>  Isn’t this is what you’d normally do in the Mapper?
>
> My understanding of the combiner is that it is like a “mapper-side
> pre-reducer” and operates on blocks of data that have already been sorted
> by key, so mucking with the keys doesn’t **seem** like a good idea.
>
> john
>
>
>
> *From:* Amit Sela [mailto:amits@infolinks.com]
> *Sent:* Sunday, January 12, 2014 9:26 AM
> *To:* user@hadoop.apache.org
> *Subject:* manipulating key in combine phase
>
>
>
> Hi all,
>
>
>
> I was wondering if it is possible to manipulate the key during combine:
>
>
>
> Say I have a mapreduce job where the key has many qualifiers.
>
> I would like to "split" the key into two (or more) keys if it has more
> than, say 100 qualifiers.
>
> In the combiner class I would do something like:
>
>
>
> int count = 0;
>
> for (Writable value: values) {
>
>   if (++count >= 100){
>
>     context.write(newKey, value);
>
>   } else {
>
>     context.write(key, value);
>
>   }
>
> }
>
>
>
> where newKey is something like key+randomUUID
>
>
>
> I know that the combiner can be called "zero, once or more..." and I'm
> getting strange results (same key written more then once) so I would be
> glad to get some deeper insight into how the combiner works.
>
>
>
> Thanks,
>
>
>
> Amit.
>

Re: manipulating key in combine phase

Posted by Devin Suiter RDX <ds...@rdx.com>.
Amit,

Have you explored chainMapper class?

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Sun, Jan 12, 2014 at 7:28 PM, John Lilley <jo...@redpoint.net>wrote:

>  Isn’t this is what you’d normally do in the Mapper?
>
> My understanding of the combiner is that it is like a “mapper-side
> pre-reducer” and operates on blocks of data that have already been sorted
> by key, so mucking with the keys doesn’t **seem** like a good idea.
>
> john
>
>
>
> *From:* Amit Sela [mailto:amits@infolinks.com]
> *Sent:* Sunday, January 12, 2014 9:26 AM
> *To:* user@hadoop.apache.org
> *Subject:* manipulating key in combine phase
>
>
>
> Hi all,
>
>
>
> I was wondering if it is possible to manipulate the key during combine:
>
>
>
> Say I have a mapreduce job where the key has many qualifiers.
>
> I would like to "split" the key into two (or more) keys if it has more
> than, say 100 qualifiers.
>
> In the combiner class I would do something like:
>
>
>
> int count = 0;
>
> for (Writable value: values) {
>
>   if (++count >= 100){
>
>     context.write(newKey, value);
>
>   } else {
>
>     context.write(key, value);
>
>   }
>
> }
>
>
>
> where newKey is something like key+randomUUID
>
>
>
> I know that the combiner can be called "zero, once or more..." and I'm
> getting strange results (same key written more then once) so I would be
> glad to get some deeper insight into how the combiner works.
>
>
>
> Thanks,
>
>
>
> Amit.
>

RE: manipulating key in combine phase

Posted by John Lilley <jo...@redpoint.net>.
Isn't this is what you'd normally do in the Mapper?
My understanding of the combiner is that it is like a "mapper-side pre-reducer" and operates on blocks of data that have already been sorted by key, so mucking with the keys doesn't *seem* like a good idea.
john

From: Amit Sela [mailto:amits@infolinks.com]
Sent: Sunday, January 12, 2014 9:26 AM
To: user@hadoop.apache.org
Subject: manipulating key in combine phase

Hi all,

I was wondering if it is possible to manipulate the key during combine:

Say I have a mapreduce job where the key has many qualifiers.
I would like to "split" the key into two (or more) keys if it has more than, say 100 qualifiers.
In the combiner class I would do something like:

int count = 0;
for (Writable value: values) {
  if (++count >= 100){
    context.write(newKey, value);
  } else {
    context.write(key, value);
  }
}

where newKey is something like key+randomUUID

I know that the combiner can be called "zero, once or more..." and I'm getting strange results (same key written more then once) so I would be glad to get some deeper insight into how the combiner works.

Thanks,

Amit.

RE: manipulating key in combine phase

Posted by John Lilley <jo...@redpoint.net>.
Isn't this is what you'd normally do in the Mapper?
My understanding of the combiner is that it is like a "mapper-side pre-reducer" and operates on blocks of data that have already been sorted by key, so mucking with the keys doesn't *seem* like a good idea.
john

From: Amit Sela [mailto:amits@infolinks.com]
Sent: Sunday, January 12, 2014 9:26 AM
To: user@hadoop.apache.org
Subject: manipulating key in combine phase

Hi all,

I was wondering if it is possible to manipulate the key during combine:

Say I have a mapreduce job where the key has many qualifiers.
I would like to "split" the key into two (or more) keys if it has more than, say 100 qualifiers.
In the combiner class I would do something like:

int count = 0;
for (Writable value: values) {
  if (++count >= 100){
    context.write(newKey, value);
  } else {
    context.write(key, value);
  }
}

where newKey is something like key+randomUUID

I know that the combiner can be called "zero, once or more..." and I'm getting strange results (same key written more then once) so I would be glad to get some deeper insight into how the combiner works.

Thanks,

Amit.

RE: manipulating key in combine phase

Posted by John Lilley <jo...@redpoint.net>.
Isn't this is what you'd normally do in the Mapper?
My understanding of the combiner is that it is like a "mapper-side pre-reducer" and operates on blocks of data that have already been sorted by key, so mucking with the keys doesn't *seem* like a good idea.
john

From: Amit Sela [mailto:amits@infolinks.com]
Sent: Sunday, January 12, 2014 9:26 AM
To: user@hadoop.apache.org
Subject: manipulating key in combine phase

Hi all,

I was wondering if it is possible to manipulate the key during combine:

Say I have a mapreduce job where the key has many qualifiers.
I would like to "split" the key into two (or more) keys if it has more than, say 100 qualifiers.
In the combiner class I would do something like:

int count = 0;
for (Writable value: values) {
  if (++count >= 100){
    context.write(newKey, value);
  } else {
    context.write(key, value);
  }
}

where newKey is something like key+randomUUID

I know that the combiner can be called "zero, once or more..." and I'm getting strange results (same key written more then once) so I would be glad to get some deeper insight into how the combiner works.

Thanks,

Amit.

RE: manipulating key in combine phase

Posted by John Lilley <jo...@redpoint.net>.
Isn't this is what you'd normally do in the Mapper?
My understanding of the combiner is that it is like a "mapper-side pre-reducer" and operates on blocks of data that have already been sorted by key, so mucking with the keys doesn't *seem* like a good idea.
john

From: Amit Sela [mailto:amits@infolinks.com]
Sent: Sunday, January 12, 2014 9:26 AM
To: user@hadoop.apache.org
Subject: manipulating key in combine phase

Hi all,

I was wondering if it is possible to manipulate the key during combine:

Say I have a mapreduce job where the key has many qualifiers.
I would like to "split" the key into two (or more) keys if it has more than, say 100 qualifiers.
In the combiner class I would do something like:

int count = 0;
for (Writable value: values) {
  if (++count >= 100){
    context.write(newKey, value);
  } else {
    context.write(key, value);
  }
}

where newKey is something like key+randomUUID

I know that the combiner can be called "zero, once or more..." and I'm getting strange results (same key written more then once) so I would be glad to get some deeper insight into how the combiner works.

Thanks,

Amit.