You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Gayatri Rao <rg...@gmail.com> on 2012/04/23 07:30:48 UTC
Problem with using BinSedesTuple as Mapper key
Hello,
I am using BinSedesTuple as a mapper key to emit a tuple of values. But
somehow same keys do not go to the same reducer and I do not get
aggregates.
Is it not suggested to use it as a mapper key?
For example in my mapper I emit
Mapper:
Output key : BinSedesTuple value: int
Example output:
tuple.append(url);
tuple.append(category);
Reducer:
Input key: BinSedesTuple value: int
Output key: Text value: int
Example output:
url1 category1 3
url1 category1 2
In the reducer output I get output with multiple keys being the same. My
expected output is
url1 category 5
Any ideas what might be wrong?
Thanks,
Gayatri
Re: Problem with using BinSedesTuple as Mapper key
Posted by Harsh J <ha...@cloudera.com>.
Per http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/data/BinSedesTuple.html
(From where you may have picked up BinSedesTuple), its advised not to
use this class directly, outside of Pig.
You're better off implementing your own writable, or using a better
serialization library that provides similar structures and much more,
such as Apache Avro:
http://avro.apache.org/docs/1.6.3/api/java/org/apache/avro/mapred/package-summary.html#package_description
On Mon, Apr 23, 2012 at 11:00 AM, Gayatri Rao <rg...@gmail.com> wrote:
> Hello,
>
> I am using BinSedesTuple as a mapper key to emit a tuple of values. But
> somehow same keys do not go to the same reducer and I do not get
> aggregates.
> Is it not suggested to use it as a mapper key?
>
> For example in my mapper I emit
>
> Mapper:
> Output key : BinSedesTuple value: int
>
>
> Example output:
> tuple.append(url);
> tuple.append(category);
>
> Reducer:
> Input key: BinSedesTuple value: int
> Output key: Text value: int
>
> Example output:
> url1 category1 3
> url1 category1 2
>
> In the reducer output I get output with multiple keys being the same. My
> expected output is
> url1 category 5
>
> Any ideas what might be wrong?
>
>
> Thanks,
> Gayatri
--
Harsh J
Re: Problem with using BinSedesTuple as Mapper key
Posted by Pere Ferrera <fe...@gmail.com>.
Hi Gayatri,
Looks like you might want to use a low-level enhancement of the default
Hadoop API called Pangool (http://pangool.net) which uses tuples and
simplifies grouping by, sorting by and joining datasets in Hadoop.
On Mon, Apr 23, 2012 at 7:30 AM, Gayatri Rao <rg...@gmail.com> wrote:
> Hello,
>
> I am using BinSedesTuple as a mapper key to emit a tuple of values. But
> somehow same keys do not go to the same reducer and I do not get
> aggregates.
> Is it not suggested to use it as a mapper key?
>
> For example in my mapper I emit
>
> Mapper:
> Output key : BinSedesTuple value: int
>
>
> Example output:
> tuple.append(url);
> tuple.append(category);
>
> Reducer:
> Input key: BinSedesTuple value: int
> Output key: Text value: int
>
> Example output:
> url1 category1 3
> url1 category1 2
>
> In the reducer output I get output with multiple keys being the same. My
> expected output is
> url1 category 5
>
> Any ideas what might be wrong?
>
>
> Thanks,
> Gayatri
>