You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Juan Martin Pampliega <jp...@gmail.com> on 2011/07/25 20:01:23 UTC
Merging multiple columns into 2 columns
I have data in an HBase table in stored in the following format:
rowkey group_id:1 group_id:2 ... group_id:n
2fcab50712467eab4004583eb8fb7f89 1 0 1
085125e8f7cdc99fd91dbd7280373c5b 0 1 0
dd53e23487da03fd02396306d248cda0 2 1 0
where the column family group_id contains one column for each set of data
and the number is the number of times that the hash is present in the set of
data.
I need to reformat the data and obtain the output in the following format:
hash group_id
2fcab50712467eab4004583eb8fb7f89 1
dd53e23487da03fd02396306d248cda0 1
dd53e23487da03fd02396306d248cda0 1
085125e8f7cdc99fd91dbd7280373c5b 2
dd53e23487da03fd02396306d248cda0 2
...
2fcab50712467eab4004583eb8fb7f89 n
Any ideas on how to achieve this? I'm really at a loss here.
Re: Merging multiple columns into 2 columns
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Sounds like you need a udf that takes the hash returned by Pig when you load
a column family, and returns a bag of column names, each column name
repeated as many times as indicated by the value of the column. You would
then flatten the result of this udf.
D
On Mon, Jul 25, 2011 at 11:01 AM, Juan Martin Pampliega <
jpampliega@gmail.com> wrote:
> I have data in an HBase table in stored in the following format:
>
> rowkey group_id:1 group_id:2 ... group_id:n
> 2fcab50712467eab4004583eb8fb7f89 1 0 1
> 085125e8f7cdc99fd91dbd7280373c5b 0 1 0
> dd53e23487da03fd02396306d248cda0 2 1 0
>
> where the column family group_id contains one column for each set of data
> and the number is the number of times that the hash is present in the set
> of
> data.
>
> I need to reformat the data and obtain the output in the following format:
>
> hash group_id
> 2fcab50712467eab4004583eb8fb7f89 1
> dd53e23487da03fd02396306d248cda0 1
> dd53e23487da03fd02396306d248cda0 1
> 085125e8f7cdc99fd91dbd7280373c5b 2
> dd53e23487da03fd02396306d248cda0 2
> ...
> 2fcab50712467eab4004583eb8fb7f89 n
>
> Any ideas on how to achieve this? I'm really at a loss here.
>