You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Eduardo Afonso Ferreira <ea...@yahoo.com> on 2011/09/06 18:50:17 UTC

Union of multiple loads using HBaseStorage not working as expected.

Hi there,

We hit a possible issue with Pig (version 0.9.1) and HBaseStorage where we try to LOAD multiple sets of data and UNION them. Here's a simple example that shows the problem:

HBase Data (use hbase shell to create table and add rows):


create 'test', {NAME => 'data', VERSIONS => 1}

put 'test', '11111', 'data:value', '1'
put 'test', '11112', 'data:value', '2'
put 'test', '11113', 'data:value', '3'
put 'test', '22221', 'data:value', '4'
put 'test', '22222', 'data:value', '5'

put 'test', '22223', 'data:value', '6'

Pig Statements (create file test.pig):

load1 = LOAD 'hbase://test' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte 11110 -lte 22220') AS (key:chararray, map:map[]);
load2 = LOAD 'hbase://test' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte 22220 -lte 33330') AS (key:chararray, map:map[]);
result = UNION load1, load2;
dump result;


Run Script:
pig -x local test.pig


Result:
(11111,[value#1])
(11112,[value#2])
(11113,[value#3])
(11111,[value#1])
(11112,[value#2])
(11113,[value#3])



The result should be the following:
(11111,[value#1])
(11112,[value#2])
(11113,[value#3])
(22221,[value#4])
(22222,[value#5])
(22223,[value#6])

If we dump load1 or load2 we see the results we expect, but when the UNION is performed, it does not put the expected data together.

Is this a known issue with Pig/HBaseStorage or are we not using them as we should?
If it's a usage problem, what would be the proper way of loading multiple sets of data and union them?

Thanks in advance.
Eduardo.

Re: Union of multiple loads using HBaseStorage not working as expected.

Posted by Eduardo Afonso Ferreira <ea...@yahoo.com>.
Hey, Dmitriy,

We built from a code we got from the 0.9 branch a couple of weeks ago.

But we just built from the trunk and now it works as expected.

Thanks for the help.
Eduardo.



________________________________
From: Dmitriy Ryaboy <dv...@gmail.com>
To: user@pig.apache.org; Eduardo Afonso Ferreira <ea...@yahoo.com>
Sent: Tuesday, September 6, 2011 12:56 PM
Subject: Re: Union of multiple loads using HBaseStorage not working as expected.


Hi Eduardo, there is no 0.9.1.. do you mean you built it from the 0.9 branch?
Could you try trunk?


On Tue, Sep 6, 2011 at 9:50 AM, Eduardo Afonso Ferreira <ea...@yahoo.com> wrote:

Hi there,
>
>We hit a possible issue with Pig (version 0.9.1) and HBaseStorage where we try to LOAD multiple sets of data and UNION them. Here's a simple example that shows the problem:
>
>HBase Data (use hbase shell to create table and add rows):
>
>
>create 'test', {NAME => 'data', VERSIONS => 1}
>
>put 'test', '11111', 'data:value', '1'
>put 'test', '11112', 'data:value', '2'
>put 'test', '11113', 'data:value', '3'
>put 'test', '22221', 'data:value', '4'
>put 'test', '22222', 'data:value', '5'
>
>put 'test', '22223', 'data:value', '6'
>
>Pig Statements (create file test.pig):
>
>load1 = LOAD 'hbase://test' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte 11110 -lte 22220') AS (key:chararray, map:map[]);
>load2 = LOAD 'hbase://test' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte 22220 -lte 33330') AS (key:chararray, map:map[]);
>result = UNION load1, load2;
>dump result;
>
>
>Run Script:
>pig -x local test.pig
>
>
>Result:
>(11111,[value#1])
>(11112,[value#2])
>(11113,[value#3])
>(11111,[value#1])
>(11112,[value#2])
>(11113,[value#3])
>
>
>
>The result should be the following:
>(11111,[value#1])
>(11112,[value#2])
>(11113,[value#3])
>(22221,[value#4])
>(22222,[value#5])
>(22223,[value#6])
>
>If we dump load1 or load2 we see the results we expect, but when the UNION is performed, it does not put the expected data together.
>
>Is this a known issue with Pig/HBaseStorage or are we not using them as we should?
>If it's a usage problem, what would be the proper way of loading multiple sets of data and union them?
>
>Thanks in advance.
>Eduardo.
>

Re: Union of multiple loads using HBaseStorage not working as expected.

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Hi Eduardo, there is no 0.9.1.. do you mean you built it from the 0.9
branch?
Could you try trunk?

On Tue, Sep 6, 2011 at 9:50 AM, Eduardo Afonso Ferreira
<ea...@yahoo.com>wrote:

> Hi there,
>
> We hit a possible issue with Pig (version 0.9.1) and HBaseStorage where we
> try to LOAD multiple sets of data and UNION them. Here's a simple example
> that shows the problem:
>
> HBase Data (use hbase shell to create table and add rows):
>
>
> create 'test', {NAME => 'data', VERSIONS => 1}
>
> put 'test', '11111', 'data:value', '1'
> put 'test', '11112', 'data:value', '2'
> put 'test', '11113', 'data:value', '3'
> put 'test', '22221', 'data:value', '4'
> put 'test', '22222', 'data:value', '5'
>
> put 'test', '22223', 'data:value', '6'
>
> Pig Statements (create file test.pig):
>
> load1 = LOAD 'hbase://test' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte
> 11110 -lte 22220') AS (key:chararray, map:map[]);
> load2 = LOAD 'hbase://test' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte
> 22220 -lte 33330') AS (key:chararray, map:map[]);
> result = UNION load1, load2;
> dump result;
>
>
> Run Script:
> pig -x local test.pig
>
>
> Result:
> (11111,[value#1])
> (11112,[value#2])
> (11113,[value#3])
> (11111,[value#1])
> (11112,[value#2])
> (11113,[value#3])
>
>
>
> The result should be the following:
> (11111,[value#1])
> (11112,[value#2])
> (11113,[value#3])
> (22221,[value#4])
> (22222,[value#5])
> (22223,[value#6])
>
> If we dump load1 or load2 we see the results we expect, but when the UNION
> is performed, it does not put the expected data together.
>
> Is this a known issue with Pig/HBaseStorage or are we not using them as we
> should?
> If it's a usage problem, what would be the proper way of loading multiple
> sets of data and union them?
>
> Thanks in advance.
> Eduardo.
>