You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Vincent Barat <vb...@ubikod.com> on 2011/05/23 16:58:39 UTC

HBaseStorage does not load all my regions!

Hi,

While testing a very simple PIG 0.8.0 script counting the nb of rows 
of one of my HBase tables, I got a strange result: the nb of rows 
reported was only half it should have been (compared to a 'count' 
done in a HBase shell.

It appears that the HBaseStorage loader seems to load only 1 single 
region of my table.

Any idea ? Is this a known regression ?

Here is my script:

myRows = LOAD 'hbase://<my table>' USING 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:deviceid') AS 
(deviceid:chararray);
allRows = GROUP myRows ALL;
nbRows = FOREACH allRows GENERATE COUNT(myRows);
DUMP nbRows;


-- 

*Vincent BARAT, UBIKOD, CTO*


vbarat@ubikod.com <ma...@ubikod.com>  Mob +33 (0)6 15 41 15 18

UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021 
Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89

UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel. +33 (0)2 
99 65 69 13


www.ubikod.com <http://www.ubikod.com/>@ubikod 
<http://twitter.com/ubikod>

www.capptain.com <http://www.capptain.com/>@capptain_hq 
<http://twitter.com/capptain_hq>


IMPORTANT NOTICE – UBIKOD and CAPPTAIN are registered trademarks of 
UBIKOD S.A.R.L., all copyrights are reserved.  The contents of this 
email and attachments are confidential and may be subject to legal 
privilege and/or protected by copyright. Copying or communicating 
any part of it to others is prohibited and may be unlawful. If you 
are not the intended recipient you must not use, copy, distribute or 
rely on this email and should please return it immediately or notify 
us by telephone. At present the integrity of email across the 
Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not 
accept liability for any claims arising as a result of the use of 
this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD 
S.A.R.L. may exercise any of its rights under relevant law, to 
monitor the content of all electronic communications. You should 
therefore be aware that this communication and any responses might 
have been monitored, and may be accessed by UBIKOD S.A.R.L. The 
views expressed in this document are that of the individual and may 
not necessarily constitute or imply its endorsement or 
recommendation by UBIKOD S.A.R.L. The content of this electronic 
mail may be subject to the confidentiality terms of a 
"Non-Disclosure Agreement" (NDA).


Re: HBaseStorage does not load all my regions!

Posted by Vincent Barat <vi...@gmail.com>.
You're right, I tested pig 0.8.0 with hbase 0.20.6, and not 0.8.1 
(very sorry).

Le 24/05/11 07:44, Dmitriy Ryaboy a écrit :
> You couldn't have possibly tested with Pig 0.8.1 successfully, as it
> does not work with HBase 0.20.6 at all. This issue should not show up
> if you use Pig 0.8.1 and HBase 0.90+
>
> Upgrade HBase. The reason I decided this was an acceptable bump in a
> minor release was that 20.6 has a lot of scaling issues that have been
> fixed in 90; anyone running HBase in production should be upgrading
> immediately unless they really like manually rescuing regions.
Yes, we were planning to upgrade to HBase 0.90, but were blocked 
because of PIG 0.8.0 limitation to HBase 0.20.6. Now that 0.8.1 is 
out, we can upgrade.
> Of course, like Jameson suggested, you can also just turn off split
> combination in 0.8. The bug is not in either of the features, it's in
> how they interact, which is why we didn't catch it until it was too
> late :-(.
Thanks a lot for this clarification.

Regards,

Re: HBaseStorage does not load all my regions!

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
You couldn't have possibly tested with Pig 0.8.1 successfully, as it
does not work with HBase 0.20.6 at all. This issue should not show up
if you use Pig 0.8.1 and HBase 0.90+

Upgrade HBase. The reason I decided this was an acceptable bump in a
minor release was that 20.6 has a lot of scaling issues that have been
fixed in 90; anyone running HBase in production should be upgrading
immediately unless they really like manually rescuing regions.

Of course, like Jameson suggested, you can also just turn off split
combination in 0.8. The bug is not in either of the features, it's in
how they interact, which is why we didn't catch it until it was too
late :-(.

D

On Mon, May 23, 2011 at 10:34 PM, Vincent Barat <vb...@ubikod.com> wrote:
> Actually, I tested PIG 0.8.1 a few days ago and I had the same issue.
>
> Furthermore, PIG 0.8.1 uses HBase 0.90, while PIG 0.8.0 uses HBase 0.20.6. I
> thought this difference was the reason why PIG 0.8.1 didn't load all of my
> data (I use HBase 0.20.6).
> So I jumped back to PIG 0.8.0 and discovered that this issue was also on
> this version.
>
> I think that this bug makes PIG useless when working with HBase, and I'm
> very disappointed to see that the new HBase loader has such a bug !
>
> Le 23/05/11 22:19, Dmitriy Ryaboy a écrit :
>>
>> Please let us know if this still happens on 8.1 when split combination is
>> on.
>
> --
>
> *Vincent BARAT, UBIKOD, CTO*
>
>
> vbarat@ubikod.com <ma...@ubikod.com>  Mob +33 (0)6 15 41 15 18
>
> UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021
> Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89
>
> UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel. +33 (0)2 99 65 69
> 13
>
>
> www.ubikod.com <http://www.ubikod.com/>@ubikod <http://twitter.com/ubikod>
>
> www.capptain.com <http://www.capptain.com/>@capptain_hq
> <http://twitter.com/capptain_hq>
>
>
> IMPORTANT NOTICE -- UBIKOD and CAPPTAIN are registered trademarks of UBIKOD
> S.A.R.L., all copyrights are reserved.  The contents of this email and
> attachments are confidential and may be subject to legal privilege and/or
> protected by copyright. Copying or communicating any part of it to others is
> prohibited and may be unlawful. If you are not the intended recipient you
> must not use, copy, distribute or rely on this email and should please
> return it immediately or notify us by telephone. At present the integrity of
> email across the Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L.
> will not accept liability for any claims arising as a result of the use of
> this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD S.A.R.L. may
> exercise any of its rights under relevant law, to monitor the content of all
> electronic communications. You should therefore be aware that this
> communication and any responses might have been monitored, and may be
> accessed by UBIKOD S.A.R.L. The views expressed in this document are that of
> the individual and may not necessarily constitute or imply its endorsement
> or recommendation by UBIKOD S.A.R.L. The content of this electronic mail may
> be subject to the confidentiality terms of a "Non-Disclosure Agreement"
> (NDA).
>
>

Re: HBaseStorage does not load all my regions!

Posted by Vincent Barat <vb...@ubikod.com>.
Actually, I tested PIG 0.8.1 a few days ago and I had the same issue.

Furthermore, PIG 0.8.1 uses HBase 0.90, while PIG 0.8.0 uses HBase 
0.20.6. I thought this difference was the reason why PIG 0.8.1 
didn't load all of my data (I use HBase 0.20.6).
So I jumped back to PIG 0.8.0 and discovered that this issue was 
also on this version.

I think that this bug makes PIG useless when working with HBase, and 
I'm very disappointed to see that the new HBase loader has such a bug !

Le 23/05/11 22:19, Dmitriy Ryaboy a écrit :
> Please let us know if this still happens on 8.1 when split combination is on.

-- 

*Vincent BARAT, UBIKOD, CTO*


vbarat@ubikod.com <ma...@ubikod.com>  Mob +33 (0)6 15 41 15 18

UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021 
Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89

UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel. +33 (0)2 
99 65 69 13


www.ubikod.com <http://www.ubikod.com/>@ubikod 
<http://twitter.com/ubikod>

www.capptain.com <http://www.capptain.com/>@capptain_hq 
<http://twitter.com/capptain_hq>


IMPORTANT NOTICE -- UBIKOD and CAPPTAIN are registered trademarks of 
UBIKOD S.A.R.L., all copyrights are reserved.  The contents of this 
email and attachments are confidential and may be subject to legal 
privilege and/or protected by copyright. Copying or communicating 
any part of it to others is prohibited and may be unlawful. If you 
are not the intended recipient you must not use, copy, distribute or 
rely on this email and should please return it immediately or notify 
us by telephone. At present the integrity of email across the 
Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not 
accept liability for any claims arising as a result of the use of 
this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD 
S.A.R.L. may exercise any of its rights under relevant law, to 
monitor the content of all electronic communications. You should 
therefore be aware that this communication and any responses might 
have been monitored, and may be accessed by UBIKOD S.A.R.L. The 
views expressed in this document are that of the individual and may 
not necessarily constitute or imply its endorsement or 
recommendation by UBIKOD S.A.R.L. The content of this electronic 
mail may be subject to the confidentiality terms of a 
"Non-Disclosure Agreement" (NDA).


Re: HBaseStorage does not load all my regions!

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
I believe we fixed this issue in 8.1 (but for 8.0, the solution is
what Jameson suggests -- turning off split combination completely).

Please let us know if this still happens on 8.1 when split combination is on.

D

On Mon, May 23, 2011 at 8:11 AM, Jameson Lopp <ja...@bronto.com> wrote:
> This sounds like a problem I also ran into a while back. I believe I solved
> it by setting:
>
> SET pig.splitCombination 'false';
>
> There may be a better way (turning off split combination feels like a bad
> thing to do) but that's the only luck I had when I was seeing partial data
> being loaded.
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc.
>
> On 05/23/2011 10:58 AM, Vincent Barat wrote:
>>
>> Hi,
>>
>> While testing a very simple PIG 0.8.0 script counting the nb of rows of
>> one of my HBase tables, I
>> got a strange result: the nb of rows reported was only half it should have
>> been (compared to a
>> 'count' done in a HBase shell.
>>
>> It appears that the HBaseStorage loader seems to load only 1 single region
>> of my table.
>>
>> Any idea ? Is this a known regression ?
>>
>> Here is my script:
>>
>> myRows = LOAD 'hbase://<my table>' USING
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:deviceid') AS
>> (deviceid:chararray);
>> allRows = GROUP myRows ALL;
>> nbRows = FOREACH allRows GENERATE COUNT(myRows);
>> DUMP nbRows;
>>
>>
>

Re: HBaseStorage does not load all my regions!

Posted by Jameson Lopp <ja...@bronto.com>.
This sounds like a problem I also ran into a while back. I believe I solved it by setting:

SET pig.splitCombination 'false';

There may be a better way (turning off split combination feels like a bad thing to do) but that's 
the only luck I had when I was seeing partial data being loaded.
--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

On 05/23/2011 10:58 AM, Vincent Barat wrote:
> Hi,
>
> While testing a very simple PIG 0.8.0 script counting the nb of rows of one of my HBase tables, I
> got a strange result: the nb of rows reported was only half it should have been (compared to a
> 'count' done in a HBase shell.
>
> It appears that the HBaseStorage loader seems to load only 1 single region of my table.
>
> Any idea ? Is this a known regression ?
>
> Here is my script:
>
> myRows = LOAD 'hbase://<my table>' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:deviceid') AS (deviceid:chararray);
> allRows = GROUP myRows ALL;
> nbRows = FOREACH allRows GENERATE COUNT(myRows);
> DUMP nbRows;
>
>