You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "HAJIHASHEMI, ZAHRA (AG/1000)" <za...@monsanto.com> on 2012/10/24 19:34:31 UTC

RE: Load table sorted by key

Hi,

I have a table in hbase that I want to load all records sorted by row key which is an integer number.
Here is my code:
library = LOAD 'discovery_rnaseq_library' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('A:COMMON_NAME A:SCIENTIFIC_NAME A:GENETIC_BACKGROUND A:TISSUE,'-loadKey true') as (id:int, COMMON_NAME:chararray, SCIENTIFIC_NAME:chararray, GENETIC_BACKGROUND:chararray, TISSUE:chararray);

I want it to be sorted based on (id:int) in increasing order. But, what it does is like this:
1              soybean               Glycine max       A3525    leaf
10           soybean               Glycine max       A3525    root
100         soybean               Glycine max       A3244    root
101         soybean               Glycine max       80% A3525 + 20% Opal
...

I appreciate any help.

-Zara

This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information you are obligated to comply with all
applicable U.S. export laws and regulations.

Re: Load table sorted by key

Posted by Adam Kawa <ka...@gmail.com>.
Rows in HBase are sorted lexicographically, so that 1 precedes 10 and 100.

You need to
a) ORDER library BY id in Pig script or
b) pad your HBase rowkey with zeros e.g. 001, 010, 100, 101 before
loading it in Pig (but I will require re-write of the HBase table,
since each row can not be simply updated, but will have to be
re-insterted into the table)

Best,
Adam

2012/10/24 HAJIHASHEMI, ZAHRA (AG/1000) <za...@monsanto.com>:
> Hi,
>
> I have a table in hbase that I want to load all records sorted by row key which is an integer number.
> Here is my code:
> library = LOAD 'discovery_rnaseq_library' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('A:COMMON_NAME A:SCIENTIFIC_NAME A:GENETIC_BACKGROUND A:TISSUE,'-loadKey true') as (id:int, COMMON_NAME:chararray, SCIENTIFIC_NAME:chararray, GENETIC_BACKGROUND:chararray, TISSUE:chararray);
>
> I want it to be sorted based on (id:int) in increasing order. But, what it does is like this:
> 1              soybean               Glycine max       A3525    leaf
> 10           soybean               Glycine max       A3525    root
> 100         soybean               Glycine max       A3244    root
> 101         soybean               Glycine max       80% A3525 + 20% Opal
> ...
>
> I appreciate any help.
>
> -Zara
>
> This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
> to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.
>
> All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
> Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
> this e-mail or any attachment.
>
>
> The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
> including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information you are obligated to comply with all
> applicable U.S. export laws and regulations.