You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Vincent Barat <vi...@gmail.com> on 2010/06/12 23:14:44 UTC

HBaseStorage: how to extract columns with non ASCII characters in their name ?

Hello,

I face a difficult issue: I need to extract some data from HBase 
columns whose names include non ASCII characters like "Cinéma" or 
event white spaces " " and coma ",".

exemple:

activity = LOAD 'hbase://activity' USING HBaseStorage('data:Cinéma') 
AS (cinema:chararray);

This line is not rejected by grunt, but does not do the job, as if 
the "data:Cinéma" column was not in my HBase table.

When I scan the table with the HBase shell, I got the following output:

1276366750803/c849058758bac column=data:Cin\xC3\xA9mas, 
timestamp=1276367292195, value=1
  1b01b3bb77215b53922

Do you see any character encoding mismatch there ?


Thanks a lot for your help.



Re: HBaseStorage: how to extract columns with non ASCII characters in their name ?

Posted by Vincent Barat <vi...@gmail.com>.
No, the issue does not come from the missing "s" in "Cinéma" !
This typo is in the email only, not in my tests :-)

Le 12/06/10 23:14, Vincent Barat a écrit :
> Hello,
>
> I face a difficult issue: I need to extract some data from HBase columns
> whose names include non ASCII characters like "Cinéma" or event white
> spaces " " and coma ",".
>
> exemple:
>
> activity = LOAD 'hbase://activity' USING HBaseStorage('data:Cinéma') AS
> (cinema:chararray);
>
> This line is not rejected by grunt, but does not do the job, as if the
> "data:Cinéma" column was not in my HBase table.
>
> When I scan the table with the HBase shell, I got the following output:
>
> 1276366750803/c849058758bac column=data:Cin\xC3\xA9mas,
> timestamp=1276367292195, value=1
> 1b01b3bb77215b53922
>
> Do you see any character encoding mismatch there ?
>
>
> Thanks a lot for your help.
>
>