You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jochem van Grondelle (JIRA)" <ji...@apache.org> on 2013/12/16 18:03:09 UTC

[jira] [Created] (PIG-3628) When using UNION with 2 HbaseStorages, casting to chararray results in empty string

Jochem van Grondelle created PIG-3628:
-----------------------------------------

             Summary: When using UNION with 2 HbaseStorages, casting to chararray results in empty string
                 Key: PIG-3628
                 URL: https://issues.apache.org/jira/browse/PIG-3628
             Project: Pig
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.11
         Environment: CDH5, Centos 6
            Reporter: Jochem van Grondelle
            Priority: Minor


Hi,

We stumbled upon the following issue. I am wondering if anyone can help us with it. I am available for any follow up questions. Unfortunately, I am not a Java programmer, so I cannot supply a fix if this actually is a bug.

It seems that the following issue is specific to the HbaseLoader, but I am not sure. When using any other loaders (two times PigStorage), the problem doesn't exist. 

It seems that even when we specifiy 'content:map [ chararray ] ' when loading data from HBase, and Pig is saying the schema contains chararrays, still maybe in the background those fields are bytearrays that seem to be not convertable.

First create 2 Hbase tables:
{code}
--hbase shell
--
--hbase(main):001:0> create 'test_table1','f'
--0 row(s) in 20.0530 seconds
--
--hbase(main):002:0> create 'test_table2', 'f'
--0 row(s) in 1.4420 seconds
--
--hbase(main):008:0> put 'test_table1','1-1386066912072','f:date_created','2012-01-04T11:33:59:05321'
--0 row(s) in 5.3380 seconds
--
--hbase(main):002:0> put 'test_table2','2-1386066912074','f:date_created','2012-01-04T11:33:59:05321'
--0 row(s) in 0.0540 seconds
--
--
--hbase(main):003:0> quit
{code}

-- Then run the following Pig script:
{code}
hbs1 = LOAD 'hbase://test_table1'
        USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
               'f:*','-loadKey true')
               AS ( id:bytearray, content:map[chararray]);
               
hbs2 = LOAD 'hbase://test_table2'
        USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
               'f:*','-loadKey true')
               AS ( id:bytearray, content:map[chararray]);

hbs3 = UNION hbs1, hbs2;


hbs4 = FOREACH hbs3
GENERATE        id as hbase_id               
               , flatten(content#'date_created') as date_created                   
               ;   

hbs5 = FOREACH hbs4
GENERATE        hbase_id   
              , date_created  --without (chararray)           
              ,  SUBSTRING( date_created,1,10) as date_created_trunc              
            ;
              
DUMP hbs5;
{code}

*Result*
{code}
(2-1386066912074,2012-01-04T11:33:59:05321,)
(1-1386066912072,2012-01-04T11:33:59:05321,)
{code}

*Expected result*
{code}
(2-1386066912074,2012-01-04T11:33:59:05321,2012-01-04)
(1-1386066912072,2012-01-04T11:33:59:05321,2012-01-04)
{code}

The Substring function in combination with the date_created is just for example purposes. There are several String functions that we want to be able to use.




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)