You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2011/09/21 23:52:26 UTC

[jira] [Resolved] (PIG-2289) HBaseStorage do not care about delimiter in STORE

     [ https://issues.apache.org/jira/browse/PIG-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai resolved PIG-2289.
-----------------------------

    Resolution: Invalid

> HBaseStorage do not care about delimiter in STORE
> -------------------------------------------------
>
>                 Key: PIG-2289
>                 URL: https://issues.apache.org/jira/browse/PIG-2289
>             Project: Pig
>          Issue Type: Bug
>          Components: internal-udfs
>    Affects Versions: 0.9.1, 0.10
>         Environment: Hadoop, Hbase, zookeeper from cdh3u1
> Pig from github (version 0.9.1 then trunk:0.10)
>            Reporter: Damien Hardy
>
> I want to store in Hbase a set of tupple generated by pig streaming (inspired by http://www.cloudera.com/blog/2009/06/analyzing-apache-logs-with-pig/ )
> Here is my script :
> set debug 'off'
> DEFINE iplookup `wrapper.sh GeoIP`
> ship ('wrapper.sh')
> cache('/GeoIP/GeoIPcity.dat#GeoIP');
> A = load 'log' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('default:body','-gt=_f:squid_t:201109161405 -lte=_f:squid_t:201109161410 -loadKey') AS (rowkey, data);
> B = LIMIT A 10;
> C = FOREACH B {
>         t = REGEX_EXTRACT(data,'([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}):([0-9]+) ',1);
>         generate rowkey, t;
> }
> D = STREAM C THROUGH iplookup AS (rowkey, ip, country_code, country, state, city);
> DESCRIBE D;
> -- DUMP D;
> STORE D INTO 'geoip_pig' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('location:ip location:country_code location:country location:state location:city') ;
> The "DESCRIBE D;" show :
> D: {rowkey: bytearray,ip: bytearray,country_code: bytearray,country: bytearray,state: bytearray,city: bytearray}
> as expected
> Store juste get the rowkey and put the rest of the tuple in the first column (location:ip) as you can see :
> hbase(main):033:0> get 'geoip_pig', "_f:squid_t:20110916140500_b:squid_s:200-1VPVjbVwywTpNtLA4mHl+A=="
> COLUMN                                               CELL                                                                                                                                                      
>  location:city         timestamp=1316180980265, value=
>  location:country      timestamp=1316180980265, value=
>  location:country_code timestamp=1316180980265, value=
>  location:ip           timestamp=1316180980265, value=90.9.213.170,FR,France,A9,Llupia
>  location:state        timestamp=1316180980265, value=
> 5 row(s) in 0.0150 seconds
> I tried also with option '-delim=,' without more effect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira