You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Krishna Kalyan <kr...@gmail.com> on 2014/12/29 08:11:48 UTC
Incorrect Dump using HBase Storage Class
Hi,
Happy holidays :).
I have 2 different pig scripts with the statement below
(1)
GeoRef_IP = LOAD '$TBL_GEOGRAPHY' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf_data:cq_geog_id
cf_data:cq_pc_sector cf_data:cq_district_code cf_data:cq_postal_town
cf_data:cq_postal_county cf_data:cq_mosaic_code cf_data:cq_mosaic_code_desc
cf_data:cq_mosaic_group cf_data:cq_sales_territory cf_data:cq_sales_area
cf_data:cq_sales_region cf_data:cq_dqtimestamp cf_data:cq_checkarray',
'-loadKey true');
and
(2)
GeoRef_IP = LOAD '$TBL_GEOGRAPHY' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf_data:cq_geog_id
cf_data:cq_pc_sector cf_data:cq_district_code cf_data:cq_postal_town
cf_data:cq_postal_county cf_data:cq_mosaic_code cf_data:cq_mosaic_code_desc
cf_data:cq_mosaic_group cf_data:cq_sales_territory cf_data:cq_sales_area
cf_data:cq_sales_region cf_data:cq_dqtimestamp cf_data:cq_checkarray',
'-loadKey true') as
(postcode,geog_id,pc_sector,district_code,postal_town,postal_county,mosaic_code,mosaic_code_desc,mosaic_group,sales_territory,sales_area,sales_region,dqtimestamp,checkarray);
the only difference is as statement.
now for example
A foreach of $0,$4,$5 and a dump gives me different results for statement 1
and 2.
where 1 is correct.
Has anyone faced this behavior before?.
Regards,
Krishna
Streaming.XMLLoader not working on PIG
Posted by harry Shah <hr...@hotmail.com>.
Hi I am new to PIG scripting.
I am trying to parse XML values through a pig script but getting the error.
> >
ERROR 1070: Could not resolve
org.apache.pig.piggybank.storage.StreamingXMLLoader using imports: [,
java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at > >
my XML file is this
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="latest_ob.xsl" type="text/xsl"?>
<current_observation version="1.0"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.weather.gov/view/current_observati
on.xsd">
<credit>NOAA's National Weather Service</credit>
<location>Unknown Station</location>
<station_id>SH007</station_id>
<temperature_string>32.0 F (0.0 C)</temperature_string>
<temp_f>32.0</temp_f>
<temp_c>0.0</temp_c>
<water_temp_f>32.0</water_temp_f>
<water_temp_c>0.0</water_temp_c>
<wind_string>Calm</wind_string>
<wind_dir>North</wind_dir>
<wind_degrees>0</wind_degrees>
<wind_mph>0.0</wind_mph>
<wind_gust_mph>0.0</wind_gust_mph>
<pressure_string>1019.0 mb</pressure_string>
<privacy_policy_url>http://weather.gov/notice.html</privacy_policy_u
rl>
</current_observation>
**************************************************************************
I want to extract location, station_id, temp_c and wind_dir I tried writing
three pig scripts first two scripts ar working but no output. third script
is giving me above error.
I am using Hadoop version 2.5 and Pig version 0.14
I think problem is with root element which is also carrying attributes with
it pls suggest me what to do with this issue.
and my pig scripts are (I tried 3 of them)
1.
REGISTER /home/hduser/Desktop/apache_pig/lib/piggybank.jar;
A = LOAD '/demo1/SH007.xml' USING
org.apache.pig.piggybank.storage.XMLLoader('location') AS (x:chararray);
B = foreach A GENERATE FLATTEN(REGEX_EXTRACT_ALL(x,'<location>(.*)
</location>\\s*<temp_c>(.*)</temp_c>\\s*<pressure_string>(.*)
</pressure_string>'))
AS (location:chararray,temp_c:int,pressure_string:chararray);
dump B;
2.
REGISTER /home/hduser/Desktop/apache_pig/lib/piggybank.jar;
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();
A = LOAD '/demo1/SH007.xml' using
org.apache.pig.piggybank.storage.XMLLoader('current_observation') as
(x:chararray);
B = FOREACH A GENERATE XPath(x, 'current_observation/location'), XPath(x,
'current_observation/temp_c');
dump B;
3.
REGISTER /home/hduser/Desktop/pig-0.14.0/lib/piggybank.jar;
data = LOAD '/demo1/SH007.xml'
USING org.apache.pig.piggybank.storage.StreamingXMLLoader(
'current_observation',
'location'
) AS (
location: {(attr:map[], content:chararray)}
);
dump data;
Pls do the needful
Thank you
Harry
Re: Incorrect Dump using HBase Storage Class
Posted by Ted Yu <yu...@gmail.com>.
Can you pastebin the output for both queries ?
What version of hbase are you using ?
Cheers
On Sun, Dec 28, 2014 at 11:11 PM, Krishna Kalyan <kr...@gmail.com>
wrote:
> Hi,
> Happy holidays :).
> I have 2 different pig scripts with the statement below
> (1)
> GeoRef_IP = LOAD '$TBL_GEOGRAPHY' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf_data:cq_geog_id
> cf_data:cq_pc_sector cf_data:cq_district_code cf_data:cq_postal_town
> cf_data:cq_postal_county cf_data:cq_mosaic_code cf_data:cq_mosaic_code_desc
> cf_data:cq_mosaic_group cf_data:cq_sales_territory cf_data:cq_sales_area
> cf_data:cq_sales_region cf_data:cq_dqtimestamp cf_data:cq_checkarray',
> '-loadKey true');
>
> and
> (2)
> GeoRef_IP = LOAD '$TBL_GEOGRAPHY' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf_data:cq_geog_id
> cf_data:cq_pc_sector cf_data:cq_district_code cf_data:cq_postal_town
> cf_data:cq_postal_county cf_data:cq_mosaic_code cf_data:cq_mosaic_code_desc
> cf_data:cq_mosaic_group cf_data:cq_sales_territory cf_data:cq_sales_area
> cf_data:cq_sales_region cf_data:cq_dqtimestamp cf_data:cq_checkarray',
> '-loadKey true') as
>
> (postcode,geog_id,pc_sector,district_code,postal_town,postal_county,mosaic_code,mosaic_code_desc,mosaic_group,sales_territory,sales_area,sales_region,dqtimestamp,checkarray);
>
> the only difference is as statement.
>
> now for example
> A foreach of $0,$4,$5 and a dump gives me different results for statement 1
> and 2.
> where 1 is correct.
>
> Has anyone faced this behavior before?.
>
> Regards,
> Krishna
>