You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by harry Shah <hr...@hotmail.com> on 2014/12/30 08:45:51 UTC
Streaming.XMLLoader not working on PIG
Hi I am new to PIG scripting.
I am trying to parse XML values through a pig script but getting the error.
> >
ERROR 1070: Could not resolve
org.apache.pig.piggybank.storage.StreamingXMLLoader using imports: [,
java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at > >
my XML file is this
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="latest_ob.xsl" type="text/xsl"?>
<current_observation version="1.0"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.weather.gov/view/current_observati
on.xsd">
<credit>NOAA's National Weather Service</credit>
<location>Unknown Station</location>
<station_id>SH007</station_id>
<temperature_string>32.0 F (0.0 C)</temperature_string>
<temp_f>32.0</temp_f>
<temp_c>0.0</temp_c>
<water_temp_f>32.0</water_temp_f>
<water_temp_c>0.0</water_temp_c>
<wind_string>Calm</wind_string>
<wind_dir>North</wind_dir>
<wind_degrees>0</wind_degrees>
<wind_mph>0.0</wind_mph>
<wind_gust_mph>0.0</wind_gust_mph>
<pressure_string>1019.0 mb</pressure_string>
<privacy_policy_url>http://weather.gov/notice.html</privacy_policy_u
rl>
</current_observation>
**************************************************************************
I want to extract location, station_id, temp_c and wind_dir I tried writing
three pig scripts first two scripts ar working but no output. third script
is giving me above error.
I am using Hadoop version 2.5 and Pig version 0.14
I think problem is with root element which is also carrying attributes with
it pls suggest me what to do with this issue.
and my pig scripts are (I tried 3 of them)
1.
REGISTER /home/hduser/Desktop/apache_pig/lib/piggybank.jar;
A = LOAD '/demo1/SH007.xml' USING
org.apache.pig.piggybank.storage.XMLLoader('location') AS (x:chararray);
B = foreach A GENERATE FLATTEN(REGEX_EXTRACT_ALL(x,'<location>(.*)
</location>\\s*<temp_c>(.*)</temp_c>\\s*<pressure_string>(.*)
</pressure_string>'))
AS (location:chararray,temp_c:int,pressure_string:chararray);
dump B;
2.
REGISTER /home/hduser/Desktop/apache_pig/lib/piggybank.jar;
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();
A = LOAD '/demo1/SH007.xml' using
org.apache.pig.piggybank.storage.XMLLoader('current_observation') as
(x:chararray);
B = FOREACH A GENERATE XPath(x, 'current_observation/location'), XPath(x,
'current_observation/temp_c');
dump B;
3.
REGISTER /home/hduser/Desktop/pig-0.14.0/lib/piggybank.jar;
data = LOAD '/demo1/SH007.xml'
USING org.apache.pig.piggybank.storage.StreamingXMLLoader(
'current_observation',
'location'
) AS (
location: {(attr:map[], content:chararray)}
);
dump data;
Pls do the needful
Thank you
Harry