You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by harry Shah <hr...@hotmail.com> on 2014/12/30 08:45:51 UTC

Streaming.XMLLoader not working on PIG

 
 Hi I am new to PIG scripting.

 I am trying to parse XML values through a pig script but getting the error.
> >
 ERROR 1070: Could not resolve 
org.apache.pig.piggybank.storage.StreamingXMLLoader using imports: [, 
java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at > > 

my XML file is this

<?xml version="1.0" encoding="ISO-8859-1"?> 
<?xml-stylesheet href="latest_ob.xsl" type="text/xsl"?>

<current_observation version="1.0"
	 xmlns:xsd="http://www.w3.org/2001/XMLSchema"
	 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	 
xsi:noNamespaceSchemaLocation="http://www.weather.gov/view/current_observati
on.xsd">
	<credit>NOAA's National Weather Service</credit>
	<location>Unknown Station</location>
	<station_id>SH007</station_id>
        <temperature_string>32.0 F (0.0 C)</temperature_string>
	<temp_f>32.0</temp_f>
	<temp_c>0.0</temp_c>
	<water_temp_f>32.0</water_temp_f>
	<water_temp_c>0.0</water_temp_c>
	<wind_string>Calm</wind_string>
	<wind_dir>North</wind_dir>
	<wind_degrees>0</wind_degrees>
	<wind_mph>0.0</wind_mph>
	<wind_gust_mph>0.0</wind_gust_mph>
	<pressure_string>1019.0 mb</pressure_string>
	<privacy_policy_url>http://weather.gov/notice.html</privacy_policy_u
rl>
</current_observation>

**************************************************************************
I want to extract location, station_id, temp_c and wind_dir I tried writing 
three pig scripts first two scripts ar working but no output. third script 
is giving me above error.
I am using Hadoop version 2.5 and Pig version 0.14

I think problem is with root element which is also carrying attributes with 
it pls suggest me what to do with this issue.
  
and my pig scripts are (I tried 3 of them)

1. 



REGISTER /home/hduser/Desktop/apache_pig/lib/piggybank.jar;

A = LOAD '/demo1/SH007.xml' USING 
org.apache.pig.piggybank.storage.XMLLoader('location') AS (x:chararray);

B = foreach A GENERATE FLATTEN(REGEX_EXTRACT_ALL(x,'<location>(.*)
</location>\\s*<temp_c>(.*)</temp_c>\\s*<pressure_string>(.*)
</pressure_string>'))
AS (location:chararray,temp_c:int,pressure_string:chararray);

dump B;



2.


REGISTER /home/hduser/Desktop/apache_pig/lib/piggybank.jar;

DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();
 

A = LOAD '/demo1/SH007.xml' using 
org.apache.pig.piggybank.storage.XMLLoader('current_observation') as 
(x:chararray);
 
B = FOREACH A GENERATE XPath(x, 'current_observation/location'), XPath(x, 
'current_observation/temp_c');

dump B;



3.



REGISTER /home/hduser/Desktop/pig-0.14.0/lib/piggybank.jar;

data = LOAD '/demo1/SH007.xml'
       USING org.apache.pig.piggybank.storage.StreamingXMLLoader(
          'current_observation',
          'location'
       ) AS (
           location:    {(attr:map[], content:chararray)}
       );

dump data;


Pls do the needful

Thank you 
Harry