You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by ajay kumar <aj...@gmail.com> on 2013/09/10 08:58:12 UTC

xml parsing issue

Hi all,

I HAVE XML FILE LIKE THIS:

<CATALOG>
<CD>
<TITLE>hadoop developer</TITLE>
<ARTIST>ajay</ARTIST>
<COUNTRY>india</COUNTRY>
<COMPANY>ITC</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>2013</YEAR>
</CD>
</CATALOG>


THIS IS MY PIG SCRIPT:

register /usr/lib/pig/piggybank.jar;

A = load '/home/sudeep/Desktop/CATALOG.xml' using
org.apache.pig.piggybank.storage.XMLLoader('CATALOG') as (x:
chararray);

B = foreach A GENERATE
FLATTEN(REGEX_EXTRACT_ALL(x,'<CATALOG>\n<CD>\\n<TITLE>(.*)</TITLE>\n<ARTIST>(.*)</ARTIST>\n<COUNTRY>(.*)</COUNTRY>\n<COMPANY>(.*)<COMPANY>\n<PRICE>(.*)</PRICE>\n<YEAR>(.*)</YEAR>\n</CD>\n</CATALOG>'))
as (id: int, name:chararray);


EXPECTED OUTPUT:
hadoop developer|ajay|india|ITC|10.90|2013


but

getting output like:

()

()

()


what is wrong???


-- 
*Thanks & Regards,*
*S. Ajay Kumar
+91-9966159106*