You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Cheolsoo Park <pi...@gmail.com> on 2014/05/19 06:57:12 UTC

Re: XMLLoader not working

I haven't used XMLLoader myself, so I can't give you help. But someone
recently completely rewrote it in trunk-
https://issues.apache.org/jira/browse/PIG-3865

Can you try to build piggybank.jar from trunk? ant clean piggybank.


On Mon, Apr 21, 2014 at 2:14 PM, Edmund Day <ed...@yahoo.com> wrote:

> When I run the script below I get lots of '()' output. Can anyone guide me
> why I get no data in B (PIg version=0.12.1 and A dumps OK)
> TIA!!!!
>
>
>
> A = load 'hdfs:///user/hduser/smsCorpus_en_2012.04.30_all.xml' using
> org.apache.pig.piggybank.storage.XMLLoader('message')
>     as (x:chararray);
> describe A;
>
>
> B = foreach A GENERATE FLATTEN(REGEX_EXTRACT_ALL(x,
> '<message>\\n\\s*<text>(.*)</text>\\n\\s*
> <source>\\n\\s*<srcNumber>(.*)</srcNumber>\\n\\s*<phoneModel
> (.*)/>\\n\\s*<userProfile>\\n\\s*<userID>(.*)</userID>\\n\\s*<age>(.*)</age>\\n\\s*<gender>(.*)</gender>\\n\\s*<nativeSpeaker>(.*)</nativeSpeaker>\\n\\s*<country>(.*)</country>\\n\\s*<city>(.*)</city>\\n\\s*<experience>(.*)</experience>\\n\\s*<frequency>(.*)</frequency>\\n\\s*<inputMethod>(.*)</inputMethod>\\n\\s*</userProfile>\\n\\s*</source>\\n\\s*<destination
> (.*)>\\n\\s*<destNumber>(.*)</destNumber>\\n\\s*</destination>\\n\\s*<messageProfile
> (.*)/>\\n\\s*<collectionMethod (.*)/>\\n\\s*</message>'))
> as (SMStext:chararray, srcNumber:chararray, phoneModel:chararray,
>     userID:chararray, age:chararray, gender:chararray,
> nativeSpeaker:chararray,
>     country:chararray, city:chararray, experience:chararray,
> frequency:chararray,
>     inputMethod:chararray, destination:chararray, destNumber:chararray,
> messageProfile:chararray, collectionMethod:chararray);
>
> describe B;
> dump B;
>
>     /* EXAMPLE DATA FROM NUS SMS CORPUS
>
> <message id="1">
>       <text>K</text>
>       <source>
>     <srcNumber>79780a9dbe83fd1e5dd2bd2543e7da2a</srcNumber>
>     <phoneModel manufactuer="Nokia" smartphone="unknown"/>
>     <userProfile>
>       <userID>79780a9dbe83fd1e5dd2bd2543e7da2a</userID>
>       <age>21-25</age>
>       <gender>unknown</gender>
>       <nativeSpeaker>yes</nativeSpeaker>
>       <country>India</country>
>       <city>Tiruppur</city>
>       <experience>3 to 5 years</experience>
>       <frequency>More than 50 SMS daily</frequency>
>       <inputMethod>Multi-tap</inputMethod>
>     </userProfile>
>       </source>
>       <destination country="unknown">
>     <destNumber>0ffc7585148560b7520931d354c00a9b</destNumber>
>       </destination>
>       <messageProfile language="en" time="2010.10.24 11:59" type="send"/>
>       <collectionMethod collector="Tao Chen" method="SMS Export"
> time="2010/11"/>
>     </message>
>
>     */