You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by krishnan N <kr...@gmail.com> on 2012/04/18 00:43:12 UTC

Parsing XML using XMLloader

Hi ,

Have anyone used the XMLloader to parse an XML file, if so then can you
please share few lines of your scripts.
I tried using the example given by pig.apache.org but not sure how to use
it.

Thanks
Krishnan

Re: Parsing XML using XMLloader

Posted by Francisco Javier Gonzalez Garcia <fr...@altran.es>.
Hi,

I think that you should use RegexExtractAll function with a pattern,
this function convert in fields an xml structure.
In this case, it will be:

register /usr/lib/pig/contrib/piggybank/java/piggybank.jar;

xml_file = LOAD '/home/test2.xml' using
org.apache.pig.piggybank.storage.XMLLoader('field') as (doc:chararray);

loof_file = foreach xml_file generate field;

loof_file = FOREACH indisXML GENERATE FLATTEN
(RegexExtractAll(register,'\\s*<field\\s+id="([^"]*)">\\n\\s*<value>([^>]*)</value>\\n\\s*</field>\\n\\s*<field
.........')
)
AS
(
filed1: chararray,
value1: chararray,
field2: chararray,
value2: chararray,
...);

for a xml structure:

<register>
<field id="productId">
                <value>12354678</value>
</field>
<field id="AckLevel">
             <value>LEVEL2</value>
</field>
...
</register>


2012/4/23, krishnan N <kr...@gmail.com>:
> Hi ,
>
> Thanks and appreciate your response, please see my requirement below. The
> current pig does not support this , I need to write UDF to achieve this.
>
> I am trying XML parsing using PIG, the below are the code which uses the
> xmlloader class . I am trying to convert XML to text file with attribute in
> columns and attribute value as column value.
>
>
>
> register /usr/lib/pig/contrib/piggybank/java/piggybank.jar;
>
> xml_file = LOAD '/home/test2.xml' using
> org.apache.pig.piggybank.storage.XMLLoader('field') as (doc:chararray);
>
> loof_file = foreach xml_file generate field;
>
> store_file = store loof_file into '/home/xml2_to_text.dat';
>
>
>
> The xmlloader identifies only the ‘tag’ supplied as input parameter and
> gives the below result only for the particular tag. Is there any way to get
> attribute values.
>
>
>
> <field id="productId">
>
>                 <value>12354678</value>
>
>             </field>
>
> <field id="AckLevel">
>
>                 <value>LEVEL2</value>
>
>             </field>
>
> <field id="AckDate">
>
>                 <value>2012-02-29T16:21:54</value>
>
>             </field>
>
> <field id="Success">
>
>                 <value>true</value>
>
>             </field>
>
>
>
> Required Output :
>
> Product_Id| AckLevel AckDate| Success
>
> 12354678   | LEVEL2  |2012-02-29T16:21:54|true
>
> Thanks
> Krishnan
>
> On Mon, Apr 23, 2012 at 6:26 AM, Francisco Javier Gonzalez Garcia <
> francisco.gonzalez@altran.es> wrote:
>
>> an example:
>>
>> *REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar;*
>> *DEFINE XMLLoader org.apache.pig.piggybank.storage.XMLLoader();*
>> *
>> *
>> *indisXML = LOAD 'indis.xml' USING XMLLoader('indisposicion') AS
>> (indisposicion:chararray);*
>> *dump indisXML;*
>>
>> 2012/4/18 krishnan N <kr...@gmail.com>
>>
>> > Hi ,
>> >
>> > Have anyone used the XMLloader to parse an XML file, if so then can you
>> > please share few lines of your scripts.
>> > I tried using the example given by pig.apache.org but not sure how to
>> use
>> > it.
>> >
>> > Thanks
>> > Krishnan
>> >
>>
>

Re: Parsing XML using XMLloader

Posted by krishnan N <kr...@gmail.com>.
Hi ,

Thanks and appreciate your response, please see my requirement below. The
current pig does not support this , I need to write UDF to achieve this.

I am trying XML parsing using PIG, the below are the code which uses the
xmlloader class . I am trying to convert XML to text file with attribute in
columns and attribute value as column value.



register /usr/lib/pig/contrib/piggybank/java/piggybank.jar;

xml_file = LOAD '/home/test2.xml' using
org.apache.pig.piggybank.storage.XMLLoader('field') as (doc:chararray);

loof_file = foreach xml_file generate field;

store_file = store loof_file into '/home/xml2_to_text.dat';



The xmlloader identifies only the ‘tag’ supplied as input parameter and
gives the below result only for the particular tag. Is there any way to get
attribute values.



<field id="productId">

                <value>12354678</value>

            </field>

<field id="AckLevel">

                <value>LEVEL2</value>

            </field>

<field id="AckDate">

                <value>2012-02-29T16:21:54</value>

            </field>

<field id="Success">

                <value>true</value>

            </field>



Required Output :

Product_Id| AckLevel AckDate| Success

12354678   | LEVEL2  |2012-02-29T16:21:54|true

Thanks
Krishnan

On Mon, Apr 23, 2012 at 6:26 AM, Francisco Javier Gonzalez Garcia <
francisco.gonzalez@altran.es> wrote:

> an example:
>
> *REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar;*
> *DEFINE XMLLoader org.apache.pig.piggybank.storage.XMLLoader();*
> *
> *
> *indisXML = LOAD 'indis.xml' USING XMLLoader('indisposicion') AS
> (indisposicion:chararray);*
> *dump indisXML;*
>
> 2012/4/18 krishnan N <kr...@gmail.com>
>
> > Hi ,
> >
> > Have anyone used the XMLloader to parse an XML file, if so then can you
> > please share few lines of your scripts.
> > I tried using the example given by pig.apache.org but not sure how to
> use
> > it.
> >
> > Thanks
> > Krishnan
> >
>

Re: Parsing XML using XMLloader

Posted by Francisco Javier Gonzalez Garcia <fr...@altran.es>.
an example:

*REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar;*
*DEFINE XMLLoader org.apache.pig.piggybank.storage.XMLLoader();*
*
*
*indisXML = LOAD 'indis.xml' USING XMLLoader('indisposicion') AS
(indisposicion:chararray);*
*dump indisXML;*

2012/4/18 krishnan N <kr...@gmail.com>

> Hi ,
>
> Have anyone used the XMLloader to parse an XML file, if so then can you
> please share few lines of your scripts.
> I tried using the example given by pig.apache.org but not sure how to use
> it.
>
> Thanks
> Krishnan
>