You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Russell Jurney (JIRA)" <ji...@apache.org> on 2012/12/24 08:48:13 UTC
[jira] [Updated] (PIG-3104) XMLLoader return Pig tuple/map/bag
representation of the DOM of XML documents
[ https://issues.apache.org/jira/browse/PIG-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Russell Jurney updated PIG-3104:
--------------------------------
Description:
I want to extend Pig's existing XMLLoader to go beyond capturing the text inside a tag and to actually create a Pig mapping of the Document Object Model the XML represents. This would be similar to elephant-bird's JsonLoader. Semi-structured data can vary, so this behavior can be risky but... I want people to be able to load JSON and XML data easily their first session with Pig.
-------
characters = load 'example.xml' using XMLLoader('character');
describe characters
{properties:map[], name:chararray, born:datetime, qualification:chararray}
-------
<book id="b0836217462" available="true">
<isbn>
0836217462
</isbn>
<title lang="en">
Being a Dog Is a Full-Time Job
</title>
<author id="CMS">
<name>
Charles M Schulz
</name>
<born>
1922-11-26
</born>
<dead>
2000-02-12
</dead>
</author>
<character id="PP">
<name>
Peppermint Patty
</name>
<born>
1966-08-22
</born>
<qualification>
bold, brash and tomboyish
</qualification>
</character>
<character id="Snoopy">
<name>
Snoopy
</name>
<born>
1950-10-04
</born>
<qualification>
extroverted beagle
</qualification>
</character>
<character id="Schroeder">
<name>
Schroeder
</name>
<born>
1951-05-30
</born>
<qualification>
brought classical music to the Peanuts strip
</qualification>
</character>
<character id="Lucy">
<name>
Lucy
</name>
<born>
1952-03-03
</born>
<qualification>
bossy, crabby and selfish
</qualification>
</character>
</book>
</library>
was:
I want to extend Pig's existing XMLLoader to go beyond capturing the text inside a tag and to actually create a Pig mapping of the Document Object Model the XML represents. This would be similar to elephant-bird's JsonLoader. Semi-structured data can vary, so this behavior can be risky but... I want people to be able to load JSON and XML data easily their first session with Pig.
> XMLLoader return Pig tuple/map/bag representation of the DOM of XML documents
> -----------------------------------------------------------------------------
>
> Key: PIG-3104
> URL: https://issues.apache.org/jira/browse/PIG-3104
> Project: Pig
> Issue Type: Improvement
> Components: internal-udfs, piggybank
> Affects Versions: 0.10.0, 0.11
> Reporter: Russell Jurney
> Assignee: Russell Jurney
> Fix For: 0.12
>
>
> I want to extend Pig's existing XMLLoader to go beyond capturing the text inside a tag and to actually create a Pig mapping of the Document Object Model the XML represents. This would be similar to elephant-bird's JsonLoader. Semi-structured data can vary, so this behavior can be risky but... I want people to be able to load JSON and XML data easily their first session with Pig.
> -------
> characters = load 'example.xml' using XMLLoader('character');
> describe characters
>
> {properties:map[], name:chararray, born:datetime, qualification:chararray}
> -------
> <book id="b0836217462" available="true">
> <isbn>
> 0836217462
> </isbn>
> <title lang="en">
> Being a Dog Is a Full-Time Job
> </title>
> <author id="CMS">
> <name>
> Charles M Schulz
> </name>
> <born>
> 1922-11-26
> </born>
> <dead>
> 2000-02-12
> </dead>
> </author>
> <character id="PP">
> <name>
> Peppermint Patty
> </name>
> <born>
> 1966-08-22
> </born>
> <qualification>
> bold, brash and tomboyish
> </qualification>
> </character>
> <character id="Snoopy">
> <name>
> Snoopy
> </name>
> <born>
> 1950-10-04
> </born>
> <qualification>
> extroverted beagle
> </qualification>
> </character>
> <character id="Schroeder">
> <name>
> Schroeder
> </name>
> <born>
> 1951-05-30
> </born>
> <qualification>
> brought classical music to the Peanuts strip
> </qualification>
> </character>
> <character id="Lucy">
> <name>
> Lucy
> </name>
> <born>
> 1952-03-03
> </born>
> <qualification>
> bossy, crabby and selfish
> </qualification>
> </character>
> </book>
> </library>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira