You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2015/04/29 01:30:07 UTC
[jira] [Updated] (PIG-3104) XMLLoader return Pig tuple/map/bag
representation of the DOM of XML documents
[ https://issues.apache.org/jira/browse/PIG-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-3104:
----------------------------
Fix Version/s: (was: 0.15.0)
0.16.0
> XMLLoader return Pig tuple/map/bag representation of the DOM of XML documents
> -----------------------------------------------------------------------------
>
> Key: PIG-3104
> URL: https://issues.apache.org/jira/browse/PIG-3104
> Project: Pig
> Issue Type: Improvement
> Components: internal-udfs, piggybank
> Affects Versions: 0.10.0, 0.11
> Reporter: Russell Jurney
> Assignee: Daniel Dai
> Fix For: 0.16.0
>
>
> I want to extend Pig's existing XMLLoader to go beyond capturing the text inside a tag and to actually create a Pig mapping of the Document Object Model the XML represents. This would be similar to elephant-bird's JsonLoader. Semi-structured data can vary, so this behavior can be risky but... I want people to be able to load JSON and XML data easily their first session with Pig.
> -------
> characters = load 'example.xml' using XMLLoader('character');
> describe characters
>
> {properties:map[], name:chararray, born:datetime, qualification:chararray}
> -------
> <book id="b0836217462" available="true">
> <isbn>
> 0836217462
> </isbn>
> <title lang="en">
> Being a Dog Is a Full-Time Job
> </title>
> <author id="CMS">
> <name>
> Charles M Schulz
> </name>
> <born>
> 1922-11-26
> </born>
> <dead>
> 2000-02-12
> </dead>
> </author>
> <character id="PP">
> <name>
> Peppermint Patty
> </name>
> <born>
> 1966-08-22
> </born>
> <qualification>
> bold, brash and tomboyish
> </qualification>
> </character>
> <character id="Snoopy">
> <name>
> Snoopy
> </name>
> <born>
> 1950-10-04
> </born>
> <qualification>
> extroverted beagle
> </qualification>
> </character>
> <character id="Schroeder">
> <name>
> Schroeder
> </name>
> <born>
> 1951-05-30
> </born>
> <qualification>
> brought classical music to the Peanuts strip
> </qualification>
> </character>
> <character id="Lucy">
> <name>
> Lucy
> </name>
> <born>
> 1952-03-03
> </born>
> <qualification>
> bossy, crabby and selfish
> </qualification>
> </character>
> </book>
> </library>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)