You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2015/04/29 01:30:07 UTC

[jira] [Updated] (PIG-3104) XMLLoader return Pig tuple/map/bag representation of the DOM of XML documents

     [ https://issues.apache.org/jira/browse/PIG-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-3104:
----------------------------
    Fix Version/s:     (was: 0.15.0)
                   0.16.0

> XMLLoader return Pig tuple/map/bag representation of the DOM of XML documents
> -----------------------------------------------------------------------------
>
>                 Key: PIG-3104
>                 URL: https://issues.apache.org/jira/browse/PIG-3104
>             Project: Pig
>          Issue Type: Improvement
>          Components: internal-udfs, piggybank
>    Affects Versions: 0.10.0, 0.11
>            Reporter: Russell Jurney
>            Assignee: Daniel Dai
>             Fix For: 0.16.0
>
>
> I want to extend Pig's existing XMLLoader to go beyond capturing the text inside a tag and to actually create a Pig mapping of the Document Object Model the XML represents. This would be similar to elephant-bird's JsonLoader. Semi-structured data can vary, so this behavior can be risky but... I want people to be able to load JSON and XML data easily their first session with Pig.
> -------
> characters = load 'example.xml' using XMLLoader('character');
> describe characters
>  
> {properties:map[], name:chararray, born:datetime, qualification:chararray}
> -------
>   <book id="b0836217462" available="true">
>     <isbn>
>       0836217462
>     </isbn>
>     <title lang="en">
>       Being a Dog Is a Full-Time Job
>     </title>
>     <author id="CMS">
>       <name>
>         Charles M Schulz
>       </name>
>       <born>
>         1922-11-26
>       </born>
>       <dead>
>         2000-02-12
>       </dead>
>     </author>
>     <character id="PP">
>       <name>
>         Peppermint Patty
>       </name>
>       <born>
>         1966-08-22
>       </born>
>       <qualification>
>         bold, brash and tomboyish
>       </qualification>
>     </character>
>     <character id="Snoopy">
>       <name>
>         Snoopy
>       </name>
>       <born>
>         1950-10-04
>       </born>
>       <qualification>
>         extroverted beagle
>       </qualification>
>     </character>
>     <character id="Schroeder">
>       <name>
>         Schroeder
>       </name>
>       <born>
>         1951-05-30
>       </born>
>       <qualification>
>         brought classical music to the Peanuts strip
>       </qualification>
>     </character>
>     <character id="Lucy">
>       <name>
>         Lucy
>       </name>
>       <born>
>         1952-03-03
>       </born>
>       <qualification>
>         bossy, crabby and selfish
>       </qualification>
>     </character>
>   </book>
> </library>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)