You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Vivek Padmanabhan (JIRA)" <ji...@apache.org> on 2011/06/28 14:40:17 UTC

[jira] [Created] (PIG-2147) Support nested tags for XMLLoader

Support nested tags for XMLLoader
---------------------------------

                 Key: PIG-2147
                 URL: https://issues.apache.org/jira/browse/PIG-2147
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.8.1, 0.9.0
            Reporter: Vivek Padmanabhan
            Assignee: Vivek Padmanabhan


Currently xmlloader does not support nested tags with same tag name, ie if i have the below content

{code}
<event>
 <relatedEvents>
   <event>x<\event>
   <event>y<\event>
   <event>z<\event>
 <\relatedEvents>
<\event>
{code}

And I load the above using XMLLoader,
events = load 'input' using org.apache.pig.piggybank.storage.XMLLoader('event') as (doc:chararray);


The output will be,
{code}
<event>
 <relatedEvents>
   <event>x<\event>
{code}

Whereas the desired output is ;
{code}
 <relatedEvents>
   <event>x<\event>
   <event>y<\event>
   <event>z<\event>
 <\relatedEvents>
{code}


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2147) Support nested tags for XMLLoader

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2147:
----------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.8.1)
                       (was: 0.9.0)
                   0.10
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

This seems not an urgent issue, I don't think we need to back port to 0.9. Since we already commit the patch to trunk, close the ticket.

> Support nested tags for XMLLoader
> ---------------------------------
>
>                 Key: PIG-2147
>                 URL: https://issues.apache.org/jira/browse/PIG-2147
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Vivek Padmanabhan
>             Fix For: 0.10
>
>         Attachments: PIG-2147_1.patch
>
>
> Currently xmlloader does not support nested tags with same tag name, ie if i have the below content
> {code}
> <event>
>  <relatedEvents>
>    <event>x<\event>
>    <event>y<\event>
>    <event>z<\event>
>  <\relatedEvents>
> <\event>
> {code}
> And I load the above using XMLLoader,
> events = load 'input' using org.apache.pig.piggybank.storage.XMLLoader('event') as (doc:chararray);
> The output will be,
> {code}
> <event>
>  <relatedEvents>
>    <event>x<\event>
> {code}
> Whereas the desired output is ;
> {code}
>  <relatedEvents>
>    <event>x<\event>
>    <event>y<\event>
>    <event>z<\event>
>  <\relatedEvents>
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2147) Support nested tags for XMLLoader

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062870#comment-13062870 ] 

jiraposter@reviews.apache.org commented on PIG-2147:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1064/
-----------------------------------------------------------

Review request for pig.


Summary
-------


Currently xmlloader does not support nested tags with same tag name, ie if i have the below content

<event>
 <relatedEvents>
   <event>x<\event>
   <event>y<\event>
   <event>z<\event>
 <\relatedEvents>
<\event>

And I load the above using XMLLoader,
events = load 'input' using org.apache.pig.piggybank.storage.XMLLoader('event') as (doc:chararray);

The output will be,

<event>
 <relatedEvents>
   <event>x<\event>

Whereas the desired output is ;

<relatedEvents>
   <event>x<\event>
   <event>y<\event>
   <event>z<\event>
 <\relatedEvents>

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Modified the behaviour of XMLLoader such that it considers the nested tags also. This is implemented by simply counting the number of nesting and decrementing accordingly.


This addresses bug PIG-2147.
    https://issues.apache.org/jira/browse/PIG-2147


Diffs
-----


Diff: https://reviews.apache.org/r/1064/diff


Testing
-------


Thanks,

Vivek



> Support nested tags for XMLLoader
> ---------------------------------
>
>                 Key: PIG-2147
>                 URL: https://issues.apache.org/jira/browse/PIG-2147
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Vivek Padmanabhan
>             Fix For: 0.8.1, 0.9.0
>
>         Attachments: PIG-2147_1.patch
>
>
> Currently xmlloader does not support nested tags with same tag name, ie if i have the below content
> {code}
> <event>
>  <relatedEvents>
>    <event>x<\event>
>    <event>y<\event>
>    <event>z<\event>
>  <\relatedEvents>
> <\event>
> {code}
> And I load the above using XMLLoader,
> events = load 'input' using org.apache.pig.piggybank.storage.XMLLoader('event') as (doc:chararray);
> The output will be,
> {code}
> <event>
>  <relatedEvents>
>    <event>x<\event>
> {code}
> Whereas the desired output is ;
> {code}
>  <relatedEvents>
>    <event>x<\event>
>    <event>y<\event>
>    <event>z<\event>
>  <\relatedEvents>
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2147) Support nested tags for XMLLoader

Posted by "Vivek Padmanabhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Padmanabhan updated PIG-2147:
-----------------------------------

    Attachment: PIG-2147_1.patch

Attaching an initial patch.

> Support nested tags for XMLLoader
> ---------------------------------
>
>                 Key: PIG-2147
>                 URL: https://issues.apache.org/jira/browse/PIG-2147
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Vivek Padmanabhan
>             Fix For: 0.8.1, 0.9.0
>
>         Attachments: PIG-2147_1.patch
>
>
> Currently xmlloader does not support nested tags with same tag name, ie if i have the below content
> {code}
> <event>
>  <relatedEvents>
>    <event>x<\event>
>    <event>y<\event>
>    <event>z<\event>
>  <\relatedEvents>
> <\event>
> {code}
> And I load the above using XMLLoader,
> events = load 'input' using org.apache.pig.piggybank.storage.XMLLoader('event') as (doc:chararray);
> The output will be,
> {code}
> <event>
>  <relatedEvents>
>    <event>x<\event>
> {code}
> Whereas the desired output is ;
> {code}
>  <relatedEvents>
>    <event>x<\event>
>    <event>y<\event>
>    <event>z<\event>
>  <\relatedEvents>
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2147) Support nested tags for XMLLoader

Posted by "Vivek Padmanabhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Padmanabhan updated PIG-2147:
-----------------------------------

    Fix Version/s: 0.8.1
                   0.9.0
           Status: Patch Available  (was: Open)

> Support nested tags for XMLLoader
> ---------------------------------
>
>                 Key: PIG-2147
>                 URL: https://issues.apache.org/jira/browse/PIG-2147
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Vivek Padmanabhan
>             Fix For: 0.9.0, 0.8.1
>
>         Attachments: PIG-2147_1.patch
>
>
> Currently xmlloader does not support nested tags with same tag name, ie if i have the below content
> {code}
> <event>
>  <relatedEvents>
>    <event>x<\event>
>    <event>y<\event>
>    <event>z<\event>
>  <\relatedEvents>
> <\event>
> {code}
> And I load the above using XMLLoader,
> events = load 'input' using org.apache.pig.piggybank.storage.XMLLoader('event') as (doc:chararray);
> The output will be,
> {code}
> <event>
>  <relatedEvents>
>    <event>x<\event>
> {code}
> Whereas the desired output is ;
> {code}
>  <relatedEvents>
>    <event>x<\event>
>    <event>y<\event>
>    <event>z<\event>
>  <\relatedEvents>
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2147) Support nested tags for XMLLoader

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063700#comment-13063700 ] 

Daniel Dai commented on PIG-2147:
---------------------------------

All test pass. test-patch show positive result. Committed to trunk first.

> Support nested tags for XMLLoader
> ---------------------------------
>
>                 Key: PIG-2147
>                 URL: https://issues.apache.org/jira/browse/PIG-2147
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Vivek Padmanabhan
>             Fix For: 0.8.1, 0.9.0
>
>         Attachments: PIG-2147_1.patch
>
>
> Currently xmlloader does not support nested tags with same tag name, ie if i have the below content
> {code}
> <event>
>  <relatedEvents>
>    <event>x<\event>
>    <event>y<\event>
>    <event>z<\event>
>  <\relatedEvents>
> <\event>
> {code}
> And I load the above using XMLLoader,
> events = load 'input' using org.apache.pig.piggybank.storage.XMLLoader('event') as (doc:chararray);
> The output will be,
> {code}
> <event>
>  <relatedEvents>
>    <event>x<\event>
> {code}
> Whereas the desired output is ;
> {code}
>  <relatedEvents>
>    <event>x<\event>
>    <event>y<\event>
>    <event>z<\event>
>  <\relatedEvents>
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira