You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2011/03/30 22:19:05 UTC

[jira] [Updated] (PIG-1842) Improve Scalability of the XMLLoader for large datasets such as wikipedia

     [ https://issues.apache.org/jira/browse/PIG-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1842:
----------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> Improve Scalability of the XMLLoader for large datasets such as wikipedia
> -------------------------------------------------------------------------
>
>                 Key: PIG-1842
>                 URL: https://issues.apache.org/jira/browse/PIG-1842
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.7.0, 0.8.0, 0.9.0
>            Reporter: Viraj Bhat
>            Assignee: Vivek Padmanabhan
>             Fix For: 0.9.0, 0.8.0, 0.7.0
>
>         Attachments: PIG-1842_1.patch, PIG-1842_2.patch, TEST-org.apache.pig.piggybank.test.storage.TestXMLLoader.txt
>
>
> The current XMLLoader for Pig, does not work well for large datasets such as the wikipedia dataset. Each mapper reads in the entire XML file resulting in extermely slow run times.
> Viraj

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira