You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Nandor Kollar (JIRA)" <ji...@apache.org> on 2016/10/04 07:59:21 UTC

[jira] [Work started] (PIG-3891) FileBasedOutputSizeReader does not calculate size of files in sub-directories

     [ https://issues.apache.org/jira/browse/PIG-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on PIG-3891 started by Nandor Kollar.
------------------------------------------
> FileBasedOutputSizeReader does not calculate size of files in sub-directories
> -----------------------------------------------------------------------------
>
>                 Key: PIG-3891
>                 URL: https://issues.apache.org/jira/browse/PIG-3891
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Rohini Palaniswamy
>            Assignee: Nandor Kollar
>         Attachments: PIG-3891-1.patch, PIG-3891-2.patch, PIG-3891-3.patch, PIG-3891-4.patch
>
>
> FileBasedOutputSizeReader only includes files in the top level output directory. So if files are stored under subdirectories (For eg: MultiStorage), it does not have the bytes written correctly. 
> 0.11 shows the correct number of total bytes written and this is a regression. A quick look at the code shows that the JobStats.addOneOutputStats() in 0.11 also does not recursively iterate and code is same as  FileBasedOutputSizeReader. Need to investigate where the correct value comes from in 0.11 and fix it in 0.12.1/0.13.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)