You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Pradeep Kamath (JIRA)" <ji...@apache.org> on 2010/04/06 23:25:33 UTC

[jira] Commented: (PIG-1299) Implement Pig counter to track number of output rows for each output files

    [ https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854188#action_12854188 ] 

Pradeep Kamath commented on PIG-1299:
-------------------------------------

Changes are mostly good - a few comments:
1) Instead of creating a wrapper RecordWriter in MapReducePOStoreImpl, the incrementing of the counter should be done in POStore.getNext() - POStore holds a reference to MapReducePOStoreImpl, so the counter is available for incrementing. This way, we will still keep our contract to StoreFunc that the RecordWriter instance provided in prepareToWrite() is the same as the one given by StoreFunc.getOutputFormat().getRecordWriter(). With this change, the change to BinStorage should be reverted.
2) Is the check for store.isMultiStore() required in MapReducePOStoreImpl - I think MapReducePOStoreImpl is used only with multi-store POStore(s) - so the check seems redundant
3) If javac warnings can be addressed, please address them - also unit tests along the lines of those in TestCounters would be good.

> Implement Pig counter  to track number of output rows for each output files 
> ----------------------------------------------------------------------------
>
>                 Key: PIG-1299
>                 URL: https://issues.apache.org/jira/browse/PIG-1299
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>             Fix For: 0.8.0
>
>         Attachments: PIG-1299.patch
>
>
> When running a multi-store query, the Hadoop job tracker often displays only 0 for "Reduce output records" or "Map output records" counters, This is incorrect and misleading. Pig should implement an "output records" counter for each output files in the query. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.