You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Cheolsoo Park (JIRA)" <ji...@apache.org> on 2012/10/29 23:56:13 UTC

[jira] [Commented] (PIG-2924) PigStats should not be assuming all Storage classes to be file-based storage

    [ https://issues.apache.org/jira/browse/PIG-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486456#comment-13486456 ] 

Cheolsoo Park commented on PIG-2924:
------------------------------------

Hi Bill, thank you very much for reviewing my patch!

I totally agree with most of your comments. In particular, adding a supports() method seems like an elegant way to support multiple computers. I will make that change in a new patch.

But I am wondering if you would agree to remove POStore from the interface. The reason why I want to remove it is because I don't think that POStore is needed to implement supports() and getOutputSize() for any kinds of computers. All we need is probably the uri string, so it seems to make sense to pass the uri string (or a URI object) instead of the whole POStore. Please let me know if you think otherwise. 

Regarding the name of the interface, I couldn't come up with a better name. Reader sounds good to me. Maybe reporter or calculator?

Thanks!
                
> PigStats should not be assuming all Storage classes to be file-based storage
> ----------------------------------------------------------------------------
>
>                 Key: PIG-2924
>                 URL: https://issues.apache.org/jira/browse/PIG-2924
>             Project: Pig
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 0.9.2, 0.10.0
>            Reporter: Harsh J
>            Assignee: Cheolsoo Park
>         Attachments: PIG-2924-2.patch, PIG-2924.patch
>
>
> Using PigStatsUtil (like Oozie does) to collect JobStats for jobs that use a HBaseStorage blows up when the stats are asked to be accumulated.
> This is because JobStats (which adds stuff up) is assuming all storages are file based and that it can do listStatus/etc. operations on their filespec-provided filename. For HBaseStorage, this is set to the tablename and there's no such file, leading to an exception (FileNotFound or Invalid URI - depending on using 'tablename' or 'hbase://tablename').

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira