You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2011/02/14 12:41:57 UTC

[jira] Commented: (PIG-1828) HBaseStorage has problems with processing multiregion tables

    [ https://issues.apache.org/jira/browse/PIG-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994267#comment-12994267 ] 

Dmitriy V. Ryaboy commented on PIG-1828:
----------------------------------------

hunted about for that this weekend, and it looks like Job is only passed in setLocation, at which point it's too late -- the modified job conf won't be the one used for the actual job! (see http://www.apacheserver.net/UDFContext-in-0-8-LoadFunc-at1098923.htm).

The suggestion in the linked thread was to use relativeToAbsolutePath to stick things into UDFContext, which worked there since the task was just to pass something along to the job-side loadfuncs, but does not work in this case, where we actually want to fix up the Conf.

Various LoadMetadata functions also get called with the Job param, I'll try that next.
I am not sure where those are getting called in regards to job creation, but regardless, it's silly to make people implement LoadMetadata just to be able to muck with the job config. We can add something like this to JobControlCompiler.getJob:


                    // Call setLocation as a hacky way of letting a LoadFunc fix up the Job.
                    lf.setLocation(ld.getLFile().getFileName(), nwJob);

(inside its POLoad loop). 

Thoughts?

> HBaseStorage has problems with processing multiregion tables
> ------------------------------------------------------------
>
>                 Key: PIG-1828
>                 URL: https://issues.apache.org/jira/browse/PIG-1828
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>         Environment: Hadoop 0.20.2, Hbase 0.20.6, Distributed mode
>            Reporter: Lukas
>            Assignee: Dmitriy V. Ryaboy
>
> As brought up in the pig user mailing list (http://www.mail-archive.com/user%40pig.apache.org/msg00606.html) Pig does sometime not scan the full HBase table.
> It seems that HBaseStorage has problems scanning large tables. It issues just one mapper job instead of one mapper job per table region.
> Ian Stevens, who brought this issue up in the mailing list, attached a script to reproduce the problem (https://gist.github.com/766929).
> However, in my case, the problem only occurred, after the table was split into more than one regions.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira