You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Albert Sunwoo (JIRA)" <ji...@apache.org> on 2011/06/02 01:04:47 UTC

[jira] [Updated] (PIG-2107) When using pig with HBaseStorage, pig filters should utilize hbase indexes to limit workset.

     [ https://issues.apache.org/jira/browse/PIG-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Albert Sunwoo updated PIG-2107:
-------------------------------

    Summary: When using pig with HBaseStorage, pig filters should utilize hbase indexes to limit workset.  (was: When using pig with hbase, pig filters should utilize hbase indexes to limit workset.)

> When using pig with HBaseStorage, pig filters should utilize hbase indexes to limit workset.
> --------------------------------------------------------------------------------------------
>
>                 Key: PIG-2107
>                 URL: https://issues.apache.org/jira/browse/PIG-2107
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Albert Sunwoo
>
> The LOAD function using HBaseStorage has filter arguments you can use limit the working set for an MR job.
> e.g. 
> blah = LOAD 'hbase://test' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:field1', '-loadKey -gte foo1 -lte foo1');
> It would be really great if this could also be applied to filter statements within pig, where a filter statement within pig e.g.
> blah2 = FILTER blah by key=foo1; or
> blah2 = FILTER blah by key > foo1 and key < foo2;
> would actually limit what is retrieved from hbase, so big has a smaller working set to perform MR on. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira