You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Koji Noguchi (JIRA)" <ji...@apache.org> on 2015/07/23 00:11:06 UTC

[jira] [Updated] (PIG-4551) Partition filter is not pushed down in case of SPLIT

     [ https://issues.apache.org/jira/browse/PIG-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated PIG-4551:
------------------------------
    Attachment: pig-4551_v01_notestyet.patch

Two approaches we can think of.
(1) Do not use LOSplit but load them twice.
(2) Merge the filters and see if we can push them down as partition filter.

Attaching a primitive patch (pig-4551_v01_notestyet.patch) that tries to do (2). 
It adds a merged filter above the LOSplit and hopes that it get pushed up to the loader and to the partition filter.  

So far there's no code that removes this unnecessary filter later in the optimization.

> Partition filter is not pushed down in case of SPLIT
> ----------------------------------------------------
>
>                 Key: PIG-4551
>                 URL: https://issues.apache.org/jira/browse/PIG-4551
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11.1
>            Reporter: Rohini Palaniswamy
>         Attachments: pig-4551_v01_notestyet.patch
>
>
>   The below query with implicit split will not push down the partition filters and will scan the whole table. 
> {code}
> A  = LOAD 'db1.table1'        USING org.apache.hive.hcatalog.pig.HCatLoader();
> B = FILTER A BY ( ((date=='20150501' AND pk2 =='1')) and pk3 == '127' );
> C  = FILTER A BY ( ((date=='20150501' AND pk2=='1') OR (date=='20150430' AND pk2=='1')) and pk3 == '127' );
> {code}
> The workaround now is to write two separate LOAD statements for each FILTER. We should do that behind the scenes while planning instead of user having to do that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)