You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by "Satish Mittal (JIRA)" <ji...@apache.org> on 2014/01/31 13:28:12 UTC

[jira] [Comment Edited] (FALCON-284) Hcatalog based feed retention doesn't work when partition filter spans across multiple partition keys

    [ https://issues.apache.org/jira/browse/FALCON-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887687#comment-13887687 ] 

Satish Mittal edited comment on FALCON-284 at 1/31/14 12:27 PM:
----------------------------------------------------------------

Suppose we have a HCatalog table table1 that is PARTITIONED BY (year STRING, month STRING, day STRING, hour STRING, minute STRING).

And we submit a falcon feed corresponding to table1 and with a retention of 2 hours:

    <clusters>
        <cluster name="hcat-cluster">
            <validity start="2013-01-01T00:00Z" end="2030-01-01T00:00Z"/>
            <retention limit="hours(2)" action="delete"/>
        </cluster>
    </clusters>

    <table uri="catalog:default:table1#year=$\{YEAR\};month=$\{MONTH\};day=$\{DAY\};hour=$\{HOUR\};minute=$\{MINUTE\}" />

The feed retention jobs for this feed succeed; however the partition filter used by retention only considers *year* in the partition filter. Here is a snippet of task log:

*2014-01-30 12:12:10,940 INFO  - List partitions for : table1, partition filter: year < '2014' (HiveCatalogService:134)*
2014-01-30 12:12:11,519 WARN  - DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. (HiveConf:1231)
2014-01-30 12:12:11,844 INFO  - Trying to connect to metastore with URI thrift://localhost:5055 (metastore:249)
2014-01-30 12:12:11,881 INFO  - Waiting 1 seconds before next connection attempt. (metastore:327)
2014-01-30 12:12:12,881 INFO  - Connected to metastore. (metastore:337)
2014-01-30 12:12:12,930 INFO  - Caching HCatalog client object for thrift://localhost:5055 (HiveCatalogService:61)
*2014-01-30 12:12:12,971 INFO  - No partitions to delete. (FeedEvictor:389)*





was (Author: satish.mittal):
Suppose we have a HCatalog table table1 that is PARTITIONED BY (year STRING, month STRING, day STRING, hour STRING, minute STRING).

And we submit a falcon feed corresponding to table1 and with a retention of 2 hours:

    <clusters>
        <cluster name="hcat-cluster">
            <validity start="2013-01-01T00:00Z" end="2030-01-01T00:00Z"/>
            <retention limit="hours(2)" action="delete"/>
        </cluster>
    </clusters>

    <table uri="catalog:default:table1#year=${YEAR};month=${MONTH};day=${DAY};hour=${HOUR};minute=${MINUTE}" />

The feed retention jobs for this feed succeed; however the partition filter used by retention only considers *year* in the partition filter. Here is a snippet of task log:

*2014-01-30 12:12:10,940 INFO  - List partitions for : table1, partition filter: year < '2014' (HiveCatalogService:134)*
2014-01-30 12:12:11,519 WARN  - DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. (HiveConf:1231)
2014-01-30 12:12:11,844 INFO  - Trying to connect to metastore with URI thrift://localhost:5055 (metastore:249)
2014-01-30 12:12:11,881 INFO  - Waiting 1 seconds before next connection attempt. (metastore:327)
2014-01-30 12:12:12,881 INFO  - Connected to metastore. (metastore:337)
2014-01-30 12:12:12,930 INFO  - Caching HCatalog client object for thrift://localhost:5055 (HiveCatalogService:61)
2014-01-30 12:12:12,971 INFO  - No partitions to delete. (FeedEvictor:389)




> Hcatalog based feed retention doesn't work when partition filter spans across multiple partition keys
> -----------------------------------------------------------------------------------------------------
>
>                 Key: FALCON-284
>                 URL: https://issues.apache.org/jira/browse/FALCON-284
>             Project: Falcon
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Satish Mittal
>
> When an HCatalog based feed is scheduled in falcon, retention only looks at the first partition key that satisfies either of date pattern: yyyy | MM | dd | HH | mm. As a result, it calculates a partition filter that contains only one of these patterns. However if HCatalog table is defined in such a way that date spans across multiple partition keys (year/month/day/hour/minute), then feed retention doesn't delete any partitions that are granular than first level (year).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)