You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by "Satish Mittal (JIRA)" <ji...@apache.org> on 2014/01/31 13:28:12 UTC
[jira] [Comment Edited] (FALCON-284) Hcatalog based feed retention
doesn't work when partition filter spans across multiple partition keys
[ https://issues.apache.org/jira/browse/FALCON-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887687#comment-13887687 ]
Satish Mittal edited comment on FALCON-284 at 1/31/14 12:27 PM:
----------------------------------------------------------------
Suppose we have a HCatalog table table1 that is PARTITIONED BY (year STRING, month STRING, day STRING, hour STRING, minute STRING).
And we submit a falcon feed corresponding to table1 and with a retention of 2 hours:
<clusters>
<cluster name="hcat-cluster">
<validity start="2013-01-01T00:00Z" end="2030-01-01T00:00Z"/>
<retention limit="hours(2)" action="delete"/>
</cluster>
</clusters>
<table uri="catalog:default:table1#year=$\{YEAR\};month=$\{MONTH\};day=$\{DAY\};hour=$\{HOUR\};minute=$\{MINUTE\}" />
The feed retention jobs for this feed succeed; however the partition filter used by retention only considers *year* in the partition filter. Here is a snippet of task log:
*2014-01-30 12:12:10,940 INFO - List partitions for : table1, partition filter: year < '2014' (HiveCatalogService:134)*
2014-01-30 12:12:11,519 WARN - DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. (HiveConf:1231)
2014-01-30 12:12:11,844 INFO - Trying to connect to metastore with URI thrift://localhost:5055 (metastore:249)
2014-01-30 12:12:11,881 INFO - Waiting 1 seconds before next connection attempt. (metastore:327)
2014-01-30 12:12:12,881 INFO - Connected to metastore. (metastore:337)
2014-01-30 12:12:12,930 INFO - Caching HCatalog client object for thrift://localhost:5055 (HiveCatalogService:61)
*2014-01-30 12:12:12,971 INFO - No partitions to delete. (FeedEvictor:389)*
was (Author: satish.mittal):
Suppose we have a HCatalog table table1 that is PARTITIONED BY (year STRING, month STRING, day STRING, hour STRING, minute STRING).
And we submit a falcon feed corresponding to table1 and with a retention of 2 hours:
<clusters>
<cluster name="hcat-cluster">
<validity start="2013-01-01T00:00Z" end="2030-01-01T00:00Z"/>
<retention limit="hours(2)" action="delete"/>
</cluster>
</clusters>
<table uri="catalog:default:table1#year=${YEAR};month=${MONTH};day=${DAY};hour=${HOUR};minute=${MINUTE}" />
The feed retention jobs for this feed succeed; however the partition filter used by retention only considers *year* in the partition filter. Here is a snippet of task log:
*2014-01-30 12:12:10,940 INFO - List partitions for : table1, partition filter: year < '2014' (HiveCatalogService:134)*
2014-01-30 12:12:11,519 WARN - DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. (HiveConf:1231)
2014-01-30 12:12:11,844 INFO - Trying to connect to metastore with URI thrift://localhost:5055 (metastore:249)
2014-01-30 12:12:11,881 INFO - Waiting 1 seconds before next connection attempt. (metastore:327)
2014-01-30 12:12:12,881 INFO - Connected to metastore. (metastore:337)
2014-01-30 12:12:12,930 INFO - Caching HCatalog client object for thrift://localhost:5055 (HiveCatalogService:61)
2014-01-30 12:12:12,971 INFO - No partitions to delete. (FeedEvictor:389)
> Hcatalog based feed retention doesn't work when partition filter spans across multiple partition keys
> -----------------------------------------------------------------------------------------------------
>
> Key: FALCON-284
> URL: https://issues.apache.org/jira/browse/FALCON-284
> Project: Falcon
> Issue Type: Bug
> Affects Versions: 0.5
> Reporter: Satish Mittal
>
> When an HCatalog based feed is scheduled in falcon, retention only looks at the first partition key that satisfies either of date pattern: yyyy | MM | dd | HH | mm. As a result, it calculates a partition filter that contains only one of these patterns. However if HCatalog table is defined in such a way that date spans across multiple partition keys (year/month/day/hour/minute), then feed retention doesn't delete any partitions that are granular than first level (year).
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)