You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by Seetharam Venkatesh <ve...@innerzeal.com> on 2014/03/05 20:46:54 UTC

Re: Review Request 18626: FALCON-284: Hcatalog based feed retention doesn't work when partition filter spans across multiple partition keys

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18626/#review36258
-----------------------------------------------------------



common/src/main/java/org/apache/falcon/catalog/AbstractCatalogService.java
<https://reviews.apache.org/r/18626/#comment67209>

    Why do you need a separate method for this? Why not add this to getPartition method itself. Seems wasted IMO.



common/src/main/java/org/apache/falcon/catalog/HiveCatalogService.java
<https://reviews.apache.org/r/18626/#comment67210>

    Why do you need a separate method for this? Why not add this to getPartition method itself. Seems wasted IMO.
    
    Also, it naturally belongs to CatalogStorage, no?



retention/src/main/java/org/apache/falcon/retention/FeedEvictor.java
<https://reviews.apache.org/r/18626/#comment67212>

    nit: method can be map or fill rather than get



retention/src/main/java/org/apache/falcon/retention/FeedEvictor.java
<https://reviews.apache.org/r/18626/#comment67213>

    Can this method be on CatalogStorage?



retention/src/main/java/org/apache/falcon/retention/FeedEvictor.java
<https://reviews.apache.org/r/18626/#comment67215>

    nit: dropPartitionsForAnInstance



retention/src/main/java/org/apache/falcon/retention/FeedEvictor.java
<https://reviews.apache.org/r/18626/#comment67216>

    Why 2 booleans? dropped and deleted? rename dropped to deleted.


- Seetharam Venkatesh


On Feb. 28, 2014, 2:56 p.m., Satish Mittal wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18626/
> -----------------------------------------------------------
> 
> (Updated Feb. 28, 2014, 2:56 p.m.)
> 
> 
> Review request for Falcon and Srikanth Sundarrajan.
> 
> 
> Repository: falcon-git
> 
> 
> Description
> -------
> 
> When an HCatalog based feed is scheduled in falcon, retention only looks at the first partition key that satisfies either of date pattern: yyyy | MM | dd | HH | mm. As a result, it calculates a partition filter that contains only one of these patterns. However if HCatalog table is defined in such a way that date spans across multiple partition keys (year/month/day/hour/minute), then feed retention doesn't delete any partitions that are granular than first level (year).
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/falcon/catalog/AbstractCatalogService.java fc9c3b1 
>   common/src/main/java/org/apache/falcon/catalog/HiveCatalogService.java 3c3660e 
>   common/src/main/java/org/apache/falcon/entity/common/FeedDataPath.java 4031e14 
>   retention/src/main/java/org/apache/falcon/retention/FeedEvictor.java 13c447c 
>   webapp/src/test/java/org/apache/falcon/lifecycle/TableStorageFeedEvictorIT.java 770780e 
> 
> Diff: https://reviews.apache.org/r/18626/diff/
> 
> 
> Testing
> -------
> 
> - Added new integration tests in TableStorageFeedEvictorIT.java to test retention for an Hcatalog feed where date consists of multiple partitions columns (year/month/day).
> - Verified the retention behavior on a test cluster having an Hcatalog based feed partitioned by year/month/day/hour/minute/country.
> 
> 
> Thanks,
> 
> Satish Mittal
> 
>


Re: Review Request 18626: FALCON-284: Hcatalog based feed retention doesn't work when partition filter spans across multiple partition keys

Posted by Satish Mittal <sa...@apache.org>.

> On March 5, 2014, 7:46 p.m., Seetharam Venkatesh wrote:
> > common/src/main/java/org/apache/falcon/catalog/AbstractCatalogService.java, line 111
> > <https://reviews.apache.org/r/18626/diff/2/?file=507006#file507006line111>
> >
> >     Why do you need a separate method for this? Why not add this to getPartition method itself. Seems wasted IMO.

The getPartitionColumns method fetches the names of all partition columns defined in the given table name. This info can't be extracted from getPartition method, which returns details about a specific partition within the table. Will rename the former method to getTablePartitionCols() to be more precise.


> On March 5, 2014, 7:46 p.m., Seetharam Venkatesh wrote:
> > common/src/main/java/org/apache/falcon/catalog/HiveCatalogService.java, line 239
> > <https://reviews.apache.org/r/18626/diff/2/?file=507007#file507007line239>
> >
> >     Why do you need a separate method for this? Why not add this to getPartition method itself. Seems wasted IMO.
> >     
> >     Also, it naturally belongs to CatalogStorage, no?

See earlier comments.

CatalogStorage class today contains the table details defined in a falcon feed xml, eg. URI, db name, table name, partition string etc. It is not getting populated with any info fetched from AbstractCatalogService. Have retained that abstraction.


> On March 5, 2014, 7:46 p.m., Seetharam Venkatesh wrote:
> > retention/src/main/java/org/apache/falcon/retention/FeedEvictor.java, line 391
> > <https://reviews.apache.org/r/18626/diff/2/?file=507009#file507009line391>
> >
> >     nit: method can be map or fill rather than get

Will rename.


> On March 5, 2014, 7:46 p.m., Seetharam Venkatesh wrote:
> > retention/src/main/java/org/apache/falcon/retention/FeedEvictor.java, line 502
> > <https://reviews.apache.org/r/18626/diff/2/?file=507009#file507009line502>
> >
> >     Can this method be on CatalogStorage?

See earlier comments.


> On March 5, 2014, 7:46 p.m., Seetharam Venkatesh wrote:
> > retention/src/main/java/org/apache/falcon/retention/FeedEvictor.java, line 539
> > <https://reviews.apache.org/r/18626/diff/2/?file=507009#file507009line539>
> >
> >     nit: dropPartitionsForAnInstance

Will rename.


> On March 5, 2014, 7:46 p.m., Seetharam Venkatesh wrote:
> > retention/src/main/java/org/apache/falcon/retention/FeedEvictor.java, line 553
> > <https://reviews.apache.org/r/18626/diff/2/?file=507009#file507009line553>
> >
> >     Why 2 booleans? dropped and deleted? rename dropped to deleted.

Will do.


- Satish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18626/#review36258
-----------------------------------------------------------


On Feb. 28, 2014, 2:56 p.m., Satish Mittal wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18626/
> -----------------------------------------------------------
> 
> (Updated Feb. 28, 2014, 2:56 p.m.)
> 
> 
> Review request for Falcon and Srikanth Sundarrajan.
> 
> 
> Repository: falcon-git
> 
> 
> Description
> -------
> 
> When an HCatalog based feed is scheduled in falcon, retention only looks at the first partition key that satisfies either of date pattern: yyyy | MM | dd | HH | mm. As a result, it calculates a partition filter that contains only one of these patterns. However if HCatalog table is defined in such a way that date spans across multiple partition keys (year/month/day/hour/minute), then feed retention doesn't delete any partitions that are granular than first level (year).
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/falcon/catalog/AbstractCatalogService.java fc9c3b1 
>   common/src/main/java/org/apache/falcon/catalog/HiveCatalogService.java 3c3660e 
>   common/src/main/java/org/apache/falcon/entity/common/FeedDataPath.java 4031e14 
>   retention/src/main/java/org/apache/falcon/retention/FeedEvictor.java 13c447c 
>   webapp/src/test/java/org/apache/falcon/lifecycle/TableStorageFeedEvictorIT.java 770780e 
> 
> Diff: https://reviews.apache.org/r/18626/diff/
> 
> 
> Testing
> -------
> 
> - Added new integration tests in TableStorageFeedEvictorIT.java to test retention for an Hcatalog feed where date consists of multiple partitions columns (year/month/day).
> - Verified the retention behavior on a test cluster having an Hcatalog based feed partitioned by year/month/day/hour/minute/country.
> 
> 
> Thanks,
> 
> Satish Mittal
> 
>