You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sentry.apache.org by Alexander Kolbasov <ak...@gmail.com> on 2017/09/01 05:33:15 UTC

Re: Review Request 61973: SENTRY-1888: Sentry might not fetch all HMS duplicated events IDs when requested


> On Aug. 31, 2017, 12:25 a.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java
> > Lines 127 (patched)
> > <https://reviews.apache.org/r/61973/diff/1/?file=1807071#file1807071line127>
> >
> >     If I am reading this code correctly, the following will happen:
> >     
> >     1) Every time we request IDs starting from last seen ID, so normally the first ID in the sequence is always a duplicate
> >     2) This first ID in the sequence isn't in the cache yet, so we go to the sentryStore and fetch the hash and compare.
> >     
> >     This means that we always go to sentryStore for the frist element of the batch which means every time we call fetchNotifications().
> >     
> >     This is not very bad, but if we store the sha1 hash of the last event in the cache we don't have to go to sentryStore at all under normal circumstances
> 
> Sergio Pena wrote:
>     Keeping the cache of the last ID is not enough. The HiveNotificationFetcher does not know if the IDs fetched are processed, so the next call to fetchNotifications() may  be called with an ID that was not cached. If I want to cache all of them, then I will add more procesisng on getting the sha1() of all the notifications and adding them to the cache. Isn't simpler just to keep it the way it is now?

This is possible but requires a different code structure. This fix is good enough.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61973/#review184215
-----------------------------------------------------------


On Aug. 31, 2017, 6:13 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61973/
> -----------------------------------------------------------
> 
> (Updated Aug. 31, 2017, 6:13 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov and Na Li.
> 
> 
> Bugs: sentry-1888
>     https://issues.apache.org/jira/browse/sentry-1888
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> Approach:
> The proposed solution will be to request notifications from the last ID seen again. This way we could bring current duplicates and apply them on Sentry. We have the risk to miss duplicates that were committed much time later, but we cannot trust on those duplicates as they will not know the order of the time they were committed.
> 
> The patch adds a new helper class called HiveNotificationFetcher that allow us to bring new notifications and apply a filter to those notifications that were already processed by Sentry.
> 
> 
> Diffs
> -----
> 
>   sentry-hdfs/sentry-hdfs-common/src/main/java/org/apache/sentry/hdfs/UniquePathsUpdate.java 7dae2f538602f4c264084791fb45bb6891a34941 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java 593b92f96b47f959266280bce3d0809f6a80c362 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java 35da6fc7908ec7498d1fd658d75b62028df35751 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HiveNotificationFetcher.java PRE-CREATION 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/SentryHMSClient.java 4a8fb953ce486e1aeb1042884de56872b5539cd0 
>   sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestHMSFollower.java f56384a99e128b3e9880cbc5692107d61f2f500f 
>   sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestHiveNotificationFetcher.java PRE-CREATION 
>   sentry-provider/sentry-provider-db/src/test/java/org/apache/sentry/service/thrift/TestSentryHMSClient.java 77a2bbb214e23ca146c2934ee4d3bf15e592906f 
> 
> 
> Diff: https://reviews.apache.org/r/61973/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>