You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@sentry.apache.org by Sergio Pena <se...@cloudera.com> on 2017/05/15 14:58:55 UTC

Review Request 59274: SENTRY-1760: HMSFollower should detect when a full snapshot from HMS is required

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59274/
-----------------------------------------------------------

Review request for sentry, Alexander Kolbasov, kalyan kumar kalvagadda, and Na Li.


Bugs: SENTRY-1760
    https://issues.apache.org/jira/browse/SENTRY-1760


Repository: sentry


Description
-------

The patch will set the 'needHiveSnapshot' to TRUE whenever the following cases are found:

- List of notifications received are different than expected. 
  This may happen when Sentry has been down or HDFS sync was disabled for a while (more than 24h), 
  and the HMS cleared old notifications (older than 24h) not processed by Sentry causing a gap when retrieving notifications.
  
- Latest Sentry notification ID processed is bigger than current HMS notification ID.
  This may happen when the HMS DB data was reset or restore from an old snapshot causing sync issues with Sentry.
  
When needHiveSnapshot is set to TRUE, then the HMSFollower will CLEAR any hive snapshot stored on the Sentry store, and recreate a
new hive snapshot from scratch to keep Sentry in sync.


Diffs
-----

  sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java ef6786537e9c5f7730bc86d44e8b4a168c20677e 
  sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java 5e6b906587f6422d9bf1466ab83815722bd51fb0 


Diff: https://reviews.apache.org/r/59274/diff/1/


Testing
-------

HadoopQA is GREEN.


Thanks,

Sergio Pena

Re: Review Request 59274: SENTRY-1760: HMSFollower should detect when a full snapshot from HMS is required

Posted by Alexander Kolbasov <ak...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59274/#review175110
-----------------------------------------------------------




sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
Lines 328 (patched)
<https://reviews.apache.org/r/59274/#comment248467>

    I was asking about a scenario when you ask for the current notification id from HMS, and afterwords asked for specific notifications and their value is smaller. This seems to be a weird scenario.


- Alexander Kolbasov


On May 15, 2017, 2:58 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59274/
> -----------------------------------------------------------
> 
> (Updated May 15, 2017, 2:58 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, kalyan kumar kalvagadda, and Na Li.
> 
> 
> Bugs: SENTRY-1760
>     https://issues.apache.org/jira/browse/SENTRY-1760
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> The patch will set the 'needHiveSnapshot' to TRUE whenever the following cases are found:
> 
> - List of notifications received are different than expected. 
>   This may happen when Sentry has been down or HDFS sync was disabled for a while (more than 24h), 
>   and the HMS cleared old notifications (older than 24h) not processed by Sentry causing a gap when retrieving notifications.
>   
> - Latest Sentry notification ID processed is bigger than current HMS notification ID.
>   This may happen when the HMS DB data was reset or restore from an old snapshot causing sync issues with Sentry.
>   
> When needHiveSnapshot is set to TRUE, then the HMSFollower will CLEAR any hive snapshot stored on the Sentry store, and recreate a
> new hive snapshot from scratch to keep Sentry in sync.
> 
> 
> Diffs
> -----
> 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java ef6786537e9c5f7730bc86d44e8b4a168c20677e 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java 5e6b906587f6422d9bf1466ab83815722bd51fb0 
> 
> 
> Diff: https://reviews.apache.org/r/59274/diff/1/
> 
> 
> Testing
> -------
> 
> HadoopQA is GREEN.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>

Re: Review Request 59274: SENTRY-1760: HMSFollower should detect when a full snapshot from HMS is required

Posted by Sergio Pena <se...@cloudera.com>.


> On May 15, 2017, 3:14 p.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 2513 (patched)
> > <https://reviews.apache.org/r/59274/diff/1/?file=1718346#file1718346line2513>
> >
> >     Will this automatically clear all the paths or we need to do it separately somehow?
> 
> Sergio Pena wrote:
>     What do you mean? I copied this small code from the method that clears all tables.
> 
> Alexander Kolbasov wrote:
>     There is a separate table that holds all paths values (and also a mapping table that glues them together. Whwn we delete all MAuthzPathsMapping entries, does this delete the actual entries or just removes the mapping? Do we need to have a separate cleanup step to remove stale entries?

What is the other table? I thought MAuthzPathsMapping was the only table used based that SentryStore.createAuthzPathsMappingCore() is the only table it persists. I don't think my code is deleting all the entries you're mentioning.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59274/#review174953
-----------------------------------------------------------


On May 15, 2017, 2:58 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59274/
> -----------------------------------------------------------
> 
> (Updated May 15, 2017, 2:58 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, kalyan kumar kalvagadda, and Na Li.
> 
> 
> Bugs: SENTRY-1760
>     https://issues.apache.org/jira/browse/SENTRY-1760
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> The patch will set the 'needHiveSnapshot' to TRUE whenever the following cases are found:
> 
> - List of notifications received are different than expected. 
>   This may happen when Sentry has been down or HDFS sync was disabled for a while (more than 24h), 
>   and the HMS cleared old notifications (older than 24h) not processed by Sentry causing a gap when retrieving notifications.
>   
> - Latest Sentry notification ID processed is bigger than current HMS notification ID.
>   This may happen when the HMS DB data was reset or restore from an old snapshot causing sync issues with Sentry.
>   
> When needHiveSnapshot is set to TRUE, then the HMSFollower will CLEAR any hive snapshot stored on the Sentry store, and recreate a
> new hive snapshot from scratch to keep Sentry in sync.
> 
> 
> Diffs
> -----
> 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java ef6786537e9c5f7730bc86d44e8b4a168c20677e 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java 5e6b906587f6422d9bf1466ab83815722bd51fb0 
> 
> 
> Diff: https://reviews.apache.org/r/59274/diff/1/
> 
> 
> Testing
> -------
> 
> HadoopQA is GREEN.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>

Re: Review Request 59274: SENTRY-1760: HMSFollower should detect when a full snapshot from HMS is required

Posted by Sergio Pena <se...@cloudera.com>.


> On May 15, 2017, 3:14 p.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 2513 (patched)
> > <https://reviews.apache.org/r/59274/diff/1/?file=1718346#file1718346line2513>
> >
> >     Will this automatically clear all the paths or we need to do it separately somehow?

What do you mean? I copied this small code from the method that clears all tables.


> On May 15, 2017, 3:14 p.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
> > Lines 328 (patched)
> > <https://reviews.apache.org/r/59274/diff/1/?file=1718347#file1718347line329>
> >
> >     note that getting eventId and getting events is done via to separate calls. New events may appear between these two calls, so it is very likely that the condition 
> >     
> >     response.getEvents().size() == expectedNotificationsSize)
> >     
> >     is false.

True. What about just checking if expected notifications is LESS instead? If we get less events, then something is wrong. If we get more events, then we're fine.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59274/#review174953
-----------------------------------------------------------


On May 15, 2017, 2:58 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59274/
> -----------------------------------------------------------
> 
> (Updated May 15, 2017, 2:58 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, kalyan kumar kalvagadda, and Na Li.
> 
> 
> Bugs: SENTRY-1760
>     https://issues.apache.org/jira/browse/SENTRY-1760
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> The patch will set the 'needHiveSnapshot' to TRUE whenever the following cases are found:
> 
> - List of notifications received are different than expected. 
>   This may happen when Sentry has been down or HDFS sync was disabled for a while (more than 24h), 
>   and the HMS cleared old notifications (older than 24h) not processed by Sentry causing a gap when retrieving notifications.
>   
> - Latest Sentry notification ID processed is bigger than current HMS notification ID.
>   This may happen when the HMS DB data was reset or restore from an old snapshot causing sync issues with Sentry.
>   
> When needHiveSnapshot is set to TRUE, then the HMSFollower will CLEAR any hive snapshot stored on the Sentry store, and recreate a
> new hive snapshot from scratch to keep Sentry in sync.
> 
> 
> Diffs
> -----
> 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java ef6786537e9c5f7730bc86d44e8b4a168c20677e 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java 5e6b906587f6422d9bf1466ab83815722bd51fb0 
> 
> 
> Diff: https://reviews.apache.org/r/59274/diff/1/
> 
> 
> Testing
> -------
> 
> HadoopQA is GREEN.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>

Re: Review Request 59274: SENTRY-1760: HMSFollower should detect when a full snapshot from HMS is required

Posted by Na Li <li...@cloudera.com>.


> On May 15, 2017, 3:14 p.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
> > Lines 328 (patched)
> > <https://reviews.apache.org/r/59274/diff/1/?file=1718347#file1718347line329>
> >
> >     note that getting eventId and getting events is done via to separate calls. New events may appear between these two calls, so it is very likely that the condition 
> >     
> >     response.getEvents().size() == expectedNotificationsSize)
> >     
> >     is false.
> 
> Sergio Pena wrote:
>     True. What about just checking if expected notifications is LESS instead? If we get less events, then something is wrong. If we get more events, then we're fine.
> 
> Alexander Kolbasov wrote:
>     What are you trying to achieve here? Asl long as we know that there are some new events to process, is there a reason not to process them? The case where we get LESS events that we'd hope we would get is, indeed, weird, but it doesn't seem to warrant a full snapshot, or you think otherwise?

I believe Sergio considers (expected notification size > retrived events size) equal to (current eventId < min(Notification ID) at meta store) (which inditicates a gap and some notifications are removed from meta store and not received by Sentry). 

This equivalence seems OK. One problem is that it assumes the meta store returns all notifications from [current eventID, max(Notification ID)]. Is it possible meta store sends only part of the notification because they are too big?


- Na


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59274/#review174953
-----------------------------------------------------------


On May 15, 2017, 2:58 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59274/
> -----------------------------------------------------------
> 
> (Updated May 15, 2017, 2:58 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, kalyan kumar kalvagadda, and Na Li.
> 
> 
> Bugs: SENTRY-1760
>     https://issues.apache.org/jira/browse/SENTRY-1760
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> The patch will set the 'needHiveSnapshot' to TRUE whenever the following cases are found:
> 
> - List of notifications received are different than expected. 
>   This may happen when Sentry has been down or HDFS sync was disabled for a while (more than 24h), 
>   and the HMS cleared old notifications (older than 24h) not processed by Sentry causing a gap when retrieving notifications.
>   
> - Latest Sentry notification ID processed is bigger than current HMS notification ID.
>   This may happen when the HMS DB data was reset or restore from an old snapshot causing sync issues with Sentry.
>   
> When needHiveSnapshot is set to TRUE, then the HMSFollower will CLEAR any hive snapshot stored on the Sentry store, and recreate a
> new hive snapshot from scratch to keep Sentry in sync.
> 
> 
> Diffs
> -----
> 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java ef6786537e9c5f7730bc86d44e8b4a168c20677e 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java 5e6b906587f6422d9bf1466ab83815722bd51fb0 
> 
> 
> Diff: https://reviews.apache.org/r/59274/diff/1/
> 
> 
> Testing
> -------
> 
> HadoopQA is GREEN.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>

Re: Review Request 59274: SENTRY-1760: HMSFollower should detect when a full snapshot from HMS is required

Posted by Alexander Kolbasov <ak...@gmail.com>.


> On May 15, 2017, 3:14 p.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
> > Lines 2513 (patched)
> > <https://reviews.apache.org/r/59274/diff/1/?file=1718346#file1718346line2513>
> >
> >     Will this automatically clear all the paths or we need to do it separately somehow?
> 
> Sergio Pena wrote:
>     What do you mean? I copied this small code from the method that clears all tables.

There is a separate table that holds all paths values (and also a mapping table that glues them together. Whwn we delete all MAuthzPathsMapping entries, does this delete the actual entries or just removes the mapping? Do we need to have a separate cleanup step to remove stale entries?


> On May 15, 2017, 3:14 p.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
> > Lines 328 (patched)
> > <https://reviews.apache.org/r/59274/diff/1/?file=1718347#file1718347line329>
> >
> >     note that getting eventId and getting events is done via to separate calls. New events may appear between these two calls, so it is very likely that the condition 
> >     
> >     response.getEvents().size() == expectedNotificationsSize)
> >     
> >     is false.
> 
> Sergio Pena wrote:
>     True. What about just checking if expected notifications is LESS instead? If we get less events, then something is wrong. If we get more events, then we're fine.

What are you trying to achieve here? Asl long as we know that there are some new events to process, is there a reason not to process them? The case where we get LESS events that we'd hope we would get is, indeed, weird, but it doesn't seem to warrant a full snapshot, or you think otherwise?


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59274/#review174953
-----------------------------------------------------------


On May 15, 2017, 2:58 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59274/
> -----------------------------------------------------------
> 
> (Updated May 15, 2017, 2:58 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, kalyan kumar kalvagadda, and Na Li.
> 
> 
> Bugs: SENTRY-1760
>     https://issues.apache.org/jira/browse/SENTRY-1760
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> The patch will set the 'needHiveSnapshot' to TRUE whenever the following cases are found:
> 
> - List of notifications received are different than expected. 
>   This may happen when Sentry has been down or HDFS sync was disabled for a while (more than 24h), 
>   and the HMS cleared old notifications (older than 24h) not processed by Sentry causing a gap when retrieving notifications.
>   
> - Latest Sentry notification ID processed is bigger than current HMS notification ID.
>   This may happen when the HMS DB data was reset or restore from an old snapshot causing sync issues with Sentry.
>   
> When needHiveSnapshot is set to TRUE, then the HMSFollower will CLEAR any hive snapshot stored on the Sentry store, and recreate a
> new hive snapshot from scratch to keep Sentry in sync.
> 
> 
> Diffs
> -----
> 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java ef6786537e9c5f7730bc86d44e8b4a168c20677e 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java 5e6b906587f6422d9bf1466ab83815722bd51fb0 
> 
> 
> Diff: https://reviews.apache.org/r/59274/diff/1/
> 
> 
> Testing
> -------
> 
> HadoopQA is GREEN.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>

Re: Review Request 59274: SENTRY-1760: HMSFollower should detect when a full snapshot from HMS is required

Posted by Alexander Kolbasov <ak...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59274/#review174953
-----------------------------------------------------------




sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java
Lines 2513 (patched)
<https://reviews.apache.org/r/59274/#comment248251>

    Will this automatically clear all the paths or we need to do it separately somehow?



sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
Lines 328 (patched)
<https://reviews.apache.org/r/59274/#comment248252>

    note that getting eventId and getting events is done via to separate calls. New events may appear between these two calls, so it is very likely that the condition 
    
    response.getEvents().size() == expectedNotificationsSize)
    
    is false.


- Alexander Kolbasov


On May 15, 2017, 2:58 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59274/
> -----------------------------------------------------------
> 
> (Updated May 15, 2017, 2:58 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, kalyan kumar kalvagadda, and Na Li.
> 
> 
> Bugs: SENTRY-1760
>     https://issues.apache.org/jira/browse/SENTRY-1760
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> The patch will set the 'needHiveSnapshot' to TRUE whenever the following cases are found:
> 
> - List of notifications received are different than expected. 
>   This may happen when Sentry has been down or HDFS sync was disabled for a while (more than 24h), 
>   and the HMS cleared old notifications (older than 24h) not processed by Sentry causing a gap when retrieving notifications.
>   
> - Latest Sentry notification ID processed is bigger than current HMS notification ID.
>   This may happen when the HMS DB data was reset or restore from an old snapshot causing sync issues with Sentry.
>   
> When needHiveSnapshot is set to TRUE, then the HMSFollower will CLEAR any hive snapshot stored on the Sentry store, and recreate a
> new hive snapshot from scratch to keep Sentry in sync.
> 
> 
> Diffs
> -----
> 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java ef6786537e9c5f7730bc86d44e8b4a168c20677e 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java 5e6b906587f6422d9bf1466ab83815722bd51fb0 
> 
> 
> Diff: https://reviews.apache.org/r/59274/diff/1/
> 
> 
> Testing
> -------
> 
> HadoopQA is GREEN.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>

Re: Review Request 59274: SENTRY-1760: HMSFollower should detect when a full snapshot from HMS is required

Posted by kalyan kumar kalvagadda <kk...@cloudera.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59274/#review175142
-----------------------------------------------------------




sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
Lines 303 (patched)
<https://reviews.apache.org/r/59274/#comment248519>

    When decision is made to create a snapshot again, 
    1. Namenode plug-in should be forced/signalled to request for a full snapshot. otherwise the newly created snapshot will never be in use untill namenode restarts.
    
    2. Apart from path update, you may want to reset perm update so that name node can start fresh.(I'm not 100% sure here. Please check)


- kalyan kumar kalvagadda


On May 15, 2017, 2:58 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59274/
> -----------------------------------------------------------
> 
> (Updated May 15, 2017, 2:58 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, kalyan kumar kalvagadda, and Na Li.
> 
> 
> Bugs: SENTRY-1760
>     https://issues.apache.org/jira/browse/SENTRY-1760
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> The patch will set the 'needHiveSnapshot' to TRUE whenever the following cases are found:
> 
> - List of notifications received are different than expected. 
>   This may happen when Sentry has been down or HDFS sync was disabled for a while (more than 24h), 
>   and the HMS cleared old notifications (older than 24h) not processed by Sentry causing a gap when retrieving notifications.
>   
> - Latest Sentry notification ID processed is bigger than current HMS notification ID.
>   This may happen when the HMS DB data was reset or restore from an old snapshot causing sync issues with Sentry.
>   
> When needHiveSnapshot is set to TRUE, then the HMSFollower will CLEAR any hive snapshot stored on the Sentry store, and recreate a
> new hive snapshot from scratch to keep Sentry in sync.
> 
> 
> Diffs
> -----
> 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java ef6786537e9c5f7730bc86d44e8b4a168c20677e 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java 5e6b906587f6422d9bf1466ab83815722bd51fb0 
> 
> 
> Diff: https://reviews.apache.org/r/59274/diff/1/
> 
> 
> Testing
> -------
> 
> HadoopQA is GREEN.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>

Re: Review Request 59274: SENTRY-1760: HMSFollower should detect when a full snapshot from HMS is required

Posted by kalyan kumar kalvagadda <kk...@cloudera.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59274/#review174976
-----------------------------------------------------------




sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
Lines 304 (patched)
<https://reviews.apache.org/r/59274/#comment248280>

    When HMSFOllower descides to take a snapshot, PATH UPDATE information should be clearned here.



sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
Line 322 (original), 325 (patched)
<https://reviews.apache.org/r/59274/#comment248276>

    Sergio, Do you think requesting all the notificaitons is good idea. How about having a cap on the number of notification by sending that value instead of Integer.MAX_VALUE.



sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
Lines 341 (patched)
<https://reviews.apache.org/r/59274/#comment248282>

    Last notification ID information stored should also be be reset.



sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
Lines 348 (patched)
<https://reviews.apache.org/r/59274/#comment248281>

    Last notification ID information stored should also be be reset.


- kalyan kumar kalvagadda


On May 15, 2017, 2:58 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59274/
> -----------------------------------------------------------
> 
> (Updated May 15, 2017, 2:58 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, kalyan kumar kalvagadda, and Na Li.
> 
> 
> Bugs: SENTRY-1760
>     https://issues.apache.org/jira/browse/SENTRY-1760
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> The patch will set the 'needHiveSnapshot' to TRUE whenever the following cases are found:
> 
> - List of notifications received are different than expected. 
>   This may happen when Sentry has been down or HDFS sync was disabled for a while (more than 24h), 
>   and the HMS cleared old notifications (older than 24h) not processed by Sentry causing a gap when retrieving notifications.
>   
> - Latest Sentry notification ID processed is bigger than current HMS notification ID.
>   This may happen when the HMS DB data was reset or restore from an old snapshot causing sync issues with Sentry.
>   
> When needHiveSnapshot is set to TRUE, then the HMSFollower will CLEAR any hive snapshot stored on the Sentry store, and recreate a
> new hive snapshot from scratch to keep Sentry in sync.
> 
> 
> Diffs
> -----
> 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java ef6786537e9c5f7730bc86d44e8b4a168c20677e 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java 5e6b906587f6422d9bf1466ab83815722bd51fb0 
> 
> 
> Diff: https://reviews.apache.org/r/59274/diff/1/
> 
> 
> Testing
> -------
> 
> HadoopQA is GREEN.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>

Re: Review Request 59274: SENTRY-1760: HMSFollower should detect when a full snapshot from HMS is required

Posted by Sergio Pena <se...@cloudera.com>.


> On May 15, 2017, 4:22 p.m., Alexander Kolbasov wrote:
> > sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
> > Lines 328 (patched)
> > <https://reviews.apache.org/r/59274/diff/1/?file=1718347#file1718347line329>
> >
> >     "(which inditicates a gap and some notifications are removed from meta store and not received by Sentry)"
> >     
> >     what do you mean by that? What would be the scenario that would cause this to happen?
> >     
> >     Sergio - what is the speci for notifications - does it guarantee that it returns *all* events since the specified value or *some number of consequitive events*"?

Scenarios where this gap might happen are:
- Sentry is down for more than 24h. HMS clears all entries that has been in the DB for more than 24h. If HMS cleares them before Sentry starts, then there is a gap.
- Same as HDFS sync disabled. If HDFS sync is disabled at some point after Sentry already did a snapshot, and HDFS sync is disabled for more than 24h, then we have a gap too.

Regarding the number of notifications. The HMS will return all notifications from the specified value up to the maximum specified. Although it might get less if we specify a filter (sentry does not use a filter). 

See http://github.mtv.cloudera.com/CDH/hive/blob/cdh5-1.1.0/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L7629

Although there is not comments that will guarantee this behavior. This being an external project, we always get the risk that the API implementation changes. 
We could try to check if the older notification ID received is the next sequantial ID.

IMPROVEMENT: We could write code to improve this and request notifications in batches (in Sentry side) on every HMSFollower run.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59274/#review174967
-----------------------------------------------------------


On May 15, 2017, 2:58 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59274/
> -----------------------------------------------------------
> 
> (Updated May 15, 2017, 2:58 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, kalyan kumar kalvagadda, and Na Li.
> 
> 
> Bugs: SENTRY-1760
>     https://issues.apache.org/jira/browse/SENTRY-1760
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> The patch will set the 'needHiveSnapshot' to TRUE whenever the following cases are found:
> 
> - List of notifications received are different than expected. 
>   This may happen when Sentry has been down or HDFS sync was disabled for a while (more than 24h), 
>   and the HMS cleared old notifications (older than 24h) not processed by Sentry causing a gap when retrieving notifications.
>   
> - Latest Sentry notification ID processed is bigger than current HMS notification ID.
>   This may happen when the HMS DB data was reset or restore from an old snapshot causing sync issues with Sentry.
>   
> When needHiveSnapshot is set to TRUE, then the HMSFollower will CLEAR any hive snapshot stored on the Sentry store, and recreate a
> new hive snapshot from scratch to keep Sentry in sync.
> 
> 
> Diffs
> -----
> 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java ef6786537e9c5f7730bc86d44e8b4a168c20677e 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java 5e6b906587f6422d9bf1466ab83815722bd51fb0 
> 
> 
> Diff: https://reviews.apache.org/r/59274/diff/1/
> 
> 
> Testing
> -------
> 
> HadoopQA is GREEN.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>

Re: Review Request 59274: SENTRY-1760: HMSFollower should detect when a full snapshot from HMS is required

Posted by Alexander Kolbasov <ak...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59274/#review174967
-----------------------------------------------------------




sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java
Lines 328 (patched)
<https://reviews.apache.org/r/59274/#comment248270>

    "(which inditicates a gap and some notifications are removed from meta store and not received by Sentry)"
    
    what do you mean by that? What would be the scenario that would cause this to happen?
    
    Sergio - what is the speci for notifications - does it guarantee that it returns *all* events since the specified value or *some number of consequitive events*"?


- Alexander Kolbasov


On May 15, 2017, 2:58 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59274/
> -----------------------------------------------------------
> 
> (Updated May 15, 2017, 2:58 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, kalyan kumar kalvagadda, and Na Li.
> 
> 
> Bugs: SENTRY-1760
>     https://issues.apache.org/jira/browse/SENTRY-1760
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> The patch will set the 'needHiveSnapshot' to TRUE whenever the following cases are found:
> 
> - List of notifications received are different than expected. 
>   This may happen when Sentry has been down or HDFS sync was disabled for a while (more than 24h), 
>   and the HMS cleared old notifications (older than 24h) not processed by Sentry causing a gap when retrieving notifications.
>   
> - Latest Sentry notification ID processed is bigger than current HMS notification ID.
>   This may happen when the HMS DB data was reset or restore from an old snapshot causing sync issues with Sentry.
>   
> When needHiveSnapshot is set to TRUE, then the HMSFollower will CLEAR any hive snapshot stored on the Sentry store, and recreate a
> new hive snapshot from scratch to keep Sentry in sync.
> 
> 
> Diffs
> -----
> 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java ef6786537e9c5f7730bc86d44e8b4a168c20677e 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/HMSFollower.java 5e6b906587f6422d9bf1466ab83815722bd51fb0 
> 
> 
> Diff: https://reviews.apache.org/r/59274/diff/1/
> 
> 
> Testing
> -------
> 
> HadoopQA is GREEN.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>