You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sentry.apache.org by Kalyan Kumar Kalvagadda <kk...@cloudera.com> on 2017/04/28 04:14:22 UTC

SQL changes for SENTRY-1726

Hello all,

As part of SENTRY-1726, I'm adding new table to store notification-id. HMS
follower just needs the the latest notification-id that sentry has
processed.
All HMS follower needs is the latest notification. As we need not store
older notification-id's, I defined the table to hold only one record so
that we can avoid inserting more records in the table and the application
should just update the existing record.


CREATE TABLE `SENTRY_LAST_NOTIFICATION_ID`
(
    `NOTIFICATION_ID` BIGINT NOT NULL,
    `RESTRICTION` VARCHAR(15) NOT NULL DEFAULT 'singleton',
    CONSTRAINT `SENTRY_NOTIFICATION_PK` PRIMARY KEY (`RESTRICTION`)
)ENGINE=INNODB;


Application just needs to insert/update the NOTIFICATION_ID. Once we
insert for the fist time, it is update only.

On the other hand, If we have NOTIFICATION_ID as primary key and keep
on inserting NOTIFICATION_ID's into

 the table. we have to use MAX(NOTIFICATION_ID) every time we need the
notification id. Additionally, we need to build logic to cleanup older
entries.


You can refer to review board for complete change set.

https://reviews.apache.org/r/58808/


-Kalyan

Re: SQL changes for SENTRY-1726

Posted by Alexander Kolbasov <ak...@cloudera.com>.
This makes sense. Since normally HMSFollower is only running on one server, there shouldn’t be any contention, so read/modify/write should be fine in this case.

> On Apr 28, 2017, at 8:30 AM, Kalyan Kumar Kalvagadda <kk...@cloudera.com> wrote:
> 
> Help me understand why should we be concerned about locking.
> 
> As far I know there are two threads that will be interested in last
> notification id
> 1. HMS Follower to construct  request to pull the latest notification
> 2. Sentry Metrics
> 
> HMSFollower should get this information from database every time it needs
> and have a in-memory copy. Other components like Metrics could depend on
> the in-memory copy stored by HMSFollower.
> Even if, we want to avoid using in-memory copy and use the database always,
> I don't know if locking would be an issue as there will not many requests
> to read the notification-id other then HMSFollower thread.
> 
> 
> -Kalyan
> 
> On Fri, Apr 28, 2017 at 2:00 AM, Alexander Kolbasov <ak...@cloudera.com>
> wrote:
> 
>> Hello Kalyan,
>> 
>>> On Apr 27, 2017, at 9:14 PM, Kalyan Kumar Kalvagadda <
>> kkalyan@cloudera.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> As part of SENTRY-1726, I'm adding new table to store notification-id.
>> HMS
>>> follower just needs the the latest notification-id that sentry has
>>> processed.
>>> All HMS follower needs is the latest notification. As we need not store
>>> older notification-id's, I defined the table to hold only one record so
>>> that we can avoid inserting more records in the table and the application
>>> should just update the existing record.
>> 
>> I think it is better to store new notification ID as a new row .This will
>> avoid holding a row lock for read-edify-write operation. The dwnside that
>> we need to use MAX(), but given that tis is the primary key, MAX() should
>> be an index scan rather then a table scan. And yes, we do need to prune old
>> rows, we should be able to do that in the background to keep the table size
>> small.
>> 
>>> 
>>> 
>>> CREATE TABLE `SENTRY_LAST_NOTIFICATION_ID`
>>> (
>>>   `NOTIFICATION_ID` BIGINT NOT NULL,
>>>   `RESTRICTION` VARCHAR(15) NOT NULL DEFAULT 'singleton',
>>>   CONSTRAINT `SENTRY_NOTIFICATION_PK` PRIMARY KEY (`RESTRICTION`)
>>> )ENGINE=INNODB;
>>> 
>>> 
>>> Application just needs to insert/update the NOTIFICATION_ID. Once we
>>> insert for the fist time, it is update only.
>>> 
>>> On the other hand, If we have NOTIFICATION_ID as primary key and keep
>>> on inserting NOTIFICATION_ID's into
>>> 
>>> the table. we have to use MAX(NOTIFICATION_ID) every time we need the
>>> notification id. Additionally, we need to build logic to cleanup older
>>> entries.
>>> 
>>> 
>>> You can refer to review board for complete change set.
>>> 
>>> https://reviews.apache.org/r/58808/
>>> 
>>> 
>>> -Kalyan
>> 
>> 


Re: SQL changes for SENTRY-1726

Posted by Kalyan Kumar Kalvagadda <kk...@cloudera.com>.
Help me understand why should we be concerned about locking.

As far I know there are two threads that will be interested in last
notification id
1. HMS Follower to construct  request to pull the latest notification
2. Sentry Metrics

HMSFollower should get this information from database every time it needs
and have a in-memory copy. Other components like Metrics could depend on
the in-memory copy stored by HMSFollower.
Even if, we want to avoid using in-memory copy and use the database always,
I don't know if locking would be an issue as there will not many requests
to read the notification-id other then HMSFollower thread.


-Kalyan

On Fri, Apr 28, 2017 at 2:00 AM, Alexander Kolbasov <ak...@cloudera.com>
wrote:

> Hello Kalyan,
>
> > On Apr 27, 2017, at 9:14 PM, Kalyan Kumar Kalvagadda <
> kkalyan@cloudera.com> wrote:
> >
> > Hello all,
> >
> > As part of SENTRY-1726, I'm adding new table to store notification-id.
> HMS
> > follower just needs the the latest notification-id that sentry has
> > processed.
> > All HMS follower needs is the latest notification. As we need not store
> > older notification-id's, I defined the table to hold only one record so
> > that we can avoid inserting more records in the table and the application
> > should just update the existing record.
>
> I think it is better to store new notification ID as a new row .This will
> avoid holding a row lock for read-edify-write operation. The dwnside that
> we need to use MAX(), but given that tis is the primary key, MAX() should
> be an index scan rather then a table scan. And yes, we do need to prune old
> rows, we should be able to do that in the background to keep the table size
> small.
>
> >
> >
> > CREATE TABLE `SENTRY_LAST_NOTIFICATION_ID`
> > (
> >    `NOTIFICATION_ID` BIGINT NOT NULL,
> >    `RESTRICTION` VARCHAR(15) NOT NULL DEFAULT 'singleton',
> >    CONSTRAINT `SENTRY_NOTIFICATION_PK` PRIMARY KEY (`RESTRICTION`)
> > )ENGINE=INNODB;
> >
> >
> > Application just needs to insert/update the NOTIFICATION_ID. Once we
> > insert for the fist time, it is update only.
> >
> > On the other hand, If we have NOTIFICATION_ID as primary key and keep
> > on inserting NOTIFICATION_ID's into
> >
> > the table. we have to use MAX(NOTIFICATION_ID) every time we need the
> > notification id. Additionally, we need to build logic to cleanup older
> > entries.
> >
> >
> > You can refer to review board for complete change set.
> >
> > https://reviews.apache.org/r/58808/
> >
> >
> > -Kalyan
>
>

Re: SQL changes for SENTRY-1726

Posted by Alexander Kolbasov <ak...@cloudera.com>.
Hello Kalyan,

> On Apr 27, 2017, at 9:14 PM, Kalyan Kumar Kalvagadda <kk...@cloudera.com> wrote:
> 
> Hello all,
> 
> As part of SENTRY-1726, I'm adding new table to store notification-id. HMS
> follower just needs the the latest notification-id that sentry has
> processed.
> All HMS follower needs is the latest notification. As we need not store
> older notification-id's, I defined the table to hold only one record so
> that we can avoid inserting more records in the table and the application
> should just update the existing record.

I think it is better to store new notification ID as a new row .This will avoid holding a row lock for read-edify-write operation. The dwnside that we need to use MAX(), but given that tis is the primary key, MAX() should be an index scan rather then a table scan. And yes, we do need to prune old rows, we should be able to do that in the background to keep the table size small.

> 
> 
> CREATE TABLE `SENTRY_LAST_NOTIFICATION_ID`
> (
>    `NOTIFICATION_ID` BIGINT NOT NULL,
>    `RESTRICTION` VARCHAR(15) NOT NULL DEFAULT 'singleton',
>    CONSTRAINT `SENTRY_NOTIFICATION_PK` PRIMARY KEY (`RESTRICTION`)
> )ENGINE=INNODB;
> 
> 
> Application just needs to insert/update the NOTIFICATION_ID. Once we
> insert for the fist time, it is update only.
> 
> On the other hand, If we have NOTIFICATION_ID as primary key and keep
> on inserting NOTIFICATION_ID's into
> 
> the table. we have to use MAX(NOTIFICATION_ID) every time we need the
> notification id. Additionally, we need to build logic to cleanup older
> entries.
> 
> 
> You can refer to review board for complete change set.
> 
> https://reviews.apache.org/r/58808/
> 
> 
> -Kalyan


Re: SQL changes for SENTRY-1726

Posted by Alexander Kolbasov <ak...@cloudera.com>.
> On May 8, 2017, at 5:11 PM, Kalyan Kumar Kalvagadda <kk...@cloudera.com> wrote:
> 
> Even if HMSFollower is running in both the nodes, we can have simple logic
> in HMSFollower to avoid processing duplicate notifications.
> Before processing the notification, if a check is added to see if the new
> notification id is greater than the one stored in database. If not, we
> ignore the notification.

Imagine two threads. Both start with the same notification ID. Both see that the new one is greater then the on win the database. And both successfully write the new value. What would prevent this scenario from happening?

Another scenario (much more dangerous): 

Thread one reads current notification ID and it is 10
    after that thread 2 processes ten notifications and now current ID is 20
Thread 1 processes single notification and writes value 11. Note that it will not see updates from another threads because we are using repeatable-read transaction isolation level.

- Alex

> 
> -Kalyan
> 
> On Mon, May 8, 2017 at 7:03 PM, Alexander Kolbasov <ak...@cloudera.com>
> wrote:
> 
>> The major problem with read-modify-write approach is that it is difficult
>> to handle the case with two writers trying to update the value at the same
>> time. If you handle this by adding new rows and having the ID as the
>> primary key, one writer will succeed and another will fail because the key
>> already exists. How would you achieve this in your approach?
>> 
>> - Alex
>> 
>>> On Apr 27, 2017, at 9:14 PM, Kalyan Kumar Kalvagadda <
>> kkalyan@cloudera.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> As part of SENTRY-1726, I'm adding new table to store notification-id.
>> HMS
>>> follower just needs the the latest notification-id that sentry has
>>> processed.
>>> All HMS follower needs is the latest notification. As we need not store
>>> older notification-id's, I defined the table to hold only one record so
>>> that we can avoid inserting more records in the table and the application
>>> should just update the existing record.
>>> 
>>> 
>>> CREATE TABLE `SENTRY_LAST_NOTIFICATION_ID`
>>> (
>>>   `NOTIFICATION_ID` BIGINT NOT NULL,
>>>   `RESTRICTION` VARCHAR(15) NOT NULL DEFAULT 'singleton',
>>>   CONSTRAINT `SENTRY_NOTIFICATION_PK` PRIMARY KEY (`RESTRICTION`)
>>> )ENGINE=INNODB;
>>> 
>>> 
>>> Application just needs to insert/update the NOTIFICATION_ID. Once we
>>> insert for the fist time, it is update only.
>>> 
>>> On the other hand, If we have NOTIFICATION_ID as primary key and keep
>>> on inserting NOTIFICATION_ID's into
>>> 
>>> the table. we have to use MAX(NOTIFICATION_ID) every time we need the
>>> notification id. Additionally, we need to build logic to cleanup older
>>> entries.
>>> 
>>> 
>>> You can refer to review board for complete change set.
>>> 
>>> https://reviews.apache.org/r/58808/
>>> 
>>> 
>>> -Kalyan
>> 
>> 


Re: SQL changes for SENTRY-1726

Posted by Kalyan Cloudera <kk...@cloudera.com>.
I see there is an issue. Let me think and get back.

-kalyan

> On May 8, 2017, at 7:11 PM, Kalyan Kumar Kalvagadda <kk...@cloudera.com> wrote:
> 
> 
> Even if HMSFollower is running in both the nodes, we can have simple logic in HMSFollower to avoid processing duplicate notifications.
> Before processing the notification, if a check is added to see if the new notification id is greater than the one stored in database. If not, we ignore the notification.
> 
> -Kalyan
> 
>> On Mon, May 8, 2017 at 7:03 PM, Alexander Kolbasov <ak...@cloudera.com> wrote:
>> The major problem with read-modify-write approach is that it is difficult to handle the case with two writers trying to update the value at the same time. If you handle this by adding new rows and having the ID as the primary key, one writer will succeed and another will fail because the key already exists. How would you achieve this in your approach?
>> 
>> - Alex
>> 
>> > On Apr 27, 2017, at 9:14 PM, Kalyan Kumar Kalvagadda <kk...@cloudera.com> wrote:
>> >
>> > Hello all,
>> >
>> > As part of SENTRY-1726, I'm adding new table to store notification-id. HMS
>> > follower just needs the the latest notification-id that sentry has
>> > processed.
>> > All HMS follower needs is the latest notification. As we need not store
>> > older notification-id's, I defined the table to hold only one record so
>> > that we can avoid inserting more records in the table and the application
>> > should just update the existing record.
>> >
>> >
>> > CREATE TABLE `SENTRY_LAST_NOTIFICATION_ID`
>> > (
>> >    `NOTIFICATION_ID` BIGINT NOT NULL,
>> >    `RESTRICTION` VARCHAR(15) NOT NULL DEFAULT 'singleton',
>> >    CONSTRAINT `SENTRY_NOTIFICATION_PK` PRIMARY KEY (`RESTRICTION`)
>> > )ENGINE=INNODB;
>> >
>> >
>> > Application just needs to insert/update the NOTIFICATION_ID. Once we
>> > insert for the fist time, it is update only.
>> >
>> > On the other hand, If we have NOTIFICATION_ID as primary key and keep
>> > on inserting NOTIFICATION_ID's into
>> >
>> > the table. we have to use MAX(NOTIFICATION_ID) every time we need the
>> > notification id. Additionally, we need to build logic to cleanup older
>> > entries.
>> >
>> >
>> > You can refer to review board for complete change set.
>> >
>> > https://reviews.apache.org/r/58808/
>> >
>> >
>> > -Kalyan
>> 
> 

Re: SQL changes for SENTRY-1726

Posted by Kalyan Kumar Kalvagadda <kk...@cloudera.com>.
Even if HMSFollower is running in both the nodes, we can have simple logic
in HMSFollower to avoid processing duplicate notifications.
Before processing the notification, if a check is added to see if the new
notification id is greater than the one stored in database. If not, we
ignore the notification.

-Kalyan

On Mon, May 8, 2017 at 7:03 PM, Alexander Kolbasov <ak...@cloudera.com>
wrote:

> The major problem with read-modify-write approach is that it is difficult
> to handle the case with two writers trying to update the value at the same
> time. If you handle this by adding new rows and having the ID as the
> primary key, one writer will succeed and another will fail because the key
> already exists. How would you achieve this in your approach?
>
> - Alex
>
> > On Apr 27, 2017, at 9:14 PM, Kalyan Kumar Kalvagadda <
> kkalyan@cloudera.com> wrote:
> >
> > Hello all,
> >
> > As part of SENTRY-1726, I'm adding new table to store notification-id.
> HMS
> > follower just needs the the latest notification-id that sentry has
> > processed.
> > All HMS follower needs is the latest notification. As we need not store
> > older notification-id's, I defined the table to hold only one record so
> > that we can avoid inserting more records in the table and the application
> > should just update the existing record.
> >
> >
> > CREATE TABLE `SENTRY_LAST_NOTIFICATION_ID`
> > (
> >    `NOTIFICATION_ID` BIGINT NOT NULL,
> >    `RESTRICTION` VARCHAR(15) NOT NULL DEFAULT 'singleton',
> >    CONSTRAINT `SENTRY_NOTIFICATION_PK` PRIMARY KEY (`RESTRICTION`)
> > )ENGINE=INNODB;
> >
> >
> > Application just needs to insert/update the NOTIFICATION_ID. Once we
> > insert for the fist time, it is update only.
> >
> > On the other hand, If we have NOTIFICATION_ID as primary key and keep
> > on inserting NOTIFICATION_ID's into
> >
> > the table. we have to use MAX(NOTIFICATION_ID) every time we need the
> > notification id. Additionally, we need to build logic to cleanup older
> > entries.
> >
> >
> > You can refer to review board for complete change set.
> >
> > https://reviews.apache.org/r/58808/
> >
> >
> > -Kalyan
>
>

Re: SQL changes for SENTRY-1726

Posted by Alexander Kolbasov <ak...@cloudera.com>.
The major problem with read-modify-write approach is that it is difficult to handle the case with two writers trying to update the value at the same time. If you handle this by adding new rows and having the ID as the primary key, one writer will succeed and another will fail because the key already exists. How would you achieve this in your approach?

- Alex

> On Apr 27, 2017, at 9:14 PM, Kalyan Kumar Kalvagadda <kk...@cloudera.com> wrote:
> 
> Hello all,
> 
> As part of SENTRY-1726, I'm adding new table to store notification-id. HMS
> follower just needs the the latest notification-id that sentry has
> processed.
> All HMS follower needs is the latest notification. As we need not store
> older notification-id's, I defined the table to hold only one record so
> that we can avoid inserting more records in the table and the application
> should just update the existing record.
> 
> 
> CREATE TABLE `SENTRY_LAST_NOTIFICATION_ID`
> (
>    `NOTIFICATION_ID` BIGINT NOT NULL,
>    `RESTRICTION` VARCHAR(15) NOT NULL DEFAULT 'singleton',
>    CONSTRAINT `SENTRY_NOTIFICATION_PK` PRIMARY KEY (`RESTRICTION`)
> )ENGINE=INNODB;
> 
> 
> Application just needs to insert/update the NOTIFICATION_ID. Once we
> insert for the fist time, it is update only.
> 
> On the other hand, If we have NOTIFICATION_ID as primary key and keep
> on inserting NOTIFICATION_ID's into
> 
> the table. we have to use MAX(NOTIFICATION_ID) every time we need the
> notification id. Additionally, we need to build logic to cleanup older
> entries.
> 
> 
> You can refer to review board for complete change set.
> 
> https://reviews.apache.org/r/58808/
> 
> 
> -Kalyan