You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@sentry.apache.org by Lars Francke <la...@gmail.com> on 2018/11/19 11:03:20 UTC

Question about the SENTRY_HMS_NOTIFICATION_ID table

Hi,

we are/were facing an issue where we were running into an OOM error in
Sentry. This happened right after startup and it is because of
SENTRY-2461[1].

We're trying to understand how this is supposed to work.

We see HmsFollower waking up every 500ms.
It looks at the current notification Id we have stored
in SENTRY_HMS_NOTIFICATION_ID (e.g. 100).
It then subtracts 1 (e.g. 99) and retrieves any notifications newer than
that (e.g. "100").
HmsFollower then calls NotificationProcessor#processNotificationEvent to
process the event.
If that method returns false we explicitly persist the notification Id

There's a few things that we find weird about this:
* We see hundreds of duplicates being processed because we never check if
we already have processed the same notification id. Now I do see comments
in the code that Hive reuses those ids so this might be intentional?

* But we also store these duplicate notification ids over and over. In the
normal scenario where the id hasn't changed in the last 500ms we just store
it again and because the table does not have a constraint on duplicates it
stores tens of thousands of duplicates for us at least until the purging
process kicks in 12h later

I guess our question is: Is this normal behavior? Should we see thousands
of duplicates and should Sentry reprocess the last notification over and
over?

Thank you very much!

Cheers,
Lars

[1] <https://issues.apache.org/jira/browse/SENTRY-2461>

Re: Question about the SENTRY_HMS_NOTIFICATION_ID table

Posted by Lars Francke <la...@gmail.com>.

Kalyan,

thank you for the detailed explanation. I missed the cache. And yeah, we're
running into SENTRY-2422. We'll monitor and wait.

Thanks again!

Cheers,
Lars

On Mon, Nov 19, 2018 at 4:18 PM Kalyan Kumar Kalvagadda
<kk...@cloudera.com.invalid> wrote:

> Lars,
>
>
>    1. SENTRY-2461 <https://issues.apache.org/jira/browse/SENTRY-2461>
> should
>    definitely be fixed. Current approach is not efficient.
>    2.
>    3.   Sentry currently checks if the notification is already processed.
>    It is in HiveNotificationFetcher.java
>
>
> >    1.
> >
> >    try {
> >      if (cache.contains(hash) ||
> sentryStore.isNotificationProcessed(hash)) {
> >        cache.add(hash);
> >
> >        LOGGER.debug("Ignoring HMS notification already processed: ID =
> {}", id);
> >        return false;
> >      }
> >
> >
> > But there is issue(SENTRY-2422
> <https://issues.apache.org/jira/browse/SENTRY-2422>) in this check.
> "isNotificationProcessed"
> checks if the notification is processed by looking at SENTRY_PATH_CHANGE
> table.
> This is where the issue exists as sentry doesn't record all the
> notifications in SENTRY_PATH_CHANGE table. Let's say if there is alter
> table where the name and location is not change, sentry doesn't record it.
>
> Simple solution for this is to use SENTRY_HMS_NOTIFICATION_ID table instead
> of SENTRY_PATH_CHANGE to verify if the notification is processed.
> This will introduce new issues with the Hive version 2.3.3. In this version
> of Hive, event id generated in NOTIFICATION_LOG table could be out-of-order
> and even have duplicate event-id's. That is why we are working on
> bumping up the hive version which the fix before making changes in Sentry.
>
>    1.
>       1.
>       2. SENTRY-2422 <https://issues.apache.org/jira/browse/SENTRY-2422>
> will
>    be fixed after bumping up the hive version. For now, as a work around
> you
>    can reduce the purge time by providing the configuration "
>
> sentry.store.clean.period.seconds" to avoid OOM memory issue.
>
>
>
> *Thanks,Kalyan Kumar Kalvagadda* | Software Engineer
> t. (469) 279- <0000000000>5732
> cloudera.com <https://www.cloudera.com>
>
> [image: Cloudera] <https://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>
>
> On Mon, Nov 19, 2018 at 5:04 AM Lars Francke <la...@gmail.com>
> wrote:
>
> > Hi,
> >
> > we are/were facing an issue where we were running into an OOM error in
> > Sentry. This happened right after startup and it is because of
> > SENTRY-2461[1].
> >
> > We're trying to understand how this is supposed to work.
> >
> > We see HmsFollower waking up every 500ms.
> > It looks at the current notification Id we have stored
> > in SENTRY_HMS_NOTIFICATION_ID (e.g. 100).
> > It then subtracts 1 (e.g. 99) and retrieves any notifications newer than
> > that (e.g. "100").
> > HmsFollower then calls NotificationProcessor#processNotificationEvent to
> > process the event.
> > If that method returns false we explicitly persist the notification Id
> >
> > There's a few things that we find weird about this:
> > * We see hundreds of duplicates being processed because we never check if
> > we already have processed the same notification id. Now I do see comments
> > in the code that Hive reuses those ids so this might be intentional?
> >
> > * But we also store these duplicate notification ids over and over. In
> the
> > normal scenario where the id hasn't changed in the last 500ms we just
> store
> > it again and because the table does not have a constraint on duplicates
> it
> > stores tens of thousands of duplicates for us at least until the purging
> > process kicks in 12h later
> >
> > I guess our question is: Is this normal behavior? Should we see thousands
> > of duplicates and should Sentry reprocess the last notification over and
> > over?
> >
> > Thank you very much!
> >
> > Cheers,
> > Lars
> >
> > [1] <https://issues.apache.org/jira/browse/SENTRY-2461>
> >
>

Re: Question about the SENTRY_HMS_NOTIFICATION_ID table

Posted by Kalyan Kumar Kalvagadda <kk...@cloudera.com.INVALID>.

Lars,


   1. SENTRY-2461 <https://issues.apache.org/jira/browse/SENTRY-2461> should
   definitely be fixed. Current approach is not efficient.
   2.
   3.   Sentry currently checks if the notification is already processed.
   It is in HiveNotificationFetcher.java


>    1.
>
>    try {
>      if (cache.contains(hash) || sentryStore.isNotificationProcessed(hash)) {
>        cache.add(hash);
>
>        LOGGER.debug("Ignoring HMS notification already processed: ID = {}", id);
>        return false;
>      }
>
>
> But there is issue(SENTRY-2422
<https://issues.apache.org/jira/browse/SENTRY-2422>) in this check.
"isNotificationProcessed"
checks if the notification is processed by looking at SENTRY_PATH_CHANGE
table.
This is where the issue exists as sentry doesn't record all the
notifications in SENTRY_PATH_CHANGE table. Let's say if there is alter
table where the name and location is not change, sentry doesn't record it.

Simple solution for this is to use SENTRY_HMS_NOTIFICATION_ID table instead
of SENTRY_PATH_CHANGE to verify if the notification is processed.
This will introduce new issues with the Hive version 2.3.3. In this version
of Hive, event id generated in NOTIFICATION_LOG table could be out-of-order
and even have duplicate event-id's. That is why we are working on
bumping up the hive version which the fix before making changes in Sentry.

   1.
      1.
      2. SENTRY-2422 <https://issues.apache.org/jira/browse/SENTRY-2422> will
   be fixed after bumping up the hive version. For now, as a work around you
   can reduce the purge time by providing the configuration "

sentry.store.clean.period.seconds" to avoid OOM memory issue.



*Thanks,Kalyan Kumar Kalvagadda* | Software Engineer
t. (469) 279- <0000000000>5732
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------


On Mon, Nov 19, 2018 at 5:04 AM Lars Francke <la...@gmail.com> wrote:

> Hi,
>
> we are/were facing an issue where we were running into an OOM error in
> Sentry. This happened right after startup and it is because of
> SENTRY-2461[1].
>
> We're trying to understand how this is supposed to work.
>
> We see HmsFollower waking up every 500ms.
> It looks at the current notification Id we have stored
> in SENTRY_HMS_NOTIFICATION_ID (e.g. 100).
> It then subtracts 1 (e.g. 99) and retrieves any notifications newer than
> that (e.g. "100").
> HmsFollower then calls NotificationProcessor#processNotificationEvent to
> process the event.
> If that method returns false we explicitly persist the notification Id
>
> There's a few things that we find weird about this:
> * We see hundreds of duplicates being processed because we never check if
> we already have processed the same notification id. Now I do see comments
> in the code that Hive reuses those ids so this might be intentional?
>
> * But we also store these duplicate notification ids over and over. In the
> normal scenario where the id hasn't changed in the last 500ms we just store
> it again and because the table does not have a constraint on duplicates it
> stores tens of thousands of duplicates for us at least until the purging
> process kicks in 12h later
>
> I guess our question is: Is this normal behavior? Should we see thousands
> of duplicates and should Sentry reprocess the last notification over and
> over?
>
> Thank you very much!
>
> Cheers,
> Lars
>
> [1] <https://issues.apache.org/jira/browse/SENTRY-2461>
>