You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@sentry.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2019/01/25 22:36:00 UTC

[jira] [Commented] (SENTRY-2491) Sentry High availability unit tests run into deadlock sometimes

    [ https://issues.apache.org/jira/browse/SENTRY-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752760#comment-16752760 ] 

Hadoop QA commented on SENTRY-2491:
-----------------------------------

Here are the results of testing the latest attachment
https://issues.apache.org/jira/secure/attachment/12956400/SENTRY-2491.002.patch against master.

{color:red}Overall:{color} -1 due to 2 errors

{color:red}ERROR:{color} mvn test exited 1
{color:red}ERROR:{color} Failed: org.apache.sentry.hdfs.TestSentryHDFSServiceProcessor

Console output: https://builds.apache.org/job/PreCommit-SENTRY-Build/4337/console

This message is automatically generated.

> Sentry High availability unit tests run into deadlock sometimes
> ---------------------------------------------------------------
>
>                 Key: SENTRY-2491
>                 URL: https://issues.apache.org/jira/browse/SENTRY-2491
>             Project: Sentry
>          Issue Type: Bug
>          Components: Sentry
>    Affects Versions: 2.2.0
>            Reporter: Na Li
>            Assignee: Na Li
>            Priority: Major
>         Attachments: SENTRY-2491.001.patch, SENTRY-2491.002.patch
>
>
> In sentry unit tests, we don't create schema before running a test. Instead, we use dataNucleus to create sentry tables when they are accessed. This creates potential deadlock when running test for Sentry HA setup. 
> For example, the following shows the event sequence that causes deadlock
> 1) thread_1 gets shared lock of SYSTABLES in order to read table SENTRY_HMS_NOTIFICATION_ID
> 2) thread_2 gets shared lock of SYSTABLES in order to read table SENTRY_HMS_NOTIFICATION_ID
> 3) thread_1 tries to get execution lock to create table SENTRY_HMS_NOTIFICATION_ID,
>    and wait for execution lock because thread_2 got shared lock already.
> 4) thread_2 tries to get execution lock to create table SENTRY_HMS_NOTIFICATION_ID,
>    and wait for execution lock because thread_1 got shared lock already.
> The solution is to let the instances of sentry service start with delay. Specifically,
> let HMS follower threads separate as far as possible, i.e., half of the interval.
>       
> This deadlock only exists in unit tests, and does not exist in production because schema is created before starting Sentry services. Therefore, there is no table creation after service starts.
> Example of the log message when such deadlock happens
> {code}
> 2019-01-04 18:32:46,332 (pool-13-thread-1) [INFO - org.apache.sentry.hdfs.DBUpdateForwarder.getAllUpdatesFrom(DBUpdateForwarder.java:140)] (org.apache.sentry.hdfs.PermImageRetriever) Requested sequence number 0 is less than 0 or requested deltas for that sequence number are not available. Fetch a full update
> 2019-01-04 18:32:49,346 (hms-follower) [DEBUG - org.apache.sentry.service.thrift.SentryStateBank.enableState(SentryStateBank.java:102)] HMSFollower entered state STARTED
> 2019-01-04 18:32:50,091 (hms-follower) [DEBUG - org.apache.sentry.service.thrift.SentryStateBank.enableState(SentryStateBank.java:102)] HMSFollower entered state STARTED
> 2019-01-04 18:33:00,286 (store-cleaner) [ERROR - org.datanucleus.util.Log4JLogger.error(Log4JLogger.java:115)] Error thrown executing CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
>     NOTIFICATION_ID BIGINT NOT NULL
> ) : A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
> Lock : ROW, SYSTABLES, (1,3)
>   Waiting XID : {436, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
>     NOTIFICATION_ID BIGINT NOT NULL
> )
>   Granted XID : {419, S} 
> Lock : ROW, SYSTABLES, (1,3)
>   Waiting XID : {419, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
>     NOTIFICATION_ID BIGINT NOT NULL
> )
>   Granted XID : {419, S} , {436, S} 
> . The selected victim is XID : 436.
> java.sql.SQLTransactionRollbackException: A lock could not be obtained due to a deadlock, cycle of locks and waiters is:
> Lock : ROW, SYSTABLES, (1,3)
>   Waiting XID : {436, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
>     NOTIFICATION_ID BIGINT NOT NULL
> )
>   Granted XID : {419, S} 
> Lock : ROW, SYSTABLES, (1,3)
>   Waiting XID : {419, X} , SENTRY, CREATE TABLE SENTRY_HMS_NOTIFICATION_ID
> (
>     NOTIFICATION_ID BIGINT NOT NULL
> )
>   Granted XID : {419, S} , {436, S} 
> . The selected victim is XID : 436.
> 	at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
> 	at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254)
> 	at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:879)
> 	at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:830)
> 	at org.datanucleus.store.rdbms.table.AbstractTable.create(AbstractTable.java:546)
> 	at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:609)
> 	at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3385)
> 	at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2896)
> 	at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
> 	at 
> ...[truncated 36339941 chars]...
> eImpl(RetryClientInvocationHandler.java:94)] Calling getAllUpdatesFrom
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)