You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Sarath Subramanian (JIRA)" <ji...@apache.org> on 2017/04/05 07:12:41 UTC

[jira] [Updated] (ATLAS-1720) Add titan storage.lock.wait-time for Berkley DB to fix intermittent IT failures

     [ https://issues.apache.org/jira/browse/ATLAS-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sarath Subramanian updated ATLAS-1720:
--------------------------------------
    Summary: Add titan storage.lock.wait-time for Berkley DB to fix intermittent IT failures   (was: Increase titan storage.lock.wait-time for Berkley DB to fix intermittent IT failures )

> Add titan storage.lock.wait-time for Berkley DB to fix intermittent IT failures 
> --------------------------------------------------------------------------------
>
>                 Key: ATLAS-1720
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1720
>             Project: Atlas
>          Issue Type: Bug
>          Components:  atlas-core
>    Affects Versions: trunk, 0.9-incubating
>            Reporter: Sarath Subramanian
>            Assignee: Sarath Subramanian
>
> Some of the ITs in Atlas fail intermittently with exception - "Could not execute operation due to backend exception"
> Upon investigation it's found this is due to Berkley LockTimeoutException (https://github.com/thinkaurelius/titan/issues/1113)
> The default LockTimeout for berkley db is 500 ms and if a thread (some IT) is waiting on titan storage resource which is locked by another thread and it doesn't releases the lock within 500ms - fails with above exception. (see error log below)
> The fix for this is to increase the storage.lock.wait-time for berkley db to 10000 ms. This is consistent with the lock wait timeout specified for HBase.
> {code}
> Caused by: com.sleepycat.je.LockTimeoutException: (JE 5.0.73) Lock expired. Locker 1516581475 7535_NotificationHookConsumer thread-0_Txn: waited for lock on database=edgestore LockAddr:284896285 LSN=0x0/0x21d55f type=WRITE grant=WAIT_PROMOTION timeoutMillis=500 startTime=1491261268442 endTime=1491261268942
> Owners: [<LockInfo locker="1445928922 7537_qtp184901207-1038 - e015a355-d6c5-4424-b7a7-833a289aea9d_Txn" type="READ"/>, <LockInfo locker="1516581475 7535_NotificationHookConsumer thread-0_Txn" type="READ"/>]
> Waiters: []
> Transaction 1445928922 7537_qtp184901207-1038 - e015a355-d6c5-4424-b7a7-833a289aea9d_Txn waits for  LockAddr:471572402 Owners:<LockInfo locker="1516581475 7535_NotificationHookConsumer thread-0_Txn" type="WRITE"/> Waiters:[<LockInfo locker="1445928922 7537_qtp184901207-1038 - e015a355-d6c5-4424-b7a7-833a289aea9d_Txn" type="READ"/>]
> Transaction 1516581475 7535_NotificationHookConsumer thread-0_Txn owns LockAddr:471572402 <LockInfo locker="1516581475 7535_NotificationHookConsumer thread-0_Txn" type="WRITE"/>
> Transaction 1516581475 7535_NotificationHookConsumer thread-0_Txn waits for LockAddr:284896285
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)