You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/10/11 04:51:00 UTC

[jira] [Commented] (ATLAS-4408) Dynamic handling of failure in updating index

    [ https://issues.apache.org/jira/browse/ATLAS-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17426950#comment-17426950 ] 

ASF subversion and git services commented on ATLAS-4408:
--------------------------------------------------------

Commit 261331bbd1fa8e50517421ab834f01f89f4997e1 in atlas's branch refs/heads/master from Radhika Kundam
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=261331b ]

ATLAS-4408: Dynamic handling of failure in updating index


> Dynamic handling of failure in updating index
> ---------------------------------------------
>
>                 Key: ATLAS-4408
>                 URL: https://issues.apache.org/jira/browse/ATLAS-4408
>             Project: Atlas
>          Issue Type: New Feature
>          Components:  atlas-core
>            Reporter: Radhika Kundam
>            Assignee: Radhika Kundam
>            Priority: Major
>         Attachments: IndexRecovery.png, IndexRecovery_FunctionalFlow.png
>
>
> *Index failure resilience:* dynamic handling of failure in updating index (i.e. HBase commit succeeds but index commit fails).
> In case of secondary persistence failure scenario, there will be inconsistency with indexes for all the transactions failed at Solr. And to repair that, the existing option is re-indexing all the data which is time consuming as it involves indexing the entire database.
> To recover such inconsistencies we can use the *transaction write-ahead log option*. By enabling write-ahead log(tx.log-tx), JanusGraph maintains all the transaction log data which can be used to recover indices in case of failures. With this approach, it’s extra overhead to maintain the log data for all transactions but with this approach we can guarantee the system is more resilient and proactive. So advantages of this approach can nullify the overhead of maintaining log data.
> Design details as below.
>  # Start new service - IndexRecoveryService at Atlas startup.
>  ## Continuously monitor for Solr(Index Client) health for every retryTime millisecs
>  ### If Solr is healthy and recovery start time is available, 
>  #### Start Transaction Recovery with available recovery start time(which is noted when Solr became unhealthy)
>  #### Persist current recovery time as previous which can be used later by passing as custom recovery time to start index recovery if required.
>  #### Reset current recovery start time
>  #### Continue with Solr health checkup.
>  ### If Solr is unhealthy and no recovery start time is available, 
>  #### Shutdown the existing transaction recovery process.
>  #### Note down the time which should be the next recovery start time and persist in graph.
>  #### Continue with Solr health checkup.
> Configuration properties to be used for this feature.
> 1.To enable or disable index recovery(By default index recovery will be enabled on Atlas startup)
>     *atlas.graph.enable.index.recovery=true*
>  2.To configure how frequently SOLR health check should be done
>     *atlas.graph.index.search.solr.status.retry.interval=<time in ms>*
>  3.To start index recovery by custom recovery time as user provided
>     *atlas.graph.index.search.solr.recovery.start.time=1630086622*
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)