You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@sentry.apache.org by "kalyan kumar kalvagadda (JIRA)" <ji...@apache.org> on 2018/10/08 20:57:00 UTC

[jira] [Commented] (SENTRY-2305) Optimize time taken for persistence HMS snapshot

    [ https://issues.apache.org/jira/browse/SENTRY-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642479#comment-16642479 ] 

kalyan kumar kalvagadda commented on SENTRY-2305:
-------------------------------------------------

I have considered multiple options. Persisting in batches is not an option with out changing the schema as the data nucleus does not persist row in batches for tables which have foreign key on other tables.

I see that best option is to persist the paths in parallel. It gave good results.  I will be updating the results from the tests in a day.

> Optimize time taken for persistence HMS snapshot 
> -------------------------------------------------
>
>                 Key: SENTRY-2305
>                 URL: https://issues.apache.org/jira/browse/SENTRY-2305
>             Project: Sentry
>          Issue Type: Sub-task
>          Components: Sentry
>    Affects Versions: 2.1.0
>            Reporter: kalyan kumar kalvagadda
>            Assignee: kalyan kumar kalvagadda
>            Priority: Major
>
> There are couple of options
> # Break the total snapshot into to batches and persist all of them in parallel in different transactions. As sentry uses repeatable_read isolation level we should be able to have parallel writes on the same table. This bring an issue if there is a failure in persisting any of the batches. This approach needs additional logic of cleaning the partially persisted snapshot. I’m evaluating this option. 
> ** *Result:* Initial results are promising. Time to persist the snapshot came down by 60%.
> # Try disabling L1 Cache for persisting the snapshot.
> # Try persisting the snapshot entries sequentially in separate transactions. As transactions which commit huge data might take longer as they take a lot of CPU cycles to keep the rollback log up to date.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)