You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Angela Schreiber (Jira)" <ji...@apache.org> on 2022/02/02 14:27:00 UTC

[jira] [Commented] (SLING-11113) resource resolver: bloom filter might be out of sync on startup

    [ https://issues.apache.org/jira/browse/SLING-11113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17485842#comment-17485842 ] 

Angela Schreiber commented on SLING-11113:
------------------------------------------

[~reschke], do i read your summary right that you are saying that
 # the bloom-filter can't be guaranteed to be accurate
 # the fix to make it accurate essentially will kill the optimization the bloom-filter was intended to deliver

?

So, my questions would be:
 * can we live with the inaccuracy as you observe it?
 * do we have any benchmarks that help us understand what the critical threshold and how big the performance gain from the filter? so, in other words what the impact is if we get rid of it altogether

> resource resolver: bloom filter might be out of sync on startup
> ---------------------------------------------------------------
>
>                 Key: SLING-11113
>                 URL: https://issues.apache.org/jira/browse/SLING-11113
>             Project: Sling
>          Issue Type: Bug
>          Components: ResourceResolver
>            Reporter: Julian Reschke
>            Priority: Major
>
> It appears that the bloom filter can be out of sync with the repo on startup.
> Upon startup, when not present, it get's created, and updated with all vanity paths found in the repo. If present, it is used as is.
> So for a restart of a node, there's a time window (up to save interval of 60s and downtime) during which the addition of vanity paths will not be reflected in the bloom filter.
> Now the bloom filter is only relevant if the number of vanity paths exceeds the maximum number, so this problem might be hard to observe.
> AFAIU, the *intent* of persisting the bloom filter is to avoid the cost of re-filling it on startup. However, we already know that *finding* the vanity paths (doing the query, getting the resources and processing the properties) is already costly. It's dubious that avoiding the cost if updating the filter helps here.
> Proposal: get rid of the persistence of the bloom filter altogether, reducing the complexity of the code significantly.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)