You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@unomi.apache.org by "Kevan Jahanshahi (Jira)" <ji...@apache.org> on 2022/10/28 09:32:00 UTC

[jira] [Resolved] (UNOMI-670) Unomi purge system is not working

     [ https://issues.apache.org/jira/browse/UNOMI-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevan Jahanshahi resolved UNOMI-670.
------------------------------------
    Resolution: Fixed

> Unomi purge system is not working
> ---------------------------------
>
>                 Key: UNOMI-670
>                 URL: https://issues.apache.org/jira/browse/UNOMI-670
>             Project: Apache Unomi
>          Issue Type: Bug
>          Components: unomi(-core)
>    Affects Versions: unomi-2.0.0, unomi-1.6.1
>            Reporter: Kevan Jahanshahi
>            Assignee: Kevan Jahanshahi
>            Priority: Major
>             Fix For: unomi-2.0.1, unomi-1.7.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The purge system have been fixed recently in version 1.6.0: [https://jira.jahia.org/browse/UNOMI-504] 
> The fix that was done, just fixed the query used for identifying the profiles to be purged.
> Actually we figured a bit late that the purge system contains more issues than that.
> Here is a non exhaustive list of the one already identified and that will need fix/answers an even reflections/discussions for some of them
> h2. Bug 1: disabling the profile purge will end up with all profiles to be purge everyday
> Using this conf:
> {code:java}
> org.apache.unomi.profile.purge.existTime = -1
> org.apache.unomi.profile.purge.inactiveTime = -1
> org.apache.unomi.monthly.index.purge.existTime= 12 {code}
> To only enable events/session purge will have a bad side effect: all profiles of the system will be removed everyday.
> h2. bug 2: OOM issue for profile purge.
> The persistence service is implementing a method: 
> {code:java}
> /**
>  * Deletes items with the specified Item subclass matching the specified {@link Condition}.
>  *
>  * @param <T>   the type of the Item subclass we want to delete
>  * @param query a {@link Condition} identifying which elements we want to delete
>  * @param clazz the {@link Item} subclass of the items we want to delete
>  * @return {@code true} if the deletion was successful, {@code false} otherwise
>  */
> <T extends Item> boolean removeByQuery(Condition query, Class<T> clazz); {code}
> The issue with the implem is that it's doing the following:
>  * query profiles to get the IDs of documents
>  * building a batch request with a delete request per profile ID
>  * sending the batch.
> So in case of a lot profiles to be purge at once, the batch can grow a lot, and consume all the memory available.
> We should not do like that, instead we should:
>  * just use the ES query: [https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html] directly with the query to identify the profiles
> h2. Bug 3: What is the cache system implemented in the persistence service.
> Seem's coming from: https://issues.apache.org/jira/browse/UNOMI-166 
> It's used by the persistence service, and it explain why the removeByQuery implem is bad.
> The reason why the *removeByQuery* is querying all the documents IDs is to invalidate the cache.
> But it's strange, the cache system is not used at all:
> {code:java}
> <cm:property name="itemClassesToCache" value="" /> {code}
> So I would say, we should remove this cache system, because:
>  * it's not tested
>  * nobody know what would be the effect if enabled
>  * it's not possible to activate it with ENV conf, or global unomi conf.
>  * it's not documented
>  * it's impacting implems like: removeByQuery, even if the cache system is not used
>  * it's mostly dead code inside the core persistence of Unomi, witch is dangerous to keep.
> Update during team planning: This caching system should be removed.
> h2. Bug 4: Purge job seem's never unregistered.
> We do register the purge job when the profile service is started, but in case the profile service is stopped (stop the bundle for example). Then it's not unregistered.
> h2. Bug 5: what should we do in case of unomi cluster, should all of them do the purge ?
> In case of unomi cluster, all of them may try to trigger the purge, I dont think it's an issue, but it look like something could be done to avoid doing unecessary operations twice or more.
> h2. Final: We should also add integrations tests to cover this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)