You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@unomi.apache.org by "Serge Huber (Jira)" <ji...@apache.org> on 2023/03/10 15:54:00 UTC

[jira] [Commented] (UNOMI-748) Unomi merge system is exposed to OOM

    [ https://issues.apache.org/jira/browse/UNOMI-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699000#comment-17699000 ] 

Serge Huber commented on UNOMI-748:
-----------------------------------

Thanks Kevan, I think you are right in this analysis.

I've been wanting to have some kind of scheduler service in Unomi for the longest time to be able to have background running tasks such as an updateByQuery and that we could query for its state and also make sure that we are not running too many background tasks at the same time in a cluster. 

This might be a little too involved for these needs, but I worry that just using a background thread could also be a problem if multiple merge requests are done in parallel. We already have a basic scheduler service in place maybe we should use that in the meantime ?

Regards,
  Serge... 

> Unomi merge system is exposed to OOM
> ------------------------------------
>
>                 Key: UNOMI-748
>                 URL: https://issues.apache.org/jira/browse/UNOMI-748
>             Project: Apache Unomi
>          Issue Type: Improvement
>    Affects Versions: unomi-2.1.0
>            Reporter: Kevan Jahanshahi
>            Priority: Major
>
> currently the sessions/events *update* is using bulkProcessor and it is asynchronous, we never know when the bulk will be perform.
>  * t{+}he benefit{+}: fast merge requests, the merge request is fast as nothing is retain, bulk processor will do the job in a separate thread.
>  * {+}the cons{+}: {*}all previous sessions/events are first loaded in memory{*}, so in case of merging active profiles that contains a lot of past events/sessions, {{{}we could be exposed to OOM{}}}. {_}(We already had similar case with the purge that was loading all profiles in memory.{_})
> If we replace the *update(one item at a time)* by using {*}updateByQuery{*}, the request will loose it’s asynchronous nature provided by the so called: BulkProcessor.
>  * {+}the benefit{+}: sessions, events not load in memory, no OOM possible
>  * {+}the cons{+}: request will be synchron and {{{}we expose merge requests to timeout on client side{}}}. merge is actually trigger by the login on jExp side adding extra timing here could have bad impacts and side effects.
>  
> Since none of this solution seem’s ok, the perfect solution should be a mix of both strength: * use *{{updateByQuery}}* in a separate thread to avoid retaining merge request
>  * 
>  ** We have the OOM protection by not loading all the past events/sessions
>  ** We have the asynchronous execution done in a separate thread/job to free the current request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)