You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@unomi.apache.org by "Thomas Draier (JIRA)" <ji...@apache.org> on 2018/10/09 16:09:00 UTC

[jira] [Created] (UNOMI-204) Optimize pastEvents conditions execution and count

Thomas Draier created UNOMI-204:
-----------------------------------

             Summary: Optimize pastEvents conditions execution and count
                 Key: UNOMI-204
                 URL: https://issues.apache.org/jira/browse/UNOMI-204
             Project: Apache Unomi
          Issue Type: Improvement
            Reporter: Thomas Draier


Past event condition query execution is based on an aggregate on events to get all profile ids, then generate an id query on profiles with each id. This leads to different issues :
- the terms aggregate is limited to 5000 buckets by default ( configurable thanks to UNOMI-119 ), so the condition will anyway not return more than 5000 users (which is an issue for updateExistingProfilesForSegment ). The limit is necessary to avoid out of memory, but we still need the list of profiles - using aggregate filter/partition should help getting all items.
- The id query can be huge (millions of ids ?) - even if, in the end, we have a limit on the size of results we want. This is unfortunately difficult to optimize, as 1/we don't know if a limit will be used or not and 2/ the condition can be part of a and boolean condition, which would require an unknown minimal number of ids
- the "count" method is not optimal as it executes the full query and gets the number of results, where it can in some cases be optimized. For pastEventCondition, we generate an IdQuery with a list of ids to just get the count of profiles - counting the ids should be enough, and in some cases we could even use cardinality aggregate to directly get the count. In all cases, keeping the list of all ids in memory should not be needed for counting.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)