You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@asterixdb.apache.org by "Chen Luo (JIRA)" <ji...@apache.org> on 2018/03/23 17:40:00 UTC

[jira] [Created] (ASTERIXDB-2339) Improve Inverted Index Merge Performance

Chen Luo created ASTERIXDB-2339:
-----------------------------------

             Summary: Improve Inverted Index Merge Performance
                 Key: ASTERIXDB-2339
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2339
             Project: Apache AsterixDB
          Issue Type: Improvement
          Components: STO - Storage
            Reporter: Chen Luo
            Assignee: Chen Luo


Currently, the merge of inverted index is implemented by a full range scan, i.e., token+key pairs are generated and fed into a priority queue to obtain a global ordering. However, it is typical that a token can correspond to tens or hundreds (or even much more) keys. As a result, comparisons of tokens are wasted because for many times tokens would be the same. To improve this, we can have two priority queues, one for tokens and one for keys. For each token, we merge their inverted lists using the key priority queue. After that, we fetch the next token from the token queue, and merge their inverted lists again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)