You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/03/30 02:19:00 UTC

[jira] [Commented] (ASTERIXDB-2339) Improve Inverted Index Merge Performance

    [ https://issues.apache.org/jira/browse/ASTERIXDB-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420073#comment-16420073 ] 

ASF subversion and git services commented on ASTERIXDB-2339:
------------------------------------------------------------

Commit 3036c98099f6c912af29bced9b2bf71bdfa9026e in asterixdb's branch refs/heads/master from [~luochen01]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=3036c98 ]

[ASTERIXDB-2339] Add a new inverted index merge cursor

- user model changes: no
- storage format changes: no
- interface changes: no

Details:
- Implement a new inverted index merge cursor which uses two priority queues,
one for tokens and one for keys. For each token, we merge their inverted
lists using the key queue. After that, we fetch the next token and merge
their lists again. This reduces unnecessary token comparision a lot.
- Along this change, created a fast path for inverted index bulkloader.
Based on how the token+key pair is created, there is no need to copy
bulkloaded tuple and check whether it's a new token during merge.

Change-Id: I57d039cd7e08033884529a204bff9acffd96d9bb
Reviewed-on: https://asterix-gerrit.ics.uci.edu/2519
Tested-by: Jenkins <je...@fulliautomatix.ics.uci.edu>
Contrib: Jenkins <je...@fulliautomatix.ics.uci.edu>
Integration-Tests: Jenkins <je...@fulliautomatix.ics.uci.edu>
Reviewed-by: Ian Maxon <im...@apache.org>


> Improve Inverted Index Merge Performance
> ----------------------------------------
>
>                 Key: ASTERIXDB-2339
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2339
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>          Components: STO - Storage
>            Reporter: Chen Luo
>            Assignee: Chen Luo
>            Priority: Major
>
> Currently, the merge of inverted index is implemented by a full range scan, i.e., token+key pairs are generated and fed into a priority queue to obtain a global ordering. However, it is typical that a token can correspond to tens or hundreds (or even much more) keys. As a result, comparisons of tokens are wasted because for many times tokens would be the same. To improve this, we can have two priority queues, one for tokens and one for keys. For each token, we merge their inverted lists using the key priority queue. After that, we fetch the next token from the token queue, and merge their inverted lists again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)