You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@asterixdb.apache.org by "Wail Alkowaileet (JIRA)" <ji...@apache.org> on 2016/10/19 08:18:58 UTC

[jira] [Commented] (ASTERIXDB-1698) Secondary index doesn't follow the compaction policy

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588022#comment-15588022 ] 

Wail Alkowaileet commented on ASTERIXDB-1698:
---------------------------------------------

I had a discussion with Sattam about this. I don't think it's a bug but unimplemented logic of that specific case.
In the normal case of an LSM index, it will seek the opportunity to have one disk component as much as possible. So when you create a secondary index, it will use the bulk loader for building the index.

> Secondary index doesn't follow the compaction policy
> ----------------------------------------------------
>
>                 Key: ASTERIXDB-1698
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1698
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: Storage
>         Environment: master : 4819ea44723b87a68406d248782861cf6e5d3305
>            Reporter: Jianfeng Jia
>            Assignee: Ian Maxon
>
> Here is the ddl for the dataset:
> {code}
> create dataset ds_tweet(typeTweet) if not exists primary key id using compaction policy prefix (("max-mergable-component-size"="134217728"),("max-tolerance-component-count"="10")) with filter on create_at ;
> create index text_idx if not exists on ds_tweet("text") type keyword;
> {code}
> In this case, I want to create a smaller component around 128M. During the data ingestion phase, it works well, and the size of each text_idx component is also small (~80M each). I assume it also followed the component size constraint? 
> After ingestion, I found that I needed to build another index, 
> {code}
> create index time_idx if not exists on ds_tweet(create_at) type btree;
> {code}
> When it finished, I found that this time_idx didn't follow the constraint and ended up with one giant 1.2G component on each partition. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)