You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Till (JIRA)" <ji...@apache.org> on 2018/03/14 23:47:00 UTC
[jira] [Updated] (ASTERIXDB-2215) Filter is not properly applied for a secondary inverted index search

     [ https://issues.apache.org/jira/browse/ASTERIXDB-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Till updated ASTERIXDB-2215:
----------------------------
    Component/s: IDX - Indexes
                 COMP - Compiler

> Filter is not properly applied for a secondary inverted index search
> --------------------------------------------------------------------
>
>                 Key: ASTERIXDB-2215
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2215
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: COMP - Compiler, IDX - Indexes
>            Reporter: Taewoo Kim
>            Priority: Major
>              Labels: triaged
>
> Based on the way of writing predicate conditions on a field with filter, the generated plan does not correctly show min and max value of a filter.
> {code}
> drop dataverse twitter if exists;
> create dataverse twitter if not exists;
> use dataverse twitter;
> create type typeUser if not exists as open {
>     id: int64,
>     name: string,
>     screen_name : string,
>     profile_image_url : string,
>     lang : string,
>     location: string,
>     create_at: date,
>     description: string,
>     followers_count: int32,
>     friends_count: int32,
>     statues_count: int64
> };
> create type typePlace if not exists as open{
>     country : string,
>     country_code : string,
>     full_name : string,
>     id : string,
>     name : string,
>     place_type : string,
>     bounding_box : rectangle
> };
> create type typeGeoTag if not exists as open {
>     stateID: int32,
>     stateName: string,
>     countyID: int32,
>     countyName: string,
>     cityID: int32?,
>     cityName: string?
> };
> create type typeTweet if not exists as open {
>     create_at : datetime,
>     id: int64,
>     "text": string,
>     in_reply_to_status : int64,
>     in_reply_to_user : int64,
>     favorite_count : int64,
>     coordinate: point?,
>     retweet_count : int64,
>     lang : string,
>     is_retweet: boolean,
>     hashtags : {{ string }} ?,
>     user_mentions : {{ int64 }} ? ,
>     user : typeUser,
>     place : typePlace?,
>     geo_tag: typeGeoTag
> };
> create dataset ds_tweet(typeTweet) if not exists primary key id with filter on create_at;
> {code}
> For the following query, the logical plan shows empty min[] and two variables in max[] when doing an inverted-index search. 
> {code}
> USE twitter;
> SELECT spatial_cell(get_points(place.bounding_box)[0], create_point(0.0,0.0),1.0,1.0) AS cell, count(*) AS cnt FROM ds_tweet
> WHERE ftcontains(text, ['trump'], {'mode':'any'}) AND place.bounding_box IS NOT unknown 
> AND datetime('2017-02-25T00:00:00') <= create_at AND  create_at < datetime('2017-02-26T00:00:00')
> GROUP BY cell;
> {code}
> Exact predicates on the filter
> {code}
> datetime('2017-02-25T00:00:00') <= create_at AND  create_at < datetime('2017-02-26T00:00:00')
> {code}
> {code}
> unnest-map [$$64, $$69, $$70] <- index-search("text_idx", 2, "twitter", "ds_tweet", FALSE, FALSE, 5, null, 21, TRUE, 1, $$63) with filter on min:[] max:[$$67, $$68]
>                                         -- SINGLE_PARTITION_INVERTED_INDEX_SEARCH  |PARTITIONED|
>                                           exchange
>                                           -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                             assign [$$67, $$68, $$63] <- [datetime: { 2017-02-26T00:00:00.000Z }, datetime: { 2017-02-25T00:00:00.000Z }, array: [ "trump" ]]
>                                             -- ASSIGN  |PARTITIONED|
>                                               empty-tuple-source
>                                               -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
> {code}
> However, for the following query(just switched the location of datetime and create_at at the end of the predicates), it shows another incorrect plan.
> {code}
> SELECT spatial_cell(get_points(place.bounding_box)[0], create_point(0.0,0.0),1.0,1.0) AS cell, count(*) AS cnt FROM ds_tweet
> WHERE ftcontains(text, ['trump'], {'mode':'any'}) AND place.bounding_box IS NOT unknown 
> AND datetime('2017-02-25T00:00:00') <= create_at AND  datetime('2017-02-26T00:00:00') > create_at
> GROUP BY cell;
> {code}
> Exact predicates on the filter:
> {code}
> datetime('2017-02-25T00:00:00') <= create_at AND  datetime('2017-02-26T00:00:00') > create_at
> {code}
> {code}
> unnest-map [$$64, $$69, $$70] <- index-search("text_idx", 2, "twitter", "ds_tweet", FALSE, FALSE, 5, null, 21, TRUE, 1, $$63) with filter on min:[$$67] max:[$$68]
>                                         -- SINGLE_PARTITION_INVERTED_INDEX_SEARCH  |PARTITIONED|
>                                           exchange
>                                           -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                             assign [$$67, $$68, $$63] <- [datetime: { 2017-02-26T00:00:00.000Z }, datetime: { 2017-02-25T00:00:00.000Z }, array: [ "trump" ]]
>                                             -- ASSIGN  |PARTITIONED|
>                                               empty-tuple-source
>                                               -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)