You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Taewoo Kim (JIRA)" <ji...@apache.org> on 2017/12/29 00:50:00 UTC
[jira] [Created] (ASTERIXDB-2215) Filter is not properly applied for a secondary inverted index search

Taewoo Kim created ASTERIXDB-2215:
-------------------------------------

             Summary: Filter is not properly applied for a secondary inverted index search
                 Key: ASTERIXDB-2215
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2215
             Project: Apache AsterixDB
          Issue Type: Bug
            Reporter: Taewoo Kim


Based on the way of writing predicate conditions on a field with filter, the generated plan sometimes is correct and sometimes not.

{code}
drop dataverse twitter if exists;
create dataverse twitter if not exists;
use dataverse twitter;

create type typeUser if not exists as open {
    id: int64,
    name: string,
    screen_name : string,
    profile_image_url : string,
    lang : string,
    location: string,
    create_at: date,
    description: string,
    followers_count: int32,
    friends_count: int32,
    statues_count: int64
};

create type typePlace if not exists as open{
    country : string,
    country_code : string,
    full_name : string,
    id : string,
    name : string,
    place_type : string,
    bounding_box : rectangle
};

create type typeGeoTag if not exists as open {
    stateID: int32,
    stateName: string,
    countyID: int32,
    countyName: string,
    cityID: int32?,
    cityName: string?
};

create type typeTweet if not exists as open {
    create_at : datetime,
    id: int64,
    "text": string,
    in_reply_to_status : int64,
    in_reply_to_user : int64,
    favorite_count : int64,
    coordinate: point?,
    retweet_count : int64,
    lang : string,
    is_retweet: boolean,
    hashtags : {{ string }} ?,
    user_mentions : {{ int64 }} ? ,
    user : typeUser,
    place : typePlace?,
    geo_tag: typeGeoTag
};

create dataset ds_tweet(typeTweet) if not exists primary key id with filter on create_at;
{code}

For the following query, the logical plan shows empty min[] and two variables in max[] when doing an inverted-index search. 

{code}
USE twitter;
SELECT spatial_cell(get_points(place.bounding_box)[0], create_point(0.0,0.0),1.0,1.0) AS cell, count(*) AS cnt FROM ds_tweet
WHERE ftcontains(text, ['trump'], {'mode':'any'}) AND place.bounding_box IS NOT unknown 
AND datetime('2017-02-25T00:00:00') <= create_at AND  create_at < datetime('2017-02-26T00:00:00')
GROUP BY cell;
{code}

Exact predicates on the filter
{code}
datetime('2017-02-25T00:00:00') <= create_at AND  create_at < datetime('2017-02-26T00:00:00')
{code}

{code}
unnest-map [$$64, $$69, $$70] <- index-search("text_idx", 2, "twitter", "ds_tweet", FALSE, FALSE, 5, null, 21, TRUE, 1, $$63) with filter on min:[] max:[$$67, $$68]
                                        -- SINGLE_PARTITION_INVERTED_INDEX_SEARCH  |PARTITIONED|
                                          exchange
                                          -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                            assign [$$67, $$68, $$63] <- [datetime: { 2017-02-26T00:00:00.000Z }, datetime: { 2017-02-25T00:00:00.000Z }, array: [ "trump" ]]
                                            -- ASSIGN  |PARTITIONED|
                                              empty-tuple-source
                                              -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
{code}


However, for the following query(just switched the location of datetime and create_at at the end of the predicates), it shows the correct plan.

{code}
SELECT spatial_cell(get_points(place.bounding_box)[0], create_point(0.0,0.0),1.0,1.0) AS cell, count(*) AS cnt FROM ds_tweet
WHERE ftcontains(text, ['trump'], {'mode':'any'}) AND place.bounding_box IS NOT unknown 
AND datetime('2017-02-25T00:00:00') <= create_at AND  datetime('2017-02-26T00:00:00') > create_at
GROUP BY cell;
{code}

Exact predicates on the filter:
{code}
datetime('2017-02-25T00:00:00') <= create_at AND  datetime('2017-02-26T00:00:00') > create_at
{code}

{code}
unnest-map [$$64, $$69, $$70] <- index-search("text_idx", 2, "twitter", "ds_tweet", FALSE, FALSE, 5, null, 21, TRUE, 1, $$63) with filter on min:[$$67] max:[$$68]
                                        -- SINGLE_PARTITION_INVERTED_INDEX_SEARCH  |PARTITIONED|
                                          exchange
                                          -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                            assign [$$67, $$68, $$63] <- [datetime: { 2017-02-26T00:00:00.000Z }, datetime: { 2017-02-25T00:00:00.000Z }, array: [ "trump" ]]
                                            -- ASSIGN  |PARTITIONED|
                                              empty-tuple-source
                                              -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)