You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Glenn Justo Galvizo (Jira)" <ji...@apache.org> on 2021/06/25 22:18:00 UTC

[jira] [Assigned] (ASTERIXDB-2899) Accelerate Jaccard Similarity Queries w/ Array Indexes

     [ https://issues.apache.org/jira/browse/ASTERIXDB-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Glenn Justo Galvizo reassigned ASTERIXDB-2899:
----------------------------------------------

    Assignee: Glenn Justo Galvizo

> Accelerate Jaccard Similarity Queries w/ Array Indexes
> ------------------------------------------------------
>
>                 Key: ASTERIXDB-2899
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2899
>             Project: Apache AsterixDB
>          Issue Type: New Feature
>            Reporter: Glenn Justo Galvizo
>            Assignee: Glenn Justo Galvizo
>            Priority: Major
>
> Given the following:
>  
> {code:java}
> CREATE INDEX storesCatIdx ON Stores (UNNEST categories);
> FROM    Stores S
> WHERE   SIMILARITY_JACCARD_CHECK(S.categories, ["Fruits", "Bread"], 0.6)
> SELECT  *;
> FROM    Stores S
> WHERE   SIMILARITY_JACCARD(S.categories, ["Fruits", "Bread"]) > 0.6
> SELECT  *;{code}
> The index Stores.storesCatIdx can be used to accelerate the aforementioned queries. A rule can be introduced to transform the query into a join query on the  ["Fruits", "Bread"] array and the S.categories array. The rule to optimize for array index joins will then fire, utilizing the primary index validation to remove false positives. 
> The resulting plan is one that generates false positives from the multi-valued index search (store categories that have one of the items) that will be filtered out by applying the actual jaccard similarity function + check before yielding the results back to the user.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)