You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Glenn Justo Galvizo (Jira)" <ji...@apache.org> on 2021/06/25 22:18:00 UTC
[jira] [Assigned] (ASTERIXDB-2899) Accelerate Jaccard Similarity
Queries w/ Array Indexes
[ https://issues.apache.org/jira/browse/ASTERIXDB-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Glenn Justo Galvizo reassigned ASTERIXDB-2899:
----------------------------------------------
Assignee: Glenn Justo Galvizo
> Accelerate Jaccard Similarity Queries w/ Array Indexes
> ------------------------------------------------------
>
> Key: ASTERIXDB-2899
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2899
> Project: Apache AsterixDB
> Issue Type: New Feature
> Reporter: Glenn Justo Galvizo
> Assignee: Glenn Justo Galvizo
> Priority: Major
>
> Given the following:
>
> {code:java}
> CREATE INDEX storesCatIdx ON Stores (UNNEST categories);
> FROM Stores S
> WHERE SIMILARITY_JACCARD_CHECK(S.categories, ["Fruits", "Bread"], 0.6)
> SELECT *;
> FROM Stores S
> WHERE SIMILARITY_JACCARD(S.categories, ["Fruits", "Bread"]) > 0.6
> SELECT *;{code}
> The index Stores.storesCatIdx can be used to accelerate the aforementioned queries. A rule can be introduced to transform the query into a join query on the ["Fruits", "Bread"] array and the S.categories array. The rule to optimize for array index joins will then fire, utilizing the primary index validation to remove false positives.
> The resulting plan is one that generates false positives from the multi-valued index search (store categories that have one of the items) that will be filtered out by applying the actual jaccard similarity function + check before yielding the results back to the user.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)