You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Taewoo Kim (JIRA)" <ji...@apache.org> on 2017/02/26 01:43:44 UTC

[jira] [Created] (ASTERIXDB-1813) similarity-jaccard-prefix() issue

Taewoo Kim created ASTERIXDB-1813:
-------------------------------------

             Summary: similarity-jaccard-prefix() issue
                 Key: ASTERIXDB-1813
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1813
             Project: Apache AsterixDB
          Issue Type: Bug
            Reporter: Taewoo Kim


For the following two records, similarity-jaccard-prefix() doesn't generate the correct result. Switch the line (skip-index, indexnl) to see the difference. In order to see this, you need to enable the fuzzy join rule. It doesn't happen in the master yet. This bug needs to be fixed before enabling the fuzzy join rule. 

{code}
drop dataverse test if exists;
create dataverse test;

use dataverse test;

create type DBLPType as open {
  id: uuid
}

create dataset AmazonReviewNoDup(DBLPType)
  primary key id;

create index AmazonReviewNoDup_summary_b_idx
on AmazonReviewNoDup(summary:string?) type btree enforced;

create index AmazonReviewNoDup_summary_kw_idx
on AmazonReviewNoDup(summary:string?) type keyword enforced;

insert into dataset AmazonReviewNoDup(
{ "id": uuid("83208a78-7007-8d77-935b-d9127e4cc9dc"), "summary": "Clear, Concise, and fun!" }
);

insert into dataset AmazonReviewNoDup(
{ "id": uuid("83208a78-7007-8d77-935b-d9127e4cc9dd"), "summary": "Clear, Concise, and Charitable" }
);

for $o in dataset
AmazonReviewNoDup
for $i in dataset
AmazonReviewNoDup
//where /* +indexnl */ similarity-jaccard(word-tokens($o.summary), word-tokens($i.summary)) >= 0.6
where /* +skip-index */ similarity-jaccard(word-tokens($o.summary), word-tokens($i.summary)) >= 0.6
and $o.id < $i.id
return {"oid":$o.id, "iid":$i.id};
{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)