You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Wenhai (JIRA)" <ji...@apache.org> on 2016/10/23 04:26:58 UTC

[jira] [Comment Edited] (ASTERIXDB-1704) Fuzzy-join query is slow

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15599024#comment-15599024 ] 

Wenhai edited comment on ASTERIXDB-1704 at 10/23/16 4:26 AM:
-------------------------------------------------------------

How many partitions did you configure? How about the running time in the inverted index join and nested loop join? Theoretically, we need 200,000 * 200, 000 * 40(the tokens number in each record on average) token-level comparisons in the nested loop join, I think your result by hand is much better than this.


was (Author: lwhay):
How many partitions did you configured? How about the running time in the inverted index join and nested loop join? Theoretically, we need 200,000 * 200, 000 * 40(the tokens number in each record on average).

> Fuzzy-join query is slow
> ------------------------
>
>                 Key: ASTERIXDB-1704
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1704
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Taewoo Kim
>
> I have an issue regarding the prefix-based fuzzy join (non-index based fuzzy join) on a small dataset. The following query runs forever even for a dataset with 200K records on 9 nodes. So, each node only has 20,000 records. Also, the record size is not that big. 
> {code}
> count(
> for $o in dataset AmazonReview
> for $i in dataset AmazonReview
> where similarity-jaccard(word-tokens($o.reviewText), word-tokens($i.reviewText)) >= 0.2 and $o.id < $i.id
> return {"oid":$o.reviewrID, "iid":$i.reviewID}
> );
> {code}
> An example record is as follows.  
> {code}
> {
>   "reviewerID": "A2SUAM1J3GNN3B",
>   "asin": "0000013714",
>   "reviewerName": "J. McDonald",
>   "helpful": [2, 3],
>   "reviewText": "I bought this for my husband who plays the piano.  He is having a wonderful time playing these old hymns.  The music  is at times hard to read because we think the book was published for singing from more than playing from.  Great purchase though!",
>   "overall": 5.0,
>   "summary": "Heavenly Highway Hymns",
>   "unixReviewTime": 1252800000,
>   "reviewTime": "09 13, 2009"
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)