You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Tomer Shiran (JIRA)" <ji...@apache.org> on 2014/11/03 06:45:33 UTC

[jira] [Commented] (DRILL-1629) Cast issue during query

    [ https://issues.apache.org/jira/browse/DRILL-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194268#comment-14194268 ] 

Tomer Shiran commented on DRILL-1629:
-------------------------------------

Note that this does work when both datasets are coming from DFS:

{code}
0: jdbc:drill:zk=local> SELECT u.user_id, u.name, count(*) reviews FROM dfs.yelp.`yelp_academic_dataset_user.json` u, dfs.yelp.`yelp_academic_dataset_review.json` r WHERE u.user_id = r.user_id GROUP BY u.user_id, u.name ORDER BY reviews DESC LIMIT 10;
+------------+------------+------------+
|  user_id   |    name    |  reviews   |
+------------+------------+------------+
| kGgAARL2UmvCcTRfiscjug | J          | 1399       |
| ikm0UCahtK34LbLCEw4YTw | Rand       | 1137       |
| Iu3Jo9ROp2IWC9FwtWOaUQ | Norm       | 1046       |
| glRXVWWD6x1EZKfjJawTOg | Jade       | 1013       |
| PV5voYSD43Cn_3gHmxG7DA | David      | 895        |
| fczQCSmaWF78toLEmb0Zsw | Gabi       | 872        |
| lHHwLi_YZuDSfdlSShFkug | Nelson     | 816        |
| ia1nTRAQEaFWv0cwADeK7g | Emily      | 811        |
| 3gIfcQq5KxAegwCPXc83cQ | Jennifer   | 790        |
| DrWLhrK8WMZf7Jb-Oqc7ww | Brad       | 765        |
+------------+------------+------------+
10 rows selected (54.847 seconds)
0: jdbc:drill:zk=local> 
{code}

And here's the explain plan:

{code}
0: jdbc:drill:zk=local> EXPLAIN PLAN FOR SELECT u.user_id, u.name, count(*) reviews FROM dfs.yelp.`yelp_academic_dataset_user.json` u, dfs.yelp.`yelp_academic_dataset_review.json` r WHERE u.user_id = r.user_id GROUP BY u.user_id, u.name ORDER BY reviews DESC LIMIT 10;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      Project(user_id=[$0], name=[$1], reviews=[$2])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[10])
00-04            SingleMergeExchange(sort0=[2 DESC])
01-01              SelectionVectorRemover
01-02                TopN(limit=[10])
01-03                  HashToRandomExchange(dist0=[[$2]])
02-01                    HashAgg(group=[{0, 1}], reviews=[COUNT()])
02-02                      Project(user_id=[$0], name=[$1])
02-03                        HashJoin(condition=[=($0, $2)], joinType=[inner])
02-05                          HashToRandomExchange(dist0=[[$0]])
03-01                            Scan(groupscan=[EasyGroupScan [selectionRoot=/Users/tshiran/Development/yelp/yelp_academic_dataset_user.json, numFiles=1, columns = [SchemaPath [`user_id`], SchemaPath [`name`]]]])
02-04                          Project(user_id0=[$0])
02-06                            HashToRandomExchange(dist0=[[$0]])
04-01                              Scan(groupscan=[Ea |
+------------+------------+
1 row selected (0.415 seconds)
{code}

> Cast issue during query
> -----------------------
>
>                 Key: DRILL-1629
>                 URL: https://issues.apache.org/jira/browse/DRILL-1629
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Tomer Shiran
>
> Imported Yelp user.json file into a local MongoDB instance (via mongoimport). Ran the following query to find the names of the users who had the most reviews:
> {code}
> 0: jdbc:drill:zk=local> SELECT u.user_id, u.name, count(*) reviews FROM mongo.yelp.users u, dfs.yelp.`yelp_academic_dataset_review.json` r WHERE u.user_id = r.user_id GROUP BY u.user_id, u.name ORDER BY reviews DESC LIMIT 10;
> Query failed: Failure while running fragment. Failure finding function that runtime code generation expected.  Signature: compare_to( VARCHAR:OPTIONALINT:OPTIONAL,  ) returns INT:REQUIRED [c796f1fb-b73a-49e8-8b4a-ed973d89b7d3]
> Error: exception while executing query: Failure while trying to get next result batch. (state=,code=0)
> {code}
> Here's the explain plan:
> {code}
> 0: jdbc:drill:zk=local> EXPLAIN PLAN FOR SELECT u.user_id, u.name, count(*) reviews FROM mongo.yelp.users u, dfs.yelp.`yelp_academic_dataset_review.json` r WHERE u.user_id = r.user_id GROUP BY u.user_id, u.name ORDER BY reviews DESC LIMIT 10;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(user_id=[$0], name=[$1], reviews=[$2])
> 00-02        SelectionVectorRemover
> 00-03          Limit(fetch=[10])
> 00-04            SingleMergeExchange(sort0=[2 DESC])
> 01-01              SelectionVectorRemover
> 01-02                TopN(limit=[10])
> 01-03                  HashToRandomExchange(dist0=[[$2]])
> 02-01                    HashAgg(group=[{0, 1}], reviews=[COUNT()])
> 02-02                      Project(user_id=[$0], name=[$1])
> 02-03                        HashJoin(condition=[=($0, $2)], joinType=[inner])
> 02-05                          HashToRandomExchange(dist0=[[$0]])
> 03-01                            Scan(groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec [dbName=yelp, collectionName=users, filters=null], columns=[SchemaPath [`user_id`], SchemaPath [`name`]]]])
> 02-04                          HashToRandomExchange(dist0=[[$0]])
> 04-01                            Project(T8¦¦*=[$0])
> 04-02                              Project(T8¦¦*=[$0], T8¦¦user_id=[$ |
> +------------+------------+
> 1 row selected (0.42 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)