You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by yousuf <yo...@css.org.sa> on 2016/12/11 08:10:26 UTC

nested array flatten and then group by - Apache drill

Hi,

How can I retrieve hashtags[n].text values as flatten from the following 
json document (storage mongo). My final goal is to group text with count.

Apache drill version : 1.8.0
MongoDB version: 3.2

I've tried several examples but my final result is either empty array eg 
[] or blank.

{
     "twitter_entities": {
         "hashtags": [{
             "text": "snow",
             "indices": [
                 66,
                 71
             ]
         }, {
             "text": "skiing",
             "indices": [
                 73,
                 80
             ]
         }],
         "trends": [],
         "urls": [{
             "url": "http://xxxxxxx",
             "expanded_url": "http://xxxxx",
             "display_url": xxxxx",
             "indices": [
                 71,
                 93
             ]
         }]
     }
}

Any help is appreciated
Regards
Yousuf

Re: nested array flatten and then group by - Apache drill

Posted by Kathleen Li <kl...@maprtech.com>.

Dechang provided method by using drill flatten function with mongdb plugin works fine in my test env by using the json document Yousuf provided.

Details as below:

I am using drill 1.8 and mongodb 3.2.9

> db.test.find()
{ "_id" : ObjectId("584f3bad7161b4e2b9fa0a7d"), "twitter_entities" : { "hashtags" : [ { "text" : "snow", "indices" : [ 66, 71 ] }, { "text" : "skiing", "indices" : [ 73, 80 ] } ], "trends" : [ ], "urls" : [ { "url" : "http://xxxxxxx", "expanded_url" : "http://xxxxx", "display_url" : "xxxxx", "indices" : [ 71, 93 ] } ] } }



0: jdbc:drill:zk=drill1:5181,drill2:5181,dril> use mongo.test;
+-------+-----------------------------------------+
|  ok   |                 summary                 |
+-------+-----------------------------------------+
| true  | Default schema changed to [mongo.test]  |
+-------+-----------------------------------------+
1 row selected (0.376 seconds)
0: jdbc:drill:zk=drill1:5181,drill2:5181,dril> select t1.c1.text, count(t1.c1.text) from (select
. . . . . . . . . . . . . . . . . . . . . . .> flatten(tbl.twitter_entities.hashtags) as c1 from test as tbl) t1 group
. . . . . . . . . . . . . . . . . . . . . . .> by t1.c1.text;
+---------+---------+
| EXPR$0  | EXPR$1  |
+---------+---------+
| skiing  | 1       |
| snow    | 1       |
+---------+---------+
2 rows selected (0.372 seconds)





The only thing difference is that I noticed in the original document Yousuf provided, for key/value pair "display_url": xxxxx”, “ is missing before xxxxx” and I was not able to insert/import into mongodb, after I added “ —>"display_url": “xxxxx”, everything works fine.

Regards,

Kathleen

On 12/13/16, 6:30 AM, "yousuf" <yo...@css.org.sa> wrote:

>Hi Dechang,
>
>The query didn't work with mongodb.
>
>
>use mongo.test;
>
>select t1.c1.text, count(t1.c1.text) from (select 
>flatten(tbl.twitter_entities.hashtags) as c1 from test as tbl) t1 group 
>by t1.c1.text;
>+---------+---------+
>| EXPR$0  | EXPR$1  |
>+---------+---------+
>+---------+---------+
>
>Kind Regards
>
>Yousuf
>
>
>
>
>
>
>-------- Forwarded Message --------
>Subject: 	Re: nested array flatten and then group by - Apache drill
>Date: 	Mon, 12 Dec 2016 10:32:53 -0800
>From: 	Dechang Gu <dg...@maprtech.com>
>Reply-To: 	user@drill.apache.org
>To: 	user@drill.apache.org
>
>
>
>AFAIK,  for MongoDB plugin, we have very limited tests.
>Anyway, I  tried the following query on your sample json file:
>
>0: jdbc:drill:schema=dfs.tpchPar100> select t1.c1.text, count(t1.c1.text)
>from (select flatten(tbl.twitter_entities.hashtags) as c1 from
>dfs.`/jsondata/junk.json` as tbl) t1 group by t1.c1.text;
>
>+---------+---------+
>
>| EXPR$0  | EXPR$1  |
>
>+---------+---------+
>
>| skiing  | 1       |
>
>| snow    | 1       |
>
>+---------+---------+
>
>2 rows selected (0.542 seconds)
>
>
>is this the group/count you want to retrieve?
>
>
>HTH,
>Dechang
>
>On Sun, Dec 11, 2016 at 12:10 AM, yousuf <yo...@css.org.sa> wrote:
>
>> Hi,
>>
>> How can I retrieve hashtags[n].text values as flatten from the following
>> json document (storage mongo). My final goal is to group text with count.
>>
>> Apache drill version : 1.8.0
>> MongoDB version: 3.2
>>
>> I've tried several examples but my final result is either empty array eg
>> [] or blank.
>>
>> {
>>     "twitter_entities": {
>>         "hashtags": [{
>>             "text": "snow",
>>             "indices": [
>>                 66,
>>                 71
>>             ]
>>         }, {
>>             "text": "skiing",
>>             "indices": [
>>                 73,
>>                 80
>>             ]
>>         }],
>>         "trends": [],
>>         "urls": [{
>>             "url": "http://xxxxxxx",
>>             "expanded_url": "http://xxxxx",
>>             "display_url": xxxxx",
>>             "indices": [
>>                 71,
>>                 93
>>             ]
>>         }]
>>     }
>> }
>>
>> Any help is appreciated
>> Regards
>> Yousuf
>>
>>
>>
>

Fwd: Re: nested array flatten and then group by - Apache drill

Posted by yousuf <yo...@css.org.sa>.

Hi Dechang,

The query didn't work with mongodb.


use mongo.test;

select t1.c1.text, count(t1.c1.text) from (select 
flatten(tbl.twitter_entities.hashtags) as c1 from test as tbl) t1 group 
by t1.c1.text;
+---------+---------+
| EXPR$0  | EXPR$1  |
+---------+---------+
+---------+---------+

Kind Regards

Yousuf






-------- Forwarded Message --------
Subject: 	Re: nested array flatten and then group by - Apache drill
Date: 	Mon, 12 Dec 2016 10:32:53 -0800
From: 	Dechang Gu <dg...@maprtech.com>
Reply-To: 	user@drill.apache.org
To: 	user@drill.apache.org



AFAIK,  for MongoDB plugin, we have very limited tests.
Anyway, I  tried the following query on your sample json file:

0: jdbc:drill:schema=dfs.tpchPar100> select t1.c1.text, count(t1.c1.text)
from (select flatten(tbl.twitter_entities.hashtags) as c1 from
dfs.`/jsondata/junk.json` as tbl) t1 group by t1.c1.text;

+---------+---------+

| EXPR$0  | EXPR$1  |

+---------+---------+

| skiing  | 1       |

| snow    | 1       |

+---------+---------+

2 rows selected (0.542 seconds)


is this the group/count you want to retrieve?


HTH,
Dechang

On Sun, Dec 11, 2016 at 12:10 AM, yousuf <yo...@css.org.sa> wrote:

> Hi,
>
> How can I retrieve hashtags[n].text values as flatten from the following
> json document (storage mongo). My final goal is to group text with count.
>
> Apache drill version : 1.8.0
> MongoDB version: 3.2
>
> I've tried several examples but my final result is either empty array eg
> [] or blank.
>
> {
>     "twitter_entities": {
>         "hashtags": [{
>             "text": "snow",
>             "indices": [
>                 66,
>                 71
>             ]
>         }, {
>             "text": "skiing",
>             "indices": [
>                 73,
>                 80
>             ]
>         }],
>         "trends": [],
>         "urls": [{
>             "url": "http://xxxxxxx",
>             "expanded_url": "http://xxxxx",
>             "display_url": xxxxx",
>             "indices": [
>                 71,
>                 93
>             ]
>         }]
>     }
> }
>
> Any help is appreciated
> Regards
> Yousuf
>
>
>

Re: nested array flatten and then group by - Apache drill

Posted by Dechang Gu <dg...@maprtech.com>.

AFAIK,  for MongoDB plugin, we have very limited tests.
Anyway, I  tried the following query on your sample json file:

0: jdbc:drill:schema=dfs.tpchPar100> select t1.c1.text, count(t1.c1.text)
from (select flatten(tbl.twitter_entities.hashtags) as c1 from
dfs.`/jsondata/junk.json` as tbl) t1 group by t1.c1.text;

+---------+---------+

| EXPR$0  | EXPR$1  |

+---------+---------+

| skiing  | 1       |

| snow    | 1       |

+---------+---------+

2 rows selected (0.542 seconds)


is this the group/count you want to retrieve?


HTH,
Dechang

On Sun, Dec 11, 2016 at 12:10 AM, yousuf <yo...@css.org.sa> wrote:

> Hi,
>
> How can I retrieve hashtags[n].text values as flatten from the following
> json document (storage mongo). My final goal is to group text with count.
>
> Apache drill version : 1.8.0
> MongoDB version: 3.2
>
> I've tried several examples but my final result is either empty array eg
> [] or blank.
>
> {
>     "twitter_entities": {
>         "hashtags": [{
>             "text": "snow",
>             "indices": [
>                 66,
>                 71
>             ]
>         }, {
>             "text": "skiing",
>             "indices": [
>                 73,
>                 80
>             ]
>         }],
>         "trends": [],
>         "urls": [{
>             "url": "http://xxxxxxx",
>             "expanded_url": "http://xxxxx",
>             "display_url": xxxxx",
>             "indices": [
>                 71,
>                 93
>             ]
>         }]
>     }
> }
>
> Any help is appreciated
> Regards
> Yousuf
>
>
>