You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by yousuf <yo...@css.org.sa> on 2016/12/11 08:10:26 UTC
nested array flatten and then group by - Apache drill
Hi,
How can I retrieve hashtags[n].text values as flatten from the following
json document (storage mongo). My final goal is to group text with count.
Apache drill version : 1.8.0
MongoDB version: 3.2
I've tried several examples but my final result is either empty array eg
[] or blank.
{
"twitter_entities": {
"hashtags": [{
"text": "snow",
"indices": [
66,
71
]
}, {
"text": "skiing",
"indices": [
73,
80
]
}],
"trends": [],
"urls": [{
"url": "http://xxxxxxx",
"expanded_url": "http://xxxxx",
"display_url": xxxxx",
"indices": [
71,
93
]
}]
}
}
Any help is appreciated
Regards
Yousuf
Re: nested array flatten and then group by - Apache drill
Posted by Kathleen Li <kl...@maprtech.com>.
Dechang provided method by using drill flatten function with mongdb plugin works fine in my test env by using the json document Yousuf provided.
Details as below:
I am using drill 1.8 and mongodb 3.2.9
> db.test.find()
{ "_id" : ObjectId("584f3bad7161b4e2b9fa0a7d"), "twitter_entities" : { "hashtags" : [ { "text" : "snow", "indices" : [ 66, 71 ] }, { "text" : "skiing", "indices" : [ 73, 80 ] } ], "trends" : [ ], "urls" : [ { "url" : "http://xxxxxxx", "expanded_url" : "http://xxxxx", "display_url" : "xxxxx", "indices" : [ 71, 93 ] } ] } }
0: jdbc:drill:zk=drill1:5181,drill2:5181,dril> use mongo.test;
+-------+-----------------------------------------+
| ok | summary |
+-------+-----------------------------------------+
| true | Default schema changed to [mongo.test] |
+-------+-----------------------------------------+
1 row selected (0.376 seconds)
0: jdbc:drill:zk=drill1:5181,drill2:5181,dril> select t1.c1.text, count(t1.c1.text) from (select
. . . . . . . . . . . . . . . . . . . . . . .> flatten(tbl.twitter_entities.hashtags) as c1 from test as tbl) t1 group
. . . . . . . . . . . . . . . . . . . . . . .> by t1.c1.text;
+---------+---------+
| EXPR$0 | EXPR$1 |
+---------+---------+
| skiing | 1 |
| snow | 1 |
+---------+---------+
2 rows selected (0.372 seconds)
The only thing difference is that I noticed in the original document Yousuf provided, for key/value pair "display_url": xxxxx”, “ is missing before xxxxx” and I was not able to insert/import into mongodb, after I added “ —>"display_url": “xxxxx”, everything works fine.
Regards,
Kathleen
On 12/13/16, 6:30 AM, "yousuf" <yo...@css.org.sa> wrote:
>Hi Dechang,
>
>The query didn't work with mongodb.
>
>
>use mongo.test;
>
>select t1.c1.text, count(t1.c1.text) from (select
>flatten(tbl.twitter_entities.hashtags) as c1 from test as tbl) t1 group
>by t1.c1.text;
>+---------+---------+
>| EXPR$0 | EXPR$1 |
>+---------+---------+
>+---------+---------+
>
>Kind Regards
>
>Yousuf
>
>
>
>
>
>
>-------- Forwarded Message --------
>Subject: Re: nested array flatten and then group by - Apache drill
>Date: Mon, 12 Dec 2016 10:32:53 -0800
>From: Dechang Gu <dg...@maprtech.com>
>Reply-To: user@drill.apache.org
>To: user@drill.apache.org
>
>
>
>AFAIK, for MongoDB plugin, we have very limited tests.
>Anyway, I tried the following query on your sample json file:
>
>0: jdbc:drill:schema=dfs.tpchPar100> select t1.c1.text, count(t1.c1.text)
>from (select flatten(tbl.twitter_entities.hashtags) as c1 from
>dfs.`/jsondata/junk.json` as tbl) t1 group by t1.c1.text;
>
>+---------+---------+
>
>| EXPR$0 | EXPR$1 |
>
>+---------+---------+
>
>| skiing | 1 |
>
>| snow | 1 |
>
>+---------+---------+
>
>2 rows selected (0.542 seconds)
>
>
>is this the group/count you want to retrieve?
>
>
>HTH,
>Dechang
>
>On Sun, Dec 11, 2016 at 12:10 AM, yousuf <yo...@css.org.sa> wrote:
>
>> Hi,
>>
>> How can I retrieve hashtags[n].text values as flatten from the following
>> json document (storage mongo). My final goal is to group text with count.
>>
>> Apache drill version : 1.8.0
>> MongoDB version: 3.2
>>
>> I've tried several examples but my final result is either empty array eg
>> [] or blank.
>>
>> {
>> "twitter_entities": {
>> "hashtags": [{
>> "text": "snow",
>> "indices": [
>> 66,
>> 71
>> ]
>> }, {
>> "text": "skiing",
>> "indices": [
>> 73,
>> 80
>> ]
>> }],
>> "trends": [],
>> "urls": [{
>> "url": "http://xxxxxxx",
>> "expanded_url": "http://xxxxx",
>> "display_url": xxxxx",
>> "indices": [
>> 71,
>> 93
>> ]
>> }]
>> }
>> }
>>
>> Any help is appreciated
>> Regards
>> Yousuf
>>
>>
>>
>
Fwd: Re: nested array flatten and then group by - Apache drill
Posted by yousuf <yo...@css.org.sa>.
Hi Dechang,
The query didn't work with mongodb.
use mongo.test;
select t1.c1.text, count(t1.c1.text) from (select
flatten(tbl.twitter_entities.hashtags) as c1 from test as tbl) t1 group
by t1.c1.text;
+---------+---------+
| EXPR$0 | EXPR$1 |
+---------+---------+
+---------+---------+
Kind Regards
Yousuf
-------- Forwarded Message --------
Subject: Re: nested array flatten and then group by - Apache drill
Date: Mon, 12 Dec 2016 10:32:53 -0800
From: Dechang Gu <dg...@maprtech.com>
Reply-To: user@drill.apache.org
To: user@drill.apache.org
AFAIK, for MongoDB plugin, we have very limited tests.
Anyway, I tried the following query on your sample json file:
0: jdbc:drill:schema=dfs.tpchPar100> select t1.c1.text, count(t1.c1.text)
from (select flatten(tbl.twitter_entities.hashtags) as c1 from
dfs.`/jsondata/junk.json` as tbl) t1 group by t1.c1.text;
+---------+---------+
| EXPR$0 | EXPR$1 |
+---------+---------+
| skiing | 1 |
| snow | 1 |
+---------+---------+
2 rows selected (0.542 seconds)
is this the group/count you want to retrieve?
HTH,
Dechang
On Sun, Dec 11, 2016 at 12:10 AM, yousuf <yo...@css.org.sa> wrote:
> Hi,
>
> How can I retrieve hashtags[n].text values as flatten from the following
> json document (storage mongo). My final goal is to group text with count.
>
> Apache drill version : 1.8.0
> MongoDB version: 3.2
>
> I've tried several examples but my final result is either empty array eg
> [] or blank.
>
> {
> "twitter_entities": {
> "hashtags": [{
> "text": "snow",
> "indices": [
> 66,
> 71
> ]
> }, {
> "text": "skiing",
> "indices": [
> 73,
> 80
> ]
> }],
> "trends": [],
> "urls": [{
> "url": "http://xxxxxxx",
> "expanded_url": "http://xxxxx",
> "display_url": xxxxx",
> "indices": [
> 71,
> 93
> ]
> }]
> }
> }
>
> Any help is appreciated
> Regards
> Yousuf
>
>
>
Re: nested array flatten and then group by - Apache drill
Posted by Dechang Gu <dg...@maprtech.com>.
AFAIK, for MongoDB plugin, we have very limited tests.
Anyway, I tried the following query on your sample json file:
0: jdbc:drill:schema=dfs.tpchPar100> select t1.c1.text, count(t1.c1.text)
from (select flatten(tbl.twitter_entities.hashtags) as c1 from
dfs.`/jsondata/junk.json` as tbl) t1 group by t1.c1.text;
+---------+---------+
| EXPR$0 | EXPR$1 |
+---------+---------+
| skiing | 1 |
| snow | 1 |
+---------+---------+
2 rows selected (0.542 seconds)
is this the group/count you want to retrieve?
HTH,
Dechang
On Sun, Dec 11, 2016 at 12:10 AM, yousuf <yo...@css.org.sa> wrote:
> Hi,
>
> How can I retrieve hashtags[n].text values as flatten from the following
> json document (storage mongo). My final goal is to group text with count.
>
> Apache drill version : 1.8.0
> MongoDB version: 3.2
>
> I've tried several examples but my final result is either empty array eg
> [] or blank.
>
> {
> "twitter_entities": {
> "hashtags": [{
> "text": "snow",
> "indices": [
> 66,
> 71
> ]
> }, {
> "text": "skiing",
> "indices": [
> 73,
> 80
> ]
> }],
> "trends": [],
> "urls": [{
> "url": "http://xxxxxxx",
> "expanded_url": "http://xxxxx",
> "display_url": xxxxx",
> "indices": [
> 71,
> 93
> ]
> }]
> }
> }
>
> Any help is appreciated
> Regards
> Yousuf
>
>
>