You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by mufy <mu...@gmail.com> on 2014/11/11 08:13:56 UTC

Doubt on Querying JSON

Say I have the following repeat format in my JSON,

data[1].a
data[2].a
data[3].a
data[4].a
data[5].a
data[6].a
...
...
...
data[100000].a

Is there a way to query all of them, say,

select data[*].a from `sample.json`.

May be wishful thinking :-). But felt it could be a very simple use-case.

---
Mufeed Usman
My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My
Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal
<http://mufeed.livejournal.com>

Re: Doubt on Querying JSON

Posted by Hao Zhu <hz...@maprtech.com>.

Hi Jim,

I can successfully reproduce the issue you mentioned.

1. Put 2 json files with the same content -- 1.json and 2.json.

$ cat 1.json
> {
>         "timestamp":1415688106710,
>         "status":"OK",
>         "total":17,
>         "data":[
>                 { "volumename":"a", "actualreplication":[0,0,0],
> "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 },
>                 { "volumename":"b", "actualreplication":[0,0,0],
> "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 },
>                 { "volumename":"c", "actualreplication":[0,0,0],
> "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 }
>                ]
> }

2. Then this SQL hung:

> > select `timestamp` as newtime,flatten(data) as newdata from
> dfs.tmp.`*.json`;
> +------------+------------+
> |  newtime   |  newdata   |
> +------------+------------+
> | 1415688106710 |
> {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> |
> | 1415688106710 |
> {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> |
> | 1415688106710 |
> {"volumename":"c","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> |
>
It should produce 6 rows instead.

Thanks,

Hao

On Tue, Nov 11, 2014 at 12:21 PM, Jim Bates <jb...@maprtech.com> wrote:

> I had trouble with flatten extending past one wile. If you include more
> than 1 json file in your data set it never seams to finish the command. In
> my case it returned in 20 sec with one file but when the data set had two
> files it never finiahed after 5 min so I gave up till a later release comes
> along.
> On Nov 11, 2014 11:57 AM, "Hao Zhu" <hz...@maprtech.com> wrote:
>
> > Hi Mufeed,
> >
> > I tested on latest 0.7 build and flatten can work in your case.
> >
> > with sub as
> > > (select `timestamp` as newtime,flatten(data) as newdata from
> > > dfs.tmp.`1.json`)
> > > select sub.newtime, sub.newdata.volumename,
> > >
> >
> sub.newdata.actualreplication,sub.newdata.InodesExceededAlarm,sub.newdata.ContainersNonLocalAlarm
> > >  from sub;
> > >
> > > +------------+------------+------------+------------+------------+
> > > |  newtime   |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |
> > > +------------+------------+------------+------------+------------+
> > > | 1415688106710 | a          | [0,0,0]    | 0          | 0          |
> > > | 1415688106710 | b          | [0,0,0]    | 0          | 0          |
> > > | 1415688106710 | c          | [0,0,0]    | 0          | 0          |
> > > +------------+------------+------------+------------+------------+
> > > 3 rows selected (0.158 seconds)
> >
> > Regarding your errors:
> >
> > 1. select timestamp from `1.json`;
> >
> > Since "timestamp" is a reserved word in Drill, so it should be changed
> to:
> >
> > select `timestamp` from `1.json`;
> >
> >
> > 2. select timestamp[0] from `1.json`;
> >
> > Since "timestamp" is not an array, so we can not use "[0]" to specify the
> > 1st element.
> >
> > 3. select status[0] from `1.json`;
> >
> > The same as 2, "status" is not an array.
> >
> > Thanks,
> >
> > Hao
> >
> > On Tue, Nov 11, 2014 at 7:22 AM, Jacques Nadeau <ja...@apache.org>
> > wrote:
> >
> > > We're indeed seeing some issues with the current master (development)
> > > branch and flatten functionality.  There a number of open bugs right
> now
> > > against it.  Can you review the open bugs and see if they describe the
> > > problem you are seeing?
> > >
> > > thanks
> > >
> > > On Tue, Nov 11, 2014 at 5:35 AM, mufy <mu...@gmail.com> wrote:
> > >
> > > > I have the following.
> > > >
> > > > {
> > > >         "timestamp":1415688106710,
> > > >         "status":"OK",
> > > >         "total":17,
> > > >         "data":[
> > > >                 { "volumename":"a", "actualreplication":[0,0,0],
> > > > "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 },
> > > >                 { "volumename":"b", "actualreplication":[0,0,0],
> > > > "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 },
> > > >                 { "volumename":"c", "actualreplication":[0,0,0],
> > > > "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 }
> > > >                ]
> > > > }
> > > >
> > > > But query results are "partial".
> > > >
> > > > 0: jdbc:drill:zk=n67:5181> select data[0] from `1.json`;
> > > > +------------+
> > > > |   EXPR$0   |
> > > > +------------+
> > > > |
> > > >
> > >
> >
> {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > > |
> > > > +------------+
> > > > 1 row selected (0.139 seconds)
> > > > 0: jdbc:drill:zk=n67:5181> select data[1] from `1.json`;
> > > > +------------+
> > > > |   EXPR$0   |
> > > > +------------+
> > > > |
> > > >
> > >
> >
> {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > > |
> > > > +------------+
> > > > 1 row selected (0.125 seconds)
> > > >
> > > >
> > > >
> > > > 0: jdbc:drill:zk=n67:5181> select * from `1.json`;
> > > > +------------+------------+------------+------------+
> > > > | timestamp  |   status   |   total    |    data    |
> > > > +------------+------------+------------+------------+
> > > > | 1415688106710 | OK         | 17         |
> > > >
> > > >
> > >
> >
> [{"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0},{"volumename":"b","actualreplication":[0,0,0]
> > > > |
> > > > +------------+------------+------------+------------+
> > > > 1 row selected (0.22 seconds)
> > > >
> > > >
> > > >
> > > > 0: jdbc:drill:zk=n67:5181> select timestamp from `1.json`;
> > > > Query failed: Failure while parsing sql.
> > > >
> > > > Error: exception while executing query: Failure while executing
> query.
> > > > (state=,code=0)
> > > > 0: jdbc:drill:zk=n67:5181> select timestamp[0] from `1.json`;
> > > > Query failed: Failure while parsing sql.
> > > >
> > > > Error: exception while executing query: Failure while executing
> query.
> > > > (state=,code=0)
> > > > 0: jdbc:drill:zk=n67:5181> select status[0] from `1.json`;
> > > > +------------+
> > > > |   EXPR$0   |
> > > > +------------+
> > > > Query failed: Failure while running fragment.
> > > >
> > > > java.lang.RuntimeException: java.sql.SQLException: Failure while
> > > > executing query.
> > > >     at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
> > > >     at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
> > > >     at sqlline.SqlLine.print(SqlLine.java:1809)
> > > >     at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
> > > >     at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
> > > >     at sqlline.SqlLine.dispatch(SqlLine.java:889)
> > > >     at sqlline.SqlLine.begin(SqlLine.java:763)
> > > >     at sqlline.SqlLine.start(SqlLine.java:498)
> > > >     at sqlline.SqlLine.main(SqlLine.java:460)
> > > >
> > >
> >
>

Re: Doubt on Querying JSON

Posted by Hao Zhu <hz...@maprtech.com>.

Correct. In this case, flatten() has to be in a subquery.

with tmp as
(select `timestamp` as newtime,flatten(data) as newdata from
dfs.root.`/Users/hzu/Documents/sharefolder/test2/*.json`)
SELECT newtime, tmp.newdata.volumename AS volumename from tmp;



On Sat, Nov 15, 2014 at 12:46 AM, Steven Phillips <sp...@maprtech.com>
wrote:

> You should try something like this:
>
> SELECT x.newtime,
>        x.newdata.volumename AS volumename
> FROM   (SELECT `timestamp`   AS newtime,
>                flatten(data) AS newdata
>         FROM   dfs.root.`/users/hzu/documents/sharefolder/test2/*.json`)x;
>
> On Fri, Nov 14, 2014 at 9:59 PM, mufy <mu...@gmail.com> wrote:
>
> > Hao,
> >
> > In the above, how should the query be modified to extract the info as
> > below?
> >
> > +------------+------------+
> > |  newtime   |  volumename   |
> > +------------+------------+
> > | 1415688106710 | a |
> > | 1415688106710 | b |
> > | 1415688106710 | c |
> > +------------+------------+
> >
> >
> >
> > ---
> > Mufeed Usman
> > My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My
> > Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal
> > <http://mufeed.livejournal.com>
> >
> >
> >
> >
> > On Sat, Nov 15, 2014 at 2:35 AM, Hao Zhu <hz...@maprtech.com> wrote:
> >
> > > Hi Jim,
> > >
> > > I verified this issue you mentioned on latest 0.7 build and it is
> fixed.
> > >
> > > 0: jdbc:drill:zk=local> select `timestamp` as newtime,flatten(data) as
> > > > newdata from
> dfs.root.`/Users/hzu/Documents/sharefolder/test2/*.json`;
> > > > +------------+------------+
> > > > |  newtime   |  newdata   |
> > > > +------------+------------+
> > > > | 1415688106710 |
> > > >
> > >
> >
> {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > > |
> > > > | 1415688106710 |
> > > >
> > >
> >
> {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > > |
> > > > | 1415688106710 |
> > > >
> > >
> >
> {"volumename":"c","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > > |
> > > > | 1415688106710 |
> > > >
> > >
> >
> {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > > |
> > > > | 1415688106710 |
> > > >
> > >
> >
> {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > > |
> > > > | 1415688106710 |
> > > >
> > >
> >
> {"volumename":"c","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > > |
> > > > +------------+------------+
> > > > 6 rows selected (0.242 seconds)
> > >
> > > Thanks,
> > >
> > > Hao
> > >
> > > On Tue, Nov 11, 2014 at 3:06 PM, Jacques Nadeau <ja...@apache.org>
> > > wrote:
> > >
> > > > On Tue, Nov 11, 2014 at 12:21 PM, Jim Bates <jb...@maprtech.com>
> > wrote:
> > > >
> > > > > I had trouble with flatten extending past one wile. If you include
> > more
> > > > > than 1 json file in your data set it never seams to finish the
> > command.
> > > > In
> > > > > my case it returned in 20 sec with one file but when the data set
> had
> > > two
> > > > > files it never finiahed after 5 min so I gave up till a later
> release
> > > > comes
> > > > > along.
> > > > >
> > > >
> > > > Yeah, that'll happen with nightly builds.  Give the build a few days.
> > > > We're testing fixes now to deal with a lot of these issues.
> > > >
> > >
> >
>
>
>
> --
>  Steven Phillips
>  Software Engineer
>
>  mapr.com
>

Re: Doubt on Querying JSON

Posted by Steven Phillips <sp...@maprtech.com>.

You should try something like this:

SELECT x.newtime,
       x.newdata.volumename AS volumename
FROM   (SELECT `timestamp`   AS newtime,
               flatten(data) AS newdata
        FROM   dfs.root.`/users/hzu/documents/sharefolder/test2/*.json`)x;

On Fri, Nov 14, 2014 at 9:59 PM, mufy <mu...@gmail.com> wrote:

> Hao,
>
> In the above, how should the query be modified to extract the info as
> below?
>
> +------------+------------+
> |  newtime   |  volumename   |
> +------------+------------+
> | 1415688106710 | a |
> | 1415688106710 | b |
> | 1415688106710 | c |
> +------------+------------+
>
>
>
> ---
> Mufeed Usman
> My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My
> Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal
> <http://mufeed.livejournal.com>
>
>
>
>
> On Sat, Nov 15, 2014 at 2:35 AM, Hao Zhu <hz...@maprtech.com> wrote:
>
> > Hi Jim,
> >
> > I verified this issue you mentioned on latest 0.7 build and it is fixed.
> >
> > 0: jdbc:drill:zk=local> select `timestamp` as newtime,flatten(data) as
> > > newdata from dfs.root.`/Users/hzu/Documents/sharefolder/test2/*.json`;
> > > +------------+------------+
> > > |  newtime   |  newdata   |
> > > +------------+------------+
> > > | 1415688106710 |
> > >
> >
> {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > |
> > > | 1415688106710 |
> > >
> >
> {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > |
> > > | 1415688106710 |
> > >
> >
> {"volumename":"c","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > |
> > > | 1415688106710 |
> > >
> >
> {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > |
> > > | 1415688106710 |
> > >
> >
> {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > |
> > > | 1415688106710 |
> > >
> >
> {"volumename":"c","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > |
> > > +------------+------------+
> > > 6 rows selected (0.242 seconds)
> >
> > Thanks,
> >
> > Hao
> >
> > On Tue, Nov 11, 2014 at 3:06 PM, Jacques Nadeau <ja...@apache.org>
> > wrote:
> >
> > > On Tue, Nov 11, 2014 at 12:21 PM, Jim Bates <jb...@maprtech.com>
> wrote:
> > >
> > > > I had trouble with flatten extending past one wile. If you include
> more
> > > > than 1 json file in your data set it never seams to finish the
> command.
> > > In
> > > > my case it returned in 20 sec with one file but when the data set had
> > two
> > > > files it never finiahed after 5 min so I gave up till a later release
> > > comes
> > > > along.
> > > >
> > >
> > > Yeah, that'll happen with nightly builds.  Give the build a few days.
> > > We're testing fixes now to deal with a lot of these issues.
> > >
> >
>



-- 
 Steven Phillips
 Software Engineer

 mapr.com

Re: Doubt on Querying JSON

Posted by mufy <mu...@gmail.com>.

Hao,

In the above, how should the query be modified to extract the info as below?

+------------+------------+
|  newtime   |  volumename   |
+------------+------------+
| 1415688106710 | a |
| 1415688106710 | b |
| 1415688106710 | c |
+------------+------------+



---
Mufeed Usman
My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My
Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal
<http://mufeed.livejournal.com>




On Sat, Nov 15, 2014 at 2:35 AM, Hao Zhu <hz...@maprtech.com> wrote:

> Hi Jim,
>
> I verified this issue you mentioned on latest 0.7 build and it is fixed.
>
> 0: jdbc:drill:zk=local> select `timestamp` as newtime,flatten(data) as
> > newdata from dfs.root.`/Users/hzu/Documents/sharefolder/test2/*.json`;
> > +------------+------------+
> > |  newtime   |  newdata   |
> > +------------+------------+
> > | 1415688106710 |
> >
> {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > |
> > | 1415688106710 |
> >
> {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > |
> > | 1415688106710 |
> >
> {"volumename":"c","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > |
> > | 1415688106710 |
> >
> {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > |
> > | 1415688106710 |
> >
> {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > |
> > | 1415688106710 |
> >
> {"volumename":"c","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > |
> > +------------+------------+
> > 6 rows selected (0.242 seconds)
>
> Thanks,
>
> Hao
>
> On Tue, Nov 11, 2014 at 3:06 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
> > On Tue, Nov 11, 2014 at 12:21 PM, Jim Bates <jb...@maprtech.com> wrote:
> >
> > > I had trouble with flatten extending past one wile. If you include more
> > > than 1 json file in your data set it never seams to finish the command.
> > In
> > > my case it returned in 20 sec with one file but when the data set had
> two
> > > files it never finiahed after 5 min so I gave up till a later release
> > comes
> > > along.
> > >
> >
> > Yeah, that'll happen with nightly builds.  Give the build a few days.
> > We're testing fixes now to deal with a lot of these issues.
> >
>

Re: Doubt on Querying JSON

Posted by Hao Zhu <hz...@maprtech.com>.

Hi Jim,

I verified this issue you mentioned on latest 0.7 build and it is fixed.

0: jdbc:drill:zk=local> select `timestamp` as newtime,flatten(data) as
> newdata from dfs.root.`/Users/hzu/Documents/sharefolder/test2/*.json`;
> +------------+------------+
> |  newtime   |  newdata   |
> +------------+------------+
> | 1415688106710 |
> {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> |
> | 1415688106710 |
> {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> |
> | 1415688106710 |
> {"volumename":"c","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> |
> | 1415688106710 |
> {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> |
> | 1415688106710 |
> {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> |
> | 1415688106710 |
> {"volumename":"c","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> |
> +------------+------------+
> 6 rows selected (0.242 seconds)

Thanks,

Hao

On Tue, Nov 11, 2014 at 3:06 PM, Jacques Nadeau <ja...@apache.org> wrote:

> On Tue, Nov 11, 2014 at 12:21 PM, Jim Bates <jb...@maprtech.com> wrote:
>
> > I had trouble with flatten extending past one wile. If you include more
> > than 1 json file in your data set it never seams to finish the command.
> In
> > my case it returned in 20 sec with one file but when the data set had two
> > files it never finiahed after 5 min so I gave up till a later release
> comes
> > along.
> >
>
> Yeah, that'll happen with nightly builds.  Give the build a few days.
> We're testing fixes now to deal with a lot of these issues.
>

Re: Doubt on Querying JSON

Posted by Jacques Nadeau <ja...@apache.org>.

On Tue, Nov 11, 2014 at 12:21 PM, Jim Bates <jb...@maprtech.com> wrote:

> I had trouble with flatten extending past one wile. If you include more
> than 1 json file in your data set it never seams to finish the command. In
> my case it returned in 20 sec with one file but when the data set had two
> files it never finiahed after 5 min so I gave up till a later release comes
> along.
>

Yeah, that'll happen with nightly builds.  Give the build a few days.
We're testing fixes now to deal with a lot of these issues.

Re: Doubt on Querying JSON

Posted by Jim Bates <jb...@maprtech.com>.

I had trouble with flatten extending past one wile. If you include more
than 1 json file in your data set it never seams to finish the command. In
my case it returned in 20 sec with one file but when the data set had two
files it never finiahed after 5 min so I gave up till a later release comes
along.
On Nov 11, 2014 11:57 AM, "Hao Zhu" <hz...@maprtech.com> wrote:

> Hi Mufeed,
>
> I tested on latest 0.7 build and flatten can work in your case.
>
> with sub as
> > (select `timestamp` as newtime,flatten(data) as newdata from
> > dfs.tmp.`1.json`)
> > select sub.newtime, sub.newdata.volumename,
> >
> sub.newdata.actualreplication,sub.newdata.InodesExceededAlarm,sub.newdata.ContainersNonLocalAlarm
> >  from sub;
> >
> > +------------+------------+------------+------------+------------+
> > |  newtime   |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |
> > +------------+------------+------------+------------+------------+
> > | 1415688106710 | a          | [0,0,0]    | 0          | 0          |
> > | 1415688106710 | b          | [0,0,0]    | 0          | 0          |
> > | 1415688106710 | c          | [0,0,0]    | 0          | 0          |
> > +------------+------------+------------+------------+------------+
> > 3 rows selected (0.158 seconds)
>
> Regarding your errors:
>
> 1. select timestamp from `1.json`;
>
> Since "timestamp" is a reserved word in Drill, so it should be changed to:
>
> select `timestamp` from `1.json`;
>
>
> 2. select timestamp[0] from `1.json`;
>
> Since "timestamp" is not an array, so we can not use "[0]" to specify the
> 1st element.
>
> 3. select status[0] from `1.json`;
>
> The same as 2, "status" is not an array.
>
> Thanks,
>
> Hao
>
> On Tue, Nov 11, 2014 at 7:22 AM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
> > We're indeed seeing some issues with the current master (development)
> > branch and flatten functionality.  There a number of open bugs right now
> > against it.  Can you review the open bugs and see if they describe the
> > problem you are seeing?
> >
> > thanks
> >
> > On Tue, Nov 11, 2014 at 5:35 AM, mufy <mu...@gmail.com> wrote:
> >
> > > I have the following.
> > >
> > > {
> > >         "timestamp":1415688106710,
> > >         "status":"OK",
> > >         "total":17,
> > >         "data":[
> > >                 { "volumename":"a", "actualreplication":[0,0,0],
> > > "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 },
> > >                 { "volumename":"b", "actualreplication":[0,0,0],
> > > "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 },
> > >                 { "volumename":"c", "actualreplication":[0,0,0],
> > > "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 }
> > >                ]
> > > }
> > >
> > > But query results are "partial".
> > >
> > > 0: jdbc:drill:zk=n67:5181> select data[0] from `1.json`;
> > > +------------+
> > > |   EXPR$0   |
> > > +------------+
> > > |
> > >
> >
> {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > |
> > > +------------+
> > > 1 row selected (0.139 seconds)
> > > 0: jdbc:drill:zk=n67:5181> select data[1] from `1.json`;
> > > +------------+
> > > |   EXPR$0   |
> > > +------------+
> > > |
> > >
> >
> {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > > |
> > > +------------+
> > > 1 row selected (0.125 seconds)
> > >
> > >
> > >
> > > 0: jdbc:drill:zk=n67:5181> select * from `1.json`;
> > > +------------+------------+------------+------------+
> > > | timestamp  |   status   |   total    |    data    |
> > > +------------+------------+------------+------------+
> > > | 1415688106710 | OK         | 17         |
> > >
> > >
> >
> [{"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0},{"volumename":"b","actualreplication":[0,0,0]
> > > |
> > > +------------+------------+------------+------------+
> > > 1 row selected (0.22 seconds)
> > >
> > >
> > >
> > > 0: jdbc:drill:zk=n67:5181> select timestamp from `1.json`;
> > > Query failed: Failure while parsing sql.
> > >
> > > Error: exception while executing query: Failure while executing query.
> > > (state=,code=0)
> > > 0: jdbc:drill:zk=n67:5181> select timestamp[0] from `1.json`;
> > > Query failed: Failure while parsing sql.
> > >
> > > Error: exception while executing query: Failure while executing query.
> > > (state=,code=0)
> > > 0: jdbc:drill:zk=n67:5181> select status[0] from `1.json`;
> > > +------------+
> > > |   EXPR$0   |
> > > +------------+
> > > Query failed: Failure while running fragment.
> > >
> > > java.lang.RuntimeException: java.sql.SQLException: Failure while
> > > executing query.
> > >     at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
> > >     at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
> > >     at sqlline.SqlLine.print(SqlLine.java:1809)
> > >     at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
> > >     at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
> > >     at sqlline.SqlLine.dispatch(SqlLine.java:889)
> > >     at sqlline.SqlLine.begin(SqlLine.java:763)
> > >     at sqlline.SqlLine.start(SqlLine.java:498)
> > >     at sqlline.SqlLine.main(SqlLine.java:460)
> > >
> >
>

Re: Doubt on Querying JSON

Posted by Hao Zhu <hz...@maprtech.com>.

Hi Mufeed,

I tested on latest 0.7 build and flatten can work in your case.

with sub as
> (select `timestamp` as newtime,flatten(data) as newdata from
> dfs.tmp.`1.json`)
> select sub.newtime, sub.newdata.volumename,
> sub.newdata.actualreplication,sub.newdata.InodesExceededAlarm,sub.newdata.ContainersNonLocalAlarm
>  from sub;
>
> +------------+------------+------------+------------+------------+
> |  newtime   |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |
> +------------+------------+------------+------------+------------+
> | 1415688106710 | a          | [0,0,0]    | 0          | 0          |
> | 1415688106710 | b          | [0,0,0]    | 0          | 0          |
> | 1415688106710 | c          | [0,0,0]    | 0          | 0          |
> +------------+------------+------------+------------+------------+
> 3 rows selected (0.158 seconds)

Regarding your errors:

1. select timestamp from `1.json`;

Since "timestamp" is a reserved word in Drill, so it should be changed to:

select `timestamp` from `1.json`;


2. select timestamp[0] from `1.json`;

Since "timestamp" is not an array, so we can not use "[0]" to specify the
1st element.

3. select status[0] from `1.json`;

The same as 2, "status" is not an array.

Thanks,

Hao

On Tue, Nov 11, 2014 at 7:22 AM, Jacques Nadeau <ja...@apache.org> wrote:

> We're indeed seeing some issues with the current master (development)
> branch and flatten functionality.  There a number of open bugs right now
> against it.  Can you review the open bugs and see if they describe the
> problem you are seeing?
>
> thanks
>
> On Tue, Nov 11, 2014 at 5:35 AM, mufy <mu...@gmail.com> wrote:
>
> > I have the following.
> >
> > {
> >         "timestamp":1415688106710,
> >         "status":"OK",
> >         "total":17,
> >         "data":[
> >                 { "volumename":"a", "actualreplication":[0,0,0],
> > "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 },
> >                 { "volumename":"b", "actualreplication":[0,0,0],
> > "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 },
> >                 { "volumename":"c", "actualreplication":[0,0,0],
> > "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 }
> >                ]
> > }
> >
> > But query results are "partial".
> >
> > 0: jdbc:drill:zk=n67:5181> select data[0] from `1.json`;
> > +------------+
> > |   EXPR$0   |
> > +------------+
> > |
> >
> {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > |
> > +------------+
> > 1 row selected (0.139 seconds)
> > 0: jdbc:drill:zk=n67:5181> select data[1] from `1.json`;
> > +------------+
> > |   EXPR$0   |
> > +------------+
> > |
> >
> {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> > |
> > +------------+
> > 1 row selected (0.125 seconds)
> >
> >
> >
> > 0: jdbc:drill:zk=n67:5181> select * from `1.json`;
> > +------------+------------+------------+------------+
> > | timestamp  |   status   |   total    |    data    |
> > +------------+------------+------------+------------+
> > | 1415688106710 | OK         | 17         |
> >
> >
> [{"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0},{"volumename":"b","actualreplication":[0,0,0]
> > |
> > +------------+------------+------------+------------+
> > 1 row selected (0.22 seconds)
> >
> >
> >
> > 0: jdbc:drill:zk=n67:5181> select timestamp from `1.json`;
> > Query failed: Failure while parsing sql.
> >
> > Error: exception while executing query: Failure while executing query.
> > (state=,code=0)
> > 0: jdbc:drill:zk=n67:5181> select timestamp[0] from `1.json`;
> > Query failed: Failure while parsing sql.
> >
> > Error: exception while executing query: Failure while executing query.
> > (state=,code=0)
> > 0: jdbc:drill:zk=n67:5181> select status[0] from `1.json`;
> > +------------+
> > |   EXPR$0   |
> > +------------+
> > Query failed: Failure while running fragment.
> >
> > java.lang.RuntimeException: java.sql.SQLException: Failure while
> > executing query.
> >     at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
> >     at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
> >     at sqlline.SqlLine.print(SqlLine.java:1809)
> >     at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
> >     at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
> >     at sqlline.SqlLine.dispatch(SqlLine.java:889)
> >     at sqlline.SqlLine.begin(SqlLine.java:763)
> >     at sqlline.SqlLine.start(SqlLine.java:498)
> >     at sqlline.SqlLine.main(SqlLine.java:460)
> >
>

Re: Doubt on Querying JSON

Posted by Jacques Nadeau <ja...@apache.org>.

We're indeed seeing some issues with the current master (development)
branch and flatten functionality.  There a number of open bugs right now
against it.  Can you review the open bugs and see if they describe the
problem you are seeing?

thanks

On Tue, Nov 11, 2014 at 5:35 AM, mufy <mu...@gmail.com> wrote:

> I have the following.
>
> {
>         "timestamp":1415688106710,
>         "status":"OK",
>         "total":17,
>         "data":[
>                 { "volumename":"a", "actualreplication":[0,0,0],
> "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 },
>                 { "volumename":"b", "actualreplication":[0,0,0],
> "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 },
>                 { "volumename":"c", "actualreplication":[0,0,0],
> "InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 }
>                ]
> }
>
> But query results are "partial".
>
> 0: jdbc:drill:zk=n67:5181> select data[0] from `1.json`;
> +------------+
> |   EXPR$0   |
> +------------+
> |
> {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> |
> +------------+
> 1 row selected (0.139 seconds)
> 0: jdbc:drill:zk=n67:5181> select data[1] from `1.json`;
> +------------+
> |   EXPR$0   |
> +------------+
> |
> {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
> |
> +------------+
> 1 row selected (0.125 seconds)
>
>
>
> 0: jdbc:drill:zk=n67:5181> select * from `1.json`;
> +------------+------------+------------+------------+
> | timestamp  |   status   |   total    |    data    |
> +------------+------------+------------+------------+
> | 1415688106710 | OK         | 17         |
>
> [{"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0},{"volumename":"b","actualreplication":[0,0,0]
> |
> +------------+------------+------------+------------+
> 1 row selected (0.22 seconds)
>
>
>
> 0: jdbc:drill:zk=n67:5181> select timestamp from `1.json`;
> Query failed: Failure while parsing sql.
>
> Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> 0: jdbc:drill:zk=n67:5181> select timestamp[0] from `1.json`;
> Query failed: Failure while parsing sql.
>
> Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> 0: jdbc:drill:zk=n67:5181> select status[0] from `1.json`;
> +------------+
> |   EXPR$0   |
> +------------+
> Query failed: Failure while running fragment.
>
> java.lang.RuntimeException: java.sql.SQLException: Failure while
> executing query.
>     at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
>     at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>     at sqlline.SqlLine.print(SqlLine.java:1809)
>     at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>     at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>     at sqlline.SqlLine.dispatch(SqlLine.java:889)
>     at sqlline.SqlLine.begin(SqlLine.java:763)
>     at sqlline.SqlLine.start(SqlLine.java:498)
>     at sqlline.SqlLine.main(SqlLine.java:460)
>

Re: Doubt on Querying JSON

Posted by mufy <mu...@gmail.com>.

I have the following.

{
        "timestamp":1415688106710,
        "status":"OK",
        "total":17,
        "data":[
                { "volumename":"a", "actualreplication":[0,0,0],
"InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 },
                { "volumename":"b", "actualreplication":[0,0,0],
"InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 },
                { "volumename":"c", "actualreplication":[0,0,0],
"InodesExceededAlarm":0, "ContainersNonLocalAlarm":0 }
               ]
}

But query results are "partial".

0: jdbc:drill:zk=n67:5181> select data[0] from `1.json`;
+------------+
|   EXPR$0   |
+------------+
| {"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
|
+------------+
1 row selected (0.139 seconds)
0: jdbc:drill:zk=n67:5181> select data[1] from `1.json`;
+------------+
|   EXPR$0   |
+------------+
| {"volumename":"b","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0}
|
+------------+
1 row selected (0.125 seconds)



0: jdbc:drill:zk=n67:5181> select * from `1.json`;
+------------+------------+------------+------------+
| timestamp  |   status   |   total    |    data    |
+------------+------------+------------+------------+
| 1415688106710 | OK         | 17         |
[{"volumename":"a","actualreplication":[0,0,0],"InodesExceededAlarm":0,"ContainersNonLocalAlarm":0},{"volumename":"b","actualreplication":[0,0,0]
|
+------------+------------+------------+------------+
1 row selected (0.22 seconds)



0: jdbc:drill:zk=n67:5181> select timestamp from `1.json`;
Query failed: Failure while parsing sql.

Error: exception while executing query: Failure while executing query.
(state=,code=0)
0: jdbc:drill:zk=n67:5181> select timestamp[0] from `1.json`;
Query failed: Failure while parsing sql.

Error: exception while executing query: Failure while executing query.
(state=,code=0)
0: jdbc:drill:zk=n67:5181> select status[0] from `1.json`;
+------------+
|   EXPR$0   |
+------------+
Query failed: Failure while running fragment.

java.lang.RuntimeException: java.sql.SQLException: Failure while
executing query.
    at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
    at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
    at sqlline.SqlLine.print(SqlLine.java:1809)
    at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
    at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
    at sqlline.SqlLine.dispatch(SqlLine.java:889)
    at sqlline.SqlLine.begin(SqlLine.java:763)
    at sqlline.SqlLine.start(SqlLine.java:498)
    at sqlline.SqlLine.main(SqlLine.java:460)

Re: Doubt on Querying JSON

Posted by Christopher Matta <cm...@mapr.com>.

Mufy,

If I read your question right, you’d like a row for each item in the array,
that is, for it to be flattened?

Similar to this data:

{
    "id": 1
    "data": [
        {"object": 1},
        {"object": 2},
        {"object": 3},
        {"object": 4},
    ]
}

You could select it like this in an upcoming build:

0: jdbc:drill:zk=sen11:5181,sen12:5181> select `id`, flatten(`data`)
from `test_flatten.json`;
+------------+------------+
|     id     |   EXPR$1   |
+------------+------------+
| 1          | {"object":1} |
| 1          | {"object":2} |
| 1          | {"object":3} |
| 1          | {"object":4} |
+------------+------------+
4 rows selected (0.124 seconds)
0: jdbc:drill:zk=sen11:5181,sen12:5181> select t.`id`,
t.`data`.`object` from (select `id`, flatten(`data`) as `data` from
`test_flatten.json`) t;
+------------+------------+
|     id     |   EXPR$1   |
+------------+------------+
| 1          | 1          |
| 1          | 2          |
| 1          | 3          |
| 1          | 4          |
+------------+------------+



Chris Matta
cmatta@mapr.com
215-701-3146

On Tue, Nov 11, 2014 at 2:13 AM, mufy <mu...@gmail.com> wrote:

> Say I have the following repeat format in my JSON,
>
> data[1].a
> data[2].a
> data[3].a
> data[4].a
> data[5].a
> data[6].a
> ...
> ...
> ...
> data[100000].a
>
> Is there a way to query all of them, say,
>
> select data[*].a from `sample.json`.
>
> May be wishful thinking :-). But felt it could be a very simple use-case.
>
> ---
> Mufeed Usman
> My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My
> Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal
> <http://mufeed.livejournal.com>
>