You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Ashwin Jayaprakash <as...@gmail.com> on 2015/01/15 17:27:09 UTC
Some issues with nested (JSON) queries
Hello, I was trying to run some queries on a JSON document. I think I may
have discovered some bugs. I was using Drill 0.7.0.
This is the JSON document (test1.json):
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
1) I think the parser got confused with the various "type" fields. I think
this query is valid as "j.type" is "donut" for the one and only row.
Although there are other "type" fields, I believe my query should have
worked.
select j.id id, j.name name, flatten(j.topping) tt,
flatten(j.batters.batter) bb from
dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
j where j.type = 'donut';
Query failed: Query failed: Failure while running fragment., Trying to
flatten a non-repeated filed.
2) The parser appears to be automatically converting "id" to a tinyint. I
suppose this is correct, but wanted your opinion on this.
select j.id id, j.name name, flatten(j.topping) tt,
flatten(j.batters.batter) bb from
dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
j where id = 'donut';
Query failed: Query failed: Failure while running fragment., index: -4,
length: 4 (expected: range(0, 16384))
3) Isn't there a way to filter the records before the flattening happens by
specifying that the path "j.topping.type" should only be "Sugar".
select j.id id, j.name name, flatten(j.topping) tt,
flatten(j.batters.batter) bb from
dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
j where j.topping.type = 'Sugar';
Query failed: Query failed: Failure while running fragment.,
org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast to
org.apache.drill.exec.vector.complex.MapVector
4) Is there a length function supported on the nested arrays?
5) There is a spelling mistake in the error message :) "Trying to flatten a
non-repeated filed." - "filed"
Thanks.
Re: Some issues with nested (JSON) queries
Posted by Ashwin Jayaprakash <as...@gmail.com>.
Rahul, thanks for your response. (I couldn't figure out a way to reply to
your actual email as I have "digest subscription").
I think #4 length function on arrays will be a very useful feature (does
anyone remember XPath?)
Same with #3 - predicate pushdown into a nested field. Otherwise the only
way to filter would be to flatten/unnest and then filter it.
On Thu, Jan 15, 2015 at 8:27 AM, Ashwin Jayaprakash <
ashwin.jayaprakash@gmail.com> wrote:
> Hello, I was trying to run some queries on a JSON document. I think I may
> have discovered some bugs. I was using Drill 0.7.0.
>
> This is the JSON document (test1.json):
>
> {
> "id": "0001",
> "type": "donut",
> "name": "Cake",
> "ppu": 0.55,
> "batters":
> {
> "batter":
> [
> { "id": "1001", "type": "Regular" },
> { "id": "1002", "type": "Chocolate" },
> { "id": "1003", "type": "Blueberry" },
> { "id": "1004", "type": "Devil's Food" }
> ]
> },
> "topping":
> [
> { "id": "5001", "type": "None" },
> { "id": "5002", "type": "Glazed" },
> { "id": "5005", "type": "Sugar" },
> { "id": "5007", "type": "Powdered Sugar" },
> { "id": "5006", "type": "Chocolate with Sprinkles" },
> { "id": "5003", "type": "Chocolate" },
> { "id": "5004", "type": "Maple" }
> ]
> }
>
>
> 1) I think the parser got confused with the various "type" fields. I think
> this query is valid as "j.type" is "donut" for the one and only row.
> Although there are other "type" fields, I believe my query should have
> worked.
>
> select j.id id, j.name name, flatten(j.topping) tt,
> flatten(j.batters.batter) bb from
> dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
> j where j.type = 'donut';
> Query failed: Query failed: Failure while running fragment., Trying to
> flatten a non-repeated filed.
>
>
> 2) The parser appears to be automatically converting "id" to a tinyint. I
> suppose this is correct, but wanted your opinion on this.
>
> select j.id id, j.name name, flatten(j.topping) tt,
> flatten(j.batters.batter) bb from
> dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
> j where id = 'donut';
> Query failed: Query failed: Failure while running fragment., index: -4,
> length: 4 (expected: range(0, 16384))
>
>
> 3) Isn't there a way to filter the records before the flattening happens
> by specifying that the path "j.topping.type" should only be "Sugar".
>
> select j.id id, j.name name, flatten(j.topping) tt,
> flatten(j.batters.batter) bb from
> dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
> j where j.topping.type = 'Sugar';
> Query failed: Query failed: Failure while running fragment.,
> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast to
> org.apache.drill.exec.vector.complex.MapVector
>
> 4) Is there a length function supported on the nested arrays?
>
> 5) There is a spelling mistake in the error message :) "Trying to flatten
> a non-repeated filed." - "filed"
>
> Thanks.
>
Re: Some issues with nested (JSON) queries
Posted by rahul challapalli <ch...@gmail.com>.
https://issues.apache.org/jira/browse/DRILL-2012
https://issues.apache.org/jira/browse/DRILL-2013
On Thu, Jan 15, 2015 at 9:51 AM, rahul challapalli <
challapallirahul@gmail.com> wrote:
> Hi Ashwin,
>
> Thanks for trying out drill. Below are a few of my comments in blue. For
> the issues found I will raise relevant jira's. Let us know if you have
> further questions.
>
>
> On Thu, Jan 15, 2015 at 8:27 AM, Ashwin Jayaprakash <
> ashwin.jayaprakash@gmail.com> wrote:
>
>> Hello, I was trying to run some queries on a JSON document. I think I may
>> have discovered some bugs. I was using Drill 0.7.0.
>>
>> This is the JSON document (test1.json):
>>
>> {
>> "id": "0001",
>> "type": "donut",
>> "name": "Cake",
>> "ppu": 0.55,
>> "batters":
>> {
>> "batter":
>> [
>> { "id": "1001", "type": "Regular" },
>> { "id": "1002", "type": "Chocolate" },
>> { "id": "1003", "type": "Blueberry" },
>> { "id": "1004", "type": "Devil's Food" }
>> ]
>> },
>> "topping":
>> [
>> { "id": "5001", "type": "None" },
>> { "id": "5002", "type": "Glazed" },
>> { "id": "5005", "type": "Sugar" },
>> { "id": "5007", "type": "Powdered Sugar" },
>> { "id": "5006", "type": "Chocolate with Sprinkles" },
>> { "id": "5003", "type": "Chocolate" },
>> { "id": "5004", "type": "Maple" }
>> ]
>> }
>>
>>
>> 1) I think the parser got confused with the various "type" fields. I think
>> this query is valid as "j.type" is "donut" for the one and only row.
>> Although there are other "type" fields, I believe my query should have
>> worked.
>>
>> select j.id id, j.name name, flatten(j.topping) tt,
>> flatten(j.batters.batter) bb from
>>
>> dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
>> j where j.type = 'donut';
>> Query failed: Query failed: Failure while running fragment., Trying to
>> flatten a non-repeated filed.
>>
>
> This looks like a bug. You can overcome this by adding the filter column
> to the list of columns in the 'select' part
> *select j.type, j.id <http://j.id> id, j.name <http://j.name> name,
> flatten(j.topping) tt, flatten(j.batters.batter) bb from `sample.json` j
> where j.type = 'donut'; *
>
>>
>>
>> 2) The parser appears to be automatically converting "id" to a tinyint. I
>> suppose this is correct, but wanted your opinion on this.
>>
>> select j.id id, j.name name, flatten(j.topping) tt,
>> flatten(j.batters.batter) bb from
>>
>> dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
>> j where id = 'donut';
>> Query failed: Query failed: Failure while running fragment., index: -4,
>> length: 4 (expected: range(0, 16384))
>>
>> To my knowledge drill only tries to do an implicit cast based on your
> filter(or join) condition. If you compare a column with a string literal,
> drill tries to cast the column as a string
> The same query works without flatten or by changing your filter condition
> so that the filter condition succeeds. So its most likely a flatten related
> issue
> *select j.id <http://j.id> id, j.name <http://j.name> name from
> `sample.json` j where j.id <http://j.id> = 'donut';*
>
> *select j.id <http://j.id> id, j.name <http://j.name> name,
> flatten(j.topping) tt, flatten(j.batters.batter) bb from `user.json` j
> where j.id <http://j.id> = '0001';*
>
>
>>
>> 3) Isn't there a way to filter the records before the flattening happens
>> by
>> specifying that the path "j.topping.type" should only be "Sugar".
>>
>> select j.id id, j.name name, flatten(j.topping) tt,
>> flatten(j.batters.batter) bb from
>>
>> dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
>> j where j.topping.type = 'Sugar';
>> Query failed: Query failed: Failure while running fragment.,
>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast to
>> org.apache.drill.exec.vector.complex.MapVector
>>
>
> You cannot do something like this as of today. However you can apply a
> filter on a particular element in the array.
> *select j.id <http://j.id> id, j.name <http://j.name> name,
> flatten(j.topping) tt, flatten(j.batters.batter) bb from `user.json` j
> where j.topping[2].type = 'Sugar'; *
>
>>
>> 4) Is there a length function supported on the nested arrays?
>>
> I believe not. Some can correct me if I am wrong.
>
>>
>> 5) There is a spelling mistake in the error message :) "Trying to flatten
>> a
>> non-repeated filed." - "filed"
>>
>
> I will raise a JIRA to get this fixed.
>
>>
>> Thanks.
>>
>
>
Re: Some issues with nested (JSON) queries
Posted by rahul challapalli <ch...@gmail.com>.
Hi Ashwin,
Thanks for trying out drill. Below are a few of my comments in blue. For
the issues found I will raise relevant jira's. Let us know if you have
further questions.
On Thu, Jan 15, 2015 at 8:27 AM, Ashwin Jayaprakash <
ashwin.jayaprakash@gmail.com> wrote:
> Hello, I was trying to run some queries on a JSON document. I think I may
> have discovered some bugs. I was using Drill 0.7.0.
>
> This is the JSON document (test1.json):
>
> {
> "id": "0001",
> "type": "donut",
> "name": "Cake",
> "ppu": 0.55,
> "batters":
> {
> "batter":
> [
> { "id": "1001", "type": "Regular" },
> { "id": "1002", "type": "Chocolate" },
> { "id": "1003", "type": "Blueberry" },
> { "id": "1004", "type": "Devil's Food" }
> ]
> },
> "topping":
> [
> { "id": "5001", "type": "None" },
> { "id": "5002", "type": "Glazed" },
> { "id": "5005", "type": "Sugar" },
> { "id": "5007", "type": "Powdered Sugar" },
> { "id": "5006", "type": "Chocolate with Sprinkles" },
> { "id": "5003", "type": "Chocolate" },
> { "id": "5004", "type": "Maple" }
> ]
> }
>
>
> 1) I think the parser got confused with the various "type" fields. I think
> this query is valid as "j.type" is "donut" for the one and only row.
> Although there are other "type" fields, I believe my query should have
> worked.
>
> select j.id id, j.name name, flatten(j.topping) tt,
> flatten(j.batters.batter) bb from
>
> dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
> j where j.type = 'donut';
> Query failed: Query failed: Failure while running fragment., Trying to
> flatten a non-repeated filed.
>
This looks like a bug. You can overcome this by adding the filter column to
the list of columns in the 'select' part
*select j.type, j.id <http://j.id> id, j.name <http://j.name> name,
flatten(j.topping) tt, flatten(j.batters.batter) bb from `sample.json` j
where j.type = 'donut'; *
>
>
> 2) The parser appears to be automatically converting "id" to a tinyint. I
> suppose this is correct, but wanted your opinion on this.
>
> select j.id id, j.name name, flatten(j.topping) tt,
> flatten(j.batters.batter) bb from
>
> dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
> j where id = 'donut';
> Query failed: Query failed: Failure while running fragment., index: -4,
> length: 4 (expected: range(0, 16384))
>
> To my knowledge drill only tries to do an implicit cast based on your
filter(or join) condition. If you compare a column with a string literal,
drill tries to cast the column as a string
The same query works without flatten or by changing your filter condition
so that the filter condition succeeds. So its most likely a flatten related
issue
*select j.id <http://j.id> id, j.name <http://j.name> name from
`sample.json` j where j.id <http://j.id> = 'donut';*
*select j.id <http://j.id> id, j.name <http://j.name> name,
flatten(j.topping) tt, flatten(j.batters.batter) bb from `user.json` j
where j.id <http://j.id> = '0001';*
>
> 3) Isn't there a way to filter the records before the flattening happens by
> specifying that the path "j.topping.type" should only be "Sugar".
>
> select j.id id, j.name name, flatten(j.topping) tt,
> flatten(j.batters.batter) bb from
>
> dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
> j where j.topping.type = 'Sugar';
> Query failed: Query failed: Failure while running fragment.,
> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast to
> org.apache.drill.exec.vector.complex.MapVector
>
You cannot do something like this as of today. However you can apply a
filter on a particular element in the array.
*select j.id <http://j.id> id, j.name <http://j.name> name,
flatten(j.topping) tt, flatten(j.batters.batter) bb from `user.json` j
where j.topping[2].type = 'Sugar'; *
>
> 4) Is there a length function supported on the nested arrays?
>
I believe not. Some can correct me if I am wrong.
>
> 5) There is a spelling mistake in the error message :) "Trying to flatten a
> non-repeated filed." - "filed"
>
I will raise a JIRA to get this fixed.
>
> Thanks.
>