You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Vinupriyaa Muthusamypillai Ananthakrishna <va...@sapient.com> on 2015/04/27 13:28:16 UTC

Drill Flatten function unable to read Nested Parquet generated by spark

Json Structure was :
{"site": [{"siteid":"site1","sitename":"sitename1"},{"siteid":"site2","sitename":"sitename2"}]}


Parquet data when i query through Drill explorer:
site as a column name: below is the value
{
  "array" : [ {
    "siteid" : "site1",
    "sitename" : "sitename1"
  }, {
    "siteid" : "site2",
    "sitename" : "sitename2"
  } ]
}

The work "array" is shown with the data. By which Drill is not able to read the data set using the FLATTEN Function.
Is there any way to read the data using Flatten function.


Thanks
Vinupriyaa

Re: Drill Flatten function unable to read Nested Parquet generated by spark

Posted by Hao Zhu <hz...@maprtech.com>.

Is it working using sqlline?

It works fine for me from sqlline with Drill 0.8 version:

0: jdbc:drill:zk=> select * from testparq;
+------------+
|   array    |
+------------+
|
[{"siteid":"site1","sitename":"sitename1"},{"siteid":"site2","sitename":"sitename2"}]
|
+------------+
1 row selected (0.063 seconds)
0: jdbc:drill:zk>  select flatten(`array`) from testparq;
+------------+
|   EXPR$0   |
+------------+
| {"siteid":"site1","sitename":"sitename1"} |
| {"siteid":"site2","sitename":"sitename2"} |
+------------+
2 rows selected (0.054 seconds)

Thanks,
Hao

On Mon, Apr 27, 2015 at 4:28 AM, Vinupriyaa Muthusamypillai Ananthakrishna <
vananthakrishna@sapient.com> wrote:

> Json Structure was :
> {"site":
> [{"siteid":"site1","sitename":"sitename1"},{"siteid":"site2","sitename":"sitename2"}]}
>
>
> Parquet data when i query through Drill explorer:
> site as a column name: below is the value
> {
>   "array" : [ {
>     "siteid" : "site1",
>     "sitename" : "sitename1"
>   }, {
>     "siteid" : "site2",
>     "sitename" : "sitename2"
>   } ]
> }
>
> The work "array" is shown with the data. By which Drill is not able to
> read the data set using the FLATTEN Function.
> Is there any way to read the data using Flatten function.
>
>
> Thanks
> Vinupriyaa
>
>

Re: Drill Flatten function unable to read Nested Parquet generated by spark

Posted by rahul challapalli <ch...@gmail.com>.

Hi,

The output from your parquet file does not correspond to the JSON structure
you have. In your parquet, the 'site' element is a map which contains one
key "array". So does your flatten query look like

select flatten(d.site.array) from `data.parquet` d;
or
select flatten(d.site) from `data.parquet` d; (This would not work)

- Rahul

On Mon, Apr 27, 2015 at 4:28 AM, Vinupriyaa Muthusamypillai Ananthakrishna <
vananthakrishna@sapient.com> wrote:

> Json Structure was :
> {"site":
> [{"siteid":"site1","sitename":"sitename1"},{"siteid":"site2","sitename":"sitename2"}]}
>
>
> Parquet data when i query through Drill explorer:
> site as a column name: below is the value
> {
>   "array" : [ {
>     "siteid" : "site1",
>     "sitename" : "sitename1"
>   }, {
>     "siteid" : "site2",
>     "sitename" : "sitename2"
>   } ]
> }
>
> The work "array" is shown with the data. By which Drill is not able to
> read the data set using the FLATTEN Function.
> Is there any way to read the data using Flatten function.
>
>
> Thanks
> Vinupriyaa
>
>