You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jason Altekruse (JIRA)" <ji...@apache.org> on 2015/03/17 00:34:39 UTC

[jira] [Commented] (DRILL-2241) CTAS fails when writing a repeated list

    [ https://issues.apache.org/jira/browse/DRILL-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364216#comment-14364216 ] 

Jason Altekruse commented on DRILL-2241:
----------------------------------------

I actually did not realize this when working on the parquet reader/writer, parquet doesn't support a list directly nested inside of another, we need to change the schema to add a column between the two lists (even if every time it is called something like "inner_list").

in the parquet file the records would have to look something like this:

{code}
{ "a" : null "b" : [ { "inner_list" : ["B1", "B2"] ] }, } 
{code}

And we have to translate this format into a plain nested list if we know that we wrote this transformed data.

To make this work seamlessly with JSON we would need to have this kind of transformation happen in the background when writing/reading. This unfortunately opens up the territory of us being like all of the other object models that have to map between their representations and the parquet object model, we will need to discuss the priority of this. We may need to just say this is unsupported in parquet for now.

> CTAS fails when writing a repeated list
> ---------------------------------------
>
>                 Key: DRILL-2241
>                 URL: https://issues.apache.org/jira/browse/DRILL-2241
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 0.8.0
>            Reporter: Abhishek Girish
>            Assignee: Jason Altekruse
>            Priority: Blocker
>             Fix For: 0.9.0
>
>         Attachments: drillbit_replist.log
>
>
> Drill can read the following JSON file with a repeated list:
> {
>   "a" : null
>   "b" : [ ["B1", "B2"] ],
> }
> Writing this to Parquet via a simple CTAS fails. 
> > create table temp as select * from `replist.json`;
> Log indicates this to be unsupported (UnsupportedOperationException: Unsupported type LIST)
> Log attached. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)