You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Dave Kincaid (JIRA)" <ji...@apache.org> on 2017/01/09 19:56:58 UTC

[jira] [Created] (DRILL-5183) Drill doesn't seem to handle array values correctly in Parquet files

Dave Kincaid created DRILL-5183:
-----------------------------------

             Summary: Drill doesn't seem to handle array values correctly in Parquet files
                 Key: DRILL-5183
                 URL: https://issues.apache.org/jira/browse/DRILL-5183
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Dave Kincaid
         Attachments: books.parquet

It looks to me that Drill is not properly converting array values in Parquet records. I have created a simple example and will attach a simple Parquet file to this issue. If I write Parquet records using the Avro schema

{code:title=Book.avsc}
{ "type": "record",
  "name": "Book",
  "fields": [
    { "name": "title", "type": "string" },
    { "name": "pages", "type": "int" },
    { "name": "authors", "type": {"type": "array", "items": "string"} }
  ]
}
{code}

I write two records using this schema into the attached Parquet file and then simply run {{SELECT * FROM dfs.`books.parquet`}} I get the following result:

||title||pages||authors||
|Physics of Waves|477|{"array":["William C. Elmore","Mark A. Heald"]}|
|Foundations of Mathematical Analysis|428|{"array":["Richard Johnsonbaugh","W.E. Pfaffenberger"]}|

You can see that the authors column seems to be a nested record with the name "array" instead of being a repeated value. If I change the SQL query to {{SELECT title,pages,t.authors.`array` FROM dfs.`/home/davek/src/drill-parquet-example/resources/books.parquet` t;}} then I get:

||title||pages||EXPR$2||
|Physics of Waves|477|["William C. Elmore","Mark A. Heald"]|
|Foundations of Mathematical Analysis|428|["Richard Johnsonbaugh","W.E. Pfaffenberger"]|

and now that column behaves in Drill as a repeated values column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)