You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Igor Guzenko (Jira)" <ji...@apache.org> on 2019/10/21 09:24:00 UTC
[jira] [Assigned] (DRILL-5183) Drill doesn't seem to handle array
values correctly in Parquet files
[ https://issues.apache.org/jira/browse/DRILL-5183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Igor Guzenko reassigned DRILL-5183:
-----------------------------------
Assignee: Igor Guzenko
> Drill doesn't seem to handle array values correctly in Parquet files
> --------------------------------------------------------------------
>
> Key: DRILL-5183
> URL: https://issues.apache.org/jira/browse/DRILL-5183
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Dave Kincaid
> Assignee: Igor Guzenko
> Priority: Major
> Attachments: books.parquet
>
>
> It looks to me that Drill is not properly converting array values in Parquet records. I have created a simple example and will attach a simple Parquet file to this issue. If I write Parquet records using the Avro schema
> {code:title=Book.avsc}
> { "type": "record",
> "name": "Book",
> "fields": [
> { "name": "title", "type": "string" },
> { "name": "pages", "type": "int" },
> { "name": "authors", "type": {"type": "array", "items": "string"} }
> ]
> }
> {code}
> I write two records using this schema into the attached Parquet file and then simply run {{SELECT * FROM dfs.`books.parquet`}} I get the following result:
> ||title||pages||authors||
> |Physics of Waves|477|{"array":["William C. Elmore","Mark A. Heald"]}|
> |Foundations of Mathematical Analysis|428|{"array":["Richard Johnsonbaugh","W.E. Pfaffenberger"]}|
> You can see that the authors column seems to be a nested record with the name "array" instead of being a repeated value. If I change the SQL query to {{SELECT title,pages,t.authors.`array` FROM dfs.`/home/davek/src/drill-parquet-example/resources/books.parquet` t;}} then I get:
> ||title||pages||EXPR$2||
> |Physics of Waves|477|["William C. Elmore","Mark A. Heald"]|
> |Foundations of Mathematical Analysis|428|["Richard Johnsonbaugh","W.E. Pfaffenberger"]|
> and now that column behaves in Drill as a repeated values column.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)