You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "benj (JIRA)" <ji...@apache.org> on 2019/03/11 13:48:00 UTC
[jira] [Created] (DRILL-7090) Improve management of
Optional(Nullable) / Required(Not nullable) type at least for parquet
storage
benj created DRILL-7090:
---------------------------
Summary: Improve management of Optional(Nullable) / Required(Not nullable) type at least for parquet storage
Key: DRILL-7090
URL: https://issues.apache.org/jira/browse/DRILL-7090
Project: Apache Drill
Issue Type: Improvement
Components: Storage - Parquet
Affects Versions: 1.15.0
Reporter: benj
It will be useful to have the ability to precise/define/cast the "mode" of columns for Parquet storage.
Example of problem without this possibility : several files are created by different methods/process. all the files have the same columns. When requested all the file and group on a column
{code:java}
SELECT source, count(*) FROM ....`ALL` GROUP BY source;
=>
java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema change
Prior schema : BatchSchema [fields=[[`source` (VARCHAR:REQUIRED)]], selectionVector=NONE]
New schema : BatchSchema [fields=[[`source` (VARCHAR:OPTIONAL)]], selectionVector=NONE]
{code}
Because source has different way of generation (example : use of a const, use of dir0*).
It will be nice to have the ability to define himself the nullable attribute (required/optional) or at least the ability to cast on read the mode/type of the field - it will allows a better homogeneity of the files and avoid crash on simple operation like aggregation.
(*) In a surprising way,
* dir0 => varchar<NULLABLE>
* '' => varchar<NOT NULL>
* coalesce(dir0, '') => varchar<NULLABLE> *???*
User should have the ability to overrule the system choice to define if the column mode is required or optional
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)