You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/11/09 22:03:00 UTC
[jira] [Created] (DRILL-5949) JSON format options should be part of
plugin config; not session options
Paul Rogers created DRILL-5949:
----------------------------------
Summary: JSON format options should be part of plugin config; not session options
Key: DRILL-5949
URL: https://issues.apache.org/jira/browse/DRILL-5949
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.12.0
Reporter: Paul Rogers
Drill provides a JSON record reader. Drill provides two ways to configure this reader:
* Using the JSON plugin configuration.
* Using a set of session options.
The plugin configuration defines the file suffix associated with JSON files. The session options are:
* {{store.json.all_text_mode}}
* {{store.json.read_numbers_as_double}}
* {{store.json.reader.skip_invalid_records}}
* {{store.json.reader.print_skipped_invalid_record_number}}
Suppose I have to JSON files from different sources (and keep them in distinct directories.) For the one, I want to use {{all_text_mode}} off as the data is nicely formatted. Also, my numbers are fine, so I want {{read_numbers_as_double}} off.
But, the other file is a mess and uses a rather ad-hoc format. So, I want these two options turned on.
As it turns out I often query both files. Today, I must set the session options one way to query my "clean" file, then reverse them to query the "dirty" file.
Next, I want to join the two files. How do I set the options one way for the "clean" file, and the other for the "dirty" file within the *same query*? Can't.
Now, consider the text format plugin that can read CSV, TSV, PSV and so on. It has a variety of options. But, the are *not* session options; they are instead options in the plugin definition. This allows me to, say, have a plugin config for CSV-with-headers files that I get from source A, and a different plugin config for my CSV-without-headers files from source B.
Suppose we applied the text reader technique to the JSON reader. We'd move the session options listed above into the JSON format plugin. Then, I can define one plugin for my "clean" files, and a different plugin config for my "dirty" files.
What's more, I can then use table functions to adjust the format for each file as needed within a single query. Since table functions are part of a query, I can add them to a view that I define for the various JSON files.
The result is a far simpler user experience than the tedium of resetting session options for every query.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)