You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/01/18 19:59:26 UTC
[jira] [Created] (DRILL-5204) Extend mock data source to use table
specs from SQL
Paul Rogers created DRILL-5204:
----------------------------------
Summary: Extend mock data source to use table specs from SQL
Key: DRILL-5204
URL: https://issues.apache.org/jira/browse/DRILL-5204
Project: Apache Drill
Issue Type: Improvement
Components: Tools, Build & Test
Affects Versions: 1.9.0
Reporter: Paul Rogers
Assignee: Paul Rogers
Priority: Minor
DRILL-5152 provided a simple way to generate mock data from SQL:
{code}
SELECT colName_type FROM `mock`.`tableName_size` ...
{code}
The fix in that release encoded types and record counts directly in the SQL, which is very handy for many simple cases.
The original mock data source has another feature: it lets you create multiple mock blocks of data that can be read in multiple threads. Later additions made it easy to repeat a column definition (to generate, say, a table with 1000 columns), to choose the data generator class, etc. All of this was available only when writing physical plans by hand and encoding the definition in the sub scan for the mock data source.
This enhancement extends the SQL feature to allow the definitions to appear in a JSON file easily referenced from SQL. The JSON file must be somewhere on the class path (typically in a resources directory.) Then:
{code}
SELECT red, blue, green FROM `mock`.`foo/colors.json` ...
{code}
Is interpreted to mean, "the file colors.json defines a mock data source, perhaps with repeated columns, perhaps with multiple fragments. From that mock data source, select the three columns red, blue and green."
With this change, tests can include quite sophisticated mock data sources, simplifying debugging of plans with multiple fragments and/or more complex table structures.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)