You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/05/16 19:37:04 UTC

[jira] [Resolved] (DRILL-5204) Extend mock data source to use table specs from SQL

     [ https://issues.apache.org/jira/browse/DRILL-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers resolved DRILL-5204.
--------------------------------
    Resolution: Fixed

Not sure why this was not closed earlier. Feature has been checked into Master.

Set up the mock data source. Then:

{code}
SELECT id_i, name_s50 FROM `mock`.`customers_1M`
{code}

The column and table names are fictions. The important part is the suffix. For columns, "_i" means integer, "_sx" means a string of length x, and so on. For tables, "x" means x rows. "xK" means x thousand rows. "xM" means x million rows.

See the {{ExampleTest}} class for details.

> Extend mock data source to use table specs from SQL
> ---------------------------------------------------
>
>                 Key: DRILL-5204
>                 URL: https://issues.apache.org/jira/browse/DRILL-5204
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Tools, Build & Test
>    Affects Versions: 1.9.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> DRILL-5152 provided a simple way to generate mock data from SQL:
> {code}
> SELECT colName_type FROM `mock`.`tableName_size` ...
> {code}
> The fix in that release encoded types and record counts directly in the SQL, which is very handy for many simple cases.
> The original mock data source has another feature: it lets you create multiple mock blocks of data that can be read in multiple threads. Later additions made it easy to repeat a column definition (to generate, say, a table with 1000 columns), to choose the data generator class, etc. All of this was available only when writing physical plans by hand and encoding the definition in the sub scan for the mock data source.
> This enhancement extends the SQL feature to allow the definitions to appear in a JSON file easily referenced from SQL. The JSON file must be somewhere on the class path (typically in a resources directory.) Then:
> {code}
> SELECT red, blue, green FROM `mock`.`foo/colors.json` ...
> {code}
> Is interpreted to mean, "the file colors.json defines a mock data source, perhaps with repeated columns, perhaps with multiple fragments. From that mock data source, select the three columns red, blue and green."
> With this change, tests can include quite sophisticated mock data sources, simplifying debugging of plans with multiple fragments and/or more complex table structures.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)