You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Mehant Baid <ba...@gmail.com> on 2013/12/03 11:14:40 UTC

Re: Review Request 15822: Remove _MAP[] from dirll; JIRA# 307

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15822/
-----------------------------------------------------------

(Updated Dec. 3, 2013, 10:14 a.m.)


Review request for drill and Jacques Nadeau.


Changes
-------

Rebased to the latest master.

Made code changes as per Jacques's review comments. 


Repository: drill-git


Description (updated)
-------

Currently while executing SQL we need to use _MAP['columnname'] in the select statements.

The problem while using select statements with columnnames instead of _MAP['columnname'] is that Optiq tries to verify that these columns exist in the schema and raises validation errors.

The way we've solved the problem is to override the 'rowtype' returned to Optiq while creating a Drill table. We have defined a new class DynamicRecordType which is the 'rowtype' returned to Optiq and will be invoked for any validation purposes. DynamicRecordType always returns true when queried if a certain field exists in the table, with its type as 'ANY'. This gets us past the validation in Optiq and allows us to use column names directly in select statements.

Details:

JSONRecordReader.java:
Previously we had the notion that the scan operator has a single column called _MAP, which used to be the reference for the scan operator. So all the actual columns would be referenced as _MAP.columnname. We have removed reference (ref) in the scan operator (since we don't need it) so we simply use column name to get the value vector. 

UnbackedRecord.java:
Modified setClearAndSetRoot to be setRoot. Any row read via the reference interpreter's JSON reader previously needed to be surrounded by a MAP type so that the project expression _MAP.columnname would be correctly evaluated. Since we don't have the notion of _MAP anymore, I have changed the plans such that the expression (project) contains only column names, hence we don't need to surround every row with a MAP type. 

DrillTable.java:
Instead of using Optiq's rowtype, the DrillTable will use the new DynamicRecordType as the rowtype. We need to use this rowtype, so that when Optiq is performing validation we invoke functions in this class that will allow validation to pass smoothly. More details about how this works is in DynamicRecordType.java

DynamicRecordType.java:
We use an instance of this class as the row type for DrillTable. Since we don't know the schema before hand whenever Optiq requires us to validate that a field exists we always return true and indicate that the type of that field is 'ANY'. By default we have one field in this table = '*', type is 'ANY'. This field is basically used during 'select *' type queries. We use '*' as a special character which essentially means get all columns in the record batch. In BatchLoaderMap while returning the results, we detect that the requested field is using this special character ('*') and return all fields in the record batch as opposed to in the normal case where we return only the requested field.

DrillScan.java:
We no longer need a 'ref' for the scan operator. Removed it.

BatchLoaderMap.java:
Modified the special character to be used to detect that all the fields in the record batch are to be returned as result as opposed to only the requested field from "_MAP" to "*"

FileSystemSchema.java:
The schema for the table is generated from the query, meaning the field list is essentially all the fields requested. Since different queries can request different fields it does not make sense to store schema information. In the next iteration it would be useful to store the schema once we perform a scan and know the actual fields from the data. 

DrillProjectRel.java:
If the ProjectRel contains only a single projection, whose expression is our special character (= name of default field in DynamicRecordType) then we know that all the fields in the record batch are to be returned. So we don't add a Project operator in that case. 

Test changes (*.json and test/**/*.java)
All test changes fall into one of the following categories:

* Upper case column names. Optiq upper cases the column names specified in the query, so we look for upper cased column names while fetching the data. If our column names in the data isn't upper cased we will return NULL. This wasn't a problem previously since we used to specify the column name within single quotes (something like _MAP['columnname']) which optiq didn't upper case.

* Remove the use of 'ref'. Since we don't have a 'ref' for the scan operator we don't need to use tablename.columnname or _MAP.columnname.


Diffs (updated)
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/store/json/JSONRecordReader.java 49aea22 
  exec/ref/src/main/java/org/apache/drill/exec/ref/UnbackedRecord.java 29c83d4 
  exec/ref/src/main/java/org/apache/drill/exec/ref/rse/JSONRecordReader.java f73bee5 
  exec/ref/src/test/java/org/apache/drill/exec/ref/rops/CollapsingAggregateTest.java d63c0cc 
  exec/ref/src/test/java/org/apache/drill/exec/ref/rops/OrderROPTest.java d4ea473 
  exec/ref/src/test/java/org/apache/drill/exec/ref/rops/WindowFrameROPTest.java 735c9cb 
  exec/ref/src/test/resources/collapse/test1.json d4b1e24 
  exec/ref/src/test/resources/departments.json 3cf0a85 
  exec/ref/src/test/resources/donuts.json d16ef46 
  exec/ref/src/test/resources/employees.json 83f68bd 
  exec/ref/src/test/resources/join/full_join.json 59a53f3 
  exec/ref/src/test/resources/join/left_join.json 96e6a79 
  exec/ref/src/test/resources/join/simple_join.json 1587a78 
  exec/ref/src/test/resources/order/nulls-first.json 6e541e9 
  exec/ref/src/test/resources/order/nulls-last.json 83338b7 
  exec/ref/src/test/resources/union/distinct.json 9361009 
  exec/ref/src/test/resources/union/nondistinct.json 35ad70e 
  sqlparser/src/main/java/org/apache/drill/jdbc/DrillTable.java f51021e 
  sqlparser/src/main/java/org/apache/drill/jdbc/DynamicRecordType.java PRE-CREATION 
  sqlparser/src/main/java/org/apache/drill/optiq/DrillProjectRel.java fabfb77 
  sqlparser/src/main/java/org/apache/drill/optiq/DrillScan.java a165fea 
  sqlparser/src/main/java/org/apache/drill/sql/client/full/BatchLoaderMap.java d444348 
  sqlparser/src/main/java/org/apache/drill/sql/client/full/FileSystemSchema.java 1650aaf 
  sqlparser/src/test/java/org/apache/drill/jdbc/test/JdbcTest.java 9ae9682 
  sqlparser/src/test/resources/test-models.json 23895c9 

Diff: https://reviews.apache.org/r/15822/diff/


Testing
-------


Thanks,

Mehant Baid