You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Venki Korukanti <ve...@gmail.com> on 2015/09/27 16:49:41 UTC

Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/
-----------------------------------------------------------

Review request for drill and Jinfeng Ni.


Repository: drill-git


Description
-------

Please jira DRILL-3209 for details.


Diffs
-----

  pom.xml 8d4b318f26c0d9723cc1bd8d842d38e8fa7f9cea 

Diff: https://reviews.apache.org/r/38796/diff/


Testing
-------

Added unittests to test reading all supported types, project pushdown and partition pruning. Manually tested with Hive tables containing large amount of data (these tests will become part of the regression suite).


Thanks,

Venki Korukanti

Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Posted by Venki Korukanti <ve...@gmail.com>.


> On Sept. 29, 2015, 12:59 p.m., Jinfeng Ni wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java, line 267
> > <https://reviews.apache.org/r/38796/diff/3/?file=1087071#file1087071line267>
> >
> >     I have one question about partition column. 
> >     
> >     Let's say Hive has 'year" as partition column. For value 2015, does Hive put "year=2015" as the directory name? If that's the case, then "year=2015" would be treated as "dir0" by native parquet reader, in stead of "2015"? Do we need handle the difference of partition column between hive scan and native scan?

Hive already stores the partition values (eg, 2015) as strings in metastore. If the partition doesn't have location defined in ADD partition command, it creates a default partition location by appending partcol1=value1/partcol2=value2 to table location. In our case we get the dir0 values from metastore directory and pass them to ScanBatch for partition vector fillup. So we just need to cast them from VARCHAR to partition column type.


- Venki


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/#review100998
-----------------------------------------------------------


On Sept. 29, 2015, 9:23 a.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38796/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2015, 9:23 a.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Please jira DRILL-3209 for details.
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java 11c6455 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java 9ada569 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java 23aa37f 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java 2181c2a 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java b459ee4 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java f0b4bdc 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java 6423a36 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java 9211af6 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java 6118be5 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java 34a7ed6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 66f9f03 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java 5838bd1 
> 
> Diff: https://reviews.apache.org/r/38796/diff/
> 
> 
> Testing
> -------
> 
> Added unittests to test reading all supported types, project pushdown and partition pruning. Manually tested with Hive tables containing large amount of data (these tests will become part of the regression suite).
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>

Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Posted by Jinfeng Ni <jn...@maprtech.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/#review100998
-----------------------------------------------------------

Ship it!


The revised comments address my comments.


contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java (line 267)
<https://reviews.apache.org/r/38796/#comment158300>

    I have one question about partition column. 
    
    Let's say Hive has 'year" as partition column. For value 2015, does Hive put "year=2015" as the directory name? If that's the case, then "year=2015" would be treated as "dir0" by native parquet reader, in stead of "2015"? Do we need handle the difference of partition column between hive scan and native scan?


- Jinfeng Ni


On Sept. 29, 2015, 9:23 a.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38796/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2015, 9:23 a.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Please jira DRILL-3209 for details.
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java 11c6455 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java 9ada569 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java 23aa37f 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java 2181c2a 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java b459ee4 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java f0b4bdc 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java 6423a36 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java 9211af6 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java 6118be5 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java 34a7ed6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 66f9f03 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java 5838bd1 
> 
> Diff: https://reviews.apache.org/r/38796/diff/
> 
> 
> Testing
> -------
> 
> Added unittests to test reading all supported types, project pushdown and partition pruning. Manually tested with Hive tables containing large amount of data (these tests will become part of the regression suite).
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>

Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Posted by Aman Sinha <as...@maprtech.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/#review100991
-----------------------------------------------------------

Ship it!


Ship It!

- Aman Sinha


On Sept. 29, 2015, 4:23 p.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38796/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2015, 4:23 p.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Please jira DRILL-3209 for details.
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java 11c6455 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java 9ada569 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java 23aa37f 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java 2181c2a 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java b459ee4 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java f0b4bdc 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java 6423a36 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java 9211af6 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java 6118be5 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java 34a7ed6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 66f9f03 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java 5838bd1 
> 
> Diff: https://reviews.apache.org/r/38796/diff/
> 
> 
> Testing
> -------
> 
> Added unittests to test reading all supported types, project pushdown and partition pruning. Manually tested with Hive tables containing large amount of data (these tests will become part of the regression suite).
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>

Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Posted by Jason Altekruse <al...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/#review101007
-----------------------------------------------------------

Ship it!


Looks good. One thing we should do before we merge this is open a sub-JIRA to do something similar for scans of Hive text based tables. 3209 discusses this as a possiblility, but I this should just be merged as is to enable the feature. As more formats are added as Drill FormatPlugins it might make sense to revisit this to make sure we have a nice interface for enhancing this feature easily whenever someone adds a format that would be useful to plug in here.

- Jason Altekruse


On Sept. 29, 2015, 4:23 p.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38796/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2015, 4:23 p.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Please jira DRILL-3209 for details.
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java 11c6455 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java 9ada569 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java 23aa37f 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java 2181c2a 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java b459ee4 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java f0b4bdc 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java 6423a36 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java 9211af6 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java 6118be5 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java 34a7ed6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 66f9f03 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java 5838bd1 
> 
> Diff: https://reviews.apache.org/r/38796/diff/
> 
> 
> Testing
> -------
> 
> Added unittests to test reading all supported types, project pushdown and partition pruning. Manually tested with Hive tables containing large amount of data (these tests will become part of the regression suite).
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>

Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Posted by Venki Korukanti <ve...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/
-----------------------------------------------------------

(Updated Sept. 29, 2015, 9:23 a.m.)


Review request for drill and Jinfeng Ni.


Changes
-------

Addressed review comments except the issue in HiveScan.getColumns not expanding the '\*'. Logged a separate bug DRILL-3852 as it is known issue.


Repository: drill-git


Description
-------

Please jira DRILL-3209 for details.


Diffs (updated)
-----

  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java 11c6455 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java PRE-CREATION 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java PRE-CREATION 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java PRE-CREATION 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java PRE-CREATION 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java 9ada569 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java 23aa37f 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java 2181c2a 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java b459ee4 
  contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java f0b4bdc 
  contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java 6423a36 
  contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java 9211af6 
  contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java 6118be5 
  contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java 34a7ed6 
  exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 66f9f03 
  exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java 5838bd1 

Diff: https://reviews.apache.org/r/38796/diff/


Testing
-------

Added unittests to test reading all supported types, project pushdown and partition pruning. Manually tested with Hive tables containing large amount of data (these tests will become part of the regression suite).


Thanks,

Venki Korukanti

Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Posted by Venki Korukanti <ve...@gmail.com>.


> On Sept. 28, 2015, 2:21 p.m., Jinfeng Ni wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java, line 128
> > <https://reviews.apache.org/r/38796/diff/2/?file=1085484#file1085484line128>
> >
> >     the format seems to expect 3 inputs; is "e" unused?

I am passing the exception to logger so that it is printed in logs. More details here: http://stackoverflow.com/questions/6371638/slf4j-how-to-log-formatted-message-object-array-exception


> On Sept. 28, 2015, 2:21 p.m., Jinfeng Ni wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java, line 176
> > <https://reviews.apache.org/r/38796/diff/2/?file=1085484#file1085484line176>
> >
> >     I feel the block of checking "*" seems unnecesary. If I understand correctly, Hive table is regarded as table with schema. Therefore, we should not see "*" in hiveScanRel.getRowType; * column should have been expanded into the list of columns in Hive table.

You are right. Columns are always expanded in RelDataType, because we call DrillHiveTable.getRowType() which by default returns the full schema. Removed the block.


> On Sept. 28, 2015, 2:21 p.m., Jinfeng Ni wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java, line 195
> > <https://reviews.apache.org/r/38796/diff/2/?file=1085484#file1085484line195>
> >
> >     Seems like during the convertion, you only maintain the column names. What about the column type? Do we need to maintain the column type as well? DrillFixedRelDataTypeImpl assume every column has "any" type; meaning the column type in hive table is lost?

Changed to use typeFactory.createStructType(typeList, nameList). Also removed the casts added to regular columns.


> On Sept. 28, 2015, 2:21 p.m., Jinfeng Ni wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java, line 199
> > <https://reviews.apache.org/r/38796/diff/2/?file=1085484#file1085484line199>
> >
> >     I feel probably there is a bug in existing code HiveScan.getColumn() should not contain "*", if Drill treats Hive table as schemaed table.

As it is an existing bug logged DRILL-3852 fix it separately from this patch..


- Venki


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/#review100855
-----------------------------------------------------------


On Sept. 27, 2015, 7:50 a.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38796/
> -----------------------------------------------------------
> 
> (Updated Sept. 27, 2015, 7:50 a.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Please jira DRILL-3209 for details.
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java 11c6455 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java 9ada569 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java 23aa37f 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java 2181c2a 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java b459ee4 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java f0b4bdc 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java 6423a36 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java 9211af6 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java 6118be5 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java 34a7ed6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 0f6a5bb 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java 118f7ad 
> 
> Diff: https://reviews.apache.org/r/38796/diff/
> 
> 
> Testing
> -------
> 
> Added unittests to test reading all supported types, project pushdown and partition pruning. Manually tested with Hive tables containing large amount of data (these tests will become part of the regression suite).
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>

Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Posted by Jinfeng Ni <jn...@maprtech.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/#review100855
-----------------------------------------------------------


I mainly looked through the change in planning side. You may have another person to look at the execution side change.

The planning side overall looks good to me. Have couple of comments below.


contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java (line 89)
<https://reviews.apache.org/r/38796/#comment158101>

    You may consider put the checking of option when adding this rule into Drill's rule set. See [1].
    
    Doing this will save the overhead to matching this rule.
    
    [1] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java#L229



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java (line 102)
<https://reviews.apache.org/r/38796/#comment158102>

    partitions.size() ==0 would be covered by line 114 (the block 107-112 will be skipped)?



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java (line 128)
<https://reviews.apache.org/r/38796/#comment158104>

    the format seems to expect 3 inputs; is "e" unused?



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java (line 176)
<https://reviews.apache.org/r/38796/#comment158108>

    I feel the block of checking "*" seems unnecesary. If I understand correctly, Hive table is regarded as table with schema. Therefore, we should not see "*" in hiveScanRel.getRowType; * column should have been expanded into the list of columns in Hive table.



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java (line 195)
<https://reviews.apache.org/r/38796/#comment158110>

    Seems like during the convertion, you only maintain the column names. What about the column type? Do we need to maintain the column type as well? DrillFixedRelDataTypeImpl assume every column has "any" type; meaning the column type in hive table is lost?



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java (line 199)
<https://reviews.apache.org/r/38796/#comment158111>

    I feel probably there is a bug in existing code HiveScan.getColumn() should not contain "*", if Drill treats Hive table as schemaed table.



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java (line 259)
<https://reviews.apache.org/r/38796/#comment158123>

    Again, if the column type is maintained during the conversion, seems it's necessary to cast for every column; if the hive type happens to be same as parquet type, we do not have to cast.


- Jinfeng Ni


On Sept. 27, 2015, 7:50 a.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38796/
> -----------------------------------------------------------
> 
> (Updated Sept. 27, 2015, 7:50 a.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Please jira DRILL-3209 for details.
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java 11c6455 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java 9ada569 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java 23aa37f 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java 2181c2a 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java b459ee4 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java f0b4bdc 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java 6423a36 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java 9211af6 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java 6118be5 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java 34a7ed6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 0f6a5bb 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java 118f7ad 
> 
> Diff: https://reviews.apache.org/r/38796/diff/
> 
> 
> Testing
> -------
> 
> Added unittests to test reading all supported types, project pushdown and partition pruning. Manually tested with Hive tables containing large amount of data (these tests will become part of the regression suite).
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>

Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Posted by Venki Korukanti <ve...@gmail.com>.


> On Sept. 28, 2015, 10:18 p.m., Aman Sinha wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java, line 63
> > <https://reviews.apache.org/r/38796/diff/2/?file=1085485#file1085485line63>
> >
> >     Since the RecordCount is the same regardless of the type of the reader, we should not divide it by the factor.  Dividing the cpu cost and disk cost seems ok.
> 
> Venki Korukanti wrote:
>     If I understand correctly, we are using only the rowcount while caclulating the self cost of the scan in ScanPrel.computeSelfCost. So we need to alter the rowcount here.
> 
> Aman Sinha wrote:
>     True..the current cost model for Scans is computing cpuCost as a function of rowCount and columnCount.  I will open an enhancement JIRA to change that such that 2 different scan methods (such as Hive scan vs. Drill native scan) that produce the same row count but differ in cpu cost and I/O cost can be modeled accurately. 
>     
>     Given that, you don't have to change the cost here...my only other suggestion would be to use a static constant as a factor: e.g HIVE_COST_FACTOR (or something similar).

Added HIVE_SERDE_SCAN_OVERHEAD_FACTOR constant.


- Venki


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/#review100878
-----------------------------------------------------------


On Sept. 29, 2015, 9:23 a.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38796/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2015, 9:23 a.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Please jira DRILL-3209 for details.
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java 11c6455 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java 9ada569 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java 23aa37f 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java 2181c2a 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java b459ee4 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java f0b4bdc 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java 6423a36 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java 9211af6 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java 6118be5 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java 34a7ed6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 66f9f03 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java 5838bd1 
> 
> Diff: https://reviews.apache.org/r/38796/diff/
> 
> 
> Testing
> -------
> 
> Added unittests to test reading all supported types, project pushdown and partition pruning. Manually tested with Hive tables containing large amount of data (these tests will become part of the regression suite).
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>

Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Posted by Aman Sinha <as...@maprtech.com>.


> On Sept. 29, 2015, 5:18 a.m., Aman Sinha wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java, line 63
> > <https://reviews.apache.org/r/38796/diff/2/?file=1085485#file1085485line63>
> >
> >     Since the RecordCount is the same regardless of the type of the reader, we should not divide it by the factor.  Dividing the cpu cost and disk cost seems ok.
> 
> Venki Korukanti wrote:
>     If I understand correctly, we are using only the rowcount while caclulating the self cost of the scan in ScanPrel.computeSelfCost. So we need to alter the rowcount here.

True..the current cost model for Scans is computing cpuCost as a function of rowCount and columnCount.  I will open an enhancement JIRA to change that such that 2 different scan methods (such as Hive scan vs. Drill native scan) that produce the same row count but differ in cpu cost and I/O cost can be modeled accurately. 

Given that, you don't have to change the cost here...my only other suggestion would be to use a static constant as a factor: e.g HIVE_COST_FACTOR (or something similar).


> On Sept. 29, 2015, 5:18 a.m., Aman Sinha wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java, line 64
> > <https://reviews.apache.org/r/38796/diff/2/?file=1085484#file1085484line64>
> >
> >     The name of a function should not have a product name in it..
> 
> Venki Korukanti wrote:
>     I am using the existing function in master. It is a convert function which converts data from IMPALA/Hive specific format to drill format. Let me know if you want this to be changed. I can log a separate jira to track this.

I see...you can leave it as is for now.


- Aman


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/#review100878
-----------------------------------------------------------


On Sept. 29, 2015, 4:23 p.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38796/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2015, 4:23 p.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Please jira DRILL-3209 for details.
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java 11c6455 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java 9ada569 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java 23aa37f 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java 2181c2a 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java b459ee4 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java f0b4bdc 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java 6423a36 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java 9211af6 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java 6118be5 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java 34a7ed6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 66f9f03 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java 5838bd1 
> 
> Diff: https://reviews.apache.org/r/38796/diff/
> 
> 
> Testing
> -------
> 
> Added unittests to test reading all supported types, project pushdown and partition pruning. Manually tested with Hive tables containing large amount of data (these tests will become part of the regression suite).
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>

Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Posted by Venki Korukanti <ve...@gmail.com>.


> On Sept. 28, 2015, 10:18 p.m., Aman Sinha wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java, line 64
> > <https://reviews.apache.org/r/38796/diff/2/?file=1085484#file1085484line64>
> >
> >     The name of a function should not have a product name in it..

I am using the existing function in master. It is a convert function which converts data from IMPALA/Hive specific format to drill format. Let me know if you want this to be changed. I can log a separate jira to track this.


> On Sept. 28, 2015, 10:18 p.m., Aman Sinha wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java, line 63
> > <https://reviews.apache.org/r/38796/diff/2/?file=1085485#file1085485line63>
> >
> >     Since the RecordCount is the same regardless of the type of the reader, we should not divide it by the factor.  Dividing the cpu cost and disk cost seems ok.

If I understand correctly, we are using only the rowcount while caclulating the self cost of the scan in ScanPrel.computeSelfCost. So we need to alter the rowcount here.


> On Sept. 28, 2015, 10:18 p.m., Aman Sinha wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java, line 123
> > <https://reviews.apache.org/r/38796/diff/2/?file=1085487#file1085487line123>
> >
> >     Should there be a consideration for complex type data ?  similar to that of the ParquetScanBatchCreator ?

Currently we don't support Hive complex types. Added the check in the rule, so that we exit early.


- Venki


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/#review100878
-----------------------------------------------------------


On Sept. 29, 2015, 9:23 a.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38796/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2015, 9:23 a.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Please jira DRILL-3209 for details.
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java 11c6455 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java 9ada569 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java 23aa37f 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java 2181c2a 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java b459ee4 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java f0b4bdc 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java 6423a36 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java 9211af6 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java 6118be5 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java 34a7ed6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 66f9f03 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java 5838bd1 
> 
> Diff: https://reviews.apache.org/r/38796/diff/
> 
> 
> Testing
> -------
> 
> Added unittests to test reading all supported types, project pushdown and partition pruning. Manually tested with Hive tables containing large amount of data (these tests will become part of the regression suite).
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>

Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Posted by Aman Sinha <as...@maprtech.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/#review100878
-----------------------------------------------------------



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java (line 64)
<https://reviews.apache.org/r/38796/#comment158200>

    The name of a function should not have a product name in it..



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java (line 63)
<https://reviews.apache.org/r/38796/#comment158130>

    Since the RecordCount is the same regardless of the type of the reader, we should not divide it by the factor.  Dividing the cpu cost and disk cost seems ok.



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java (line 123)
<https://reviews.apache.org/r/38796/#comment158197>

    Should there be a consideration for complex type data ?  similar to that of the ParquetScanBatchCreator ?



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java (line 169)
<https://reviews.apache.org/r/38796/#comment158196>

    Login => Logic ?


- Aman Sinha


On Sept. 27, 2015, 2:50 p.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38796/
> -----------------------------------------------------------
> 
> (Updated Sept. 27, 2015, 2:50 p.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Please jira DRILL-3209 for details.
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java 11c6455 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java 9ada569 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java 23aa37f 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java 2181c2a 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java b459ee4 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java f0b4bdc 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java 6423a36 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java 9211af6 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java 6118be5 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java 34a7ed6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 0f6a5bb 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java 118f7ad 
> 
> Diff: https://reviews.apache.org/r/38796/diff/
> 
> 
> Testing
> -------
> 
> Added unittests to test reading all supported types, project pushdown and partition pruning. Manually tested with Hive tables containing large amount of data (these tests will become part of the regression suite).
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>

Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader

Posted by Venki Korukanti <ve...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/
-----------------------------------------------------------

(Updated Sept. 27, 2015, 7:50 a.m.)


Review request for drill and Jinfeng Ni.


Repository: drill-git


Description
-------

Please jira DRILL-3209 for details.


Diffs (updated)
-----

  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java 11c6455 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java PRE-CREATION 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java PRE-CREATION 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java PRE-CREATION 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java PRE-CREATION 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java 9ada569 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java 23aa37f 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java 2181c2a 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java b459ee4 
  contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java f0b4bdc 
  contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java 6423a36 
  contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java 9211af6 
  contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java 6118be5 
  contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java 34a7ed6 
  exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 0f6a5bb 
  exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java 118f7ad 

Diff: https://reviews.apache.org/r/38796/diff/


Testing
-------

Added unittests to test reading all supported types, project pushdown and partition pruning. Manually tested with Hive tables containing large amount of data (these tests will become part of the regression suite).


Thanks,

Venki Korukanti