You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Boglarka Egyed <bo...@apache.org> on 2018/06/14 13:30:44 UTC

Re: Review Request 66548: Importing as ORC file to support full ACID Hive tables

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66548/#review204773
-----------------------------------------------------------



Hi Daniel,

Thanks for this improvement, your change generally LGTM, I also like the amount of test cases you attached to this change!

I have a couple of minor findings, please find them below, also I ran unit tests and the newly added TestOrcConversionContext failed for me with the following error message:

Testcase: testGetConverterDatetimeTypes took 0.011 sec
	Caused an ERROR
org.apache.orc.storage.serde2.io.DateWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DateWritable
java.lang.ClassCastException: org.apache.orc.storage.serde2.io.DateWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DateWritable
	at org.apache.sqoop.util.TestOrcConversionContext.testGetConverterDatetimeTypes(TestOrcConversionContext.java:97)

Could you please take a look?

Thanks in advance,
Bogi


src/java/org/apache/sqoop/util/OrcUtil.java
Lines 38-47 (patched)
<https://reviews.apache.org/r/66548/#comment287500>

    Input parameter overrides is missing from javadoc here.



src/test/org/apache/sqoop/TestAllTables.java
Line 41 (original), 52 (patched)
<https://reviews.apache.org/r/66548/#comment287501>

    Could you please revert this to prevent wild card import usage?



src/test/org/apache/sqoop/TestOrcImport.java
Lines 62-81 (patched)
<https://reviews.apache.org/r/66548/#comment287502>

    Would you mind use the newly introduced ArgumentArrayBuilder logic in tests instead? It has been added in https://reviews.apache.org/r/65607/


- Boglarka Egyed


On May 2, 2018, 12:12 p.m., daniel voros wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66548/
> -----------------------------------------------------------
> 
> (Updated May 2, 2018, 12:12 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3311
>     https://issues.apache.org/jira/browse/SQOOP-3311
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Hive 3 will introduce a switch (HIVE-18294) to create eligible tables as ACID by default. This will probably result in increased usage of ACID tables and the need to support importing into ACID tables with Sqoop.
> 
> Currently the only table format supporting full ACID tables is ORC.
> 
> The easiest and most effective way to support importing into these tables would be to write out files as ORC and keep using LOAD DATA as we do for all other Hive tables (supported since HIVE-17361).
> 
> Workaround could be to create table as textfile (as before) and then CTAS from that. This would push the responsibility of creating ORC format to Hive. However it would result in writing every record twice; in text format and in ORC.
> 
> Note that ORC is only necessary for full ACID tables. Insert-only (aka. micromanaged) ACID tables can use arbitrary file format.
> 
> Supporting full ACID tables would also be the first step in making "lastmodified" incremental imports work with Hive.
> 
> 
> Diffs
> -----
> 
>   ivy.xml 1f587f3e 
>   ivy/libraries.properties 565a8bf5 
>   src/docs/man/import-common-args.txt 22e3448e 
>   src/docs/man/sqoop-import-all-tables.txt 6db38ad8 
>   src/docs/user/export-purpose.txt def6ead3 
>   src/docs/user/hcatalog.txt 2ae1d54d 
>   src/docs/user/help.txt 8a0d1477 
>   src/docs/user/import-all-tables.txt fbad47b2 
>   src/docs/user/import-purpose.txt c7eca606 
>   src/docs/user/import.txt 2d074f49 
>   src/java/org/apache/sqoop/SqoopOptions.java d9984af3 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java 27d988c5 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 3b542102 
>   src/java/org/apache/sqoop/mapreduce/OrcImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java c62ee98c 
>   src/java/org/apache/sqoop/tool/ExportTool.java 060f2c07 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2c474b7e 
>   src/java/org/apache/sqoop/util/OrcConversionContext.java PRE-CREATION 
>   src/java/org/apache/sqoop/util/OrcUtil.java PRE-CREATION 
>   src/test/org/apache/sqoop/TestAllTables.java 16933a82 
>   src/test/org/apache/sqoop/TestOrcImport.java PRE-CREATION 
>   src/test/org/apache/sqoop/hive/TestHiveServer2TextImport.java f6d591b7 
>   src/test/org/apache/sqoop/hive/TestTableDefWriter.java 3ea61f64 
>   src/test/org/apache/sqoop/util/TestOrcConversionContext.java PRE-CREATION 
>   src/test/org/apache/sqoop/util/TestOrcUtil.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66548/diff/11/
> 
> 
> Testing
> -------
> 
> - added some unit tests
> - tested basic Hive import scenarios on a cluster
> 
> 
> Thanks,
> 
> daniel voros
> 
>