You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@sqoop.apache.org by Qian Xu <qi...@intel.com> on 2015/04/11 19:31:54 UTC

Review Request 33104: SQOOP-Hive import with Parquet should append automatically

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33104/
-----------------------------------------------------------

Review request for Sqoop.


Bugs: SQOOP-2295
    https://issues.apache.org/jira/browse/SQOOP-2295


Repository: sqoop-trunk


Description
-------

Currently, an existing dataset will throw an exception. This differs from `--as-textfile`. I've checked the user manual, the handling of HDFS and Hive are indeed different. For HDFS, unless `--append` is specified, the job will fail when destination exists already. For Hive, unless `--create-hive-table` is specified, the job will become append mode. The patch has made the handling of `--as-textfile` and `--as-parquetfile` consistent. (Note that `--as-avrofile` is not supported, which can be supported similias as parquet in follow-up jira.


Diffs
-----

  src/docs/man/hive-args.txt 7d9e427 
  src/docs/man/sqoop-create-hive-table.txt 7aebcc1 
  src/docs/user/create-hive-table.txt 3aa34fd 
  src/docs/user/hive-args.txt 53de92d 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java e70d23c 
  src/java/org/apache/sqoop/mapreduce/ParquetJob.java df55dbc 

Diff: https://reviews.apache.org/r/33104/diff/


Testing
-------

Manually tested append, new create and overwrite cases.


Thanks,

Qian Xu

Re: Review Request 33104: SQOOP-Hive import with Parquet should append automatically

Posted by Abraham Elmahrek <ab...@cloudera.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33104/#review79927
-----------------------------------------------------------


Can you also add test cases for this? TestParquetImport

- Abraham Elmahrek


On April 12, 2015, 7:26 a.m., Qian Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33104/
> -----------------------------------------------------------
> 
> (Updated April 12, 2015, 7:26 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-2295
>     https://issues.apache.org/jira/browse/SQOOP-2295
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Currently, an existing dataset will throw an exception. This differs from `--as-textfile`. I've checked the user manual, the handling of HDFS and Hive are indeed different. For HDFS, unless `--append` is specified, the job will fail when destination exists already. For Hive, unless `--create-hive-table` is specified, the job will become append mode. The patch has made the handling of `--as-textfile` and `--as-parquetfile` consistent.
> 
> 
> Diffs
> -----
> 
>   src/docs/man/hive-args.txt 7d9e427 
>   src/docs/man/sqoop-create-hive-table.txt 7aebcc1 
>   src/docs/user/create-hive-table.txt 3aa34fd 
>   src/docs/user/hive-args.txt 53de92d 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java e70d23c 
>   src/java/org/apache/sqoop/mapreduce/ParquetJob.java df55dbc 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java c97bb58 
> 
> Diff: https://reviews.apache.org/r/33104/diff/
> 
> 
> Testing
> -------
> 
> Manually tested append, new create and overwrite cases.
> 
> 
> Thanks,
> 
> Qian Xu
> 
>

Re: Review Request 33104: SQOOP-Hive import with Parquet should append automatically

Posted by Jarek Cecho <ja...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33104/#review82349
-----------------------------------------------------------


The patch looks good to me. Just one question - it seems that it's causing two test failures on my machine:

Testcase: testFieldWithHiveDelims took 5.021 sec
Testcase: testGenerateOnly took 0.329 sec
Testcase: testHiveExitFails took 1.7 sec
Testcase: testDate took 1.695 sec
Testcase: testFieldWithHiveDelimsReplacement took 1.617 sec
Testcase: testCustomDelimiters took 1.598 sec
Testcase: testHiveDropAndReplaceOptionValidation took 0.036 sec
Testcase: testCreateOverwriteHiveImport took 0.103 sec
Testcase: testCreateOnlyHiveImport took 0.055 sec
Testcase: testAppendHiveImportAsParquet took 15.383 sec
	Caused an ERROR
null
java.util.NoSuchElementException
	at org.kitesdk.data.spi.filesystem.MultiFileDatasetReader.next(MultiFileDatasetReader.java:144)
	at com.cloudera.sqoop.hive.TestHiveImport.verifyHiveDataset(TestHiveImport.java:292)
	at com.cloudera.sqoop.hive.TestHiveImport.testAppendHiveImportAsParquet(TestHiveImport.java:383)

Testcase: testNormalHiveImport took 1.58 sec
Testcase: testNormalHiveImportAsParquet took 3.46 sec
Testcase: testImportWithBadPartitionKey took 3.068 sec
Testcase: testCreateOverwriteHiveImportAsParquet took 4.107 sec
	Caused an ERROR
Failure during job; return status 1
java.io.IOException: Failure during job; return status 1
	at com.cloudera.sqoop.testutil.ImportJobTestCase.runImport(ImportJobTestCase.java:236)
	at com.cloudera.sqoop.testutil.ImportJobTestCase.runImport(ImportJobTestCase.java:210)
	at com.cloudera.sqoop.hive.TestHiveImport.runImportTest(TestHiveImport.java:215)
	at com.cloudera.sqoop.hive.TestHiveImport.testCreateOverwriteHiveImportAsParquet(TestHiveImport.java:356)

Testcase: testImportHiveWithPartitions took 1.51 sec
Testcase: testNumeric took 1.476 sec

I'm wondering if you see the same failures Stanley?

- Jarek Cecho


On May 3, 2015, 3:40 p.m., Qian Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33104/
> -----------------------------------------------------------
> 
> (Updated May 3, 2015, 3:40 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-2295
>     https://issues.apache.org/jira/browse/SQOOP-2295
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Currently, an existing dataset will throw an exception. This differs from `--as-textfile`. I've checked the user manual, the handling of HDFS and Hive are indeed different. For HDFS, unless `--append` is specified, the job will fail when destination exists already. For Hive, unless `--create-hive-table` is specified, the job will become append mode. The patch has made the handling of `--as-textfile` and `--as-parquetfile` consistent.
> 
> 
> Diffs
> -----
> 
>   src/docs/man/hive-args.txt 7d9e427 
>   src/docs/man/sqoop-create-hive-table.txt 7aebcc1 
>   src/docs/user/create-hive-table.txt 3aa34fd 
>   src/docs/user/hive-args.txt 53de92d 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java d5bfae2 
>   src/java/org/apache/sqoop/mapreduce/ParquetJob.java df55dbc 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java fa717cb 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java 7934791 
>   testdata/hive/scripts/normalImportAsParquet.q e434e9b 
> 
> Diff: https://reviews.apache.org/r/33104/diff/
> 
> 
> Testing
> -------
> 
> Manually tested append, new create and overwrite cases.
> 
> 
> Thanks,
> 
> Qian Xu
> 
>

Re: Review Request 33104: SQOOP-Hive import with Parquet should append automatically

Posted by Qian Xu <qi...@intel.com>.


> On May 7, 2015, 9:46 a.m., Abraham Elmahrek wrote:
> > src/test/com/cloudera/sqoop/hive/TestHiveImport.java, lines 296-314
> > <https://reviews.apache.org/r/33104/diff/4-5/?file=948229#file948229line296>
> >
> >     Why not just create a Set object and compare sets? It should be much less code.

We will not able to test a dataset with duplicated records, if we use set instead of list. `List.remove` will remove the first matched element.


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33104/#review82786
-----------------------------------------------------------


On May 6, 2015, 1:49 p.m., Qian Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33104/
> -----------------------------------------------------------
> 
> (Updated May 6, 2015, 1:49 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-2295
>     https://issues.apache.org/jira/browse/SQOOP-2295
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Currently, an existing dataset will throw an exception. This differs from `--as-textfile`. I've checked the user manual, the handling of HDFS and Hive are indeed different. For HDFS, unless `--append` is specified, the job will fail when destination exists already. For Hive, unless `--create-hive-table` is specified, the job will become append mode. The patch has made the handling of `--as-textfile` and `--as-parquetfile` consistent.
> 
> 
> Diffs
> -----
> 
>   src/docs/man/hive-args.txt 7d9e427 
>   src/docs/man/sqoop-create-hive-table.txt 7aebcc1 
>   src/docs/user/create-hive-table.txt 3aa34fd 
>   src/docs/user/hive-args.txt 53de92d 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java d5bfae2 
>   src/java/org/apache/sqoop/mapreduce/ParquetJob.java df55dbc 
>   src/test/com/cloudera/sqoop/TestParquetImport.java 07e140a 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java fa717cb 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java 7934791 
>   src/test/com/cloudera/sqoop/testutil/ImportJobTestCase.java 293bf10 
>   testdata/hive/scripts/normalImportAsParquet.q e434e9b 
> 
> Diff: https://reviews.apache.org/r/33104/diff/
> 
> 
> Testing
> -------
> 
> Manually tested append, new create and overwrite cases.
> 
> 
> Thanks,
> 
> Qian Xu
> 
>

Re: Review Request 33104: SQOOP-Hive import with Parquet should append automatically

Posted by Abraham Elmahrek <ab...@cloudera.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33104/#review82786
-----------------------------------------------------------



src/test/com/cloudera/sqoop/hive/TestHiveImport.java
<https://reviews.apache.org/r/33104/#comment133601>

    Why not just create a Set object and compare sets? It should be much less code.


- Abraham Elmahrek


On May 6, 2015, 5:49 a.m., Qian Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33104/
> -----------------------------------------------------------
> 
> (Updated May 6, 2015, 5:49 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-2295
>     https://issues.apache.org/jira/browse/SQOOP-2295
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Currently, an existing dataset will throw an exception. This differs from `--as-textfile`. I've checked the user manual, the handling of HDFS and Hive are indeed different. For HDFS, unless `--append` is specified, the job will fail when destination exists already. For Hive, unless `--create-hive-table` is specified, the job will become append mode. The patch has made the handling of `--as-textfile` and `--as-parquetfile` consistent.
> 
> 
> Diffs
> -----
> 
>   src/docs/man/hive-args.txt 7d9e427 
>   src/docs/man/sqoop-create-hive-table.txt 7aebcc1 
>   src/docs/user/create-hive-table.txt 3aa34fd 
>   src/docs/user/hive-args.txt 53de92d 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java d5bfae2 
>   src/java/org/apache/sqoop/mapreduce/ParquetJob.java df55dbc 
>   src/test/com/cloudera/sqoop/TestParquetImport.java 07e140a 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java fa717cb 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java 7934791 
>   src/test/com/cloudera/sqoop/testutil/ImportJobTestCase.java 293bf10 
>   testdata/hive/scripts/normalImportAsParquet.q e434e9b 
> 
> Diff: https://reviews.apache.org/r/33104/diff/
> 
> 
> Testing
> -------
> 
> Manually tested append, new create and overwrite cases.
> 
> 
> Thanks,
> 
> Qian Xu
> 
>

Re: Review Request 33104: SQOOP-Hive import with Parquet should append automatically

Posted by Abraham Elmahrek <ab...@cloudera.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33104/#review82981
-----------------------------------------------------------

Ship it!


Ship It!

- Abraham Elmahrek


On May 6, 2015, 5:49 a.m., Qian Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33104/
> -----------------------------------------------------------
> 
> (Updated May 6, 2015, 5:49 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-2295
>     https://issues.apache.org/jira/browse/SQOOP-2295
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Currently, an existing dataset will throw an exception. This differs from `--as-textfile`. I've checked the user manual, the handling of HDFS and Hive are indeed different. For HDFS, unless `--append` is specified, the job will fail when destination exists already. For Hive, unless `--create-hive-table` is specified, the job will become append mode. The patch has made the handling of `--as-textfile` and `--as-parquetfile` consistent.
> 
> 
> Diffs
> -----
> 
>   src/docs/man/hive-args.txt 7d9e427 
>   src/docs/man/sqoop-create-hive-table.txt 7aebcc1 
>   src/docs/user/create-hive-table.txt 3aa34fd 
>   src/docs/user/hive-args.txt 53de92d 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java d5bfae2 
>   src/java/org/apache/sqoop/mapreduce/ParquetJob.java df55dbc 
>   src/test/com/cloudera/sqoop/TestParquetImport.java 07e140a 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java fa717cb 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java 7934791 
>   src/test/com/cloudera/sqoop/testutil/ImportJobTestCase.java 293bf10 
>   testdata/hive/scripts/normalImportAsParquet.q e434e9b 
> 
> Diff: https://reviews.apache.org/r/33104/diff/
> 
> 
> Testing
> -------
> 
> Manually tested append, new create and overwrite cases.
> 
> 
> Thanks,
> 
> Qian Xu
> 
>

Re: Review Request 33104: SQOOP-Hive import with Parquet should append automatically

Posted by Qian Xu <qi...@intel.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33104/
-----------------------------------------------------------

(Updated May 6, 2015, 1:49 p.m.)


Review request for Sqoop.


Changes
-------

The new patch has two changes (1) makes sure table directory is cleaned up in setup stage for every test case (2) make sure records verification stable


Bugs: SQOOP-2295
    https://issues.apache.org/jira/browse/SQOOP-2295


Repository: sqoop-trunk


Description
-------

Currently, an existing dataset will throw an exception. This differs from `--as-textfile`. I've checked the user manual, the handling of HDFS and Hive are indeed different. For HDFS, unless `--append` is specified, the job will fail when destination exists already. For Hive, unless `--create-hive-table` is specified, the job will become append mode. The patch has made the handling of `--as-textfile` and `--as-parquetfile` consistent.


Diffs (updated)
-----

  src/docs/man/hive-args.txt 7d9e427 
  src/docs/man/sqoop-create-hive-table.txt 7aebcc1 
  src/docs/user/create-hive-table.txt 3aa34fd 
  src/docs/user/hive-args.txt 53de92d 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java d5bfae2 
  src/java/org/apache/sqoop/mapreduce/ParquetJob.java df55dbc 
  src/test/com/cloudera/sqoop/TestParquetImport.java 07e140a 
  src/test/com/cloudera/sqoop/hive/TestHiveImport.java fa717cb 
  src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java 7934791 
  src/test/com/cloudera/sqoop/testutil/ImportJobTestCase.java 293bf10 
  testdata/hive/scripts/normalImportAsParquet.q e434e9b 

Diff: https://reviews.apache.org/r/33104/diff/


Testing
-------

Manually tested append, new create and overwrite cases.


Thanks,

Qian Xu

Re: Review Request 33104: SQOOP-Hive import with Parquet should append automatically

Posted by Qian Xu <qi...@intel.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33104/
-----------------------------------------------------------

(Updated May 3, 2015, 11:40 p.m.)


Review request for Sqoop.


Changes
-------

Removed `--create-hive-table` related code regarding Jarcec's comments.


Bugs: SQOOP-2295
    https://issues.apache.org/jira/browse/SQOOP-2295


Repository: sqoop-trunk


Description
-------

Currently, an existing dataset will throw an exception. This differs from `--as-textfile`. I've checked the user manual, the handling of HDFS and Hive are indeed different. For HDFS, unless `--append` is specified, the job will fail when destination exists already. For Hive, unless `--create-hive-table` is specified, the job will become append mode. The patch has made the handling of `--as-textfile` and `--as-parquetfile` consistent.


Diffs (updated)
-----

  src/docs/man/hive-args.txt 7d9e427 
  src/docs/man/sqoop-create-hive-table.txt 7aebcc1 
  src/docs/user/create-hive-table.txt 3aa34fd 
  src/docs/user/hive-args.txt 53de92d 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java d5bfae2 
  src/java/org/apache/sqoop/mapreduce/ParquetJob.java df55dbc 
  src/test/com/cloudera/sqoop/hive/TestHiveImport.java fa717cb 
  src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java 7934791 
  testdata/hive/scripts/normalImportAsParquet.q e434e9b 

Diff: https://reviews.apache.org/r/33104/diff/


Testing
-------

Manually tested append, new create and overwrite cases.


Thanks,

Qian Xu

Re: Review Request 33104: SQOOP-Hive import with Parquet should append automatically

Posted by Jarek Cecho <ja...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33104/#review82254
-----------------------------------------------------------


Just few notes:


src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java
<https://reviews.apache.org/r/33104/#comment132951>

    Honestly this is the first time I'm seeing the "doFailIfHiveTableExists" method :) It seems unusued in current Sqoop code base, so I'm wondering whether it would be better to not use it here (and perhaps drop it completely in different JIRA).



src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java
<https://reviews.apache.org/r/33104/#comment132950>

    Just a note for the log output: I believe that the semantcs of --create-hive-table is  - create the table if it doesn't exist and do nothing if it does exists.
    
    I'm thinking whether the comment should be just mentioning that this will do "append" to existing hive tables and that user might consider --hive-overwrite if rewrite is needed? E.g. no mention of the --create-hive-table. What do you think?



src/java/org/apache/sqoop/tool/BaseSqoopTool.java
<https://reviews.apache.org/r/33104/#comment132952>

    The --append paramemetr doesn't make really sense with Hive because hive import behaves differently then HDFS one, right?
    
    It's quite unfortunate, but it seems better to preserve the check to not confuse people even more?


Jarcec

- Jarek Cecho


On April 30, 2015, 5:54 p.m., Qian Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33104/
> -----------------------------------------------------------
> 
> (Updated April 30, 2015, 5:54 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-2295
>     https://issues.apache.org/jira/browse/SQOOP-2295
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Currently, an existing dataset will throw an exception. This differs from `--as-textfile`. I've checked the user manual, the handling of HDFS and Hive are indeed different. For HDFS, unless `--append` is specified, the job will fail when destination exists already. For Hive, unless `--create-hive-table` is specified, the job will become append mode. The patch has made the handling of `--as-textfile` and `--as-parquetfile` consistent.
> 
> 
> Diffs
> -----
> 
>   src/docs/man/hive-args.txt 7d9e427 
>   src/docs/man/sqoop-create-hive-table.txt 7aebcc1 
>   src/docs/user/create-hive-table.txt 3aa34fd 
>   src/docs/user/hive-args.txt 53de92d 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java d5bfae2 
>   src/java/org/apache/sqoop/mapreduce/ParquetJob.java df55dbc 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java c97bb58 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java fa717cb 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java 7934791 
>   testdata/hive/scripts/normalImportAsParquet.q e434e9b 
> 
> Diff: https://reviews.apache.org/r/33104/diff/
> 
> 
> Testing
> -------
> 
> Manually tested append, new create and overwrite cases.
> 
> 
> Thanks,
> 
> Qian Xu
> 
>

Re: Review Request 33104: SQOOP-Hive import with Parquet should append automatically

Posted by Qian Xu <qi...@intel.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33104/
-----------------------------------------------------------

(Updated May 1, 2015, 1:54 a.m.)


Review request for Sqoop.


Bugs: SQOOP-2295
    https://issues.apache.org/jira/browse/SQOOP-2295


Repository: sqoop-trunk


Description
-------

Currently, an existing dataset will throw an exception. This differs from `--as-textfile`. I've checked the user manual, the handling of HDFS and Hive are indeed different. For HDFS, unless `--append` is specified, the job will fail when destination exists already. For Hive, unless `--create-hive-table` is specified, the job will become append mode. The patch has made the handling of `--as-textfile` and `--as-parquetfile` consistent.


Diffs (updated)
-----

  src/docs/man/hive-args.txt 7d9e427 
  src/docs/man/sqoop-create-hive-table.txt 7aebcc1 
  src/docs/user/create-hive-table.txt 3aa34fd 
  src/docs/user/hive-args.txt 53de92d 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java d5bfae2 
  src/java/org/apache/sqoop/mapreduce/ParquetJob.java df55dbc 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java c97bb58 
  src/test/com/cloudera/sqoop/hive/TestHiveImport.java fa717cb 
  src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java 7934791 
  testdata/hive/scripts/normalImportAsParquet.q e434e9b 

Diff: https://reviews.apache.org/r/33104/diff/


Testing
-------

Manually tested append, new create and overwrite cases.


Thanks,

Qian Xu

Re: Review Request 33104: SQOOP-Hive import with Parquet should append automatically

Posted by Qian Xu <qi...@intel.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33104/
-----------------------------------------------------------

(Updated April 12, 2015, 3:26 p.m.)


Review request for Sqoop.


Bugs: SQOOP-2295
    https://issues.apache.org/jira/browse/SQOOP-2295


Repository: sqoop-trunk


Description (updated)
-------

Currently, an existing dataset will throw an exception. This differs from `--as-textfile`. I've checked the user manual, the handling of HDFS and Hive are indeed different. For HDFS, unless `--append` is specified, the job will fail when destination exists already. For Hive, unless `--create-hive-table` is specified, the job will become append mode. The patch has made the handling of `--as-textfile` and `--as-parquetfile` consistent.


Diffs (updated)
-----

  src/docs/man/hive-args.txt 7d9e427 
  src/docs/man/sqoop-create-hive-table.txt 7aebcc1 
  src/docs/user/create-hive-table.txt 3aa34fd 
  src/docs/user/hive-args.txt 53de92d 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java e70d23c 
  src/java/org/apache/sqoop/mapreduce/ParquetJob.java df55dbc 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java c97bb58 

Diff: https://reviews.apache.org/r/33104/diff/


Testing
-------

Manually tested append, new create and overwrite cases.


Thanks,

Qian Xu