You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Venkat Ranganathan <n....@live.com> on 2013/06/02 22:33:20 UTC

Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/
-----------------------------------------------------------

(Updated June 2, 2013, 8:33 p.m.)


Review request for Sqoop and Jarek Cecho.


Changes
-------

The following are the changes in this version

All review comments addressed
Two new tests to check for invalid options (--as-avrofile and --as-sequencefile) with HCatalog jobs
Moved HCatalog tests to integration tests temporarily pending the release HCatalog artifacts for Hadoop 2.x
Added HCatalog docs to the user guide


Description
-------

This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  

With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.


Diffs (updated)
-----

  build.xml 636c103 
  ivy.xml 1fa4dd1 
  ivy/ivysettings.xml c4cc561 
  src/docs/user/SqoopUserGuide.txt 01ac1cf 
  src/docs/user/hcatalog.txt PRE-CREATION 
  src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
  src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
  src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
  src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
  src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
  src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
  src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
  src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
  src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
  src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
  src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
  src/perftest/ExportStressTest.java 0a41408 
  src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
  src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
  src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
  src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
  src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
  testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
  testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
  testdata/hcatalog/conf/log4j.properties PRE-CREATION 

Diff: https://reviews.apache.org/r/10688/diff/


Testing
-------

Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass


Thanks,

Venkat Ranganathan


Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

Posted by Venkat Ranganathan <n....@live.com>.

> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > Hi Venkat,
> > Thank you for incorporating my comments, greatly appreciated. I've took a deep look again and I do have following additional comments:
> > 
> > 1) Can we add the HCatalog tests into ThirdPartyTest suite? https://github.com/apache/sqoop/blob/trunk/src/test/com/cloudera/sqoop/ThirdPartyTests.java
> > 
> > 2) It seems that using --create-hcatalog-table will create the table and exist Sqoop without doing the import:
> > 
> > [root@bousa-hcat ~]# sqoop import --connect jdbc:mysql://mysql.ent.cloudera.com/sqoop --username sqoop --password sqoop --table text --hcatalog-table text --create-hcatalog-table
> > 13/06/04 15:44:39 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
> > 13/06/04 15:44:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
> > 13/06/04 15:44:39 INFO tool.CodeGenTool: Beginning code generation
> > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
> > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
> > 13/06/04 15:44:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
> > 13/06/04 15:44:39 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
> > Note: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.java uses or overrides a deprecated API.
> > Note: Recompile with -Xlint:deprecation for details.
> > 13/06/04 15:44:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.jar
> > 13/06/04 15:44:42 WARN manager.MySQLManager: It looks like you are importing from mysql.
> > 13/06/04 15:44:42 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
> > 13/06/04 15:44:42 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
> > 13/06/04 15:44:42 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
> > 13/06/04 15:44:42 INFO mapreduce.ImportJobBase: Beginning import of text
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for import job
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific details for job
> > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: Hive home is not set. job may fail if needed jar files are not found correctly.  Please set HIVE_HOME in sqoop-env.sh or provide --hive-home option.  Setting HIVE_HOME  to /usr/lib/hive
> > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: HCatalog home is not set. job may fail if needed jar files are not found correctly.  Please set HCAT_HOME in sqoop-env.sh or provide --hcatalog-home option.   Setting HCAT_HOME to /usr/lib/hcatalog
> > 13/06/04 15:44:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column names projected : [id, txt]
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column name - type map :
> >         Names: [id, txt]
> >         Types : [4, 12]
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Creating HCatalog table default.text for import
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: HCatalog Create table statement: 
> > 
> > create table default.text (
> >         id int,
> >         txt string)
> > stored as rcfile
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Executing HCatalog CLI in-process.
> > Hive history file=/tmp/root/hive_job_log_65f4f145-0b1e-4e09-8e40-b7edcfc15f83_2077084453.txt
> > OK
> > Time taken: 25.121 seconds
> > [root@bousa-hcat ~]#
> > 
> >
> 
> Venkat Ranganathan wrote:
>     Sure, I can add it to that.
>     
>     --create-hcatalog-table -  It seems to work by chance - That is, after creating the table a bunch of stuff is done that is not needed.   I will add additional checks there

Sorry I misunderstood your observation - There is even a test case to test this.   What I thought you said was just using --create-hcatalog-table also works like the --create-hive-table option without hive import.   Let me recheck this.

Thanks


- Venkat


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21420
-----------------------------------------------------------


On June 3, 2013, 4:16 a.m., Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10688/
> -----------------------------------------------------------
> 
> (Updated June 3, 2013, 4:16 a.m.)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Description
> -------
> 
> This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  
> 
> With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.
> 
> 
> Diffs
> -----
> 
>   build.xml 636c103 
>   ivy.xml 1fa4dd1 
>   ivy/ivysettings.xml c4cc561 
>   src/docs/user/SqoopUserGuide.txt 01ac1cf 
>   src/docs/user/hcatalog.txt PRE-CREATION 
>   src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
>   src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
>   src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
>   src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
>   src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
>   src/perftest/ExportStressTest.java 0a41408 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
>   src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
>   src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
>   testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
>   testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
>   testdata/hcatalog/conf/log4j.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10688/diff/
> 
> 
> Testing
> -------
> 
> Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>


Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

Posted by Venkat Ranganathan <n....@live.com>.

> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > Hi Venkat,
> > Thank you for incorporating my comments, greatly appreciated. I've took a deep look again and I do have following additional comments:
> > 
> > 1) Can we add the HCatalog tests into ThirdPartyTest suite? https://github.com/apache/sqoop/blob/trunk/src/test/com/cloudera/sqoop/ThirdPartyTests.java
> > 
> > 2) It seems that using --create-hcatalog-table will create the table and exist Sqoop without doing the import:
> > 
> > [root@bousa-hcat ~]# sqoop import --connect jdbc:mysql://mysql.ent.cloudera.com/sqoop --username sqoop --password sqoop --table text --hcatalog-table text --create-hcatalog-table
> > 13/06/04 15:44:39 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
> > 13/06/04 15:44:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
> > 13/06/04 15:44:39 INFO tool.CodeGenTool: Beginning code generation
> > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
> > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
> > 13/06/04 15:44:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
> > 13/06/04 15:44:39 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
> > Note: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.java uses or overrides a deprecated API.
> > Note: Recompile with -Xlint:deprecation for details.
> > 13/06/04 15:44:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.jar
> > 13/06/04 15:44:42 WARN manager.MySQLManager: It looks like you are importing from mysql.
> > 13/06/04 15:44:42 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
> > 13/06/04 15:44:42 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
> > 13/06/04 15:44:42 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
> > 13/06/04 15:44:42 INFO mapreduce.ImportJobBase: Beginning import of text
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for import job
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific details for job
> > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: Hive home is not set. job may fail if needed jar files are not found correctly.  Please set HIVE_HOME in sqoop-env.sh or provide --hive-home option.  Setting HIVE_HOME  to /usr/lib/hive
> > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: HCatalog home is not set. job may fail if needed jar files are not found correctly.  Please set HCAT_HOME in sqoop-env.sh or provide --hcatalog-home option.   Setting HCAT_HOME to /usr/lib/hcatalog
> > 13/06/04 15:44:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column names projected : [id, txt]
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column name - type map :
> >         Names: [id, txt]
> >         Types : [4, 12]
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Creating HCatalog table default.text for import
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: HCatalog Create table statement: 
> > 
> > create table default.text (
> >         id int,
> >         txt string)
> > stored as rcfile
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Executing HCatalog CLI in-process.
> > Hive history file=/tmp/root/hive_job_log_65f4f145-0b1e-4e09-8e40-b7edcfc15f83_2077084453.txt
> > OK
> > Time taken: 25.121 seconds
> > [root@bousa-hcat ~]#
> > 
> >
> 
> Venkat Ranganathan wrote:
>     Sure, I can add it to that.
>     
>     --create-hcatalog-table -  It seems to work by chance - That is, after creating the table a bunch of stuff is done that is not needed.   I will add additional checks there
> 
> Venkat Ranganathan wrote:
>     Sorry I misunderstood your observation - There is even a test case to test this.   What I thought you said was just using --create-hcatalog-table also works like the --create-hive-table option without hive import.   Let me recheck this.
>     
>     Thanks
> 
> Jarek Cecho wrote:
>     Hi Venkat,
>     please accept my apology for the confusion and let me to explain a bit better. I've noticed that when I'm using the parameter --create-hcatalog-table, the logger will get reconfigured and there is not Sqoop log available after the table is created. Notice that there is no log after the "Time taken...".

Yes.  That is what I am debugging.  My system tests on a real cluster passed but that was by comparing the results of the action.   There are a few issues - Hive does not have a logging configuration in place - a template is provided and until the user creates a logger configuration, it is not helpful.   I tries to pass in the hive logging configuration on the command line, but then we have to pass in a whole lot of things on the command line.   So, I have decided to disable in line execution of HCat scripts in real usage mode and for tests only we will support in line usage, but the configuration files I have checked in already should help with this.

BTW, this is also an issue with HiveImport I think, but there it is the last part of the import so it is OK, but we still will have issues with any output there.   


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 491
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line491>
> >
> >     Both Hive and HBase are idempotent when creating tables, so It might make sense to add "IF NOT EXISTS" in order to remain consistent.
> 
> Venkat Ranganathan wrote:
>     Good point.  I think we will otherwise earlier, but for consistency I think we should do this.   Will change

I went through this, and we use --create-hcatalog-table to mean that the table has to be created and the assumption is the table is not there.   We will fail if the table is there.   So, I have decided to leave this in but added a test case to test this scenario specifically


- Venkat


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21420
-----------------------------------------------------------


On June 3, 2013, 4:16 a.m., Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10688/
> -----------------------------------------------------------
> 
> (Updated June 3, 2013, 4:16 a.m.)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Description
> -------
> 
> This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  
> 
> With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.
> 
> 
> Diffs
> -----
> 
>   build.xml 636c103 
>   ivy.xml 1fa4dd1 
>   ivy/ivysettings.xml c4cc561 
>   src/docs/user/SqoopUserGuide.txt 01ac1cf 
>   src/docs/user/hcatalog.txt PRE-CREATION 
>   src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
>   src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
>   src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
>   src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
>   src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
>   src/perftest/ExportStressTest.java 0a41408 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
>   src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
>   src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
>   testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
>   testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
>   testdata/hcatalog/conf/log4j.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10688/diff/
> 
> 
> Testing
> -------
> 
> Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>


Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

Posted by Jarek Cecho <ja...@apache.org>.

> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > Hi Venkat,
> > Thank you for incorporating my comments, greatly appreciated. I've took a deep look again and I do have following additional comments:
> > 
> > 1) Can we add the HCatalog tests into ThirdPartyTest suite? https://github.com/apache/sqoop/blob/trunk/src/test/com/cloudera/sqoop/ThirdPartyTests.java
> > 
> > 2) It seems that using --create-hcatalog-table will create the table and exist Sqoop without doing the import:
> > 
> > [root@bousa-hcat ~]# sqoop import --connect jdbc:mysql://mysql.ent.cloudera.com/sqoop --username sqoop --password sqoop --table text --hcatalog-table text --create-hcatalog-table
> > 13/06/04 15:44:39 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
> > 13/06/04 15:44:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
> > 13/06/04 15:44:39 INFO tool.CodeGenTool: Beginning code generation
> > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
> > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
> > 13/06/04 15:44:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
> > 13/06/04 15:44:39 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
> > Note: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.java uses or overrides a deprecated API.
> > Note: Recompile with -Xlint:deprecation for details.
> > 13/06/04 15:44:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.jar
> > 13/06/04 15:44:42 WARN manager.MySQLManager: It looks like you are importing from mysql.
> > 13/06/04 15:44:42 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
> > 13/06/04 15:44:42 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
> > 13/06/04 15:44:42 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
> > 13/06/04 15:44:42 INFO mapreduce.ImportJobBase: Beginning import of text
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for import job
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific details for job
> > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: Hive home is not set. job may fail if needed jar files are not found correctly.  Please set HIVE_HOME in sqoop-env.sh or provide --hive-home option.  Setting HIVE_HOME  to /usr/lib/hive
> > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: HCatalog home is not set. job may fail if needed jar files are not found correctly.  Please set HCAT_HOME in sqoop-env.sh or provide --hcatalog-home option.   Setting HCAT_HOME to /usr/lib/hcatalog
> > 13/06/04 15:44:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column names projected : [id, txt]
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column name - type map :
> >         Names: [id, txt]
> >         Types : [4, 12]
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Creating HCatalog table default.text for import
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: HCatalog Create table statement: 
> > 
> > create table default.text (
> >         id int,
> >         txt string)
> > stored as rcfile
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Executing HCatalog CLI in-process.
> > Hive history file=/tmp/root/hive_job_log_65f4f145-0b1e-4e09-8e40-b7edcfc15f83_2077084453.txt
> > OK
> > Time taken: 25.121 seconds
> > [root@bousa-hcat ~]#
> > 
> >
> 
> Venkat Ranganathan wrote:
>     Sure, I can add it to that.
>     
>     --create-hcatalog-table -  It seems to work by chance - That is, after creating the table a bunch of stuff is done that is not needed.   I will add additional checks there
> 
> Venkat Ranganathan wrote:
>     Sorry I misunderstood your observation - There is even a test case to test this.   What I thought you said was just using --create-hcatalog-table also works like the --create-hive-table option without hive import.   Let me recheck this.
>     
>     Thanks

Hi Venkat,
please accept my apology for the confusion and let me to explain a bit better. I've noticed that when I'm using the parameter --create-hcatalog-table, the logger will get reconfigured and there is not Sqoop log available after the table is created. Notice that there is no log after the "Time taken...".


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java, lines 131-137
> > <https://reviews.apache.org/r/10688/diff/9/?file=299874#file299874line131>
> >
> >     This method seems to be required only for the debug message. Is it the only reason or did I miss something?
> 
> Venkat Ranganathan wrote:
>     Yes, it is needed for debugging purpose when we want to know when the sub record reader or main record reader are called

I see, thank you.


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 523
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line523>
> >
> >     It seems that at this point we are not reading the hive configuration files but yet executing the in-process Hive CLI that will as a result not pick up the configuration file and will use defaults that is not consistent with the executed mapreduce job that will use the proper configuration files. As a result the table will be created in different metastore then into which we are importing data.
> 
> Venkat Ranganathan wrote:
>      Hive and hcat configuration files and jars have to be in the classpath brought in by hcat -classpath.   Do you think that is not always in the configuration?   When I update the configure sqoop script, I will make sure the hive conf is added.

Yeah it seems that HCatalog 0.5.0 is not putting the hive configuration directory in the classpath - at least in my environment.


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21420
-----------------------------------------------------------


On June 3, 2013, 4:16 a.m., Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10688/
> -----------------------------------------------------------
> 
> (Updated June 3, 2013, 4:16 a.m.)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Description
> -------
> 
> This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  
> 
> With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.
> 
> 
> Diffs
> -----
> 
>   build.xml 636c103 
>   ivy.xml 1fa4dd1 
>   ivy/ivysettings.xml c4cc561 
>   src/docs/user/SqoopUserGuide.txt 01ac1cf 
>   src/docs/user/hcatalog.txt PRE-CREATION 
>   src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
>   src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
>   src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
>   src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
>   src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
>   src/perftest/ExportStressTest.java 0a41408 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
>   src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
>   src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
>   testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
>   testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
>   testdata/hcatalog/conf/log4j.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10688/diff/
> 
> 
> Testing
> -------
> 
> Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>


Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

Posted by Venkat Ranganathan <n....@live.com>.

> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > Hi Venkat,
> > Thank you for incorporating my comments, greatly appreciated. I've took a deep look again and I do have following additional comments:
> > 
> > 1) Can we add the HCatalog tests into ThirdPartyTest suite? https://github.com/apache/sqoop/blob/trunk/src/test/com/cloudera/sqoop/ThirdPartyTests.java
> > 
> > 2) It seems that using --create-hcatalog-table will create the table and exist Sqoop without doing the import:
> > 
> > [root@bousa-hcat ~]# sqoop import --connect jdbc:mysql://mysql.ent.cloudera.com/sqoop --username sqoop --password sqoop --table text --hcatalog-table text --create-hcatalog-table
> > 13/06/04 15:44:39 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
> > 13/06/04 15:44:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
> > 13/06/04 15:44:39 INFO tool.CodeGenTool: Beginning code generation
> > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
> > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
> > 13/06/04 15:44:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
> > 13/06/04 15:44:39 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
> > Note: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.java uses or overrides a deprecated API.
> > Note: Recompile with -Xlint:deprecation for details.
> > 13/06/04 15:44:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.jar
> > 13/06/04 15:44:42 WARN manager.MySQLManager: It looks like you are importing from mysql.
> > 13/06/04 15:44:42 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
> > 13/06/04 15:44:42 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
> > 13/06/04 15:44:42 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
> > 13/06/04 15:44:42 INFO mapreduce.ImportJobBase: Beginning import of text
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for import job
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific details for job
> > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: Hive home is not set. job may fail if needed jar files are not found correctly.  Please set HIVE_HOME in sqoop-env.sh or provide --hive-home option.  Setting HIVE_HOME  to /usr/lib/hive
> > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: HCatalog home is not set. job may fail if needed jar files are not found correctly.  Please set HCAT_HOME in sqoop-env.sh or provide --hcatalog-home option.   Setting HCAT_HOME to /usr/lib/hcatalog
> > 13/06/04 15:44:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column names projected : [id, txt]
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column name - type map :
> >         Names: [id, txt]
> >         Types : [4, 12]
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Creating HCatalog table default.text for import
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: HCatalog Create table statement: 
> > 
> > create table default.text (
> >         id int,
> >         txt string)
> > stored as rcfile
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Executing HCatalog CLI in-process.
> > Hive history file=/tmp/root/hive_job_log_65f4f145-0b1e-4e09-8e40-b7edcfc15f83_2077084453.txt
> > OK
> > Time taken: 25.121 seconds
> > [root@bousa-hcat ~]#
> > 
> >

Sure, I can add it to that.

--create-hcatalog-table -  It seems to work by chance - That is, after creating the table a bunch of stuff is done that is not needed.   I will add additional checks there


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/docs/user/hcatalog.txt, lines 284-285
> > <https://reviews.apache.org/r/10688/diff/9/?file=299864#file299864line284>
> >
> >     This seem unnecessary, can we tweak the bash scripts to do this automatically if the hcat command is present?

Good point.  Since I modified  the hive unit tests to function correctly in the presence of real hive environment, this can be easily done.


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/ExportJobBase.java, line 95
> > <https://reviews.apache.org/r/10688/diff/9/?file=299870#file299870line95>
> >
> >     Nit: I think that this line can be also refactored to the parent class right?

Yes.   One thing to note is that by  moving the isHCatJob to the parent class we lost the ability to mark it as final.   Let me rework it


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/ImportJobBase.java, line 85
> > <https://reviews.apache.org/r/10688/diff/9/?file=299871#file299871line85>
> >
> >     Nit: I think that this line can be also refactored to the parent class right?

Please see above


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java, lines 131-137
> > <https://reviews.apache.org/r/10688/diff/9/?file=299874#file299874line131>
> >
> >     This method seems to be required only for the debug message. Is it the only reason or did I miss something?

Yes, it is needed for debugging purpose when we want to know when the sub record reader or main record reader are called


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines 237-241
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line237>
> >
> >     Nit: It seems that we are doing the options = opts; every in all cases so maybe it would be worth putting this line before "if" statement?

Sure


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 249
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line249>
> >
> >     Nit: Shouldn't be default Hive home in SqoopOptions.getDefaultHiveHome()?

Yes.   The message needs fixing


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 257
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line257>
> >
> >     Shouldn't be default Hive home in SqoopOptions.getDefaultHcatHome()?

Yes.  As above


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 491
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line491>
> >
> >     Both Hive and HBase are idempotent when creating tables, so It might make sense to add "IF NOT EXISTS" in order to remain consistent.

Good point.  I think we will otherwise earlier, but for consistency I think we should do this.   Will change


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 523
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line523>
> >
> >     It seems that at this point we are not reading the hive configuration files but yet executing the in-process Hive CLI that will as a result not pick up the configuration file and will use defaults that is not consistent with the executed mapreduce job that will use the proper configuration files. As a result the table will be created in different metastore then into which we are importing data.

 Hive and hcat configuration files and jars have to be in the classpath brought in by hcat -classpath.   Do you think that is not always in the configuration?   When I update the configure sqoop script, I will make sure the hive conf is added.


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines 749-750
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line749>
> >
> >     Shouldn't we use here the SqoopOptions.getDefaultHiveHome()?

Yes.  WIll fix


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines 871-875
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line871>
> >
> >     Nit: Those lines seems to be unused.

Good catch - earlier I had the ability to execute a command line but removed it in favor of a simpler model.  Will remove it


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 876
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line876>
> >
> >     Can we write the file in temporary directory rather than in current working directory? (that might not be writable).

Sure will change


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/docs/user/hcatalog.txt, line 160
> > <https://reviews.apache.org/r/10688/diff/9/?file=299864#file299864line160>
> >
> >     Can we add here information what will happen if the table already exists and this parameter is specified?

Sure.   Will do.


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines 898-899
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line898>
> >
> >     I would suggest to alter this to single line:
> >     
> >     LOG.error("Error writing HCatalog load-in script: ", ioe);
> >     
> >     That will also print the stack trace.

Sure will do


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines 906-907
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line906>
> >
> >     I would suggest to change this line to :
> >     
> >     LOG.warn("IOException closing stream to HCatalog script: ", ioe);
> >     
> >     That will also print out the stack trace.

Sure will do


- Venkat


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21420
-----------------------------------------------------------


On June 3, 2013, 4:16 a.m., Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10688/
> -----------------------------------------------------------
> 
> (Updated June 3, 2013, 4:16 a.m.)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Description
> -------
> 
> This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  
> 
> With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.
> 
> 
> Diffs
> -----
> 
>   build.xml 636c103 
>   ivy.xml 1fa4dd1 
>   ivy/ivysettings.xml c4cc561 
>   src/docs/user/SqoopUserGuide.txt 01ac1cf 
>   src/docs/user/hcatalog.txt PRE-CREATION 
>   src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
>   src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
>   src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
>   src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
>   src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
>   src/perftest/ExportStressTest.java 0a41408 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
>   src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
>   src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
>   testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
>   testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
>   testdata/hcatalog/conf/log4j.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10688/diff/
> 
> 
> Testing
> -------
> 
> Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>


Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21420
-----------------------------------------------------------


Hi Venkat,
Thank you for incorporating my comments, greatly appreciated. I've took a deep look again and I do have following additional comments:

1) Can we add the HCatalog tests into ThirdPartyTest suite? https://github.com/apache/sqoop/blob/trunk/src/test/com/cloudera/sqoop/ThirdPartyTests.java

2) It seems that using --create-hcatalog-table will create the table and exist Sqoop without doing the import:

[root@bousa-hcat ~]# sqoop import --connect jdbc:mysql://mysql.ent.cloudera.com/sqoop --username sqoop --password sqoop --table text --hcatalog-table text --create-hcatalog-table
13/06/04 15:44:39 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/06/04 15:44:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
13/06/04 15:44:39 INFO tool.CodeGenTool: Beginning code generation
13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
13/06/04 15:44:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
13/06/04 15:44:39 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
Note: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/06/04 15:44:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.jar
13/06/04 15:44:42 WARN manager.MySQLManager: It looks like you are importing from mysql.
13/06/04 15:44:42 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
13/06/04 15:44:42 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
13/06/04 15:44:42 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
13/06/04 15:44:42 INFO mapreduce.ImportJobBase: Beginning import of text
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for import job
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific details for job
13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: Hive home is not set. job may fail if needed jar files are not found correctly.  Please set HIVE_HOME in sqoop-env.sh or provide --hive-home option.  Setting HIVE_HOME  to /usr/lib/hive
13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: HCatalog home is not set. job may fail if needed jar files are not found correctly.  Please set HCAT_HOME in sqoop-env.sh or provide --hcatalog-home option.   Setting HCAT_HOME to /usr/lib/hcatalog
13/06/04 15:44:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text` AS t LIMIT 1
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column names projected : [id, txt]
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column name - type map :
        Names: [id, txt]
        Types : [4, 12]
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Creating HCatalog table default.text for import
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: HCatalog Create table statement: 

create table default.text (
        id int,
        txt string)
stored as rcfile
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Executing HCatalog CLI in-process.
Hive history file=/tmp/root/hive_job_log_65f4f145-0b1e-4e09-8e40-b7edcfc15f83_2077084453.txt
OK
Time taken: 25.121 seconds
[root@bousa-hcat ~]#




src/docs/user/hcatalog.txt
<https://reviews.apache.org/r/10688/#comment44346>

    Can we add here information what will happen if the table already exists and this parameter is specified?



src/docs/user/hcatalog.txt
<https://reviews.apache.org/r/10688/#comment44347>

    This seem unnecessary, can we tweak the bash scripts to do this automatically if the hcat command is present?



src/java/org/apache/sqoop/mapreduce/ExportJobBase.java
<https://reviews.apache.org/r/10688/#comment44350>

    Nit: I think that this line can be also refactored to the parent class right?



src/java/org/apache/sqoop/mapreduce/ImportJobBase.java
<https://reviews.apache.org/r/10688/#comment44349>

    Nit: I think that this line can be also refactored to the parent class right?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java
<https://reviews.apache.org/r/10688/#comment44351>

    This method seems to be required only for the debug message. Is it the only reason or did I miss something?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44369>

    Nit: It seems that we are doing the options = opts; every in all cases so maybe it would be worth putting this line before "if" statement?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44371>

    Nit: Shouldn't be default Hive home in SqoopOptions.getDefaultHiveHome()?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44372>

    Shouldn't be default Hive home in SqoopOptions.getDefaultHcatHome()?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44397>

    Both Hive and HBase are idempotent when creating tables, so It might make sense to add "IF NOT EXISTS" in order to remain consistent.



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44395>

    It seems that at this point we are not reading the hive configuration files but yet executing the in-process Hive CLI that will as a result not pick up the configuration file and will use defaults that is not consistent with the executed mapreduce job that will use the proper configuration files. As a result the table will be created in different metastore then into which we are importing data.



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44378>

    Shouldn't we use here the SqoopOptions.getDefaultHiveHome()?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44379>

    Shouldn't we use here the SqoopOptions.getDefaultHCatHome()?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44382>

    Nit: Considering that there might be Hadoop3 in the future, would it be simple to change the condition to (isLocalMode and isHadoop1) instead of enumerating all other possible hadoop versions?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44386>

    Nit: Those lines seems to be unused.



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44387>

    Can we write the file in temporary directory rather than in current working directory? (that might not be writable).



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44388>

    I would suggest to alter this to single line:
    
    LOG.error("Error writing HCatalog load-in script: ", ioe);
    
    That will also print the stack trace.



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44389>

    I would suggest to change this line to :
    
    LOG.warn("IOException closing stream to HCatalog script: ", ioe);
    
    That will also print out the stack trace.


Jarcec

- Jarek Cecho


On June 3, 2013, 4:16 a.m., Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10688/
> -----------------------------------------------------------
> 
> (Updated June 3, 2013, 4:16 a.m.)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Description
> -------
> 
> This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  
> 
> With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.
> 
> 
> Diffs
> -----
> 
>   build.xml 636c103 
>   ivy.xml 1fa4dd1 
>   ivy/ivysettings.xml c4cc561 
>   src/docs/user/SqoopUserGuide.txt 01ac1cf 
>   src/docs/user/hcatalog.txt PRE-CREATION 
>   src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
>   src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
>   src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
>   src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
>   src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
>   src/perftest/ExportStressTest.java 0a41408 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
>   src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
>   src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
>   testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
>   testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
>   testdata/hcatalog/conf/log4j.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10688/diff/
> 
> 
> Testing
> -------
> 
> Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>


Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21577
-----------------------------------------------------------

Ship it!


Ship It!

- Jarek Cecho


On June 7, 2013, 2:03 a.m., Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10688/
> -----------------------------------------------------------
> 
> (Updated June 7, 2013, 2:03 a.m.)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Description
> -------
> 
> This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  
> 
> With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.
> 
> 
> Diffs
> -----
> 
>   bin/configure-sqoop 61ff3f2 
>   bin/configure-sqoop.cmd f5fd608 
>   build.xml 636c103 
>   ivy.xml 1fa4dd1 
>   ivy/ivysettings.xml c4cc561 
>   src/docs/user/SqoopUserGuide.txt 01ac1cf 
>   src/docs/user/hcatalog.txt PRE-CREATION 
>   src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
>   src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
>   src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
>   src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
>   src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
>   src/perftest/ExportStressTest.java 0a41408 
>   src/test/com/cloudera/sqoop/ThirdPartyTests.java 06f7122 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
>   src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
>   src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
>   testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
>   testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
>   testdata/hcatalog/conf/log4j.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10688/diff/
> 
> 
> Testing
> -------
> 
> Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>


Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

Posted by Venkat Ranganathan <n....@live.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/
-----------------------------------------------------------

(Updated June 7, 2013, 2:03 a.m.)


Review request for Sqoop and Jarek Cecho.


Changes
-------

Latest changes with the issue identified fixed.   

Thanks Jarek for a thorough review - Very much appreciated.   Will upload to JIRA also


Description
-------

This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  

With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.


Diffs (updated)
-----

  bin/configure-sqoop 61ff3f2 
  bin/configure-sqoop.cmd f5fd608 
  build.xml 636c103 
  ivy.xml 1fa4dd1 
  ivy/ivysettings.xml c4cc561 
  src/docs/user/SqoopUserGuide.txt 01ac1cf 
  src/docs/user/hcatalog.txt PRE-CREATION 
  src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
  src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
  src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
  src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
  src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
  src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
  src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
  src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
  src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
  src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
  src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
  src/perftest/ExportStressTest.java 0a41408 
  src/test/com/cloudera/sqoop/ThirdPartyTests.java 06f7122 
  src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
  src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
  src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
  src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
  src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
  testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
  testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
  testdata/hcatalog/conf/log4j.properties PRE-CREATION 

Diff: https://reviews.apache.org/r/10688/diff/


Testing
-------

Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass


Thanks,

Venkat Ranganathan


Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

Posted by Venkat Ranganathan <n....@live.com>.

> On June 7, 2013, 1:03 a.m., Jarek Cecho wrote:
> > bin/configure-sqoop, line 61
> > <https://reviews.apache.org/r/10688/diff/10-11/?file=301227#file301227line61>
> >
> >     Nit: there seems to be extra "`" that is breaking the script.

My apologies.  After testing, I was made sure the merged branch was alright and might have introduced this by mistake.   Also I noticed that I did not make the corresponding change in configure-sqoop.cmd for Windows.   I have added that.

Thanks

Venkat


- Venkat


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21552
-----------------------------------------------------------


On June 6, 2013, 10:55 p.m., Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10688/
> -----------------------------------------------------------
> 
> (Updated June 6, 2013, 10:55 p.m.)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Description
> -------
> 
> This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  
> 
> With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.
> 
> 
> Diffs
> -----
> 
>   bin/configure-sqoop 61ff3f2 
>   build.xml 636c103 
>   ivy.xml 1fa4dd1 
>   ivy/ivysettings.xml c4cc561 
>   src/docs/user/SqoopUserGuide.txt 01ac1cf 
>   src/docs/user/hcatalog.txt PRE-CREATION 
>   src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
>   src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
>   src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
>   src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
>   src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
>   src/perftest/ExportStressTest.java 0a41408 
>   src/test/com/cloudera/sqoop/ThirdPartyTests.java 06f7122 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
>   src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
>   src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
>   testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
>   testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
>   testdata/hcatalog/conf/log4j.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10688/diff/
> 
> 
> Testing
> -------
> 
> Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>


Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21552
-----------------------------------------------------------

Ship it!


Hi Venkat,
please do fix the last final typo and attach the patch to JIRA, I'll go ahead and commit it!


bin/configure-sqoop
<https://reviews.apache.org/r/10688/#comment44592>

    Nit: there seems to be extra "`" that is breaking the script.


Jarcec

- Jarek Cecho


On June 6, 2013, 10:55 p.m., Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10688/
> -----------------------------------------------------------
> 
> (Updated June 6, 2013, 10:55 p.m.)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Description
> -------
> 
> This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  
> 
> With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.
> 
> 
> Diffs
> -----
> 
>   bin/configure-sqoop 61ff3f2 
>   build.xml 636c103 
>   ivy.xml 1fa4dd1 
>   ivy/ivysettings.xml c4cc561 
>   src/docs/user/SqoopUserGuide.txt 01ac1cf 
>   src/docs/user/hcatalog.txt PRE-CREATION 
>   src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
>   src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
>   src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
>   src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
>   src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
>   src/perftest/ExportStressTest.java 0a41408 
>   src/test/com/cloudera/sqoop/ThirdPartyTests.java 06f7122 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
>   src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
>   src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
>   testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
>   testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
>   testdata/hcatalog/conf/log4j.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10688/diff/
> 
> 
> Testing
> -------
> 
> Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>


Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

Posted by Venkat Ranganathan <n....@live.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/
-----------------------------------------------------------

(Updated June 6, 2013, 10:55 p.m.)


Review request for Sqoop and Jarek Cecho.


Changes
-------

Updated patch with review comments - use an API that is backwards compatible for joining Strings.
Fixed script so that it works correctly with HCatalog jobs on BIGTOP
Other fixes.


Description
-------

This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  

With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.


Diffs (updated)
-----

  bin/configure-sqoop 61ff3f2 
  build.xml 636c103 
  ivy.xml 1fa4dd1 
  ivy/ivysettings.xml c4cc561 
  src/docs/user/SqoopUserGuide.txt 01ac1cf 
  src/docs/user/hcatalog.txt PRE-CREATION 
  src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
  src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
  src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
  src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
  src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
  src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
  src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
  src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
  src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
  src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
  src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
  src/perftest/ExportStressTest.java 0a41408 
  src/test/com/cloudera/sqoop/ThirdPartyTests.java 06f7122 
  src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
  src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
  src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
  src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
  src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
  testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
  testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
  testdata/hcatalog/conf/log4j.properties PRE-CREATION 

Diff: https://reviews.apache.org/r/10688/diff/


Testing
-------

Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass


Thanks,

Venkat Ranganathan


Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

Posted by Venkat Ranganathan <n....@live.com>.

> On June 6, 2013, 6:34 p.m., Jarek Cecho wrote:
> > Hi Venkat,
> > thank you very much for incorporating all my suggestions. I believe that we are almost at the end. I was again doing some testing and I've noticed few issues (some of them created by my own suggestions):
> > 
> > 1) I see compilation failure
> >     [javac] /home/jarcec/apache/repos/sqoop/src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java:877: join(java.lang.CharSequence,java.lang.Iterable<?>) in org.apache.hadoop.util.StringUtils cannot be applied to (java.lang.String,java.lang.String[])
> >     [javac]     String argLine = StringUtils.join(",", argArray);
> > 
> > I've fixed that by changing the line to String argLine = StringUtils.join(",", Arrays.asList(argArray)) to unblock the review, however proper solution is up to you :-)
> > 
> > 2) We've changed the hardcoded paths to Hive and HCatalog home to SqoopOptions.getHiveHomeDefault() (or HCatalog), however those two methods actually can return null, which is causing ClassNotFoundExceptions later in the code. What about improving them in similar fashion:
> > 
> >   public static String getHiveHomeDefault() {
> >     // Set this with $HIVE_HOME, but -Dhive.home can override.
> >     String hiveHome = System.getenv("HIVE_HOME", "/usr/lib/hive");
> >     return System.getProperty("hive.home", hiveHome);
> >   }

Thanks for the review

1)   I did run all the tests with hadoop100 profile but it looks like StringUtils.join(String, String[]) is a new addition.   Unfortunately, there is no @since in the javadocs :(  Sorry about that
2)  Good catch - will fix it and use the default values I was using before for these two


> On June 6, 2013, 6:34 p.m., Jarek Cecho wrote:
> > bin/configure-sqoop, line 118
> > <https://reviews.apache.org/r/10688/diff/10/?file=301227#file301227line118>
> >
> >     Nit: Add HCatalog to dependency list

Will fix


> On June 6, 2013, 6:34 p.m., Jarek Cecho wrote:
> > bin/configure-sqoop, line 118
> > <https://reviews.apache.org/r/10688/diff/10/?file=301227#file301227line118>
> >
> >     Nit: Add HCatalog to dependency list

Will fix


> On June 6, 2013, 6:34 p.m., Jarek Cecho wrote:
> > bin/configure-sqoop, line 120
> > <https://reviews.apache.org/r/10688/diff/10/?file=301227#file301227line120>
> >
> >     Rest of the Sqoop is expecting variable HADOOP_COMMON_HOME whereas the underlying hcat script is expecting HADOOP_HOME, so on BigTop this line is ending with:
> >     
> >     Hadoop not found.
> >     
> >     I was able to workaround it by adding following line before the highlighted line:
> >     
> >     export HADOOP_HOME=$HADOOP_COMMON_HOME
> >     
> >     However I'm not sure whether this is the best solution or not :-/

I think that sounds like a good fix.  Thanks for that.   Let me add it and also add a comment so that it is not accidentally removed in future


- Venkat


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21527
-----------------------------------------------------------


On June 6, 2013, midnight, Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10688/
> -----------------------------------------------------------
> 
> (Updated June 6, 2013, midnight)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Description
> -------
> 
> This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  
> 
> With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.
> 
> 
> Diffs
> -----
> 
>   bin/configure-sqoop 61ff3f2 
>   build.xml 636c103 
>   ivy.xml 1fa4dd1 
>   ivy/ivysettings.xml c4cc561 
>   src/docs/user/SqoopUserGuide.txt 01ac1cf 
>   src/docs/user/hcatalog.txt PRE-CREATION 
>   src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
>   src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
>   src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
>   src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
>   src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
>   src/perftest/ExportStressTest.java 0a41408 
>   src/test/com/cloudera/sqoop/ThirdPartyTests.java 06f7122 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
>   src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
>   src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
>   testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
>   testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
>   testdata/hcatalog/conf/log4j.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10688/diff/
> 
> 
> Testing
> -------
> 
> Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>


Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21527
-----------------------------------------------------------


Hi Venkat,
thank you very much for incorporating all my suggestions. I believe that we are almost at the end. I was again doing some testing and I've noticed few issues (some of them created by my own suggestions):

1) I see compilation failure
    [javac] /home/jarcec/apache/repos/sqoop/src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java:877: join(java.lang.CharSequence,java.lang.Iterable<?>) in org.apache.hadoop.util.StringUtils cannot be applied to (java.lang.String,java.lang.String[])
    [javac]     String argLine = StringUtils.join(",", argArray);

I've fixed that by changing the line to String argLine = StringUtils.join(",", Arrays.asList(argArray)) to unblock the review, however proper solution is up to you :-)

2) We've changed the hardcoded paths to Hive and HCatalog home to SqoopOptions.getHiveHomeDefault() (or HCatalog), however those two methods actually can return null, which is causing ClassNotFoundExceptions later in the code. What about improving them in similar fashion:

  public static String getHiveHomeDefault() {
    // Set this with $HIVE_HOME, but -Dhive.home can override.
    String hiveHome = System.getenv("HIVE_HOME", "/usr/lib/hive");
    return System.getProperty("hive.home", hiveHome);
  }


bin/configure-sqoop
<https://reviews.apache.org/r/10688/#comment44563>

    Nit: Add HCatalog to dependency list



bin/configure-sqoop
<https://reviews.apache.org/r/10688/#comment44566>

    Nit: Add HCatalog to dependency list



bin/configure-sqoop
<https://reviews.apache.org/r/10688/#comment44573>

    Rest of the Sqoop is expecting variable HADOOP_COMMON_HOME whereas the underlying hcat script is expecting HADOOP_HOME, so on BigTop this line is ending with:
    
    Hadoop not found.
    
    I was able to workaround it by adding following line before the highlighted line:
    
    export HADOOP_HOME=$HADOOP_COMMON_HOME
    
    However I'm not sure whether this is the best solution or not :-/


Jarcec

- Jarek Cecho


On June 6, 2013, midnight, Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10688/
> -----------------------------------------------------------
> 
> (Updated June 6, 2013, midnight)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Description
> -------
> 
> This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  
> 
> With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.
> 
> 
> Diffs
> -----
> 
>   bin/configure-sqoop 61ff3f2 
>   build.xml 636c103 
>   ivy.xml 1fa4dd1 
>   ivy/ivysettings.xml c4cc561 
>   src/docs/user/SqoopUserGuide.txt 01ac1cf 
>   src/docs/user/hcatalog.txt PRE-CREATION 
>   src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
>   src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
>   src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
>   src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
>   src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
>   src/perftest/ExportStressTest.java 0a41408 
>   src/test/com/cloudera/sqoop/ThirdPartyTests.java 06f7122 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
>   src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
>   src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
>   testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
>   testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
>   testdata/hcatalog/conf/log4j.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10688/diff/
> 
> 
> Testing
> -------
> 
> Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>


Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

Posted by Venkat Ranganathan <n....@live.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/
-----------------------------------------------------------

(Updated June 6, 2013, midnight)


Review request for Sqoop and Jarek Cecho.


Changes
-------

New review changes.    Fixed documentation and added a new test to validate we fail when create-hcatalog-table is provided with preexisting table.

Removed inline hcat client execution.   It causes issues with logger configuration being reset by hive and specifying hiveconfiguration on the command line will entail more significant changes.

For tests we still use inline hcat execution.


Description
-------

This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  

With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.


Diffs (updated)
-----

  bin/configure-sqoop 61ff3f2 
  build.xml 636c103 
  ivy.xml 1fa4dd1 
  ivy/ivysettings.xml c4cc561 
  src/docs/user/SqoopUserGuide.txt 01ac1cf 
  src/docs/user/hcatalog.txt PRE-CREATION 
  src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
  src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
  src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
  src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
  src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
  src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
  src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
  src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
  src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
  src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
  src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
  src/perftest/ExportStressTest.java 0a41408 
  src/test/com/cloudera/sqoop/ThirdPartyTests.java 06f7122 
  src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
  src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
  src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
  src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
  src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
  testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
  testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
  testdata/hcatalog/conf/log4j.properties PRE-CREATION 

Diff: https://reviews.apache.org/r/10688/diff/


Testing
-------

Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass


Thanks,

Venkat Ranganathan


Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog

Posted by Venkat Ranganathan <n....@live.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/
-----------------------------------------------------------

(Updated June 3, 2013, 4:16 a.m.)


Review request for Sqoop and Jarek Cecho.


Changes
-------

Same as previous one except fixed the trailing blanks in hcatalog.txt documentation.   No real change to the rendered HTML.   Sorry for one more minor update


Description
-------

This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  

With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.


Diffs (updated)
-----

  build.xml 636c103 
  ivy.xml 1fa4dd1 
  ivy/ivysettings.xml c4cc561 
  src/docs/user/SqoopUserGuide.txt 01ac1cf 
  src/docs/user/hcatalog.txt PRE-CREATION 
  src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
  src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
  src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
  src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
  src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
  src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
  src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
  src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
  src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
  src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
  src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
  src/perftest/ExportStressTest.java 0a41408 
  src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
  src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
  src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
  src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
  src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
  testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
  testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
  testdata/hcatalog/conf/log4j.properties PRE-CREATION 

Diff: https://reviews.apache.org/r/10688/diff/


Testing
-------

Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass


Thanks,

Venkat Ranganathan