You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Ruslan Dautkhanov (JIRA)" <ji...@apache.org> on 2016/10/18 20:17:59 UTC

[jira] [Commented] (SQOOP-3030) Export to Oracle using (--direct + --hcatalog + --columns) with columns having the wrong case sensitivity fails with NPE

    [ https://issues.apache.org/jira/browse/SQOOP-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586552#comment-15586552 ] 

Ruslan Dautkhanov commented on SQOOP-3030:
------------------------------------------

thank you [~markuskemper@me.com]

> Export to Oracle using (--direct + --hcatalog + --columns) with columns having the wrong case sensitivity fails with NPE
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-3030
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3030
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors/oracle, hive-integration
>            Reporter: Markus Kemper
>
> It appears that if the case of the columns in the Oracle RDBMS are not used with the Sqoop --columns option with (export + --hcatalog + --direct) the Sqoop map task will fail with an NPE where the non-direct path fails correct.  My test case is below, please review and let me know if you have any questions.
> Additionally it would be nice if we could detect this column case mis-match during compile time and not submit the job only to have it fail in YARN.
> TEST CASE:
> STEP 01 : Create Oracle Source/Target Table
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c1 int, c2 varchar(10))"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, 'some data')"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1"
> -------------------------------------
> | C1                   | C2         | 
> -------------------------------------
> | 1                    | some data  | 
> -------------------------------------
> STEP 02 : Import Oracle Table using Sqoop --hcatalog options
> beeline -u jdbc:hive2:// -e "use default; drop table t1_text;"
> sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --num-mappers 1 --hcatalog-database default --hcatalog-table t1_text --create-hcatalog-table --hcatalog-storage-stanza 'stored as textfile'
> beeline -u jdbc:hive2:// -e "use default; select * from t1_text;"
> Output:
> <SNIP>
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column names projected : [c1, c2]
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Database column name - info map :
> 	c1 : [Type : 2,Precision : 38,Scale : 0]
> 	c2 : [Type : 12,Precision : 10,Scale : 0]
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: Creating HCatalog table default.t1_text for import
> 16/10/18 09:11:31 INFO hcat.SqoopHCatUtilities: HCatalog Create table statement: 
> create table `default`.`t1_text` (
> 	`c1` decimal(38),
> 	`c2` varchar(10))
> stored as textfile
> <SNIP>
> 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Transferred 12 bytes in 34.4222 seconds (0.3486 bytes/sec)
> 16/10/18 09:12:14 INFO mapreduce.ImportJobBase: Retrieved 1 records.
> ---
> +-------------+-------------+--+
> | t1_text.c1  | t1_text.c2  |
> +-------------+-------------+--+
> | 1           | some data   |
> +-------------+-------------+--+
> STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns options
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T2 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text --columns "c1,c2"
> Output: (failure = correct)
> <SNIP>
> 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column names projected : [c1, c2]
> 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: Database column name - info map :
> 	c1 : [Type : 2,Precision : 38,Scale : 0]
> 	c2 : [Type : 12,Precision : 10,Scale : 0]
> <SNIP>
> 16/10/18 09:13:48 INFO hive.metastore: Connected to metastore.
> 16/10/18 09:13:48 INFO hcat.SqoopHCatUtilities: HCatalog full table schema fields = [c1, c2]
> 16/10/18 09:13:49 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
> 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog table partitioning key fields = []
> 16/10/18 09:13:49 INFO hcat.SqoopHCatUtilities: HCatalog projected schema fields = [c1, c2]
> <SNIP>
> 16/10/18 09:14:22 INFO mapreduce.ExportJobBase: Exported 0 records.
> 16/10/18 09:14:22 ERROR tool.ExportTool: Error during export: Export job failed!
> ---
> 2016-10-18 09:14:19,418 INFO [main] org.apache.hive.hcatalog.mapreduce.InternalUtil: Initializing org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe with properties {transient_lastDdlTime=1476807097, name=default.t1_text, serialization.null.format=\N, columns=c1,c2, serialization.lib=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, serialization.format=1, columns.types=decimal(38,0),varchar(10)}
> 2016-10-18 09:14:19,660 INFO [Thread-12] org.apache.sqoop.mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
> 2016-10-18 09:14:19,952 ERROR [Thread-11] org.apache.sqoop.mapreduce.AsyncSqlOutputFormat: Got exception in update thread: java.sql.SQLSyntaxErrorException: ORA-00904: "c2": invalid identifier
> STEP 03 : Attempt Export to Oracle using Sqoop --hcatalog and --columns options and Oracle --direct
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T2 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text --columns "c1,c2" --direct
> Output: (failure = not correct (NPE))
> <SNIP>
> 16/10/18 09:17:47 INFO oracle.OraOopManagerFactory: 
> **************************************************
> *** Using Data Connector for Oracle and Hadoop ***
> **************************************************
> 16/10/18 09:17:47 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
> 16/10/18 09:17:47 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.
> 16/10/18 09:17:47 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
> 16/10/18 09:17:47 INFO tool.CodeGenTool: Beginning code generation
> 16/10/18 09:17:47 INFO manager.SqlManager: Executing SQL statement: SELECT "C1","C2" FROM T2 WHERE 0=1
> <SNIP>
> 16/10/18 09:17:50 INFO mapreduce.ExportJobBase: Configuring HCatalog for export job
> 16/10/18 09:17:50 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific details for job
> 16/10/18 09:17:50 INFO manager.SqlManager: Executing SQL statement: SELECT "C1","C2" FROM "T2" WHERE 1=0
> 16/10/18 09:17:50 INFO hcat.SqoopHCatUtilities: Database column names projected : [c1, c2]
> 16/10/18 09:17:50 INFO hcat.SqoopHCatUtilities: Database column name - info map :
> 	c1 : [Type : 2,Precision : 38,Scale : 0]
> 	c2 : [Type : 12,Precision : 10,Scale : 0]
> <SNIP>
> 16/10/18 09:17:51 INFO hive.metastore: Connected to metastore.
> 16/10/18 09:17:52 INFO hcat.SqoopHCatUtilities: HCatalog full table schema fields = [c1, c2]
> <SNIP>
> 16/10/18 09:18:25 INFO mapreduce.ExportJobBase: Exported 0 records.
> 16/10/18 09:18:25 ERROR tool.ExportTool: Error during export: Export job failed!
> ---
> 2016-10-18 09:18:23,561 INFO [main] org.apache.sqoop.manager.oracle.OraOopOutputFormatBase: The number of rows per batch is: 100
> 2016-10-18 09:18:23,561 INFO [main] org.apache.sqoop.manager.oracle.OraOopOutputFormatBase: The number of batches per commit is: 100
> 2016-10-18 09:18:23,721 INFO [main] org.apache.sqoop.manager.oracle.OraOopOutputFormatBase: This record writer is connected to Oracle via the JDBC URL: 
> 	"oracle.jdbc.driver.T4CConnection@34133979"
> 	to the Oracle instance: "ORCL"
> 2016-10-18 09:18:23,828 INFO [main] org.apache.sqoop.manager.oracle.OraOopOracleQueries: Session Time Zone set to GMT
> 2016-10-18 09:18:23,883 INFO [main] org.apache.sqoop.manager.oracle.OracleConnectionFactory: Initializing Oracle session with SQL :
> begin 
>   dbms_application_info.set_module(module_name => 'Data Connector for Oracle and Hadoop', action_name => 'export 20161018091747PDT'); 
> end;
> 2016-10-18 09:18:23,883 WARN [main] org.apache.sqoop.manager.oracle.OracleConnectionFactory: No Oracle 'session initialization' statements were found to execute.
> Check that your oraoop-site-template.xml and/or oraoop-site.xml files are correctly installed in the ${SQOOP_HOME}/conf directory.
> 2016-10-18 09:18:24,116 INFO [main] org.apache.hive.hcatalog.mapreduce.InternalUtil: Initializing org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe with properties {transient_lastDdlTime=1476807097, name=default.t1_text, serialization.null.format=\N, columns=c1,c2, serialization.lib=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, serialization.format=1, columns.types=decimal(38,0),varchar(10)}
> 2016-10-18 09:18:24,345 INFO [Thread-12] org.apache.sqoop.mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
> 2016-10-18 09:18:24,357 INFO [main] org.apache.sqoop.manager.oracle.OraOopOutputFormatBase: Batch-Mode insert statement:
> insert  into "SQOOP"."T2"
> ("C1"
> ,"C2")
> values
> (:C1
> ,:C2)
> 2016-10-18 09:18:24,358 ERROR [main] org.apache.sqoop.manager.oracle.OraOopOutputFormatInsert: The following error occurred during configurePreparedStatement()
> java.lang.NullPointerException
> 	at org.apache.sqoop.manager.oracle.OraOopOutputFormatBase$OraOopDBRecordWriterBase.setBindValueAtName(OraOopOutputFormatBase.java:432)
> STEP 04 : Verify Export without --direct and correct column case works
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text --columns "C1,C2" 
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1"
> Output:
> 16/10/18 09:26:46 INFO mapreduce.ExportJobBase: Transferred 7.79 KB in 30.9303 seconds (257.9025 bytes/sec)
> 16/10/18 09:26:46 INFO mapreduce.ExportJobBase: Exported 1 records.
> ---
> -------------------------------------
> | C1                   | C2         | 
> -------------------------------------
> | 1                    | some data  | 
> | 1                    | some data  | 
> -------------------------------------
> STEP 05 : Verify Export with --direct and correct column case works
> sqoop export --connect $MYCONN --username $MYUSER --password $MYPSWD --table T1 --num-mappers 2 --hcatalog-database default --hcatalog-table t1_text --columns "C1,C2" --direct
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1"
> Output:
> <SNIP>
> 16/10/18 09:30:40 INFO oracle.OraOopManagerFactory: 
> **************************************************
> *** Using Data Connector for Oracle and Hadoop ***
> **************************************************
> <SNIP>
> 16/10/18 09:31:16 INFO mapreduce.ExportJobBase: Transferred 7.79 KB in 30.9987 seconds (257.333 bytes/sec)
> 16/10/18 09:31:16 INFO mapreduce.ExportJobBase: Exported 1 records.
> ---
> -------------------------------------
> | C1                   | C2         | 
> -------------------------------------
> | 1                    | some data  | 
> | 1                    | some data  | 
> | 1                    | some data  | 
> -------------------------------------



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)