You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Fero Szabo (Jira)" <ji...@apache.org> on 2019/09/25 14:18:00 UTC

[jira] [Comment Edited] (SQOOP-3451) Importing FLOAT from Oracle to Hive results in INTEGER

    [ https://issues.apache.org/jira/browse/SQOOP-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937767#comment-16937767 ] 

Fero Szabo edited comment on SQOOP-3451 at 9/25/19 2:17 PM:
------------------------------------------------------------

Hi [~dionusos],

Yeah, I think you are right and Oracle is a pain in this regard to work with.

I've had the same issue when developing the fixed point decimal support for Avro and Parquet, namely that a column defined as NUMBER (without precision and scale), comes back with invalid metadata from the database. (I believe something like -127 as scale, though please double check this). And under the hood, I suspect Oracle is using NUMBER, again, to store the type Float. 

In my case, the only missing thing was a proper scale to be able to pad a BigDecimal within sqoop. So, I created a flag to allow this to the user in SQOOP-2976. Not sure what to do in your case, as it's neither of those files, (it's orc, if I'm seeing this correctly). In any case, I believe you'll need to watch out for these "special" values for scale and precision returned by Oracle and implement a logic that maps these to proper values.

I used user input for this via properties. Seemed the best at the time, for that particular case. I'm not sure if my approach is the right one for you as well, though certainly an option.

So, TL;DR:

Track down where the Hive schema gets created and debug whether you can identify a Float coming from Oracle based on the precision and scale. You might want to check other number types, too.

Hope this helps!

 

(edited a mistake)


was (Author: fero):
Hi [~dionusos],

Yeah, I think you are right and Oracle is a pain in this regard to work with.

I've had the same issue when developing the fixed point number support for Avro and Parquet, namely that a column defined as NUMBER (without precision and scale), comes back with invalid metadata from the database. (I believe something like -127 as scale, though please double check this). And under the hood, I suspect Oracle is using NUMBER, again, to store the type Float. 

In my case, the only missing thing was a proper scale to be able to pad a BigDecimal within sqoop. So, I created a flag to allow this to the user in SQOOP-2976. Not sure what to do in your case, as it's neither of those files, (it's orc, if I'm seeing this correctly). In any case, I believe you'll need to watch out for these "special" values for scale and precision returned by Oracle and implement a logic that maps these to proper values.

I used user input for this via properties. Seemed the best at the time, for that particular case. I'm not sure if my approach is the right one for you as well, though certainly an option.

So, TL;DR:

Track down where the Hive schema gets created and debug whether you can identify a Float coming from Oracle based on the precision and scale. You might want to check other number types, too.

Hope this helps!

 

(edited a mistake)

> Importing FLOAT from Oracle to Hive results in INTEGER
> ------------------------------------------------------
>
>                 Key: SQOOP-3451
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3451
>             Project: Sqoop
>          Issue Type: Bug
>          Components: codegen, connectors/oracle, hive-integration
>    Affects Versions: 1.4.7
>            Reporter: Denes Bodo
>            Priority: Major
>
> We ran into an issue where there is a table created in Oracle 11g:
> {noformat}
> create table floattest (column1 float(30), column2 number(30,-127), column3 number(30));
> {noformat}
> We want to import date from Oracle to Hive:
> {noformat}
> sqoop import -D mapred.child.java.opts='-Djava.security.egd=file:/dev/../dev/urandom' -Dmapreduce.job.queuename=default --connect "jdbc:oracle:thin:@DBHOST:1521/xe" --username sqoop --password sqoop --table floattest --hcatalog-database default --hcatalog-table floattest --create-hcatalog-table --hcatalog-external-table --hcatalog-storage-stanza "stored as orc" -m 1 --columns COLUMN1,COLUMN2,COLUMN3 --verbose
> {noformat}
> In Sqoop logs we see the following:
> {noformat}
> 19/09/24 13:51:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM floattest t WHERE 1=0
> 19/09/24 13:51:45 DEBUG manager.SqlManager: Found column COLUMN1 of type [2, 30, -127]
> 19/09/24 13:51:45 DEBUG manager.SqlManager: Found column COLUMN2 of type [2, 30, -84]
> 19/09/24 13:51:45 DEBUG manager.SqlManager: Found column COLUMN3 of type [2, 30, 0]
> 19/09/24 13:51:45 INFO hcat.SqoopHCatUtilities: Database column names projected : [COLUMN1, COLUMN2, COLUMN3]
> 19/09/24 13:51:45 INFO hcat.SqoopHCatUtilities: Database column name - info map :
>         COLUMN3 : [Type : 2,Precision : 30,Scale : 0]
>         COLUMN2 : [Type : 2,Precision : 30,Scale : -84]
>         COLUMN1 : [Type : 2,Precision : 30,Scale : -127]
> 19/09/24 13:51:45 INFO hcat.SqoopHCatUtilities: Creating HCatalog table default.floattest for import
> 19/09/24 13:51:45 INFO hcat.SqoopHCatUtilities: HCatalog Create table statement:
> create external table `default`.`floattest` (
>         `column1` decimal(30),
>         `column2` decimal(30),
>         `column3` decimal(30))
> stored as orc
> {noformat}
> From this output we can see that Oracle states about column1 has Type=2 which is NUMERIC (regarding to https://docs.oracle.com/javase/7/docs/api/constant-values.html#java.sql.Types.FLOAT). Sqoop translates NUMERIC to DECIMAL (https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java#L1050L1107). Due to Oracle uses {{scale=-127}} to sign about a NUMERIC that it is a FLOAT instead of stating {{Type=6}}, Sqoop creates integers (decimal with 0 scale) from NUMBER.
> I think it is the fault of Oracle as it does not use Java Type=6 to sign type of a float. What do you think?
> ----
> Thank you for the details and investigation to [~mbalakrishnan] and Andrew Miller



--
This message was sent by Atlassian Jira
(v8.3.4#803005)