You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Kunal Sharma (Jira)" <ji...@apache.org> on 2019/08/27 17:25:00 UTC

[jira] [Created] (SQOOP-3448) Pulling timestamp over year 2038/2039 and storing it to parquet file causes unix timestamp stored to be inaccurate.

Kunal Sharma created SQOOP-3448:
-----------------------------------

             Summary: Pulling timestamp over year 2038/2039 and storing it to parquet file causes unix timestamp stored to be inaccurate.
                 Key: SQOOP-3448
                 URL: https://issues.apache.org/jira/browse/SQOOP-3448
             Project: Sqoop
          Issue Type: Bug
    Affects Versions: 1.4.6
            Reporter: Kunal Sharma


Background:

We are pulling the data from source and storing it directly into the the parquet file as result of this all timestamp/date value is changed into unix timestamp (*milliseconds*) in the Parquet file stored as data type long as part of the sqoop process by the sqoop process as default configuration. (which is what we want since we want to avoid the whole timezone issue parquet file has with different data engine.)

 

Issue:

When pulling timestamp over the classic year 2038 issue ([https://en.wikipedia.org/wiki/Year_2038_problem)] we get negative number. Which is weird as the unix timestamp that is stored as is in milliseconds and milliseconds needs to be stored as big int. So some where in the process the transformation is happening as int or double which is getting multiple by 1000 and then truncated into big int data type which is the end results we see stored on the parquet file which is data type long (big int)

 

Key Configuration

Oracle jars  - "HADOOP_CLASSPATH=ojdbc6.jar"

See attached file for the sqoop command reference 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)