You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2021/01/12 16:59:00 UTC

[jira] [Closed] (ORC-502) Hive ORC read INT, BIGINT as NULL for Data created by Spark

     [ https://issues.apache.org/jira/browse/ORC-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun closed ORC-502.
-----------------------------
    Resolution: Duplicate

> Hive ORC read INT, BIGINT as NULL for Data created by Spark
> -----------------------------------------------------------
>
>                 Key: ORC-502
>                 URL: https://issues.apache.org/jira/browse/ORC-502
>             Project: ORC
>          Issue Type: Bug
>            Reporter: Oleksiy Sayankin
>            Priority: Major
>         Attachments: data.orc
>
>
> *Preconditions*
> Create file {{ratings.csv}} and put it to HDFS {{/user/test/rating/ratings.csv}}.
> {code}
> userId,movieId,rating,timestamp
> 1,2,4.5,1784325658
> {code}
> See appropriate {{data.orc}} file in attachment.
> *STR:*
> 1. Using spark (tested on version 2.2.1 and 2.3.1) created {{dataframe(df)}} of using {{interSchema}} from a CSV file
> {code}
> val df =spark.read.format("csv").option("header","true").option("inferSchema","true").load("/user/test/rating/ratings.csv")
> {code}
> 2. Now save the df into ORC format file.
> {code}
> df.write.format("orc").save("/user/test/spark_rating_orc_typesafe")
> {code}
> 3. Using hive 2.3. Try creating hive external table respective.
> {code}
> create external table rating_orc_hive_type_1(userId int,movieId int,rating double, `timestamp` int) stored as ORC location "/user/test/spark_orc_rating_typesafe/";
> {code}
> 4. Do query
> {code}
> select * from rating_orc_hive_type_1;
> {code}
> Only double value is printed. Null for integer and even for BIGINT.
> {code}
> OK
> NULL    NULL    4.5     1784325658
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)