You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Anubhav Tarar <an...@knoldus.in> on 2018/03/01 07:31:00 UTC

How to Load Data From a CSV to a parquet table

Hi i m trying to load data from a csv file into parquet in hive but got
this exception

hive> create table if not exists REGION( R_NAME string, R_REGIONKEY string,
R_COMMENT string ) stored as parquet;
OK
Time taken: 0.414 seconds
hive> load data local inpath
'file:///home/anubhav/Downloads/dbgen/region.tbl' into table region;
Loading data to table default.region
OK
Time taken: 1.011 seconds
hive> select * from region;
OK
Failed with exception java.io.IOException:java.lang.RuntimeException:
hdfs://localhost:54311/user/hive/warehouse/region/region.tbl is not a
Parquet file. expected magic number at tail [80, 65, 82, 49] but found
[115, 108, 124, 10]
Time taken: 0.108 seconds

can anyone help?hive version is 2.1

-- 
Thanks and Regards

*   Anubhav Tarar     *


* Software Consultant*
      *Knoldus Software LLP <http://www.knoldus.com/home.knol>       *
       LinkedIn <http://in.linkedin.com/in/rahulforallp>     Twitter
<https://twitter.com/RahulKu71223673>    fb <ra...@facebook.com>
          mob : 8588915184

Re: How to Load Data From a CSV to a parquet table

Posted by Jörn Franke <jo...@gmail.com>.
You have defined a parquet only table. It interprets your CSV file as parquet. You can for instance define 2 tables:

* one external for the CSV file
* one table for the parquet file

Afterwards you select from the first table and insert in the second table. 

> On 1. Mar 2018, at 08:31, Anubhav Tarar <an...@knoldus.in> wrote:
> 
> Hi i m trying to load data from a csv file into parquet in hive but got
> this exception
> 
> hive> create table if not exists REGION( R_NAME string, R_REGIONKEY string,
> R_COMMENT string ) stored as parquet;
> OK
> Time taken: 0.414 seconds
> hive> load data local inpath
> 'file:///home/anubhav/Downloads/dbgen/region.tbl' into table region;
> Loading data to table default.region
> OK
> Time taken: 1.011 seconds
> hive> select * from region;
> OK
> Failed with exception java.io.IOException:java.lang.RuntimeException:
> hdfs://localhost:54311/user/hive/warehouse/region/region.tbl is not a
> Parquet file. expected magic number at tail [80, 65, 82, 49] but found
> [115, 108, 124, 10]
> Time taken: 0.108 seconds
> 
> can anyone help?hive version is 2.1
> 
> -- 
> Thanks and Regards
> 
> *   Anubhav Tarar     *
> 
> 
> * Software Consultant*
>      *Knoldus Software LLP <http://www.knoldus.com/home.knol>       *
>       LinkedIn <http://in.linkedin.com/in/rahulforallp>     Twitter
> <https://twitter.com/RahulKu71223673>    fb <ra...@facebook.com>
>          mob : 8588915184